1
|
Han MV, Thomas GWC, Lugo-Martinez J, Hahn MW. Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. Mol Biol Evol 2013; 30:1987-97. [PMID: 23709260 DOI: 10.1093/molbev/mst100] [Citation(s) in RCA: 582] [Impact Index Per Article: 48.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Current sequencing methods produce large amounts of data, but genome assemblies constructed from these data are often fragmented and incomplete. Incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. This means that methods attempting to estimate rates of gene duplication and loss often will be misled by such errors and that rates of gene family evolution will be consistently overestimated. Here, we present a method that takes these errors into account, allowing one to accurately infer rates of gene gain and loss among genomes even with low assembly and annotation quality. The method is implemented in the newest version of the software package CAFE, along with several other novel features. We demonstrate the accuracy of the method with extensive simulations and reanalyze several previously published data sets. Our results show that errors in genome annotation do lead to higher inferred rates of gene gain and loss but that CAFE 3 sufficiently accounts for these errors to provide accurate estimates of important evolutionary parameters.
Collapse
|
Research Support, Non-U.S. Gov't |
12 |
582 |
2
|
Chen S. Ultrafast one-pass FASTQ data preprocessing, quality control, and de duplication using fastp. IMETA 2023; 2:e107. [PMID: 38868435 PMCID: PMC10989850 DOI: 10.1002/imt2.107] [Citation(s) in RCA: 452] [Impact Index Per Article: 226.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 04/11/2023] [Accepted: 04/11/2023] [Indexed: 06/14/2024]
Abstract
A large amount of sequencing data is generated and processed every day with the continuous evolution of sequencing technology and the expansion of sequencing applications. One consequence of such sequencing data explosion is the increasing cost and complexity of data processing. The preprocessing of FASTQ data, which means removing adapter contamination, filtering low-quality reads, and correcting wrongly represented bases, is an indispensable but resource intensive part of sequencing data analysis. Therefore, although a lot of software applications have been developed to solve this problem, bioinformatics scientists and engineers are still pursuing faster, simpler, and more energy-efficient software. Several years ago, the author developed fastp, which is an ultrafast all-in-one FASTQ data preprocessor with many modern features. This software has been approved by many bioinformatics users and has been continuously maintained and updated. Since the first publication on fastp, it has been greatly improved, making it even faster and more powerful. For instance, the duplication evaluation module has been improved, and a new deduplication module has been added. This study aimed to introduce the new features of fastp and demonstrate how it was designed and implemented.
Collapse
|
brief-report |
2 |
452 |
3
|
Lang D, Ullrich KK, Murat F, Fuchs J, Jenkins J, Haas FB, Piednoel M, Gundlach H, Van Bel M, Meyberg R, Vives C, Morata J, Symeonidi A, Hiss M, Muchero W, Kamisugi Y, Saleh O, Blanc G, Decker EL, van Gessel N, Grimwood J, Hayes RD, Graham SW, Gunter LE, McDaniel SF, Hoernstein SNW, Larsson A, Li FW, Perroud PF, Phillips J, Ranjan P, Rokshar DS, Rothfels CJ, Schneider L, Shu S, Stevenson DW, Thümmler F, Tillich M, Villarreal Aguilar JC, Widiez T, Wong GKS, Wymore A, Zhang Y, Zimmer AD, Quatrano RS, Mayer KFX, Goodstein D, Casacuberta JM, Vandepoele K, Reski R, Cuming AC, Tuskan GA, Maumus F, Salse J, Schmutz J, Rensing SA. The Physcomitrella patens chromosome-scale assembly reveals moss genome structure and evolution. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2018; 93:515-533. [PMID: 29237241 DOI: 10.1111/tpj.13801] [Citation(s) in RCA: 292] [Impact Index Per Article: 41.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2017] [Revised: 11/20/2017] [Accepted: 11/24/2017] [Indexed: 05/18/2023]
Abstract
The draft genome of the moss model, Physcomitrella patens, comprised approximately 2000 unordered scaffolds. In order to enable analyses of genome structure and evolution we generated a chromosome-scale genome assembly using genetic linkage as well as (end) sequencing of long DNA fragments. We find that 57% of the genome comprises transposable elements (TEs), some of which may be actively transposing during the life cycle. Unlike in flowering plant genomes, gene- and TE-rich regions show an overall even distribution along the chromosomes. However, the chromosomes are mono-centric with peaks of a class of Copia elements potentially coinciding with centromeres. Gene body methylation is evident in 5.7% of the protein-coding genes, typically coinciding with low GC and low expression. Some giant virus insertions are transcriptionally active and might protect gametes from viral infection via siRNA mediated silencing. Structure-based detection methods show that the genome evolved via two rounds of whole genome duplications (WGDs), apparently common in mosses but not in liverworts and hornworts. Several hundred genes are present in colinear regions conserved since the last common ancestor of plants. These syntenic regions are enriched for functions related to plant-specific cell growth and tissue organization. The P. patens genome lacks the TE-rich pericentromeric and gene-rich distal regions typical for most flowering plant genomes. More non-seed plant genomes are needed to unravel how plant genomes evolve, and to understand whether the P. patens genome structure is typical for mosses or bryophytes.
Collapse
|
|
7 |
292 |
4
|
Ingason A, Rujescu D, Cichon S, Sigurdsson E, Sigmundsson T, Pietiläinen OPH, Buizer-Voskamp JE, Strengman E, Francks C, Muglia P, Gylfason A, Gustafsson O, Olason PI, Steinberg S, Hansen T, Jakobsen KD, Rasmussen HB, Giegling I, Möller HJ, Hartmann A, Crombie C, Fraser G, Walker N, Lonnqvist J, Suvisaari J, Tuulio-Henriksson A, Bramon E, Kiemeney LA, Franke B, Murray R, Vassos E, Toulopoulou T, Mühleisen TW, Tosato S, Ruggeri M, Djurovic S, Andreassen OA, Zhang Z, Werge T, Ophoff RA, GROUP Investigators, Rietschel M, Nöthen MM, Petursson H, Stefansson H, Peltonen L, Collier D, Stefansson K, St Clair DM. Copy number variations of chromosome 16p13.1 region associated with schizophrenia. Mol Psychiatry 2011; 16:17-25. [PMID: 19786961 PMCID: PMC3330746 DOI: 10.1038/mp.2009.101] [Citation(s) in RCA: 192] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/11/2009] [Revised: 08/18/2009] [Accepted: 08/21/2009] [Indexed: 01/22/2023]
Abstract
Deletions and reciprocal duplications of the chromosome 16p13.1 region have recently been reported in several cases of autism and mental retardation (MR). As genomic copy number variants found in these two disorders may also associate with schizophrenia, we examined 4345 schizophrenia patients and 35,079 controls from 8 European populations for duplications and deletions at the 16p13.1 locus, using microarray data. We found a threefold excess of duplications and deletions in schizophrenia cases compared with controls, with duplications present in 0.30% of cases versus 0.09% of controls (P=0.007) and deletions in 0.12 % of cases and 0.04% of controls (P>0.05). The region can be divided into three intervals defined by flanking low copy repeats. Duplications spanning intervals I and II showed the most significant (P = 0.00010) association with schizophrenia. The age of onset in duplication and deletion carriers among cases ranged from 12 to 35 years, and the majority were males with a family history of psychiatric disorders. In a single Icelandic family, a duplication spanning intervals I and II was present in two cases of schizophrenia, and individual cases of alcoholism, attention deficit hyperactivity disorder and dyslexia. Candidate genes in the region include NTAN1 and NDE1. We conclude that duplications and perhaps also deletions of chromosome 16p13.1, previously reported to be associated with autism and MR, also confer risk of schizophrenia.
Collapse
|
Multicenter Study |
14 |
192 |
5
|
Cho NH, Kim HR, Lee JH, Kim SY, Kim J, Cha S, Kim SY, Darby AC, Fuxelius HH, Yin J, Kim JH, Kim J, Lee SJ, Koh YS, Jang WJ, Park KH, Andersson SGE, Choi MS, Kim IS. The Orientia tsutsugamushi genome reveals massive proliferation of conjugative type IV secretion system and host-cell interaction genes. Proc Natl Acad Sci U S A 2007; 104:7981-6. [PMID: 17483455 PMCID: PMC1876558 DOI: 10.1073/pnas.0611553104] [Citation(s) in RCA: 184] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2006] [Indexed: 11/18/2022] Open
Abstract
Scrub typhus is caused by the obligate intracellular rickettsia Orientia tsutsugamushi (previously called Rickettsia tsutsugamushi). The bacterium is maternally inherited in trombicuid mites and transmitted to humans by feeding larvae. We report here the 2,127,051-bp genome of the Boryong strain, which represents the most highly repeated bacterial genome sequenced to date. The repeat density of the scrub typhus pathogen is 200-fold higher than that of its close relative Rickettsia prowazekii, the agent of epidemic typhus. A total of 359 tra genes for components of conjugative type IV secretion systems were identified at 79 sites in the genome. Associated with these are >200 genes for signaling and host-cell interaction proteins, such as histidine kinases, ankyrin-repeat proteins, and tetratrico peptide-repeat proteins. Additionally, the O. tsutsugamushi genome contains >400 transposases, 60 phage integrases, and 70 reverse transcriptases. Deletions and rearrangements have yielded unique gene combinations as well as frequent pseudogenization in the tra clusters. A comparative analysis of the tra clusters within the genome and across strains indicates sequence homogenization by gene conversion, whereas complexity, diversity, and pseudogenization are acquired by duplications, deletions, and transposon integrations into the amplified segments. The results suggest intragenomic duplications or multiple integrations of a massively proliferating conjugative transfer system. Diversifying selection on host-cell interaction genes along with repeated population bottlenecks may drive rare genome variants to fixation, thereby short-circuiting selection for low complexity in bacterial genomes.
Collapse
|
research-article |
18 |
184 |
6
|
Adams IR, Kilmartin JV. Localization of core spindle pole body (SPB) components during SPB duplication in Saccharomyces cerevisiae. J Cell Biol 1999; 145:809-23. [PMID: 10330408 PMCID: PMC2133189 DOI: 10.1083/jcb.145.4.809] [Citation(s) in RCA: 163] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/1999] [Revised: 03/26/1999] [Indexed: 11/22/2022] Open
Abstract
We have examined the process of spindle pole body (SPB) duplication in Saccharomyces cerevisiae by electron microscopy and found several stages. These include the assembly, probably from the satellite, of a large plaque-like structure, the duplication plaque, on the cytoplasmic face of the half-bridge and its insertion into the nuclear envelope. We analyzed the role of the main SPB components in the formation of these structures by identifying them from an SPB core fraction by mass spectrometry. Temperature-sensitive mutants for two of the components, Spc29p and Nud1p, were prepared to partly define their function. The composition of two of the intermediates in SPB duplication, the satellite and the duplication plaque, was examined by immunoelectron microscopy. Both contain cytoplasmic SPB components showing that duplication has already been partly achieved by the end of the preceding cell cycle when the satellite is formed. We show that by overexpression of SPB components the structure of the satellite can be changed and SPB duplication inhibited by disrupting the attachment of the plaque-like intermediate to the half-bridge. We present a model for SPB duplication where binding of SPB components to either end of the bridge structure ensures two separate SPBs.
Collapse
|
research-article |
26 |
163 |
7
|
Chen W, Lee MK, Jefcoate C, Kim SC, Chen F, Yu JH. Fungal cytochrome p450 monooxygenases: their distribution, structure, functions, family expansion, and evolutionary origin. Genome Biol Evol 2014; 6:1620-34. [PMID: 24966179 PMCID: PMC4122930 DOI: 10.1093/gbe/evu132] [Citation(s) in RCA: 162] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Cytochrome P450 (CYP) monooxygenase superfamily contributes a broad array of biological functions in living organisms. In fungi, CYPs play diverse and pivotal roles in versatile metabolism and fungal adaptation to specific ecological niches. In this report, CYPomes in the 47 genomes of fungi belong to the phyla Ascomycota, Basidiomycota, Chytridiomycota, and Zygomycota have been studied. The comparison of fungal CYPomes suggests that generally fungi possess abundant CYPs belonging to a variety of families with the two global families CYP51 and CYP61, indicating individuation of CYPomes during the evolution of fungi. Fungal CYPs show highly conserved characteristic motifs, but very low overall sequence similarities. The characteristic motifs of fungal CYPs are distinguishable from those of CYPs in animals, plants, and especially archaea and bacteria. The four representative motifs contribute to the general function of CYPs. Fungal CYP51s and CYP61s can be used as the models for the substrate recognition sites analysis. The CYP proteins are clustered into 15 clades and the phylogenetic analyses suggest that the wide variety of fungal CYPs has mainly arisen from gene duplication. Two large duplication events might have been associated with the booming of Ascomycota and Basidiomycota. In addition, horizontal gene transfer also contributes to the diversification of fungal CYPs. Finally, a possible evolutionary scenario for fungal CYPs along with fungal divergences is proposed. Our results provide the fundamental information for a better understanding of CYP distribution, structure and function, and new insights into the evolutionary events of fungal CYPs along with the evolution of fungi.
Collapse
|
Research Support, Non-U.S. Gov't |
11 |
162 |
8
|
Abstract
The Williams-Beuren syndrome is a genomic disorder (prevalence: 1/7,500 to 1/20,000), caused by a hemizygous contiguous gene deletion on chromosome 7q11.23. Typical symptoms comprise supravalvular aortic stenosis, mental retardation, overfriendliness and visuospatial impairment. The common deletion sizes range of 1.5-1.8 mega base pairs (Mb), encompassing app. 28 genes. For a few genes, a genotype-phenotype correlation has been established. The best-explored gene within this region is the elastin gene; its haploinsufficiency causes arterial stenosis. The region of the Williams-Beuren syndrome consists of a single copy gene region (approximately 1.2 Mb) flanked by repetitive sequences--Low Copy Repeats (LCR). The deletions arise as a consequence of misalignment of these repetitive sequences during meiosis and a following unequal crossing over due to high similarity of LCRs. This review presents an overview of the Williams-Beuren syndrome region considering the genomic assembly, chromosomal rearrangements and their mechanisms (i.e. deletions, duplications, inversions) and evolutionary and historical aspects.
Collapse
|
Review |
16 |
147 |
9
|
Moreno-De-Luca D, Sanders SJ, Willsey AJ, Mulle JG, Lowe JK, Geschwind DH, State MW, Martin CL, Ledbetter DH. Using large clinical data sets to infer pathogenicity for rare copy number variants in autism cohorts. Mol Psychiatry 2013; 18:1090-5. [PMID: 23044707 PMCID: PMC3720840 DOI: 10.1038/mp.2012.138] [Citation(s) in RCA: 125] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/24/2012] [Revised: 07/24/2012] [Accepted: 08/20/2012] [Indexed: 11/16/2022]
Abstract
Copy number variants (CNVs) have a major role in the etiology of autism spectrum disorders (ASD), and several of these have reached statistical significance in case-control analyses. Nevertheless, current ASD cohorts are not large enough to detect very rare CNVs that may be causative or contributory (that is, risk alleles). Here, we use a tiered approach, in which clinically significant CNVs are first identified in large clinical cohorts of neurodevelopmental disorders (including but not specific to ASD), after which these CNVs are then systematically identified within well-characterized ASD cohorts. We focused our initial analysis on 48 recurrent CNVs (segmental duplication-mediated 'hotspots') from 24 loci in 31 516 published clinical cases with neurodevelopmental disorders and 13 696 published controls, which yielded a total of 19 deletion CNVs and 11 duplication CNVs that reached statistical significance. We then investigated the overlap of these 30 CNVs in a combined sample of 3955 well-characterized ASD cases from three published studies. We identified 73 deleterious recurrent CNVs, including 36 deletions from 11 loci and 37 duplications from seven loci, for a frequency of 1 in 54; had we considered the ASD cohorts alone, only 58 CNVs from eight loci (24 deletions from three loci and 34 duplications from five loci) would have reached statistical significance. In conclusion, until there are sufficiently large ASD research cohorts with enough power to detect very rare causative or contributory CNVs, data from larger clinical cohorts can be used to infer the likely clinical significance of CNVs in ASD.
Collapse
|
research-article |
12 |
125 |
10
|
Abstract
Floret fertility is a key determinant of the number of grains per inflorescence in cereals. During the evolution of wheat (Triticum sp.), floret fertility has increased, such that current bread wheat (Triticum aestivum) cultivars set three to five grains per spikelet. However, little is known regarding the genetic basis of floret fertility. The locus Grain Number Increase 1 (GNI1) is shown here to be an important contributor to floret fertility. GNI1 evolved in the Triticeae through gene duplication. The gene, which encodes a homeodomain leucine zipper class I (HD-Zip I) transcription factor, was expressed most abundantly in the most apical floret primordia and in parts of the rachilla, suggesting that it acts to inhibit rachilla growth and development. The level of GNI1 expression has decreased over the course of wheat evolution under domestication, leading to the production of spikes bearing more fertile florets and setting more grains per spikelet. Genetic analysis has revealed that the reduced-function allele GNI-A1 contributes to the increased number of fertile florets per spikelet. The RNAi-based knockdown of GNI1 led to an increase in the number of both fertile florets and grains in hexaploid wheat. Mutants carrying an impaired GNI-A1 allele out-yielded WT allele carriers under field conditions. The data show that gene duplication generated evolutionary novelty affecting floret fertility while mutations favoring increased grain production have been under selection during wheat evolution under domestication.
Collapse
|
Research Support, U.S. Gov't, Non-P.H.S. |
6 |
122 |
11
|
Dopman EB, Hartl DL. A portrait of copy-number polymorphism in Drosophila melanogaster. Proc Natl Acad Sci U S A 2007; 104:19920-5. [PMID: 18056801 PMCID: PMC2148398 DOI: 10.1073/pnas.0709888104] [Citation(s) in RCA: 120] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2007] [Indexed: 11/18/2022] Open
Abstract
Thomas Hunt Morgan and colleagues identified variation in gene copy number in Drosophila in the 1920s and 1930s and linked such variation to phenotypic differences [Bridges CB (1936) Science 83:210]. Yet the extent of variation in the number of chromosomes, chromosomal regions, or gene copies, and the importance of this variation within species, remain poorly understood. Here, we focus on copy-number variation in Drosophila melanogaster. We characterize copy-number polymorphism (CNP) across genomic regions, and we contrast patterns to infer the evolutionary processes acting on this variation. Copy-number variation in D. melanogaster is nonrandomly distributed, presumably because of a mutational bias produced by tandem repeats or other mechanisms. Comparisons of coding and noncoding CNPs, however, reveal a strong effect of purifying selection in the removal of structural variation from functionally constrained regions. Most patterns of CNP in D. melanogaster suggest that negative selection and mutational biases are the primary agents responsible for shaping structural variation.
Collapse
|
Research Support, N.I.H., Extramural |
18 |
120 |
12
|
Zimmer CT, Garrood WT, Singh KS, Randall E, Lueke B, Gutbrod O, Matthiesen S, Kohler M, Nauen R, Davies TGE, Bass C. Neofunctionalization of Duplicated P450 Genes Drives the Evolution of Insecticide Resistance in the Brown Planthopper. Curr Biol 2018; 28:268-274.e5. [PMID: 29337073 PMCID: PMC5788746 DOI: 10.1016/j.cub.2017.11.060] [Citation(s) in RCA: 116] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2017] [Revised: 11/22/2017] [Accepted: 11/23/2017] [Indexed: 11/25/2022]
Abstract
Gene duplication is a major source of genetic variation that has been shown to underpin the evolution of a wide range of adaptive traits [1, 2]. For example, duplication or amplification of genes encoding detoxification enzymes has been shown to play an important role in the evolution of insecticide resistance [3, 4, 5]. In this context, gene duplication performs an adaptive function as a result of its effects on gene dosage and not as a source of functional novelty [3, 6, 7, 8]. Here, we show that duplication and neofunctionalization of a cytochrome P450, CYP6ER1, led to the evolution of insecticide resistance in the brown planthopper. Considerable genetic variation was observed in the coding sequence of CYP6ER1 in populations of brown planthopper collected from across Asia, but just two sequence variants are highly overexpressed in resistant strains and metabolize imidacloprid. Both variants are characterized by profound amino-acid alterations in substrate recognition sites, and the introduction of these mutations into a susceptible P450 sequence is sufficient to confer resistance. CYP6ER1 is duplicated in resistant strains with individuals carrying paralogs with and without the gain-of-function mutations. Despite numerical parity in the genome, the susceptible and mutant copies exhibit marked asymmetry in their expression with the resistant paralogs overexpressed. In the primary resistance-conferring CYP6ER1 variant, this results from an extended region of novel sequence upstream of the gene that provides enhanced expression. Our findings illustrate the versatility of gene duplication in providing opportunities for functional and regulatory innovation during the evolution of an adaptive trait.
The cytochrome P450 CYP6ER1 is duplicated in imidacloprid resistant N. lugens strains Amino-acid alterations in certain CYP6ER1 variants confer resistance to imidacloprid Resistant hoppers have paralogs with and without the gain-of-function mutations The susceptible and mutant CYP6ER1 copies show marked divergence in their expression
Collapse
|
Research Support, Non-U.S. Gov't |
7 |
116 |
13
|
Katju V, Bergthorsson U. Copy-number changes in evolution: rates, fitness effects and adaptive significance. Front Genet 2013; 4:273. [PMID: 24368910 PMCID: PMC3857721 DOI: 10.3389/fgene.2013.00273] [Citation(s) in RCA: 111] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2013] [Accepted: 11/18/2013] [Indexed: 11/13/2022] Open
Abstract
Gene copy-number differences due to gene duplications and deletions are rampant in natural populations and play a crucial role in the evolution of genome complexity. Per-locus analyses of gene duplication rates in the pre-genomic era revealed that gene duplication rates are much higher than the per nucleotide substitution rate. Analyses of gene duplication and deletion rates in mutation accumulation lines of model organisms have revealed that these high rates of copy-number mutations occur at a genome-wide scale. Furthermore, comparisons of the spontaneous duplication and deletion rates to copy-number polymorphism data and bioinformatic-based estimates of duplication rates from sequenced genomes suggest that the vast majority of gene duplications are detrimental and removed by natural selection. The rate at which new gene copies appear in populations greatly influences their evolutionary dynamics and standing gene copy-number variation in populations. The opportunity for mutations that result in the maintenance of duplicate copies, either through neofunctionalization or subfunctionalization, also depends on the equilibrium frequency of additional gene copies in the population, and hence on the spontaneous gene duplication (and loss) rate. The duplication rate may therefore have profound effects on the role of adaptation in the evolution of duplicated genes as well as important consequences for the evolutionary potential of organisms. We further discuss the broad ramifications of this standing gene copy-number variation on fitness and adaptive potential from a population-genetic and genome-wide perspective.
Collapse
|
Review |
12 |
111 |
14
|
Gout JF, Lynch M. Maintenance and Loss of Duplicated Genes by Dosage Subfunctionalization. Mol Biol Evol 2015; 32:2141-8. [PMID: 25908670 DOI: 10.1093/molbev/msv095] [Citation(s) in RCA: 110] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Whole-genome duplications (WGDs) have contributed to gene-repertoire enrichment in many eukaryotic lineages. However, most duplicated genes are eventually lost and it is still unclear why some duplicated genes are evolutionary successful whereas others quickly turn to pseudogenes. Here, we show that dosage constraints are major factors opposing post-WGD gene loss in several Paramecium species that share a common ancestral WGD. We propose a model where a majority of WGD-derived duplicates preserve their ancestral function and are retained to produce enough of the proteins performing this same ancestral function. Under this model, the expression level of individual duplicated genes can evolve neutrally as long as they maintain a roughly constant summed expression, and this allows random genetic drift toward uneven contributions of the two copies to total expression. Our analysis suggests that once a high level of imbalance is reached, which can require substantial lengths of time, the copy with the lowest expression level contributes a small enough fraction of the total expression that selection no longer opposes its loss. Extension of our analysis to yeast species sharing a common ancestral WGD yields similar results, suggesting that duplicated-gene retention for dosage constraints followed by divergence in expression level and eventual deterministic gene loss might be a universal feature of post-WGD evolution.
Collapse
|
Research Support, U.S. Gov't, Non-P.H.S. |
10 |
110 |
15
|
Beckers A, Lodish MB, Trivellin G, Rostomyan L, Lee M, Faucz FR, Yuan B, Choong CS, Caberg JH, Verrua E, Naves LA, Cheetham TD, Young J, Lysy PA, Petrossians P, Cotterill A, Shah NS, Metzger D, Castermans E, Ambrosio MR, Villa C, Strebkova N, Mazerkina N, Gaillard S, Barra GB, Casulari LA, Neggers SJ, Salvatori R, Jaffrain-Rea ML, Zacharin M, Santamaria BL, Zacharieva S, Lim EM, Mantovani G, Zatelli MC, Collins MT, Bonneville JF, Quezado M, Chittiboina P, Oldfield EH, Bours V, Liu P, De Herder W, Pellegata N, Lupski JR, Daly AF, Stratakis CA. X-linked acrogigantism syndrome: clinical profile and therapeutic responses. Endocr Relat Cancer 2015; 22:353-67. [PMID: 25712922 PMCID: PMC4433400 DOI: 10.1530/erc-15-0038] [Citation(s) in RCA: 109] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/23/2015] [Indexed: 12/31/2022]
Abstract
X-linked acrogigantism (X-LAG) is a new syndrome of pituitary gigantism, caused by microduplications on chromosome Xq26.3, encompassing the gene GPR101, which is highly upregulated in pituitary tumors. We conducted this study to explore the clinical, radiological, and hormonal phenotype and responses to therapy in patients with X-LAG syndrome. The study included 18 patients (13 sporadic) with X-LAG and microduplication of chromosome Xq26.3. All sporadic cases had unique duplications and the inheritance pattern in two families was dominant, with all Xq26.3 duplication carriers being affected. Patients began to grow rapidly as early as 2-3 months of age (median 12 months). At diagnosis (median delay 27 months), patients had a median height and weight standard deviation scores (SDS) of >+3.9 SDS. Apart from the increased overall body size, the children had acromegalic symptoms including acral enlargement and facial coarsening. More than a third of cases had increased appetite. Patients had marked hypersecretion of GH/IGF1 and usually prolactin, due to a pituitary macroadenoma or hyperplasia. Primary neurosurgical control was achieved with extensive anterior pituitary resection, but postoperative hypopituitarism was frequent. Control with somatostatin analogs was not readily achieved despite moderate to high levels of expression of somatostatin receptor subtype-2 in tumor tissue. Postoperative use of adjuvant pegvisomant resulted in control of IGF1 in all five cases where it was employed. X-LAG is a new infant-onset gigantism syndrome that has a severe clinical phenotype leading to challenging disease management.
Collapse
|
Research Support, N.I.H., Extramural |
10 |
109 |
16
|
Rees E, Kirov G, Sanders A, Walters JTR, Chambert KD, Shi J, Szatkiewicz J, O'Dushlaine C, Richards AL, Green EK, Jones I, Davies G, Legge SE, Moran JL, Pato C, Pato M, Genovese G, Levinson D, Duan J, Moy W, Göring HHH, Morris D, Cormican P, Kendler KS, O'Neill FA, Riley B, Gill M, Corvin A, Wellcome Trust Case Control Consortium 19, Craddock N, Sklar P, Hultman C, Sullivan PF, Gejman PV, McCarroll SA, O'Donovan MC, Owen MJ. Evidence that duplications of 22q11.2 protect against schizophrenia. Mol Psychiatry 2014; 19:37-40. [PMID: 24217254 PMCID: PMC3873028 DOI: 10.1038/mp.2013.156] [Citation(s) in RCA: 104] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/03/2013] [Revised: 09/03/2013] [Accepted: 09/25/2013] [Indexed: 01/15/2023]
Abstract
A number of large, rare copy number variants (CNVs) are deleterious for neurodevelopmental disorders, but large, rare, protective CNVs have not been reported for such phenotypes. Here we show in a CNV analysis of 47 005 individuals, the largest CNV analysis of schizophrenia to date, that large duplications (1.5-3.0 Mb) at 22q11.2--the reciprocal of the well-known, risk-inducing deletion of this locus--are substantially less common in schizophrenia cases than in the general population (0.014% vs 0.085%, OR=0.17, P=0.00086). 22q11.2 duplications represent the first putative protective mutation for schizophrenia.
Collapse
|
Research Support, N.I.H., Extramural |
11 |
104 |
17
|
Pereira-Leal JB, Levy ED, Teichmann SA. The origins and evolution of functional modules: lessons from protein complexes. Philos Trans R Soc Lond B Biol Sci 2006; 361:507-17. [PMID: 16524839 PMCID: PMC1609335 DOI: 10.1098/rstb.2005.1807] [Citation(s) in RCA: 100] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Modularity is an attribute of a system that can be decomposed into a set of cohesive entities that are loosely coupled. Many cellular networks can be decomposed into functional modules-each functionally separable from the other modules. The protein complexes in physical protein interaction networks are a good example of this, and here we focus on their origins and evolution. We investigate the emergence of protein complexes and physical interactions between proteins by duplication, and review other mechanisms. We dissect the dataset of protein complexes of known three-dimensional structure, and show that roughly 90% of these complexes contain contacts between identical proteins within the same complex. Proteins that are shared across different complexes occur frequently, and they tend to be essential genes more often than members of a single protein complex. We also provide a perspective on the evolutionary mechanisms driving the growth of other modular cellular networks such as transcriptional regulatory and metabolic networks.
Collapse
|
Review |
19 |
100 |
18
|
Middendorp S, Küntziger T, Abraham Y, Holmes S, Bordes N, Paintrand M, Paoletti A, Bornens M. A role for centrin 3 in centrosome reproduction. J Cell Biol 2000; 148:405-16. [PMID: 10662768 PMCID: PMC2174797 DOI: 10.1083/jcb.148.3.405] [Citation(s) in RCA: 100] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Centrosome reproduction by duplication is essential for the bipolarity of cell division, but the molecular basis of this process is still unknown. Mutations in Saccharomyces cerevisiae CDC31 gene prevent the duplication of the spindle pole body (SPB). The product of this gene belongs to the calmodulin super-family and is concentrated at the half bridge of the SPB. We present a functional analysis of HsCEN3, a human centrin gene closely related to the CDC31 gene. Transient overexpression of wild-type or mutant forms of HsCen3p in human cells demonstrates that centriole localization depends on a functional fourth EF-hand, but does not produce mitotic phenotype. However, injection of recombinant HsCen3p or of RNA encoding HsCen3p in one blastomere of two-cell stage Xenopus laevis embryos resulted in undercleavage and inhibition of centrosome duplication. Furthermore, HsCEN3 does not complement mutations or deletion of CDC31 in S. cerevisiae, but specifically blocks SPB duplication, indicating that the human protein acts as a dominant negative mutant of CDC31. Several lines of evidence indicate that HsCen3p acts by titrating Cdc31p-binding protein(s). Our results demonstrate that, in spite of the large differences in centrosome structure among widely divergent species, the centrosome pathway of reproduction is conserved.
Collapse
|
research-article |
25 |
100 |
19
|
Li J, Shou J, Guo Y, Tang Y, Wu Y, Jia Z, Zhai Y, Chen Z, Xu Q, Wu Q. Efficient inversions and duplications of mammalian regulatory DNA elements and gene clusters by CRISPR/Cas9. J Mol Cell Biol 2015; 7:284-98. [PMID: 25757625 PMCID: PMC4524425 DOI: 10.1093/jmcb/mjv016] [Citation(s) in RCA: 99] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2015] [Accepted: 03/02/2015] [Indexed: 12/26/2022] Open
Abstract
The human genome contains millions of DNA regulatory elements and a large number of gene clusters, most of which have not been tested experimentally. The clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated nuclease 9 (Cas9) programed with a synthetic single-guide RNA (sgRNA) emerges as a method for genome editing in virtually any organisms. Here we report that targeted DNA fragment inversions and duplications could easily be achieved in human and mouse genomes by CRISPR with two sgRNAs. Specifically, we found that, in cultured human cells and mice, efficient precise inversions of DNA fragments ranging in size from a few tens of bp to hundreds of kb could be generated. In addition, DNA fragment duplications and deletions could also be generated by CRISPR through trans-allelic recombination between the Cas9-induced double-strand breaks (DSBs) on two homologous chromosomes (chromatids). Moreover, junctions of combinatorial inversions and duplications of the protocadherin (Pcdh) gene clusters induced by Cas9 with four sgRNAs could be detected. In mice, we obtained founders with alleles of precise inversions, duplications, and deletions of DNA fragments of variable sizes by CRISPR. Interestingly, we found that very efficient inversions were mediated by microhomology-mediated end joining (MMEJ) through short inverted repeats. We showed for the first time that DNA fragment inversions could be transmitted through germlines in mice. Finally, we applied this CRISPR method to a regulatory element of the Pcdhα cluster and found a new role in the regulation of members of the Pcdhγ cluster. This simple and efficient method should be useful in manipulating mammalian genomes to study millions of regulatory DNA elements as well as vast numbers of gene clusters.
Collapse
|
research-article |
10 |
99 |
20
|
Martinez-Castilla LP, Alvarez-Buylla ER. Adaptive evolution in the Arabidopsis MADS-box gene family inferred from its complete resolved phylogeny. Proc Natl Acad Sci U S A 2003; 100:13407-12. [PMID: 14597714 PMCID: PMC263827 DOI: 10.1073/pnas.1835864100] [Citation(s) in RCA: 94] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2003] [Indexed: 11/18/2022] Open
Abstract
Gene duplication is a substrate of evolution. However, the relative importance of positive selection versus relaxation of constraints in the functional divergence of gene copies is still under debate. Plant MADS-box genes encode transcriptional regulators key in various aspects of development and have undergone extensive duplications to form a large family. We recovered 104 MADS sequences from the Arabidopsis genome. Bayesian phylogenetic trees recover type II lineage as a monophyletic group and resolve a branching sequence of monophyletic groups within this lineage. The type I lineage is comprised of several divergent groups. However, contrasting gene structure and patterns of chromosomal distribution between type I and II sequences suggest that they had different evolutionary histories and support the placement of the root of the gene family between these two groups. Site-specific and site-branch analyses of positive Darwinian selection (PDS) suggest that different selection regimes could have affected the evolution of these lineages. We found evidence for PDS along the branch leading to flowering time genes that have a direct impact on plant fitness. Sites with high probabilities of having been under PDS were found in the MADS and K domains, suggesting that these played important roles in the acquisition of novel functions during MADS-box diversification. Detected sites are targets for further experimental analyses. We argue that adaptive changes in MADS-domain protein sequences have been important for their functional divergence, suggesting that changes within coding regions of transcriptional regulators have influenced phenotypic evolution of plants.
Collapse
|
research-article |
22 |
94 |
21
|
Cellular Phenotypes in Human iPSC-Derived Neurons from a Genetic Model of Autism Spectrum Disorder. Cell Rep 2018; 21:2678-2687. [PMID: 29212016 DOI: 10.1016/j.celrep.2017.11.037] [Citation(s) in RCA: 90] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2017] [Revised: 09/01/2017] [Accepted: 11/10/2017] [Indexed: 01/26/2023] Open
Abstract
A deletion or duplication in the 16p11.2 region is associated with neurodevelopmental disorders, including autism spectrum disorder and schizophrenia. In addition to clinical characteristics, carriers of the 16p11.2 copy-number variant (CNV) manifest opposing neuroanatomical phenotypes-e.g., macrocephaly in deletion carriers (16pdel) and microcephaly in duplication carriers (16pdup). Using fibroblasts obtained from 16pdel and 16pdup carriers, we generated induced pluripotent stem cells (iPSCs) and differentiated them into neurons to identify causal cellular mechanisms underlying neurobiological phenotypes. Our study revealed increased soma size and dendrite length in 16pdel neurons and reduced neuronal size and dendrite length in 16pdup neurons. The functional properties of iPSC-derived neurons corroborated aspects of these contrasting morphological differences that may underlie brain size. Interestingly, both 16pdel and 16pdup neurons displayed reduced synaptic density, suggesting that distinct mechanisms may underlie brain size and neuronal connectivity at this locus.
Collapse
|
Journal Article |
7 |
90 |
22
|
Chakraborty M, Jarvis ED. Brain evolution by brain pathway duplication. Philos Trans R Soc Lond B Biol Sci 2016; 370:rstb.2015.0056. [PMID: 26554045 PMCID: PMC4650129 DOI: 10.1098/rstb.2015.0056] [Citation(s) in RCA: 83] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Understanding the mechanisms of evolution of brain pathways for complex behaviours is still in its infancy. Making further advances requires a deeper understanding of brain homologies, novelties and analogies. It also requires an understanding of how adaptive genetic modifications lead to restructuring of the brain. Recent advances in genomic and molecular biology techniques applied to brain research have provided exciting insights into how complex behaviours are shaped by selection of novel brain pathways and functions of the nervous system. Here, we review and further develop some insights to a new hypothesis on one mechanism that may contribute to nervous system evolution, in particular by brain pathway duplication. Like gene duplication, we propose that whole brain pathways can duplicate and the duplicated pathway diverge to take on new functions. We suggest that one mechanism of brain pathway duplication could be through gene duplication, although other mechanisms are possible. We focus on brain pathways for vocal learning and spoken language in song-learning birds and humans as example systems. This view presents a new framework for future research in our understanding of brain evolution and novel behavioural traits.
Collapse
|
Review |
9 |
83 |
23
|
Book A, Guella I, Candido T, Brice A, Hattori N, Jeon B, Farrer MJ, SNCA Multiplication Investigators of the GEoPD Consortium. A Meta-Analysis of α-Synuclein Multiplication in Familial Parkinsonism. Front Neurol 2018; 9:1021. [PMID: 30619023 PMCID: PMC6297377 DOI: 10.3389/fneur.2018.01021] [Citation(s) in RCA: 81] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2018] [Accepted: 11/13/2018] [Indexed: 11/18/2022] Open
Abstract
Chronic alpha-synuclein (SNCA) overexpression is a relatively homogenous and well-defined cause of parkinsonism and dementia. Parkinson's disease (PD), PD with dementia, dementia with Lewy bodies and multiple system atrophy all manifest in SNCA multiplication families. Herein we summarize genealogic, clinical and genetic data from 59 families (25 not previously published) with parkinsonism caused by SNCA multiplications. Longitudinal clinical assessments and genealogic relationships were documented for all family members. All probands were genotyped with an Illumina MEGA high-density genotyping array to identify copy number variants (CNV) and enable SNCA multiplication breakpoints to be defined. Three SNCA short tandem repeat (STR) markers were genotyped in all available samples to validate genomic dosage and inheritance. A web-application was built as a forum for future data sharing. CNV analysis identified 49 subjects with heterozygous SNCA duplication (CNV3), 2 with homozygous duplication (CNV4) and 7 with a triplication mutation (CNV4). Clinical presentations varied greatly throughout the cohort. SNCA dosage correlates with disease onset (mean age of onset CNV3: 46.9 ± 10.5 years vs. 34.5 ± 7.4 CNV4, p = 0.003). Atypical or more severe clinical courses were described in several patients and dementia was noted in 50.9% of the probands. Neither the multiplication size (average 2.05 ± 2.45 Mb) nor the number of genes included (range 1-50) was associated with motor symptom onset or dementia. Families with SNCA multiplication are rare and globally-distributed. Nevertheless, they may both inform and benefit from the development of SNCA targeted therapeutic strategies relevant to the treatment of all alpha-synucleinopathies.
Collapse
|
research-article |
7 |
81 |
24
|
Sinkus ML, Lee MJ, Gault J, Logel J, Short M, Freedman R, Christian SL, Lyon J, Leonard S. A 2-base pair deletion polymorphism in the partial duplication of the alpha7 nicotinic acetylcholine gene (CHRFAM7A) on chromosome 15q14 is associated with schizophrenia. Brain Res 2009; 1291:1-11. [PMID: 19631623 PMCID: PMC2747474 DOI: 10.1016/j.brainres.2009.07.041] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2009] [Revised: 07/13/2009] [Accepted: 07/14/2009] [Indexed: 11/27/2022]
Abstract
Multiple genetic linkage studies support the hypothesis that the 15q13-14 chromosomal region contributes to the etiology of schizophrenia. Among the putative candidate genes in this area are the alpha7 nicotinic acetylcholine receptor gene (CHRNA7) and its partial duplication, CHRFAM7A. A large chromosomal segment including the CHRFAM7A gene locus, but not the CHRNA7 locus, is deleted in some individuals. The CHRFAM7A gene contains a polymorphism consisting of a 2 base pair (2 bp) deletion at position 497-498 bp of exon 6. We employed PCR-based methods to quantify the copy number of CHRFAM7A and the presence of the 2 bp polymorphism in a large, multi-ethnic population. The 2 bp polymorphism was associated with schizophrenia in African Americans (genotype p=0.005, allele p=0.015), and in Caucasians (genotype p=0.015, allele p=0.009). We conclude that the presence of the 2 bp polymorphism at the CHRFAM7A locus may have a functional significance in schizophrenia.
Collapse
|
Research Support, N.I.H., Extramural |
16 |
76 |
25
|
Huchard E, Martinez M, Alout H, Douzery EJ, Lutfalla G, Berthomieu A, Berticat C, Raymond M, Weill M. Acetylcholinesterase genes within the Diptera: takeover and loss in true flies. Proc Biol Sci 2006; 273:2595-604. [PMID: 17002944 PMCID: PMC1635460 DOI: 10.1098/rspb.2006.3621] [Citation(s) in RCA: 74] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2006] [Accepted: 05/13/2006] [Indexed: 11/12/2022] Open
Abstract
It has recently been reported that the synaptic acetylcholinesterase (AChE) in mosquitoes is encoded by the ace-1 gene, distinct and divergent from the ace-2 gene, which performs this function in Drosophila. This is an unprecedented situation within the Diptera order because both ace genes derive from an old duplication and are present in most insects and arthropods. Nevertheless, Drosophila possesses only the ace-2 gene. Thus, a secondary loss occurred during the evolution of Diptera, implying a vital function switch from one gene (ace-1) to the other (ace-2). We sampled 78 species, representing 50 families (27% of the Dipteran families) spread over all major subdivisions of the Diptera, and looked for ace-1 and ace-2 by systematic PCR screening to determine which taxonomic groups within the Diptera have this gene change. We show that this loss probably extends to all true flies (or Cyclorrhapha), a large monophyletic group of the Diptera. We also show that ace-2 plays a non-detectable role in the synaptic AChE in a lower Diptera species, suggesting that it has non-synaptic functions. A relative molecular evolution rate test showed that the intensity of purifying selection on ace-2 sequences is constant across the Diptera, irrespective of the presence or absence of ace-1, confirming the evolutionary importance of non-synaptic functions for this gene. We discuss the evolutionary scenarios for the takeover of ace-2 and the loss of ace-1, taking into account our limited knowledge of non-synaptic functions of ace genes and some specific adaptations of true flies.
Collapse
|
Comparative Study |
19 |
74 |