1
|
Stavropoulou A, Tassios E, Kalyva M, Georgoulopoulos M, Vakirlis N, Iliopoulos I, Nikolaou C. Distinct chromosomal “niches” in the genome of Saccharomyces cerevisiae provide the background for genomic innovation and shape the fate of gene duplicates. NAR Genom Bioinform 2022; 4:lqac086. [PMID: 36381424 PMCID: PMC9661399 DOI: 10.1093/nargab/lqac086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Revised: 10/20/2022] [Accepted: 10/25/2022] [Indexed: 11/15/2022] Open
Abstract
Nearly one third of Saccharomyces cerevisiae protein coding sequences correspond to duplicate genes, equally split between small-scale duplicates (SSD) and whole-genome duplicates (WGD). While duplicate genes have distinct properties compared to singletons, to date, there has been no systematic analysis of their positional preferences. In this work, we show that SSD and WGD genes are organized in distinct gene clusters that occupy different genomic regions, with SSD being more peripheral and WGD more centrally positioned close to centromeric chromatin. Duplicate gene clusters differ from the rest of the genome in terms of gene size and spacing, gene expression variability and regulatory complexity, properties that are also shared by singleton genes residing within them. Singletons within duplicate gene clusters have longer promoters, more complex structure and a higher number of protein–protein interactions. Particular chromatin architectures appear to be important for gene evolution, as we find SSD gene-pair co-expression to be strongly associated with the similarity of nucleosome positioning patterns. We propose that specific regions of the yeast genome provide a favourable environment for the generation and maintenance of small-scale gene duplicates, segregating them from WGD-enriched genomic domains. Our findings provide a valuable framework linking genomic innovation with positional genomic preferences.
Collapse
Affiliation(s)
- Athanasia Stavropoulou
- Medical School, University of Crete , Heraklion 70013, Greece
- Computational Genomics Group, Biomedical Sciences Research Center “Alexander Fleming” , Athens 16672, Greece
| | - Emilios Tassios
- Medical School, University of Crete , Heraklion 70013, Greece
- Computational Genomics Group, Biomedical Sciences Research Center “Alexander Fleming” , Athens 16672, Greece
| | - Maria Kalyva
- European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus , Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | - Nikolaos Vakirlis
- Computational Genomics Group, Biomedical Sciences Research Center “Alexander Fleming” , Athens 16672, Greece
| | | | - Christoforos Nikolaou
- Computational Genomics Group, Biomedical Sciences Research Center “Alexander Fleming” , Athens 16672, Greece
- Hellenic Open University , Patras 26335, Greece
| |
Collapse
|
2
|
Zhang Y, Yu Z, Zheng C, Sankoff D. Integrated synteny- and similarity-based inference on the polyploidization-fractionation cycle. Interface Focus 2021; 11:20200059. [PMID: 34123351 PMCID: PMC8193467 DOI: 10.1098/rsfs.2020.0059] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/13/2021] [Indexed: 11/21/2022] Open
Abstract
Whole-genome doubling, tripling or replicating to a greater degree, due to fixation of polyploidization events, is attested in almost all lineages of the flowering plants, recurring in the ancestry of some plants two, three or more times in retracing their history to the earliest angiosperm. This major mechanism in plant genome evolution, which generally appears as instantaneous on the evolutionary time scale, sets in operation a compensatory process called fractionation, the loss of duplicate genes, initially rapid, but continuing at a diminishing rate over millions and tens of millions of years. We study this process by statistically comparing the distribution of duplicate gene pairs as a function of their time of creation through polyploidization, as measured by sequence similarity. The stochastic model that accounts for this distribution, though exceedingly simple, still has too many parameters to be estimated based only on the similarity distribution, while the computational procedures for compiling the distribution from annotated genomic data is heavily biased against earlier polyploidization events—syntenic ‘crumble’. Other parameters, such as the size of the initial gene complement and the ploidy of the various events giving rise to duplicate gene pairs, are even more inaccessible to estimation. Here, we show how the frequency of unpaired genes, identified via their embedding in stretches of duplicate pairs, together with previously established constraints among some parameters, adds enormously to the range of successive polyploidization events that can be analysed. This also allows us to estimate the initial gene complement and to correct for the bias due to crumble. We explore the applicability of our methodology to four flowering plant genomes covering a range of different polyploidization histories.
Collapse
Affiliation(s)
- Yue Zhang
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada K1N 6N5
| | - Zhe Yu
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada K1N 6N5
| | - Chunfang Zheng
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada K1N 6N5
| | - David Sankoff
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada K1N 6N5
| |
Collapse
|
3
|
Yu Z, Zheng C, Albert VA, Sankoff D. Excision Dominates Pseudogenization During Fractionation After Whole Genome Duplication and in Gene Loss After Speciation in Plants. Front Genet 2021; 11:603056. [PMID: 33391353 PMCID: PMC7775554 DOI: 10.3389/fgene.2020.603056] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 11/27/2020] [Indexed: 11/20/2022] Open
Abstract
We take advantage of synteny blocks, the analytical construct enabled at the evolutionary moment of speciation or polyploidization, to follow the independent loss of duplicate genes in two sister species or the loss through fractionation of syntenic paralogs in a doubled genome. By examining how much sequence remains after a contiguous series of genes is deleted, we find that this residue remains at a constant low level independent of how many genes are lost—there are few if any relics of the missing sequence. Pseudogenes are rare or extremely transient in this context. The potential exceptions lie exclusively with a few examples of speciation, where the synteny blocks in some larger genomes tolerate degenerate sequence during genomic divergence of two species, but not after whole genome doubling in the same species where fractionation pressure eliminates virtually all non-coding sequence.
Collapse
Affiliation(s)
- Zhe Yu
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, ON, Canada
| | - Chunfang Zheng
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, ON, Canada
| | - Victor A Albert
- Department of Biological Sciences, University at Buffalo, Buffalo, NY, United States
| | - David Sankoff
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, ON, Canada
| |
Collapse
|
4
|
Martín-Vide C, Vega-Rodríguez MA, Wheeler T. Gaps and Runs in Syntenic Alignments. ALGORITHMS FOR COMPUTATIONAL BIOLOGY 2020. [PMCID: PMC7197063 DOI: 10.1007/978-3-030-42266-0_5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Gene loss is the obverse of novel gene acquisition by a genome through a variety of evolutionary processes. It serves a number of functional and structural roles, compensating for the energy and material costs of gene complement expansion. A type of gene loss widespread in the lineages of plant genomes is “fractionation” after whole genome doubling or tripling, where one of a pair or triplet of paralogous genes in parallel syntenic contexts is discarded. The detailed syntenic mechanisms of gene loss, especially in fractionation, remain controversial. We focus on the the frequency distribution of gap lengths (number of deleted genes – not nucleotides) within syntenic blocks calculated during the comparison of chromosomes from two genomes. We mathematically characterize s simple model in some detail and show how it is an adequate description neither of the Coffea arabica subgenomes nor its two progenitor genomes. We find that a mixture of two models, a random, one-gene-at-a-time, model and a geometric-length distributed excision for removing a variable number of genes, fits well.
Collapse
|
5
|
Abstract
The recurrent cycle of whole genome duplication (WGD) followed by massive duplicate gene loss (fractionation) differentiates plant evolutionary history from that of most other phylogenetic domains, where WGD has occurred relatively rarely, even on an evolutionary time scale. We discuss the mechanism of WGD and its biological consequences. We survey the prevalence of WGD in the flowering plants. We outline some of the major kinds of combinatorial optimization problems arising in computational biology for analyzing WGD. Fractionation and its consequences are the subject of mathematical modeling questions and further combinatorial algorithms. A strong connection is made between WGD in phylogenetic context and the theory of gene trees and species trees. We illustrate the analysis of WGD with studies involving a large number of sequenced plant genomes, including grape, the crucifers and other rosids, the asterid tomato, the eudicot Nelumbo nucifera and pineapple, a monocot.
Collapse
Affiliation(s)
- David Sankoff
- Department of Mathematics and Statistics, University of Ottawa, 585 King Edward Ave., Ottawa, ON, K1N 6N5, Canada.
| | - Chunfang Zheng
- Department of Mathematics and Statistics, University of Ottawa, 585 King Edward Ave., Ottawa, ON, K1N 6N5, Canada
| |
Collapse
|
6
|
Yu Z, Sankoff D. A continuous analog of run length distributions reflecting accumulated fractionation events. BMC Bioinformatics 2016; 17:412. [PMID: 28185566 PMCID: PMC5123346 DOI: 10.1186/s12859-016-1265-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background We propose a new, continuous model of the fractionation process (duplicate gene deletion after polyploidization) on the real line. The aim is to infer how much DNA is deleted at a time, based on segment lengths for alternating deleted (invisible) and undeleted (visible) regions. Results After deriving a number of analytical results for “one-sided” fractionation, we undertake a series of simulations that help us identify the distribution of segment lengths as a gamma with shape and rate parameters evolving over time. This leads to an inference procedure based on observed length distributions for visible and invisible segments. Conclusions We suggest extensions of this mathematical and simulation work to biologically realistic discrete models, including two-sided fractionation.
Collapse
Affiliation(s)
- Zhe Yu
- Department of Mathematics and Statistics, University of Ottawa, 585 King Edward Avenue, Ottawa, Ontario, K1N 6N5, Canada
| | - David Sankoff
- Department of Mathematics and Statistics, University of Ottawa, 585 King Edward Avenue, Ottawa, Ontario, K1N 6N5, Canada.
| |
Collapse
|
7
|
Sankoff D, Zheng C, Wang B, Abad Najar C. Structural vs. functional mechanisms of duplicate gene loss following whole genome doubling. BMC Bioinformatics 2015; 16 Suppl 17:S9. [PMID: 26680009 PMCID: PMC4674901 DOI: 10.1186/1471-2105-16-s17-s9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Background The loss of duplicate genes - fractionation - after whole genome doubling (WGD) is the subject to a debate as to whether it proceeds gene by gene or through deletion of multi-gene chromosomal segments. Results WGD produces two copies of every chromosome, namely two identical copies of a sequence of genes. We assume deletion events excise a geometrically distributed number of consecutive genes with mean µ ≥ 1, and these events can combine to produce single-copy runs of length l. If µ = 1, the process is gene-by-gene. If µ > 1, the process at least occasionally excises more than one gene at a time. In the latter case if deletions overlap, the later one simply extends the existing run of single-copy genes. We explore aspects of the predicted distribution of the lengths of single-copy regions analytically, but resort to simulations to show how observing run lengths l allows us to discriminate between the two hypotheses. Conclusions Deletion run length distributions can discriminate between gene-by-gene fractionation and deletion of segments of geometrically distributed length, even if µ is only slightly larger than 1, as long as the genome is large enough and fractionation has not proceeded too far towards completion.
Collapse
|
8
|
Abstract
BACKGROUND Paralog reduction, the loss of duplicate genes after whole genome duplication (WGD) is a pervasive process. Whether this loss proceeds gene by gene or through deletion of multi-gene DNA segments is controversial, as is the question of fractionation bias, namely whether one homeologous chromosome is more vulnerable to gene deletion than the other. RESULTS As a null hypothesis, we first assume deletion events, on either homeolog, excise a geometrically distributed number of genes with unknown mean μ, and a number r of these events overlap to produce deleted runs of length l. There is a fractionation bias 0 ≤ φ ≤ 1 for deletions to fall on one homeolog rather than the other. The parameter r is a random variable with distribution π(·). We simulate the distribution of run lengths l, as well as the underlying π(·), as a function of μ, φ and θ, the proportion of remaining genes in duplicate form. We show how sampling l allows us to estimate μ and φ. The main part of this work is the derivation of a deterministic recurrence to calculate each π(r) as a function of μ, φ and θ. CONCLUSIONS The recurrence for π provides a deeper mathematical understanding of fractionation process than simulations. The parameters μ and φ can be estimated based on run lengths of single-copy regions.
Collapse
Affiliation(s)
- David Sankoff
- Department of Mathematics and Statistics, University of Ottawa, Ottawa K1N 6N5, Canada.
| | | | | |
Collapse
|
9
|
Abstract
Background Paralog reduction, the loss of duplicate genes after whole genome duplication (WGD) is a pervasive process. Whether this loss proceeds gene by gene or through deletion of multi-gene DNA segments is controversial, as is the question of fractionation bias, namely whether one homeologous chromosome is more vulnerable to gene deletion than the other. Results As a null hypothesis, we first assume deletion events, on one homeolog only, excise a geometrically distributed number of genes with unknown mean µ, and these events combine to produce deleted runs of length l, distributed approximately as a negative binomial with unknown parameter r, itself a random variable with distribution π(·). A more realistic model requires deletion events on both homeologs distributed as a truncated geometric. We simulate the distribution of run lengths l in both models, as well as the underlying π(r), as a function of µ, and show how sampling l allows us to estimate µ. We apply this to data on a total of 15 genomes descended from 6 distinct WGD events and show how to correct the bias towards shorter runs caused by genome rearrangements. Because of the difficulty in deriving π(·) analytically, we develop a deterministic recurrence to calculate each π(r) as a function of µ and the proportion of unreduced paralog pairs. Conclusions The parameter µ can be estimated based on run lengths of single-copy regions. Estimates of µ in real data do not exclude the possibility that duplicate gene deletion is largely gene by gene, although it may sometimes involve longer segments.
Collapse
|
10
|
Zwart MP, Dieu BTM, Hemerik L, Vlak JM. Evolutionary trajectory of white spot syndrome virus (WSSV) genome shrinkage during spread in Asia. PLoS One 2010; 5:e13400. [PMID: 20976239 PMCID: PMC2954812 DOI: 10.1371/journal.pone.0013400] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2010] [Accepted: 09/19/2010] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND White spot syndrome virus (WSSV) is the sole member of the novel Nimaviridae family, and the source of major economic problems in shrimp aquaculture. WSSV appears to have rapidly spread worldwide after the first reported outbreak in the early 1990s. Genomic deletions of various sizes occur at two loci in the WSSV genome, the ORF14/15 and ORF23/24 variable regions, and these have been used as molecular markers to study patterns of viral spread over space and time. We describe the dynamics underlying the process of WSSV genome shrinkage using empirical data and a simple mathematical model. METHODOLOGY/PRINCIPAL FINDINGS We genotyped new WSSV isolates from five Asian countries, and analyzed this information together with published data. Genome size appears to stabilize over time, and deletion size in the ORF23/24 variable region was significantly related to the time of the first WSSV outbreak in a particular country. Parameter estimates derived from fitting a simple mathematical model of genome shrinkage to the data support a geometric progression (k<1) of the genomic deletions, with k = 0.371 ± 0.150. CONCLUSIONS/SIGNIFICANCE The data suggest that the rate of genome shrinkage decreases over time before attenuating. Bioassay data provided support for a link between genome size and WSSV fitness in an aquaculture setting. Differences in genomic deletions between geographic WSSV isolates suggest that WSSV spread did not follow a smooth pattern of geographic radiation, suggesting spread of WSSV over long distances by commercial activities. We discuss two hypotheses for genome shrinkage, an adaptive and a neutral one. We argue in favor of the adaptive hypothesis, given that there is support for a link between WSSV genome size and fitness.
Collapse
Affiliation(s)
- Mark P Zwart
- Laboratory of Virology, Wageningen University, Wageningen, The Netherlands.
| | | | | | | |
Collapse
|
11
|
The collapse of gene complement following whole genome duplication. BMC Genomics 2010; 11:313. [PMID: 20482863 PMCID: PMC2896955 DOI: 10.1186/1471-2164-11-313] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2009] [Accepted: 05/19/2010] [Indexed: 01/15/2023] Open
Abstract
Background Genome amplification through duplication or proliferation of transposable elements has its counterpart in genome reduction, by elimination of DNA or by gene inactivation. Whether loss is primarily due to excision of random length DNA fragments or the inactivation of one gene at a time is controversial. Reduction after whole genome duplication (WGD) represents an inexorable collapse in gene complement. Results We compare fifteen genomes descending from six eukaryotic WGD events 20-450 Mya. We characterize the collapse over time through the distribution of runs of reduced paralog pairs in duplicated segments. Descendant genomes of the same WGD event behave as replicates. Choice of paralog pairs to be reduced is random except for some resistant regions of contiguous pairs. For those paralog pairs that are reduced, conserved copies tend to concentrate on one chromosome. Conclusions Both the contiguous regions of reduction-resistant pairs and the concentration of runs of single copy genes on a single chromosome are evidence of transcriptional co-regulation, dosage sensitivity or other functional interaction constraining the reduction process. These constraints and their evolution over time show a consistent pattern across evolutionary domains and a highly reproducible pattern, as replicates, for the several descendants of a single WGD.
Collapse
|
12
|
Brinza L, Viñuelas J, Cottret L, Calevro F, Rahbé Y, Febvay G, Duport G, Colella S, Rabatel A, Gautier C, Fayard JM, Sagot MF, Charles H. Systemic analysis of the symbiotic function of Buchnera aphidicola, the primary endosymbiont of the pea aphid Acyrthosiphon pisum. C R Biol 2009; 332:1034-49. [PMID: 19909925 DOI: 10.1016/j.crvi.2009.09.007] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Buchnera aphidicola is the primary obligate intracellular symbiont of most aphid species. B. aphidicola and aphids have been evolving in parallel since their association started, about 150 Myr ago. Both partners have lost their autonomy, and aphid diversification has been confined to smaller ecological niches by this co-evolution. B. aphidicola has undergone major genomic and biochemical changes as a result of adapting to intracellular life. Several genomes of B. aphidicola from different aphid species have been sequenced in the last decade, making it possible to carry out analyses and comparative studies using system-level in silico methods. This review attempts to provide a systemic description of the symbiotic function of aphid endosymbionts, particularly of B. aphidicola from the pea aphid Acyrthosiphon pisum, by analyzing their structural genomic properties, as well as their genetic and metabolic networks.
Collapse
Affiliation(s)
- Lilia Brinza
- UMR203 BF2I, Biologie fonctionnelle insectes et interactions, Université de Lyon, INRA, INSA-Lyon, IFR41, 20, avenue A. Einstein, 69621 Villeurbanne, France
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Zheng C, Kerr Wall P, Leebens-Mack J, DE Pamphilis C, Albert VA, Sankoff D. Gene loss under neighborhood selection following whole genome duplication and the reconstruction of the ancestral Populus genome. J Bioinform Comput Biol 2009; 7:499-520. [PMID: 19507287 DOI: 10.1142/s0219720009004199] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2008] [Revised: 11/06/2008] [Accepted: 11/11/2008] [Indexed: 11/18/2022]
Abstract
We develop criteria to detect neighborhood selection effects on gene loss following whole genome duplication, and apply them to the recently sequenced poplar (Populus trichocarpa) genome. We improve on guided genome halving algorithms so that several thousand gene sets, each containing two paralogs in the descendant T of the doubling event and their single ortholog from an undoubled reference genome R, can be analyzed to reconstruct the ancestor A of T at the time of doubling. At the same time, large numbers of defective gene sets, either missing one paralog from T or missing their ortholog in R, may be incorporated into the analysis in a consistent way. We apply this genomic rearrangement distance-based approach to the poplar and grapevine (Vitis vinifera) genomes, as T and R respectively. We conclude that, after chromosome doubling, the "choice" of which paralogous gene pairs will lose copies is random, but that the retention of strings of single-copy genes on one chromosome versus the other is decidedly non-random.
Collapse
Affiliation(s)
- Chunfang Zheng
- Department of Biology, University of Ottawa, Ottawa, Ontario K1N 6N5, Canada.
| | | | | | | | | | | |
Collapse
|