1
|
Baltoumas FA, Karatzas E, Paez-Espino D, Venetsianou NK, Aplakidou E, Oulas A, Finn RD, Ovchinnikov S, Pafilis E, Kyrpides NC, Pavlopoulos GA. Exploring microbial functional biodiversity at the protein family level-From metagenomic sequence reads to annotated protein clusters. FRONTIERS IN BIOINFORMATICS 2023; 3:1157956. [PMID: 36959975 PMCID: PMC10029925 DOI: 10.3389/fbinf.2023.1157956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 02/21/2023] [Indexed: 03/06/2023] Open
Abstract
Metagenomics has enabled accessing the genetic repertoire of natural microbial communities. Metagenome shotgun sequencing has become the method of choice for studying and classifying microorganisms from various environments. To this end, several methods have been developed to process and analyze the sequence data from raw reads to end-products such as predicted protein sequences or families. In this article, we provide a thorough review to simplify such processes and discuss the alternative methodologies that can be followed in order to explore biodiversity at the protein family level. We provide details for analysis tools and we comment on their scalability as well as their advantages and disadvantages. Finally, we report the available data repositories and recommend various approaches for protein family annotation related to phylogenetic distribution, structure prediction and metadata enrichment.
Collapse
Affiliation(s)
- Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - David Paez-Espino
- Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States
| | - Nefeli K. Venetsianou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Eleni Aplakidou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Anastasis Oulas
- The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Robert D. Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA, United States
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece
| | - Nikos C. Kyrpides
- Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
- Center of New Biotechnologies and Precision Medicine, Department of Medicine, School of Health Sciences, National and Kapodistrian University of Athens, Athens, Greece
- Hellenic Army Academy, Vari, Greece
| |
Collapse
|
2
|
Stavropoulou A, Tassios E, Kalyva M, Georgoulopoulos M, Vakirlis N, Iliopoulos I, Nikolaou C. Distinct chromosomal “niches” in the genome of Saccharomyces cerevisiae provide the background for genomic innovation and shape the fate of gene duplicates. NAR Genom Bioinform 2022; 4:lqac086. [PMID: 36381424 PMCID: PMC9661399 DOI: 10.1093/nargab/lqac086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Revised: 10/20/2022] [Accepted: 10/25/2022] [Indexed: 11/15/2022] Open
Abstract
Nearly one third of Saccharomyces cerevisiae protein coding sequences correspond to duplicate genes, equally split between small-scale duplicates (SSD) and whole-genome duplicates (WGD). While duplicate genes have distinct properties compared to singletons, to date, there has been no systematic analysis of their positional preferences. In this work, we show that SSD and WGD genes are organized in distinct gene clusters that occupy different genomic regions, with SSD being more peripheral and WGD more centrally positioned close to centromeric chromatin. Duplicate gene clusters differ from the rest of the genome in terms of gene size and spacing, gene expression variability and regulatory complexity, properties that are also shared by singleton genes residing within them. Singletons within duplicate gene clusters have longer promoters, more complex structure and a higher number of protein–protein interactions. Particular chromatin architectures appear to be important for gene evolution, as we find SSD gene-pair co-expression to be strongly associated with the similarity of nucleosome positioning patterns. We propose that specific regions of the yeast genome provide a favourable environment for the generation and maintenance of small-scale gene duplicates, segregating them from WGD-enriched genomic domains. Our findings provide a valuable framework linking genomic innovation with positional genomic preferences.
Collapse
Affiliation(s)
- Athanasia Stavropoulou
- Medical School, University of Crete , Heraklion 70013, Greece
- Computational Genomics Group, Biomedical Sciences Research Center “Alexander Fleming” , Athens 16672, Greece
| | - Emilios Tassios
- Medical School, University of Crete , Heraklion 70013, Greece
- Computational Genomics Group, Biomedical Sciences Research Center “Alexander Fleming” , Athens 16672, Greece
| | - Maria Kalyva
- European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus , Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | - Nikolaos Vakirlis
- Computational Genomics Group, Biomedical Sciences Research Center “Alexander Fleming” , Athens 16672, Greece
| | | | - Christoforos Nikolaou
- Computational Genomics Group, Biomedical Sciences Research Center “Alexander Fleming” , Athens 16672, Greece
- Hellenic Open University , Patras 26335, Greece
| |
Collapse
|
3
|
Van Dyke K, Lutz S, Mekonnen G, Myers CL, Albert FW. Trans-acting genetic variation affects the expression of adjacent genes. Genetics 2021; 217:6126816. [PMID: 33789351 DOI: 10.1093/genetics/iyaa051] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 12/16/2020] [Indexed: 11/13/2022] Open
Abstract
Gene expression differences among individuals are shaped by trans-acting expression quantitative trait loci (eQTLs). Most trans-eQTLs map to hotspot locations that influence many genes. The molecular mechanisms perturbed by hotspots are often assumed to involve "vertical" cascades of effects in pathways that can ultimately affect the expression of thousands of genes. Here, we report that trans-eQTLs can affect the expression of adjacent genes via "horizontal" mechanisms that extend along a chromosome. Genes affected by trans-eQTL hotspots in the yeast Saccharomyces cerevisiae were more likely to be located next to each other than expected by chance. These paired hotspot effects tended to occur at adjacent genes that also show coexpression in response to genetic and environmental perturbations, suggesting shared mechanisms. Physical proximity and shared chromatin state, in addition to regulation of adjacent genes by similar transcription factors, were independently associated with paired hotspot effects among adjacent genes. Paired effects of trans-eQTLs can occur at neighboring genes even when these genes do not share a common function. This phenomenon could result in unexpected connections between regulatory genetic variation and phenotypes.
Collapse
Affiliation(s)
- Krisna Van Dyke
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, MN 55455, USA
| | - Sheila Lutz
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, MN 55455, USA
| | - Gemechu Mekonnen
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, MN 55455, USA
| | - Chad L Myers
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455, USA
| | - Frank W Albert
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
4
|
Hasan AR, Ness RW. Recombination Rate Variation and Infrequent Sex Influence Genetic Diversity in Chlamydomonas reinhardtii. Genome Biol Evol 2021; 12:370-380. [PMID: 32181819 PMCID: PMC7186780 DOI: 10.1093/gbe/evaa057] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/13/2020] [Indexed: 12/12/2022] Open
Abstract
Recombination confers a major evolutionary advantage by breaking up linkage disequilibrium between harmful and beneficial mutations, thereby facilitating selection. However, in species that are only periodically sexual, such as many microbial eukaryotes, the realized rate of recombination is also affected by the frequency of sex, meaning that infrequent sex can increase the effects of selection at linked sites despite high recombination rates. Despite this, the rate of sex of most facultatively sexual species is unknown. Here, we use genomewide patterns of linkage disequilibrium to infer fine-scale recombination rate variation in the genome of the facultatively sexual green alga Chlamydomonas reinhardtii. We observe recombination rate variation of up to two orders of magnitude and find evidence of recombination hotspots across the genome. Recombination rate is highest flanking genes, consistent with trends observed in other nonmammalian organisms, though intergenic recombination rates vary by intergenic tract length. We also find a positive relationship between nucleotide diversity and physical recombination rate, suggesting a widespread influence of selection at linked sites in the genome. Finally, we use estimates of the effective rate of recombination to calculate the rate of sex that occurs in natural populations, estimating a sexual cycle roughly every 840 generations. We argue that the relatively infrequent rate of sex and large effective population size creates a population genetic environment that increases the influence of selection on linked sites across the genome.
Collapse
Affiliation(s)
- Ahmed R Hasan
- Department of Cell and Systems Biology, University of Toronto, Ontario, Canada.,Department of Biology, University of Toronto Mississauga, Ontario, Canada
| | - Rob W Ness
- Department of Cell and Systems Biology, University of Toronto, Ontario, Canada.,Department of Biology, University of Toronto Mississauga, Ontario, Canada
| |
Collapse
|
5
|
Wright BW, Ruan J, Molloy MP, Jaschke PR. Genome Modularization Reveals Overlapped Gene Topology Is Necessary for Efficient Viral Reproduction. ACS Synth Biol 2020; 9:3079-3090. [PMID: 33044064 DOI: 10.1021/acssynbio.0c00323] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Sequence overlap between two genes is common across all genomes, with viruses having high proportions of these gene overlaps. Genome modularization and refactoring is the process of disrupting natural gene overlaps to separate coding sequences to enable their individual manipulation. The biological function and fitness effects of gene overlaps are not fully understood, and their effects on gene cluster and genome-level refactoring are unknown. The bacteriophage φX174 genome has ∼26% of nucleotides involved in encoding more than one gene. In this study we use an engineered φX174 phage containing a genome with all gene overlaps removed to show that gene overlap is critical to maintaining optimal viral fecundity. Through detailed phenotypic measurements we reveal that genome modularization in φX174 causes virion replication, stability, and attachment deficiencies. Quantitation of the complete phage proteome across an infection cycle reveals 30% of proteins display abnormal expression patterns. Taken together, we have for the first time comprehensively demonstrated that gene modularization severely perturbs the coordinated functioning of a bacteriophage replication cycle. This work highlights the biological importance of gene overlap in natural genomes and that reducing gene overlap disruption should be an integral part of future genome engineering projects.
Collapse
Affiliation(s)
- Bradley W. Wright
- Department of Molecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
| | - Juanfang Ruan
- Electron Microscope Unit, Mark Wainwright Analytical Centre, The University of New South Wales, Sydney, NSW 2052, Australia
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, NSW 2052, Australia
| | - Mark P. Molloy
- Kolling Institute, Northern Clinical School, The University of Sydney, Sydney, NSW 2006, Australia
| | - Paul R. Jaschke
- Department of Molecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
| |
Collapse
|
6
|
Gilet J, Conte R, Torchet C, Benard L, Lafontaine I. Additional Layer of Regulation via Convergent Gene Orientation in Yeasts. Mol Biol Evol 2020; 37:365-378. [PMID: 31580446 PMCID: PMC6993858 DOI: 10.1093/molbev/msz221] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Convergent gene pairs can produce transcripts with complementary sequences. We had shown that mRNA duplexes form in vivo in Saccharomyces cerevisiae via interactions of mRNA overlapping 3′-ends and can lead to posttranscriptional regulatory events. Here we show that mRNA duplex formation is restricted to convergent genes separated by short intergenic distance, independently of their 3′-untranslated region (UTR) length. We disclose an enrichment in genes involved in biological processes related to stress among these convergent genes. They are markedly conserved in convergent orientation in budding yeasts, meaning that this mode of posttranscriptional regulation could be shared in these organisms, conferring an additional level for modulating stress response. We thus investigated the mechanistic advantages potentially conferred by 3′-UTR mRNA interactions. Analysis of genome-wide transcriptome data revealed that Pat1 and Lsm1 factors, having 3′-UTR binding preference and participating to the remodeling of messenger ribonucleoprotein particles, bind differently these messenger-interacting mRNAs forming duplexes in comparison to mRNAs that do not interact (solo mRNAs). Functionally, messenger-interacting mRNAs show limited translational repression upon stress. We thus propose that mRNA duplex formation modulates the regulation of mRNA expression by limiting their access to translational repressors. Our results thus show that posttranscriptional regulation is an additional factor that determines the order of coding genes.
Collapse
Affiliation(s)
- Jules Gilet
- Institut de Biologie Physico-Chimique, UMR7141 Laboratoire de Biologie du Chloroplaste et Perception de la Lumière chez les Microalgues, CNRS, Sorbonne Université, Paris, France.,Institut de Biologie Physico-Chimique, UMR8226, CNRS, Sorbonne Université, Laboratoire de Biologie Moléculaire et Cellulaire des Eucaryotes, Paris, France
| | - Romain Conte
- Institut de Biologie Physico-Chimique, UMR7141 Laboratoire de Biologie du Chloroplaste et Perception de la Lumière chez les Microalgues, CNRS, Sorbonne Université, Paris, France.,Institut de Biologie Physico-Chimique, UMR8226, CNRS, Sorbonne Université, Laboratoire de Biologie Moléculaire et Cellulaire des Eucaryotes, Paris, France
| | - Claire Torchet
- Institut de Biologie Physico-Chimique, UMR8226, CNRS, Sorbonne Université, Laboratoire de Biologie Moléculaire et Cellulaire des Eucaryotes, Paris, France
| | - Lionel Benard
- Institut de Biologie Physico-Chimique, UMR8226, CNRS, Sorbonne Université, Laboratoire de Biologie Moléculaire et Cellulaire des Eucaryotes, Paris, France
| | - Ingrid Lafontaine
- Institut de Biologie Physico-Chimique, UMR7141 Laboratoire de Biologie du Chloroplaste et Perception de la Lumière chez les Microalgues, CNRS, Sorbonne Université, Paris, France.,Institut de Biologie Physico-Chimique, FRC 550, CNRS, Paris, France
| |
Collapse
|
7
|
Patterns of diverse gene functions in genomic neighborhoods predict gene function and phenotype. Sci Rep 2019; 9:19537. [PMID: 31863070 PMCID: PMC6925100 DOI: 10.1038/s41598-019-55984-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Accepted: 12/02/2019] [Indexed: 01/01/2023] Open
Abstract
Genes with similar roles in the cell cluster on chromosomes, thus benefiting from coordinated regulation. This allows gene function to be inferred by transferring annotations from genomic neighbors, following the guilt-by-association principle. We performed a systematic search for co-occurrence of >1000 gene functions in genomic neighborhoods across 1669 prokaryotic, 49 fungal and 80 metazoan genomes, revealing prevalent patterns that cannot be explained by clustering of functionally similar genes. It is a very common occurrence that pairs of dissimilar gene functions – corresponding to semantically distant Gene Ontology terms – are significantly co-located on chromosomes. These neighborhood associations are often as conserved across genomes as the known associations between similar functions, suggesting selective benefits from clustering of certain diverse functions, which may conceivably play complementary roles in the cell. We propose a simple encoding of chromosomal gene order, the neighborhood function profiles (NFP), which draws on diverse gene clustering patterns to predict gene function and phenotype. NFPs yield a 26–46% increase in predictive power over state-of-the-art approaches that propagate function across neighborhoods, thus providing hundreds of novel, high-confidence gene function inferences per genome. Furthermore, we demonstrate that copy number-neutral structural variation that shapes gene function distribution across chromosomes can predict phenotype of individuals from their genome sequence.
Collapse
|
8
|
Comparative Analysis of Intra- and Inter-Specific Genomic Variability in the Peach Potato Aphid, Myzus persicae. INSECTS 2019; 10:insects10100368. [PMID: 31652640 PMCID: PMC6835256 DOI: 10.3390/insects10100368] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Revised: 10/09/2019] [Accepted: 10/17/2019] [Indexed: 12/20/2022]
Abstract
The availability of genomic data in the last decade relating to different aphid species has allowed the analysis of the genomic variability occurring among such species, whereas intra-specific variability has hitherto very largely been neglected. In order to analyse the intra-genomic variability in the peach potato aphid, Myzus persicae, comparative analyses were performed revealing several clone-specific gene duplications, together with numerous deletions/rearrangements. Our comparative approach also allowed us to evaluate the synteny existing between the two M. persicae clones tested and between the peach potato aphid and the pea aphid, Acyrthosiphon pisum. Even if part of the observed rearrangements are related to a low quality of some assembled contigs and/or to the high number of contigs present in these aphid genomes, our evidence reveals that aphid clones are genetically more different than expected. These results suggest that the choice of performing genomes sequencing combining different biotypes/populations, as revealed in the case of the soybean aphid, Aphis glycines, is unlikely to be very informative in aphids. Interestingly, it is possible that the holocentric nature of aphid chromosomes favours genome rearrangements that can be successively inherited transgenerationally via the aphid's apomictic (parthenogenetic) mode of reproduction. Lastly, we evaluated the structure of the cluster of genes coding for the five histones (H1, H2A, H2B, H3 and H4) in order to better understand the quality of the two M. persicae genomes and thereby to improve our knowledge of this functionally important gene family.
Collapse
|
9
|
Salazar AN, Abeel T. Approximate, simultaneous comparison of microbial genome architectures via syntenic anchoring of quiver representations. Bioinformatics 2018; 34:i732-i742. [PMID: 30423098 PMCID: PMC6129293 DOI: 10.1093/bioinformatics/bty614] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Motivation A long-standing limitation in comparative genomic studies is the dependency on a reference genome, which hinders the spectrum of genetic diversity that can be identified across a population of organisms. This is especially true in the microbial world where genome architectures can significantly vary. There is therefore a need for computational methods that can simultaneously analyze the architectures of multiple genomes without introducing bias from a reference. Results In this article, we present Ptolemy: a novel method for studying the diversity of genome architectures-such as structural variation and pan-genomes-across a collection of microbial assemblies without the need of a reference. Ptolemy is a 'top-down' approach to compare whole genome assemblies. Genomes are represented as labeled multi-directed graphs-known as quivers-which are then merged into a single, canonical quiver by identifying 'gene anchors' via synteny analysis. The canonical quiver represents an approximate, structural alignment of all genomes in a given collection encoding structural variation across (sub-) populations within the collection. We highlight various applications of Ptolemy by analyzing structural variation and the pan-genomes of different datasets composing of Mycobacterium, Saccharomyces, Escherichia and Shigella species. Our results show that Ptolemy is flexible and can handle both conserved and highly dynamic genome architectures. Ptolemy is user-friendly-requires only FASTA-formatted assembly along with a corresponding GFF-formatted file-and resource-friendly-can align 24 genomes in ∼10 mins with four CPUs and <2 GB of RAM. Availability and implementation Github: https://github.com/AbeelLab/ptolemy. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alex N Salazar
- Delft Bioinformatics Lab, Delft University of Technology, Delft, The Netherlands
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Thomas Abeel
- Delft Bioinformatics Lab, Delft University of Technology, Delft, The Netherlands
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| |
Collapse
|
10
|
Vakirlis N, Sarilar V, Drillon G, Fleiss A, Agier N, Meyniel JP, Blanpain L, Carbone A, Devillers H, Dubois K, Gillet-Markowska A, Graziani S, Huu-Vang N, Poirel M, Reisser C, Schott J, Schacherer J, Lafontaine I, Llorente B, Neuvéglise C, Fischer G. Reconstruction of ancestral chromosome architecture and gene repertoire reveals principles of genome evolution in a model yeast genus. Genome Res 2016; 26:918-32. [PMID: 27247244 PMCID: PMC4937564 DOI: 10.1101/gr.204420.116] [Citation(s) in RCA: 66] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2016] [Accepted: 04/28/2016] [Indexed: 12/22/2022]
Abstract
Reconstructing genome history is complex but necessary to reveal quantitative principles governing genome evolution. Such reconstruction requires recapitulating into a single evolutionary framework the evolution of genome architecture and gene repertoire. Here, we reconstructed the genome history of the genus Lachancea that appeared to cover a continuous evolutionary range from closely related to more diverged yeast species. Our approach integrated the generation of a high-quality genome data set; the development of AnChro, a new algorithm for reconstructing ancestral genome architecture; and a comprehensive analysis of gene repertoire evolution. We found that the ancestral genome of the genus Lachancea contained eight chromosomes and about 5173 protein-coding genes. Moreover, we characterized 24 horizontal gene transfers and 159 putative gene creation events that punctuated species diversification. We retraced all chromosomal rearrangements, including gene losses, gene duplications, chromosomal inversions and translocations at single gene resolution. Gene duplications outnumbered losses and balanced rearrangements with 1503, 929, and 423 events, respectively. Gene content variations between extant species are mainly driven by differential gene losses, while gene duplications remained globally constant in all lineages. Remarkably, we discovered that balanced chromosomal rearrangements could be responsible for up to 14% of all gene losses by disrupting genes at their breakpoints. Finally, we found that nonsynonymous substitutions reached fixation at a coordinated pace with chromosomal inversions, translocations, and duplications, but not deletions. Overall, we provide a granular view of genome evolution within an entire eukaryotic genus, linking gene content, chromosome rearrangements, and protein divergence into a single evolutionary framework.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Sorbonne Universités, UPMC Univ. Paris 06, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, F-75005, Paris, France
| | - Véronique Sarilar
- Micalis Institute, INRA, AgroParisTech, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| | - Guénola Drillon
- Sorbonne Universités, UPMC Univ. Paris 06, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, F-75005, Paris, France
| | - Aubin Fleiss
- Sorbonne Universités, UPMC Univ. Paris 06, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, F-75005, Paris, France
| | - Nicolas Agier
- Sorbonne Universités, UPMC Univ. Paris 06, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, F-75005, Paris, France
| | - Jean-Philippe Meyniel
- ISoft, Route de l'Orme, Parc "Les Algorithmes" Bâtiment Euclide, 91190 Saint-Aubin, France
| | - Lou Blanpain
- Micalis Institute, INRA, AgroParisTech, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| | - Alessandra Carbone
- Sorbonne Universités, UPMC Univ. Paris 06, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, F-75005, Paris, France
| | - Hugo Devillers
- Micalis Institute, INRA, AgroParisTech, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| | - Kenny Dubois
- CRCM, CNRS, UMR7258, Inserm, U1068; Institut Paoli-Calmettes, Aix-Marseille Université, UM 105, F-13009, Marseille, France
| | - Alexandre Gillet-Markowska
- Sorbonne Universités, UPMC Univ. Paris 06, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, F-75005, Paris, France
| | - Stéphane Graziani
- ISoft, Route de l'Orme, Parc "Les Algorithmes" Bâtiment Euclide, 91190 Saint-Aubin, France
| | - Nguyen Huu-Vang
- Micalis Institute, INRA, AgroParisTech, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| | - Marion Poirel
- ISoft, Route de l'Orme, Parc "Les Algorithmes" Bâtiment Euclide, 91190 Saint-Aubin, France
| | - Cyrielle Reisser
- Department of Genetics, Genomics and Microbiology, University of Strasbourg/CNRS, UMR 7156, 67083 Strasbourg, France
| | - Jonathan Schott
- CRCM, CNRS, UMR7258, Inserm, U1068; Institut Paoli-Calmettes, Aix-Marseille Université, UM 105, F-13009, Marseille, France
| | - Joseph Schacherer
- Department of Genetics, Genomics and Microbiology, University of Strasbourg/CNRS, UMR 7156, 67083 Strasbourg, France
| | - Ingrid Lafontaine
- Sorbonne Universités, UPMC Univ. Paris 06, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, F-75005, Paris, France
| | - Bertrand Llorente
- CRCM, CNRS, UMR7258, Inserm, U1068; Institut Paoli-Calmettes, Aix-Marseille Université, UM 105, F-13009, Marseille, France
| | - Cécile Neuvéglise
- Micalis Institute, INRA, AgroParisTech, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| | - Gilles Fischer
- Sorbonne Universités, UPMC Univ. Paris 06, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, F-75005, Paris, France
| |
Collapse
|
11
|
Riadi G, Ossandón F, Larraín J, Melo F. Towards the bridging of molecular genetics data across Xenopus species. BMC Genomics 2016; 17:161. [PMID: 26925848 PMCID: PMC4772642 DOI: 10.1186/s12864-016-2440-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2015] [Accepted: 02/05/2016] [Indexed: 01/24/2023] Open
Abstract
BACKGROUND The clawed African frog Xenopus laevis has been one of the main vertebrate models for studies in developmental biology. However, for genetic studies, Xenopus tropicalis has been the experimental model of choice because it shorter life cycle and due to a more tractable genome that does not result from genome duplication as in the case of X. laevis. Today, although still organized in a large number of scaffolds, nearly 85% of X. tropicalis and 89% of X. laevis genomes have been sequenced. There is expectation for a comparative physical map that can be used as a Rosetta Stone between X. laevis genetic studies and X. tropicalis genomic research. RESULTS In this work, we have mapped using coarse-grained alignment the 18 chromosomes of X. laevis, release 9.1, on the 10 reference scaffolds representing the haploid genome of X. tropicalis, release 9.0. After validating the mapping with theoretical data, and estimating reference averages of genome sequence identity, 37 to 44% between the two species, we have carried out a synteny analysis for 2,112 orthologous genes. We found that 99.6% of genes are in the same organization. CONCLUSIONS Taken together, our results make possible to establish the correspondence between 62 and 65.5% of both genomes, percentage of identity, synteny and automatic annotation of transcripts of both species, providing a new and more comprehensive tool for comparative analysis of these two species, by allowing to bridge molecular genetics data among them.
Collapse
Affiliation(s)
- Gonzalo Riadi
- Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile. .,Centro de Bioinformática y Simulación Molecular, Facultad de Ingeniería, Universidad de Talca, Talca, Chile.
| | | | - Juan Larraín
- Center for Aging and Regeneration and Millennium Nucleus in Regenerative Biology, Santiago, Chile.
| | - Francisco Melo
- Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile.
| |
Collapse
|
12
|
Three-dimensional Genomic Organization of Genes’ Function in Eukaryotes. Evol Biol 2016. [DOI: 10.1007/978-3-319-41324-2_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
13
|
Wang D. DLGP: A database for lineage-conserved and lineage-specific gene pairs in animal and plant genomes. Biochem Biophys Res Commun 2015; 469:542-5. [PMID: 26697753 DOI: 10.1016/j.bbrc.2015.12.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Accepted: 12/10/2015] [Indexed: 10/22/2022]
Abstract
The conservation of gene organization in the genome with lineage-specificity is an invaluable resource to decipher their potential functionality with diverse selective constraints, especially in higher animals and plants. Gene pairs appear to be the minimal structure for such kind of gene clusters that tend to reside in their preferred locations, representing the distinctive genomic characteristics in single species or a given lineage. Despite gene families having been investigated in a widespread manner, the definition of gene pair families in various taxa still lacks adequate attention. To address this issue, we report DLGP (http://lcgbase.big.ac.cn/DLGP/) that stores the pre-calculated lineage-based gene pairs in currently available 134 animal and plant genomes and inspect them under the same analytical framework, bringing out a set of innovational features. First, the taxonomy or lineage has been classified into four levels such as Kingdom, Phylum, Class and Order. It adopts all-to-all comparison strategy to identify the possible conserved gene pairs in all species for each gene pair in certain species and reckon those that are conserved in over a significant proportion of species in a given lineage (e.g. Primates, Diptera or Poales) as the lineage-conserved gene pairs. Furthermore, it predicts the lineage-specific gene pairs by retaining the above-mentioned lineage-conserved gene pairs that are not conserved in any other lineages. Second, it carries out pairwise comparison for the gene pairs between two compared species and creates the table including all the conserved gene pairs and the image elucidating the conservation degree of gene pairs in chromosomal level. Third, it supplies gene order browser to extend gene pairs to gene clusters, allowing users to view the evolution dynamics in the gene context in an intuitive manner. This database will be able to facilitate the particular comparison between animals and plants, between vertebrates and arthropods, and between monocots and eudicots, accounting for the significant contribution of gene pairs to speciation and diversification in specific lineages.
Collapse
Affiliation(s)
- Dapeng Wang
- Stem Cell Laboratory, UCL Cancer Institute, University College London, London WC1E 6BT, UK; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, PR China.
| |
Collapse
|
14
|
Tang H, Bomhoff MD, Briones E, Zhang L, Schnable JC, Lyons E. SynFind: Compiling Syntenic Regions across Any Set of Genomes on Demand. Genome Biol Evol 2015; 7:3286-98. [PMID: 26560340 PMCID: PMC4700967 DOI: 10.1093/gbe/evv219] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
The identification of conserved syntenic regions enables discovery of predicted
locations for orthologous and homeologous genes, even when no such gene is present.
This capability means that synteny-based methods are far more effective than sequence
similarity-based methods in identifying true-negatives, a necessity for studying gene
loss and gene transposition. However, the identification of syntenic regions requires
complex analyses which must be repeated for pairwise comparisons between any two
species. Therefore, as the number of published genomes increases, there is a growing
demand for scalable, simple-to-use applications to perform comparative genomic
analyses that cater to both gene family studies and genome-scale studies. We
implemented SynFind, a web-based tool that addresses this need. Given one query
genome, SynFind is capable of identifying conserved syntenic regions in any set of
target genomes. SynFind is capable of reporting per-gene information, useful for
researchers studying specific gene families, as well as genome-wide data sets of
syntenic gene and predicted gene locations, critical for researchers focused on
large-scale genomic analyses. Inference of syntenic homologs provides the basis for
correlation of functional changes around genes of interests between related
organisms. Deployed on the CoGe online platform, SynFind is connected to the genomic
data from over 15,000 organisms from all domains of life as well as supporting
multiple releases of the same organism. SynFind makes use of a powerful job execution
framework that promises scalability and reproducibility. SynFind can be accessed at
http://genomevolution.org/CoGe/SynFind.pl. A video tutorial of SynFind
using Phytophthrora as an example is available at http://www.youtube.com/watch?v=2Agczny9Nyc.
Collapse
Affiliation(s)
- Haibao Tang
- Center for Genomics and Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, Fujian Province, China School of Plant Sciences, iPlant Collaborative, University of Arizona
| | - Matthew D Bomhoff
- School of Plant Sciences, iPlant Collaborative, University of Arizona
| | - Evan Briones
- School of Plant Sciences, iPlant Collaborative, University of Arizona
| | - Liangsheng Zhang
- Center for Genomics and Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, Fujian Province, China
| | - James C Schnable
- Department of Agronomy and Horticulture, University of Nebraska, Lincoln
| | - Eric Lyons
- School of Plant Sciences, iPlant Collaborative, University of Arizona
| |
Collapse
|
15
|
Diament A, Tuller T. Improving 3D Genome Reconstructions Using Orthologous and Functional Constraints. PLoS Comput Biol 2015; 11:e1004298. [PMID: 26000633 PMCID: PMC4441392 DOI: 10.1371/journal.pcbi.1004298] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2015] [Accepted: 04/24/2015] [Indexed: 11/19/2022] Open
Abstract
The study of the 3D architecture of chromosomes has been advancing rapidly in recent years. While a number of methods for 3D reconstruction of genomic models based on Hi-C data were proposed, most of the analyses in the field have been performed on different 3D representation forms (such as graphs). Here, we reproduce most of the previous results on the 3D genomic organization of the eukaryote Saccharomyces cerevisiae using analysis of 3D reconstructions. We show that many of these results can be reproduced in sparse reconstructions, generated from a small fraction of the experimental data (5% of the data), and study the properties of such models. Finally, we propose for the first time a novel approach for improving the accuracy of 3D reconstructions by introducing additional predicted physical interactions to the model, based on orthologous interactions in an evolutionary-related organism and based on predicted functional interactions between genes. We demonstrate that this approach indeed leads to the reconstruction of improved models. Understanding the importance of genome architecture, the arrangement of genes within the genome and how this organization evolved has been intensively studied in recent years. Despite rapid progress in the field, accurate 3D modeling of genome organization remains a challenge. While a number of methods for 3D reconstruction of genomic models based on genome-wide experimental data were proposed, most of the analyses in the field have been performed on different 3D representation forms (such as graphs). Here, we reproduce most of the previous results on the 3D genome organization of the eukaryote Saccharomyces cerevisiae using analysis of 3D reconstructions. We show that many of these results can be reproduced in sparse reconstructions, generated from a small fraction of the experimental data (5% of the data), and study the properties of such models. Finally, we propose for the first time a novel approach for improving the accuracy of 3D reconstructions by introducing additional predicted physical interactions to the model, based on orthologous interactions in a different organism and based on predicted functional interactions between genes. Our proposed approach can facilitate future studies of 3D genome organization via improved models.
Collapse
Affiliation(s)
- Alon Diament
- Dept. of Biomedical Engineering, Tel Aviv University, Tel Aviv, Israel
| | - Tamir Tuller
- Dept. of Biomedical Engineering, Tel Aviv University, Tel Aviv, Israel
- The Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
- * E-mail:
| |
Collapse
|
16
|
Abstract
When considering the evolution of a gene’s expression profile, we commonly assume that this is unaffected by its genomic neighborhood. This is, however, in contrast to what we know about the lack of autonomy between neighboring genes in gene expression profiles in extant taxa. Indeed, in all eukaryotic genomes genes of similar expression-profile tend to cluster, reflecting chromatin level dynamics. Does it follow that if a gene increases expression in a particular lineage then the genomic neighbors will also increase in their expression or is gene expression evolution autonomous? To address this here we consider evolution of human gene expression since the human-chimp common ancestor, allowing for both variation in estimation of current expression level and error in Bayesian estimation of the ancestral state. We find that in all tissues and both sexes, the change in gene expression of a focal gene on average predicts the change in gene expression of neighbors. The effect is highly pronounced in the immediate vicinity (<100 kb) but extends much further. Sex-specific expression change is also genomically clustered. As genes increasing their expression in humans tend to avoid nuclear lamina domains and be enriched for the gene activator 5-hydroxymethylcytosine, we conclude that, most probably owing to chromatin level control of gene expression, a change in gene expression of one gene likely affects the expression evolution of neighbors, what we term expression piggybacking, an analog of hitchhiking.
Collapse
Affiliation(s)
- Avazeh T Ghanbarian
- Department of Biology and Biochemisty, University of Bath, Bath, United Kingdom
| | - Laurence D Hurst
- Department of Biology and Biochemisty, University of Bath, Bath, United Kingdom
| |
Collapse
|
17
|
Diament A, Pinter RY, Tuller T. Three-dimensional eukaryotic genomic organization is strongly correlated with codon usage expression and function. Nat Commun 2014; 5:5876. [PMID: 25510862 DOI: 10.1038/ncomms6876] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2014] [Accepted: 11/17/2014] [Indexed: 01/08/2023] Open
Abstract
It has been shown that the distribution of genes in eukaryotic genomes is not random; however, formerly reported relations between gene function and genomic organization were relatively weak. Previous studies have demonstrated that codon usage bias is related to all stages of gene expression and to protein function. Here we apply a novel tool for assessing functional relatedness, codon usage frequency similarity (CUFS), which measures similarity between genes in terms of codon and amino acid usage. By analyzing chromosome conformation capture data, describing the three-dimensional (3D) conformation of the DNA, we show that the functional similarity between genes captured by CUFS is directly and very strongly correlated with their 3D distance in Saccharomyces cerevisiae, Schizosaccharomyces pombe, Arabidopsis thaliana, mouse and human. This emphasizes the importance of three-dimensional genomic localization in eukaryotes and indicates that codon usage is tightly linked to genome architecture.
Collapse
Affiliation(s)
- Alon Diament
- Department of Biomedical Engineering, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Ron Y Pinter
- Department of Computer Science, Technion-Israel Institute of Technology, Haifa 32000, Israel
| | - Tamir Tuller
- 1] Department of Biomedical Engineering, Tel Aviv University, Tel Aviv 6997801, Israel [2] The Sagol School of Neuroscience, Tel Aviv University, Tel Aviv 6997801, Israel
| |
Collapse
|
18
|
DBC1/CCAR2 and CCAR1 Are Largely Disordered Proteins that Have Evolved from One Common Ancestor. BIOMED RESEARCH INTERNATIONAL 2014; 2014:418458. [PMID: 25610865 PMCID: PMC4287135 DOI: 10.1155/2014/418458] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/11/2014] [Revised: 09/18/2014] [Accepted: 09/18/2014] [Indexed: 01/07/2023]
Abstract
Deleted in breast cancer 1 (DBC1, CCAR2, KIAA1967) is a large, predominantly nuclear, multidomain protein that modulates gene expression by inhibiting several epigenetic modifiers, including the deacetylases SIRT1 and HDAC3, and the methyltransferase SUV39H1. DBC1 shares many highly conserved protein domains with its paralog cell cycle and apoptosis regulator 1 (CCAR1, CARP-1). In this study, we examined the full-length sequential and structural properties of DBC1 and CCAR1 from multiple species and correlated these properties with evolution. Our data shows that the conserved domains shared between DBC1 and CCAR1 have similar domain structures, as well as similar patterns of predicted disorder in less-conserved intrinsically disordered regions. Our analysis indicates similarities between DBC1, CCAR1, and the nematode protein lateral signaling target 3 (LST-3), suggesting that DBC1 and CCAR1 may have evolved from LST-3. Our data also suggests that DBC1 emerged later in evolution than CCAR1. DBC1 contains regions that show less conservation across species as compared to the same regions in CCAR1, suggesting a continuously evolving scenario for DBC1. Overall, this study provides insight into the structure and evolution of DBC1 and CCAR1, which may impact future studies on the biological functions of these proteins.
Collapse
|
19
|
Dai Z, Xiong Y, Dai X. Neighboring genes show interchromosomal colocalization after their separation. Mol Biol Evol 2014; 31:1166-72. [PMID: 24505120 DOI: 10.1093/molbev/msu065] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
The order of genes on eukaryotic chromosomes is nonrandom. Some neighboring genes show order conservation among species, while some neighboring genes separate during evolution. Here, we investigated whether neighboring genes show interactions after their separation. We found that neighboring gene pairs tend to show interchromosomal colocalization (i.e., nuclear colocalization) in the species in which they separate. These nuclear colocalized separated neighboring gene pairs 1) show neighborhood conservation in more species, 2) tend to be regulated by the same transcription factor, and 3) tend to be regulated by the same histone modification. These results suggest a mechanism by which neighboring genes could retain nuclear proximity after their separation.
Collapse
Affiliation(s)
- Zhiming Dai
- Department of Electronics and Communication Engineering, School of Information Science and Technology, Sun Yat-Sen University, Guangzhou, China
| | | | | |
Collapse
|
20
|
Rubin AF, Green P. Expression-based segmentation of the Drosophila genome. BMC Genomics 2013; 14:812. [PMID: 24256206 PMCID: PMC3909303 DOI: 10.1186/1471-2164-14-812] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2013] [Accepted: 11/18/2013] [Indexed: 01/22/2023] Open
Abstract
Background It is generally accepted that gene order in eukaryotes is nonrandom, with adjacent genes often sharing expression patterns across tissues, and that this organization may be important for gene regulation. Here we describe a novel method, based on an explicit probability model instead of correlation analysis, for identifying coordinately expressed gene clusters (‘coexpression segments’), apply it to Drosophila melanogaster, and look for epigenetic associations using publicly available data. Results We find that two-thirds of Drosophila genes fall into multigenic coexpression segments, and that such segments are of two main types, housekeeping and tissue-restricted. Consistent with correlation-based studies, we find that adjacent genes within the same segment tend to be physically closer to each other than to the adjacent genes in different segments, and that tissue-restricted segments are enriched for testis-expressed genes. Our segmentation pattern correlates with Hi-C based physical interaction domains, but segments are generally much smaller than domains. Intersegment regions (including those which do not correspond to physical domain boundaries) are enriched for insulator binding sites. Conclusions We describe a novel approach for identifying coexpression clusters that does not require arbitrary cutoff values or heuristics, and find that coexpression of adjacent genes is widespread in the Drosophila genome. Coexpression segments appear to reflect a level of regulatory organization related to, but below that of physical interaction domains, and depending in part on insulator binding.
Collapse
Affiliation(s)
- Alan F Rubin
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.
| | | |
Collapse
|
21
|
Physical linkage of metabolic genes in fungi is an adaptation against the accumulation of toxic intermediate compounds. Proc Natl Acad Sci U S A 2013; 110:11481-6. [PMID: 23798424 DOI: 10.1073/pnas.1304461110] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Genomic analyses have proliferated without being tied to tangible phenotypes. For example, although coordination of both gene expression and genetic linkage have been offered as genetic mechanisms for the frequently observed clustering of genes participating in fungal metabolic pathways, elucidation of the phenotype(s) favored by selection, resulting in cluster formation and maintenance, has not been forthcoming. We noted that the cause of certain well-studied human metabolic disorders is the accumulation of toxic intermediate compounds (ICs), which occurs when the product of an enzyme is not used as a substrate by a downstream neighbor in the metabolic network. This raises the hypothesis that the phenotype favored by selection to drive gene clustering is the mitigation of IC toxicity. To test this, we examined 100 diverse fungal genomes for the simplest type of cluster, gene pairs that are both metabolic neighbors and chromosomal neighbors immediately adjacent to each other, which we refer to as "double neighbor gene pairs" (DNGPs). Examination of the toxicity of their corresponding ICs shows that, compared with chromosomally nonadjacent metabolic neighbors, DNGPs are enriched for ICs that have acutely toxic LD50 doses or reactive functional groups. Furthermore, DNGPs are significantly more likely to be divergently oriented on the chromosome; remarkably, ∼40% of these DNGPs have ICs known to be toxic. We submit that the structure of synteny in metabolic pathways of fungi is a signature of selection for protection against the accumulation of toxic metabolic intermediates.
Collapse
|
22
|
Díaz-Castillo C. Females and males contribute in opposite ways to the evolution of gene order in Drosophila. PLoS One 2013; 8:e64491. [PMID: 23696898 PMCID: PMC3655977 DOI: 10.1371/journal.pone.0064491] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2012] [Accepted: 04/16/2013] [Indexed: 11/19/2022] Open
Abstract
An intriguing association between the spatial layout of chromosomes within nuclei and the evolution of chromosome gene order was recently uncovered. Chromosome regions with conserved gene order in the Drosophila genus are larger if they interact with the inner side of the nuclear envelope in D. melanogaster somatic cells. This observation opens a new door to understand the evolution of chromosomes in the light of the dynamics of the spatial layout of chromosomes and the way double-strand breaks are repaired in D. melanogaster germ lines. Chromosome regions at the nuclear periphery in somatic cell nuclei relocate to more internal locations of male germ line cell nuclei, which might prefer a gene order-preserving mechanism to repair double-strand breaks. Conversely, chromosome regions at the nuclear periphery in somatic cells keep their location in female germ line cell nuclei, which might be inaccessible for cellular machinery that causes gene order-disrupting chromosome rearrangements. Thus, the gene order stability for genome regions at the periphery of somatic cell nuclei might result from the active repair of double-strand breaks using conservative mechanisms in male germ line cells, and the passive inaccessibility for gene order-disrupting factors at the periphery of nuclei of female germ line cells. In the present article, I find evidence consistent with a DNA break repair-based differential contribution of both D. melanogaster germ lines to the stability/disruption of gene order. The importance of germ line differences for the layout of chromosomes and DNA break repair strategies with regard to other genomic patterns is briefly discussed.
Collapse
|
23
|
Lemay DG, Martin WF, Hinrichs AS, Rijnkels M, German JB, Korf I, Pollard KS. G-NEST: a gene neighborhood scoring tool to identify co-conserved, co-expressed genes. BMC Bioinformatics 2012; 13:253. [PMID: 23020263 PMCID: PMC3575404 DOI: 10.1186/1471-2105-13-253] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2012] [Accepted: 09/23/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In previous studies, gene neighborhoods-spatial clusters of co-expressed genes in the genome-have been defined using arbitrary rules such as requiring adjacency, a minimum number of genes, a fixed window size, or a minimum expression level. In the current study, we developed a Gene Neighborhood Scoring Tool (G-NEST) which combines genomic location, gene expression, and evolutionary sequence conservation data to score putative gene neighborhoods across all possible window sizes simultaneously. RESULTS Using G-NEST on atlases of mouse and human tissue expression data, we found that large neighborhoods of ten or more genes are extremely rare in mammalian genomes. When they do occur, neighborhoods are typically composed of families of related genes. Both the highest scoring and the largest neighborhoods in mammalian genomes are formed by tandem gene duplication. Mammalian gene neighborhoods contain highly and variably expressed genes. Co-localized noisy gene pairs exhibit lower evolutionary conservation of their adjacent genome locations, suggesting that their shared transcriptional background may be disadvantageous. Genes that are essential to mammalian survival and reproduction are less likely to occur in neighborhoods, although neighborhoods are enriched with genes that function in mitosis. We also found that gene orientation and protein-protein interactions are partially responsible for maintenance of gene neighborhoods. CONCLUSIONS Our experiments using G-NEST confirm that tandem gene duplication is the primary driver of non-random gene order in mammalian genomes. Non-essentiality, co-functionality, gene orientation, and protein-protein interactions are additional forces that maintain gene neighborhoods, especially those formed by tandem duplicates. We expect G-NEST to be useful for other applications such as the identification of core regulatory modules, common transcriptional backgrounds, and chromatin domains. The software is available at http://docpollard.org/software.html.
Collapse
Affiliation(s)
- Danielle G Lemay
- Genome Center, University of California Davis, 451 Health Science Dr, Davis, CA, 95616, United States of America.
| | | | | | | | | | | | | |
Collapse
|
24
|
Collins PL, Henderson MA, Aune TM. Lineage-specific adjacent IFNG and IL26 genes share a common distal enhancer element. Genes Immun 2012; 13:481-8. [PMID: 22622197 PMCID: PMC4180225 DOI: 10.1038/gene.2012.22] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2012] [Accepted: 04/23/2012] [Indexed: 12/24/2022]
Abstract
Certain groups of physically linked genes remain linked over long periods of evolutionary time. The general view is that such evolutionary conservation confers 'fitness' to the species. Why gene order confers 'fitness' to the species is incompletely understood. For example, linkage of IL26 and IFNG is preserved over evolutionary time yet Th17 lineages express IL26 and Th1 lineages express IFNG. We considered the hypothesis that distal enhancer elements may be shared between adjacent genes, which would require linkage be maintained in evolution. We test this hypothesis using a bacterial artificial chromosome transgenic model with deletions of specific conserved non-coding sequences. We identify one enhancer element uniquely required for IL26 expression but not for IFNG expression. We identify a second enhancer element positioned between IL26 and IFNG required for both IL26 and IFNG expression. One function of this enhancer is to facilitate recruitment of RNA polymerase II to promoters of both genes. Thus, sharing of distal enhancers between adjacent genes may contribute to evolutionary preservation of gene order.
Collapse
Affiliation(s)
- P L Collins
- Department of Pathology, Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, TN 37232-2681, USA
| | | | | |
Collapse
|
25
|
Irimia M, Tena JJ, Alexis MS, Fernandez-Miñan A, Maeso I, Bogdanovic O, de la Calle-Mustienes E, Roy SW, Gómez-Skarmeta JL, Fraser HB. Extensive conservation of ancient microsynteny across metazoans due to cis-regulatory constraints. Genome Res 2012; 22:2356-67. [PMID: 22722344 PMCID: PMC3514665 DOI: 10.1101/gr.139725.112] [Citation(s) in RCA: 105] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
The order of genes in eukaryotic genomes has generally been assumed to be neutral, since gene order is largely scrambled over evolutionary time. Only a handful of exceptional examples are known, typically involving deeply conserved clusters of tandemly duplicated genes (e.g., Hox genes and histones). Here we report the first systematic survey of microsynteny conservation across metazoans, utilizing 17 genome sequences. We identified nearly 600 pairs of unrelated genes that have remained tightly physically linked in diverse lineages across over 600 million years of evolution. Integrating sequence conservation, gene expression data, gene function, epigenetic marks, and other genomic features, we provide extensive evidence that many conserved ancient linkages involve (1) the coordinated transcription of neighboring genes, or (2) genomic regulatory blocks (GRBs) in which transcriptional enhancers controlling developmental genes are contained within nearby bystander genes. In addition, we generated ChIP-seq data for key histone modifications in zebrafish embryos, which provided further evidence of putative GRBs in embryonic development. Finally, using chromosome conformation capture (3C) assays and stable transgenic experiments, we demonstrate that enhancers within bystander genes drive the expression of genes such as Otx and Islet, critical regulators of central nervous system development across bilaterians. These results suggest that ancient genomic functional associations are far more common than previously thought—involving ∼12% of the ancestral bilaterian genome—and that cis-regulatory constraints are crucial in determining metazoan genome architecture.
Collapse
Affiliation(s)
- Manuel Irimia
- Department of Biology, Stanford University, Stanford, California 94305, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Wang GZ, Chen WH, Lercher MJ. Coexpression of linked gene pairs persists long after their separation. Genome Biol Evol 2011; 3:565-70. [PMID: 21737396 PMCID: PMC3156566 DOI: 10.1093/gbe/evr049] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
In many eukaryotes, physically linked gene pairs tend to be coexpressed. However, it is still controversial to what extent this neighbor coexpression is maintained by selection and to what extent it is nonselective, purely mechanistic "leaky expression." Here, we analyze expression patterns of gene pairs that have lost their linkage in the evolution of Saccharomyces cerevisiae since its last common ancestor with Kluyveromyces waltii or that were never linked in the S. cerevisiae lineage but became neighbors in a related yeast. We demonstrate that coexpression of many linked genes is retained long after their separation and is thus likely to be functionally important. In addition, unlinked gene pairs that recently became neighbors in other yeast species tend to be coexpressed in S. cerevisiae. This suggests that natural selection often favors chromosomal rearrangements in which coexpressed genes become neighbors. Contrary to previous suggestions, selectively favorable coexpression appears not to be restricted to bidirectional promoters.
Collapse
Affiliation(s)
- Guang-Zhong Wang
- Institute for Computer Science, Heinrich-Heine-University, 40225 Düsseldorf, Germany
| | | | | |
Collapse
|
27
|
Chen WH, Wei W, Lercher MJ. Minimal regulatory spaces in yeast genomes. BMC Genomics 2011; 12:320. [PMID: 21679449 PMCID: PMC3128071 DOI: 10.1186/1471-2164-12-320] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2011] [Accepted: 06/16/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The regulatory information encoded in the DNA of promoter regions usually enforces a minimal, non-zero distance between the coding regions of neighboring genes. However, the size of this minimal regulatory space is not generally known. In particular, it is unclear if minimal promoter size differs between species and between uni- and bi-directionally acting regulatory regions. RESULTS Analyzing the genomes of 11 yeasts, we show that the lower size limit on promoter-containing regions is species-specific within a relatively narrow range (80-255 bp). This size limit applies equally to regions that initiate transcription on one or both strands, indicating that bi-directional promoters and uni-directional promoters are constrained similarly. We further find that young, species-specific regions are on average much longer than older regions, suggesting either a bias towards deletions or selection for genome compactness in yeasts. While the length evolution of promoter-less intergenic regions is well described by a simplistic, purely neutral model, regions containing promoters typically show an excess of unusually long regions. Regions flanked by divergently transcribed genes have a bi-modal length distribution, with short lengths found preferentially among older regions. These old, short regions likely harbor evolutionarily conserved bi-directionally active promoters. Surprisingly, some of the evolutionarily youngest regions in two of the eleven species (S. cerevisiae and K. waltii) are shorter than the lower limit observed in older regions. CONCLUSIONS The minimal chromosomal space required for transcriptional regulation appears to be relatively similar across yeast species, and is the same for uni-directional and bi-directional promoters. New intergenic regions created by genome rearrangements tend to evolve towards the more narrow size distribution found among older regions.
Collapse
Affiliation(s)
- Wei-Hua Chen
- Institute for Computer Science, Heinrich-Heine-University Düsseldorf, Germany
| | | | | |
Collapse
|
28
|
Sugino RP, Innan H. Natural Selection on Gene Order in the Genome Reorganization Process After Whole-Genome Duplication of Yeast. Mol Biol Evol 2011; 29:71-9. [DOI: 10.1093/molbev/msr118] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
|
29
|
Weber CC, Hurst LD. Support for multiple classes of local expression clusters in Drosophila melanogaster, but no evidence for gene order conservation. Genome Biol 2011; 12:R23. [PMID: 21414197 PMCID: PMC3129673 DOI: 10.1186/gb-2011-12-3-r23] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2011] [Revised: 03/04/2011] [Accepted: 03/17/2011] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Gene order in eukaryotic genomes is not random, with genes with similar expression profiles tending to cluster. In yeasts, the model taxon for gene order analysis, such syntenic clusters of non-homologous genes tend to be conserved over evolutionary time. Whether similar clusters show gene order conservation in other lineages is, however, undecided. Here, we examine this issue in Drosophila melanogaster using high-resolution chromosome rearrangement data. RESULTS We show that D. melanogaster has at least three classes of expression clusters: first, as observed in mammals, large clusters of functionally unrelated housekeeping genes; second, small clusters of functionally related highly co-expressed genes; and finally, as previously defined by Spellman and Rubin, larger domains of co-expressed but functionally unrelated genes. The latter are, however, not independent of the small co-expression clusters and likely reflect a methodological artifact. While the small co-expression and housekeeping/essential gene clusters resemble those observed in yeast, in contrast to yeast, we see no evidence that any of the three cluster types are preserved as synteny blocks. If anything, adjacent co-expressed genes are more likely to become rearranged than expected. Again in contrast to yeast, in D. melanogaster, gene pairs with short intergene distance or in divergent orientations tend to have higher rearrangement rates. These findings are consistent with co-expression being partly due to shared chromatin environment. CONCLUSIONS We conclude that, while similar in terms of cluster types, gene order evolution has strikingly different patterns in yeasts and in D. melanogaster, although recombination is associated with gene order rearrangement in both.
Collapse
Affiliation(s)
- Claudia C Weber
- Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, BA2 7AY, UK
| | | |
Collapse
|
30
|
Wang GZ, Lercher MJ, Hurst LD. Transcriptional coupling of neighboring genes and gene expression noise: evidence that gene orientation and noncoding transcripts are modulators of noise. Genome Biol Evol 2011; 3:320-31. [PMID: 21402863 PMCID: PMC5654408 DOI: 10.1093/gbe/evr025] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
How is noise in gene expression modulated? Do mechanisms of noise control impact genome organization? In yeast, the expression of one gene can affect that of a very close neighbor. As the effect is highly regionalized, we hypothesize that genes in different orientations will have differing degrees of coupled expression and, in turn, different noise levels. Divergently organized gene pairs, in particular those with bidirectional promoters, have close promoters, maximizing the likelihood that expression of one gene affects the neighbor. With more distant promoters, the same is less likely to hold for gene pairs in nondivergent orientation. Stochastic models suggest that coupled chromatin dynamics will typically result in low abundance-corrected noise (ACN). Transcription of noncoding RNA (ncRNA) from a bidirectional promoter, we thus hypothesize to be a noise-reduction, expression-priming, mechanism. The hypothesis correctly predicts that protein-coding genes with a bidirectional promoter, including those with a ncRNA partner, have lower ACN than other genes and divergent gene pairs uniquely have correlated ACN. Moreover, as predicted, ACN increases with the distance between promoters. The model also correctly predicts ncRNA transcripts to be often divergently transcribed from genes that a priori would be under selection for low noise (essential genes, protein complex genes) and that the latter genes should commonly reside in divergent orientation. Likewise, that genes with bidirectional promoters are rare subtelomerically, cluster together, and are enriched in essential gene clusters is expected and observed. We conclude that gene orientation and transcription of ncRNAs are candidate modulators of noise.
Collapse
Affiliation(s)
- Guang-Zhong Wang
- Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, United Kingdom
| | | | | |
Collapse
|
31
|
Muro EM, Mah N, Moreno-Hagelsieb G, Andrade-Navarro MA. The pseudogenes of Mycobacterium leprae reveal the functional relevance of gene order within operons. Nucleic Acids Res 2010; 39:1732-8. [PMID: 21051341 PMCID: PMC3061063 DOI: 10.1093/nar/gkq1067] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Almost 50 years following the discovery of the prokaryotic operon, the functional relevance of gene order within operons remains unclear. In this work, we take advantage of the eroded genome of Mycobacterium leprae to add evidence supporting the notion that functionally less important genes have a tendency to be located at the end of its operons. M. leprae’s genome includes 1133 pseudogenes and 1614 protein-coding genes and can be compared with the close genome of M. tuberculosis. Assuming M. leprae’s pseudogenes to represent dispensable genes, we have studied the position of these pseudogenes in the operons of M. leprae and of their orthologs in M. tuberculosis. We observed that both tend to be located in the 3′ (downstream) half of the operon (P-values of 0.03 and 0.18, respectively). Analysis of pseudogenes in all available prokaryotic genomes confirms this trend (P-value of 7.1 × 10−7). In a complementary analysis, we found a significant tendency for essential genes to be located at the 5′ (upstream) half of the operon (P-value of 0.006). Our work provides an indication that, in prokarya, functionally less important genes have a tendency to be located at the end of operons, while more relevant genes tend to be located toward operon starts.
Collapse
Affiliation(s)
- Enrique M Muro
- Computational Biology and Data Mining Group, Max Delbrück Center for Molecular Medicine, Robert-Rössle Strasse 10, 13125, Berlin, Germany.
| | | | | | | |
Collapse
|
32
|
Characterizing the metabolism of Dehalococcoides with a constraint-based model. PLoS Comput Biol 2010; 6. [PMID: 20811585 PMCID: PMC2930330 DOI: 10.1371/journal.pcbi.1000887] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2010] [Accepted: 07/15/2010] [Indexed: 01/26/2023] Open
Abstract
Dehalococcoides strains respire a wide variety of chloro-organic compounds and are important for the bioremediation of toxic, persistent, carcinogenic, and ubiquitous ground water pollutants. In order to better understand metabolism and optimize their application, we have developed a pan-genome-scale metabolic network and constraint-based metabolic model of Dehalococcoides. The pan-genome was constructed from publicly available complete genome sequences of Dehalococcoides sp. strain CBDB1, strain 195, strain BAV1, and strain VS. We found that Dehalococcoides pan-genome consisted of 1118 core genes (shared by all), 457 dispensable genes (shared by some), and 486 unique genes (found in only one genome). The model included 549 metabolic genes that encoded 356 proteins catalyzing 497 gene-associated model reactions. Of these 497 reactions, 477 were associated with core metabolic genes, 18 with dispensable genes, and 2 with unique genes. This study, in addition to analyzing the metabolism of an environmentally important phylogenetic group on a pan-genome scale, provides valuable insights into Dehalococcoides metabolic limitations, low growth yields, and energy conservation. The model also provides a framework to anchor and compare disparate experimental data, as well as to give insights on the physiological impact of "incomplete" pathways, such as the TCA-cycle, CO(2) fixation, and cobalamin biosynthesis pathways. The model, referred to as iAI549, highlights the specialized and highly conserved nature of Dehalococcoides metabolism, and suggests that evolution of Dehalococcoides species is driven by the electron acceptor availability.
Collapse
|
33
|
Dávila López M, Martínez Guerra JJ, Samuelsson T. Analysis of gene order conservation in eukaryotes identifies transcriptionally and functionally linked genes. PLoS One 2010; 5:e10654. [PMID: 20498846 PMCID: PMC2871058 DOI: 10.1371/journal.pone.0010654] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2010] [Accepted: 04/26/2010] [Indexed: 01/03/2023] Open
Abstract
The order of genes in eukaryotes is not entirely random. Studies of gene order conservation are important to understand genome evolution and to reveal mechanisms why certain neighboring genes are more difficult to separate during evolution. Here, genome-wide gene order information was compiled for 64 species, representing a wide variety of eukaryotic phyla. This information is presented in a browser where gene order may be displayed and compared between species. Factors related to non-random gene order in eukaryotes were examined by considering pairs of neighboring genes. The evolutionary conservation of gene pairs was studied with respect to relative transcriptional direction, intergenic distance and functional relationship as inferred by gene ontology. The results show that among gene pairs that are conserved the divergently and co-directionally transcribed genes are much more common than those that are convergently transcribed. Furthermore, highly conserved pairs, in particular those of fungi, are characterized by a short intergenic distance. Finally, gene pairs of metazoa and fungi that are evolutionary conserved and that are divergently transcribed are much more likely to be related by function as compared to poorly conserved gene pairs. One example is the ribosomal protein gene pair L13/S16, which is unusual as it occurs both in fungi and alveolates. A specific functional relationship between these two proteins is also suggested by the fact that they are part of the same operon in both eubacteria and archaea. In conclusion, factors associated with non-random gene order in eukaryotes include relative gene orientation, intergenic distance and functional relationships. It seems likely that certain pairs of genes are conserved because the genes involved have a transcriptional and/or functional relationship. The results also indicate that studies of gene order conservation aid in identifying genes that are related in terms of transcriptional control.
Collapse
Affiliation(s)
- Marcela Dávila López
- Department of Medical Biochemistry and Cell Biology, Institute of Biomedicine, Sahlgrenska Academy at University of Gothenburg, Göteborg, Sweden
| | - Juan José Martínez Guerra
- Departmento de Química, Centro de Ciencias Básicas, Universidad Autónoma de Aguascalientes, Aguascalientes, Aguascalientes, Mexico
| | - Tore Samuelsson
- Department of Medical Biochemistry and Cell Biology, Institute of Biomedicine, Sahlgrenska Academy at University of Gothenburg, Göteborg, Sweden
- * E-mail:
| |
Collapse
|
34
|
Zheng C, Kerr Wall P, Leebens-Mack J, DE Pamphilis C, Albert VA, Sankoff D. Gene loss under neighborhood selection following whole genome duplication and the reconstruction of the ancestral Populus genome. J Bioinform Comput Biol 2009; 7:499-520. [PMID: 19507287 DOI: 10.1142/s0219720009004199] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2008] [Revised: 11/06/2008] [Accepted: 11/11/2008] [Indexed: 11/18/2022]
Abstract
We develop criteria to detect neighborhood selection effects on gene loss following whole genome duplication, and apply them to the recently sequenced poplar (Populus trichocarpa) genome. We improve on guided genome halving algorithms so that several thousand gene sets, each containing two paralogs in the descendant T of the doubling event and their single ortholog from an undoubled reference genome R, can be analyzed to reconstruct the ancestor A of T at the time of doubling. At the same time, large numbers of defective gene sets, either missing one paralog from T or missing their ortholog in R, may be incorporated into the analysis in a consistent way. We apply this genomic rearrangement distance-based approach to the poplar and grapevine (Vitis vinifera) genomes, as T and R respectively. We conclude that, after chromosome doubling, the "choice" of which paralogous gene pairs will lose copies is random, but that the retention of strings of single-copy genes on one chromosome versus the other is decidedly non-random.
Collapse
Affiliation(s)
- Chunfang Zheng
- Department of Biology, University of Ottawa, Ottawa, Ontario K1N 6N5, Canada.
| | | | | | | | | | | |
Collapse
|
35
|
Yanai I, Hunter CP. Comparison of diverse developmental transcriptomes reveals that coexpression of gene neighbors is not evolutionarily conserved. Genome Res 2009; 19:2214-20. [PMID: 19745112 DOI: 10.1101/gr.093815.109] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Genomic analyses have shown that adjacent genes are often coexpressed. However, it remains unclear whether the observed coexpression is a result of functional organization or a consequence of adjacent active chromatin or transcriptional read-through, which may be free of selective biases. Here, we compare temporal expression profiles of one-to-one orthologs in conserved or divergent genomic positions in two genetically distant nematode species-Caenorhabditis elegans and C. briggsae-that share a near-identical developmental program. We find, for all major patterns of temporal expression, a substantive amount of gene expression divergence. However, this divergence is not random: Genes that function in essential developmental processes show less divergence than genes whose functions are not required for viability. Coexpression of gene neighbors in either species is highly divergent in the other, in particular when the neighborhood is not conserved. Interestingly, essential genes appear to maintain their expression profiles despite changes in neighborhoods suggesting exposure to stronger selection. Our results suggest that a significant fraction of the coexpression observed among gene neighbors may be accounted for by neutral processes, and further that these may be distinguished by comparative gene expression analyses.
Collapse
Affiliation(s)
- Itai Yanai
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138, USA.
| | | |
Collapse
|
36
|
Tuller T, Ruppin E, Kupiec M. Properties of untranslated regions of the S. cerevisiae genome. BMC Genomics 2009; 10:391. [PMID: 19698117 PMCID: PMC2737003 DOI: 10.1186/1471-2164-10-391] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2009] [Accepted: 08/22/2009] [Indexed: 11/28/2022] Open
Abstract
Background During evolution selection forces such as changing environments shape the architecture of genomes. The distribution of genes along chromosomes and the length of intragenic regions are basic genomic features known to play a major role in the regulation of gene transcription and translation. Results In this work we perform the first large scale analysis of the length distribution of untranslated regions (promoters, 5' and 3' untranslated regions, terminators) in the genome of the yeast Saccharomyces cerevisiae. Our analysis shows that the length of each open reading frame (ORF) and that of its associated regulatory and untranslated regions significantly correlate with each other. Moreover, significant correlations with other features related to gene expression and evolution (number of regulating transcription factors, mRNA and protein abundance, evolutionary rate, etc) were observed. Furthermore, the function of genes seems to have an important role in the evolution of these lengths. Notably, genes that are related to RNA metabolism tend to have shorter untranslated regions and thus tend to be closer to their neighbouring genes while genes coding for cell wall proteins tend to be isolated in the genome. Conclusion These results indicate that genome architecture has a significant role in regulating gene expression, and in shaping the characteristics and functionality of proteins.
Collapse
Affiliation(s)
- Tamir Tuller
- School of Computer Science, Tel Aviv University, Ramat Aviv 69978, Israel.
| | | | | |
Collapse
|
37
|
Pignatelli M, Serras F, Moya A, Guigó R, Corominas M. CROC: finding chromosomal clusters in eukaryotic genomes. Bioinformatics 2009; 25:1552-3. [PMID: 19389737 DOI: 10.1093/bioinformatics/btp248] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
SUMMARY There is increasing evidence showing that co-expression of genes that cluster along the genome is a common characteristic of eukaryotic transcriptomes. Several algorithms have been used to date in the identification of these kinds of gene organization. Here, we present a web tool called CROC that aims to help in the identification and analysis of genomic gene clusters. This method has been successfully used before in the identification of chromosomal clusters in different eukaryotic species. AVAILABILITY The web server is freely available to non-commercial users at the following address: http://metagenomics.uv.es/CROC/.
Collapse
Affiliation(s)
- Miguel Pignatelli
- Instituto Cavanilles of Biodiversity and Evolutionary Biology, University of Valencia, Apdo, Valencia, Spain.
| | | | | | | | | |
Collapse
|
38
|
Tuller T, Rubinstein U, Bar D, Gurevitch M, Ruppin E, Kupiec M. Higher-order genomic organization of cellular functions in yeast. J Comput Biol 2009; 16:303-16. [PMID: 19193148 DOI: 10.1089/cmb.2008.15tt] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Previous studies have shown that the distribution of genes in prokaryotes and eukaryotic genomes is not random. Using the thousands of cellular functions that appear in the Gene Ontology (GO) project, we exhaustively studied the relation between functionality and genomic localization of genes across 16 organisms with rich GO ontologies (one prokaryote and 15 eukaryotes). Overall, we found that the genomic distribution of cellular functions tends to be more similar in organisms that have higher evolutionary proximity. At the primary level, which measures localization of functionally related genes, the prokaryote Escherichia coli exhibits the highest level of organization, as one would expect given its operon-based genomic organization. However, examining a higher level of genomic organization by analyzing the co-localization of pairs of different functional gene groups, we surprisingly find that the eukaryote yeast Saccharomyces cerevisiae is markedly more organized than E. coli. A network-based analysis further supports this notion and suggests that the eukaryotic genomic architecture is more organized than previously thought. See online Supplementary Material at (www.liebertonline.com).
Collapse
Affiliation(s)
- Tamir Tuller
- School of Computer Sciences, Tel Aviv University, Ramat Aviv, Israel.
| | | | | | | | | | | |
Collapse
|
39
|
Hurst LD. Fundamental concepts in genetics: genetics and the understanding of selection. Nat Rev Genet 2009; 10:83-93. [PMID: 19119264 DOI: 10.1038/nrg2506] [Citation(s) in RCA: 91] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
At first sight selection is a simple notion, and some consider it the most important evolutionary force. But how important is selection, is it really so trivial to understand and what are the alternatives? Here I discuss how genetics is crucial for addressing all of these questions: genetics allowed the concept of natural selection to become viable, it contributed to our understanding of the complexities of selection and it spurred the development of competing models of evolution. Understanding how and why selection acts has important potential applications, from understanding the mechanisms of disease and microbial resistance, to improving the design of transgenes and drugs.
Collapse
Affiliation(s)
- Laurence D Hurst
- Department of Biology and Biochemistry, University of Bath, Somerset, BA2 7AY, UK.
| |
Collapse
|
40
|
Liu X, Han B. Evolutionary conservation of neighbouring gene pairs in plants. Gene 2009; 437:71-9. [PMID: 19264115 DOI: 10.1016/j.gene.2009.02.012] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2008] [Revised: 02/12/2009] [Accepted: 02/16/2009] [Indexed: 12/12/2022]
Abstract
Evolutionary conservation of neighbouring gene pairs has been widely explored in many species, but remains poorly understood in plants. The availability of several plant genome sequences allows for an in-depth investigation of this problem in plants. Here, we analyzed the phylogenetic conservation of physically linked gene pairs in nine plant genomes and compared the conservation in different orientations. We also examined several potential determinants to detect whether they affect the conservation of neighbouring gene pairs. Our results suggested that among the three types of neighbouring gene pairs, closely linked parallel pairs might be the least conserved. Intergenic distance was shown to be a major determinant of linkage conservation, suggesting that the conservation of gene order in plants was determined primarily by chance. The enrichment of housekeeping genes was identified to contribute to the conservation of all three types and the enrichment of genes involved in protein metabolism might contribute to the conservation of parallel pairs. Moreover, a co-expressed signal was detected in conserved divergent pairs, which might be determined by intergenic distance.
Collapse
Affiliation(s)
- Xiling Liu
- National Center for Gene Research/Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 500 Caobao Road, Shanghai 200233, China
| | | |
Collapse
|
41
|
Babu MM, Janga SC, de Santiago I, Pombo A. Eukaryotic gene regulation in three dimensions and its impact on genome evolution. Curr Opin Genet Dev 2008; 18:571-82. [PMID: 19007886 DOI: 10.1016/j.gde.2008.10.002] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2008] [Revised: 10/07/2008] [Accepted: 10/09/2008] [Indexed: 12/11/2022]
Abstract
Recent advances in molecular techniques and high-resolution imaging are beginning to provide exciting insights into the higher order chromatin organization within the cell nucleus and its influence on eukaryotic gene regulation. This improved understanding of gene regulation also raises fundamental questions about how spatial features might have constrained the organization of genes on eukaryotic chromosomes and how mutations that affect these processes might contribute to disease conditions. In this review, we discuss recent studies that highlight the role of spatial components in gene regulation and their impact on genome evolution. We then address implications for human diseases and outline new directions for future research.
Collapse
Affiliation(s)
- M Madan Babu
- MRC Laboratory of Molecular Biology, Cambridge, UK.
| | | | | | | |
Collapse
|
42
|
Labbé J, Zhang X, Yin T, Schmutz J, Grimwood J, Martin F, Tuskan GA, Le Tacon F. A genetic linkage map for the ectomycorrhizal fungus Laccaria bicolor and its alignment to the whole-genome sequence assemblies. THE NEW PHYTOLOGIST 2008; 180:316-328. [PMID: 18783356 DOI: 10.1111/j.1469-8137.2008.02614.x] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
A genetic linkage map for the ectomycorrhizal basidiomycete Laccaria bicolor was constructed from 45 sib-homokaryotic haploid mycelial lines derived from the parental S238N strain progeny. For map construction, 294 simple sequence repeats (SSRs), single-nucleotide polymorphisms (SNPs), amplified fragment length polymorphisms (AFLPs) and random amplified polymorphic DNA (RAPD) markers were employed to identify and assay loci that segregated in backcross configuration. Using SNP, RAPD and SSR sequences, the L. bicolor whole-genome sequence (WGS) assemblies were aligned onto the linkage groups. A total of 37.36 Mbp of the assembled sequences was aligned to 13 linkage groups. Most mapped genetic markers used in alignment were colinear with the sequence assemblies, indicating that both the genetic map and sequence assemblies achieved high fidelity. The resulting matrix of recombination rates between all pairs of loci was used to construct an integrated linkage map using JoinMap. The final map consisted of 13 linkage groups spanning 812 centiMorgans (cM) at an average distance of 2.76 cM between markers (range 1.9-17 cM). The WGS and the present linkage map represent an initial step towards the identification and cloning of quantitative trait loci associated with development and functioning of the ectomycorrhizal symbiosis.
Collapse
Affiliation(s)
- J Labbé
- UMR 1136, INRA-Nancy Université, Interactions Arbres/Microorganismes, INRA-Nancy, 54280 Champenoux, France
| | - X Zhang
- Environmental Sciences Division, Oak Ridge National Laboratory, PO Box 2008, Oak Ridge, TN 37831-6422, USA
- Joint Genome Institute, 2500 Mitchell St, Walnut Creek, CA 94250, USA
| | - T Yin
- Environmental Sciences Division, Oak Ridge National Laboratory, PO Box 2008, Oak Ridge, TN 37831-6422, USA
- Joint Genome Institute, 2500 Mitchell St, Walnut Creek, CA 94250, USA
| | - J Schmutz
- Stanford Human Genome Center, Department of Genetics, Stanford University School of Medicine, 975 California Avenue, Palo Alto, CA 94304, USA
| | - J Grimwood
- Stanford Human Genome Center, Department of Genetics, Stanford University School of Medicine, 975 California Avenue, Palo Alto, CA 94304, USA
| | - F Martin
- UMR 1136, INRA-Nancy Université, Interactions Arbres/Microorganismes, INRA-Nancy, 54280 Champenoux, France
| | - G A Tuskan
- Environmental Sciences Division, Oak Ridge National Laboratory, PO Box 2008, Oak Ridge, TN 37831-6422, USA
- Joint Genome Institute, 2500 Mitchell St, Walnut Creek, CA 94250, USA
| | - F Le Tacon
- UMR 1136, INRA-Nancy Université, Interactions Arbres/Microorganismes, INRA-Nancy, 54280 Champenoux, France
| |
Collapse
|
43
|
Liao BY, Zhang J. Coexpression of linked genes in Mammalian genomes is generally disadvantageous. Mol Biol Evol 2008; 25:1555-65. [PMID: 18440951 PMCID: PMC2734128 DOI: 10.1093/molbev/msn101] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/19/2008] [Indexed: 01/06/2023] Open
Abstract
Similarity in gene expression pattern between closely linked genes is known in several eukaryotes. Two models have been proposed to explain the presence of such coexpression patterns. The adaptive model assumes that coexpression is advantageous and is established by relocation of initially unlinked but coexpressed genes, whereas the neutral model asserts that coexpression is a type of leaky expression due to similar expressional environments of linked genes, but is neither advantageous nor detrimental. However, these models are incompatible with several empirical observations. Here, we propose that coexpression of linked genes is a form of transcriptional interference that is disadvantageous to the organism. We show that even distantly linked genes that are tens of megabases away exhibit significant coexpression in the human genome. However, the linkage is more likely to be broken during evolution between genes of high coexpression than those of low coexpression and the breakage of linkage reduces gene coexpression. These results support our hypothesis that coexpression of linked genes in mammalian genomes is generally disadvantageous, implying that many mammalian genes may never reach their optimal expression pattern due to the interference of their genomic environment and that such transcriptional interference may be a force promoting recurrent relocation of genes in the genome.
Collapse
Affiliation(s)
- Ben-Yang Liao
- Department of Ecology and Evolutionary Biology, University of Michigan, USA
| | | |
Collapse
|
44
|
Current awareness on yeast. Yeast 2008. [DOI: 10.1002/yea.1563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
|
45
|
Makino T, McLysaght A. Interacting gene clusters and the evolution of the vertebrate immune system. Mol Biol Evol 2008; 25:1855-62. [PMID: 18573844 DOI: 10.1093/molbev/msn137] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Unraveling the "code" of genome structure is an important goal of genomics research. Colocalization of genes in eukaryotic genomes may facilitate preservation of favorable allele combinations between epistasic loci or coregulation of functionally related genes. However, the presence of interacting gene clusters in the human genome has remained unclear. We systematically searched the human genome for evidence of closely linked genes whose protein products interact. We find 83 pairs of interacting genes that are located within 1 Mbp in the human genome or 37 if we exclude hub proteins. This number of interacting gene clusters is significantly more than expected by chance and is not the result of tandem duplications. Furthermore, we find that these clusters are significantly more conserved across vertebrate (but not chordate) genomes than other pairs of genes located within 1 Mbp in the human genome. In many cases, the genes are both present but not clustered in older vertebrate lineages. These results suggest gene cluster creation along the human lineage. These clusters are not enriched for housekeeping genes, but we find a significant contribution from genes involved in "response to stimulus." Many of these genes are involved in the immune response, including, but not limited to, known clusters such as the major histocompatibility complex. That these clusters were formed contemporaneously with the origin of adaptive immunity within the vertebrate lineage suggests that novel evolutionary and regulatory constraints were associated with the operation of the immune system.
Collapse
Affiliation(s)
- Takashi Makino
- Smurfit Institute of Genetics, University of Dublin, Trinity College, Dublin, Ireland
| | | |
Collapse
|
46
|
Niculita-Hirzel H, Labbé J, Kohler A, Le Tacon F, Martin F, Sanders IR, Kües U. Gene organization of the mating type regions in the ectomycorrhizal fungus Laccaria bicolor reveals distinct evolution between the two mating type loci. THE NEW PHYTOLOGIST 2008; 180:329-342. [PMID: 18557817 DOI: 10.1111/j.1469-8137.2008.02525.x] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
In natural conditions, basidiomycete ectomycorrhizal fungi such as Laccaria bicolor are typically in the dikaryotic state when forming symbioses with trees, meaning that two genetically different individuals have to fuse or 'mate'. Nevertheless, nothing is known about the molecular mechanisms of mating in these ecologically important fungi. Here, advantage was taken of the first sequenced genome of the ectomycorrhizal fungus, Laccaria bicolor, to determine the genes that govern the establishment of cell-type identity and orchestrate mating. The L. bicolor mating type loci were identified through genomic screening. The evolutionary history of the genomic regions that contained them was determined by genome-wide comparison of L. bicolor sequences with those of known tetrapolar and bipolar basidiomycete species, and by phylogenetic reconstruction of gene family history. It is shown that the genes of the two mating type loci, A and B, are conserved across the Agaricales, but they are contained in regions of the genome with different evolutionary histories. The A locus is in a region where the gene order is under strong selection across the Agaricales. By contrast, the B locus is in a region where the gene order is likely under a low selection pressure but where gene duplication, translocation and transposon insertion are frequent.
Collapse
Affiliation(s)
- Hélène Niculita-Hirzel
- Department of Ecology and Evolution, University of Lausanne, Biophore Building, CH-1015 Lausanne, Switzerland
| | - Jessy Labbé
- UMR 1136, Interactions Arbres/Microorganismes, INRA-Nancy, F-54280 Champenoux, France
| | - Annegret Kohler
- UMR 1136, Interactions Arbres/Microorganismes, INRA-Nancy, F-54280 Champenoux, France
| | - François Le Tacon
- UMR 1136, Interactions Arbres/Microorganismes, INRA-Nancy, F-54280 Champenoux, France
| | - Francis Martin
- UMR 1136, Interactions Arbres/Microorganismes, INRA-Nancy, F-54280 Champenoux, France
| | - Ian R Sanders
- Department of Ecology and Evolution, University of Lausanne, Biophore Building, CH-1015 Lausanne, Switzerland
| | - Ursula Kües
- Molecular Wood Biotechnology and Technical Mycology, Büsgen-Institute, Georg-August-University Göttingen, D-37077 Göttingen, Germany
| |
Collapse
|
47
|
Kensche PR, Oti M, Dutilh BE, Huynen MA. Conservation of divergent transcription in fungi. Trends Genet 2008; 24:207-11. [PMID: 18375009 DOI: 10.1016/j.tig.2008.02.003] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2007] [Revised: 01/23/2008] [Accepted: 02/05/2008] [Indexed: 11/26/2022]
Abstract
The comparison of fully sequenced genomes enables the study of selective constraints that determine genome organisation. We show that, in fungi, adjacent divergently transcribed (<---->) genes are more conserved in orientation than convergent (--><--) or co-oriented (-->-->) gene pairs. Furthermore, the time divergent orientation of two genes is conserved correlates with the degree of their co-expression and with the likelihood of them being functionally related. The functional interactions of the proteins encoded by the conserved divergent gene pairs indicate a potential for protein function prediction in eukaryotes.
Collapse
Affiliation(s)
- Philip R Kensche
- Center for Molecular and Biomolecular Informatics, Nijmegen Center for Molecular Life Sciences, Radboud University Medical Center, 6500 HB Nijmegen, The Netherlands
| | | | | | | |
Collapse
|