1
|
Abstract
Syntenies are genomic segments of consecutive genes identified by a certain conservation in gene content and order. The notion of conservation may vary from one definition to another, the more constrained requiring identical gene contents and gene orders, while more relaxed definitions just require a certain similarity in gene content, and not necessarily in the same order. Regardless of the way they are identified, the goal is to characterize homologous genomic regions, i.e., regions deriving from a common ancestral region, reflecting a certain gene co-evolution that can enlighten important functional properties. In addition of being able to identify them, it is also necessary to infer the evolutionary history that has led from the ancestral segment to the extant ones. In this field, most algorithmic studies address the problem of inferring rearrangement scenarios explaining the disruption in gene order between segments with the same gene content, some of them extending the evolutionary model to gene insertion and deletion. However, syntenies also evolve through other events modifying their content in genes, such as duplications, losses or horizontal gene transfers, i.e., the movement of genes from one species to another. Although the reconciliation approach between a gene tree and a species tree addresses the problem of inferring such events for single-gene families, little effort has been dedicated to the generalization to segmental events and to syntenies. This paper reviews some of the main algorithmic methods for inferring ancestral syntenies and focus on those integrating both gene orders and gene trees.
Collapse
|
2
|
Velandia-Huerto CA, Berkemer SJ, Hoffmann A, Retzlaff N, Romero Marroquín LC, Hernández-Rosales M, Stadler PF, Bermúdez-Santana CI. Orthologs, turn-over, and remolding of tRNAs in primates and fruit flies. BMC Genomics 2016; 17:617. [PMID: 27515907 PMCID: PMC4981973 DOI: 10.1186/s12864-016-2927-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2016] [Accepted: 07/11/2016] [Indexed: 12/26/2022] Open
Abstract
Background Transfer RNAs (tRNAs) are ubiquitous in all living organism. They implement the genetic code so that most genomes contain distinct tRNAs for almost all 61 codons. They behave similar to mobile elements and proliferate in genomes spawning both local and non-local copies. Most tRNA families are therefore typically present as multicopy genes. The members of the individual tRNA families evolve under concerted or rapid birth-death evolution, so that paralogous copies maintain almost identical sequences over long evolutionary time-scales. To a good approximation these are functionally equivalent. Individual tRNA copies thus are evolutionary unstable and easily turn into pseudogenes and disappear. This leads to a rapid turnover of tRNAs and often large differences in the tRNA complements of closely related species. Since tRNA paralogs are not distinguished by sequence, common methods cannot not be used to establish orthology between tRNA genes. Results In this contribution we introduce a general framework to distinguish orthologs and paralogs in gene families that are subject to concerted evolution. It is based on the use of uniquely aligned adjacent sequence elements as anchors to establish syntenic conservation of sequence intervals. In practice, anchors and intervals can be extracted from genome-wide multiple sequence alignments. Syntenic clusters of concertedly evolving genes of different families can then be subdivided by list alignments, leading to usually small clusters of candidate co-orthologs. On the basis of recent advances in phylogenetic combinatorics, these candidate clusters can be further processed by cograph editing to recover their duplication histories. We developed a workflow that can be conceptualized as stepwise refinement of a graph of homologous genes. We apply this analysis strategy with different types of synteny anchors to investigate the evolution of tRNAs in primates and fruit flies. We identified a large number of tRNA remolding events concentrated at the tips of the phylogeny. With one notable exception all phylogenetically old tRNA remoldings do not change the isoacceptor class. Conclusions Gene families evolving under concerted evolution are not amenable to classical phylogenetic analyses since paralogs maintain identical, species-specific sequences, precluding the estimation of correct gene trees from sequence differences. This leaves conservation of syntenic arrangements with respect to “anchor elements” that are not subject to concerted evolution as the only viable source of phylogenetic information. We have demonstrated here that a purely synteny-based analysis of tRNA gene histories is indeed feasible. Although the choice of synteny anchors influences the resolution in particular when tight gene clusters are present, and the quality of sequence alignments, genome assemblies, and genome rearrangements limits the scope of the analysis, largely coherent results can be obtained for tRNAs. In particular, we conclude that a large fraction of the tRNAs are recent copies. This proliferation is compensated by rapid pseudogenization as exemplified by many very recent alloacceptor remoldings. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2927-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Cristian A Velandia-Huerto
- Biology Department, Universidad Nacional de Colombia, Carrera 45 # 26-85, Edif. Uriel Gutiérrez, Bogotá, D.C, Colombia
| | - Sarah J Berkemer
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, Leipzig, D-04103, Germany.,Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18D-04107, Leipzig, Germany
| | - Anne Hoffmann
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18D-04107, Leipzig, Germany
| | - Nancy Retzlaff
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, Leipzig, D-04103, Germany.,Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18D-04107, Leipzig, Germany
| | - Liliana C Romero Marroquín
- Biology Department, Universidad Nacional de Colombia, Carrera 45 # 26-85, Edif. Uriel Gutiérrez, Bogotá, D.C, Colombia
| | - Maribel Hernández-Rosales
- CONACYT - Instituto de Matemáticas, UNAM Juriquilla, Av. Juriquilla #3001, Santiago de Querétaro, MX-76230, QRO, México
| | - Peter F Stadler
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, Leipzig, D-04103, Germany. .,Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18D-04107, Leipzig, Germany. .,Fraunhofer Institut for Cell Therapy and Immunology, Perlickstraße 1, Leipzig, D-04103, Germany. .,Department of Theoretical Chemistry, University of Vienna, Währinger Straße 17, Vienna, A-1090, Austria. .,Center for non-coding RNA in Technology and Health, Grønegårdsvej 3, Frederiksberg C, DK-1870, Denmark. .,Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM87501, USA.
| | - Clara I Bermúdez-Santana
- Biology Department, Universidad Nacional de Colombia, Carrera 45 # 26-85, Edif. Uriel Gutiérrez, Bogotá, D.C, Colombia
| |
Collapse
|
3
|
Expansion of stochastic expression repertoire by tandem duplication in mouse Protocadherin-α cluster. Sci Rep 2014; 4:6263. [PMID: 25179445 PMCID: PMC4151104 DOI: 10.1038/srep06263] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2014] [Accepted: 08/13/2014] [Indexed: 11/08/2022] Open
Abstract
Tandem duplications are concentrated within the Pcdh cluster throughout vertebrate evolution and as copy number variations (CNVs) in human populations, but the effects of tandem duplication in the Pcdh cluster remain elusive. To investigate the effects of tandem duplication in the Pcdh cluster, here we generated and analyzed a new line of the Pcdh cluster mutant mice. In the mutant allele, a 218-kb region containing the Pcdh-α2 to Pcdh-αc2 variable exons with their promoters was duplicated and the individual duplicated Pcdh isoforms can be disctinguished. The individual duplicated Pcdh-α isoforms showed diverse expression level with stochastic expression manner, even though those have an identical promoter sequence. Interestingly, the 5'-located duplicated Pcdh-αc2, which is constitutively expressed in the wild-type brain, shifted to stochastic expression accompanied by increased DNA methylation. These results demonstrate that tandem duplication in the Pcdh cluster expands the stochastic expression repertoire irrespective of sequence divergence.
Collapse
|
4
|
Trujillo DI, Silverstein KAT, Young ND. Genomic characterization of the LEED..PEEDs, a gene family unique to the medicago lineage. G3 (BETHESDA, MD.) 2014; 4:2003-12. [PMID: 25155275 PMCID: PMC4199706 DOI: 10.1534/g3.114.011874] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/25/2014] [Accepted: 08/18/2014] [Indexed: 12/18/2022]
Abstract
The LEED..PEED (LP) gene family in Medicago truncatula (A17) is composed of 13 genes coding small putatively secreted peptides with one to two conserved domains of negatively charged residues. This family is not present in the genomes of Glycine max, Lotus japonicus, or the IRLC species Cicer arietinum. LP genes were also not detected in a Trifolium pratense draft genome or Pisum sativum nodule transcriptome, which were sequenced de novo in this study, suggesting that the LP gene family arose within the past 25 million years. M. truncatula accession HM056 has 13 LP genes with high similarity to those in A17, whereas M. truncatula ssp. tricycla (R108) and M. sativa have 11 and 10 LP gene copies, respectively. In M. truncatula A17, 12 LP genes are located on chromosome 7 within a 93-kb window, whereas one LP gene copy is located on chromosome 4. A phylogenetic analysis of the gene family is consistent with most gene duplications occurring prior to Medicago speciation events, mainly through local tandem duplications and one distant duplication across chromosomes. Synteny comparisons between R108 and A17 confirm that gene order is conserved between the two subspecies, although a further duplication occurred solely in A17. In M. truncatula A17, all 13 LPs are exclusively transcribed in nodules and absent from other plant tissues, including roots, leaves, flowers, seeds, seed shells, and pods. The recent expansion of LP genes in Medicago spp. and their timing and location of expression suggest a novel function in nodulation, possibly as an aftermath of the evolution of bacteroid terminal differentiation or potentially associated with rhizobial-host specificity.
Collapse
Affiliation(s)
- Diana I Trujillo
- Department of Plant Biology, University of Minnesota, St. Paul, Minnesota 55108
| | | | - Nevin D Young
- Department of Plant Biology, University of Minnesota, St. Paul, Minnesota 55108
| |
Collapse
|
5
|
The Genetic Basis of Primate Behavior: Genetics and Genomics in Field-Based Primatology. INT J PRIMATOL 2013; 35:1-10. [PMID: 25013243 DOI: 10.1007/s10764-013-9732-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
6
|
Kubota A, Bainy ACD, Woodin BR, Goldstone JV, Stegeman JJ. The cytochrome P450 2AA gene cluster in zebrafish (Danio rerio): expression of CYP2AA1 and CYP2AA2 and response to phenobarbital-type inducers. Toxicol Appl Pharmacol 2013; 272:172-9. [PMID: 23726801 DOI: 10.1016/j.taap.2013.05.017] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2013] [Revised: 05/15/2013] [Accepted: 05/18/2013] [Indexed: 11/17/2022]
Abstract
The cytochrome P450 (CYP) 2 gene family is the largest and most diverse CYP gene family in vertebrates. In zebrafish, we have identified 10 genes in a new subfamily, CYP2AA, which does not show orthology to any human or other mammalian CYP genes. Here we report evolutionary and structural relationships of the 10 CYP2AA genes and expression of the first two genes, CYP2AA1 and CYP2AA2. Parsimony reconstruction of the tandem duplication pattern for the CYP2AA cluster suggests that CYP2AA1, CYP2AA2 and CYP2AA3 likely arose in the earlier duplication events and thus are most diverged in function from the other CYP2AAs. On the other hand, CYP2AA8 and CYP2AA9 are genes that arose in the latest duplication event, implying functional similarity between these two CYPs. A molecular model of CYP2AA1 showing the sequence conservation across the CYP2AA cluster reveals that the regions with the highest variability within the cluster map onto CYP2AA1 near the substrate access channels, suggesting differing substrate specificities. Zebrafish CYP2AA1 transcript was expressed predominantly in the intestine, while CYP2AA2 was most highly expressed in the kidney, suggesting differing roles in physiology. In the liver CYP2AA2 expression but not that of CYP2AA1, was increased by 1,4-bis [2-(3,5-dichloropyridyloxy)] benzene (TCPOBOP) and, to a lesser extent, by phenobarbital (PB). In contrast, pregnenolone 16α-carbonitrile (PCN) increased CYP2AA1 expression, but not CYP2AA2 in the liver. The results identify a CYP2 subfamily in zebrafish that includes genes apparently induced by PB-type chemicals and PXR agonists, the first concrete in vivo evidence for a PB-type response in fish.
Collapse
Affiliation(s)
- Akira Kubota
- Biology Department, Woods Hole Oceanographic Institution, Woods Hole, MA 02543, USA
| | | | | | | | | |
Collapse
|
7
|
Bérard S, Gallien C, Boussau B, Szöllősi GJ, Daubin V, Tannier E. Evolution of gene neighborhoods within reconciled phylogenies. ACTA ACUST UNITED AC 2013; 28:i382-i388. [PMID: 22962456 PMCID: PMC3436801 DOI: 10.1093/bioinformatics/bts374] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Motivation: Most models of genome evolution integrating gene duplications, losses and chromosomal rearrangements are computationally intract
able, even when comparing only two genomes. This prevents large-scale studies that consider different types of genome structural variations. Results: We define an ‘adjacency phylogenetic tree’ that describes the evolution of an adjacency, a neighborhood relation between two genes, by speciation, duplication or loss of one or both genes, and rearrangement. We describe an algorithm that, given a species tree and a set of gene trees where the leaves are connected by adjacencies, computes an adjacency forest that minimizes the number of gains and breakages of adjacencies (caused by rearrangements) and runs in polynomial time. We use this algorithm to reconstruct contiguous regions of mammalian and plant ancestral genomes in a few minutes for a dozen species and several thousand genes. We show that this method yields reduced conflict between ancestral adjacencies. We detect duplications involving several genes and compare the different modes of evolution between phyla and among lineages. Availability: C++ implementation using BIO++ package, available upon request to Sèverine Bérard. Contact:Severine.Berard@cirad.fr or Eric.Tannier@inria.fr Supplementary information:Supplementary material is available at Bioinformatics online.
Collapse
|
8
|
Chauve C, El-Mabrouk N, Guéguen L, Semeria M, Tannier E. Duplication, Rearrangement and Reconciliation: A Follow-Up 13 Years Later. MODELS AND ALGORITHMS FOR GENOME EVOLUTION 2013. [DOI: 10.1007/978-1-4471-5298-9_4] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
9
|
Romero-Campero FJ, Lucas-Reina E, Said FE, Romero JM, Valverde F. A contribution to the study of plant development evolution based on gene co-expression networks. FRONTIERS IN PLANT SCIENCE 2013; 4:291. [PMID: 23935602 PMCID: PMC3732916 DOI: 10.3389/fpls.2013.00291] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2013] [Accepted: 07/13/2013] [Indexed: 05/04/2023]
Abstract
Phototrophic eukaryotes are among the most successful organisms on Earth due to their unparalleled efficiency at capturing light energy and fixing carbon dioxide to produce organic molecules. A conserved and efficient network of light-dependent regulatory modules could be at the bases of this success. This regulatory system conferred early advantages to phototrophic eukaryotes that allowed for specialization, complex developmental processes and modern plant characteristics. We have studied light-dependent gene regulatory modules from algae to plants employing integrative-omics approaches based on gene co-expression networks. Our study reveals some remarkably conserved ways in which eukaryotic phototrophs deal with day length and light signaling. Here we describe how a family of Arabidopsis transcription factors involved in photoperiod response has evolved from a single algal gene according to the innovation, amplification and divergence theory of gene evolution by duplication. These modifications of the gene co-expression networks from the ancient unicellular green algae Chlamydomonas reinhardtii to the modern brassica Arabidopsis thaliana may hint on the evolution and specialization of plants and other organisms.
Collapse
Affiliation(s)
| | - Eva Lucas-Reina
- Molecular Plant Development and Metabolism, Instituto de Bioquímica Vegetal y Fotosíntesis, Consejo Superior de Investigaciones Científicas y Universidad de SevillaSevilla, Spain
| | - Fatima E. Said
- Molecular Plant Development and Metabolism, Instituto de Bioquímica Vegetal y Fotosíntesis, Consejo Superior de Investigaciones Científicas y Universidad de SevillaSevilla, Spain
| | - José M. Romero
- Molecular Plant Development and Metabolism, Instituto de Bioquímica Vegetal y Fotosíntesis, Consejo Superior de Investigaciones Científicas y Universidad de SevillaSevilla, Spain
| | - Federico Valverde
- Molecular Plant Development and Metabolism, Instituto de Bioquímica Vegetal y Fotosíntesis, Consejo Superior de Investigaciones Científicas y Universidad de SevillaSevilla, Spain
- *Correspondence: Federico Valverde, Molecular Plant Development and Metabolism Group, Instituto de Bioquímica Vegetal y Fotosíntesis, Consejo Superior de Investigaciones Científicasy Universidad de Sevilla, 49th, Americo Vespucio Avenue, 41092 Sevilla, Spain e-mail:
| |
Collapse
|
10
|
Abstract
The purpose of this chapter is to provide a comprehensive review of the field of genome rearrangement, i.e., comparative genomics, based on the representation of genomes as ordered sequences of signed genes. We specifically focus on the "hard part" of genome rearrangement, how to handle duplicated genes. The main questions are: how have present-day genomes evolved from a common ancestor? What are the most realistic evolutionary scenarios explaining the observed gene orders? What was the content and structure of ancestral genomes? We aim to provide a concise but complete overview of the field, starting with the practical problem of finding an appropriate representation of a genome as a sequence of ordered genes or blocks, namely the problems of orthology, paralogy, and synteny block identification. We then consider three levels of gene organization: the gene family level (evolution by duplication, loss, and speciation), the cluster level (evolution by tandem duplications), and the genome level (all types of rearrangement events, including whole genome duplication).
Collapse
Affiliation(s)
- Nadia El-Mabrouk
- Département d'Informatique et de Recherche Opérationnelle, Université de Montréal, Montréal, QC, Canada
| | | |
Collapse
|
11
|
Tremblay Savard O, Bertrand D, El-Mabrouk N. Evolution of orthologous tandemly arrayed gene clusters. BMC Bioinformatics 2011; 12 Suppl 9:S2. [PMID: 22152029 PMCID: PMC3283317 DOI: 10.1186/1471-2105-12-s9-s2] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Tandemly Arrayed Gene (TAG) clusters are groups of paralogous genes that are found adjacent on a chromosome. TAGs represent an important repertoire of genes in eukaryotes. In addition to tandem duplication events, TAG clusters are affected during their evolution by other mechanisms, such as inversion and deletion events, that affect the order and orientation of genes. The DILTAG algorithm developed in 1 makes it possible to infer a set of optimal evolutionary histories explaining the evolution of a single TAG cluster, from an ancestral single gene, through tandem duplications (simple or multiple, direct or inverted), deletions and inversion events. RESULTS We present a general methodology, which is an extension of DILTAG, for the study of the evolutionary history of a set of orthologous TAG clusters in multiple species. In addition to the speciation events reflected by the phylogenetic tree of the considered species, the evolutionary events that are taken into account are simple or multiple tandem duplications, direct or inverted, simple or multiple deletions, and inversions. We analysed the performance of our algorithm on simulated data sets and we applied it to the protocadherin gene clusters of human, chimpanzee, mouse and rat. CONCLUSIONS Our results obtained on simulated data sets showed a good performance in inferring the total number and size distribution of duplication events. A limitation of the algorithm is however in dealing with multiple gene deletions, as the algorithm is highly exponential in this case, and becomes quickly intractable.
Collapse
Affiliation(s)
| | - Denis Bertrand
- Computational and Mathematical Biology, Genome Institute of Singapore, Singapore
| | - Nadia El-Mabrouk
- Department of Computer Science (DIRO), University of Montreal, Montreal, Quebec, Canada
| |
Collapse
|
12
|
Gabriško M, Janeček Š. Characterization of Maltase Clusters in the Genus Drosophila. J Mol Evol 2010; 72:104-18. [DOI: 10.1007/s00239-010-9406-3] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2010] [Accepted: 10/27/2010] [Indexed: 11/28/2022]
|