1
|
Wang D. DLGP: A database for lineage-conserved and lineage-specific gene pairs in animal and plant genomes. Biochem Biophys Res Commun 2015; 469:542-5. [PMID: 26697753 DOI: 10.1016/j.bbrc.2015.12.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Accepted: 12/10/2015] [Indexed: 10/22/2022]
Abstract
The conservation of gene organization in the genome with lineage-specificity is an invaluable resource to decipher their potential functionality with diverse selective constraints, especially in higher animals and plants. Gene pairs appear to be the minimal structure for such kind of gene clusters that tend to reside in their preferred locations, representing the distinctive genomic characteristics in single species or a given lineage. Despite gene families having been investigated in a widespread manner, the definition of gene pair families in various taxa still lacks adequate attention. To address this issue, we report DLGP (http://lcgbase.big.ac.cn/DLGP/) that stores the pre-calculated lineage-based gene pairs in currently available 134 animal and plant genomes and inspect them under the same analytical framework, bringing out a set of innovational features. First, the taxonomy or lineage has been classified into four levels such as Kingdom, Phylum, Class and Order. It adopts all-to-all comparison strategy to identify the possible conserved gene pairs in all species for each gene pair in certain species and reckon those that are conserved in over a significant proportion of species in a given lineage (e.g. Primates, Diptera or Poales) as the lineage-conserved gene pairs. Furthermore, it predicts the lineage-specific gene pairs by retaining the above-mentioned lineage-conserved gene pairs that are not conserved in any other lineages. Second, it carries out pairwise comparison for the gene pairs between two compared species and creates the table including all the conserved gene pairs and the image elucidating the conservation degree of gene pairs in chromosomal level. Third, it supplies gene order browser to extend gene pairs to gene clusters, allowing users to view the evolution dynamics in the gene context in an intuitive manner. This database will be able to facilitate the particular comparison between animals and plants, between vertebrates and arthropods, and between monocots and eudicots, accounting for the significant contribution of gene pairs to speciation and diversification in specific lineages.
Collapse
Affiliation(s)
- Dapeng Wang
- Stem Cell Laboratory, UCL Cancer Institute, University College London, London WC1E 6BT, UK; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, PR China.
| |
Collapse
|
2
|
Wang D, Yu J. LCGserver: A Webserver for Exploring Evolutionary Trajectory of Gene Orders in a Large Number of Genomes. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2015; 19:574-7. [PMID: 26258441 DOI: 10.1089/omi.2015.0060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Genes and chromosomes are highly organized; together with protein-coding sequence, gene structure at per gene level and gene order at cluster level are both variable in a context of lineages and under natural selection. How gene order and chromosome organization are related and selected remains to be illuminated. The number of newly-sequenced genomes from various taxa has been increasing rapidly, but there have not been easy-to-use web tools that allow better visualization for gene order in a large genome collection. Here, we describe a webserver, LCGserver (http://lcgbase.big.ac.cn/LCGserver/), for exploring evolutionary dynamics of gene orders over diverse lineages. This server provides gene order information at three levels: single gene, paired gene (a minimal cluster), and clustered gene (more than two genes). The most exclusive feature of LCGserver is alignment and visualization of neighboring genes based on orthology, allowing users to inspect all conserved and dynamic events of gene order along chromosomes in a lineage-specific manner. In addition, it categories paired genes into six patterns and identifies fully-conserved gene clusters within and among lineages.
Collapse
Affiliation(s)
- Dapeng Wang
- 1 CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics , Chinese Academy of Sciences, Beijing, People's Republic of China .,2 Stem Cell Laboratory, UCL Cancer Institute, University College London , London, United Kingdom
| | - Jun Yu
- 1 CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics , Chinese Academy of Sciences, Beijing, People's Republic of China
| |
Collapse
|
3
|
Louis A, Murat F, Salse J, Crollius HR. GenomicusPlants: a web resource to study genome evolution in flowering plants. PLANT & CELL PHYSIOLOGY 2015; 56:e4. [PMID: 25432975 PMCID: PMC4301744 DOI: 10.1093/pcp/pcu177] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Comparative genomics combined with phylogenetic reconstructions are powerful approaches to study the evolution of genes and genomes. However, the current rapid expansion of the volume of genomic information makes it increasingly difficult to interrogate, integrate and synthesize comparative genome data while taking into account the maximum breadth of information available. GenomicusPlants (http://www.genomicus.biologie.ens.fr/genomicus-plants) is an extension of the Genomicus webserver that addresses this issue by allowing users to explore flowering plant genomes in an intuitive way, across the broadest evolutionary scales. Extant genomes of 26 flowering plants can be analyzed, as well as 23 ancestral reconstructed genomes. Ancestral gene order provides a long-term chronological view of gene order evolution, greatly facilitating comparative genomics and evolutionary studies. Four main interfaces ('views') are available where: (i) PhyloView combines phylogenetic trees with comparisons of genomic loci across any number of genomes; (ii) AlignView projects loci of interest against all other genomes to visualize its topological conservation; (iii) MatrixView compares two genomes in a classical dotplot representation; and (iv) Karyoview visualizes chromosome karyotypes 'painted' with colours of another genome of interest. All four views are interconnected and benefit from many customizable features.
Collapse
Affiliation(s)
- Alexandra Louis
- Ecole Normale Supérieure, Institut de Biologie de l'ENS, IBENS, Paris, F-75005 France CNRS, UMR 8197, Paris, F-75005 France Inserm, U1024, Paris, F-75005 France
| | - Florent Murat
- INRA/UBP UMR 1095 GDEC (Génétique, Diversité et Ecophysiologie des Céréales), Clermont Ferrand, France
| | - Jérôme Salse
- INRA/UBP UMR 1095 GDEC (Génétique, Diversité et Ecophysiologie des Céréales), Clermont Ferrand, France
| | - Hugues Roest Crollius
- Ecole Normale Supérieure, Institut de Biologie de l'ENS, IBENS, Paris, F-75005 France CNRS, UMR 8197, Paris, F-75005 France Inserm, U1024, Paris, F-75005 France
| |
Collapse
|
4
|
Hu F, Lin Y, Tang J. MLGO: phylogeny reconstruction and ancestral inference from gene-order data. BMC Bioinformatics 2014; 15:354. [PMID: 25376663 PMCID: PMC4236499 DOI: 10.1186/s12859-014-0354-6] [Citation(s) in RCA: 76] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Accepted: 10/16/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The rapid accumulation of whole-genome data has renewed interest in the study of using gene-order data for phylogenetic analyses and ancestral reconstruction. Current software and web servers typically do not support duplication and loss events along with rearrangements. RESULTS MLGO (Maximum Likelihood for Gene-Order Analysis) is a web tool for the reconstruction of phylogeny and/or ancestral genomes from gene-order data. MLGO is based on likelihood computation and shows advantages over existing methods in terms of accuracy, scalability and flexibility. CONCLUSIONS To the best of our knowledge, it is the first web tool for analysis of large-scale genomic changes including not only rearrangements but also gene insertions, deletions and duplications. The web tool is available from http://www.geneorder.org/server.php .
Collapse
Affiliation(s)
- Fei Hu
- Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin, 300072, China. .,Department of Computer Science and Engineering, University of South Carolina, Columbia, 29208, SC, USA.
| | - Yu Lin
- Department of Computer Science and Engineering, University of California, San Diego, 92093 La Jolla, CA, USA.
| | - Jijun Tang
- Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin, 300072, China. .,Department of Computer Science and Engineering, University of South Carolina, Columbia, 29208, SC, USA.
| |
Collapse
|
5
|
Wang D, Yu J. Plastid-LCGbase: a collection of evolutionarily conserved plastid-associated gene pairs. Nucleic Acids Res 2014; 43:D990-5. [PMID: 25378306 PMCID: PMC4383908 DOI: 10.1093/nar/gku1070] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Plastids carry their own genetic material that encodes a variable set of genes that are limited in number but functionally important. Aside from orthology, the lineage-specific order and orientation of these genes are also relevant. Here, we develop a database, Plastid-LCGbase (http://lcgbase.big.ac.cn/plastid-LCGbase/), which focuses on organizational variability of plastid genes and genomes from diverse taxonomic groups. The current Plastid-LCGbase contains information from 470 plastid genomes and exhibits several unique features. First, through a genome-overview page generated from OrganellarGenomeDRAW, it displays general arrangement of all plastid genes (circular or linear). Second, it shows patterns and modes of all paired plastid genes and their physical distances across user-defined lineages, which are facilitated by a step-wise stratification of taxonomic groups. Third, it divides the paired genes into three categories (co-directionally-paired genes or CDPGs, convergently-paired genes or CPGs and divergently-paired genes or DPGs) and three patterns (separation, overlap and inclusion) and provides basic statistics for each species. Fourth, the gene pairing scheme is expandable, where neighboring genes can also be included in species-/lineage-specific comparisons. We hope that Plastid-LCGbase facilitates gene variation (insertion-deletion, translocation and rearrangement) and transcription-level studies of plastid genomes.
Collapse
Affiliation(s)
- Dapeng Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, P. R. China Stem Cell Laboratory, UCL Cancer Institute, University College London, London WC1E 6BT, UK
| | - Jun Yu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, P. R. China
| |
Collapse
|
6
|
Moret BME, Lin Y, Tang J. Rearrangements in Phylogenetic Inference: Compare, Model, or Encode? MODELS AND ALGORITHMS FOR GENOME EVOLUTION 2013. [DOI: 10.1007/978-1-4471-5298-9_7] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
7
|
Louis A, Muffato M, Roest Crollius H. Genomicus: five genome browsers for comparative genomics in eukaryota. Nucleic Acids Res 2012. [PMID: 23193262 PMCID: PMC3531091 DOI: 10.1093/nar/gks1156] [Citation(s) in RCA: 132] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Genomicus (http://www.dyogen.ens.fr/genomicus/) is a database and an online tool that allows easy comparative genomic visualization in >150 eukaryote genomes. It provides a way to explore spatial information related to gene organization within and between genomes and temporal relationships related to gene and genome evolution. For the specific vertebrate phylum, it also provides access to ancestral gene order reconstructions and conserved non-coding elements information. We extended the Genomicus database originally dedicated to vertebrate to four new clades, including plants, non-vertebrate metazoa, protists and fungi. This visualization tool allows evolutionary phylogenomics analysis and exploration. Here, we describe the graphical modules of Genomicus and show how it is capable of revealing differential gene loss and gain, segmental or genome duplications and study the evolution of a locus through homology relationships.
Collapse
Affiliation(s)
- Alexandra Louis
- Ecole Normale Supérieure, Institut de Biologie de l'ENS, IBENS, Paris, France.
| | | | | |
Collapse
|
8
|
Levasseur A, Paganini J, Dainat J, Thompson JD, Poch O, Pontarotti P, Gouret P. The chordate proteome history database. Evol Bioinform Online 2012; 8:437-47. [PMID: 22904610 PMCID: PMC3418167 DOI: 10.4137/ebo.s9186] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The chordate proteome history database (http://ioda.univ-provence.fr) comprises some 20,000 evolutionary analyses of proteins from chordate species. Our main objective was to characterize and study the evolutionary histories of the chordate proteome, and in particular to detect genomic events and automatic functional searches. Firstly, phylogenetic analyses based on high quality multiple sequence alignments and a robust phylogenetic pipeline were performed for the whole protein and for each individual domain. Novel approaches were developed to identify orthologs/paralogs, and predict gene duplication/gain/loss events and the occurrence of new protein architectures (domain gains, losses and shuffling). These important genetic events were localized on the phylogenetic trees and on the genomic sequence. Secondly, the phylogenetic trees were enhanced by the creation of phylogroups, whereby groups of orthologous sequences created using OrthoMCL were corrected based on the phylogenetic trees; gene family size and gene gain/loss in a given lineage could be deduced from the phylogroups. For each ortholog group obtained from the phylogenetic or the phylogroup analysis, functional information and expression data can be retrieved. Database searches can be performed easily using biological objects: protein identifier, keyword or domain, but can also be based on events, eg, domain exchange events can be retrieved. To our knowledge, this is the first database that links group clustering, phylogeny and automatic functional searches along with the detection of important events occurring during genome evolution, such as the appearance of a new domain architecture.
Collapse
Affiliation(s)
- Anthony Levasseur
- INRA, UMR1163 Biotechnologie des Champignons Filamenteux, Aix Marseille Université, ESIL Polytech, 163 avenue de Luminy, CP 925, 13288 Marseille Cedex 09, France
| | | | | | | | | | | | | |
Collapse
|
9
|
Wang D, Zhang Y, Fan Z, Liu G, Yu J. LCGbase: A Comprehensive Database for Lineage-Based Co-regulated Genes. Evol Bioinform Online 2011; 8:39-46. [PMID: 22267903 PMCID: PMC3256993 DOI: 10.4137/ebo.s8540] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Animal genes of different lineages, such as vertebrates and arthropods, are well-organized and blended into dynamic chromosomal structures that represent a primary regulatory mechanism for body development and cellular differentiation. The majority of genes in a genome are actually clustered, which are evolutionarily stable to different extents and biologically meaningful when evaluated among genomes within and across lineages. Until now, many questions concerning gene organization, such as what is the minimal number of genes in a cluster and what is the driving force leading to gene co-regulation, remain to be addressed. Here, we provide a user-friendly database—LCGbase (a comprehensive database for lineage-based co-regulated genes)—hosting information on evolutionary dynamics of gene clustering and ordering within animal kingdoms in two different lineages: vertebrates and arthropods. The database is constructed on a web-based Linux-Apache-MySQL-PHP framework and effective interactive user-inquiry service. Compared to other gene annotation databases with similar purposes, our database has three comprehensible advantages. First, our database is inclusive, including all high-quality genome assemblies of vertebrates and representative arthropod species. Second, it is human-centric since we map all gene clusters from other genomes in an order of lineage-ranks (such as primates, mammals, warm-blooded, and reptiles) onto human genome and start the database from well-defined gene pairs (a minimal cluster where the two adjacent genes are oriented as co-directional, convergent, and divergent pairs) to large gene clusters. Furthermore, users can search for any adjacent genes and their detailed annotations. Third, the database provides flexible parameter definitions, such as the distance of transcription start sites between two adjacent genes, which is extendable to genes that flanking the cluster across species. We also provide useful tools for sequence alignment, gene ontology (GO) annotation, promoter identification, gene expression (co-expression), and evolutionary analysis. This database not only provides a way to define lineage-specific and species-specific gene clusters but also facilitates future studies on gene co-regulation, epigenetic control of gene expression (DNA methylation and histone marks), and chromosomal structures in a context of gene clusters and species evolution. LCGbase is freely available at http://lcgbase.big.ac.cn/LCGbase.
Collapse
Affiliation(s)
- Dapeng Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, PR China
| | | | | | | | | |
Collapse
|