1
|
Enav H, Paz I, Ley RE. Strain tracking in complex microbiomes using synteny analysis reveals per-species modes of evolution. Nat Biotechnol 2024:10.1038/s41587-024-02276-2. [PMID: 38898177 DOI: 10.1038/s41587-024-02276-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 05/10/2024] [Indexed: 06/21/2024]
Abstract
Microbial species diversify into strains through single-nucleotide mutations and structural changes, such as recombination, insertions and deletions. Most strain-comparison methods quantify differences in single-nucleotide polymorphisms (SNPs) and are insensitive to structural changes. However, recombination is an important driver of phenotypic diversification in many species, including human pathogens. We introduce SynTracker, a tool that compares microbial strains using genome synteny-the order of sequence blocks in homologous genomic regions-in pairs of metagenomic assemblies or genomes. Genome synteny is a rich source of genomic information untapped by current strain-comparison tools. SynTracker has low sensitivity to SNPs, has no database requirement and is robust to sequencing errors. It outperforms existing tools when tracking strains in metagenomic data and is particularly suited for phages, plasmids and other low-data contexts. Applied to single-species datasets and human gut metagenomes, SynTracker, combined with an SNP-based tool, detects strains enriched in either point mutations or structural changes, providing insights into microbial evolution in situ.
Collapse
Affiliation(s)
- Hagay Enav
- Department of Microbiome Science, Max Planck Institute for Biology, Tübingen, Germany
| | - Inbal Paz
- Department of Microbiome Science, Max Planck Institute for Biology, Tübingen, Germany
| | - Ruth E Ley
- Department of Microbiome Science, Max Planck Institute for Biology, Tübingen, Germany.
- Cluster of Excellence EXC 2124: Controlling Microbes to Fight Infections (CMFI), University of Tübingen, Tübingen, Germany.
| |
Collapse
|
2
|
Zeng W, Qiao X, Li Q, Liu C, Wu J, Yin H, Zhang S. Genome-wide identification and comparative analysis of the ADH gene family in Chinese white pear (Pyrus bretschneideri) and other Rosaceae species. Genomics 2020; 112:3484-3496. [PMID: 32585175 DOI: 10.1016/j.ygeno.2020.06.031] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 05/27/2020] [Accepted: 06/19/2020] [Indexed: 01/26/2023]
Abstract
Alcohol dehydrogenase (ADH) is essential to the formation of aromatic compounds in fruits. However, the evolutionary history and characteristics of ADH gene expression remain largely unclear in Rosaceae fruit species. In this study, 464 ADH genes were identified in eight Rosaceae fruit species, 68 of the genes were from pear and which were classified into four subgroups. Frequent single gene duplication events were found to have contributed to the formation of ADH gene clusters and the expansion of the ADH gene family in these eight Rosaceae species. Purifying selection was the major force in ADH gene evolution. The younger genes derived from tandem and proximal duplications had evolved faster than those derived from other types of duplication. RNA-Seq and qRT-PCR analysis revealed that the expression levels of three ADH genes were closely correlated with the content of aromatic compounds detected during fruit development.
Collapse
Affiliation(s)
- Weiwei Zeng
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Centre of Pear Engineering Technology Research, Nanjing Agricultural University, Nanjing 210095, China
| | - Xin Qiao
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Centre of Pear Engineering Technology Research, Nanjing Agricultural University, Nanjing 210095, China
| | - Qionghou Li
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Centre of Pear Engineering Technology Research, Nanjing Agricultural University, Nanjing 210095, China.
| | - Chunxin Liu
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Centre of Pear Engineering Technology Research, Nanjing Agricultural University, Nanjing 210095, China.
| | - Jun Wu
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Centre of Pear Engineering Technology Research, Nanjing Agricultural University, Nanjing 210095, China.
| | - Hao Yin
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Centre of Pear Engineering Technology Research, Nanjing Agricultural University, Nanjing 210095, China.
| | - Shaoling Zhang
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Centre of Pear Engineering Technology Research, Nanjing Agricultural University, Nanjing 210095, China.
| |
Collapse
|
3
|
Gao K, Miller J. Primary orthologs from local sequence context. BMC Bioinformatics 2020; 21:48. [PMID: 32028880 PMCID: PMC7006074 DOI: 10.1186/s12859-020-3384-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 01/22/2020] [Indexed: 02/05/2023] Open
Abstract
BACKGROUND The evolutionary history of genes serves as a cornerstone of contemporary biology. Most conserved sequences in mammalian genomes don't code for proteins, yielding a need to infer evolutionary history of sequences irrespective of what kind of functional element they may encode. Thus, sequence-, as opposed to gene-, centric modes of inferring paths of sequence evolution are increasingly relevant. Customarily, homologous sequences derived from the same direct ancestor, whose ancestral position in two genomes is usually conserved, are termed "primary" (or "positional") orthologs. Methods based solely on similarity don't reliably distinguish primary orthologs from other homologs; for this, genomic context is often essential. Context-dependent identification of orthologs traditionally relies on genomic context over length scales characteristic of conserved gene order or whole-genome sequence alignment, and can be computationally intensive. RESULTS We demonstrate that short-range sequence context-as short as a single "maximal" match- distinguishes primary orthologs from other homologs across whole genomes. On mammalian whole genomes not preprocessed by repeat-masker, potential orthologs are extracted by genome intersection as "non-nested maximal matches:" maximal matches that are not nested into other maximal matches. It emerges that on both nucleotide and gene scales, non-nested maximal matches recapitulate primary or positional orthologs with high precision and high recall, while the corresponding computation consumes less than one thirtieth of the computation time required by commonly applied whole-genome alignment methods. In regions of genomes that would be masked by repeat-masker, non-nested maximal matches recover orthologs that are inaccessible to Lastz net alignment, for which repeat-masking is a prerequisite. mmRBHs, reciprocal best hits of genes containing non-nested maximal matches, yield novel putative orthologs, e.g. around 1000 pairs of genes for human-chimpanzee. CONCLUSIONS We describe an intersection-based method that requires neither repeat-masking nor alignment to infer evolutionary history of sequences based on short-range genomic sequence context. Ortholog identification based on non-nested maximal matches is parameter-free, and less computationally intensive than many alignment-based methods. It is especially suitable for genome-wide identification of orthologs, and may be applicable to unassembled genomes. We are agnostic as to the reasons for its effectiveness, which may reflect local variation of mean mutation rate.
Collapse
Affiliation(s)
- Kun Gao
- School of Science, Southwest University of Science and Technology, 59 Qinglong Road, Mianyang, Sichuan Province, 621010, People's Republic of China.
| | - Jonathan Miller
- Physics and Biology Unit, Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Kunigami-gun, Okinawa, 904-0495, Japan
| |
Collapse
|
4
|
Liu C, Qiao X, Li Q, Zeng W, Wei S, Wang X, Chen Y, Wu X, Wu J, Yin H, Zhang S. Genome-wide comparative analysis of the BAHD superfamily in seven Rosaceae species and expression analysis in pear (Pyrus bretschneideri). BMC PLANT BIOLOGY 2020; 20:14. [PMID: 31914928 PMCID: PMC6950883 DOI: 10.1186/s12870-019-2230-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 12/30/2019] [Indexed: 05/26/2023]
Abstract
BACKGROUND The BAHD acyltransferase superfamily exhibits various biological roles in plants, including regulating fruit quality, catalytic synthesizing of terpene, phenolics and esters, and improving stress resistance. However, the copy numbers, expression characteristics and associations with fruit aroma formation of the BAHD genes remain unclear. RESULTS In total, 717 BAHD genes were obtained from the genomes of seven Rosaceae, (Pyrus bretschneideri, Malus domestica, Prunus avium, Prunus persica, Fragaria vesca, Pyrus communis and Rubus occidentalis). Based on the detailed phylogenetic analysis and classifications in model plants, we divided the BAHD family genes into seven groups, I-a, I-b, II-a, II-b, III-a, IV and V. An inter-species synteny analysis revealed the ancient origin of BAHD superfamily with 78 syntenic gene pairs were detected among the seven Rosaceae species. Different types of gene duplication events jointly drive the expansion of BAHD superfamily, and purifying selection dominates the evolution of BAHD genes supported by the small Ka/Ks ratios. Based on the correlation analysis between the ester content and expression levels of BAHD genes at different developmental stages, four candidate genes were selected for verification as assessed by qRT-PCR. The result implied that Pbr020016.1, Pbr019034.1, Pbr014028.1 and Pbr029551.1 are important candidate genes involved in aroma formation during pear fruit development. CONCLUSION We have thoroughly identified the BAHD superfamily genes and performed a comprehensive comparative analysis of their phylogenetic relationships, expansion patterns, and expression characteristics in seven Rosaceae species, and we also obtained four candidate genes involved in aroma synthesis in pear fruit. These results provide a theoretical basis for future studies of the specific biological functions of BAHD superfamily members and the improvement of pear fruit quality.
Collapse
Affiliation(s)
- Chunxin Liu
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Xin Qiao
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Qionghou Li
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Weiwei Zeng
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Shuwei Wei
- Shandong Institute of Pomology, Taian, 271000, Shandong, China
| | - Xin Wang
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Yangyang Chen
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Xiao Wu
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Jun Wu
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Hao Yin
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China.
| | - Shaoling Zhang
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China.
| |
Collapse
|
5
|
Seoane P, Tapia-Paniagua ST, Bautista R, Alcaide E, Esteve C, Martínez-Manzanares E, Balebona MC, Claros MG, Moriñigo MA. TarSynFlow, a workflow for bacterial genome comparisons that revealed genes putatively involved in the probiotic character of Shewanella putrefaciens strain Pdp11. PeerJ 2019; 7:e6526. [PMID: 30842906 PMCID: PMC6397758 DOI: 10.7717/peerj.6526] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Accepted: 01/26/2019] [Indexed: 11/20/2022] Open
Abstract
Probiotic microorganisms are of great interest in clinical, livestock and aquaculture. Knowledge of the genomic basis of probiotic characteristics can be a useful tool to understand why some strains can be pathogenic while others are probiotic in the same species. An automatized workflow called TarSynFlow (Targeted Synteny Workflow) has been then developed to compare finished or draft bacterial genomes based on a set of proteins. When used to analyze the finished genome of the probiotic strain Pdp11 of Shewanella putrefaciens and genome drafts from seven known non-probiotic strains of the same species obtained in this work, 15 genes were found exclusive of Pdp11. Their presence was confirmed by PCR using Pdp11-specific primers. Functional inspection of the 15 genes allowed us to hypothesize that Pdp11 underwent genome rearrangements spurred by plasmids and mobile elements. As a result, Pdp11 presents specific proteins for gut colonization, bile salt resistance and gut pathogen adhesion inhibition, which can explain some probiotic features of Pdp11.
Collapse
Affiliation(s)
- Pedro Seoane
- Department of Molecular Biology and Biochemistry, Universidad de Málaga, Málaga, Spain
| | | | - Rocío Bautista
- Andalusian Platform for Bioinformatics, Universidad de Málaga, Málaga, Spain
| | - Elena Alcaide
- Department of Microbiology and Ecology, Universidad de Valencia, Valencia, Spain
| | - Consuelo Esteve
- Department of Microbiology and Ecology, Universidad de Valencia, Valencia, Spain
| | | | | | - M. Gonzalo Claros
- Department of Molecular Biology and Biochemistry, Universidad de Málaga, Málaga, Spain
- Andalusian Platform for Bioinformatics, Universidad de Málaga, Málaga, Spain
| | | |
Collapse
|
6
|
Linlin X, Xin Q, Mingyue Z, Shaoling Z. Genome-Wide analysis of aluminum-activated malate transporter family genes in six rosaceae species, and expression analysis and functional characterization on malate accumulation in Chinese white pear. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2018; 274:451-465. [PMID: 30080635 DOI: 10.1016/j.plantsci.2018.06.022] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Revised: 06/12/2018] [Accepted: 06/25/2018] [Indexed: 06/08/2023]
Abstract
Aluminum-activated malate transporters (ALMTs) exhibit a variety of physiological roles in plants to regulate fruit quality, but the evolutionary history of the ALMT family in the Rosaceae species remains unknown. In this study, a total of 113 ALMT homologous genes were identified from six Rosaceae species (Pyrus bretschneideri, Malus × domestica, Prunus persica, Fragaria vesca, Prunus mume, and Pyrus communis), and 27 of these sequences came from Chinese white pear, designated PbrALMT. Based on the phylogenetic analysis, we divided these ALMT genes into three main clusters (A-C). Conserved domain analysis indicated that all PbrALMT proteins contained the ALMT domain and the FUSC_2 domain, and fewer proteins included the FUSC domain. The results of subcellular localization experiments showed that parts of PbrALMT proteins containing the FUSC domain were located in the membrane. Collinearity analysis revealed that segmental and dispersed duplications were the primary forces underlying ALMT gene family expansion in the Rosaceae. Calculation of Ka/Ks between the paralogous pairs indicated that all of the genes in the PbrALMT family have evolved under negative selection. Combining the changes of malate content and transcriptome data analysis, five genes belonging to Cluster B were chosen for qRT-PCR, and the results revealed that Pbr020270.1, as a candidate gene, may play important roles in malate accumulation during pear fruit development. Further transgenic assay confirmed the above conclusion. The present study provides a foundation to better understand the molecular evolution of ALMT genes in pear and the functional characterization of these genes in the future.
Collapse
Affiliation(s)
- Xu Linlin
- Centre of Pear Engineering Technology Research, Nanjing Agricultural University, Nanjing 210095, China
| | - Qiao Xin
- Centre of Pear Engineering Technology Research, Nanjing Agricultural University, Nanjing 210095, China
| | - Zhang Mingyue
- Centre of Pear Engineering Technology Research, Nanjing Agricultural University, Nanjing 210095, China
| | - Zhang Shaoling
- Centre of Pear Engineering Technology Research, Nanjing Agricultural University, Nanjing 210095, China.
| |
Collapse
|
7
|
Sutphin GL, Mahoney JM, Sheppard K, Walton DO, Korstanje R. WORMHOLE: Novel Least Diverged Ortholog Prediction through Machine Learning. PLoS Comput Biol 2016; 12:e1005182. [PMID: 27812085 PMCID: PMC5094675 DOI: 10.1371/journal.pcbi.1005182] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Accepted: 10/05/2016] [Indexed: 01/01/2023] Open
Abstract
The rapid advancement of technology in genomics and targeted genetic manipulation has made comparative biology an increasingly prominent strategy to model human disease processes. Predicting orthology relationships between species is a vital component of comparative biology. Dozens of strategies for predicting orthologs have been developed using combinations of gene and protein sequence, phylogenetic history, and functional interaction with progressively increasing accuracy. A relatively new class of orthology prediction strategies combines aspects of multiple methods into meta-tools, resulting in improved prediction performance. Here we present WORMHOLE, a novel ortholog prediction meta-tool that applies machine learning to integrate 17 distinct ortholog prediction algorithms to identify novel least diverged orthologs (LDOs) between 6 eukaryotic species-humans, mice, zebrafish, fruit flies, nematodes, and budding yeast. Machine learning allows WORMHOLE to intelligently incorporate predictions from a wide-spectrum of strategies in order to form aggregate predictions of LDOs with high confidence. In this study we demonstrate the performance of WORMHOLE across each combination of query and target species. We show that WORMHOLE is particularly adept at improving LDO prediction performance between distantly related species, expanding the pool of LDOs while maintaining low evolutionary distance and a high level of functional relatedness between genes in LDO pairs. We present extensive validation, including cross-validated prediction of PANTHER LDOs and evaluation of evolutionary divergence and functional similarity, and discuss future applications of machine learning in ortholog prediction. A WORMHOLE web tool has been developed and is available at http://wormhole.jax.org/.
Collapse
Affiliation(s)
| | - J. Matthew Mahoney
- Department of Neurological Sciences, University of Vermont College of Medicine, Burlington, VT, United States of America
| | - Keith Sheppard
- The Jackson Laboratory, Bar Harbor, ME, United States of America
| | - David O. Walton
- The Jackson Laboratory, Bar Harbor, ME, United States of America
| | - Ron Korstanje
- The Jackson Laboratory, Bar Harbor, ME, United States of America
| |
Collapse
|
8
|
Glover NM, Redestig H, Dessimoz C. Homoeologs: What Are They and How Do We Infer Them? TRENDS IN PLANT SCIENCE 2016; 21:609-621. [PMID: 27021699 PMCID: PMC4920642 DOI: 10.1016/j.tplants.2016.02.005] [Citation(s) in RCA: 102] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Revised: 02/09/2016] [Accepted: 02/20/2016] [Indexed: 05/18/2023]
Abstract
The evolutionary history of nearly all flowering plants includes a polyploidization event. Homologous genes resulting from allopolyploidy are commonly referred to as 'homoeologs', although this term has not always been used precisely or consistently in the literature. With several allopolyploid genome sequencing projects under way, there is a pressing need for computational methods for homoeology inference. Here we review the definition of homoeology in historical and modern contexts and propose a precise and testable definition highlighting the connection between homoeologs and orthologs. In the second part, we survey experimental and computational methods of homoeolog inference, considering the strengths and limitations of each approach. Establishing a precise and evolutionarily meaningful definition of homoeology is essential for understanding the evolutionary consequences of polyploidization.
Collapse
Affiliation(s)
- Natasha M Glover
- Bayer CropScience NV, Technologiepark 38, 9052 Gent, Belgium; University College London, Gower Street, London WC1E 6BT, UK
| | | | - Christophe Dessimoz
- University College London, Gower Street, London WC1E 6BT, UK; University of Lausanne, Biophore, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, Biophore, 1015 Lausanne, Switzerland.
| |
Collapse
|
9
|
GenFamClust: an accurate, synteny-aware and reliable homology inference algorithm. BMC Evol Biol 2016; 16:120. [PMID: 27260514 PMCID: PMC4893229 DOI: 10.1186/s12862-016-0684-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Accepted: 05/12/2016] [Indexed: 11/24/2022] Open
Abstract
Background Homology inference is pivotal to evolutionary biology and is primarily based on significant sequence similarity, which, in general, is a good indicator of homology. Algorithms have also been designed to utilize conservation in gene order as an indication of homologous regions. We have developed GenFamClust, a method based on quantification of both gene order conservation and sequence similarity. Results In this study, we validate GenFamClust by comparing it to well known homology inference algorithms on a synthetic dataset. We applied several popular clustering algorithms on homologs inferred by GenFamClust and other algorithms on a metazoan dataset and studied the outcomes. Accuracy, similarity, dependence, and other characteristics were investigated for gene families yielded by the clustering algorithms. GenFamClust was also applied to genes from a set of complete fungal genomes and gene families were inferred using clustering. The resulting gene families were compared with a manually curated gold standard of pillars from the Yeast Gene Order Browser. We found that the gene-order component of GenFamClust is simple, yet biologically realistic, and captures local synteny information for homologs. Conclusions The study shows that GenFamClust is a more accurate, informed, and comprehensive pipeline to infer homologs and gene families than other commonly used homology and gene-family inference methods. Electronic supplementary material The online version of this article (doi:10.1186/s12862-016-0684-2) contains supplementary material, which is available to authorized users.
Collapse
|
10
|
Tekaia F. Inferring Orthologs: Open Questions and Perspectives. GENOMICS INSIGHTS 2016; 9:17-28. [PMID: 26966373 PMCID: PMC4778853 DOI: 10.4137/gei.s37925] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Revised: 12/30/2015] [Accepted: 01/02/2016] [Indexed: 01/25/2023]
Abstract
With the increasing number of sequenced genomes and their comparisons, the detection of orthologs is crucial for reliable functional annotation and evolutionary analyses of genes and species. Yet, the dynamic remodeling of genome content through gain, loss, transfer of genes, and segmental and whole-genome duplication hinders reliable orthology detection. Moreover, the lack of direct functional evidence and the questionable quality of some available genome sequences and annotations present additional difficulties to assess orthology. This article reviews the existing computational methods and their potential accuracy in the high-throughput era of genome sequencing and anticipates open questions in terms of methodology, reliability, and computation. Appropriate taxon sampling together with combination of methods based on similarity, phylogeny, synteny, and evolutionary knowledge that may help detecting speciation events appears to be the most accurate strategy. This review also raises perspectives on the potential determination of orthology throughout the whole species phylogeny.
Collapse
Affiliation(s)
- Fredj Tekaia
- Institut Pasteur, Unit of Structural Microbiology, CNRS URA 3528 and University Paris Diderot, Sorbonne Paris Cité, Paris, France
| |
Collapse
|
11
|
Qiao X, Li M, Li L, Yin H, Wu J, Zhang S. Genome-wide identification and comparative analysis of the heat shock transcription factor family in Chinese white pear (Pyrus bretschneideri) and five other Rosaceae species. BMC PLANT BIOLOGY 2015; 15:12. [PMID: 25604453 PMCID: PMC4310194 DOI: 10.1186/s12870-014-0401-5] [Citation(s) in RCA: 88] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2014] [Accepted: 12/22/2014] [Indexed: 05/03/2023]
Abstract
BACKGROUND Heat shock transcription factors (Hsfs), which act as important transcriptional regulatory proteins in eukaryotes, play a central role in controlling the expression of heat-responsive genes. At present, the genomes of Chinese white pear ('Dangshansuli') and five other Rosaceae fruit crops have been fully sequenced. However, information about the Hsfs gene family in these Rosaceae species is limited, and the evolutionary history of the Hsfs gene family also remains unresolved. RESULTS In this study, 137 Hsf genes were identified from six Rosaceae species (Pyrus bretschneideri, Malus × domestica, Prunus persica, Fragaria vesca, Prunus mume, and Pyrus communis), 29 of which came from Chinese white pear, designated as PbHsf. Based on the structural characteristics and phylogenetic analysis of these sequences, the Hsf family genes could be classified into three main groups (classes A, B, and C). Segmental and dispersed duplications were the primary forces underlying Hsf gene family expansion in the Rosaceae. Most of the PbHsf duplicated gene pairs were dated back to the recent whole-genome duplication (WGD, 30-45 million years ago (MYA)). Purifying selection also played a critical role in the evolution of Hsf genes. Transcriptome data demonstrated that the expression levels of the PbHsf genes were widely different. Six PbHsf genes were upregulated in fruit under naturally increased temperature. CONCLUSION A comprehensive analysis of Hsf genes was performed in six Rosaceae species, and 137 full length Hsf genes were identified. The results presented here will undoubtedly be useful for better understanding the complexity of the Hsf gene family and will facilitate functional characterization in future studies.
Collapse
Affiliation(s)
- Xin Qiao
- College of Horticulture, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, China.
| | - Meng Li
- College of Horticulture, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, China.
| | - Leiting Li
- College of Horticulture, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, China.
| | - Hao Yin
- College of Horticulture, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, China.
| | - Juyou Wu
- College of Horticulture, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, China.
| | - Shaoling Zhang
- College of Horticulture, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, China.
| |
Collapse
|
12
|
Pereira C, Denise A, Lespinet O. A meta-approach for improving the prediction and the functional annotation of ortholog groups. BMC Genomics 2014; 15 Suppl 6:S16. [PMID: 25573073 PMCID: PMC4240552 DOI: 10.1186/1471-2164-15-s6-s16] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In comparative genomics, orthologs are used to transfer annotation from genes already characterized to newly sequenced genomes. Many methods have been developed for finding orthologs in sets of genomes. However, the application of different methods on the same proteome set can lead to distinct orthology predictions. METHODS We developed a method based on a meta-approach that is able to combine the results of several methods for orthologous group prediction. The purpose of this method is to produce better quality results by using the overlapping results obtained from several individual orthologous gene prediction procedures. Our method proceeds in two steps. The first aims to construct seeds for groups of orthologous genes; these seeds correspond to the exact overlaps between the results of all or several methods. In the second step, these seed groups are expanded by using HMM profiles. RESULTS We evaluated our method on two standard reference benchmarks, OrthoBench and Orthology Benchmark Service. Our method presents a higher level of accurately predicted groups than the individual input methods of orthologous group prediction. Moreover, our method increases the number of annotated orthologous pairs without decreasing the annotation quality compared to twelve state-of-the-art methods. CONCLUSIONS The meta-approach based method appears to be a reliable procedure for predicting orthologous groups. Since a large number of methods for predicting groups of orthologous genes exist, it is quite conceivable to apply this meta-approach to several combinations of different methods.
Collapse
|
13
|
Peeters N, Carrère S, Anisimova M, Plener L, Cazalé AC, Genin S. Repertoire, unified nomenclature and evolution of the Type III effector gene set in the Ralstonia solanacearum species complex. BMC Genomics 2013; 14:859. [PMID: 24314259 PMCID: PMC3878972 DOI: 10.1186/1471-2164-14-859] [Citation(s) in RCA: 139] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2013] [Accepted: 11/29/2013] [Indexed: 12/21/2022] Open
Abstract
Background Ralstonia solanacearum is a soil-borne beta-proteobacterium that causes bacterial wilt disease in many food crops and is a major problem for agriculture in intertropical regions. R. solanacearum is a heterogeneous species, both phenotypically and genetically, and is considered as a species complex. Pathogenicity of R. solanacearum relies on the Type III secretion system that injects Type III effector (T3E) proteins into plant cells. T3E collectively perturb host cell processes and modulate plant immunity to enable bacterial infection. Results We provide the catalogue of T3E in the R. solanacearum species complex, as well as candidates in newly sequenced strains. 94 T3E orthologous groups were defined on phylogenetic bases and ordered using a uniform nomenclature. This curated T3E catalog is available on a public website and a bioinformatic pipeline has been designed to rapidly predict T3E genes in newly sequenced strains. Systematical analyses were performed to detect lateral T3E gene transfer events and identify T3E genes under positive selection. Our analyses also pinpoint the RipF translocon proteins as major discriminating determinants among the phylogenetic lineages. Conclusions Establishment of T3E repertoires in strains representatives of the R. solanacearum biodiversity allowed determining a set of 22 T3E present in all the strains but provided no clues on host specificity determinants. The definition of a standardized nomenclature and the optimization of predictive tools will pave the way to understanding how variation of these repertoires is correlated to the diversification of this species complex and how they contribute to the different strain pathotypes.
Collapse
Affiliation(s)
- Nemo Peeters
- INRA, Laboratoire des Interactions Plantes-Microorganismes (LIPM), UMR441, F-31326 Castanet-Tolosan, France.
| | | | | | | | | | | |
Collapse
|
14
|
Wolf YI, Koonin EV. A tight link between orthologs and bidirectional best hits in bacterial and archaeal genomes. Genome Biol Evol 2013; 4:1286-94. [PMID: 23160176 PMCID: PMC3542571 DOI: 10.1093/gbe/evs100] [Citation(s) in RCA: 71] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Orthologous relationships between genes are routinely inferred from bidirectional best hits (BBH) in pairwise genome comparisons. However, to our knowledge, it has never been quantitatively demonstrated that orthologs form BBH. To test this “BBH-orthology conjecture,” we take advantage of the operon organization of bacterial and archaeal genomes and assume that, when two genes in compared genomes are flanked by two BBH show statistically significant sequence similarity to one another, these genes are bona fide orthologs. Under this assumption, we tested whether middle genes in “syntenic orthologous gene triplets” form BBH. We found that this was the case in more than 95% of the syntenic gene triplets in all genome comparisons. A detailed examination of the exceptions to this pattern, including maximum likelihood phylogenetic tree analysis, showed that some of these deviations involved artifacts of genome annotation, whereas very small fractions represented random assignment of the best hit to one of closely related in-paralogs, paralogous displacement in situ, or even less frequent genuine violations of the BBH–orthology conjecture caused by acceleration of evolution in one of the orthologs. We conclude that, at least in prokaryotes, genes for which independent evidence of orthology is available typically form BBH and, conversely, BBH can serve as a strong indication of gene orthology.
Collapse
|
15
|
Wise MJ. Mean protein evolutionary distance: a method for comparative protein evolution and its application. PLoS One 2013; 8:e61276. [PMID: 23613826 PMCID: PMC3626687 DOI: 10.1371/journal.pone.0061276] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2012] [Accepted: 03/08/2013] [Indexed: 12/26/2022] Open
Abstract
Proteins are under tight evolutionary constraints, so if a protein changes it can only do so in ways that do not compromise its function. In addition, the proteins in an organism evolve at different rates. Leveraging the history of patristic distance methods, a new method for analysing comparative protein evolution, called Mean Protein Evolutionary Distance (MeaPED), measures differential resistance to evolutionary pressure across viral proteomes and is thereby able to point to the proteins’ roles. Different species’ proteomes can also be compared because the results, consistent across virus subtypes, concisely reflect the very different lifestyles of the viruses. The MeaPED method is here applied to influenza A virus, hepatitis C virus, human immunodeficiency virus (HIV), dengue virus, rotavirus A, polyomavirus BK and measles, which span the positive and negative single-stranded, doubled-stranded and reverse transcribing RNA viruses, and double-stranded DNA viruses. From this analysis, host interaction proteins including hemagglutinin (influenza), and viroporins agnoprotein (polyomavirus), p7 (hepatitis C) and VPU (HIV) emerge as evolutionary hot-spots. By contrast, RNA-directed RNA polymerase proteins including L (measles), PB1/PB2 (influenza) and VP1 (rotavirus), and internal serine proteases such as NS3 (dengue and hepatitis C virus) emerge as evolutionary cold-spots. The hot spot influenza hemagglutinin protein is contrasted with the related cold spot H protein from measles. It is proposed that evolutionary cold-spot proteins can become significant targets for second-line anti-viral therapeutics, in cases where front-line vaccines are not available or have become ineffective due to mutations in the hot-spot, generally more antigenically exposed proteins. The MeaPED package is available from www.pam1.bcs.uwa.edu.au/~michaelw/ftp/src/meaped.tar.gz.
Collapse
Affiliation(s)
- Michael J Wise
- School of Chemistry and Biochemistry, University of Western Australia, Crawley, Western Australia, Australia.
| |
Collapse
|
16
|
Wilkins AD, Bachman BJ, Erdin S, Lichtarge O. The use of evolutionary patterns in protein annotation. Curr Opin Struct Biol 2012; 22:316-25. [PMID: 22633559 DOI: 10.1016/j.sbi.2012.05.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2012] [Accepted: 05/01/2012] [Indexed: 01/13/2023]
Abstract
With genomic data skyrocketing, their biological interpretation remains a serious challenge. Diverse computational methods address this problem by pointing to the existence of recurrent patterns among sequence, structure, and function. These patterns emerge naturally from evolutionary variation, natural selection, and divergence--the defining features of biological systems--and they identify molecular events and shapes that underlie specificity of function and allosteric communication. Here we review these methods, and the patterns they identify in case studies and in proteome-wide applications, to infer and rationally redesign function.
Collapse
Affiliation(s)
- Angela D Wilkins
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | | | | | | |
Collapse
|
17
|
Dewey CN. Positional orthology: putting genomic evolutionary relationships into context. Brief Bioinform 2011; 12:401-12. [PMID: 21705766 PMCID: PMC3178058 DOI: 10.1093/bib/bbr040] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Orthology is a powerful refinement of homology that allows us to describe more precisely the evolution of genomes and understand the function of the genes they contain. However, because orthology is not concerned with genomic position, it is limited in its ability to describe genes that are likely to have equivalent roles in different genomes. Because of this limitation, the concept of ‘positional orthology’ has emerged, which describes the relation between orthologous genes that retain their ancestral genomic positions. In this review, we formally define this concept, for which we introduce the shorter term ‘toporthology’, with respect to the evolutionary events experienced by a gene’s ancestors. Through a discussion of recent studies on the role of genomic context in gene evolution, we show that the distinction between orthology and toporthology is biologically significant. We then review a number of orthology prediction methods that take genomic context into account and thus that may be used to infer the important relation of toporthology.
Collapse
Affiliation(s)
- Colin N Dewey
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, 5785 Medical Sciences Center, 1300 University Ave, Madison, WI 53706, USA.
| |
Collapse
|
18
|
Deniélou YP, Sagot MF, Boyer F, Viari A. Bacterial syntenies: an exact approach with gene quorum. BMC Bioinformatics 2011; 12:193. [PMID: 21605461 PMCID: PMC3121647 DOI: 10.1186/1471-2105-12-193] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2011] [Accepted: 05/24/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The automatic identification of syntenies across multiple species is a key step in comparative genomics that helps biologists shed light both on evolutionary and functional problems. RESULTS In this paper, we present a versatile tool to extract all syntenies from multiple bacterial species based on a clear-cut and very flexible definition of the synteny blocks that allows for gene quorum, partial gene correspondence, gaps, and a partial or total conservation of the gene order. CONCLUSIONS We apply this tool to two different kinds of studies. The first one is a search for functional gene associations. In this context, we compare our tool to a widely used heuristic--I-ADHORE--and show that at least up to ten genomes, the problem remains tractable with our exact definition and algorithm. The second application is linked to evolutionary studies: we verify in a multiple alignment setting that pairs of orthologs in synteny are more conserved than pairs outside, thus extending a previous pairwise study. We then show that this observation is in fact a function of the size of the synteny: the larger the block of synteny is, the more conserved the genes are.
Collapse
Affiliation(s)
- Yves-Pol Deniélou
- INRIA Grenoble-Rhône-Alpes, Team BAMBOO, 655 Avenue de l'Europe, 38334 Montbonnot Cedex, France.
| | | | | | | |
Collapse
|
19
|
Salichos L, Rokas A. Evaluating ortholog prediction algorithms in a yeast model clade. PLoS One 2011; 6:e18755. [PMID: 21533202 PMCID: PMC3076445 DOI: 10.1371/journal.pone.0018755] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2010] [Accepted: 03/15/2011] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Accurate identification of orthologs is crucial for evolutionary studies and for functional annotation. Several algorithms have been developed for ortholog delineation, but so far, manually curated genome-scale biological databases of orthologous genes for algorithm evaluation have been lacking. We evaluated four popular ortholog prediction algorithms (MultiParanoid; and OrthoMCL; RBH: Reciprocal Best Hit; RSD: Reciprocal Smallest Distance; the last two extended into clustering algorithms cRBH and cRSD, respectively, so that they can predict orthologs across multiple taxa) against a set of 2,723 groups of high-quality curated orthologs from 6 Saccharomycete yeasts in the Yeast Gene Order Browser. RESULTS Examination of sensitivity [TP/(TP+FN)], specificity [TN/(TN+FP)], and accuracy [(TP+TN)/(TP+TN+FP+FN)] across a broad parameter range showed that cRBH was the most accurate and specific algorithm, whereas OrthoMCL was the most sensitive. Evaluation of the algorithms across a varying number of species showed that cRBH had the highest accuracy and lowest false discovery rate [FP/(FP+TP)], followed by cRSD. Of the six species in our set, three descended from an ancestor that underwent whole genome duplication. Subsequent differential duplicate loss events in the three descendants resulted in distinct classes of gene loss patterns, including cases where the genes retained in the three descendants are paralogs, constituting 'traps' for ortholog prediction algorithms. We found that the false discovery rate of all algorithms dramatically increased in these traps. CONCLUSIONS These results suggest that simple algorithms, like cRBH, may be better ortholog predictors than more complex ones (e.g., OrthoMCL and MultiParanoid) for evolutionary and functional genomics studies where the objective is the accurate inference of single-copy orthologs (e.g., molecular phylogenetics), but that all algorithms fail to accurately predict orthologs when paralogy is rampant.
Collapse
Affiliation(s)
- Leonidas Salichos
- Department of Biological Sciences, Vanderbilt University, Nashville,
Tennessee, United States of America
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville,
Tennessee, United States of America
| |
Collapse
|
20
|
argC Orthologs from Rhizobiales show diverse profiles of transcriptional efficiency and functionality in Sinorhizobium meliloti. J Bacteriol 2010; 193:460-72. [PMID: 21075924 DOI: 10.1128/jb.01010-10] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Several factors can influence ortholog replacement between closely related species. We evaluated the transcriptional expression and metabolic performance of ortholog substitution complementing a Sinorhizobium meliloti argC mutant with argC from Rhizobiales (Agrobacterium tumefaciens, Rhizobium etli, and Mesorhizobium loti). The argC gene is necessary for the synthesis of arginine, an amino acid that is central to protein and cellular metabolism. Strains were obtained carrying plasmids with argC orthologs expressed under the speB and argC (S. meliloti) and lac (Escherichia coli) promoters. Complementation analysis was assessed by growth, transcriptional activity, enzymatic activity, mRNA levels, specific detection of ArgC proteomic protein, and translational efficiency. The argC orthologs performed differently in each complementation, reflecting the diverse factors influencing gene expression and the ability of the ortholog product to function in a foreign metabolic background. Optimal complementation was directly related to sequence similarity with S. meliloti, and was inversely related to species signature, with M. loti argC showing the poorest performance, followed by R. etli and A. tumefaciens. Different copy numbers of genes and amounts of mRNA and protein were produced, even with genes transcribed from the same promoter, indicating that coding sequences play a role in the transcription and translation processes. These results provide relevant information for further genomic analyses and suggest that orthologous gene substitutions between closely related species are not completely functionally equivalent.
Collapse
|
21
|
Dutta A, Paul S, Dutta C. GC-rich intra-operonic spacers in prokaryotes: Possible relation to gene order conservation. FEBS Lett 2010; 584:4633-8. [DOI: 10.1016/j.febslet.2010.10.037] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2010] [Revised: 10/12/2010] [Accepted: 10/15/2010] [Indexed: 11/28/2022]
|
22
|
Mahmood K, Konagurthu AS, Song J, Buckle AM, Webb GI, Whisstock JC. EGM: encapsulated gene-by-gene matching to identify gene orthologs and homologous segments in genomes. Bioinformatics 2010; 26:2076-84. [DOI: 10.1093/bioinformatics/btq339] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
|
23
|
Grossetête S, Labedan B, Lespinet O. FUNGIpath: a tool to assess fungal metabolic pathways predicted by orthology. BMC Genomics 2010; 11:81. [PMID: 20122162 PMCID: PMC2829015 DOI: 10.1186/1471-2164-11-81] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2009] [Accepted: 02/01/2010] [Indexed: 11/29/2022] Open
Abstract
Background More and more completely sequenced fungal genomes are becoming available and many more sequencing projects are in progress. This deluge of data should improve our knowledge of the various primary and secondary metabolisms of Fungi, including their synthesis of useful compounds such as antibiotics or toxic molecules such as mycotoxins. Functional annotation of many fungal genomes is imperfect, especially of genes encoding enzymes, so we need dedicated tools to analyze their metabolic pathways in depth. Description FUNGIpath is a new tool built using a two-stage approach. Groups of orthologous proteins predicted using complementary methods of detection were collected in a relational database. Each group was further mapped on to steps in the metabolic pathways published in the public databases KEGG and MetaCyc. As a result, FUNGIpath allows the primary and secondary metabolisms of the different fungal species represented in the database to be compared easily, making it possible to assess the level of specificity of various pathways at different taxonomic distances. It is freely accessible at http://www.fungipath.u-psud.fr. Conclusions As more and more fungal genomes are expected to be sequenced during the coming years, FUNGIpath should help progressively to reconstruct the ancestral primary and secondary metabolisms of the main branches of the fungal tree of life and to elucidate the evolution of these ancestral fungal metabolisms to various specific derived metabolisms.
Collapse
Affiliation(s)
- Sandrine Grossetête
- Institut de Génétique et de Microbiologie, Université Paris-Sud 11, CNRS UMR 8621, Bâtiment 400, 91405 Orsay Cedex, France
| | | | | |
Collapse
|
24
|
Assessing the quality of whole genome alignments in bacteria. Adv Bioinformatics 2010:749027. [PMID: 20049164 PMCID: PMC2798158 DOI: 10.1155/2009/749027] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2009] [Accepted: 08/28/2009] [Indexed: 11/17/2022] Open
Abstract
Comparing genomes is an essential preliminary step to solve many problems in
biology. Matching long similar segments between two genomes is a precondition for their evolutionary, genetic, and genome rearrangement analyses. Though various comparison methods have been developed in recent years, a quantitative assessment of their performance is lacking. Here, we describe two families of assessment measures whose purpose is to evaluate bacteria-oriented comparison tools. The first measure is based on how well the genome segmentation fits the gene annotation of the studied organisms; the second uses the number of segments created by the segmentation and the percentage of the two genomes that are conserved. The effectiveness of the two measures is demonstrated by applying them to the results of genome comparison tools obtained on 41 pairs of bacterial species. Despite the difference in the nature of the two types of measurements, both show consistent results, providing insights into the subtle differences between the mapping tools.
Collapse
|
25
|
Jun J, Mandoiu II, Nelson CE. Identification of mammalian orthologs using local synteny. BMC Genomics 2009; 10:630. [PMID: 20030836 PMCID: PMC2807883 DOI: 10.1186/1471-2164-10-630] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2009] [Accepted: 12/23/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Accurate determination of orthology is central to comparative genomics. For vertebrates in particular, very large gene families, high rates of gene duplication and loss, multiple mechanisms of gene duplication, and high rates of retrotransposition all combine to make inference of orthology between genes difficult. Many methods have been developed to identify orthologous genes, mostly based upon analysis of the inferred protein sequence of the genes. More recently, methods have been proposed that use genomic context in addition to protein sequence to improve orthology assignment in vertebrates. Such methods have been most successfully implemented in fungal genomes and have long been used in prokaryotic genomes, where gene order is far less variable than in vertebrates. However, to our knowledge, no explicit comparison of synteny and sequence based definitions of orthology has been reported in vertebrates, or, more specifically, in mammals. RESULTS We test a simple method for the measurement and utilization of gene order (local synteny) in the identification of mammalian orthologs by investigating the agreement between coding sequence based orthology (Inparanoid) and local synteny based orthology. In the 5 mammalian genomes studied, 93% of the sampled inter-species pairs were found to be concordant between the two orthology methods, illustrating that local synteny is a robust substitute to coding sequence for identifying orthologs. However, 7% of pairs were found to be discordant between local synteny and Inparanoid. These cases of discordance result from evolutionary events including retrotransposition and genome rearrangements. CONCLUSIONS By analyzing cases of discordance between local synteny and Inparanoid we show that local synteny can distinguish between true orthologs and recent retrogenes, can resolve ambiguous many-to-many orthology relationships into one-to-one ortholog pairs, and might be used to identify cases of non-orthologous gene displacement by retroduplicated paralogs.
Collapse
Affiliation(s)
- Jin Jun
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269, USA
| | | | | |
Collapse
|
26
|
Lemoine F, Labedan B, Lespinet O. SynteBase/SynteView: a tool to visualize gene order conservation in prokaryotic genomes. BMC Bioinformatics 2008; 9:536. [PMID: 19087285 PMCID: PMC2667195 DOI: 10.1186/1471-2105-9-536] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2008] [Accepted: 12/16/2008] [Indexed: 11/16/2022] Open
Abstract
Background It has been repeatedly observed that gene order is rapidly lost in prokaryotic genomes. However, persistent synteny blocks are found when comparing more or less distant species. These genes that remain consistently adjacent are appealing candidates for the study of genome evolution and a more accurate definition of their functional role. Such studies require visualizing conserved synteny blocks in a large number of genomes at all taxonomic distances. Results After comparing nearly 600 completely sequenced genomes encompassing the whole prokaryotic tree of life, the computed synteny data were assembled in a relational database, SynteBase. SynteView was designed to visualize conserved synteny blocks in a large number of genomes after choosing one of them as a reference. SynteView functions with data stored either in SynteBase or in a home-made relational database of personal data. In addition, this software can compute on-the-fly and display the distribution of synteny blocks which are conserved in pairs of genomes. This tool has been designed to provide a wealth of information on each positional orthologous gene, to be user-friendly and customizable. It is also possible to download sequences of genes belonging to these synteny blocks for further studies. SynteView is accessible through Java Webstart at . Conclusion SynteBase answers queries about gene order conservation and SynteView visualizes the obtained results in a flexible and powerful way which provides a comparative overview of the conserved synteny in a large number of genomes, whatever their taxonomic distances.
Collapse
Affiliation(s)
- Frédéric Lemoine
- Institut de Génétique et Microbiologie, Université Paris Sud XI, CNRS UMR 8621, Bât, 400, 91405 Orsay Cedex, France.
| | | | | |
Collapse
|
27
|
Lemoine F, Labedan B, Froidevaux C. GenoQuery: a new querying module for functional annotation in a genomic warehouse. Bioinformatics 2008; 24:i322-9. [PMID: 18586731 PMCID: PMC2718637 DOI: 10.1093/bioinformatics/btn159] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Motivation: We have to cope with both a deluge of new genome sequences and a huge amount of data produced by high-throughput approaches used to exploit these genomic features. Crossing and comparing such heterogeneous and disparate data will help improving functional annotation of genomes. This requires designing elaborate integration systems such as warehouses for storing and querying these data. Results: We have designed a relational genomic warehouse with an original multi-layer architecture made of a databases layer and an entities layer. We describe a new querying module, GenoQuery, which is based on this architecture. We use the entities layer to define mixed queries. These mixed queries allow searching for instances of biological entities and their properties in the different databases, without specifying in which database they should be found. Accordingly, we further introduce the central notion of alternative queries. Such queries have the same meaning as the original mixed queries, while exploiting complementarities yielded by the various integrated databases of the warehouse. We explain how GenoQuery computes all the alternative queries of a given mixed query. We illustrate how useful this querying module is by means of a thorough example. Availability:http://www.lri.fr/~lemoine/GenoQuery/ Contact:chris@lri.fr, lemoine@lri.fr
Collapse
Affiliation(s)
- Frédéric Lemoine
- Institut de Génétique et Microbiologie, Université Paris-Sud XI, 91405 Orsay Cedex, France
| | | | | |
Collapse
|