1
|
Blair MW, Cortés AJ, Farmer AD, Huang W, Ambachew D, Penmetsa RV, Carrasquilla-Garcia N, Assefa T, Cannon SB. Uneven recombination rate and linkage disequilibrium across a reference SNP map for common bean (Phaseolus vulgaris L.). PLoS One 2018; 13:e0189597. [PMID: 29522524 PMCID: PMC5844515 DOI: 10.1371/journal.pone.0189597] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Accepted: 11/28/2017] [Indexed: 12/20/2022] Open
Abstract
Recombination (R) rate and linkage disequilibrium (LD) analyses are the basis for plant breeding. These vary by breeding system, by generation of inbreeding or outcrossing and by region in the chromosome. Common bean (Phaseolus vulgaris L.) is a favored food legume with a small sequenced genome (514 Mb) and n = 11 chromosomes. The goal of this study was to describe R and LD in the common bean genome using a 768-marker array of single nucleotide polymorphisms (SNP) based on Trans-legume Orthologous Group (TOG) genes along with an advanced-generation Recombinant Inbred Line reference mapping population (BAT93 x Jalo EEP558) and an internationally available diversity panel. A whole genome genetic map was created that covered all eleven linkage groups (LG). The LGs were linked to the physical map by sequence data of the TOGs compared to each chromosome sequence of common bean. The genetic map length in total was smaller than for previous maps reflecting the precision of allele calling and mapping with SNP technology as well as the use of gene-based markers. A total of 91.4% of TOG markers had singleton hits with annotated Pv genes and all mapped outside of regions of resistance gene clusters. LD levels were found to be stronger within the Mesoamerican genepool and decay more rapidly within the Andean genepool. The recombination rate across the genome was 2.13 cM / Mb but R was found to be highly repressed around centromeres and frequent outside peri-centromeric regions. These results have important implications for association and genetic mapping or crop improvement in common bean.
Collapse
Affiliation(s)
- Matthew W. Blair
- Department of Agricultural & Environmental Science, Tennessee State University (TSU), Nashville, Tennessee, United States of America
| | - Andrés J. Cortés
- Colombian Corporation for Agricultural Research (CORPOICA), C.I. La Selva, Rionegro, Department of Antioquia, Colombia
| | - Andrew D. Farmer
- National Center for Genome Resources (NCGR), Santa Fe, New Mexico, United States of America
| | - Wei Huang
- Iowa State University (ISU), Ames, Iowa, United States of America
| | - Daniel Ambachew
- Department of Agricultural & Environmental Science, Tennessee State University (TSU), Nashville, Tennessee, United States of America
| | - R. Varma Penmetsa
- University of California, Davis (US-D), California, United States of America
| | | | - Teshale Assefa
- Iowa State University (ISU), Ames, Iowa, United States of America
- United States Department of Agriculture - Agricultural Research Service (USDA-ARS), Corn Insects and Crop Genetics Research Unit, Ames, Iowa, United States of America
| | - Steven B. Cannon
- Iowa State University (ISU), Ames, Iowa, United States of America
- United States Department of Agriculture - Agricultural Research Service (USDA-ARS), Corn Insects and Crop Genetics Research Unit, Ames, Iowa, United States of America
| |
Collapse
|
2
|
Zhao SY, Chen LY, Muchuku JK, Hu GW, Wang QF. Genetic Adaptation of Giant Lobelias (Lobelia aberdarica and Lobelia telekii) to Different Altitudes in East African Mountains. FRONTIERS IN PLANT SCIENCE 2016; 7:488. [PMID: 27148313 PMCID: PMC4828460 DOI: 10.3389/fpls.2016.00488] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2016] [Accepted: 03/25/2016] [Indexed: 06/01/2023]
Abstract
The giant lobelias in East African mountains are good models for studying molecular mechanisms of adaptation to different altitudes. In this study, we generated RNA-seq data of a middle-altitude species Lobelia aberdarica and a high-altitude species L. telekii, followed by selective pressure estimation of their orthologous genes. Our aim was to explore the important genes potentially involved in adaptation to different altitudes. About 9.3 Gb of clean nucleotides, 167,929-170,534 unigenes with total lengths of 159,762,099-171,138,936 bp for each of the two species were generated. OrthoMCL method identified 3,049 1:1 orthologous genes (each species was represented by one ortholog). Estimations of non-synonymous to synonymous rate were performed using an approximate method and a maximum likelihood method in PAML. Eighty-five orthologous genes were under positive selection. At least 8 of these genes are possibly involved in DNA repair, response to DNA damage and temperature stimulus, and regulation of gene expression, which hints on how giant lobelias adapt to high altitudinal environment that characterized by cold, low oxygen, and strong ultraviolet radiation. The negatively selected genes are over-represented in Gene Ontology terms of hydrolase, macromolecular complex assembly among others. This study sheds light on understanding the molecular mechanism of adaptation to different altitudes, and provides genomic resources for further studies of giant lobelias.
Collapse
Affiliation(s)
- Shu-Ying Zhao
- Key Laboratory of Aquatic Botany and Watershed Ecology, Wuhan Botanical Garden, Chinese Academy of SciencesWuhan, China
- Sino-Africa Joint Research Centre, Chinese Academy of SciencesWuhan, China
| | - Ling-Yun Chen
- Key Laboratory of Aquatic Botany and Watershed Ecology, Wuhan Botanical Garden, Chinese Academy of SciencesWuhan, China
- Sino-Africa Joint Research Centre, Chinese Academy of SciencesWuhan, China
| | - John K. Muchuku
- Key Laboratory of Aquatic Botany and Watershed Ecology, Wuhan Botanical Garden, Chinese Academy of SciencesWuhan, China
- Sino-Africa Joint Research Centre, Chinese Academy of SciencesWuhan, China
| | - Guang-Wan Hu
- Sino-Africa Joint Research Centre, Chinese Academy of SciencesWuhan, China
| | - Qing-Feng Wang
- Key Laboratory of Aquatic Botany and Watershed Ecology, Wuhan Botanical Garden, Chinese Academy of SciencesWuhan, China
- Sino-Africa Joint Research Centre, Chinese Academy of SciencesWuhan, China
| |
Collapse
|
3
|
de Vries S, Nemesio-Gorriz M, Blair PB, Karlsson M, Mukhtar MS, Elfstrand M. Heterotrimeric G-proteins in Picea abies and their regulation in response to Heterobasidion annosum s.l. infection. BMC PLANT BIOLOGY 2015; 15:287. [PMID: 26654722 PMCID: PMC4676809 DOI: 10.1186/s12870-015-0676-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2015] [Accepted: 12/03/2015] [Indexed: 06/05/2023]
Abstract
BACKGROUND Heterotrimeric G-proteins are important signalling switches, present in all eukaryotic kingdoms. In plants they regulate several developmental functions and play an important role in plant-microbe interactions. The current knowledge on plant G-proteins is mostly based on model angiosperms and little is known about the G-protein repertoire and function in other lineages. In this study we investigate the heterotrimeric G-protein subunit repertoire in Pinaceae, including phylogenetic relationships, radiation and sequence diversity levels in relation to other plant linages. We also investigate functional diversification of the G-protein complex in Picea abies by analysing transcriptional regulation of the G-protein subunits in different tissues and in response to pathogen infection. RESULTS A full repertoire of G-protein subunits in several conifer species were identified in silico. The full-length P. abies coding regions of one Gα-, one Gβ- and four Gγ-subunits were cloned and sequenced. The phylogenetic analysis of the Gγ-subunits showed that PaGG1 clustered with A-type-like subunits, PaGG3 and PaGG4 clustered with C-type-like subunits, while PaGG2 and its orthologs represented a novel conifer-specific putative Gγ-subunit type. Gene expression analyses by quantitative PCR of P. abies G-protein subunits showed specific up-regulation of the Gα-subunit gene PaGPA1 and the Gγ-subunit gene PaGG1 in response to Heterobasidion annosum sensu lato infection. CONCLUSIONS Conifers possess a full repertoire of G-protein subunits. The differential regulation of PaGPA1 and PaGG1 indicates that the heterotrimeric G-protein complex represents a critical linchpin in Heterobasidion annosum s.l. perception and downstream signaling in P. abies.
Collapse
Affiliation(s)
- Sophie de Vries
- Department of Forest Mycology and Plant Pathology, Uppsala Biocenter, Swedish University of Agricultural Sciences, Uppsala, Sweden.
- Institute of Population Genetics, Heinrich Heine-University, Düsseldorf, Germany.
| | - Miguel Nemesio-Gorriz
- Department of Forest Mycology and Plant Pathology, Uppsala Biocenter, Swedish University of Agricultural Sciences, Uppsala, Sweden.
| | - Peter B Blair
- Department of Biology, The University of Alabama at Birmingham, Birmingham, AL, USA.
| | - Magnus Karlsson
- Department of Forest Mycology and Plant Pathology, Uppsala Biocenter, Swedish University of Agricultural Sciences, Uppsala, Sweden.
| | - M Shahid Mukhtar
- Department of Biology, The University of Alabama at Birmingham, Birmingham, AL, USA.
| | - Malin Elfstrand
- Department of Forest Mycology and Plant Pathology, Uppsala Biocenter, Swedish University of Agricultural Sciences, Uppsala, Sweden.
| |
Collapse
|
4
|
Alexeyenko A, Lindberg J, Pérez-Bercoff A, Sonnhammer ELL. Overview and comparison of ortholog databases. DRUG DISCOVERY TODAY. TECHNOLOGIES 2014; 3:137-43. [PMID: 24980400 DOI: 10.1016/j.ddtec.2006.06.002] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Orthologs are an indispensable bridge to transfer biological knowledge between species, from protein annotations to sophisticated disease models. However, orthology assignment is not trivial. A large number of resources now exist, each with its own idiosyncrasies. The goal of this review is to compare their contents and clarify which database is most suited for a certain task.:
Collapse
Affiliation(s)
- Andrey Alexeyenko
- Stockholm Bioinformatics Center, Albanova, Stockholm University, SE-106 91, Stockholm, Sweden
| | - Julia Lindberg
- Stockholm Bioinformatics Center, Albanova, Stockholm University, SE-106 91, Stockholm, Sweden
| | - Asa Pérez-Bercoff
- Stockholm Bioinformatics Center, Albanova, Stockholm University, SE-106 91, Stockholm, Sweden
| | - Erik L L Sonnhammer
- Stockholm Bioinformatics Center, Albanova, Stockholm University, SE-106 91, Stockholm, Sweden.
| |
Collapse
|
5
|
Zhang LN, Zhang XZ, Zhang YX, Zeng CX, Ma PF, Zhao L, Guo ZH, Li DZ. Identification of putative orthologous genes for the phylogenetic reconstruction of temperate woody bamboos (Poaceae: Bambusoideae). Mol Ecol Resour 2014; 14:988-99. [PMID: 24606129 DOI: 10.1111/1755-0998.12248] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2013] [Revised: 03/02/2014] [Accepted: 03/04/2014] [Indexed: 11/29/2022]
Abstract
The temperate woody bamboos (Arundinarieae) are highly diverse in morphology but lack a substantial amount of genetic variation. The taxonomy of this lineage is intractable, and the relationships within the tribe have not been well resolved. Recent studies indicated that this tribe could have a complex evolutionary history. Although phylogenetic studies of the tribe have been carried out, most of these phylogenetic reconstructions were based on plastid data, which provide lower phylogenetic resolution compared with nuclear data. In this study, we intended to identify a set of desirable nuclear genes for resolving the phylogeny of the temperate woody bamboos. Using two different methodologies, we identified 209 and 916 genes, respectively, as putative single copy orthologous genes. A total of 112 genes was successfully amplified and sequenced by next-generation sequencing technologies in five species sampled from the tribe. As most of the genes exhibited intra-individual allele heterozygotes, we investigated phylogenetic utility by reconstructing the phylogeny based on individual genes. Discordance among gene trees was observed and, to resolve the conflict, we performed a range of analyses using BUCKy and HybTree. While caution should be taken when inferring a phylogeny from multiple conflicting genes, our analysis indicated that 74 of the 112 investigated genes are potential markers for resolving the phylogeny of the temperate woody bamboos.
Collapse
Affiliation(s)
- Li-Na Zhang
- Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, 650201, China; Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, 650201, China; Kunming College of Life Sciences, University of Chinese Academy of Sciences, Kunming, Yunnan, 650201, China
| | | | | | | | | | | | | | | |
Collapse
|
6
|
Niu C, Yu D, Wang Y, Ren H, Jin Y, Zhou W, Li B, Cheng Y, Yue J, Gao Z, Liang L. Common and pathogen-specific virulence factors are different in function and structure. Virulence 2013; 4:473-82. [PMID: 23863604 PMCID: PMC5359729 DOI: 10.4161/viru.25730] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
In the process of host–pathogen interactions, bacterial pathogens always employ some special genes, e.g., virulence factors (VFs) to interact with host and cause damage or diseases to host. A number of VFs have been identified in bacterial pathogens that confer upon bacterial pathogens the ability to cause various types of damage or diseases. However, it has been clarified that some of the identified VFs are also encoded in the genomes of nonpathogenic bacteria, and this finding gives rise to considerable controversy about the definition of virulence factor.
Here 1988 virulence factors of 51 sequenced pathogenic bacterial genomes from the virulence factor database (VFDB) were collected, and an orthologous comparison to a non-pathogenic bacteria protein database was conducted using the reciprocal-best-BLAST-hits approach. Six hundred and twenty pathogen-specific VFs and 1368 common VFs (present in both pathogens and nonpathogens) were identified, which account for 31.19% and 68.81% of the total VFs, respectively. The distribution of pathogen-specific VFs and common VFs in pathogenicity islands (PAIs) was systematically investigated, and pathogen-specific VFs were more likely to be located in PAIs than common VFs. The function of the two classes of VFs were also analyzed and compared in depth. Our results indicated that most but not all T3SS proteins are pathogen-specific. T3SS effector proteins tended to be distributed in pathogen-specific VFs, whereas T3SS translocation proteins, apparatus proteins, and chaperones were inclined to be distributed in common VFs. We also observed that exotoxins were located in both pathogen-specific and common VFs. In addition, the architecture of the two classes of VFs was compared, and the results indicated that common VFs had a higher domain number and lower domain coverage value, revealed that common VFs tend to be more complex and less compact proteins.
Collapse
Affiliation(s)
- Chao Niu
- Tianjin Institute of Health & Environmental Medicine, Tianjin, People's Republic of China
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Nakaya A, Katayama T, Itoh M, Hiranuka K, Kawashima S, Moriya Y, Okuda S, Tanaka M, Tokimatsu T, Yamanishi Y, Yoshizawa AC, Kanehisa M, Goto S. KEGG OC: a large-scale automatic construction of taxonomy-based ortholog clusters. Nucleic Acids Res 2012. [PMID: 23193276 PMCID: PMC3531156 DOI: 10.1093/nar/gks1239] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
The identification of orthologous genes in an increasing number of fully sequenced genomes is a challenging issue in recent genome science. Here we present KEGG OC (http://www.genome.jp/tools/oc/), a novel database of ortholog clusters (OCs). The current version of KEGG OC contains 1 176 030 OCs, obtained by clustering 8 357 175 genes in 2112 complete genomes (153 eukaryotes, 1830 bacteria and 129 archaea). The OCs were constructed by applying the quasi-clique-based clustering method to all possible protein coding genes in all complete genomes, based on their amino acid sequence similarities. It is computationally efficient to calculate OCs, which enables to regularly update the contents. KEGG OC has the following two features: (i) It consists of all complete genomes of a wide variety of organisms from three domains of life, and the number of organisms is the largest among the existing databases; and (ii) It is compatible with the KEGG database by sharing the same sets of genes and identifiers, which leads to seamless integration of OCs with useful components in KEGG such as biological pathways, pathway modules, functional hierarchy, diseases and drugs. The KEGG OC resources are accessible via OC Viewer that provides an interactive visualization of OCs at different taxonomic levels.
Collapse
Affiliation(s)
- Akihiro Nakaya
- Center for Transdisciplinary Research, Niigata University, 1-757 Asahimachi-dori, Chuo-ku, Niigata 951-8585, Japan
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Luo S, Zhang Y, Hu Q, Chen J, Li K, Lu C, Liu H, Wang W, Kuang H. Dynamic nucleotide-binding site and leucine-rich repeat-encoding genes in the grass family. PLANT PHYSIOLOGY 2012; 159:197-210. [PMID: 22422941 PMCID: PMC3375961 DOI: 10.1104/pp.111.192062] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2011] [Accepted: 03/12/2012] [Indexed: 05/20/2023]
Abstract
The proper use of resistance genes (R genes) requires a comprehensive understanding of their genomics and evolution. We analyzed genes encoding nucleotide-binding sites and leucine-rich repeats in the genomes of rice (Oryza sativa), maize (Zea mays), sorghum (Sorghum bicolor), and Brachypodium distachyon. Frequent deletions and translocations of R genes generated prevalent presence/absence polymorphism between different accessions/species. The deletions were caused by unequal crossover, homologous repair, nonhomologous repair, or other unknown mechanisms. R gene loci identified from different genomes were mapped onto the chromosomes of rice cv Nipponbare using comparative genomics, resulting in an integrated map of 495 R loci. Sequence analysis of R genes from the partially sequenced genomes of an African rice cultivar and 10 wild accessions suggested that there are many additional R gene lineages in the AA genome of Oryza. The R genes with chimeric structures (termed type I R genes) are diverse in different rice accessions but only account for 5.8% of all R genes in the Nipponbare genome. In contrast, the vast majority of R genes in the rice genome are type II R genes, which are highly conserved in different accessions. Surprisingly, pseudogene-causing mutations in some type II lineages are often conserved, indicating that their conservations were not due to their functions. Functional R genes cloned from rice so far have more type II R genes than type I R genes, but type I R genes are predicted to contribute considerable diversity in wild species. Type I R genes tend to reduce the microsynteny of their flanking regions significantly more than type II R genes, and their flanking regions have slightly but significantly lower G/C content than those of type II R genes.
Collapse
Affiliation(s)
| | | | - Qun Hu
- Key Laboratory of Horticulture Biology, Ministry of Education, and Department of Vegetable Crops, College of Horticulture and Forestry, Huazhong Agricultural University, Wuhan, People’s Republic of China, 430070 (S.L., Y.Z., Q.H., J.C., K.L, C.L., H.K.); and Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, People’s Republic of China, 650223 (H.L., W.W.)
| | - Jiongjiong Chen
- Key Laboratory of Horticulture Biology, Ministry of Education, and Department of Vegetable Crops, College of Horticulture and Forestry, Huazhong Agricultural University, Wuhan, People’s Republic of China, 430070 (S.L., Y.Z., Q.H., J.C., K.L, C.L., H.K.); and Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, People’s Republic of China, 650223 (H.L., W.W.)
| | - Kunpeng Li
- Key Laboratory of Horticulture Biology, Ministry of Education, and Department of Vegetable Crops, College of Horticulture and Forestry, Huazhong Agricultural University, Wuhan, People’s Republic of China, 430070 (S.L., Y.Z., Q.H., J.C., K.L, C.L., H.K.); and Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, People’s Republic of China, 650223 (H.L., W.W.)
| | - Chen Lu
- Key Laboratory of Horticulture Biology, Ministry of Education, and Department of Vegetable Crops, College of Horticulture and Forestry, Huazhong Agricultural University, Wuhan, People’s Republic of China, 430070 (S.L., Y.Z., Q.H., J.C., K.L, C.L., H.K.); and Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, People’s Republic of China, 650223 (H.L., W.W.)
| | - Hui Liu
- Key Laboratory of Horticulture Biology, Ministry of Education, and Department of Vegetable Crops, College of Horticulture and Forestry, Huazhong Agricultural University, Wuhan, People’s Republic of China, 430070 (S.L., Y.Z., Q.H., J.C., K.L, C.L., H.K.); and Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, People’s Republic of China, 650223 (H.L., W.W.)
| | - Wen Wang
- Key Laboratory of Horticulture Biology, Ministry of Education, and Department of Vegetable Crops, College of Horticulture and Forestry, Huazhong Agricultural University, Wuhan, People’s Republic of China, 430070 (S.L., Y.Z., Q.H., J.C., K.L, C.L., H.K.); and Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, People’s Republic of China, 650223 (H.L., W.W.)
| | - Hanhui Kuang
- Key Laboratory of Horticulture Biology, Ministry of Education, and Department of Vegetable Crops, College of Horticulture and Forestry, Huazhong Agricultural University, Wuhan, People’s Republic of China, 430070 (S.L., Y.Z., Q.H., J.C., K.L, C.L., H.K.); and Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, People’s Republic of China, 650223 (H.L., W.W.)
| |
Collapse
|
9
|
Woodcock MR. Nested Hierarchal Organization of Conservation for MicroRNAs and Their Putative Targets to Drosophila melanogaster. Chem Biodivers 2012; 9:945-64. [DOI: 10.1002/cbdv.201100358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
10
|
Song G, Riemer C, Dickins B, Kim HL, Zhang L, Zhang Y, Hsu CH, Hardison RC, Nisc Comparative Sequencing Program, Green ED, Miller W. Revealing mammalian evolutionary relationships by comparative analysis of gene clusters. Genome Biol Evol 2012; 4:586-601. [PMID: 22454131 PMCID: PMC3342878 DOI: 10.1093/gbe/evs032] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/19/2012] [Indexed: 12/13/2022] Open
Abstract
Many software tools for comparative analysis of genomic sequence data have been released in recent decades. Despite this, it remains challenging to determine evolutionary relationships in gene clusters due to their complex histories involving duplications, deletions, inversions, and conversions. One concept describing these relationships is orthology. Orthologs derive from a common ancestor by speciation, in contrast to paralogs, which derive from duplication. Discriminating orthologs from paralogs is a necessary step in most multispecies sequence analyses, but doing so accurately is impeded by the occurrence of gene conversion events. We propose a refined method of orthology assignment based on two paradigms for interpreting its definition: by genomic context or by sequence content. X-orthology (based on context) traces orthology resulting from speciation and duplication only, while N-orthology (based on content) includes the influence of conversion events. We developed a computational method for automatically mapping both types of orthology on a per-nucleotide basis in gene cluster regions studied by comparative sequencing, and we make this mapping accessible by visualizing the output. All of these steps are incorporated into our newly extended CHAP 2 package. We evaluate our method using both simulated data and real gene clusters (including the well-characterized α-globin and β-globin clusters). We also illustrate use of CHAP 2 by analyzing four more loci: CCL (chemokine ligand), IFN (interferon), CYP2abf (part of cytochrome P450 family 2), and KIR (killer cell immunoglobulin-like receptors). These new methods facilitate and extend our understanding of evolution at these and other loci by adding automated accurate evolutionary inference to the biologist's toolkit. The CHAP 2 package is freely available from http://www.bx.psu.edu/miller_lab.
Collapse
Affiliation(s)
- Giltae Song
- Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, PA, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Thatoi H, Patra JK. Biotechnology and Pharmacological Evaluation of Medicinal Plants: An Overview. ACTA ACUST UNITED AC 2011. [DOI: 10.1080/10496475.2011.602471] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
12
|
Antonescu C, Antonescu V, Sultana R, Quackenbush J. Using the DFCI gene index databases for biological discovery. ACTA ACUST UNITED AC 2010; Chapter 1:1.6.1-1.6.36. [PMID: 20205187 DOI: 10.1002/0471250953.bi0106s29] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The DFCI Gene Index Web pages provide access to analyses of ESTs and gene sequences for nearly 114 species, as well as a number of resources derived from these. Each species-specific database is presented using a common format with a home page. A variety of methods exist that allow users to search each species-specific database. Methods implemented currently include nucleotide or protein sequence queries using WU-BLAST, text-based searches using various sequence identifiers, searches by gene, tissue and library name, and searches using functional classes through Gene Ontology assignments. This protocol provides guidance for using the Gene Index Databases to extract information.
Collapse
|
13
|
Genetic analysis of gene expression for pigmentation in Chinese cabbage (Brassica rapa). BIOCHIP JOURNAL 2010. [DOI: 10.1007/s13206-010-4206-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
14
|
Mazza R, Strozzi F, Caprera A, Ajmone-Marsan P, Williams JL. The other side of comparative genomics: genes with no orthologs between the cow and other mammalian species. BMC Genomics 2009; 10:604. [PMID: 20003425 PMCID: PMC2808326 DOI: 10.1186/1471-2164-10-604] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2009] [Accepted: 12/14/2009] [Indexed: 11/10/2022] Open
Abstract
Background With the rapid growth in the availability of genome sequence data, the automated identification of orthologous genes between species (orthologs) is of fundamental importance to facilitate functional annotation and studies on comparative and evolutionary genomics. Genes with no apparent orthologs between the bovine and human genome may be responsible for major differences between the species, however, such genes are often neglected in functional genomics studies. Results A BLAST-based method was exploited to explore the current annotation and orthology predictions in Ensembl. Genes with no orthologs between the two genomes were classified into groups based on alignments, ontology, manual curation and publicly available information. Starting from a high quality and specific set of orthology predictions, as provided by Ensembl, hidden relationship between genes and genomes of different mammalian species were unveiled using a highly sensitive approach, based on sequence similarity and genomic comparison. Conclusions The analysis identified 3,801 bovine genes with no orthologs in human and 1010 human genes with no orthologs in cow, among which 411 and 43 genes, respectively, had no match at all in the other species. Most of the apparently non-orthologous genes may potentially have orthologs which were missed in the annotation process, despite having a high percentage of identity, because of differences in gene length and structure. The comparative analysis reported here identified gene variants, new genes and species-specific features and gave an overview of the other side of orthology which may help to improve the annotation of the bovine genome and the knowledge of structural differences between species.
Collapse
Affiliation(s)
- Raffaele Mazza
- Istituto di Zootecnica, Università Cattolica del Sacro Cuore, 29100 Piacenza, Italy.
| | | | | | | | | |
Collapse
|
15
|
Schreiber AW, Sutton T, Caldo RA, Kalashyan E, Lovell B, Mayo G, Muehlbauer GJ, Druka A, Waugh R, Wise RP, Langridge P, Baumann U. Comparative transcriptomics in the Triticeae. BMC Genomics 2009; 10:285. [PMID: 19558723 PMCID: PMC2717122 DOI: 10.1186/1471-2164-10-285] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2009] [Accepted: 06/29/2009] [Indexed: 01/13/2023] Open
Abstract
Background Barley and particularly wheat are two grass species of immense agricultural importance. In spite of polyploidization events within the latter, studies have shown that genotypically and phenotypically these species are very closely related and, indeed, fertile hybrids can be created by interbreeding. The advent of two genome-scale Affymetrix GeneChips now allows studies of the comparison of their transcriptomes. Results We have used the Wheat GeneChip to create a "gene expression atlas" for the wheat transcriptome (cv. Chinese Spring). For this, we chose mRNA from a range of tissues and developmental stages closely mirroring a comparable study carried out for barley (cv. Morex) using the Barley1 GeneChip. This, together with large-scale clustering of the probesets from the two GeneChips into "homologous groups", has allowed us to perform a genomic-scale comparative study of expression patterns in these two species. We explore the influence of the polyploidy of wheat on the results obtained with the Wheat GeneChip and quantify the correlation between conservation in gene sequence and gene expression in wheat and barley. In addition, we show how the conservation of expression patterns can be used to elucidate, probeset by probeset, the reliability of the Wheat GeneChip. Conclusion While there are many differences in expression on the level of individual genes and tissues, we demonstrate that the wheat and barley transcriptomes appear highly correlated. This finding is significant not only because given small evolutionary distance between the two species it is widely expected, but also because it demonstrates that it is possible to use the two GeneChips for comparative studies. This is the case even though their probeset composition reflects rather different design principles as well as, of course, the present incomplete knowledge of the gene content of the two species. We also show that, in general, the Wheat GeneChip is not able to distinguish contributions from individual homoeologs. Furthermore, the comparison between the two species leads us to conclude that the conservation of both gene sequence as well as gene expression is positively correlated with absolute expression levels, presumably reflecting increased selection pressure on genes coding for proteins present at high levels. In addition, the results indicate the presence of a correlation between sequence and expression conservation within the Triticeae.
Collapse
Affiliation(s)
- Andreas W Schreiber
- Australian Centre for Plant Functional Genomics, Univ of Adelaide, PMB 1 Glen Osmond, SA 5064, Australia.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Bettoni F, Filho FC, Grosso DM, Galante PAF, Parmigiani RB, Geraldo MV, Henrique-Silva F, Oba-Shinjo SM, Marie SKN, Soares FA, Brentani HP, Simpson AJG, de Souza SJ, Camargo AA. Identification of FAM46D as a novel cancer/testis antigen using EST data and serological analysis. Genomics 2009; 94:153-60. [PMID: 19540335 DOI: 10.1016/j.ygeno.2009.06.001] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2009] [Revised: 05/13/2009] [Accepted: 06/11/2009] [Indexed: 11/28/2022]
Abstract
Cancer/testis Antigens (CTAs) are immunogenic proteins with a restricted expression pattern in normal tissues and aberrant expression in different types of tumors being considered promising candidates for immunotherapy. We used the alignment between EST sequences and the human genome sequence to identify novel CT genes. By examining the EST tissue composition of known CT clusters we defined parameters for the selection of 1184 EST clusters corresponding to putative CT genes. The expression pattern of 70 CT gene candidates was evaluated by RT-PCR in 21 normal tissues, 17 tumor cell lines and 160 primary tumors. We were able to identify 4 CT genes expressed in different types of tumors. The presence of antibodies against the protein encoded by 1 of these 4 CT genes (FAM46D) was exclusively detected in plasma samples from cancer patients. Due to its restricted expression pattern and immunogenicity FAM46D represents a novel target for cancer immunotherapy.
Collapse
Affiliation(s)
- Fabiana Bettoni
- Ludwig Institute for Cancer Research, Hospital Alemão Oswaldo Cruz, São Paulo, SP, Brazil
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Natarajan S, Jakobsson E. Functional equivalency inferred from "authoritative sources" in networks of homologous proteins. PLoS One 2009; 4:e5898. [PMID: 19521530 PMCID: PMC2690840 DOI: 10.1371/journal.pone.0005898] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2008] [Accepted: 04/29/2009] [Indexed: 11/18/2022] Open
Abstract
A one-on-one mapping of protein functionality across different species is a critical component of comparative analysis. This paper presents a heuristic algorithm for discovering the Most Likely Functional Counterparts (MoLFunCs) of a protein, based on simple concepts from network theory. A key feature of our algorithm is utilization of the user's knowledge to assign high confidence to selected functional identification. We show use of the algorithm to retrieve functional equivalents for 7 membrane proteins, from an exploration of almost 40 genomes form multiple online resources. We verify the functional equivalency of our dataset through a series of tests that include sequence, structure and function comparisons. Comparison is made to the OMA methodology, which also identifies one-on-one mapping between proteins from different species. Based on that comparison, we believe that incorporation of user's knowledge as a key aspect of the technique adds value to purely statistical formal methods.
Collapse
Affiliation(s)
- Shreedhar Natarajan
- Biophysics and Computational Biology, University of Illinois, Urbana-Champaign, Illinois, United States of America
| | - Eric Jakobsson
- Biophysics and Computational Biology, University of Illinois, Urbana-Champaign, Illinois, United States of America
- National Center for Supercomputing Applications, University of Illinois, Urbana-Champaign, Illinois, United States of America
- Department of Molecular and Integrative Physiology, University of Illinois, Urbana-Champaign, Illinois, United States of America
- * E-mail:
| |
Collapse
|
18
|
Pan ZX, Xu D, Zhang JB, Lin F, Wu BJ, Liu HL. [Reviews in comparative genomic research based on orthologs]. YI CHUAN = HEREDITAS 2009; 31:457-463. [PMID: 19586838 DOI: 10.3724/sp.j.1005.2009.00457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
The orthologs have similar or even identical functions in different species, share the anological regulatory pathways, and play the close or even same role among species. Furthermore, the vast majority of the biological core functions were assumed to a considerable number of orthologous genes in organisms. Orthologs was the most reliable choices for functional annotation and analysis of genomic sequences, whose unique biological characteristics demonstrated that comparative genomics research based on orthologs will certainly provide clues for detecting the origin, expression, and loss of important functional genes during the biological evolution in different organisms. In this review, the fundamental characteristics of orthologous genes and the relationship between the orthologs and comparison genomics were recounted. The corresponding approaches and the current status in comparative genomic research based on the orthologs were summarized.
Collapse
Affiliation(s)
- Zeng-Xiang Pan
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing 210095, China
| | | | | | | | | | | |
Collapse
|
19
|
E(mu)-TCL1 mice represent a model for immunotherapeutic reversal of chronic lymphocytic leukemia-induced T-cell dysfunction. Proc Natl Acad Sci U S A 2009; 106:6250-5. [PMID: 19332800 DOI: 10.1073/pnas.0901166106] [Citation(s) in RCA: 99] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Preclinical animal models have largely ignored the immune-suppressive mechanisms that are important in human cancers. The identification and use of such models should allow better predictions of successful human responses to immunotherapy. As a model for changes induced in nonmalignant cells by cancer, we examined T-cell function in the chronic lymphocytic leukemia (CLL) Emu-TCL1 transgenic mouse model. With development of leukemia, Emu-TCL1 transgenic mice developed functional T-cell defects and alteration of gene and protein expression closely resembling changes seen in CLL human patients. Furthermore, infusion of CLL cells into young Emu-TCL1 mice induced defects comparable to those seen in mice with developed leukemia, demonstrating a causal relationship between leukemia and the T-cell defects. Altered pathways involved genes regulating actin remodeling, and T cells exhibited dysfunctional immunological synapse formation and T-cell signaling, which was reversed by the immunomodulatory drug lenalidomide. These results further demonstrate the utility of this animal model of CLL and define a versatile model to investigate both the molecular mechanisms of cancer-induced immune suppression and immunotherapeutic repair strategies.
Collapse
|
20
|
Ortutay C, Vihinen M. Immunome knowledge base (IKB): an integrated service for immunome research. BMC Immunol 2009; 10:3. [PMID: 19134210 PMCID: PMC2632617 DOI: 10.1186/1471-2172-10-3] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2008] [Accepted: 01/09/2009] [Indexed: 01/17/2023] Open
Abstract
Background Functioning of the immune system requires the coordinated expression and action of many genes and proteins. With the emergence of high-throughput technologies, a great amount of molecular data is available for the genes and proteins of the immune system. However, these data are scattered into several databases and literature and therefore integration is needed. Description The Immunome Knowledge Base (IKB) is a dedicated resource for immunological information. We identified and collected genes that are essential for the immunome. Nucleotide and protein sequences, as well as information about the related pseudogenes are available for 893 human essential immunome genes. To allow the study of the evolution of the immune system, data for the orthologs of human genes was collected. In addition to the human immunome, ortholog groups of 1811 metazoan immunity genes are available with information about the evidence of their immunity function. IKB combines three previous databases and several additional data items in an integrated system. Conclusion IKB provides in one single service access to several databases and resources and contains plenty of new data about immune system. The most recent addition is variation data on genomic, transcriptomic and proteomic levels for all the immunome genes and proteins. In the future, more data will be added on the function of these genes. The service has a free and public web interface.
Collapse
Affiliation(s)
- Csaba Ortutay
- Institute of Medical Technology, FI-33014 University of Tampere, Finland.
| | | |
Collapse
|
21
|
Multi-granularity Parallel Computing in a Genome-Scale Molecular Evolution Application. ACTA ACUST UNITED AC 2009. [PMID: 21841894 DOI: 10.1007/978-3-642-03275-2_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Previously [1], we reported a coarse-grained parallel computational approach to identifying rare molecular evolutionary events often referred to as horizontal gene transfers. Very high degrees of parallelism (up to 65x speedup on 4,096 processors) were reported, yet the overall execution time for a realistic problem size was still on the order of 12 days. With the availability of large numbers of compute clusters, as well as genomic sequence from more than 2,000 species containing as many as 35,000 genes each, and trillions of sequence nucleotides in all, we demonstrated the computational feasibility of a method to examine "clusters" of genes using phylogenetic tree similarity as a distance metric. A full serial solution to this problem requires years of CPU time, yet only makes modest IPC and memory demands; thus, it is an ideal candidate for a grid computing approach involving low-cost compute nodes. This paper now describes a multiple granularity parallelism solution that includes exploitation of multi-core shared memory nodes to address fine-grained aspects in the tree-clustering phase of our previous deployment of XenoCluster 1.0. In addition to benchmarking results that show up to 80% speedup efficiency on 8 CPU cores, we report on the biological accuracy and relevance of our results compared to a reported set of known xenologs in yeast.
Collapse
|
22
|
Arai Y, Hayashi M, Nishimura M. Proteomic identification and characterization of a novel peroxisomal adenine nucleotide transporter supplying ATP for fatty acid beta-oxidation in soybean and Arabidopsis. THE PLANT CELL 2008; 20:3227-40. [PMID: 19073762 PMCID: PMC2630451 DOI: 10.1105/tpc.108.062877] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2008] [Revised: 11/05/2008] [Accepted: 11/18/2008] [Indexed: 05/17/2023]
Abstract
We have identified the novel protein Glycine max PEROXISOMAL ADENINE NUCLEOTIDE CARRIER (Gm PNC1) by proteomic analyses of peroxisomal membrane proteins using a blue native/SDS-PAGE technique combined with peptide mass fingerprinting. Gm PNC1, and the Arabidopsis thaliana orthologs At PNC1 and At PNC2, were targeted to peroxisomes. Functional integration of Gm PNC1 and At PNC2 into the cytoplasmic membranes of intact Escherichia coli cells revealed ATP and ADP import activities. The amount of Gm PNC1 in cotyledons increased until 5 d after germination under constant darkness and then decreased very rapidly in response to illumination. We investigated the physiological functions of PNC1 in peroxisomal metabolism by analyzing a transgenic Arabidopsis plant in which At PNC1 and At PNC2 expression was suppressed using RNA interference. The pnc1/2i mutant required sucrose for germination and suppressed the degradation of storage lipids during postgerminative growth. These results suggest that PNC1 contributes to the transport of adenine nucleotides that are consumed by reactions that generate acyl-CoA for peroxisomal fatty acid beta-oxidation during postgerminative growth.
Collapse
Affiliation(s)
- Yuko Arai
- Department of Cell Biology, National Institute for Basic Biology, Okazaki 444-8585 Japan
| | | | | |
Collapse
|
23
|
Abstract
Automated use of phylogenetic trees to deduce orthology relationships in proteins. Reliable orthology prediction is central to comparative genomics. Although orthology is defined by phylogenetic criteria, most automated prediction methods are based on pairwise sequence comparisons. Recently, automated phylogeny-based orthology prediction has emerged as a feasible alternative for genome-wide studies.
Collapse
Affiliation(s)
- Toni Gabaldón
- Bioinformatics and Genomics Program, Center for Genomic Regulation, Doctor Aiguader 88, Barcelona, Spain.
| |
Collapse
|
24
|
McMillan LEM, Martin ACR. Automatically extracting functionally equivalent proteins from SwissProt. BMC Bioinformatics 2008; 9:418. [PMID: 18838004 PMCID: PMC2576269 DOI: 10.1186/1471-2105-9-418] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2008] [Accepted: 10/06/2008] [Indexed: 11/10/2022] Open
Abstract
Background There is a frequent need to obtain sets of functionally equivalent homologous proteins (FEPs) from different species. While it is usually the case that orthology implies functional equivalence, this is not always true; therefore datasets of orthologous proteins are not appropriate. The information relevant to extracting FEPs is contained in databanks such as UniProtKB/Swiss-Prot and a manual analysis of these data allow FEPs to be extracted on a one-off basis. However there has been no resource allowing the easy, automatic extraction of groups of FEPs – for example, all instances of protein C. We have developed FOSTA, an automatically generated database of FEPs annotated as having the same function in UniProtKB/Swiss-Prot which can be used for large-scale analysis. The method builds a candidate list of homologues and filters out functionally diverged proteins on the basis of functional annotations using a simple text mining approach. Results Large scale evaluation of our FEP extraction method is difficult as there is no gold-standard dataset against which the method can be benchmarked. However, a manual analysis of five protein families confirmed a high level of performance. A more extensive comparison with two manually verified functional equivalence datasets also demonstrated very good performance. Conclusion In summary, FOSTA provides an automated analysis of annotations in UniProtKB/Swiss-Prot to enable groups of proteins already annotated as functionally equivalent, to be extracted. Our results demonstrate that the vast majority of UniProtKB/Swiss-Prot functional annotations are of high quality, and that FOSTA can interpret annotations successfully. Where FOSTA is not successful, we are able to highlight inconsistencies in UniProtKB/Swiss-Prot annotation. Most of these would have presented equal difficulties for manual interpretation of annotations. We discuss limitations and possible future extensions to FOSTA, and recommend changes to the UniProtKB/Swiss-Prot format, which would facilitate text-mining of UniProtKB/Swiss-Prot.
Collapse
Affiliation(s)
- Lisa E M McMillan
- Research Department of Structural & Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK.
| | | |
Collapse
|
25
|
The quest for orthologs: finding the corresponding gene across genomes. Trends Genet 2008; 24:539-51. [PMID: 18819722 DOI: 10.1016/j.tig.2008.08.009] [Citation(s) in RCA: 238] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2007] [Revised: 08/20/2008] [Accepted: 08/21/2008] [Indexed: 11/23/2022]
Abstract
Orthology is a key evolutionary concept in many areas of genomic research. It provides a framework for subjects as diverse as the evolution of genomes, gene functions, cellular networks and functional genome annotation. Although orthologous proteins usually perform equivalent functions in different species, establishing true orthologous relationships requires a phylogenetic approach, which combines both trees and graphs (networks) using reliable species phylogeny and available genomic data from more than two species, and an insight into the processes of molecular evolution. Here, we evaluate the available bioinformatics tools and provide a set of guidelines to aid researchers in choosing the most appropriate tool for any situation.
Collapse
|
26
|
Fu Z, Jiang T. Clustering of main orthologs for multiple genomes. J Bioinform Comput Biol 2008; 6:573-84. [PMID: 18574863 DOI: 10.1142/s0219720008003540] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2007] [Revised: 12/01/2007] [Accepted: 01/03/2008] [Indexed: 11/18/2022]
Abstract
The identification of orthologous genes shared by multiple genomes is critical for both functional and evolutionary studies in comparative genomics. While it is usually done by sequence similarity search and reconciled tree construction in practice, recently a new combinatorial approach and high-throughput system MSOAR for ortholog identification between closely related genomes based on genome rearrangement and gene duplication has been proposed in Fu et al. MSOAR assumes that orthologous genes correspond to each other in the most parsimonious evolutionary scenario, minimizing the number of genome rearrangement and (postspeciation) gene duplication events. However, the parsimony approach used by MSOAR limits it to pairwise genome comparisons. In this paper, we extend MSOAR to multiple (closely related) genomes and propose an ortholog clustering method, called MultiMSOAR, to infer main orthologs in multiple genomes. As a preliminary experiment, we apply MultiMSOAR to rat, mouse, and human genomes, and validate our results using gene annotations and gene function classifications in the public databases. We further compare our results to the ortholog clusters predicted by MultiParanoid, which is an extension of the well-known program InParanoid for pairwise genome comparisons. The comparison reveals that MultiMSOAR gives more detailed and accurate orthology information, since it can effectively distinguish main orthologs from inparalogs.
Collapse
Affiliation(s)
- Zheng Fu
- Department of Computer Science and Engineering, University of California, Riverside, Riverside, CA 92521, USA.
| | | |
Collapse
|
27
|
Baek JM, Han P, Iandolino A, Cook DR. Characterization and comparison of intron structure and alternative splicing between Medicago truncatula, Populus trichocarpa, Arabidopsis and rice. PLANT MOLECULAR BIOLOGY 2008; 67:499-510. [PMID: 18438730 DOI: 10.1007/s11103-008-9334-4] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2007] [Accepted: 04/01/2008] [Indexed: 05/26/2023]
Abstract
Alignment of transcripts and genome sequences yielded a set of alternatively spliced transcripts in four angiosperm genomes: three dicotyledon species Medicago truncatula (Medicago), Populus trichocarpa (poplar) and Arabidopsis thaliana (Arabidopsis), and the monocotyledon Oryzae sativa (rice). Intron retention was the predominant mode of alternative splicing (AS) in each species, consistent with previous reports for Arabidopsis and rice. We analyzed the structure of 5'-splice junctions and observed commonalities between species. There was dependency of base composition between sites flanking the 5'-splice junction, with the potential to create a subset of splice sites that interact more weakly or strongly than average with U1 snRNA. Such altered nucleotide composition was correlated with splicing fidelity in all four species. For Medicago, poplar and Arabidopsis, but not in rice, alternative splicing was most prevalent for introns with decreased UA content, consistent with lower UA content for monocot introns and potentially reflecting evolved differences in splicing mechanisms. Similarly, the occurrence of AS between transcript Gene Ontology categories was positively correlated between Arabidopsis and Medicago, with no correlation between dicots and rice. Analysis of within-species paralogs and between-species reciprocal best-hit homologs yielded rare cases of potentially conserved AS events. Reverse transcriptase PCR and amplicon sequencing were used to confirm a subset of the in silico-predicted AS events within Medicago, as well as to characterize conserved AS events between Medicago and Arabidopsis.
Collapse
Affiliation(s)
- Jong-Min Baek
- College of Agricultural and Environmental Sciences Genomics Facility, University of California, 117 Robbins hall, Davis, CA 95616, USA.
| | | | | | | |
Collapse
|
28
|
Abstract
In recent years, it has become clear that all of the organisms on the Earth are related to each other in ways that can be documented by molecular sequence comparison. In this review, we focus on the evolutionary relationships among the proteins of the eukaryotes, especially those that allow inference of function from one species to another. Data and illustrations are derived from specific comparison of eight species: Homo sapiens, Mus musculus, Arabidopsis thaliana, Caenorhabditis elegans, Danio rerio, Saccharomyces cerevisiae, and Plasmodium falciparum.
Collapse
Affiliation(s)
- Kara Dolinski
- Department of Molecular Biology, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA.
| | | |
Collapse
|
29
|
Lee Y, Quackenbush J. Using the TIGR gene index databases for biological discovery. ACTA ACUST UNITED AC 2008; Chapter 1:Unit 1.6. [PMID: 18428690 DOI: 10.1002/0471250953.bi0106s03] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The TIGR Gene Index web pages provide access to analyses of ESTs and gene sequences for nearly 60 species, as well as a number of resources derived from these. Each species-specific database is presented using a common format with a homepage. A variety of methods exist that allow users to search each species-specific database. Methods implemented currently include nucleotide or protein sequence queries using WU-BLAST, text-based searches using various sequence identifiers, searches by gene, tissue and library name, and searches using functional classes through Gene Ontology assignments. This protocol provides guidance for using the Gene Index Databases to extract information.
Collapse
Affiliation(s)
- Yuandan Lee
- The Institute for Genomic Research, Rockville, Maryland, USA
| | | |
Collapse
|
30
|
Ward RM, Erdin S, Tran TA, Kristensen DM, Lisewski AM, Lichtarge O. De-orphaning the structural proteome through reciprocal comparison of evolutionarily important structural features. PLoS One 2008; 3:e2136. [PMID: 18461181 PMCID: PMC2362850 DOI: 10.1371/journal.pone.0002136] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2008] [Accepted: 03/25/2008] [Indexed: 12/01/2022] Open
Abstract
Function prediction frequently relies on comparing genes or gene products to search for relevant similarities. Because the number of protein structures with unknown function is mushrooming, however, we asked here whether such comparisons could be improved by focusing narrowly on the key functional features of protein structures, as defined by the Evolutionary Trace (ET). Therefore a series of algorithms was built to (a) extract local motifs (3D templates) from protein structures based on ET ranking of residue importance; (b) to assess their geometric and evolutionary similarity to other structures; and (c) to transfer enzyme annotation whenever a plurality was reached across matches. Whereas a prototype had only been 80% accurate and was not scalable, here a speedy new matching algorithm enabled large-scale searches for reciprocal matches and thus raised annotation specificity to 100% in both positive and negative controls of 49 enzymes and 50 non-enzymes, respectively—in one case even identifying an annotation error—while maintaining sensitivity (∼60%). Critically, this Evolutionary Trace Annotation (ETA) pipeline requires no prior knowledge of functional mechanisms. It could thus be applied in a large-scale retrospective study of 1218 structural genomics enzymes and reached 92% accuracy. Likewise, it was applied to all 2935 unannotated structural genomics proteins and predicted enzymatic functions in 320 cases: 258 on first pass and 62 more on second pass. Controls and initial analyses suggest that these predictions are reliable. Thus the large-scale evolutionary integration of sequence-structure-function data, here through reciprocal identification of local, functionally important structural features, may contribute significantly to de-orphaning the structural proteome.
Collapse
Affiliation(s)
- R. Matthew Ward
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Serkan Erdin
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Tuan A. Tran
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - David M. Kristensen
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Andreas Martin Lisewski
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
- * E-mail:
| |
Collapse
|
31
|
Arai Y, Hayashi M, Nishimura M. Proteomic analysis of highly purified peroxisomes from etiolated soybean cotyledons. PLANT & CELL PHYSIOLOGY 2008; 49:526-39. [PMID: 18281324 DOI: 10.1093/pcp/pcn027] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
To identify previously unknown peroxisomal proteins, we established an optimized method for isolating highly purified peroxisomes from etiolated soybean cotyledons using Percoll density gradient centrifugation followed by iodixanol density gradient centrifugation. Proteins in highly purified peroxisomes were separated by two-dimensional PAGE. We performed peptide mass fingerprinting of proteins separated in the gel with matrix-assisted laser desorption ionization time-of-flight mass spectrometry and used the peptide mass fingerprints to search a non-redundant soybean expressed sequence tag database. We succeeded in assigning 92 proteins to 70 sequences in the database. Among them, proteins encoded by 30 sequences were judged to be located in peroxisomes. These included enzymes for fatty acid beta-oxidation, the glyoxylate cycle, photorespiratory glycolate metabolism, stress response and metabolite transport. We also show experimental evidence that plant peroxisomes contain a short-chain dehydrogenase/reductase family protein, enoyl-CoA hydratase/isomerase family protein, 3-hydroxyacyl-CoA dehydrogenase-like protein and a voltage-dependent anion-selective channel protein.
Collapse
Affiliation(s)
- Yuko Arai
- Department of Cell Biology, National Institute for Basic Biology, Okazaki 444-8585 Japan
| | | | | |
Collapse
|
32
|
Fu Z, Chen X, Vacic V, Nan P, Zhong Y, Jiang T. MSOAR: a high-throughput ortholog assignment system based on genome rearrangement. J Comput Biol 2008; 14:1160-75. [PMID: 17990975 DOI: 10.1089/cmb.2007.0048] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The assignment of orthologous genes between a pair of genomes is a fundamental and challenging problem in comparative genomics, since many computational methods for solving various biological problems critically rely on bona fide orthologs as input. While it is usually done using sequence similarity search, we recently proposed a new combinatorial approach that combines sequence similarity and genome rearrangement. This paper continues the development of the approach and unites genome rearrangement events and (post-speciation) duplication events in a single framework under the parsimony principle. In this framework, orthologous genes are assumed to correspond to each other in the most parsimonious evolutionary scenario involving both genome rearrangement and (post-speciation) gene duplication. Besides several original algorithmic contributions, the enhanced method allows for the detection of inparalogs. Following this approach, we have implemented a high-throughput system for ortholog assignment on a genome scale, called MSOAR, and applied it to human and mouse genomes. As the result will show, MSOAR is able to find 99 more true orthologs than the INPARANOID program did. In comparison to the iterated exemplar algorithm on simulated data, MSOAR performed favorably in terms of assignment accuracy. We also validated our predicted main ortholog pairs between human and mouse using public ortholog assignment datasets, synteny information, and gene function classification. These test results indicate that our approach is very promising for genome-wide ortholog assignment. Supplemental material and MSOAR program are available at http://msoar.cs.ucr.edu.
Collapse
Affiliation(s)
- Zheng Fu
- Department of Computer Science and Engineering, University of California, Riverside, California 92521, USA.
| | | | | | | | | | | |
Collapse
|
33
|
Barbosa-Silva A, Satagopam VP, Schneider R, Ortega JM. Clustering of cognate proteins among distinct proteomes derived from multiple links to a single seed sequence. BMC Bioinformatics 2008; 9:141. [PMID: 18321373 PMCID: PMC2277401 DOI: 10.1186/1471-2105-9-141] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2007] [Accepted: 03/05/2008] [Indexed: 01/10/2023] Open
Abstract
Background Modern proteomes evolved by modification of pre-existing ones. It is extremely important to comparative biology that related proteins be identified as members of the same cognate group, since a characterized putative homolog could be used to find clues about the function of uncharacterized proteins from the same group. Typically, databases of related proteins focus on those from completely-sequenced genomes. Unfortunately, relatively few organisms have had their genomes fully sequenced; accordingly, many proteins are ignored by the currently available databases of cognate proteins, despite the high amount of important genes that are functionally described only for these incomplete proteomes. Results We have developed a method to cluster cognate proteins from multiple organisms beginning with only one sequence, through connectivity saturation with that Seed sequence. We show that the generated clusters are in agreement with some other approaches based on full genome comparison. Conclusion The method produced results that are as reliable as those produced by conventional clustering approaches. Generating clusters based only on individual proteins of interest is less time consuming than generating clusters for whole proteomes.
Collapse
Affiliation(s)
- Adriano Barbosa-Silva
- Laboratório de Biodados, Dep, Bioquímica e Imunologia, Instituto de Ciências Biológicas, UFMG, Av, Antônio Carlos 6627, Belo Horizonte, MG, Brasil.
| | | | | | | |
Collapse
|
34
|
Abstract
The promise of the genome project was that a complete sequence would provide us with information that would transform biology and medicine. But the 'parts list' that has emerged from the genome project is far from the 'wiring diagram' and 'circuit logic' we need to understand the link between genotype, environment and phenotype. While genomic technologies such as DNA microarrays, proteomics and metabolomics have given us new tools and new sources of data to address these problems, a number of crucial elements remain to be addressed before we can begin to close the loop and develop a predictive quantitative biology that is the stated goal of so much of current biological research, including systems biology. Our approach to this problem has largely been one of integration, bringing together a vast wealth of information to better interpret the experimental data we are generating in genomic assays and creating publicly available databases and software tools to facilitate the work of others. Recently, we have used a similar approach to trying to understand the biological networks that underlie the phenotypic responses we observe and starting us on the road to developing a predictive biology.
Collapse
Affiliation(s)
- John Quackenbush
- Department of Biostatistics and Computational Biology and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA.
| |
Collapse
|
35
|
Rannikko K, Ortutay C, Vihinen M. Immunity genes and their orthologs: a multi-species database. Int Immunol 2007; 19:1361-70. [PMID: 17965450 DOI: 10.1093/intimm/dxm109] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Metazoan species, from sponges to insects and mammals, possess successful defence systems against their pathogens and parasites. The evolutionary origins of these diverse systems are beginning to be more comprehensively investigated and mapped out. We have collected 1811 metazoan immunity genes from literature and gene ontology annotations. Tentative orthologs of these genes were identified using reciprocal protein-protein Blast searches against proteins from the GenBank and RefSeq databases. We have defined different levels or classes of ortholog group according to the order of reciprocal ortholog pairs among the seed immunity genes. The genes were clustered into these different ortholog groups. Initial phylogenetic analysis of these ortholog groups suggests that by this approach, we can collect a spectrum of immunity genes representing well the taxa in which they appear. All the immunity genes and their evidence of immune function, orthologs and ortholog groups have been combined into an open access database -- ImmunomeBase, which is publicly available from (http://bioinf.uta.fi/ImmunomeBase).
Collapse
Affiliation(s)
- Kathryn Rannikko
- Bioinformatics Research Group, Institute of Medical Technology, FI-33014, University of Tampere, Finland
| | | | | |
Collapse
|
36
|
Nickel GC, Tefft D, Adams MD. Human PAML browser: a database of positive selection on human genes using phylogenetic methods. Nucleic Acids Res 2007; 36:D800-8. [PMID: 17962310 PMCID: PMC2238824 DOI: 10.1093/nar/gkm764] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
With the recent increase in the number of mammalian genomes being sequenced, large-scale genome scans for human-specific positive selection are now possible. Selection can be inferred through phylogenetic analysis by comparing the rates of silent and replacement substitution between related species. Maximum-likelihood (ML) analysis of codon substitution models can be used to identify genes with an accelerated pattern of amino acid substitution on a particular lineage. However, the ML methods are computationally intensive and awkward to configure. We have created a database that contains the results of tests for positive selection along the human lineage in 13 721 genes with orthologs in the UCSC multispecies genome alignments. The Human PAML Browser is a resource through which researchers can search for a gene of interest or groups of genes by Gene Ontology category, and obtain coding sequence alignments for the gene and as well as results from tests of positive selection from the software package Phylogenetic Analysis by Maximum Likelihood. The Human PAML Browser is available at http://mendel.gene.cwru.edu/adamslab/pbrowser.py.
Collapse
Affiliation(s)
- Gabrielle C Nickel
- Department of Genetics, Case Western Reserve University, Cleveland, OH, USA
| | | | | |
Collapse
|
37
|
Jensen LJ, Julien P, Kuhn M, von Mering C, Muller J, Doerks T, Bork P. eggNOG: automated construction and annotation of orthologous groups of genes. Nucleic Acids Res 2007; 36:D250-4. [PMID: 17942413 PMCID: PMC2238944 DOI: 10.1093/nar/gkm796] [Citation(s) in RCA: 317] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
The identification of orthologous genes forms the basis for most comparative genomics studies. Existing approaches either lack functional annotation of the identified orthologous groups, hampering the interpretation of subsequent results, or are manually annotated and thus lag behind the rapid sequencing of new genomes. Here we present the eggNOG database (‘evolutionary genealogy of genes: Non-supervised Orthologous Groups’), which contains orthologous groups constructed from Smith–Waterman alignments through identification of reciprocal best matches and triangular linkage clustering. Applying this procedure to 312 bacterial, 26 archaeal and 35 eukaryotic genomes yielded 43 582 course-grained orthologous groups of which 9724 are extended versions of those from the original COG/KOG database. We also constructed more fine-grained groups for selected subsets of organisms, such as the 19 914 mammalian orthologous groups. We automatically annotated our non-supervised orthologous groups with functional descriptions, which were derived by identifying common denominators for the genes based on their individual textual descriptions, annotated functional categories, and predicted protein domains. The orthologous groups in eggNOG contain 1 241 751 genes and provide at least a broad functional description for 77% of them. Users can query the resource for individual genes via a web interface or download the complete set of orthologous groups at http://eggnog.embl.de.
Collapse
Affiliation(s)
- Lars Juhl Jensen
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | | | | | | | | | | | | |
Collapse
|
38
|
Heinicke S, Livstone MS, Lu C, Oughtred R, Kang F, Angiuoli SV, White O, Botstein D, Dolinski K. The Princeton Protein Orthology Database (P-POD): a comparative genomics analysis tool for biologists. PLoS One 2007; 2:e766. [PMID: 17712414 PMCID: PMC1942082 DOI: 10.1371/journal.pone.0000766] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2007] [Accepted: 07/18/2007] [Indexed: 02/07/2023] Open
Abstract
Many biological databases that provide comparative genomics information and tools are now available on the internet. While certainly quite useful, to our knowledge none of the existing databases combine results from multiple comparative genomics methods with manually curated information from the literature. Here we describe the Princeton Protein Orthology Database (P-POD, http://ortholog.princeton.edu), a user-friendly database system that allows users to find and visualize the phylogenetic relationships among predicted orthologs (based on the OrthoMCL method) to a query gene from any of eight eukaryotic organisms, and to see the orthologs in a wider evolutionary context (based on the Jaccard clustering method). In addition to the phylogenetic information, the database contains experimental results manually collected from the literature that can be compared to the computational analyses, as well as links to relevant human disease and gene information via the OMIM, model organism, and sequence databases. Our aim is for the P-POD resource to be extremely useful to typical experimental biologists wanting to learn more about the evolutionary context of their favorite genes. P-POD is based on the commonly used Generic Model Organism Database (GMOD) schema and can be downloaded in its entirety for installation on one's own system. Thus, bioinformaticians and software developers may also find P-POD useful because they can use the P-POD database infrastructure when developing their own comparative genomics resources and database tools.
Collapse
Affiliation(s)
- Sven Heinicke
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Michael S. Livstone
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Charles Lu
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Rose Oughtred
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Fan Kang
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Samuel V. Angiuoli
- The Institute for Genomic Research, Rockville, Maryland, United States of America
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
| | - Owen White
- The Institute for Genomic Research, Rockville, Maryland, United States of America
| | - David Botstein
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Kara Dolinski
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
39
|
Zimmer A, Lang D, Richardt S, Frank W, Reski R, Rensing SA. Dating the early evolution of plants: detection and molecular clock analyses of orthologs. Mol Genet Genomics 2007; 278:393-402. [PMID: 17593393 DOI: 10.1007/s00438-007-0257-6] [Citation(s) in RCA: 93] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2007] [Accepted: 05/24/2007] [Indexed: 11/28/2022]
Abstract
Orthologs generally are under selective pressure against loss of function, while paralogs usually accumulate mutations and finally die or deviate in terms of function or regulation. Most ortholog detection methods contaminate the resulting datasets with a substantial amount of paralogs. Therefore we aimed to implement a straightforward method that allows the detection of ortholog clusters with a reduced amount of paralogs from completely sequenced genomes. The described cross-species expansion of the reciprocal best BLAST hit method is a time-effective method for ortholog detection, which results in 68% truly orthologous clusters and the procedure specifically enriches single-copy orthologs. The detection of true orthologs can provide a phylogenetic toolkit to better understand evolutionary processes. In a study across six photosynthetic eukaryotes, nuclear genes of putative mitochondrial origin were shown to be over-represented among single copy orthologs. These orthologs are involved in fundamental biological processes like amino acid metabolism or translation. Molecular clock analyses based on this dataset yielded divergence time estimates for the red/green algae (1,142 MYA), green algae/land plant (725 MYA), mosses/seed plant (496 MYA), gymno-/angiosperm (385 MYA) and monocotyledons/core eudicotyledons (301 MYA) divergence times.
Collapse
Affiliation(s)
- Andreas Zimmer
- Plant Biotechnology, Faculty of Biology, University of Freiburg, Schaenzlestr. 1, 79104, Freiburg, Germany
| | | | | | | | | | | |
Collapse
|
40
|
Schneider A, Dessimoz C, Gonnet GH. OMA Browser--exploring orthologous relations across 352 complete genomes. Bioinformatics 2007; 23:2180-2. [PMID: 17545180 DOI: 10.1093/bioinformatics/btm295] [Citation(s) in RCA: 90] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Inference of the evolutionary relation between proteins, in particular the identification of orthologs, is a central problem in comparative genomics. Several large-scale efforts with various methodologies and scope tackle this problem, including OMA (the Orthologous MAtrix project). RESULTS Based on the results of the OMA project, we introduce here the OMA Browser, a web-based tool allowing the exploration of orthologous relations over 352 complete genomes. Orthologs can be viewed as groups across species, but also at the level of sequence pairs, allowing the distinction among one-to-one, one-to-many and many-to-many orthologs. AVAILABILITY http://omabrowser.org.
Collapse
|
41
|
Abstract
We present BLAST on Orthologous groups (BLASTO), a modified BLAST tool for searching orthologous group data. It treats each orthologous group as a unit and outputs a ranked list of orthologous groups instead of single sequences. By filtering out redundancy and putative paralogs, sequence comparisons to orthologous groups, instead of to single sequences in the database, can improve both functional prediction and phylogenetic inference. BLASTO computes the significance score of each orthologous group based on the individual BLAST hits in the orthologous group, using the number of taxa in the group as an optional weight. This allows users to control the species diversity of the orthologous groups. BLASTO incorporates the best-known multispecies ortholog databases, including NCBI Clusters of Orthologous Group, NCBI euKaryotic Orthologous Group database, OrthoMCL, MultiParanoid and TIGR Eukaryotic Gene Orthologues database, and offers a useful platform to integrate orthology information into functional inference and evolutionary studies of individual sequences. BLASTO is accessible online at http://oxytricha.princeton.edu/BlastO.
Collapse
Affiliation(s)
| | - Laura F. Landweber
- *To whom correspondence should be addressed: +1 609 258 1947+1 609 258 7892
| |
Collapse
|
42
|
Chen F, Mackey AJ, Vermunt JK, Roos DS. Assessing performance of orthology detection strategies applied to eukaryotic genomes. PLoS One 2007; 2:e383. [PMID: 17440619 PMCID: PMC1849888 DOI: 10.1371/journal.pone.0000383] [Citation(s) in RCA: 311] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2007] [Accepted: 03/13/2007] [Indexed: 12/02/2022] Open
Abstract
Orthology detection is critically important for accurate functional annotation, and has been widely used to facilitate studies on comparative and evolutionary genomics. Although various methods are now available, there has been no comprehensive analysis of performance, due to the lack of a genomic-scale ‘gold standard’ orthology dataset. Even in the absence of such datasets, the comparison of results from alternative methodologies contains useful information, as agreement enhances confidence and disagreement indicates possible errors. Latent Class Analysis (LCA) is a statistical technique that can exploit this information to reasonably infer sensitivities and specificities, and is applied here to evaluate the performance of various orthology detection methods on a eukaryotic dataset. Overall, we observe a trade-off between sensitivity and specificity in orthology detection, with BLAST-based methods characterized by high sensitivity, and tree-based methods by high specificity. Two algorithms exhibit the best overall balance, with both sensitivity and specificity>80%: INPARANOID identifies orthologs across two species while OrthoMCL clusters orthologs from multiple species. Among methods that permit clustering of ortholog groups spanning multiple genomes, the (automated) OrthoMCL algorithm exhibits better within-group consistency with respect to protein function and domain architecture than the (manually curated) KOG database, and the homolog clustering algorithm TribeMCL as well. By way of using LCA, we are also able to comprehensively assess similarities and statistical dependence between various strategies, and evaluate the effects of parameter settings on performance. In summary, we present a comprehensive evaluation of orthology detection on a divergent set of eukaryotic genomes, thus providing insights and guides for method selection, tuning and development for different applications. Many biological questions have been addressed by multiple tests yielding binary (yes/no) outcomes but no clear definition of truth, making LCA an attractive approach for computational biology.
Collapse
Affiliation(s)
- Feng Chen
- Department of Chemistry, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Genomics Institute, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Aaron J. Mackey
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Genomics Institute, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Jeroen K. Vermunt
- Department of Methodology and Statistics, Tilburg University, The Netherlands
| | - David S. Roos
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Genomics Institute, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
43
|
Wei T, Geiser AG, Qian HR, Su C, Helvering LM, Kulkarini NH, Shou J, N'Cho M, Bryant HU, Onyia JE. DNA microarray data integration by ortholog gene analysis reveals potential molecular mechanisms of estrogen-dependent growth of human uterine fibroids. BMC WOMENS HEALTH 2007; 7:5. [PMID: 17407572 PMCID: PMC1852551 DOI: 10.1186/1472-6874-7-5] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/07/2006] [Accepted: 04/02/2007] [Indexed: 01/28/2023]
Abstract
Background Uterine fibroids or leiomyoma are a common benign smooth muscle tumor. The tumor growth is well known to be estrogen-dependent. However, the molecular mechanisms of its estrogen-dependency is not well understood. Methods Differentially expressed genes in human uterine fibroids were either retrieved from published papers or from our own statistical analysis of downloaded array data. Probes for the same genes on different Affymetrix chips were mapped based on probe comparison information provided by Affymetrix. Genes identified by two or three array studies were submitted for ortholog analysis. Human and rat ortholog genes were identified by using ortholog gene databases, HomoloGene and TOGA and were confirmed by synteny analysis with MultiContigView tool in the Ensembl genome browser. Results By integrated analysis of three recently published DNA microarray studies with human tissue, thirty-eight genes were found to be differentially expressed in the same direction in fibroid compared to adjacent uterine myometrium by at least two research groups. Among these genes, twelve with rat orthologs were identified as estrogen-regulated from our array study investigating uterine expression in ovariectomized rats treated with estrogen. Functional and pathway analyses of the twelve genes suggested multiple molecular mechanisms for estrogen-dependent cell survival and tumor growth. Firstly, estrogen increased expression of the anti-apoptotic PCP4 gene and suppressed the expression of growth inhibitory receptors PTGER3 and TGFBR2. Secondly, estrogen may antagonize PPARγ signaling, thought to inhibit fibroid growth and survival, at two points in the PPAR pathway: 1) through increased ANXA1 gene expression which can inhibit phospholipase A2 activity and in turn decrease arachidonic acid synthesis, and 2) by decreasing L-PGDS expression which would reduce synthesis of PGJ2, an endogenous ligand for PPARγ. Lastly, estrogen affects retinoic acid (RA) synthesis and mobilization by regulating expression of CRABP2 and ALDH1A1. RA has been shown to play a significant role in the development of uterine fibroids in an animal model. Conclusion Integrated analysis of multiple array datasets revealed twelve human and rat ortholog genes that were differentially expressed in human uterine fibroids and transcriptionally responsive to estrogen in the rat uterus. Functional and pathway analysis of these genes suggest multiple potential molecular mechanisms for the poorly understood estrogen-dependent growth of uterine fibroids. Fully understanding the exact molecular interactions among these gene products requires further study to validate their roles in uterine fibroids. This work provides new avenues of study which could influence the future direction of therapeutic intervention for the disease.
Collapse
Affiliation(s)
- Tao Wei
- Integrative Biology, Lilly Research Laboratories, Greenfield, Indiana 46140, USA
| | - Andrew G Geiser
- Bone and Inflammation, Lilly Research Laboratories, Indianapolis, Indiana 46285, USA
| | - Hui-Rong Qian
- Discovery Statistics, Lilly Research Laboratories, Indianapolis, Indiana 46285, USA
| | - Chen Su
- Integrative Biology, Lilly Research Laboratories, Greenfield, Indiana 46140, USA
| | - Leah M Helvering
- Bone and Inflammation, Lilly Research Laboratories, Indianapolis, Indiana 46285, USA
| | - Nalini H Kulkarini
- Integrative Biology, Lilly Research Laboratories, Greenfield, Indiana 46140, USA
| | - Jianyong Shou
- Integrative Biology, Lilly Research Laboratories, Greenfield, Indiana 46140, USA
| | - Mathias N'Cho
- Integrative Biology, Lilly Research Laboratories, Greenfield, Indiana 46140, USA
| | - Henry U Bryant
- Bone and Inflammation, Lilly Research Laboratories, Indianapolis, Indiana 46285, USA
| | - Jude E Onyia
- Integrative Biology, Lilly Research Laboratories, Greenfield, Indiana 46140, USA
| |
Collapse
|
44
|
Ortutay C, Siermala M, Vihinen M. ImmTree: database of evolutionary relationships of genes and proteins in the human immune system. Immunome Res 2007; 3:4. [PMID: 17376226 PMCID: PMC1845140 DOI: 10.1186/1745-7580-3-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2007] [Accepted: 03/21/2007] [Indexed: 11/10/2022] Open
Abstract
Background The immune system, which is a complex machinery, is based on the highly coordinated expression of a wide array of genes and proteins. The evolutionary history of the human immune system is not well characterised. Although several studies related to the development and evolution of immunological processes have been published, a full-scale genome-based analysis is still missing. A database focused on the evolutionary relationships of immune related genes would contribute to and facilitate research on immunology and evolutionary biology. Results An Internet resource called ImmTree was constructed for studying the evolution and evolutionary trees of the human immune system. ImmTree contains information about orthologs in 80 species collected from the HomoloGene, OrthoMCL and EGO databases. In addition to phylogenetic trees, the service provides data for the comparison of human-mouse ortholog pairs, including synonymous and non-synonymous mutation rates, Z values, and Ka/Ks quotients. A versatile search engine allows complex queries from the database. Currently, data is available for 847 human immune system related genes and proteins. Conclusion ImmTree provides a unique data set of genes and proteins from the human immune system, their phylogenetics, and information for comparisons of human-mouse ortholog pairs, synonymous and non-synonymous mutation rates, as well as other statistical information.
Collapse
Affiliation(s)
- Csaba Ortutay
- Institute of Medical Technology, FI-33014 University of Tampere, Finland
| | - Markku Siermala
- Institute of Medical Technology, FI-33014 University of Tampere, Finland
| | - Mauno Vihinen
- Institute of Medical Technology, FI-33014 University of Tampere, Finland
- Research Unit, Tampere University Hospital, FI-33520 Tampere, Finland
| |
Collapse
|
45
|
Ortutay C, Siermala M, Vihinen M. Molecular characterization of the immune system: emergence of proteins, processes, and domains. Immunogenetics 2007; 59:333-48. [PMID: 17294181 DOI: 10.1007/s00251-007-0191-0] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2006] [Accepted: 01/08/2007] [Indexed: 12/27/2022]
Abstract
Many genes and proteins are required to carry out the processes of innate and adaptive immunity. For many studies, including systems biology, it is necessary to have a clear and comprehensive definition of the immune system, including the genes and proteins that take part in immunological processes. We have identified and cataloged a large portion of the human immunology-related genes, which we call the essential immunome. The 847 identified genes and proteins were annotated, and their chromosomal localizations were compared to the mouse genome. Relation to disease was also taken into account. We identified numerous pseudogenes, many of which are expressed, and found two putative new genes. We also carried out an evolutionary analysis of immune processes based on gene orthologs to gain an overview of the evolutionary past and molecular present of the human immune system. A list of genes and proteins were compiled. A comprehensive characterization of the member genes and proteins, including the corresponding pseudogenes is presented. Immunome genes were found to have three types of emergence in independent studies of their ontologies, domains, and functions.
Collapse
Affiliation(s)
- Csaba Ortutay
- Institute of Medical Technology, University of Tampere, 33014, Tampere, Finland
| | | | | |
Collapse
|
46
|
Sanderson MJ, McMahon MM. Inferring angiosperm phylogeny from EST data with widespread gene duplication. BMC Evol Biol 2007; 7 Suppl 1:S3. [PMID: 17288576 PMCID: PMC1796612 DOI: 10.1186/1471-2148-7-s1-s3] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Most studies inferring species phylogenies use sequences from single copy genes or sets of orthologs culled from gene families. For taxa such as plants, with very high levels of gene duplication in their nuclear genomes, this has limited the exploitation of nuclear sequences for phylogenetic studies, such as those available in large EST libraries. One rarely used method of inference, gene tree parsimony, can infer species trees from gene families undergoing duplication and loss, but its performance has not been evaluated at a phylogenomic scale for EST data in plants. RESULTS A gene tree parsimony analysis based on EST data was undertaken for six angiosperm model species and Pinus, an outgroup. Although a large fraction of the tentative consensus sequences obtained from the TIGR database of ESTs was assembled into homologous clusters too small to be phylogenetically informative, some 557 clusters contained promising levels of information. Based on maximum likelihood estimates of the gene trees obtained from these clusters, gene tree parsimony correctly inferred the accepted species tree with strong statistical support. A slight variant of this species tree was obtained when maximum parsimony was used to infer the individual gene trees instead. CONCLUSION Despite the complexity of the EST data and the relatively small fraction eventually used in inferring a species tree, the gene tree parsimony method performed well in the face of very high apparent rates of duplication.
Collapse
Affiliation(s)
- Michael J Sanderson
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Michelle M McMahon
- Department of Plant Sciences, University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
47
|
Grow M, Neff AW, Mescher AL, King MW. Global analysis of gene expression in Xenopus hindlimbs during stage-dependent complete and incomplete regeneration. Dev Dyn 2007; 235:2667-85. [PMID: 16871633 DOI: 10.1002/dvdy.20897] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Xenopus laevis tadpoles are capable of limb regeneration after amputation, in a process that initially involves the formation of a blastema. However, Xenopus has full regenerative capacity only through premetamorphic stages. We have used the Affymetrix Xenopus laevis Genome Genechip microarray to perform a large-scale screen of gene expression in the regeneration-complete, stage 53 (st53), and regeneration-incomplete, stage 57 (st57), hindlimbs at 1 and 5 days postamputation. Through an exhaustive reannotation of the Genechip and a variety of comparative bioinformatic analyses, we have identified genes that are differentially expressed between the regeneration-complete and -incomplete stages, detected the transcriptional changes associated with the regenerating blastema, and compared these results with those of other regeneration researchers. We focus particular attention on striking transcriptional activity observed in genes associated with patterning, stress response, and inflammation. Overall, this work provides the most comprehensive views yet of a regenerating limb and different transcriptional compositions of regeneration-competent and deficient tissues.
Collapse
Affiliation(s)
- Matthew Grow
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, Indiana, USA.
| | | | | | | |
Collapse
|
48
|
Chan AP, Rabinowicz PD, Quackenbush J, Buell CR, Town CD. Plant database resources at The Institute for Genomic Research. Methods Mol Biol 2007; 406:113-136. [PMID: 18287690 DOI: 10.1007/978-1-59745-535-0_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
With the completion of the genome sequences of the model plants Arabidopsis and rice, and the continuing sequencing efforts of other economically important crop plants, an unprecedented amount of genome sequence data is now available for large-scale genomics studies and analyses, such as the identification and discovery of novel genes, comparative genomics, and functional genomics. Efficient utilization of these large data sets is critically dependent on the ease of access and organization of the data. The plant databases at The Institute for Genomic Research (TIGR) have been set up to maintain various data types including genomic sequence, annotation and analyses, expressed transcript assemblies and analyses, and gene expression profiles from microarray studies. We present here an overview of the TIGR database resources for plant genomics and describe methods to access the data.
Collapse
Affiliation(s)
- Agnes P Chan
- The Institute for Genomic Research, Rockville, MD, USA
| | | | | | | | | |
Collapse
|
49
|
Chen J, Blackwell TW, Fermin D, Menon R, Chen Y, Gao J, Lee AW, States DJ. Evolutionary-conserved gene expression response profiles across mammalian tissues. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2007; 11:96-115. [PMID: 17411398 DOI: 10.1089/omi.2006.0007] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Gene expression responses are complex and frequently involve the actions of many genes to effect coordinated patterns. We hypothesized these coordinated responses are evolutionarily conserved and used a comparison of human and mouse gene expression profiles to identify the most prominent conserved features across a set of normal mammalian tissues. Based on data from multiple studies across multiple tissues in human and mouse, 13 gene expression modes across multiple tissues were identified in each of these species using principal component analysis. Strikingly, 1-to-1 pairing of human and mouse modes was observed in 12 out of 13 modes obtained from the two species independently. These paired modes define evolutionarily conserved gene expression response modes (CGEMs). Notably, in this study we were able to extract biological responses that are not overwhelmed by laboratory-to-laboratory or species-to-species variation. Of the variation in our gene expression dataset, 84% can be explained using these CGEMs. Functional annotation was performed using Gene Ontology, pathway, and transcription factor binding site over representation. Our conclusion is that we found an unbiased way of obtaining conserved gene response modes that accounts for a considerable portion of gene expression variation in a given dataset, as well as validates the conservation of major gene expression response modes across the mammals.
Collapse
Affiliation(s)
- Ji Chen
- Bioinformatics Program, University of Michigan, Ann Arbor, Michigan 48109, USA
| | | | | | | | | | | | | | | |
Collapse
|
50
|
Mattes WB. Cross-species comparative toxicogenomics as an aid to safety assessment. Expert Opin Drug Metab Toxicol 2006; 2:859-74. [PMID: 17125406 DOI: 10.1517/17425255.2.6.859] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Cross-species comparative toxicogenomics has the potential for improving the understanding of the different responses of animal models to toxicants at a molecular level. This understanding could then lead to a more accurate extrapolation of the risk posed by these toxicants to humans. Cross-species comparative studies have been carried out at the genomic sequence level and using microarrays to examine changes in global mRNA profiles. However, these studies face considerable bioinformatic challenges in terms of identifying which genes are truly orthologous across species. The resources to analyse such studies, in the context of such orthologues, beg improvement. Finally, the experimental design of such studies needs to be carefully considered to make their results fully interpretable. These issues are discussed, along with the current state-of-the-art cross-species comparative toxicogenomics in this review.
Collapse
|