101
|
Oh Brother, Where Art Thou? Finding Orthologs in the Twilight and Midnight Zones of Sequence Similarity. Evol Biol 2016. [DOI: 10.1007/978-3-319-41324-2_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
102
|
Das S, Dawson NL, Orengo CA. Diversity in protein domain superfamilies. Curr Opin Genet Dev 2015; 35:40-9. [PMID: 26451979 PMCID: PMC4686048 DOI: 10.1016/j.gde.2015.09.005] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2015] [Revised: 09/07/2015] [Accepted: 09/08/2015] [Indexed: 01/25/2023]
Abstract
Whilst ∼93% of domain superfamilies appear to be relatively structurally and functionally conserved based on the available data from the CATH-Gene3D domain classification resource, the remainder are much more diverse. In this review, we consider how domains in some of the most ubiquitous and promiscuous superfamilies have evolved, in particular the plasticity in their functional sites and surfaces which expands the repertoire of molecules they interact with and actions performed on them. To what extent can we identify a core function for these superfamilies which would allow us to develop a ‘domain grammar of function’ whereby a protein's biological role can be proposed from its constituent domains? Clearly the first step is to understand the extent to which these components vary and how changes in their molecular make-up modifies function.
Collapse
Affiliation(s)
- Sayoni Das
- Institute of Structural and Molecular Biology, UCL, 627 Darwin Building, Gower Street, WC1E 6BT, UK
| | - Natalie L Dawson
- Institute of Structural and Molecular Biology, UCL, 627 Darwin Building, Gower Street, WC1E 6BT, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, UCL, 627 Darwin Building, Gower Street, WC1E 6BT, UK.
| |
Collapse
|
103
|
Applications of comparative evolution to human disease genetics. Curr Opin Genet Dev 2015; 35:16-24. [PMID: 26338499 DOI: 10.1016/j.gde.2015.08.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2015] [Revised: 08/11/2015] [Accepted: 08/12/2015] [Indexed: 12/15/2022]
Abstract
Direct comparison of human diseases with model phenotypes allows exploration of key areas of human biology which are often inaccessible for practical or ethical reasons. We review recent developments in comparative evolutionary approaches for finding models for genetic disease, including high-throughput generation of gene/phenotype relationship data, the linking of orthologous genes and phenotypes across species, and statistical methods for linking human diseases to model phenotypes.
Collapse
|
104
|
|
105
|
Genome-wide identification of MAPK, MAPKK, and MAPKKK gene families and transcriptional profiling analysis during development and stress response in cucumber. BMC Genomics 2015; 16:386. [PMID: 25976104 PMCID: PMC4432876 DOI: 10.1186/s12864-015-1621-2] [Citation(s) in RCA: 87] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2014] [Accepted: 05/05/2015] [Indexed: 12/02/2022] Open
Abstract
Background The mitogen-activated protein kinase (MAPK) cascade consists of three types of reversibly phosphorylated kinases, namely, MAPK, MAPK kinase (MAPKK/MEK), and MAPK kinase kinase (MAPKKK/MEKK), playing important roles in plant growth, development, and defense response. The MAPK cascade genes have been investigated in detail in model plants, including Arabidopsis, rice, and tomato, but poorly characterized in cucumber (Cucumis sativus L.), a major popular vegetable in Cucurbitaceae crops, which is highly susceptible to environmental stress and pathogen attack. Results A genome-wide analysis revealed the presence of at least 14 MAPKs, 6 MAPKKs, and 59 MAPKKKs in the cucumber genome. Phylogenetic analyses classified all the CsMAPK and CsMAPKK genes into four groups, whereas the CsMAPKKK genes were grouped into the MEKK, RAF, and ZIK subfamilies. The expansion of these three gene families was mainly contributed by segmental duplication events. Furthermore, the ratios of non-synonymous substitution rates (Ka) and synonymous substitution rates (Ks) implied that the duplicated gene pairs had experienced strong purifying selection. Real-time PCR analysis demonstrated that some MAPK, MAPKK and MAPKKK genes are preferentially expressed in specific organs or tissues. Moreover, the expression levels of most of these genes significantly changed under heat, cold, drought, and Pseudoperonospora cubensis treatments. Exposure to abscisic acid and jasmonic acid markedly affected the expression levels of these genes, thereby implying that they may play important roles in the plant hormone network. Conclusion A comprehensive genome-wide analysis of gene structure, chromosomal distribution, and evolutionary relationship of MAPK cascade genes in cucumber are present here. Further expression analysis revealed that these genes were involved in important signaling pathways for biotic and abiotic stress responses in cucumber, as well as the response to plant hormones. Our first systematic description of the MAPK, MAPKK, and MAPKKK families in cucumber will help to elucidate their biological roles in plant. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1621-2) contains supplementary material, which is available to authorized users.
Collapse
|
106
|
Gfeller D, Zoete V. Protein homology reveals new targets for bioactive small molecules. Bioinformatics 2015; 31:2721-7. [PMID: 25900917 DOI: 10.1093/bioinformatics/btv214] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2015] [Accepted: 04/14/2015] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION The functional impact of small molecules is increasingly being assessed in different eukaryotic species through large-scale phenotypic screening initiatives. Identifying the targets of these molecules is crucial to mechanistically understand their function and uncover new therapeutically relevant modes of action. However, despite extensive work carried out in model organisms and human, it is still unclear to what extent one can use information obtained in one species to make predictions in other species. RESULTS Here, for the first time, we explore and validate at a large scale the use of protein homology relationships to predict the targets of small molecules across different species. Our results show that exploiting target homology can significantly improve the predictions, especially for molecules experimentally tested in other species. Interestingly, when considering separately orthology and paralogy relationships, we observe that mapping small molecule interactions among orthologs improves prediction accuracy, while including paralogs does not improve and even sometimes worsens the prediction accuracy. Overall, our results provide a novel approach to integrate chemical screening results across multiple species and highlight the promises and remaining challenges of using protein homology for small molecule target identification. AVAILABILITY AND IMPLEMENTATION Homology-based predictions can be tested on our website http://www.swisstargetprediction.ch. CONTACT david.gfeller@unil.ch or vincent.zoete@isb-sib.ch. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David Gfeller
- Department of Fundamental Oncology, Ludwig Center for Cancer Research, University of Lausanne, 1066 Epalinges, Switzerland and Swiss Institute of Bioinformatics (SIB), Quartier Sorge, Bâtiment Génopode, 1015 Lausanne, Switzerland
| | - Vincent Zoete
- Swiss Institute of Bioinformatics (SIB), Quartier Sorge, Bâtiment Génopode, 1015 Lausanne, Switzerland
| |
Collapse
|
107
|
Crutcher FK, Moran-Diez ME, Ding S, Liu J, Horwitz BA, Mukherjee PK, Kenerley CM. A paralog of the proteinaceous elicitor SM1 is involved in colonization of maize roots by Trichoderma virens. Fungal Biol 2015; 119:476-86. [PMID: 25986544 DOI: 10.1016/j.funbio.2015.01.004] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2014] [Revised: 01/17/2015] [Accepted: 01/19/2015] [Indexed: 11/17/2022]
Abstract
The biocontrol agent, Trichoderma virens, has the ability to protect plants from pathogens by eliciting plant defense responses, involvement in mycoparasitism, or secreting antagonistic secondary metabolites. SM1, an elicitor of induced systemic resistance (ISR), was found to have three paralogs within the T. virens genome. The paralog sm2 is highly expressed in the presence of plant roots. Gene deletion mutants of sm2 were generated and the mutants were found to overproduce SM1. The ability to elicit ISR in maize against Colletotrichum graminicola was not compromised for the mutants compared to that of wild type isolate. However, the deletion strains had a significantly lowered ability to colonize maize roots. This appears to be the first report on the involvement of an effector-like protein in colonization of roots by Trichoderma.
Collapse
Affiliation(s)
- Frankie K Crutcher
- Department of Plant Pathology and Microbiology, Texas A&M University, College Station, TX 77843, USA; Southern Plains Agricultural Research Center, USDA, Agricultural Research Service, 2765 F and B Road, College Station, TX 77845, USA
| | - Maria E Moran-Diez
- Department of Plant Pathology and Microbiology, Texas A&M University, College Station, TX 77843, USA; Bioprotection Research Centre, Lincoln University, PO Box 84, Lincoln 7647 Canterbury, New Zealand
| | - Shengli Ding
- Department of Plant Pathology and Microbiology, Texas A&M University, College Station, TX 77843, USA
| | - Jinggao Liu
- Southern Plains Agricultural Research Center, USDA, Agricultural Research Service, 2765 F and B Road, College Station, TX 77845, USA
| | - Benjamin A Horwitz
- Department of Biology, Technion-Israel Institute of Technology, 32000 Haifa, Israel
| | - Prasun K Mukherjee
- Department of Plant Pathology and Microbiology, Texas A&M University, College Station, TX 77843, USA; Nuclear Agriculture and Biotechnology Division, Bhabha Atomic Research Center, Trombay, Mumbai 400085, India
| | - Charles M Kenerley
- Department of Plant Pathology and Microbiology, Texas A&M University, College Station, TX 77843, USA.
| |
Collapse
|
108
|
Rogozin IB, Managadze D, Shabalina SA, Koonin EV. Gene family level comparative analysis of gene expression in mammals validates the ortholog conjecture. Genome Biol Evol 2015; 6:754-62. [PMID: 24610837 PMCID: PMC4007545 DOI: 10.1093/gbe/evu051] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
The ortholog conjecture (OC), which is central to functional annotation of genomes, posits that orthologous genes are functionally more similar than paralogous genes at the same level of sequence divergence. However, a recent study challenged the OC by reporting a greater functional similarity, in terms of Gene Ontology (GO) annotations and expression profiles, among within-species paralogs compared with orthologs. These findings were taken to indicate that functional similarity of homologous genes is primarily determined by the cellular context of the genes, rather than evolutionary history. However, several subsequent studies suggest that GO annotations and microarray data could artificially inflate functional similarity between paralogs from the same organism. We sought to test the OC using approaches distinct from those used in previous studies. Analysis of a large RNAseq data set from multiple human and mouse tissues shows that expression similarity (correlations coefficients, rank's, or Z-scores) between orthologs is substantially greater than that for between-species paralogs with the same sequence divergence, in agreement with the OC and the results of recent detailed analyses. These findings are further corroborated by a fine-grain analysis in which expression profiles of orthologs and paralogs were compared separately for individual gene families. Expression profiles of within-species paralogs are more strongly correlated than profiles of orthologs but it is shown that this is caused by high background noise, that is, correlation between profiles of unrelated genes in the same organism. Z-scores and rank scores show a nonmonotonic dependence of expression profile similarity on sequence divergence. This complexity of gene expression evolution after duplication might be at least partially caused by selection for protein dosage rebalancing following gene duplication.
Collapse
Affiliation(s)
- Igor B Rogozin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland
| | | | | | | |
Collapse
|
109
|
Mohanta TK, Arora PK, Mohanta N, Parida P, Bae H. Identification of new members of the MAPK gene family in plants shows diverse conserved domains and novel activation loop variants. BMC Genomics 2015; 16:58. [PMID: 25888265 PMCID: PMC4363184 DOI: 10.1186/s12864-015-1244-7] [Citation(s) in RCA: 81] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2014] [Accepted: 01/15/2015] [Indexed: 11/30/2022] Open
Abstract
Background Mitogen Activated Protein Kinase (MAPK) signaling is of critical importance in plants and other eukaryotic organisms. The MAPK cascade plays an indispensible role in the growth and development of plants, as well as in biotic and abiotic stress responses. The MAPKs are constitute the most downstream module of the three tier MAPK cascade and are phosphorylated by upstream MAP kinase kinases (MAPKK), which are in turn are phosphorylated by MAP kinase kinase kinase (MAPKKK). The MAPKs play pivotal roles in regulation of many cytoplasmic and nuclear substrates, thus regulating several biological processes. Results A total of 589 MAPKs genes were identified from the genome wide analysis of 40 species. The sequence analysis has revealed the presence of several N- and C-terminal conserved domains. The MAPKs were previously believed to be characterized by the presence of TEY/TDY activation loop motifs. The present study showed that, in addition to presence of activation loop TEY/TDY motifs, MAPKs are also contain MEY, TEM, TQM, TRM, TVY, TSY, TEC and TQY activation loop motifs. Phylogenetic analysis of all predicted MAPKs were clustered into six different groups (group A, B, C, D, E and F), and all predicted MAPKs were assigned with specific names based on their orthology based evolutionary relationships with Arabidopsis or Oryza MAPKs. Conclusion We conducted global analysis of the MAPK gene family of plants from lower eukaryotes to higher eukaryotes and analyzed their genomic and evolutionary aspects. Our study showed the presence of several new activation loop motifs and diverse conserved domains in MAPKs. Advance study of newly identified activation loop motifs can provide further information regarding the downstream signaling cascade activated in response to a wide array of stress conditions, as well as plant growth and development. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1244-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Tapan Kumar Mohanta
- School of Biotechnology, Yeungnam University, Daehak Gyeongsan, Gyeonsangbook, 712749, Republic of Korea.
| | - Pankaj Kumar Arora
- School of Biotechnology, Yeungnam University, Daehak Gyeongsan, Gyeonsangbook, 712749, Republic of Korea.
| | - Nibedita Mohanta
- Department of Biotechnology, North Orissa University, Sri Ramchandra Vihar, Takatpur, Baripada, Mayurbhanj, Orissa, 757003, India.
| | - Pratap Parida
- Center for Studies in Biotechnology, Dibrugarh University, Dibrugarh, Assam, 786004, India.
| | - Hanhong Bae
- School of Biotechnology, Yeungnam University, Daehak Gyeongsan, Gyeonsangbook, 712749, Republic of Korea.
| |
Collapse
|
110
|
Bolser DM, Kerhornou A, Walts B, Kersey P. Triticeae resources in Ensembl Plants. PLANT & CELL PHYSIOLOGY 2015; 56:e3. [PMID: 25432969 PMCID: PMC4301745 DOI: 10.1093/pcp/pcu183] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2014] [Accepted: 11/12/2014] [Indexed: 05/21/2023]
Abstract
Recent developments in DNA sequencing have enabled the large and complex genomes of many crop species to be determined for the first time, even those previously intractable due to their polyploid nature. Indeed, over the course of the last 2 years, the genome sequences of several commercially important cereals, notably barley and bread wheat, have become available, as well as those of related wild species. While still incomplete, comparison with other, more completely assembled species suggests that coverage of genic regions is likely to be high. Ensembl Plants (http://plants.ensembl.org) is an integrative resource organizing, analyzing and visualizing genome-scale information for important crop and model plants. Available data include reference genome sequence, variant loci, gene models and functional annotation. For variant loci, individual and population genotypes, linkage information and, where available, phenotypic information are shown. Comparative analyses are performed on DNA and protein sequence alignments. The resulting genome alignments and gene trees, representing the implied evolutionary history of the gene family, are made available for visualization and analysis. Driven by the case of bread wheat, specific extensions to the analysis pipelines and web interface have recently been developed to support polyploid genomes. Data in Ensembl Plants is accessible through a genome browser incorporating various specialist interfaces for different data types, and through a variety of additional methods for programmatic access and data mining. These interfaces are consistent with those offered through the Ensembl interface for the genomes of non-plant species, including those of plant pathogens, pests and pollinators, facilitating the study of the plant in its environment.
Collapse
Affiliation(s)
- Dan M Bolser
- Ensembl Genomes, EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK
| | - Arnaud Kerhornou
- Ensembl Genomes, EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK
| | - Brandon Walts
- Ensembl Genomes, EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK
| | - Paul Kersey
- Ensembl Genomes, EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK
| |
Collapse
|
111
|
Snyder-Talkington BN, Dong C, Zhao X, Dymacek J, Porter DW, Wolfarth MG, Castranova V, Qian Y, Guo NL. Multi-walled carbon nanotube-induced gene expression in vitro: concordance with in vivo studies. Toxicology 2014; 328:66-74. [PMID: 25511174 DOI: 10.1016/j.tox.2014.12.012] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2014] [Revised: 12/08/2014] [Accepted: 12/11/2014] [Indexed: 11/26/2022]
Abstract
There is a current interest in reducing the in vivo toxicity testing of nanomaterials in animals by increasing toxicity testing using in vitro cellular assays; however, toxicological results are seldom concordant between in vivo and in vitro models. This study compared global multi-walled carbon nanotube (MWCNT)-induced gene expression from human lung epithelial and microvascular endothelial cells in monoculture and coculture with gene expression from mouse lungs exposed to MWCNT. Using a cutoff of 10% false discovery rate and 1.5 fold change, we determined that there were more concordant genes (gene expression both up- or downregulated in vivo and in vitro) expressed in both cell types in coculture than in monoculture. When reduced to only those genes involved in inflammation and fibrosis, known outcomes of in vivo MWCNT exposure, there were more disease-related concordant genes expressed in coculture than monoculture. Additionally, different cellular signaling pathways are activated in response to MWCNT dependent upon culturing conditions. As coculture gene expression better correlated with in vivo gene expression, we suggest that cellular cocultures may offer enhanced in vitro models for nanoparticle risk assessment and the reduction of in vivo toxicological testing.
Collapse
Affiliation(s)
- Brandi N Snyder-Talkington
- Pathology and Physiology Research Branch, Health Effects Laboratory Division, National Institute for Occupational Safety and Health, Morgantown, WV 26505, USA
| | - Chunlin Dong
- Mary Babb Randolph Cancer Center, West Virginia University, Morgantown, WV 26506-9300, USA
| | - Xiangyi Zhao
- Mary Babb Randolph Cancer Center, West Virginia University, Morgantown, WV 26506-9300, USA
| | - Julian Dymacek
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV 26506-6070, USA
| | - Dale W Porter
- Pathology and Physiology Research Branch, Health Effects Laboratory Division, National Institute for Occupational Safety and Health, Morgantown, WV 26505, USA
| | - Michael G Wolfarth
- Pathology and Physiology Research Branch, Health Effects Laboratory Division, National Institute for Occupational Safety and Health, Morgantown, WV 26505, USA
| | - Vincent Castranova
- Pathology and Physiology Research Branch, Health Effects Laboratory Division, National Institute for Occupational Safety and Health, Morgantown, WV 26505, USA; Department of Basic Pharmaceutical Sciences, School of Pharmacy, West Virginia University, Morgantown, WV 26506, USA
| | - Yong Qian
- Pathology and Physiology Research Branch, Health Effects Laboratory Division, National Institute for Occupational Safety and Health, Morgantown, WV 26505, USA.
| | - Nancy L Guo
- Mary Babb Randolph Cancer Center, West Virginia University, Morgantown, WV 26506-9300, USA.
| |
Collapse
|
112
|
Trachana K, Forslund K, Larsson T, Powell S, Doerks T, von Mering C, Bork P. A phylogeny-based benchmarking test for orthology inference reveals the limitations of function-based validation. PLoS One 2014; 9:e111122. [PMID: 25369365 PMCID: PMC4219706 DOI: 10.1371/journal.pone.0111122] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Accepted: 09/23/2014] [Indexed: 11/19/2022] Open
Abstract
Accurate orthology prediction is crucial for many applications in the post-genomic era. The lack of broadly accepted benchmark tests precludes a comprehensive analysis of orthology inference. So far, functional annotation between orthologs serves as a performance proxy. However, this violates the fundamental principle of orthology as an evolutionary definition, while it is often not applicable due to limited experimental evidence for most species. Therefore, we constructed high quality "gold standard" orthologous groups that can serve as a benchmark set for orthology inference in bacterial species. Herein, we used this dataset to demonstrate 1) why a manually curated, phylogeny-based dataset is more appropriate for benchmarking orthology than other popular practices and 2) how it guides database design and parameterization through careful error quantification. More specifically, we illustrate how function-based tests often fail to identify false assignments, misjudging the true performance of orthology inference methods. We also examined how our dataset can instruct the selection of a “core” species repertoire to improve detection accuracy. We conclude that including more genomes at the proper evolutionary distances can influence the overall quality of orthology detection. The curated gene families, called Reference Orthologous Groups, are publicly available at http://eggnog.embl.de/orthobench2.
Collapse
Affiliation(s)
- Kalliopi Trachana
- Institute for Systems Biology, Seattle, WA, United States of America
| | - Kristoffer Forslund
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Tomas Larsson
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Developmental Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Sean Powell
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Tobias Doerks
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Christian von Mering
- Institute of Molecular Life Sciences, University of Zurich and Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Max-Delbruck-Centre for Molecular Medicine, Berlin, Germany
- * E-mail:
| |
Collapse
|
113
|
Sonnhammer ELL, Gabaldón T, Sousa da Silva AW, Martin M, Robinson-Rechavi M, Boeckmann B, Thomas PD, Dessimoz C. Big data and other challenges in the quest for orthologs. Bioinformatics 2014; 30:2993-8. [PMID: 25064571 PMCID: PMC4201156 DOI: 10.1093/bioinformatics/btu492] [Citation(s) in RCA: 98] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2014] [Revised: 06/25/2014] [Accepted: 07/16/2014] [Indexed: 01/29/2023] Open
Abstract
UNLABELLED Given the rapid increase of species with a sequenced genome, the need to identify orthologous genes between them has emerged as a central bioinformatics task. Many different methods exist for orthology detection, which makes it difficult to decide which one to choose for a particular application. Here, we review the latest developments and issues in the orthology field, and summarize the most recent results reported at the third 'Quest for Orthologs' meeting. We focus on community efforts such as the adoption of reference proteomes, standard file formats and benchmarking. Progress in these areas is good, and they are already beneficial to both orthology consumers and providers. However, a major current issue is that the massive increase in complete proteomes poses computational challenges to many of the ortholog database providers, as most orthology inference algorithms scale at least quadratically with the number of proteomes. The Quest for Orthologs consortium is an open community with a number of working groups that join efforts to enhance various aspects of orthology analysis, such as defining standard formats and datasets, documenting community resources and benchmarking. AVAILABILITY AND IMPLEMENTATION All such materials are available at http://questfororthologs.org.
Collapse
Affiliation(s)
- Erik L L Sonnhammer
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London
| | - Toni Gabaldón
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London
| | - Alan W Sousa da Silva
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK
| | - Maria Martin
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK
| | - Marc Robinson-Rechavi
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London
| | - Brigitte Boeckmann
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK
| | - Paul D Thomas
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK
| | - Christophe Dessimoz
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London
| |
Collapse
|
114
|
Pereira C, Denise A, Lespinet O. A meta-approach for improving the prediction and the functional annotation of ortholog groups. BMC Genomics 2014; 15 Suppl 6:S16. [PMID: 25573073 PMCID: PMC4240552 DOI: 10.1186/1471-2164-15-s6-s16] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In comparative genomics, orthologs are used to transfer annotation from genes already characterized to newly sequenced genomes. Many methods have been developed for finding orthologs in sets of genomes. However, the application of different methods on the same proteome set can lead to distinct orthology predictions. METHODS We developed a method based on a meta-approach that is able to combine the results of several methods for orthologous group prediction. The purpose of this method is to produce better quality results by using the overlapping results obtained from several individual orthologous gene prediction procedures. Our method proceeds in two steps. The first aims to construct seeds for groups of orthologous genes; these seeds correspond to the exact overlaps between the results of all or several methods. In the second step, these seed groups are expanded by using HMM profiles. RESULTS We evaluated our method on two standard reference benchmarks, OrthoBench and Orthology Benchmark Service. Our method presents a higher level of accurately predicted groups than the individual input methods of orthologous group prediction. Moreover, our method increases the number of annotated orthologous pairs without decreasing the annotation quality compared to twelve state-of-the-art methods. CONCLUSIONS The meta-approach based method appears to be a reliable procedure for predicting orthologous groups. Since a large number of methods for predicting groups of orthologous genes exist, it is quite conceivable to apply this meta-approach to several combinations of different methods.
Collapse
|
115
|
Wittwer LD, Piližota I, Altenhoff AM, Dessimoz C. Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology. PeerJ 2014; 2:e607. [PMID: 25320677 PMCID: PMC4193403 DOI: 10.7717/peerj.607] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2014] [Accepted: 09/12/2014] [Indexed: 11/20/2022] Open
Abstract
Orthology inference and other sequence analyses across multiple genomes typically start by performing exhaustive pairwise sequence comparisons, a process referred to as "all-against-all". As this process scales quadratically in terms of the number of sequences analysed, this step can become a bottleneck, thus limiting the number of genomes that can be simultaneously analysed. Here, we explored ways of speeding-up the all-against-all step while maintaining its sensitivity. By exploiting the transitivity of homology and, crucially, ensuring that homology is defined in terms of consistent protein subsequences, our proof-of-concept resulted in a 4× speedup while recovering >99.6% of all homologs identified by the full all-against-all procedure on empirical sequences sets. In comparison, state-of-the-art k-mer approaches are orders of magnitude faster but only recover 3-14% of all homologous pairs. We also outline ideas to further improve the speed and recall of the new approach. An open source implementation is provided as part of the OMA standalone software at http://omabrowser.org/standalone.
Collapse
Affiliation(s)
- Lucas D Wittwer
- University College London, London, United Kingdom.,Swiss Institute of Bioinformatics, Zurich, Switzerland.,ETH Zurich, Department of Computer Science, Zurich, Switzerland
| | | | - Adrian M Altenhoff
- University College London, London, United Kingdom.,Swiss Institute of Bioinformatics, Zurich, Switzerland.,ETH Zurich, Department of Computer Science, Zurich, Switzerland
| | - Christophe Dessimoz
- University College London, London, United Kingdom.,Swiss Institute of Bioinformatics, Zurich, Switzerland
| |
Collapse
|
116
|
Complexity of gene expression evolution after duplication: protein dosage rebalancing. GENETICS RESEARCH INTERNATIONAL 2014; 2014:516508. [PMID: 25197576 PMCID: PMC4150538 DOI: 10.1155/2014/516508] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/26/2014] [Accepted: 08/03/2014] [Indexed: 11/17/2022]
Abstract
Ongoing debates about functional importance of gene duplications have been recently intensified by a heated discussion of the “ortholog conjecture” (OC). Under the OC, which is central to functional annotation of genomes, orthologous genes are functionally more similar than paralogous genes at the same level of sequence divergence. However, a recent study challenged the OC by reporting a greater functional similarity, in terms of gene ontology (GO) annotations and expression profiles, among within-species paralogs compared to orthologs. These findings were taken to indicate that functional similarity of homologous genes is primarily determined by the cellular context of the genes, rather than evolutionary history. Subsequent studies suggested that the OC appears to be generally valid when applied to mammalian evolution but the complete picture of evolution of gene expression also has to incorporate lineage-specific aspects of paralogy. The observed complexity of gene expression evolution after duplication can be explained through selection for gene dosage effect combined with the duplication-degeneration-complementation model. This paper discusses expression divergence of recent duplications occurring before functional divergence of proteins encoded by duplicate genes.
Collapse
|
117
|
Vanneste K, Baele G, Maere S, Van de Peer Y. Analysis of 41 plant genomes supports a wave of successful genome duplications in association with the Cretaceous-Paleogene boundary. Genome Res 2014; 24:1334-47. [PMID: 24835588 PMCID: PMC4120086 DOI: 10.1101/gr.168997.113] [Citation(s) in RCA: 277] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2013] [Accepted: 05/16/2014] [Indexed: 02/02/2023]
Abstract
Ancient whole-genome duplications (WGDs), also referred to as paleopolyploidizations, have been reported in most evolutionary lineages. Their attributed role remains a major topic of discussion, ranging from an evolutionary dead end to a road toward evolutionary success, with evidence supporting both fates. Previously, based on dating WGDs in a limited number of plant species, we found a clustering of angiosperm paleopolyploidizations around the Cretaceous-Paleogene (K-Pg) extinction event about 66 million years ago. Here we revisit this finding, which has proven controversial, by combining genome sequence information for many more plant lineages and using more sophisticated analyses. We include 38 full genome sequences and three transcriptome assemblies in a Bayesian evolutionary analysis framework that incorporates uncorrelated relaxed clock methods and fossil uncertainty. In accordance with earlier findings, we demonstrate a strongly nonrandom pattern of genome duplications over time with many WGDs clustering around the K-Pg boundary. We interpret these results in the context of recent studies on invasive polyploid plant species, and suggest that polyploid establishment is promoted during times of environmental stress. We argue that considering the evolutionary potential of polyploids in light of the environmental and ecological conditions present around the time of polyploidization could mitigate the stark contrast in the proposed evolutionary fates of polyploids.
Collapse
Affiliation(s)
- Kevin Vanneste
- Department of Plant Systems Biology, VIB, Ghent B-9052, Belgium; Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent B-9052, Belgium
| | - Guy Baele
- Department of Microbiology and Immunology, Rega Institute, KU Leuven, Leuven B-3000, Belgium
| | - Steven Maere
- Department of Plant Systems Biology, VIB, Ghent B-9052, Belgium; Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent B-9052, Belgium
| | - Yves Van de Peer
- Department of Plant Systems Biology, VIB, Ghent B-9052, Belgium; Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent B-9052, Belgium; Department of Genetics, Genomics Research Institute, University of Pretoria, Pretoria 0002, South Africa
| |
Collapse
|
118
|
Ward N, Moreno-Hagelsieb G. Quickly finding orthologs as reciprocal best hits with BLAT, LAST, and UBLAST: how much do we miss? PLoS One 2014; 9:e101850. [PMID: 25013894 PMCID: PMC4094424 DOI: 10.1371/journal.pone.0101850] [Citation(s) in RCA: 107] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2014] [Accepted: 06/11/2014] [Indexed: 11/30/2022] Open
Abstract
Reciprocal Best Hits (RBH) are a common proxy for orthology in comparative genomics. Essentially, a RBH is found when the proteins encoded by two genes, each in a different genome, find each other as the best scoring match in the other genome. NCBI's BLAST is the software most usually used for the sequence comparisons necessary to finding RBHs. Since sequence comparison can be time consuming, we decided to compare the number and quality of RBHs detected using algorithms that run in a fraction of the time as BLAST. We tested BLAT, LAST and UBLAST. All three programs ran in a hundredth to a 25th of the time required to run BLAST. A reduction in the number of homologs and RBHs found by the faster algorithms compared to BLAST becomes apparent as the genomes compared become more dissimilar, with BLAT, a program optimized for quickly finding very similar sequences, missing both the most homologs and the most RBHs. Though LAST produced the closest number of homologs and RBH to those produced with BLAST, UBLAST was very close, with either program producing between 0.6 and 0.8 of the RBHs as BLAST between dissimilar genomes, while in more similar genomes the differences were barely apparent. UBLAST ran faster than LAST, making it the best option among the programs tested.
Collapse
Affiliation(s)
- Natalie Ward
- Department of Biology, Wilfrid Laurier University, Waterloo, Ontario, Canada
| | | |
Collapse
|
119
|
Szövényi P, Devos N, Weston DJ, Yang X, Hock Z, Shaw JA, Shimizu KK, McDaniel SF, Wagner A. Efficient purging of deleterious mutations in plants with haploid selfing. Genome Biol Evol 2014; 6:1238-52. [PMID: 24879432 PMCID: PMC4041004 DOI: 10.1093/gbe/evu099] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
In diploid organisms, selfing reduces the efficiency of selection in removing deleterious mutations from a population. This need not be the case for all organisms. Some plants, for example, undergo an extreme form of selfing known as intragametophytic selfing, which immediately exposes all recessive deleterious mutations in a parental genome to selective purging. Here, we ask how effectively deleterious mutations are removed from such plants. Specifically, we study the extent to which deleterious mutations accumulate in a predominantly selfing and a predominantly outcrossing pair of moss species, using genome-wide transcriptome data. We find that the selfing species purge significantly more nonsynonymous mutations, as well as a greater proportion of radical amino acid changes which alter physicochemical properties of amino acids. Moreover, their purging of deleterious mutation is especially strong in conserved regions of protein-coding genes. Our observations show that selfing need not impede but can even accelerate the removal of deleterious mutations, and do so on a genome-wide scale.
Collapse
Affiliation(s)
- Péter Szövényi
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, SwitzerlandInstitute of Systematic Botany, University of Zurich, SwitzerlandSwiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, Lausanne, SwitzerlandMTA-ELTE-MTM Ecology Research Group, ELTE, Biological Institute, Hungary
| | | | - David J Weston
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN
| | - Xiaohan Yang
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN
| | - Zsófia Hock
- Institute of Systematic Botany, University of Zurich, Switzerland
| | | | - Kentaro K Shimizu
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, Switzerland
| | | | - Andreas Wagner
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, SwitzerlandSwiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, Lausanne, SwitzerlandBioinformatics Institute, Agency for Science, Technology and Research (A*STAR), SingaporeThe Santa Fe Institute, Santa Fe NM
| |
Collapse
|
120
|
Abstract
The use of model organisms as tools for the investigation of human genetic variation has significantly and rapidly advanced our understanding of the aetiologies underlying hereditary traits. However, while equivalences in the DNA sequence of two species may be readily inferred through evolutionary models, the identification of equivalence in the phenotypic consequences resulting from comparable genetic variation is far from straightforward, limiting the value of the modelling paradigm. In this review, we provide an overview of the emerging statistical and computational approaches to objectively identify phenotypic equivalence between human and model organisms with examples from the vertebrate models, mouse and zebrafish. Firstly, we discuss enrichment approaches, which deem the most frequent phenotype among the orthologues of a set of genes associated with a common human phenotype as the orthologous phenotype, or phenolog, in the model species. Secondly, we introduce and discuss computational reasoning approaches to identify phenotypic equivalences made possible through the development of intra- and interspecies ontologies. Finally, we consider the particular challenges involved in modelling neuropsychiatric disorders, which illustrate many of the remaining difficulties in developing comprehensive and unequivocal interspecies phenotype mappings.
Collapse
Affiliation(s)
- Peter N. Robinson
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
- Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Berlin, Germany
- Max Planck Institute for Molecular Genetics, Berlin, Germany
- Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
- * E-mail: (PNR); (CW)
| | - Caleb Webber
- MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
- * E-mail: (PNR); (CW)
| |
Collapse
|
121
|
Kim KM, Park JH, Bhattacharya D, Yoon HS. Applications of next-generation sequencing to unravelling the evolutionary history of algae. Int J Syst Evol Microbiol 2014; 64:333-345. [PMID: 24505071 DOI: 10.1099/ijs.0.054221-0] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
First-generation Sanger DNA sequencing revolutionized science over the past three decades and the current next-generation sequencing (NGS) technology has opened the doors to the next phase in the sequencing revolution. Using NGS, scientists are able to sequence entire genomes and to generate extensive transcriptome data from diverse photosynthetic eukaryotes in a timely and cost-effective manner. Genome data in particular shed light on the complicated evolutionary history of algae that form the basis of the food chain in many environments. In the Eukaryotic Tree of Life, the fact that photosynthetic lineages are positioned in four supergroups has important evolutionary consequences. We now know that the story of eukaryotic photosynthesis unfolds with a primary endosymbiosis between an ancestral heterotrophic protist and a captured cyanobacterium that gave rise to the glaucophytes, red algae and Viridiplantae (green algae and land plants). These primary plastids were then transferred to other eukaryotic groups through secondary endosymbiosis. A red alga was captured by the ancestor(s) of the stramenopiles, alveolates (dinoflagellates, apicomplexa, chromeridae), cryptophytes and haptophytes, whereas green algae were captured independently by the common ancestors of the euglenophytes and chlorarachniophytes. A separate case of primary endosymbiosis is found in the filose amoeba Paulinella chromatophora, which has at least nine heterotrophic sister species. Paulinella genome data provide detailed insights into the early stages of plastid establishment. Therefore, genome data produced by NGS have provided many novel insights into the taxonomy, phylogeny and evolutionary history of photosynthetic eukaryotes.
Collapse
Affiliation(s)
- Kyeong Mi Kim
- Department of Biological Sciences, Sungkyunkwan University, Suwon 440-746, Republic of Korea
| | - Jun-Hyung Park
- Codes Division, Insilicogen Inc., Suwon, 440-746, Republic of Korea
| | - Debashish Bhattacharya
- Department of Ecology, Evolution and Natural Resources and Institute of Marine and Coastal Science, Rutgers University, NJ 08901, USA
| | - Hwan Su Yoon
- Department of Biological Sciences, Sungkyunkwan University, Suwon 440-746, Republic of Korea
| |
Collapse
|
122
|
Lovell PV, Wirthlin M, Wilhelm L, Minx P, Lazar NH, Carbone L, Warren WC, Mello CV. Conserved syntenic clusters of protein coding genes are missing in birds. Genome Biol 2014; 15:565. [PMID: 25518852 PMCID: PMC4290089 DOI: 10.1186/s13059-014-0565-1] [Citation(s) in RCA: 86] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2014] [Accepted: 12/08/2014] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Birds are one of the most highly successful and diverse groups of vertebrates, having evolved a number of distinct characteristics, including feathers and wings, a sturdy lightweight skeleton and unique respiratory and urinary/excretion systems. However, the genetic basis of these traits is poorly understood. RESULTS Using comparative genomics based on extensive searches of 60 avian genomes, we have found that birds lack approximately 274 protein coding genes that are present in the genomes of most vertebrate lineages and are for the most part organized in conserved syntenic clusters in non-avian sauropsids and in humans. These genes are located in regions associated with chromosomal rearrangements, and are largely present in crocodiles, suggesting that their loss occurred subsequent to the split of dinosaurs/birds from crocodilians. Many of these genes are associated with lethality in rodents, human genetic disorders, or biological functions targeting various tissues. Functional enrichment analysis combined with orthogroup analysis and paralog searches revealed enrichments that were shared by non-avian species, present only in birds, or shared between all species. CONCLUSIONS Together these results provide a clearer definition of the genetic background of extant birds, extend the findings of previous studies on missing avian genes, and provide clues about molecular events that shaped avian evolution. They also have implications for fields that largely benefit from avian studies, including development, immune system, oncogenesis, and brain function and cognition. With regards to the missing genes, birds can be considered ‘natural knockouts’ that may become invaluable model organisms for several human diseases.
Collapse
Affiliation(s)
- Peter V Lovell
- />Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR USA
| | - Morgan Wirthlin
- />Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR USA
| | - Larry Wilhelm
- />Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR USA
- />Oregon National Primate Research Center, West Campus, Oregon Health and Science University, Portland, OR USA
| | - Patrick Minx
- />The Genome Institute, Washington University School of Medicine, St. Louis, MO USA
| | - Nathan H Lazar
- />Oregon National Primate Research Center, West Campus, Oregon Health and Science University, Portland, OR USA
- />Bioinformatics and Computational Biology Division, Oregon Health & Science University, Portland, OR USA
| | - Lucia Carbone
- />Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR USA
- />Oregon National Primate Research Center, West Campus, Oregon Health and Science University, Portland, OR USA
| | - Wesley C Warren
- />The Genome Institute, Washington University School of Medicine, St. Louis, MO USA
| | - Claudio V Mello
- />Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR USA
| |
Collapse
|
123
|
Haggerty LS, Jachiet PA, Hanage WP, Fitzpatrick DA, Lopez P, O'Connell MJ, Pisani D, Wilkinson M, Bapteste E, McInerney JO. A pluralistic account of homology: adapting the models to the data. Mol Biol Evol 2013; 31:501-16. [PMID: 24273322 PMCID: PMC3935183 DOI: 10.1093/molbev/mst228] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Defining homologous genes is important in many evolutionary studies but raises obvious issues. Some of these issues are conceptual and stem from our assumptions of how a gene evolves, others are practical, and depend on the algorithmic decisions implemented in existing software. Therefore, to make progress in the study of homology, both ontological and epistemological questions must be considered. In particular, defining homologous genes cannot be solely addressed under the classic assumptions of strong tree thinking, according to which genes evolve in a strictly tree-like fashion of vertical descent and divergence and the problems of homology detection are primarily methodological. Gene homology could also be considered under a different perspective where genes evolve as “public goods,” subjected to various introgressive processes. In this latter case, defining homologous genes becomes a matter of designing models suited to the actual complexity of the data and how such complexity arises, rather than trying to fit genetic data to some a priori tree-like evolutionary model, a practice that inevitably results in the loss of much information. Here we show how important aspects of the problems raised by homology detection methods can be overcome when even more fundamental roots of these problems are addressed by analyzing public goods thinking evolutionary processes through which genes have frequently originated. This kind of thinking acknowledges distinct types of homologs, characterized by distinct patterns, in phylogenetic and nonphylogenetic unrooted or multirooted networks. In addition, we define “family resemblances” to include genes that are related through intermediate relatives, thereby placing notions of homology in the broader context of evolutionary relationships. We conclude by presenting some payoffs of adopting such a pluralistic account of homology and family relationship, which expands the scope of evolutionary analyses beyond the traditional, yet relatively narrow focus allowed by a strong tree-thinking view on gene evolution.
Collapse
Affiliation(s)
- Leanne S Haggerty
- Bioinformatics and Molecular Evolution Unit, Department of Biology, National University of Ireland Maynooth, Maynooth, Co. Kildare, Ireland
| | | | | | | | | | | | | | | | | | | |
Collapse
|
124
|
Ebersberger I, Simm S, Leisegang MS, Schmitzberger P, Mirus O, von Haeseler A, Bohnsack MT, Schleiff E. The evolution of the ribosome biogenesis pathway from a yeast perspective. Nucleic Acids Res 2013; 42:1509-23. [PMID: 24234440 PMCID: PMC3919561 DOI: 10.1093/nar/gkt1137] [Citation(s) in RCA: 77] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Ribosome biogenesis is fundamental for cellular life, but surprisingly little is known about the underlying pathway. In eukaryotes a comprehensive collection of experimentally verified ribosome biogenesis factors (RBFs) exists only for Saccharomyces cerevisiae. Far less is known for other fungi, animals or plants, and insights are even more limited for archaea. Starting from 255 yeast RBFs, we integrated ortholog searches, domain architecture comparisons and, in part, manual curation to investigate the inventories of RBF candidates in 261 eukaryotes, 26 archaea and 57 bacteria. The resulting phylogenetic profiles reveal the evolutionary ancestry of the yeast pathway. The oldest core comprising 20 RBF lineages dates back to the last universal common ancestor, while the youngest 20 factors are confined to the Saccharomycotina. On this basis, we outline similarities and differences of ribosome biogenesis across contemporary species. Archaea, so far a rather uncharted domain, possess 38 well-supported RBF candidates of which some are known to form functional sub-complexes in yeast. This provides initial evidence that ribosome biogenesis in eukaryotes and archaea follows similar principles. Within eukaryotes, RBF repertoires vary considerably. A comparison of yeast and human reveals that lineage-specific adaptation via RBF exclusion and addition characterizes the evolution of this ancient pathway.
Collapse
Affiliation(s)
- Ingo Ebersberger
- Institute for Cell Biology and Neuroscience, Goethe University, Frankfurt 60438, Germany, Center for Integrative Bioinformatics, Max F Perutz Laboratories, University of Vienna, Medical University of Vienna, Vienna 1030, Austria, Institute for Molecular Biosciences, Goethe University, Frankfurt 60438, Germany, Faculty of Computer Science, University of Vienna, Vienna 1030, Austria, Cluster of Excellence Macromolecular Complexes, Goethe University, Frankfurt 60438, Germany, Department of Biochemistry I, Universitätsmedizin Göttingen, Göttingen 37073, Germany and Center of Membrane Proteomics, Goethe University, Frankfurt 60438, Germany
| | | | | | | | | | | | | | | |
Collapse
|
125
|
Wang P, Lai WF, Li MJ, Xu F, Yalamanchili HK, Lovell-Badge R, Wang J. Inference of gene-phenotype associations via protein-protein interaction and orthology. PLoS One 2013; 8:e77478. [PMID: 24194887 PMCID: PMC3806783 DOI: 10.1371/journal.pone.0077478] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2013] [Accepted: 08/30/2013] [Indexed: 01/23/2023] Open
Abstract
One of the fundamental goals of genetics is to understand gene functions and their associated phenotypes. To achieve this goal, in this study we developed a computational algorithm that uses orthology and protein-protein interaction information to infer gene-phenotype associations for multiple species. Furthermore, we developed a web server that provides genome-wide phenotype inference for six species: fly, human, mouse, worm, yeast, and zebrafish. We evaluated our inference method by comparing the inferred results with known gene-phenotype associations. The high Area Under the Curve values suggest a significant performance of our method. By applying our method to two human representative diseases, Type 2 Diabetes and Breast Cancer, we demonstrated that our method is able to identify related Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes pathways. The web server can be used to infer functions and putative phenotypes of a gene along with the candidate genes of a phenotype, and thus aids in disease candidate gene discovery. Our web server is available at http://jjwanglab.org/PhenoPPIOrth.
Collapse
Affiliation(s)
- Panwen Wang
- Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
- Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen, China
| | - Wing-Fu Lai
- Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Mulin Jun Li
- Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
- Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen, China
| | - Feng Xu
- Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
- Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen, China
| | - Hari Krishna Yalamanchili
- Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
- Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen, China
| | - Robin Lovell-Badge
- Division of Developmental Genetics, MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London, United Kingdom
| | - Junwen Wang
- Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
- Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen, China
- Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
- * E-mail:
| |
Collapse
|
126
|
Prasad A, Kumar SS, Dessimoz C, Bleuler S, Laule O, Hruz T, Gruissem W, Zimmermann P. Global regulatory architecture of human, mouse and rat tissue transcriptomes. BMC Genomics 2013; 14:716. [PMID: 24138449 PMCID: PMC4008137 DOI: 10.1186/1471-2164-14-716] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2013] [Accepted: 09/25/2013] [Indexed: 01/30/2023] Open
Abstract
Background Predicting molecular responses in human by extrapolating results from model organisms requires a precise understanding of the architecture and regulation of biological mechanisms across species. Results Here, we present a large-scale comparative analysis of organ and tissue transcriptomes involving the three mammalian species human, mouse and rat. To this end, we created a unique, highly standardized compendium of tissue expression. Representative tissue specific datasets were aggregated from more than 33,900 Affymetrix expression microarrays. For each organism, we created two expression datasets covering over 55 distinct tissue types with curated data from two independent microarray platforms. Principal component analysis (PCA) revealed that the tissue-specific architecture of transcriptomes is highly conserved between human, mouse and rat. Moreover, tissues with related biological function clustered tightly together, even if the underlying data originated from different labs and experimental settings. Overall, the expression variance caused by tissue type was approximately 10 times higher than the variance caused by perturbations or diseases, except for a subset of cancers and chemicals. Pairs of gene orthologs exhibited higher expression correlation between mouse and rat than with human. Finally, we show evidence that tissue expression profiles, if combined with sequence similarity, can improve the correct assignment of functionally related homologs across species. Conclusion The results demonstrate that tissue-specific regulation is the main determinant of transcriptome composition and is highly conserved across mammalian species.
Collapse
|
127
|
Repo H, Kuokkanen E, Oksanen E, Goldman A, Heikinheimo P. Is the bovine lysosomal phospholipase B-like protein an amidase? Proteins 2013; 82:300-11. [PMID: 23934913 DOI: 10.1002/prot.24388] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2013] [Revised: 07/17/2013] [Accepted: 07/26/2013] [Indexed: 12/17/2022]
Abstract
The main function of lysosomal proteins is to degrade cellular macromolecules. We purified a novel lysosomal protein to homogeneity from bovine kidneys. By gene annotation, this protein is defined as a bovine phospholipase B-like protein 1 (bPLBD1) and, to better understand its biological function, we solved its structure at 1.9 Å resolution. We showed that bPLBD1 has uniform noncomplex-type N-glycosylation and that it localized to the lysosome. The first step in lysosomal protein transport, the initiation of mannose-6-phosphorylation by a N-acetylglucosamine-1-phosphotransferase, requires recognition of at least two distinct lysines on the protein surface. We identified candidate lysines by analyzing the structural and sequentially conserved N-glycosylation sites and lysines in bPLBD1 and in the homologous mouse PLBD2. Our model suggests that N408 is the primarily phosphorylated glycan, and K358 a key residue for N-acetylglucosamine-1-phosphotransferase recognition. Two other lysines, K334 and K342, provide the required second site for N-acetylglucosamine-1-phosphotransferase recognition. bPLBD1 is an N-terminal nucleophile (Ntn) hydrolase. By comparison with other Ntn-hydrolases, we conclude that the acyl moiety of PLBD1 substrate must be small to fit the putative binding pocket, whereas the space for the rest of the substrate is a large open cleft. Finally, as all the known substrates of Ntn-hydrolases have amide bonds, we suggest that bPLBD1 may be an amidase or peptidase instead of lipase, explaining the difficulty in finding a good substrate for any members of the PLBD family.
Collapse
Affiliation(s)
- Heidi Repo
- Institute of Biotechnology, University of Helsinki, FI-00014, Helsinki, Finland; Department of Biosciences, University of Helsinki, FI-00014, Helsinki, Finland
| | | | | | | | | |
Collapse
|
128
|
GET_HOMOLOGUES, a versatile software package for scalable and robust microbial pangenome analysis. Appl Environ Microbiol 2013; 79:7696-701. [PMID: 24096415 DOI: 10.1128/aem.02411-13] [Citation(s) in RCA: 585] [Impact Index Per Article: 53.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
GET_HOMOLOGUES is an open-source software package that builds on popular orthology-calling approaches making highly customizable and detailed pangenome analyses of microorganisms accessible to nonbioinformaticians. It can cluster homologous gene families using the bidirectional best-hit, COGtriangles, or OrthoMCL clustering algorithms. Clustering stringency can be adjusted by scanning the domain composition of proteins using the HMMER3 package, by imposing desired pairwise alignment coverage cutoffs, or by selecting only syntenic genes. The resulting homologous gene families can be made even more robust by computing consensus clusters from those generated by any combination of the clustering algorithms and filtering criteria. Auxiliary scripts make the construction, interrogation, and graphical display of core genome and pangenome sets easy to perform. Exponential and binomial mixture models can be fitted to the data to estimate theoretical core genome and pangenome sizes, and high-quality graphics can be generated. Furthermore, pangenome trees can be easily computed and basic comparative genomics performed to identify lineage-specific genes or gene family expansions. The software is designed to take advantage of modern multiprocessor personal computers as well as computer clusters to parallelize time-consuming tasks. To demonstrate some of these capabilities, we survey a set of 50 Streptococcus genomes annotated in the Orthologous Matrix (OMA) browser as a benchmark case. The package can be downloaded at http://www.eead.csic.es/compbio/soft/gethoms.php and http://maya.ccg.unam.mx/soft/gethoms.php.
Collapse
|
129
|
Verma D, Jacobs DJ, Livesay DR. Variations within class-A β-lactamase physiochemical properties reflect evolutionary and environmental patterns, but not antibiotic specificity. PLoS Comput Biol 2013; 9:e1003155. [PMID: 23874193 PMCID: PMC3715408 DOI: 10.1371/journal.pcbi.1003155] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2012] [Accepted: 06/10/2013] [Indexed: 11/19/2022] Open
Abstract
The bacterial enzyme β-lactamase hydrolyzes the β-lactam ring of penicillin and chemically related antibiotics, rendering them ineffective. Due to rampant antibiotic overuse, the enzyme is evolving new resistance activities at an alarming rate. Related, the enzyme's global physiochemical properties exhibit various amounts of conservation and variability across the family. To that end, we characterize the extent of property conservation within twelve different class-A β-lactamases, and conclusively establish that the systematic variations therein parallel their evolutionary history. Large and systematic differences within electrostatic potential maps and pairwise residue-to-residue couplings are observed across the protein, which robustly reflect phylogenetic outgroups. Other properties are more conserved (such as residue pKa values, electrostatic networks, and backbone flexibility), yet they also have systematic variations that parallel the phylogeny in a statistically significant way. Similarly, the above properties also parallel the environmental condition of the bacteria they are from in a statistically significant way. However, it is interesting and surprising that the only one of the global properties (protein charge) parallels the functional specificity patterns; meaning antibiotic resistance activities are not significantly constraining the global physiochemical properties. Rather, extended spectrum activities can emerge from the background of nearly any set of electrostatic and dynamic properties.
Collapse
Affiliation(s)
- Deeptak Verma
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, North Carolina, United States of America
| | - Donald J. Jacobs
- Department of Physics and Optical Science, University of North Carolina at Charlotte, Charlotte, North Carolina, United States of America
| | - Dennis R. Livesay
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, North Carolina, United States of America
| |
Collapse
|
130
|
Primmer CR, Papakostas S, Leder EH, Davis MJ, Ragan MA. Annotated genes and nonannotated genomes: cross-species use of Gene Ontology in ecology and evolution research. Mol Ecol 2013; 22:3216-41. [DOI: 10.1111/mec.12309] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2012] [Revised: 02/22/2013] [Accepted: 02/26/2013] [Indexed: 02/01/2023]
Affiliation(s)
- C. R. Primmer
- Department of Biology; University of Turku; 20014 Turku Finland
| | - S. Papakostas
- Department of Biology; University of Turku; 20014 Turku Finland
| | - E. H. Leder
- Department of Biology; University of Turku; 20014 Turku Finland
| | - M. J. Davis
- Institute for Molecular Bioscience; The University of Queensland; Brisbane Qld 4072 Australia
| | - M. A. Ragan
- Institute for Molecular Bioscience; The University of Queensland; Brisbane Qld 4072 Australia
| |
Collapse
|
131
|
Wolf YI, Koonin EV. A tight link between orthologs and bidirectional best hits in bacterial and archaeal genomes. Genome Biol Evol 2013; 4:1286-94. [PMID: 23160176 PMCID: PMC3542571 DOI: 10.1093/gbe/evs100] [Citation(s) in RCA: 71] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Orthologous relationships between genes are routinely inferred from bidirectional best hits (BBH) in pairwise genome comparisons. However, to our knowledge, it has never been quantitatively demonstrated that orthologs form BBH. To test this “BBH-orthology conjecture,” we take advantage of the operon organization of bacterial and archaeal genomes and assume that, when two genes in compared genomes are flanked by two BBH show statistically significant sequence similarity to one another, these genes are bona fide orthologs. Under this assumption, we tested whether middle genes in “syntenic orthologous gene triplets” form BBH. We found that this was the case in more than 95% of the syntenic gene triplets in all genome comparisons. A detailed examination of the exceptions to this pattern, including maximum likelihood phylogenetic tree analysis, showed that some of these deviations involved artifacts of genome annotation, whereas very small fractions represented random assignment of the best hit to one of closely related in-paralogs, paralogous displacement in situ, or even less frequent genuine violations of the BBH–orthology conjecture caused by acceleration of evolution in one of the orthologs. We conclude that, at least in prokaryotes, genes for which independent evidence of orthology is available typically form BBH and, conversely, BBH can serve as a strong indication of gene orthology.
Collapse
|
132
|
Pegueroles C, Laurie S, Albà MM. Accelerated evolution after gene duplication: a time-dependent process affecting just one copy. Mol Biol Evol 2013; 30:1830-42. [PMID: 23625888 DOI: 10.1093/molbev/mst083] [Citation(s) in RCA: 87] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Gene duplication is widely regarded as a major mechanism modeling genome evolution and function. However, the mechanisms that drive the evolution of the two, initially redundant, gene copies are still ill defined. Many gene duplicates experience evolutionary rate acceleration, but the relative contribution of positive selection and random drift to the retention and subsequent evolution of gene duplicates, and for how long the molecular clock may be distorted by these processes, remains unclear. Focusing on rodent genes that duplicated before and after the mouse and rat split, we find significantly increased sequence divergence after duplication in only one of the copies, which in nearly all cases corresponds to the novel daughter copy, independent of the mechanism of duplication. We observe that the evolutionary rate of the accelerated copy, measured as the ratio of nonsynonymous to synonymous substitutions, is on average 5-fold higher in the period spanning 4-12 My after the duplication than it was before the duplication. This increase can be explained, at least in part, by the action of positive selection according to the results of the maximum likelihood-based branch-site test. Subsequently, the rate decelerates until purifying selection completely returns to preduplication levels. Reversion to the original rates has already been accomplished 40.5 My after the duplication event, corresponding to a genetic distance of about 0.28 synonymous substitutions per site. Differences in tissue gene expression patterns parallel those of substitution rates, reinforcing the role of neofunctionalization in explaining the evolution of young gene duplicates.
Collapse
Affiliation(s)
- Cinta Pegueroles
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Research Institute (IMIM), Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | | | | |
Collapse
|
133
|
Zhang YB, Liu TK, Jiang J, Shi J, Liu Y, Li S, Gui JF. Identification of a novel Gig2 gene family specific to non-amniote vertebrates. PLoS One 2013; 8:e60588. [PMID: 23593256 PMCID: PMC3617106 DOI: 10.1371/journal.pone.0060588] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2012] [Accepted: 02/28/2013] [Indexed: 12/15/2022] Open
Abstract
Gig2 (grass carp reovirus (GCRV)-induced gene 2) is first identified as a novel fish interferon (IFN)-stimulated gene (ISG). Overexpression of a zebrafish Gig2 gene can protect cultured fish cells from virus infection. In the present study, we identify a novel gene family that is comprised of genes homologous to the previously characterized Gig2. EST/GSS search and in silico cloning identify 190 Gig2 homologous genes in 51 vertebrate species ranged from lampreys to amphibians. Further large-scale search of vertebrate and invertebrate genome databases indicate that Gig2 gene family is specific to non-amniotes including lampreys, sharks/rays, ray-finned fishes and amphibians. Phylogenetic analysis and synteny analysis reveal lineage-specific expansion of Gig2 gene family and also provide valuable evidence for the fish-specific genome duplication (FSGD) hypothesis. Although Gig2 family proteins exhibit no significant sequence similarity to any known proteins, a typical Gig2 protein appears to consist of two conserved parts: an N-terminus that bears very low homology to the catalytic domains of poly(ADP-ribose) polymerases (PARPs), and a novel C-terminal domain that is unique to this gene family. Expression profiling of zebrafish Gig2 family genes shows that some duplicate pairs have diverged in function via acquisition of novel spatial and/or temporal expression under stresses. The specificity of this gene family to non-amniotes might contribute to a large extent to distinct physiology in non-amniote vertebrates.
Collapse
Affiliation(s)
- Yi-Bing Zhang
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
- * E-mail: (YZ) (YZ); (JG) (JG)
| | - Ting-Kai Liu
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
| | - Jun Jiang
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
| | - Jun Shi
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
| | - Ying Liu
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
| | - Shun Li
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
| | - Jian-Fang Gui
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
- * E-mail: (YZ) (YZ); (JG) (JG)
| |
Collapse
|
134
|
Abstract
Orthologues and paralogues are types of homologous genes that are related by speciation or duplication, respectively. Orthologous genes are generally assumed to retain equivalent functions in different organisms and to share other key properties. Several recent comparative genomic studies have focused on testing these expectations. Here we discuss the complexity of the evolution of gene-phenotype relationships and assess the validity of the key implications of orthology and paralogy relationships as general statistical trends and guiding principles.
Collapse
|
135
|
Dalquen DA, Altenhoff AM, Gonnet GH, Dessimoz C. The impact of gene duplication, insertion, deletion, lateral gene transfer and sequencing error on orthology inference: a simulation study. PLoS One 2013; 8:e56925. [PMID: 23451112 PMCID: PMC3581572 DOI: 10.1371/journal.pone.0056925] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2012] [Accepted: 01/16/2013] [Indexed: 11/19/2022] Open
Abstract
The identification of orthologous genes, a prerequisite for numerous analyses in comparative and functional genomics, is commonly performed computationally from protein sequences. Several previous studies have compared the accuracy of orthology inference methods, but simulated data has not typically been considered in cross-method assessment studies. Yet, while dependent on model assumptions, simulation-based benchmarking offers unique advantages: contrary to empirical data, all aspects of simulated data are known with certainty. Furthermore, the flexibility of simulation makes it possible to investigate performance factors in isolation of one another.Here, we use simulated data to dissect the performance of six methods for orthology inference available as standalone software packages (Inparanoid, OMA, OrthoInspector, OrthoMCL, QuartetS, SPIMAP) as well as two generic approaches (bidirectional best hit and reciprocal smallest distance). We investigate the impact of various evolutionary forces (gene duplication, insertion, deletion, and lateral gene transfer) and technological artefacts (ambiguous sequences) on orthology inference. We show that while gene duplication/loss and insertion/deletion are well handled by most methods (albeit for different trade-offs of precision and recall), lateral gene transfer disrupts all methods. As for ambiguous sequences, which might result from poor sequencing, assembly, or genome annotation, we show that they affect alignment score-based orthology methods more strongly than their distance-based counterparts.
Collapse
Affiliation(s)
- Daniel A. Dalquen
- Eldgenössische Technische Hochschule Zurich, Department of Computer Science, Zürich, Switzerland
- Swiss Institute of Bioinformatics, Zürich, Switzerland
| | - Adrian M. Altenhoff
- Eldgenössische Technische Hochschule Zurich, Department of Computer Science, Zürich, Switzerland
- Swiss Institute of Bioinformatics, Zürich, Switzerland
| | - Gaston H. Gonnet
- Eldgenössische Technische Hochschule Zurich, Department of Computer Science, Zürich, Switzerland
- Swiss Institute of Bioinformatics, Zürich, Switzerland
| | - Christophe Dessimoz
- Eldgenössische Technische Hochschule Zurich, Department of Computer Science, Zürich, Switzerland
- Swiss Institute of Bioinformatics, Zürich, Switzerland
- European Bioinformatics Institute, Hinxton, Cambridge, United Kingdom
| |
Collapse
|
136
|
Phyletic profiling with cliques of orthologs is enhanced by signatures of paralogy relationships. PLoS Comput Biol 2013; 9:e1002852. [PMID: 23308060 PMCID: PMC3536626 DOI: 10.1371/journal.pcbi.1002852] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2012] [Accepted: 11/05/2012] [Indexed: 11/19/2022] Open
Abstract
New microbial genomes are sequenced at a high pace, allowing insight into the genetics of not only cultured microbes, but a wide range of metagenomic collections such as the human microbiome. To understand the deluge of genomic data we face, computational approaches for gene functional annotation are invaluable. We introduce a novel model for computational annotation that refines two established concepts: annotation based on homology and annotation based on phyletic profiling. The phyletic profiling-based model that includes both inferred orthologs and paralogs-homologs separated by a speciation and a duplication event, respectively-provides more annotations at the same average Precision than the model that includes only inferred orthologs. For experimental validation, we selected 38 poorly annotated Escherichia coli genes for which the model assigned one of three GO terms with high confidence: involvement in DNA repair, protein translation, or cell wall synthesis. Results of antibiotic stress survival assays on E. coli knockout mutants showed high agreement with our model's estimates of accuracy: out of 38 predictions obtained at the reported Precision of 60%, we confirmed 25 predictions, indicating that our confidence estimates can be used to make informed decisions on experimental validation. Our work will contribute to making experimental validation of computational predictions more approachable, both in cost and time. Our predictions for 998 prokaryotic genomes include ~400000 specific annotations with the estimated Precision of 90%, ~19000 of which are highly specific-e.g. "penicillin binding," "tRNA aminoacylation for protein translation," or "pathogenesis"-and are freely available at http://gorbi.irb.hr/.
Collapse
|
137
|
Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, Lin J, Minguez P, Bork P, von Mering C, Jensen LJ. STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res 2013; 41:D808-15. [PMID: 23203871 PMCID: PMC3531103 DOI: 10.1093/nar/gks1094] [Citation(s) in RCA: 3220] [Impact Index Per Article: 292.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2012] [Revised: 10/15/2012] [Accepted: 10/18/2012] [Indexed: 12/12/2022] Open
Abstract
Complete knowledge of all direct and indirect interactions between proteins in a given cell would represent an important milestone towards a comprehensive description of cellular mechanisms and functions. Although this goal is still elusive, considerable progress has been made-particularly for certain model organisms and functional systems. Currently, protein interactions and associations are annotated at various levels of detail in online resources, ranging from raw data repositories to highly formalized pathway databases. For many applications, a global view of all the available interaction data is desirable, including lower-quality data and/or computational predictions. The STRING database (http://string-db.org/) aims to provide such a global perspective for as many organisms as feasible. Known and predicted associations are scored and integrated, resulting in comprehensive protein networks covering >1100 organisms. Here, we describe the update to version 9.1 of STRING, introducing several improvements: (i) we extend the automated mining of scientific texts for interaction information, to now also include full-text articles; (ii) we entirely re-designed the algorithm for transferring interactions from one model organism to the other; and (iii) we provide users with statistical information on any functional enrichment observed in their networks.
Collapse
Affiliation(s)
- Andrea Franceschini
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Damian Szklarczyk
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Sune Frankild
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Michael Kuhn
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Milan Simonovic
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Alexander Roth
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Jianyi Lin
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Pablo Minguez
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Peer Bork
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Christian von Mering
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Lars J. Jensen
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| |
Collapse
|
138
|
The ortholog conjecture is untestable by the current gene ontology but is supported by RNA sequencing data. PLoS Comput Biol 2012; 8:e1002784. [PMID: 23209392 PMCID: PMC3510086 DOI: 10.1371/journal.pcbi.1002784] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2012] [Accepted: 10/02/2012] [Indexed: 11/19/2022] Open
Abstract
The ortholog conjecture posits that orthologous genes are functionally more similar than paralogous genes. This conjecture is a cornerstone of phylogenomics and is used daily by both computational and experimental biologists in predicting, interpreting, and understanding gene functions. A recent study, however, challenged the ortholog conjecture on the basis of experimentally derived Gene Ontology (GO) annotations and microarray gene expression data in human and mouse. It instead proposed that the functional similarity of homologous genes is primarily determined by the cellular context in which the genes act, explaining why a greater functional similarity of (within-species) paralogs than (between-species) orthologs was observed. Here we show that GO-based functional similarity between human and mouse orthologs, relative to that between paralogs, has been increasing in the last five years. Further, compared with paralogs, orthologs are less likely to be included in the same study, causing an underestimation in their functional similarity. A close examination of functional studies of homologs with identical protein sequences reveals experimental biases, annotation errors, and homology-based functional inferences that are labeled in GO as experimental. These problems and the temporary nature of the GO-based finding make the current GO inappropriate for testing the ortholog conjecture. RNA sequencing (RNA-Seq) is known to be superior to microarray for comparing the expressions of different genes or in different species. Our analysis of a large RNA-Seq dataset of multiple tissues from eight mammals and the chicken shows that the expression similarity between orthologs is significantly higher than that between within-species paralogs, supporting the ortholog conjecture and refuting the cellular context hypothesis for gene expression. We conclude that the ortholog conjecture remains largely valid to the extent that it has been tested, but further scrutiny using more and better functional data is needed.
Collapse
|
139
|
Whiteside MD, Winsor GL, Laird MR, Brinkman FSL. OrtholugeDB: a bacterial and archaeal orthology resource for improved comparative genomic analysis. Nucleic Acids Res 2012. [PMID: 23203876 PMCID: PMC3531125 DOI: 10.1093/nar/gks1241] [Citation(s) in RCA: 63] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Prediction of orthologs (homologous genes that diverged because of speciation) is an integral component of many comparative genomics methods. Although orthologs are more likely to have similar function versus paralogs (genes that diverged because of duplication), recent studies have shown that their degree of functional conservation is variable. Also, there are inherent problems with several large-scale ortholog prediction approaches. To address these issues, we previously developed Ortholuge, which uses phylogenetic distance ratios to provide more precise ortholog assessments for a set of predicted orthologs. However, the original version of Ortholuge required manual intervention and was not easily accessible; therefore, we now report the development of OrtholugeDB, available online at http://www.pathogenomics.sfu.ca/ortholugedb. OrtholugeDB provides ortholog predictions for completely sequenced bacterial and archaeal genomes from NCBI based on reciprocal best Basic Local Alignment Search Tool hits, supplemented with further evaluation by the more precise Ortholuge method. The OrtholugeDB web interface facilitates user-friendly and flexible ortholog analysis, from single genes to genomes, plus flexible data download options. We compare Ortholuge with similar methods, showing how it may more consistently identify orthologs with conserved features across a wide range of taxonomic distances. OrtholugeDB facilitates rapid, and more accurate, bacterial and archaeal comparative genomic analysis and large-scale ortholog predictions.
Collapse
Affiliation(s)
- Matthew D Whiteside
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia V5A 1S6, Canada
| | | | | | | |
Collapse
|
140
|
Current challenges in genome annotation through structural biology and bioinformatics. Curr Opin Struct Biol 2012; 22:594-601. [PMID: 22884875 DOI: 10.1016/j.sbi.2012.07.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2012] [Revised: 06/29/2012] [Accepted: 07/09/2012] [Indexed: 01/25/2023]
Abstract
With the huge volume in genomic sequences being generated from high-throughout sequencing projects the requirement for providing accurate and detailed annotations of gene products has never been greater. It is proving to be a huge challenge for computational biologists to use as much information as possible from experimental data to provide annotations for genome data of unknown function. A central component to this process is to use experimentally determined structures, which provide a means to detect homology that is not discernable from just the sequence and permit the consequences of genomic variation to be realized at the molecular level. In particular, structures also form the basis of many bioinformatics methods for improving the detailed functional annotations of enzymes in combination with similarities in sequence and chemistry.
Collapse
|