1
|
Langleib M, Calvelo J, Costábile A, Castillo E, Tort JF, Hoffmann FG, Protasio AV, Koziol U, Iriarte A. Evolutionary analysis of species-specific duplications in flatworm genomes. Mol Phylogenet Evol 2024; 199:108141. [PMID: 38964593 DOI: 10.1016/j.ympev.2024.108141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2023] [Revised: 06/15/2024] [Accepted: 07/01/2024] [Indexed: 07/06/2024]
Abstract
Platyhelminthes, also known as flatworms, is a phylum of bilaterian invertebrates infamous for their parasitic representatives. The classes Cestoda, Monogenea, and Trematoda comprise parasitic helminths inhabiting multiple hosts, including fishes, humans, and livestock, and are responsible for considerable economic damage and burden on human health. As in other animals, the genomes of flatworms have a wide variety of paralogs, genes related via duplication, whose origins could be mapped throughout the evolution of the phylum. Through in-silico analysis, we studied inparalogs, i.e., species-specific duplications, focusing on their biological functions, expression changes, and evolutionary rate. These genes are thought to be key players in the adaptation process of species to each particular niche. Our results showed that genes related with specific functional terms, such as response to stress, transferase activity, oxidoreductase activity, and peptidases, are overrepresented among inparalogs. This trend is conserved among species from different classes, including free-living species. Available expression data from Schistosoma mansoni, a parasite from the trematode class, demonstrated high conservation of expression patterns between inparalogs, but with notable exceptions, which also display evidence of rapid evolution. We discuss how natural selection may operate to maintain these genes and the particular duplication models that fit better to the observations. Our work supports the critical role of gene duplication in the evolution of flatworms, representing the first study of inparalogs evolution at the genome-wide level in this group.
Collapse
Affiliation(s)
- Mauricio Langleib
- Laboratorio de Biología Computacional, Departamento de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay; Departamento de Genética, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay
| | - Javier Calvelo
- Laboratorio de Biología Computacional, Departamento de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay
| | - Alicia Costábile
- Sección Bioquímica, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| | - Estela Castillo
- Laboratorio de Biología Parasitaria, Instituto de Higiene, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| | - José F Tort
- Departamento de Genética, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay
| | - Federico G Hoffmann
- Department of Biochemistry, Molecular Biology, Entomology, and Plant Pathology, Mississippi State University, Mississippi, United States of America; Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Mississippi, United States of America
| | - Anna V Protasio
- Department of Pathology, University of Cambridge, Tennis Court Road, CB2 1QP, Cambridge, United Kingdom
| | - Uriel Koziol
- Sección Biología Celular, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| | - Andrés Iriarte
- Laboratorio de Biología Computacional, Departamento de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay.
| |
Collapse
|
2
|
Bohutínská M, Peichel CL. Divergence time shapes gene reuse during repeated adaptation. Trends Ecol Evol 2024; 39:396-407. [PMID: 38155043 DOI: 10.1016/j.tree.2023.11.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 11/15/2023] [Accepted: 11/20/2023] [Indexed: 12/30/2023]
Abstract
When diverse lineages repeatedly adapt to similar environmental challenges, the extent to which the same genes are involved (gene reuse) varies across systems. We propose that divergence time among lineages is a key factor driving this variability: as lineages diverge, the extent of gene reuse should decrease due to reductions in allele sharing, functional differentiation among genes, and restructuring of genome architecture. Indeed, we show that many genomic studies of repeated adaptation find that more recently diverged lineages exhibit higher gene reuse during repeated adaptation, but the relationship becomes less clear at older divergence time scales. Thus, future research should explore the factors shaping gene reuse and their interplay across broad divergence time scales for a deeper understanding of evolutionary repeatability.
Collapse
Affiliation(s)
- Magdalena Bohutínská
- Division of Evolutionary Ecology, Institute of Ecology and Evolution, University of Bern, Bern, 3012, Switzerland; Department of Botany, Faculty of Science, Charles University, Prague, 12800, Czech Republic.
| | - Catherine L Peichel
- Division of Evolutionary Ecology, Institute of Ecology and Evolution, University of Bern, Bern, 3012, Switzerland
| |
Collapse
|
3
|
McCartney N, Kondakath G, Tai A, Trimmer BA. Functional annotation of insecta transcriptomes: A cautionary tale from Lepidoptera. INSECT BIOCHEMISTRY AND MOLECULAR BIOLOGY 2024; 165:104038. [PMID: 37952902 DOI: 10.1016/j.ibmb.2023.104038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 10/30/2023] [Accepted: 11/07/2023] [Indexed: 11/14/2023]
Abstract
Functional annotation is a critical step in the analysis of genomic data, as it provides insight into the function of individual genes and the pathways in which they participate. Currently, there is no consensus on the best computational approach for assigning functional annotation. This study compares three functional annotation methods (BLAST, eggNOG-Mapper, and InterProScan) in their ability to assign Gene Ontology terms in two species of Insecta with differing levels of annotation, Bombyx mori and Manduca sexta. The methods were compared for their annotation coverage, number of term assignments, term agreement and non-overlapping terms. Here we show that there are large discrepancies in gene ontology term assignment among the three computational methods, which could lead to confounding interpretations of data and non-comparable results. This study provide insight into the strengths and weaknesses of each computational method and highlight the need for more standardized methods of functional annotation.
Collapse
Affiliation(s)
- Naya McCartney
- Department of Biology, Tufts University, 200 Boston Ave, Medford, MA, 02155, USA
| | - Gayathri Kondakath
- Department of Biology, Tufts University, 200 Boston Ave, Medford, MA, 02155, USA
| | - Albert Tai
- School of Medicine, Tufts University, 136 Harrison Ave, Boston, MA, 02111, USA
| | - Barry A Trimmer
- Department of Biology, Tufts University, 200 Boston Ave, Medford, MA, 02155, USA.
| |
Collapse
|
4
|
Islam M, Behura SK. Role of paralogs in the sex-bias transcriptional and metabolic regulation of the brain-placental axis in mice. Placenta 2024; 145:143-150. [PMID: 38134547 DOI: 10.1016/j.placenta.2023.12.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 12/12/2023] [Accepted: 12/14/2023] [Indexed: 12/24/2023]
Abstract
INTRODUCTION Duplicated genes or paralogs play important roles in the adaptive function of eukaryotic genomes. Animal studies have shown evidence for the functional role of paralogs in pregnancy, but our knowledge about the role of paralogs in the fetoplacental regulation remains limited. In particular, if fetoplacental metabolic regulation is modulated by differential expression of paralogs remains unexamined. METHODS In this study, gene expression profiles of day-15 placenta and fetal brain were compared to identify families or groups of paralogous genes expressed in the placenta and brain of male versus female fetuses in mice. A Bayesian modeling was applied to infer directional relationship of transcriptional variation of the paralogs relative to the phylogenetic variation of the genes in each family. Gas chromatography-mass spectrometry (GC-MS) was used to perform untargeted metabolomics analysis of day-15 placenta and fetal brain of both sexes. RESULTS We identified paralog groups that were expressed in a sex and/or tissue biased manner between the placenta and fetal brain. Bayesian modeling showed evidence for directional relationship between expression and phylogeny of specific paralogs. These relationships were sex specific. GC-MS analysis identified metabolites that were expressed in a sex-bias manner between the placenta and fetal brain. By performing integrative analysis of the metabolomics and gene expression data, we showed that specific groups of metabolites and paralogous genes were expressed in a coordinated manner between the placenta and fetal brain. DISCUSSION The findings of this study collectively suggest that paralogs play an influential role in the regulation of the brain-placental axis in mice.
Collapse
Affiliation(s)
- Maliha Islam
- Division of Animal Sciences, University of Missouri, 920 East Campus Drive, Columbia, Missouri, 65211, USA
| | - Susanta K Behura
- Division of Animal Sciences, University of Missouri, 920 East Campus Drive, Columbia, Missouri, 65211, USA; MU Institute for Data Science and Informatics, University of Missouri, USA; Interdisciplinary Reproduction and Health Group, University of Missouri, USA; Interdisciplinary Neuroscience Program, University of Missouri, USA.
| |
Collapse
|
5
|
Carrion SA, Michal JJ, Jiang Z. Imprinted Genes: Genomic Conservation, Transcriptomic Dynamics and Phenomic Significance in Health and Diseases. Int J Biol Sci 2023; 19:3128-3142. [PMID: 37416777 PMCID: PMC10321285 DOI: 10.7150/ijbs.83712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 05/25/2023] [Indexed: 07/08/2023] Open
Abstract
Since its discovery in 1991, genomic imprinting has been the subject of numerous studies into its mechanisms of establishment and regulation, evolution and function, and presence in multiple genomes. Disturbance of imprinting has been implicated in a range of diseases, ranging from debilitating syndromes to cancers to fetal deficiencies. Despite this, studies done on the prevalence and relevance of imprinting on genes have been limited in scope, tissue types available, and focus, by both availability and resources. This has left a gap in comparative studies. To address this, we assembled a collection of imprinted genes available in current literature covering five species. Here we sought to identify trends and motifs in the imprinted gene set (IGS) in three distinct arenas: evolutionary conservation, across-tissue expression, and health phenomics. Overall, we found that imprinted genes displayed less conservation and higher proportions of non-coding RNA while maintaining synteny. Maternally expressed genes (MEGs) and paternally expressed genes (PEGs) occupied distinct roles in tissue expression and biological pathway use, while imprinted genes collectively showed a broader tissue range, notable preference for tissue specific expression and limited gene pathways than comparable sex differentiation genes. Both human and murine imprinted genes showed the same clear phenotypic trends, that were distinct from those displayed by sex differentiation genes which were less involved in mental and nervous system disease. While both sets had representation across the genome, the IGS showed clearer clustering as expected, with PEGs significantly more represented than MEGs.
Collapse
Affiliation(s)
| | | | - Zhihua Jiang
- ✉ Corresponding author: Dr. Zhihua Jiang (ORCID ID: 0000-0003-1986-088X), Professor of Genome Biology. Phone: 509-335 8761;
| |
Collapse
|
6
|
Zhang J. What Has Genomics Taught An Evolutionary Biologist? GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:1-12. [PMID: 36720382 PMCID: PMC10373158 DOI: 10.1016/j.gpb.2023.01.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 01/06/2023] [Accepted: 01/19/2023] [Indexed: 01/30/2023]
Abstract
Genomics, an interdisciplinary field of biology on the structure, function, and evolution of genomes, has revolutionized many subdisciplines of life sciences, including my field of evolutionary biology, by supplying huge data, bringing high-throughput technologies, and offering a new approach to biology. In this review, I describe what I have learned from genomics and highlight the fundamental knowledge and mechanistic insights gained. I focus on three broad topics that are central to evolutionary biology and beyond-variation, interaction, and selection-and use primarily my own research and study subjects as examples. In the next decade or two, I expect that the most important contributions of genomics to evolutionary biology will be to provide genome sequences of nearly all known species on Earth, facilitate high-throughput phenotyping of natural variants and systematically constructed mutants for mapping genotype-phenotype-fitness landscapes, and assist the determination of causality in evolutionary processes using experimental evolution.
Collapse
Affiliation(s)
- Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
7
|
Suetsugu K, Fukushima K, Makino T, Ikematsu S, Sakamoto T, Kimura S. Transcriptomic heterochrony and completely cleistogamous flower development in the mycoheterotrophic orchid Gastrodia. THE NEW PHYTOLOGIST 2023; 237:323-338. [PMID: 36110047 DOI: 10.1111/nph.18495] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 09/09/2022] [Indexed: 06/15/2023]
Abstract
Cleistogamy, in which plants can reproduce via self-fertilization within permanently closed flowers, has evolved in > 30 angiosperm lineages; however, consistent with Darwin's doubts about its existence, complete cleistogamy - the production of only cleistogamous flowers - has rarely been recognized. Thus far, the achlorophyllous orchid genus, Gastrodia, is the only known genus with several plausible completely cleistogamous species. Here, we analyzed the floral developmental transcriptomes of two recently evolved, completely cleistogamous Gastrodia species and their chasmogamous sister species to elucidate the possible changes involved in producing common cleistogamous traits. The ABBA-BABA test did not support introgression and protein sequence convergence as evolutionary mechanisms leading to cleistogamy, leaving convergence in gene expression as a plausible mechanism. Regarding transcriptomic differentiation, the two cleistogamous species had common modifications in the expression of developmental regulators, exhibiting a gene family-wide signature of convergent expression changes in MADS-box genes. Our transcriptomic pseudotime analysis revealed a prolonged juvenile state and eventual maturation, a heterochronic pattern consistent with partial neoteny, in cleistogamous flower development. These findings indicate that transcriptomic partial neoteny, arising from changes in the expression of conserved developmental regulators, might have contributed to the rapid and repeated evolution of cleistogamous flowers in Gastrodia.
Collapse
Affiliation(s)
- Kenji Suetsugu
- Department of Biology, Graduate School of Science, Kobe University, 1-1 Rokkodai, Nada-ku, Kobe, 657-8501, Japan
| | - Kenji Fukushima
- Institute for Molecular Plant Physiology and Biophysics, University of Würzburg, Julius-von-Sachs Platz 2, 97082, Würzburg, Germany
| | - Takashi Makino
- Graduate School of Life Sciences, Tohoku University, 6-3, Aramaki Aza Aoba, Aoba-ku, Sendai, 980-8578, Japan
| | - Shuka Ikematsu
- Faculty of Life Sciences, Kyoto Sangyo University, Kamigamo-motoyama, Kita-ku, Kyoto, 603-8555, Japan
- Center for Plant Sciences, Kyoto Sangyo University, Kamigamo-motoyama, Kita-ku, Kyoto, 603-8555, Japan
| | - Tomoaki Sakamoto
- Faculty of Life Sciences, Kyoto Sangyo University, Kamigamo-motoyama, Kita-ku, Kyoto, 603-8555, Japan
- Center for Plant Sciences, Kyoto Sangyo University, Kamigamo-motoyama, Kita-ku, Kyoto, 603-8555, Japan
| | - Seisuke Kimura
- Faculty of Life Sciences, Kyoto Sangyo University, Kamigamo-motoyama, Kita-ku, Kyoto, 603-8555, Japan
- Center for Plant Sciences, Kyoto Sangyo University, Kamigamo-motoyama, Kita-ku, Kyoto, 603-8555, Japan
| |
Collapse
|
8
|
Ahsan F, Yan Z, Precup D, Blanchette M. PhyloPGM: boosting regulatory function prediction accuracy using evolutionary information. Bioinformatics 2022; 38:i299-i306. [PMID: 35758792 PMCID: PMC9235490 DOI: 10.1093/bioinformatics/btac259] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Motivation The computational prediction of regulatory function associated with a genomic sequence is of utter importance in -omics study, which facilitates our understanding of the underlying mechanisms underpinning the vast gene regulatory network. Prominent examples in this area include the binding prediction of transcription factors in DNA regulatory regions, and predicting RNA–protein interaction in the context of post-transcriptional gene expression. However, existing computational methods have suffered from high false-positive rates and have seldom used any evolutionary information, despite the vast amount of available orthologous data across multitudes of extant and ancestral genomes, which readily present an opportunity to improve the accuracy of existing computational methods. Results In this study, we present a novel probabilistic approach called PhyloPGM that leverages previously trained TFBS or RNA–RBP binding predictors by aggregating their predictions from various orthologous regions, in order to boost the overall prediction accuracy on human sequences. Throughout our experiments, PhyloPGM has shown significant improvement over baselines such as the sequence-based RNA–RBP binding predictor RNATracker and the sequence-based TFBS predictor that is known as FactorNet. PhyloPGM is simple in principle, easy to implement and yet, yields impressive results. Availability and implementation The PhyloPGM package is available at https://github.com/BlanchetteLab/PhyloPGM Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Faizy Ahsan
- School of Computer Science, McGill University, Montreal H3A 0G4, Canada
| | - Zichao Yan
- School of Computer Science, McGill University, Montreal H3A 0G4, Canada
| | - Doina Precup
- School of Computer Science, McGill University, Montreal H3A 0G4, Canada
| | | |
Collapse
|
9
|
Sánchez AL, Lafond M. Colorful orthology clustering in bounded-degree similarity graphs. J Bioinform Comput Biol 2021; 19:2140010. [PMID: 34775924 DOI: 10.1142/s0219720021400102] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Clustering genes in similarity graphs is a popular approach for orthology prediction. Most algorithms group genes without considering their species, which results in clusters that contain several paralogous genes. Moreover, clustering is known to be problematic when in-paralogs arise from ancient duplications. Recently, we proposed a two-step process that avoids these problems. First, we infer clusters of only orthologs (i.e. with only genes from distinct species), and second, we infer the missing inter-cluster orthologs. In this paper, we focus on the first step, which leads to a problem we call Colorful Clustering. In general, this is as hard as classical clustering. However, in similarity graphs, the number of species is usually small, as well as the neighborhood size of genes in other species. We therefore study the problem of clustering in which the number of colors is bounded by [Formula: see text], and each gene has at most [Formula: see text] neighbors in another species. We show that the well-known cluster editing formulation remains NP-hard even when [Formula: see text] and [Formula: see text]. We then propose a fixed-parameter algorithm in [Formula: see text] to find the single best cluster in the graph. We implemented this algorithm and included it in the aforementioned two-step approach. Experiments on simulated data show that this approach performs favorably to applying only an unconstrained clustering step.
Collapse
Affiliation(s)
- Alitzel López Sánchez
- Computer Science Department, Université de Sherbrooke, 2500 Boulevard de l'Université, Sherbrooke, Québec J1K 2R1, Canada
| | - Manuel Lafond
- Computer Science Department, Université de Sherbrooke, 2500 Boulevard de l'Université, Sherbrooke, Québec J1K 2R1, Canada
| |
Collapse
|
10
|
Mo N, Zhang X, Shi W, Yu G, Chen X, Yang JR. Bidirectional Genetic Control of Phenotypic Heterogeneity and Its Implication for Cancer Drug Resistance. Mol Biol Evol 2021; 38:1874-1887. [PMID: 33355660 PMCID: PMC8097262 DOI: 10.1093/molbev/msaa332] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Negative genetic regulators of phenotypic heterogeneity, or phenotypic capacitors/stabilizers, elevate population average fitness by limiting deviation from the optimal phenotype and increase the efficacy of natural selection by enhancing the phenotypic differences among genotypes. Stabilizers can presumably be switched off to release phenotypic heterogeneity in the face of extreme or fluctuating environments to ensure population survival. This task could, however, also be achieved by positive genetic regulators of phenotypic heterogeneity, or "phenotypic diversifiers," as shown by recently reported evidence that a bacterial divisome factor enhances antibiotic resistance. We hypothesized that such active creation of phenotypic heterogeneity by diversifiers, which is functionally independent of stabilizers, is more common than previously recognized. Using morphological phenotypic data from 4,718 single-gene knockout strains of Saccharomyces cerevisiae, we systematically identified 324 stabilizers and 160 diversifiers and constructed a bipartite network between these genes and the morphological traits they control. Further analyses showed that, compared with stabilizers, diversifiers tended to be weaker and more promiscuous (regulating more traits) regulators targeting traits unrelated to fitness. Moreover, there is a general division of labor between stabilizers and diversifiers. Finally, by incorporating NCI-60 human cancer cell line anticancer drug screening data, we found that human one-to-one orthologs of yeast diversifiers/stabilizers likely regulate the anticancer drug resistance of human cancer cell lines, suggesting that these orthologs are potential targets for auxiliary treatments. Our study therefore highlights stabilizers and diversifiers as the genetic regulators for the bidirectional control of phenotypic heterogeneity as well as their distinct evolutionary roles and functional independence.
Collapse
Affiliation(s)
- Ning Mo
- Department of Medical Genetics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Xiaoyu Zhang
- Department of Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Wenjun Shi
- Department of Medical Genetics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Gongwang Yu
- Department of Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Xiaoshu Chen
- Department of Medical Genetics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
- Corresponding authors: E-mails: ;
| | - Jian-Rong Yang
- Department of Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
- Key Laboratory of Tropical Disease Control, Ministry of Education, Sun Yat-sen University, Guangzhou, China
- RNA Biomedical Institute, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
- Corresponding authors: E-mails: ;
| |
Collapse
|
11
|
Begum T, Robinson-Rechavi M. Special Care Is Needed in Applying Phylogenetic Comparative Methods to Gene Trees with Speciation and Duplication Nodes. Mol Biol Evol 2021; 38:1614-1626. [PMID: 33169790 PMCID: PMC8042747 DOI: 10.1093/molbev/msaa288] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
How gene function evolves is a central question of evolutionary biology. It can be investigated by comparing functional genomics results between species and between genes. Most comparative studies of functional genomics have used pairwise comparisons. Yet it has been shown that this can provide biased results, as genes, like species, are phylogenetically related. Phylogenetic comparative methods should be used to correct for this, but they depend on strong assumptions, including unbiased tree estimates relative to the hypothesis being tested. Such methods have recently been used to test the “ortholog conjecture,” the hypothesis that functional evolution is faster in paralogs than in orthologs. Although pairwise comparisons of tissue specificity (τ) provided support for the ortholog conjecture, phylogenetic independent contrasts did not. Our reanalysis on the same gene trees identified problems with the time calibration of duplication nodes. We find that the gene trees used suffer from important biases, due to the inclusion of trees with no duplication nodes, to the relative age of speciations and duplications, to systematic differences in branch lengths, and to non-Brownian motion of tissue specificity on many trees. We find that incorrect implementation of phylogenetic method in empirical gene trees with duplications can be problematic. Controlling for biases allows successful use of phylogenetic methods to study the evolution of gene function and provides some support for the ortholog conjecture using three different phylogenetic approaches.
Collapse
Affiliation(s)
- Tina Begum
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Marc Robinson-Rechavi
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
12
|
Swamy KBS, Schuyler SC, Leu JY. Protein Complexes Form a Basis for Complex Hybrid Incompatibility. Front Genet 2021; 12:609766. [PMID: 33633780 PMCID: PMC7900514 DOI: 10.3389/fgene.2021.609766] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 01/20/2021] [Indexed: 12/20/2022] Open
Abstract
Proteins are the workhorses of the cell and execute many of their functions by interacting with other proteins forming protein complexes. Multi-protein complexes are an admixture of subunits, change their interaction partners, and modulate their functions and cellular physiology in response to environmental changes. When two species mate, the hybrid offspring are usually inviable or sterile because of large-scale differences in the genetic makeup between the two parents causing incompatible genetic interactions. Such reciprocal-sign epistasis between inter-specific alleles is not limited to incompatible interactions between just one gene pair; and, usually involves multiple genes. Many of these multi-locus incompatibilities show visible defects, only in the presence of all the interactions, making it hard to characterize. Understanding the dynamics of protein-protein interactions (PPIs) leading to multi-protein complexes is better suited to characterize multi-locus incompatibilities, compared to studying them with traditional approaches of genetics and molecular biology. The advances in omics technologies, which includes genomics, transcriptomics, and proteomics can help achieve this end. This is especially relevant when studying non-model organisms. Here, we discuss the recent progress in the understanding of hybrid genetic incompatibility; omics technologies, and how together they have helped in characterizing protein complexes and in turn multi-locus incompatibilities. We also review advances in bioinformatic techniques suitable for this purpose and propose directions for leveraging the knowledge gained from model-organisms to identify genetic incompatibilities in non-model organisms.
Collapse
Affiliation(s)
- Krishna B. S. Swamy
- Division of Biological and Life Sciences, School of Arts and Sciences, Ahmedabad University, Ahmedabad, India
| | - Scott C. Schuyler
- Department of Biomedical Sciences, College of Medicine, Chang Gung University, Taoyuan, Taiwan
- Division of Head and Neck Surgery, Department of Otolaryngology, Chang Gung Memorial Hospital, Taoyuan, Taiwan
| | - Jun-Yi Leu
- Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
13
|
Dong N, Bandura J, Zhang Z, Wang Y, Labadie K, Noel B, Davison A, Koene JM, Sun HS, Coutellec MA, Feng ZP. Ion channel profiling of the Lymnaea stagnalis ganglia via transcriptome analysis. BMC Genomics 2021; 22:18. [PMID: 33407100 PMCID: PMC7789530 DOI: 10.1186/s12864-020-07287-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Accepted: 11/28/2020] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND The pond snail Lymnaea stagnalis (L. stagnalis) has been widely used as a model organism in neurobiology, ecotoxicology, and parasitology due to the relative simplicity of its central nervous system (CNS). However, its usefulness is restricted by a limited availability of transcriptome data. While sequence information for the L. stagnalis CNS transcripts has been obtained from EST libraries and a de novo RNA-seq assembly, the quality of these assemblies is limited by a combination of low coverage of EST libraries, the fragmented nature of de novo assemblies, and lack of reference genome. RESULTS In this study, taking advantage of the recent availability of a preliminary L. stagnalis genome, we generated an RNA-seq library from the adult L. stagnalis CNS, using a combination of genome-guided and de novo assembly programs to identify 17,832 protein-coding L. stagnalis transcripts. We combined our library with existing resources to produce a transcript set with greater sequence length, completeness, and diversity than previously available ones. Using our assembly and functional domain analysis, we profiled L. stagnalis CNS transcripts encoding ion channels and ionotropic receptors, which are key proteins for CNS function, and compared their sequences to other vertebrate and invertebrate model organisms. Interestingly, L. stagnalis transcripts encoding numerous putative Ca2+ channels showed the most sequence similarity to those of Mus musculus, Danio rerio, Xenopus tropicalis, Drosophila melanogaster, and Caenorhabditis elegans, suggesting that many calcium channel-related signaling pathways may be evolutionarily conserved. CONCLUSIONS Our study provides the most thorough characterization to date of the L. stagnalis transcriptome and provides insights into differences between vertebrates and invertebrates in CNS transcript diversity, according to function and protein class. Furthermore, this study provides a complete characterization of the ion channels of Lymnaea stagnalis, opening new avenues for future research on fundamental neurobiological processes in this model system.
Collapse
Affiliation(s)
- Nancy Dong
- Department of Physiology, University of Toronto, 3308 MSB, 1 King's College Circle, Toronto, ON, M5S 1A8, Canada
| | - Julia Bandura
- Department of Physiology, University of Toronto, 3308 MSB, 1 King's College Circle, Toronto, ON, M5S 1A8, Canada
| | - Zhaolei Zhang
- Donnelly Centre for Cellular and Biomolecular Research and Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Yan Wang
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, M5S 3B2, Canada
- Department of Biological Sciences, University of Toronto Scarborough, Toronto, Ontario, M1C 1A4, Canada
| | - Karine Labadie
- Genoscope, Institut de biologie François Jacob, Commissariat à l'Energie Atomique (CEA), Université Paris-Saclay, BP5706, 91057, Evry, France
| | - Benjamin Noel
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, University of Evry, Université Paris-Saclay, 91057, Evry, France
| | - Angus Davison
- School of Life Sciences, University of Nottingham, University Park, Nottingham, UK, NG7 2RD, UK
| | - Joris M Koene
- Department of Ecological Science, Faculty of Science, Vrije Universiteit, Amsterdam, The Netherlands
| | - Hong-Shuo Sun
- Department of Physiology, University of Toronto, 3308 MSB, 1 King's College Circle, Toronto, ON, M5S 1A8, Canada
- Department of Surgery, University of Toronto, Toronto, Ontario, M5S 1A8, Canada
| | | | - Zhong-Ping Feng
- Department of Physiology, University of Toronto, 3308 MSB, 1 King's College Circle, Toronto, ON, M5S 1A8, Canada.
| |
Collapse
|
14
|
Purification and Characterization of Two Novel Laccases from Peniophora lycii. J Fungi (Basel) 2020; 6:jof6040340. [PMID: 33291231 PMCID: PMC7762197 DOI: 10.3390/jof6040340] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 12/02/2020] [Accepted: 12/03/2020] [Indexed: 01/09/2023] Open
Abstract
Although, currently, more than 100 laccases have been purified from basidiomycete fungi, the majority of these laccases were obtained from fungi of the Polyporales order, and only scarce data are available about the laccases from other fungi. In this article, laccase production by the white-rot basidiomycete fungus Peniophora lycii, belonging to the Russulales order, was investigated. It was shown that, under copper induction, this fungus secreted three different laccase isozymes. Two laccase isozymes—Lac5 and LacA—were purified and their corresponding nucleotide sequences were determined. Both purified laccases were relatively thermostable with periods of half-life at 70 °C of 10 and 8 min for Lac5 and LacA, respectively. The laccases demonstrated the highest activity toward ABTS (97 U·mg−1 for Lac5 and 121 U·mg−1 for LacA at pH 4.5); Lac5 demonstrated the lowest activity toward 2,6-DMP (2.5 U·mg−1 at pH 4.5), while LacA demonstrated this towards gallic acid (1.4 U·mg−1 at pH 4.5). Both Lac5 and LacA were able to efficiently decolorize such dyes as RBBR and Bromcresol Green. Additionally, phylogenetic relationships among laccases of Peniophora spp. were reconstructed, and groups of orthologous genes were determined. Based on these groups, all currently available data about laccases of Peniophora spp. were systematized.
Collapse
|
15
|
Ahrens JB, Teufel AI, Siltberg-Liberles J. A Phylogenetic Rate Parameter Indicates Different Sequence Divergence Patterns in Orthologs and Paralogs. J Mol Evol 2020; 88:720-730. [PMID: 33118098 DOI: 10.1007/s00239-020-09969-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Accepted: 10/15/2020] [Indexed: 10/23/2022]
Abstract
Heterotachy-the change in sequence evolutionary rate over time-is a common feature of protein molecular evolution. Decades of studies have shed light on the conditions under which heterotachy occurs, and there is evidence that site-specific evolutionary rate shifts are correlated with changes in protein function. Here, we present a large-scale, computational analysis using thousands of protein sequence alignments from animal and plant proteomes, representing genes related either by orthology (speciation events) or paralogy (gene duplication), to compare sequence divergence patterns in orthologous vs. paralogous sequence alignments. We use sequence-based phylogenetic analyses to infer overall sequence divergence (tree length/number of sequences) and to fit site-specific rates to a discrete gamma distribution with a shape parameter α. This inference method is applied to real protein sequence alignments, as well as alignments simulated under various models of protein sequence evolution. Our simulations indicate that sequence divergence and the α parameter are positively correlated when sequences evolve with heterotachy, meaning that inferred site rate distributions appear more uniform as sequences diverge. Divergence and α are also positively correlated in both orthologous and paralogous genes, but the average increase in α (as a function of divergence) is significantly higher in paralogous protein alignments than in orthologous alignments. This result is consistent with the widely held view that recently duplicated proteins initially evolve under relaxed selective pressure, promoting functional divergence by accumulation of amino acid replacements, and hence experience more evolutionary rate fluctuations than orthologous proteins. We discuss these findings in the context of the ortholog conjecture, a long-standing assumption in molecular evolution, which posits that protein sequences related by orthology tend to be more functionally conserved than paralogous proteins.
Collapse
Affiliation(s)
- Joseph B Ahrens
- Department of Biological Sciences, Biomolecular Sciences Institute, Florida International University, Miami, FL, USA. .,Department of Biochemistry and Molecular Genetics, Computational Bioscience Program, University of Colorado Denver, Aurora, CO, USA.
| | - Ashley I Teufel
- Department of Integrative Biology, The University of Texas At Austin, Austin, TX, USA.,Santa Fe Institute, Santa Fe, NM, USA
| | - Jessica Siltberg-Liberles
- Department of Biological Sciences, Biomolecular Sciences Institute, Florida International University, Miami, FL, USA.
| |
Collapse
|
16
|
Hernández-Salmerón JE, Moreno-Hagelsieb G. Progress in quickly finding orthologs as reciprocal best hits: comparing blast, last, diamond and MMseqs2. BMC Genomics 2020; 21:741. [PMID: 33099302 PMCID: PMC7585182 DOI: 10.1186/s12864-020-07132-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Accepted: 10/09/2020] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Finding orthologs remains an important bottleneck in comparative genomics analyses. While the authors of software for the quick comparison of protein sequences evaluate the speed of their software and compare their results against the most usual software for the task, it is not common for them to evaluate their software for more particular uses, such as finding orthologs as reciprocal best hits (RBH). Here we compared RBH results obtained using software that runs faster than blastp. Namely, lastal, diamond, and MMseqs2. RESULTS We found that lastal required the least time to produce results. However, it yielded fewer results than any other program when comparing the proteins encoded by evolutionarily distant genomes. The program producing the most similar number of RBH to blastp was diamond ran with the "ultra-sensitive" option. However, this option was diamond's slowest, with the "very-sensitive" option offering the best balance between speed and RBH results. The speeding up of the programs was much more evident when dealing with eukaryotic genomes, which code for more numerous proteins. For example, lastal took a median of approx. 1.5% of the blastp time to run with bacterial proteomes and 0.6% with eukaryotic ones, while diamond with the very-sensitive option took 7.4% and 5.2%, respectively. Though estimated error rates were very similar among the RBH obtained with all programs, RBH obtained with MMseqs2 had the lowest error rates among the programs tested. CONCLUSIONS The fast algorithms for pairwise protein comparison produced results very similar to blast in a fraction of the time, with diamond offering the best compromise in speed, sensitivity and quality, as long as a sensitivity option, other than the default, was chosen.
Collapse
Affiliation(s)
| | - Gabriel Moreno-Hagelsieb
- Wilfrid Laurier University, Department of Biology, 75 University Ave W, Waterloo, N2L 3C5 ON Canada
| |
Collapse
|
17
|
Amalgamated cross-species transcriptomes reveal organ-specific propensity in gene expression evolution. Nat Commun 2020; 11:4459. [PMID: 32900997 PMCID: PMC7479108 DOI: 10.1038/s41467-020-18090-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 07/29/2020] [Indexed: 12/24/2022] Open
Abstract
The origins of multicellular physiology are tied to evolution of gene expression. Genes can shift expression as organisms evolve, but how ancestral expression influences altered descendant expression is not well understood. To examine this, we amalgamate 1,903 RNA-seq datasets from 182 research projects, including 6 organs in 21 vertebrate species. Quality control eliminates project-specific biases, and expression shifts are reconstructed using gene-family-wise phylogenetic Ornstein-Uhlenbeck models. Expression shifts following gene duplication result in more drastic changes in expression properties than shifts without gene duplication. The expression properties are tightly coupled with protein evolutionary rate, depending on whether and how gene duplication occurred. Fluxes in expression patterns among organs are nonrandom, forming modular connections that are reshaped by gene duplication. Thus, if expression shifts, ancestral expression in some organs induces a strong propensity for expression in particular organs in descendants. Regardless of whether the shifts are adaptive or not, this supports a major role for what might be termed preadaptive pathways of gene expression evolution.
Collapse
|
18
|
Costa SS, Guimarães LC, Silva A, Soares SC, Baraúna RA. First Steps in the Analysis of Prokaryotic Pan-Genomes. Bioinform Biol Insights 2020; 14:1177932220938064. [PMID: 32843837 PMCID: PMC7418249 DOI: 10.1177/1177932220938064] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 05/26/2020] [Indexed: 01/14/2023] Open
Abstract
Pan-genome is defined as the set of orthologous and unique genes of a specific group of organisms. The pan-genome is composed by the core genome, accessory genome, and species- or strain-specific genes. The pan-genome is considered open or closed based on the alpha value of the Heap law. In an open pan-genome, the number of gene families will continuously increase with the addition of new genomes to the analysis, while in a closed pan-genome, the number of gene families will not increase considerably. The first step of a pan-genome analysis is the homogenization of genome annotation. The same software should be used to annotate genomes, such as GeneMark or RAST. Subsequently, several software are used to calculate the pan-genome such as BPGA, GET_HOMOLOGUES, PGAP, among others. This review presents all these initial steps for those who want to perform a pan-genome analysis, explaining key concepts of the area. Furthermore, we present the pan-genomic analysis of 9 bacterial species. These are the species with the highest number of genomes deposited in GenBank. We also show the influence of the identity and coverage parameters on the prediction of orthologous and paralogous genes. Finally, we cite the perspectives of several research areas where pan-genome analysis can be used to answer important issues.
Collapse
Affiliation(s)
- Sávio Souza Costa
- Centro de Genômica e Biologia de Sistemas, Universidade Federal do Pará, Belém, Brazil
- Laboratório de Engenharia Biológica, Espaço Inovação, Parque de Ciência e Tecnologia Guamá, Belém, Brazil
| | - Luís Carlos Guimarães
- Centro de Genômica e Biologia de Sistemas, Universidade Federal do Pará, Belém, Brazil
| | - Artur Silva
- Centro de Genômica e Biologia de Sistemas, Universidade Federal do Pará, Belém, Brazil
- Laboratório de Engenharia Biológica, Espaço Inovação, Parque de Ciência e Tecnologia Guamá, Belém, Brazil
| | - Siomar Castro Soares
- Instituto de Ciências Biológicas e Naturais, Universidade Federal do Triângulo Mineiro, Uberaba, Brazil
| | - Rafael Azevedo Baraúna
- Centro de Genômica e Biologia de Sistemas, Universidade Federal do Pará, Belém, Brazil
- Laboratório de Engenharia Biológica, Espaço Inovação, Parque de Ciência e Tecnologia Guamá, Belém, Brazil
| |
Collapse
|
19
|
Stamboulian M, Guerrero RF, Hahn MW, Radivojac P. The ortholog conjecture revisited: the value of orthologs and paralogs in function prediction. Bioinformatics 2020; 36:i219-i226. [PMID: 32657391 PMCID: PMC7355290 DOI: 10.1093/bioinformatics/btaa468] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
MOTIVATION The computational prediction of gene function is a key step in making full use of newly sequenced genomes. Function is generally predicted by transferring annotations from homologous genes or proteins for which experimental evidence exists. The 'ortholog conjecture' proposes that orthologous genes should be preferred when making such predictions, as they evolve functions more slowly than paralogous genes. Previous research has provided little support for the ortholog conjecture, though the incomplete nature of the data cast doubt on the conclusions. RESULTS We use experimental annotations from over 40 000 proteins, drawn from over 80 000 publications, to revisit the ortholog conjecture in two pairs of species: (i) Homo sapiens and Mus musculus and (ii) Saccharomyces cerevisiae and Schizosaccharomyces pombe. By making a distinction between questions about the evolution of function versus questions about the prediction of function, we find strong evidence against the ortholog conjecture in the context of function prediction, though questions about the evolution of function remain difficult to address. In both pairs of species, we quantify the amount of information that would be ignored if paralogs are discarded, as well as the resulting loss in prediction accuracy. Taken as a whole, our results support the view that the types of homologs used for function transfer are largely irrelevant to the task of function prediction. Maximizing the amount of data used for this task, regardless of whether it comes from orthologs or paralogs, is most likely to lead to higher prediction accuracy. AVAILABILITY AND IMPLEMENTATION https://github.com/predragradivojac/oc. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Moses Stamboulian
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
| | - Rafael F Guerrero
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
- Department of Biological Sciences, North Carolina State University, Raleigh, NC 27695, USA
| | - Matthew W Hahn
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA
| |
Collapse
|
20
|
Parey E, Louis A, Cabau C, Guiguen Y, Roest Crollius H, Berthelot C. Synteny-Guided Resolution of Gene Trees Clarifies the Functional Impact of Whole-Genome Duplications. Mol Biol Evol 2020; 37:3324-3337. [DOI: 10.1093/molbev/msaa149] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Abstract
Whole-genome duplications (WGDs) have major impacts on the evolution of species, as they produce new gene copies contributing substantially to adaptation, isolation, phenotypic robustness, and evolvability. They result in large, complex gene families with recurrent gene losses in descendant species that sequence-based phylogenetic methods fail to reconstruct accurately. As a result, orthologs and paralogs are difficult to identify reliably in WGD-descended species, which hinders the exploration of functional consequences of WGDs. Here, we present Synteny-guided CORrection of Paralogies and Orthologies (SCORPiOs), a novel method to reconstruct gene phylogenies in the context of a known WGD event. WGDs generate large duplicated syntenic regions, which SCORPiOs systematically leverages as a complement to sequence evolution to infer the evolutionary history of genes. We applied SCORPiOs to the 320-My-old WGD at the origin of teleost fish. We find that almost one in four teleost gene phylogenies in the Ensembl database (3,394) are inconsistent with their syntenic contexts. For 70% of these gene families (2,387), we were able to propose an improved phylogenetic tree consistent with both the molecular substitution distances and the local syntenic information. We show that these synteny-guided phylogenies are more congruent with the species tree, with sequence evolution and with expected expression conservation patterns than those produced by state-of-the-art methods. Finally, we show that synteny-guided gene trees emphasize contributions of WGD paralogs to evolutionary innovations in the teleost clade.
Collapse
Affiliation(s)
- Elise Parey
- Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Ecole Normale Supérieure, CNRS, INSERM, Université PSL, Paris, France
| | - Alexandra Louis
- Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Ecole Normale Supérieure, CNRS, INSERM, Université PSL, Paris, France
| | - Cédric Cabau
- SIGENAE, GenPhySE, Université de Toulouse, INRAE, ENVT, Castanet Tolosan, France
| | | | - Hugues Roest Crollius
- Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Ecole Normale Supérieure, CNRS, INSERM, Université PSL, Paris, France
| | - Camille Berthelot
- Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Ecole Normale Supérieure, CNRS, INSERM, Université PSL, Paris, France
| |
Collapse
|
21
|
Kim S, Park J, Kim T, Lee JS. The functional study of human proteins using humanized yeast. J Microbiol 2020; 58:343-349. [PMID: 32342338 DOI: 10.1007/s12275-020-0136-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 04/13/2020] [Accepted: 04/13/2020] [Indexed: 12/18/2022]
Abstract
The functional and optimal expression of genes is crucial for survival of all living organisms. Numerous experiments and efforts have been performed to reveal the mechanisms required for the functional and optimal expression of human genes. The yeast Saccharomyces cerevisiae has evolved independently of humans for billions of years. Nevertheless, S. cerevisiae has many conserved genes and expression mechanisms that are similar to those in humans. Yeast is the most commonly used model organism for studying the function and expression mechanisms of human genes because it has a relatively simple genome structure, which is easy to manipulate. Many previous studies have focused on understanding the functions and mechanisms of human proteins using orthologous genes and biological systems of yeast. In this review, we mainly introduce two recent studies that replaced human genes and nucleosomes with those of yeast. Here, we suggest that, although yeast is a relatively small eukaryotic cell, its humanization is useful for the direct study of human proteins. In addition, yeast can be used as a model organism in a broader range of studies, including drug screening.
Collapse
Affiliation(s)
- Seho Kim
- Department of Molecular Bioscience, College of Biomedical Science, Kangwon National University, Chuncheon, 24341, Republic of Korea
| | - Juhee Park
- Department of Molecular Bioscience, College of Biomedical Science, Kangwon National University, Chuncheon, 24341, Republic of Korea
| | - Taekyung Kim
- Department of Biology Education, Pusan National University, Busan, 26241, Republic of Korea.
| | - Jung-Shin Lee
- Department of Molecular Bioscience, College of Biomedical Science, Kangwon National University, Chuncheon, 24341, Republic of Korea.
| |
Collapse
|
22
|
David KT, Oaks JR, Halanych KM. Patterns of gene evolution following duplications and speciations in vertebrates. PeerJ 2020; 8:e8813. [PMID: 32266119 PMCID: PMC7120047 DOI: 10.7717/peerj.8813] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Accepted: 02/27/2020] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Eukaryotic genes typically form independent evolutionary lineages through either speciation or gene duplication events. Generally, gene copies resulting from speciation events (orthologs) are expected to maintain similarity over time with regard to sequence, structure and function. After a duplication event, however, resulting gene copies (paralogs) may experience a broader set of possible fates, including partial (subfunctionalization) or complete loss of function, as well as gain of new function (neofunctionalization). This assumption, known as the Ortholog Conjecture, is prevalent throughout molecular biology and notably plays an important role in many functional annotation methods. Unfortunately, studies that explicitly compare evolutionary processes between speciation and duplication events are rare and conflicting. METHODS To provide an empirical assessment of ortholog/paralog evolution, we estimated ratios of nonsynonymous to synonymous substitutions (ω = dN/dS) for 251,044 lineages in 6,244 gene trees across 77 vertebrate taxa. RESULTS Overall, we found ω to be more similar between lineages descended from speciation events (p < 0.001) than lineages descended from duplication events, providing strong support for the Ortholog Conjecture. The asymmetry in ω following duplication events appears to be largely driven by an increase along one of the paralogous lineages, while the other remains similar to the parent. This trend is commonly associated with neofunctionalization, suggesting that gene duplication is a significant mechanism for generating novel gene functions.
Collapse
Affiliation(s)
- Kyle T. David
- Department of Biological Sciences, Auburn University, Auburn, AL, USA
| | - Jamie R. Oaks
- Department of Biological Sciences, Auburn University, Auburn, AL, USA
| | | |
Collapse
|
23
|
Lafond M, Meghdari Miardan M, Sankoff D. Accurate prediction of orthologs in the presence of divergence after duplication. Bioinformatics 2019; 34:i366-i375. [PMID: 29950018 PMCID: PMC6022570 DOI: 10.1093/bioinformatics/bty242] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Motivation When gene duplication occurs, one of the copies may become free of selective pressure and evolve at an accelerated pace. This has important consequences on the prediction of orthology relationships, since two orthologous genes separated by divergence after duplication may differ in both sequence and function. In this work, we make the distinction between the primary orthologs, which have not been affected by accelerated mutation rates on their evolutionary path, and the secondary orthologs, which have. Similarity-based prediction methods will tend to miss secondary orthologs, whereas phylogeny-based methods cannot separate primary and secondary orthologs. However, both types of orthology have applications in important areas such as gene function prediction and phylogenetic reconstruction, motivating the need for methods that can distinguish the two types. Results We formalize the notion of divergence after duplication and provide a theoretical basis for the inference of primary and secondary orthologs. We then put these ideas to practice with the Hybrid Prediction of Paralogs and Orthologs (HyPPO) framework, which combines ideas from both similarity and phylogeny approaches. We apply our method to simulated and empirical datasets and show that we achieve superior accuracy in predicting primary orthologs, secondary orthologs and paralogs. Availability and implementation HyPPO is a modular framework with a core developed in Python and is provided with a variety of C++ modules. The source code is available at https://github.com/manuellafond/HyPPO. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Manuel Lafond
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada.,Department of Computer Science, Université de Sherbrooke, Sherbrooke, Canada
| | | | - David Sankoff
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada
| |
Collapse
|
24
|
Zhang R, Pan Y, Ahmed L, Block E, Zhang Y, Batista VS, Zhuang H. A Multispecific Investigation of the Metal Effect in Mammalian Odorant Receptors for Sulfur-Containing Compounds. Chem Senses 2019; 43:357-366. [PMID: 29659735 DOI: 10.1093/chemse/bjy022] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Metal-coordinating compounds are generally known to have strong smells, a phenomenon that can be attributed to the fact that odorant receptors for intense-smelling compounds, such as those containing sulfur, may be metalloproteins. We previously identified a mouse odorant receptor (OR), Olfr1509, that requires copper ions for sensitive detection of a series of metal-coordinating odorants, including (methylthio)methanethiol (MTMT), a strong-smelling component of male mouse urine that attracts female mice. By combining mutagenesis and quantum mechanics/molecular mechanics (QM/MM) modeling, we identified candidate binding sites in Olfr1509 that may bind to the copper-MTMT complex. However, whether there are other receptors utilizing metal ions for ligand-binding and other sites important for receptor activation is still unknown. In this study, we describe a second mouse OR for MTMT with a copper effect, namely Olfr1019. In an attempt to investigate the functional changes of metal-coordinating ORs in multiple species and to decipher additional sites involved in the metal effect, we cloned various mammalian orthologs of the 2 mouse MTMT receptors, and a third mouse MTMT receptor, Olfr15, that does not have a copper effect. We found that the function of all 3 MTMT receptors varies greatly among species and that the response to MTMT always co-occurred with the copper effect. Furthermore, using ancestral reconstruction and QM/MM modeling combined with receptor functional assay, we found that the amino acid residue R260 in Olfr1509 and the respective R261 site in Olfr1019 may be important for receptor activation.
Collapse
Affiliation(s)
- Ruina Zhang
- Department of Pathophysiology, Key Laboratory of Cell Differentiation and Apoptosis of the Chinese Ministry of Education, Shanghai Jiaotong University School of Medicine, Huangpu District, Shanghai, P. R. China
| | - Yi Pan
- Department of Pathophysiology, Key Laboratory of Cell Differentiation and Apoptosis of the Chinese Ministry of Education, Shanghai Jiaotong University School of Medicine, Huangpu District, Shanghai, P. R. China
| | - Lucky Ahmed
- Department of Chemistry, Yale University, New Haven, CT, USA
| | - Eric Block
- Department of Chemistry, University at Albany, State University of New York, NY, USA
| | - Yuetian Zhang
- Department of Pathophysiology, Key Laboratory of Cell Differentiation and Apoptosis of the Chinese Ministry of Education, Shanghai Jiaotong University School of Medicine, Huangpu District, Shanghai, P. R. China
| | | | - Hanyi Zhuang
- Department of Pathophysiology, Key Laboratory of Cell Differentiation and Apoptosis of the Chinese Ministry of Education, Shanghai Jiaotong University School of Medicine, Huangpu District, Shanghai, P. R. China
- Institute of Health Sciences, Shanghai Jiaotong University School of Medicine/Shanghai Institutes for Biological Sciences of Chinese Academy of Sciences, Xuhui District, Shanghai, P. R. China
| |
Collapse
|
25
|
Zmasek CM, Knipe DM, Pellett PE, Scheuermann RH. Classification of human Herpesviridae proteins using Domain-architecture Aware Inference of Orthologs (DAIO). Virology 2019; 529:29-42. [PMID: 30660046 PMCID: PMC6502252 DOI: 10.1016/j.virol.2019.01.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Revised: 01/04/2019] [Accepted: 01/04/2019] [Indexed: 12/13/2022]
Abstract
We developed a computational approach called Domain-architecture Aware Inference of Orthologs (DAIO) for the analysis of protein orthology by combining phylogenetic and protein domain-architecture information. Using DAIO, we performed a systematic study of the proteomes of all human Herpesviridae species to define Strict Ortholog Groups (SOGs). In addition to assessing the taxonomic distribution for each protein based on sequence similarity, we performed a protein domain-architecture analysis for every protein family and computationally inferred gene duplication events. While many herpesvirus proteins have evolved without any detectable gene duplications or domain rearrangements, numerous herpesvirus protein families do exhibit complex evolutionary histories. Some proteins acquired additional domains (e.g., DNA polymerase), whereas others show a combination of domain acquisition and gene duplication (e.g., betaherpesvirus US22 family), with possible functional implications. This novel classification system of SOGs for human Herpesviridae proteins is available through the Virus Pathogen Resource (ViPR, www.viprbrc.org).
Collapse
Affiliation(s)
| | - David M Knipe
- Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Philip E Pellett
- Department of Biochemistry, Microbiology & Immunology, Wayne State University School of Medicine, Detroit, MI 48201, USA
| | - Richard H Scheuermann
- J. Craig Venter Institute, La Jolla, CA 92037, USA; Department of Pathology, University of California, San Diego, CA 92093, USA; Division of Vaccine Discovery, La Jolla Institute for Allergy and Immunology, La Jolla, CA 92037, USA.
| |
Collapse
|
26
|
Savinova OS, Moiseenko KV, Vavilova EA, Chulkin AM, Fedorova TV, Tyazhelova TV, Vasina DV. Evolutionary Relationships Between the Laccase Genes of Polyporales: Orthology-Based Classification of Laccase Isozymes and Functional Insight From Trametes hirsuta. Front Microbiol 2019; 10:152. [PMID: 30792703 PMCID: PMC6374638 DOI: 10.3389/fmicb.2019.00152] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Accepted: 01/22/2019] [Indexed: 01/06/2023] Open
Abstract
Laccase is one of the oldest known and intensively studied fungal enzymes capable of oxidizing recalcitrant lignin-resembling phenolic compounds. It is currently well established that fungal genomes almost always contain several non-allelic copies of laccase genes (laccase multigene families); nevertheless, many aspects of laccase multigenicity, for example, their precise biological functions or evolutionary relationships, are mostly unknown. Here, we present a detailed evolutionary analysis of the sensu stricto laccase genes (CAZy - AA1_1) from fungi of the Polyporales order. The conducted analysis provides a better understanding of the Polyporales laccase multigenicity and allows for the systemization of the individual features of different laccase isozymes. In addition, we provide a comparison of the biochemical and catalytic properties of the four laccase isozymes from Trametes hirsuta and suggest their functional diversification within the multigene family.
Collapse
Affiliation(s)
- Olga S Savinova
- Laboratory of Molecular Aspects of Biotransformations, A. N. Bach Institute of Biochemistry, Research Center of Biotechnology, Russian Academy of Sciences, Moscow, Russia
| | - Konstantin V Moiseenko
- Laboratory of Molecular Aspects of Biotransformations, A. N. Bach Institute of Biochemistry, Research Center of Biotechnology, Russian Academy of Sciences, Moscow, Russia
| | - Ekaterina A Vavilova
- Laboratory of Gene Expression Optimization, A. N. Bach Institute of Biochemistry, Research Center of Biotechnology, Russian Academy of Sciences, Moscow, Russia
| | - Andrey M Chulkin
- Laboratory of Gene Expression Optimization, A. N. Bach Institute of Biochemistry, Research Center of Biotechnology, Russian Academy of Sciences, Moscow, Russia
| | - Tatiana V Fedorova
- Laboratory of Molecular Aspects of Biotransformations, A. N. Bach Institute of Biochemistry, Research Center of Biotechnology, Russian Academy of Sciences, Moscow, Russia
| | - Tatiana V Tyazhelova
- Laboratory of Molecular Aspects of Biotransformations, A. N. Bach Institute of Biochemistry, Research Center of Biotechnology, Russian Academy of Sciences, Moscow, Russia
| | - Daria V Vasina
- Laboratory of Molecular Aspects of Biotransformations, A. N. Bach Institute of Biochemistry, Research Center of Biotechnology, Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|
27
|
Yang Q, Han XM, Gu JK, Liu YJ, Yang MJ, Zeng QY. Functional and structural profiles of GST gene family from three Populus species reveal the sequence-function decoupling of orthologous genes. THE NEW PHYTOLOGIST 2019; 221:1060-1073. [PMID: 30204242 DOI: 10.1111/nph.15430] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2018] [Accepted: 08/08/2018] [Indexed: 05/07/2023]
Abstract
A common assumption in comparative genomics is that orthologous genes are functionally more similar than paralogous genes. However, the validity of this assumption needs to be assessed using robust experimental data. We conducted tissue-specific gene expression and protein function analyses of orthologous groups within the glutathione S-transferase (GST) gene family in three closely related Populus species: Populus trichocarpa, Populus euphratica and Populus yatungensis. This study identified 21 GST orthologous groups in the three Populus species. Although the sequences of the GST orthologous groups were highly conserved, the divergence in enzymatic functions was prevalent. Through site-directed mutagenesis of orthologous proteins, this study revealed that nonsynonymous substitutions at key amino acid sites played an important role in the divergence of enzymatic functions. In particular, a single amino acid mutation (Arg39→Trp39) contributed to P. euphratica PeGSTU30 possessing high enzymatic activity via increasing the hydrophobicity of the active cavity. This study provided experimental evidence showing that orthologues belonging to the gene family have functional divergences. The nonsynonymous substitutions at a few amino acid sites resulted in functional divergence of the orthologous genes. Our findings provide new insights into the evolution of orthologous genes in closely related species.
Collapse
Affiliation(s)
- Qi Yang
- State Key Laboratory of Tree Genetics and Breeding, Chinese Academy of Forestry, Beijing, 100091, China
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xue-Min Han
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China
| | - Jin-Ke Gu
- State Key Laboratory of Biomembrane and Membrane Biotechnology, Tsinghua-Peking Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Yan-Jing Liu
- State Key Laboratory of Tree Genetics and Breeding, Chinese Academy of Forestry, Beijing, 100091, China
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China
| | - Mao-Jun Yang
- State Key Laboratory of Biomembrane and Membrane Biotechnology, Tsinghua-Peking Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Qing-Yin Zeng
- State Key Laboratory of Tree Genetics and Breeding, Chinese Academy of Forestry, Beijing, 100091, China
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| |
Collapse
|
28
|
Ambrosino L, Ruggieri V, Bostan H, Miralto M, Vitulo N, Zouine M, Barone A, Bouzayen M, Frusciante L, Pezzotti M, Valle G, Chiusano ML. Multilevel comparative bioinformatics to investigate evolutionary relationships and specificities in gene annotations: an example for tomato and grapevine. BMC Bioinformatics 2018; 19:435. [PMID: 30497367 PMCID: PMC6266932 DOI: 10.1186/s12859-018-2420-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
Background “Omics” approaches may provide useful information for a deeper understanding of speciation events, diversification and function innovation. This can be achieved by investigating the molecular similarities at sequence level between species, allowing the definition of ortholog and paralog genes. However, the spreading of sequenced genome, often endowed with still preliminary annotations, requires suitable bioinformatics to be appropriately exploited in this framework. Results We presented here a multilevel comparative approach to investigate on genome evolutionary relationships and peculiarities of two fleshy fruit species of relevant agronomic interest, Solanum lycopersicum (tomato) and Vitis vinifera (grapevine). We defined 17,823 orthology relationships between tomato and grapevine reference gene annotations. The resulting orthologs are associated with the detected paralogs in each species, permitting the definition of gene networks, useful to investigate the different relationships. The reconciliation of the compared collections in terms of an updating of the functional descriptions was also exploited. All the results were made accessible in ComParaLogs, a dedicated bioinformatics platform available at http://biosrv.cab.unina.it/comparalogs/gene/search. Conclusions The aim of the work was to suggest a reliable approach to detect all similarities of gene loci between two species based on the integration of results from different levels of information, such as the gene, the transcript and the protein sequences, overcoming possible limits due to exclusive protein versus protein comparisons. This to define reliable ortholog and paralog genes, as well as species specific gene loci in the two species, overcoming limits due to the possible draft nature of preliminary gene annotations. Moreover, reconciled functional descriptions, as well as common or peculiar enzymatic classes and protein domains from tomato and grapevine, together with the definition of species-specific gene sets after the pairwise comparisons, contributed a comprehensive set of information useful to comparatively exploit the two species gene annotations and investigate on differences between species with climacteric and non-climacteric fruits. In addition, the definition of networks of ortholog genes and of associated paralogs, and the organization of web-based interfaces for the exploration of the results, defined a friendly computational bench-work in support of comparative analyses between two species. Electronic supplementary material The online version of this article (10.1186/s12859-018-2420-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Luca Ambrosino
- Department of Agriculture, University of Naples "Federico II,", Portici, Naples, Italy.,Current address: Research Infrastructures for Marine Biological Resources, Stazione Zoologica Anton Dohrn, Naples, Italy
| | - Valentino Ruggieri
- Department of Agriculture, University of Naples "Federico II,", Portici, Naples, Italy.,Current address: Center for Research in Agricultural Genomics, Cerdanyola, Barcelona, Spain
| | - Hamed Bostan
- Department of Agriculture, University of Naples "Federico II,", Portici, Naples, Italy.,Current address: Plants for Human Health Institute, North Carolina State University, Kannapolis, NC, USA
| | - Marco Miralto
- Department of Agriculture, University of Naples "Federico II,", Portici, Naples, Italy.,Current address: Research Infrastructures for Marine Biological Resources, Stazione Zoologica Anton Dohrn, Naples, Italy
| | - Nicola Vitulo
- Department of Biotechnology, University of Verona, Verona, Italy
| | - Mohamed Zouine
- Génomique et Biotechnologie des Fruits, UMR990 INRA / INP-Toulouse, Université de Toulouse, Castanet-Tolosan, France
| | - Amalia Barone
- Department of Agriculture, University of Naples "Federico II,", Portici, Naples, Italy
| | - Mondher Bouzayen
- Génomique et Biotechnologie des Fruits, UMR990 INRA / INP-Toulouse, Université de Toulouse, Castanet-Tolosan, France
| | - Luigi Frusciante
- Department of Agriculture, University of Naples "Federico II,", Portici, Naples, Italy
| | - Mario Pezzotti
- Department of Biotechnology, University of Verona, Verona, Italy
| | - Giorgio Valle
- CRIBI Biotechnology Centre, University of Padova, Padova, Italy
| | - Maria Luisa Chiusano
- Department of Agriculture, University of Naples "Federico II,", Portici, Naples, Italy. .,Research Infrastructures for Marine Biological Resources, Stazione Zoologica Anton Dohrn, Naples, Italy.
| |
Collapse
|
29
|
Mier P, Pérez-Pulido AJ, Andrade-Navarro MA. Automated selection of homologs to track the evolutionary history of proteins. BMC Bioinformatics 2018; 19:431. [PMID: 30453878 PMCID: PMC6245638 DOI: 10.1186/s12859-018-2457-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Accepted: 10/31/2018] [Indexed: 11/26/2022] Open
Abstract
Background The selection of distant homologs of a query protein under study is a usual and useful application of protein sequence databases. Such sets of homologs are often applied to investigate the function of a protein and the degree to which experimental results can be transferred from one organism to another. In particular, a variety of databases facilitates static browsing for orthologs. However, these resources have a limited power when identifying orthologs between taxonomically distant species. In addition, in some situations, for a given query protein, it is advantageous to compare the sets of orthologs from different specific organisms: this recursive step-wise search might give an idea of the evolutionary path of the protein as a series of consecutive steps, for example gaining or losing domains. However, a step-wise orthology search is a time-consuming task if the number of steps is high. Results To illustrate a solution for this problem, we present the web tool ProteinPathTracker, which allows to track the evolutionary history of a query protein by locating homologs in selected proteomes along several evolutionary paths. Additional functionalities include locking a region of interest to follow its evolution in the discovered homologous sequences and the study of the protein function evolution by analysis of the annotations of the homologs. Conclusions ProteinPathTracker is an easy-to-use web tool that automatises the practice of looking for selected homologs in distant species in a straightforward way for non-expert users. Electronic supplementary material The online version of this article (10.1186/s12859-018-2457-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany.
| | | | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany
| |
Collapse
|
30
|
OrthoList 2: A New Comparative Genomic Analysis of Human and Caenorhabditis elegans Genes. Genetics 2018; 210:445-461. [PMID: 30120140 DOI: 10.1534/genetics.118.301307] [Citation(s) in RCA: 184] [Impact Index Per Article: 30.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Accepted: 08/15/2018] [Indexed: 11/18/2022] Open
Abstract
OrthoList, a compendium of Caenorhabditis elegans genes with human orthologs compiled in 2011 by a meta-analysis of four orthology-prediction methods, has been a popular tool for identifying conserved genes for research into biological and disease mechanisms. However, the efficacy of orthology prediction depends on the accuracy of gene-model predictions, an ongoing process, and orthology-prediction algorithms have also been updated over time. Here we present OrthoList 2 (OL2), a new comparative genomic analysis between C. elegans and humans, and the first assessment of how changes over time affect the landscape of predicted orthologs between two species. Although we find that updates to the orthology-prediction methods significantly changed the landscape of C. elegans-human orthologs predicted by individual programs and-unexpectedly-reduced agreement among them, we also show that our meta-analysis approach "buffered" against changes in gene content. We show that adding results from more programs did not lead to many additions to the list and discuss reasons to avoid assigning "scores" based on support by individual orthology-prediction programs; the treatment of "legacy" genes no longer predicted by these programs; and the practical difficulties of updating due to encountering deprecated, changed, or retired gene identifiers. In addition, we consider what other criteria may support claims of orthology and alternative approaches to find potential orthologs that elude identification by these programs. Finally, we created a new web-based tool that allows for rapid searches of OL2 by gene identifiers, protein domains [InterPro and SMART (Simple Modular Architecture Research Tool], or human disease associations ([OMIM (Online Mendelian Inheritence in Man], and also includes available RNA-interference resources to facilitate potential translational cross-species studies.
Collapse
|
31
|
Dunn CW, Zapata F, Munro C, Siebert S, Hejnol A. Pairwise comparisons across species are problematic when analyzing functional genomic data. Proc Natl Acad Sci U S A 2018; 115:E409-E417. [PMID: 29301966 PMCID: PMC5776959 DOI: 10.1073/pnas.1707515115] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
There is considerable interest in comparing functional genomic data across species. One goal of such work is to provide an integrated understanding of genome and phenotype evolution. Most comparative functional genomic studies have relied on multiple pairwise comparisons between species, an approach that does not incorporate information about the evolutionary relationships among species. The statistical problems that arise from not considering these relationships can lead pairwise approaches to the wrong conclusions and are a missed opportunity to learn about biology that can only be understood in an explicit phylogenetic context. Here, we examine two recently published studies that compare gene expression across species with pairwise methods, and find reason to question the original conclusions of both. One study interpreted pairwise comparisons of gene expression as support for the ortholog conjecture, the hypothesis that orthologs tend to have more similar attributes (expression in this case) than paralogs. The other study interpreted pairwise comparisons of embryonic gene expression across distantly related animals as evidence for a distinct evolutionary process that gave rise to phyla. In each study, distinct patterns of pairwise similarity among species were originally interpreted as evidence of particular evolutionary processes, but instead, we find that they reflect species relationships. These reanalyses concretely show the inadequacy of pairwise comparisons for analyzing functional genomic data across species. It will be critical to adopt phylogenetic comparative methods in future functional genomic work. Fortunately, phylogenetic comparative biology is also a rapidly advancing field with many methods that can be directly applied to functional genomic data.
Collapse
Affiliation(s)
- Casey W Dunn
- Department of Ecology and Evolutionary Biology, Brown University, Providence, RI 02912;
| | - Felipe Zapata
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095
| | - Catriona Munro
- Department of Ecology and Evolutionary Biology, Brown University, Providence, RI 02912
| | - Stefan Siebert
- Department of Molecular and Cellular Biology, University of California, Davis, CA 95616
| | - Andreas Hejnol
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen 5006, Norway
| |
Collapse
|
32
|
Ramakrishnan Varadarajan A, Mopuri R, Streelman JT, McGrath PT. Genome-wide protein phylogenies for four African cichlid species. BMC Evol Biol 2018; 18:1. [PMID: 29368592 PMCID: PMC5784529 DOI: 10.1186/s12862-017-1072-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Accepted: 11/15/2017] [Indexed: 11/29/2022] Open
Abstract
Background The thousands of species of closely related cichlid fishes in the great lakes of East Africa are a powerful model for understanding speciation and the genetic basis of trait variation. Recently, the genomes of five species of African cichlids representing five distinct lineages were sequenced and used to predict protein products at a genome-wide level. Here we characterize the evolutionary relationship of each cichlid protein to previously sequenced animal species. Results We used the Treefam database, a set of preexisting protein phylogenies built using 109 previously sequenced genomes, to identify Treefam families for each protein annotated from four cichlid species: Metriaclima zebra, Astatotilapia burtoni, Pundamilia nyererei and Neolamporologus brichardi. For each of these Treefam families, we built new protein phylogenies containing each of the cichlid protein hits. Using these new phylogenies we identified the evolutionary relationship of each cichlid protein to its nearest human and zebrafish protein. This data is available either through download or through a webserver we have implemented. Conclusion These phylogenies will be useful for any cichlid researchers trying to predict biological and protein function for a given cichlid gene, understanding the evolutionary history of a given cichlid gene, identifying recently duplicated cichlid genes, or performing genome-wide analysis in cichlids that relies on using databases generated from other species. Electronic supplementary material The online version of this article (10.1186/s12862-017-1072-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Rohini Mopuri
- Department of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Dr., Atlanta, GA, 30332, USA
| | - J Todd Streelman
- Department of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Dr., Atlanta, GA, 30332, USA
| | - Patrick T McGrath
- Department of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Dr., Atlanta, GA, 30332, USA.
| |
Collapse
|
33
|
Darby CA, Stolzer M, Ropp PJ, Barker D, Durand D. Xenolog classification. Bioinformatics 2017; 33:640-649. [PMID: 27998934 PMCID: PMC5860392 DOI: 10.1093/bioinformatics/btw686] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2016] [Accepted: 10/26/2016] [Indexed: 01/31/2023] Open
Abstract
Motivation Orthology analysis is a fundamental tool in comparative genomics. Sophisticated methods have been developed to distinguish between orthologs and paralogs and to classify paralogs into subtypes depending on the duplication mechanism and timing, relative to speciation. However, no comparable framework exists for xenologs: gene pairs whose history, since their divergence, includes a horizontal transfer. Further, the diversity of gene pairs that meet this broad definition calls for classification of xenologs with similar properties into subtypes. Results We present a xenolog classification that uses phylogenetic reconciliation to assign each pair of genes to a class based on the event responsible for their divergence and the historical association between genes and species. Our classes distinguish between genes related through transfer alone and genes related through duplication and transfer. Further, they separate closely-related genes in distantly-related species from distantly-related genes in closely-related species. We present formal rules that assign gene pairs to specific xenolog classes, given a reconciled gene tree with an arbitrary number of duplications and transfers. These xenology classification rules have been implemented in software and tested on a collection of ∼13 000 prokaryotic gene families. In addition, we present a case study demonstrating the connection between xenolog classification and gene function prediction. Availability and Implementation The xenolog classification rules have been implemented in N otung 2.9, a freely available phylogenetic reconciliation software package. http://www.cs.cmu.edu/~durand/Notung . Gene trees are available at http://dx.doi.org/10.7488/ds/1503 . Contact durand@cmu.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Charlotte A Darby
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Maureen Stolzer
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Patrick J Ropp
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Daniel Barker
- School of Biology, University of St. Andrews, St. Andrews, Fife KY16 9TH, UK
| | - Dannie Durand
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA.,Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
34
|
Guschanski K, Warnefors M, Kaessmann H. The evolution of duplicate gene expression in mammalian organs. Genome Res 2017; 27:1461-1474. [PMID: 28743766 PMCID: PMC5580707 DOI: 10.1101/gr.215566.116] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2016] [Accepted: 07/18/2017] [Indexed: 12/16/2022]
Abstract
Gene duplications generate genomic raw material that allows the emergence of novel functions, likely facilitating adaptive evolutionary innovations. However, global assessments of the functional and evolutionary relevance of duplicate genes in mammals were until recently limited by the lack of appropriate comparative data. Here, we report a large-scale study of the expression evolution of DNA-based functional gene duplicates in three major mammalian lineages (placental mammals, marsupials, egg-laying monotremes) and birds, on the basis of RNA sequencing (RNA-seq) data from nine species and eight organs. We observe dynamic changes in tissue expression preference of paralogs with different duplication ages, suggesting differential contribution of paralogs to specific organ functions during vertebrate evolution. Specifically, we show that paralogs that emerged in the common ancestor of bony vertebrates are enriched for genes with brain-specific expression and provide evidence for differential forces underlying the preferential emergence of young testis- and liver-specific expressed genes. Further analyses uncovered that the overall spatial expression profiles of gene families tend to be conserved, with several exceptions of pronounced tissue specificity shifts among lineage-specific gene family expansions. Finally, we trace new lineage-specific genes that may have contributed to the specific biology of mammalian organs, including the little-studied placenta. Overall, our study provides novel and taxonomically broad evidence for the differential contribution of duplicate genes to tissue-specific transcriptomes and for their importance for the phenotypic evolution of vertebrates.
Collapse
Affiliation(s)
- Katerina Guschanski
- Department of Animal Ecology, Evolutionary Biology Centre, Uppsala University, S-75105 Uppsala, Sweden
| | - Maria Warnefors
- Center for Molecular Biology of Heidelberg University (ZMBH), DKFZ-ZMBH Alliance, D-69120 Heidelberg, Germany
| | - Henrik Kaessmann
- Center for Molecular Biology of Heidelberg University (ZMBH), DKFZ-ZMBH Alliance, D-69120 Heidelberg, Germany
| |
Collapse
|
35
|
Abstract
Surveys of public sequence resources show that experimentally supported functional information is still completely missing for a considerable fraction of known proteins and is clearly incomplete for an even larger portion. Bioinformatics methods have long made use of very diverse data sources alone or in combination to predict protein function, with the understanding that different data types help elucidate complementary biological roles. This chapter focuses on methods accepting amino acid sequences as input and producing GO term assignments directly as outputs; the relevant biological and computational concepts are presented along with the advantages and limitations of individual approaches.
Collapse
Affiliation(s)
- Domenico Cozzetto
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
| | - David T Jones
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK.
| |
Collapse
|
36
|
Kryuchkova-Mostacci N, Robinson-Rechavi M. Tissue-Specificity of Gene Expression Diverges Slowly between Orthologs, and Rapidly between Paralogs. PLoS Comput Biol 2016; 12:e1005274. [PMID: 28030541 PMCID: PMC5193323 DOI: 10.1371/journal.pcbi.1005274] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2016] [Accepted: 11/26/2016] [Indexed: 11/18/2022] Open
Abstract
The ortholog conjecture implies that functional similarity between orthologous genes is higher than between paralogs. It has been supported using levels of expression and Gene Ontology term analysis, although the evidence was rather weak and there were also conflicting reports. In this study on 12 species we provide strong evidence of high conservation in tissue-specificity between orthologs, in contrast to low conservation between within-species paralogs. This allows us to shed a new light on the evolution of gene expression patterns. While there have been several studies of the correlation of expression between species, little is known about the evolution of tissue-specificity itself. Ortholog tissue-specificity is strongly conserved between all tetrapod species, with the lowest Pearson correlation between mouse and frog at r = 0.66. Tissue-specificity correlation decreases strongly with divergence time. Paralogs in human show much lower conservation, even for recent Primate-specific paralogs. When both paralogs from ancient whole genome duplication tissue-specific paralogs are tissue-specific, it is often to different tissues, while other tissue-specific paralogs are mostly specific to the same tissue. The same patterns are observed using human or mouse as focal species, and are robust to choices of datasets and of thresholds. Our results support the following model of evolution: in the absence of duplication, tissue-specificity evolves slowly, and tissue-specific genes do not change their main tissue of expression; after small-scale duplication the less expressed paralog loses the ancestral specificity, leading to an immediate difference between paralogs; over time, both paralogs become more broadly expressed, but remain poorly correlated. Finally, there is a small number of paralog pairs which stay tissue-specific with the same main tissue of expression, for at least 300 million years.
Collapse
Affiliation(s)
- Nadezda Kryuchkova-Mostacci
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Marc Robinson-Rechavi
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
37
|
Harkess A, Leebens-Mack J. A Century of Sex Determination in Flowering Plants. J Hered 2016; 108:69-77. [PMID: 27974487 DOI: 10.1093/jhered/esw060] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2016] [Accepted: 09/07/2016] [Indexed: 11/14/2022] Open
Abstract
Plants have evolved a diverse array of strategies for sexual reproduction, particularly through the modification of male and female organs at distinct points in development. The immense variation in sexual systems across the land plants provides a unique opportunity to study the genetic, epigenetic, phylogenetic, and ecological underpinnings of sex determination. Here, we reflect on more than a century of research into flowering plant sex determination, placing a particular focus on the foundational genetic and cytogenetic observations, experiments, and hypotheses. Building on the seminal work on the genetics of plant sex, modern comparative genomic analyses now allow us to address longstanding questions about sex determination and the origins of sex chromosomes.
Collapse
Affiliation(s)
- Alex Harkess
- From the Department of Plant Biology, University of Georgia, Athens, GA 30602 (Harkess and Leebens-Mack), Alex Harkess is now at the Donald Danforth Plant Science Center, St. Louis MO 63132.
| | - Jim Leebens-Mack
- From the Department of Plant Biology, University of Georgia, Athens, GA 30602 (Harkess and Leebens-Mack), Alex Harkess is now at the Donald Danforth Plant Science Center, St. Louis MO 63132
| |
Collapse
|
38
|
Sutphin GL, Mahoney JM, Sheppard K, Walton DO, Korstanje R. WORMHOLE: Novel Least Diverged Ortholog Prediction through Machine Learning. PLoS Comput Biol 2016; 12:e1005182. [PMID: 27812085 PMCID: PMC5094675 DOI: 10.1371/journal.pcbi.1005182] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Accepted: 10/05/2016] [Indexed: 01/01/2023] Open
Abstract
The rapid advancement of technology in genomics and targeted genetic manipulation has made comparative biology an increasingly prominent strategy to model human disease processes. Predicting orthology relationships between species is a vital component of comparative biology. Dozens of strategies for predicting orthologs have been developed using combinations of gene and protein sequence, phylogenetic history, and functional interaction with progressively increasing accuracy. A relatively new class of orthology prediction strategies combines aspects of multiple methods into meta-tools, resulting in improved prediction performance. Here we present WORMHOLE, a novel ortholog prediction meta-tool that applies machine learning to integrate 17 distinct ortholog prediction algorithms to identify novel least diverged orthologs (LDOs) between 6 eukaryotic species-humans, mice, zebrafish, fruit flies, nematodes, and budding yeast. Machine learning allows WORMHOLE to intelligently incorporate predictions from a wide-spectrum of strategies in order to form aggregate predictions of LDOs with high confidence. In this study we demonstrate the performance of WORMHOLE across each combination of query and target species. We show that WORMHOLE is particularly adept at improving LDO prediction performance between distantly related species, expanding the pool of LDOs while maintaining low evolutionary distance and a high level of functional relatedness between genes in LDO pairs. We present extensive validation, including cross-validated prediction of PANTHER LDOs and evaluation of evolutionary divergence and functional similarity, and discuss future applications of machine learning in ortholog prediction. A WORMHOLE web tool has been developed and is available at http://wormhole.jax.org/.
Collapse
Affiliation(s)
| | - J. Matthew Mahoney
- Department of Neurological Sciences, University of Vermont College of Medicine, Burlington, VT, United States of America
| | - Keith Sheppard
- The Jackson Laboratory, Bar Harbor, ME, United States of America
| | - David O. Walton
- The Jackson Laboratory, Bar Harbor, ME, United States of America
| | - Ron Korstanje
- The Jackson Laboratory, Bar Harbor, ME, United States of America
| |
Collapse
|
39
|
Chen G, Chen J, Yang J, Chen L, Qu X, Shi C, Ning B, Shi L, Tong W, Zhao Y, Zhang M, Shi T. Significant variations in alternative splicing patterns and expression profiles between human-mouse orthologs in early embryos. SCIENCE CHINA-LIFE SCIENCES 2016; 60:178-188. [PMID: 27378339 DOI: 10.1007/s11427-015-0348-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Accepted: 02/11/2016] [Indexed: 02/05/2023]
Abstract
Human and mouse orthologs are expected to have similar biological functions; however, many discrepancies have also been reported. We systematically compared human and mouse orthologs in terms of alternative splicing patterns and expression profiles. Human-mouse orthologs are divergent in alternative splicing, as human orthologs could generally encode more isoforms than their mouse orthologs. In early embryos, exon skipping is far more common with human orthologs, whereas constitutive exons are more prevalent with mouse orthologs. This may correlate with divergence in expression of splicing regulators. Orthologous expression similarities are different in distinct embryonic stages, with the highest in morula. Expression differences for orthologous transcription factor genes could play an important role in orthologous expression discordance. We further detected largely orthologous divergence in differential expression between distinct embryonic stages. Collectively, our study uncovers significant orthologous divergence from multiple aspects, which may result in functional differences and dynamics between human-mouse orthologs during embryonic development.
Collapse
Affiliation(s)
- Geng Chen
- The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China.,Center for Pharmacogenomics, School of Pharmacy, Fudan University, Shanghai, 201203, China
| | - Jiwei Chen
- The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Jianmin Yang
- The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Long Chen
- The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Xiongfei Qu
- The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Caiping Shi
- The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Baitang Ning
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Leming Shi
- Center for Pharmacogenomics, School of Pharmacy, Fudan University, Shanghai, 201203, China.,National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Weida Tong
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Yongxiang Zhao
- Biological Targeting Diagnosis and Therapy Research Center, Guangxi Medical University, Nanning, 530021, China.
| | - Meixia Zhang
- Department of Ophthalmology, West China Hospital, Sichuan University, Chengdu, 610041, China.
| | - Tieliu Shi
- The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China.
| |
Collapse
|
40
|
Abstract
Correctly estimating the age of a gene or gene family is important for a variety of fields, including molecular evolution, comparative genomics, and phylogenetics, and increasingly for systems biology and disease genetics. However, most studies use only a point estimate of a gene’s age, neglecting the substantial uncertainty involved in this estimation. Here, we characterize this uncertainty by investigating the effect of algorithm choice on gene-age inference and calculate consensus gene ages with attendant error distributions for a variety of model eukaryotes. We use 13 orthology inference algorithms to create gene-age datasets and then characterize the error around each age-call on a per-gene and per-algorithm basis. Systematic error was found to be a large factor in estimating gene age, suggesting that simple consensus algorithms are not enough to give a reliable point estimate. We also found that different sources of error can affect downstream analyses, such as gene ontology enrichment. Our consensus gene-age datasets, with associated error terms, are made fully available at so that researchers can propagate this uncertainty through their analyses (geneages.org).
Collapse
Affiliation(s)
- Benjamin J Liebeskind
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin Center for Computational Biology and Bioinformatics, University of Texas at Austin
| | - Claire D McWhite
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin
| | - Edward M Marcotte
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin
| |
Collapse
|
41
|
Standardized benchmarking in the quest for orthologs. Nat Methods 2016; 13:425-30. [PMID: 27043882 PMCID: PMC4827703 DOI: 10.1038/nmeth.3830] [Citation(s) in RCA: 132] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2016] [Accepted: 03/09/2016] [Indexed: 11/23/2022]
Abstract
Achieving high accuracy in orthology inference is essential for many comparative, evolutionary and functional genomic analyses, yet the true evolutionary history of genes is generally unknown and orthologs are used for very different applications across phyla, requiring different precision–recall trade-offs. As a result, it is difficult to assess the performance of orthology inference methods. Here, we present a community effort to establish standards and an automated web-based service to facilitate orthology benchmarking. Using this service, we characterize 15 well-established inference methods and resources on a battery of 20 different benchmarks. Standardized benchmarking provides a way for users to identify the most effective methods for the problem at hand, sets a minimum requirement for new tools and resources, and guides the development of more accurate orthology inference methods.
Collapse
|
42
|
Applications of comparative evolution to human disease genetics. Curr Opin Genet Dev 2015; 35:16-24. [PMID: 26338499 DOI: 10.1016/j.gde.2015.08.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2015] [Revised: 08/11/2015] [Accepted: 08/12/2015] [Indexed: 12/15/2022]
Abstract
Direct comparison of human diseases with model phenotypes allows exploration of key areas of human biology which are often inaccessible for practical or ethical reasons. We review recent developments in comparative evolutionary approaches for finding models for genetic disease, including high-throughput generation of gene/phenotype relationship data, the linking of orthologous genes and phenotypes across species, and statistical methods for linking human diseases to model phenotypes.
Collapse
|
43
|
|
44
|
Kachroo AH, Laurent JM, Yellman CM, Meyer AG, Wilke CO, Marcotte EM. Evolution. Systematic humanization of yeast genes reveals conserved functions and genetic modularity. Science 2015; 348:921-5. [PMID: 25999509 PMCID: PMC4718922 DOI: 10.1126/science.aaa0769] [Citation(s) in RCA: 281] [Impact Index Per Article: 31.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
To determine whether genes retain ancestral functions over a billion years of evolution and to identify principles of deep evolutionary divergence, we replaced 414 essential yeast genes with their human orthologs, assaying for complementation of lethal growth defects upon loss of the yeast genes. Nearly half (47%) of the yeast genes could be successfully humanized. Sequence similarity and expression only partly predicted replaceability. Instead, replaceability depended strongly on gene modules: Genes in the same process tended to be similarly replaceable (e.g., sterol biosynthesis) or not (e.g., DNA replication initiation). Simulations confirmed that selection for specific function can maintain replaceability despite extensive sequence divergence. Critical ancestral functions of many essential genes are thus retained in a pathway-specific manner, resilient to drift in sequences, splicing, and protein interfaces.
Collapse
Affiliation(s)
- Aashiq H Kachroo
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX 78712, USA
| | - Jon M Laurent
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX 78712, USA
| | - Christopher M Yellman
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX 78712, USA
| | - Austin G Meyer
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX 78712, USA. Center for Computational Biology and Bioinformatics, University of Texas at Austin, Austin, TX 78712, USA
| | - Claus O Wilke
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX 78712, USA. Center for Computational Biology and Bioinformatics, University of Texas at Austin, Austin, TX 78712, USA. Department of Integrative Biology, University of Texas at Austin, Austin, TX 78712, USA
| | - Edward M Marcotte
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX 78712, USA. Center for Computational Biology and Bioinformatics, University of Texas at Austin, Austin, TX 78712, USA. Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA.
| |
Collapse
|
45
|
Wang C, Liu Y, Li SS, Han GZ. Insights into the origin and evolution of the plant hormone signaling machinery. PLANT PHYSIOLOGY 2015; 167:872-86. [PMID: 25560880 PMCID: PMC4348752 DOI: 10.1104/pp.114.247403] [Citation(s) in RCA: 158] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Plant hormones modulate plant growth, development, and defense. However, many aspects of the origin and evolution of plant hormone signaling pathways remain obscure. Here, we use a comparative genomic and phylogenetic approach to investigate the origin and evolution of nine major plant hormone (abscisic acid, auxin, brassinosteroid, cytokinin, ethylene, gibberellin, jasmonate, salicylic acid, and strigolactone) signaling pathways. Our multispecies genome-wide analysis reveals that: (1) auxin, cytokinin, and strigolactone signaling pathways originated in charophyte lineages; (2) abscisic acid, jasmonate, and salicylic acid signaling pathways arose in the last common ancestor of land plants; (3) gibberellin signaling evolved after the divergence of bryophytes from land plants; (4) the canonical brassinosteroid signaling originated before the emergence of angiosperms but likely after the split of gymnosperms and angiosperms; and (5) the origin of the canonical ethylene signaling pathway postdates shortly the emergence of angiosperms. Our findings might have important implications in understanding the molecular mechanisms underlying the emergence of land plants.
Collapse
Affiliation(s)
- Chunyang Wang
- Jiangsu Key Laboratory for Microbes and Functional Genomics, Jiangsu Engineering and Technology Research Center for Microbiology, College of Life Sciences, Nanjing Normal University, Nanjing, Jiangsu 210023, China (C.W., G.-Z.H.);State Key Laboratory of Crop Biology, Shandong Agricultural University, Tai'an, Shandong 271018, China (C.W., Y.L., S.-S.L.); andDepartment of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721 (G.-Z.H.)
| | - Yang Liu
- Jiangsu Key Laboratory for Microbes and Functional Genomics, Jiangsu Engineering and Technology Research Center for Microbiology, College of Life Sciences, Nanjing Normal University, Nanjing, Jiangsu 210023, China (C.W., G.-Z.H.);State Key Laboratory of Crop Biology, Shandong Agricultural University, Tai'an, Shandong 271018, China (C.W., Y.L., S.-S.L.); andDepartment of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721 (G.-Z.H.)
| | - Si-Shen Li
- Jiangsu Key Laboratory for Microbes and Functional Genomics, Jiangsu Engineering and Technology Research Center for Microbiology, College of Life Sciences, Nanjing Normal University, Nanjing, Jiangsu 210023, China (C.W., G.-Z.H.);State Key Laboratory of Crop Biology, Shandong Agricultural University, Tai'an, Shandong 271018, China (C.W., Y.L., S.-S.L.); andDepartment of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721 (G.-Z.H.)
| | - Guan-Zhu Han
- Jiangsu Key Laboratory for Microbes and Functional Genomics, Jiangsu Engineering and Technology Research Center for Microbiology, College of Life Sciences, Nanjing Normal University, Nanjing, Jiangsu 210023, China (C.W., G.-Z.H.);State Key Laboratory of Crop Biology, Shandong Agricultural University, Tai'an, Shandong 271018, China (C.W., Y.L., S.-S.L.); andDepartment of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721 (G.-Z.H.)
| |
Collapse
|
46
|
Rogozin IB, Managadze D, Shabalina SA, Koonin EV. Gene family level comparative analysis of gene expression in mammals validates the ortholog conjecture. Genome Biol Evol 2015; 6:754-62. [PMID: 24610837 PMCID: PMC4007545 DOI: 10.1093/gbe/evu051] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
The ortholog conjecture (OC), which is central to functional annotation of genomes, posits that orthologous genes are functionally more similar than paralogous genes at the same level of sequence divergence. However, a recent study challenged the OC by reporting a greater functional similarity, in terms of Gene Ontology (GO) annotations and expression profiles, among within-species paralogs compared with orthologs. These findings were taken to indicate that functional similarity of homologous genes is primarily determined by the cellular context of the genes, rather than evolutionary history. However, several subsequent studies suggest that GO annotations and microarray data could artificially inflate functional similarity between paralogs from the same organism. We sought to test the OC using approaches distinct from those used in previous studies. Analysis of a large RNAseq data set from multiple human and mouse tissues shows that expression similarity (correlations coefficients, rank's, or Z-scores) between orthologs is substantially greater than that for between-species paralogs with the same sequence divergence, in agreement with the OC and the results of recent detailed analyses. These findings are further corroborated by a fine-grain analysis in which expression profiles of orthologs and paralogs were compared separately for individual gene families. Expression profiles of within-species paralogs are more strongly correlated than profiles of orthologs but it is shown that this is caused by high background noise, that is, correlation between profiles of unrelated genes in the same organism. Z-scores and rank scores show a nonmonotonic dependence of expression profile similarity on sequence divergence. This complexity of gene expression evolution after duplication might be at least partially caused by selection for protein dosage rebalancing following gene duplication.
Collapse
Affiliation(s)
- Igor B Rogozin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland
| | | | | | | |
Collapse
|
47
|
Sonnhammer ELL, Gabaldón T, Sousa da Silva AW, Martin M, Robinson-Rechavi M, Boeckmann B, Thomas PD, Dessimoz C. Big data and other challenges in the quest for orthologs. Bioinformatics 2014; 30:2993-8. [PMID: 25064571 PMCID: PMC4201156 DOI: 10.1093/bioinformatics/btu492] [Citation(s) in RCA: 98] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2014] [Revised: 06/25/2014] [Accepted: 07/16/2014] [Indexed: 01/29/2023] Open
Abstract
UNLABELLED Given the rapid increase of species with a sequenced genome, the need to identify orthologous genes between them has emerged as a central bioinformatics task. Many different methods exist for orthology detection, which makes it difficult to decide which one to choose for a particular application. Here, we review the latest developments and issues in the orthology field, and summarize the most recent results reported at the third 'Quest for Orthologs' meeting. We focus on community efforts such as the adoption of reference proteomes, standard file formats and benchmarking. Progress in these areas is good, and they are already beneficial to both orthology consumers and providers. However, a major current issue is that the massive increase in complete proteomes poses computational challenges to many of the ortholog database providers, as most orthology inference algorithms scale at least quadratically with the number of proteomes. The Quest for Orthologs consortium is an open community with a number of working groups that join efforts to enhance various aspects of orthology analysis, such as defining standard formats and datasets, documenting community resources and benchmarking. AVAILABILITY AND IMPLEMENTATION All such materials are available at http://questfororthologs.org.
Collapse
Affiliation(s)
- Erik L L Sonnhammer
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London
| | - Toni Gabaldón
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London
| | - Alan W Sousa da Silva
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK
| | - Maria Martin
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK
| | - Marc Robinson-Rechavi
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London
| | - Brigitte Boeckmann
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK
| | - Paul D Thomas
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK
| | - Christophe Dessimoz
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London
| |
Collapse
|
48
|
Complexity of gene expression evolution after duplication: protein dosage rebalancing. GENETICS RESEARCH INTERNATIONAL 2014; 2014:516508. [PMID: 25197576 PMCID: PMC4150538 DOI: 10.1155/2014/516508] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/26/2014] [Accepted: 08/03/2014] [Indexed: 11/17/2022]
Abstract
Ongoing debates about functional importance of gene duplications have been recently intensified by a heated discussion of the “ortholog conjecture” (OC). Under the OC, which is central to functional annotation of genomes, orthologous genes are functionally more similar than paralogous genes at the same level of sequence divergence. However, a recent study challenged the OC by reporting a greater functional similarity, in terms of gene ontology (GO) annotations and expression profiles, among within-species paralogs compared to orthologs. These findings were taken to indicate that functional similarity of homologous genes is primarily determined by the cellular context of the genes, rather than evolutionary history. Subsequent studies suggested that the OC appears to be generally valid when applied to mammalian evolution but the complete picture of evolution of gene expression also has to incorporate lineage-specific aspects of paralogy. The observed complexity of gene expression evolution after duplication can be explained through selection for gene dosage effect combined with the duplication-degeneration-complementation model. This paper discusses expression divergence of recent duplications occurring before functional divergence of proteins encoded by duplicate genes.
Collapse
|
49
|
Ward N, Moreno-Hagelsieb G. Quickly finding orthologs as reciprocal best hits with BLAT, LAST, and UBLAST: how much do we miss? PLoS One 2014; 9:e101850. [PMID: 25013894 PMCID: PMC4094424 DOI: 10.1371/journal.pone.0101850] [Citation(s) in RCA: 107] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2014] [Accepted: 06/11/2014] [Indexed: 11/30/2022] Open
Abstract
Reciprocal Best Hits (RBH) are a common proxy for orthology in comparative genomics. Essentially, a RBH is found when the proteins encoded by two genes, each in a different genome, find each other as the best scoring match in the other genome. NCBI's BLAST is the software most usually used for the sequence comparisons necessary to finding RBHs. Since sequence comparison can be time consuming, we decided to compare the number and quality of RBHs detected using algorithms that run in a fraction of the time as BLAST. We tested BLAT, LAST and UBLAST. All three programs ran in a hundredth to a 25th of the time required to run BLAST. A reduction in the number of homologs and RBHs found by the faster algorithms compared to BLAST becomes apparent as the genomes compared become more dissimilar, with BLAT, a program optimized for quickly finding very similar sequences, missing both the most homologs and the most RBHs. Though LAST produced the closest number of homologs and RBH to those produced with BLAST, UBLAST was very close, with either program producing between 0.6 and 0.8 of the RBHs as BLAST between dissimilar genomes, while in more similar genomes the differences were barely apparent. UBLAST ran faster than LAST, making it the best option among the programs tested.
Collapse
Affiliation(s)
- Natalie Ward
- Department of Biology, Wilfrid Laurier University, Waterloo, Ontario, Canada
| | | |
Collapse
|
50
|
Abstract
The use of model organisms as tools for the investigation of human genetic variation has significantly and rapidly advanced our understanding of the aetiologies underlying hereditary traits. However, while equivalences in the DNA sequence of two species may be readily inferred through evolutionary models, the identification of equivalence in the phenotypic consequences resulting from comparable genetic variation is far from straightforward, limiting the value of the modelling paradigm. In this review, we provide an overview of the emerging statistical and computational approaches to objectively identify phenotypic equivalence between human and model organisms with examples from the vertebrate models, mouse and zebrafish. Firstly, we discuss enrichment approaches, which deem the most frequent phenotype among the orthologues of a set of genes associated with a common human phenotype as the orthologous phenotype, or phenolog, in the model species. Secondly, we introduce and discuss computational reasoning approaches to identify phenotypic equivalences made possible through the development of intra- and interspecies ontologies. Finally, we consider the particular challenges involved in modelling neuropsychiatric disorders, which illustrate many of the remaining difficulties in developing comprehensive and unequivocal interspecies phenotype mappings.
Collapse
Affiliation(s)
- Peter N. Robinson
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
- Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Berlin, Germany
- Max Planck Institute for Molecular Genetics, Berlin, Germany
- Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
- * E-mail: (PNR); (CW)
| | - Caleb Webber
- MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
- * E-mail: (PNR); (CW)
| |
Collapse
|