1
|
Assis R, Conant G, Holland B, Liberles DA, O'Reilly MM, Wilson AE. Models for the retention of duplicate genes and their biological underpinnings. F1000Res 2024; 12:1400. [PMID: 38173826 PMCID: PMC10762295 DOI: 10.12688/f1000research.141786.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/08/2024] [Indexed: 01/05/2024] Open
Abstract
Gene content in genomes changes through several different processes, with gene duplication being an important contributor to such changes. Gene duplication occurs over a range of scales from individual genes to whole genomes, and the dynamics of this process can be context dependent. Still, there are rules by which genes are retained or lost from genomes after duplication, and probabilistic modeling has enabled characterization of these rules, including their context-dependence. Here, we describe the biology and corresponding mathematical models that are used to understand duplicate gene retention and its contribution to the set of biochemical functions encoded in a genome.
Collapse
Affiliation(s)
- Raquel Assis
- Florida Atlantic University, Boca Raton, Florida, USA
| | - Gavin Conant
- North Carolina State University, Raleigh, North Carolina, USA
| | | | | | | | | |
Collapse
|
2
|
Wilson AE, Liberles DA. Expectations of duplicate gene retention under the gene duplicability hypothesis. BMC Ecol Evol 2023; 23:76. [PMID: 38097959 PMCID: PMC10720195 DOI: 10.1186/s12862-023-02174-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 11/02/2023] [Indexed: 12/17/2023] Open
Abstract
BACKGROUND Gene duplication is an important process in evolution. What causes some genes to be retained after duplication and others to be lost is a process not well understood. The most prevalent theory is the gene duplicability hypothesis, that something about the function and number of interacting partners (number of subunits of protein complex, etc.), determines whether copies have more opportunity to be retained for long evolutionary periods. Some genes are also more susceptible to dosage balance effects following WGD events, making them more likely to be retained for longer periods of time. One would expect these processes that affect the retention of duplicate copies to affect the conditional probability ratio after consecutive whole genome duplication events. The probability that a gene will be retained after a second whole genome duplication event (WGD2), given that it was retained after the first whole genome duplication event (WGD1) versus the probability a gene will be retained after WGD2, given it was lost after WGD1 defines the probability ratio that is calculated. RESULTS Since duplicate gene retention is a time heterogeneous process, the time between the events (t1) and the time since the most recent event (t2) are relevant factors in calculating the expectation for observation in any genome. Here, we use a survival analysis framework to predict the probability ratio for genomes with different values of t1 and t2 under the gene duplicability hypothesis, that some genes are more susceptible to selectable functional shifts, some more susceptible to dosage compensation, and others only drifting. We also predict the probability ratio with different values of t1 and t2 under the mutational opportunity hypothesis, that probability of retention for certain genes changes in subsequent events depending upon how they were previously retained. These models are nested such that the mutational opportunity model encompasses the gene duplicability model with shifting duplicability over time. Here we present a formalization of the gene duplicability and mutational opportunity hypotheses to characterize evolutionary dynamics and explanatory power in a recently developed statistical framework. CONCLUSIONS This work presents expectations of the gene duplicability and mutational opportunity hypotheses over time under different sets of assumptions. This expectation will enable formal testing of processes leading to duplicate gene retention.
Collapse
Affiliation(s)
- Amanda E Wilson
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, 1900 N. 12th Street, Philadelphia, PA, 19122, USA
| | - David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, 1900 N. 12th Street, Philadelphia, PA, 19122, USA.
| |
Collapse
|
3
|
Vance Z, McLysaght A. Ohnologs and SSD Paralogs Differ in Genomic and Expression Features Related to Dosage Constraints. Genome Biol Evol 2023; 15:evad174. [PMID: 37776514 PMCID: PMC10563793 DOI: 10.1093/gbe/evad174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 09/21/2023] [Accepted: 09/26/2023] [Indexed: 10/02/2023] Open
Abstract
Gene duplication is recognized as a critical process in genome evolution; however, many questions about this process remain unanswered. Although gene duplicability has been observed to differ by duplication mechanism and evolutionary rate, there is so far no broad characterization of its determinants. Many features correlate with this difference in duplicability; however, our ability to exploit these observations to advance our understanding of the role of duplication in evolution is hampered by limitations within existing work. In particular, the existence of methodological differences across studies impedes meaningful comparison. Here, we use consistent definitions of duplicability in the human lineage to explore these associations, allow resolution of the impact of confounding factors, and define the overall relevance of individual features. Using a classifier approach and controlling for the confounding effect of duplicate longevity, we find a subset of gene features important in differentiating genes duplicable by small-scale duplication from those duplicable by whole-genome duplication, revealing critical roles for gene dosage and expression costs in duplicability. We further delve into patterns of functional enrichment and find a lack of constraint on duplicate retention in any context for genes duplicable by small-scale duplication.
Collapse
Affiliation(s)
- Zoe Vance
- Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland
| | - Aoife McLysaght
- Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland
| |
Collapse
|
4
|
Yang X, Wang X, Zou Y, Zhang S, Xia M, Fu L, Vollger MR, Chen NC, Taylor DJ, Harvey WT, Logsdon GA, Meng D, Shi J, McCoy RC, Schatz MC, Li W, Eichler EE, Lu Q, Mao Y. Characterization of large-scale genomic differences in the first complete human genome. Genome Biol 2023; 24:157. [PMID: 37403156 PMCID: PMC10320979 DOI: 10.1186/s13059-023-02995-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 06/23/2023] [Indexed: 07/06/2023] Open
Abstract
BACKGROUND The first telomere-to-telomere (T2T) human genome assembly (T2T-CHM13) release is a milestone in human genomics. The T2T-CHM13 genome assembly extends our understanding of telomeres, centromeres, segmental duplication, and other complex regions. The current human genome reference (GRCh38) has been widely used in various human genomic studies. However, the large-scale genomic differences between these two important genome assemblies are not characterized in detail yet. RESULTS Here, in addition to the previously reported "non-syntenic" regions, we find 67 additional large-scale discrepant regions and precisely categorize them into four structural types with a newly developed website tool called SynPlotter. The discrepant regions (~ 21.6 Mbp) excluding telomeric and centromeric regions are highly structurally polymorphic in humans, where the deletions or duplications are likely associated with various human diseases, such as immune and neurodevelopmental disorders. The analyses of a newly identified discrepant region-the KLRC gene cluster-show that the depletion of KLRC2 by a single-deletion event is associated with natural killer cell differentiation in ~ 20% of humans. Meanwhile, the rapid amino acid replacements observed within KLRC3 are probably a result of natural selection in primate evolution. CONCLUSION Our study provides a foundation for understanding the large-scale structural genomic differences between the two crucial human reference genomes, and is thereby important for future human genomics studies.
Collapse
Affiliation(s)
- Xiangyu Yang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Xuankai Wang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Yawen Zou
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Shilong Zhang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Manying Xia
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Lianting Fu
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Dylan J Taylor
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Dan Meng
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Junfeng Shi
- Shanghai Engineering Research Center of Advanced Dental Technology and Materials, Shanghai, China
- Shanghai Key Laboratory of Stomatology, Shanghai Ninth People's Hospital, College of Stomatology, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Weidong Li
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Qing Lu
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China.
- Shanghai Key Laboratory of Stomatology, Shanghai Ninth People's Hospital, College of Stomatology, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
| |
Collapse
|
5
|
Fajardo D, Saint Jean R, Lyons PJ. Acquisition of new function through gene duplication in the metallocarboxypeptidase family. Sci Rep 2023; 13:2512. [PMID: 36781897 PMCID: PMC9925722 DOI: 10.1038/s41598-023-29800-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 02/10/2023] [Indexed: 02/15/2023] Open
Abstract
Gene duplication is a key first step in the process of expanding the functionality of a multigene family. In order to better understand the process of gene duplication and its role in the formation of new enzymes, we investigated recent duplication events in the M14 family of proteolytic enzymes. Within vertebrates, four of 23 M14 genes were frequently found in duplicate form. While AEBP1, CPXM1, and CPZ genes were duplicated once through a large-scale, likely whole-genome duplication event, the CPO gene underwent many duplication events within fish and Xenopus lineages. Bioinformatic analyses of enzyme specificity and conservation suggested a greater amount of neofunctionalization and purifying selection in CPO paralogs compared with other CPA/B enzymes. To examine the functional consequences of evolutionary changes on CPO paralogs, the four CPO paralogs from Xenopus tropicalis were expressed in Sf9 and HEK293T cells. Immunocytochemistry showed subcellular distribution of Xenopus CPO paralogs to be similar to that of human CPO. Upon activation with trypsin, the enzymes demonstrated differential activity against three substrates, suggesting an acquisition of new function following duplication and subsequent mutagenesis. Characteristics such as gene size and enzyme activation mechanisms are possible contributors to the evolutionary capacity of the CPO gene.
Collapse
Affiliation(s)
- Daniel Fajardo
- Department of Biology, Andrews University, Berrien Springs, MI, 49104, USA
| | - Ritchie Saint Jean
- Department of Biology, Andrews University, Berrien Springs, MI, 49104, USA
| | - Peter J Lyons
- Department of Biology, Andrews University, Berrien Springs, MI, 49104, USA.
| |
Collapse
|
6
|
Vance Z, Niezabitowski L, Hurst LD, McLysaght A. Evidence from Drosophila Supports Higher Duplicability of Faster Evolving Genes. Genome Biol Evol 2022; 14:6501445. [PMID: 35018456 PMCID: PMC8765793 DOI: 10.1093/gbe/evac003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/05/2022] [Indexed: 12/03/2022] Open
Abstract
The faster rate of evolution of duplicated genes relative to singletons has been well documented in multiple lineages. This observation has generally been attributed to a presumed release from constraint following creation of a redundant, duplicate copy. However, it is not obvious that the relationship operates in this direction. An alternative possibility—that the faster rate of evolution predates the duplication event and the observed differences result from a higher propensity to duplicate in fast-evolving genes—has been tested in primates and in insects. However, these studies arrived at different conclusions and clarity is needed on whether these contrasting results relate to differences in methodology or legitimate biological differences between the lineages selected. Here, we test whether duplicable genes are faster evolving independent of duplication in the Drosophila lineage and find that our results support the conclusion that faster evolving genes are more likely to duplicate, in agreement with previous work in primates. Our findings indicate that this characteristic of gene duplication is not restricted to a single lineage and has broad implications for the interpretation of the impact of gene duplication. We identify a subset of “singletons” which defy the general trends and appear to be faster evolving. Further investigation implicates homology detection failure and suggests that these may be duplicable genes with unidentifiable paralogs.
Collapse
Affiliation(s)
- Zoe Vance
- Smurfit Institute of Genetics, Trinity College Dublin, Ireland
| | | | - Laurence D Hurst
- Department of Biology and Biochemistry, University of Bath, United Kingdom
| | - Aoife McLysaght
- Smurfit Institute of Genetics, Trinity College Dublin, Ireland
| |
Collapse
|
7
|
Sánchez AL, Lafond M. Colorful orthology clustering in bounded-degree similarity graphs. J Bioinform Comput Biol 2021; 19:2140010. [PMID: 34775924 DOI: 10.1142/s0219720021400102] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Clustering genes in similarity graphs is a popular approach for orthology prediction. Most algorithms group genes without considering their species, which results in clusters that contain several paralogous genes. Moreover, clustering is known to be problematic when in-paralogs arise from ancient duplications. Recently, we proposed a two-step process that avoids these problems. First, we infer clusters of only orthologs (i.e. with only genes from distinct species), and second, we infer the missing inter-cluster orthologs. In this paper, we focus on the first step, which leads to a problem we call Colorful Clustering. In general, this is as hard as classical clustering. However, in similarity graphs, the number of species is usually small, as well as the neighborhood size of genes in other species. We therefore study the problem of clustering in which the number of colors is bounded by [Formula: see text], and each gene has at most [Formula: see text] neighbors in another species. We show that the well-known cluster editing formulation remains NP-hard even when [Formula: see text] and [Formula: see text]. We then propose a fixed-parameter algorithm in [Formula: see text] to find the single best cluster in the graph. We implemented this algorithm and included it in the aforementioned two-step approach. Experiments on simulated data show that this approach performs favorably to applying only an unconstrained clustering step.
Collapse
Affiliation(s)
- Alitzel López Sánchez
- Computer Science Department, Université de Sherbrooke, 2500 Boulevard de l'Université, Sherbrooke, Québec J1K 2R1, Canada
| | - Manuel Lafond
- Computer Science Department, Université de Sherbrooke, 2500 Boulevard de l'Université, Sherbrooke, Québec J1K 2R1, Canada
| |
Collapse
|
8
|
Campos TL, Korhonen PK, Hofmann A, Gasser RB, Young ND. Harnessing model organism genomics to underpin the machine learning-based prediction of essential genes in eukaryotes - Biotechnological implications. Biotechnol Adv 2021; 54:107822. [PMID: 34461202 DOI: 10.1016/j.biotechadv.2021.107822] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Revised: 08/17/2021] [Accepted: 08/24/2021] [Indexed: 12/17/2022]
Abstract
The availability of high-quality genomes and advances in functional genomics have enabled large-scale studies of essential genes in model eukaryotes, including the 'elegant worm' (Caenorhabditis elegans; Nematoda) and the 'vinegar fly' (Drosophila melanogaster; Arthropoda). However, this is not the case for other, much less-studied organisms, such as socioeconomically important parasites, for which functional genomic platforms usually do not exist. Thus, there is a need to develop innovative techniques or approaches for the prediction, identification and investigation of essential genes. A key approach that could enable the prediction of such genes is machine learning (ML). Here, we undertake an historical review of experimental and computational approaches employed for the characterisation of essential genes in eukaryotes, with a particular focus on model ecdysozoans (C. elegans and D. melanogaster), and discuss the possible applicability of ML-approaches to organisms such as socioeconomically important parasites. We highlight some recent results showing that high-performance ML, combined with feature engineering, allows a reliable prediction of essential genes from extensive, publicly available 'omic data sets, with major potential to prioritise such genes (with statistical confidence) for subsequent functional genomic validation. These findings could 'open the door' to fundamental and applied research areas. Evidence of some commonality in the essential gene-complement between these two organisms indicates that an ML-engineering approach could find broader applicability to ecdysozoans such as parasitic nematodes or arthropods, provided that suitably large and informative data sets become/are available for proper feature engineering, and for the robust training and validation of algorithms. This area warrants detailed exploration to, for example, facilitate the identification and characterisation of essential molecules as novel targets for drugs and vaccines against parasitic diseases. This focus is particularly important, given the substantial impact that such diseases have worldwide, and the current challenges associated with their prevention and control and with drug resistance in parasite populations.
Collapse
Affiliation(s)
- Tulio L Campos
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia; Bioinformatics Core Facility, Instituto Aggeu Magalhães, Fundação Oswaldo Cruz (IAM-Fiocruz), Recife, Pernambuco, Brazil
| | - Pasi K Korhonen
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Andreas Hofmann
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Robin B Gasser
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia.
| | - Neil D Young
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia.
| |
Collapse
|
9
|
van Leeuwen J, Pons C, Tan G, Wang JZ, Hou J, Weile J, Gebbia M, Liang W, Shuteriqi E, Li Z, Lopes M, Ušaj M, Dos Santos Lopes A, van Lieshout N, Myers CL, Roth FP, Aloy P, Andrews BJ, Boone C. Systematic analysis of bypass suppression of essential genes. Mol Syst Biol 2021; 16:e9828. [PMID: 32939983 PMCID: PMC7507402 DOI: 10.15252/msb.20209828] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 08/11/2020] [Accepted: 08/13/2020] [Indexed: 12/15/2022] Open
Abstract
Essential genes tend to be highly conserved across eukaryotes, but, in some cases, their critical roles can be bypassed through genetic rewiring. From a systematic analysis of 728 different essential yeast genes, we discovered that 124 (17%) were dispensable essential genes. Through whole-genome sequencing and detailed genetic analysis, we investigated the genetic interactions and genome alterations underlying bypass suppression. Dispensable essential genes often had paralogs, were enriched for genes encoding membrane-associated proteins, and were depleted for members of protein complexes. Functionally related genes frequently drove the bypass suppression interactions. These gene properties were predictive of essential gene dispensability and of specific suppressors among hundreds of genes on aneuploid chromosomes. Our findings identify yeast's core essential gene set and reveal that the properties of dispensable essential genes are conserved from yeast to human cells, correlating with human genes that display cell line-specific essentiality in the Cancer Dependency Map (DepMap) project.
Collapse
Affiliation(s)
- Jolanda van Leeuwen
- Center for Integrative Genomics, Bâtiment Génopode, University of Lausanne, Lausanne, Switzerland.,Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Carles Pons
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute for Science and Technology, Barcelona, Spain
| | - Guihong Tan
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Jason Zi Wang
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Jing Hou
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Jochen Weile
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.,Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Marinella Gebbia
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada.,Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Wendy Liang
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Ermira Shuteriqi
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Zhijian Li
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Maykel Lopes
- Center for Integrative Genomics, Bâtiment Génopode, University of Lausanne, Lausanne, Switzerland
| | - Matej Ušaj
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Andreia Dos Santos Lopes
- Center for Integrative Genomics, Bâtiment Génopode, University of Lausanne, Lausanne, Switzerland
| | - Natascha van Lieshout
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada.,Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Chad L Myers
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, Minneapolis, MN, USA
| | - Frederick P Roth
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.,Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada.,Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Patrick Aloy
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute for Science and Technology, Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Brenda J Andrews
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Charles Boone
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
10
|
Álvarez-Lugo A, Becerra A. The Role of Gene Duplication in the Divergence of Enzyme Function: A Comparative Approach. Front Genet 2021; 12:641817. [PMID: 34335678 PMCID: PMC8318041 DOI: 10.3389/fgene.2021.641817] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 06/21/2021] [Indexed: 11/13/2022] Open
Abstract
Gene duplication is a crucial process involved in the appearance of new genes and functions. It is thought to have played a major role in the growth of enzyme families and the expansion of metabolism at the biosphere's dawn and in recent times. Here, we analyzed paralogous enzyme content within each of the seven enzymatic classes for a representative sample of prokaryotes by a comparative approach. We found a high ratio of paralogs for three enzymatic classes: oxidoreductases, isomerases, and translocases, and within each of them, most of the paralogs belong to only a few subclasses. Our results suggest an intricate scenario for the evolution of prokaryotic enzymes, involving different fates for duplicated enzymes fixed in the genome, where around 20-40% of prokaryotic enzymes have paralogs. Intracellular organisms have a lesser ratio of duplicated enzymes, whereas free-living enzymes show the highest ratios. We also found that phylogenetically close phyla and some unrelated but with the same lifestyle share similar genomic and biochemical traits, which ultimately support the idea that gene duplication is associated with environmental adaptation.
Collapse
Affiliation(s)
- Alejandro Álvarez-Lugo
- Posgrado en Ciencias Biológicas, Universidad Nacional Autónoma de México, Mexico City, Mexico.,Facultad de Ciencias, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Arturo Becerra
- Facultad de Ciencias, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
11
|
Campos TL, Korhonen PK, Young ND. Cross-Predicting Essential Genes between Two Model Eukaryotic Species Using Machine Learning. Int J Mol Sci 2021; 22:5056. [PMID: 34064595 PMCID: PMC8150380 DOI: 10.3390/ijms22105056] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Revised: 05/07/2021] [Accepted: 05/08/2021] [Indexed: 12/24/2022] Open
Abstract
Experimental studies of Caenorhabditis elegans and Drosophila melanogaster have contributed substantially to our understanding of molecular and cellular processes in metazoans at large. Since the publication of their genomes, functional genomic investigations have identified genes that are essential or non-essential for survival in each species. Recently, a range of features linked to gene essentiality have been inferred using a machine learning (ML)-based approach, allowing essentiality predictions within a species. Nevertheless, predictions between species are still elusive. Here, we undertake a comprehensive study using ML to discover and validate features of essential genes common to both C. elegans and D. melanogaster. We demonstrate that the cross-species prediction of gene essentiality is possible using a subset of features linked to nucleotide/protein sequences, protein orthology and subcellular localisation, single-cell RNA-seq, and histone methylation markers. Complementary analyses showed that essential genes are enriched for transcription and translation functions and are preferentially located away from heterochromatin regions of C. elegans and D. melanogaster chromosomes. The present work should enable the cross-prediction of essential genes between model and non-model metazoans.
Collapse
Affiliation(s)
- Tulio L. Campos
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC 3010, Australia; (T.L.C.); (P.K.K.)
- Bioinformatics Core Facility, Instituto Aggeu Magalhães, Fundação Oswaldo Cruz (IAM-Fiocruz), Recife 50740-465, PE, Brazil
| | - Pasi K. Korhonen
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC 3010, Australia; (T.L.C.); (P.K.K.)
| | - Neil D. Young
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC 3010, Australia; (T.L.C.); (P.K.K.)
| |
Collapse
|
12
|
Schonfeld E, Vendrow E, Vendrow J, Schonfeld E. On the relation of gene essentiality to intron structure: a computational and deep learning approach. Life Sci Alliance 2021; 4:4/6/e202000951. [PMID: 33906938 PMCID: PMC8127325 DOI: 10.26508/lsa.202000951] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Revised: 04/12/2021] [Accepted: 04/15/2021] [Indexed: 11/24/2022] Open
Abstract
Essential genes have been studied by copy number variants and deletions, both associated with introns. The premise of our work is that introns of essential genes have distinct characteristic properties. We provide support for this by training a deep learning model and demonstrating that introns alone can be used to classify essentiality. The model, limited to first introns, performs at an increased level, implicating first introns in essentiality. We identify unique properties of introns of essential genes, finding that their structure protects against deletion and intron-loss events, especially centered on the first intron. We show that GC density is increased in the first introns of essential genes, allowing for increased enhancer activity, protection against deletions, and improved splice site recognition. We find that first introns of essential genes are of remarkably smaller size than their nonessential counterparts, and to protect against common 3' end deletion events, essential genes carry an increased number of (smaller) introns. To demonstrate the importance of the seven features we identified, we train a feature-based model using only these features and achieve high performance.
Collapse
Affiliation(s)
| | | | - Joshua Vendrow
- University of California, Los Angeles, Los Angeles, CA, USA
| | | |
Collapse
|
13
|
Correa M, Lerat E, Birmelé E, Samson F, Bouillon B, Normand K, Rizzon C. The Transposable Element Environment of Human Genes Differs According to Their Duplication Status and Essentiality. Genome Biol Evol 2021; 13:6273345. [PMID: 33973013 PMCID: PMC8155550 DOI: 10.1093/gbe/evab062] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/17/2021] [Indexed: 12/13/2022] Open
Abstract
Transposable elements (TEs) are major components of eukaryotic genomes and represent approximately 45% of the human genome. TEs can be important sources of novelty in genomes and there is increasing evidence that TEs contribute to the evolution of gene regulation in mammals. Gene duplication is an evolutionary mechanism that also provides new genetic material and opportunities to acquire new functions. To investigate how duplicated genes are maintained in genomes, here, we explored the TE environment of duplicated and singleton genes. We found that singleton genes have more short-interspersed nuclear elements and DNA transposons in their vicinity than duplicated genes, whereas long-interspersed nuclear elements and long-terminal repeat retrotransposons have accumulated more near duplicated genes. We also discovered that this result is highly associated with the degree of essentiality of the genes with an unexpected accumulation of short-interspersed nuclear elements and DNA transposons around the more-essential genes. Our results underline the importance of taking into account the TE environment of genes to better understand how duplicated genes are maintained in genomes.
Collapse
Affiliation(s)
- Margot Correa
- Laboratoire de Mathématiques et Modélisation d'Evry (LaMME), UMR CNRS 8071, ENSIIE, USC INRA, Université d'Evry Val d'Essonne, Evry, France
| | - Emmanuelle Lerat
- Laboratoire de Biométrie et Biologie Evolutive, UMR 5558, Université de Lyon, Université Lyon 1, CNRS, Villeurbanne, France
| | - Etienne Birmelé
- Laboratoire MAP5 UMR 8145, Université de Paris, Paris, France
| | - Franck Samson
- Laboratoire de Mathématiques et Modélisation d'Evry (LaMME), UMR CNRS 8071, ENSIIE, USC INRA, Université d'Evry Val d'Essonne, Evry, France
| | - Bérengère Bouillon
- Laboratoire de Mathématiques et Modélisation d'Evry (LaMME), UMR CNRS 8071, ENSIIE, USC INRA, Université d'Evry Val d'Essonne, Evry, France
| | - Kévin Normand
- Laboratoire de Mathématiques et Modélisation d'Evry (LaMME), UMR CNRS 8071, ENSIIE, USC INRA, Université d'Evry Val d'Essonne, Evry, France
| | - Carène Rizzon
- Laboratoire de Mathématiques et Modélisation d'Evry (LaMME), UMR CNRS 8071, ENSIIE, USC INRA, Université d'Evry Val d'Essonne, Evry, France
| |
Collapse
|
14
|
Baker EA, Gilbert SPR, Shimeld SM, Woollard A. Extensive non-redundancy in a recently duplicated developmental gene family. BMC Ecol Evol 2021; 21:33. [PMID: 33648446 PMCID: PMC7919330 DOI: 10.1186/s12862-020-01735-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 12/13/2020] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND It has been proposed that recently duplicated genes are more likely to be redundant with one another compared to ancient paralogues. The evolutionary logic underpinning this idea is simple, as the assumption is that recently derived paralogous genes are more similar in sequence compared to members of ancient gene families. We set out to test this idea by using molecular phylogenetics and exploiting the genetic tractability of the model nematode, Caenorhabditis elegans, in studying the nematode-specific family of Hedgehog-related genes, the Warthogs. Hedgehog is one of a handful of signal transduction pathways that underpins the development of bilaterian animals. While having lost a bona fide Hedgehog gene, most nematodes have evolved an expanded repertoire of Hedgehog-related genes, ten of which reside within the Warthog family. RESULTS We have characterised their evolutionary origin and their roles in C. elegans and found that these genes have adopted new functions in aspects of post-embryonic development, including left-right asymmetry and cell fate determination, akin to the functions of their vertebrate counterparts. Analysis of various double and triple mutants of the Warthog family reveals that more recently derived paralogues are not redundant with one another, while a pair of divergent Warthogs do display redundancy with respect to their function in cuticle biosynthesis. CONCLUSIONS We have shown that newer members of taxon-restricted gene families are not always functionally redundant despite their recent inception, whereas much older paralogues can be, which is considered paradoxical according to the current framework in gene evolution.
Collapse
Affiliation(s)
- E A Baker
- Department of Biochemistry, University of Oxford, Oxford, OX1 3QU, UK
| | - S P R Gilbert
- Department of Biochemistry, University of Oxford, Oxford, OX1 3QU, UK
| | - S M Shimeld
- Department of Zoology, University of Oxford, Oxford, OX1 3SZ, UK
| | - A Woollard
- Department of Biochemistry, University of Oxford, Oxford, OX1 3QU, UK.
| |
Collapse
|
15
|
Xie C, Bekpen C, Künzel S, Keshavarz M, Krebs-Wheaton R, Skrabar N, Ullrich KK, Zhang W, Tautz D. Dedicated transcriptomics combined with power analysis lead to functional understanding of genes with weak phenotypic changes in knockout lines. PLoS Comput Biol 2020; 16:e1008354. [PMID: 33180766 PMCID: PMC7685438 DOI: 10.1371/journal.pcbi.1008354] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 11/24/2020] [Accepted: 09/20/2020] [Indexed: 12/26/2022] Open
Abstract
Systematic knockout studies in mice have shown that a large fraction of the gene replacements show no lethal or other overt phenotypes. This has led to the development of more refined analysis schemes, including physiological, behavioral, developmental and cytological tests. However, transcriptomic analyses have not yet been systematically evaluated for non-lethal knockouts. We conducted a power analysis to determine the experimental conditions under which even small changes in transcript levels can be reliably traced. We have applied this to two gene disruption lines of genes for which no function was known so far. Dedicated phenotyping tests informed by the tissues and stages of highest expression of the two genes show small effects on the tested phenotypes. For the transcriptome analysis of these stages and tissues, we used a prior power analysis to determine the number of biological replicates and the sequencing depth. We find that under these conditions, the knockouts have a significant impact on the transcriptional networks, with thousands of genes showing small transcriptional changes. GO analysis suggests that A930004D18Rik is involved in developmental processes through contributing to protein complexes, and A830005F24Rik in extracellular matrix functions. Subsampling analysis of the data reveals that the increase in the number of biological replicates was more important that increasing the sequencing depth to arrive at these results. Hence, our proof-of-principle experiment suggests that transcriptomic analysis is indeed an option to study gene functions of genes with weak or no traceable phenotypic effects and it provides the boundary conditions under which this is possible. Knockout mice benefit the understanding of gene functions in mammals. However, it has proven difficult for many genes to identify clear phenotypes, related due to lack of sufficient assays. As Lewis Wolpert put it in a famous quote “But did you take them to the opera?”, thus metaphorically alluding to the need to extend phenotyping efforts. This insight led to the establishment of phenotyping pipelines that are nowadays routinely used to characterize knock-out lines. However, transcriptomic approaches based on RNA-Seq have been much less explored for such deep-level studies. We conducted here both, a theoretical power analysis and practical RNA-Seq experiments on two knockout lines with small phenotypic effects to investigate the parameters including sample size, sequencing depth, fold change, and dispersion. Our dedicated RNA-Seq studies discovered thousands of genes with small transcriptional changes and enriched in specific functions in both knockout lines. We find that it is more important to increase the number of samples than to increase the sequencing depth. Our work shows that a deep RNA-Seq study on knockouts is powerful for understanding gene functions in cases of weak phenotypic effects, and provides a guideline for the experimental design of such studies.
Collapse
Affiliation(s)
- Chen Xie
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
- * E-mail:
| | - Cemalettin Bekpen
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Sven Künzel
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Maryam Keshavarz
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Rebecca Krebs-Wheaton
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Neva Skrabar
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Kristian K. Ullrich
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Wenyu Zhang
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Diethard Tautz
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| |
Collapse
|
16
|
Lee YH, Kim MS, Kim DH, Kim IC, Hagiwara A, Lee JS. Genome-wide identification of DNA double-strand break repair genes and transcriptional modulation in response to benzo[α]pyrene in the monogonont rotifer Brachionus spp. AQUATIC TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2020; 227:105614. [PMID: 32932040 DOI: 10.1016/j.aquatox.2020.105614] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Revised: 08/19/2020] [Accepted: 08/24/2020] [Indexed: 06/11/2023]
Abstract
The DNA repair system has evolved from the common ancestor of all life forms and its function is highly conserved within eukaryotes. In this study, to reveal the role of DNA double-strand break repair (DSB) genes in response to benzo[α]pyrene (B[α]P), we first identified DSB genes in relation to homologous recombination and non-homologous end joining events in four Brachionus rotifer spp.: B. calyciflorus, B. koreanus, B. plicatilis, and B. rotundiformis. In all the Brachionus spp., 39 orthologous genes to human DSB repair genes were identified. Furthermore, three genes in B. koreanus, two genes in B. plicatilis, and one gene in B. calyciflorus and B. rotundiformis were present as duplicated genes, indicating that these genes were diversified over speciation in the genus Brachionus. Moreover, we compared DSB repair genes on the gene structures in four monogonont Brachionus rotifers and the bdelloid rotifer Adineta vaga, which possesses highly efficient DNA repair ability. The transcriptional responses of four monogonont Brachionus rotifers in response to B[α]P exposure showed how B[α]P exposure led to DSBs and subsequently recruited DNA DSB repair pathways in the rotifer B. koreanus. Taken together, this study provides a better understanding of the potential role of DSB repair genes in the monogonont rotifer Brachionus spp. in response to B[α]P.
Collapse
Affiliation(s)
- Young Hwan Lee
- Department of Biological Sciences, College of Science, Sungkyunkwan University, Suwon 16419, South Korea
| | - Min-Sub Kim
- Department of Biological Sciences, College of Science, Sungkyunkwan University, Suwon 16419, South Korea
| | - Duck-Hyun Kim
- Department of Biological Sciences, College of Science, Sungkyunkwan University, Suwon 16419, South Korea
| | - Il-Chan Kim
- Division of Polar Life Sciences, Korea Polar Research Institute, Incheon 21990, South Korea
| | - Atsushi Hagiwara
- Institute of Integrated Science and Technology, Nagasaki University, Nagasaki 852-8521, Japan; Organization for Marine Science and Technology, Nagasaki University, Nagasaki 852-8521, Japan
| | - Jae-Seong Lee
- Department of Biological Sciences, College of Science, Sungkyunkwan University, Suwon 16419, South Korea.
| |
Collapse
|
17
|
Transcriptional activity and strain-specific history of mouse pseudogenes. Nat Commun 2020; 11:3695. [PMID: 32728065 PMCID: PMC7392758 DOI: 10.1038/s41467-020-17157-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2018] [Accepted: 06/08/2020] [Indexed: 01/07/2023] Open
Abstract
Pseudogenes are ideal markers of genome remodelling. In turn, the mouse is an ideal platform for studying them, particularly with the recent availability of strain-sequencing and transcriptional data. Here, combining both manual curation and automatic pipelines, we present a genome-wide annotation of the pseudogenes in the mouse reference genome and 18 inbred mouse strains (available via the mouse.pseudogene.org resource). We also annotate 165 unitary pseudogenes in mouse, and 303, in human. The overall pseudogene repertoire in mouse is similar to that in human in terms of size, biotype distribution, and family composition (e.g. with GAPDH and ribosomal proteins being the largest families). Notable differences arise in the pseudogene age distribution, with multiple retro-transpositional bursts in mouse evolutionary history and only one in human. Furthermore, in each strain about a fifth of all pseudogenes are unique, reflecting strain-specific evolution. Finally, we find that ~15% of the mouse pseudogenes are transcribed, and that highly transcribed parent genes tend to give rise to many processed pseudogenes.
Collapse
|
18
|
Conant GC. The lasting after-effects of an ancient polyploidy on the genomes of teleosts. PLoS One 2020; 15:e0231356. [PMID: 32298330 PMCID: PMC7161988 DOI: 10.1371/journal.pone.0231356] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Accepted: 03/20/2020] [Indexed: 12/20/2022] Open
Abstract
The ancestor of most teleost fishes underwent a whole-genome duplication event three hundred million years ago. Despite its antiquity, the effects of this event are evident both in the structure of teleost genomes and in how the surviving duplicated genes still operate to drive form and function. I inferred a set of shared syntenic regions that survive from the teleost genome duplication (TGD) using eight teleost genomes and the outgroup gar genome (which lacks the TGD). I then phylogenetically modeled the TGD's resolution via shared and independent gene losses and applied a new simulation-based statistical test for the presence of bias toward the preservation of genes from one parental subgenome. On the basis of that test, I argue that the TGD was likely an allopolyploidy. I find that duplicate genes surviving from this duplication in zebrafish are less likely to function in early embryo development than are genes that have returned to single copy at some point in this species' history. The tissues these ohnologs are expressed in, as well as their biological functions, lend support to recent suggestions that the TGD was the source of a morphological innovation in the structure of the teleost retina. Surviving duplicates also appear less likely to be essential than singletons, despite the fact that their single-copy orthologs in mouse are no less essential than other genes.
Collapse
Affiliation(s)
- Gavin C. Conant
- Department of Biological Sciences, North Carolina State University, Raleigh, NC, United States of America
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, United States of America
- Program in Genetics, North Carolina State University, Raleigh, NC, United States of America
- Division of Animal Sciences, University of Missouri, Columbia, MO, United States of America
| |
Collapse
|
19
|
Modeling succinate dehydrogenase loss disorders in C. elegans through effects on hypoxia-inducible factor. PLoS One 2019; 14:e0227033. [PMID: 31887185 PMCID: PMC6936837 DOI: 10.1371/journal.pone.0227033] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Accepted: 12/10/2019] [Indexed: 12/03/2022] Open
Abstract
Mitochondrial disorders arise from defects in nuclear genes encoding enzymes of oxidative metabolism. Mutations of metabolic enzymes in somatic tissues can cause cancers due to oncometabolite accumulation. Paraganglioma and pheochromocytoma are examples, whose etiology and therapy are complicated by the absence of representative cell lines or animal models. These tumors can be driven by loss of the tricarboxylic acid cycle enzyme succinate dehydrogenase. We exploit the relationship between succinate accumulation, hypoxic signaling, egg-laying behavior, and morphology in C. elegans to create genetic and pharmacological models of succinate dehydrogenase loss disorders. With optimization, these models may enable future high-throughput screening efforts.
Collapse
|
20
|
Lafond M, Meghdari Miardan M, Sankoff D. Accurate prediction of orthologs in the presence of divergence after duplication. Bioinformatics 2019; 34:i366-i375. [PMID: 29950018 PMCID: PMC6022570 DOI: 10.1093/bioinformatics/bty242] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Motivation When gene duplication occurs, one of the copies may become free of selective pressure and evolve at an accelerated pace. This has important consequences on the prediction of orthology relationships, since two orthologous genes separated by divergence after duplication may differ in both sequence and function. In this work, we make the distinction between the primary orthologs, which have not been affected by accelerated mutation rates on their evolutionary path, and the secondary orthologs, which have. Similarity-based prediction methods will tend to miss secondary orthologs, whereas phylogeny-based methods cannot separate primary and secondary orthologs. However, both types of orthology have applications in important areas such as gene function prediction and phylogenetic reconstruction, motivating the need for methods that can distinguish the two types. Results We formalize the notion of divergence after duplication and provide a theoretical basis for the inference of primary and secondary orthologs. We then put these ideas to practice with the Hybrid Prediction of Paralogs and Orthologs (HyPPO) framework, which combines ideas from both similarity and phylogeny approaches. We apply our method to simulated and empirical datasets and show that we achieve superior accuracy in predicting primary orthologs, secondary orthologs and paralogs. Availability and implementation HyPPO is a modular framework with a core developed in Python and is provided with a variety of C++ modules. The source code is available at https://github.com/manuellafond/HyPPO. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Manuel Lafond
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada.,Department of Computer Science, Université de Sherbrooke, Sherbrooke, Canada
| | | | - David Sankoff
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada
| |
Collapse
|
21
|
Posner R, Toker IA, Antonova O, Star E, Anava S, Azmon E, Hendricks M, Bracha S, Gingold H, Rechavi O. Neuronal Small RNAs Control Behavior Transgenerationally. Cell 2019; 177:1814-1826.e15. [PMID: 31178120 PMCID: PMC6579485 DOI: 10.1016/j.cell.2019.04.029] [Citation(s) in RCA: 101] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Revised: 02/18/2019] [Accepted: 04/13/2019] [Indexed: 12/21/2022]
Abstract
It is unknown whether the activity of the nervous system can be inherited. In Caenorhabditis elegans nematodes, parental responses can transmit heritable small RNAs that regulate gene expression transgenerationally. In this study, we show that a neuronal process can impact the next generations. Neurons-specific synthesis of RDE-4-dependent small RNAs regulates germline amplified endogenous small interfering RNAs (siRNAs) and germline gene expression for multiple generations. Further, the production of small RNAs in neurons controls the chemotaxis behavior of the progeny for at least three generations via the germline Argonaute HRDE-1. Among the targets of these small RNAs, we identified the conserved gene saeg-2, which is transgenerationally downregulated in the germline. Silencing of saeg-2 following neuronal small RNA biogenesis is required for chemotaxis under stress. Thus, we propose a small-RNA-based mechanism for communication of neuronal processes transgenerationally. C. elegans neuronal small RNAs are characterized by RNA sequencing RDE-4-dependent neuronal endogenous small RNAs communicate with the germline Germline HRDE-1 mediates transgenerational regulation by neuronal small RNAs Neuronal small RNAs regulate germline genes to control behavior transgenerationally
Collapse
Affiliation(s)
- Rachel Posner
- Department of Neurobiology, Wise Faculty of Life Sciences and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Itai Antoine Toker
- Department of Neurobiology, Wise Faculty of Life Sciences and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Olga Antonova
- Department of Neurobiology, Wise Faculty of Life Sciences and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Ekaterina Star
- Department of Neurobiology, Wise Faculty of Life Sciences and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Sarit Anava
- Department of Neurobiology, Wise Faculty of Life Sciences and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Eran Azmon
- Department of Neurobiology, Wise Faculty of Life Sciences and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Michael Hendricks
- Department of Biology, McGill University, Montreal, QC H3A 1B1, Canada
| | - Shahar Bracha
- Department of Neurobiology, Wise Faculty of Life Sciences and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Hila Gingold
- Department of Neurobiology, Wise Faculty of Life Sciences and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Oded Rechavi
- Department of Neurobiology, Wise Faculty of Life Sciences and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv 6997801, Israel.
| |
Collapse
|
22
|
Abstract
An attractive and long-standing hypothesis regarding the evolution of genes after duplication posits that the duplication event creates new evolutionary possibilities by releasing a copy of the gene from constraint. Apparent support was found in numerous analyses, particularly, the observation of higher rates of evolution in duplicated as compared with singleton genes. Could it, instead, be that more duplicable genes (owing to mutation, fixation, or retention biases) are intrinsically faster evolving? To uncouple the measurement of rates of evolution from the determination of duplicate or singleton status, we measure the rates of evolution in singleton genes in outgroup primate lineages but classify these genes as to whether they have duplicated or not in a crown group of great apes. We find that rates of evolution are higher in duplicable genes prior to the duplication event. In part this is owing to a negative correlation between coding sequence length and rate of evolution, coupled with a bias toward smaller genes being more duplicable. The effect is masked by difference in expression rate between duplicable genes and singletons. Additionally, in contradiction to the classical assumption, we find no convincing evidence for an increase in dN/dS after duplication, nor for rate asymmetry between duplicates. We conclude that high rates of evolution of duplicated genes are not solely a consequence of the duplication event, but are rather a predictor of duplicability. These results are consistent with a model in which successful gene duplication events in mammals are skewed toward events of minimal phenotypic impact.
Collapse
Affiliation(s)
- Áine N O'Toole
- Department of Genetics, Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, Somerset, United Kingdom
| | - Aoife McLysaght
- Department of Genetics, Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland
| |
Collapse
|
23
|
Friedrich M. Ancient genetic redundancy of eyeless and twin of eyeless in the arthropod ocular segment. Dev Biol 2017; 432:192-200. [PMID: 28993201 DOI: 10.1016/j.ydbio.2017.10.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2017] [Revised: 10/02/2017] [Accepted: 10/03/2017] [Indexed: 01/28/2023]
Abstract
Pax6 transcription factors are essential upstream regulators in the developing anterior brain and peripheral visual system of most bilaterian animals. While a single homolog is in charge of these functions in vertebrates, two Pax6 genes are in Drosophila: eyeless (ey) and twin of eyeless (toy). At first glance, their co-existence seems sufficiently explained by their differential involvement in the specification of two types of insect visual organs: the lateral compound eyes (ey) and the dorsal ocelli (toy). Less straightforward to understand, however, is their genetic redundancy in promoting defined early and late growth phases of the precursor tissue to these organs: the eye-antennal imaginal disc. Drawing on comparative sequence, expression, and gene function evidence, I here conclude that this gene regulatory network module dates back to the dawn of arthropod evolution, securing the embryonic development of the ocular head segment. Thus, ey and toy constitute a paradigm to explore the organization and functional significance of longterm conserved genetic redundancy of duplicated genes. Indeed, as first steps in this direction, recent studies uncovered the shared use of binding sites in shared enhancers of target genes that are under redundant (string) and, strikingly, even subfunctionalized control by ey and toy (atonal). Equally significant, the evolutionarily recent and paralog-specific function of ey to repress the transcription of the antenna fate regulator Distal-less offers a functionally and phylogenetically well-defined opportunity to study the reconciliation of shared, partitioned, and newly acquired functions in a duplicated developmental gene pair.
Collapse
Affiliation(s)
- Markus Friedrich
- Department of Biological Sciences, Wayne State University, 5047 Gullen Mall, Detroit, MI 48202, USA; Department of Anatomy and Cell Biology, Wayne State University, School of Medicine, 540 East Canfield Avenue, Detroit, MI 48201,USA.
| |
Collapse
|
24
|
Guschanski K, Warnefors M, Kaessmann H. The evolution of duplicate gene expression in mammalian organs. Genome Res 2017; 27:1461-1474. [PMID: 28743766 PMCID: PMC5580707 DOI: 10.1101/gr.215566.116] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2016] [Accepted: 07/18/2017] [Indexed: 12/16/2022]
Abstract
Gene duplications generate genomic raw material that allows the emergence of novel functions, likely facilitating adaptive evolutionary innovations. However, global assessments of the functional and evolutionary relevance of duplicate genes in mammals were until recently limited by the lack of appropriate comparative data. Here, we report a large-scale study of the expression evolution of DNA-based functional gene duplicates in three major mammalian lineages (placental mammals, marsupials, egg-laying monotremes) and birds, on the basis of RNA sequencing (RNA-seq) data from nine species and eight organs. We observe dynamic changes in tissue expression preference of paralogs with different duplication ages, suggesting differential contribution of paralogs to specific organ functions during vertebrate evolution. Specifically, we show that paralogs that emerged in the common ancestor of bony vertebrates are enriched for genes with brain-specific expression and provide evidence for differential forces underlying the preferential emergence of young testis- and liver-specific expressed genes. Further analyses uncovered that the overall spatial expression profiles of gene families tend to be conserved, with several exceptions of pronounced tissue specificity shifts among lineage-specific gene family expansions. Finally, we trace new lineage-specific genes that may have contributed to the specific biology of mammalian organs, including the little-studied placenta. Overall, our study provides novel and taxonomically broad evidence for the differential contribution of duplicate genes to tissue-specific transcriptomes and for their importance for the phenotypic evolution of vertebrates.
Collapse
Affiliation(s)
- Katerina Guschanski
- Department of Animal Ecology, Evolutionary Biology Centre, Uppsala University, S-75105 Uppsala, Sweden
| | - Maria Warnefors
- Center for Molecular Biology of Heidelberg University (ZMBH), DKFZ-ZMBH Alliance, D-69120 Heidelberg, Germany
| | - Henrik Kaessmann
- Center for Molecular Biology of Heidelberg University (ZMBH), DKFZ-ZMBH Alliance, D-69120 Heidelberg, Germany
| |
Collapse
|
25
|
Structural and Functional Characterization of a Caenorhabditis elegans Genetic Interaction Network within Pathways. PLoS Comput Biol 2016; 12:e1004738. [PMID: 26871911 PMCID: PMC4752231 DOI: 10.1371/journal.pcbi.1004738] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Accepted: 01/05/2016] [Indexed: 12/02/2022] Open
Abstract
A genetic interaction (GI) is defined when the mutation of one gene modifies the phenotypic expression associated with the mutation of a second gene. Genome-wide efforts to map GIs in yeast revealed structural and functional properties of a GI network. This provided insights into the mechanisms underlying the robustness of yeast to genetic and environmental insults, and also into the link existing between genotype and phenotype. While a significant conservation of GIs and GI network structure has been reported between distant yeast species, such a conservation is not clear between unicellular and multicellular organisms. Structural and functional characterization of a GI network in these latter organisms is consequently of high interest. In this study, we present an in-depth characterization of ~1.5K GIs in the nematode Caenorhabditis elegans. We identify and characterize six distinct classes of GIs by examining a wide-range of structural and functional properties of genes and network, including co-expression, phenotypical manifestations, relationship with protein-protein interaction dense subnetworks (PDS) and pathways, molecular and biological functions, gene essentiality and pleiotropy. Our study shows that GI classes link genes within pathways and display distinctive properties, specifically towards PDS. It suggests a model in which pathways are composed of PDS-centric and PDS-independent GIs coordinating molecular machines through two specific classes of GIs involving pleiotropic and non-pleiotropic connectors. Our study provides the first in-depth characterization of a GI network within pathways of a multicellular organism. It also suggests a model to understand better how GIs control system robustness and evolution. Network biology has focused for years on protein-protein interaction (PPI) networks, identifying nodes with central structural functions and modules associated to bioprocesses, phenotypes and diseases. Network biology field moved to a higher level of abstraction, and started characterizing a less intuitive kind of interactions, called genetic interactions (GIs) or epistasis. Mostly due to technical challenges associated to the genome-wide mapping of GIs, these studies primarily focused on unicellular organisms. They uncovered modules embedded within the structure of these networks and started characterizing their relationship with PPI-network and biological functions. We provide here the first in-depth characterization of a network composed of ~600 GIs within signaling and metabolic pathways of a multicellular organism, the nematode Caenorhabditis elegans. We characterize the structure of this network, and the function of GI classes found in this network. We also discuss how these GI classes contribute to the genomic robustness and the adaptive evolution of multicellular organisms.
Collapse
|
26
|
Li Z, Defoort J, Tasdighian S, Maere S, Van de Peer Y, De Smet R. Gene Duplicability of Core Genes Is Highly Consistent across All Angiosperms. THE PLANT CELL 2016; 28:326-44. [PMID: 26744215 PMCID: PMC4790876 DOI: 10.1105/tpc.15.00877] [Citation(s) in RCA: 136] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/13/2015] [Accepted: 01/04/2016] [Indexed: 05/02/2023]
Abstract
Gene duplication is an important mechanism for adding to genomic novelty. Hence, which genes undergo duplication and are preserved following duplication is an important question. It has been observed that gene duplicability, or the ability of genes to be retained following duplication, is a nonrandom process, with certain genes being more amenable to survive duplication events than others. Primarily, gene essentiality and the type of duplication (small-scale versus large-scale) have been shown in different species to influence the (long-term) survival of novel genes. However, an overarching view of "gene duplicability" is lacking, mainly due to the fact that previous studies usually focused on individual species and did not account for the influence of genomic context and the time of duplication. Here, we present a large-scale study in which we investigated duplicate retention for 9178 gene families shared between 37 flowering plant species, referred to as angiosperm core gene families. For most gene families, we observe a strikingly consistent pattern of gene duplicability across species, with gene families being either primarily single-copy or multicopy in all species. An intermediate class contains gene families that are often retained in duplicate for periods extending to tens of millions of years after whole-genome duplication, but ultimately appear to be largely restored to singleton status, suggesting that these genes may be dosage balance sensitive. The distinction between single-copy and multicopy gene families is reflected in their functional annotation, with single-copy genes being mainly involved in the maintenance of genome stability and organelle function and multicopy genes in signaling, transport, and metabolism. The intermediate class was overrepresented in regulatory genes, further suggesting that these represent putative dosage-balance-sensitive genes.
Collapse
Affiliation(s)
- Zhen Li
- Department of Plant Systems Biology, VIB, B-9052 Ghent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, B-9052 Ghent, Belgium Bioinformatics Institute Ghent, Ghent University, B-9052 Ghent, Belgium
| | - Jonas Defoort
- Department of Plant Systems Biology, VIB, B-9052 Ghent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, B-9052 Ghent, Belgium Bioinformatics Institute Ghent, Ghent University, B-9052 Ghent, Belgium
| | - Setareh Tasdighian
- Department of Plant Systems Biology, VIB, B-9052 Ghent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, B-9052 Ghent, Belgium Bioinformatics Institute Ghent, Ghent University, B-9052 Ghent, Belgium
| | - Steven Maere
- Department of Plant Systems Biology, VIB, B-9052 Ghent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, B-9052 Ghent, Belgium Bioinformatics Institute Ghent, Ghent University, B-9052 Ghent, Belgium
| | - Yves Van de Peer
- Department of Plant Systems Biology, VIB, B-9052 Ghent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, B-9052 Ghent, Belgium Bioinformatics Institute Ghent, Ghent University, B-9052 Ghent, Belgium Genomics Research Institute, University of Pretoria, Pretoria 0028, South Africa
| | - Riet De Smet
- Department of Plant Systems Biology, VIB, B-9052 Ghent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, B-9052 Ghent, Belgium Bioinformatics Institute Ghent, Ghent University, B-9052 Ghent, Belgium
| |
Collapse
|
27
|
The Constrained Maximal Expression Level Owing to Haploidy Shapes Gene Content on the Mammalian X Chromosome. PLoS Biol 2015; 13:e1002315. [PMID: 26685068 PMCID: PMC4686125 DOI: 10.1371/journal.pbio.1002315] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2015] [Accepted: 11/02/2015] [Indexed: 11/19/2022] Open
Abstract
X chromosomes are unusual in many regards, not least of which is their nonrandom gene content. The causes of this bias are commonly discussed in the context of sexual antagonism and the avoidance of activity in the male germline. Here, we examine the notion that, at least in some taxa, functionally biased gene content may more profoundly be shaped by limits imposed on gene expression owing to haploid expression of the X chromosome. Notably, if the X, as in primates, is transcribed at rates comparable to the ancestral rate (per promoter) prior to the X chromosome formation, then the X is not a tolerable environment for genes with very high maximal net levels of expression, owing to transcriptional traffic jams. We test this hypothesis using The Encyclopedia of DNA Elements (ENCODE) and data from the Functional Annotation of the Mammalian Genome (FANTOM5) project. As predicted, the maximal expression of human X-linked genes is much lower than that of genes on autosomes: on average, maximal expression is three times lower on the X chromosome than on autosomes. Similarly, autosome-to-X retroposition events are associated with lower maximal expression of retrogenes on the X than seen for X-to-autosome retrogenes on autosomes. Also as expected, X-linked genes have a lesser degree of increase in gene expression than autosomal ones (compared to the human/Chimpanzee common ancestor) if highly expressed, but not if lowly expressed. The traffic jam model also explains the known lower breadth of expression for genes on the X (and the Z of birds), as genes with broad expression are, on average, those with high maximal expression. As then further predicted, highly expressed tissue-specific genes are also rare on the X and broadly expressed genes on the X tend to be lowly expressed, both indicating that the trend is shaped by the maximal expression level not the breadth of expression per se. Importantly, a limit to the maximal expression level explains biased tissue of expression profiles of X-linked genes. Tissues whose tissue-specific genes are very highly expressed (e.g., secretory tissues, tissues abundant in structural proteins) are also tissues in which gene expression is relatively rare on the X chromosome. These trends cannot be fully accounted for in terms of alternative models of biased expression. In conclusion, the notion that it is hard for genes on the Therian X to be highly expressed, owing to transcriptional traffic jams, provides a simple yet robustly supported rationale of many peculiar features of X’s gene content, gene expression, and evolution. Laurence Hurst, Lukasz Huminiecki, and the FANTOM5 consortium propose a new explanation for the peculiar expression properties of genes on the human X chromosome, based on the premise that very high expression levels cannot be achieved on a haploid-expressed chromosome. Genes located on the human X chromosome are not a random mix of genes: they tend to be expressed in relatively few tissues or are specific for a particular set of tissues, e.g., brain regions. Prior attempts to explain this skewed gene content have hypothesized that the X chromosome might be peculiar because it has to balance mutations that are advantageous to one sex but deleterious to the other, or because it has to shut down during the process of sperm manufacture in males. Here we suggest and test a third possible explanation: that genes on the X chromosome are limited in their transcription levels and thus tend to be genes that are lowly or specifically expressed. We consider the suggestion that since these genes can only be expressed from one chromosome, as males only have one X, the ability to express a gene at very high rates is limited owing to potential transcriptional traffic jams. As predicted, we find that human X-located genes have maximal expression rates far below that of genes residing on autosomes. When we look at genes that have moved onto or off the X chromosome during recent evolution, we find the maximal expression is higher when not on the X chromosome. We also find that X-located genes that are relatively highly expressed are not able to increase their expression level further. Our model explains both the enrichment for tissue specificity and the paucity of certain tissues with X-located genes. Genes underrepresented on the X are either expressed in many tissues—such genes tend to have high maximal expression—or are from tissues that require a lot of transcription (e.g., fast secreting tissues like the liver). Just as many of the findings cannot be explained by the two earlier models, neither can the traffic jam model explain all the peculiar features of the genes found on the X chromosome. Indeed, we find evidence of a reproduction-related bias in X-located genes, even after allowing for the traffic jam problem.
Collapse
|
28
|
Miura S, Tate S, Kumar S. Using Disease-Associated Coding Sequence Variation to Investigate Functional Compensation by Human Paralogous Proteins. Evol Bioinform Online 2015; 11:245-51. [PMID: 26604664 PMCID: PMC4631161 DOI: 10.4137/ebo.s30594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2015] [Revised: 09/14/2015] [Accepted: 09/18/2015] [Indexed: 11/09/2022] Open
Abstract
Gene duplication enables the functional diversification in species. It is thought that duplicated genes may be able to compensate if the function of one of the gene copies is disrupted. This possibility is extensively debated with some studies reporting proteome-wide compensation, whereas others suggest functional compensation among only recent gene duplicates or no compensation at all. We report results from a systematic molecular evolutionary analysis to test the predictions of the functional compensation hypothesis. We contrasted the density of Mendelian disease-associated single nucleotide variants (dSNVs) in proteins with no discernable paralogs (singletons) with the dSNV density in proteins found in multigene families. Under the functional compensation hypothesis, we expected to find greater numbers of dSNVs in singletons due to the lack of any compensating partners. Our analyses produced an opposite pattern; paralogs have over 35% higher dSNV density than singletons. We found that these patterns are concordant with similar differences in the rates of amino acid evolution (ie, functional constraints), as the proteins with paralogs have evolved 33% slower than singletons. Our evolutionary constraint explanation is robust to differences in family sizes, ages (young vs. old duplicates), and degrees of amino acid sequence similarities among paralogs. Therefore, disease-associated human variation does not exhibit significant signals of functional compensation among paralogous proteins, but rather an evolutionary constraint hypothesis provides a better explanation for the observed patterns of disease-associated and neutral polymorphisms in the human genome.
Collapse
Affiliation(s)
- Sayaka Miura
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
| | - Stephanie Tate
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA. ; Department of Biology, Temple University, Philadelphia, PA, USA. ; Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
29
|
Calpena E, Palau F, Espinós C, Galindo MI. Evolutionary History of the Smyd Gene Family in Metazoans: A Framework to Identify the Orthologs of Human Smyd Genes in Drosophila and Other Animal Species. PLoS One 2015; 10:e0134106. [PMID: 26230726 PMCID: PMC4521844 DOI: 10.1371/journal.pone.0134106] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2015] [Accepted: 07/06/2015] [Indexed: 01/01/2023] Open
Abstract
The Smyd gene family code for proteins containing a conserved core consisting of a SET domain interrupted by a MYND zinc finger. Smyd proteins are important in epigenetic control of development and carcinogenesis, through posttranslational modifications in histones and other proteins. Previous reports indicated that the Smyd family is quite variable in metazoans, so a rigorous phylogenetic reconstruction of this complex gene family is of central importance to understand its evolutionary history and functional diversification or conservation. We have performed a phylogenetic analysis of Smyd protein sequences, and our results show that the extant metazoan Smyd genes can be classified in three main classes, Smyd3 (which includes chordate-specific Smyd1 and Smyd2 genes), Smyd4 and Smyd5. In addition, there is an arthropod-specific class, SmydA. While the evolutionary history of the Smyd3 and Smyd5 classes is relatively simple, the Smyd4 class has suffered several events of gene loss, gene duplication and lineage-specific expansions in the animal phyla included in our analysis. A more specific study of the four Smyd4 genes in Drosophila melanogaster shows that they are not redundant, since their patterns of expression are different and knock-down of individual genes can have dramatic phenotypes despite the presence of the other family members.
Collapse
Affiliation(s)
- Eduardo Calpena
- Program in Rare and Genetic Diseases, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Instituto de Salud Carlos III, Valencia, Spain
| | - Francesc Palau
- Program in Rare and Genetic Diseases, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Instituto de Salud Carlos III, Valencia, Spain
| | - Carmen Espinós
- Program in Rare and Genetic Diseases, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Instituto de Salud Carlos III, Valencia, Spain
| | - Máximo Ibo Galindo
- Program in Rare and Genetic Diseases, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Instituto de Salud Carlos III, Valencia, Spain
- * E-mail:
| |
Collapse
|
30
|
Zhang Z, Ren Q. Why are essential genes essential? - The essentiality of Saccharomyces genes. MICROBIAL CELL 2015; 2:280-287. [PMID: 28357303 PMCID: PMC5349100 DOI: 10.15698/mic2015.08.218] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Essential genes are defined as required for the survival of an organism or a cell. They are of particular interests, not only for their essential biological functions, but also in practical applications, such as identifying effective drug targets to pathogenic bacteria and fungi. The budding yeast Saccharomyces cerevisiae has approximately 6,000 open reading frames, 15 to 20% of which are deemed as essential. Some of the essential genes, however, appear to perform non-essential functions, such as aging and cell death, while many of the non-essential genes play critical roles in cell survival. In this paper, we reviewed and analyzed the levels of essentiality of the Saccharomyces cerevisiae genes and have grouped the genes into four categories: (1) Conditional essential: essential only under certain circumstances or growth conditions; (2) Essential: required for survival under optimal growth conditions; (3) Redundant essential: synthetic lethal due to redundant pathways or gene duplication; and (4) Absolute essential: the minimal genes required for maintaining a cellular life under a stress-free environment. The essential and non-essential functions of the essential genes were further analyzed.
Collapse
Affiliation(s)
- Zhaojie Zhang
- Department of Zoology and Physiology, University of Wyoming, Laramie, WY 82071, USA
| | - Qun Ren
- Department of Zoology and Physiology, University of Wyoming, Laramie, WY 82071, USA
| |
Collapse
|
31
|
Tanaka K, Diekmann Y, Hazbun A, Hijazi A, Vreede B, Roch F, Sucena É. Multispecies Analysis of Expression Pattern Diversification in the Recently Expanded Insect Ly6 Gene Family. Mol Biol Evol 2015; 32:1730-47. [PMID: 25743545 PMCID: PMC4476152 DOI: 10.1093/molbev/msv052] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Gene families often consist of members with diverse expression domains reflecting their functions in a wide variety of tissues. However, how the expression of individual members, and thus their tissue-specific functions, diversified during the course of gene family expansion is not well understood. In this study, we approached this question through the analysis of the duplication history and transcriptional evolution of a rapidly expanding subfamily of insect Ly6 genes. We analyzed different insect genomes and identified seven Ly6 genes that have originated from a single ancestor through sequential duplication within the higher Diptera. We then determined how the original embryonic expression pattern of the founding gene diversified by characterizing its tissue-specific expression in the beetle Tribolium castaneum, the butterfly Bicyclus anynana, and the mosquito Anopheles stephensi and those of its duplicates in three higher dipteran species, representing various stages of the duplication history (Megaselia abdita, Ceratitis capitata, and Drosophila melanogaster). Our results revealed that frequent neofunctionalization episodes contributed to the increased expression breadth of this subfamily and that these events occurred after duplication and speciation events at comparable frequencies. In addition, at each duplication node, we consistently found asymmetric expression divergence. One paralog inherited most of the tissue-specificities of the founder gene, whereas the other paralog evolved drastically reduced expression domains. Our approach attests to the power of combining a well-established duplication history with a comprehensive coverage of representative species in acquiring unequivocal information about the dynamics of gene expression evolution in gene families.
Collapse
Affiliation(s)
| | | | | | - Assia Hijazi
- Centre de Biologie du Développement, CNRS UMR 5547, Université de Toulouse UPS, Toulouse, France
| | | | - Fernando Roch
- Centre de Biologie du Développement, CNRS UMR 5547, Université de Toulouse UPS, Toulouse, France
| | - Élio Sucena
- Instituto Gulbenkian de Ciência, Oeiras, Portugal Departamento de Biologia Animal, Faculdade de Ciências, Edifício C2, Universidade de Lisboa, Lisboa, Portugal
| |
Collapse
|
32
|
Zarrei M, MacDonald JR, Merico D, Scherer SW. A copy number variation map of the human genome. Nat Rev Genet 2015; 16:172-83. [DOI: 10.1038/nrg3871] [Citation(s) in RCA: 565] [Impact Index Per Article: 62.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
33
|
Hurst LD, Sachenkova O, Daub C, Forrest ARR, Huminiecki L. A simple metric of promoter architecture robustly predicts expression breadth of human genes suggesting that most transcription factors are positive regulators. Genome Biol 2014; 15:413. [PMID: 25079787 PMCID: PMC4310617 DOI: 10.1186/s13059-014-0413-3] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2013] [Accepted: 07/15/2014] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Conventional wisdom holds that, owing to the dominance of features such as chromatin level control, the expression of a gene cannot be readily predicted from knowledge of promoter architecture. This is reflected, for example, in a weak or absent correlation between promoter divergence and expression divergence between paralogs. However, an inability to predict may reflect an inability to accurately measure or employment of the wrong parameters. Here we address this issue through integration of two exceptional resources: ENCODE data on transcription factor binding and the FANTOM5 high-resolution expression atlas. RESULTS Consistent with the notion that in eukaryotes most transcription factors are activating, the number of transcription factors binding a promoter is a strong predictor of expression breadth. In addition, evolutionarily young duplicates have fewer transcription factor binders and narrower expression. Nonetheless, we find several binders and cooperative sets that are disproportionately associated with broad expression, indicating that models more complex than simple correlations should hold more predictive power. Indeed, a machine learning approach improves fit to the data compared with a simple correlation. Machine learning could at best moderately predict tissue of expression of tissue specific genes. CONCLUSIONS We find robust evidence that some expression parameters and paralog expression divergence are strongly predictable with knowledge of transcription factor binding repertoire. While some cooperative complexes can be identified, consistent with the notion that most eukaryotic transcription factors are activating, a simple predictor, the number of binding transcription factors found on a promoter, is a robust predictor of expression breadth.
Collapse
Affiliation(s)
- Laurence D Hurst
- />Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY UK
| | - Oxana Sachenkova
- />Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
- />Science for Life Laboratory, SciLifeLab, Stockholm, Sweden
| | - Carsten Daub
- />Science for Life Laboratory, SciLifeLab, Stockholm, Sweden
| | - Alistair RR Forrest
- />RIKEN Omics Science Center, Yokohama, Japan
- />Division of Genomic Technologies, RIKEN Center for Life Science Technologies, Yokohama, Kanagawa Japan
| | - the FANTOM consortium
- />Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY UK
- />Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
- />Science for Life Laboratory, SciLifeLab, Stockholm, Sweden
- />RIKEN Omics Science Center, Yokohama, Japan
- />Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden
- />BILS bioinformatics infrastructure for life sciences, Stockholm, Sweden
- />Department of Immunology Genetics and Pathology, Uppsala University, Uppsala, Sweden
- />Division of Genomic Technologies, RIKEN Center for Life Science Technologies, Yokohama, Kanagawa Japan
| | - Lukasz Huminiecki
- />Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
- />Science for Life Laboratory, SciLifeLab, Stockholm, Sweden
- />Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden
- />BILS bioinformatics infrastructure for life sciences, Stockholm, Sweden
- />Department of Immunology Genetics and Pathology, Uppsala University, Uppsala, Sweden
| |
Collapse
|
34
|
Abstract
Gene duplication and alternative splicing are important mechanisms in the production of genomic novelties. Previous work has shown that a gene’s family size and the number of splice variants it produces are inversely related, although the underlying reason is not well understood. Here, we report that gene length and expression level together explain this relationship. We found that gene lengths correlate with both gene duplication and alternative splicing: Longer genes are less likely to produce duplicates and more likely to exhibit alternative splicing. We show that gene length is a dynamic property, increasing with evolutionary time—due in part to the insertions of transposable elements—and decreasing following partial gene duplications. However, gene length alone does not account for the relationship between alternative splicing and gene duplication. A gene’s expression level appears both to impose a strong constraint on its length and to restrict gene duplications. Furthermore, high gene expression promotes alternative splicing, in particular for long genes, and alternatively, short genes with low expression levels have large gene families. Our analysis of the human and mouse genomes shows that gene length and expression level are primary genic properties that together account for the relationship between gene duplication and alternative splicing and bias the origin of novelties in the genome.
Collapse
Affiliation(s)
| | - Itai Yanai
- Department of Biology, Technion-Israel Institute of Technology, Haifa 32000, Israel
| |
Collapse
|
35
|
Abstract
Gene duplication is widely believed to facilitate adaptation, but unambiguous evidence for this hypothesis has been found in only a small number of cases. Although gene duplication may increase the fitness of the involved organisms by doubling gene dosage or neofunctionalization, it may also result in a simple division of ancestral functions into daughter genes, which need not promote adaptation. Hence, the general validity of the adaptation by gene duplication hypothesis remains uncertain. Indeed, a genome-scale experiment found similar fitness effects of deleting pairs of duplicate genes and deleting individual singleton genes from the yeast genome, leading to the conclusion that duplication rarely results in adaptation. Here we contend that the above comparison is unfair because of a known duplication bias among genes with different fitness contributions. To rectify this problem, we compare homologous genes from the budding yeast Saccharomyces cerevisiae and the fission yeast Schizosaccharomyces pombe. We discover that simultaneously deleting a duplicate gene pair in S. cerevisiae reduces fitness significantly more than deleting their singleton counterpart in S. pombe, revealing post-duplication adaptation. The duplicates-singleton difference in fitness effect is not attributable to a potential increase in gene dose after duplication, suggesting that the adaptation is owing to neofunctionalization, which we find to be explicable by acquisitions of binary protein-protein interactions rather than gene expression changes. These results provide genomic evidence for the role of gene duplication in organismal adaptation and are important for understanding the genetic mechanisms of evolutionary innovation.
Collapse
Affiliation(s)
- Wenfeng Qian
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan 48109, USA; Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|
36
|
Cheng J, Xu Z, Wu W, Zhao L, Li X, Liu Y, Tao S. Training set selection for the prediction of essential genes. PLoS One 2014; 9:e86805. [PMID: 24466248 PMCID: PMC3899339 DOI: 10.1371/journal.pone.0086805] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2013] [Accepted: 12/13/2013] [Indexed: 01/23/2023] Open
Abstract
Various computational models have been developed to transfer annotations of gene essentiality between organisms. However, despite the increasing number of microorganisms with well-characterized sets of essential genes, selection of appropriate training sets for predicting the essential genes of poorly-studied or newly sequenced organisms remains challenging. In this study, a machine learning approach was applied reciprocally to predict the essential genes in 21 microorganisms. Results showed that training set selection greatly influenced predictive accuracy. We determined four criteria for training set selection: (1) essential genes in the selected training set should be reliable; (2) the growth conditions in which essential genes are defined should be consistent in training and prediction sets; (3) species used as training set should be closely related to the target organism; and (4) organisms used as training and prediction sets should exhibit similar phenotypes or lifestyles. We then analyzed the performance of an incomplete training set and an integrated training set with multiple organisms. We found that the size of the training set should be at least 10% of the total genes to yield accurate predictions. Additionally, the integrated training sets exhibited remarkable increase in stability and accuracy compared with single sets. Finally, we compared the performance of the integrated training sets with the four criteria and with random selection. The results revealed that a rational selection of training sets based on our criteria yields better performance than random selection. Thus, our results provide empirical guidance on training set selection for the identification of essential genes on a genome-wide scale.
Collapse
Affiliation(s)
- Jian Cheng
- College of Life Sciences and State Key Laboratory of Crop Stress Biology in Arid Areas, Northwest A&F University, Yangling, Shaanxi, China
- Bioinformatics Center, Northwest A&F University, Yangling, Shaanxi, China
| | - Zhao Xu
- College of Science, Northwest A&F University, Yangling Shaanxi, China
| | - Wenwu Wu
- Bioinformatics Center, Northwest A&F University, Yangling, Shaanxi, China
- Key Laboratory of Food Safety Research, Institute for Nutritional Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Li Zhao
- College of Life Sciences and State Key Laboratory of Crop Stress Biology in Arid Areas, Northwest A&F University, Yangling, Shaanxi, China
- Bioinformatics Center, Northwest A&F University, Yangling, Shaanxi, China
| | - Xiangchen Li
- College of Life Sciences and State Key Laboratory of Crop Stress Biology in Arid Areas, Northwest A&F University, Yangling, Shaanxi, China
- Bioinformatics Center, Northwest A&F University, Yangling, Shaanxi, China
| | - Yanlin Liu
- College of Wine, Northwest A&F University, Yangling Shaanxi, China
- * E-mail: (YL); (ST)
| | - Shiheng Tao
- College of Life Sciences and State Key Laboratory of Crop Stress Biology in Arid Areas, Northwest A&F University, Yangling, Shaanxi, China
- Bioinformatics Center, Northwest A&F University, Yangling, Shaanxi, China
- * E-mail: (YL); (ST)
| |
Collapse
|
37
|
Bergström A, Simpson JT, Salinas F, Barré B, Parts L, Zia A, Nguyen Ba AN, Moses AM, Louis EJ, Mustonen V, Warringer J, Durbin R, Liti G. A high-definition view of functional genetic variation from natural yeast genomes. Mol Biol Evol 2014; 31:872-88. [PMID: 24425782 PMCID: PMC3969562 DOI: 10.1093/molbev/msu037] [Citation(s) in RCA: 207] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The question of how genetic variation in a population influences phenotypic variation and evolution is of major importance in modern biology. Yet much is still unknown about the relative functional importance of different forms of genome variation and how they are shaped by evolutionary processes. Here we address these questions by population level sequencing of 42 strains from the budding yeast Saccharomyces cerevisiae and its closest relative S. paradoxus. We find that genome content variation, in the form of presence or absence as well as copy number of genetic material, is higher within S. cerevisiae than within S. paradoxus, despite genetic distances as measured in single-nucleotide polymorphisms being vastly smaller within the former species. This genome content variation, as well as loss-of-function variation in the form of premature stop codons and frameshifting indels, is heavily enriched in the subtelomeres, strongly reinforcing the relevance of these regions to functional evolution. Genes affected by these likely functional forms of variation are enriched for functions mediating interaction with the external environment (sugar transport and metabolism, flocculation, metal transport, and metabolism). Our results and analyses provide a comprehensive view of genomic diversity in budding yeast and expose surprising and pronounced differences between the variation within S. cerevisiae and that within S. paradoxus. We also believe that the sequence data and de novo assemblies will constitute a useful resource for further evolutionary and population genomics studies.
Collapse
Affiliation(s)
- Anders Bergström
- Institute for Research on Cancer and Ageing, Nice (IRCAN), University of Nice, Nice, France
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
38
|
Cheng J, Wu W, Zhang Y, Li X, Jiang X, Wei G, Tao S. A new computational strategy for predicting essential genes. BMC Genomics 2013; 14:910. [PMID: 24359534 PMCID: PMC3880044 DOI: 10.1186/1471-2164-14-910] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2013] [Accepted: 11/29/2013] [Indexed: 12/17/2022] Open
Abstract
Background Determination of the minimum gene set for cellular life is one of the central goals in biology. Genome-wide essential gene identification has progressed rapidly in certain bacterial species; however, it remains difficult to achieve in most eukaryotic species. Several computational models have recently been developed to integrate gene features and used as alternatives to transfer gene essentiality annotations between organisms. Results We first collected features that were widely used by previous predictive models and assessed the relationships between gene features and gene essentiality using a stepwise regression model. We found two issues that could significantly reduce model accuracy: (i) the effect of multicollinearity among gene features and (ii) the diverse and even contrasting correlations between gene features and gene essentiality existing within and among different species. To address these issues, we developed a novel model called feature-based weighted Naïve Bayes model (FWM), which is based on Naïve Bayes classifiers, logistic regression, and genetic algorithm. The proposed model assesses features and filters out the effects of multicollinearity and diversity. The performance of FWM was compared with other popular models, such as support vector machine, Naïve Bayes model, and logistic regression model, by applying FWM to reciprocally predict essential genes among and within 21 species. Our results showed that FWM significantly improves the accuracy and robustness of essential gene prediction. Conclusions FWM can remarkably improve the accuracy of essential gene prediction and may be used as an alternative method for other classification work. This method can contribute substantially to the knowledge of the minimum gene sets required for living organisms and the discovery of new drug targets.
Collapse
Affiliation(s)
| | | | | | | | | | - Gehong Wei
- College of Life Science, State Key Laboratory of Crop Stress Biology for Arid Areas, Northwest A&F University, Yangling, Shaanxi, China.
| | | |
Collapse
|