1
|
Vakirlis N, Acar O, Cherupally V, Carvunis AR. Ancestral Sequence Reconstruction as a Tool to Detect and Study De Novo Gene Emergence. Genome Biol Evol 2024; 16:evae151. [PMID: 39004885 DOI: 10.1093/gbe/evae151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 06/17/2024] [Accepted: 07/09/2024] [Indexed: 07/16/2024] Open
Abstract
New protein-coding genes can evolve from previously noncoding genomic regions through a process known as de novo gene emergence. Evidence suggests that this process has likely occurred throughout evolution and across the tree of life. Yet, confidently identifying de novo emerged genes remains challenging. Ancestral sequence reconstruction is a promising approach for inferring whether a gene has emerged de novo or not, as it allows us to inspect whether a given genomic locus ancestrally harbored protein-coding capacity. However, the use of ancestral sequence reconstruction in the context of de novo emergence is still in its infancy and its capabilities, limitations, and overall potential are largely unknown. Notably, it is difficult to formally evaluate the protein-coding capacity of ancestral sequences, particularly when new gene candidates are short. How well-suited is ancestral sequence reconstruction as a tool for the detection and study of de novo genes? Here, we address this question by designing an ancestral sequence reconstruction workflow incorporating different tools and sets of parameters and by introducing a formal criterion that allows to estimate, within a desired level of confidence, when protein-coding capacity originated at a particular locus. Applying this workflow on ∼2,600 short, annotated budding yeast genes (<1,000 nucleotides), we found that ancestral sequence reconstruction robustly predicts an ancient origin for the most widely conserved genes, which constitute "easy" cases. For less robust cases, we calculated a randomization-based empirical P-value estimating whether the observed conservation between the extant and ancestral reading frame could be attributed to chance. This formal criterion allowed us to pinpoint a branch of origin for most of the less robust cases, identifying 49 genes that can unequivocally be considered de novo originated since the split of the Saccharomyces genus, including 37 Saccharomyces cerevisiae-specific genes. We find that for the remaining equivocal cases we cannot rule out different evolutionary scenarios including rapid evolution, multiple gene losses, or a recent de novo origin. Overall, our findings suggest that ancestral sequence reconstruction is a valuable tool to study de novo gene emergence but should be applied with caution and awareness of its limitations.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Omer Acar
- Pittsburgh Center for Evolutionary Biology and Medicine, Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Vijay Cherupally
- Pittsburgh Center for Evolutionary Biology and Medicine, Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Anne-Ruxandra Carvunis
- Pittsburgh Center for Evolutionary Biology and Medicine, Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| |
Collapse
|
2
|
Sanejouand YH. Are Most Human-Specific Proteins Encoded by Long Noncoding RNAs? J Mol Evol 2024:10.1007/s00239-024-10174-z. [PMID: 38916610 DOI: 10.1007/s00239-024-10174-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 05/03/2024] [Indexed: 06/26/2024]
Abstract
By looking for a lack of homologs in a reference database of 27 well-annotated proteomes of primates and 52 well-annotated proteomes of other mammals, 170 putative human-specific proteins were identified. While most of them are deemed uncertain, 2 are known at the protein level and 23 at the transcript level, according to UniProt. Interestingly, 23 of these 25 proteins are found to be encoded or to have close homologs in an open reading frame of a long noncoding human RNA. However, half of them are predicted to be at least 80% globular, with a single structural domain, according to IUPred, and with at least 80% of ordered residues, according to flDPnn. Strikingly, there is a near-complete lack of structural knowledge about these proteins, with no tertiary structure presently available in the Protein Data Bank and a fair prediction for one of them in the AlphaFold Protein Structure Database. Moreover, knowledge about the function of these possibly key proteins remains scarce.
Collapse
Affiliation(s)
- Yves-Henri Sanejouand
- US2B, UMR 6286 of CNRS, Nantes University, 2 rue de la Houssinière, Nantes, 44322, Pays de la Loire, France.
| |
Collapse
|
3
|
McCoy MJ, Fire AZ. Parallel gene size and isoform expansion of ancient neuronal genes. Curr Biol 2024; 34:1635-1645.e3. [PMID: 38460513 PMCID: PMC11043017 DOI: 10.1016/j.cub.2024.02.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 12/16/2023] [Accepted: 02/11/2024] [Indexed: 03/11/2024]
Abstract
How nervous systems evolved is a central question in biology. A diversity of synaptic proteins is thought to play a central role in the formation of specific synapses leading to nervous system complexity. The largest animal genes, often spanning hundreds of thousands of base pairs, are known to be enriched for expression in neurons at synapses and are frequently mutated or misregulated in neurological disorders and diseases. Although many of these genes have been studied independently in the context of nervous system evolution and disease, general principles underlying their parallel evolution remain unknown. To investigate this, we directly compared orthologous gene sizes across eukaryotes. By comparing relative gene sizes within organisms, we identified a distinct class of large genes with origins predating the diversification of animals and, in many cases, the emergence of neurons as dedicated cell types. We traced this class of ancient large genes through evolution and found orthologs of the large synaptic genes potentially driving the immense complexity of metazoan nervous systems, including in humans and cephalopods. Moreover, we found that while these genes are evolving under strong purifying selection, as demonstrated by low dN/dS ratios, they have simultaneously grown larger and gained the most isoforms in animals. This work provides a new lens through which to view this distinctive class of large and multi-isoform genes and demonstrates how intrinsic genomic properties, such as gene length, can provide flexibility in molecular evolution and allow groups of genes and their host organisms to evolve toward complexity.
Collapse
Affiliation(s)
- Matthew J McCoy
- Department of Pathology, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305, USA.
| | - Andrew Z Fire
- Department of Pathology, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305, USA; Department of Genetics, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305, USA.
| |
Collapse
|
4
|
Pividori M, Lu S, Li B, Su C, Johnson ME, Wei WQ, Feng Q, Namjou B, Kiryluk K, Kullo IJ, Luo Y, Sullivan BD, Voight BF, Skarke C, Ritchie MD, Grant SFA, Greene CS. Projecting genetic associations through gene expression patterns highlights disease etiology and drug mechanisms. Nat Commun 2023; 14:5562. [PMID: 37689782 PMCID: PMC10492839 DOI: 10.1038/s41467-023-41057-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2021] [Accepted: 08/18/2023] [Indexed: 09/11/2023] Open
Abstract
Genes act in concert with each other in specific contexts to perform their functions. Determining how these genes influence complex traits requires a mechanistic understanding of expression regulation across different conditions. It has been shown that this insight is critical for developing new therapies. Transcriptome-wide association studies have helped uncover the role of individual genes in disease-relevant mechanisms. However, modern models of the architecture of complex traits predict that gene-gene interactions play a crucial role in disease origin and progression. Here we introduce PhenoPLIER, a computational approach that maps gene-trait associations and pharmacological perturbation data into a common latent representation for a joint analysis. This representation is based on modules of genes with similar expression patterns across the same conditions. We observe that diseases are significantly associated with gene modules expressed in relevant cell types, and our approach is accurate in predicting known drug-disease pairs and inferring mechanisms of action. Furthermore, using a CRISPR screen to analyze lipid regulation, we find that functionally important players lack associations but are prioritized in trait-associated modules by PhenoPLIER. By incorporating groups of co-expressed genes, PhenoPLIER can contextualize genetic associations and reveal potential targets missed by single-gene strategies.
Collapse
Affiliation(s)
- Milton Pividori
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Sumei Lu
- Center for Spatial and Functional Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Binglan Li
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Chun Su
- Center for Spatial and Functional Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Matthew E Johnson
- Center for Spatial and Functional Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Wei-Qi Wei
- Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Qiping Feng
- Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Bahram Namjou
- Cincinnati Children's Hospital Medical Center, Cincinnati, OH, 45229, USA
| | - Krzysztof Kiryluk
- Department of Medicine, Division of Nephrology, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, 10032, USA
| | | | - Yuan Luo
- Northwestern University, Chicago, IL, 60611, USA
| | - Blair D Sullivan
- Kahlert School of Computing, University of Utah, Salt Lake City, UT, 84112, USA
| | - Benjamin F Voight
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Carsten Skarke
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Marylyn D Ritchie
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Struan F A Grant
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Center for Spatial and Functional Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Division of Endocrinology and Diabetes, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Division of Human Genetics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Casey S Greene
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA.
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO, 80045, USA.
| |
Collapse
|
5
|
McCoy MJ, Fire AZ. Ancient origins of complex neuronal genes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.28.534655. [PMID: 37034725 PMCID: PMC10081198 DOI: 10.1101/2023.03.28.534655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
How nervous systems evolved is a central question in biology. An increasing diversity of synaptic proteins is thought to play a central role in the formation of specific synapses leading to nervous system complexity. The largest animal genes, often spanning millions of base pairs, are known to be enriched for expression in neurons at synapses and are frequently mutated or misregulated in neurological disorders and diseases. While many of these genes have been studied independently in the context of nervous system evolution and disease, general principles underlying their parallel evolution remain unknown. To investigate this, we directly compared orthologous gene sizes across eukaryotes. By comparing relative gene sizes within organisms, we identified a distinct class of large genes with origins predating the diversification of animals and in many cases the emergence of dedicated neuronal cell types. We traced this class of ancient large genes through evolution and found orthologs of the large synaptic genes driving the immense complexity of metazoan nervous systems, including in humans and cephalopods. Moreover, we found that while these genes are evolving under strong purifying selection as demonstrated by low dN/dS scores, they have simultaneously grown larger and gained the most isoforms in animals. This work provides a new lens through which to view this distinctive class of large and multi-isoform genes and demonstrates how intrinsic genomic properties, such as gene length, can provide flexibility in molecular evolution and allow groups of genes and their host organisms to evolve toward complexity.
Collapse
Affiliation(s)
- Matthew J. McCoy
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Whitman Center, Marine Biological Laboratory, Woods Hole, MA 02543, USA
| | - Andrew Z. Fire
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
6
|
Sanejouand YH. On the Unknown Proteins of Eukaryotic Proteomes. J Mol Evol 2023:10.1007/s00239-023-10116-1. [PMID: 37219573 DOI: 10.1007/s00239-023-10116-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 05/07/2023] [Indexed: 05/24/2023]
Abstract
To study unknown proteins on a large scale, a reference system has been set up for the three better studied eukaryotic kingdoms, built with 36 proteomes as taxonomically diverse as possible. Proteins from 362 other eukaryotic proteomes with no known homologue in this set were then analyzed, focusing noteworthy on singletons, that is, on such proteins with no known homologue in their own proteome. Consistently, for a given species, no more than 12% of the singletons thus found are known at the protein level, according to Uniprot. In addition, since they rely on the information found in the alignment of homologous sequences, predictions of AlphaFold2 for their tridimensional structure are poor. In the case of metazoan species, the number of singletons rarely exceeds 1000 for the species the closest to the reference system (divergence times below 75 Myr). Interestingly, in the cases of viridiplantae and fungi, larger amounts of singletons are found for such species, as if the timescale on which singletons are added to proteomes were different in metazoa and in other eukaryotic kingdoms. In order to confirm this phenomenon, further studies of proteomes closer to those of the reference system are, however, needed.
Collapse
Affiliation(s)
- Yves-Henri Sanejouand
- US2B, UMR 6286 of CNRS, Nantes University, rue de la Houssinière, 44322, Nantes, France.
| |
Collapse
|
7
|
Li S, Hannenhalli S, Ovcharenko I. De novo human brain enhancers created by single-nucleotide mutations. SCIENCE ADVANCES 2023; 9:eadd2911. [PMID: 36791193 PMCID: PMC9931207 DOI: 10.1126/sciadv.add2911] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 01/12/2023] [Indexed: 05/30/2023]
Abstract
Advanced human cognition is attributed to increased neocortex size and complexity, but the underlying evolutionary and regulatory mechanisms are largely unknown. Using human and macaque embryonic neocortical H3K27ac data coupled with a deep learning model of enhancers, we identified ~4000 enhancer gains in humans, which, per our model, can often be attributed to single-nucleotide essential mutations. Our analyses suggest that functional gains in embryonic brain development are associated with de novo enhancers whose putative target genes exhibit increased expression in progenitor cells and interneurons and partake in critical neural developmental processes. Essential mutations alter enhancer activity through altered binding of key transcription factors (TFs) of embryonic neocortex, including ISL1, POU3F2, PITX1/2, and several SOX TFs, and are associated with central nervous system disorders. Overall, our results suggest that essential mutations lead to gain of embryonic neocortex enhancers, which orchestrate expression of genes involved in critical developmental processes associated with human cognition.
Collapse
Affiliation(s)
- Shan Li
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892, USA
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Sridhar Hannenhalli
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Ivan Ovcharenko
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
8
|
Moreyra NN, Almeida FC, Allan C, Frankel N, Matzkin LM, Hasson E. Phylogenomics provides insights into the evolution of cactophily and host plant shifts in Drosophila. Mol Phylogenet Evol 2023; 178:107653. [PMID: 36404461 DOI: 10.1016/j.ympev.2022.107653] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 09/30/2022] [Accepted: 10/25/2022] [Indexed: 11/06/2022]
Abstract
Cactophilic species of the Drosophila buzzatii cluster (repleta group) comprise an excellent model group to investigate genomic changes underlying adaptation to extreme climate conditions and host plants. In particular, these species form a tractable system to study the transition from chemically simpler breeding sites (like prickly pears of the genus Opuntia) to chemically more complex hosts (columnar cacti). Here, we report four highly contiguous genome assemblies of three species of the buzzatii cluster. Based on this genomic data and inferred phylogenetic relationships, we identified candidate taxonomically restricted genes (TRGs) likely involved in the evolution of cactophily and cactus host specialization. Functional enrichment analyses of TRGs within the buzzatii cluster identified genes involved in detoxification, water preservation, immune system response, anatomical structure development, and morphogenesis. In contrast, processes that regulate responses to stress, as well as the metabolism of nitrogen compounds, transport, and secretion were found in the set of species that are columnar cacti dwellers. These findings are in line with the hypothesis that those genomic changes brought about key mechanisms underlying the adaptation of the buzzatii cluster species to arid regions in South America.
Collapse
Affiliation(s)
- Nicolás Nahuel Moreyra
- Departamento de Ecología, Genética y Evolución (EGE), Facultad de Ciencias Exactas y Naturales (FCEyN), Universidad de Buenos Aires (UBA), Ciudad Autónoma de Buenos Aires C1428EGA, Argentina; Instituto de Ecología, Genética y Evolución de Buenos Aires (IEGEBA), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Ciudad Autónoma de Buenos Aires C1428EGA, Argentina.
| | - Francisca Cunha Almeida
- Departamento de Ecología, Genética y Evolución (EGE), Facultad de Ciencias Exactas y Naturales (FCEyN), Universidad de Buenos Aires (UBA), Ciudad Autónoma de Buenos Aires C1428EGA, Argentina; Instituto de Ecología, Genética y Evolución de Buenos Aires (IEGEBA), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Ciudad Autónoma de Buenos Aires C1428EGA, Argentina.
| | - Carson Allan
- Department of Entomology, University of Arizona, Tucson, AZ 85719, USA.
| | - Nicolás Frankel
- Departamento de Ecología, Genética y Evolución (EGE), Facultad de Ciencias Exactas y Naturales (FCEyN), Universidad de Buenos Aires (UBA), Ciudad Autónoma de Buenos Aires C1428EGA, Argentina; Instituto de Ecología, Genética y Evolución de Buenos Aires (IEGEBA), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Ciudad Autónoma de Buenos Aires C1428EGA, Argentina.
| | | | - Esteban Hasson
- Departamento de Ecología, Genética y Evolución (EGE), Facultad de Ciencias Exactas y Naturales (FCEyN), Universidad de Buenos Aires (UBA), Ciudad Autónoma de Buenos Aires C1428EGA, Argentina; Instituto de Ecología, Genética y Evolución de Buenos Aires (IEGEBA), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Ciudad Autónoma de Buenos Aires C1428EGA, Argentina.
| |
Collapse
|
9
|
Zhang N, Li Y, Halanych KM, Kong L, Li Q. A comparative analysis of mitochondrial ORFs provides new insights on expansion of mitochondrial genome size in Arcidae. BMC Genomics 2022; 23:809. [PMID: 36474182 PMCID: PMC9727918 DOI: 10.1186/s12864-022-09040-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 11/22/2022] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Arcidae, comprising about 260 species of ark shells, is an ecologically and economically important lineage of bivalve mollusks. Interestingly, mitochondrial genomes of several Arcidae species are 2-3 times larger than those of most bilaterians, and are among the largest bilaterian mitochondrial genomes reported to date. The large mitochondrial genome size is mainly due to expansion of unassigned regions (regions that are functionally unassigned). Previous work on unassigned regions of Arcidae mtDNA genomes has focused on nucleotide-level analyses to observe sequence characteristics, however the origin of expansion remains unclear. RESULTS We assembled six new mitogenomes and sequenced six transcriptomes of Scapharca broughtonii to identify conserved functional ORFs that are transcribed in unassigned regions. Sixteen lineage-specific ORFs with different copy numbers were identified from seven Arcidae species, and 11 of 16 ORFs were expressed and likely biologically active. Unassigned regions of 32 Arcidae mitogenomes were compared to verify the presence of these novel mitochondrial ORFs and their distribution. Strikingly, multiple structural analyses and functional prediction suggested that these additional mtDNA-encoded proteins have potential functional significance. In addition, our results also revealed that the ORFs have a strong connection to the expansion of Arcidae mitochondrial genomes and their large-scale duplication play an important role in multiple expansion events. We discussed the possible origin of ORFs and hypothesized that these ORFs may originate from duplication of mitochondrial genes. CONCLUSIONS The presence of lineage-specific mitochondrial ORFs with transcriptional activity and potential functional significance supports novel features for Arcidae mitochondrial genomes. Given our observation and analyses, these ORFs may be products of mitochondrial gene duplication. These findings shed light on the origin and function of novel mitochondrial genes in bivalves and provide new insights into evolution of mitochondrial genome size in metazoans.
Collapse
Affiliation(s)
- Ning Zhang
- grid.4422.00000 0001 2152 3263Key Laboratory of Mariculture, Ministry of Education, Ocean University of China, Qingdao, China
| | - Yuanning Li
- grid.27255.370000 0004 1761 1174Shandong University, Qingdao, China
| | - Kenneth M. Halanych
- grid.217197.b0000 0000 9813 0452Center for Marine Science, University of North Carolina Wilmington, Wilmington, NC 28409 USA
| | - Lingfeng Kong
- grid.4422.00000 0001 2152 3263Key Laboratory of Mariculture, Ministry of Education, Ocean University of China, Qingdao, China ,grid.484590.40000 0004 5998 3072Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, Qingdao, China
| | - Qi Li
- grid.4422.00000 0001 2152 3263Key Laboratory of Mariculture, Ministry of Education, Ocean University of China, Qingdao, China ,grid.484590.40000 0004 5998 3072Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, Qingdao, China
| |
Collapse
|
10
|
Jiang M, Li X, Dong X, Zu Y, Zhan Z, Piao Z, Lang H. Research Advances and Prospects of Orphan Genes in Plants. FRONTIERS IN PLANT SCIENCE 2022; 13:947129. [PMID: 35874010 PMCID: PMC9305701 DOI: 10.3389/fpls.2022.947129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 06/23/2022] [Indexed: 06/15/2023]
Abstract
Orphan genes (OGs) are defined as genes having no sequence similarity with genes present in other lineages. OGs have been regarded to play a key role in the development of lineage-specific adaptations and can also serve as a constant source of evolutionary novelty. These genes have often been found related to various stress responses, species-specific traits, special expression regulation, and also participate in primary substance metabolism. The advancement in sequencing tools and genome analysis methods has made the identification and characterization of OGs comparatively easier. In the study of OG functions in plants, significant progress has been made. We review recent advances in the fast evolving characteristics, expression modulation, and functional analysis of OGs with a focus on their role in plant biology. We also emphasize current challenges, adoptable strategies and discuss possible future directions of functional study of OGs.
Collapse
Affiliation(s)
- Mingliang Jiang
- School of Agriculture, Jilin Agricultural Science and Technology College, Jilin, China
| | - Xiaonan Li
- College of Horticulture, Shenyang Agricultural University, Shenyang, China
| | - Xiangshu Dong
- School of Agriculture, Yunnan University, Kunming, China
| | - Ye Zu
- College of Horticulture, Shenyang Agricultural University, Shenyang, China
| | - Zongxiang Zhan
- College of Horticulture, Shenyang Agricultural University, Shenyang, China
| | - Zhongyun Piao
- College of Horticulture, Shenyang Agricultural University, Shenyang, China
| | - Hong Lang
- School of Agriculture, Jilin Agricultural Science and Technology College, Jilin, China
| |
Collapse
|
11
|
Raxwal VK, Singh S, Agarwal M, Riha K. Transcriptional and post-transcriptional regulation of young genes in plants. BMC Biol 2022; 20:134. [PMID: 35676681 PMCID: PMC9178820 DOI: 10.1186/s12915-022-01339-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Accepted: 05/30/2022] [Indexed: 12/03/2022] Open
Abstract
Background New genes continuously emerge from non-coding DNA or by diverging from existing genes, but most of them are rapidly lost and only a few become fixed within the population. We hypothesized that young genes are subject to transcriptional and post-transcriptional regulation to limit their expression and minimize their exposure to purifying selection. Results We performed a protein-based homology search across the tree of life to determine the evolutionary age of protein-coding genes present in the rice genome. We found that young genes in rice have relatively low expression levels, which can be attributed to distal enhancers, and closed chromatin conformation at their transcription start sites (TSS). The chromatin in TSS regions can be re-modeled in response to abiotic stress, indicating conditional expression of young genes. Furthermore, transcripts of young genes in Arabidopsis tend to be targeted by nonsense-mediated RNA decay, presenting another layer of regulation limiting their expression. Conclusions These data suggest that transcriptional and post-transcriptional mechanisms contribute to the conditional expression of young genes, which may alleviate purging selection while providing an opportunity for phenotypic exposure and functionalization. Supplementary Information The online version contains supplementary material available at 10.1186/s12915-022-01339-7.
Collapse
Affiliation(s)
- Vivek Kumar Raxwal
- Department of Botany, University of Delhi, Delhi, 110007, India. .,Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czech Republic.
| | - Somya Singh
- Department of Botany, University of Delhi, Delhi, 110007, India
| | - Manu Agarwal
- Department of Botany, University of Delhi, Delhi, 110007, India.
| | - Karel Riha
- Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czech Republic.
| |
Collapse
|
12
|
Chen Y, Wang D, Li N, Wang D, Liu XH, Song Y. Accelerated evolution of Vkorc1 in desert rodent species reveals genetic preadaptation to anticoagulant rodenticides. PEST MANAGEMENT SCIENCE 2022; 78:2704-2713. [PMID: 35394111 DOI: 10.1002/ps.6905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 03/23/2022] [Accepted: 04/08/2022] [Indexed: 06/14/2023]
Abstract
BACKGROUND Some rodent species living in arid areas show elevated physiological tolerance to anti-vitamin K rodenticides (AVKs), which seems to be due to some unknown selective pressures that rodents may experience in desert habitats. Genes involved in the ϒ-carboxylation of blood coagulation, including vitamin K epoxide reductase complex, subunit 1 (Vkorc1), ϒ-glutamyl-carboxylase (Ggcx) and NAD(P)H quinone one dehydrogenase (Nqo1) are associated with anticoagulant resistance, or some levels of elevated tolerance, in rodents. To detect whether the DNA sequences of the three genes are also under natural selection in the desert rodent species, we analyzed the Vkorc1, Ggcx and Nqo1 genes of the desert rodents and compared them with other rodent species. RESULTS We found an accelerated evolutionary rate in Vkorc1 of desert rodents, especially in Mus spretus, Nannospalax galili and Psammomys obesus. By contrast, signals of positive selection were absent for Ggcx and Nqo1 in all species. Mapping the amino acid variations on the VKORC1 protein three-dimensional model suggested most interspecific amino acid variations occur on the outer surface of the VKORC1 pocket, whereas most intraspecific amino acid changes and known AVK resistance mutations occurred on the inner surface and endoplasmic reticulum luminal loop regions. Some desert-species-specific amino acid variations were found on the positions where known resistance mutations occurred, indicating these variations might be related to the elevated physical tolerance to AVKs in desert rodents. CONCLUSION The evolution of Vkorc1 has been accelerated in some desert rodent species, indicating genetic preadaptation to anticoagulant rodenticides. Positive selection and relaxed selection have been detected in Psammomys obesus and Nannospalax galili, indicating the two rodent species might also show tolerance to AVKs, which needs further verification. © 2022 Society of Chemical Industry.
Collapse
Affiliation(s)
- Yan Chen
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Dawei Wang
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Ning Li
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Deng Wang
- College of Grassland Science and Technology, China Agricultural University, Beijing, China
| | - Xiao-Hui Liu
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Ying Song
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, China
| |
Collapse
|
13
|
Kumar H, Panigrahi M, Panwar A, Rajawat D, Nayak SS, Saravanan KA, Kaisa K, Parida S, Bhushan B, Dutt T. Machine-Learning Prospects for Detecting Selection Signatures Using Population Genomics Data. J Comput Biol 2022; 29:943-960. [PMID: 35639362 DOI: 10.1089/cmb.2021.0447] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Natural selection has been given a lot of attention because it relates to the adaptation of populations to their environments, both biotic and abiotic. An allele is selected when it is favored by natural selection. Consequently, the favored allele increases in frequency in the population and neighboring linked variation diminishes, causing so-called selective sweeps. A high-throughput genomic sequence allows one to disentangle the evolutionary forces at play in populations. With the development of high-throughput genome sequencing technologies, it has become easier to detect these selective sweeps/selection signatures. Various methods can be used to detect selective sweeps, from simple implementations using summary statistics to complex statistical approaches. One of the important problems of these statistical models is the potential to provide inaccurate results when their assumptions are violated. The use of machine learning (ML) in population genetics has been introduced as an alternative method of detecting selection by treating the problem of detecting selection signatures as a classification problem. Since the availability of population genomics data is increasing, researchers may incorporate ML into these statistical models to infer signatures of selection with higher predictive accuracy and better resolution. This article describes how ML can be used to aid in detecting and studying natural selection patterns using population genomic data.
Collapse
Affiliation(s)
- Harshit Kumar
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Manjit Panigrahi
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Anuradha Panwar
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Divya Rajawat
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Sonali Sonejita Nayak
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - K A Saravanan
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Kaiho Kaisa
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Subhashree Parida
- Divisions of Pharmacology and Toxicology, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Bharat Bhushan
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Triveni Dutt
- Livestock Production and Management Section, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| |
Collapse
|
14
|
Soni V, Eyre-Walker A. OUP accepted manuscript. Genome Biol Evol 2022; 14:6528851. [PMID: 35166775 PMCID: PMC8882387 DOI: 10.1093/gbe/evac028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/09/2022] [Indexed: 12/05/2022] Open
Abstract
The rate of amino acid substitution has been shown to be correlated to a number of factors including the rate of recombination, the age of the gene, the length of the protein, mean expression level, and gene function. However, the extent to which these correlations are due to adaptive and nonadaptive evolution has not been studied in detail, at least not in hominids. We find that the rate of adaptive evolution is significantly positively correlated to the rate of recombination, protein length and gene expression level, and negatively correlated to gene age. These correlations remain significant when each factor is controlled for in turn, except when controlling for expression in an analysis of protein length; and they also generally remain significant when biased gene conversion is taken into account. However, the positive correlations could be an artifact of population size contraction. We also find that the rate of nonadaptive evolution is negatively correlated to each factor, and all these correlations survive controlling for each other and biased gene conversion. Finally, we examine the effect of gene function on rates of adaptive and nonadaptive evolution; we confirm that virus-interacting proteins (VIPs) have higher rates of adaptive and lower rates of nonadaptive evolution, but we also demonstrate that there is significant variation in the rate of adaptive and nonadaptive evolution between GO categories when removing VIPs. We estimate that the VIP/non-VIP axis explains about 5–8 fold more of the variance in evolutionary rate than GO categories.
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | - Adam Eyre-Walker
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
- Corresponding author: E-mail:
| |
Collapse
|
15
|
Zhao Z, Ma D. Genome-Wide Identification, Characterization and Function Analysis of Lineage-Specific Genes in the Tea Plant Camellia sinensis. Front Genet 2021; 12:770570. [PMID: 34858483 PMCID: PMC8631334 DOI: 10.3389/fgene.2021.770570] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2021] [Accepted: 10/14/2021] [Indexed: 11/22/2022] Open
Abstract
Genes that have no homologous sequences with other species are called lineage-specific genes (LSGs), are common in living organisms, and have an important role in the generation of new functions, adaptive evolution and phenotypic alteration of species. Camellia sinensis var. sinensis (CSS) is one of the most widely distributed cultivars for quality green tea production. The rich catechins in tea have antioxidant, free radical elimination, fat loss and cancer prevention potential. To further understand the evolution and utilize the function of LSGs in tea, we performed a comparative genomics approach to identify Camellia-specific genes (CSGs). Our result reveals that 1701 CSGs were identified specific to CSS, accounting for 3.37% of all protein-coding genes. The majority of CSGs (57.08%) were generated by gene duplication, and the time of duplication occurrence coincide with the time of two genome-wide replication (WGD) events that happened in CSS genome. Gene structure analysis revealed that CSGs have shorter gene lengths, fewer exons, higher GC content and higher isoelectric point. Gene expression analysis showed that CSG had more tissue-specific expression compared to evolutionary conserved genes (ECs). Weighted gene co-expression network analysis (WGCNA) showed that 18 CSGs are mainly associated with catechin synthesis-related pathways, including phenylalanine biosynthesis, biosynthesis of amino acids, pentose phosphate pathway, photosynthesis and carbon metabolism. Besides, we found that the expression of three CSGs (CSS0030246, CSS0002298, and CSS0030939) was significantly down-regulated in response to both types of stresses (salt and drought). Our study first systematically identified LSGs in CSS, and comprehensively analyzed the features and potential functions of CSGs. We also identified key candidate genes, which will provide valuable assistance for further studies on catechin synthesis and provide a molecular basis for the excavation of excellent germplasm resources.
Collapse
Affiliation(s)
- Zhizhu Zhao
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, China
| | - Dongna Ma
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, China
| |
Collapse
|
16
|
Yousaf A, Liu J, Ye S, Chen H. Current Progress in Evolutionary Comparative Genomics of Great Apes. Front Genet 2021; 12:657468. [PMID: 34456962 PMCID: PMC8385753 DOI: 10.3389/fgene.2021.657468] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 07/15/2021] [Indexed: 12/04/2022] Open
Abstract
The availability of high-quality genome sequences of great ape species provides unprecedented opportunities for genomic analyses. Herein, we reviewed the recent progress in evolutionary comparative genomic studies of the existing great ape species, including human, chimpanzee, bonobo, gorilla, and orangutan. We elaborate discovery on evolutionary history, natural selection, structural variations, and new genes of these species, which is informative for understanding the origin of human-specific phenotypes.
Collapse
Affiliation(s)
- Aisha Yousaf
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,China National Center for Bioinformation, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Junfeng Liu
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,China National Center for Bioinformation, Beijing, China
| | - Sicheng Ye
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,China National Center for Bioinformation, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Hua Chen
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,China National Center for Bioinformation, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China.,CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| |
Collapse
|
17
|
Cancer, Retrogenes, and Evolution. Life (Basel) 2021; 11:life11010072. [PMID: 33478113 PMCID: PMC7835786 DOI: 10.3390/life11010072] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Revised: 01/14/2021] [Accepted: 01/15/2021] [Indexed: 12/18/2022] Open
Abstract
This review summarizes the knowledge about retrogenes in the context of cancer and evolution. The retroposition, in which the processed mRNA from parental genes undergoes reverse transcription and the resulting cDNA is integrated back into the genome, results in additional copies of existing genes. Despite the initial misconception, retroposition-derived copies can become functional, and due to their role in the molecular evolution of genomes, they have been named the “seeds of evolution”. It is convincing that retrogenes, as important elements involved in the evolution of species, also take part in the evolution of neoplastic tumors at the cell and species levels. The occurrence of specific “resistance mechanisms” to neoplastic transformation in some species has been noted. This phenomenon has been related to additional gene copies, including retrogenes. In addition, the role of retrogenes in the evolution of tumors has been described. Retrogene expression correlates with the occurrence of specific cancer subtypes, their stages, and their response to therapy. Phylogenetic insights into retrogenes show that most cancer-related retrocopies arose in the lineage of primates, and the number of identified cancer-related retrogenes demonstrates that these duplicates are quite important players in human carcinogenesis.
Collapse
|
18
|
Nguyen P, Hess K, Smulders L, Le D, Briseno C, Chavez CM, Nikolaidis N. Origin and Evolution of the Human Bcl2-Associated Athanogene-1 (BAG-1). Int J Mol Sci 2020; 21:ijms21249701. [PMID: 33353252 PMCID: PMC7766421 DOI: 10.3390/ijms21249701] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Revised: 12/13/2020] [Accepted: 12/17/2020] [Indexed: 02/07/2023] Open
Abstract
Molecular chaperones, particularly the 70-kDa heat shock proteins (Hsp70s), are key orchestrators of the cellular stress response. To perform their critical functions, Hsp70s require the presence of specific co-chaperones, which include nucleotide exchange factors containing the BCL2-associated athanogene (BAG) domain. BAG-1 is one of these proteins that function in a wide range of cellular processes, including apoptosis, protein refolding, and degradation, as well as tumorigenesis. However, the origin of BAG-1 proteins and their evolution between and within species are mostly uncharacterized. This report investigated the macro- and micro-evolution of BAG-1 using orthologous sequences and single nucleotide polymorphisms (SNPs) to elucidate the evolution and understand how natural variation affects the cellular stress response. We first collected and analyzed several BAG-1 sequences across animals, plants, and fungi; mapped intron positions and phases; reconstructed phylogeny; and analyzed protein characteristics. These data indicated that BAG-1 originated before the animals, plants, and fungi split, yet most extant fungal species have lost BAG-1. Furthermore, although BAG-1's structure has remained relatively conserved, kingdom-specific conserved differences exist at sites of known function, suggesting functional specialization within each kingdom. We then analyzed SNPs from the 1000 genomes database to determine the evolutionary patterns within humans. These analyses revealed that the SNP density is unequally distributed within the BAG1 gene, and the ratio of non-synonymous/synonymous SNPs is significantly higher than 1 in the BAG domain region, which is an indication of positive selection. To further explore this notion, we performed several biochemical assays and found that only one out of five mutations tested altered the major co-chaperone properties of BAG-1. These data collectively suggest that although the co-chaperone functions of BAG-1 are highly conserved and can probably tolerate several radical mutations, BAG-1 might have acquired specialized and potentially unexplored functions during the evolutionary process.
Collapse
Affiliation(s)
- Peter Nguyen
- Center for Applied Biotechnology Studies, and Center for Computational and Applied Mathematics, Department of Biological Science, College of Natural Sciences and Mathematics, California State University Fullerton, Fullerton, CA 92834-6850, USA; (P.N.); (L.S.); (D.L.); (C.B.); (C.M.C.)
| | - Kyle Hess
- Department of Genome Sciences, Molecular and Cellular Biology Graduate Program, University of Washington, Seattle, WA 98195, USA;
| | - Larissa Smulders
- Center for Applied Biotechnology Studies, and Center for Computational and Applied Mathematics, Department of Biological Science, College of Natural Sciences and Mathematics, California State University Fullerton, Fullerton, CA 92834-6850, USA; (P.N.); (L.S.); (D.L.); (C.B.); (C.M.C.)
| | - Dat Le
- Center for Applied Biotechnology Studies, and Center for Computational and Applied Mathematics, Department of Biological Science, College of Natural Sciences and Mathematics, California State University Fullerton, Fullerton, CA 92834-6850, USA; (P.N.); (L.S.); (D.L.); (C.B.); (C.M.C.)
| | - Carolina Briseno
- Center for Applied Biotechnology Studies, and Center for Computational and Applied Mathematics, Department of Biological Science, College of Natural Sciences and Mathematics, California State University Fullerton, Fullerton, CA 92834-6850, USA; (P.N.); (L.S.); (D.L.); (C.B.); (C.M.C.)
| | - Christina M. Chavez
- Center for Applied Biotechnology Studies, and Center for Computational and Applied Mathematics, Department of Biological Science, College of Natural Sciences and Mathematics, California State University Fullerton, Fullerton, CA 92834-6850, USA; (P.N.); (L.S.); (D.L.); (C.B.); (C.M.C.)
| | - Nikolas Nikolaidis
- Center for Applied Biotechnology Studies, and Center for Computational and Applied Mathematics, Department of Biological Science, College of Natural Sciences and Mathematics, California State University Fullerton, Fullerton, CA 92834-6850, USA; (P.N.); (L.S.); (D.L.); (C.B.); (C.M.C.)
- Correspondence: ; Tel.: +1-657-278-4526
| |
Collapse
|
19
|
Wang M, Wang D, Yu J, Huang S. Enrichment in conservative amino acid changes among fixed and standing missense variations in slowly evolving proteins. PeerJ 2020; 8:e9983. [PMID: 32995099 PMCID: PMC7501800 DOI: 10.7717/peerj.9983] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 08/27/2020] [Indexed: 11/20/2022] Open
Abstract
The process of molecular evolution has many elements that are not yet fully understood. Evolutionary rates are known to vary among protein coding and noncoding DNAs, and most of the observed changes in amino acid or nucleotide sequences are assumed to be non-adaptive by the neutral theory of molecular evolution. However, it remains unclear whether fixed and standing missense changes in slowly evolving proteins are more or less neutral compared to those in fast evolving genes. Here, based on the evolutionary rates as inferred from identity scores between orthologs in human and Rhesus Macaques (Macaca mulatta), we found that the fraction of conservative substitutions between species was significantly higher in their slowly evolving proteins. Similar results were obtained by using four different methods of scoring conservative substitutions, including three that remove the impact of substitution probability, where conservative changes require fewer mutations. We also examined the single nucleotide polymorphisms (SNPs) by using the 1000 Genomes Project data and found that missense SNPs in slowly evolving proteins also had a higher fraction of conservative changes, especially for common SNPs, consistent with more non-conservative substitutions and hence stronger natural selection for SNPs, particularly rare ones, in fast evolving proteins. These results suggest that fixed and standing missense variants in slowly evolving proteins are more likely to be neutral.
Collapse
Affiliation(s)
- Mingrui Wang
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, P.R. China
| | - Dapeng Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, Beijing, P.R. China
- Current affiliation: LeedsOmics, University of Leeds, Leeds, UK
| | - Jun Yu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, Beijing, P.R. China
| | - Shi Huang
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, P.R. China
| |
Collapse
|
20
|
Dapper AL, Wade MJ. Relaxed Selection and the Rapid Evolution of Reproductive Genes. Trends Genet 2020; 36:640-649. [PMID: 32713599 DOI: 10.1016/j.tig.2020.06.014] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Revised: 06/22/2020] [Accepted: 06/23/2020] [Indexed: 10/23/2022]
Abstract
Evolutionary genomic studies find that reproductive protein genes, those directly involved in reproductive processes, diversify more rapidly than most other gene categories. Strong postcopulatory sexual selection acting within species is the predominant hypothesis proposed to account for the observed pattern. Recently, relaxed selection due to sex-specific gene expression has also been put forward to explain the relatively rapid diversification. We contend that relaxed selection due to sex-limited gene expression is the correct null model for tests of molecular evolution of reproductive genes and argue that it may play a more significant role in the evolutionary diversification of reproductive genes than previously recognized. We advocate for a re-evaluation of adaptive explanations for the rapid diversification of reproductive genes.
Collapse
Affiliation(s)
- Amy L Dapper
- Department of Biological Sciences, Mississippi State University, Mississippi State, MS 39762, USA; Department of Biology, Indiana University, Bloomington, IN 47401, USA.
| | - Michael J Wade
- Department of Biology, Indiana University, Bloomington, IN 47401, USA
| |
Collapse
|
21
|
Mohamed SK, Nounu A, Nováček V. Biological applications of knowledge graph embedding models. Brief Bioinform 2020; 22:1679-1693. [PMID: 32065227 DOI: 10.1093/bib/bbaa012] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Revised: 01/10/2020] [Accepted: 01/21/2020] [Indexed: 01/04/2023] Open
Abstract
Complex biological systems are traditionally modelled as graphs of interconnected biological entities. These graphs, i.e. biological knowledge graphs, are then processed using graph exploratory approaches to perform different types of analytical and predictive tasks. Despite the high predictive accuracy of these approaches, they have limited scalability due to their dependency on time-consuming path exploratory procedures. In recent years, owing to the rapid advances of computational technologies, new approaches for modelling graphs and mining them with high accuracy and scalability have emerged. These approaches, i.e. knowledge graph embedding (KGE) models, operate by learning low-rank vector representations of graph nodes and edges that preserve the graph's inherent structure. These approaches were used to analyse knowledge graphs from different domains where they showed superior performance and accuracy compared to previous graph exploratory approaches. In this work, we study this class of models in the context of biological knowledge graphs and their different applications. We then show how KGE models can be a natural fit for representing complex biological knowledge modelled as graphs. We also discuss their predictive and analytical capabilities in different biology applications. In this regard, we present two example case studies that demonstrate the capabilities of KGE models: prediction of drug-target interactions and polypharmacy side effects. Finally, we analyse different practical considerations for KGEs, and we discuss possible opportunities and challenges related to adopting them for modelling biological systems.
Collapse
Affiliation(s)
| | - Aayah Nounu
- Insight Centre for Data Analytics, NUI Galway, Galway, Ireland
| | - Vít Nováček
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, UK
| |
Collapse
|
22
|
Mohamed SK. Predicting tissue-specific protein functions using multi-part tensor decomposition. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2019.08.061] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|
23
|
Rödelsperger C, Prabh N, Sommer RJ. New Gene Origin and Deep Taxon Phylogenomics: Opportunities and Challenges. Trends Genet 2019; 35:914-922. [DOI: 10.1016/j.tig.2019.08.007] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2019] [Revised: 08/07/2019] [Accepted: 08/29/2019] [Indexed: 01/22/2023]
|
24
|
Yin H, Li M, Xia L, He C, Zhang Z. Computational determination of gene age and characterization of evolutionary dynamics in human. Brief Bioinform 2019; 20:2141-2149. [PMID: 30184145 DOI: 10.1093/bib/bby074] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Revised: 08/01/2018] [Accepted: 08/02/2018] [Indexed: 12/23/2022] Open
Abstract
Genes originate at different evolutionary time scales and possess different ages, accordingly presenting diverse functional characteristics and reflecting distinct adaptive evolutionary innovations. In the past decades, progresses have been made in gene age identification by a variety of methods that are principally based on comparative genomics. Here we summarize methods for computational determination of gene age and evaluate the effectiveness of different computational methods for age identification. Our results show that improved age determination can be achieved by combining homolog clustering with phylogeny inference, which enables more accurate age identification in human genes. Accordingly, we characterize evolutionary dynamics of human genes based on an extremely long evolutionary time scale spanning ~4,000 million years from archaea/bacteria to human, revealing that young genes are clustered on certain chromosomes and that Mendelian disease genes (including monogenic disease and polygenic disease genes) and cancer genes exhibit divergent evolutionary origins. Taken together, deciphering genes' ages as well as their evolutionary dynamics is of fundamental significance in unveiling the underlying mechanisms during evolution and better understanding how young or new genes become indispensable integrants coupled with novel phenotypes and biological diversity.
Collapse
Affiliation(s)
- Hongyan Yin
- Hainan Key Laboratory for Sustainable Utilization of Tropical Bioresources, Institute of Tropical Agriculture and Forestry, Hainan University, China
| | - Mengwei Li
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Lin Xia
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Chaozu He
- Hainan Key Laboratory for Sustainable Utilization of Tropical Bioresources, Institute of Tropical Agriculture and Forestry, Hainan University, China
| | - Zhang Zhang
- BIG Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
25
|
Gupta MK, Vadde R. Genetic Basis of Adaptation and Maladaptation via Balancing Selection. ZOOLOGY 2019; 136:125693. [PMID: 31513936 DOI: 10.1016/j.zool.2019.125693] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Accepted: 07/03/2019] [Indexed: 10/26/2022]
|
26
|
Liu J, Robinson-Rechavi M. Adaptive Evolution of Animal Proteins over Development: Support for the Darwin Selection Opportunity Hypothesis of Evo-Devo. Mol Biol Evol 2019; 35:2862-2872. [PMID: 30184095 PMCID: PMC6278863 DOI: 10.1093/molbev/msy175] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
A driving hypothesis of evolutionary developmental biology is that animal morphological diversity is shaped both by adaptation and by developmental constraints. Here, we have tested Darwin’s “selection opportunity” hypothesis, according to which high evolutionary divergence in late development is due to strong positive selection. We contrasted it to a “developmental constraint” hypothesis, according to which late development is under relaxed negative selection. Indeed, the highest divergence between species, both at the morphological and molecular levels, is observed late in embryogenesis and postembryonically. To distinguish between adaptation and relaxation hypotheses, we investigated the evidence of positive selection on protein-coding genes in relation to their expression over development, in fly Drosophila melanogaster, zebrafish Danio rerio, and mouse Mus musculus. First, we found that genes specifically expressed in late development have stronger signals of positive selection. Second, over the full transcriptome, genes with evidence for positive selection trend to be expressed in late development. Finally, genes involved in pathways with cumulative evidence of positive selection have higher expression in late development. Overall, there is a consistent signal that positive selection mainly affects genes and pathways expressed in late embryonic development and in adult. Our results imply that the evolution of embryogenesis is mostly conservative, with most adaptive evolution affecting some stages of postembryonic gene expression, and thus postembryonic phenotypes. This is consistent with the diversity of environmental challenges to which juveniles and adults are exposed.
Collapse
Affiliation(s)
- Jialin Liu
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Marc Robinson-Rechavi
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
27
|
Dussert Y, Mazet ID, Couture C, Gouzy J, Piron MC, Kuchly C, Bouchez O, Rispe C, Mestre P, Delmotte F. A High-Quality Grapevine Downy Mildew Genome Assembly Reveals Rapidly Evolving and Lineage-Specific Putative Host Adaptation Genes. Genome Biol Evol 2019; 11:954-969. [PMID: 30847481 PMCID: PMC6660063 DOI: 10.1093/gbe/evz048] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/05/2019] [Indexed: 02/06/2023] Open
Abstract
Downy mildews are obligate biotrophic oomycete pathogens that cause devastating plant diseases on economically important crops. Plasmopara viticola is the causal agent of grapevine downy mildew, a major disease in vineyards worldwide. We sequenced the genome of Pl. viticola with PacBio long reads and obtained a new 92.94 Mb assembly with high contiguity (359 scaffolds for a N50 of 706.5 kb) due to a better resolution of repeat regions. This assembly presented a high level of gene completeness, recovering 1,592 genes encoding secreted proteins involved in plant–pathogen interactions. Plasmopara viticola had a two-speed genome architecture, with secreted protein-encoding genes preferentially located in gene-sparse, repeat-rich regions and evolving rapidly, as indicated by pairwise dN/dS values. We also used short reads to assemble the genome of Plasmopara muralis, a closely related species infecting grape ivy (Parthenocissus tricuspidata). The lineage-specific proteins identified by comparative genomics analysis included a large proportion of RxLR cytoplasmic effectors and, more generally, genes with high dN/dS values. We identified 270 candidate genes under positive selection, including several genes encoding transporters and components of the RNA machinery potentially involved in host specialization. Finally, the Pl. viticola genome assembly generated here will allow the development of robust population genomics approaches for investigating the mechanisms involved in adaptation to biotic and abiotic selective pressures in this species.
Collapse
Affiliation(s)
- Yann Dussert
- SAVE, INRA, Bordeaux Sciences Agro, Villenave d'Ornon, France
| | | | - Carole Couture
- SAVE, INRA, Bordeaux Sciences Agro, Villenave d'Ornon, France
| | - Jérôme Gouzy
- LIPM, INRA, Université de Toulouse, CNRS, Castanet-Tolosan, France
| | | | - Claire Kuchly
- US 1426 GeT-PlaGe, Genotoul, INRA, Castanet-Tolosan, France
| | | | | | - Pere Mestre
- SVQV, INRA, Université de Strasbourg, Colmar, France
| | | |
Collapse
|
28
|
Dussert Y, Mazet ID, Couture C, Gouzy J, Piron MC, Kuchly C, Bouchez O, Rispe C, Mestre P, Delmotte F. A High-Quality Grapevine Downy Mildew Genome Assembly Reveals Rapidly Evolving and Lineage-Specific Putative Host Adaptation Genes. Genome Biol Evol 2019. [PMID: 30847481 DOI: 10.1101/350041] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2023] Open
Abstract
Downy mildews are obligate biotrophic oomycete pathogens that cause devastating plant diseases on economically important crops. Plasmopara viticola is the causal agent of grapevine downy mildew, a major disease in vineyards worldwide. We sequenced the genome of Pl. viticola with PacBio long reads and obtained a new 92.94 Mb assembly with high contiguity (359 scaffolds for a N50 of 706.5 kb) due to a better resolution of repeat regions. This assembly presented a high level of gene completeness, recovering 1,592 genes encoding secreted proteins involved in plant-pathogen interactions. Plasmopara viticola had a two-speed genome architecture, with secreted protein-encoding genes preferentially located in gene-sparse, repeat-rich regions and evolving rapidly, as indicated by pairwise dN/dS values. We also used short reads to assemble the genome of Plasmopara muralis, a closely related species infecting grape ivy (Parthenocissus tricuspidata). The lineage-specific proteins identified by comparative genomics analysis included a large proportion of RxLR cytoplasmic effectors and, more generally, genes with high dN/dS values. We identified 270 candidate genes under positive selection, including several genes encoding transporters and components of the RNA machinery potentially involved in host specialization. Finally, the Pl. viticola genome assembly generated here will allow the development of robust population genomics approaches for investigating the mechanisms involved in adaptation to biotic and abiotic selective pressures in this species.
Collapse
Affiliation(s)
- Yann Dussert
- SAVE, INRA, Bordeaux Sciences Agro, Villenave d'Ornon, France
| | | | - Carole Couture
- SAVE, INRA, Bordeaux Sciences Agro, Villenave d'Ornon, France
| | - Jérôme Gouzy
- LIPM, INRA, Université de Toulouse, CNRS, Castanet-Tolosan, France
| | | | - Claire Kuchly
- US 1426 GeT-PlaGe, Genotoul, INRA, Castanet-Tolosan, France
| | | | | | - Pere Mestre
- SVQV, INRA, Université de Strasbourg, Colmar, France
| | | |
Collapse
|
29
|
Vakirlis N, Hebert AS, Opulente DA, Achaz G, Hittinger CT, Fischer G, Coon JJ, Lafontaine I. A Molecular Portrait of De Novo Genes in Yeasts. Mol Biol Evol 2019; 35:631-645. [PMID: 29220506 DOI: 10.1093/molbev/msx315] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
New genes, with novel protein functions, can evolve "from scratch" out of intergenic sequences. These de novo genes can integrate the cell's genetic network and drive important phenotypic innovations. Therefore, identifying de novo genes and understanding how the transition from noncoding to coding occurs are key problems in evolutionary biology. However, identifying de novo genes is a difficult task, hampered by the presence of remote homologs, fast evolving sequences and erroneously annotated protein coding genes. To overcome these limitations, we developed a procedure that handles the usual pitfalls in de novo gene identification and predicted the emergence of 703 de novo gene candidates in 15 yeast species from 2 genera whose phylogeny spans at least 100 million years of evolution. We validated 85 candidates by proteomic data, providing new translation evidence for 25 of them through mass spectrometry experiments. We also unambiguously identified the mutations that enabled the transition from noncoding to coding for 30 Saccharomyces de novo genes. We established that de novo gene origination is a widespread phenomenon in yeasts, only a few being ultimately maintained by selection. We also found that de novo genes preferentially emerge next to divergent promoters in GC-rich intergenic regions where the probability of finding a fortuitous and transcribed ORF is the highest. Finally, we found a more than 3-fold enrichment of de novo genes at recombination hot spots, which are GC-rich and nucleosome-free regions, suggesting that meiotic recombination contributes to de novo gene emergence in yeasts.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Sorbonne Universités, UPMC Univ Paris 06, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative UMR7238, 75005 Paris, France
| | - Alex S Hebert
- Genome Center of Wisconsin, University of Wisconsin-Madison, Madison, WI.,DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI
| | - Dana A Opulente
- Laboratory of Genetics, Genome Center of Wisconsin, J. F. Crow Institute for the Study of Evolution, Wisconsin Energy Institute, University of Wisconsin-Madison, Madison, WI
| | - Guillaume Achaz
- Atelier de BioInformatique, ISyEB UMR7205 Muséum National d'Histoire Naturelle, Paris, France.,SMILE Group, CIRB UMR7241, Collège de France, Paris, France
| | - Chris Todd Hittinger
- DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI.,Laboratory of Genetics, Genome Center of Wisconsin, J. F. Crow Institute for the Study of Evolution, Wisconsin Energy Institute, University of Wisconsin-Madison, Madison, WI
| | - Gilles Fischer
- Sorbonne Universités, UPMC Univ Paris 06, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative UMR7238, 75005 Paris, France
| | - Joshua J Coon
- Genome Center of Wisconsin, University of Wisconsin-Madison, Madison, WI.,DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI.,Department of Biomolecular Chemistry, University of Wisconsin-Madison, Madison, WI.,Department of Chemistry, University of Wisconsin-Madison, Madison, WI.,Morgridge Institute for Research, Madison, WI
| | - Ingrid Lafontaine
- Atelier de BioInformatique, ISyEB UMR7205 Muséum National d'Histoire Naturelle, Paris, France.,Sorbonne Universités, UPMC Univ Paris 06, CNRS, Institut de Biologie Physico-Chimique, Physiologie Membranaire et Moléculaire du Chloroplaste UMR7141, 75005 Paris, France
| |
Collapse
|
30
|
Moyers BA, Zhang J. Toward Reducing Phylostratigraphic Errors and Biases. Genome Biol Evol 2018; 10:2037-2048. [PMID: 30060201 PMCID: PMC6105108 DOI: 10.1093/gbe/evy161] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/28/2018] [Indexed: 01/03/2023] Open
Abstract
Phylostratigraphy is a method for estimating gene age, usually applied to large numbers of genes in order to detect nonrandom age-distributions of gene properties that could shed light on mechanisms of gene origination and evolution. However, phylostratigraphy underestimates gene age with a nonnegligible probability. The underestimation is severer for genes with certain properties, creating spurious age distributions of these properties and those correlated with these properties. Here we explore three strategies to reduce phylostratigraphic error/bias. First, we test several alternative homology detection methods (PSIBLAST, HMMER, PHMMER, OMA, and GLAM2Scan) in phylostratigraphy, but fail to find any that noticeably outperforms the commonly used BLASTP. Second, using machine learning, we look for predictors of error-prone genes to exclude from phylostratigraphy, but cannot identify reliable predictors. Finally, we remove from phylostratigraphic analysis genes exhibiting errors in simulation, which by definition minimizes error/bias if the simulation is sufficiently realistic. Using this last approach, we show that some previously reported phylostratigraphic trends (e.g., younger proteins tend to evolve more rapidly and be shorter) disappear or even reverse, reconfirming the necessity of controlling phylostratigraphic error/bias. Taken together, our analyses demonstrate that phylostratigraphic errors/biases are refractory to several potential solutions but can be controlled at least partially by the exclusion of error-prone genes identified via realistic simulations. These results are expected to stimulate the judicious use of error-aware phylostratigraphy and reevaluation of previous phylostratigraphic findings.
Collapse
Affiliation(s)
- Bryan A Moyers
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan
| |
Collapse
|
31
|
Yu L, Zhao J, Gao L. Predicting Potential Drugs for Breast Cancer based on miRNA and Tissue Specificity. Int J Biol Sci 2018; 14:971-982. [PMID: 29989066 PMCID: PMC6036744 DOI: 10.7150/ijbs.23350] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2017] [Accepted: 12/14/2017] [Indexed: 02/01/2023] Open
Abstract
Network-based computational method, with the emphasis on biomolecular interactions and biological data integration, has succeeded in drug development and created new directions, such as drug repositioning and drug combination. Drug repositioning, that is finding new uses for existing drugs to treat more patients, offers time, cost and efficiency benefits in drug development, especially when in silico techniques are used. MicroRNAs (miRNAs) play important roles in multiple biological processes and have attracted much scientific attention recently. Moreover, cumulative studies demonstrate that the mature miRNAs as well as their precursors can be targeted by small molecular drugs. At the same time, human diseases result from the disordered interplay of tissue- and cell lineage-specific processes. However, few computational researches predict drug-disease potential relationships based on miRNA data and tissue specificity. Therefore, based on miRNA data and the tissue specificity of diseases, we propose a new method named as miTS to predict the potential treatments for diseases. Firstly, based on miRNAs data, target genes and information of FDA (Food and Drug Administration) approved drugs, we evaluate the relationships between miRNAs and drugs in the tissue-specific PPI (protein-protein) network. Then, we construct a tripartite network: drug-miRNA-disease Finally, we obtain the potential drug-disease associations based on the tripartite network. In this paper, we take breast cancer as case study and focus on the top-30 predicted drugs. 25 of them (83.3%) are found having known connections with breast cancer in CTD (Comparative Toxicogenomics Database) benchmark and the other 5 drugs are potential drugs for breast cancer. We further evaluate the 5 newly predicted drugs from clinical records, literature mining, KEGG pathways enrichment analysis and overlapping genes between enriched pathways. For each of the 5 new drugs, strongly supported evidences can be found in three or more aspects. In particular, Regorafenib (DB08896) has 15 overlapping KEGG pathways with breast cancer and their p-values are all very small. In addition, whether in the literature curation or clinical validation, Regorafenib has a strong correlation with breast cancer. All the facts show that Regorafenib is likely to be a truly effective drug, worthy of our further study. It further follows that our method miTS is effective and practical for predicting new drug indications, which will provide potential values for treatments of complex diseases.
Collapse
Affiliation(s)
- Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, P.R. China
| | - Jin Zhao
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, P.R. China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, P.R. China
| |
Collapse
|
32
|
Banerjee S, Chakraborty S. Protein intrinsic disorder negatively associates with gene age in different eukaryotic lineages. MOLECULAR BIOSYSTEMS 2018; 13:2044-2055. [PMID: 28783193 DOI: 10.1039/c7mb00230k] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The emergence of new protein-coding genes in a specific lineage or species provides raw materials for evolutionary adaptations. Until recently, the biology of new genes emerging particularly from non-genic sequences remained unexplored. Although the new genes are subjected to variable selection pressure and face rapid deletion, some of them become functional and are retained in the gene pool. To acquire functional novelties, new genes often get integrated into the pre-existing ancestral networks. However, the mechanism by which young proteins acquire novel interactions remains unanswered till date. Since structural orientation contributes hugely to the mode of proteins' physical interactions, in this regard, we put forward an interesting question - Do new genes encode proteins with stable folds? Addressing the question, we demonstrated that the intrinsic disorder inversely correlates with the evolutionary gene ages - i.e. young proteins are richer in intrinsic disorder than the ancient ones. We further noted that young proteins, which are initially poorly connected hubs, prefer to be structurally more disordered than well-connected ancient proteins. The phenomenon strikingly defies the usual trend of well-connected proteins being highly disordered in structure. We justified that structural disorder might help poorly connected young proteins to undergo promiscuous interactions, which provides the foundation for novel protein interactions. The study focuses on the evolutionary perspectives of young proteins in the light of structural adaptations.
Collapse
Affiliation(s)
- Sanghita Banerjee
- Machine Intelligence Unit, Indian Statistical Institute, 203 Barrackpore Trunk Road, Kolkata 700108, India.
| | | |
Collapse
|
33
|
Translation of neutrally evolving peptides provides a basis for de novo gene evolution. Nat Ecol Evol 2018; 2:890-896. [DOI: 10.1038/s41559-018-0506-6] [Citation(s) in RCA: 85] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2017] [Accepted: 02/16/2018] [Indexed: 01/29/2023]
|
34
|
Carlson DE, Hedin M. Comparative transcriptomics of Entelegyne spiders (Araneae, Entelegynae), with emphasis on molecular evolution of orphan genes. PLoS One 2017; 12:e0174102. [PMID: 28379977 PMCID: PMC5381867 DOI: 10.1371/journal.pone.0174102] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2016] [Accepted: 03/04/2017] [Indexed: 11/18/2022] Open
Abstract
Next-generation sequencing technology is rapidly transforming the landscape of evolutionary biology, and has become a cost-effective and efficient means of collecting exome information for non-model organisms. Due to their taxonomic diversity, production of interesting venom and silk proteins, and the relative scarcity of existing genomic resources, spiders in particular are excellent targets for next-generation sequencing (NGS) methods. In this study, the transcriptomes of six entelegyne spider species from three genera (Cicurina travisae, C. vibora, Habronattus signatus, H. ustulatus, Nesticus bishopi, and N. cooperi) were sequenced and de novo assembled. Each assembly was assessed for quality and completeness and functionally annotated using gene ontology information. Approximately 100 transcripts with evidence of homology to venom proteins were discovered. After identifying more than 3,000 putatively orthologous genes across all six taxa, we used comparative analyses to identify 24 instances of positively selected genes. In addition, between ~ 550 and 1,100 unique orphan genes were found in each genus. These unique, uncharacterized genes exhibited elevated rates of amino acid substitution, potentially consistent with lineage-specific adaptive evolution. The data generated for this study represent a valuable resource for future phylogenetic and molecular evolutionary research, and our results provide new insight into the forces driving genome evolution in taxa that span the root of entelegyne spider phylogeny.
Collapse
Affiliation(s)
- David E. Carlson
- Department of Biology, San Diego State University, San Diego, California, United States of America
- Department of Ecology & Evolution, Stony Brook University, Stony Brook, New York, United States of America
| | - Marshal Hedin
- Department of Biology, San Diego State University, San Diego, California, United States of America
| |
Collapse
|
35
|
Pang E, Hao Y, Sun Y, Lin K. Differential variation patterns between hubs and bottlenecks in human protein-protein interaction networks. BMC Evol Biol 2016; 16:260. [PMID: 27903259 PMCID: PMC5131443 DOI: 10.1186/s12862-016-0840-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2016] [Accepted: 11/25/2016] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND The identification, description and understanding of protein-protein networks are important in cell biology and medicine, especially for the study of system biology where the focus concerns the interaction of biomolecules. Hubs and bottlenecks refer to the important proteins of a protein interaction network. Until now, very little attention has been paid to differentiate these two protein groups. RESULTS By integrating human protein-protein interaction networks and human genome-wide variations across populations, we described the differences between hubs and bottlenecks in this study. Our findings showed that similar to interspecies, hubs and bottlenecks changed significantly more slowly than non-hubs and non-bottlenecks. To distinguish hubs from bottlenecks, we extracted their special members: hub-non-bottlenecks and non-hub-bottlenecks. The differences between these two groups represent what is between hubs and bottlenecks. We found that the variation rate of hubs was significantly lower than that of bottlenecks. In addition, we verified that stronger constraint is exerted on hubs than on bottlenecks. We further observed fewer non-synonymous sites on the domains of hubs than on those of bottlenecks and different molecular functions between them. CONCLUSIONS Based on these results, we conclude that in recent human history, different variation patterns exist in hubs and bottlenecks in protein interaction networks. By revealing the difference between hubs and bottlenecks, our results might provide further insights in the relationship between evolution and biological structure.
Collapse
Affiliation(s)
- Erli Pang
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, No 19 Xinjiekouwai Street, Beijing, 100875, China. .,Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China.
| | - Yu Hao
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, No 19 Xinjiekouwai Street, Beijing, 100875, China.,Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Ying Sun
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, No 19 Xinjiekouwai Street, Beijing, 100875, China.,Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Kui Lin
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, No 19 Xinjiekouwai Street, Beijing, 100875, China.,Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| |
Collapse
|
36
|
Lopes KDP, Campos-Laborie FJ, Vialle RA, Ortega JM, De Las Rivas J. Evolutionary hallmarks of the human proteome: chasing the age and coregulation of protein-coding genes. BMC Genomics 2016; 17:725. [PMID: 27801289 PMCID: PMC5088522 DOI: 10.1186/s12864-016-3062-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Background The development of large-scale technologies for quantitative transcriptomics has enabled comprehensive analysis of the gene expression profiles in complete genomes. RNA-Seq allows the measurement of gene expression levels in a manner far more precise and global than previous methods. Studies using this technology are altering our view about the extent and complexity of the eukaryotic transcriptomes. In this respect, multiple efforts have been done to determine and analyse the gene expression patterns of human cell types in different conditions, either in normal or pathological states. However, until recently, little has been reported about the evolutionary marks present in human protein-coding genes, particularly from the combined perspective of gene expression and protein evolution. Results We present a combined analysis of human protein-coding gene expression profiling and time-scale ancestry mapping, that places the genes in taxonomy clades and reveals eight evolutionary major steps (“hallmarks”), that include clusters of functionally coherent proteins. The human expressed genes are analysed using a RNA-Seq dataset of 116 samples from 32 tissues. The evolutionary analysis of the human proteins is performed combining the information from: (i) a database of orthologous proteins (OMA), (ii) the taxonomy mapping of genes to lineage clades (from NCBI Taxonomy) and (iii) the evolution time-scale mapping provided by TimeTree (Timescale of Life). The human protein-coding genes are also placed in a relational context based in the construction of a robust gene coexpression network, that reveals tighter links between age-related protein-coding genes and finds functionally coherent gene modules. Conclusions Understanding the relational landscape of the human protein-coding genes is essential for interpreting the functional elements and modules of our active genome. Moreover, decoding the evolutionary history of the human genes can provide very valuable information to reveal or uncover their origin and function. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3062-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Katia de Paiva Lopes
- Bioinformatics and Functional Genomics Group, Cancer Research Center (CiC-IBMCC, CSIC/USAL/IBSAL), Consejo Superior de Investigaciones Cientificas (CSIC), Salamanca, Spain.,Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas (ICB), Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brasil
| | - Francisco José Campos-Laborie
- Bioinformatics and Functional Genomics Group, Cancer Research Center (CiC-IBMCC, CSIC/USAL/IBSAL), Consejo Superior de Investigaciones Cientificas (CSIC), Salamanca, Spain
| | - Ricardo Assunção Vialle
- Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas (ICB), Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brasil
| | - José Miguel Ortega
- Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas (ICB), Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brasil
| | - Javier De Las Rivas
- Bioinformatics and Functional Genomics Group, Cancer Research Center (CiC-IBMCC, CSIC/USAL/IBSAL), Consejo Superior de Investigaciones Cientificas (CSIC), Salamanca, Spain.
| |
Collapse
|
37
|
Abstract
As genes originate at different evolutionary times, they harbor distinctive genomic signatures of evolutionary ages. Although previous studies have investigated different gene age-related signatures, what signatures dominantly associate with gene age remains unresolved. Here we address this question via a combined approach of comprehensive assignment of gene ages, gene family identification, and multivariate analyses. We first provide a comprehensive and improved gene age assignment by combining homolog clustering with phylogeny inference and categorize human genes into 26 age classes spanning the whole tree of life. We then explore the dominant age-related signatures based on a collection of 10 potential signatures (including gene composition, gene length, selection pressure, expression level, connectivity in protein–protein interaction network and DNA methylation). Our results show that GC content and connectivity in protein–protein interaction network (PPIN) associate dominantly with gene age. Furthermore, we investigate the heterogeneity of dominant signatures in duplicates and singletons. We find that GC content is a consistent primary factor of gene age in duplicates and singletons, whereas PPIN is more strongly associated with gene age in singletons than in duplicates. Taken together, GC content and PPIN are two dominant signatures in close association with gene age, exhibiting heterogeneity in duplicates and singletons and presumably reflecting complex differential interplays between natural selection and mutation.
Collapse
Affiliation(s)
- Hongyan Yin
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing, China BIG Data Center, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China
| | - Guangyu Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing, China BIG Data Center, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China
| | - Lina Ma
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing, China BIG Data Center, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing, China
| | - Soojin V Yi
- School of Biology, Georgia Institute of Technology, Atlanta
| | - Zhang Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing, China BIG Data Center, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
38
|
Saber MM, Adeyemi Babarinde I, Hettiarachchi N, Saitou N. Emergence and Evolution of Hominidae-Specific Coding and Noncoding Genomic Sequences. Genome Biol Evol 2016; 8:2076-92. [PMID: 27289096 PMCID: PMC4987104 DOI: 10.1093/gbe/evw132] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Family Hominidae, which includes humans and great apes, is recognized for unique complex social behavior and intellectual abilities. Despite the increasing genome data, however, the genomic origin of its phenotypic uniqueness has remained elusive. Clade-specific genes and highly conserved noncoding sequences (HCNSs) are among the high-potential evolutionary candidates involved in driving clade-specific characters and phenotypes. On this premise, we analyzed whole genome sequences along with gene orthology data retrieved from major DNA databases to find Hominidae-specific (HS) genes and HCNSs. We discovered that Down syndrome critical region 4 (DSCR4) is the only experimentally verified gene uniquely present in Hominidae. DSCR4 has no structural homology to any known protein and was inferred to have emerged in several steps through LTR/ERV1, LTR/ERVL retrotransposition, and transversion. Using the genomic distance as neutral evolution threshold, we identified 1,658 HS HCNSs. Polymorphism coverage and derived allele frequency analysis of HS HCNSs showed that these HCNSs are under purifying selection, indicating that they may harbor important functions. They are overrepresented in promoters/untranslated regions, in close proximity of genes involved in sensory perception of sound and developmental process, and also showed a significantly lower nucleosome occupancy probability. Interestingly, many ancestral sequences of the HS HCNSs showed very high evolutionary rates. This suggests that new functions emerged through some kind of positive selection, and then purifying selection started to operate to keep these functions.
Collapse
Affiliation(s)
- Morteza Mahmoudi Saber
- Department of Biological Sciences, Graduate School of Science, University of Tokyo Division of Population Genetics, National Institute of Genetics, Mishima, Japan
| | - Isaac Adeyemi Babarinde
- Division of Population Genetics, National Institute of Genetics, Mishima, Japan Department of Genetics, School of Life Science, Graduate University for Advanced Studies (SOKENDAI), Mishima, Japan
| | - Nilmini Hettiarachchi
- Division of Population Genetics, National Institute of Genetics, Mishima, Japan Department of Genetics, School of Life Science, Graduate University for Advanced Studies (SOKENDAI), Mishima, Japan
| | - Naruya Saitou
- Department of Biological Sciences, Graduate School of Science, University of Tokyo Division of Population Genetics, National Institute of Genetics, Mishima, Japan Department of Genetics, School of Life Science, Graduate University for Advanced Studies (SOKENDAI), Mishima, Japan
| |
Collapse
|
39
|
Prabh N, Rödelsperger C. Are orphan genes protein-coding, prediction artifacts, or non-coding RNAs? BMC Bioinformatics 2016; 17:226. [PMID: 27245157 PMCID: PMC4888513 DOI: 10.1186/s12859-016-1102-x] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Accepted: 05/24/2016] [Indexed: 12/26/2022] Open
Abstract
Background Current genome sequencing projects reveal substantial numbers of taxonomically restricted, so called orphan genes that lack homology with genes from other evolutionary lineages. However, it is not clear to what extent orphan genes are real, genomic artifacts, or represent non-coding RNAs. Results Here, we use a simple set of assumptions to test the nature of orphan genes. First, a sequence that is transcribed is considered a real biological entity. Second, every sequence that is supported by proteome data or shows a depletion of non-synonymous substitutions is a protein-coding gene. Using genomic, transcriptomic and proteomic data for the nematode Pristionchus pacificus, we show that between 4129–7997 (42–81 %) of predicted orphan genes are expressed and 3818–7545 (39–76 %) of orphan genes are under negative selection. In three cases that exhibited strong evolutionary constraint but lacked expression evidence in 14 RNA-seq samples, we could experimentally validate the predicted gene structures. Comparing different data sets to infer selection on orphan gene clusters, we find that the presence of a closely related genome provides the most powerful resource to robustly identify evidence of negative selection. However, even in the absence of other genomic data, the availability of paralogous sequences was enough to show negative selection in 8–10 % of orphan genes. Conclusions Our study shows that the great majority of previously identified orphan genes in P. pacificus are indeed protein-coding genes. Even though this work represents a case study on a single species, our approach can be transferred to genomic data of other non-model organisms in order to ascertain the protein-coding nature of orphan genes.
Collapse
Affiliation(s)
- Neel Prabh
- Department for Evolutionary Biology, Max-Planck-Institute for Developmental Biology, Spemannstrasse 35, 72076, Tübingen, Germany
| | - Christian Rödelsperger
- Department for Evolutionary Biology, Max-Planck-Institute for Developmental Biology, Spemannstrasse 35, 72076, Tübingen, Germany.
| |
Collapse
|
40
|
Xu Y, Wu G, Hao B, Chen L, Deng X, Xu Q. Identification, characterization and expression analysis of lineage-specific genes within sweet orange (Citrus sinensis). BMC Genomics 2015; 16:995. [PMID: 26597278 PMCID: PMC4657247 DOI: 10.1186/s12864-015-2211-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2015] [Accepted: 11/13/2015] [Indexed: 11/23/2022] Open
Abstract
Background With the availability of rapidly increasing number of genome and transcriptome sequences, lineage-specific genes (LSGs) can be identified and characterized. Like other conserved functional genes, LSGs play important roles in biological evolution and functions. Results Two set of citrus LSGs, 296 citrus-specific genes (CSGs) and 1039 orphan genes specific to sweet orange, were identified by comparative analysis between the sweet orange genome sequences and 41 genomes and 273 transcriptomes. With the two sets of genes, gene structure and gene expression pattern were investigated. On average, both the CSGs and orphan genes have fewer exons, shorter gene length and higher GC content when compared with those evolutionarily conserved genes (ECs). Expression profiling indicated that most of the LSGs expressed in various tissues of sweet orange and some of them exhibited distinct temporal and spatial expression patterns. Particularly, the orphan genes were preferentially expressed in callus, which is an important pluripotent tissue of citrus. Besides, part of the CSGs and orphan genes expressed responsive to abiotic stress, indicating their potential functions during interaction with environment. Conclusion This study identified and characterized two sets of LSGs in citrus, dissected their sequence features and expression patterns, and provided valuable clues for future functional analysis of the LSGs in sweet orange. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2211-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yuantao Xu
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), Huazhong Agricultural University, Wuhan, 430070, China.
| | - Guizhi Wu
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), Huazhong Agricultural University, Wuhan, 430070, China.
| | - Baohai Hao
- Agricultural Bioinformatics Key laboratory of Hubei Province, College of Information, Huazhong Agricultural University, Wuhan, 430070, China.
| | - Lingling Chen
- Agricultural Bioinformatics Key laboratory of Hubei Province, College of Information, Huazhong Agricultural University, Wuhan, 430070, China.
| | - Xiuxin Deng
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), Huazhong Agricultural University, Wuhan, 430070, China.
| | - Qiang Xu
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
41
|
Choi JH, Balasubramanian R, Lee PH, Shaw ND, Hall JE, Plummer L, Buck CL, Kottler ML, Jarzabek K, Wołczynski S, Quinton R, Latronico AC, Dode C, Ogata T, Kim HG, Layman LC, Gusella JF, Crowley WF. Expanding the Spectrum of Founder Mutations Causing Isolated Gonadotropin-Releasing Hormone Deficiency. J Clin Endocrinol Metab 2015; 100. [PMID: 26207952 PMCID: PMC4596034 DOI: 10.1210/jc.2015-2262] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
CONTEXT Loss of function (LoF) mutations in more than 20 genes are now known to cause isolated GnRH deficiency (IGD) in humans. Most causal IGD mutations are typically private, ie, limited to a single individual/pedigree. However, somewhat paradoxically, four IGD genes (GNRH1, TAC3, PROKR2, and GNRHR) have been shown to harbor LoF founder mutations that are shared by multiple unrelated individuals. It is not known whether similar founder mutations occur in other IGD genes. OBJECTIVE The objective of the study was to determine whether shared deleterious mutations in IGD-associated genes represent founder alleles. SETTING This study was an international collaboration among academic medical centers. METHODS IGD patients with shared mutations, defined as those documented in three or more unrelated probands in 14 IGD-associated genes, were identified from various academic institutions, the Human Gene Mutation Database, and literature reports by other international investigators. Haplotypes of single-nucleotide polymorphisms and short tandem repeats surrounding the mutations were constructed to assess genetic ancestry. RESULTS A total of eight founder mutations in five genes, GNRHR (Q106R, R262Q, R139H), TACR3 (W275X), PROKR2 (R85H), FGFR1 (R250Q, G687R), and HS6ST1 (R382W) were identified. Most founder alleles were present at low frequency in the general population. The estimated age of these mutant alleles ranged from 1925 to 5600 years and corresponded to the time of rapid human population expansion. CONCLUSIONS We have expanded the spectrum of founder alleles associated with IGD to a total of eight founder mutations. In contrast to the approximately 9000-year-old PROKR2 founder allele that may confer a heterozygote advantage, the rest of the founder alleles are relatively more recent in origin, in keeping with the timing of recent human population expansion and any selective heterozygote advantage of these alleles requires further evaluation.
Collapse
Affiliation(s)
- Jin-Ho Choi
- Harvard Reproductive Endocrine Sciences Center and Reproductive Endocrine Unit (J.-H.C., R.B., N.D.S., J.E.H., L.P., C.L.B., W.F.C.), and Department of Medicine, Psychiatric, and Neurodevelopmental Genetics Unit (P.H.L.), Analytic and Translational Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, and Center for Human Genetic Research (J.F.G.), Massachusetts General Hospital, and Department of Genetics, Harvard Medical School, Boston, Massachusetts Boston, Massachusetts 02114; Department of Genetics (M.-L.K.), University Hospital, Caen, 14003, Caen Cedex, France; Department of Biology and Pathology of Human Reproduction in Bialystok (K.J.), Institute of Animal Reproduction and Food Research, Polish Academy of Sciences, Olsztyn, and Department of Reproduction and Gynecological Endocrinology (S.W.), Medical University of Bialystok, Sklodowskiej 24A, 15-276 Bialystok, Poland; Institute for Genetic Medicine (R.Q.), Newcastle University, Newcastle-upon-Tyne, NE1 3BZ, United Kingdom; Disciplina de Endocrinologia (A.C.L.), Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, 05403-900 Sao Paulo, Brazil; Laboratoire de Biochimie et Génétique Moléculaire (C.D.), Hôpital Cochin, Assistance Publique-Hôpitaux de Paris, Université Paris-Descartes, 75014 Paris, France; Departments of Molecular Endocrinology and Pediatrics (T.O.), Hamamatsu University of School of Medicine, Hamamatsu 431-3192, Japan; Section of Reproductive Endocrinology, Infertility, and Genetics (H.-G.K., L.C.L.), Departments of Obstetrics and Gynecology and Neuroscience and Regenerative Medicine, Medical College of Georgia at Georgia Regents University, Augusta, Georgia 30912; and Department of Pediatrics (J.-H.C.), Asan Medical Center Children's Hospital, University of Ulsan College of Medicine, Seoul 138-736, Republic of Korea
| | - Ravikumar Balasubramanian
- Harvard Reproductive Endocrine Sciences Center and Reproductive Endocrine Unit (J.-H.C., R.B., N.D.S., J.E.H., L.P., C.L.B., W.F.C.), and Department of Medicine, Psychiatric, and Neurodevelopmental Genetics Unit (P.H.L.), Analytic and Translational Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, and Center for Human Genetic Research (J.F.G.), Massachusetts General Hospital, and Department of Genetics, Harvard Medical School, Boston, Massachusetts Boston, Massachusetts 02114; Department of Genetics (M.-L.K.), University Hospital, Caen, 14003, Caen Cedex, France; Department of Biology and Pathology of Human Reproduction in Bialystok (K.J.), Institute of Animal Reproduction and Food Research, Polish Academy of Sciences, Olsztyn, and Department of Reproduction and Gynecological Endocrinology (S.W.), Medical University of Bialystok, Sklodowskiej 24A, 15-276 Bialystok, Poland; Institute for Genetic Medicine (R.Q.), Newcastle University, Newcastle-upon-Tyne, NE1 3BZ, United Kingdom; Disciplina de Endocrinologia (A.C.L.), Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, 05403-900 Sao Paulo, Brazil; Laboratoire de Biochimie et Génétique Moléculaire (C.D.), Hôpital Cochin, Assistance Publique-Hôpitaux de Paris, Université Paris-Descartes, 75014 Paris, France; Departments of Molecular Endocrinology and Pediatrics (T.O.), Hamamatsu University of School of Medicine, Hamamatsu 431-3192, Japan; Section of Reproductive Endocrinology, Infertility, and Genetics (H.-G.K., L.C.L.), Departments of Obstetrics and Gynecology and Neuroscience and Regenerative Medicine, Medical College of Georgia at Georgia Regents University, Augusta, Georgia 30912; and Department of Pediatrics (J.-H.C.), Asan Medical Center Children's Hospital, University of Ulsan College of Medicine, Seoul 138-736, Republic of Korea
| | - Phil H Lee
- Harvard Reproductive Endocrine Sciences Center and Reproductive Endocrine Unit (J.-H.C., R.B., N.D.S., J.E.H., L.P., C.L.B., W.F.C.), and Department of Medicine, Psychiatric, and Neurodevelopmental Genetics Unit (P.H.L.), Analytic and Translational Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, and Center for Human Genetic Research (J.F.G.), Massachusetts General Hospital, and Department of Genetics, Harvard Medical School, Boston, Massachusetts Boston, Massachusetts 02114; Department of Genetics (M.-L.K.), University Hospital, Caen, 14003, Caen Cedex, France; Department of Biology and Pathology of Human Reproduction in Bialystok (K.J.), Institute of Animal Reproduction and Food Research, Polish Academy of Sciences, Olsztyn, and Department of Reproduction and Gynecological Endocrinology (S.W.), Medical University of Bialystok, Sklodowskiej 24A, 15-276 Bialystok, Poland; Institute for Genetic Medicine (R.Q.), Newcastle University, Newcastle-upon-Tyne, NE1 3BZ, United Kingdom; Disciplina de Endocrinologia (A.C.L.), Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, 05403-900 Sao Paulo, Brazil; Laboratoire de Biochimie et Génétique Moléculaire (C.D.), Hôpital Cochin, Assistance Publique-Hôpitaux de Paris, Université Paris-Descartes, 75014 Paris, France; Departments of Molecular Endocrinology and Pediatrics (T.O.), Hamamatsu University of School of Medicine, Hamamatsu 431-3192, Japan; Section of Reproductive Endocrinology, Infertility, and Genetics (H.-G.K., L.C.L.), Departments of Obstetrics and Gynecology and Neuroscience and Regenerative Medicine, Medical College of Georgia at Georgia Regents University, Augusta, Georgia 30912; and Department of Pediatrics (J.-H.C.), Asan Medical Center Children's Hospital, University of Ulsan College of Medicine, Seoul 138-736, Republic of Korea
| | - Natalie D Shaw
- Harvard Reproductive Endocrine Sciences Center and Reproductive Endocrine Unit (J.-H.C., R.B., N.D.S., J.E.H., L.P., C.L.B., W.F.C.), and Department of Medicine, Psychiatric, and Neurodevelopmental Genetics Unit (P.H.L.), Analytic and Translational Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, and Center for Human Genetic Research (J.F.G.), Massachusetts General Hospital, and Department of Genetics, Harvard Medical School, Boston, Massachusetts Boston, Massachusetts 02114; Department of Genetics (M.-L.K.), University Hospital, Caen, 14003, Caen Cedex, France; Department of Biology and Pathology of Human Reproduction in Bialystok (K.J.), Institute of Animal Reproduction and Food Research, Polish Academy of Sciences, Olsztyn, and Department of Reproduction and Gynecological Endocrinology (S.W.), Medical University of Bialystok, Sklodowskiej 24A, 15-276 Bialystok, Poland; Institute for Genetic Medicine (R.Q.), Newcastle University, Newcastle-upon-Tyne, NE1 3BZ, United Kingdom; Disciplina de Endocrinologia (A.C.L.), Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, 05403-900 Sao Paulo, Brazil; Laboratoire de Biochimie et Génétique Moléculaire (C.D.), Hôpital Cochin, Assistance Publique-Hôpitaux de Paris, Université Paris-Descartes, 75014 Paris, France; Departments of Molecular Endocrinology and Pediatrics (T.O.), Hamamatsu University of School of Medicine, Hamamatsu 431-3192, Japan; Section of Reproductive Endocrinology, Infertility, and Genetics (H.-G.K., L.C.L.), Departments of Obstetrics and Gynecology and Neuroscience and Regenerative Medicine, Medical College of Georgia at Georgia Regents University, Augusta, Georgia 30912; and Department of Pediatrics (J.-H.C.), Asan Medical Center Children's Hospital, University of Ulsan College of Medicine, Seoul 138-736, Republic of Korea
| | - Janet E Hall
- Harvard Reproductive Endocrine Sciences Center and Reproductive Endocrine Unit (J.-H.C., R.B., N.D.S., J.E.H., L.P., C.L.B., W.F.C.), and Department of Medicine, Psychiatric, and Neurodevelopmental Genetics Unit (P.H.L.), Analytic and Translational Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, and Center for Human Genetic Research (J.F.G.), Massachusetts General Hospital, and Department of Genetics, Harvard Medical School, Boston, Massachusetts Boston, Massachusetts 02114; Department of Genetics (M.-L.K.), University Hospital, Caen, 14003, Caen Cedex, France; Department of Biology and Pathology of Human Reproduction in Bialystok (K.J.), Institute of Animal Reproduction and Food Research, Polish Academy of Sciences, Olsztyn, and Department of Reproduction and Gynecological Endocrinology (S.W.), Medical University of Bialystok, Sklodowskiej 24A, 15-276 Bialystok, Poland; Institute for Genetic Medicine (R.Q.), Newcastle University, Newcastle-upon-Tyne, NE1 3BZ, United Kingdom; Disciplina de Endocrinologia (A.C.L.), Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, 05403-900 Sao Paulo, Brazil; Laboratoire de Biochimie et Génétique Moléculaire (C.D.), Hôpital Cochin, Assistance Publique-Hôpitaux de Paris, Université Paris-Descartes, 75014 Paris, France; Departments of Molecular Endocrinology and Pediatrics (T.O.), Hamamatsu University of School of Medicine, Hamamatsu 431-3192, Japan; Section of Reproductive Endocrinology, Infertility, and Genetics (H.-G.K., L.C.L.), Departments of Obstetrics and Gynecology and Neuroscience and Regenerative Medicine, Medical College of Georgia at Georgia Regents University, Augusta, Georgia 30912; and Department of Pediatrics (J.-H.C.), Asan Medical Center Children's Hospital, University of Ulsan College of Medicine, Seoul 138-736, Republic of Korea
| | - Lacey Plummer
- Harvard Reproductive Endocrine Sciences Center and Reproductive Endocrine Unit (J.-H.C., R.B., N.D.S., J.E.H., L.P., C.L.B., W.F.C.), and Department of Medicine, Psychiatric, and Neurodevelopmental Genetics Unit (P.H.L.), Analytic and Translational Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, and Center for Human Genetic Research (J.F.G.), Massachusetts General Hospital, and Department of Genetics, Harvard Medical School, Boston, Massachusetts Boston, Massachusetts 02114; Department of Genetics (M.-L.K.), University Hospital, Caen, 14003, Caen Cedex, France; Department of Biology and Pathology of Human Reproduction in Bialystok (K.J.), Institute of Animal Reproduction and Food Research, Polish Academy of Sciences, Olsztyn, and Department of Reproduction and Gynecological Endocrinology (S.W.), Medical University of Bialystok, Sklodowskiej 24A, 15-276 Bialystok, Poland; Institute for Genetic Medicine (R.Q.), Newcastle University, Newcastle-upon-Tyne, NE1 3BZ, United Kingdom; Disciplina de Endocrinologia (A.C.L.), Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, 05403-900 Sao Paulo, Brazil; Laboratoire de Biochimie et Génétique Moléculaire (C.D.), Hôpital Cochin, Assistance Publique-Hôpitaux de Paris, Université Paris-Descartes, 75014 Paris, France; Departments of Molecular Endocrinology and Pediatrics (T.O.), Hamamatsu University of School of Medicine, Hamamatsu 431-3192, Japan; Section of Reproductive Endocrinology, Infertility, and Genetics (H.-G.K., L.C.L.), Departments of Obstetrics and Gynecology and Neuroscience and Regenerative Medicine, Medical College of Georgia at Georgia Regents University, Augusta, Georgia 30912; and Department of Pediatrics (J.-H.C.), Asan Medical Center Children's Hospital, University of Ulsan College of Medicine, Seoul 138-736, Republic of Korea
| | - Cassandra L Buck
- Harvard Reproductive Endocrine Sciences Center and Reproductive Endocrine Unit (J.-H.C., R.B., N.D.S., J.E.H., L.P., C.L.B., W.F.C.), and Department of Medicine, Psychiatric, and Neurodevelopmental Genetics Unit (P.H.L.), Analytic and Translational Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, and Center for Human Genetic Research (J.F.G.), Massachusetts General Hospital, and Department of Genetics, Harvard Medical School, Boston, Massachusetts Boston, Massachusetts 02114; Department of Genetics (M.-L.K.), University Hospital, Caen, 14003, Caen Cedex, France; Department of Biology and Pathology of Human Reproduction in Bialystok (K.J.), Institute of Animal Reproduction and Food Research, Polish Academy of Sciences, Olsztyn, and Department of Reproduction and Gynecological Endocrinology (S.W.), Medical University of Bialystok, Sklodowskiej 24A, 15-276 Bialystok, Poland; Institute for Genetic Medicine (R.Q.), Newcastle University, Newcastle-upon-Tyne, NE1 3BZ, United Kingdom; Disciplina de Endocrinologia (A.C.L.), Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, 05403-900 Sao Paulo, Brazil; Laboratoire de Biochimie et Génétique Moléculaire (C.D.), Hôpital Cochin, Assistance Publique-Hôpitaux de Paris, Université Paris-Descartes, 75014 Paris, France; Departments of Molecular Endocrinology and Pediatrics (T.O.), Hamamatsu University of School of Medicine, Hamamatsu 431-3192, Japan; Section of Reproductive Endocrinology, Infertility, and Genetics (H.-G.K., L.C.L.), Departments of Obstetrics and Gynecology and Neuroscience and Regenerative Medicine, Medical College of Georgia at Georgia Regents University, Augusta, Georgia 30912; and Department of Pediatrics (J.-H.C.), Asan Medical Center Children's Hospital, University of Ulsan College of Medicine, Seoul 138-736, Republic of Korea
| | - Marie-Laure Kottler
- Harvard Reproductive Endocrine Sciences Center and Reproductive Endocrine Unit (J.-H.C., R.B., N.D.S., J.E.H., L.P., C.L.B., W.F.C.), and Department of Medicine, Psychiatric, and Neurodevelopmental Genetics Unit (P.H.L.), Analytic and Translational Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, and Center for Human Genetic Research (J.F.G.), Massachusetts General Hospital, and Department of Genetics, Harvard Medical School, Boston, Massachusetts Boston, Massachusetts 02114; Department of Genetics (M.-L.K.), University Hospital, Caen, 14003, Caen Cedex, France; Department of Biology and Pathology of Human Reproduction in Bialystok (K.J.), Institute of Animal Reproduction and Food Research, Polish Academy of Sciences, Olsztyn, and Department of Reproduction and Gynecological Endocrinology (S.W.), Medical University of Bialystok, Sklodowskiej 24A, 15-276 Bialystok, Poland; Institute for Genetic Medicine (R.Q.), Newcastle University, Newcastle-upon-Tyne, NE1 3BZ, United Kingdom; Disciplina de Endocrinologia (A.C.L.), Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, 05403-900 Sao Paulo, Brazil; Laboratoire de Biochimie et Génétique Moléculaire (C.D.), Hôpital Cochin, Assistance Publique-Hôpitaux de Paris, Université Paris-Descartes, 75014 Paris, France; Departments of Molecular Endocrinology and Pediatrics (T.O.), Hamamatsu University of School of Medicine, Hamamatsu 431-3192, Japan; Section of Reproductive Endocrinology, Infertility, and Genetics (H.-G.K., L.C.L.), Departments of Obstetrics and Gynecology and Neuroscience and Regenerative Medicine, Medical College of Georgia at Georgia Regents University, Augusta, Georgia 30912; and Department of Pediatrics (J.-H.C.), Asan Medical Center Children's Hospital, University of Ulsan College of Medicine, Seoul 138-736, Republic of Korea
| | - Katarzyna Jarzabek
- Harvard Reproductive Endocrine Sciences Center and Reproductive Endocrine Unit (J.-H.C., R.B., N.D.S., J.E.H., L.P., C.L.B., W.F.C.), and Department of Medicine, Psychiatric, and Neurodevelopmental Genetics Unit (P.H.L.), Analytic and Translational Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, and Center for Human Genetic Research (J.F.G.), Massachusetts General Hospital, and Department of Genetics, Harvard Medical School, Boston, Massachusetts Boston, Massachusetts 02114; Department of Genetics (M.-L.K.), University Hospital, Caen, 14003, Caen Cedex, France; Department of Biology and Pathology of Human Reproduction in Bialystok (K.J.), Institute of Animal Reproduction and Food Research, Polish Academy of Sciences, Olsztyn, and Department of Reproduction and Gynecological Endocrinology (S.W.), Medical University of Bialystok, Sklodowskiej 24A, 15-276 Bialystok, Poland; Institute for Genetic Medicine (R.Q.), Newcastle University, Newcastle-upon-Tyne, NE1 3BZ, United Kingdom; Disciplina de Endocrinologia (A.C.L.), Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, 05403-900 Sao Paulo, Brazil; Laboratoire de Biochimie et Génétique Moléculaire (C.D.), Hôpital Cochin, Assistance Publique-Hôpitaux de Paris, Université Paris-Descartes, 75014 Paris, France; Departments of Molecular Endocrinology and Pediatrics (T.O.), Hamamatsu University of School of Medicine, Hamamatsu 431-3192, Japan; Section of Reproductive Endocrinology, Infertility, and Genetics (H.-G.K., L.C.L.), Departments of Obstetrics and Gynecology and Neuroscience and Regenerative Medicine, Medical College of Georgia at Georgia Regents University, Augusta, Georgia 30912; and Department of Pediatrics (J.-H.C.), Asan Medical Center Children's Hospital, University of Ulsan College of Medicine, Seoul 138-736, Republic of Korea
| | - Sławomir Wołczynski
- Harvard Reproductive Endocrine Sciences Center and Reproductive Endocrine Unit (J.-H.C., R.B., N.D.S., J.E.H., L.P., C.L.B., W.F.C.), and Department of Medicine, Psychiatric, and Neurodevelopmental Genetics Unit (P.H.L.), Analytic and Translational Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, and Center for Human Genetic Research (J.F.G.), Massachusetts General Hospital, and Department of Genetics, Harvard Medical School, Boston, Massachusetts Boston, Massachusetts 02114; Department of Genetics (M.-L.K.), University Hospital, Caen, 14003, Caen Cedex, France; Department of Biology and Pathology of Human Reproduction in Bialystok (K.J.), Institute of Animal Reproduction and Food Research, Polish Academy of Sciences, Olsztyn, and Department of Reproduction and Gynecological Endocrinology (S.W.), Medical University of Bialystok, Sklodowskiej 24A, 15-276 Bialystok, Poland; Institute for Genetic Medicine (R.Q.), Newcastle University, Newcastle-upon-Tyne, NE1 3BZ, United Kingdom; Disciplina de Endocrinologia (A.C.L.), Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, 05403-900 Sao Paulo, Brazil; Laboratoire de Biochimie et Génétique Moléculaire (C.D.), Hôpital Cochin, Assistance Publique-Hôpitaux de Paris, Université Paris-Descartes, 75014 Paris, France; Departments of Molecular Endocrinology and Pediatrics (T.O.), Hamamatsu University of School of Medicine, Hamamatsu 431-3192, Japan; Section of Reproductive Endocrinology, Infertility, and Genetics (H.-G.K., L.C.L.), Departments of Obstetrics and Gynecology and Neuroscience and Regenerative Medicine, Medical College of Georgia at Georgia Regents University, Augusta, Georgia 30912; and Department of Pediatrics (J.-H.C.), Asan Medical Center Children's Hospital, University of Ulsan College of Medicine, Seoul 138-736, Republic of Korea
| | - Richard Quinton
- Harvard Reproductive Endocrine Sciences Center and Reproductive Endocrine Unit (J.-H.C., R.B., N.D.S., J.E.H., L.P., C.L.B., W.F.C.), and Department of Medicine, Psychiatric, and Neurodevelopmental Genetics Unit (P.H.L.), Analytic and Translational Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, and Center for Human Genetic Research (J.F.G.), Massachusetts General Hospital, and Department of Genetics, Harvard Medical School, Boston, Massachusetts Boston, Massachusetts 02114; Department of Genetics (M.-L.K.), University Hospital, Caen, 14003, Caen Cedex, France; Department of Biology and Pathology of Human Reproduction in Bialystok (K.J.), Institute of Animal Reproduction and Food Research, Polish Academy of Sciences, Olsztyn, and Department of Reproduction and Gynecological Endocrinology (S.W.), Medical University of Bialystok, Sklodowskiej 24A, 15-276 Bialystok, Poland; Institute for Genetic Medicine (R.Q.), Newcastle University, Newcastle-upon-Tyne, NE1 3BZ, United Kingdom; Disciplina de Endocrinologia (A.C.L.), Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, 05403-900 Sao Paulo, Brazil; Laboratoire de Biochimie et Génétique Moléculaire (C.D.), Hôpital Cochin, Assistance Publique-Hôpitaux de Paris, Université Paris-Descartes, 75014 Paris, France; Departments of Molecular Endocrinology and Pediatrics (T.O.), Hamamatsu University of School of Medicine, Hamamatsu 431-3192, Japan; Section of Reproductive Endocrinology, Infertility, and Genetics (H.-G.K., L.C.L.), Departments of Obstetrics and Gynecology and Neuroscience and Regenerative Medicine, Medical College of Georgia at Georgia Regents University, Augusta, Georgia 30912; and Department of Pediatrics (J.-H.C.), Asan Medical Center Children's Hospital, University of Ulsan College of Medicine, Seoul 138-736, Republic of Korea
| | - Ana Claudia Latronico
- Harvard Reproductive Endocrine Sciences Center and Reproductive Endocrine Unit (J.-H.C., R.B., N.D.S., J.E.H., L.P., C.L.B., W.F.C.), and Department of Medicine, Psychiatric, and Neurodevelopmental Genetics Unit (P.H.L.), Analytic and Translational Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, and Center for Human Genetic Research (J.F.G.), Massachusetts General Hospital, and Department of Genetics, Harvard Medical School, Boston, Massachusetts Boston, Massachusetts 02114; Department of Genetics (M.-L.K.), University Hospital, Caen, 14003, Caen Cedex, France; Department of Biology and Pathology of Human Reproduction in Bialystok (K.J.), Institute of Animal Reproduction and Food Research, Polish Academy of Sciences, Olsztyn, and Department of Reproduction and Gynecological Endocrinology (S.W.), Medical University of Bialystok, Sklodowskiej 24A, 15-276 Bialystok, Poland; Institute for Genetic Medicine (R.Q.), Newcastle University, Newcastle-upon-Tyne, NE1 3BZ, United Kingdom; Disciplina de Endocrinologia (A.C.L.), Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, 05403-900 Sao Paulo, Brazil; Laboratoire de Biochimie et Génétique Moléculaire (C.D.), Hôpital Cochin, Assistance Publique-Hôpitaux de Paris, Université Paris-Descartes, 75014 Paris, France; Departments of Molecular Endocrinology and Pediatrics (T.O.), Hamamatsu University of School of Medicine, Hamamatsu 431-3192, Japan; Section of Reproductive Endocrinology, Infertility, and Genetics (H.-G.K., L.C.L.), Departments of Obstetrics and Gynecology and Neuroscience and Regenerative Medicine, Medical College of Georgia at Georgia Regents University, Augusta, Georgia 30912; and Department of Pediatrics (J.-H.C.), Asan Medical Center Children's Hospital, University of Ulsan College of Medicine, Seoul 138-736, Republic of Korea
| | - Catherine Dode
- Harvard Reproductive Endocrine Sciences Center and Reproductive Endocrine Unit (J.-H.C., R.B., N.D.S., J.E.H., L.P., C.L.B., W.F.C.), and Department of Medicine, Psychiatric, and Neurodevelopmental Genetics Unit (P.H.L.), Analytic and Translational Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, and Center for Human Genetic Research (J.F.G.), Massachusetts General Hospital, and Department of Genetics, Harvard Medical School, Boston, Massachusetts Boston, Massachusetts 02114; Department of Genetics (M.-L.K.), University Hospital, Caen, 14003, Caen Cedex, France; Department of Biology and Pathology of Human Reproduction in Bialystok (K.J.), Institute of Animal Reproduction and Food Research, Polish Academy of Sciences, Olsztyn, and Department of Reproduction and Gynecological Endocrinology (S.W.), Medical University of Bialystok, Sklodowskiej 24A, 15-276 Bialystok, Poland; Institute for Genetic Medicine (R.Q.), Newcastle University, Newcastle-upon-Tyne, NE1 3BZ, United Kingdom; Disciplina de Endocrinologia (A.C.L.), Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, 05403-900 Sao Paulo, Brazil; Laboratoire de Biochimie et Génétique Moléculaire (C.D.), Hôpital Cochin, Assistance Publique-Hôpitaux de Paris, Université Paris-Descartes, 75014 Paris, France; Departments of Molecular Endocrinology and Pediatrics (T.O.), Hamamatsu University of School of Medicine, Hamamatsu 431-3192, Japan; Section of Reproductive Endocrinology, Infertility, and Genetics (H.-G.K., L.C.L.), Departments of Obstetrics and Gynecology and Neuroscience and Regenerative Medicine, Medical College of Georgia at Georgia Regents University, Augusta, Georgia 30912; and Department of Pediatrics (J.-H.C.), Asan Medical Center Children's Hospital, University of Ulsan College of Medicine, Seoul 138-736, Republic of Korea
| | - Tsutomu Ogata
- Harvard Reproductive Endocrine Sciences Center and Reproductive Endocrine Unit (J.-H.C., R.B., N.D.S., J.E.H., L.P., C.L.B., W.F.C.), and Department of Medicine, Psychiatric, and Neurodevelopmental Genetics Unit (P.H.L.), Analytic and Translational Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, and Center for Human Genetic Research (J.F.G.), Massachusetts General Hospital, and Department of Genetics, Harvard Medical School, Boston, Massachusetts Boston, Massachusetts 02114; Department of Genetics (M.-L.K.), University Hospital, Caen, 14003, Caen Cedex, France; Department of Biology and Pathology of Human Reproduction in Bialystok (K.J.), Institute of Animal Reproduction and Food Research, Polish Academy of Sciences, Olsztyn, and Department of Reproduction and Gynecological Endocrinology (S.W.), Medical University of Bialystok, Sklodowskiej 24A, 15-276 Bialystok, Poland; Institute for Genetic Medicine (R.Q.), Newcastle University, Newcastle-upon-Tyne, NE1 3BZ, United Kingdom; Disciplina de Endocrinologia (A.C.L.), Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, 05403-900 Sao Paulo, Brazil; Laboratoire de Biochimie et Génétique Moléculaire (C.D.), Hôpital Cochin, Assistance Publique-Hôpitaux de Paris, Université Paris-Descartes, 75014 Paris, France; Departments of Molecular Endocrinology and Pediatrics (T.O.), Hamamatsu University of School of Medicine, Hamamatsu 431-3192, Japan; Section of Reproductive Endocrinology, Infertility, and Genetics (H.-G.K., L.C.L.), Departments of Obstetrics and Gynecology and Neuroscience and Regenerative Medicine, Medical College of Georgia at Georgia Regents University, Augusta, Georgia 30912; and Department of Pediatrics (J.-H.C.), Asan Medical Center Children's Hospital, University of Ulsan College of Medicine, Seoul 138-736, Republic of Korea
| | - Hyung-Goo Kim
- Harvard Reproductive Endocrine Sciences Center and Reproductive Endocrine Unit (J.-H.C., R.B., N.D.S., J.E.H., L.P., C.L.B., W.F.C.), and Department of Medicine, Psychiatric, and Neurodevelopmental Genetics Unit (P.H.L.), Analytic and Translational Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, and Center for Human Genetic Research (J.F.G.), Massachusetts General Hospital, and Department of Genetics, Harvard Medical School, Boston, Massachusetts Boston, Massachusetts 02114; Department of Genetics (M.-L.K.), University Hospital, Caen, 14003, Caen Cedex, France; Department of Biology and Pathology of Human Reproduction in Bialystok (K.J.), Institute of Animal Reproduction and Food Research, Polish Academy of Sciences, Olsztyn, and Department of Reproduction and Gynecological Endocrinology (S.W.), Medical University of Bialystok, Sklodowskiej 24A, 15-276 Bialystok, Poland; Institute for Genetic Medicine (R.Q.), Newcastle University, Newcastle-upon-Tyne, NE1 3BZ, United Kingdom; Disciplina de Endocrinologia (A.C.L.), Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, 05403-900 Sao Paulo, Brazil; Laboratoire de Biochimie et Génétique Moléculaire (C.D.), Hôpital Cochin, Assistance Publique-Hôpitaux de Paris, Université Paris-Descartes, 75014 Paris, France; Departments of Molecular Endocrinology and Pediatrics (T.O.), Hamamatsu University of School of Medicine, Hamamatsu 431-3192, Japan; Section of Reproductive Endocrinology, Infertility, and Genetics (H.-G.K., L.C.L.), Departments of Obstetrics and Gynecology and Neuroscience and Regenerative Medicine, Medical College of Georgia at Georgia Regents University, Augusta, Georgia 30912; and Department of Pediatrics (J.-H.C.), Asan Medical Center Children's Hospital, University of Ulsan College of Medicine, Seoul 138-736, Republic of Korea
| | - Lawrence C Layman
- Harvard Reproductive Endocrine Sciences Center and Reproductive Endocrine Unit (J.-H.C., R.B., N.D.S., J.E.H., L.P., C.L.B., W.F.C.), and Department of Medicine, Psychiatric, and Neurodevelopmental Genetics Unit (P.H.L.), Analytic and Translational Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, and Center for Human Genetic Research (J.F.G.), Massachusetts General Hospital, and Department of Genetics, Harvard Medical School, Boston, Massachusetts Boston, Massachusetts 02114; Department of Genetics (M.-L.K.), University Hospital, Caen, 14003, Caen Cedex, France; Department of Biology and Pathology of Human Reproduction in Bialystok (K.J.), Institute of Animal Reproduction and Food Research, Polish Academy of Sciences, Olsztyn, and Department of Reproduction and Gynecological Endocrinology (S.W.), Medical University of Bialystok, Sklodowskiej 24A, 15-276 Bialystok, Poland; Institute for Genetic Medicine (R.Q.), Newcastle University, Newcastle-upon-Tyne, NE1 3BZ, United Kingdom; Disciplina de Endocrinologia (A.C.L.), Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, 05403-900 Sao Paulo, Brazil; Laboratoire de Biochimie et Génétique Moléculaire (C.D.), Hôpital Cochin, Assistance Publique-Hôpitaux de Paris, Université Paris-Descartes, 75014 Paris, France; Departments of Molecular Endocrinology and Pediatrics (T.O.), Hamamatsu University of School of Medicine, Hamamatsu 431-3192, Japan; Section of Reproductive Endocrinology, Infertility, and Genetics (H.-G.K., L.C.L.), Departments of Obstetrics and Gynecology and Neuroscience and Regenerative Medicine, Medical College of Georgia at Georgia Regents University, Augusta, Georgia 30912; and Department of Pediatrics (J.-H.C.), Asan Medical Center Children's Hospital, University of Ulsan College of Medicine, Seoul 138-736, Republic of Korea
| | - James F Gusella
- Harvard Reproductive Endocrine Sciences Center and Reproductive Endocrine Unit (J.-H.C., R.B., N.D.S., J.E.H., L.P., C.L.B., W.F.C.), and Department of Medicine, Psychiatric, and Neurodevelopmental Genetics Unit (P.H.L.), Analytic and Translational Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, and Center for Human Genetic Research (J.F.G.), Massachusetts General Hospital, and Department of Genetics, Harvard Medical School, Boston, Massachusetts Boston, Massachusetts 02114; Department of Genetics (M.-L.K.), University Hospital, Caen, 14003, Caen Cedex, France; Department of Biology and Pathology of Human Reproduction in Bialystok (K.J.), Institute of Animal Reproduction and Food Research, Polish Academy of Sciences, Olsztyn, and Department of Reproduction and Gynecological Endocrinology (S.W.), Medical University of Bialystok, Sklodowskiej 24A, 15-276 Bialystok, Poland; Institute for Genetic Medicine (R.Q.), Newcastle University, Newcastle-upon-Tyne, NE1 3BZ, United Kingdom; Disciplina de Endocrinologia (A.C.L.), Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, 05403-900 Sao Paulo, Brazil; Laboratoire de Biochimie et Génétique Moléculaire (C.D.), Hôpital Cochin, Assistance Publique-Hôpitaux de Paris, Université Paris-Descartes, 75014 Paris, France; Departments of Molecular Endocrinology and Pediatrics (T.O.), Hamamatsu University of School of Medicine, Hamamatsu 431-3192, Japan; Section of Reproductive Endocrinology, Infertility, and Genetics (H.-G.K., L.C.L.), Departments of Obstetrics and Gynecology and Neuroscience and Regenerative Medicine, Medical College of Georgia at Georgia Regents University, Augusta, Georgia 30912; and Department of Pediatrics (J.-H.C.), Asan Medical Center Children's Hospital, University of Ulsan College of Medicine, Seoul 138-736, Republic of Korea
| | - William F Crowley
- Harvard Reproductive Endocrine Sciences Center and Reproductive Endocrine Unit (J.-H.C., R.B., N.D.S., J.E.H., L.P., C.L.B., W.F.C.), and Department of Medicine, Psychiatric, and Neurodevelopmental Genetics Unit (P.H.L.), Analytic and Translational Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, and Center for Human Genetic Research (J.F.G.), Massachusetts General Hospital, and Department of Genetics, Harvard Medical School, Boston, Massachusetts Boston, Massachusetts 02114; Department of Genetics (M.-L.K.), University Hospital, Caen, 14003, Caen Cedex, France; Department of Biology and Pathology of Human Reproduction in Bialystok (K.J.), Institute of Animal Reproduction and Food Research, Polish Academy of Sciences, Olsztyn, and Department of Reproduction and Gynecological Endocrinology (S.W.), Medical University of Bialystok, Sklodowskiej 24A, 15-276 Bialystok, Poland; Institute for Genetic Medicine (R.Q.), Newcastle University, Newcastle-upon-Tyne, NE1 3BZ, United Kingdom; Disciplina de Endocrinologia (A.C.L.), Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, 05403-900 Sao Paulo, Brazil; Laboratoire de Biochimie et Génétique Moléculaire (C.D.), Hôpital Cochin, Assistance Publique-Hôpitaux de Paris, Université Paris-Descartes, 75014 Paris, France; Departments of Molecular Endocrinology and Pediatrics (T.O.), Hamamatsu University of School of Medicine, Hamamatsu 431-3192, Japan; Section of Reproductive Endocrinology, Infertility, and Genetics (H.-G.K., L.C.L.), Departments of Obstetrics and Gynecology and Neuroscience and Regenerative Medicine, Medical College of Georgia at Georgia Regents University, Augusta, Georgia 30912; and Department of Pediatrics (J.-H.C.), Asan Medical Center Children's Hospital, University of Ulsan College of Medicine, Seoul 138-736, Republic of Korea
| |
Collapse
|
42
|
Zhou K, Huang B, Zou M, Lu D, He S, Wang G. Genome-wide identification of lineage-specific genes within Caenorhabditis elegans. Genomics 2015; 106:242-8. [PMID: 26188256 DOI: 10.1016/j.ygeno.2015.07.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Revised: 07/08/2015] [Accepted: 07/09/2015] [Indexed: 11/19/2022]
Abstract
With the rapid growth of sequencing technology, a number of genomes and transcriptomes of various species have been sequenced, contributing to the study of lineage-specific genes (LSGs). We identified two sets of LSGs using BLAST: one included Caenorhabditis elegans species-specific genes (1423, SSGs), and the other consisted of Caenorhabditis genus-specific genes (4539, GSGs). The subsequent characterization and analysis of the SSGs and GSGs showed that they have significant differences in evolution and that most LSGs were generated by gene duplication and integration of transposable elements (TEs). We then performed temporal expression profiling and protein function prediction and observed that many SSGs and GSGs are expressed and that genes involved with sex determination, specific stress, immune response, and morphogenesis are over-represented, suggesting that these specific genes may be related to the Caenorhabditis nematodes' special ability to survive in severe and extreme environments.
Collapse
Affiliation(s)
- Kun Zhou
- Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan 430079, China.
| | - Beibei Huang
- Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan 430079, China.
| | - Ming Zou
- Huazhong Agriculture University, Wuhan 430070, China.
| | - Dandan Lu
- Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan 430079, China.
| | - Shunping He
- The Key Laboratory of Aquatic Biodiversity and Conservation of the Chinese Academy of Sciences, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China.
| | - Guoxiu Wang
- Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan 430079, China.
| |
Collapse
|
43
|
Schatz MC, Maron LG, Stein JC, Hernandez Wences A, Gurtowski J, Biggers E, Lee H, Kramer M, Antoniou E, Ghiban E, Wright MH, Chia JM, Ware D, McCouch SR, McCombie WR. Whole genome de novo assemblies of three divergent strains of rice, Oryza sativa, document novel gene space of aus and indica. Genome Biol 2015; 15:506. [PMID: 25468217 DOI: 10.1186/preaccept-2784872521277375] [Citation(s) in RCA: 101] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The use of high throughput genome-sequencing technologies has uncovered a large extent of structural variation in eukaryotic genomes that makes important contributions to genomic diversity and phenotypic variation. When the genomes of different strains of a given organism are compared, whole genome resequencing data are typically aligned to an established reference sequence. However, when the reference differs in significant structural ways from the individuals under study, the analysis is often incomplete or inaccurate. RESULTS Here, we use rice as a model to demonstrate how improvements in sequencing and assembly technology allow rapid and inexpensive de novo assembly of next generation sequence data into high-quality assemblies that can be directly compared using whole genome alignment to provide an unbiased assessment. Using this approach, we are able to accurately assess the "pan-genome" of three divergent rice varieties and document several megabases of each genome absent in the other two. CONCLUSIONS Many of the genome-specific loci are annotated to contain genes, reflecting the potential for new biological properties that would be missed by standard reference-mapping approaches. We further provide a detailed analysis of several loci associated with agriculturally important traits, including the S5 hybrid sterility locus, the Sub1 submergence tolerance locus, the LRK gene cluster associated with improved yield, and the Pup1 cluster associated with phosphorus deficiency, illustrating the utility of our approach for biological discovery. All of the data and software are openly available to support further breeding and functional studies of rice and other species.
Collapse
|
44
|
Understanding multicellular function and disease with human tissue-specific networks. Nat Genet 2015; 47:569-76. [PMID: 25915600 PMCID: PMC4828725 DOI: 10.1038/ng.3259] [Citation(s) in RCA: 543] [Impact Index Per Article: 60.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2014] [Accepted: 03/06/2015] [Indexed: 12/17/2022]
Abstract
Tissue and cell-type identity lie at the core of human physiology and disease. Understanding the genetic underpinnings of complex tissues and individual cell lineages is crucial for developing improved diagnostics and therapeutics. We present genome-wide functional interaction networks for 144 human tissues and cell types developed using a data-driven Bayesian methodology that integrates thousands of diverse experiments spanning tissue and disease states. Tissue-specific networks predict lineage-specific responses to perturbation, reveal genes’ changing functional roles across tissues, and illuminate disease-disease relationships. We introduce NetWAS, which combines genes with nominally significant GWAS p-values and tissue-specific networks to identify disease-gene associations more accurately than GWAS alone. Our webserver, GIANT, provides an interface to human tissue networks through multi-gene queries, network visualization, analysis tools including NetWAS, and downloadable networks. GIANT enables systematic exploration of the landscape of interacting genes that shape specialized cellular functions across more than one hundred human tissues and cell types.
Collapse
|
45
|
Gossmann TI, Santure AW, Sheldon BC, Slate J, Zeng K. Highly variable recombinational landscape modulates efficacy of natural selection in birds. Genome Biol Evol 2015; 6:2061-75. [PMID: 25062920 PMCID: PMC4231635 DOI: 10.1093/gbe/evu157] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Determining the rate of protein evolution and identifying the causes of its variation across the genome are powerful ways to understand forces that are important for genome evolution. By using a multitissue transcriptome data set from great tit (Parus major), we analyzed patterns of molecular evolution between two passerine birds, great tit and zebra finch (Taeniopygia guttata), using the chicken genome (Gallus gallus) as an outgroup. We investigated whether a special feature of avian genomes, the highly variable recombinational landscape, modulates the efficacy of natural selection through the effects of Hill-Robertson interference, which predicts that selection should be more effective in removing deleterious mutations and incorporating beneficial mutations in high-recombination regions than in low-recombination regions. In agreement with these predictions, genes located in low-recombination regions tend to have a high proportion of neutrally evolving sites and relaxed selective constraint on sites subject to purifying selection, whereas genes that show strong support for past episodes of positive selection appear disproportionally in high-recombination regions. There is also evidence that genes located in high-recombination regions tend to have higher gene expression specificity than those located in low-recombination regions. Furthermore, more compact genes (i.e., those with fewer/shorter introns or shorter proteins) evolve faster than less compact ones. In sum, our results demonstrate that transcriptome sequencing is a powerful method to answer fundamental questions about genome evolution in nonmodel organisms.
Collapse
Affiliation(s)
- Toni I Gossmann
- Department of Animal and Plant Sciences, University of Sheffield, United Kingdom
| | - Anna W Santure
- Department of Animal and Plant Sciences, University of Sheffield, United KingdomSchool of Biological Sciences, University of Auckland, New Zealand
| | - Ben C Sheldon
- Edward Grey Institute, Department of Zoology, University of Oxford, United Kingdom
| | - Jon Slate
- Department of Animal and Plant Sciences, University of Sheffield, United Kingdom
| | - Kai Zeng
- Department of Animal and Plant Sciences, University of Sheffield, United Kingdom
| |
Collapse
|
46
|
Kretzler M, Ju W. A Transcriptional Map of the Renal Tubule: Linking Structure to Function. J Am Soc Nephrol 2015; 26:2603-5. [PMID: 25817354 DOI: 10.1681/asn.2015030242] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Affiliation(s)
- Matthias Kretzler
- Department of Internal Medicine, Division of Nephrology, Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| | - Wenjun Ju
- Department of Internal Medicine, Division of Nephrology, Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| |
Collapse
|
47
|
Guillén Y, Rius N, Delprat A, Williford A, Muyas F, Puig M, Casillas S, Ràmia M, Egea R, Negre B, Mir G, Camps J, Moncunill V, Ruiz-Ruano FJ, Cabrero J, de Lima LG, Dias GB, Ruiz JC, Kapusta A, Garcia-Mas J, Gut M, Gut IG, Torrents D, Camacho JP, Kuhn GCS, Feschotte C, Clark AG, Betrán E, Barbadilla A, Ruiz A. Genomics of ecological adaptation in cactophilic Drosophila. Genome Biol Evol 2014; 7:349-66. [PMID: 25552534 PMCID: PMC4316639 DOI: 10.1093/gbe/evu291] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Cactophilic Drosophila species provide a valuable model to study gene–environment interactions and ecological adaptation. Drosophila buzzatii and Drosophila mojavensis are two cactophilic species that belong to the repleta group, but have very different geographical distributions and primary host plants. To investigate the genomic basis of ecological adaptation, we sequenced the genome and developmental transcriptome of D. buzzatii and compared its gene content with that of D. mojavensis and two other noncactophilic Drosophila species in the same subgenus. The newly sequenced D. buzzatii genome (161.5 Mb) comprises 826 scaffolds (>3 kb) and contains 13,657 annotated protein-coding genes. Using RNA sequencing data of five life-stages we found expression of 15,026 genes, 80% protein-coding genes, and 20% noncoding RNA genes. In total, we detected 1,294 genes putatively under positive selection. Interestingly, among genes under positive selection in the D. mojavensis lineage, there is an excess of genes involved in metabolism of heterocyclic compounds that are abundant in Stenocereus cacti and toxic to nonresident Drosophila species. We found 117 orphan genes in the shared D. buzzatii–D. mojavensis lineage. In addition, gene duplication analysis identified lineage-specific expanded families with functional annotations associated with proteolysis, zinc ion binding, chitin binding, sensory perception, ethanol tolerance, immunity, physiology, and reproduction. In summary, we identified genetic signatures of adaptation in the shared D. buzzatii–D. mojavensis lineage, and in the two separate D. buzzatii and D. mojavensis lineages. Many of the novel lineage-specific genomic features are promising candidates for explaining the adaptation of these species to their distinct ecological niches.
Collapse
Affiliation(s)
- Yolanda Guillén
- Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Spain
| | - Núria Rius
- Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Spain
| | - Alejandra Delprat
- Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Spain
| | | | - Francesc Muyas
- Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Spain
| | - Marta Puig
- Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Spain
| | - Sònia Casillas
- Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Spain Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Spain
| | - Miquel Ràmia
- Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Spain Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Spain
| | - Raquel Egea
- Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Spain Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Spain
| | - Barbara Negre
- EMBL/CRG Research Unit in Systems Biology, Centre for Genomic Regulation (CRG), Barcelona, Spain Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Gisela Mir
- IRTA, Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Campus UAB, Edifici CRAG, Barcelona, Spain The Peter MacCallum Cancer Centre, East Melbourne, Victoria, Australia
| | - Jordi Camps
- Centro Nacional de Análisis Genómico (CNAG), Parc Científic de Barcelona, Torre I, Barcelona, Spain
| | - Valentí Moncunill
- Barcelona Supercomputing Center (BSC), Edifici TG (Torre Girona), Barcelona, Spain and Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | | | - Josefa Cabrero
- Departamento de Genética, Facultad de Ciencias, Universidad de Granada, Spain
| | - Leonardo G de Lima
- Instituto de Ciências Biológicas, Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - Guilherme B Dias
- Instituto de Ciências Biológicas, Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - Jeronimo C Ruiz
- Informática de Biossistemas, Centro de Pesquisas René Rachou-Fiocruz Minas, Belo Horizonte, MG, Brazil
| | - Aurélie Kapusta
- Department of Human Genetics, University of Utah School of Medicine
| | - Jordi Garcia-Mas
- IRTA, Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Campus UAB, Edifici CRAG, Barcelona, Spain
| | - Marta Gut
- Centro Nacional de Análisis Genómico (CNAG), Parc Científic de Barcelona, Torre I, Barcelona, Spain
| | - Ivo G Gut
- Centro Nacional de Análisis Genómico (CNAG), Parc Científic de Barcelona, Torre I, Barcelona, Spain
| | - David Torrents
- Barcelona Supercomputing Center (BSC), Edifici TG (Torre Girona), Barcelona, Spain and Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Juan P Camacho
- Departamento de Genética, Facultad de Ciencias, Universidad de Granada, Spain
| | - Gustavo C S Kuhn
- Instituto de Ciências Biológicas, Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - Cédric Feschotte
- Department of Human Genetics, University of Utah School of Medicine
| | - Andrew G Clark
- Department of Molecular Biology and Genetics, Cornell University
| | - Esther Betrán
- Department of Biology, University of Texas at Arlington
| | - Antonio Barbadilla
- Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Spain Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Spain
| | - Alfredo Ruiz
- Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Spain
| |
Collapse
|
48
|
Pilotti M, Brunetti A, Uva P, Lumia V, Tizzani L, Gervasi F, Iacono M, Pindo M. Kinase domain-targeted isolation of defense-related receptor-like kinases (RLK/Pelle) in Platanus×acerifolia: phylogenetic and structural analysis. BMC Res Notes 2014; 7:884. [PMID: 25486898 PMCID: PMC4295470 DOI: 10.1186/1756-0500-7-884] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2014] [Accepted: 11/18/2014] [Indexed: 01/31/2023] Open
Abstract
BACKGROUND Plant receptor-like kinase (RLK/Pelle) family regulates growth and developmental processes and interaction with pathogens and symbionts.Platanaceae is one of the earliest branches of Eudicots temporally located before the split which gave rise to Rosids and Asterids. Thus investigations into the RLK family in Platanus can provide information on the evolution of this gene family in the land plants.Moreover RLKs are good candidates for finding genes that are able to confer resistance to Platanus pathogens. RESULTS Degenerate oligonucleotide primers targeting the kinase domain of stress-related RLKs were used to isolate for the first time 111 RLK gene fragments in Platanus×acerifolia. Sequences were classified as candidates of the following subfamilies: CrRLK1L, LRR XII, WAK-like, and LRR X-BRI1 group. All the structural features typical of the RLK kinase domain were identified, including the non-RD motif which marks potential pathogen recognition receptors (PRRs). The LRR XII candidates, whose counterpart in Arabidopsis and rice comprises non-RD PRRs, were mostly non-RD kinases, suggesting a group of PRRs. Region-specific signatures of a relaxed purifying selection in the LRR XII candidates were also found, which is novel for plant RLK kinase domain and further supports the role of LRR XII candidates as PRRs. As we obtained CrRLK1L candidates using primers designed on Pto of tomato, we analysed the phylogenetic relationship between CrRLK1L and Pto-like of plant species. We thus classified all non-solanaceous Pto-like genes as CrRLK1L and highlighted for the first time the close phylogenetic vicinity between CrRLK1L and Pto group. The origins of Pto from CrRLK1L is proposed as an evolutionary mechanism. CONCLUSIONS The signatures of relaxed purifying selection highlight that a group of RLKs might have been involved in the expression of phenotypic plasticity and is thus a good candidate for investigations into pathogen resistance.Search of Pto-like genes in Platanus highlighted the close relationship between CrRLK1L and Pto group. It will be exciting to verify if sensu strictu Pto are present in taxonomic groups other than Solanaceae, in order to further clarify the evolutionary link with CrRLK1L.We obtained a first valuable resource useful for an in-depth study on stress perception systems.
Collapse
Affiliation(s)
- Massimo Pilotti
- />Plant Pathology Research Center, CRA-PAV Agricultural Research Council, V. C.G. Bertero 22, 00156 Rome, Italy
| | - Angela Brunetti
- />Plant Pathology Research Center, CRA-PAV Agricultural Research Council, V. C.G. Bertero 22, 00156 Rome, Italy
| | - Paolo Uva
- />CRS4 Bioinformatics Laboratory POLARIS Science and Technology Park, 09010 Pula, Cagliari, Italy
| | - Valentina Lumia
- />Plant Pathology Research Center, CRA-PAV Agricultural Research Council, V. C.G. Bertero 22, 00156 Rome, Italy
| | - Lorenza Tizzani
- />Plant Pathology Research Center, CRA-PAV Agricultural Research Council, V. C.G. Bertero 22, 00156 Rome, Italy
| | - Fabio Gervasi
- />Fruit Tree Research Center, CRA-FRU Agricultural Research Council, V. Fioranello, 52, 00134 Rome, Italy
| | - Michele Iacono
- />Roche Diagnostics SpA, V. G.B. Stucchi 110, 20052 Monza Milano, Italy
| | - Massimo Pindo
- />Research and Innovation Centre, Edmund Mach Foundation, V. E. Mach 1, 38010 San Michele a/A, Trento, Italy
| |
Collapse
|
49
|
Schatz MC, Maron LG, Stein JC, Wences AH, Gurtowski J, Biggers E, Lee H, Kramer M, Antoniou E, Ghiban E, Wright MH, Chia JM, Ware D, McCouch SR, McCombie WR. Whole genome de novo assemblies of three divergent strains of rice, Oryza sativa, document novel gene space of aus and indica. Genome Biol 2014. [PMID: 25468217 PMCID: PMC4268812 DOI: 10.1186/s13059-014-0506-z] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Background The use of high throughput genome-sequencing technologies has uncovered a large extent of structural variation in eukaryotic genomes that makes important contributions to genomic diversity and phenotypic variation. When the genomes of different strains of a given organism are compared, whole genome resequencing data are typically aligned to an established reference sequence. However, when the reference differs in significant structural ways from the individuals under study, the analysis is often incomplete or inaccurate. Results Here, we use rice as a model to demonstrate how improvements in sequencing and assembly technology allow rapid and inexpensive de novo assembly of next generation sequence data into high-quality assemblies that can be directly compared using whole genome alignment to provide an unbiased assessment. Using this approach, we are able to accurately assess the ‘pan-genome’ of three divergent rice varieties and document several megabases of each genome absent in the other two. Conclusions Many of the genome-specific loci are annotated to contain genes, reflecting the potential for new biological properties that would be missed by standard reference-mapping approaches. We further provide a detailed analysis of several loci associated with agriculturally important traits, including the S5 hybrid sterility locus, the Sub1 submergence tolerance locus, the LRK gene cluster associated with improved yield, and the Pup1 cluster associated with phosphorus deficiency, illustrating the utility of our approach for biological discovery. All of the data and software are openly available to support further breeding and functional studies of rice and other species. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0506-z) contains supplementary material, which is available to authorized users.
Collapse
|
50
|
Gossmann TI, Ziegler M. Sequence divergence and diversity suggests ongoing functional diversification of vertebrate NAD metabolism. DNA Repair (Amst) 2014; 23:39-48. [PMID: 25084685 PMCID: PMC4248024 DOI: 10.1016/j.dnarep.2014.07.005] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2014] [Revised: 05/22/2014] [Accepted: 07/09/2014] [Indexed: 12/04/2022]
Abstract
NAD is not only an important cofactor in redox reactions but has also received attention in recent years because of its physiological importance in metabolic regulation, DNA repair and signaling. In contrast to the redox reactions, these regulatory processes involve degradation of NAD and therefore necessitate a constant replenishment of its cellular pool. NAD biosynthetic enzymes are common to almost all species in all clades, but the number of NAD degrading enzymes varies substantially across taxa. In particular, vertebrates, including humans, have a manifold of NAD degrading enzymes which require a high turnover of NAD. As there is currently a lack of a systematic study of how natural selection has shaped enzymes involved in NAD metabolism we conducted a comprehensive evolutionary analysis based on intraspecific variation and interspecific divergence. We compare NAD biosynthetic and degrading enzymes in four eukaryotic model species and subsequently focus on human NAD metabolic enzymes and their orthologs in other vertebrates. We find that the majority of enzymes involved in NAD metabolism are subject to varying levels of purifying selection. While NAD biosynthetic enzymes appear to experience a rather high level of evolutionary constraint, there is evidence for positive selection among enzymes mediating NAD-dependent signaling. This is particularly evident for members of the PARP family, a diverse protein family involved in DNA damage repair and programmed cell death. Based on haplotype information and substitution rate analysis we pinpoint sites that are potential targets of positive selection. We also link our findings to a three dimensional structure, which suggests that positive selection occurs in domains responsible for DNA binding and polymerization rather than the NAD catalytic domain. Taken together, our results indicate that vertebrate NAD metabolism is still undergoing functional diversification.
Collapse
Affiliation(s)
- Toni I Gossmann
- Department of Animal and Plant Sciences, University of Sheffield, Alfred Denny Building, S10 2TN Sheffield, United Kingdom.
| | - Mathias Ziegler
- Department of Molecular Biology, University of Bergen, Postbox 7803, 5020 Bergen, Norway
| |
Collapse
|