1
|
Zhang Y, Ahsan MU, Wang K. Noncoding de novo mutations in SCN2A are associated with autism spectrum disorders. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.05.24306908. [PMID: 38766206 PMCID: PMC11100849 DOI: 10.1101/2024.05.05.24306908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Coding de novo mutations (DNMs) contribute to the risk for autism spectrum disorders (ASD), but the contribution of noncoding DNMs remains relatively unexplored. Here we use whole genome sequencing (WGS) data of 12,411 individuals (including 3,508 probands and 2,218 unaffected siblings) from 3,357 families collected in Simons Foundation Powering Autism Research for Knowledge (SPARK) to detect DNMs associated with ASD, while examining Simons Simplex Collection (SSC) with 6383 individuals from 2274 families to replicate the results. For coding DNMs, SCN2A reached exome-wide significance (p=2.06×10-11) in SPARK. The 618 known dominant ASD genes as a group are strongly enriched for coding DNMs in cases than sibling controls (fold change=1.51, p =1.13×10-5 for SPARK; fold change=1.86, p =2.06×10-9 for SSC). For noncoding DNMs, we used two methods to assess statistical significance: a point-based test that analyzes sites with a Combined Annotation Dependent Depletion (CADD) score ≥15, and a segment-based test that analyzes 1kb genomic segments with segment-specific background mutation rates (inferred from expected rare mutations in Gnocchi genome constraint scores). The point-based test identified SCN2A as marginally significant (p=6.12×10-4) in SPARK, yet segment-based test identified CSMD1, RBFOX1 and CHD13 as exome-wide significant. We did not identify significant enrichment of noncoding DNMs (in all 1kb segments or those with Gnocchi>4) in the 618 known ASD genes as a group in cases than sibling controls. When combining evidence from both coding and noncoding DNMs, we found that SCN2A with 11 coding and 5 noncoding DNMs exhibited the strongest significance (p=4.15×10-13). In summary, we identified both coding and noncoding DNMs in SCN2A associated with ASD, while nominating additional candidates for further examination in future studies.
Collapse
Affiliation(s)
- Yuan Zhang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Mian Umair Ahsan
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
2
|
Vasilyeva TA, Marakhonov AV, Kutsev SI, Zinchenko RA. Relative Frequencies of PAX6 Mutational Events in a Russian Cohort of Aniridia Patients in Comparison with the World's Population and the Human Genome. Int J Mol Sci 2022; 23:ijms23126690. [PMID: 35743132 PMCID: PMC9223373 DOI: 10.3390/ijms23126690] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Revised: 06/06/2022] [Accepted: 06/13/2022] [Indexed: 12/10/2022] Open
Abstract
Genome-wide sequencing metadata allows researchers to infer bias in the relative frequencies of mutational events and to predict putative mutagenic models. In addition, much less data could be useful in the evaluation of the mutational frequency spectrum and the prevalent local mutagenic process. Here we analyzed the PAX6 gene locus for mutational spectra obtained in our own and previous studies and compared them with data on other genes as well as the whole human genome. MLPA and Sanger sequencing were used for mutation searching in a cohort of 199 index patients from Russia with aniridia and aniridia-related phenotypes. The relative frequencies of different categories of PAX6 mutations were consistent with those previously reported by other researchers. The ratio between substitutions, small indels, and chromosome deletions in the 11p13 locus was within the interval previously published for 20 disease associated genomic loci, but corresponded to a higher end due to very high frequencies of small indels and chromosome deletions. The ratio between substitutions, small indels, and chromosome deletions for disease associated genes, including the PAX6 gene as well as the share of PAX6 missense mutations, differed considerably from those typical for the whole genome.
Collapse
Affiliation(s)
- Tatyana A. Vasilyeva
- Research Centre for Medical Genetics, 115522 Moscow, Russia; (T.A.V.); (S.I.K.); (R.A.Z.)
| | - Andrey V. Marakhonov
- Research Centre for Medical Genetics, 115522 Moscow, Russia; (T.A.V.); (S.I.K.); (R.A.Z.)
- Correspondence: ; Tel.: +7-499-320-60-90
| | - Sergey I. Kutsev
- Research Centre for Medical Genetics, 115522 Moscow, Russia; (T.A.V.); (S.I.K.); (R.A.Z.)
| | - Rena A. Zinchenko
- Research Centre for Medical Genetics, 115522 Moscow, Russia; (T.A.V.); (S.I.K.); (R.A.Z.)
- N.A. Semashko National Research Institute of Public Health, 105064 Moscow, Russia
| |
Collapse
|
3
|
Hanson HE, Wang C, Schrey AW, Liebl AL, Ravinet M, Jiang RH, Martin LB. Epigenetic Potential and DNA Methylation in an Ongoing House Sparrow (Passer domesticus) Range Expansion. Am Nat 2022; 200:662-674. [DOI: 10.1086/720950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
4
|
Han J, Munro JE, Kocoski A, Barry AE, Bahlo M. Population-level genome-wide STR discovery and validation for population structure and genetic diversity assessment of Plasmodium species. PLoS Genet 2022; 18:e1009604. [PMID: 35007277 PMCID: PMC8782505 DOI: 10.1371/journal.pgen.1009604] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 01/21/2022] [Accepted: 12/14/2021] [Indexed: 11/18/2022] Open
Abstract
Short tandem repeats (STRs) are highly informative genetic markers that have been used extensively in population genetics analysis. They are an important source of genetic diversity and can also have functional impact. Despite the availability of bioinformatic methods that permit large-scale genome-wide genotyping of STRs from whole genome sequencing data, they have not previously been applied to sequencing data from large collections of malaria parasite field samples. Here, we have genotyped STRs using HipSTR in more than 3,000 Plasmodium falciparum and 174 Plasmodium vivax published whole-genome sequence data from samples collected across the globe. High levels of noise and variability in the resultant callset necessitated the development of a novel method for quality control of STR genotype calls. A set of high-quality STR loci (6,768 from P. falciparum and 3,496 from P. vivax) were used to study Plasmodium genetic diversity, population structures and genomic signatures of selection and these were compared to genome-wide single nucleotide polymorphism (SNP) genotyping data. In addition, the genome-wide information about genetic variation and other characteristics of STRs in P. falciparum and P. vivax have been available in an interactive web-based R Shiny application PlasmoSTR (https://github.com/bahlolab/PlasmoSTR).
Collapse
Affiliation(s)
- Jiru Han
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
- Department of Medical Biology, The University of Melbourne, Melbourne, Australia
| | - Jacob E. Munro
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
- Department of Medical Biology, The University of Melbourne, Melbourne, Australia
| | - Anthony Kocoski
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
- Department of Mathematics and Statistics, The University of Melbourne, Melbourne, Australia
| | - Alyssa E. Barry
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
- Department of Medical Biology, The University of Melbourne, Melbourne, Australia
- Disease Elimination Program, Burnet Institute, Melbourne, Australia
- IMPACT Institute for Innovation in Mental and Physical Health and Clinical Translation, Deakin University, Geelong, Australia
| | - Melanie Bahlo
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
- Department of Medical Biology, The University of Melbourne, Melbourne, Australia
- * E-mail:
| |
Collapse
|
5
|
Hassan NE, Al-Janabi AA. Investigation of Interferon Gamma Activity Using Bioinformatics Methods. ARCHIVES OF RAZI INSTITUTE 2021; 76:1245-1253. [PMID: 35355749 PMCID: PMC8934094 DOI: 10.22092/ari.2021.356106.1780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Accepted: 10/02/2021] [Indexed: 05/25/2023]
Abstract
Breast cancer grows from the breast tissue and is a severe health problem worldwide. Genetics is believed to be the primary cause of all cases of breast cancer via gene mutation. Bioinformatics methodology has been used to determine the sequences and structures of bioactive substances. This study aimed to analyze the function and structure of the Interferon Gamma (IFNγ) in healthy controls and patients with breast cancer using bioinformatics methods. Blood samples were collected from 75 patients with breast cancer and 25 healthy subjects as control samples. The results showed transition mutation (30%) and transversion mutation (70%) in patients with breast cancer. Moreover, missense mutations (84%) and silent mutations (16%) were detected by BLAST. In addition, the amino acid of the IFNγ protein consisting of alpha-helical, β-sheet, and coil of secondary structure was determined in this study using BioEdit. The results of the physicochemical properties of the IFNγ protein reflect the function, stability, molecular weight, isoelectric point, and instability index of the IFNγ protein using ProtParam. Moreover, the results of mutation affected the percentage of alpha-helix, β-turns, and coil in breast cancer patients compared to healthy groups with reference of NCBI using PSIpred program. Additionally, the PHYRE2 server and RasMol program showed a tertiary structure of the IFNγ protein in breast cancer patients. Furthermore, the STRING program revealed the poly IFNγ protein interacted with other proteins to perform its functions normally. From the recorded data in the current study, it was concluded that IFNγ is considered a marker for patients with breast cancer.
Collapse
Affiliation(s)
- N E Hassan
- Department of Applied Science, University of Technology, Baghdad, Iraq
| | - A A Al-Janabi
- Department of Applied Science, University of Technology, Baghdad, Iraq
| |
Collapse
|
6
|
Manawasinghe IS, Phillips AJL, Xu J, Balasuriya A, Hyde KD, Stępień Ł, Harischandra DL, Karunarathna A, Yan J, Weerasinghe J, Luo M, Dong Z, Cheewangkoon R. Defining a species in fungal plant pathology: beyond the species level. FUNGAL DIVERS 2021. [DOI: 10.1007/s13225-021-00481-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
7
|
Khan AA, Ali MS, Babar F, Fatima A, Shafqat MA, Asghar B, Ilyas N, Fatima M, Liaqat A, Gondal MA. Lack of CpG islands in human unitary pseudogenes and its implication. Mamm Genome 2021; 32:443-447. [PMID: 34272576 DOI: 10.1007/s00335-021-09893-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 07/07/2021] [Indexed: 11/24/2022]
Abstract
CpG islands (CGIs) are aggregation of CpG dinucleotides in the promoters of mammalian genes. These CGIs are present in almost all the housekeeping genes and some tissue-specific genes in the mammalian genome. Extensive research has been done on the prevalence and role of CGIs in protein-coding genes. However, little is known about CGIs in pseudogenes. In the current research project, we focused on CGIs in three main classes of pseudogenes e.g., duplicated pseudogenes (DPGs), processed pseudogenes (PPGs), and unitary pseudogenes (UPGs). We discovered a predominant absence of CGIs in the promoters of all three pseudogenes. We also compared the CGI profile of these pseudogenes with their parent genes and found that unitary pseudogenes (UPGs) differ from the DPGs and PPGs in the sense that in the latter, lack of CGIs is a consequential event while in UPGs, this lack of CGIs in their promoters is not a result of pseudogenization process. We also discussed the implication of the results obtained from this comparison. To our knowledge, this is the first-ever study highlighting this aspect of UPGs throwing new insights into the evolution of genome in general and especially in the context of pseudogenes.
Collapse
Affiliation(s)
- Ammad Aslam Khan
- Department of Bioinformatics and Computational Biology, Virtual University, Lahore, 547 92, Pakistan.
| | - Muhammad Shahryar Ali
- Department of Bioinformatics and Computational Biology, Virtual University, Lahore, 547 92, Pakistan
| | - Farah Babar
- Department of Bioinformatics and Computational Biology, Virtual University, Lahore, 547 92, Pakistan
| | - Anees Fatima
- Department of Bioinformatics and Computational Biology, Virtual University, Lahore, 547 92, Pakistan
| | - Muhammad Awais Shafqat
- Department of Bioinformatics and Computational Biology, Virtual University, Lahore, 547 92, Pakistan
| | - Bisma Asghar
- Department of Bioinformatics and Computational Biology, Virtual University, Lahore, 547 92, Pakistan
| | - Nimra Ilyas
- Department of Bioinformatics and Computational Biology, Virtual University, Lahore, 547 92, Pakistan
| | - Maheen Fatima
- Department of Bioinformatics and Computational Biology, Virtual University, Lahore, 547 92, Pakistan
| | - Ayesha Liaqat
- Department of Bioinformatics and Computational Biology, Virtual University, Lahore, 547 92, Pakistan
| | | |
Collapse
|
8
|
The Long-Term Evolutionary History of Gradual Reduction of CpG Dinucleotides in the SARS-CoV-2 Lineage. BIOLOGY 2021; 10:biology10010052. [PMID: 33445785 PMCID: PMC7828247 DOI: 10.3390/biology10010052] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 12/29/2020] [Accepted: 01/09/2021] [Indexed: 12/24/2022]
Abstract
Simple Summary Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused the coronavirus disease 2019 (COVID-19), a pandemic that infected over 81 million people worldwide. This has led the scientific community to characterize the genome of this virus, including its nucleotide composition. Investigation of the dinucleotide frequency revealed that the proportion of CG dinucleotides (CpG) is highly reduced in the viral genomes. Since CpG dinucleotides is the target site for the host antiviral zinc finger protein, it has been suggested that the reduction in the proportion of CpG is the viral response to escape from the host defense machinery. In the present study, we investigated the time of origin of reduction in the CpG content. Whole genome analyses based on all representative viral genomes of the group Betacoronavirus revealed that the CpG content in the lineage of SARS-CoV-2 has been progressively declining over the past 1213 years. The depletion of CpG was found to occur at neutral—as well as selectively constrained—positions of the viral genomes. Abstract Recent studies suggested that the fraction of CG dinucleotides (CpG) is severely reduced in the genome of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The CpG deficiency was predicted to be the adaptive response of the virus to evade degradation of the viral RNA by the antiviral zinc finger protein that specifically binds to CpG nucleotides. By comparing all representative genomes belonging to the genus Betacoronavirus, this study examined the potential time of origin of CpG depletion. The results of this investigation revealed a highly significant correlation between the proportions of CpG nucleotide (CpG content) of the betacoronavirus species and their times of divergence from SARS-CoV-2. Species that are distantly related to SARS-CoV-2 had much higher CpG contents than that of SARS-CoV-2. Conversely, closely related species had low CpG contents that are similar to or slightly higher than that of SARS-CoV-2. These results suggest a systematic and continuous reduction in the CpG content in the SARS-CoV-2 lineage that might have started since the Sarbecovirus + Hibecovirus clade separated from Nobecovirus, which was estimated to be 1213 years ago. This depletion was not found to be mediated by the GC contents of the genomes. Our results also showed that the depletion of CpG occurred at neutral positions of the genome as well as those under selection. The latter is evident from the progressive reduction in the proportion of arginine amino acid (coded by CpG dinucleotides) in the SARS-CoV-2 lineage over time. The results of this study suggest that shedding CpG nucleotides from their genome is a continuing process in this viral lineage, potentially to escape from their host defense mechanisms.
Collapse
|
9
|
Rong S, Buerer L, Rhine CL, Wang J, Cygan KJ, Fairbrother WG. Mutational bias and the protein code shape the evolution of splicing enhancers. Nat Commun 2020; 11:2845. [PMID: 32504065 PMCID: PMC7275064 DOI: 10.1038/s41467-020-16673-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Accepted: 04/28/2020] [Indexed: 02/06/2023] Open
Abstract
Exonic splicing enhancers (ESEs) are enriched in exons relative to introns and bind splicing activators. This study considers a fundamental question of co-evolution: How did ESE motifs become enriched in exons prior to the evolution of ESE recognition? We hypothesize that the high exon to intron motif ratios necessary for ESE function were created by mutational bias coupled with purifying selection on the protein code. These two forces retain certain coding motifs in exons while passively depleting them from introns. Through the use of simulations, genomic analyses, and high throughput splicing assays, we confirm the key predictions of this hypothesis, including an overlap between protein and splicing information in ESEs. We discuss the implications of mutational bias as an evolutionary driver in other cis-regulatory systems. Splicing is regulated by cis-acting elements in pre-mRNAs such as exonic or intronic splicing enhancers and silencers. Here the authors show that exonic splicing enhancers are enriched in exons compared to introns due to mutational bias coupled with purifying selection on the protein code.
Collapse
Affiliation(s)
- Stephen Rong
- Center for Computational Molecular Biology, Brown University, Providence, RI, 02912, USA.,Ecology and Evolutionary Biology, Brown University, Providence, RI, 02912, USA
| | - Luke Buerer
- Center for Computational Molecular Biology, Brown University, Providence, RI, 02912, USA
| | - Christy L Rhine
- Molecular Biology, Cell Biology and Biochemistry, Brown University, Providence, RI, 02912, USA
| | - Jing Wang
- Molecular Biology, Cell Biology and Biochemistry, Brown University, Providence, RI, 02912, USA
| | - Kamil J Cygan
- Center for Computational Molecular Biology, Brown University, Providence, RI, 02912, USA.,Molecular Biology, Cell Biology and Biochemistry, Brown University, Providence, RI, 02912, USA
| | - William G Fairbrother
- Center for Computational Molecular Biology, Brown University, Providence, RI, 02912, USA. .,Molecular Biology, Cell Biology and Biochemistry, Brown University, Providence, RI, 02912, USA. .,Hassenfeld Child Health Innovation Institute of Brown University, Providence, RI, 02912, USA.
| |
Collapse
|
10
|
McDew-White M, Li X, Nkhoma SC, Nair S, Cheeseman I, Anderson TJC. Mode and Tempo of Microsatellite Length Change in a Malaria Parasite Mutation Accumulation Experiment. Genome Biol Evol 2020; 11:1971-1985. [PMID: 31273388 PMCID: PMC6644851 DOI: 10.1093/gbe/evz140] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/29/2019] [Indexed: 12/12/2022] Open
Abstract
Malaria parasites have small extremely AT-rich genomes: microsatellite repeats (1–9 bp) comprise 11% of the genome and genetic variation in natural populations is dominated by repeat changes in microsatellites rather than point mutations. This experiment was designed to quantify microsatellite mutation patterns in Plasmodium falciparum. We established 31 parasite cultures derived from a single parasite cell and maintained these for 114–267 days with frequent reductions to a single cell, so parasites accumulated mutations during ∼13,207 cell divisions. We Illumina sequenced the genomes of both progenitor and end-point mutation accumulation (MA) parasite lines in duplicate to validate stringent calling parameters. Microsatellite calls were 99.89% (GATK), 99.99% (freeBayes), and 99.96% (HipSTR) concordant in duplicate sequence runs from independent sequence libraries, whereas introduction of microsatellite mutations into the reference genome revealed a low false negative calling rate (0.68%). We observed 98 microsatellite mutations. We highlight several conclusions: microsatellite mutation rates (3.12 × 10−7 to 2.16 × 10−8/cell division) are associated with both repeat number and repeat motif like other organisms studied. However, 41% of changes resulted from loss or gain of more than one repeat: this was particularly true for long repeat arrays. Unlike other eukaryotes, we found no insertions or deletions that were not associated with repeats or homology regions. Overall, microsatellite mutation rates are among the lowest recorded and comparable to those in another AT-rich protozoan (Dictyostelium). However, a single infection (>1011 parasites) will still contain over 2.16 × 103 to 3.12 × 104 independent mutations at any single microsatellite locus.
Collapse
Affiliation(s)
| | - Xue Li
- Texas Biomedical Research Institute, San Antonio, Texas
| | - Standwell C Nkhoma
- Texas Biomedical Research Institute, San Antonio, Texas.,Malaria Research and Reference Reagent Resource Center (MR4), BEI Resources, American Type Culture Collection, 10801 University Boulevard, Manassas, VA
| | - Shalini Nair
- Texas Biomedical Research Institute, San Antonio, Texas
| | - Ian Cheeseman
- Texas Biomedical Research Institute, San Antonio, Texas
| | | |
Collapse
|
11
|
Bin Y, Wang X, Zhao L, Wen P, Xia J. An analysis of mutational signatures of synonymous mutations across 15 cancer types. BMC MEDICAL GENETICS 2019; 20:190. [PMID: 31815613 PMCID: PMC6900878 DOI: 10.1186/s12881-019-0926-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Background Synonymous mutations have been identified to play important roles in cancer development, although they do not modify the protein sequences. However, relatively little research has specifically delineated the functionality of synonymous mutations in cancer. Results We investigated the nucleotide-based and amino acid-based features of synonymous mutations across 15 cancer types from The Cancer Genome Atlas (TCGA), and revealed novel driver candidates by identifying hotspot mutations. Firstly, synonymous mutations were analyzed between TCGA and 1000 Genomes Project at nucleotide and amino acid levels. We found that C:G → T:A transitions were the most frequent single-base substitutions, and leucine underwent the largest number of synonymous mutations in TCGA due to prevalent C → T transition, which induced the transformation between optimal and non-optimal codons. Next, 97 synonymous hotspot mutations in 86 genes were nominated as candidate drivers with potential cancer risk by considering the mutational rates across different sequence contexts. We observed that non-CpG-island GC transition sequence context was positively selected across most of cancer types, and different sequence contexts under which hotspot mutations occur could be significance for genetic differences and functional features. We also found that the hotspots were more conserved than neutral mutations of hotspot-mutation-containing-genes and frequently happened at leucine. In addition, we mapped hotspots, neutral and non-hotspot mutations of hotspot-mutation-containing-genes to their respective protein domains and found ion transport domain was the most frequent one, which could mediate the cell interaction and had relevant implication for tumor therapy. And the signatures of synonymous hotspots were qualitatively similar with those of harmful missense variants. Conclusions We illustrated the preferences of cancer associated synonymous mutations, especially hotspots, and laid the groundwork for understanding the synonymous mutations act as drivers in cancer.
Collapse
Affiliation(s)
- Yannan Bin
- Institutes of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Xiaojuan Wang
- Institutes of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Le Zhao
- Institutes of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Pengbo Wen
- Institutes of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Junfeng Xia
- Institutes of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, Hefei, 230601, Anhui, China.
| |
Collapse
|
12
|
Maze EA, Ham C, Kelly J, Ussher L, Almond N, Towers GJ, Berry N, Belshaw R. Variable Baseline Papio cynocephalus Endogenous Retrovirus (PcEV) Expression Is Upregulated in Acutely SIV-Infected Macaques and Correlated to STAT1 Expression in the Spleen. Front Immunol 2019; 10:901. [PMID: 31156613 PMCID: PMC6529565 DOI: 10.3389/fimmu.2019.00901] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Accepted: 04/08/2019] [Indexed: 01/12/2023] Open
Abstract
Retroviral replication leaves a DNA copy in the host cell chromosome, which over millions of years of infection of germline cells has led to 5% of the human genome sequence being comprised of endogenous retroviruses (ERVs), distributed throughout an estimated 100,000 loci. Over time these loci have accrued mutations such as premature stop codons that prevent continued replication. However, many loci remain both transcriptionally and translationally active and ERVs have been implicated in interacting with the host immune system. Using archived plasma and tissue samples from past macaque studies, experimentally infected with simian immunodeficiency virus (SIV), the expression of one macaque ERV in response to acute viral infection was explored together with a measure of the innate immune response. Specifically, RNA levels were determined for (a) Papio cynocephalus Endogenous Retrovirus (PcEV), an ERV (b) STAT1, a key gene in the interferon signaling pathway, and (c) SIV, an exogenous pathogen. Bioinformatic analysis of DNA sequences of the PcEV loci within the macaque reference genome revealed the presence of open reading frames (ORFs) consistent with potential protein expression but not ERV replication. Quantitative RT-PCR analysis of DNase-treated RNA extracts from plasma derived from acute SIV-infection detected PcEV RNA at low levels in 7 of 22 macaques. PcEV RNA levels were significantly elevated in PBMC and spleen samples recovered during acute SIV infection, but not in the thymus and lymph nodes. A strong positive correlation was identified between PcEV and STAT1 RNA levels in spleen samples recovered from SIV-positive macaques. One possibility is that SIV infection induces PcEV expression in infected lymphoid tissue that contributes to induction of an antiviral response.
Collapse
Affiliation(s)
- Emmanuel Atangana Maze
- School of Biomedical Sciences, Faculty of Medicine and Dentistry, University of Plymouth, Plymouth, United Kingdom.,Division of Infectious Disease Diagnostics, National Institute of Standards and Control (NIBSC), Potters Bar, United Kingdom
| | - Claire Ham
- Division of Infectious Disease Diagnostics, National Institute of Standards and Control (NIBSC), Potters Bar, United Kingdom
| | - Jack Kelly
- School of Biomedical Sciences, Faculty of Medicine and Dentistry, University of Plymouth, Plymouth, United Kingdom
| | - Lindsay Ussher
- School of Biomedical Sciences, Faculty of Medicine and Dentistry, University of Plymouth, Plymouth, United Kingdom
| | - Neil Almond
- Division of Infectious Disease Diagnostics, National Institute of Standards and Control (NIBSC), Potters Bar, United Kingdom
| | - Greg J Towers
- Division of Infection and Immunity, University College London, London, United Kingdom
| | - Neil Berry
- Division of Infectious Disease Diagnostics, National Institute of Standards and Control (NIBSC), Potters Bar, United Kingdom
| | - Robert Belshaw
- School of Biomedical Sciences, Faculty of Medicine and Dentistry, University of Plymouth, Plymouth, United Kingdom
| |
Collapse
|
13
|
Pranckėnienė L, Jakaitienė A, Ambrozaitytė L, Kavaliauskienė I, Kučinskas V. Insights Into de novo Mutation Variation in Lithuanian Exome. Front Genet 2018; 9:315. [PMID: 30154829 PMCID: PMC6102505 DOI: 10.3389/fgene.2018.00315] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2018] [Accepted: 07/24/2018] [Indexed: 01/23/2023] Open
Abstract
In the last decade, one of the biggest challenges in genomics research has been to distinguish definitive pathogenic variants from all likely pathogenic variants identified by next-generation sequencing. This task is particularly complex because of our lack of knowledge regarding overall genome variation and pathogenicity of the variants. Therefore, obtaining sufficient information about genome variants in the general population is necessary as such data could be used for the interpretation of de novo mutations (DNMs) in the context of patient's phenotype in cases of sporadic genetic disease. In this study, data from whole-exome sequencing of the general population in Lithuania were directly examined. In total, 84 (VarScan) and 95 (VarSeqTM) DNMs were identified and validated using different algorithms. Thirty-nine of these mutations were considered likely to be pathogenic based on gene function, evolutionary conservation, and mutation impact. The mutation rate estimated per position pair per generation was 2.74 × 10-8 [95% CI: 2.24 × 10-8-3.35 × 10-8] (VarScan) and 2.4 × 10-8 [95% CI: 1.96 × 10-8-2.99 × 10-8] (VarSeqTM), with 1.77 × 10-8 [95% CI: 6.03 × 10-9-5.2 × 10-8] de novo indels per position per generation. The rate of germline DNMs in the Lithuanian population and the effects of the genomic and epigenetic context on DNM formation were calculated for the first time in this study, providing a basis for further analysis of DNMs in individuals with genetic diseases. Considering these findings, additional studies in patient groups with genetic diseases with unclear etiology may facilitate our ability to distinguish certain pathogenic or adaptive DNMs from tolerated background DNMs and to reliably identify disease-causing DNMs by their properties through direct observation.
Collapse
Affiliation(s)
- Laura Pranckėnienė
- Department of Human and Medical Genetics, Institute of Biomedical Sciences, Faculty of Medicine, Vilnius University, Vilnius, Lithuania
| | | | | | | | | |
Collapse
|
14
|
Talukder SK, Azhaguvel P, Chekhovskiy K, Saha MC. Molecular discrimination of tall fescue morphotypes in association with Festuca relatives. PLoS One 2018; 13:e0191343. [PMID: 29342197 PMCID: PMC5771633 DOI: 10.1371/journal.pone.0191343] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Accepted: 01/03/2018] [Indexed: 11/18/2022] Open
Abstract
Tall fescue (Festuca arundinacea Schreb.) is an important cool-season perennial grass species used as forage and turf, and in conservation plantings. There are three morphotypes in hexaploid tall fescue: Continental, Mediterranean and Rhizomatous. This study was conducted to develop morphotype-specific molecular markers to distinguish Continental and Mediterranean tall fescues, and establish their relationships with other species of the Festuca genus for genomic inference. Chloroplast sequence variation and simple sequence repeat (SSR) polymorphism were explored in 12 genotypes of three tall fescue morphotypes and four Festuca species. Hypervariable chloroplast regions were retrieved by using 33 specifically designed primers followed by sequencing the PCR products. SSR polymorphism was studied using 144 tall fescue SSR primers. Four chloroplast (NFTCHL17, NFTCHL43, NFTCHL45 and NFTCHL48) and three SSR (nffa090, nffa204 and nffa338) markers were identified which can distinctly differentiate Continental and Mediterranean morphotypes. A primer pair, NFTCHL45, amplified a 47 bp deletion between the two morphotypes is being routinely used in the Noble Research Institute's core facility for morphotype discrimination. Both chloroplast sequence variation and SSR diversity showed a close association between Rhizomatous and Continental morphotypes, while the Mediterranean morphotype was in a distant clade. F. pratensis and F. arundinacea var. glaucescens, the P and G1G2 genome donors, respectively, were grouped with the Continental clade, and F. mairei (M1M2 genome) grouped with the Mediterranean clade in chloroplast sequence variation, while both F. pratensis and F. mairei formed independent clade in SSR analysis. Age estimation based on chloroplast sequence variation indicated that the Continental and Mediterranean clades might have been colonized independently during 0.65 ± 0.06 and 0.96 ± 0.1 million years ago (Mya) respectively. The findings of the study will enhance tall fescue breeding for persistence and productivity.
Collapse
Affiliation(s)
| | - Perumal Azhaguvel
- Noble Research Institute, LLC, Ardmore, OK, United States of America
| | | | - Malay C. Saha
- Noble Research Institute, LLC, Ardmore, OK, United States of America
| |
Collapse
|
15
|
Hurst LD, Batada NN. Depletion of somatic mutations in splicing-associated sequences in cancer genomes. Genome Biol 2017; 18:213. [PMID: 29115978 PMCID: PMC5678748 DOI: 10.1186/s13059-017-1337-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Accepted: 10/12/2017] [Indexed: 01/01/2023] Open
Abstract
Background An important goal of cancer genomics is to identify systematically cancer-causing mutations. A common approach is to identify sites with high ratios of non-synonymous to synonymous mutations; however, if synonymous mutations are under purifying selection, this methodology leads to identification of false-positive mutations. Here, using synonymous somatic mutations (SSMs) identified in over 4000 tumours across 15 different cancer types, we sought to test this assumption by focusing on coding regions required for splicing. Results Exon flanks, which are enriched for sequences required for splicing fidelity, have ~ 17% lower SSM density compared to exonic cores, even after excluding canonical splice sites. While it is impossible to eliminate a mutation bias of unknown cause, multiple lines of evidence support a purifying selection model above a mutational bias explanation. The flank/core difference is not explained by skewed nucleotide content, replication timing, nucleosome occupancy or deficiency in mismatch repair. The depletion is not seen in tumour suppressors, consistent with their role in positive tumour selection, but is otherwise observed in cancer-associated and non-cancer genes, both essential and non-essential. Consistent with a role in splicing modulation, exonic splice enhancers have a lower SSM density before and after controlling for nucleotide composition; moreover, flanks at the 5’ end of the exons have significantly lower SSM density than at the 3’ end. Conclusions These results suggest that the observable mutational spectrum of cancer genomes is not simply a product of various mutational processes and positive selection, but might also be shaped by negative selection. Electronic supplementary material The online version of this article (doi:10.1186/s13059-017-1337-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| | - Nizar N Batada
- Institute for Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, UK.
| |
Collapse
|
16
|
Elurbe DM, Paranjpe SS, Georgiou G, van Kruijsbergen I, Bogdanovic O, Gibeaux R, Heald R, Lister R, Huynen MA, van Heeringen SJ, Veenstra GJC. Regulatory remodeling in the allo-tetraploid frog Xenopus laevis. Genome Biol 2017; 18:198. [PMID: 29065907 PMCID: PMC5655803 DOI: 10.1186/s13059-017-1335-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2017] [Accepted: 10/03/2017] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Genome duplication has played a pivotal role in the evolution of many eukaryotic lineages, including the vertebrates. A relatively recent vertebrate genome duplication is that in Xenopus laevis, which resulted from the hybridization of two closely related species about 17 million years ago. However, little is known about the consequences of this duplication at the level of the genome, the epigenome, and gene expression. RESULTS The X. laevis genome consists of two subgenomes, referred to as L (long chromosomes) and S (short chromosomes), that originated from distinct diploid progenitors. Of the parental subgenomes, S chromosomes have degraded faster than L chromosomes from the point of genome duplication until the present day. Deletions appear to have the largest effect on pseudogene formation and loss of regulatory regions. Deleted regions are enriched for long DNA repeats and the flanking regions have high alignment scores, suggesting that non-allelic homologous recombination has played a significant role in the loss of DNA. To assess innovations in the X. laevis subgenomes we examined p300-bound enhancer peaks that are unique to one subgenome and absent from X. tropicalis. A large majority of new enhancers comprise transposable elements. Finally, to dissect early and late events following interspecific hybridization, we examined the epigenome and the enhancer landscape in X. tropicalis × X. laevis hybrid embryos. Strikingly, young X. tropicalis DNA transposons are derepressed and recruit p300 in hybrid embryos. CONCLUSIONS The results show that erosion of X. laevis genes and functional regulatory elements is associated with repeats and non-allelic homologous recombination and furthermore that young repeats have also contributed to the p300-bound regulatory landscape following hybridization and whole-genome duplication.
Collapse
Affiliation(s)
- Dei M Elurbe
- Radboud University Medical Center, Center for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, 6500 HB, Nijmegen, The Netherlands
| | - Sarita S Paranjpe
- Radboud University, Faculty of Science, Department of Molecular Developmental Biology, Radboud Institute for Molecular Life Sciences, 6500 HB, Nijmegen, The Netherlands
| | - Georgios Georgiou
- Radboud University, Faculty of Science, Department of Molecular Developmental Biology, Radboud Institute for Molecular Life Sciences, 6500 HB, Nijmegen, The Netherlands
| | - Ila van Kruijsbergen
- Radboud University, Faculty of Science, Department of Molecular Developmental Biology, Radboud Institute for Molecular Life Sciences, 6500 HB, Nijmegen, The Netherlands
| | - Ozren Bogdanovic
- Genomics and Epigenetics Division, Garvan Institute of Medical Research, Sydney, Australia
- St Vincent's Clinical School, Faculty of Medicine, University of New South Wales, Sydney, Australia
- ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, Australia
| | - Romain Gibeaux
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
| | - Rebecca Heald
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
| | - Ryan Lister
- Harry Perkins Institute of Medical Research and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA, 6009, Australia
| | - Martijn A Huynen
- Radboud University Medical Center, Center for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, 6500 HB, Nijmegen, The Netherlands.
| | - Simon J van Heeringen
- Radboud University, Faculty of Science, Department of Molecular Developmental Biology, Radboud Institute for Molecular Life Sciences, 6500 HB, Nijmegen, The Netherlands.
| | - Gert Jan C Veenstra
- Radboud University, Faculty of Science, Department of Molecular Developmental Biology, Radboud Institute for Molecular Life Sciences, 6500 HB, Nijmegen, The Netherlands.
| |
Collapse
|
17
|
Enge M, Arda HE, Mignardi M, Beausang J, Bottino R, Kim SK, Quake SR. Single-Cell Analysis of Human Pancreas Reveals Transcriptional Signatures of Aging and Somatic Mutation Patterns. Cell 2017; 171:321-330.e14. [PMID: 28965763 DOI: 10.1016/j.cell.2017.09.004] [Citation(s) in RCA: 332] [Impact Index Per Article: 47.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2017] [Revised: 07/02/2017] [Accepted: 08/30/2017] [Indexed: 12/20/2022]
Abstract
As organisms age, cells accumulate genetic and epigenetic errors that eventually lead to impaired organ function or catastrophic transformation such as cancer. Because aging reflects a stochastic process of increasing disorder, cells in an organ will be individually affected in different ways, thus rendering bulk analyses of postmitotic adult cells difficult to interpret. Here, we directly measure the effects of aging in human tissue by performing single-cell transcriptome analysis of 2,544 human pancreas cells from eight donors spanning six decades of life. We find that islet endocrine cells from older donors display increased levels of transcriptional noise and potential fate drift. By determining the mutational history of individual cells, we uncover a novel mutational signature in healthy aging endocrine cells. Our results demonstrate the feasibility of using single-cell RNA sequencing (RNA-seq) data from primary cells to derive insights into genetic and transcriptional processes that operate on aging human tissue.
Collapse
Affiliation(s)
- Martin Enge
- Department of Bioengineering and Applied Physics, Stanford University, Stanford, CA 94305, USA
| | - H Efsun Arda
- Department of Developmental Biology, Stanford University School of Medicine, CA 94305, USA
| | - Marco Mignardi
- Department of Bioengineering and Applied Physics, Stanford University, Stanford, CA 94305, USA; Department of Information Technology, Uppsala University, Sweden and SciLifeLab, Uppsala, Sweden SE-751 05
| | - John Beausang
- Department of Bioengineering and Applied Physics, Stanford University, Stanford, CA 94305, USA
| | - Rita Bottino
- Institute of Cellular Therapeutics, Allegheny Health Network, 320 East North Avenue, Pittsburgh, PA 15212, USA
| | - Seung K Kim
- Department of Developmental Biology, Stanford University School of Medicine, CA 94305, USA
| | - Stephen R Quake
- Department of Bioengineering and Applied Physics, Stanford University, Stanford, CA 94305, USA; Chan Zuckerberg Biohub, San Francisco, CA 94158, USA; Institute of Cellular Therapeutics, Allegheny Health Network, 320 East North Avenue, Pittsburgh, PA 15212, USA.
| |
Collapse
|
18
|
Abstract
Strong DNA conservation among divergent species is an indicator of enduring functionality. With weaker sequence conservation we enter a vast ‘twilight zone’ in which sequence subject to transient or lower constraint cannot be distinguished easily from neutrally evolving, non-functional sequence. Twilight zone functional sequence is illuminated instead by principles of selective constraint and positive selection using genomic data acquired from within a species’ population. Application of these principles reveals that despite being biochemically active, most twilight zone sequence is not functional.
Collapse
Affiliation(s)
- Chris P Ponting
- MRC Human Genetics Unit, The Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK.
| |
Collapse
|
19
|
Sahakyan AB, Balasubramanian S. Single genome retrieval of context-dependent variability in mutation rates for human germline. BMC Genomics 2017; 18:81. [PMID: 28086752 PMCID: PMC5237266 DOI: 10.1186/s12864-016-3440-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2016] [Accepted: 12/19/2016] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Accurate knowledge of the core components of substitution rates is of vital importance to understand genome evolution and dynamics. By performing a single-genome and direct analysis of 39,894 retrotransposon remnants, we reveal sequence context-dependent germline nucleotide substitution rates for the human genome. RESULTS The rates are characterised through rate constants in a time-domain, and are made available through a dedicated program (Trek) and a stand-alone database. Due to the nature of the method design and the imposed stringency criteria, we expect our rate constants to be good estimates for the rates of spontaneous mutations. Benefiting from such data, we study the short-range nucleotide (up to 7-mer) organisation and the germline basal substitution propensity (BSP) profile of the human genome; characterise novel, CpG-independent, substitution prone and resistant motifs; confirm a decreased tendency of moieties with low BSP to undergo somatic mutations in a number of cancer types; and, produce a Trek-based estimate of the overall mutation rate in human. CONCLUSIONS The extended set of rate constants we report may enrich our resources and help advance our understanding of genome dynamics and evolution, with possible implications for the role of spontaneous mutations in the emergence of pathological genotypes and neutral evolution of proteomes.
Collapse
Affiliation(s)
- Aleksandr B Sahakyan
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK.
| | - Shankar Balasubramanian
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK.
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK.
- School of Clinical Medicine, University of Cambridge, Cambridge, CB2 0SP, UK.
| |
Collapse
|
20
|
Stephens ZD, Hudson ME, Mainzer LS, Taschuk M, Weber MR, Iyer RK. Simulating Next-Generation Sequencing Datasets from Empirical Mutation and Sequencing Models. PLoS One 2016; 11:e0167047. [PMID: 27893777 PMCID: PMC5125660 DOI: 10.1371/journal.pone.0167047] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2016] [Accepted: 11/08/2016] [Indexed: 12/31/2022] Open
Abstract
An obstacle to validating and benchmarking methods for genome analysis is that there are few reference datasets available for which the “ground truth” about the mutational landscape of the sample genome is known and fully validated. Additionally, the free and public availability of real human genome datasets is incompatible with the preservation of donor privacy. In order to better analyze and understand genomic data, we need test datasets that model all variants, reflecting known biology as well as sequencing artifacts. Read simulators can fulfill this requirement, but are often criticized for limited resemblance to true data and overall inflexibility. We present NEAT (NExt-generation sequencing Analysis Toolkit), a set of tools that not only includes an easy-to-use read simulator, but also scripts to facilitate variant comparison and tool evaluation. NEAT has a wide variety of tunable parameters which can be set manually on the default model or parameterized using real datasets. The software is freely available at github.com/zstephens/neat-genreads.
Collapse
Affiliation(s)
- Zachary D. Stephens
- Department of Electrical and Computer Engineering, Univ. of Illinois at Urbana-Champaign, Urbana, IL, United States of America
- * E-mail:
| | - Matthew E. Hudson
- Department of Crop Sciences, Univ. of Illinois at Urbana-Champaign, Urbana, IL, United States of America
- Institute for Genomic Biology, Univ. of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| | - Liudmila S. Mainzer
- Institute for Genomic Biology, Univ. of Illinois at Urbana-Champaign, Urbana, IL, United States of America
- National Center for Supercomputing Applications, Univ. of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| | - Morgan Taschuk
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - Matthew R. Weber
- National Center for Supercomputing Applications, Univ. of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| | - Ravishankar K. Iyer
- Department of Electrical and Computer Engineering, Univ. of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| |
Collapse
|
21
|
Sahakyan AB, Balasubramanian S. Long genes and genes with multiple splice variants are enriched in pathways linked to cancer and other multigenic diseases. BMC Genomics 2016; 17:225. [PMID: 26968808 PMCID: PMC4788956 DOI: 10.1186/s12864-016-2582-9] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Accepted: 03/08/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The role of random mutations and genetic errors in defining the etiology of cancer and other multigenic diseases has recently received much attention. With the view that complex genes should be particularly vulnerable to such events, here we explore the link between the simple properties of the human genes, such as transcript length, number of splice variants, exon/intron composition, and their involvement in the pathways linked to cancer and other multigenic diseases. RESULTS We reveal a substantial enrichment of cancer pathways with long genes and genes that have multiple splice variants. Although the latter two factors are interdependent, we show that the overall gene length and splicing complexity increase in cancer pathways in a partially decoupled manner. Our systematic survey for the pathways enriched with top lengthy genes and with genes that have multiple splice variants reveal, along with cancer pathways, the pathways involved in various neuronal processes, cardiomyopathies and type II diabetes. We outline a correlation between the gene length and the number of somatic mutations. CONCLUSIONS Our work is a step forward in the assessment of the role of simple gene characteristics in cancer and a wider range of multigenic diseases. We demonstrate a significant accumulation of long genes and genes with multiple splice variants in pathways of multigenic diseases that have already been associated with de novo mutations. Unlike the cancer pathways, we note that the pathways of neuronal processes, cardiomyopathies and type II diabetes contain genes long enough for topoisomerase-dependent gene expression to also be a potential contributing factor in the emergence of pathologies, should topoisomerases become impaired.
Collapse
Affiliation(s)
- Aleksandr B. Sahakyan
- />Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
- />Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE UK
| | - Shankar Balasubramanian
- />Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
- />Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE UK
- />School of Clinical Medicine, University of Cambridge, Cambridge, CB2 0SP UK
| |
Collapse
|
22
|
Price N, Graur D. Are Synonymous Sites in Primates and Rodents Functionally Constrained? J Mol Evol 2015; 82:51-64. [PMID: 26563252 DOI: 10.1007/s00239-015-9719-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2015] [Accepted: 11/04/2015] [Indexed: 11/28/2022]
Abstract
It has been claimed that synonymous sites in mammals are under selective constraint. Furthermore, in many studies the selective constraint at such sites in primates was claimed to be more stringent than that in rodents. Given the larger effective population sizes in rodents than in primates, the theoretical expectation is that selection in rodents would be more effective than that in primates. To resolve this contradiction between expectations and observations, we used processed pseudogenes as a model for strict neutral evolution, and estimated selective constraint on synonymous sites using the rate of substitution at pseudosynonymous and pseudononsynonymous sites in pseudogenes as the neutral expectation. After controlling for the effects of GC content, our results were similar to those from previous studies, i.e., synonymous sites in primates exhibited evidence for higher selective constraint that those in rodents. Specifically, our results indicated that in primates up to 24% of synonymous sites could be under purifying selection, while in rodents synonymous sites evolved neutrally. To further control for shifts in GC content, we estimated selective constraint at fourfold degenerate sites using a maximum parsimony approach. This allowed us to estimate selective constraint using mutational patterns that cause a shift in GC content (GT ↔ TG, CT ↔ TC, GA ↔ AG, and CA ↔ AC) and ones that do not (AT ↔ TA and CG ↔ GC). Using this approach, we found that synonymous sites evolve neutrally in both primates and rodents. Apparent deviations from neutrality were caused by a higher rate of C → A and C → T mutations in pseudogenes. Such differences are most likely caused by the shift in GC content experienced by pseudogenes. We conclude that previous estimates according to which 20-40% of synonymous sites in primates were under selective constraint were most likely artifacts of the biased pattern of mutation.
Collapse
Affiliation(s)
- Nicholas Price
- Department of Bioagricultural Sciences and Pest Management, Colorado State University, Fort Collins, CO, 80523, USA.
| | - Dan Graur
- Department of Biology and Biochemistry, University of Houston, Houston, TX, 77204-5001, USA
| |
Collapse
|
23
|
Koziol U, Radio S, Smircich P, Zarowiecki M, Fernández C, Brehm K. A Novel Terminal-Repeat Retrotransposon in Miniature (TRIM) Is Massively Expressed in Echinococcus multilocularis Stem Cells. Genome Biol Evol 2015; 7:2136-53. [PMID: 26133390 PMCID: PMC4558846 DOI: 10.1093/gbe/evv126] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/26/2015] [Indexed: 12/14/2022] Open
Abstract
Taeniid cestodes (including the human parasites Echinococcus spp. and Taenia solium) have very few mobile genetic elements (MGEs) in their genome, despite lacking a canonical PIWI pathway. The MGEs of these parasites are virtually unexplored, and nothing is known about their expression and silencing. In this work, we report the discovery of a novel family of small nonautonomous long terminal repeat retrotransposons (also known as terminal-repeat retrotransposons in miniature, TRIMs) which we have named ta-TRIM (taeniid TRIM). ta-TRIMs are only the second family of TRIM elements discovered in animals, and are likely the result of convergent reductive evolution in different taxonomic groups. These elements originated at the base of the taeniid tree and have expanded during taeniid diversification, including after the divergence of closely related species such as Echinococcus multilocularis and Echinococcus granulosus. They are massively expressed in larval stages, from a small proportion of full-length copies and from isolated terminal repeats that show transcriptional read-through into downstream regions, generating novel noncoding RNAs and transcriptional fusions to coding genes. In E. multilocularis, ta-TRIMs are specifically expressed in the germinative cells (the somatic stem cells) during asexual reproduction of metacestode larvae. This would provide a developmental mechanism for insertion of ta-TRIMs into cells that will eventually generate the adult germ line. Future studies of active and inactive ta-TRIM elements could give the first clues on MGE silencing mechanisms in cestodes.
Collapse
Affiliation(s)
- Uriel Koziol
- Institute of Hygiene and Microbiology, University of Würzburg, Germany Sección Bioquímica y Biología Molecular, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| | - Santiago Radio
- Laboratorio de Interacciones Moleculares, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay Departamento de Genética, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay
| | - Pablo Smircich
- Laboratorio de Interacciones Moleculares, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay Departamento de Genética, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay
| | - Magdalena Zarowiecki
- Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Cecilia Fernández
- Cátedra de Inmunología, Facultad de Química, Universidad de la República, Montevideo, Uruguay
| | - Klaus Brehm
- Institute of Hygiene and Microbiology, University of Würzburg, Germany
| |
Collapse
|
24
|
Francioli LC, Polak PP, Koren A, Menelaou A, Chun S, Renkens I, van Duijn CM, Swertz M, Wijmenga C, van Ommen G, Slagboom PE, Boomsma DI, Ye K, Guryev V, Arndt PF, Kloosterman WP, de Bakker PIW, Sunyaev SR. Genome-wide patterns and properties of de novo mutations in humans. Nat Genet 2015; 47:822-826. [PMID: 25985141 PMCID: PMC4485564 DOI: 10.1038/ng.3292] [Citation(s) in RCA: 247] [Impact Index Per Article: 27.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2014] [Accepted: 04/07/2015] [Indexed: 12/12/2022]
Abstract
Mutations create variation in the population, fuel evolution, and cause genetic diseases. Current knowledge about de novo mutations is incomplete and mostly indirect 1–10. Here, we analyze 11,020 de novo mutations from whole-genomes of 250 families. We show that de novo mutations in offspring of older fathers are not only more numerous 11–13 but also occur more frequently in early-replicating, genic regions. Functional regions exhibit higher mutation rates due to CpG dinucleotides and reveal signatures of transcription-coupled repair, while mutation clusters with a unique signature point to a novel mutational mechanism. Mutation and recombination rates independently associate with nucleotide diversity, and regional variation in human-chimpanzee divergence is only partly explained by mutation rate heterogeneity. Finally, we provide a genome-wide mutation rate map for medical and population genetics applications. Our results reveal novel insights and refine long-standing hypotheses about human mutagenesis.
Collapse
Affiliation(s)
- Laurent C Francioli
- Department of Medical Genetics, Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Paz P Polak
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Amnon Koren
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Androniki Menelaou
- Department of Medical Genetics, Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Sung Chun
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Ivo Renkens
- Department of Medical Genetics, Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
| | | | | | - Morris Swertz
- University of Groningen, University Medical Center Groningen, Department of Genetics, Groningen, The Netherlands.,University of Groningen, University Medical Center Groningen, Genomics Coordination Center, Groningen, The Netherlands
| | - Cisca Wijmenga
- University of Groningen, University Medical Center Groningen, Department of Genetics, Groningen, The Netherlands.,University of Groningen, University Medical Center Groningen, Genomics Coordination Center, Groningen, The Netherlands
| | - Gertjan van Ommen
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - P Eline Slagboom
- Section of Molecular Epidemiology, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, The Netherlands
| | - Dorret I Boomsma
- Department of Biological Psychology, VU University Amsterdam, Amsterdam, The Netherlands
| | - Kai Ye
- Section of Molecular Epidemiology, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, The Netherlands.,The Genome Institute, Washington University, St. Louis, MO, USA
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Peter F Arndt
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Wigard P Kloosterman
- Department of Medical Genetics, Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Paul I W de Bakker
- Department of Medical Genetics, Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands.,Department of Epidemiology, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Shamil R Sunyaev
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
25
|
Magiorkinis G, Blanco-Melo D, Belshaw R. The decline of human endogenous retroviruses: extinction and survival. Retrovirology 2015; 12:8. [PMID: 25640971 PMCID: PMC4335370 DOI: 10.1186/s12977-015-0136-x] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2014] [Accepted: 01/06/2015] [Indexed: 12/21/2022] Open
Abstract
Background Endogenous Retroviruses (ERVs) are retroviruses that over the course of evolution have integrated into germline cells and eventually become part of the host genome. They proliferate within the germline of their host, making up ~5% of the human and mouse genome sequences. Several lines of evidence have suggested a decline in the rate of ERV integration into the human genome in recent evolutionary history but this has not been investigated quantitatively or possible causes explored. Results By dating the integration of ERV loci in 40 mammal species, we show that the human genome and that of other hominoids (great apes and gibbons) have experienced an approximately four-fold decline in the ERV integration rate over the last 10 million years. A major cause is the recent extinction of one very large ERV lineage (HERV-H), which is responsible for most of the integrations over the last 30 million years. The decline however affects most other ERV lineages. Only about 10% of the decline might be attributed to an accompanying increase in body mass (a trait we have shown recently to be negatively correlated with ERV integration rate). Humans are unusual compared to related species – Old World monkeys, great apes and gibbons – in (a) having not acquired any new ERV lineages during the last 30 million years and (b) the possession of an old ERV lineage that has continued to replicate up until at least the last few hundred thousand years – the potentially medically significant HERVK(HML2). Conclusions The human genome shares with the genome of other great apes and gibbons a recent decline in ERV integration that is not typical of other primates and mammals. The human genome differs from that of related species both in maintaining up until at least recently a replicating old ERV lineage and in not having acquired any new lineages. We speculate that the decline in ERV integration in the human genome has been exacerbated by a relatively low burden of horizontally-transmitted retroviruses and subsequent reduced risk of endogenization. Electronic supplementary material The online version of this article (doi:10.1186/s12977-015-0136-x) contains supplementary material, which is available to authorized users.
Collapse
|
26
|
Fares M. Modeling Evolution of Molecular Sequences. NATURAL SELECTION 2014:28-47. [DOI: 10.1201/b17795-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
27
|
Mazzarella L, Riva L, Luzi L, Ronchini C, Pelicci PG. The Genomic and Epigenomic Landscapes of AML. Semin Hematol 2014; 51:259-72. [DOI: 10.1053/j.seminhematol.2014.08.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
28
|
Abstract
Mutational heterogeneity must be taken into account when reconstructing evolutionary histories, calibrating molecular clocks, and predicting links between genes and disease. Selective pressures and various DNA transactions have been invoked to explain the heterogeneous distribution of genetic variation between species, within populations, and in tissue-specific tumors. To examine relationships between such heterogeneity and variations in leading- and lagging-strand replication fidelity and mismatch repair, we accumulated 40,000 spontaneous mutations in eight diploid yeast strains in the absence of selective pressure. We found that replicase error rates vary by fork direction, coding state, nucleosome proximity, and sequence context. Further, error rates and DNA mismatch repair efficiency both vary by mismatch type, responsible polymerase, replication time, and replication origin proximity. Mutation patterns implicate replication infidelity as one driver of variation in somatic and germline evolution, suggest mechanisms of mutual modulation of genome stability and composition, and predict future observations in specific cancers.
Collapse
|
29
|
Abstract
One lineage of human endogenous retroviruses (HERVs), HERV-K(HML2), is upregulated in many cancers, some autoimmune/inflammatory diseases, and HIV-infected cells. Despite 3 decades of research, it is not known if these viruses play a causal role in disease, and there has been recent interest in whether they can be used as immunotherapy targets. Resolution of both these questions will be helped by an ability to distinguish between the effects of different integrated copies of the virus (loci). Research so far has concentrated on the 20 or so recently integrated loci that, with one exception, are in the human reference genome sequence. However, this viral lineage has been copying in the human population within the last million years, so some loci will inevitably be present in the human population but absent from the reference sequence. We therefore performed the first detailed search for such loci by mining whole-genome sequences generated by next-generation sequencing. We found a total of 17 loci, and the frequency of their presence ranged from only 2 of the 358 individuals examined to over 95% of them. On average, each individual had six loci that are not in the human reference genome sequence. Comparing the number of loci that we found to an expectation derived from a neutral population genetic model suggests that the lineage was copying until at least ∼250,000 years ago. IMPORTANCE About 5% of the human genome sequence is composed of the remains of retroviruses that over millions of years have integrated into the chromosomes of egg and/or sperm precursor cells. There are indications that protein expression of these viruses is higher in some diseases, and we need to know (i) whether these viruses have a role in causing disease and (ii) whether they can be used as immunotherapy targets in some of them. Answering both questions requires a better understanding of how individuals differ in the viruses that they carry. We carried out the first careful search for new viruses in some of the many human genome sequences that are now available thanks to advances in sequencing technology. We also compared the number that we found to a theoretical expectation to see if it is likely that these viruses are still replicating in the human population today.
Collapse
|
30
|
Anvar SY, Khachatryan L, Vermaat M, van Galen M, Pulyakhina I, Ariyurek Y, Kraaijeveld K, den Dunnen JT, de Knijff P, ’t Hoen PAC, Laros JFJ. Determining the quality and complexity of next-generation sequencing data without a reference genome. Genome Biol 2014; 15:555. [PMID: 25514851 PMCID: PMC4298064 DOI: 10.1186/s13059-014-0555-3] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2014] [Accepted: 11/27/2014] [Indexed: 01/22/2023] Open
Abstract
We describe an open-source kPAL package that facilitates an alignment-free assessment of the quality and comparability of sequencing datasets by analyzing k-mer frequencies. We show that kPAL can detect technical artefacts such as high duplication rates, library chimeras, contamination and differences in library preparation protocols. kPAL also successfully captures the complexity and diversity of microbiomes and provides a powerful means to study changes in microbial communities. Together, these features make kPAL an attractive and broadly applicable tool to determine the quality and comparability of sequence libraries even in the absence of a reference sequence. kPAL is freely available at https://github.com/LUMC/kPAL webcite.
Collapse
Affiliation(s)
- Seyed Yahya Anvar
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
- />Leiden Genome Technology Center, Leiden University Medical Center, Leiden, The Netherlands
| | - Lusine Khachatryan
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Martijn Vermaat
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Michiel van Galen
- />Leiden Genome Technology Center, Leiden University Medical Center, Leiden, The Netherlands
| | - Irina Pulyakhina
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Yavuz Ariyurek
- />Leiden Genome Technology Center, Leiden University Medical Center, Leiden, The Netherlands
| | - Ken Kraaijeveld
- />Leiden Genome Technology Center, Leiden University Medical Center, Leiden, The Netherlands
- />Department of Ecological Science, VU University Amsterdam, Amsterdam, The Netherlands
| | - Johan T den Dunnen
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
- />Leiden Genome Technology Center, Leiden University Medical Center, Leiden, The Netherlands
- />Department of Clinical Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Peter de Knijff
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Peter AC ’t Hoen
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Jeroen FJ Laros
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
- />Leiden Genome Technology Center, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
31
|
Zhu T, Xu PZ, Liu JP, Peng S, Mo XC, Gao LZ. Phylogenetic relationships and genome divergence among the AA- genome species of the genus Oryza as revealed by 53 nuclear genes and 16 intergenic regions. Mol Phylogenet Evol 2013; 70:348-61. [PMID: 24148990 DOI: 10.1016/j.ympev.2013.10.008] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2012] [Revised: 08/17/2013] [Accepted: 10/09/2013] [Indexed: 12/17/2022]
Abstract
Rapid radiations have long been regarded as the most challenging issue for elucidating poorly resolved phylogenies in evolutionary biology. The eight diploid AA- genome species in the genus Oryza represent a typical example of a closely spaced series of recent speciation events in plants. However, questions regarding when and how they diversified have long been an issue of extensive interest but remain a mystery. Here, a data set comprising >60 kb of 53 singleton fragments and 16 intergenic regions is used to perform phylogenomic analyses of all eight AA- genome species plus four diploid Oryza species with BB-, CC-, EE- and GG- genomes. We fully reconstruct phylogenetic relationships of AA- genome species with confidence. Oryza meridionalis, native to Australia, is found to be the earliest divergent lineage around 2.93 mya, whereas O. punctata, a BB- genome species, serves as the best outgroup to distinguish their phylogenetic relationships. They separated from O. punctata approximately 9.11 mya during the Miocene epoch, and subsequently radiated to generate the entire AA- genome lineage diversity. The success in resolving the phylogeny of AA- genome species highlights the potential of phylogenomics to determine their divergence and evolutionary histories.
Collapse
Affiliation(s)
- Ting Zhu
- Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species in Southwest China, Kunming Institute of Botany, The Chinese Academy of Sciences, Kunming 650204, China; University of the Chinese Academy of Sciences, Beijing 100039, China.
| | | | | | | | | | | |
Collapse
|
32
|
Rauch A, Wieczorek D, Graf E, Wieland T, Endele S, Schwarzmayr T, Albrecht B, Bartholdi D, Beygo J, Di Donato N, Dufke A, Cremer K, Hempel M, Horn D, Hoyer J, Joset P, Röpke A, Moog U, Riess A, Thiel CT, Tzschach A, Wiesener A, Wohlleber E, Zweier C, Ekici AB, Zink AM, Rump A, Meisinger C, Grallert H, Sticht H, Schenck A, Engels H, Rappold G, Schröck E, Wieacker P, Riess O, Meitinger T, Reis A, Strom TM. Range of genetic mutations associated with severe non-syndromic sporadic intellectual disability: an exome sequencing study. Lancet 2012; 380:1674-82. [PMID: 23020937 DOI: 10.1016/s0140-6736(12)61480-9] [Citation(s) in RCA: 754] [Impact Index Per Article: 62.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
BACKGROUND The genetic cause of intellectual disability in most patients is unclear because of the absence of morphological clues, information about the position of such genes, and suitable screening methods. Our aim was to identify de-novo variants in individuals with sporadic non-syndromic intellectual disability. METHODS In this study, we enrolled children with intellectual disability and their parents from ten centres in Germany and Switzerland. We compared exome sequences between patients and their parents to identify de-novo variants. 20 children and their parents from the KORA Augsburg Diabetes Family Study were investigated as controls. FINDINGS We enrolled 51 participants from the German Mental Retardation Network. 45 (88%) participants in the case group and 14 (70%) in the control group had de-novo variants. We identified 87 de-novo variants in the case group, with an exomic mutation rate of 1·71 per individual per generation. In the control group we identified 24 de-novo variants, which is 1·2 events per individual per generation. More participants in the case group had loss-of-function variants than in the control group (20/51 vs 2/20; p=0·022), suggesting their contribution to disease development. 16 patients carried de-novo variants in known intellectual disability genes with three recurrently mutated genes (STXBP1, SYNGAP1, and SCN2A). We deemed at least six loss-of-function mutations in six novel genes to be disease causing. We also identified several missense alterations with potential pathogenicity. INTERPRETATION After exclusion of copy-number variants, de-novo point mutations and small indels are associated with severe, sporadic non-syndromic intellectual disability, accounting for 45-55% of patients with high locus heterogeneity. Autosomal recessive inheritance seems to contribute little in the outbred population investigated. The large number of de-novo variants in known intellectual disability genes is only partially attributable to known non-specific phenotypes. Several patients did not meet the expected syndromic manifestation, suggesting a strong bias in present clinical syndrome descriptions. FUNDING German Ministry of Education and Research, European Commission 7th Framework Program, and Swiss National Science Foundation.
Collapse
Affiliation(s)
- Anita Rauch
- Institute of Medical Genetics, University of Zurich, Schwerzenbach-Zurich, Switzerland
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
|
34
|
Sen K, Ghosh TC. Evolutionary conservation and disease gene association of the human genes composing pseudogenes. Gene 2012; 501:164-70. [PMID: 22521745 DOI: 10.1016/j.gene.2012.04.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2011] [Revised: 02/09/2012] [Accepted: 04/05/2012] [Indexed: 01/16/2023]
Abstract
Pseudogenes, the 'genomic fossils' present portrayal of evolutionary history of human genome. The human genes configuring pseudogenes are also now coming forth as important resources in the study of human protein evolution. In this communication, we explored evolutionary conservation of the genes forming pseudogenes over the genes lacking any pseudogene and delving deeper, we probed an evolutionary rate difference between the disease genes in the two groups. We illustrated this differential evolutionary pattern by gene expressivity, number of regulatory miRNA targeting per gene, abundance of protein complex forming genes and lesser percentage of protein intrinsic disorderness. Furthermore, pseudogenes are observed to harbor sequence variations, over their entirety, those become degenerative disease-causing mutations though the disease involvement of their progenitors is still unexplored. Here, we unveiled an immense association of disease genes in the genes casting pseudogenes in human. We interpreted the issue by disease associated miRNA targeting, genes containing polymorphisms in miRNA target sites, abundance of genes having disease causing non-synonymous mutations, disease gene specific network properties, presence of genes having repeat regions, affluence of dosage sensitive genes and the presence of intrinsically unstructured protein regions.
Collapse
Affiliation(s)
- Kamalika Sen
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, India.
| | | |
Collapse
|
35
|
Iengar P. An analysis of substitution, deletion and insertion mutations in cancer genes. Nucleic Acids Res 2012; 40:6401-13. [PMID: 22492711 PMCID: PMC3413105 DOI: 10.1093/nar/gks290] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Cancer-associated mutations in cancer genes constitute a diverse set of mutations associated with the disease. To gain insight into features of the set, substitution, deletion and insertion mutations were analysed at the nucleotide level, from the COSMIC database. The most frequent substitutions were c→t, g→a, g→t, and the most frequent codon changes were to termination codons. Deletions more than insertions, FS (frameshift) indels more than I-F (in-frame) ones, and single-nucleotide indels, were frequent. FS indels cause loss of significant fractions of proteins. The 5′-cut in FS deletions, and 5′-ligation in FS insertions, often occur between pairs of identical bases. Interestingly, the cut-site and 3′-ligation in insertions, and 3′-cut and join-pair in deletions, were each found to be the same significantly often (p < 0.001). It is suggested that these features aid the incorporation of indel mutations. Tumor suppressors undergo larger numbers of mutations, especially disruptive ones, over the entire protein length, to inactivate two alleles. Proto-oncogenes undergo fewer, less-disruptive mutations, in selected protein regions, to activate a single allele. Finally, catalogues, in ranked order, of genes mutated in each cancer, and cancers in which each gene is mutated, were created. The study highlights the nucleotide level preferences and disruptive nature of cancer mutations.
Collapse
Affiliation(s)
- Prathima Iengar
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560012, India.
| |
Collapse
|
36
|
Uno Y, Osada N. CpG site degeneration triggered by the loss of functional constraint created a highly polymorphic macaque drug-metabolizing gene, CYP1A2. BMC Evol Biol 2011; 11:283. [PMID: 21961956 PMCID: PMC3199271 DOI: 10.1186/1471-2148-11-283] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2011] [Accepted: 10/01/2011] [Indexed: 11/29/2022] Open
Abstract
Background Elucidating the pattern of evolutionary changes in drug-metabolizing genes is an important subject not only for evolutionary but for biomedical research. We investigated the pattern of divergence and polymorphisms of macaque CYP1A1 and CYP1A2 genes, which are major drug-metabolizing genes in humans. In humans, CYP1A2 is specifically expressed in livers while CYP1A1 has a wider gene expression pattern in extrahepatic tissues. In contrast, macaque CYP1A2 is expressed at a much lower level than CYP1A1 in livers. Interestingly, a previous study has shown that Macaca fascicularis CYP1A2 harbored unusually high genetic diversity within species. Genomic regions showing high genetic diversity within species is occasionally interpreted as a result of balancing selection, where natural selection maintains highly diverged alleles with different functions. Nevertheless many other forces could create such signatures. Results We found that the CYP1A1/2 gene copy number and orientation has been highly conserved among mammalian genomes. The signature of gene conversion between CYP1A1 and CYP1A2 was detected, but the last gene conversion event in the simian primate lineage occurred before the Catarrhini-Platyrrhini divergence. The high genetic diversity of macaque CYP1A2 therefore cannot be explained by gene conversion between CYP1A1 and CYP1A2. By surveying CYP1A2 polymorphisms in total 91 M. fascicularis and M. mulatta, we found several null alleles segregating in these species, indicating functional constraint on CYP1A2 in macaques may have weakened after the divergence between humans and macaques. We propose that the high genetic diversity in macaque CYP1A2 is partly due to the degeneration of CpG sites, which had been maintained at a high level by purifying selection, and the rapid degeneration process was initiated by the loss of functional constraint on macaque CYP1A2. Conclusions Our findings show that the highly polymorphic CYP1A2 gene in macaques has not been created by balancing selection but by the burst of CpG site degeneration after loss of functional constraint. Because the functional importance of CYP1A1/2 genes is different between humans and macaques, we have to be cautious in extrapolating a drug-testing data using substrates metabolized by CYP1A genes from macaques to humans, despite of their somewhat overlapping substrate specificity.
Collapse
Affiliation(s)
- Yasuhiro Uno
- Pharmacokinetics and Bioanalysis Center, Shin Nippon BiomedicalLaboratories, Ltd., Kainan, Wakayama 642-0017, Japan
| | | |
Collapse
|
37
|
Kumar S, Filipski AJ, Battistuzzi FU, Kosakovsky Pond SL, Tamura K. Statistics and truth in phylogenomics. Mol Biol Evol 2011; 29:457-72. [PMID: 21873298 DOI: 10.1093/molbev/msr202] [Citation(s) in RCA: 164] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Phylogenomics refers to the inference of historical relationships among species using genome-scale sequence data and to the use of phylogenetic analysis to infer protein function in multigene families. With rapidly decreasing sequencing costs, phylogenomics is becoming synonymous with evolutionary analysis of genome-scale and taxonomically densely sampled data sets. In phylogenetic inference applications, this translates into very large data sets that yield evolutionary and functional inferences with extremely small variances and high statistical confidence (P value). However, reports of highly significant P values are increasing even for contrasting phylogenetic hypotheses depending on the evolutionary model and inference method used, making it difficult to establish true relationships. We argue that the assessment of the robustness of results to biological factors, that may systematically mislead (bias) the outcomes of statistical estimation, will be a key to avoiding incorrect phylogenomic inferences. In fact, there is a need for increased emphasis on the magnitude of differences (effect sizes) in addition to the P values of the statistical test of the null hypothesis. On the other hand, the amount of sequence data available will likely always remain inadequate for some phylogenomic applications, for example, those involving episodic positive selection at individual codon positions and in specific lineages. Again, a focus on effect size and biological relevance, rather than the P value, may be warranted. Here, we present a theoretical overview and discuss practical aspects of the interplay between effect sizes, bias, and P values as it relates to the statistical inference of evolutionary truth in phylogenomics.
Collapse
Affiliation(s)
- Sudhir Kumar
- Center for Evolutionary Medicine and Informatics, Biodesign Institute, Arizona State University, Arizona, USA.
| | | | | | | | | |
Collapse
|
38
|
Arbiza L, Patricio M, Dopazo H, Posada D. Genome-wide heterogeneity of nucleotide substitution model fit. Genome Biol Evol 2011; 3:896-908. [PMID: 21824869 PMCID: PMC3175760 DOI: 10.1093/gbe/evr080] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
At a genomic scale, the patterns that have shaped molecular evolution are believed to be largely heterogeneous. Consequently, comparative analyses should use appropriate probabilistic substitution models that capture the main features under which different genomic regions have evolved. While efforts have concentrated in the development and understanding of model selection techniques, no descriptions of overall relative substitution model fit at the genome level have been reported. Here, we provide a characterization of best-fit substitution models across three genomic data sets including coding regions from mammals, vertebrates, and Drosophila (24,000 alignments). According to the Akaike Information Criterion (AIC), 82 of 88 models considered were selected as best-fit models at least in one occasion, although with very different frequencies. Most parameter estimates also varied broadly among genes. Patterns found for vertebrates and Drosophila were quite similar and often more complex than those found in mammals. Phylogenetic trees derived from models in the 95% confidence interval set showed much less variance and were significantly closer to the tree estimated under the best-fit model than trees derived from models outside this interval. Although alternative criteria selected simpler models than the AIC, they suggested similar patterns. All together our results show that at a genomic scale, different gene alignments for the same set of taxa are best explained by a large variety of different substitution models and that model choice has implications on different parameter estimates including the inferred phylogenetic trees. After taking into account the differences related to sample size, our results suggest a noticeable diversity in the underlying evolutionary process. All together, we conclude that the use of model selection techniques is important to obtain consistent phylogenetic estimates from real data at a genomic scale.
Collapse
Affiliation(s)
- Leonardo Arbiza
- Department of Biochemistry, Genetics, and Immunology, University of Vigo, Vigo, Spain
| | | | | | | |
Collapse
|
39
|
Suzuki Y. Statistical methods for detecting natural selection from genomic data. Genes Genet Syst 2011; 85:359-76. [PMID: 21415566 DOI: 10.1266/ggs.85.359] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
In the study of molecular and phenotypic evolution, understanding the relative importance of random genetic drift and positive selection as the mechanisms for driving divergences between populations and maintaining polymorphisms within populations has been a central issue. A variety of statistical methods has been developed for detecting natural selection operating at the amino acid and nucleotide sequence levels. These methods may be largely classified into those aimed at detecting recurrent and/or recent/ongoing natural selection by utilizing the divergence and/or polymorphism data. Using these methods, pervasive positive selection has been identified for protein-coding and non-coding sequences in the genomic analysis of some organisms. However, many of these methods have been criticized by using computer simulation and real data analysis to produce excessive false-positives and to be sensitive to various disturbing factors. Importantly, some of these methods have been invalidated experimentally. These facts indicate that many of the statistical methods for detecting natural selection are unreliable. In addition, the signals that have been believed as the evidence for fixations of advantageous mutations due to positive selection may also be interpreted as the evidence for fixations of deleterious mutations due to random genetic drift. The genomic diversity data are rapidly accumulating in various organisms, and detection of natural selection may play a critical role for clarifying the relative role of random genetic drift and positive selection in molecular and phenotypic evolution. It is therefore important to develop reliable statistical methods that are unbiased as well as robust against various disturbing factors, for inferring natural selection.
Collapse
Affiliation(s)
- Yoshiyuki Suzuki
- Graduate School of Natural Sciences, Nagoya City University, Japan.
| |
Collapse
|
40
|
Brown CA, Scharner J, Felice K, Meriggioli MN, Tarnopolsky M, Bower M, Zammit PS, Mendell JR, Ellis JA. Novel and recurrent EMD mutations in patients with Emery–Dreifuss muscular dystrophy, identify exon 2 as a mutation hot spot. J Hum Genet 2011; 56:589-94. [DOI: 10.1038/jhg.2011.65] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
41
|
Inference of mutation parameters and selective constraint in mammalian coding sequences by approximate Bayesian computation. Genetics 2011; 187:1153-61. [PMID: 21288873 DOI: 10.1534/genetics.110.124073] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
We develop an inference method that uses approximate Bayesian computation (ABC) to simultaneously estimate mutational parameters and selective constraint on the basis of nucleotide divergence for protein-coding genes between pairs of species. Our simulations explicitly model CpG hypermutability and transition vs. transversion mutational biases along with negative and positive selection operating on synonymous and nonsynonymous sites. We evaluate the method by simulations in which true mean parameter values are known and show that it produces reasonably unbiased parameter estimates as long as sequences are not too short and sequence divergence is not too low. We show that the use of quadratic regression within ABC offers an improvement over linear regression, but that weighted regression has little impact on the efficiency of the procedure. We apply the method to estimate mutational and selective constraint parameters in data sets of protein-coding genes extracted from the genome sequences of primates, murids, and carnivores. Estimates of CpG hypermutability are substantially higher in primates than murids and carnivores. Nonsynonymous site selective constraint is substantially higher in murids and carnivores than primates, and autosomal nonsynonymous constraint is higher than X-chromsome constraint in all taxa. We detect significant selective constraint at synonymous sites in primates, carnivores, and murid rodents. Synonymous site selective constraint is weakest in murids, a surprising result, considering that murid effective population sizes are likely to be considerably higher than the other two taxa.
Collapse
|
42
|
Stover DA, Verrelli BC. Comparative Vertebrate Evolutionary Analyses of Type I Collagen: Potential of COL1a1 Gene Structure and Intron Variation for Common Bone-Related Diseases. Mol Biol Evol 2010; 28:533-42. [DOI: 10.1093/molbev/msq221] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
|
43
|
Sen K, Podder S, Ghosh TC. Insights into the genomic features and evolutionary impact of the genes configuring duplicated pseudogenes in human. FEBS Lett 2010; 584:4015-8. [PMID: 20708614 DOI: 10.1016/j.febslet.2010.08.012] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2010] [Revised: 08/05/2010] [Accepted: 08/06/2010] [Indexed: 10/19/2022]
Abstract
Pseudogenes, regarded as 'genomic fossils', are DNA sequences resembling functional genes in perspective of sequence homology but completely non-functional. In this study, we explored the unique characteristic features of human genes, configuring classical duplicated pseudogenes. We found that progenitors of duplicated pseudogenes are characterized by a high expressivity, and ability to encode hub-proteins in association with a high evolutionary rate. Such unusual features are endorsed by longer protein length, elevated CpG content, and a high recombination rate. The non-functionalization of their duplicated copies can be attributed to the overabundance of gene paralog number in concert with functional redundancy.
Collapse
Affiliation(s)
- Kamalika Sen
- Bioinformatics Centre, Bose Institute, Kolkata, India
| | | | | |
Collapse
|
44
|
Claw KG, Tito RY, Stone AC, Verrelli BC. Haplotype structure and divergence at human and chimpanzee serotonin transporter and receptor genes: implications for behavioral disorder association analyses. Mol Biol Evol 2010; 27:1518-29. [PMID: 20118193 DOI: 10.1093/molbev/msq030] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Genetic variation in the human serotonin system has long-been studied because of its functional consequences and links to various behavior-related disorders and it being routinely targeted in research and development for drug therapy. However, aside from clinical studies, little is known about this genetic diversity and how it differs within and between human populations with respect to haplotype structure, which can greatly impact phenotype association studies. In addition, no evolutionary approach among humans and other primates has examined how long- and short-term selective pressures explain existing serotonin variation. Here, we examine DNA sequence variation in natural population samples of 192 human and 40 chimpanzee chromosome sequences for the most commonly implicated approximately 38-kb serotonin transporter (SLC6A4) and approximately 63-kb serotonin 2A receptor (HTR2A) genes. Our comparative population genetic analyses find significant linkage disequilibrium associated with functionally relevant variants in humans, as well as geographic variation for these haplotypes, at both loci. In addition, although amino acid divergence is consistent with purifying selection, promoter and untranslated regions exhibit significantly high divergence in both species lineages. These evolutionary analyses imply that the serotonin system may have accumulated significant regulatory variation over both recent and ancient periods of time in both humans and chimpanzees. We discuss the implications of this variation for disease association studies and for the evolution of behavior-related phenotypes during the divergence of humans and our closest primate relatives.
Collapse
Affiliation(s)
- Katrina G Claw
- Center for Evolutionary Functional Genomics, The Biodesign Institute and School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | | | | | | |
Collapse
|
45
|
Medvedeva YA, Fridman MV, Oparina NJ, Malko DB, Ermakova EO, Kulakovskiy IV, Heinzel A, Makeev VJ. Intergenic, gene terminal, and intragenic CpG islands in the human genome. BMC Genomics 2010; 11:48. [PMID: 20085634 PMCID: PMC2817693 DOI: 10.1186/1471-2164-11-48] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2009] [Accepted: 01/19/2010] [Indexed: 11/10/2022] Open
Abstract
Background Recently, it has been discovered that the human genome contains many transcription start sites for non-coding RNA. Regulatory regions related to transcription of this non-coding RNAs are poorly studied. Some of these regulatory regions may be associated with CpG islands located far from transcription start-sites of any protein coding gene. The human genome contains many such CpG islands; however, until now their properties were not systematically studied. Results We studied CpG islands located in different regions of the human genome using methods of bioinformatics and comparative genomics. We have observed that CpG islands have a preference to overlap with exons, including exons located far from transcription start site, but usually extend well into introns. Synonymous substitution rate of CpG-containing codons becomes substantially reduced in regions where CpG islands overlap with protein-coding exons, even if they are located far downstream from transcription start site. CAGE tag analysis displayed frequent transcription start sites in all CpG islands, including those found far from transcription start sites of protein coding genes. Computational prediction and analysis of published ChIP-chip data revealed that CpG islands contain an increased number of sites recognized by Sp1 protein. CpG islands containing more CAGE tags usually also contain more Sp1 binding sites. This is especially relevant for CpG islands located in 3' gene regions. Various examples of transcription, confirmed by mRNAs or ESTs, but with no evidence of protein coding genes, were found in CAGE-enriched CpG islands located far from transcription start site of any known protein coding gene. Conclusions CpG islands located far from transcription start sites of protein coding genes have transcription initiation activity and display Sp1 binding properties. In exons, overlapping with these islands, the synonymous substitution rate of CpG containing codons is decreased. This suggests that these CpG islands are involved in transcription initiation, possibly of some non-coding RNAs.
Collapse
Affiliation(s)
- Yulia A Medvedeva
- Research Institute for Genetics and Selection of Industrial Microorganisms, Genetika, 1st Dorozhny proezd, 1, Moscow, 117545, Russia.
| | | | | | | | | | | | | | | |
Collapse
|
46
|
Suzuki Y, Gojobori T, Kumar S. Methods for incorporating the hypermutability of CpG dinucleotides in detecting natural selection operating at the amino acid sequence level. Mol Biol Evol 2009; 26:2275-84. [PMID: 19581348 DOI: 10.1093/molbev/msp133] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
In detecting natural selection operating at the amino acid sequence level by comparing the rates of synonymous (r(S)) and nonsynonymous (r(N)) substitutions, the rates of synonymous and nonsynonymous mutations are assumed to be approximately the same. In reality, however, these rates may not be the same if different proportions of synonymous and nonsynonymous sites overlap with CpG dinucleotides, which are known to be hypermutable in some organisms. Here, we develop the evolutionary pathway methods for comparing r(S) and r(N) at multiple codon sites (all-sites analysis) and at single codon sites (single-site analysis) that take into account the hypermutability at CpG dinucleotides in estimating the number of synonymous substitutions per synonymous site (d(S)) and nonsynonymous substitutions per nonsynonymous site (d(N)). Computer simulations show that the direction and magnitude of the bias in the estimation of d(N)/d(S) caused by the hypermutability of CpGs are determined by both the number of CpGs and the relative proportions of synonymous and nonsynonymous sites overlapping with CpGs. This bias is greatly reduced when using the methods we propose to account for the hypermutability of CpG dinucleotides. In an all-sites analysis of protamine 1 genes from primates, d(N)/d(S) > 1 was observed for many pairs if the hypermutability was ignored. However, d(N)/d(S) becomes <or=1 for most of these pairs when the CpG sites are assumed to be hypermutable. Therefore, statistical indications of positive selection in some sequences or individual codons may be caused by mutation rate differences in synonymous and nonsynonymous sites.
Collapse
Affiliation(s)
- Yoshiyuki Suzuki
- Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, Mishima, Shizuoka, Japan.
| | | | | |
Collapse
|
47
|
Li JB, Gao Y, Aach J, Zhang K, Kryukov GV, Xie B, Ahlford A, Yoon JK, Rosenbaum AM, Zaranek AW, LeProust E, Sunyaev SR, Church GM. Multiplex padlock targeted sequencing reveals human hypermutable CpG variations. Genome Res 2009; 19:1606-15. [PMID: 19525355 DOI: 10.1101/gr.092213.109] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Utilizing the full power of next-generation sequencing often requires the ability to perform large-scale multiplex enrichment of many specific genomic loci in multiple samples. Several technologies have been recently developed but await substantial improvements. We report the 10,000-fold improvement of a previously developed padlock-based approach, and apply the assay to identifying genetic variations in hypermutable CpG regions across human chromosome 21. From approximately 3 million reads derived from a single Illumina Genome Analyzer lane, approximately 94% (approximately 50,500) target sites can be observed with at least one read. The uniformity of coverage was also greatly improved; up to 93% and 57% of all targets fell within a 100- and 10-fold coverage range, respectively. Alleles at >400,000 target base positions were determined across six subjects and examined for single nucleotide polymorphisms (SNPs), and the concordance with independently obtained genotypes was 98.4%-100%. We detected >500 SNPs not currently in dbSNP, 362 of which were in targeted CpG locations. Transitions in CpG sites were at least 13.7 times more abundant than non-CpG transitions. Fractions of polymorphic CpG sites are lower in CpG-rich regions and show higher correlation with human-chimpanzee divergence within CpG versus non-CpG sites. This is consistent with the hypothesis that methylation rate heterogeneity along chromosomes contributes to mutation rate variation in humans. Our success suggests that targeted CpG resequencing is an efficient way to identify common and rare genetic variations. In addition, the significantly improved padlock capture technology can be readily applied to other projects that require multiplex sample preparation.
Collapse
Affiliation(s)
- Jin Billy Li
- Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
48
|
MacEachern S, McEwan J, McCulloch A, Mather A, Savin K, Goddard M. Molecular evolution of the Bovini tribe (Bovidae, Bovinae): is there evidence of rapid evolution or reduced selective constraint in Domestic cattle? BMC Genomics 2009; 10:179. [PMID: 19393048 PMCID: PMC2681479 DOI: 10.1186/1471-2164-10-179] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2008] [Accepted: 04/24/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND If mutation within the coding region of the genome is largely not adaptive, the ratio of nonsynonymous (dN) to synonymous substitutions (dS) per site (dN/dS) should be approximately equal among closely related species. Furthermore, dN/dS in divergence between species should be equivalent to dN/dS in polymorphisms. This hypothesis is of particular interest in closely related members of the Bovini tribe, because domestication has promoted rapid phenotypic divergence through strong artificial selection of some species while others remain undomesticated. We examined a number of genes that may be involved in milk production in Domestic cattle and a number of their wild relatives for evidence that domestication had affected molecular evolution. Elevated rates of dN/dS were further queried to determine if they were the result of positive selection, low effective population size (N(e)) or reduced selective constraint. RESULTS We have found that the domestication process has contributed to higher dN/dS ratios in cattle, especially in the lineages leading to the Domestic cow (Bos taurus) and Mithan (Bos frontalis) and within some breeds of Domestic cow. However, the high rates of dN/dS polymorphism within B. taurus when compared to species divergence suggest that positive selection has not elevated evolutionary rates in these genes. Likewise, the low rate of dN/dS in Bison, which has undergone a recent population bottleneck, indicates a reduction in population size alone is not responsible for these observations. CONCLUSION The effect of selection depends on effective population size and the selection coefficient (N(e)s). Typically under domestication both selection pressure for traits important in fitness in the wild and Ne are reduced. Therefore, reduced selective constraint could be responsible for the observed elevated evolutionary ratios in domesticated species, especially in B. taurus and B. frontalis, which have the highest dN/dS in the Bovini. This may have important implications for tests of selection such as the McDonald-Kreitman test. Surprisingly we have also detected a significant difference in the supposed neutral substitution rate between synonymous and noncoding sites in the Bovine genome, with a 30% higher rate of substitution at synonymous sites. This is due, at least in part, to an excess of the highly mutable CpG dinucleotides at synonymous sites, which will have implications for time of divergence estimates from molecular data.
Collapse
Affiliation(s)
- Sean MacEachern
- Primary Industries Research Victoria, Animal Genetics and Genomics, Attwood, VIC, Australia.
| | | | | | | | | | | |
Collapse
|
49
|
MacEachern S, Hayes B, McEwan J, Goddard M. An examination of positive selection and changing effective population size in Angus and Holstein cattle populations (Bos taurus) using a high density SNP genotyping platform and the contribution of ancient polymorphism to genomic diversity in Domestic cattle. BMC Genomics 2009; 10:181. [PMID: 19393053 PMCID: PMC2681480 DOI: 10.1186/1471-2164-10-181] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2008] [Accepted: 04/24/2009] [Indexed: 12/03/2022] Open
Abstract
Background Identifying recent positive selection signatures in domesticated animals could provide information on genome response to strong directional selection from domestication and artificial selection. With the completion of the cattle genome, private companies are now providing large numbers of polymorphic markers for probing variation in domestic cattle (Bos taurus). We analysed over 7,500 polymorphic single nucleotide polymorphisms (SNP) in beef (Angus) and dairy (Holstein) cattle and outgroup species Bison, Yak and Banteng in an indirect test of inbreeding and positive selection in Domestic cattle. Results Outgroup species: Bison, Yak and Banteng, were genotyped with high levels of success (90%) and used to determine ancestral and derived allele states in domestic cattle. Frequency spectrums of the derived alleles in Angus and Holstein were examined using Fay and Wu's H test. Significant divergences from the predicted frequency spectrums expected under neutrality were identified. This appeared to be the result of combined influences of positive selection, inbreeding and ascertainment bias for moderately frequent SNP. Approximately 10% of all polymorphisms identified as segregating in B. taurus were also segregating in Bison, Yak or Banteng; highlighting a large number of polymorphisms that are ancient in origin. Conclusion These results suggest that a large effective population size (Ne) of approximately 90,000 or more existed in B. taurus since they shared a common ancestor with Bison, Yak and Banteng ~1–2 million years ago (MYA). More recently Ne decreased sharply probably associated with domestication. This may partially explain the paradox of high levels of polymorphism in Domestic cattle and the relatively small recent Ne in this species. The period of inbreeding caused Fay and Wu's H statistic to depart from its expectation under neutrality mimicking the effect of selection. However, there was also evidence for selection, because high frequency derived alleles tended to cluster near each other on the genome.
Collapse
Affiliation(s)
- Sean MacEachern
- Primary Industries Research Victoria, Animal Genetics and Genomics, Attwood, VIC, Australia.
| | | | | | | |
Collapse
|
50
|
MacEachern S, McEwan J, Goddard M. Phylogenetic reconstruction and the identification of ancient polymorphism in the Bovini tribe (Bovidae, Bovinae). BMC Genomics 2009; 10:177. [PMID: 19393045 PMCID: PMC2694835 DOI: 10.1186/1471-2164-10-177] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2008] [Accepted: 04/24/2009] [Indexed: 11/22/2022] Open
Abstract
Background The Bovinae subfamily incorporates an array of antelope, buffalo and cattle species. All of the members of this subfamily have diverged recently. Not surprisingly, a number of phylogenetic studies from molecular and morphological data have resulted in ambiguous trees and relationships amongst species, especially for Yak and Bison species. A partial phylogenetic reconstruction of 13 extant members of the Bovini tribe (Bovidae, Bovinae) from 15 complete or partially sequenced autosomal genes is presented. Results We identified 3 distinct lineages after the Bovini split from the Boselaphini and Tragelaphini tribes, which has lead to the (1) Buffalo clade (Bubalus and Syncerus species) and a more recent divergence leading to the (2) Banteng, Gaur and Mithan and (3) Domestic cattle clades. A fourth lineage may also exist that leads to Bison and Yak. However, there was some ambiguity as to whether this was a divergence from the Banteng/Gaur/Mithan or the Domestic cattle clade. From an analysis of approximately 30,000 sites that were amplified in all species 133 sites were identified with ambiguous inheritance, in that all trees implied more than one mutation at the same site. Closer examination of these sites has identified that they are the result of ancient polymorphisms that have subsequently undergone lineage sorting in the Bovini tribe, of which 53 have remained polymorphic since Bos and Bison species last shared a common ancestor with Bubalus between 5–8 million years ago (MYA). Conclusion Uncertainty arises in our phylogenetic reconstructions because many species in the Bovini diverged over a short period of time. It appears that a number of sites with ambiguous inheritance have been maintained in subsequent populations by chance (lineage sorting) and that they have contributed to an association between Yak and Domestic cattle and an unreliable phylogenetic reconstruction for the Bison/Yak clade. Interestingly, a number of these aberrant sites are in coding sections of the genome and their identification may have important implications for studying the neutral rate of mutation at nonsynonymous sites. The presence of these sites could help account for the apparent contradiction between levels of polymorphism and effective population size in domesticated cattle.
Collapse
Affiliation(s)
- Sean MacEachern
- Primary Industries Research Victoria, Animal Genetics and Genomics, Attwood, VIC, Australia.
| | | | | |
Collapse
|