1
|
Shafat Z, Ahmed A, Parvez MK, Parveen S. Sequence to structure analysis of the ORF4 protein from Hepatitis E virus. Bioinformation 2022; 17:818-828. [PMID: 35539889 PMCID: PMC9049080 DOI: 10.6026/97320630017818] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Revised: 09/21/2021] [Accepted: 09/21/2021] [Indexed: 02/06/2023] Open
Abstract
Hepatitis E virus (HEV) is the main cause of acute hepatitis worldwide. HEV accounts for up to 30% mortality rate in pregnant women, with highest incidences reported for genotype 1 (G1) HEV. The contributing factors in adverse cases during pregnancy
in women due to HEV infection is still debated. The mechanism underlying the pathogenesis of viral infection is attributed to different genomic component of HEV, i.e., open reading frames (ORFs): ORF1, ORF2, ORF3 and ORF4. Recently, ORF4 has been discovered
in enhancing the replication of GI isolates of HEV through regulation of an IRES-like RNA element. However, its characterization through computational methodologies remains unexplored. In this novel study, we provide comprehensive overview of ORF4 protein's
genetic and molecular characteristics through analyzing its sequence and different structural levels. A total of three different datasets (Human, Rat and Ferret) of ORF4 genomes were built and comparatively analyzed. Several non-synonymous mutations in
conjunction with higher entropy values were observed in rat and ferret datasets, however, limited variation was observed in human ORF4 genomes. Higher transition to tranversion ratio was observed in the ORF4 genomes. Studies have reported the association of
intrinsic disordered proteins (IDP) with drug discovery due to its role in several signaling and regulatory processes through protein-protein interactions (PPIs). As PPIs are potent drug target sources, thus the ORF4 protein was explored by analyzing its
polypeptide structure in order to shed light on its intrinsic disorder. Pressures that lead towards preponderance of disordered-promoting amino acid residues shaped the evolution of ORF4. The intrinsic disorder propensity analysis revealed ORF4 protein
(Human) as a highly disordered protein (IDP). Predominance of coils and lack of secondary structure further substantiated our findings suggesting its involvement in binding to ligand molecules. Thus, ORF4 contributes to cellular signaling processes through
protein-protein interactions, as IDPs are targets for regulation to accelerate the process of drug designing strategies against HEV infections.
Collapse
Affiliation(s)
- Zoya Shafat
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi, India
| | - Anwar Ahmed
- Centre of Excellence in Biotechnology Research, College of Science, King Saud University, Riyadh, Saudi Arabia
| | - Mohammad K Parvez
- Department of Pharmacognosy, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Shama Parveen
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi, India
| |
Collapse
|
2
|
Keegan NP, Wilton SD, Fletcher S. Analysis of Pathogenic Pseudoexons Reveals Novel Mechanisms Driving Cryptic Splicing. Front Genet 2022; 12:806946. [PMID: 35140743 PMCID: PMC8819188 DOI: 10.3389/fgene.2021.806946] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 12/09/2021] [Indexed: 12/16/2022] Open
Abstract
Understanding pre-mRNA splicing is crucial to accurately diagnosing and treating genetic diseases. However, mutations that alter splicing can exert highly diverse effects. Of all the known types of splicing mutations, perhaps the rarest and most difficult to predict are those that activate pseudoexons, sometimes also called cryptic exons. Unlike other splicing mutations that either destroy or redirect existing splice events, pseudoexon mutations appear to create entirely new exons within introns. Since exon definition in vertebrates requires coordinated arrangements of numerous RNA motifs, one might expect that pseudoexons would only arise when rearrangements of intronic DNA create novel exons by chance. Surprisingly, although such mutations do occur, a far more common cause of pseudoexons is deep-intronic single nucleotide variants, raising the question of why these latent exon-like tracts near the mutation sites have not already been purged from the genome by the evolutionary advantage of more efficient splicing. Possible answers may lie in deep intronic splicing processes such as recursive splicing or poison exon splicing. Because these processes utilize intronic motifs that benignly engage with the spliceosome, the regions involved may be more susceptible to exonization than other intronic regions would be. We speculated that a comprehensive study of reported pseudoexons might detect alignments with known deep intronic splice sites and could also permit the characterisation of novel pseudoexon categories. In this report, we present and analyse a catalogue of over 400 published pseudoexon splice events. In addition to confirming prior observations of the most common pseudoexon mutation types, the size of this catalogue also enabled us to suggest new categories for some of the rarer types of pseudoexon mutation. By comparing our catalogue against published datasets of non-canonical splice events, we also found that 15.7% of pseudoexons exhibit some splicing activity at one or both of their splice sites in non-mutant cells. Importantly, this included seven examples of experimentally confirmed recursive splice sites, confirming for the first time a long-suspected link between these two splicing phenomena. These findings have the potential to improve the fidelity of genetic diagnostics and reveal new targets for splice-modulating therapies.
Collapse
Affiliation(s)
- Niall P. Keegan
- Centre for Molecular Medicine and Innovative Therapeutics, Health Futures Institute, Murdoch University, Perth, WA, Australia
- Centre for Neuromuscular and Neurological Disorders, Perron Institute for Neurological and Translational Science, The University of Western Australia, Perth, WA, Australia
- *Correspondence: Niall P. Keegan,
| | - Steve D. Wilton
- Centre for Molecular Medicine and Innovative Therapeutics, Health Futures Institute, Murdoch University, Perth, WA, Australia
- Centre for Neuromuscular and Neurological Disorders, Perron Institute for Neurological and Translational Science, The University of Western Australia, Perth, WA, Australia
| | - Sue Fletcher
- Centre for Molecular Medicine and Innovative Therapeutics, Health Futures Institute, Murdoch University, Perth, WA, Australia
- Centre for Neuromuscular and Neurological Disorders, Perron Institute for Neurological and Translational Science, The University of Western Australia, Perth, WA, Australia
| |
Collapse
|
3
|
Ciprofloxacin induced antibiotic resistance in Salmonella Typhimurium mutants and genome analysis. Arch Microbiol 2021; 203:6131-6142. [PMID: 34585273 DOI: 10.1007/s00203-021-02577-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 09/07/2021] [Accepted: 09/12/2021] [Indexed: 10/20/2022]
Abstract
Antibiotic resistance of Salmonella species is well reported. Ciprofloxacin is the frontline antibiotic for salmonellosis. The repeated exposure to ciprofloxacin leads to resistant strains. After 20 cycles of antibiotic exposure, resistant bacterial clones were evaluated. The colony size of the mutants was small and had an extended lag phase compared to parent strain. The whole genome sequencing showed 40,513 mutations across the genome. Small percentage (5.2%) of mutations was non-synonymous. Four-fold more transitions were observed than transversions. Ratio of < 1 transition vs transversion showed a positive selection for antibiotic resistant trait. Mutation distribution across the genome was uniform. The native plasmid was an exception and 2 mutations were observed on 90 kb plasmid. The important genes like dnaE, gyrA, iroC, metH and rpoB involved in antibiotic resistance had point mutations. The genome analysis revealed most of the metabolic pathways were affected.
Collapse
|
4
|
Zou Z, Zhang J. Are Nonsynonymous Transversions Generally More Deleterious than Nonsynonymous Transitions? Mol Biol Evol 2021; 38:181-191. [PMID: 32805043 PMCID: PMC7783172 DOI: 10.1093/molbev/msaa200] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
It has been suggested that, due to the structure of the genetic code, nonsynonymous transitions are less likely than transversions to cause radical changes in amino acid physicochemical properties so are on average less deleterious. This view was supported by some but not all mutagenesis experiments. Because laboratory measures of fitness effects have limited sensitivities and relative frequencies of different mutations in mutagenesis studies may not match those in nature, we here revisit this issue using comparative genomics. We extend the standard codon model of sequence evolution by adding the parameter η that quantifies the ratio of the fixation probability of transitional nonsynonymous mutations to that of transversional nonsynonymous mutations. We then estimate η from the concatenated alignment of all protein-coding DNA sequences of two closely related genomes. Surprisingly, η ranges from 0.13 to 2.0 across 90 species pairs sampled from the tree of life, with 51 incidences of η < 1 and 30 incidences of η >1 that are statistically significant. Hence, whether nonsynonymous transversions are overall more deleterious than nonsynonymous transitions is species-dependent. Because the corresponding groups of amino acid replacements differ between nonsynonymous transitions and transversions, η is influenced by the relative exchangeabilities of amino acid pairs. Indeed, an extensive search reveals that the large variation in η is primarily explainable by the recently reported among-species disparity in amino acid exchangeabilities. These findings demonstrate that genome-wide nucleotide substitution patterns in coding sequences have species-specific features and are more variable among evolutionary lineages than are currently thought.
Collapse
Affiliation(s)
- Zhengting Zou
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI
- Corresponding author: E-mail: .Associate editor: Jeffrey Townsend
| |
Collapse
|
5
|
Thomforde J, Fu I, Rodriguez F, Pujari SS, Broyde S, Tretyakova N. Translesion Synthesis Past 5-Formylcytosine-Mediated DNA-Peptide Cross-Links by hPolη Is Dependent on the Local DNA Sequence. Biochemistry 2021; 60:1797-1807. [PMID: 34080848 DOI: 10.1021/acs.biochem.1c00130] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
DNA-protein cross-links (DPCs) are unusually bulky DNA lesions that form when cellular proteins become trapped on DNA following exposure to ultraviolet light, free radicals, aldehydes, and transition metals. DPCs can also form endogenously when naturally occurring epigenetic marks [5-formyl cytosine (5fC)] in DNA react with lysine and arginine residues of histones to form Schiff base conjugates. Our previous studies revealed that DPCs inhibit DNA replication and transcription but can undergo proteolytic cleavage to produce smaller DNA-peptide conjugates. We have shown that 5fC-conjugated DNA-peptide cross-links (DpCs) placed within the CXA sequence (X = DpC) can be bypassed by human translesion synthesis (TLS) polymerases η and κ in an error-prone manner. However, the local nucleotide sequence context can have a strong effect on replication bypass of bulky lesions by influencing the geometry of the ternary complex among the DNA template, polymerase, and the incoming dNTP. In this work, we investigated polymerase bypass of 5fC-DNA-11-mer peptide cross-links placed in seven different sequence contexts (CXC, CXG, CXT, CXA, AXA, GXA, and TXA) in the presence of human TLS polymerase η. Primer extension products were analyzed by gel electrophoresis, and steady-state kinetics of the misincorporation of dAMP opposite the DpC lesion in different base sequence contexts was investigated. Our results revealed a strong impact of nearest neighbor base identity on polymerase η activity in the absence and presence of a DpC lesion. Molecular dynamics simulations were used to structurally explain the experimental findings. Our results suggest a possible role of local DNA sequence in promoting TLS-related mutational hot spots in the presence and absence of DpC lesions.
Collapse
Affiliation(s)
- Jenna Thomforde
- Department of Medicinal Chemistry and Masonic Cancer Center, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Iwen Fu
- Department of Biology, New York University, New York, New York 10003-6688, United States
| | - Freddys Rodriguez
- Department of Medicinal Chemistry and Masonic Cancer Center, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Suresh S Pujari
- Department of Medicinal Chemistry and Masonic Cancer Center, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Suse Broyde
- Department of Biology, New York University, New York, New York 10003-6688, United States
| | - Natalia Tretyakova
- Department of Medicinal Chemistry and Masonic Cancer Center, University of Minnesota, Minneapolis, Minnesota 55455, United States
| |
Collapse
|
6
|
Di Gioacchino A, Šulc P, Komarova AV, Greenbaum BD, Monasson R, Cocco S. The Heterogeneous Landscape and Early Evolution of Pathogen-Associated CpG Dinucleotides in SARS-CoV-2. Mol Biol Evol 2021; 38:2428-2445. [PMID: 33555346 PMCID: PMC7928797 DOI: 10.1093/molbev/msab036] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
COVID-19 can lead to acute respiratory syndrome, which can be due to dysregulated immune signaling. We analyze the distribution of CpG dinucleotides, a pathogen-associated molecular pattern, in the SARS-CoV-2 genome. We characterize CpG content by a CpG force that accounts for statistical constraints acting on the genome at the nucleotidic and amino acid levels. The CpG force, as the CpG content, is overall low compared with other pathogenic betacoronaviruses; however, it widely fluctuates along the genome, with a particularly low value, comparable with the circulating seasonal HKU1, in the spike coding region and a greater value, comparable with SARS and MERS, in the highly expressed nucleocapside coding region (N ORF), whose transcripts are relatively abundant in the cytoplasm of infected cells and present in the 3'UTRs of all subgenomic RNA. This dual nature of CpG content could confer to SARS-CoV-2 the ability to avoid triggering pattern recognition receptors upon entry, while eliciting a stronger response during replication. We then investigate the evolution of synonymous mutations since the outbreak of the COVID-19 pandemic, finding a signature of CpG loss in regions with a greater CpG force. Sequence motifs preceding the CpG-loss-associated loci in the N ORF match recently identified binding patterns of the zinc finger antiviral protein. Using a model of the viral gene evolution under human host pressure, we find that synonymous mutations seem driven in the SARS-CoV-2 genome, and particularly in the N ORF, by the viral codon bias, the transition-transversion bias, and the pressure to lower CpG content.
Collapse
Affiliation(s)
- Andrea Di Gioacchino
- Laboratoire de Physique de l’Ecole Normale Supérieure, PSL & CNRS UMR8063, Sorbonne Université, Université de Paris, Paris, France
| | - Petr Šulc
- School of Molecular Sciences and Center for Molecular Design and Biomimetics, The Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Anastassia V Komarova
- Molecular Genetics of RNA Viruses, Department of Virology, Institut Pasteur, CNRS UMR-3569, Paris, France
| | - Benjamin D Greenbaum
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Rémi Monasson
- Laboratoire de Physique de l’Ecole Normale Supérieure, PSL & CNRS UMR8063, Sorbonne Université, Université de Paris, Paris, France
| | - Simona Cocco
- Laboratoire de Physique de l’Ecole Normale Supérieure, PSL & CNRS UMR8063, Sorbonne Université, Université de Paris, Paris, France
| |
Collapse
|
7
|
Di Gioacchino A, Šulc P, Komarova AV, Greenbaum BD, Monasson R, Cocco S. The heterogeneous landscape and early evolution of pathogen-associated CpG dinucleotides in SARS-CoV-2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020. [PMID: 32511407 DOI: 10.1101/2020.05.06.074039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
COVID-19 can lead to acute respiratory syndrome, which can be due to dysregulated immune signaling. We analyze the distribution of CpG dinucleotides, a pathogen-associated molecular pattern, in the SARS-CoV-2 genome. We find that the CpG content, which we characterize by a force parameter that accounts for statistical constraints acting on the genome at the nucleotidic and amino-acid levels, is, on average, low compared to other pathogenic betacoronaviruses. However, the CpG force widely fluctuates along the genome, with a particularly low value, comparable to the circulating seasonal HKU1, in the spike coding region and a greater value, comparable to SARS and MERS, in the highly expressed nucleocapside coding region (N ORF), whose transcripts are relatively abundant in the cytoplasm of infected cells and present in the 3'UTRs of all subgenomic RNA. This dual nature of CpG content could confer to SARS-CoV-2 the ability to avoid triggering pattern recognition receptors upon entry, while eliciting a stronger response during replication. We then investigate the evolution of synonymous mutations since the outbreak of the COVID-19 pandemic, finding a signature of CpG loss in regions with a greater CpG force. Sequence motifs preceding the CpG-loss-associated loci in the N ORF match recently identified binding patterns of the Zinc finger Anti-viral Protein. Using a model of the viral gene evolution under human host pressure, we find that synonymous mutations seem driven in the SARS-CoV-2 genome, and particularly in the N ORF, by the viral codon bias, the transition-transversion bias and the pressure to lower CpG content.
Collapse
|
8
|
Carlson J, Locke AE, Flickinger M, Zawistowski M, Levy S, Myers RM, Boehnke M, Kang HM, Scott LJ, Li JZ, Zöllner S. Extremely rare variants reveal patterns of germline mutation rate heterogeneity in humans. Nat Commun 2018; 9:3753. [PMID: 30218074 PMCID: PMC6138700 DOI: 10.1038/s41467-018-05936-5] [Citation(s) in RCA: 80] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 07/30/2018] [Indexed: 12/30/2022] Open
Abstract
A detailed understanding of the genome-wide variability of single-nucleotide germline mutation rates is essential to studying human genome evolution. Here, we use ~36 million singleton variants from 3560 whole-genome sequences to infer fine-scale patterns of mutation rate heterogeneity. Mutability is jointly affected by adjacent nucleotide context and diverse genomic features of the surrounding region, including histone modifications, replication timing, and recombination rate, sometimes suggesting specific mutagenic mechanisms. Remarkably, GC content, DNase hypersensitivity, CpG islands, and H3K36 trimethylation are associated with both increased and decreased mutation rates depending on nucleotide context. We validate these estimated effects in an independent dataset of ~46,000 de novo mutations, and confirm our estimates are more accurate than previously published results based on ancestrally older variants without considering genomic features. Our results thus provide the most refined portrait to date of the factors contributing to genome-wide variability of the human germline mutation rate.
Collapse
Affiliation(s)
- Jedidiah Carlson
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Adam E Locke
- McDonnell Genome Institute & Department of Medicine, Washington University, St. Louis, MO, 63108, USA
| | - Matthew Flickinger
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Matthew Zawistowski
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Shawn Levy
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, 35806, USA
| | - Richard M Myers
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, 35806, USA
| | - Michael Boehnke
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Hyun Min Kang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Laura J Scott
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Jun Z Li
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA.
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, 48109, USA.
| | - Sebastian Zöllner
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA.
- Department of Psychiatry, University of Michigan, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
9
|
Lyons DM, Lauring AS. Evidence for the Selective Basis of Transition-to-Transversion Substitution Bias in Two RNA Viruses. Mol Biol Evol 2018; 34:3205-3215. [PMID: 29029187 PMCID: PMC5850290 DOI: 10.1093/molbev/msx251] [Citation(s) in RCA: 72] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The substitution rates of transitions are higher than expected by chance relative to those of transversions. Many have argued that selection disfavors transversions, as nonsynonymous transversions are less likely to conserve biochemical properties of the original amino acid. Only recently has it become feasible to directly test this selective hypothesis by comparing the fitness effects of a large number of transition and transversion mutations. For example, a recent study of six viruses and one beta-lactamase gene did not find evidence supporting the selective hypothesis. Here, we analyze the relative fitness effects of transition and transversion mutations from our recently published genome-wide study of mutational fitness effects in influenza virus. In contrast to prior work, we find that transversions are significantly more detrimental than transitions. Using what we believe to be an improved statistical framework, we also identify a similar trend in two HIV data sets. We further demonstrate a fitness difference in transition and transversion mutations using four deep mutational scanning data sets of influenza virus and HIV, which provided adequate statistical power. We find that three of the most commonly cited radical/conservative amino acid categories are predictive of fitness, supporting their utility in studies of positive selection and codon usage bias. We conclude that selection is a major contributor to the transition:transversion substitution bias in viruses and that this effect is only partially explained by the greater likelihood of transversion mutations to cause radical as opposed to conservative amino acid changes.
Collapse
Affiliation(s)
- Daniel M Lyons
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI
| | - Adam S Lauring
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI.,Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI.,Division of Infectious Diseases, Department of Internal Medicine, University of Michigan, Ann Arbor, MI
| |
Collapse
|
10
|
Somatic Tumor Mutations Detected by Targeted Next Generation Sequencing in Minute Amounts of Serum-Derived Cell-Free DNA. Sci Rep 2017; 7:2136. [PMID: 28522829 PMCID: PMC5437051 DOI: 10.1038/s41598-017-02388-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2016] [Accepted: 04/18/2017] [Indexed: 01/15/2023] Open
Abstract
The use of blood-circulating cell-free DNA (cfDNA) as 'liquid-biopsy' is explored worldwide, with hopes for its potential in providing prognostic or predictive information in cancer treatment. In exploring cfDNA, valuable repositories are biobanks containing material collected over time, however these retrospective cohorts have restrictive resources. In this study, we aimed to detect tumor-specific mutations in only minute amounts of serum-derived cfDNA by using a targeted next generation sequencing (NGS) approach. In a retrospective cohort of ten metastatic breast cancer patients, we profiled DNA from primary tumor tissue (frozen), tumor-adjacent normal tissue (formalin-fixed paraffin embedded), and three consecutive serum samples (frozen). Our presented workflow includes comparisons with matched normal DNA or in silico reference DNA to discriminate germline from somatic variants, validation of variants through the detection in at least two DNA samples of an individual, and the use of public databases on variants. By our workflow, we were able to detect a total of four variants traceable as circulating tumor DNA (ctDNA) in the sera of three of the ten patients.
Collapse
|
11
|
Shimada MK, Sanbonmatsu R, Yamaguchi-Kabata Y, Yamasaki C, Suzuki Y, Chakraborty R, Gojobori T, Imanishi T. Selection pressure on human STR loci and its relevance in repeat expansion disease. Mol Genet Genomics 2016; 291:1851-69. [PMID: 27290643 DOI: 10.1007/s00438-016-1219-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2015] [Accepted: 05/21/2016] [Indexed: 12/30/2022]
Abstract
Short Tandem Repeats (STRs) comprise repeats of one to several base pairs. Because of the high mutability due to strand slippage during DNA synthesis, rapid evolutionary change in the number of repeating units directly shapes the range of repeat-number variation according to selection pressure. However, the remaining questions include: Why are STRs causing repeat expansion diseases maintained in the human population; and why are these limited to neurodegenerative diseases? By evaluating the genome-wide selection pressure on STRs using the database we constructed, we identified two different patterns of relationship in repeat-number polymorphisms between DNA and amino-acid sequences, although both patterns are evolutionary consequences of avoiding the formation of harmful long STRs. First, a mixture of degenerate codons is represented in poly-proline (poly-P) repeats. Second, long poly-glutamine (poly-Q) repeats are favored at the protein level; however, at the DNA level, STRs encoding long poly-Qs are frequently divided by synonymous SNPs. Furthermore, significant enrichments of apoptosis and neurodevelopment were biological processes found specifically in genes encoding poly-Qs with repeat polymorphism. This suggests the existence of a specific molecular function for polymorphic and/or long poly-Q stretches. Given that the poly-Qs causing expansion diseases were longer than other poly-Qs, even in healthy subjects, our results indicate that the evolutionary benefits of long and/or polymorphic poly-Q stretches outweigh the risks of long CAG repeats predisposing to pathological hyper-expansions. Molecular pathways in neurodevelopment requiring long and polymorphic poly-Q stretches may provide a clue to understanding why poly-Q expansion diseases are limited to neurodegenerative diseases.
Collapse
Affiliation(s)
- Makoto K Shimada
- Institute for Comprehensive Medical Science, Fujita Health University, 1-98 Dengakugakubo, Kutsukake-cho, Toyoake, Aichi, 470-1192, Japan. .,National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan. .,Japan Biological Informatics Consortium, 10F TIME24 Building, 2-4-32 Aomi, Koto-ku, Tokyo, 135-8073, Japan.
| | - Ryoko Sanbonmatsu
- Japan Biological Informatics Consortium, 10F TIME24 Building, 2-4-32 Aomi, Koto-ku, Tokyo, 135-8073, Japan
| | - Yumi Yamaguchi-Kabata
- National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan.,Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, 980-8573, Japan
| | - Chisato Yamasaki
- National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan.,Japan Biological Informatics Consortium, 10F TIME24 Building, 2-4-32 Aomi, Koto-ku, Tokyo, 135-8073, Japan
| | - Yoshiyuki Suzuki
- Graduate School of Natural Sciences, Nagoya City University, 1 Yamanohata, Mizuho-cho, Mizuho-ku, Nagoya, Aichi, 467-8501, Japan
| | - Ranajit Chakraborty
- Health Science Center, University of North Texas, 3500 Camp Bowie Blvd., Fort Worth, TX, 76107, USA
| | - Takashi Gojobori
- National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan.,Computational Bioscience Research Center, King Abdullah University of Science and Technology, Ibn Al-Haytham Building (West), Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Tadashi Imanishi
- National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan.,Department of Molecular Life Science, Tokai University School of Medicine, 143 Shimokasuya, Isehara, Kanagawa, 259-1193, Japan
| |
Collapse
|
12
|
Abstract
A pattern in which nucleotide transitions are favored several fold over transversions is common in molecular evolution. When this pattern occurs among amino acid replacements, explanations often invoke an effect of selection, on the grounds that transitions are more conservative in their effects on proteins. However, the underlying hypothesis of conservative transitions has never been tested directly. Here we assess support for this hypothesis using direct evidence: the fitness effects of mutations in actual proteins measured via individual or paired growth experiments. We assembled data from 8 published studies, ranging in size from 24 to 757 single-nucleotide mutations that change an amino acid. Every study has the statistical power to reveal significant effects of amino acid exchangeability, and most studies have the power to discern a binary conservative-vs-radical distinction. However, only one study suggests that transitions are significantly more conservative than transversions. In the combined set of 1,239 replacements (544 transitions, 695 transversions), the chance that a transition is more conservative than a transversion is 53 % (95 % confidence interval 50 to 56) compared with the null expectation of 50 %. We show that this effect is not large compared with that of most biochemical factors, and is not large enough to explain the several-fold bias observed in evolution. In short, the available data have the power to verify the “conservative transitions” hypothesis if true, but suggest instead that selection on proteins plays at best a minor role in the observed bias.
Collapse
Affiliation(s)
- Arlin Stoltzfus
- Institute for Bioscience and Biotechnology Research, Rockville, MD Genome-scale Measurements Group, National Institute of Standards and Technology, Gaithersburg, MD
| | - Ryan W Norris
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University
| |
Collapse
|
13
|
Murata C, Kuroki Y, Imoto I, Tsukahara M, Ikejiri N, Kuroiwa A. Initiation of recombination suppression and PAR formation during the early stages of neo-sex chromosome differentiation in the Okinawa spiny rat, Tokudaia muenninki. BMC Evol Biol 2015; 15:234. [PMID: 26514418 PMCID: PMC4625939 DOI: 10.1186/s12862-015-0514-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 10/20/2015] [Indexed: 11/17/2022] Open
Abstract
Background Sex chromosomes of extant eutherian species are too ancient to reveal the process that initiated sex-chromosome differentiation. By contrast, the neo-sex chromosomes generated by sex-autosome fusions of recent origin in Tokudaia muenninki are expected to be evolutionarily ‘young’, and therefore provide a good model in which to elucidate the early phases of eutherian sex chromosome evolution. Here we describe the genomic evolution of T. muenninki in neo-sex chromosome differentiation. Results FISH mapping of a T. muenninki male, using 50 BAC clones as probes, revealed no chromosomal rearrangements between the neo-sex chromosomes. Substitution-direction analysis disclosed that sequence evolution toward GC-richness, which positively correlates with recombination activity, occurred in the peritelomeric regions, but not middle regions of the neo-sex chromosomes. In contrast, the sequence evolution toward AT-richness was observed in those pericentromeric regions. Furthermore, we showed genetic differentiation between the pericentromeric regions as well as an accelerated rate of evolution in the neo-Y region through the detection of male-specific substitutions by gene sequencing in multiple males and females, and each neo-sex–derived BAC sequencing. Conclusions Our results suggest that recombination has been suppressed in the pericentromeric region of neo-sex chromosomes without chromosome rearrangement, whereas high levels of recombination activity is limited in the peritelomeric region of almost undifferentiated neo-sex chromosomes. We conclude that PAR might have been formed on the peritelomeric region of sex chromosomes as an independent event from spread of recombination suppression during the early stages of sex chromosome differentiation. Electronic supplementary material The online version of this article (doi:10.1186/s12862-015-0514-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Chie Murata
- Department of Human Genetics, Institute of Health Biosciences, Tokushima University Graduate School, 3-18-15, Kuramoto-cho, Tokushima, Japan.
| | - Yoko Kuroki
- RIKEN, Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi, Yokohama, Kanagawa, Japan. .,Present address: Division of Pediatric Disease Genomics, Department of Genome Medicine, National Research Institute for Child Health and Development, 2-10-1 Okura, Setagaya-ku, Tokyo, Japan.
| | - Issei Imoto
- Department of Human Genetics, Institute of Health Biosciences, Tokushima University Graduate School, 3-18-15, Kuramoto-cho, Tokushima, Japan.
| | - Masaru Tsukahara
- Student Laboratory, Faculty of Medicine, Tokushima University, 3-18-15, Kuramoto-cho, Tokushima, Japan.
| | - Naoto Ikejiri
- Student Laboratory, Faculty of Medicine, Tokushima University, 3-18-15, Kuramoto-cho, Tokushima, Japan.
| | - Asato Kuroiwa
- Laboratory of Animal Cytogenetics, Faculty of Science, Hokkaido University, Kita 10 Nishi 8, Kita-ku, Sapporo, Hokkaido, Japan.
| |
Collapse
|
14
|
Pereira FJC, Teixeira A, Kong J, Barbosa C, Silva AL, Marques-Ramos A, Liebhaber SA, Romão L. Resistance of mRNAs with AUG-proximal nonsense mutations to nonsense-mediated decay reflects variables of mRNA structure and translational activity. Nucleic Acids Res 2015; 43:6528-44. [PMID: 26068473 PMCID: PMC4513866 DOI: 10.1093/nar/gkv588] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2015] [Accepted: 05/23/2015] [Indexed: 11/25/2022] Open
Abstract
Nonsense-mediated mRNA decay (NMD) is a surveillance pathway that recognizes and selectively degrades mRNAs carrying premature termination codons (PTCs). The level of sensitivity of a PTC-containing mRNA to NMD is multifactorial. We have previously shown that human β-globin mRNAs carrying PTCs in close proximity to the translation initiation AUG codon escape NMD. This was called the ‘AUG-proximity effect’. The present analysis of nonsense codons in the human α-globin mRNA illustrates that the determinants of the AUG-proximity effect are in fact quite complex, reflecting the ability of the ribosome to re-initiate translation 3′ to the PTC and the specific sequence and secondary structure of the translated ORF. These data support a model in which the time taken to translate the short ORF, impacted by distance, sequence, and structure, not only modulates translation re-initiation, but also impacts on the exact boundary of AUG-proximity protection from NMD.
Collapse
Affiliation(s)
- Francisco J C Pereira
- Departamento de Genética Humana, Instituto Nacional de Saúde Doutor Ricardo Jorge, 1649-016 Lisboa, Portugal
| | - Alexandre Teixeira
- Departamento de Genética Humana, Instituto Nacional de Saúde Doutor Ricardo Jorge, 1649-016 Lisboa, Portugal Centro de Investigação em Genética Molecular Humana, Faculdade de Ciências Médicas, Universidade Nova de Lisboa, 1349-008 Lisboa, Portugal
| | - Jian Kong
- Departments of Genetics and Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Cristina Barbosa
- Departamento de Genética Humana, Instituto Nacional de Saúde Doutor Ricardo Jorge, 1649-016 Lisboa, Portugal BioISI - Biosystems & Integrative Sciences Institute, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal
| | - Ana Luísa Silva
- Departamento de Genética Humana, Instituto Nacional de Saúde Doutor Ricardo Jorge, 1649-016 Lisboa, Portugal
| | - Ana Marques-Ramos
- Departamento de Genética Humana, Instituto Nacional de Saúde Doutor Ricardo Jorge, 1649-016 Lisboa, Portugal
| | - Stephen A Liebhaber
- Departments of Genetics and Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Luísa Romão
- Departamento de Genética Humana, Instituto Nacional de Saúde Doutor Ricardo Jorge, 1649-016 Lisboa, Portugal BioISI - Biosystems & Integrative Sciences Institute, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal
| |
Collapse
|
15
|
Xia J, Han L, Zhao Z. Investigating the relationship of DNA methylation with mutation rate and allele frequency in the human genome. BMC Genomics 2012; 13 Suppl 8:S7. [PMID: 23281708 PMCID: PMC3535710 DOI: 10.1186/1471-2164-13-s8-s7] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Background DNA methylation, which mainly occurs at CpG dinucleotides, is a dynamic epigenetic regulation mechanism in most eukaryotic genomes. It is already known that methylated CpG dinucleotides can lead to a high rate of C to T mutation at these sites. However, less is known about whether and how the methylation level causes a different mutation rate, especially at the single-base resolution. Results In this study, we used genome-wide single-base resolution methylation data to perform a comprehensive analysis of the mutation rate of methylated cytosines from human embryonic stem cell. Through the analysis of the density of single nucleotide polymorphisms, we first confirmed that the mutation rate in methylated CpG sites is greater than that in unmethylated CpG sites. Then, we showed that among methylated CpG sites, the mutation rate is markedly increased in low-intermediately (20-40% methylation level) to intermediately methylated CpG sites (40-60% methylation level) of the human genome. This mutation pattern was observed regardless of DNA strand direction and the sequence coverage over the site on which the methylation level was calculated. Moreover, this highly non-random mutation pattern was found more apparent in intergenic and intronic regions than in promoter regions and CpG islands. Our investigation suggested this pattern appears primarily in autosomes rather than sex chromosomes. Further analysis based on human-chimpanzee divergence confirmed these observations. Finally, we observed a significant correlation between the methylation level and cytosine allele frequency. Conclusions Our results showed a high mutation rate in low-intermediately to intermediately methylated CpG sites at different scales, from the categorized genomic region, whole chromosome, to the whole genome level, thereby providing the first supporting evidence of mutation rate variation at human methylated CpG sites using the genome-wide sing-base resolution methylation data.
Collapse
Affiliation(s)
- Junfeng Xia
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | | | | |
Collapse
|
16
|
Analysis of compensatory substitution and gene evolution on the MAGEA/CSAG-palindrome of the primate X chromosomes. Comput Biol Chem 2012; 42:18-22. [PMID: 23257410 DOI: 10.1016/j.compbiolchem.2012.11.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2012] [Revised: 11/06/2012] [Accepted: 11/13/2012] [Indexed: 11/20/2022]
Abstract
The human X chromosome contains a large number of inverted repeat DNA palindromes. Although arbitrary substitutions destroyed the inverted repeat structure of MAGEA/CSAG-palindrome during the evolutionary process of the primates, most of the substitutions are compensatory. Using maximum parsimony, it is demonstrated that the compensatory substitutions are prone to occur between bases with similar structures on the human, chimpanzee and orangutan MAGEA/CSAG-palindromes. Furthermore, it is found that MAGEA/CSAG genes also exist in orangutan and rhesus monkey palindromes by homologous searching. This suggests that the MAGEA/CSAG-palindrome might predate the divergence of human and other primate lineages. Comparative sequence analysis of the arms and genes on the primate MAGEA/CSAG-palindromes provides possible evidence of subsequently arm to arm gene conversion. These compensatory substitutions on the MAGEA/CSAG-palindrome of the primate X chromosomes play an important role in maintaining their structural symmetry during the process of formation.
Collapse
|
17
|
Freudenberg J, Gregersen PK, Freudenberg-Hua Y. A simple method for analyzing exome sequencing data shows distinct levels of nonsynonymous variation for human immune and nervous system genes. PLoS One 2012; 7:e38087. [PMID: 22701602 PMCID: PMC3368947 DOI: 10.1371/journal.pone.0038087] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2012] [Accepted: 05/03/2012] [Indexed: 11/29/2022] Open
Abstract
To measure the strength of natural selection that acts upon single nucleotide variants (SNVs) in a set of human genes, we calculate the ratio between nonsynonymous SNVs (nsSNVs) per nonsynonymous site and synonymous SNVs (sSNVs) per synonymous site. We transform this ratio with a respective factor f that corrects for the bias of synonymous sites towards transitions in the genetic code and different mutation rates for transitions and transversions. This method approximates the relative density of nsSNVs (rdnsv) in comparison with the neutral expectation as inferred from the density of sSNVs. Using SNVs from a diploid genome and 200 exomes, we apply our method to immune system genes (ISGs), nervous system genes (NSGs), randomly sampled genes (RSGs), and gene ontology annotated genes. The estimate of rdnsv in an individual exome is around 20% for NSGs and 30-40% for ISGs and RSGs. This smaller rdnsv of NSGs indicates overall stronger purifying selection. To quantify the relative shift of nsSNVs towards rare variants, we next fit a linear regression model to the estimates of rdnsv over different SNV allele frequency bins. The obtained regression models show a negative slope for NSGs, ISGs and RSGs, supporting an influence of purifying selection on the frequency spectrum of segregating nsSNVs. The y-intercept of the model predicts rdnsv for an allele frequency close to 0. This parameter can be interpreted as the proportion of nonsynonymous sites where mutations are tolerated to segregate with an allele frequency notably greater than 0 in the population, given the performed normalization of the observed nsSNV to sSNV ratio. A smaller y-intercept is displayed by NSGs, indicating more nonsynonymous sites under strong negative selection. This predicts more monogenically inherited or de-novo mutation diseases that affect the nervous system.
Collapse
Affiliation(s)
- Jan Freudenberg
- Robert S. Boas Center for Human Genetics and Genomics, The Feinstein Institute for Medical Research, Northshore LIJ Healthsystem, Manhasset, New York, United States of America.
| | | | | |
Collapse
|
18
|
Xia J, Wang Q, Jia P, Wang B, Pao W, Zhao Z. NGS catalog: A database of next generation sequencing studies in humans. Hum Mutat 2012; 33:E2341-55. [PMID: 22517761 DOI: 10.1002/humu.22096] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2011] [Accepted: 03/09/2011] [Indexed: 11/10/2022]
Abstract
Next generation sequencing (NGS) technologies have been rapidly applied in biomedical and biological research since its advent only a few years ago, and they are expected to advance at an unprecedented pace in the following years. To provide the research community with a comprehensive NGS resource, we have developed the database Next Generation Sequencing Catalog (NGS Catalog, http://bioinfo.mc.vanderbilt.edu/NGS/index.html), a continually updated database that collects, curates and manages available human NGS data obtained from published literature. NGS Catalog deposits publication information of NGS studies and their mutation characteristics (SNVs, small insertions/deletions, copy number variations, and structural variants), as well as mutated genes and gene fusions detected by NGS. Other functions include user data upload, NGS general analysis pipelines, and NGS software. NGS Catalog is particularly useful for investigators who are new to NGS but would like to take advantage of these powerful technologies for their own research. Finally, based on the data deposited in NGS Catalog, we summarized features and findings from whole exome sequencing, whole genome sequencing, and transcriptome sequencing studies for human diseases or traits.
Collapse
Affiliation(s)
- Junfeng Xia
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37203, USA
| | | | | | | | | | | |
Collapse
|
19
|
Suzuki Y. Overestimation of nonsynonymous/synonymous rate ratio by reverse-translation of aligned amino acid sequences. Genes Genet Syst 2011; 86:123-9. [PMID: 21670552 DOI: 10.1266/ggs.86.123] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
In the analysis of protein-coding nucleotide sequences, the ratio of the number of nonsynonymous substitutions to that of synonymous substitutions (d(N)/d(S)) is used as an indicator for the direction and magnitude of natural selection operating at the amino acid sequence level. The d(S) and d(N) values are estimated based on the comparison of homologous codons, which are often identified by converting (reverse-translating) aligned amino acid sequences into codon sequences. In this method, however, homologous codons may be mis-identified when frame-shifts occurred or amino acid sequences were mis-aligned, which may lead to overestimation of the d(N)/d(S) ratio. Here the effect of reverse-translating aligned amino acid sequences on the estimation of d(N)/d(S) ratio was examined through a large-scale analysis of protein-coding nucleotide sequences from vertebrate species. Apparently, 1-9% of codon sites that were identified as homologous with reverse-translation contained non-homologous codons, where the d(N)/d(S) ratio was unduly high. By correcting the d(N)/d(S) ratio for these codon sites, it was inferred that the ratio was 5-43% overestimated with reverse-translation. These results suggest that caution should be exerted in the study of natural selection using the d(N)/d(S) ratio by reverse-translating aligned amino acid sequences.
Collapse
Affiliation(s)
- Yoshiyuki Suzuki
- Graduate School of Natural Sciences, Nagoya City University, 1 Yamanohata, Nagoya-shi, Aichi-ken 467-8501, Japan.
| |
Collapse
|
20
|
Cooper DN, Bacolla A, Férec C, Vasquez KM, Kehrer-Sawatzki H, Chen JM. On the sequence-directed nature of human gene mutation: the role of genomic architecture and the local DNA sequence environment in mediating gene mutations underlying human inherited disease. Hum Mutat 2011; 32:1075-99. [PMID: 21853507 PMCID: PMC3177966 DOI: 10.1002/humu.21557] [Citation(s) in RCA: 94] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2011] [Accepted: 06/17/2011] [Indexed: 12/21/2022]
Abstract
Different types of human gene mutation may vary in size, from structural variants (SVs) to single base-pair substitutions, but what they all have in common is that their nature, size and location are often determined either by specific characteristics of the local DNA sequence environment or by higher order features of the genomic architecture. The human genome is now recognized to contain "pervasive architectural flaws" in that certain DNA sequences are inherently mutation prone by virtue of their base composition, sequence repetitivity and/or epigenetic modification. Here, we explore how the nature, location and frequency of different types of mutation causing inherited disease are shaped in large part, and often in remarkably predictable ways, by the local DNA sequence environment. The mutability of a given gene or genomic region may also be influenced indirectly by a variety of noncanonical (non-B) secondary structures whose formation is facilitated by the underlying DNA sequence. Since these non-B DNA structures can interfere with subsequent DNA replication and repair and may serve to increase mutation frequencies in generalized fashion (i.e., both in the context of subtle mutations and SVs), they have the potential to serve as a unifying concept in studies of mutational mechanisms underlying human inherited disease.
Collapse
Affiliation(s)
- David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, United Kingdom.
| | | | | | | | | | | |
Collapse
|
21
|
Zaina S, Pérez-Luque EL, Lund G. Genetics talks to epigenetics? The interplay between sequence variants and chromatin structure. Curr Genomics 2011; 11:359-67. [PMID: 21286314 PMCID: PMC2945002 DOI: 10.2174/138920210791616662] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2010] [Revised: 06/12/2010] [Accepted: 06/15/2010] [Indexed: 12/29/2022] Open
Abstract
Transcription is regulated by two major mechanisms. On the one hand, changes in DNA sequence are responsible for genetic gene regulation. On the other hand, chromatin structure regulates gene activity at the epigenetic level. Given the fundamental participation of these mechanisms in transcriptional regulation of virtually any gene, they are likely to co-regulate a significant proportion of the genome. The simple concept behind this idea is that a mutation may have a significant impact on local chromatin structure by modifying DNA methylation patterns or histone type recruitment. Yet, the relevance of these interactions is poorly understood. Elucidating how genetic and epigenetic mechanisms co-participate in regulating transcription may assist in some of the unresolved cases of genetic variant-phenotype association. One example is loci that have biologically predictable functions but genotypes that fail to correlate with phenotype, particularly disease outcome. Conversely, a crosstalk between genetics and epigenetics may provide a mechanistic explanation for cases in which a convincing association between phenotype and a genetic variant has been established, but the latter does not lie in a promoter or protein coding sequence. Here, we review recently published data in the field and discuss their implications for genetic variant-phenotype association studies.
Collapse
Affiliation(s)
- Silvio Zaina
- Department of Medical Research, Division of Health Sciences, Leon Campus, University of Guanajuato, Leon, Mexico
| | | | | |
Collapse
|
22
|
Li M, Chen SS. The tendency to recreate ancestral CG dinucleotides in the human genome. BMC Evol Biol 2011; 11:3. [PMID: 21208429 PMCID: PMC3025853 DOI: 10.1186/1471-2148-11-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2010] [Accepted: 01/05/2011] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND The CG dinucleotides are known to be deficient in the human genome, due to a high mutation rate from 5-methylated CG to TG and its complementary pair CA. Meanwhile, many cellular functions rely on these CG dinucleotides, such as gene expression controlled by cytosine methylation status. Thus, CG dinucleotides that provide essential functional substrates should be retained in genomes. How these two conflicting processes regarding the fate of CG dinucleotides - i.e., high mutation rate destroying CG dinucleotides, vs. functional processes that require their preservation remains an unsolved question. RESULTS By analyzing the mutation and frequency spectrum of newly derived alleles in the human genome, a tendency towards generating more CGs was observed, which was mainly contributed by an excess number of mutations from CA/TG to CG. Simultaneously, we found a fixation preference for CGs derived from TG/CA rather than CGs generated by other dinucleotides. These tendencies were observed both in intergenic and genic regions. An analysis of Integrated Extended Haplotype Homozygosity provided no evidence of selection for newly derived CGs. CONCLUSIONS Ancestral CG dinucleotides that were subsequently lost by mutation tend to be recreated in the human genome, as indicated by a biased mutation and fixation pattern favoring new CGs that derived from TG/CA.
Collapse
Affiliation(s)
- Mingkun Li
- CAS-MPG Partner Institute of Computational Biology, Shanghai Institutes of Biological Sciences, Chinese Academy of Sciences, 200000 Shanghai, PR China.
| | | |
Collapse
|
23
|
Meyer K, Ueland PM. Use of matrix-assisted laser desorption/ionization time-of-flight mass spectrometry for multiplex genotyping. Adv Clin Chem 2011; 53:1-29. [PMID: 21404912 DOI: 10.1016/b978-0-12-385855-9.00001-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
After completion of the human genome project, the focus of geneticists has shifted to elucidation of gene function and genetic diversity to understand the mechanisms of complex diseases or variation of patient response in drug treatment. In the past decade, many different genotyping techniques have been described for the detection of single-nucleotide polymorphisms (SNPs) and other common polymorphic variants. Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS) is among the most powerful and widely used genotyping technologies. The method offers great flexibility in assay design and enables highly accurate genotyping at high sample throughput. Different strategies for allele discrimination and quantification have been combined with MALDI (hybridization, ligation, cleavage, and primer extension). Approaches based on primer extension have become the most popular applications. This combination enables rapid and reliable multiplexing of SNPs and other common variants, and makes MALDI-TOF-MS well suited for large-scale studies in fine-mapping and verification of genome-wide scans. In contrast to standard genotyping, more demanding approaches have enabled genotyping of DNA pools, molecular haplotyping or the detection of free circulating DNA for prenatal or cancer diagnostics. In addition, MALDI can also be used in novel applications as DNA methylation analysis, expression profiling, and resequencing. This review gives an introduction to multiplex genotyping by MALDI-MS and will focus on the latest developments of this technology.
Collapse
|
24
|
Nakken S, Rødland EA, Hovig E. Impact of DNA physical properties on local sequence bias of human mutation. Hum Mutat 2010; 31:1316-25. [PMID: 20886615 DOI: 10.1002/humu.21371] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2010] [Accepted: 08/31/2010] [Indexed: 01/07/2023]
Abstract
In selectively neutral regions of the human genome, nucleotide substitutions do not occur at random with respect to the local DNA sequence neighborhood. However, apart from the hypermutability of methylated CpG dinucleotides, which can explain the overrepresentation of nucleotide transitions in this context, the sequence-specific factors underlying point mutation bias remain largely to be determined, both in nature and in quantitative impact. One hypothesis suggests that the physical characteristics of a DNA context could have a modulating effect on its mutability, adjusting the impact of damage or the efficiency of repair. Here, we report a genome-wide computational test of this hypothesis, in which we utilize a constrained set of human non-CpG SNPs as the source of selectively neutral germline mutations. Interestingly, we observe that the quantitative context-dependencies of some substitution types display significant associations to measures of local structural topography and helix stability in DNA. Most prominently, we find that the local sequence bias of transition mutations is significantly associated with the sequence-dependent level of helix instability imposed by the potentially underlying DNA mismatches. The results of our work indicate the extent to which DNA physical properties could have shaped the recent point mutational spectrum in the human genome.
Collapse
Affiliation(s)
- Sigve Nakken
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Norwegian Radium Hospital, Norway
| | | | | |
Collapse
|
25
|
Analysis of BCR-ABL1 tyrosine kinase domain mutational spectra in primitive chronic myeloid leukemia cells suggests a unique mutator phenotype. Leukemia 2010; 24:1817-21. [PMID: 20739956 DOI: 10.1038/leu.2010.179] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
26
|
Features of recent codon evolution: a comparative polymorphism-fixation study. J Biomed Biotechnol 2010; 2010:202918. [PMID: 20622912 PMCID: PMC2896653 DOI: 10.1155/2010/202918] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2010] [Accepted: 03/31/2010] [Indexed: 11/17/2022] Open
Abstract
Features of amino-acid and codon changes can provide us important insights on protein evolution. So far, investigators have often examined mutation patterns at either interspecies fixed substitution or intraspecies nucleotide polymorphism level, but not both. Here, we performed a unique analysis of a combined set of intra-species polymorphisms and inter-species substitutions in human codons. Strong difference in mutational pattern was found at codon positions 1, 2, and 3 between the polymorphism and fixation data. Fixation had strong bias towards increasing the rarest codons but decreasing the most frequently used codons, suggesting that codon equilibrium has not been reached yet. We detected strong CpG effect on CG-containing codons and subsequent suppression by fixation. Finally, we detected the signature of purifying selection against Amid R:U dinucleotides at synonymous dicodon boundaries. Overall, fixation process could effectively and quickly correct the volatile changes introduced by polymorphisms so that codon changes could be gradual and directional and that codon composition could be kept relatively stable during evolution.
Collapse
|
27
|
Han L, Zhao Z. Contrast features of CpG islands in the promoter and other regions in the dog genome. Genomics 2009; 94:117-24. [PMID: 19409480 PMCID: PMC2729786 DOI: 10.1016/j.ygeno.2009.04.007] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2008] [Revised: 04/21/2009] [Accepted: 04/23/2009] [Indexed: 10/20/2022]
Abstract
The recent release of the domestic dog genome provides us with an ideal opportunity to investigate dog-specific genomic features. In this study, we performed a systematic analysis of CpG islands (CGIs), which are often considered gene markers, in the dog genome. Relative to the human and mouse genomes, the dog genome has a remarkably large number of CGIs and high CGI density, which is contributed by its noncoding sequences. Surprisingly, the dog genome has fewer CGIs associated with the promoter regions of genes than the human or the mouse. Further examination of functional features of dog-human-mouse homologous genes suggests that the dog might have undergone a faster erosion rate of promoter-associated CGIs than the human or mouse. Some genetic or genomic factors such as local recombination rate and karyotype may be related to the unique dog CGI features.
Collapse
Affiliation(s)
- Leng Han
- Department of Psychiatry and Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA 23298, USA
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
- Graduate School, Chinese Academy of Sciences, Beijing 100039, China
| | - Zhongming Zhao
- Department of Psychiatry and Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA 23298, USA
- Department of Human and Molecular Genetics, Virginia Commonwealth University, Richmond, VA 23298, USA
- Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
28
|
Qi YJ, Qiu WY. Symmetry Analysis of an X-palindrome in Human and Chimpanzee. CHINESE J CHEM PHYS 2009. [DOI: 10.1088/1674-0068/22/04/401-405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
29
|
Suzuki Y, Gojobori T, Kumar S. Methods for incorporating the hypermutability of CpG dinucleotides in detecting natural selection operating at the amino acid sequence level. Mol Biol Evol 2009; 26:2275-84. [PMID: 19581348 DOI: 10.1093/molbev/msp133] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
In detecting natural selection operating at the amino acid sequence level by comparing the rates of synonymous (r(S)) and nonsynonymous (r(N)) substitutions, the rates of synonymous and nonsynonymous mutations are assumed to be approximately the same. In reality, however, these rates may not be the same if different proportions of synonymous and nonsynonymous sites overlap with CpG dinucleotides, which are known to be hypermutable in some organisms. Here, we develop the evolutionary pathway methods for comparing r(S) and r(N) at multiple codon sites (all-sites analysis) and at single codon sites (single-site analysis) that take into account the hypermutability at CpG dinucleotides in estimating the number of synonymous substitutions per synonymous site (d(S)) and nonsynonymous substitutions per nonsynonymous site (d(N)). Computer simulations show that the direction and magnitude of the bias in the estimation of d(N)/d(S) caused by the hypermutability of CpGs are determined by both the number of CpGs and the relative proportions of synonymous and nonsynonymous sites overlapping with CpGs. This bias is greatly reduced when using the methods we propose to account for the hypermutability of CpG dinucleotides. In an all-sites analysis of protamine 1 genes from primates, d(N)/d(S) > 1 was observed for many pairs if the hypermutability was ignored. However, d(N)/d(S) becomes <or=1 for most of these pairs when the CpG sites are assumed to be hypermutable. Therefore, statistical indications of positive selection in some sequences or individual codons may be caused by mutation rate differences in synonymous and nonsynonymous sites.
Collapse
Affiliation(s)
- Yoshiyuki Suzuki
- Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, Mishima, Shizuoka, Japan.
| | | | | |
Collapse
|
30
|
Han L, Zhao Z. CpG islands or CpG clusters: how to identify functional GC-rich regions in a genome? BMC Bioinformatics 2009; 10:65. [PMID: 19232104 PMCID: PMC2652441 DOI: 10.1186/1471-2105-10-65] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2008] [Accepted: 02/20/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND CpG islands (CGIs), clusters of CpG dinucleotides in GC-rich regions, are often located in the 5' end of genes and considered gene markers. Hackenberg et al. (2006) recently developed a new algorithm, CpGcluster, which uses a completely different mathematical approach from previous traditional algorithms. Their evaluation suggests that CpGcluster provides a much more efficient approach to detecting functional clusters or islands of CpGs. RESULTS We systematically compared CpGcluster with the traditional algorithm by Takai and Jones (2002). Our comparisons of (1) the number of islands versus the number of genes in a genome, (2) the distribution of islands in different genomic regions, (3) island length, (4) the distance between two neighboring islands, and (5) methylation status suggest that Takai and Jones' algorithm is overall more appropriate for identifying promoter-associated islands of CpGs in vertebrate genomes. CONCLUSION The generation of genome sequence and DNA methylation data is expected to accelerate greatly. The information in this study is important for its extensive utility in gene feature analysis and epigenomics including gene prediction and methylation chip design in different genomes.
Collapse
Affiliation(s)
- Leng Han
- Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA 23298, USA.
| | | |
Collapse
|
31
|
Nakken S, Rødland EA, Rognes T, Hovig E. Large-scale inference of the point mutational spectrum in human segmental duplications. BMC Genomics 2009; 10:43. [PMID: 19161616 PMCID: PMC2640414 DOI: 10.1186/1471-2164-10-43] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2008] [Accepted: 01/22/2009] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Recent segmental duplications are relatively large (> or = 1 kb) genomic regions of high sequence identity (> or = 90%). They cover approximately 4-5% of the human genome and play important roles in gene evolution and genomic disease. The DNA sequence differences between copies of a segmental duplication represent the result of various mutational events over time, since any two duplication copies originated from the same ancestral DNA sequence. Based on this fact, we have developed a computational scheme for inference of point mutational events in human segmental duplications, which we collectively term duplication-inferred mutations (DIMs). We have characterized these nucleotide substitutions by comparing them with high-quality SNPs from dbSNP, both in terms of sequence context and frequency of substitution types. RESULTS Overall, DIMs show a lower ratio of transitions relative to transversions than SNPs, although this ratio approaches that of SNPs when considering DIMs within most recent duplications. Our findings indicate that DIMs and SNPs in general are caused by similar mutational mechanisms, with some deviances at the CpG dinucleotide. Furthermore, we discover a large number of reference SNPs that coincide with computationally inferred DIMs. The latter reflects how sequence variation in duplicated sequences can be misinterpreted as ordinary allelic variation. CONCLUSION In summary, we show how DNA sequence analysis of segmental duplications can provide a genome-wide mutational spectrum that mirrors recent genome evolution. The inferred set of nucleotide substitutions represents a valuable complement to SNPs for the analysis of genetic variation and point mutagenesis.
Collapse
Affiliation(s)
- Sigve Nakken
- Department of Informatics, University of Oslo, PO Box 1080 Blindern, NO-0316 Oslo, Norway.
| | | | | | | |
Collapse
|
32
|
Borštnik B, Oblak B, Pumpernik D. The Evolutionary Constraints in Mutational Replacements. Evol Biol 2009. [DOI: 10.1007/978-3-642-00952-5_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
33
|
Masumura K, Nohmi T. Spontaneous Mutagenesis in Rodents: Spontaneous Gene Mutations Identified by Neutral Reporter Genes in gpt Delta Transgenic Mice and Rats. ACTA ACUST UNITED AC 2009. [DOI: 10.1248/jhs.55.40] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Affiliation(s)
- Kenichi Masumura
- Division of Genetics and Mutagenesis, National Institute of Health Sciences
| | - Takehiko Nohmi
- Division of Genetics and Mutagenesis, National Institute of Health Sciences
| |
Collapse
|
34
|
Abstract
The dystrobrevin-binding protein 1 (DTNBP1) gene has been one of the most studied and promising schizophrenia susceptibility genes since it was first reported to be associated with schizophrenia in the Irish Study of High Density Schizophrenia Families (ISHDSF). Although many studies have been performed both at the functional level and in association with psychiatric disorders, there has been no systematic review of the features of the DTNBP1 gene, protein or the relationship between function and phenotype. Using a bioinformatics approach, we identified the DTNBP1 gene in 13 vertebrate species. The comparison of these genes revealed a conserved gene structure, protein-coding sequence and dysbindin domain, but a diverse noncoding sequence. The molecular evolutionary analysis suggests the DTNBP1 gene probably originated in chordates and matured in vertebrates. No signature of recent positive selection was seen in any primate lineage. The DTNBP1 gene likely has many more alternative transcripts than the current three major isoforms annotated in the NCBI database. Our examination of risk haplotypes revealed that, although the frequency of a single nucleotide polymorphism (SNP) or haplotype might be significantly different in cases from controls, difference between major geographic populations was even larger. Finally, we constructed the first DTNBP1 interactome and explored its network features. Besides the biogenesis of lysosome-related organelles complex 1 and dystrophin-associated protein complex, several molecules in the DTNBP1 network likely provide insight into the role of DTNBP1 in biological systems: retinoic acid, beta-estradiol, calmodulin and tumour necrosis factor. Studies of these subnetworks and pathways may provide opportunities to deepen our understanding of the mechanisms of action of DTNBP1 variants.
Collapse
|
35
|
Suzuki Y. False-positive results obtained from the branch-site test of positive selection. Genes Genet Syst 2008; 83:331-8. [PMID: 18931458 DOI: 10.1266/ggs.83.331] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Natural selection operating at the amino acid sequence level can be detected by comparing the rates of synonymous (r(S)) and nonsynonymous (r(N)) nucleotide substitutions, where r(N)/r(S) (omega) > 1 and omega < 1 suggest positive and negative selection, respectively. The branch-site test has been developed for detecting positive selection operating at a group of amino acid sites for a pre-specified (foreground) branch of a phylogenetic tree by taking into account the heterogeneity of omega among sites and branches. Here the performance of the branch-site test was examined by computer simulation, with special reference to the false-positive rate when the divergence of the sequences analyzed was small. The false-positive rate was found to inflate when the assumptions made on the omega values for the foreground and other (background) branches in the branch-site test were violated. In addition, under a similar condition, false-positive results were often obtained even when Bonferroni correction was conducted and the false-discovery rate was controlled in a large-scale analysis. False-positive results were also obtained even when the number of nonsynonymous substitutions for the foreground branch was smaller than the minimum value required for detecting positive selection. The existence of a codon site with a possibility of occurrence of multiple nonsynonymous substitutions for the foreground branch often caused the branch-site test to falsely identify positive selection. In the re-analysis of orthologous trios of protein-coding genes from humans, chimpanzees, and macaques, most of the genes previously identified to be positively selected for the human or chimpanzee branch by the branch-site test contained such a codon site, suggesting a possibility that a significant fraction of these genes are false-positives.
Collapse
Affiliation(s)
- Yoshiyuki Suzuki
- Institute of Molecular Evolutionary Genetics and Department of Biology, The Pennsylvania State University, PA, USA.
| |
Collapse
|
36
|
Meyer K, Fredriksen A, Ueland PM. MALDI-TOF MS genotyping of polymorphisms related to 1-carbon metabolism using common and mass-modified terminators. Clin Chem 2008; 55:139-49. [PMID: 18988749 DOI: 10.1373/clinchem.2008.115378] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
BACKGROUND Large cohort studies may provide sufficient power to disentangle the role of polymorphisms related to 1-carbon metabolism and chronic diseases, but they require fast, accurate, high-throughput genotyping techniques. MALDI-TOF mass spectrometry has been adapted to rapid fine mapping using various approaches for allele discrimination. We developed a genotyping method based on MALDI-TOF MS and compared assay performance for formats based on standard and mass-modified terminators. METHODS The assay includes 20 polymorphisms of 14 genes involved in 1-carbon metabolism (BHMT 742G>A, CBS 844ins68 and 699C>T, CTH 1364G>T, DHFR del19, NOS3 -786T>C and 894G>T, FOLR1 1314G>A, MTHFD1 -105T>C and 1958G>A, MTHFR 677C>T and 1298A>C, MTR 2756A>G, MTRR 66A>G and 524C>T, SLC19A1 80G>A, SHMT1 1420C>T, TCN2 67A>G and 776C>G, and TYMS 1494del6). RESULTS Missing calls were observed for 4.7% of the DNA samples, attributed to failed liquid sample handling. Highly accurate genotyping was obtained by mass-modified as well as standard ddNTPs, with an average error rate of < or =0.1% by analysis of sample duplicates. A semiquantitative approach enabled unambiguous identification of the CBS 844ins68. Cluster plots of the relative allele intensities showed allele-specific bias according to type of minisequencing terminator and revealed a potential structural variation in the BHMT gene. CONCLUSIONS MALDI-TOF MS-based genotyping using either standard or mass-modified terminators allows the accurate determination of single nucleotides as well as structural genetic variants. This was demonstrated with 20 polymorphisms involved in 1-carbon metabolism.
Collapse
Affiliation(s)
- Klaus Meyer
- Bevital A/S, Armauer Hansens Hus, University of Bergen, Bergen, Norway.
| | | | | |
Collapse
|
37
|
Simmen MW. Genome-scale relationships between cytosine methylation and dinucleotide abundances in animals. Genomics 2008; 92:33-40. [PMID: 18485662 DOI: 10.1016/j.ygeno.2008.03.009] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2008] [Accepted: 03/26/2008] [Indexed: 01/11/2023]
Abstract
In mammalian genomes CpGs occur at one-fifth their expected frequency. This is accepted as resulting from cytosine methylation and deamination of 5-methylcytosine leading to TpG and CpA dinucleotides. The corollary that a CpG deficit should correlate with TpG excess has not hitherto been systematically tested at a genomic level. I analyzed genome sequences (human, chimpanzee, mouse, pufferfish, zebrafish, sea squirt, fruitfly, mosquito, and nematode) to do this and generally to assess the hypothesis that CpG deficit, TpG excess, and other data are accountable in terms of 5-methylcytosine mutation. In all methylated genomes local CpG deficit decreases with higher G + C content. Local TpG surplus, while positively associated with G + C level in mammalian genomes but negatively associated with G + C in nonmammalian methylated genomes, is always explicable in terms of the CpG trend under the methylation model. Covariance of dinucleotide abundances with G + C demonstrates that correlation analyses should control for G + C. Doing this reveals a strong negative correlation between local CpG and TpG abundances in methylated genomes, in accord with the methylation hypothesis. CpG deficit also correlates with CpT excess in mammals, which may reflect enhanced cytosine mutation in the context 5'-YCG-3'. Analyses with repeat-masked sequences show that the results are not attributable to repetitive elements.
Collapse
Affiliation(s)
- Martin W Simmen
- School of Biomedical Sciences, University of Edinburgh, Edinburgh EH8 9XD, UK.
| |
Collapse
|
38
|
Han L, Su B, Li WH, Zhao Z. CpG island density and its correlations with genomic features in mammalian genomes. Genome Biol 2008; 9:R79. [PMID: 18477403 PMCID: PMC2441465 DOI: 10.1186/gb-2008-9-5-r79] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2008] [Revised: 04/08/2008] [Accepted: 05/13/2008] [Indexed: 11/25/2022] Open
Abstract
A systematic analysis of CpG islands in ten mammalian genomes suggests that an increase in chromosome number elevates GC content and prevents loss of CpG islands. Background CpG islands, which are clusters of CpG dinucleotides in GC-rich regions, are considered gene markers and represent an important feature of mammalian genomes. Previous studies of CpG islands have largely been on specific loci or within one genome. To date, there seems to be no comparative analysis of CpG islands and their density at the DNA sequence level among mammalian genomes and of their correlations with other genome features. Results In this study, we performed a systematic analysis of CpG islands in ten mammalian genomes. We found that both the number of CpG islands and their density vary greatly among genomes, though many of these genomes encode similar numbers of genes. We observed significant correlations between CpG island density and genomic features such as number of chromosomes, chromosome size, and recombination rate. We also observed a trend of higher CpG island density in telomeric regions. Furthermore, we evaluated the performance of three computational algorithms for CpG island identifications. Finally, we compared our observations in mammals to other non-mammal vertebrates. Conclusion Our study revealed that CpG islands vary greatly among mammalian genomes. Some factors such as recombination rate and chromosome size might have influenced the evolution of CpG islands in the course of mammalian evolution. Our results suggest a scenario in which an increase in chromosome number increases the rate of recombination, which in turn elevates GC content to help prevent loss of CpG islands and maintain their density. These findings should be useful for studying mammalian genomes, the role of CpG islands in gene function, and molecular evolution.
Collapse
Affiliation(s)
- Leng Han
- Department of Psychiatry, Virginia Commonwealth University, Richmond, VA 23298, USA.
| | | | | | | |
Collapse
|
39
|
LI MK, GU L, CHEN SS, DAI JQ, TAO SH. Evolution of the isochore structure in the scale of chromosome: insight from the mutation bias and fixation bias. J Evol Biol 2007; 21:173-182. [DOI: 10.1111/j.1420-9101.2007.01455.x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
40
|
Shastry BS. SNPs in disease gene mapping, medicinal drug development and evolution. J Hum Genet 2007; 52:871-880. [PMID: 17928948 DOI: 10.1007/s10038-007-0200-z] [Citation(s) in RCA: 105] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2007] [Accepted: 09/18/2007] [Indexed: 01/02/2023]
Abstract
Single nucleotide polymorphism (SNP) technologies can be used to identify disease-causing genes in humans and to understand the inter-individual variation in drug response. These areas of research have major medical benefits. By establishing an association between the genetic make-up of an individual and drug response it may be possible to develop a genome-based diet and medicines that are more effective and safer for each individual. Additionally, SNPs can be used to understand the molecular mechanisms of sequence evolution. It has been found that throughout the given gene, the rate, type and site of nucleotide substitutions as well as the selection pressure on codons is not uniform. The residues that evolve under strong selective pressures are found to be significantly associated with human disease. Deleterious mutations that affect biological function of proteins are effectively being rejected by natural selection from the gene pool. If substituted nucleotides are fixed during evolution then they may have selection advantages, they may be neutral, or they may be deleterious and cause pathology. Therefore, it is possible that disease-associated SNPs (or pathology) and evolution can be related to one another.
Collapse
Affiliation(s)
- Barkur S Shastry
- Department of Biological Sciences, Oakland University, Rochester, MI, USA.
| |
Collapse
|
41
|
Jiang C, Han L, Su B, Li WH, Zhao Z. Features and Trend of Loss of Promoter-Associated CpG Islands in the Human and Mouse Genomes. Mol Biol Evol 2007; 24:1991-2000. [PMID: 17591602 DOI: 10.1093/molbev/msm128] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
CpG islands (CGIs) are often considered as gene markers, but the number of CGIs varies among mammalian genomes that have similar numbers of genes. In this study, we investigated the distribution of CGIs in the promoter regions of 3,197 human-mouse orthologous gene pairs and found that the mouse genome has notably fewer CGIs in the promoter regions and less pronounced CGI characteristics than does the human genome. We further inferred CGI's ancestral state using the dog genome as a reference and examined the nucleotide substitution pattern and the mutational direction in the conserved regions of human and mouse CGIs. The results reveal many losses of CGIs in both genomes but the loss rate in the mouse lineage is two to four times the rate in the human lineage. We found an intriguing feature of CGI loss, namely that the loss of a CGI usually starts from erosion at the both edges and gradually moves towards the center. We found functional bias in the genes that have lost promoter-associated CGIs in the human or mouse lineage. Finally, our analysis indicates that the association of CGIs with housekeeping genes is not as strong as previously estimated. Our study provides a detailed view of the evolution of promoter-associated CGIs in the human and mouse genomes and our findings are helpful for understanding the evolution of mammalian genomes and the role of CGIs in gene function.
Collapse
Affiliation(s)
- Cizhong Jiang
- Department of Psychiatry and Center for the Study of Biological Complexity, Virginia Commonwealth, USA
| | | | | | | | | |
Collapse
|
42
|
Wang GZ, Chen LL, Zhang HY. Phase-dependent nucleotide substitution in protein-coding sequences. Biochem Biophys Res Commun 2007; 355:599-602. [PMID: 17300744 DOI: 10.1016/j.bbrc.2007.01.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2006] [Accepted: 01/02/2007] [Indexed: 11/21/2022]
Abstract
It is well known that due to the degeneracy of genetic code, most of the silent substitutions appear in the third codon position, so the mutation frequency of the third codon position is much higher than that of the first two positions. However, it remains unknown whether the directionality of point mutation in three codon positions is similar or not. In this paper, through analyzing 15 sets of orthologous genes, it is revealed that most of the substitution types are significantly different between any two codon positions, especially between the 2nd and the 3rd phases. Furthermore, the average frequencies of each type of substitution calculated from the fifteen sets of orthologous genes are similar to those identified in single nucleotide polymorphisms (SNPs) of human and mouse genome. The present analyses suggest that the nucleotide substitution in protein-coding sequences is not only context-dependent (so called neighboring-nucleotide effects), but also phase-dependent, which is of significance to improving the prevalent nucleotide-evolution models.
Collapse
Affiliation(s)
- Guang-Zhong Wang
- Shandong Provincial Research Center for Bioinformatic Engineering and Technique, Center for Advanced Study, Shandong University of Technology, Zibo 255049, PR China
| | | | | |
Collapse
|
43
|
Seo D, Jiang C, Zhao Z. A novel statistical method to estimate the effective SNP size in vertebrate genomes and categorized genomic regions. BMC Genomics 2006; 7:329. [PMID: 17196097 PMCID: PMC1769377 DOI: 10.1186/1471-2164-7-329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2006] [Accepted: 12/29/2006] [Indexed: 11/29/2022] Open
Abstract
Background The local environment of single nucleotide polymorphisms (SNPs) contains abundant genetic information for the study of mechanisms of mutation, genome evolution, and causes of diseases. Recent studies revealed that neighboring-nucleotide biases on SNPs were strong and the genome-wide bias patterns could be represented by a small subset of the total SNPs. It remains unsolved for the estimation of the effective SNP size, the number of SNPs that are sufficient to represent the bias patterns observed from the whole SNP data. Results To estimate the effective SNP size, we developed a novel statistical method, SNPKS, which considers both the statistical and biological significances. SNPKS consists of two major steps: to obtain an initial effective size by the Kolmogorov-Smirnov test (KS test) and to find an intermediate effective size by interval evaluation. The SNPKS algorithm was implemented in computer programs and applied to the real SNP data. The effective SNP size was estimated to be 38,200, 39,300, 38,000, and 38,700 in the human, chimpanzee, dog, and mouse genomes, respectively, and 39,100, 39,600, 39,200, and 42,200 in human intergenic, genic, intronic, and CpG island regions, respectively. Conclusion SNPKS is the first statistical method to estimate the effective SNP size. It runs efficiently and greatly outperforms the algorithm implemented in SNPNB. The application of SNPKS to the real SNP data revealed the similar small effective SNP size (38,000 – 42,200) in the human, chimpanzee, dog, and mouse genomes as well as in human genomic regions. The findings suggest strong influence of genetic factors across vertebrate genomes.
Collapse
|
44
|
Zhao Z, Jiang C. Methylation-dependent transition rates are dependent on local sequence lengths and genomic regions. Mol Biol Evol 2006; 24:23-5. [PMID: 17056644 DOI: 10.1093/molbev/msl156] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Recently, Fryxell and Moon (2005) examined methylation-dependent transition rates (5mC deamination rates), which were calculated by the difference between the CpG transition and GpC transition rates, using 4,437 transition mutations in CpG or GpC dinucleotides. They concluded that 5mC deamination rates were highly dependent on local GC content but not on local sequence lengths over which GC content was calculated or the genomic regions where the mutations occurred. Here, we reexamined these statements by using 292,216 CpG-->TpG/CpA and GpC-->GpT/ApC mutations, an increase of 66 times as much data. Contrary to Fryxell and Moon's conclusions, our analysis indicated that 5mC deamination rates in the human genome were dependent on both the local sequence length and the genomic region. Some explanations for their conclusions were provided.
Collapse
|