1
|
Liao X, Li M, Hu K, Wu FX, Gao X, Wang J. A sensitive repeat identification framework based on short and long reads. Nucleic Acids Res 2021; 49:e100. [PMID: 34214175 PMCID: PMC8464074 DOI: 10.1093/nar/gkab563] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Revised: 06/08/2021] [Accepted: 06/18/2021] [Indexed: 12/11/2022] Open
Abstract
Numerous studies have shown that repetitive regions in genomes play indispensable roles in the evolution, inheritance and variation of living organisms. However, most existing methods cannot achieve satisfactory performance on identifying repeats in terms of both accuracy and size, since NGS reads are too short to identify long repeats whereas SMS (Single Molecule Sequencing) long reads are with high error rates. In this study, we present a novel identification framework, LongRepMarker, based on the global de novo assembly and k-mer based multiple sequence alignment for precisely marking long repeats in genomes. The major characteristics of LongRepMarker are as follows: (i) by introducing barcode linked reads and SMS long reads to assist the assembly of all short paired-end reads, it can identify the repeats to a greater extent; (ii) by finding the overlap sequences between assemblies or chomosomes, it locates the repeats faster and more accurately; (iii) by using the multi-alignment unique k-mers rather than the high frequency k-mers to identify repeats in overlap sequences, it can obtain the repeats more comprehensively and stably; (iv) by applying the parallel alignment model based on the multi-alignment unique k-mers, the efficiency of data processing can be greatly optimized and (v) by taking the corresponding identification strategies, structural variations that occur between repeats can be identified. Comprehensive experimental results show that LongRepMarker can achieve more satisfactory results than the existing de novo detection methods (https://github.com/BioinformaticsCSU/LongRepMarker).
Collapse
Affiliation(s)
- Xingyu Liao
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, P.R. China
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, P.R. China
| | - Kang Hu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, P.R. China
| | - Fang-Xiang Wu
- Department of Mechanical Engineering and Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N5A9, Canada
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, P.R. China
| |
Collapse
|
2
|
Long H, Miller SF, Williams E, Lynch M. Specificity of the DNA Mismatch Repair System (MMR) and Mutagenesis Bias in Bacteria. Mol Biol Evol 2019; 35:2414-2421. [PMID: 29939310 DOI: 10.1093/molbev/msy134] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
The mutation rate of an organism is influenced by the interaction of evolutionary forces such as natural selection and genetic drift. However, the mutation spectrum (i.e., the frequency distribution of different types of mutations) can be heavily influenced by DNA repair. Using mutation-accumulation lines of the extremophile bacterium Deinococcus radiodurans ΔmutS1 and the model soil bacterium Pseudomonas fluorescens wild-type and MMR- (Methyl-dependent Mismatch Repair-deficient) strains, we report the mutational features of these two important bacteria. We find that P. fluorescens has one of the highest MMR repair efficiencies among tested bacteria. We also discover that MMR of D. radiodurans preferentially repairs deletions, contrary to all other bacteria examined. We then, for the first time, quantify genome-wide efficiency and specificity of MMR in repairing different genomic regions and mutation types, by evaluating the P. fluorescens and D. radiodurans mutation data sets, along with previously reported ones of Bacillus subtilis subsp. subtilis, Escherichia coli, Vibrio cholerae, and V. fischeri. MMR in all six bacteria shares two general features: 1) repair efficiency is influenced by the neighboring base composition for both transitions and transversions, not limited to transversions as previously reported; and 2) MMR only recognizes indels <4 bp in length. This study demonstrates the power of mutation accumulation lines in quantifying DNA repair and mutagenesis patterns.
Collapse
Affiliation(s)
- Hongan Long
- Institute of Evolution & Marine Biodiversity, KLMME, Ocean University of China, Qingdao, Shandong, China
| | - Samuel F Miller
- Center for Mechanisms of Evolution, The Biodesign Institute, Arizona State University, Tempe, AZ
| | - Emily Williams
- Center for Mechanisms of Evolution, The Biodesign Institute, Arizona State University, Tempe, AZ
| | - Michael Lynch
- Center for Mechanisms of Evolution, The Biodesign Institute, Arizona State University, Tempe, AZ
| |
Collapse
|
3
|
Jiang X, Tang H, Mohammed Ismail W, Lynch M. A Maximum-Likelihood Approach to Estimating the Insertion Frequencies of Transposable Elements from Population Sequencing Data. Mol Biol Evol 2018; 35:2560-2571. [PMID: 30099533 PMCID: PMC6188571 DOI: 10.1093/molbev/msy152] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Transposable elements (TEs) contribute to a large fraction of the expansion of many eukaryotic genomes due to the capability of TEs duplicating themselves through transposition. A first step to understanding the roles of TEs in a eukaryotic genome is to characterize the population-wide variation of TE insertions in the species. Here, we present a maximum-likelihood (ML) method for estimating allele frequencies and detecting selection on TE insertions in a diploid population, based on the genotypes at TE insertion sites detected in multiple individuals sampled from the population using paired-end (PE) sequencing reads. Tests of the method on simulated data show that it can accurately estimate the allele frequencies of TE insertions even when the PE sequencing is conducted at a relatively low coverage (=5X). The method can also detect TE insertions under strong selection, and the detection ability increases with sample size in a population, although a substantial fraction of actual TE insertions under selection may be undetected. Application of the ML method to genomic sequencing data collected from a natural Daphnia pulex population shows that, on the one hand, most (>90%) TE insertions present in the reference D. pulex genome are either fixed or nearly fixed (with allele frequencies >0.95); on the other hand, among the nonreference TE insertions (i.e., those detected in some individuals in the population but absent from the reference genome), the majority (>70%) are still at low frequencies (<0.1). Finally, we detected a substantial fraction (∼9%) of nonreference TE insertions under selection.
Collapse
Affiliation(s)
- Xiaoqian Jiang
- Department of Biology, Indiana University, Bloomington, IN
| | - Haixu Tang
- School of Informatics and Computing, Indiana University, Bloomington, IN
| | | | - Michael Lynch
- Center for Mechanisms of Evolution, Arizona State University, Temple, AZ
| |
Collapse
|
4
|
Nzabarushimana E, Tang H. Insertion sequence elements-mediated structural variations in bacterial genomes. Mob DNA 2018; 9:29. [PMID: 30181787 PMCID: PMC6114881 DOI: 10.1186/s13100-018-0134-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2018] [Accepted: 08/16/2018] [Indexed: 12/20/2022] Open
Abstract
Mobile genetic elements (MGEs) impact the evolution and stability of their host genomes. Insertion sequence (IS) elements are the most common MGEs in bacterial genomes and play a crucial role in mediating large-scale variations in bacterial genomes. It is understood that IS elements and MGEs in general coexist in a dynamical equilibrium with their respective hosts. Current studies indicate that the spontaneous movement of IS elements does not follow a constant rate in different bacterial genomes. However, due to the paucity and sparsity of the data, these observations are yet to be conclusive. In this paper, we conducted a comparative analysis of the IS-mediated genome structural variations in ten mutation accumulation (MA) experiments across eight strains of five bacterial species containing IS elements, including four strains of the E. coli. We used GRASPER algorithm, a denovo structural variation (SV) identification algorithm designed to detect SVs involving repetitive sequences in the genome. We observed highly diverse rates of IS insertions and IS-mediated recombinations across different bacterial species as well as across different strains of the same bacterial species. We also observed different rates of the elements from the same IS family in different bacterial genomes, suggesting that the distinction in rates might not be due to the different composition of IS elements across bacterial genomes.
Collapse
Affiliation(s)
- Etienne Nzabarushimana
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN USA
| | - Haixu Tang
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN USA
| |
Collapse
|
5
|
Escherichia coli cultures maintain stable subpopulation structure during long-term evolution. Proc Natl Acad Sci U S A 2018; 115:E4642-E4650. [PMID: 29712844 DOI: 10.1073/pnas.1708371115] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
How genetic variation is generated and maintained remains a central question in evolutionary biology. When presented with a complex environment, microbes can take advantage of genetic variation to exploit new niches. Here we present a massively parallel experiment where WT and repair-deficient (∆mutL) Escherichia coli populations have evolved over 3 y in a spatially heterogeneous and nutritionally complex environment. Metagenomic sequencing revealed that these initially isogenic populations evolved and maintained stable subpopulation structure in just 10 mL of medium for up to 10,000 generations, consisting of up to five major haplotypes with many minor haplotypes. We characterized the genomic, transcriptomic, exometabolomic, and phenotypic differences between clonal isolates, revealing subpopulation structure driven primarily by spatial segregation followed by differential utilization of nutrients. In addition to genes regulating the import and catabolism of nutrients, major polymorphisms of note included insertion elements transposing into fimE (regulator of the type I fimbriae) and upstream of hns (global regulator of environmental-change and stress-response genes), both known to regulate biofilm formation. Interestingly, these genes have also been identified as critical to colonization in uropathogenic E. coli infections. Our findings illustrate the complexity that can arise and persist even in small cultures, raising the possibility that infections may often be promoted by an evolving and complex pathogen population.
Collapse
|
6
|
Schroeder JW, Yeesin P, Simmons LA, Wang JD. Sources of spontaneous mutagenesis in bacteria. Crit Rev Biochem Mol Biol 2017; 53:29-48. [PMID: 29108429 DOI: 10.1080/10409238.2017.1394262] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Mutations in an organism's genome can arise spontaneously, that is, in the absence of exogenous stress and prior to selection. Mutations are often neutral or deleterious to individual fitness but can also provide genetic diversity driving evolution. Mutagenesis in bacteria contributes to the already serious and growing problem of antibiotic resistance. However, the negative impacts of spontaneous mutagenesis on human health are not limited to bacterial antibiotic resistance. Spontaneous mutations also underlie tumorigenesis and evolution of drug resistance. To better understand the causes of genetic change and how they may be manipulated in order to curb antibiotic resistance or the development of cancer, we must acquire a mechanistic understanding of the major sources of mutagenesis. Bacterial systems are particularly well-suited to studying mutagenesis because of their fast growth rate and the panoply of available experimental tools, but efforts to understand mutagenic mechanisms can be complicated by the experimental system employed. Here, we review our current understanding of mutagenic mechanisms in bacteria and describe the methods used to study mutagenesis in bacterial systems.
Collapse
Affiliation(s)
- Jeremy W Schroeder
- a Department of Bacteriology , University of Wisconsin - Madison , Madison , WI , USA
| | - Ponlkrit Yeesin
- a Department of Bacteriology , University of Wisconsin - Madison , Madison , WI , USA
| | - Lyle A Simmons
- b Department of Molecular, Cellular, and Developmental Biology , University of Michigan , Ann Arbor , MI , USA
| | - Jue D Wang
- a Department of Bacteriology , University of Wisconsin - Madison , Madison , WI , USA
| |
Collapse
|
7
|
Jiang X, Tang H, Ye Z, Lynch M. Insertion Polymorphisms of Mobile Genetic Elements in Sexual and Asexual Populations of Daphnia pulex. Genome Biol Evol 2017; 9:362-374. [PMID: 28057730 PMCID: PMC5381639 DOI: 10.1093/gbe/evw302] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/29/2016] [Indexed: 12/20/2022] Open
Abstract
Transposable elements (TEs) constitute a substantial portion of many eukaryotic genomes, and can in principle contribute to evolutionary innovation as well as genomic deterioration. Daphnia pulex serves as a useful model for studying TE dynamics as a potential cause and/or consequence of asexuality. We analyzed insertion polymorphisms of TEs in 20 sexual and 20 asexual isolates of D. pulex across North American from their available whole-genome sequencing data. Our results show that the total fraction of the derived sequences of TEs is not substantially different between asexual and sexual D. pulex isolates. However, in general, sexual clones contain fewer fixed TE insertions but more total insertion polymorphisms than asexual clones, supporting the hypothesis that sexual reproduction facilitates the spread and elimination of TEs. We identified nine asexual-specific fixed TE insertions, eight long terminal repeat retrotransposons, and one DNA transposon. By comparison, no sexual-specific fixed TE insertions were observed in our analysis. Furthermore, except one TE insertion located on a contig from chromosome 7, the other eight asexual-specific insertion sites are located on contigs from chromosome 9 that is known to be associated with obligate asexuality in D. pulex. We found that all nine asexual-specific fixed TE insertions can also be detected in some Daphnia pulicaria isolates, indicating that a substantial number of TE insertions in asexual D. pulex have been directly inherited from D. pulicaria during the origin of obligate asexuals.
Collapse
Affiliation(s)
- Xiaoqian Jiang
- Department of Biology, Indiana University, Bloomington, Indiana
| | - Haixu Tang
- School of Informatics and Computing, Indiana University, Bloomington, Indiana
| | - Zhiqiang Ye
- Department of Biology, Indiana University, Bloomington, Indiana
| | - Michael Lynch
- Department of Biology, Indiana University, Bloomington, Indiana
| |
Collapse
|
8
|
Tincher C, Long H, Behringer M, Walker N, Lynch M. The Glyphosate-Based Herbicide Roundup Does not Elevate Genome-Wide Mutagenesis of Escherichia coli. G3 (BETHESDA, MD.) 2017; 7:3331-3335. [PMID: 28983068 PMCID: PMC5633383 DOI: 10.1534/g3.117.300133] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Accepted: 08/08/2017] [Indexed: 12/30/2022]
Abstract
Mutations induced by pollutants may promote pathogen evolution, for example by accelerating mutations conferring antibiotic resistance. Generally, evaluating the genome-wide mutagenic effects of long-term sublethal pollutant exposure at single-nucleotide resolution is extremely difficult. To overcome this technical barrier, we use the mutation accumulation/whole-genome sequencing (MA/WGS) method as a mutagenicity test, to quantitatively evaluate genome-wide mutagenesis of Escherichia coli after long-term exposure to a wide gradient of the glyphosate-based herbicide (GBH) Roundup Concentrate Plus. The genome-wide mutation rate decreases as GBH concentration increases, suggesting that even long-term GBH exposure does not compromise the genome stability of bacteria.
Collapse
Affiliation(s)
- Clayton Tincher
- Department of Biology, Indiana University, Bloomington, Indiana 47405
| | - Hongan Long
- Department of Biology, Indiana University, Bloomington, Indiana 47405
| | - Megan Behringer
- Department of Biology, Indiana University, Bloomington, Indiana 47405
| | - Noah Walker
- Department of Biology, Indiana University, Bloomington, Indiana 47405
| | - Michael Lynch
- Department of Biology, Indiana University, Bloomington, Indiana 47405
| |
Collapse
|
9
|
Shewaramani S, Finn TJ, Leahy SC, Kassen R, Rainey PB, Moon CD. Anaerobically Grown Escherichia coli Has an Enhanced Mutation Rate and Distinct Mutational Spectra. PLoS Genet 2017; 13:e1006570. [PMID: 28103245 PMCID: PMC5289635 DOI: 10.1371/journal.pgen.1006570] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Revised: 02/02/2017] [Accepted: 01/04/2017] [Indexed: 12/21/2022] Open
Abstract
Oxidative stress is a major cause of mutation but little is known about how growth in the absence of oxygen impacts the rate and spectrum of mutations. We employed long-term mutation accumulation experiments to directly measure the rates and spectra of spontaneous mutation events in Escherichia coli populations propagated under aerobic and anaerobic conditions. To detect mutations, whole genome sequencing was coupled with methods of analysis sufficient to identify a broad range of mutational classes, including structural variants (SVs) generated by movement of repetitive elements. The anaerobically grown populations displayed a mutation rate nearly twice that of the aerobic populations, showed distinct asymmetric mutational strand biases, and greater insertion element activity. Consistent with mutation rate and spectra observations, genes for transposition and recombination repair associated with SVs were up-regulated during anaerobic growth. Together, these results define differences in mutational spectra affecting the evolution of facultative anaerobes.
Collapse
Affiliation(s)
- Sonal Shewaramani
- AgResearch Ltd, Grasslands Research Centre, Palmerston North, New Zealand
- New Zealand Institute for Advanced Study, Massey University, Auckland, New Zealand
| | - Thomas J. Finn
- AgResearch Ltd, Grasslands Research Centre, Palmerston North, New Zealand
- New Zealand Institute for Advanced Study, Massey University, Auckland, New Zealand
| | - Sinead C. Leahy
- AgResearch Ltd, Grasslands Research Centre, Palmerston North, New Zealand
| | - Rees Kassen
- Department of Biology, University of Ottawa, Ottawa, Ontario, Canada
| | - Paul B. Rainey
- New Zealand Institute for Advanced Study, Massey University, Auckland, New Zealand
- Department of Microbial Population Biology, Max Planck Institute for Evolutionary Biology, Plön, Germany
- Ecole Supérieure de Physique et de Chimie Industrielles de la Ville de Paris (ESPCI ParisTech), CNRS UMR 8231, PSL Research University, Paris, France
| | - Christina D. Moon
- AgResearch Ltd, Grasslands Research Centre, Palmerston North, New Zealand
- * E-mail:
| |
Collapse
|
10
|
Lee H, Doak TG, Popodi E, Foster PL, Tang H. Insertion sequence-caused large-scale rearrangements in the genome of Escherichia coli. Nucleic Acids Res 2016; 44:7109-19. [PMID: 27431326 PMCID: PMC5009759 DOI: 10.1093/nar/gkw647] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2016] [Accepted: 07/08/2016] [Indexed: 12/27/2022] Open
Abstract
A majority of large-scale bacterial genome rearrangements involve mobile genetic elements such as insertion sequence (IS) elements. Here we report novel insertions and excisions of IS elements and recombination between homologous IS elements identified in a large collection of Escherichia coli mutation accumulation lines by analysis of whole genome shotgun sequencing data. Based on 857 identified events (758 IS insertions, 98 recombinations and 1 excision), we estimate that the rate of IS insertion is 3.5 × 10(-4) insertions per genome per generation and the rate of IS homologous recombination is 4.5 × 10(-5) recombinations per genome per generation. These events are mostly contributed by the IS elements IS1, IS2, IS5 and IS186 Spatial analysis of new insertions suggest that transposition is biased to proximal insertions, and the length spectrum of IS-caused deletions is largely explained by local hopping. For any of the ISs studied there is no region of the circular genome that is favored or disfavored for new insertions but there are notable hotspots for deletions. Some elements have preferences for non-coding sequence or for the beginning and end of coding regions, largely explained by target site motifs. Interestingly, transposition and deletion rates remain constant across the wild-type and 12 mutant E. coli lines, each deficient in a distinct DNA repair pathway. Finally, we characterized the target sites of four IS families, confirming previous results and characterizing a highly specific pattern at IS186 target-sites, 5'-GGGG(N6/N7)CCCC-3'. We also detected 48 long deletions not involving IS elements.
Collapse
Affiliation(s)
- Heewook Lee
- School of Informatics and Computing, Indiana University, Bloomington, IN 47401, USA Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Thomas G Doak
- Department of Biology, Indiana University, Bloomington, IN 47401, USA National Center for Genome Analysis Support, Indiana University, Bloomington, IN 47401, USA
| | - Ellen Popodi
- Department of Biology, Indiana University, Bloomington, IN 47401, USA
| | - Patricia L Foster
- Department of Biology, Indiana University, Bloomington, IN 47401, USA
| | - Haixu Tang
- School of Informatics and Computing, Indiana University, Bloomington, IN 47401, USA
| |
Collapse
|
11
|
Antibiotic treatment enhances the genome-wide mutation rate of target cells. Proc Natl Acad Sci U S A 2016; 113:E2498-505. [PMID: 27091991 DOI: 10.1073/pnas.1601208113] [Citation(s) in RCA: 131] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Although it is well known that microbial populations can respond adaptively to challenges from antibiotics, empirical difficulties in distinguishing the roles of de novo mutation and natural selection have left several issues unresolved. Here, we explore the mutational properties of Escherichia coli exposed to long-term sublethal levels of the antibiotic norfloxacin, using a mutation accumulation design combined with whole-genome sequencing of replicate lines. The genome-wide mutation rate significantly increases with norfloxacin concentration. This response is associated with enhanced expression of error-prone DNA polymerases and may also involve indirect effects of norfloxacin on DNA mismatch and oxidative-damage repair. Moreover, we find that acquisition of antibiotic resistance can be enhanced solely by accelerated mutagenesis, i.e., without direct involvement of selection. Our results suggest that antibiotics may generally enhance the mutation rates of target cells, thereby accelerating the rate of adaptation not only to the antibiotic itself but to additional challenges faced by invasive pathogens.
Collapse
|
12
|
Dozmorov MG, Adrianto I, Giles CB, Glass E, Glenn SB, Montgomery C, Sivils KL, Olson LE, Iwayama T, Freeman WM, Lessard CJ, Wren JD. Detrimental effects of duplicate reads and low complexity regions on RNA- and ChIP-seq data. BMC Bioinformatics 2015; 16 Suppl 13:S10. [PMID: 26423047 PMCID: PMC4597324 DOI: 10.1186/1471-2105-16-s13-s10] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Background Adapter trimming and removal of duplicate reads are common practices in next-generation sequencing pipelines. Sequencing reads ambiguously mapped to repetitive and low complexity regions can also be problematic for accurate assessment of the biological signal, yet their impact on sequencing data has not received much attention. We investigate how trimming the adapters, removing duplicates, and filtering out reads overlapping low complexity regions influence the significance of biological signal in RNA- and ChIP-seq experiments. Methods We assessed the effect of data processing steps on the alignment statistics and the functional enrichment analysis results of RNA- and ChIP-seq data. We compared differentially processed RNA-seq data with matching microarray data on the same patient samples to determine whether changes in pre-processing improved correlation between the two. We have developed a simple tool to remove low complexity regions, RepeatSoaker, available at https://github.com/mdozmorov/RepeatSoaker, and tested its effect on the alignment statistics and the results of the enrichment analyses. Results Both adapter trimming and duplicate removal moderately improved the strength of biological signals in RNA-seq and ChIP-seq data. Aggressive filtering of reads overlapping with low complexity regions, as defined by RepeatMasker, further improved the strength of biological signals, and the correlation between RNA-seq and microarray gene expression data. Conclusions Adapter trimming and duplicates removal, coupled with filtering out reads overlapping low complexity regions, is shown to increase the quality and reliability of detecting biological signals in RNA-seq and ChIP-seq data.
Collapse
|
13
|
Long H, Kucukyildirim S, Sung W, Williams E, Lee H, Ackerman M, Doak TG, Tang H, Lynch M. Background Mutational Features of the Radiation-Resistant Bacterium Deinococcus radiodurans. Mol Biol Evol 2015; 32:2383-92. [PMID: 25976352 DOI: 10.1093/molbev/msv119] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Deinococcus bacteria are extremely resistant to radiation, oxidation, and desiccation. Resilience to these factors has been suggested to be due to enhanced damage prevention and repair mechanisms, as well as highly efficient antioxidant protection systems. Here, using mutation-accumulation experiments, we find that the GC-rich Deinococcus radiodurans has an overall background genomic mutation rate similar to that of E. coli, but differs in mutation spectrum, with the A/T to G/C mutation rate (based on a total count of 88 A:T → G:C transitions and 82 A:T → C:G transversions) per site per generation higher than that in the other direction (based on a total count of 157 G:C → A:T transitions and 33 G:C → T:A transversions). We propose that this unique spectrum is shaped mainly by the abundant uracil DNA glycosylases reducing G:C → A:T transitions, adenine methylation elevating A:T → C:G transversions, and absence of cytosine methylation decreasing G:C → A:T transitions. As opposed to the greater than 100× elevation of the mutation rate in MMR(-) (DNA Mismatch Repair deficient) strains of most other organisms, MMR(-) D. radiodurans only exhibits a 4-fold elevation, raising the possibility that other DNA repair mechanisms compensate for a relatively low-efficiency DNA MMR pathway. As D. radiodurans has plentiful insertion sequence (IS) elements in the genome and the activities of IS elements are rarely directly explored, we also estimated the insertion (transposition) rate of the IS elements to be 2.50 × 10(-3) per genome per generation in the wild-type strain; knocking out MMR did not elevate the IS element insertion rate in this organism.
Collapse
Affiliation(s)
- Hongan Long
- Department of Biology, Indiana University, Bloomington
| | | | - Way Sung
- Department of Biology, Indiana University, Bloomington
| | | | - Heewook Lee
- School of Informatics and Computing, Indiana University, Bloomington
| | | | - Thomas G Doak
- Department of Biology, Indiana University, Bloomington National Center for Genome Analysis Support, Indiana University, Bloomington
| | - Haixu Tang
- School of Informatics and Computing, Indiana University, Bloomington
| | - Michael Lynch
- Department of Biology, Indiana University, Bloomington
| |
Collapse
|