1
|
Yang X, Shi X, Lai L, Chen C, Xu H, Deng M. Towards long double-stranded chains and robust DNA-based data storage using the random code system. Front Genet 2023; 14:1179867. [PMID: 37384333 PMCID: PMC10294226 DOI: 10.3389/fgene.2023.1179867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Accepted: 05/31/2023] [Indexed: 06/30/2023] Open
Abstract
DNA has become a popular choice for next-generation storage media due to its high storage density and stability. As the storage medium of life's information, DNA has significant storage capacity and low-cost, low-power replication and transcription capabilities. However, utilizing long double-stranded DNA for storage can introduce unstable factors that make it difficult to meet the constraints of biological systems. To address this challenge, we have designed a highly robust coding scheme called the "random code system," inspired by the idea of fountain codes. The random code system includes the establishment of a random matrix, Gaussian preprocessing, and random equilibrium. Compared to Luby transform codes (LT codes), random code (RC) has better robustness and recovery ability of lost information. In biological experiments, we successfully stored 29,390 bits of data in 25,700 bp chains, achieving a storage density of 1.78 bits per nucleotide. These results demonstrate the potential for using long double-stranded DNA and the random code system for robust DNA-based data storage.
Collapse
Affiliation(s)
- Xu Yang
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Xiaolong Shi
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Langwen Lai
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Congzhou Chen
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China
| | - Huaisheng Xu
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Ming Deng
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| |
Collapse
|
2
|
Mortuza GM, Guerrero J, Llewellyn S, Tobiason MD, Dickinson GD, Hughes WL, Zadegan R, Andersen T. In-vitro validated methods for encoding digital data in deoxyribonucleic acid (DNA). BMC Bioinformatics 2023; 24:160. [PMID: 37085766 PMCID: PMC10120115 DOI: 10.1186/s12859-023-05264-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 03/30/2023] [Indexed: 04/23/2023] Open
Abstract
Deoxyribonucleic acid (DNA) is emerging as an alternative archival memory technology. Recent advancements in DNA synthesis and sequencing have both increased the capacity and decreased the cost of storing information in de novo synthesized DNA pools. In this survey, we review methods for translating digital data to and/or from DNA molecules. An emphasis is placed on methods which have been validated by storing and retrieving real-world data via in-vitro experiments.
Collapse
Affiliation(s)
- Golam Md Mortuza
- Department of Computer Science, Boise State University, Boise, Idaho, USA
| | - Jorge Guerrero
- Department of Nanoengineering, Joint School of Nanoscience and Nanoengineering, North Carolina A&T State University, Greensboro, NC, USA
| | | | | | | | - William L Hughes
- School of Engineering, Kelowna, University of British Columbia, Kelowna, British Columbia, Canada
| | - Reza Zadegan
- Department of Nanoengineering, Joint School of Nanoscience and Nanoengineering, North Carolina A&T State University, Greensboro, NC, USA.
| | - Tim Andersen
- Department of Computer Science, Boise State University, Boise, Idaho, USA.
| |
Collapse
|
3
|
Eckert KA. Nontraditional Roles of DNA Polymerase Eta Support Genome Duplication and Stability. Genes (Basel) 2023; 14:genes14010175. [PMID: 36672916 PMCID: PMC9858799 DOI: 10.3390/genes14010175] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 01/03/2023] [Accepted: 01/04/2023] [Indexed: 01/11/2023] Open
Abstract
DNA polymerase eta (Pol η) is a Y-family polymerase and the product of the POLH gene. Autosomal recessive inheritance of POLH mutations is the cause of the xeroderma pigmentosum variant, a cancer predisposition syndrome. This review summarizes mounting evidence for expanded Pol η cellular functions in addition to DNA lesion bypass that are critical for maintaining genome stability. In vitro, Pol η displays efficient DNA synthesis through difficult-to-replicate sequences, catalyzes D-loop extensions, and utilizes RNA-DNA hybrid templates. Human Pol η is constitutively present at the replication fork. In response to replication stress, Pol η is upregulated at the transcriptional and protein levels, and post-translational modifications regulate its localization to chromatin. Numerous studies show that Pol η is required for efficient common fragile site replication and stability. Additionally, Pol η can be recruited to stalled replication forks through protein-protein interactions, suggesting a broader role in replication fork recovery. During somatic hypermutations, Pol η is recruited by mismatch repair proteins and is essential for VH gene A:T basepair mutagenesis. Within the global context of repeat-dense genomes, the recruitment of Pol η to perform specialized functions during replication could promote genome stability by interrupting pure repeat arrays with base substitutions. Alternatively, not engaging Pol η in genome duplication is costly, as the absence of Pol η leads to incomplete replication and increased chromosomal instability.
Collapse
Affiliation(s)
- Kristin A Eckert
- Gittlen Cancer Research Laboratories, Department of Pathology, Penn State University College of Medicine, 500 University Drive, Hershey, PA 17036, USA
| |
Collapse
|
4
|
Nyuykonge B, Siddig EE, Konings M, Bakhiet S, Verbon A, Klaassen CHW, Fahal AH, van de Sande WWJ. Madurella mycetomatis grains within a eumycetoma lesion are clonal. Med Mycol 2022; 60:6643561. [PMID: 35833294 PMCID: PMC9335062 DOI: 10.1093/mmy/myac051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Revised: 06/16/2022] [Accepted: 07/12/2022] [Indexed: 11/23/2022] Open
Abstract
Eumycetoma is a neglected tropical infection of the subcutaneous tissue, characterized by tumor-like lesions and most commonly caused by the fungus Madurella mycetomatis. In the tissue, M. mycetomatis organizes itself in grains, and within a single lesion, thousands of grains can be present. The current hypothesis is that all these grains originate from a single causative agent, however, this hypothesis was never proven. Here, we used our recently developed MmySTR assay, a highly discriminative typing method, to determine the genotypes of multiple grains within a single lesion. Multiple grains from surgical lesions obtained from 11 patients were isolated and genotyped using the MmySTR panel. Within a single lesion, all tested grains shared the same genotype. Only in one single grain from one patient, a difference of one repeat unit in one MmySTR marker was noted relative to the other grains from that patient. We conclude that within these lesions the grains originate from a single clone and that the inherent unstable nature of the microsatellite markers may lead to small genotypic differences.
Collapse
Affiliation(s)
- Bertrand Nyuykonge
- Department of Medical Microbiology and Infectious Diseases, Erasmus MC, University Medical Centre Rotterdam, Rotterdam, Netherlands
| | - Emmanuel Edwar Siddig
- Mycetoma Research Centre, University of Khartoum, Khartoum, Sudan.,Faculty of medical laboratory sciences, University of Khartoum, Khartoum, Sudan
| | - Mickey Konings
- Department of Medical Microbiology and Infectious Diseases, Erasmus MC, University Medical Centre Rotterdam, Rotterdam, Netherlands
| | - Sahar Bakhiet
- Mycetoma Research Centre, University of Khartoum, Khartoum, Sudan
| | - Annelies Verbon
- Department of Medical Microbiology and Infectious Diseases, Erasmus MC, University Medical Centre Rotterdam, Rotterdam, Netherlands
| | - Corné H W Klaassen
- Department of Medical Microbiology and Infectious Diseases, Erasmus MC, University Medical Centre Rotterdam, Rotterdam, Netherlands
| | | | - Wendy W J van de Sande
- Department of Medical Microbiology and Infectious Diseases, Erasmus MC, University Medical Centre Rotterdam, Rotterdam, Netherlands
| |
Collapse
|
5
|
Huo Y, Zhao Y, Xu L, Yi H, Zhang Y, Jia X, Zhao H, Zhao J, Wang F. An integrated strategy for target SSR genotyping with toleration of nucleotide variations in the SSRs and flanking regions. BMC Bioinformatics 2021; 22:429. [PMID: 34496768 PMCID: PMC8424866 DOI: 10.1186/s12859-021-04351-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Accepted: 08/31/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND With the broad application of high-throughput sequencing and its reduced cost, simple sequence repeat (SSR) genotyping by sequencing (SSR-GBS) has been widely used for interpreting genetic data across different fields, including population genetic diversity and structure analysis, the construction of genetic maps, and the investigation of intraspecies relationships. The development of accurate and efficient typing strategies for SSR-GBS is urgently needed and several tools have been published. However, to date, no suitable accurate genotyping method can tolerate single nucleotide variations (SNVs) in SSRs and flanking regions. These SNVs may be caused by PCR and sequencing errors or SNPs among varieties, and they directly affect sequence alignment and genotyping accuracy. RESULTS Here, we report a new integrated strategy named the accurate microsatellite genotyping tool based on targeted sequencing (AMGT-TS) and provide a user-friendly web-based platform and command-line version of AMGT-TS. To handle SNVs in the SSRs or flanking regions, we developed a broad matching algorithm (BMA) that can quickly and accurately achieve SSR typing for ultradeep coverage and high-throughput analysis of loci with SNVs compatibility and grouping of typed reads for further in-depth information mining. To evaluate this tool, we tested 21 randomly sampled loci in eight maize varieties, accompanied by experimental validation on actual and simulated sequencing data. Our evaluation showed that, compared to other tools, AMGT-TS presented extremely accurate typing results with single base resolution for both homozygous and heterozygous samples. CONCLUSION This integrated strategy can achieve accurate SSR genotyping based on targeted sequencing, and it can tolerate single nucleotide variations in the SSRs and flanking regions. This method can be readily applied to divergent sequencing platforms and species and has excellent application prospects in genetic and population biology research. The web-based platform and command-line version of AMGT-TS are available at https://amgt-ts.plantdna.site:8445 and https://github.com/plantdna/amgt-ts , respectively.
Collapse
Affiliation(s)
- Yongxue Huo
- Maize Research Center, Beijing Academy of Agricultural and Forest Sciences (BAAFS)/Beijing Key Laboratory of Maize DNA Fingerprinting and Molecular Breeding, Beijing, 100097, China
| | - Yikun Zhao
- Maize Research Center, Beijing Academy of Agricultural and Forest Sciences (BAAFS)/Beijing Key Laboratory of Maize DNA Fingerprinting and Molecular Breeding, Beijing, 100097, China
| | - Liwen Xu
- Maize Research Center, Beijing Academy of Agricultural and Forest Sciences (BAAFS)/Beijing Key Laboratory of Maize DNA Fingerprinting and Molecular Breeding, Beijing, 100097, China
| | - Hongmei Yi
- Maize Research Center, Beijing Academy of Agricultural and Forest Sciences (BAAFS)/Beijing Key Laboratory of Maize DNA Fingerprinting and Molecular Breeding, Beijing, 100097, China
| | - Yunlong Zhang
- Maize Research Center, Beijing Academy of Agricultural and Forest Sciences (BAAFS)/Beijing Key Laboratory of Maize DNA Fingerprinting and Molecular Breeding, Beijing, 100097, China
| | - Xianqing Jia
- Maize Research Center, Beijing Academy of Agricultural and Forest Sciences (BAAFS)/Beijing Key Laboratory of Maize DNA Fingerprinting and Molecular Breeding, Beijing, 100097, China
| | - Han Zhao
- Provincial Key Laboratory of Agrobiology, Institute of Crop Germplasm and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing, 210014, Jiangsu, China
| | - Jiuran Zhao
- Maize Research Center, Beijing Academy of Agricultural and Forest Sciences (BAAFS)/Beijing Key Laboratory of Maize DNA Fingerprinting and Molecular Breeding, Beijing, 100097, China.
| | - Fengge Wang
- Maize Research Center, Beijing Academy of Agricultural and Forest Sciences (BAAFS)/Beijing Key Laboratory of Maize DNA Fingerprinting and Molecular Breeding, Beijing, 100097, China.
| |
Collapse
|
6
|
Jedrzejczak-Silicka M, Lepczynski A, Gołębiowski F, Dolata D, Dybus A. Application of PCR-HRM method for microsatellite polymorphism genotyping in the LDHA gene of pigeons (Columba livia). PLoS One 2021; 16:e0256065. [PMID: 34411134 PMCID: PMC8376019 DOI: 10.1371/journal.pone.0256065] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2020] [Accepted: 08/01/2021] [Indexed: 11/19/2022] Open
Abstract
High-resolution melting (HRM) is a post-PCR method that allows to discriminate genotypes based on fluorescence changes during the melting phase. HRM is used to detect mutations or polymorphisms (e.g. microsatellites, SNPs, indels). Here, the (TTTAT)3-5 microsatellite polymorphism within intron 6 of the LDHA gene in pigeons was analysed using the HRM method. Individuals (123 homing pigeons) were genotyped using conventional PCR. Birds were classified into groups based on genotype type and the results were tested by qPCR-HRM and verified using sequencing. Based on the evaluated protocol, five genotypes were identified that vary in the number of TTTAT repeat units (3/3, 4/4, 3/4, 4/5, and 5/5). Sequencing have confirmed the results obtained with qPCR-HRM and verified that HRM is a suitable method for identification of three-allele microsatellite polymorphisms. It can be concluded that the high-resolution melting (HRM) method can be effectively used for rapid (one-step) discrimination of the (TTTAT)3-5 microsatellite polymorphism in the pigeon’s LDHA gene.
Collapse
Affiliation(s)
- Magdalena Jedrzejczak-Silicka
- Faculty of Biotechnology and Animal Husbandry, Laboratory of Molecular Biology, West Pomeranian University of Technology, Szczecin, Poland
- * E-mail:
| | - Adam Lepczynski
- Department of Physiology, Cytobiology and Proteomics, West Pomeranian University of Technology, Szczecin, Poland
| | | | | | - Andrzej Dybus
- Faculty of Biotechnology and Animal Husbandry, Department of Genetics, West Pomeranian University of Technology, Szczecin, Poland
| |
Collapse
|
7
|
Jeong J, Park SJ, Kim JW, No JS, Jeon HH, Lee JW, No A, Kim S, Park H. Cooperative Sequence Clustering and Decoding for DNA Storage System with Fountain Codes. Bioinformatics 2021; 37:3136-3143. [PMID: 33904574 DOI: 10.1093/bioinformatics/btab246] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Revised: 03/03/2021] [Accepted: 04/13/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION In DNA storage systems, there are tradeoffs between writing and reading costs. Increasing the code rate of error-correcting codes may save writing cost, but it will need more sequence reads for data retrieval. There is potentially a way to improve sequencing and decoding processes in such a way that the reading cost induced by this tradeoff is reduced without increasing the writing cost. In past researches, clustering, alignment, and decoding processes were considered as separate stages but we believe that using the information from all these processes together may improve decoding performance. Actual experiments of DNA synthesis and sequencing should be performed because simulations cannot be relied on to cover all error possibilities in practical circumstances. RESULTS For DNA storage systems using fountain code and Reed-Solomon (RS) code, we introduce several techniques to improve the decoding performance. We designed the decoding process focusing on the cooperation of key components: Hamming-distance based clustering, discarding of abnormal sequence reads, RS error correction as well as detection, and quality score-based ordering of sequences. We synthesized 513.6KB data into DNA oligo pools and sequenced this data successfully with Illumina MiSeq instrument. Compared to Erlich's research, the proposed decoding method additionally incorporates sequence reads with minor errors which had been discarded before, and thuswas able to make use of 10.6-11.9% more sequence reads from the same sequencing environment, this resulted in 6.5-8.9% reduction in the reading cost. Channel characteristics including sequence coverage and read-length distributions are provided as well. AVAILABILITY The raw data files and the source codes of our experiments are available at: https://github.com/jhjeong0702/dna-storage.
Collapse
Affiliation(s)
- Jaeho Jeong
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, Korea
| | - Seong-Joon Park
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, Korea
| | - Jae-Won Kim
- Department of Electronic Engineering, Gyeongsang National University, Jinju, Korea
| | - Jong-Seon No
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, Korea
| | - Ha Hyeon Jeon
- Department of Chemical Engineering, POSTECH, Pohang, Korea
| | - Jeong Wook Lee
- Department of Chemical Engineering, POSTECH, Pohang, Korea
| | - Albert No
- Department of Electronic and Electrical Engineering, Hongik University, Seoul, Korea
| | - Sunghwan Kim
- School of Electrical Engineering, University of Ulsan, Ulsan, Korea
| | - Hosung Park
- Department of Computer Engineering and Department of ICT Convergence System Engineering, Chonnam National University, Gwangju, Korea
| |
Collapse
|
8
|
Shortt JA, Ruggiero RP, Cox C, Wacholder AC, Pollock DD. Finding and extending ancient simple sequence repeat-derived regions in the human genome. Mob DNA 2020; 11:11. [PMID: 32095164 PMCID: PMC7027126 DOI: 10.1186/s13100-020-00206-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Accepted: 02/04/2020] [Indexed: 12/19/2022] Open
Abstract
Background Previously, 3% of the human genome has been annotated as simple sequence repeats (SSRs), similar to the proportion annotated as protein coding. The origin of much of the genome is not well annotated, however, and some of the unidentified regions are likely to be ancient SSR-derived regions not identified by current methods. The identification of these regions is complicated because SSRs appear to evolve through complex cycles of expansion and contraction, often interrupted by mutations that alter both the repeated motif and mutation rate. We applied an empirical, kmer-based, approach to identify genome regions that are likely derived from SSRs. Results The sequences flanking annotated SSRs are enriched for similar sequences and for SSRs with similar motifs, suggesting that the evolutionary remains of SSR activity abound in regions near obvious SSRs. Using our previously described P-clouds approach, we identified ‘SSR-clouds’, groups of similar kmers (or ‘oligos’) that are enriched near a training set of unbroken SSR loci, and then used the SSR-clouds to detect likely SSR-derived regions throughout the genome. Conclusions Our analysis indicates that the amount of likely SSR-derived sequence in the human genome is 6.77%, over twice as much as previous estimates, including millions of newly identified ancient SSR-derived loci. SSR-clouds identified poly-A sequences adjacent to transposable element termini in over 74% of the oldest class of Alu (roughly, AluJ), validating the sensitivity of the approach. Poly-A’s annotated by SSR-clouds also had a length distribution that was more consistent with their poly-A origins, with mean about 35 bp even in older Alus. This work demonstrates that the high sensitivity provided by SSR-Clouds improves the detection of SSR-derived regions and will enable deeper analysis of how decaying repeats contribute to genome structure.
Collapse
Affiliation(s)
- Jonathan A Shortt
- 1Colorado Center for Personalized Medicine, University of Colorado School of Medicine, Aurora, CO 80045 USA
| | - Robert P Ruggiero
- 2Department of Biology, Southeast Missouri State University, Cape Girardeau, MO 63701 USA
| | - Corey Cox
- 1Colorado Center for Personalized Medicine, University of Colorado School of Medicine, Aurora, CO 80045 USA
| | - Aaron C Wacholder
- 3Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213 USA
| | - David D Pollock
- 4Department of Biochemistry & Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045 USA
| |
Collapse
|
9
|
Barton HJ, Zeng K. The Impact of Natural Selection on Short Insertion and Deletion Variation in the Great Tit Genome. Genome Biol Evol 2019; 11:1514-1524. [PMID: 30924871 PMCID: PMC6543879 DOI: 10.1093/gbe/evz068] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/27/2019] [Indexed: 12/11/2022] Open
Abstract
Insertions and deletions (INDELs) remain understudied, despite being the most common form of genetic variation after single nucleotide polymorphisms. This stems partly from the challenge of correctly identifying the ancestral state of an INDEL and thus identifying it as an insertion or a deletion. Erroneously assigned ancestral states can skew the site frequency spectrum, leading to artificial signals of selection. Consequently, the selective pressures acting on INDELs are, at present, poorly resolved. To tackle this issue, we have recently published a maximum likelihood approach to estimate the mutation rate and the distribution of fitness effects for INDELs. Our approach estimates and controls for the rate of ancestral state misidentification, overcoming issues plaguing previous INDEL studies. Here, we apply the method to INDEL polymorphism data from ten high coverage (∼44×) European great tit (Parus major) genomes. We demonstrate that coding INDELs are under strong purifying selection with a small proportion making it into the population (∼4%). However, among fixed coding INDELs, 71% of insertions and 86% of deletions are fixed by positive selection. In noncoding regions, we estimate ∼80% of insertions and ∼52% of deletions are effectively neutral, the remainder show signatures of purifying selection. Additionally, we see evidence of linked selection reducing INDEL diversity below background levels, both in proximity to exons and in areas of low recombination.
Collapse
Affiliation(s)
- Henry J Barton
- Department of Animal and Plant Sciences, University of Sheffield, United Kingdom
| | - Kai Zeng
- Department of Animal and Plant Sciences, University of Sheffield, United Kingdom
| |
Collapse
|
10
|
de Groot T, Meis JF. Microsatellite Stability in STR Analysis Aspergillus fumigatus Depends on Number of Repeat Units. Front Cell Infect Microbiol 2019; 9:82. [PMID: 30984630 PMCID: PMC6449440 DOI: 10.3389/fcimb.2019.00082] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 03/11/2019] [Indexed: 01/02/2023] Open
Abstract
More than a decade ago a short tandem repeat-based typing method was developed for the fungus Aspergillus fumigatus. This STRAf assay is based on the analysis of nine short tandem repeat markers. Interpretation of this STRAf assay is complicated when there are only one or two differences in tandem repeat markers between isolates, as the stability of these markers is unknown. To determine the stability of these nine markers, a STRAf assay was performed on 73–100 successive generations of five clonally expanded A. fumigatus isolates. In a total of 473 generations we found five times an increase of one tandem repeat unit. Three changes were found in the trinucleotide repeat marker STRAf 3A, while the other two were found in the trinucleotide repeat marker STRAf 3C. The di- or tetranucleotide repeats were not altered. The altered STRAf markers 3A and 3C demonstrated the highest number of repeat units (≥50) as compared to the other markers (≤26). Altogether, we demonstrated that 7 of 9 STRAf markers remain stable for 473 generations and that the frequency of alterations in tandem repeats is positively correlated with the number of repeats. The potential low level instability of STRAf markers 3A and 3C should be taken into account when interpreting STRAf data during an outbreak.
Collapse
Affiliation(s)
- Theun de Groot
- Department of Medical Microbiology and Infectious Diseases, Canisius Wilhelmina Hospital (CWZ), Nijmegen, Netherlands
| | - Jacques F Meis
- Department of Medical Microbiology and Infectious Diseases, Canisius Wilhelmina Hospital (CWZ), Nijmegen, Netherlands.,Centre of Expertise in Mycology, Radboudumc/CWZ, Nijmegen, Netherlands.,Department of Medical Microbiology, Radboudumc, Nijmegen, Netherlands
| |
Collapse
|
11
|
Barton HJ, Zeng K. New Methods for Inferring the Distribution of Fitness Effects for INDELs and SNPs. Mol Biol Evol 2019; 35:1536-1546. [PMID: 29635416 PMCID: PMC5967470 DOI: 10.1093/molbev/msy054] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Small insertions and deletions (INDELs; ≤50 bp) are the most common type of variability after single nucleotide polymorphism (SNP). However, compared with SNPs, we know little about the distribution of fitness effects (DFE) of new INDEL mutations and how prevalent adaptive INDEL substitutions are. Studying INDELs has been difficult partly because identifying ancestral states at these sites is error-prone and misidentification can lead to severely biased estimates of the strength of selection. To solve these problems, we develop new maximum likelihood methods, which use polymorphism data to simultaneously estimate the DFE, the mutation rate, and the misidentification rate. These methods are applicable to both INDELs and SNPs. Simulations show that they can provide highly accurate results. We applied the methods to an INDEL polymorphism data set in Drosophila melanogaster. We found that the DFE for polymorphic INDELs in protein-coding regions is bimodal, with the variants being either nearly neutral or strongly deleterious. Based on the DFE, we estimated that 71.5–83.7% of the INDEL substitutions that took place along the D. melanogaster lineage were fixed by positive selection, which is comparable with the prevalence of adaptive substitutions at nonsynonymous sites. The new methods have been implemented in the software package anavar.
Collapse
Affiliation(s)
- Henry J Barton
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, United Kingdom
| | - Kai Zeng
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, United Kingdom
| |
Collapse
|
12
|
Redford L, Alhilal G, Needham S, O’Brien O, Coaker J, Tyson J, Amorim LM, Middleton I, Izuogu O, Arends M, Oniscu A, Alonso ÁM, Laguna SM, Gallon R, Sheth H, Santibanez-Koref M, Jackson MS, Burn J. A novel panel of short mononucleotide repeats linked to informative polymorphisms enabling effective high volume low cost discrimination between mismatch repair deficient and proficient tumours. PLoS One 2018; 13:e0203052. [PMID: 30157243 PMCID: PMC6114912 DOI: 10.1371/journal.pone.0203052] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2018] [Accepted: 08/14/2018] [Indexed: 12/12/2022] Open
Abstract
Somatic mutations in mononucleotide repeats are commonly used to assess the mismatch repair status of tumours. Current tests focus on repeats with a length above 15bp, which tend to be somatically more unstable than shorter ones. These longer repeats also have a substantially higher PCR error rate, and tests that use capillary electrophoresis for fragment size analysis often require expert interpretation. In this communication, we present a panel of 17 short repeats (length 7-12bp) for sequence-based microsatellite instability (MSI) testing. Using a simple scoring procedure that incorporates the allelic distribution of the mutant repeats, and analysis of two cohort of tumours totalling 209 samples, we show that this panel is able to discriminate between MMR proficient and deficient tumours, even when constitutional DNA is not available. In the training cohort, the method achieved 100% concordance with fragment analysis, while in the testing cohort, 4 discordant samples were observed (corresponding to 97% concordance). Of these, 2 showed discrepancies between fragment analysis and immunohistochemistry and one was reclassified after re-testing using fragment analysis. These results indicate that our approach offers the option of a reliable, scalable routine test for MSI.
Collapse
Affiliation(s)
- Lisa Redford
- Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Ghanim Alhilal
- Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Stephanie Needham
- Pathology Department and Northern Genetics Service, Newcastle Hospitals, NHS Foundation Trust, Newcastle upon Tyne, United Kingdom
| | - Ottie O’Brien
- Pathology Department and Northern Genetics Service, Newcastle Hospitals, NHS Foundation Trust, Newcastle upon Tyne, United Kingdom
| | - Julie Coaker
- QuantuMDx group ltd, Lugano Building, Newcastle upon Tyne, United Kingdom
| | - John Tyson
- QuantuMDx group ltd, Lugano Building, Newcastle upon Tyne, United Kingdom
| | - Leonardo Maldaner Amorim
- Laboratório de Genética Molecular Humana, Departamento de Genética, Universidade Federal do Paraná, Curitiba, CEP, Brazil
| | - Iona Middleton
- Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Osagi Izuogu
- Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Mark Arends
- Western General Hospital, Edinburgh, United Kingdom
| | - Anca Oniscu
- Western General Hospital, Edinburgh, United Kingdom
| | - Ángel Miguel Alonso
- Servicio de Genética Médica, Complejo Hospitalario de Navarra, Hospital Virgen del Camino, C/ Irunlarrea 4, Pamplona, Spain
| | - Sira Moreno Laguna
- Servicio de Genética Médica, Complejo Hospitalario de Navarra, Hospital Virgen del Camino, C/ Irunlarrea 4, Pamplona, Spain
| | - Richard Gallon
- Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Harsh Sheth
- Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Mauro Santibanez-Koref
- Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Michael S. Jackson
- Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - John Burn
- Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, United Kingdom
| |
Collapse
|
13
|
Zavodna M, Bagshaw A, Brauning R, Gemmell NJ. The effects of transcription and recombination on mutational dynamics of short tandem repeats. Nucleic Acids Res 2018; 46:1321-1330. [PMID: 29300948 PMCID: PMC5814968 DOI: 10.1093/nar/gkx1253] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2017] [Revised: 11/27/2017] [Accepted: 12/27/2017] [Indexed: 01/07/2023] Open
Abstract
Short tandem repeats (STR) are ubiquitous components of the genomic architecture of most living organisms. Recent work has highlighted the widespread functional significance of such repeats, particularly around gene regulation, but the mutational processes underlying the evolution of these highly abundant and highly variable sequences are not fully understood. Traditional models assume that strand misalignment during replication is the predominant mechanism, but empirical data suggest the involvement of other processes including recombination and transcription. Despite this evidence, the relative influences of these processes have not previously been tested experimentally on a genome-wide scale. Using deep sequencing, we identify mutations at >200 microsatellites, across 700 generations in replicated populations of two otherwise identical sexual and asexual Saccharomyces cerevisiae strains. Using generalized linear models, we investigate correlates of STR mutability including the nature of the mutation, STR composition and contextual factors including recombination, transcription and replication origins. Sexual capability was not a significant predictor of microsatellite mutability, but, intriguingly, we identify transcription as a significant positive predictor. We also find that STR density is substantially increased in regions neighboring, but not within, recombination hotspots.
Collapse
Affiliation(s)
- Monika Zavodna
- Department of Anatomy, University of Otago, Dunedin 9054, New Zealand
| | - Andrew Bagshaw
- Department of Pathology, University of Otago, Christchurch 8140, New Zealand
| | - Rudiger Brauning
- AgResearch Limited, Invermay Agricultural Centre, Mosgiel, New Zealand
| | - Neil J Gemmell
- Department of Anatomy, University of Otago, Dunedin 9054, New Zealand
- Allan Wilson Centre for Molecular Ecology and Evolution, University of Otago, Dunedin 9054, New Zealand
| |
Collapse
|
14
|
Abstract
The availability of complete fungal genomes is expanding rapidly and is offering an extensive and accurate view of this "kingdom." The scientific milestone of free access to more than 1000 fungal genomes of different species was reached, and new and stimulating projects have meanwhile been released. The "1000 Fungal Genomes Project" represents one of the largest sequencing initiative regarding fungal organisms trying to fill some gaps on fungal genomics. Presently, there are 329 fungal families with at least one representative genome sequenced, but there is still a large number of fungal families without a single sequenced genome. In addition, additional sequencing projects helped to understand the genetic diversity within some fungal species. The availability of multiple genomes per species allows to support taxonomic organization, brings new insights for fungal evolution in short-time scales, clarifies geographical and dispersion patterns, elucidates outbreaks and transmission routes, among other objectives. Genotyping methodologies analyze only a small fraction of an individual's genome but facilitate the comparison of hundreds or thousands of isolates in a small fraction of the time and at low cost. The integration of whole genome strategies and improved genotyping panels targeting specific and relevant SNPs and/or repeated regions can represent fast and practical strategies for studying local, regional, and global epidemiology of fungi.
Collapse
Affiliation(s)
- Ricardo Araujo
- University of Porto, Porto, Portugal; School of Medicine and Health Sciences, Flinders University, Adelaide, SA, Australia.
| | | |
Collapse
|
15
|
Erlich Y, Zielinski D. DNA Fountain enables a robust and efficient storage architecture. Science 2017; 355:950-954. [PMID: 28254941 DOI: 10.1126/science.aaj2038] [Citation(s) in RCA: 287] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Accepted: 02/09/2017] [Indexed: 12/16/2022]
Abstract
DNA is an attractive medium to store digital information. Here we report a storage strategy, called DNA Fountain, that is highly robust and approaches the information capacity per nucleotide. Using our approach, we stored a full computer operating system, movie, and other files with a total of 2.14 × 106 bytes in DNA oligonucleotides and perfectly retrieved the information from a sequencing coverage equivalent to a single tile of Illumina sequencing. We also tested a process that can allow 2.18 × 1015 retrievals using the original DNA sample and were able to perfectly decode the data. Finally, we explored the limit of our architecture in terms of bytes per molecule and obtained a perfect retrieval from a density of 215 petabytes per gram of DNA, orders of magnitude higher than previous reports.
Collapse
Affiliation(s)
- Yaniv Erlich
- New York Genome Center, New York, NY 10013, USA. .,Department of Computer Science, Fu Foundation School of Engineering, Columbia University, New York, NY 10027, USA.,Center for Computational Biology and Bioinformatics (C2B2), Department of Systems Biology, Columbia University, New York, NY 10027, USA
| | | |
Collapse
|
16
|
Bagshaw AT. Functional Mechanisms of Microsatellite DNA in Eukaryotic Genomes. Genome Biol Evol 2017; 9:2428-2443. [PMID: 28957459 PMCID: PMC5622345 DOI: 10.1093/gbe/evx164] [Citation(s) in RCA: 64] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/23/2017] [Indexed: 02/06/2023] Open
Abstract
Microsatellite repeat DNA is best known for its length mutability, which is implicated in several neurological diseases and cancers, and often exploited as a genetic marker. Less well-known is the body of work exploring the widespread and surprisingly diverse functional roles of microsatellites. Recently, emerging evidence includes the finding that normal microsatellite polymorphism contributes substantially to the heritability of human gene expression on a genome-wide scale, calling attention to the task of elucidating the mechanisms involved. At present, these are underexplored, but several themes have emerged. I review evidence demonstrating roles for microsatellites in modulation of transcription factor binding, spacing between promoter elements, enhancers, cytosine methylation, alternative splicing, mRNA stability, selection of transcription start and termination sites, unusual structural conformations, nucleosome positioning and modification, higher order chromatin structure, noncoding RNA, and meiotic recombination hot spots.
Collapse
|
17
|
Hosseinzadeh-Colagar A, Haghighatnia MJ, Amiri Z, Mohadjerani M, Tafrihi M. Microsatellite (SSR) amplification by PCR usually led to polymorphic bands: Evidence which shows replication slippage occurs in extend or nascent DNA strands. MOLECULAR BIOLOGY RESEARCH COMMUNICATIONS 2016; 5:167-174. [PMID: 28097170 PMCID: PMC5219911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Microsatellites or simple sequence repeats (SSRs) are very effective molecular markers in population genetics, genome mapping, taxonomic study and other large-scale studies. Variation in number of tandem repeats within microsatellite refers to simple sequence length polymorphism (SSLP); but there are a few studies that are showed SSRs replication slippage may be occurred during in vitro amplification which are produced 'stutter products' differing in length from the main products. The purpose of this study is introducing a reliable method to realize SSRs replication slippage. At first, three unique primers designed to amplify SSRs loci in the great gerbil (Rhombomys opimus) by PCR. Crush and soak method used to isolate interesting DNA bands from polyacrylamide gel. PCR products analyzed using by sequencing methods. Our study has been shown that Taq DNA polymerase slipped during microsatellite in vitro amplification which led to insertion or deletion of repeats in sense or antisense DNA strands. It is produced amplified fragments with various lengths in gel electrophoresis showed as 'stutter bands'. Thus, in population studies by SSRs markers recommend that replication slippage effects and stutter bands have been considered.
Collapse
Affiliation(s)
- Abasalt Hosseinzadeh-Colagar
- Address for correspondence: Department of Molecular and Cell Biology, Faculty of Basic Sciences, University of Mazandaran, Babolsar, Postal Code 47416-95447, Mazandaran, Iran ,Tel: +98 (112) 5242161, Fax: +98 (112) 5242161, E. mail: : and
| | | | | | | | | |
Collapse
|
18
|
Fungtammasan A, Tomaszkiewicz M, Campos-Sánchez R, Eckert KA, DeGiorgio M, Makova KD. Reverse Transcription Errors and RNA-DNA Differences at Short Tandem Repeats. Mol Biol Evol 2016; 33:2744-58. [PMID: 27413049 PMCID: PMC5026258 DOI: 10.1093/molbev/msw139] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Transcript variation has important implications for organismal function in health and disease. Most transcriptome studies focus on assessing variation in gene expression levels and isoform representation. Variation at the level of transcript sequence is caused by RNA editing and transcription errors, and leads to nongenetically encoded transcript variants, or RNA–DNA differences (RDDs). Such variation has been understudied, in part because its detection is obscured by reverse transcription (RT) and sequencing errors. It has only been evaluated for intertranscript base substitution differences. Here, we investigated transcript sequence variation for short tandem repeats (STRs). We developed the first maximum-likelihood estimator (MLE) to infer RT error and RDD rates, taking next generation sequencing error rates into account. Using the MLE, we empirically evaluated RT error and RDD rates for STRs in a large-scale DNA and RNA replicated sequencing experiment conducted in a primate species. The RT error rates increased exponentially with STR length and were biased toward expansions. The RDD rates were approximately 1 order of magnitude lower than the RT error rates. The RT error rates estimated with the MLE from a primate data set were concordant with those estimated with an independent method, barcoded RNA sequencing, from a Caenorhabditis elegans data set. Our results have important implications for medical genomics, as STR allelic variation is associated with >40 diseases. STR nonallelic transcript variation can also contribute to disease phenotype. The MLE and empirical rates presented here can be used to evaluate the probability of disease-associated transcripts arising due to RDD.
Collapse
Affiliation(s)
- Arkarachai Fungtammasan
- Integrative Biosciences, Bioinformatics and Genomics Option, Pennsylvania State University Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University Huck Institute of Genome Sciences, Pennsylvania State University
| | - Marta Tomaszkiewicz
- Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University
| | - Rebeca Campos-Sánchez
- Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University
| | - Kristin A Eckert
- Center for Medical Genomics, Pennsylvania State University Department of Pathology, The Jake Gittlen Laboratories for Cancer Research, The Pennsylvania State University College of Medicine
| | - Michael DeGiorgio
- Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University Institute for CyberScience, Pennsylvania State University
| | - Kateryna D Makova
- Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University Huck Institute of Genome Sciences, Pennsylvania State University
| |
Collapse
|
19
|
Shimada MK, Sanbonmatsu R, Yamaguchi-Kabata Y, Yamasaki C, Suzuki Y, Chakraborty R, Gojobori T, Imanishi T. Selection pressure on human STR loci and its relevance in repeat expansion disease. Mol Genet Genomics 2016; 291:1851-69. [PMID: 27290643 DOI: 10.1007/s00438-016-1219-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2015] [Accepted: 05/21/2016] [Indexed: 12/30/2022]
Abstract
Short Tandem Repeats (STRs) comprise repeats of one to several base pairs. Because of the high mutability due to strand slippage during DNA synthesis, rapid evolutionary change in the number of repeating units directly shapes the range of repeat-number variation according to selection pressure. However, the remaining questions include: Why are STRs causing repeat expansion diseases maintained in the human population; and why are these limited to neurodegenerative diseases? By evaluating the genome-wide selection pressure on STRs using the database we constructed, we identified two different patterns of relationship in repeat-number polymorphisms between DNA and amino-acid sequences, although both patterns are evolutionary consequences of avoiding the formation of harmful long STRs. First, a mixture of degenerate codons is represented in poly-proline (poly-P) repeats. Second, long poly-glutamine (poly-Q) repeats are favored at the protein level; however, at the DNA level, STRs encoding long poly-Qs are frequently divided by synonymous SNPs. Furthermore, significant enrichments of apoptosis and neurodevelopment were biological processes found specifically in genes encoding poly-Qs with repeat polymorphism. This suggests the existence of a specific molecular function for polymorphic and/or long poly-Q stretches. Given that the poly-Qs causing expansion diseases were longer than other poly-Qs, even in healthy subjects, our results indicate that the evolutionary benefits of long and/or polymorphic poly-Q stretches outweigh the risks of long CAG repeats predisposing to pathological hyper-expansions. Molecular pathways in neurodevelopment requiring long and polymorphic poly-Q stretches may provide a clue to understanding why poly-Q expansion diseases are limited to neurodegenerative diseases.
Collapse
Affiliation(s)
- Makoto K Shimada
- Institute for Comprehensive Medical Science, Fujita Health University, 1-98 Dengakugakubo, Kutsukake-cho, Toyoake, Aichi, 470-1192, Japan. .,National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan. .,Japan Biological Informatics Consortium, 10F TIME24 Building, 2-4-32 Aomi, Koto-ku, Tokyo, 135-8073, Japan.
| | - Ryoko Sanbonmatsu
- Japan Biological Informatics Consortium, 10F TIME24 Building, 2-4-32 Aomi, Koto-ku, Tokyo, 135-8073, Japan
| | - Yumi Yamaguchi-Kabata
- National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan.,Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, 980-8573, Japan
| | - Chisato Yamasaki
- National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan.,Japan Biological Informatics Consortium, 10F TIME24 Building, 2-4-32 Aomi, Koto-ku, Tokyo, 135-8073, Japan
| | - Yoshiyuki Suzuki
- Graduate School of Natural Sciences, Nagoya City University, 1 Yamanohata, Mizuho-cho, Mizuho-ku, Nagoya, Aichi, 467-8501, Japan
| | - Ranajit Chakraborty
- Health Science Center, University of North Texas, 3500 Camp Bowie Blvd., Fort Worth, TX, 76107, USA
| | - Takashi Gojobori
- National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan.,Computational Bioscience Research Center, King Abdullah University of Science and Technology, Ibn Al-Haytham Building (West), Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Tadashi Imanishi
- National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan.,Department of Molecular Life Science, Tokai University School of Medicine, 143 Shimokasuya, Isehara, Kanagawa, 259-1193, Japan
| |
Collapse
|
20
|
Campos-Sánchez R, Cremona MA, Pini A, Chiaromonte F, Makova KD. Integration and Fixation Preferences of Human and Mouse Endogenous Retroviruses Uncovered with Functional Data Analysis. PLoS Comput Biol 2016; 12:e1004956. [PMID: 27309962 PMCID: PMC4911145 DOI: 10.1371/journal.pcbi.1004956] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2016] [Accepted: 04/29/2016] [Indexed: 01/24/2023] Open
Abstract
Endogenous retroviruses (ERVs), the remnants of retroviral infections in the germ line, occupy ~8% and ~10% of the human and mouse genomes, respectively, and affect their structure, evolution, and function. Yet we still have a limited understanding of how the genomic landscape influences integration and fixation of ERVs. Here we conducted a genome-wide study of the most recently active ERVs in the human and mouse genome. We investigated 826 fixed and 1,065 in vitro HERV-Ks in human, and 1,624 fixed and 242 polymorphic ETns, as well as 3,964 fixed and 1,986 polymorphic IAPs, in mouse. We quantitated >40 human and mouse genomic features (e.g., non-B DNA structure, recombination rates, and histone modifications) in ±32 kb of these ERVs' integration sites and in control regions, and analyzed them using Functional Data Analysis (FDA) methodology. In one of the first applications of FDA in genomics, we identified genomic scales and locations at which these features display their influence, and how they work in concert, to provide signals essential for integration and fixation of ERVs. The investigation of ERVs of different evolutionary ages (young in vitro and polymorphic ERVs, older fixed ERVs) allowed us to disentangle integration vs. fixation preferences. As a result of these analyses, we built a comprehensive model explaining the uneven distribution of ERVs along the genome. We found that ERVs integrate in late-replicating AT-rich regions with abundant microsatellites, mirror repeats, and repressive histone marks. Regions favoring fixation are depleted of genes and evolutionarily conserved elements, and have low recombination rates, reflecting the effects of purifying selection and ectopic recombination removing ERVs from the genome. In addition to providing these biological insights, our study demonstrates the power of exploiting multiple scales and localization with FDA. These powerful techniques are expected to be applicable to many other genomic investigations.
Collapse
Affiliation(s)
- Rebeca Campos-Sánchez
- Genetics Graduate Program, The Huck Institutes of the Life Sciences, Penn State University, University Park, Pennsylvania, United States of America
| | - Marzia A. Cremona
- MOX—Modeling and Scientific Computing, Department of Mathematics, Politecnico di Milano, Milano, Italy
- Department of Statistics, Penn State University, University Park, Pennsylvania, United States of America
| | - Alessia Pini
- MOX—Modeling and Scientific Computing, Department of Mathematics, Politecnico di Milano, Milano, Italy
| | - Francesca Chiaromonte
- Department of Statistics, Penn State University, University Park, Pennsylvania, United States of America
- Center for Medical Genomics, The Huck Institutes of the Life Sciences, Penn State University, University Park, Pennsylvania, United States of America
| | - Kateryna D. Makova
- Center for Medical Genomics, The Huck Institutes of the Life Sciences, Penn State University, University Park, Pennsylvania, United States of America
- Department of Biology, Penn State University, University Park, Pennsylvania, United States of America
| |
Collapse
|
21
|
Tomaszkiewicz M, Rangavittal S, Cechova M, Campos Sanchez R, Fescemyer HW, Harris R, Ye D, O'Brien PCM, Chikhi R, Ryder OA, Ferguson-Smith MA, Medvedev P, Makova KD. A time- and cost-effective strategy to sequence mammalian Y Chromosomes: an application to the de novo assembly of gorilla Y. Genome Res 2016; 26:530-40. [PMID: 26934921 PMCID: PMC4817776 DOI: 10.1101/gr.199448.115] [Citation(s) in RCA: 77] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2015] [Accepted: 01/21/2016] [Indexed: 01/25/2023]
Abstract
The mammalian Y Chromosome sequence, critical for studying male fertility and dispersal, is enriched in repeats and palindromes, and thus, is the most difficult component of the genome to assemble. Previously, expensive and labor-intensive BAC-based techniques were used to sequence the Y for a handful of mammalian species. Here, we present a much faster and more affordable strategy for sequencing and assembling mammalian Y Chromosomes of sufficient quality for most comparative genomics analyses and for conservation genetics applications. The strategy combines flow sorting, short- and long-read genome and transcriptome sequencing, and droplet digital PCR with novel and existing computational methods. It can be used to reconstruct sex chromosomes in a heterogametic sex of any species. We applied our strategy to produce a draft of the gorilla Y sequence. The resulting assembly allowed us to refine gene content, evaluate copy number of ampliconic gene families, locate species-specific palindromes, examine the repetitive element content, and produce sequence alignments with human and chimpanzee Y Chromosomes. Our results inform the evolution of the hominine (human, chimpanzee, and gorilla) Y Chromosomes. Surprisingly, we found the gorilla Y Chromosome to be similar to the human Y Chromosome, but not to the chimpanzee Y Chromosome. Moreover, we have utilized the assembled gorilla Y Chromosome sequence to design genetic markers for studying the male-specific dispersal of this endangered species.
Collapse
Affiliation(s)
- Marta Tomaszkiewicz
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Samarth Rangavittal
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Monika Cechova
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Rebeca Campos Sanchez
- Genetics Program, The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Howard W Fescemyer
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Robert Harris
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Danling Ye
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Patricia C M O'Brien
- Department of Veterinary Medicine, University of Cambridge, Cambridge CB3 0ES, United Kingdom
| | - Rayan Chikhi
- University of Lille 1/CNRS 59655 Villeneuve d'Ascq, France; Department of Computer Science and Engineering, Pennsylvania State University, University Park, Pennsylvania 16802, USA; The Genome Sciences Institute of the Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Oliver A Ryder
- San Diego Zoo Institute for Conservation Research, Escondido, California 92027, USA
| | | | - Paul Medvedev
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, Pennsylvania 16802, USA; The Genome Sciences Institute of the Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA; Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Kateryna D Makova
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
22
|
Vaksman Z, Garner HR. Somatic microsatellite variability as a predictive marker for colorectal cancer and liver cancer progression. Oncotarget 2016; 6:5760-71. [PMID: 25691061 PMCID: PMC4467400 DOI: 10.18632/oncotarget.3306] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2014] [Accepted: 01/02/2015] [Indexed: 12/13/2022] Open
Abstract
Microsatellites (MSTs) are short tandem repeated genetic motifs that comprise ~3% of the genome. MST instability (MSI), defined as acquired/lost primary alleles at a small subset of microsatellite loci (e.g. Bethesda markers), is a clinically relevant marker for colorectal cancer. However, these markers are not applicable to other types of cancers, specifically, for liver cancer which has a high mortality rate. Here we show that somatic MST variability (SMV), defined as the presence of additional, non-primary (aka minor) alleles at MST loci, is a complementary measure of MSI, and a genetic marker for colorectal and liver cancer. Re-analysis of Illumina sequenced exomes from The Cancer Genome Atlas indicates that SMV may distinguish a subpopulation of African American patients with colorectal cancer, which represents ~33% of the population in this study. Further, for liver cancer, a higher rate of SMV may be indicative of an earlier age of onset. The work presented here suggests that classical MSI should be expanded to include SMV, going beyond alterations of the primary alleles at a small number of microsatellite loci. This measure of SMV may represent a potential new diagnostic for a variety of cancers and may provide new information for colorectal cancer patients.
Collapse
Affiliation(s)
- Zalman Vaksman
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA, USA
| | - Harold R Garner
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA, USA
| |
Collapse
|
23
|
Sonay TB, Koletou M, Wagner A. A survey of tandem repeat instabilities and associated gene expression changes in 35 colorectal cancers. BMC Genomics 2015; 16:702. [PMID: 26376692 PMCID: PMC4574073 DOI: 10.1186/s12864-015-1902-9] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2015] [Accepted: 09/09/2015] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Colorectal cancer is a major contributor to cancer morbidity and mortality. Tandem repeat instability and its effect on cancer phenotypes remain so far poorly studied on a genome-wide scale. RESULTS Here we analyze the genomes of 35 colorectal tumors and their matched normal (healthy) tissues for two types of tandem repeat instability, de-novo repeat gain or loss and repeat copy number variation. Specifically, we study for the first time genome-wide repeat instability in the promoters and exons of 18,439 genes, and examine the association of repeat instability with genome-scale gene expression levels. We find that tumors with a microsatellite instable (MSI) phenotype are enriched in genes with repeat instability, and that tumor genomes have significantly more genes with repeat instability compared to healthy tissues. Genes in tumor genomes with repeat instability in their promoters are significantly less expressed and show slightly higher levels of methylation. Genes in well-studied cancer-associated signaling pathways also contain significantly more unstable repeats in tumor genomes. Genes with such unstable repeats in the tumor-suppressor p53 pathway have lower expression levels, whereas genes with repeat instability in the MAPK and Wnt signaling pathways are expressed at higher levels, consistent with the oncogenic role they play in cancer. CONCLUSIONS Our results suggest that repeat instability in gene promoters and associated differential gene expression may play an important role in colorectal tumors, which is a first step towards the development of more effective molecular diagnostic approaches centered on repeat instability.
Collapse
Affiliation(s)
- Tugce Bilgin Sonay
- Anthropological Institute and Museum, University of Zurich, Zurich, Switzerland.
- Institute of Evolutionary Biology and Environmental Sciences, University of Zurich, Zurich, Switzerland.
| | | | - Andreas Wagner
- Institute of Evolutionary Biology and Environmental Sciences, University of Zurich, Zurich, Switzerland.
- The Swiss Institute of Bioinformatics, Lausanne, Switzerland.
- The Santa Fe Institute, Santa Fe, NM, United States of America.
| |
Collapse
|
24
|
The Nature, Extent, and Consequences of Genetic Variation in the opa Repeats of Notch in Drosophila. G3-GENES GENOMES GENETICS 2015; 5:2405-19. [PMID: 26362765 PMCID: PMC4632060 DOI: 10.1534/g3.115.021659] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Polyglutamine (pQ) tracts are abundant in proteins co-interacting on DNA. The lengths of these pQ tracts can modulate their interaction strengths. However, pQ tracts >40 residues are pathologically prone to amyloidogenic self-assembly. Here, we assess the extent and consequences of variation in the pQ-encoding opa repeats of Notch in Drosophila melanogaster. We use Sanger sequencing to genotype opa sequences (5′-CAX repeats), which have resisted assembly using short sequence reads. While most sampled lines carry the major allele opa31 encoding Q13HQ17 or the opa32 allele encoding Q13HQ18, many lines carry rare alleles encoding pQ tracts >32 residues: opa33a (Q14HQ18), opa33b (Q15HQ17), opa34 (Q16HQ17), opa35a1/opa35a2 (Q13HQ21), opa36 (Q13HQ22), and opa37 (Q13HQ23). Only one rare allele encodes a tract <31 residues: opa23 (Q13–Q10). This opa23 allele shortens the pQ tract while simultaneously eliminating the interrupting histidine. We introgressed these opa variant alleles into common backgrounds and measured the frequency of Notch-type phenotypes. Homozygotes for the short and long opa alleles have defects in embryonic survival and sensory bristle organ patterning, and sometimes show wing notching. Consistent with functional differences between Notch opa variants, we find that a scute inversion carrying the rare opa33b allele suppresses the bristle patterning defect caused by achaete/scute insufficiency, while an equivalent scute inversion carrying opa31 manifests the patterning defect. Our results demonstrate the existence of potent pQ variants of Notch and the need for long read genotyping of key repeat variables underlying gene regulatory networks.
Collapse
|
25
|
Bacolla A, Zhu X, Chen H, Howells K, Cooper DN, Vasquez KM. Local DNA dynamics shape mutational patterns of mononucleotide repeats in human genomes. Nucleic Acids Res 2015; 43:5065-80. [PMID: 25897114 PMCID: PMC4446427 DOI: 10.1093/nar/gkv364] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Accepted: 04/07/2015] [Indexed: 12/13/2022] Open
Abstract
Single base substitutions (SBSs) and insertions/deletions are critical for generating population diversity and can lead both to inherited disease and cancer. Whereas on a genome-wide scale SBSs are influenced by cellular factors, on a fine scale SBSs are influenced by the local DNA sequence-context, although the role of flanking sequence is often unclear. Herein, we used bioinformatics, molecular dynamics and hybrid quantum mechanics/molecular mechanics to analyze sequence context-dependent mutagenesis at mononucleotide repeats (A-tracts and G-tracts) in human population variation and in cancer genomes. SBSs and insertions/deletions occur predominantly at the first and last base-pairs of A-tracts, whereas they are concentrated at the second and third base-pairs in G-tracts. These positions correspond to the most flexible sites along A-tracts, and to sites where a ‘hole’, generated by the loss of an electron through oxidation, is most likely to be localized in G-tracts. For A-tracts, most SBSs occur in the direction of the base-pair flanking the tracts. We conclude that intrinsic features of local DNA structure, i.e. base-pair flexibility and charge transfer, render specific nucleotides along mononucleotide runs susceptible to base modification, which then yields mutations. Thus, local DNA dynamics contributes to phenotypic variation and disease in the human population.
Collapse
Affiliation(s)
- Albino Bacolla
- Division of Pharmacology and Toxicology, College of Pharmacy, Dell Pediatric Research Institute, The University of Texas at Austin, 1400 Barbara Jordan Boulevard, Austin, TX 78723, USA
| | - Xiao Zhu
- Texas Advanced Computing Center, Austin, TX 78758-4497, USA
| | - Hanning Chen
- Department of Chemistry, George Washington University, 725 21st Street, NW, Washington, DC 20052, USA
| | - Katy Howells
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff CF14 4XN, UK
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff CF14 4XN, UK
| | - Karen M Vasquez
- Division of Pharmacology and Toxicology, College of Pharmacy, Dell Pediatric Research Institute, The University of Texas at Austin, 1400 Barbara Jordan Boulevard, Austin, TX 78723, USA
| |
Collapse
|
26
|
Fungtammasan A, Ananda G, Hile SE, Su MSW, Sun C, Harris R, Medvedev P, Eckert K, Makova KD. Accurate typing of short tandem repeats from genome-wide sequencing data and its applications. Genome Res 2015; 25:736-49. [PMID: 25823460 PMCID: PMC4417121 DOI: 10.1101/gr.185892.114] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2014] [Accepted: 03/16/2015] [Indexed: 11/24/2022]
Abstract
Short tandem repeats (STRs) are implicated in dozens of human genetic diseases and contribute significantly to genome variation and instability. Yet profiling STRs from short-read sequencing data is challenging because of their high sequencing error rates. Here, we developed STR-FM, short tandem repeat profiling using flank-based mapping, a computational pipeline that can detect the full spectrum of STR alleles from short-read data, can adapt to emerging read-mapping algorithms, and can be applied to heterogeneous genetic samples (e.g., tumors, viruses, and genomes of organelles). We used STR-FM to study STR error rates and patterns in publicly available human and in-house generated ultradeep plasmid sequencing data sets. We discovered that STRs sequenced with a PCR-free protocol have up to ninefold fewer errors than those sequenced with a PCR-containing protocol. We constructed an error correction model for genotyping STRs that can distinguish heterozygous alleles containing STRs with consecutive repeat numbers. Applying our model and pipeline to Illumina sequencing data with 100-bp reads, we could confidently genotype several disease-related long trinucleotide STRs. Utilizing this pipeline, for the first time we determined the genome-wide STR germline mutation rate from a deeply sequenced human pedigree. Additionally, we built a tool that recommends minimal sequencing depth for accurate STR genotyping, depending on repeat length and sequencing read length. The required read depth increases with STR length and is lower for a PCR-free protocol. This suite of tools addresses the pressing challenges surrounding STR genotyping, and thus is of wide interest to researchers investigating disease-related STRs and STR evolution.
Collapse
Affiliation(s)
- Arkarachai Fungtammasan
- Integrative Biosciences, Bioinformatics and Genomics Option, Pennsylvania State University, University Park, Pennsylvania 16802, USA; Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA; Center for Medical Genomics, Pennsylvania State University, University Park, Pennsylvania 16802, USA; The Genome Science Institute at the Huck Institutes of Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Guruprasad Ananda
- Integrative Biosciences, Bioinformatics and Genomics Option, Pennsylvania State University, University Park, Pennsylvania 16802, USA; Center for Medical Genomics, Pennsylvania State University, University Park, Pennsylvania 16802, USA; The Genome Science Institute at the Huck Institutes of Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA; Department of Biochemistry and Molecular Biology, Pennsylvania State University, Pennsylvania 16802, USA
| | - Suzanne E Hile
- Center for Medical Genomics, Pennsylvania State University, University Park, Pennsylvania 16802, USA; Department of Pathology, The Jake Gittlen Laboratories for Cancer Research, Pennsylvania State University College of Medicine, Hershey, Pennsylvania 17033, USA
| | - Marcia Shu-Wei Su
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA; Center for Medical Genomics, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Chen Sun
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Robert Harris
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Paul Medvedev
- Center for Medical Genomics, Pennsylvania State University, University Park, Pennsylvania 16802, USA; The Genome Science Institute at the Huck Institutes of Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA; Department of Biochemistry and Molecular Biology, Pennsylvania State University, Pennsylvania 16802, USA; Department of Computer Science and Engineering, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Kristin Eckert
- Center for Medical Genomics, Pennsylvania State University, University Park, Pennsylvania 16802, USA; Department of Pathology, The Jake Gittlen Laboratories for Cancer Research, Pennsylvania State University College of Medicine, Hershey, Pennsylvania 17033, USA
| | - Kateryna D Makova
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA; Center for Medical Genomics, Pennsylvania State University, University Park, Pennsylvania 16802, USA; The Genome Science Institute at the Huck Institutes of Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
27
|
Baptiste BA, Jacob KD, Eckert KA. Genetic evidence that both dNTP-stabilized and strand slippage mechanisms may dictate DNA polymerase errors within mononucleotide microsatellites. DNA Repair (Amst) 2015; 29:91-100. [PMID: 25758780 DOI: 10.1016/j.dnarep.2015.02.016] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Revised: 02/15/2015] [Accepted: 02/16/2015] [Indexed: 12/19/2022]
Abstract
Mononucleotide microsatellites are tandem repeats of a single base pair, abundant within coding exons and frequent sites of mutation in the human genome. Because the repeated unit is one base pair, multiple mechanisms of insertion/deletion (indel) mutagenesis are possible, including strand-slippage, dNTP-stabilized, and misincorportion-misalignment. Here, we examine the effects of polymerase identity (mammalian Pols α, β, κ, and η), template sequence, dNTP pool size, and reaction temperature on indel errors during in vitro synthesis of mononucleotide microsatellites. We utilized the ratio of insertion to deletion errors as a genetic indicator of mechanism. Strikingly, we observed a statistically significant bias toward deletion errors within mononucleotide repeats for the majority of the 28 DNA template and polymerase combinations examined, with notable exceptions based on sequence and polymerase identity. Using mutator forms of Pol β did not substantially alter the error specificity, suggesting that mispairing-misalignment mechanism is not a primary mechanism. Based on our results for mammalian DNA polymerases representing three structurally distinct families, we suggest that dNTP-stabilized mutagenesis may be an alternative mechanism for mononucleotide microsatellite indel mutation. The change from a predominantly dNTP-stabilized mechanism to a strand-slippage mechanism with increasing microsatellite length may account for the differential rates of tandem repeat mutation that are observed genome-wide.
Collapse
Affiliation(s)
- Beverly A Baptiste
- The Jake Gittlen Laboratories for Cancer Research and the Department of Pathology, Pennsylvania State University College of Medicine, 500 University Drive, Hershey, PA 17033, USA
| | - Kimberly D Jacob
- The Jake Gittlen Laboratories for Cancer Research and the Department of Pathology, Pennsylvania State University College of Medicine, 500 University Drive, Hershey, PA 17033, USA
| | - Kristin A Eckert
- The Jake Gittlen Laboratories for Cancer Research and the Department of Pathology, Pennsylvania State University College of Medicine, 500 University Drive, Hershey, PA 17033, USA.
| |
Collapse
|
28
|
Kwong M, Pemberton TJ. Sequence differences at orthologous microsatellites inflate estimates of human-chimpanzee differentiation. BMC Genomics 2014; 15:990. [PMID: 25407736 PMCID: PMC4253012 DOI: 10.1186/1471-2164-15-990] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2014] [Accepted: 10/30/2014] [Indexed: 02/06/2023] Open
Abstract
Background Microsatellites---contiguous arrays of 2–6 base-pair motifs---have formed the cornerstone of population-genetic studies for over two decades. Their genotype data typically takes the form of PCR fragment lengths obtained using locus-specific primer pairs to amplify the genomic region encompassing the microsatellite. Recently, we reported a dataset of 5,795 human and 84 chimpanzee individuals with genotypes at 246 human-derived autosomal microsatellites as a resource to facilitate interspecies comparisons. A major assumption underlying this dataset is that PCR amplicons at orthologous microsatellites are commensurable between species. Results We find this assumption to be frequently incorrect owing to discordance in microsatellite organization and variability, as well as nontrivial length imbalances caused by small species-specific indels in microsatellite flanking sequences. Converting PCR fragment lengths into the repeat numbers they represent at 138 microsatellites whose organization and variability was found to be highly similar in both species, we show that interspecies incommensurability among PCR amplicons can inflate FST and DPS estimates by up to 10.6%. Separate investigations of determinants of microsatellite variability in humans and chimpanzees uncover similar patterns with mean and maximum numbers of repeats, as well as numbers and ranges of distinct alleles, all important factors in predicting heterozygosity. In contrast, across microsatellites, numbers of repeats were significantly smaller in chimpanzees than in humans, while numbers and ranges of distinct alleles were instead larger. Conclusions Our findings have fundamental implications for interspecies comparisons using microsatellites and offer new opportunities for more accurate comparisons of patterns of human and chimpanzee genetic variation in numerous areas of application. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-990) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Trevor J Pemberton
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, Manitoba, Canada.
| |
Collapse
|
29
|
Exome-wide somatic microsatellite variation is altered in cells with DNA repair deficiencies. PLoS One 2014; 9:e110263. [PMID: 25402475 PMCID: PMC4234249 DOI: 10.1371/journal.pone.0110263] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Accepted: 09/18/2014] [Indexed: 11/19/2022] Open
Abstract
Microsatellites (MST), tandem repeats of 1–6 nucleotide motifs, are mutational hot-spots with a bias for insertions and deletions (INDELs) rather than single nucleotide polymorphisms (SNPs). The majority of MST instability studies are limited to a small number of loci, the Bethesda markers, which are only informative for a subset of colorectal cancers. In this paper we evaluate non-haplotype alleles present within next-gen sequencing data to evaluate somatic MST variation (SMV) within DNA repair proficient and DNA repair defective cell lines. We confirm that alleles present within next-gen data that do not contribute to the haplotype can be reliably quantified and utilized to evaluate the SMV without requiring comparisons of matched samples. We observed that SMV patterns found in DNA repair proficient cell lines without DNA repair defects, MCF10A, HEK293 and PD20 RV:D2, had consistent patterns among samples. Further, we were able to confirm that changes in SMV patterns in cell lines lacking functional BRCA2, FANCD2 and mismatch repair were consistent with the different pathways perturbed. Using this new exome sequencing analysis approach we show that DNA instability can be identified in a sample and that patterns of instability vary depending on the impaired DNA repair mechanism, and that genes harboring minor alleles are strongly associated with cancer pathways. The MST Minor Allele Caller used for this study is available at https://github.com/zalmanv/MST_minor_allele_caller.
Collapse
|
30
|
Willems T, Gymrek M, Highnam G, Mittelman D, Erlich Y. The landscape of human STR variation. Genome Res 2014; 24:1894-904. [PMID: 25135957 PMCID: PMC4216929 DOI: 10.1101/gr.177774.114] [Citation(s) in RCA: 176] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2014] [Accepted: 08/15/2014] [Indexed: 02/06/2023]
Abstract
Short tandem repeats are among the most polymorphic loci in the human genome. These loci play a role in the etiology of a range of genetic diseases and have been frequently utilized in forensics, population genetics, and genetic genealogy. Despite this plethora of applications, little is known about the variation of most STRs in the human population. Here, we report the largest-scale analysis of human STR variation to date. We collected information for nearly 700,000 STR loci across more than 1000 individuals in Phase 1 of the 1000 Genomes Project. Extensive quality controls show that reliable allelic spectra can be obtained for close to 90% of the STR loci in the genome. We utilize this call set to analyze determinants of STR variation, assess the human reference genome's representation of STR alleles, find STR loci with common loss-of-function alleles, and obtain initial estimates of the linkage disequilibrium between STRs and common SNPs. Overall, these analyses further elucidate the scale of genetic variation beyond classical point mutations.
Collapse
Affiliation(s)
- Thomas Willems
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA; Computational and Systems Biology Program, MIT, Cambridge, Massachusetts 02139, USA
| | - Melissa Gymrek
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA; Harvard-MIT Division of Health Sciences and Technology, MIT, Cambridge, Massachusetts 02139, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA; Department of Molecular Biology and Diabetes Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| | - Gareth Highnam
- Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia 24061, USA
| | - David Mittelman
- Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia 24061, USA; Gene by Gene, Ltd., Houston, Texas 77008, USA
| | - Yaniv Erlich
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA;
| |
Collapse
|
31
|
Tessereau C, Lesecque Y, Monnet N, Buisson M, Barjhoux L, Léoné M, Feng B, Goldgar DE, Sinilnikova OM, Mousset S, Duret L, Mazoyer S. Estimation of the RNU2 macrosatellite mutation rate by BRCA1 mutation tracing. Nucleic Acids Res 2014; 42:9121-30. [PMID: 25034697 PMCID: PMC4132748 DOI: 10.1093/nar/gku639] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Large tandem repeat sequences have been poorly investigated as severe technical limitations and their frequent absence from the genome reference hinder their analysis. Extensive allelotyping of this class of variation has not been possible until now and their mutational dynamics are still poorly known. In order to estimate the mutation rate of a macrosatellite, we analysed in detail the RNU2 locus, which displays at least 50 different alleles containing 5-82 copies of a 6.1 kb repeat unit. Mining data from the 1000 Genomes Project allowed us to precisely estimate copy numbers of the RNU2 repeat unit using read depth of coverage. This further revealed significantly different mean values in various recent modern human populations, favoring a scenario of fast evolution of this locus. Its proximity to a disease gene with numerous founder mutations, BRCA1, within the same linkage disequilibrium block, offered the unique opportunity to trace RNU2 arrays over a large timescale. Analysis of the transmission of RNU2 arrays associated with one ‘private’ mutation in an extended kindred and four founder mutations in multiple kindreds gave an estimation by maximum likelihood of 5 × 10−3 mutations per generation, which is close to that of microsatellites.
Collapse
Affiliation(s)
- Chloé Tessereau
- Genetics of Breast Cancer Team, Cancer Research Centre of Lyon, CNRS UMR5286, Inserm U1052, Université Lyon 1, Centre Léon Bérard, Lyon, France Genomic Vision, Bagneux, Paris, France
| | - Yann Lesecque
- Laboratoire de Biométrie et Biologie Evolutive, CNRS UMR5558, Université Lyon 1, France
| | - Nastasia Monnet
- Genetics of Breast Cancer Team, Cancer Research Centre of Lyon, CNRS UMR5286, Inserm U1052, Université Lyon 1, Centre Léon Bérard, Lyon, France
| | - Monique Buisson
- Genetics of Breast Cancer Team, Cancer Research Centre of Lyon, CNRS UMR5286, Inserm U1052, Université Lyon 1, Centre Léon Bérard, Lyon, France
| | - Laure Barjhoux
- Genetics of Breast Cancer Team, Cancer Research Centre of Lyon, CNRS UMR5286, Inserm U1052, Université Lyon 1, Centre Léon Bérard, Lyon, France
| | - Mélanie Léoné
- Unité Mixte de Génétique Constitutionnelle des Cancers Fréquents, Hospices Civils de Lyon/Centre Léon Bérard, Lyon, France
| | - Bingjian Feng
- Department of Dermatology and Huntsman Cancer Institute University of Utah School of Medicine, Salt Lake City, Utah, USA
| | - David E Goldgar
- Department of Dermatology and Huntsman Cancer Institute University of Utah School of Medicine, Salt Lake City, Utah, USA
| | - Olga M Sinilnikova
- Genetics of Breast Cancer Team, Cancer Research Centre of Lyon, CNRS UMR5286, Inserm U1052, Université Lyon 1, Centre Léon Bérard, Lyon, France Unité Mixte de Génétique Constitutionnelle des Cancers Fréquents, Hospices Civils de Lyon/Centre Léon Bérard, Lyon, France
| | - Sylvain Mousset
- Laboratoire de Biométrie et Biologie Evolutive, CNRS UMR5558, Université Lyon 1, France
| | - Laurent Duret
- Laboratoire de Biométrie et Biologie Evolutive, CNRS UMR5558, Université Lyon 1, France
| | - Sylvie Mazoyer
- Genetics of Breast Cancer Team, Cancer Research Centre of Lyon, CNRS UMR5286, Inserm U1052, Université Lyon 1, Centre Léon Bérard, Lyon, France
| |
Collapse
|
32
|
Ananda G, Hile SE, Breski A, Wang Y, Kelkar Y, Makova KD, Eckert KA. Microsatellite interruptions stabilize primate genomes and exist as population-specific single nucleotide polymorphisms within individual human genomes. PLoS Genet 2014; 10:e1004498. [PMID: 25033203 PMCID: PMC4102424 DOI: 10.1371/journal.pgen.1004498] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2013] [Accepted: 05/28/2014] [Indexed: 01/01/2023] Open
Abstract
Interruptions of microsatellite sequences impact genome evolution and can alter disease manifestation. However, human polymorphism levels at interrupted microsatellites (iMSs) are not known at a genome-wide scale, and the pathways for gaining interruptions are poorly understood. Using the 1000 Genomes Phase-1 variant call set, we interrogated mono-, di-, tri-, and tetranucleotide repeats up to 10 units in length. We detected ∼26,000–40,000 iMSs within each of four human population groups (African, European, East Asian, and American). We identified population-specific iMSs within exonic regions, and discovered that known disease-associated iMSs contain alleles present at differing frequencies among the populations. By analyzing longer microsatellites in primate genomes, we demonstrate that single interruptions result in a genome-wide average two- to six-fold reduction in microsatellite mutability, as compared with perfect microsatellites. Centrally located interruptions lowered mutability dramatically, by two to three orders of magnitude. Using a biochemical approach, we tested directly whether the mutability of a specific iMS is lower because of decreased DNA polymerase strand slippage errors. Modeling the adenomatous polyposis coli tumor suppressor gene sequence, we observed that a single base substitution interruption reduced strand slippage error rates five- to 50-fold, relative to a perfect repeat, during synthesis by DNA polymerases α, β, or η. Computationally, we demonstrate that iMSs arise primarily by base substitution mutations within individual human genomes. Our biochemical survey of human DNA polymerase α, β, δ, κ, and η error rates within certain microsatellites suggests that interruptions are created most frequently by low fidelity polymerases. Our combined computational and biochemical results demonstrate that iMSs are abundant in human genomes and are sources of population-specific genetic variation that may affect genome stability. The genome-wide identification of iMSs in human populations presented here has important implications for current models describing the impact of microsatellite polymorphisms on gene expression. Microsatellites are short tandem repeat DNA sequences located throughout the human genome that display a high degree of inter-individual variation. This characteristic makes microsatellites an attractive tool for population genetics and forensics research. Some microsatellites affect gene expression, and mutations within such microsatellites can cause disease. Interruption mutations disrupt the perfect repeated array and are frequently associated with altered disease risk, but they have not been thoroughly studied in human genomes. We identified interrupted mono-, di-, tri- and tetranucleotide MSs (iMS) within individual genomes from African, European, Asian and American population groups. We show that many iMSs, including some within disease-associated genes, are unique to a single population group. By measuring the conservation of microsatellites between human and chimpanzee genomes, we demonstrate that interruptions decrease the probability of microsatellite mutations throughout the genome. We demonstrate that iMSs arise in the human genome by single base changes within the DNA, and provide biochemical data suggesting that these stabilizing changes may be created by error-prone DNA polymerases. Our genome-wide study supports the model in which iMSs act to stabilize individual genomes, and suggests that population-specific differences in microsatellite architecture may be an avenue by which genetic ancestry impacts individual disease risk.
Collapse
Affiliation(s)
- Guruprasad Ananda
- Department of Biology, Penn State University, University Park, Pennsylvania, United States of America
| | - Suzanne E. Hile
- Department of Pathology, Gittlen Cancer Research Foundation, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania, United States of America
| | - Amanda Breski
- Department of Pathology, Gittlen Cancer Research Foundation, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania, United States of America
| | - Yanli Wang
- Department of Biology, Penn State University, University Park, Pennsylvania, United States of America
| | - Yogeshwar Kelkar
- Department of Biology, Penn State University, University Park, Pennsylvania, United States of America
| | - Kateryna D. Makova
- Department of Biology, Penn State University, University Park, Pennsylvania, United States of America
- Center for Medical Genomics, Penn State University, University Park, Pennsylvania, United States of America
- * E-mail: (KDM); (KAE)
| | - Kristin A. Eckert
- Department of Pathology, Gittlen Cancer Research Foundation, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania, United States of America
- Center for Medical Genomics, Penn State University, University Park, Pennsylvania, United States of America
- * E-mail: (KDM); (KAE)
| |
Collapse
|
33
|
Haasl RJ, Payseur BA. Remarkable selective constraints on exonic dinucleotide repeats. Evolution 2014; 68:2737-44. [PMID: 24899386 DOI: 10.1111/evo.12460] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2014] [Accepted: 05/14/2014] [Indexed: 01/07/2023]
Abstract
Long dinucleotide repeats found in exons present a substantial mutational hazard: mutations at these loci occur often and generate frameshifts. Here, we provide clear and compelling evidence that exonic dinucleotides experience strong selective constraint. In humans, only 18 exonic dinucleotides have repeat lengths greater than six, which contrasts sharply with the genome-wide distribution of dinucleotides. We genotyped each of these dinucleotides in 200 humans from eight 1000 Genomes Project populations and found a near-absence of polymorphism. More remarkably, divergence data demonstrate that repeat lengths have been conserved across the primate phylogeny in spite of what is likely considerable mutational pressure. Coalescent simulations show that even a very low mutation rate at these loci fails to explain the anomalous patterns of polymorphism and divergence. Our data support two related selective constraints on the evolution of exonic dinucleotides: a short-term intolerance for any change to repeat length and a long-term prevention of increases to repeat length. In general, our results implicate purifying selection as the force that eliminates new, deleterious mutants at exonic dinucleotides. We briefly discuss the evolution of the longest exonic dinucleotide in the human genome--a 10 x CA repeat in fibroblast growth factor receptor-like 1 (FGFRL1)--that should possess a considerably greater mutation rate than any other exonic dinucleotide and therefore generate a large number of deleterious variants.
Collapse
Affiliation(s)
- Ryan J Haasl
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, Wisconsin, 53706.
| | | |
Collapse
|
34
|
Brittain A, Stroebele E, Erives A. Microsatellite repeat instability fuels evolution of embryonic enhancers in Hawaiian Drosophila. PLoS One 2014; 9:e101177. [PMID: 24978198 PMCID: PMC4076327 DOI: 10.1371/journal.pone.0101177] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2014] [Accepted: 06/03/2014] [Indexed: 12/16/2022] Open
Abstract
For ∼30 million years, the eggs of Hawaiian Drosophila were laid in ever-changing environments caused by high rates of island formation. The associated diversification of the size and developmental rate of the syncytial fly embryo would have altered morphogenic gradients, thus necessitating frequent evolutionary compensation of transcriptional responses. We investigate the consequences these radiations had on transcriptional enhancers patterning the embryo to see whether their pattern of molecular evolution is different from non-Hawaiian species. We identify and functionally assay in transgenic D. melanogaster the Neurogenic Ectoderm Enhancers from two different Hawaiian Drosophila groups: (i) the picture wing group, and (ii) the modified mouthparts group. We find that the binding sites in this set of well-characterized enhancers are footprinted by diverse microsatellite repeat (MSR) sequences. We further show that Hawaiian embryonic enhancers in general are enriched in MSR relative to both Hawaiian non-embryonic enhancers and non-Hawaiian embryonic enhancers. We propose embryonic enhancers are sensitive to Activator spacing because they often serve as assembly scaffolds for the aggregation of transcription factor activator complexes. Furthermore, as most indels are produced by microsatellite repeat slippage, enhancers from Hawaiian Drosophila lineages, which experience dynamic evolutionary pressures, would become grossly enriched in MSR content.
Collapse
Affiliation(s)
- Andrew Brittain
- Department of Biology, University of Iowa, Iowa City, Iowa, United States of America
| | - Elizabeth Stroebele
- Department of Biology, University of Iowa, Iowa City, Iowa, United States of America
| | - Albert Erives
- Department of Biology, University of Iowa, Iowa City, Iowa, United States of America
- * E-mail:
| |
Collapse
|
35
|
|
36
|
Campos-Sánchez R, Kapusta A, Feschotte C, Chiaromonte F, Makova KD. Genomic landscape of human, bat, and ex vivo DNA transposon integrations. Mol Biol Evol 2014; 31:1816-32. [PMID: 24809961 DOI: 10.1093/molbev/msu138] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
The integration and fixation preferences of DNA transposons, one of the major classes of eukaryotic transposable elements, have never been evaluated comprehensively on a genome-wide scale. Here, we present a detailed study of the distribution of DNA transposons in the human and bat genomes. We studied three groups of DNA transposons that integrated at different evolutionary times: 1) ancient (>40 My) and currently inactive human elements, 2) younger (<40 My) bat elements, and 3) ex vivo integrations of piggyBat and Sleeping Beauty elements in HeLa cells. Although the distribution of ex vivo elements reflected integration preferences, the distribution of human and (to a lesser extent) bat elements was also affected by selection. We used regression techniques (linear, negative binomial, and logistic regression models with multiple predictors) applied to 20-kb and 1-Mb windows to investigate how the genomic landscape in the vicinity of DNA transposons contributes to their integration and fixation. Our models indicate that genomic landscape explains 16-79% of variability in DNA transposon genome-wide distribution. Importantly, we not only confirmed previously identified predictors (e.g., DNA conformation and recombination hotspots) but also identified several novel predictors (e.g., signatures of double-strand breaks and telomere hexamer). Ex vivo integrations showed a bias toward actively transcribed regions. Older DNA transposons were located in genomic regions scarce in most conserved elements-likely reflecting purifying selection. Our study highlights how DNA transposons are integral to the evolution of bat and human genomes, and has implications for the development of DNA transposon assays for gene therapy and mutagenesis applications.
Collapse
Affiliation(s)
- Rebeca Campos-Sánchez
- Genetics Program, The Huck Institutes of the Life Sciences, Penn State University, University Park, PA
| | - Aurélie Kapusta
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT
| | - Cédric Feschotte
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT
| | - Francesca Chiaromonte
- Center for Medical Genomics, The Huck Institutes of the Life Sciences, Penn State University, University Park, PADepartment of Statistics, Penn State University, University Park, PA
| | - Kateryna D Makova
- Center for Medical Genomics, The Huck Institutes of the Life Sciences, Penn State University, University Park, PADepartment of Biology, Penn State University, University Park, PA
| |
Collapse
|
37
|
Lin X, Wu J, Li H, Wang Z, Lin JM. Determination of mini-short tandem repeat (miniSTR) loci by using the combination of polymerase chain reaction (PCR) and microchip electrophoresis. Talanta 2013; 114:131-7. [DOI: 10.1016/j.talanta.2013.04.012] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2013] [Revised: 03/26/2013] [Accepted: 04/04/2013] [Indexed: 11/27/2022]
|
38
|
Grandi FC, An W. Non-LTR retrotransposons and microsatellites: Partners in genomic variation. Mob Genet Elements 2013; 3:e25674. [PMID: 24195012 PMCID: PMC3812793 DOI: 10.4161/mge.25674] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2013] [Revised: 07/07/2013] [Accepted: 07/09/2013] [Indexed: 01/10/2023] Open
Abstract
The human genome is laden with both non-LTR (long-terminal repeat) retrotransposons and microsatellite repeats. Both types of sequences are able to, either actively or passively, mutagenize the genomes of human individuals and are therefore poised to dynamically alter the human genomic landscape across generations. Non-LTR retrotransposons, such as L1 and Alu, are a major source of new microsatellites, which are born both concurrently and subsequently to L1 and Alu integration into the genome. Likewise, the mutation dynamics of microsatellite repeats have a direct impact on the fitness of their non-LTR retrotransposon parent owing to microsatellite expansion and contraction. This review explores the interactions and dynamics between non-LTR retrotransposons and microsatellites in the context of genomic variation and evolution.
Collapse
Affiliation(s)
- Fiorella C Grandi
- School of Molecular Biosciences and Center for Reproductive Biology; Washington State University; Pullman, WA USA
| | | |
Collapse
|
39
|
Montgomery SB, Goode DL, Kvikstad E, Albers CA, Zhang ZD, Mu XJ, Ananda G, Howie B, Karczewski KJ, Smith KS, Anaya V, Richardson R, Davis J, MacArthur DG, Sidow A, Duret L, Gerstein M, Makova KD, Marchini J, McVean G, Lunter G. The origin, evolution, and functional impact of short insertion-deletion variants identified in 179 human genomes. Genome Res 2013; 23:749-61. [PMID: 23478400 PMCID: PMC3638132 DOI: 10.1101/gr.148718.112] [Citation(s) in RCA: 163] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Short insertions and deletions (indels) are the second most abundant form of human genetic variation, but our understanding of their origins and functional effects lags behind that of other types of variants. Using population-scale sequencing, we have identified a high-quality set of 1.6 million indels from 179 individuals representing three diverse human populations. We show that rates of indel mutagenesis are highly heterogeneous, with 43%–48% of indels occurring in 4.03% of the genome, whereas in the remaining 96% their prevalence is 16 times lower than SNPs. Polymerase slippage can explain upwards of three-fourths of all indels, with the remainder being mostly simple deletions in complex sequence. However, insertions do occur and are significantly associated with pseudo-palindromic sequence features compatible with the fork stalling and template switching (FoSTeS) mechanism more commonly associated with large structural variations. We introduce a quantitative model of polymerase slippage, which enables us to identify indel-hypermutagenic protein-coding genes, some of which are associated with recurrent mutations leading to disease. Accounting for mutational rate heterogeneity due to sequence context, we find that indels across functional sequence are generally subject to stronger purifying selection than SNPs. We find that indel length modulates selection strength, and that indels affecting multiple functionally constrained nucleotides undergo stronger purifying selection. We further find that indels are enriched in associations with gene expression and find evidence for a contribution of nonsense-mediated decay. Finally, we show that indels can be integrated in existing genome-wide association studies (GWAS); although we do not find direct evidence that potentially causal protein-coding indels are enriched with associations to known disease-associated SNPs, our findings suggest that the causal variant underlying some of these associations may be indels.
Collapse
Affiliation(s)
- Stephen B Montgomery
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, 1211, Switzerland.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
40
|
Mature microsatellites: mechanisms underlying dinucleotide microsatellite mutational biases in human cells. G3-GENES GENOMES GENETICS 2013; 3:451-63. [PMID: 23450065 PMCID: PMC3583453 DOI: 10.1534/g3.112.005173] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2012] [Accepted: 12/30/2012] [Indexed: 12/19/2022]
Abstract
Dinucleotide microsatellites are dynamic DNA sequences that affect genome stability. Here, we focused on mature microsatellites, defined as pure repeats of lengths above the threshold and unlikely to mutate below it in a single mutational event. We investigated the prevalence and mutational behavior of these sequences by using human genome sequence data, human cells in culture, and purified DNA polymerases. Mature dinucleotides (≥10 units) are present within exonic sequences of >350 genes, resulting in vulnerability to cellular genetic integrity. Mature dinucleotide mutagenesis was examined experimentally using ex vivo and in vitro approaches. We observe an expansion bias for dinucleotide microsatellites up to 20 units in length in somatic human cells, in agreement with previous computational analyses of germ-line biases. Using purified DNA polymerases and human cell lines deficient for mismatch repair (MMR), we show that the expansion bias is caused by functional MMR and is not due to DNA polymerase error biases. Specifically, we observe that the MutSα and MutLα complexes protect against expansion mutations. Our data support a model wherein different MMR complexes shift the balance of mutations toward deletion or expansion. Finally, we show that replication fork progression is stalled within long dinucleotides, suggesting that mutational mechanisms within long repeats may be distinct from shorter lengths, depending on the biochemistry of fork resolution. Our work combines computational and experimental approaches to explain the complex mutational behavior of dinucleotide microsatellites in humans.
Collapse
|
41
|
Hile SE, Shabashev S, Eckert KA. Tumor-specific microsatellite instability: do distinct mechanisms underlie the MSI-L and EMAST phenotypes? Mutat Res 2012. [PMID: 23206442 DOI: 10.1016/j.mrfmmm.2012.11.003] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Microsatellite DNA sequences display allele length alterations or microsatellite instability (MSI) in tumor tissues, and MSI is used diagnostically for tumor detection and classification. We discuss the known types of tumor-specific MSI patterns and the relevant mechanisms underlying each pattern. Mutation rates of individual microsatellites vary greatly, and the intrinsic DNA features of motif size, sequence, and length contribute to this variation. MSI is used for detecting mismatch repair (MMR)-deficient tumors, which display an MSI-high phenotype due to genome-wide microsatellite destabilization. Because several pathways maintain microsatellite stability, tumors that have undergone other events associated with moderate genome instability may display diagnostic MSI only at specific di- or tetranucleotide markers. We summarize evidence for such alternative MSI forms (A-MSI) in sporadic cancers, also referred to as MSI-low and EMAST. While the existence of A-MSI is not disputed, there is disagreement about the origin and pathologic significance of this phenomenon. Although ambiguities due to PCR methods may be a source, evidence exists for other mechanisms to explain tumor-specific A-MSI. Some portion of A-MSI tumors may result from random mutational events arising during neoplastic cell evolution. However, this mechanism fails to explain the specificity of A-MSI for di- and tetranucleotide instability. We present evidence supporting the alternative argument that some A-MSI tumors arise by a distinct genetic pathway, and give examples of DNA metabolic pathways that, when altered, may be responsible for instability at specific microsatellite motifs. Finally, we suggest that A-MSI in tumors could be molecular signatures of environmental influences and DNA damage. Importantly, A-MSI occurs in several pre-neoplastic inflammatory states, including inflammatory bowel diseases, consistent with a role of oxidative stress in A-MSI. Understanding the biochemical basis of A-MSI tumor phenotypes will advance the development of new diagnostic tools and positively impact the clinical management of individual cancers.
Collapse
Affiliation(s)
- Suzanne E Hile
- Department of Pathology, Gittlen Cancer Research Foundation, Pennsylvania State University College of Medicine, 500 University Drive, Hershey, PA 17033, USA
| | - Samion Shabashev
- Department of Pathology, Gittlen Cancer Research Foundation, Pennsylvania State University College of Medicine, 500 University Drive, Hershey, PA 17033, USA
| | - Kristin A Eckert
- Department of Pathology, Gittlen Cancer Research Foundation, Pennsylvania State University College of Medicine, 500 University Drive, Hershey, PA 17033, USA.
| |
Collapse
|