1
|
Loh CA, Shields DA, Schwing A, Evrony GD. High-fidelity, large-scale targeted profiling of microsatellites. Genome Res 2024; 34:1008-1026. [PMID: 39013593 PMCID: PMC11368184 DOI: 10.1101/gr.278785.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 07/11/2024] [Indexed: 07/18/2024]
Abstract
Microsatellites are highly mutable sequences that can serve as markers for relationships among individuals or cells within a population. The accuracy and resolution of reconstructing these relationships depends on the fidelity of microsatellite profiling and the number of microsatellites profiled. However, current methods for targeted profiling of microsatellites incur significant "stutter" artifacts that interfere with accurate genotyping, and sequencing costs preclude whole-genome microsatellite profiling of a large number of samples. We developed a novel method for accurate and cost-effective targeted profiling of a panel of more than 150,000 microsatellites per sample, along with a computational tool for designing large-scale microsatellite panels. Our method addresses the greatest challenge for microsatellite profiling-"stutter" artifacts-with a low-temperature hybridization capture that significantly reduces these artifacts. We also developed a computational tool for accurate genotyping of the resulting microsatellite sequencing data that uses an ensemble approach integrating three microsatellite genotyping tools, which we optimize by analysis of de novo microsatellite mutations in human trios. Altogether, our suite of experimental and computational tools enables high-fidelity, large-scale profiling of microsatellites, which may find utility in diverse applications such as lineage tracing, population genetics, ecology, and forensics.
Collapse
Affiliation(s)
- Caitlin A Loh
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, New York 10016, USA
- Department of Pediatrics, Department of Neuroscience & Physiology, Institute for Systems Genetics, Perlmutter Cancer Center, and Neuroscience Institute, New York University Grossman School of Medicine, New York, New York 10016, USA
| | - Danielle A Shields
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, New York 10016, USA
- Department of Pediatrics, Department of Neuroscience & Physiology, Institute for Systems Genetics, Perlmutter Cancer Center, and Neuroscience Institute, New York University Grossman School of Medicine, New York, New York 10016, USA
| | - Adam Schwing
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, New York 10016, USA
- Department of Pediatrics, Department of Neuroscience & Physiology, Institute for Systems Genetics, Perlmutter Cancer Center, and Neuroscience Institute, New York University Grossman School of Medicine, New York, New York 10016, USA
| | - Gilad D Evrony
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, New York 10016, USA;
- Department of Pediatrics, Department of Neuroscience & Physiology, Institute for Systems Genetics, Perlmutter Cancer Center, and Neuroscience Institute, New York University Grossman School of Medicine, New York, New York 10016, USA
| |
Collapse
|
2
|
Lu J, Toro C, Adams DR, Moreno CAM, Lee WP, Leung YY, Harms MB, Vardarajan B, Heinzen EL. LUSTR: a new customizable tool for calling genome-wide germline and somatic short tandem repeat variants. BMC Genomics 2024; 25:115. [PMID: 38279154 PMCID: PMC10811831 DOI: 10.1186/s12864-023-09935-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 12/21/2023] [Indexed: 01/28/2024] Open
Abstract
BACKGROUND Short tandem repeats (STRs) are widely distributed across the human genome and are associated with numerous neurological disorders. However, the extent that STRs contribute to disease is likely under-estimated because of the challenges calling these variants in short read next generation sequencing data. Several computational tools have been developed for STR variant calling, but none fully address all of the complexities associated with this variant class. RESULTS Here we introduce LUSTR which is designed to address some of the challenges associated with STR variant calling by enabling more flexibility in defining STR loci, allowing for customizable modules to tailor analyses, and expanding the capability to call somatic and multiallelic STR variants. LUSTR is a user-friendly and easily customizable tool for targeted or unbiased genome-wide STR variant screening that can use either predefined or novel genome builds. Using both simulated and real data sets, we demonstrated that LUSTR accurately infers germline and somatic STR expansions in individuals with and without diseases. CONCLUSIONS LUSTR offers a powerful and user-friendly approach that allows for the identification of STR variants and can facilitate more comprehensive studies evaluating the role of pathogenic STR variants across human diseases.
Collapse
Affiliation(s)
- Jinfeng Lu
- Division of Pharmacotherapy and Experimental Therapeutics, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
- The Taub Institute for Research On Alzheimer's Disease and the Aging Brain, Gertrude H. Sergievsky Center, Department of Neurology, College of Physicians and Surgeons, Columbia University, The New York Presbyterian Hospital, New York, NY, 10032, USA.
| | - Camilo Toro
- NIH Undiagnosed Diseases Program, National Human Genome Research Institute (NHGRI), National Institutes of Health, Bethesda, MD, 20892, USA
| | - David R Adams
- NIH Undiagnosed Diseases Program, National Human Genome Research Institute (NHGRI), National Institutes of Health, Bethesda, MD, 20892, USA
| | | | - Wan-Ping Lee
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory MedicinePerelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Yuk Yee Leung
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory MedicinePerelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Mathew B Harms
- Department of Neurology, Division of Neuromuscular Medicine, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Badri Vardarajan
- The Taub Institute for Research On Alzheimer's Disease and the Aging Brain, Gertrude H. Sergievsky Center, Department of Neurology, College of Physicians and Surgeons, Columbia University, The New York Presbyterian Hospital, New York, NY, 10032, USA
| | - Erin L Heinzen
- Division of Pharmacotherapy and Experimental Therapeutics, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
3
|
Sihag P, Sagwal V, Kumar A, Balyan P, Mir RR, Dhankher OP, Kumar U. Discovery of miRNAs and Development of Heat-Responsive miRNA-SSR Markers for Characterization of Wheat Germplasm for Terminal Heat Tolerance Breeding. Front Genet 2021; 12:699420. [PMID: 34394189 PMCID: PMC8356722 DOI: 10.3389/fgene.2021.699420] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 06/30/2021] [Indexed: 11/13/2022] Open
Abstract
A large proportion of the Asian population fulfills their energy requirements from wheat (Triticum aestivum L.). Wheat quality and yield are critically affected by the terminal heat stress across the globe. It affects approximately 40% of the wheat-cultivating regions of the world. Therefore, there is a critical need to develop improved terminal heat-tolerant wheat varieties. Marker-assisted breeding with genic simple sequence repeats (SSR) markers have been used for developing terminal heat-tolerant wheat varieties; however, only few studies involved the use of microRNA (miRNA)-based SSR markers (miRNA-SSRs) in wheat, which were found as key players in various abiotic stresses. In the present study, we identified 104 heat-stress-responsive miRNAs reported in various crops. Out of these, 70 miRNA-SSR markers have been validated on a set of 20 terminal heat-tolerant and heat-susceptible wheat genotypes. Among these, only 19 miRNA-SSR markers were found to be polymorphic, which were further used to study the genetic diversity and population structure. The polymorphic miRNA-SSRs amplified 61 SSR loci with an average of 2.9 alleles per locus. The polymorphic information content (PIC) value of polymorphic miRNA-SSRs ranged from 0.10 to 0.87 with a mean value of 0.48. The dendrogram constructed using unweighted neighbor-joining method and population structure analysis clustered these 20 wheat genotypes into 3 clusters. The target genes of these miRNAs are involved either directly or indirectly in providing tolerance to heat stress. Furthermore, two polymorphic markers miR159c and miR165b were declared as very promising diagnostic markers, since these markers showed specific alleles and discriminated terminal heat-tolerant genotypes from the susceptible genotypes. Thus, these identified miRNA-SSR markers will prove useful in the characterization of wheat germplasm through the study of genetic diversity and population structural analysis and in wheat molecular breeding programs aimed at terminal heat tolerance of wheat varieties.
Collapse
Affiliation(s)
- Pooja Sihag
- Department of Molecular Biology, Biotechnology and Bioinformatics, College of Basic Sciences and Humanities, CCS Haryana Agricultural University, Hisar, India
| | - Vijeta Sagwal
- Department of Molecular Biology, Biotechnology and Bioinformatics, College of Basic Sciences and Humanities, CCS Haryana Agricultural University, Hisar, India
| | - Anuj Kumar
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | | | - Reyazul Rouf Mir
- Division of Genetics and Plant Breeding, Sher-e-Kashmir University of Agricultural Sciences and Technology, Srinagar, India
| | - Om Parkash Dhankher
- Stockbridge School of Agriculture, University of Massachusetts, Amherst, MA, United States
| | - Upendra Kumar
- Department of Molecular Biology, Biotechnology and Bioinformatics, College of Basic Sciences and Humanities, CCS Haryana Agricultural University, Hisar, India
| |
Collapse
|
4
|
Laskar R, Jilani MG, Ali S. Implications of genome simple sequence repeats signature in 98 Polyomaviridae species. 3 Biotech 2021; 11:35. [PMID: 33432281 PMCID: PMC7787124 DOI: 10.1007/s13205-020-02583-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Accepted: 11/02/2020] [Indexed: 01/21/2023] Open
Abstract
The analysis of simple sequence repeats (SSRs) in 98 genomes across four genera of the family Polyomaviridae was performed. The genome size ranged from 3962 (BM87) to 7369 bp (BM85) but maximum genomes were in the range of 5-5.5 kb. The GC% had an average of 42% and ranged between 34.69 (BM95) and 52.35 (BM81). A total of 3036 SSRs and 223 cSSRs were extracted using IMEx with incident frequency from 18 to 56 and 0 to 7, respectively. The most prevalent mono-nucleotide repeat motif was "T" (48.95%) followed by "A" (33.48%). "AT/TA" was the most prevalent dinucleotide motif closely followed by "CT/TC". The distribution was expectedly more in the coding region with 77.6% SSRs of which nearly half were in Large T Antigen (LTA) gene. Notably, most viruses with humans, apes and related species as host exhibited exclusivity of mono-nucleotide repeats in AT region, a proposed predictive marker for determination of humans as host in the virus in course of its evolution. Each genome has a unique SSR signature which is pivotal for viral evolution particularly in terms of host divergence. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s13205-020-02583-w.
Collapse
Affiliation(s)
- Rezwanuzzaman Laskar
- Clinical and Applied Genomics (CAG) Laboratory, Department of Biological Sciences, Aliah University, IIA/27, Newtown, Kolkata, 700160 India
| | - Md Gulam Jilani
- Clinical and Applied Genomics (CAG) Laboratory, Department of Biological Sciences, Aliah University, IIA/27, Newtown, Kolkata, 700160 India
| | - Safdar Ali
- Clinical and Applied Genomics (CAG) Laboratory, Department of Biological Sciences, Aliah University, IIA/27, Newtown, Kolkata, 700160 India
| |
Collapse
|
5
|
Fazal S, Danzi MC, Cintra VP, Bis-Brewer DM, Dolzhenko E, Eberle MA, Zuchner S. Large scale in silico characterization of repeat expansion variation in human genomes. Sci Data 2020; 7:294. [PMID: 32901039 PMCID: PMC7479135 DOI: 10.1038/s41597-020-00633-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Accepted: 08/13/2020] [Indexed: 11/21/2022] Open
Abstract
Significant progress has been made in elucidating single nucleotide polymorphism diversity in the human population. However, the majority of the variation space in the genome is structural and remains partially elusive. One form of structural variation is tandem repeats (TRs). Expansion of TRs are responsible for over 40 diseases, but we hypothesize these represent only a fraction of the pathogenic repeat expansions that exist. Here we characterize long or expanded TR variation in 1,115 human genomes as well as a replication cohort of 2,504 genomes, identified using ExpansionHunter Denovo. We found that individual genomes typically harbor several rare, large TRs, generally in non-coding regions of the genome. We noticed that these large TRs are enriched in their proximity to Alu elements. The vast majority of these large TRs seem to be expansions of smaller TRs that are already present in the reference genome. We are providing this TR profile as a resource for comparison to undiagnosed rare disease genomes in order to detect novel disease-causing repeat expansions.
Collapse
Affiliation(s)
- Sarah Fazal
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Matt C Danzi
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Vivian P Cintra
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Dana M Bis-Brewer
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | | | | | - Stephan Zuchner
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA.
| |
Collapse
|
6
|
A New Census of Protein Tandem Repeats and Their Relationship with Intrinsic Disorder. Genes (Basel) 2020; 11:genes11040407. [PMID: 32283633 PMCID: PMC7230257 DOI: 10.3390/genes11040407] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Revised: 03/29/2020] [Accepted: 04/01/2020] [Indexed: 12/31/2022] Open
Abstract
Protein tandem repeats (TRs) are often associated with immunity-related functions and diseases. Since that last census of protein TRs in 1999, the number of curated proteins increased more than seven-fold and new TR prediction methods were published. TRs appear to be enriched with intrinsic disorder and vice versa. The significance and the biological reasons for this association are unknown. Here, we characterize protein TRs across all kingdoms of life and their overlap with intrinsic disorder in unprecedented detail. Using state-of-the-art prediction methods, we estimate that 50.9% of proteins contain at least one TR, often located at the sequence flanks. Positive linear correlation between the proportion of TRs and the protein length was observed universally, with Eukaryotes in general having more TRs, but when the difference in length is taken into account the difference is quite small. TRs were enriched with disorder-promoting amino acids and were inside intrinsically disordered regions. Many such TRs were homorepeats. Our results support that TRs mostly originate by duplication and are involved in essential functions such as transcription processes, structural organization, electron transport and iron-binding. In viruses, TRs are found in proteins essential for virulence.
Collapse
|
7
|
Human Genomics in Immunology. Clin Immunol 2019. [DOI: 10.1016/b978-0-7020-6896-6.00033-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
8
|
Xu D, Pavlidis P, Taskent RO, Alachiotis N, Flanagan C, DeGiorgio M, Blekhman R, Ruhl S, Gokcumen O. Archaic Hominin Introgression in Africa Contributes to Functional Salivary MUC7 Genetic Variation. Mol Biol Evol 2017; 34:2704-2715. [PMID: 28957509 PMCID: PMC5850612 DOI: 10.1093/molbev/msx206] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
One of the most abundant proteins in human saliva, mucin-7, is encoded by the MUC7 gene, which harbors copy number variable subexonic repeats (PTS-repeats) that affect the size and glycosylation potential of this protein. We recently documented the adaptive evolution of MUC7 subexonic copy number variation among primates. Yet, the evolution of MUC7 genetic variation in humans remained unexplored. Here, we found that PTS-repeat copy number variation has evolved recurrently in the human lineage, thereby generating multiple haplotypic backgrounds carrying five or six PTS-repeat copy number alleles. Contrary to previous studies, we found no associations between the copy number of PTS-repeats and protection against asthma. Instead, we revealed a significant association of MUC7 haplotypic variation with the composition of the oral microbiome. Furthermore, based on in-depth simulations, we conclude that a divergent MUC7 haplotype likely originated in an unknown African hominin population and introgressed into ancestors of modern Africans.
Collapse
Affiliation(s)
- Duo Xu
- Department of Biological Sciences, University at Buffalo, The State University of New York, Buffalo, NY
| | - Pavlos Pavlidis
- Institute of Molecular Biology and Biotechnology (IMBB), Foundation for Research and Technology - Hellas, Heraklion, Crete, Greece
| | - Recep Ozgur Taskent
- Department of Biological Sciences, University at Buffalo, The State University of New York, Buffalo, NY
| | - Nikolaos Alachiotis
- Institute of Computer Science (ICS), Foundation for Research and Technology - Hellas, Heraklion, Crete, Greece
| | - Colin Flanagan
- Department of Biological Sciences, University at Buffalo, The State University of New York, Buffalo, NY
| | - Michael DeGiorgio
- Department of Biology and the Institute for CyberScience, Pennsylvania State University, University Park, PA
| | - Ran Blekhman
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Twin Cities, MN
| | - Stefan Ruhl
- Department of Oral Biology, School of Dental Medicine, University at Buffalo, The State University of New York, Buffalo, NY
| | - Omer Gokcumen
- Department of Biological Sciences, University at Buffalo, The State University of New York, Buffalo, NY
| |
Collapse
|
9
|
Li Y, Zhao Q, Liu H. Microdeletions at DYS448 and DYS387S1 associate with increased risk of male infertility. Syst Biol Reprod Med 2017; 63:318-323. [PMID: 28481628 DOI: 10.1080/19396368.2017.1321698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Male infertility affects many people of reproductive age. Diagnosis and therapies based on descriptive semen parameters have helped some of the infertility patients; however, further progress in reproductive therapy demands a better understanding of the molecular and genetic causes for male infertility. Although Y chromosome microdeletions have been a hot subject of genetic studies on male infertility, the relationship between male infertility and microdeletions at Y chromosome loci DYS448, DYS387, and DYS627 remains unclear. Here we analyzed the microdeletions at these three loci in 200 infertility male patients and 200 healthy subjects and showed that microdeletions at DYS448 and DYS387 correlate with male infertility. Our results suggest that genetic analyses of Y chromosome loci DYS448 and DYS387 can be genetic markers for reproductive diagnosis and therapy.
Collapse
Affiliation(s)
- Yanqing Li
- a Clinical Laboratory, The First Affiliated Hospital, Henan University of Traditional Chinese Medicine , Zhengzhou , Henan Province , China
| | - Qiurong Zhao
- b Zhengzhou Shen You Biological Technology Co. Ltd ., Zhengzhou , Henan Province , China
| | - Hai Liu
- c The Institute of Forensic Science and Technology, Henan Provincial Public Security Bureau , Zhengzhou , Henan Province , China
| |
Collapse
|
10
|
Paladin L, Hirsh L, Piovesan D, Andrade-Navarro MA, Kajava AV, Tosatto SCE. RepeatsDB 2.0: improved annotation, classification, search and visualization of repeat protein structures. Nucleic Acids Res 2016; 45:D308-D312. [PMID: 27899671 PMCID: PMC5210593 DOI: 10.1093/nar/gkw1136] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2016] [Revised: 10/20/2016] [Accepted: 10/31/2016] [Indexed: 12/19/2022] Open
Abstract
RepeatsDB 2.0 (URL: http://repeatsdb.bio.unipd.it/) is an update of the database of annotated tandem repeat protein structures. Repeat proteins are a widespread class of non-globular proteins carrying heterogeneous functions involved in several diseases. Here we provide a new version of RepeatsDB with an improved classification schema including high quality annotations for ∼5400 protein structures. RepeatsDB 2.0 features information on start and end positions for the repeat regions and units for all entries. The extensive growth of repeat unit characterization was possible by applying the novel ReUPred annotation method over the entire Protein Data Bank, with data quality is guaranteed by an extensive manual validation for >60% of the entries. The updated web interface includes a new search engine for complex queries and a fully re-designed entry page for a better overview of structural data. It is now possible to compare unit positions, together with secondary structure, fold information and Pfam domains. Moreover, a new classification level has been introduced on top of the existing scheme as an independent layer for sequence similarity relationships at 40%, 60% and 90% identity.
Collapse
Affiliation(s)
- Lisanna Paladin
- Dept. of Biomedical Sciences, University of Padua, 35121 Padova, Italy
| | - Layla Hirsh
- Dept. of Biomedical Sciences, University of Padua, 35121 Padova, Italy.,Departamento de Ingeniería, Pontificia Universidad Católica del Perú, 32 Lima, Perú
| | - Damiano Piovesan
- Dept. of Biomedical Sciences, University of Padua, 35121 Padova, Italy
| | - Miguel A Andrade-Navarro
- Institute of Molecular Biology, Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany
| | - Andrey V Kajava
- Centre de Recherches de Biochimie Macromoléculaire, CNRS, Université Montpellier, 34293 Montpellier, France.,Institut de Biologie Computationnelle (IBC), 34293 Montpellier, France.,Institute of Bioengineering, University ITMO, 197101 St. Petersburg, Russia
| | - Silvio C E Tosatto
- Dept. of Biomedical Sciences, University of Padua, 35121 Padova, Italy .,CNR Institute of Neuroscience, 35121 Padova, Italy
| |
Collapse
|
11
|
Xu D, Pavlidis P, Thamadilok S, Redwood E, Fox S, Blekhman R, Ruhl S, Gokcumen O. Recent evolution of the salivary mucin MUC7. Sci Rep 2016; 6:31791. [PMID: 27558399 PMCID: PMC4997351 DOI: 10.1038/srep31791] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2016] [Accepted: 07/26/2016] [Indexed: 11/23/2022] Open
Abstract
Genomic structural variants constitute the majority of variable base pairs in primate genomes and affect gene function in multiple ways. While whole gene duplications and deletions are relatively well-studied, the biology of subexonic (i.e., within coding exon sequences), copy number variation remains elusive. The salivary MUC7 gene provides an opportunity for studying such variation, as it harbors copy number variable subexonic repeat sequences that encode for densely O-glycosylated domains (PTS-repeats) with microbe-binding properties. To understand the evolution of this gene, we analyzed mammalian and primate genomes within a comparative framework. Our analyses revealed that (i) MUC7 has emerged in the placental mammal ancestor and rapidly gained multiple sites for O-glycosylation; (ii) MUC7 has retained its extracellular activity in saliva in placental mammals; (iii) the anti-fungal domain of the protein was remodified under positive selection in the primate lineage; and (iv) MUC7 PTS-repeats have evolved recurrently and under adaptive constraints. Our results establish MUC7 as a major player in salivary adaptation, likely as a response to diverse pathogenic exposure in primates. On a broader scale, our study highlights variable subexonic repeats as a primary source for modular evolutionary innovation that lead to rapid functional adaptation.
Collapse
Affiliation(s)
- Duo Xu
- Department of Biological Sciences, State University of New York at Buffalo, New York 14260, USA
| | - Pavlos Pavlidis
- Institute of Computer Science (ICS), Foundation of Research and Technology-Hellas, Heraklion, Crete, Greece
| | - Supaporn Thamadilok
- Department of Oral Biology, School of Dental Medicine, State University of New York at Buffalo, New York 14214, USA
| | - Emilie Redwood
- Department of Biological Sciences, State University of New York at Buffalo, New York 14260, USA
| | - Sara Fox
- Department of Biological Sciences, State University of New York at Buffalo, New York 14260, USA
| | - Ran Blekhman
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Twin Cities, Minnesota 55455, USA
| | - Stefan Ruhl
- Department of Oral Biology, School of Dental Medicine, State University of New York at Buffalo, New York 14214, USA
| | - Omer Gokcumen
- Department of Biological Sciences, State University of New York at Buffalo, New York 14260, USA
| |
Collapse
|
12
|
Fungtammasan A, Tomaszkiewicz M, Campos-Sánchez R, Eckert KA, DeGiorgio M, Makova KD. Reverse Transcription Errors and RNA-DNA Differences at Short Tandem Repeats. Mol Biol Evol 2016; 33:2744-58. [PMID: 27413049 PMCID: PMC5026258 DOI: 10.1093/molbev/msw139] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Transcript variation has important implications for organismal function in health and disease. Most transcriptome studies focus on assessing variation in gene expression levels and isoform representation. Variation at the level of transcript sequence is caused by RNA editing and transcription errors, and leads to nongenetically encoded transcript variants, or RNA–DNA differences (RDDs). Such variation has been understudied, in part because its detection is obscured by reverse transcription (RT) and sequencing errors. It has only been evaluated for intertranscript base substitution differences. Here, we investigated transcript sequence variation for short tandem repeats (STRs). We developed the first maximum-likelihood estimator (MLE) to infer RT error and RDD rates, taking next generation sequencing error rates into account. Using the MLE, we empirically evaluated RT error and RDD rates for STRs in a large-scale DNA and RNA replicated sequencing experiment conducted in a primate species. The RT error rates increased exponentially with STR length and were biased toward expansions. The RDD rates were approximately 1 order of magnitude lower than the RT error rates. The RT error rates estimated with the MLE from a primate data set were concordant with those estimated with an independent method, barcoded RNA sequencing, from a Caenorhabditis elegans data set. Our results have important implications for medical genomics, as STR allelic variation is associated with >40 diseases. STR nonallelic transcript variation can also contribute to disease phenotype. The MLE and empirical rates presented here can be used to evaluate the probability of disease-associated transcripts arising due to RDD.
Collapse
Affiliation(s)
- Arkarachai Fungtammasan
- Integrative Biosciences, Bioinformatics and Genomics Option, Pennsylvania State University Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University Huck Institute of Genome Sciences, Pennsylvania State University
| | - Marta Tomaszkiewicz
- Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University
| | - Rebeca Campos-Sánchez
- Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University
| | - Kristin A Eckert
- Center for Medical Genomics, Pennsylvania State University Department of Pathology, The Jake Gittlen Laboratories for Cancer Research, The Pennsylvania State University College of Medicine
| | - Michael DeGiorgio
- Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University Institute for CyberScience, Pennsylvania State University
| | - Kateryna D Makova
- Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University Huck Institute of Genome Sciences, Pennsylvania State University
| |
Collapse
|
13
|
Survey and analysis of simple sequence repeats (SSRs) in three genomes of Candida species. Gene 2016; 584:129-35. [PMID: 26883055 DOI: 10.1016/j.gene.2016.02.018] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Revised: 01/15/2016] [Accepted: 02/12/2016] [Indexed: 11/23/2022]
Abstract
Simple sequence repeats (SSRs) or microsatellites, which composed of tandem repeated short units of 1-6 bp, have been paying attention continuously. Here, the distribution, composition and polymorphism of microsatellites and compound microsatellites were analyzed in three available genomes of Candida species (Candida dubliniensis, Candida glabrata and Candida orthopsilosis). The results show that there were 118,047, 66,259 and 61,119 microsatellites in genomes of C. dubliniensis, C. glabrata and C. orthopsilosis, respectively. The SSRs covered more than 1/3 length of genomes in the three species. The microsatellites, which just consist of bases A and (or) T, such as (A)n, (T)n, (AT)n, (TA)n, (AAT)n, (TAA)n, (TTA)n, (ATA)n, (ATT)n and (TAT)n, were predominant in the three genomes. The length of microsatellites was focused on 6 bp and 9 bp either in the three genomes or in its coding sequences. What's more, the relative abundance (19.89/kbp) and relative density (167.87 bp/kbp) of SSRs in sequence of mitochondrion of C. glabrata were significantly great than that in any one of genomes or chromosomes of the three species. In addition, the distance between any two adjacent microsatellites was an important factor to influence the formation of compound microsatellites. The analysis may be helpful for further studying the roles of microsatellites in genomes' origination, organization and evolution of Candida species.
Collapse
|
14
|
Comparative analysis of microsatellites and compound microsatellites in T4-like viruses. Gene 2016; 575:695-701. [DOI: 10.1016/j.gene.2015.09.053] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2015] [Revised: 09/16/2015] [Accepted: 09/21/2015] [Indexed: 01/27/2023]
|
15
|
Boschiero C, Gheyas AA, Ralph HK, Eory L, Paton B, Kuo R, Fulton J, Preisinger R, Kaiser P, Burt DW. Detection and characterization of small insertion and deletion genetic variants in modern layer chicken genomes. BMC Genomics 2015; 16:562. [PMID: 26227840 PMCID: PMC4563830 DOI: 10.1186/s12864-015-1711-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2014] [Accepted: 06/22/2015] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Small insertions and deletions (InDels) constitute the second most abundant class of genetic variants and have been found to be associated with many traits and diseases. The present study reports on the detection and characterisation of about 883 K high quality InDels from the whole-genome analysis of several modern layer chicken lines from diverse breeds. RESULTS To reduce the error rates seen in InDel detection, this study used the consensus set from two InDel-calling packages: SAMtools and Dindel, as well as stringent post-filtering criteria. By analysing sequence data from 163 chickens from 11 commercial and 5 experimental layer lines, this study detected about 883 K high quality consensus InDels with 93% validation rate and an average density of 0.78 InDels/kb over the genome. Certain chromosomes, viz, GGAZ, 16, 22 and 25 showed very low densities of InDels whereas the highest rate was observed on GGA6. In spite of the higher recombination rates on microchromosomes, the InDel density on these chromosomes was generally lower relative to macrochromosomes possibly due to their higher gene density. About 43-87% of the InDels were found to be fixed within each line. The majority of detected InDels (86%) were 1-5 bases and about 63% were non-repetitive in nature while the rest were tandem repeats of various motif types. Functional annotation identified 613 frameshift, 465 non-frameshift and 10 stop-gain/loss InDels. Apart from the frameshift and stopgain/loss InDels that are expected to affect the translation of protein sequences and their biological activity, 33% of the non-frameshift were predicted as evolutionary intolerant with potential impact on protein functions. Moreover, about 2.5% of the InDels coincided with the most-conserved elements previously mapped on the chicken genome and are likely to define functional elements. InDels potentially affecting protein function were found to be enriched for certain gene-classes e.g. those associated with cell proliferation, chromosome and Golgi organization, spermatogenesis, and muscle contraction. CONCLUSIONS The large catalogue of InDels presented in this study along with their associated information such as functional annotation, estimated allele frequency, etc. are expected to serve as a rich resource for application in future research and breeding in the chicken.
Collapse
Affiliation(s)
- Clarissa Boschiero
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK. .,Current Address: Departamento de Zootecnia, University of Sao Paulo/ESALQ, Piracicaba, SP, 13418-900, Brazil.
| | - Almas A Gheyas
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | - Hannah K Ralph
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | - Lel Eory
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | - Bob Paton
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | - Richard Kuo
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | | | | | - Pete Kaiser
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | - David W Burt
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| |
Collapse
|
16
|
Schmid M, Smith J, Burt DW, Aken BL, Antin PB, Archibald AL, Ashwell C, Blackshear PJ, Boschiero C, Brown CT, Burgess SC, Cheng HH, Chow W, Coble DJ, Cooksey A, Crooijmans RPMA, Damas J, Davis RVN, de Koning DJ, Delany ME, Derrien T, Desta TT, Dunn IC, Dunn M, Ellegren H, Eöry L, Erb I, Farré M, Fasold M, Fleming D, Flicek P, Fowler KE, Frésard L, Froman DP, Garceau V, Gardner PP, Gheyas AA, Griffin DK, Groenen MAM, Haaf T, Hanotte O, Hart A, Häsler J, Hedges SB, Hertel J, Howe K, Hubbard A, Hume DA, Kaiser P, Kedra D, Kemp SJ, Klopp C, Kniel KE, Kuo R, Lagarrigue S, Lamont SJ, Larkin DM, Lawal RA, Markland SM, McCarthy F, McCormack HA, McPherson MC, Motegi A, Muljo SA, Münsterberg A, Nag R, Nanda I, Neuberger M, Nitsche A, Notredame C, Noyes H, O'Connor R, O'Hare EA, Oler AJ, Ommeh SC, Pais H, Persia M, Pitel F, Preeyanon L, Prieto Barja P, Pritchett EM, Rhoads DD, Robinson CM, Romanov MN, Rothschild M, Roux PF, Schmidt CJ, Schneider AS, Schwartz MG, Searle SM, Skinner MA, Smith CA, Stadler PF, Steeves TE, Steinlein C, Sun L, Takata M, Ulitsky I, Wang Q, Wang Y, Warren WC, Wood JMD, Wragg D, Zhou H. Third Report on Chicken Genes and Chromosomes 2015. Cytogenet Genome Res 2015; 145:78-179. [PMID: 26282327 PMCID: PMC5120589 DOI: 10.1159/000430927] [Citation(s) in RCA: 65] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
- Michael Schmid
- Department of Human Genetics, University of Würzburg, Würzburg, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Pagaduan JV, Sahore V, Woolley AT. Applications of microfluidics and microchip electrophoresis for potential clinical biomarker analysis. Anal Bioanal Chem 2015; 407:6911-22. [PMID: 25855148 DOI: 10.1007/s00216-015-8622-5] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2015] [Revised: 02/20/2015] [Accepted: 03/05/2015] [Indexed: 10/23/2022]
Abstract
This article reviews advances over the last five years in microfluidics and microchip-electrophoresis techniques for detection of clinical biomarkers. The variety of advantages of miniaturization compared with conventional benchtop methods for detecting biomarkers has resulted in increased interest in developing cheap, fast, and sensitive techniques. We discuss the development of applications of microfluidics and microchip electrophoresis for analysis of different clinical samples for pathogen identification, personalized medicine, and biomarker detection. We emphasize the advantages of microfluidic techniques over conventional methods, which make them attractive future diagnostic tools. We also discuss the versatility and adaptability of this technology for analysis of a variety of biomarkers, including lipids, small molecules, carbohydrates, nucleic acids, proteins, and cells. Finally, we conclude with a discussion of aspects that need to be improved to move this technology towards routine clinical and point-of-care applications.
Collapse
Affiliation(s)
- Jayson V Pagaduan
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, UT, 84602, USA
| | | | | |
Collapse
|
18
|
The analysis of microsatellites and compound microsatellites in 56 complete genomes of Herpesvirales. Gene 2014; 551:103-9. [DOI: 10.1016/j.gene.2014.08.054] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2014] [Revised: 08/09/2014] [Accepted: 08/26/2014] [Indexed: 01/13/2023]
|
19
|
Ananda G, Hile SE, Breski A, Wang Y, Kelkar Y, Makova KD, Eckert KA. Microsatellite interruptions stabilize primate genomes and exist as population-specific single nucleotide polymorphisms within individual human genomes. PLoS Genet 2014; 10:e1004498. [PMID: 25033203 PMCID: PMC4102424 DOI: 10.1371/journal.pgen.1004498] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2013] [Accepted: 05/28/2014] [Indexed: 01/01/2023] Open
Abstract
Interruptions of microsatellite sequences impact genome evolution and can alter disease manifestation. However, human polymorphism levels at interrupted microsatellites (iMSs) are not known at a genome-wide scale, and the pathways for gaining interruptions are poorly understood. Using the 1000 Genomes Phase-1 variant call set, we interrogated mono-, di-, tri-, and tetranucleotide repeats up to 10 units in length. We detected ∼26,000–40,000 iMSs within each of four human population groups (African, European, East Asian, and American). We identified population-specific iMSs within exonic regions, and discovered that known disease-associated iMSs contain alleles present at differing frequencies among the populations. By analyzing longer microsatellites in primate genomes, we demonstrate that single interruptions result in a genome-wide average two- to six-fold reduction in microsatellite mutability, as compared with perfect microsatellites. Centrally located interruptions lowered mutability dramatically, by two to three orders of magnitude. Using a biochemical approach, we tested directly whether the mutability of a specific iMS is lower because of decreased DNA polymerase strand slippage errors. Modeling the adenomatous polyposis coli tumor suppressor gene sequence, we observed that a single base substitution interruption reduced strand slippage error rates five- to 50-fold, relative to a perfect repeat, during synthesis by DNA polymerases α, β, or η. Computationally, we demonstrate that iMSs arise primarily by base substitution mutations within individual human genomes. Our biochemical survey of human DNA polymerase α, β, δ, κ, and η error rates within certain microsatellites suggests that interruptions are created most frequently by low fidelity polymerases. Our combined computational and biochemical results demonstrate that iMSs are abundant in human genomes and are sources of population-specific genetic variation that may affect genome stability. The genome-wide identification of iMSs in human populations presented here has important implications for current models describing the impact of microsatellite polymorphisms on gene expression. Microsatellites are short tandem repeat DNA sequences located throughout the human genome that display a high degree of inter-individual variation. This characteristic makes microsatellites an attractive tool for population genetics and forensics research. Some microsatellites affect gene expression, and mutations within such microsatellites can cause disease. Interruption mutations disrupt the perfect repeated array and are frequently associated with altered disease risk, but they have not been thoroughly studied in human genomes. We identified interrupted mono-, di-, tri- and tetranucleotide MSs (iMS) within individual genomes from African, European, Asian and American population groups. We show that many iMSs, including some within disease-associated genes, are unique to a single population group. By measuring the conservation of microsatellites between human and chimpanzee genomes, we demonstrate that interruptions decrease the probability of microsatellite mutations throughout the genome. We demonstrate that iMSs arise in the human genome by single base changes within the DNA, and provide biochemical data suggesting that these stabilizing changes may be created by error-prone DNA polymerases. Our genome-wide study supports the model in which iMSs act to stabilize individual genomes, and suggests that population-specific differences in microsatellite architecture may be an avenue by which genetic ancestry impacts individual disease risk.
Collapse
Affiliation(s)
- Guruprasad Ananda
- Department of Biology, Penn State University, University Park, Pennsylvania, United States of America
| | - Suzanne E. Hile
- Department of Pathology, Gittlen Cancer Research Foundation, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania, United States of America
| | - Amanda Breski
- Department of Pathology, Gittlen Cancer Research Foundation, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania, United States of America
| | - Yanli Wang
- Department of Biology, Penn State University, University Park, Pennsylvania, United States of America
| | - Yogeshwar Kelkar
- Department of Biology, Penn State University, University Park, Pennsylvania, United States of America
| | - Kateryna D. Makova
- Department of Biology, Penn State University, University Park, Pennsylvania, United States of America
- Center for Medical Genomics, Penn State University, University Park, Pennsylvania, United States of America
- * E-mail: (KDM); (KAE)
| | - Kristin A. Eckert
- Department of Pathology, Gittlen Cancer Research Foundation, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania, United States of America
- Center for Medical Genomics, Penn State University, University Park, Pennsylvania, United States of America
- * E-mail: (KDM); (KAE)
| |
Collapse
|
20
|
Singh AK, Alam CM, Sharfuddin C, Ali S. Frequency and distribution of simple and compound microsatellites in forty-eight Human papillomavirus (HPV) genomes. INFECTION GENETICS AND EVOLUTION 2014; 24:92-8. [PMID: 24662441 DOI: 10.1016/j.meegid.2014.03.010] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 01/18/2014] [Revised: 03/02/2014] [Accepted: 03/12/2014] [Indexed: 12/14/2022]
Abstract
Simple sequence repeats (SSRs) are tandem-repeated sequences ubiquitously present but differentially distributed across genomes. Present study is a systematic analysis for incidence, composition and complexity of different microsatellites in 48 representative Human papillomavirus (HPV) genomes. The analysis revealed a total of 1868 SSRs and 120 cSSRs. However, four genomes (HPV-60, HPV-92, HPV-112 and HPV-136) lacked any cSSR content; while HPV-31 accounted for a maximum of 10 cSSRs. An overall increase in cSSR% with higher dMAX was observed. The SSRs and cSSRs were prevalent in coding regions. Poly(A/T) repeats were significantly more abundant than poly(G/C) repeats possibly due to high (A/T) content of the HPV genomes. Further, higher prevalence of di-nucleotide repeats over tri-nucleotide repeats may be attributed to instability of former because of higher slippage rate. An in-depth study of the satellite sequences would provide an insight into the imperfections and evolution of microsatellites.
Collapse
Affiliation(s)
- Avadhesh Kumar Singh
- Department of Biomedical Sciences, SRCASW, University of Delhi, Vasundhara Enclave, New Delhi 110096, India
| | | | | | - Safdar Ali
- Department of Biomedical Sciences, SRCASW, University of Delhi, Vasundhara Enclave, New Delhi 110096, India.
| |
Collapse
|
21
|
Kidane D, Chae WJ, Czochor J, Eckert KA, Glazer PM, Bothwell ALM, Sweasy JB. Interplay between DNA repair and inflammation, and the link to cancer. Crit Rev Biochem Mol Biol 2014; 49:116-39. [PMID: 24410153 DOI: 10.3109/10409238.2013.875514] [Citation(s) in RCA: 117] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
DNA damage and repair are linked to cancer. DNA damage that is induced endogenously or from exogenous sources has the potential to result in mutations and genomic instability if not properly repaired, eventually leading to cancer. Inflammation is also linked to cancer. Reactive oxygen and nitrogen species (RONs) produced by inflammatory cells at sites of infection can induce DNA damage. RONs can also amplify inflammatory responses, leading to increased DNA damage. Here, we focus on the links between DNA damage, repair, and inflammation, as they relate to cancer. We examine the interplay between chronic inflammation, DNA damage and repair and review recent findings in this rapidly emerging field, including the links between DNA damage and the innate immune system, and the roles of inflammation in altering the microbiome, which subsequently leads to the induction of DNA damage in the colon. Mouse models of defective DNA repair and inflammatory control are extensively reviewed, including treatment of mouse models with pathogens, which leads to DNA damage. The roles of microRNAs in regulating inflammation and DNA repair are discussed. Importantly, DNA repair and inflammation are linked in many important ways, and in some cases balance each other to maintain homeostasis. The failure to repair DNA damage or to control inflammatory responses has the potential to lead to cancer.
Collapse
Affiliation(s)
- Dawit Kidane
- Departments of Therapeutic Radiology and Genetics
| | | | | | | | | | | | | |
Collapse
|
22
|
Mature microsatellites: mechanisms underlying dinucleotide microsatellite mutational biases in human cells. G3-GENES GENOMES GENETICS 2013; 3:451-63. [PMID: 23450065 PMCID: PMC3583453 DOI: 10.1534/g3.112.005173] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2012] [Accepted: 12/30/2012] [Indexed: 12/19/2022]
Abstract
Dinucleotide microsatellites are dynamic DNA sequences that affect genome stability. Here, we focused on mature microsatellites, defined as pure repeats of lengths above the threshold and unlikely to mutate below it in a single mutational event. We investigated the prevalence and mutational behavior of these sequences by using human genome sequence data, human cells in culture, and purified DNA polymerases. Mature dinucleotides (≥10 units) are present within exonic sequences of >350 genes, resulting in vulnerability to cellular genetic integrity. Mature dinucleotide mutagenesis was examined experimentally using ex vivo and in vitro approaches. We observe an expansion bias for dinucleotide microsatellites up to 20 units in length in somatic human cells, in agreement with previous computational analyses of germ-line biases. Using purified DNA polymerases and human cell lines deficient for mismatch repair (MMR), we show that the expansion bias is caused by functional MMR and is not due to DNA polymerase error biases. Specifically, we observe that the MutSα and MutLα complexes protect against expansion mutations. Our data support a model wherein different MMR complexes shift the balance of mutations toward deletion or expansion. Finally, we show that replication fork progression is stalled within long dinucleotides, suggesting that mutational mechanisms within long repeats may be distinct from shorter lengths, depending on the biochemistry of fork resolution. Our work combines computational and experimental approaches to explain the complex mutational behavior of dinucleotide microsatellites in humans.
Collapse
|
23
|
Chen M, Tan Z, Zeng G, Zeng Z. Differential distribution of compound microsatellites in various Human Immunodeficiency Virus Type 1 complete genomes. INFECTION GENETICS AND EVOLUTION 2012; 12:1452-7. [DOI: 10.1016/j.meegid.2012.05.006] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2011] [Revised: 05/04/2012] [Accepted: 05/12/2012] [Indexed: 12/21/2022]
|
24
|
Khodakov D, Thredgold L, Lenehan CE, Andersson GG, Kobus H, Ellis AV. DNA capture-probe based separation of double-stranded polymerase chain reaction amplification products in poly(dimethylsiloxane) microfluidic channels. BIOMICROFLUIDICS 2012; 6:26503. [PMID: 23761843 PMCID: PMC3386992 DOI: 10.1063/1.4729131] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2012] [Accepted: 05/29/2012] [Indexed: 05/12/2023]
Abstract
Herein, we describe the development of a novel primer system that allows for the capture of double-stranded polymerase chain reaction (PCR) amplification products onto a microfluidic channel without any preliminary purification stages. We show that specially designed PCR primers consisting of the main primer sequence and an additional "tag sequence" linked through a poly(ethylene glycol) molecule can be used to generate ds-PCR amplification products tailed with ss-oligonucleotides of two forensically relevant genes (amelogenin and human c-fms (macrophage colony-stimulating factor) proto-oncogene for the CSF-1 receptor (CSF1PO). Furthermore, with a view to enriching and eluting the ds-PCR products of amplification on a capillary electrophoretic-based microfluidic device we describe the capture of the target ds-PCR products onto poly(dimethylsiloxane) microchannels modified with ss-oligonucleotide capture probes.
Collapse
Affiliation(s)
- Dmitriy Khodakov
- Flinders Centre for NanoScale Science and Technology, School of Chemical and Physical Sciences, Flinders University, GPO Box 2100, Adelaide, SA 5001, Australia
| | | | | | | | | | | |
Collapse
|
25
|
Bakhtiarizadeh MR, Ebrahimi M, Ebrahimie E. Discovery of EST-SSRs in lung cancer: tagged ESTs with SSRs lead to differential amino acid and protein expression patterns in cancerous tissues. PLoS One 2011; 6:e27118. [PMID: 22073269 PMCID: PMC3208562 DOI: 10.1371/journal.pone.0027118] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2011] [Accepted: 10/11/2011] [Indexed: 11/18/2022] Open
Abstract
Tandem repeats are found in both coding and non-coding sequences of higher organisms. These sequences can be used in cancer genetics and diagnosis to unravel the genetic basis of tumor formation and progression. In this study, a possible relationship between SSR distributions and lung cancer was studied by comparative analysis of EST-SSRs in normal and lung cancerous tissues. While the EST-SSR distribution was similar between tumorous tissues, this distribution was different between normal and tumorous tissues. Trinucleotides tandem repeats were highly different; the number of trinucleotides in ESTs of lung cancer was 3 times higher than normal tissue. Significant negative correlation between normal and cancerous tissue showed that cancerous tissue generates different types of trinucleotides. GGC and CGC were the more frequent expressed trinucleotides in cancerous tissue, but these SSRs were not expressed in normal tissue. Similar to the EST level, the expression pattern of EST-SSRs-derived amino acids was significantly different between normal and cancerous tissues. Arg, Pro, Ser, Gly, and Lys were the most abundant amino acids in cancerous tissues, and Leu, Cys, Phe, and His were significantly more abundant in normal tissues than in cancerous tissues. Next, the putative functions of triplet SSR-containing genes were analyzed. In cancerous tissue, EST-SSRs produce different types of proteins. Chromodomain helicase DNA binding proteins were one of the major protein products of EST-SSRs in the cancerous library, while these proteins were not produced from EST-SSRs in normal tissue. For the first time, the findings of this study confirmed that EST-SSRs in normal lung tissues are different than in unhealthy tissues, and tagged ESTs with SSRs cause remarkable differences in amino acid and protein expression patterns in cancerous tissue. We suggest that EST-SSRs and EST-SSRs differentially expressed in cancerous tissue may be suitable candidate markers for lung cancer diagnosis and prediction.
Collapse
Affiliation(s)
| | - Mansour Ebrahimi
- Department of Biology & Bioinformatics Research Group, University of Qom, Qom, Iran
| | - Esmaeil Ebrahimie
- School of Molecular and Biomedical Science, The University of Adelaide, Adelaide, Australia
- * E-mail:
| |
Collapse
|
26
|
Krzyzosiak WJ, Sobczak K, Wojciechowska M, Fiszer A, Mykowska A, Kozlowski P. Triplet repeat RNA structure and its role as pathogenic agent and therapeutic target. Nucleic Acids Res 2011; 40:11-26. [PMID: 21908410 PMCID: PMC3245940 DOI: 10.1093/nar/gkr729] [Citation(s) in RCA: 122] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
This review presents detailed information about the structure of triplet repeat RNA and addresses the simple sequence repeats of normal and expanded lengths in the context of the physiological and pathogenic roles played in human cells. First, we discuss the occurrence and frequency of various trinucleotide repeats in transcripts and classify them according to the propensity to form RNA structures of different architectures and stabilities. We show that repeats capable of forming hairpin structures are overrepresented in exons, which implies that they may have important functions. We further describe long triplet repeat RNA as a pathogenic agent by presenting human neurological diseases caused by triplet repeat expansions in which mutant RNA gains a toxic function. Prominent examples of these diseases include myotonic dystrophy type 1 and fragile X-associated tremor ataxia syndrome, which are triggered by mutant CUG and CGG repeats, respectively. In addition, we discuss RNA-mediated pathogenesis in polyglutamine disorders such as Huntington's disease and spinocerebellar ataxia type 3, in which expanded CAG repeats may act as an auxiliary toxic agent. Finally, triplet repeat RNA is presented as a therapeutic target. We describe various concepts and approaches aimed at the selective inhibition of mutant transcript activity in experimental therapies developed for repeat-associated diseases.
Collapse
Affiliation(s)
- Wlodzimierz J Krzyzosiak
- Laboratory of Cancer Genetics, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland.
| | | | | | | | | | | |
Collapse
|
27
|
Characterization of Unique Signature Sequences in the Divergent Maternal Protein Bcl2l10. Mol Biol Evol 2011; 28:3271-83. [DOI: 10.1093/molbev/msr152] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
28
|
Jarem DA, Huckaby LV, Delaney S. AGG interruptions in (CGG)(n) DNA repeat tracts modulate the structure and thermodynamics of non-B conformations in vitro. Biochemistry 2010; 49:6826-37. [PMID: 20695523 DOI: 10.1021/bi1007782] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The trinucleotide repeat sequence CGG/CCG is known to expand in the human genome. This expansion is the primary pathogenic signature of fragile X syndrome, which is the most common form of inherited mental retardation. It has been proposed that formation of non-B conformations by the repetitive sequence contributes to the expansion mechanism. It is also known that the CGG/CCG repeat sequence of healthy individuals, which is not prone to expansion, contains AGG/CCT interruptions every 8-11 CGG/CCG repeats. Using DNA containing 19 or 39 CGG repeats, we have found that both the position and number of interruptions modulate the non-B conformation adopted by the repeat sequence. Analysis by chemical probes revealed larger loops and the presence of bulges for sequences containing interruptions. Additionally, using optical analysis and calorimetry, the effect of these structural changes on the thermodynamic stability of the conformation has been quantified. Notably, changing even one nucleotide, as occurs when CGG is replaced with an AGG interruption, causes a measurable decrease in the stability of the conformation adopted by the repeat sequence. These results provide insight into the role interruptions may play in preventing expansion in vivo and also contribute to our understanding of the relationship between non-B conformations and trinucleotide repeat expansion.
Collapse
Affiliation(s)
- Daniel A Jarem
- Department of Chemistry, Brown University, Providence, Rhode Island 02912, USA
| | | | | |
Collapse
|
29
|
Chen M, Tan Z, Zeng G, Peng J. Comprehensive analysis of simple sequence repeats in pre-miRNAs. Mol Biol Evol 2010; 27:2227-32. [PMID: 20395311 DOI: 10.1093/molbev/msq100] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Simple sequence repeats (SSRs) are tandem repeat units of 1-6 bp that are identified in various complete sequences. However, the distribution, nature, and origination of SSRs in pre-miRNAs, which are characteristic stem-loop sequences and are finally processed into ∼22 nt functional miRNAs contributing to regulate several biological processes, are still not well studied. The availability of large numbers of pre-miRNAs makes it possible to analyze and compare the occurrences of SSRs, the relative count of SSRs, or the longest SSRs in pre-miRNAs. In this study, we analyzed SSRs in 8,619 pre-miRNAs from 87 species, including Arthropoda, Nematoda, Platyhelminthes, Urochordata, Vertebrata, Mycetozoa, Protistae, Viridiplantae, and Viruses. We find that SSRs widely exist in the pre-miRNAs analyzed. Our analysis shows that mononucleotide repeats are the most abundant repeats, followed by dinucleotide repeats, whereas tri-, tetra-, penta-, and hexanucleotide repeats rarely occurred in pre-miRNAs. The number of SSRs per pre-miRNA on average ranges from 4.1 for viruses to 13.5 for Mycetozoa. Our results confirm that the number of repeats correlates inversely to the length of repeats. Generally, in each taxonomic group, the occurrence and relative count of SSRs decrease with the increase of repeat unit. SSRs do not exhibit obvious preference for special location in pre-miRNAs. The repeats in pre-miRNAs are complementary to repeats in coding or noncoding regions of genomes, and no significant difference is observed between these two classes with respect to the occurrence of repeats. These data on SSRs may become a useful resource of pre-miRNAs, and their possible functions are discussed.
Collapse
Affiliation(s)
- Ming Chen
- College of Environmental Science and Engineering, Hunan University, Changsha, China
| | | | | | | |
Collapse
|
30
|
Kozlowski P, de Mezer M, Krzyzosiak WJ. Trinucleotide repeats in human genome and exome. Nucleic Acids Res 2010; 38:4027-39. [PMID: 20215431 PMCID: PMC2896521 DOI: 10.1093/nar/gkq127] [Citation(s) in RCA: 94] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Trinucleotide repeats (TNRs) are of interest in genetics because they are used as markers for tracing genotype–phenotype relations and because they are directly involved in numerous human genetic diseases. In this study, we searched the human genome reference sequence and annotated exons (exome) for the presence of uninterrupted triplet repeat tracts composed of six or more repeated units. A list of 32 448 TNRs and 878 TNR-containing genes was generated and is provided herein. We found that some triplet repeats, specifically CNG, are overrepresented, while CTT, ATC, AAC and AAT are underrepresented in exons. This observation suggests that the occurrence of TNRs in exons is not random, but undergoes positive or negative selective pressure. Additionally, TNR types strongly determine their localization in mRNA sections (ORF, UTRs). Most genes containing exon-overrepresented TNRs are associated with gene ontology-defined functions. Surprisingly, many groups of genes that contain TNR types coding for different homo-amino acid tracts associate with the same transcription-related GO categories. We propose that TNRs have potential to be functional genetic elements and that their variation may be involved in the regulation of many common phenotypes; as such, TNR polymorphisms should be considered a priority in association studies.
Collapse
Affiliation(s)
- Piotr Kozlowski
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland.
| | | | | |
Collapse
|
31
|
Hannan AJ. Tandem repeat polymorphisms: modulators of disease susceptibility and candidates for ‘missing heritability’. Trends Genet 2010; 26:59-65. [PMID: 20036436 DOI: 10.1016/j.tig.2009.11.008] [Citation(s) in RCA: 114] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2009] [Revised: 11/27/2009] [Accepted: 11/30/2009] [Indexed: 01/26/2023]
|
32
|
Chen M, Tan Z, Jiang J, Li M, Chen H, Shen G, Yu R. Similar distribution of simple sequence repeats in diverse completed Human Immunodeficiency Virus Type 1 genomes. FEBS Lett 2009; 583:2959-63. [PMID: 19679131 DOI: 10.1016/j.febslet.2009.08.004] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2009] [Revised: 07/31/2009] [Accepted: 08/04/2009] [Indexed: 01/16/2023]
Abstract
The survey of simple sequence repeats (SSRs) has been extensively made in eukaryotes and prokaryotes. However, its still rare in viruses. Thus, we undertook a survey of SSRs in Human Immunodeficiency Virus Type 1 (HIV-1) which is an excellent system to study evolution and roles of SSRs in viruses. Distribution of SSRs was examined in 81 completed HIV-1 genome sequences which come from 34 different countries or districts over 6 continents. In these surveyed sequences, although relative abundance and relative density exhibit very high similarity, some of these sequences show different preference for most common SSRs and longest SSRs. Our results suggest proportion of various repeat types might be related to genome stability.
Collapse
Affiliation(s)
- Ming Chen
- Institute of Life Sciences and Biotechnology, Hunan University, Changsha, China
| | | | | | | | | | | | | |
Collapse
|
33
|
Giancarlo R, Scaturro D, Utro F. Textual data compression in computational biology: a synopsis. Bioinformatics 2009; 25:1575-86. [DOI: 10.1093/bioinformatics/btp117] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
|
34
|
Somers CM, Cooper DN. Air pollution and mutations in the germline: are humans at risk? Hum Genet 2008; 125:119-30. [PMID: 19112582 DOI: 10.1007/s00439-008-0613-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2008] [Accepted: 12/16/2008] [Indexed: 01/27/2023]
Abstract
Genotoxic air pollution is ubiquitous in urban and industrial areas. A variety of studies has linked human exposure to air pollution with a number of different somatic cell endpoints including cancer. However, the potential for inducing mutations in the human germline remains unclear. Sentinel animal studies of germline mutations at tandem-repeat loci (specifically minisatellites and expanded simple tandem repeats) have recently provided proof of principle that germline mutations can be induced in vertebrates (birds and mice) by air pollution under ambient conditions. Although humans may also be susceptible to induced germline mutations in polluted areas, uncertainties regarding causative agents, doses, and mutational mechanisms at repetitive DNA loci currently preclude extrapolation from animal data to the evaluation of human risk. Nevertheless, several recent studies have linked air pollution exposure to DNA damage in human sperm, indicating that our germ cells are not impervious to the genotoxic effects of air pollution. Thus, both sentinel animal and human studies have raised the possibility that ambient air pollution may increase human germline mutation rates, especially at repetitive DNA loci. Given that some human genetic conditions appear to be modulated by length mutations at tandem-repeat loci (e.g. HRAS1 cancers, type 1 diabetes, etc.), there is an urgent need for extensive study in this area. Research should be primarily focused upon: (1) the direct measurement of mutation frequencies at repetitive DNA loci in human male germ cells as a function of air pollution exposure, (2) large-scale epidemiology studies of inherited disorders and tandem-repeat associated genetic conditions and air pollution, and (3) the characterization of mutational mechanisms at hypervariable tandem-repeat loci.
Collapse
|