1
|
Liao X, Zhu W, Zhou J, Li H, Xu X, Zhang B, Gao X. Repetitive DNA sequence detection and its role in the human genome. Commun Biol 2023; 6:954. [PMID: 37726397 PMCID: PMC10509279 DOI: 10.1038/s42003-023-05322-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 09/04/2023] [Indexed: 09/21/2023] Open
Abstract
Repetitive DNA sequences playing critical roles in driving evolution, inducing variation, and regulating gene expression. In this review, we summarized the definition, arrangement, and structural characteristics of repeats. Besides, we introduced diverse biological functions of repeats and reviewed existing methods for automatic repeat detection, classification, and masking. Finally, we analyzed the type, structure, and regulation of repeats in the human genome and their role in the induction of complex diseases. We believe that this review will facilitate a comprehensive understanding of repeats and provide guidance for repeat annotation and in-depth exploration of its association with human diseases.
Collapse
Affiliation(s)
- Xingyu Liao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Wufei Zhu
- Department of Endocrinology, Yichang Central People's Hospital, The First College of Clinical Medical Science, China Three Gorges University, 443000, Yichang, P.R. China
| | - Juexiao Zhou
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Haoyang Li
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Xiaopeng Xu
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Bin Zhang
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia.
| |
Collapse
|
2
|
Zhu X, Guo L, Zhu R, Zhou X, Zhang J, Li D, He S, Qiao Y. Phytophthora sojae effector PsAvh113 associates with the soybean transcription factor GmDPB to inhibit catalase-mediated immunity. PLANT BIOTECHNOLOGY JOURNAL 2023. [PMID: 36972124 DOI: 10.1111/pbi.14043] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 02/17/2023] [Accepted: 02/28/2023] [Indexed: 06/18/2023]
Abstract
Phytophthora species are the most destructive plant pathogens worldwide and the main threat to agricultural and natural ecosystems; however, their pathogenic mechanism remains largely unknown. Here, we show that Avh113 effector is required for the virulence of Phytophthora sojae and is important for development of Phytophthora root and stem rot (PRSR) in soybean (Glycine max). Ectopic expression of PsAvh113 enhanced viral and Phytophthora infection in Nicotiana benthamiana. PsAvh113 directly associated with the soybean transcription factor GmDPB, inducing its degradation by the 26S proteasome. The internal repeat 2 (IR2) motif of PsAvh113 was important for its virulence and interaction with GmDPB, while silencing and overexpression of GmDPB in soybean hairy roots altered the resistance to P. sojae. Upon binding to GmDPB, PsAvh113 decreased the transcription of the downstream gene GmCAT1, which acts as a positive regulator of plant immunity. Furthermore, we revealed that PsAvh113 suppressed the GmCAT1-induced cell death by associating with GmDPB, thereby enhancing plant susceptibility to Phytophthora. Together, our findings reveal a vital role of PsAvh113 in inducing PRSR in soybean and offer a novel insight into the interplay between defence and counter-defence during the P. sojae infection of soybean.
Collapse
Affiliation(s)
- Xiaoguo Zhu
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Liang Guo
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Ruiqing Zhu
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Xiaoyi Zhou
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Jianing Zhang
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Die Li
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Shidan He
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Yongli Qiao
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai, China
| |
Collapse
|
3
|
Verbiest M, Maksimov M, Jin Y, Anisimova M, Gymrek M, Bilgin Sonay T. Mutation and selection processes regulating short tandem repeats give rise to genetic and phenotypic diversity across species. J Evol Biol 2023; 36:321-336. [PMID: 36289560 PMCID: PMC9990875 DOI: 10.1111/jeb.14106] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 06/29/2022] [Accepted: 08/01/2022] [Indexed: 02/03/2023]
Abstract
Short tandem repeats (STRs) are units of 1-6 bp that repeat in a tandem fashion in DNA. Along with single nucleotide polymorphisms and large structural variations, they are among the major genomic variants underlying genetic, and likely phenotypic, divergence. STRs experience mutation rates that are orders of magnitude higher than other well-studied genotypic variants. Frequent copy number changes result in a wide range of alleles, and provide unique opportunities for modulating complex phenotypes through variation in repeat length. While classical studies have identified key roles of individual STR loci, the advent of improved sequencing technology, high-quality genome assemblies for diverse species, and bioinformatics methods for genome-wide STR analysis now enable more systematic study of STR variation across wide evolutionary ranges. In this review, we explore mutation and selection processes that affect STR copy number evolution, and how these processes give rise to varying STR patterns both within and across species. Finally, we review recent examples of functional and adaptive changes linked to STRs.
Collapse
Affiliation(s)
- Max Verbiest
- Institute of Computational Life Sciences, School of Life Sciences and Facility ManagementZürich University of Applied SciencesWädenswilSwitzerland
- Department of Molecular Life SciencesUniversity of ZurichZurichSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Mikhail Maksimov
- Department of Computer Science & EngineeringUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Ye Jin
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of BioengineeringUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Maria Anisimova
- Institute of Computational Life Sciences, School of Life Sciences and Facility ManagementZürich University of Applied SciencesWädenswilSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Melissa Gymrek
- Department of Computer Science & EngineeringUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Tugce Bilgin Sonay
- Institute of Ecology, Evolution and Environmental BiologyColumbia UniversityNew YorkNew YorkUSA
| |
Collapse
|
4
|
Marwaha S, Knowles JW, Ashley EA. A guide for the diagnosis of rare and undiagnosed disease: beyond the exome. Genome Med 2022; 14:23. [PMID: 35220969 PMCID: PMC8883622 DOI: 10.1186/s13073-022-01026-w] [Citation(s) in RCA: 113] [Impact Index Per Article: 56.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 02/10/2022] [Indexed: 02/07/2023] Open
Abstract
Rare diseases affect 30 million people in the USA and more than 300-400 million worldwide, often causing chronic illness, disability, and premature death. Traditional diagnostic techniques rely heavily on heuristic approaches, coupling clinical experience from prior rare disease presentations with the medical literature. A large number of rare disease patients remain undiagnosed for years and many even die without an accurate diagnosis. In recent years, gene panels, microarrays, and exome sequencing have helped to identify the molecular cause of such rare and undiagnosed diseases. These technologies have allowed diagnoses for a sizable proportion (25-35%) of undiagnosed patients, often with actionable findings. However, a large proportion of these patients remain undiagnosed. In this review, we focus on technologies that can be adopted if exome sequencing is unrevealing. We discuss the benefits of sequencing the whole genome and the additional benefit that may be offered by long-read technology, pan-genome reference, transcriptomics, metabolomics, proteomics, and methyl profiling. We highlight computational methods to help identify regionally distant patients with similar phenotypes or similar genetic mutations. Finally, we describe approaches to automate and accelerate genomic analysis. The strategies discussed here are intended to serve as a guide for clinicians and researchers in the next steps when encountering patients with non-diagnostic exomes.
Collapse
Affiliation(s)
- Shruti Marwaha
- Department of Medicine, Division of Cardiovascular Medicine, School of Medicine, Stanford University, Stanford, CA, USA.
- Stanford Center for Undiagnosed Diseases, Stanford University, Stanford, CA, USA.
| | - Joshua W Knowles
- Department of Medicine, Division of Cardiovascular Medicine, School of Medicine, Stanford University, Stanford, CA, USA
- Department of Medicine, Diabetes Research Center, Cardiovascular Institute and Prevention Research Center, Stanford, CA, USA
| | - Euan A Ashley
- Department of Medicine, Division of Cardiovascular Medicine, School of Medicine, Stanford University, Stanford, CA, USA.
- Stanford Center for Undiagnosed Diseases, Stanford University, Stanford, CA, USA.
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA.
| |
Collapse
|
5
|
Pappalardo XG, Barra V. Losing DNA methylation at repetitive elements and breaking bad. Epigenetics Chromatin 2021; 14:25. [PMID: 34082816 PMCID: PMC8173753 DOI: 10.1186/s13072-021-00400-z] [Citation(s) in RCA: 57] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Accepted: 05/21/2021] [Indexed: 02/08/2023] Open
Abstract
Background DNA methylation is an epigenetic chromatin mark that allows heterochromatin formation and gene silencing. It has a fundamental role in preserving genome stability (including chromosome stability) by controlling both gene expression and chromatin structure. Therefore, the onset of an incorrect pattern of DNA methylation is potentially dangerous for the cells. This is particularly important with respect to repetitive elements, which constitute the third of the human genome. Main body Repetitive sequences are involved in several cell processes, however, due to their intrinsic nature, they can be a source of genome instability. Thus, most repetitive elements are usually methylated to maintain a heterochromatic, repressed state. Notably, there is increasing evidence showing that repetitive elements (satellites, long interspersed nuclear elements (LINEs), Alus) are frequently hypomethylated in various of human pathologies, from cancer to psychiatric disorders. Repetitive sequences’ hypomethylation correlates with chromatin relaxation and unscheduled transcription. If these alterations are directly involved in human diseases aetiology and how, is still under investigation. Conclusions Hypomethylation of different families of repetitive sequences is recurrent in many different human diseases, suggesting that the methylation status of these elements can be involved in preservation of human health. This provides a promising point of view towards the research of therapeutic strategies focused on specifically tuning DNA methylation of DNA repeats.
Collapse
Affiliation(s)
- Xena Giada Pappalardo
- Department of Biomedical and Biotechnological Sciences (BIOMETEC), University of Catania, 95125, Catania, Italy.,National Council of Research, Institute for Biomedical Research and Innovation (IRIB), Unit of Catania, 95125, Catania, Italy
| | - Viviana Barra
- Department of Biological, Chemical and Pharmaceutical Sciences and Technologies (STEBICEF), University of Palermo, 90128, Palermo, Italy.
| |
Collapse
|
6
|
Copy Number Variations of Glycoside Hydrolase 45 Genes in Bursaphelenchus xylophilus and Their Impact on the Pathogenesis of Pine Wilt Disease. FORESTS 2021. [DOI: 10.3390/f12030275] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The pine wood nematode Bursaphelenchus xylophilus parasitizes millions of pine trees worldwide each year, causing severe wilt and the death of host trees. Glycoside hydrolase 45 genes of B. xylophilus are reported to have been acquired by horizontal gene transfer from fungi and are responsible for cell wall degradation during nematode infection. Previous studies ignored the possibility of copy number variations of such genes. In this study, we determined that two of the glycoside hydrolase 45 genes evolved to maintain multiple copies with distinct expression levels, enabling the nematode to infect a variety of pine hosts. Additionally, tandem repeat variations within coding regions were also detected between different copies of glycoside hydrolase 45 genes that could result in changes in protein sequences and serve as an effective biological marker to detect copy number variations among different B. xylophilus populations. Consequently, we were able to further identify the copy number variations of glycoside hydrolase 45 genes among B. xylophilus strains with different virulence. Our results provide new insights into the pathogenicity of B. xylophilus, provide a practical marker to genotype copy number variations and may aid in population classification.
Collapse
|
7
|
Balzano E, Pelliccia F, Giunta S. Genome (in)stability at tandem repeats. Semin Cell Dev Biol 2020; 113:97-112. [PMID: 33109442 DOI: 10.1016/j.semcdb.2020.10.003] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Revised: 09/26/2020] [Accepted: 10/10/2020] [Indexed: 12/12/2022]
Abstract
Repeat sequences account for over half of the human genome and represent a significant source of variation that underlies physiological and pathological states. Yet, their study has been hindered due to limitations in short-reads sequencing technology and difficulties in assembly. A important category of repetitive DNA in the human genome is comprised of tandem repeats (TRs), where repetitive units are arranged in a head-to-tail pattern. Compared to other regions of the genome, TRs carry between 10 and 10,000 fold higher mutation rate. There are several mutagenic mechanisms that can give rise to this propensity toward instability, but their precise contribution remains speculative. Given the high degree of homology between these sequences and their arrangement in tandem, once damaged, TRs have an intrinsic propensity to undergo aberrant recombination with non-allelic exchange and generate harmful rearrangements that may undermine the stability of the entire genome. The dynamic mutagenesis at TRs has been found to underlie individual polymorphism associated with neurodegenerative and neuromuscular disorders, as well as complex genetic diseases like cancer and diabetes. Here, we review our current understanding of the surveillance and repair mechanisms operating within these regions, and we describe how alterations in these protective processes can readily trigger mutational signatures found at TRs, ultimately resulting in the pathological correlation between TRs instability and human diseases. Finally, we provide a viewpoint to counter the detrimental effects that TRs pose in light of their selection and conservation, as important drivers of human evolution.
Collapse
Affiliation(s)
- Elisa Balzano
- Dipartimento di Biologia e Biotecnologie "Charles Darwin", Sapienza Università di Roma, 00185 Roma, Italy
| | - Franca Pelliccia
- Dipartimento di Biologia e Biotecnologie "Charles Darwin", Sapienza Università di Roma, 00185 Roma, Italy
| | - Simona Giunta
- The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA; Dipartimento di Biologia e Biotecnologie "Charles Darwin", Sapienza Università di Roma, 00185 Roma, Italy.
| |
Collapse
|
8
|
Huang Y, Huang X, Zhou X, Wang J, Zhang R, Ma F, Wang K, Zhang Z, Dai X, Cao X, Zhang C, Han K, Ren Q. Immune activation by a multigene family of lectins with variable tandem repeats in oriental river prawn ( Macrobrachium nipponense). Open Biol 2020; 10:200141. [PMID: 32931720 PMCID: PMC7536079 DOI: 10.1098/rsob.200141] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Genomic regions with repeated sequences are unstable and prone to rapid DNA diversification. However, the role of tandem repeats within the coding region is not fully characterized. Here, we have identified a new hypervariable C-type lectin gene family with different numbers of tandem repeats (Rlecs; R means repeat) in oriental river prawn (Macrobrachium nipponense). Two types of repeat units (33 or 30 bp) are identified in the second exon, and the number of repeat units vary from 1 to 9. Rlecs can be classified into 15 types through phylogenetic analysis. The amino acid sequences in the same type of Rlec are highly conservative outside the repeat regions. The main differences among the Rlec types are evident in exon 5. A variable number of tandem repeats in Rlecs may be produced by slip mispairing during gene replication. Alternative splicing contributes to the multiplicity of forms in this lectin gene family, and different types of Rlecs vary in terms of tissue distribution, expression quantity and response to bacterial challenge. These variations suggest that Rlecs have functional diversity. The results of experiments on sugar binding, microbial inhibition and clearance, regulation of antimicrobial peptide gene expression and prophenoloxidase activation indicate that the function of Rlecs with the motif of YRSKDD in innate immunity is enhanced when the number of tandem repeats increases. Our results suggest that Rlecs undergo gene expansion through gene duplication and alternative splicing, which ultimately leads to functional diversity.
Collapse
Affiliation(s)
- Ying Huang
- College of Marine Science and Engineering, Nanjing Normal University, 1 Wenyuan Road, Nanjing, Jiangsu 210023, People's Republic of China.,College of Oceanography, Hohai University, 1 Xikang Road, Nanjing, Jiangsu 210098, People's Republic of China
| | - Xin Huang
- College of Marine Science and Engineering, Nanjing Normal University, 1 Wenyuan Road, Nanjing, Jiangsu 210023, People's Republic of China
| | - Xuming Zhou
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Jialin Wang
- Hubei Key Laboratory of Genetic Regulation and Integrative Biology, School of Life Sciences, Central China Normal University, Wuhan 430079, People's Republic of China
| | - Ruidong Zhang
- College of Marine Science and Engineering, Nanjing Normal University, 1 Wenyuan Road, Nanjing, Jiangsu 210023, People's Republic of China
| | - Futong Ma
- College of Marine Science and Engineering, Nanjing Normal University, 1 Wenyuan Road, Nanjing, Jiangsu 210023, People's Republic of China
| | - Kaiqiang Wang
- College of Marine Science and Engineering, Nanjing Normal University, 1 Wenyuan Road, Nanjing, Jiangsu 210023, People's Republic of China
| | - Zhuoxing Zhang
- College of Marine Science and Engineering, Nanjing Normal University, 1 Wenyuan Road, Nanjing, Jiangsu 210023, People's Republic of China
| | - Xiaoling Dai
- College of Marine Science and Engineering, Nanjing Normal University, 1 Wenyuan Road, Nanjing, Jiangsu 210023, People's Republic of China
| | - Xueying Cao
- College of Marine Science and Engineering, Nanjing Normal University, 1 Wenyuan Road, Nanjing, Jiangsu 210023, People's Republic of China
| | - Chao Zhang
- College of Marine Science and Engineering, Nanjing Normal University, 1 Wenyuan Road, Nanjing, Jiangsu 210023, People's Republic of China
| | - Keke Han
- College of Marine Science and Engineering, Nanjing Normal University, 1 Wenyuan Road, Nanjing, Jiangsu 210023, People's Republic of China
| | - Qian Ren
- College of Marine Science and Engineering, Nanjing Normal University, 1 Wenyuan Road, Nanjing, Jiangsu 210023, People's Republic of China.,Co-Innovation Center for Marine Bio-Industry Technology of Jiangsu Province, Lianyungang, Jiangsu 222005, People's Republic of China
| |
Collapse
|
9
|
Guerrero-Bosagna C. From epigenotype to new genotypes: Relevance of epigenetic mechanisms in the emergence of genomic evolutionary novelty. Semin Cell Dev Biol 2020; 97:86-92. [DOI: 10.1016/j.semcdb.2019.07.006] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Revised: 07/08/2019] [Accepted: 07/08/2019] [Indexed: 11/24/2022]
|
10
|
|
11
|
Genome-wide investigation of microsatellite polymorphism in coding region of the giant panda (Ailuropoda melanoleuca) genome: a resource for study of phenotype diversity and abnormal traits. MAMMAL RES 2019. [DOI: 10.1007/s13364-019-00418-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
12
|
Pértille F, Da Silva VH, Johansson AM, Lindström T, Wright D, Coutinho LL, Jensen P, Guerrero-Bosagna C. Mutation dynamics of CpG dinucleotides during a recent event of vertebrate diversification. Epigenetics 2019; 14:685-707. [PMID: 31070073 PMCID: PMC6557589 DOI: 10.1080/15592294.2019.1609868] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
DNA methylation in CpGs dinucleotides is associated with high mutability and disappearance of CpG sites during evolution. Although the high mutability of CpGs is thought to be relevant for vertebrate evolution, very little is known on the role of CpG-related mutations in the genomic diversification of vertebrates. Our study analysed genetic differences in chickens, between Red Junglefowl (RJF; the living closest relative to the ancestor of domesticated chickens) and domesticated breeds, to identify genomic dynamics that have occurred during the process of their domestication, focusing particularly on CpG-related mutations. Single nucleotide polymorphisms (SNPs) and copy number variations (CNVs) between RJF and these domesticated breeds were assessed in a reduced fraction of their genome. Additionally, DNA methylation in the same fraction of the genome was measured in the sperm of RJF individuals to identify possible correlations with the mutations found between RJF and the domesticated breeds. Our study shows that although the vast majority of CpG-related mutations found relate to CNVs, CpGs disproportionally associate to SNPs in comparison to CNVs, where they are indeed substantially under-represented. Moreover, CpGs seem to be hotspots of mutations related to speciation. We suggest that, on the one hand, CpG-related mutations in CNV regions would promote genomic ‘flexibility’ in evolution, i.e., the ability of the genome to expand its functional possibilities; on the other hand, CpG-related mutations in SNPs would relate to genomic ‘specificity’ in evolution, thus, representing mutations that would associate with phenotypic traits relevant for speciation.
Collapse
Affiliation(s)
- Fábio Pértille
- a Avian Behavioral Genomics and Physiology Group, IFM Biology , Linköping University , Linköping , Sweden.,b Animal Biotechnology Laboratory, Animal Science Department , University of São Paulo (USP)/Luiz de Queiroz College of Agriculture (ESALQ) , Piracicaba , São Paulo , Brazil
| | - Vinicius H Da Silva
- c Animal Breeding and Genomics Centre , Wageningen University & Research , Wageningen , The Netherlands.,d Department of Animal Ecology (AnE) , Netherlands Institute of Ecology (NIOO-KNAW) , Wageningen , The Netherlands.,e Department of Animal Breeding and Genetics , Swedish University of Agricultural Sciences , Uppsala , Sweden
| | - Anna M Johansson
- e Department of Animal Breeding and Genetics , Swedish University of Agricultural Sciences , Uppsala , Sweden
| | - Tom Lindström
- f Division of Theoretical Biology, IFM , Linköping University , Linköping , Sweden
| | - Dominic Wright
- a Avian Behavioral Genomics and Physiology Group, IFM Biology , Linköping University , Linköping , Sweden
| | - Luiz L Coutinho
- b Animal Biotechnology Laboratory, Animal Science Department , University of São Paulo (USP)/Luiz de Queiroz College of Agriculture (ESALQ) , Piracicaba , São Paulo , Brazil
| | - Per Jensen
- a Avian Behavioral Genomics and Physiology Group, IFM Biology , Linköping University , Linköping , Sweden
| | - Carlos Guerrero-Bosagna
- a Avian Behavioral Genomics and Physiology Group, IFM Biology , Linköping University , Linköping , Sweden
| |
Collapse
|
13
|
de Groot T, Meis JF. Microsatellite Stability in STR Analysis Aspergillus fumigatus Depends on Number of Repeat Units. Front Cell Infect Microbiol 2019; 9:82. [PMID: 30984630 PMCID: PMC6449440 DOI: 10.3389/fcimb.2019.00082] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 03/11/2019] [Indexed: 01/02/2023] Open
Abstract
More than a decade ago a short tandem repeat-based typing method was developed for the fungus Aspergillus fumigatus. This STRAf assay is based on the analysis of nine short tandem repeat markers. Interpretation of this STRAf assay is complicated when there are only one or two differences in tandem repeat markers between isolates, as the stability of these markers is unknown. To determine the stability of these nine markers, a STRAf assay was performed on 73–100 successive generations of five clonally expanded A. fumigatus isolates. In a total of 473 generations we found five times an increase of one tandem repeat unit. Three changes were found in the trinucleotide repeat marker STRAf 3A, while the other two were found in the trinucleotide repeat marker STRAf 3C. The di- or tetranucleotide repeats were not altered. The altered STRAf markers 3A and 3C demonstrated the highest number of repeat units (≥50) as compared to the other markers (≤26). Altogether, we demonstrated that 7 of 9 STRAf markers remain stable for 473 generations and that the frequency of alterations in tandem repeats is positively correlated with the number of repeats. The potential low level instability of STRAf markers 3A and 3C should be taken into account when interpreting STRAf data during an outbreak.
Collapse
Affiliation(s)
- Theun de Groot
- Department of Medical Microbiology and Infectious Diseases, Canisius Wilhelmina Hospital (CWZ), Nijmegen, Netherlands
| | - Jacques F Meis
- Department of Medical Microbiology and Infectious Diseases, Canisius Wilhelmina Hospital (CWZ), Nijmegen, Netherlands.,Centre of Expertise in Mycology, Radboudumc/CWZ, Nijmegen, Netherlands.,Department of Medical Microbiology, Radboudumc, Nijmegen, Netherlands
| |
Collapse
|
14
|
Ma LS, Wang L, Trippel C, Mendoza-Mendoza A, Ullmann S, Moretti M, Carsten A, Kahnt J, Reissmann S, Zechmann B, Bange G, Kahmann R. The Ustilago maydis repetitive effector Rsp3 blocks the antifungal activity of mannose-binding maize proteins. Nat Commun 2018; 9:1711. [PMID: 29703884 PMCID: PMC5923269 DOI: 10.1038/s41467-018-04149-0] [Citation(s) in RCA: 73] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Accepted: 04/06/2018] [Indexed: 12/22/2022] Open
Abstract
To cause disease in maize, the biotrophic fungus Ustilago maydis secretes a large arsenal of effector proteins. Here, we functionally characterize the repetitive effector Rsp3 (repetitive secreted protein 3), which shows length polymorphisms in field isolates and is highly expressed during biotrophic stages. Rsp3 is required for virulence and anthocyanin accumulation. During biotrophic growth, Rsp3 decorates the hyphal surface and interacts with at least two secreted maize DUF26-domain family proteins (designated AFP1 and AFP2). AFP1 binds mannose and displays antifungal activity against the rsp3 mutant but not against a strain constitutively expressing rsp3. Maize plants silenced for AFP1 and AFP2 partially rescue the virulence defect of rsp3 mutants, suggesting that blocking the antifungal activity of AFP1 and AFP2 by the Rsp3 effector is an important virulence function. Rsp3 orthologs are present in all sequenced smut fungi, and the ortholog from Sporisorium reilianum can complement the rsp3 mutant of U. maydis, suggesting a novel widespread fungal protection mechanism. The fungus Ustilago maydis secretes many effector proteins to cause disease in maize. Here, Ma et al. show that the repetitive effector Rsp3 is required for virulence by inhibiting the antifungal activity of two mannose-binding proteins that are secreted by the plant cells.
Collapse
Affiliation(s)
- Lay-Sun Ma
- Department of Organismic Interactions, Max Planck Institute for Terrestrial Microbiology, 35043, Marburg, Germany
| | - Lei Wang
- Department of Organismic Interactions, Max Planck Institute for Terrestrial Microbiology, 35043, Marburg, Germany.,Department of Pharmacology, Max Planck Institute for Heart and Lung Research, 61231, Bad Nauheim, Germany
| | - Christine Trippel
- Department of Organismic Interactions, Max Planck Institute for Terrestrial Microbiology, 35043, Marburg, Germany.,Department of Plant Cell Biology, Albrecht-von-Haller-Institute, Georg-August-University-Göttingen, 37077, Göttingen, Germany
| | - Artemio Mendoza-Mendoza
- Department of Organismic Interactions, Max Planck Institute for Terrestrial Microbiology, 35043, Marburg, Germany.,Bio-Protection Research Centre, Lincoln University, PO Box 64, Lincoln, 7647, New Zealand
| | - Steffen Ullmann
- Department of Organismic Interactions, Max Planck Institute for Terrestrial Microbiology, 35043, Marburg, Germany.,, Düsseldorfer Straße 177, 45481, Mülheim an der Ruhr, Germany
| | - Marino Moretti
- Department of Organismic Interactions, Max Planck Institute for Terrestrial Microbiology, 35043, Marburg, Germany
| | - Alexander Carsten
- Department of Organismic Interactions, Max Planck Institute for Terrestrial Microbiology, 35043, Marburg, Germany
| | - Jörg Kahnt
- Mass Spectroscopy Facility, Max Planck Institute for Terrestrial Microbiology, 35043 Marburg, Germany
| | - Stefanie Reissmann
- Department of Organismic Interactions, Max Planck Institute for Terrestrial Microbiology, 35043, Marburg, Germany
| | - Bernd Zechmann
- Center for Microscopy and Imaging (CMI), Baylor University, Waco, Texas, 76798-7046, USA
| | - Gert Bange
- LOEWE Center for Synthetic Microbiology and Faculty of Chemistry, Philipps-Universität Marburg, 35032 Marburg, Germany
| | - Regine Kahmann
- Department of Organismic Interactions, Max Planck Institute for Terrestrial Microbiology, 35043, Marburg, Germany.
| |
Collapse
|
15
|
Xiao S, Han Z, Wang P, Han F, Liu Y, Li J, Wang ZY. Functional marker detection and analysis on a comprehensive transcriptome of large yellow croaker by next generation sequencing. PLoS One 2015; 10:e0124432. [PMID: 25909910 PMCID: PMC4409302 DOI: 10.1371/journal.pone.0124432] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2015] [Accepted: 03/15/2015] [Indexed: 01/08/2023] Open
Abstract
Large yellow croaker (Larimichthys crocea) is an important economic fish in China and Eastern Asia. Because of the exhaustive fishing and overdense aquaculture, the wild population and the mariculture of the species are facing serious challenges on germplasm degeneration and susceptibility to infectious disease agents. However, a comprehensive transcriptome from multi-tissues of the species has not been reported and functional molecular markers have not yet been detected and analyzed. In this work, we applied RNA-seq with the Illumina Hiseq2000 platform for a multi-tissue sample of large yellow croaker and assembled the transcriptome into 88,103 transcripts. Of them, 52,782 transcripts have been successfully annotated by nt/nr, InterPro, GO and KEGG database. Comparing with public fish proteins, we have found that 34,576 protein coding transcripts are shared in large yellow croaker with zebrafish, medaka, pufferfish, and stickleback. For functional markers, we have discovered 1,276 polymorphic SSRs and 261, 000 SNPs. The functional impact analysis of SNPs showed that the majority (~75%) of small variants cause synonymous mutations in proteins, followed by variations in 3' UTR region. The functional enrichment analysis illuminated that transcripts involved in DNA bindings, enzyme activities, and signal pathways prominently exhibit less single-nucleotide variants but genes for the constituent of the muscular tissue, the cytoskeleton, and the immunity system contain more frequent SNP mutations, which may reflect the structural and functional selections of the translated proteins. This is the first work for the high-throughput detection and analysis of functional polymorphic SSR and SNP markers in a comprehensive transcriptome of large yellow croaker. Our study provides valuable transcript sequence and functional marker resources for the quantitative trait locus (QTL) identification and molecular selection of the species in the research community.
Collapse
Affiliation(s)
- Shijun Xiao
- Key Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture; Fisheries College, Jimei University, Xiamen, Fujian, China
| | - Zhaofang Han
- Key Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture; Fisheries College, Jimei University, Xiamen, Fujian, China
| | - Panpan Wang
- Key Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture; Fisheries College, Jimei University, Xiamen, Fujian, China
| | - Fang Han
- Key Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture; Fisheries College, Jimei University, Xiamen, Fujian, China
| | - Yang Liu
- Key Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture; Fisheries College, Jimei University, Xiamen, Fujian, China
| | - Jiongtang Li
- Chinese Academy of Fishery Sciences, Beijing, China
| | - Zhi Yong Wang
- Key Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture; Fisheries College, Jimei University, Xiamen, Fujian, China
- * E-mail:
| |
Collapse
|
16
|
Carlson KD, Sudmant PH, Press MO, Eichler EE, Shendure J, Queitsch C. MIPSTR: a method for multiplex genotyping of germline and somatic STR variation across many individuals. Genome Res 2015; 25:750-61. [PMID: 25659649 PMCID: PMC4417122 DOI: 10.1101/gr.182212.114] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2014] [Accepted: 02/05/2015] [Indexed: 12/21/2022]
Abstract
Short tandem repeats (STRs) are highly mutable genetic elements that often reside in regulatory and coding DNA. The cumulative evidence of genetic studies on individual STRs suggests that STR variation profoundly affects phenotype and contributes to trait heritability. Despite recent advances in sequencing technology, STR variation has remained largely inaccessible across many individuals compared to single nucleotide variation or copy number variation. STR genotyping with short-read sequence data is confounded by (1) the difficulty of uniquely mapping short, low-complexity reads; and (2) the high rate of STR amplification stutter. Here, we present MIPSTR, a robust, scalable, and affordable method that addresses these challenges. MIPSTR uses targeted capture of STR loci by single-molecule Molecular Inversion Probes (smMIPs) and a unique mapping strategy. Targeted capture and our mapping strategy resolve the first challenge; the use of single molecule information resolves the second challenge. Unlike previous methods, MIPSTR is capable of distinguishing technical error due to amplification stutter from somatic STR mutations. In proof-of-principle experiments, we use MIPSTR to determine germline STR genotypes for 102 STR loci with high accuracy across diverse populations of the plant A. thaliana. We show that putatively functional STRs may be identified by deviation from predicted STR variation and by association with quantitative phenotypes. Using DNA mixing experiments and a mutant deficient in DNA repair, we demonstrate that MIPSTR can detect low-frequency somatic STR variants. MIPSTR is applicable to any organism with a high-quality reference genome and is scalable to genotyping many thousands of STR loci in thousands of individuals.
Collapse
Affiliation(s)
- Keisha D Carlson
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Peter H Sudmant
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Maximilian O Press
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA; Howard Hughes Medical Institute, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Christine Queitsch
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
17
|
Chen Q, Luo H, Zhang C, Chen YPP. Bioinformatics in protein kinases regulatory network and drug discovery. Math Biosci 2015; 262:147-56. [PMID: 25656386 DOI: 10.1016/j.mbs.2015.01.010] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2014] [Revised: 01/16/2015] [Accepted: 01/22/2015] [Indexed: 10/24/2022]
Abstract
Protein kinases have been implicated in a number of diseases, where kinases participate many aspects that control cell growth, movement and death. The deregulated kinase activities and the knowledge of these disorders are of great clinical interest of drug discovery. The most critical issue is the development of safe and efficient disease diagnosis and treatment for less cost and in less time. It is critical to develop innovative approaches that aim at the root cause of a disease, not just its symptoms. Bioinformatics including genetic, genomic, mathematics and computational technologies, has become the most promising option for effective drug discovery, and has showed its potential in early stage of drug-target identification and target validation. It is essential that these aspects are understood and integrated into new methods used in drug discovery for diseases arisen from deregulated kinase activity. This article reviews bioinformatics techniques for protein kinase data management and analysis, kinase pathways and drug targets and describes their potential application in pharma ceutical industry.
Collapse
Affiliation(s)
- Qingfeng Chen
- School of Computer, Electronic and Information, Guangxi University, Nanning, 530004, China; State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, China.
| | - Haiqiong Luo
- School of Public Health, Guangxi Medical University, Nanning, 530021, China.
| | - Chengqi Zhang
- Centre for Quantum Computation & Intelligent Systems, University of Technology, Sydney P.O. Box 123, Broadway, NSW 2007, Australia.
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Computer Engineering, La Trobe University, Vic 3086, Australia.
| |
Collapse
|
18
|
Press MO, Carlson KD, Queitsch C. The overdue promise of short tandem repeat variation for heritability. Trends Genet 2014; 30:504-12. [PMID: 25182195 DOI: 10.1016/j.tig.2014.07.008] [Citation(s) in RCA: 65] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2014] [Revised: 07/23/2014] [Accepted: 07/24/2014] [Indexed: 12/11/2022]
Abstract
Short tandem repeat (STR) variation has been proposed as a major explanatory factor in the heritability of complex traits in humans and model organisms. However, we still struggle to incorporate STR variation into genotype-phenotype maps. We review here the promise of STRs in contributing to complex trait heritability and highlight the challenges that STRs pose due to their repetitive nature. We argue that STR variants are more likely than single-nucleotide variants to have epistatic interactions, reiterate the need for targeted assays to genotype STRs accurately, and call for more appropriate statistical methods in detecting STR-phenotype associations. Lastly, we suggest that somatic STR variation within individuals may serve as a read-out of disease susceptibility, and is thus potentially a valuable covariate for future association studies.
Collapse
Affiliation(s)
- Maximilian O Press
- Department of Genome Sciences, University of Washington, Foege Building S-250, Box 355065, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Keisha D Carlson
- Department of Genome Sciences, University of Washington, Foege Building S-250, Box 355065, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Christine Queitsch
- Department of Genome Sciences, University of Washington, Foege Building S-250, Box 355065, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA.
| |
Collapse
|
19
|
Duitama J, Zablotskaya A, Gemayel R, Jansen A, Belet S, Vermeesch JR, Verstrepen KJ, Froyen G. Large-scale analysis of tandem repeat variability in the human genome. Nucleic Acids Res 2014; 42:5728-41. [PMID: 24682812 PMCID: PMC4027155 DOI: 10.1093/nar/gku212] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Tandem repeats are short DNA sequences that are repeated head-to-tail with a propensity to be variable. They constitute a significant proportion of the human genome, also occurring within coding and regulatory regions. Variation in these repeats can alter the function and/or expression of genes allowing organisms to swiftly adapt to novel environments. Importantly, some repeat expansions have also been linked to certain neurodegenerative diseases. Therefore, accurate sequencing of tandem repeats could contribute to our understanding of common phenotypic variability and might uncover missing genetic factors in idiopathic clinical conditions. However, despite long-standing evidence for the functional role of repeats, they are largely ignored because of technical limitations in sequencing, mapping and typing. Here, we report on a novel capture technique and data filtering protocol that allowed simultaneous sequencing of thousands of tandem repeats in the human genomes of a three generation family using GS-FLX-plus Titanium technology. Our results demonstrated that up to 7.6% of tandem repeats in this family (4% in coding sequences) differ from the reference sequence, and identified a de novo variation in the family tree. The method opens new routes to look at this underappreciated type of genetic variability, including the identification of novel disease-related repeats.
Collapse
Affiliation(s)
- Jorge Duitama
- VIB lab for Systems Biology & CMPG Lab for Genetics and Genomics, KU Leuven, B-3001 Leuven, Belgium Agrobiodiversity Research Area, International Center for Tropical Agriculture (CIAT), Cali, Colombia
| | - Alena Zablotskaya
- Human Genome Laboratory, VIB Center for the Biology of Disease, Leuven, Belgium Human Genome Laboratory, Department of Human Genetics, KU Leuven, B-3000 Leuven, Belgium
| | - Rita Gemayel
- VIB lab for Systems Biology & CMPG Lab for Genetics and Genomics, KU Leuven, B-3001 Leuven, Belgium
| | - An Jansen
- VIB lab for Systems Biology & CMPG Lab for Genetics and Genomics, KU Leuven, B-3001 Leuven, Belgium Human Genome Laboratory, VIB Center for the Biology of Disease, Leuven, Belgium Human Genome Laboratory, Department of Human Genetics, KU Leuven, B-3000 Leuven, Belgium
| | - Stefanie Belet
- Human Genome Laboratory, VIB Center for the Biology of Disease, Leuven, Belgium Human Genome Laboratory, Department of Human Genetics, KU Leuven, B-3000 Leuven, Belgium
| | - Joris R Vermeesch
- Center for Human Genetics, University Hospitals Leuven, KU Leuven, B-3000 Leuven, Belgium
| | - Kevin J Verstrepen
- VIB lab for Systems Biology & CMPG Lab for Genetics and Genomics, KU Leuven, B-3001 Leuven, Belgium
| | - Guy Froyen
- Human Genome Laboratory, VIB Center for the Biology of Disease, Leuven, Belgium Human Genome Laboratory, Department of Human Genetics, KU Leuven, B-3000 Leuven, Belgium
| |
Collapse
|
20
|
Alam CM, Singh AK, Sharfuddin C, Ali S. Incidence, complexity and diversity of simple sequence repeats across potexvirus genomes. Gene 2014; 537:189-96. [DOI: 10.1016/j.gene.2014.01.007] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2013] [Revised: 11/15/2013] [Accepted: 01/04/2014] [Indexed: 01/18/2023]
|
21
|
In-silico analysis of simple and imperfect microsatellites in diverse tobamovirus genomes. Gene 2013; 530:193-200. [DOI: 10.1016/j.gene.2013.08.046] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Revised: 08/10/2013] [Accepted: 08/13/2013] [Indexed: 11/20/2022]
|
22
|
Churbanov A, Ryan R, Hasan N, Bailey D, Chen H, Milligan B, Houde P. HighSSR: high-throughput SSR characterization and locus development from next-gen sequencing data. ACTA ACUST UNITED AC 2012; 28:2797-803. [PMID: 22954626 DOI: 10.1093/bioinformatics/bts524] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
MOTIVATION Microsatellites are among the most useful genetic markers in population biology. High-throughput sequencing of microsatellite-enriched libraries dramatically expedites the traditional process of screening recombinant libraries for microsatellite markers. However, sorting through millions of reads to distill high-quality polymorphic markers requires special algorithms tailored to tolerate sequencing errors in locus reconstruction, distinguish paralogous loci, rarify raw reads originating from the same amplicon and sort out various artificial fragments resulting from recombination or concatenation of auxiliary adapters. Existing programs warrant improvement. RESULTS We describe a microsatellite prediction framework named HighSSR for microsatellite genotyping based on high-throughput sequencing. We demonstrate the utility of HighSSR in comparison to Roche gsAssembler on two Roche 454 GS FLX runs. The majority of the HighSSR-assembled loci were reliably mapped against model organism reference genomes. HighSSR demultiplexes pooled libraries, assesses locus polymorphism and implements Primer3 for the design of PCR primers flanking polymorphic microsatellite loci. As sequencing costs drop and permit the analysis of all project samples on next-generation platforms, this framework can also be used for direct simple sequence repeats genotyping. AVAILABILITY http://code.google.com/p/highssr/
Collapse
Affiliation(s)
- Alexander Churbanov
- New Mexico State University, Biology Deptartment, MSC 3AF, PO Box 30001, Las Cruces, NM 88003, USA.
| | | | | | | | | | | | | |
Collapse
|
23
|
Gemayel R, Cho J, Boeynaems S, Verstrepen KJ. Beyond junk-variable tandem repeats as facilitators of rapid evolution of regulatory and coding sequences. Genes (Basel) 2012; 3:461-80. [PMID: 24704980 PMCID: PMC3899988 DOI: 10.3390/genes3030461] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2012] [Revised: 07/19/2012] [Accepted: 07/21/2012] [Indexed: 01/19/2023] Open
Abstract
Copy Number Variations (CNVs) and Single Nucleotide Polymorphisms (SNPs) have been the major focus of most large-scale comparative genomics studies to date. Here, we discuss a third, largely ignored, type of genetic variation, namely changes in tandem repeat number. Historically, tandem repeats have been designated as non functional “junk” DNA, mostly as a result of their highly unstable nature. With the exception of tandem repeats involved in human neurodegenerative diseases, repeat variation was often believed to be neutral with no phenotypic consequences. Recent studies, however, have shown that as many as 10% to 20% of coding and regulatory sequences in eukaryotes contain an unstable repeat tract. Contrary to initial suggestions, tandem repeat variation can have useful phenotypic consequences. Examples include rapid variation in microbial cell surface, tuning of internal molecular clocks in flies and the dynamic morphological plasticity in mammals. As such, tandem repeats can be useful functional elements that facilitate evolvability and rapid adaptation.
Collapse
Affiliation(s)
- Rita Gemayel
- Laboratory for Systems Biology, VIB, Gaston Geenslaan 1, B-3001 Heverlee, Belgium.
| | - Janice Cho
- Laboratory for Systems Biology, VIB, Gaston Geenslaan 1, B-3001 Heverlee, Belgium.
| | - Steven Boeynaems
- Laboratory for Systems Biology, VIB, Gaston Geenslaan 1, B-3001 Heverlee, Belgium.
| | - Kevin J Verstrepen
- Laboratory for Systems Biology, VIB, Gaston Geenslaan 1, B-3001 Heverlee, Belgium.
| |
Collapse
|
24
|
Pellegrini M, Renda ME, Vecchio A. Tandem repeats discovery service (TReaDS) applied to finding novel cis-acting factors in repeat expansion diseases. BMC Bioinformatics 2012; 13 Suppl 4:S3. [PMID: 22536970 PMCID: PMC3303744 DOI: 10.1186/1471-2105-13-s4-s3] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Background Tandem repeats are multiple duplications of substrings in the DNA that occur contiguously, or at a short distance, and may involve some mutations (such as substitutions, insertions, and deletions). Tandem repeats have been extensively studied also for their association with the class of repeat expansion diseases (mostly affecting the nervous system). Comparative studies on the output of different tools for finding tandem repeats highlighted significant differences among the sets of detected tandem repeats, while many authors pointed up how critical it is the right choice of parameters. Results In this paper we present TReaDS - Tandem Repeats Discovery Service, a tandem repeat meta search engine. TReaDS forwards user requests to several state of the art tools for finding tandem repeats and merges their outcome into a single report, providing a global, synthetic, and comparative view of the results. In particular, TReaDS allows the user to (i) simultaneously run different algorithms on the same data set, (ii) choose for each algorithm a different setting of parameters, and (iii) obtain a report that can be downloaded for further, off-line, investigations. We used TReaDS to investigate sequences associated with repeat expansion diseases. Conclusions By using the tool TReaDS we discover that, for 27 repeat expansion diseases out of a currently known set of 29, long fuzzy tandem repeats are covering the expansion loci. Tests with control sets confirm the specificity of this association. This finding suggests that long fuzzy tandem repeats can be a new class of cis-acting elements involved in the mechanisms leading to the expansion instability. We strongly believe that biologists can be interested in a tool that, not only gives them the possibility of using multiple search algorithm at the same time, with the same effort exerted in using just one of the systems, but also simplifies the burden of comparing and merging the results, thus expanding our capabilities in detecting important phenomena related to tandem repeats.
Collapse
Affiliation(s)
- Marco Pellegrini
- Istituto di Informatica e Telematica, Consiglio Nazionale delle Ricerche, Pisa I-56124, Italy
| | | | | |
Collapse
|
25
|
Chen M, Tan Z, Zeng G. Microsatellite is an important component of complete Hepatitis C virus genomes. INFECTION GENETICS AND EVOLUTION 2011; 11:1646-54. [DOI: 10.1016/j.meegid.2011.06.012] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2011] [Revised: 06/02/2011] [Accepted: 06/16/2011] [Indexed: 12/15/2022]
|
26
|
Calvo-Bado LA, Green LE, Medley GF, Ul-Hassan A, Grogono-Thomas R, Buller N, Kaler J, Russell CL, Kennan RM, Rood JI, Wellington EMH. Detection and diversity of a putative novel heterogeneous polymorphic proline-glycine repeat (Pgr) protein in the footrot pathogen Dichelobacter nodosus. Vet Microbiol 2011; 147:358-66. [PMID: 20655152 DOI: 10.1016/j.vetmic.2010.06.024] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2009] [Revised: 06/23/2010] [Accepted: 06/25/2010] [Indexed: 11/29/2022]
Abstract
Dichelobacter nodosus, a Gram-negative anaerobic bacterium, is the essential causative agent of footrot in sheep. Currently, depending on the clinical presentation in the field, footrot is described as benign or virulent; D. nodosus strains have also been classified as benign or virulent, but this designation is not always consistent with clinical disease. The aim of this study was to determine the diversity of the pgr gene, which encodes a putative proline-glycine repeat protein (Pgr). The pgr gene was present in all 100 isolates of D. nodosus that were examined and, based on sequence analysis had two variants, pgrA and pgrB. In pgrA, there were two coding tandem repeat regions, R1 and R2: different strains had variable numbers of repeats within these regions. The R1 and R2 were absent from pgrB. Both variants were present in strains from Australia, Sweden and the UK, however, only pgrB was detected in isolates from Western Australia. The pgrA gene was detected in D. nodosus from tissue samples from two flocks in the UK with virulent footrot and only pgrB from a flock with no virulent or benign footrot for >10 years. Bioinformatic analysis of the putative PgrA protein indicated that it contained a collagen-like cell surface anchor motif. These results suggest that the pgr gene may be a useful molecular marker for epidemiological studies.
Collapse
Affiliation(s)
- Leo A Calvo-Bado
- Department of Biological Sciences, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, UK.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Pellegrini M, Renda ME, Vecchio A. TRStalker: an efficient heuristic for finding fuzzy tandem repeats. ACTA ACUST UNITED AC 2010; 26:i358-66. [PMID: 20529928 PMCID: PMC2881393 DOI: 10.1093/bioinformatics/btq209] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Motivation: Genomes in higher eukaryotic organisms contain a substantial amount of repeated sequences. Tandem Repeats (TRs) constitute a large class of repetitive sequences that are originated via phenomena such as replication slippage and are characterized by close spatial contiguity. They play an important role in several molecular regulatory mechanisms, and also in several diseases (e.g. in the group of trinucleotide repeat disorders). While for TRs with a low or medium level of divergence the current methods are rather effective, the problem of detecting TRs with higher divergence (fuzzy TRs) is still open. The detection of fuzzy TRs is propaedeutic to enriching our view of their role in regulatory mechanisms and diseases. Fuzzy TRs are also important as tools to shed light on the evolutionary history of the genome, where higher divergence correlates with more remote duplication events. Results: We have developed an algorithm (christened TRStalker) with the aim of detecting efficiently TRs that are hard to detect because of their inherent fuzziness, due to high levels of base substitutions, insertions and deletions. To attain this goal, we developed heuristics to solve a Steiner version of the problem for which the fuzziness is measured with respect to a motif string not necessarily present in the input string. This problem is akin to the ‘generalized median string’ that is known to be an NP-hard problem. Experiments with both synthetic and biological sequences demonstrate that our method performs better than current state of the art for fuzzy TRs and that the fuzzy TRs of the type we detect are indeed present in important biological sequences. Availability: TRStalker will be integrated in the web-based TRs Discovery Service (TReaDS) at bioalgo.iit.cnr.it. Contact:marco.pellegrini@iit.cnr.it Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Marco Pellegrini
- CNR, Istituto di Informatica e Telematica, Via Moruzzi 1, 56124 Pisa, Italy.
| | | | | |
Collapse
|
28
|
Tan JC, Tan A, Checkley L, Honsa CM, Ferdig MT. Variable numbers of tandem repeats in Plasmodium falciparum genes. J Mol Evol 2010; 71:268-78. [PMID: 20730584 DOI: 10.1007/s00239-010-9381-8] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2009] [Accepted: 08/09/2010] [Indexed: 11/29/2022]
Abstract
Genome variation studies in Plasmodium falciparum have focused on SNPs and, more recently, large-scale copy number polymorphisms and ectopic rearrangements. Here, we examine another source of variation: variable number tandem repeats (VNTRs). Interspersed low complexity features, including the well-studied P. falciparum microsatellite sequences, are commonly classified as VNTRs; however, this study is focused on longer coding VNTR polymorphisms, a small class of copy number variations. Selection against frameshift mutation is a main constraint on tandem repeats (TRs) in coding regions, while limited propagation of TRs longer than 975 nt total length is a minor restriction in coding regions. Comparative analysis of three P. falciparum genomes reveals that more than 9% of all P. falciparum ORFs harbor VNTRs, much more than has been reported for any other species. Moreover, genotyping of VNTR loci in a drug-selected line, progeny of a genetic cross, and 334 field isolates demonstrates broad variability in these sequences. Functional enrichment analysis of ORFs harboring VNTRs identifies stress and DNA damage responses along with chromatin modification activities, suggesting an influence on genome mutability and functional variation. Analysis of the repeat units and their flanking regions in both P. falciparum and Plasmodium reichenowi sequences implicates a replication slippage mechanism in the generation of TRs from an initially unrepeated sequence. VNTRs can contribute to rapid adaptation by localized sequence duplication. They also can confound SNP-typing microarrays or mapping short-sequence reads and therefore must be accounted for in such analyses.
Collapse
Affiliation(s)
- John C Tan
- The Eck Institute for Global Health, University of Notre Dame, 100 Galvin Life Sciences, Notre Dame, IN, 46556, USA.
| | | | | | | | | |
Collapse
|
29
|
Abstract
Single nucleotide polymorphisms (SNPs) are widely distributed in the human genome and although most SNPs are the result of independent point-mutations, there are exceptions. When studying distances between SNPs, a periodic pattern in the distance between pairs of identical SNPs has been found to be heavily correlated with periodicity in short tandem repeats (STRs). STRs are short DNA segments, widely distributed in the human genome and mainly found outside known tandem repeats. Because of the biased occurrence of SNPs, special care has to be taken when analyzing SNP-variation in STRs. We present a review of STRs in the human genome and discuss molecular mechanisms related to the biased occurrence of SNPs in STRs, and its implications for genome comparisons and genetic association studies.
Collapse
Affiliation(s)
- Bo Eskerod Madsen
- AgroTech, Institute for Agri Technology and Food Innovation, Aarhus N, Denmark
| | | | | |
Collapse
|
30
|
Cruz F, Roux J, Robinson-Rechavi M. The expansion of amino-acid repeats is not associated to adaptive evolution in mammalian genes. BMC Genomics 2009; 10:619. [PMID: 20021652 PMCID: PMC2806350 DOI: 10.1186/1471-2164-10-619] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2009] [Accepted: 12/18/2009] [Indexed: 01/22/2023] Open
Abstract
Background The expansion of amino acid repeats is determined by a high mutation rate and can be increased or limited by selection. It has been suggested that recent expansions could be associated with the potential of adaptation to new environments. In this work, we quantify the strength of this association, as well as the contribution of potential confounding factors. Results Mammalian positively selected genes have accumulated more recent amino acid repeats than other mammalian genes. However, we found little support for an accelerated evolutionary rate as the main driver for the expansion of amino acid repeats. The most significant predictors of amino acid repeats are gene function and GC content. There is no correlation with expression level. Conclusions Our analyses show that amino acid repeat expansions are causally independent from protein adaptive evolution in mammalian genomes. Relaxed purifying selection or positive selection do not associate with more or more recent amino acid repeats. Their occurrence is slightly favoured by the sequence context but mainly determined by the molecular function of the gene.
Collapse
Affiliation(s)
- Fernando Cruz
- Department of Ecology and Evolution, Biophore, University of Lausanne, 1015 Lausanne, Switzerland.
| | | | | |
Collapse
|
31
|
Verstrepen KJ, Fink GR. Genetic and epigenetic mechanisms underlying cell-surface variability in protozoa and fungi. Annu Rev Genet 2009; 43:1-24. [PMID: 19640229 DOI: 10.1146/annurev-genet-102108-134156] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Eukaryotic microorganisms have evolved ingenious mechanisms to generate variability at their cell surface, permitting differential adherence, rapid adaptation to changing environments, and evasion of immune surveillance. Fungi such as Saccharomyces cerevisiae and the pathogen Candida albicans carry a family of mucin and adhesin genes that allow adhesion to various surfaces and tissues. Trypanosoma cruzi, T. brucei, and Plasmodium falciparum likewise contain large arsenals of different cell surface adhesion genes. In both yeasts and protozoa, silencing and differential expression of the gene family results in surface variability. Here, we discuss unexpected similarities in the structure and genomic location of the cell surface genes, the role of repeated DNA sequences, and the genetic and epigenetic mechanisms-all of which contribute to the remarkable cell surface variability in these highly divergent microbes.
Collapse
|
32
|
Gibbons JG, Rokas A. Comparative and functional characterization of intragenic tandem repeats in 10 Aspergillus genomes. Mol Biol Evol 2008; 26:591-602. [PMID: 19056904 DOI: 10.1093/molbev/msn277] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Intragenic tandem repeats (ITRs) are consecutive repeats of three or more nucleotides found in coding regions. ITRs are the underlying cause of several human genetic diseases and have been associated with phenotypic variation, including pathogenesis, in several clades of the tree of life. We have examined the evolution and functional role of ITRs in 10 genomes spanning the fungal genus Aspergillus, a clade of relevance to medicine, agriculture, and industry. We identified several hundred ITRs in each of the species examined. ITR content varied extensively between species, with an average 79% of ITRs unique to a given species. For the fraction of conserved ITR regions, sequence comparisons within species and between close relatives revealed that they were highly variable. ITR-containing proteins were evolutionarily less conserved, compositionally distinct, and overrepresented for domains associated with cell-surface localization and function relative to the rest of the proteome. Furthermore, ITRs were preferentially found in proteins involved in transcription, cellular communication, and cell-type differentiation but were underrepresented in proteins involved in metabolism and energy. Importantly, although ITRs were evolutionarily labile, their functional associations appeared. To be remarkably conserved across eukaryotes. Fungal ITRs likely participate in a variety of developmental processes and cell-surface-associated functions, suggesting that their contribution to fungal lifestyle and evolution may be more general than previously assumed.
Collapse
Affiliation(s)
- John G Gibbons
- Department of Biological Sciences, Vanderbilt University, Nashville, USA
| | | |
Collapse
|
33
|
Usdin K. The biological effects of simple tandem repeats: lessons from the repeat expansion diseases. Genome Res 2008; 18:1011-9. [PMID: 18593815 DOI: 10.1101/gr.070409.107] [Citation(s) in RCA: 151] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Tandem repeats are common features of both prokaryote and eukaryote genomes, where they can be found not only in intergenic regions but also in both the noncoding and coding regions of a variety of different genes. The repeat expansion diseases are a group of human genetic disorders caused by long and highly polymorphic tandem repeats. These disorders provide many examples of the effects that such repeats can have on many biological processes. While repeats in the coding sequence can result in the generation of toxic or malfunctioning proteins, noncoding repeats can also have significant effects including the generation of chromosome fragility, the silencing of the genes in which they are located, the modulation of transcription and translation, and the sequestering of proteins involved in processes such as splicing and cell architecture.
Collapse
Affiliation(s)
- Karen Usdin
- Section on Gene Structure and Disease, Laboratory of Molecular and Cellular Biology, National Institute of Diabetes, Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892-0830, USA.
| |
Collapse
|
34
|
Madsen BE, Villesen P, Wiuf C. Short tandem repeats in human exons: a target for disease mutations. BMC Genomics 2008; 9:410. [PMID: 18789129 PMCID: PMC2543027 DOI: 10.1186/1471-2164-9-410] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2008] [Accepted: 09/12/2008] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND In recent years it has been demonstrated that structural variations, such as indels (insertions and deletions), are common throughout the genome, but the implications of structural variations are still not clearly understood. Long tandem repeats (e.g. microsatellites or simple repeats) are known to be hypermutable (indel-rich), but are rare in exons and only occasionally associated with diseases. Here we focus on short (imperfect) tandem repeats (STRs) which fall below the radar of conventional tandem repeat detection, and investigate whether STRs are targets for disease-related mutations in human exons. In particular, we test whether they share the hypermutability of the longer tandem repeats and whether disease-related genes have a higher STR content than non-disease-related genes. RESULTS We show that validated human indels are extremely common in STR regions compared to non-STR regions. In contrast to longer tandem repeats, our definition of STRs found them to be present in exons of most known human genes (92%), 99% of all STR sequences in exons are shorter than 33 base pairs and 62% of all STR sequences are imperfect repeats. We also demonstrate that STRs are significantly overrepresented in disease-related genes in both human and mouse. These results are preserved when we limit the analysis to STRs outside known longer tandem repeats. CONCLUSION Based on our findings we conclude that STRs represent hypermutable regions in the human genome that are linked to human disease. In addition, STRs constitute an obvious target when screening for rare mutations, because of the relatively low amount of STRs in exons (1,973,844 bp) and the limited length of STR regions.
Collapse
Affiliation(s)
- Bo Eskerod Madsen
- Bioinformatics Research Center (BiRC), University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Palle Villesen
- Bioinformatics Research Center (BiRC), University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Carsten Wiuf
- Bioinformatics Research Center (BiRC), University of Aarhus, DK-8000 Aarhus C, Denmark
| |
Collapse
|
35
|
Merkel A, Gemmell N. Detecting short tandem repeats from genome data: opening the software black box. Brief Bioinform 2008; 9:355-66. [PMID: 18621747 DOI: 10.1093/bib/bbn028] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Short tandem repeats, specifically microsatellites, are widely used genetic markers, associated with human genetic diseases, and play an important role in various regulatory mechanisms and evolution. Despite their importance, much is yet unknown about their mutational dynamics. The increasing availability of genome data has led to several in silico studies of microsatellite evolution which have produced a vast range of algorithms and software for tandem repeat detection. Documentation of these tools is often sparse, or provided in a format that is impenetrable to most biologists without informatics background. This article introduces the major concepts behind repeat detecting software essential for informed tool selection. We reflect on issues such as parameter settings and program bias, as well as redundancy filtering and efficiency using examples from the currently available range of programs, to provide an integrated comparison and practical guide to microsatellite detecting programs.
Collapse
Affiliation(s)
- Angelika Merkel
- School of Biological Sciences, University of Canterbury, Private Bag 4800, Christchurch 8041, New Zealand.
| | | |
Collapse
|
36
|
O'Dushlaine CT, Shields DC. Marked variation in predicted and observed variability of tandem repeat loci across the human genome. BMC Genomics 2008; 9:175. [PMID: 18416815 PMCID: PMC2364633 DOI: 10.1186/1471-2164-9-175] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2007] [Accepted: 04/16/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Tandem repeat (TR) variants in the human genome play key roles in a number of diseases. However, current models predicting variability are based on limited training sets. We conducted a systematic analysis of TRs of unit lengths 2-12 nucleotides in Whole Genome Shotgun (WGS) sequences to define the extent of variation of 209,214 unique repeat loci throughout the genome. RESULTS We applied a multivariate statistical model to predict TR variability. Predicted heterozygosity correlated with heterozygosity in the CEPH polymorphism database (correlation rho = 0.29, p < 0.0005) better than the correlation between the CEPH and WGS data (rho = 0.17), presumably because the model smoothes noise from small sample sizes. A multivariate logistic model of 8 parameters accounted for 36% of the variation in the WGS data. Validation studies of 70 experimentally investigated TRs revealed high concordance with the model's predictions (p < 0.0001). CONCLUSION Variability among 2-12-mer TRs in the genome can be modeled by a few parameters, which do not markedly differ according to unit length, consistent with a common mechanism for the generation of variability among such TRs. Analysis of the distributions of observed and predicted variants across the genome showed a general concordance, indicating that the repeat variation dataset does not exhibit strong regional ascertainment biases. This revealed a deficit of variant repeats in chromosomes 19 and Y - likely to reflect a reduction in 2-mer repeats in the former and a reduced level of recombination in the latter - and excesses in chromosomes 6, 13, 20 and 21.
Collapse
Affiliation(s)
- Colm T O'Dushlaine
- Molecular and Cellular Therapeutics, Royal College of Surgeons in Ireland, Dublin 2, Ireland.
| | | |
Collapse
|
37
|
Srivastava J, Premi S, Kumar S, Ali S. Organization and differential expression of the GACA/GATA tagged somatic and spermatozoal transcriptomes in Buffalo Bubalus bubalis. BMC Genomics 2008; 9:132. [PMID: 18366692 PMCID: PMC2346481 DOI: 10.1186/1471-2164-9-132] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2008] [Accepted: 03/20/2008] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND Simple sequence repeats (SSRs) of GACA/GATA have been implicated with differentiation of sex-chromosomes and speciation. However, the organization of these repeats within genomes and transcriptomes, even in the best characterized organisms including human, remains unclear. The main objective of this study was to explore the buffalo transcriptome for its association with GACA/GATA repeats, and study the structural organization and differential expression of the GACA/GATA repeat tagged transcripts. Moreover, the distribution of GACA and GATA repeats in the prokaryotic and eukaryotic genomes was studied to highlight their significance in genome evolution. RESULTS We explored several genomes and transcriptomes, and observed total absence of these repeats in the prokaryotes, with their gradual accumulation in higher eukaryotes. Further, employing novel microsatellite associated sequence amplification (MASA) approach using varying length oligos based on GACA and GATA repeats; we identified and characterized 44 types of known and novel mRNA transcripts tagged with these repeats from different somatic tissues, gonads and spermatozoa of water buffalo Bubalus bubalis. GACA was found to be associated with higher number of transcripts compared to that with GATA. Exclusive presence of several GACA-tagged transcripts in a tissue or spermatozoa, and absence of the GATA-tagged ones in lung/heart highlights their tissue-specific significance. Of all the GACA/GATA tagged transcripts, approximately 30% demonstrated inter-tissue and/or tissue-spermatozoal sequence polymorphisms. Significantly, approximately 60% of the GACA-tagged and all the GATA-tagged transcripts showed highest or unique expression in the testis and/or spermatozoa. Moreover, approximately 75% GACA-tagged and all the GATA-tagged transcripts were found to be conserved across the species. CONCLUSION Present study is a pioneer attempt exploring GACA/GATA tagged transcriptome in any mammalian species highlighting their tissue, stage and species-specific expression profiles. Comparative analysis suggests the gradual accumulation of these repeats in the higher eukaryotes, and establishes the GACA richness of the buffalo transcriptome. This is envisaged to establish the roles of integral simple sequence repeats and tagged transcripts in gene expression or regulation.
Collapse
Affiliation(s)
- Jyoti Srivastava
- Molecular Genetics Laboratory, National Institute of Immunology, Aruna Asaf Ali Marg, New Delhi-110 067, India
| | - Sanjay Premi
- Molecular Genetics Laboratory, National Institute of Immunology, Aruna Asaf Ali Marg, New Delhi-110 067, India
| | - Sudhir Kumar
- Molecular Genetics Laboratory, National Institute of Immunology, Aruna Asaf Ali Marg, New Delhi-110 067, India
| | - Sher Ali
- Molecular Genetics Laboratory, National Institute of Immunology, Aruna Asaf Ali Marg, New Delhi-110 067, India
| |
Collapse
|
38
|
Legendre M, Pochet N, Pak T, Verstrepen KJ. Sequence-based estimation of minisatellite and microsatellite repeat variability. Genome Res 2007; 17:1787-96. [PMID: 17978285 DOI: 10.1101/gr.6554007] [Citation(s) in RCA: 145] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Variable tandem repeats are frequently used for genetic mapping, genotyping, and forensics studies. Moreover, variation in some repeats underlies rapidly evolving traits or certain diseases. However, mutation rates vary greatly from repeat to repeat, and as a consequence, not all tandem repeats are suitable genetic markers or interesting unstable genetic modules. We developed a model, "SERV," that predicts the variability of a broad range of tandem repeats in a wide range of organisms. The nonlinear model uses three basic characteristics of the repeat (number of repeated units, unit length, and purity) to produce a numeric "VARscore" that correlates with repeat variability. SERV was experimentally validated using a large set of different artificial repeats located in the Saccharomyces cerevisiae URA3 gene. Further in silico analysis shows that SERV outperforms existing models and accurately predicts repeat variability in bacteria and eukaryotes, including plants and humans. Using SERV, we demonstrate significant enrichment of variable repeats within human genes involved in transcriptional regulation, chromatin remodeling, morphogenesis, and neurogenesis. Moreover, SERV allows identification of known and candidate genes involved in repeat-based diseases. In addition, we demonstrate the use of SERV for the selection and comparison of suitable variable repeats for genotyping and forensic purposes. Our analysis indicates that tandem repeats used for genotyping should have a VARscore between 1 and 3. SERV is publicly available from http://hulsweb1.cgr.harvard.edu/SERV/.
Collapse
Affiliation(s)
- Matthieu Legendre
- FAS Center for Systems Biology, Harvard University, Cambridge, Massachusetts 02138, USA
| | | | | | | |
Collapse
|
39
|
Scheffel A, Schüler D. The acidic repetitive domain of the Magnetospirillum gryphiswaldense MamJ protein displays hypervariability but is not required for magnetosome chain assembly. J Bacteriol 2007; 189:6437-46. [PMID: 17601786 PMCID: PMC1951895 DOI: 10.1128/jb.00421-07] [Citation(s) in RCA: 80] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Magnetotactic bacteria navigate along the earth's magnetic field using chains of magnetosomes, which are intracellular organelles comprising membrane-enclosed magnetite crystals. The assembly of highly ordered magnetosome chains is under genetic control and involves several specific proteins. Based on genetic and cryo-electron tomography studies, a model was recently proposed in which the acidic MamJ magnetosome protein attaches magnetosome vesicles to the actin-like cytoskeletal filament formed by MamK, thereby preventing magnetosome chains from collapsing. However, the exact functions as well as the mode of interaction between MamK and MamJ are unknown. Here, we demonstrate that several functional MamJ variants from Magnetospirillum gryphiswaldense and other magnetotactic bacteria share an acidic and repetitive central domain, which displays an unusual intra- and interspecies sequence polymorphism, probably caused by homologous recombination between identical copies of Glu- and Pro-rich repeats. Surprisingly, mamJ mutant alleles in which the central domain was deleted retained their potential to restore chain formation in a DeltamamJ mutant, suggesting that the acidic domain is not essential for MamJ's function. Results of two-hybrid experiments indicate that MamJ physically interacts with MamK, and two distinct sequence regions within MamJ were shown to be involved in binding to MamK. Mutant variants of MamJ lacking either of the binding domains were unable to functionally complement the DeltamamJ mutant. In addition, two-hybrid experiments suggest both MamK-binding domains of MamJ confer oligomerization of MamJ. In summary, our data reveal domains required for the functions of the MamJ protein in chain assembly and maintenance and provide the first experimental indications for a direct interaction between MamJ and the cytoskeletal filament protein MamK.
Collapse
Affiliation(s)
- André Scheffel
- Max Planck Institute for Marine Microbiology, Bremen, Germany
| | | |
Collapse
|
40
|
Crane CF. Patterned sequence in the transcriptome of vascular plants. BMC Genomics 2007; 8:173. [PMID: 17573970 PMCID: PMC1940011 DOI: 10.1186/1471-2164-8-173] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2006] [Accepted: 06/15/2007] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Microsatellites (repeated subsequences based on motifs of one to six nucleotides) are widely used as codominant genetic markers because of their frequent polymorphism and relative selective neutrality. Minisatellites are repeats of motifs having seven or more nucleotides. The large number of EST sequences now available in public databases offers an opportunity to compare microsatellite and minisatellite properties and evaluate their evolution over a broad range of plant taxa. RESULTS Repeated motifs from one to 250 nucleotides long were identified in 6793306 expressed sequence tags (ESTs) from 88 genera of vascular plants, using a custom data-processing pipeline that allowed limited variation among repeats. The pipeline processed trimmed but otherwise unfiltered sequence and output nonredundant loci of at least 15 nucleotides, with degree of polymorphism and PCR primers wherever possible. Motifs that were an integral multiple of three in length were more abundant and richer in G/C than other motifs. From 80 to 85% of minisatellite motifs represented repeats within proteins, up to the 228-nucleotide repeat of ubiquitin, but not all of these repeats preserved reading frame. The remaining 15 to 20% of minisatellite motifs were associated with transcribed repetitive elements, e.g., retrotransposons. Relative microsatellite motif frequencies did not correlate tightly to phylogenetic relationship. Evolution of increased microsatellite and EST GC content was evident within the grasses. Microsatellites were less frequent in the transcriptome of genera with large genomes, but there was no evidence for greater dilution of the transcriptome with transposable element transcripts in these genera. CONCLUSION The relatively low correlation of microsatellite spectrum to phylogeny suggests that repeat loci evolve more rapidly than the surrounding sequence, although tissue specificity of the different EST libraries is a complicating factor. In-frame motifs are more abundant and higher in GC than frame-shifting motifs, but most EST minisatellite loci appear to represent repeats in translated sequence, regardless of whether reading frame is preserved. Motifs of four to six nucleotides are as polymorphic in EST collections as the commonly used motifs of two and three nucleotides, and they can be exploited as genetic markers with little additional effort.
Collapse
Affiliation(s)
- Charles F Crane
- Agricultural Research Service, United States Department of Agriculture, and Department of Botany and Plant Pathology, Purdue University, 915 W. State St, West Lafayette, Indiana 47907-2054, USA.
| |
Collapse
|
41
|
O'Dushlaine CT, Shields DC. Tools for the identification of variable and potentially variable tandem repeats. BMC Genomics 2006; 7:290. [PMID: 17107618 PMCID: PMC1654160 DOI: 10.1186/1471-2164-7-290] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2006] [Accepted: 11/15/2006] [Indexed: 11/28/2022] Open
Abstract
Background Tandem repeat arrays showing variation between sequences within a population, between strains or across species may have functional effects. The increasing availability of genomic sequence data makes routine description of observed variation possible, creating a need for tools to describe such variability. Results We present a set of programs that facilitate the identification of tandem repeats showing variation across multiple sequences or genomes, and the prediction of potentially polymorphic tandem repeats. The VNTRfinder (Variable Number of Tandem Repeats finder) program enables the detection of sequence length variation between arrays of inter-specific or intra-specific tandem repeats. In the absence of comparable sequences to explore observed variation, predictions are provided describing which tandem repeats are more likely to be variable, to help guide and focus further experimental evaluation. Conclusion These tools represent a resource for researchers interested in tandem repeats in nucleotide sequences that are most likely to be of clinical and evolutionary interest. The tools are available at . Downloadable versions for UNIX/LINUX and WINDOWS which permit the consideration of longer and more numerous sequences are also available.
Collapse
Affiliation(s)
- Colm T O'Dushlaine
- Bioinformatics Core, Molecular and Cellular Therapeutics, Royal College of Surgeons in Ireland, Ireland
| | - Denis C Shields
- UCD Conway Institute, University College Dublin, Dublin 4, Ireland
| |
Collapse
|
42
|
Voynov V, Verstrepen KJ, Jansen A, Runner VM, Buratowski S, Fink GR. Genes with internal repeats require the THO complex for transcription. Proc Natl Acad Sci U S A 2006; 103:14423-8. [PMID: 16983072 PMCID: PMC1599979 DOI: 10.1073/pnas.0606546103] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The evolutionarily conserved multisubunit THO complex, which is recruited to actively transcribed genes, is required for the efficient expression of FLO11 and other yeast genes that have long internal tandem repeats. FLO11 transcription elongation in Tho- mutants is hindered in the region of the tandem repeats, resulting in a loss of function. Moreover, the repeats become genetically unstable in Tho- mutants. A FLO11 gene without the tandem repeats is transcribed equally well in Tho+ or Tho- strains. The Tho- defect in transcription is suppressed by overexpression of topoisomerase I, suggesting that the THO complex functions to rectify aberrant structures that arise during transcription.
Collapse
Affiliation(s)
- Vladimir Voynov
- *Whitehead Institute for Biomedical Research, Nine Cambridge Center, Cambridge, MA 02142
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Kevin J. Verstrepen
- Bauer Center for Genomics Research, Harvard University, 7 Divinity Avenue, Cambridge, MA 02138
- Department of Microbial and Molecular Systems, Faculty of Bioscience Engineering, Katholieke Universiteit Leuven, Kasteelpark Arenberg 22, B-3001 Leuven, Belgium
| | - An Jansen
- *Whitehead Institute for Biomedical Research, Nine Cambridge Center, Cambridge, MA 02142
| | - Vanessa M. Runner
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, 240 Longwood Avenue, Boston, MA 02115
| | - Stephen Buratowski
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, 240 Longwood Avenue, Boston, MA 02115
| | - Gerald R. Fink
- *Whitehead Institute for Biomedical Research, Nine Cambridge Center, Cambridge, MA 02142
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139
- To whom correspondence should be addressed. E-mail:
| |
Collapse
|
43
|
Mularoni L, Guigó R, Albà MM. Mutation patterns of amino acid tandem repeats in the human proteome. Genome Biol 2006; 7:R33. [PMID: 16640792 PMCID: PMC1557989 DOI: 10.1186/gb-2006-7-4-r33] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2006] [Revised: 03/17/2006] [Accepted: 03/23/2006] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Amino acid tandem repeats are found in nearly one-fifth of human proteins. Abnormal expansion of these regions is associated with several human disorders. To gain further insight into the mutational mechanisms that operate in this type of sequence, we have analyzed a large number of mutation variants derived from human expressed sequence tags (ESTs). RESULTS We identified 137 polymorphic variants in 115 different amino acid tandem repeats. Of these, 77 contained amino acid substitutions and 60 contained gaps (expansions or contractions of the repeat unit). The analysis showed that at least about 21% of the repeats might be polymorphic in humans. We compared the mutations found in different types of amino acid repeats and in adjacent regions. Overall, repeats showed a five-fold increase in the number of gap mutations compared to adjacent regions, reflecting the action of slippage within the repetitive structures. Gap and substitution mutations were very differently distributed between different amino acid repeat types. Among repeats containing gap variants we identified several disease and candidate disease genes. CONCLUSION This is the first report at a genome-wide scale of the types of mutations occurring in the amino acid repeat component of the human proteome. We show that the mutational dynamics of different amino acid repeat types are very diverse. We provide a list of loci with highly variable repeat structures, some of which may be potentially involved in disease.
Collapse
Affiliation(s)
- Loris Mularoni
- Research Unit on Biomedical Informatics, Institut Municipal d'Investigació Mèdica, Universitat Pompeu Fabra, Barcelona 08003, Spain
| | - Roderic Guigó
- Research Unit on Biomedical Informatics, Institut Municipal d'Investigació Mèdica, Universitat Pompeu Fabra, Barcelona 08003, Spain
- Centre de Regulació Genòmica, Barcelona 08003, Spain
| | - M Mar Albà
- Research Unit on Biomedical Informatics, Institut Municipal d'Investigació Mèdica, Universitat Pompeu Fabra, Barcelona 08003, Spain
| |
Collapse
|