1
|
A microsatellite repeat in PCA3 long non-coding RNA is associated with prostate cancer risk and aggressiveness. Sci Rep 2017; 7:16862. [PMID: 29203868 PMCID: PMC5715103 DOI: 10.1038/s41598-017-16700-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2017] [Accepted: 11/10/2017] [Indexed: 01/08/2023] Open
Abstract
Short tandem repeats (STRs) are repetitive sequences of a polymorphic stretch of two to six nucleotides. We hypothesized that STRs are associated with prostate cancer development and/or progression. We undertook RNA sequencing analysis of prostate tumors and adjacent non-malignant cells to identify polymorphic STRs that are readily expressed in these cells. Most of the expressed STRs in the clinical samples mapped to intronic and intergenic DNA. Our analysis indicated that three of these STRs (TAAA-ACTG2, TTTTG-TRIB1, and TG-PCA3) are polymorphic and differentially expressed in prostate tumors compared to adjacent non-malignant cells. TG-PCA3 STR expression was repressed by the anti-androgen drug enzalutamide in prostate cancer cells. Genetic analysis of prostate cancer patients and healthy controls (N > 2,000) showed a significant association of the most common 11 repeat allele of TG-PCA3 STR with prostate cancer risk (OR = 1.49; 95% CI 1.11–1.99; P = 0.008). A significant association was also observed with aggressive disease (OR = 2.00; 95% CI 1.06–3.76; P = 0.031) and high mortality rates (HR = 3.0; 95% CI 1.03–8.77; P = 0.045). We propose that TG-PCA3 STR has both diagnostic and prognostic potential for prostate cancer. We provided a proof of concept to be applied to other RNA sequencing datasets to identify disease-associated STRs for future clinical exploratory studies.
Collapse
|
2
|
Nagpure NS, Rashid I, Pati R, Pathak AK, Singh M, Singh SP, Sarkar UK. FishMicrosat: a microsatellite database of commercially important fishes and shellfishes of the Indian subcontinent. BMC Genomics 2013; 14:630. [PMID: 24047532 PMCID: PMC3852227 DOI: 10.1186/1471-2164-14-630] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2013] [Accepted: 09/11/2013] [Indexed: 11/17/2022] Open
Abstract
Background Microsatellite DNA is one of many powerful genetic markers used for the construction of genetic linkage maps and the study of population genetics. The biological databases in public domain hold vast numbers of microsatellite sequences for many organisms including fishes. The microsatellite data available in these data sources were extracted and managed into a database that facilitates sequences analysis and browsing relevant information. The system also helps to design primer sequences for flanking regions of repeat loci for PCR identification of polymorphism within populations. Description FishMicrosat is a database of microsatellite sequences of fishes and shellfishes that includes important aquaculture species such as Lates calcarifer, Ctenopharyngodon idella, Hypophthalmichthys molitrix, Penaeus monodon, Labeo rohita, Oreochromis niloticus, Fenneropenaeus indicus and Macrobrachium rosenbergii. The database contains 4398 microsatellite sequences of 41 species belonging to 15 families from the Indian subcontinent. GenBank of NCBI was used as a prime data source for developing the database. The database presents information about simple and compound microsatellites, their clusters and locus orientation within sequences. The database has been integrated with different tools in a web interface such as primer designing, locus finding, mapping repeats, detecting similarities among sequences across species, and searching using motifs and keywords. In addition, the database has the ability to browse information on the top 10 families and the top 10 species, through record overview. Conclusions FishMicrosat database is a useful resource for fish and shellfish microsatellite analyses and locus identification across species, which has important applications in population genetics, evolutionary studies and genetic relatedness among species. The database can be expanded further to include the microsatellite data of fishes and shellfishes from other regions and available information on genome sequencing project of species of aquaculture importance.
Collapse
Affiliation(s)
- Naresh Sahebrao Nagpure
- Division of Molecular Biology and Biotechnology, National Bureau of Fish Genetic Resources, Lucknow 226002, India.
| | | | | | | | | | | | | |
Collapse
|
3
|
Chaturvedi A, Tiwari S, Jesudasan RA. RiDs db: Repeats in diseases database. Bioinformation 2011; 7:96-7. [PMID: 21938212 PMCID: PMC3174043 DOI: 10.6026/97320630007096] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2011] [Accepted: 08/13/2011] [Indexed: 11/23/2022] Open
Abstract
UNLABELLED The non-coding fraction of the human genome, which is approximately 98%, is mainly constituted by repeats. Transpositions, expansions and deletions of these repeat elements contribute to a number of diseases. None of the available databases consolidates information on both tandem and interspersed repeats with the flexibility of FASTA based homology search with reference to disease genes. Repeats in diseases database (RiDs db) is a web accessible relational database, which aids analysis of repeats associated with Mendelian disorders. It is a repository of disease genes, which can be searched by FASTA program or by limitedor free- text keywords. Unlike other databases, RiDs db contains the sequences of these genes with access to corresponding information on both interspersed and tandem repeats contained within them, on a unified platform. Comparative analysis of novel or patient sequences with the reference sequences in RiDs db using FASTA search will indicate change in structure of repeats, if any, with a particular disorder. This database also provides links to orthologs in model organisms such as zebrafish, mouse and Drosophila. AVAILABILITY The database is available for free at http://115.111.90.196/ridsdb/index.php.
Collapse
Affiliation(s)
- Anurag Chaturvedi
- Centre for Cellular and Molecular Biology, Habsiguda, Hyderabad − 500007, Andhra Pradesh, India
| | - Shrish Tiwari
- Centre for Cellular and Molecular Biology, Habsiguda, Hyderabad − 500007, Andhra Pradesh, India
| | - Rachel A Jesudasan
- Centre for Cellular and Molecular Biology, Habsiguda, Hyderabad − 500007, Andhra Pradesh, India
| |
Collapse
|
4
|
Rouchka EC. Database of exact tandem repeats in the Zebrafish genome. BMC Genomics 2010; 11:347. [PMID: 20515480 PMCID: PMC2901318 DOI: 10.1186/1471-2164-11-347] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2009] [Accepted: 06/01/2010] [Indexed: 11/23/2022] Open
Abstract
Background Sequencing of the approximately 1.7 billion bases of the zebrafish genome is currently underway. To date, few high resolution genetic maps exist for the zebrafish genome, based mainly on single nucleotide polymorphisms (SNPs) and short microsatellite repeats. The desire to construct a higher resolution genetic map led to the construction of a database of tandemly repeating elements within the zebrafish Zv8 assembly. Description Exact tandem repeats with a repeat length of at least three bases and a copy number of at least 10 were reported. Repeats with a total length of 250 or fewer bases and their flanking regions were masked for known vertebrate repeats. Optimal primer pairs were computationally designed in the regions flanking the detected repeats. This database of exact tandem repeats can then be used as a resource by molecular biologists with interests in experimentally testing VNTRs within a zebrafish population. Conclusions A total of 116,915 repeats with a base length of at least three nucleotides were detected. The longest of these was a 54-base repeat with fourteen tandem copies. A significant number of repeats with a base length of 18, 24, 27 and 30 were detected, many with potentially novel proline-rich coding regions. Detection of exact tandem repeats in the zebrafish genome leads to a wealth of information regarding potential polymorphic sites for VNTRs. The association of many of these repeats with potentially novel yet similar coding regions yields an exciting potential for disease associated genes. A web interface for querying repeats is available at http://bioinformatics.louisville.edu/zebrafish/. This portal allows for users to search for a repeats of a selected base size from any valid specified region within the 25 linkage groups.
Collapse
Affiliation(s)
- Eric C Rouchka
- Department of Computer Engineering and Computer Science, Speed School of Engineering, University of Louisville, Duthie Center, Room 208, Louisville, KY, USA.
| |
Collapse
|
5
|
Chaley MB, Nazipova NN, Kutyrkin VA. Statistical methods for detecting latent periodicity patterns in biological sequences: The case of small-size samples. PATTERN RECOGNITION AND IMAGE ANALYSIS 2009. [DOI: 10.1134/s1054661809020217] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
6
|
Zhang W, He L, Liu W, Sun C, Ratain MJ. Exploring the relationship between polymorphic (TG/CA)n repeats in intron 1 regions and gene expression. Hum Genomics 2009; 3:236-45. [PMID: 19403458 PMCID: PMC2735212 DOI: 10.1186/1479-7364-3-3-236] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
The putative role of (TG/CA)n repeats in the regulation of transcription has recently been reported for several cancer- and disease-related genes, including the genes encoding the epidermal growth factor receptor (EGFR), hydroxysteroid (11-beta) dehydrogenase 2 (HSD11B2) and interferon-gamma (IFNG). These studies indicated a correlation between gene expression levels and the presence or length of (TG/CA)n repeats in their intron 1 regions. A genome-wide search for genes with similar features may provide evidence of whether these dinucleotide repeats represent a class of universal regulators of gene expression, which has recently begun to be investigated as a quantitative complex phenotype. Using a public database of simple repeats, we identified 330 genes containing potentially polymorphic long (TG/CA)n repeats (n >or= 12) in their intron 1 regions. One known physiological pathway, the calcium signalling pathway, was found to be enriched among the genes containing long repeats. In addition, certain biological processes, such as cation transport, signal transduction and ion transport, were found to be enriched in these genes. Genotyping of the long repeats showed that the majority of these dinucleotide repeats were polymorphic in the HapMap CEU (Caucasians from Utah, USA) samples of northern and western European ancestry. Evidence for a significant association between these repeats and gene expression was not observed in the genes selected based on their expression profiles in the HapMap CEU samples. Our current findings, therefore, do not support a role for these repeats as a class of universal gene expression regulators. A more comprehensive evaluation of the relationship between these repeats and gene expression, potentially in other tissues, may be necessary to illustrate their roles in gene regulation in the future.
Collapse
Affiliation(s)
- Wei Zhang
- Section of Hematology/Oncology, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | | | | | | | | |
Collapse
|
7
|
Model of perfect tandem repeat with random pattern and empirical homogeneity testing poly-criteria for latent periodicity revelation in biological sequences. Math Biosci 2008; 211:186-204. [DOI: 10.1016/j.mbs.2007.10.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2007] [Revised: 10/19/2007] [Accepted: 10/26/2007] [Indexed: 11/23/2022]
|
8
|
Haberman Y, Amariglio N, Rechavi G, Eisenberg E. Trinucleotide repeats are prevalent among cancer-related genes. Trends Genet 2007; 24:14-8. [PMID: 18054813 DOI: 10.1016/j.tig.2007.09.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2007] [Revised: 09/27/2007] [Accepted: 09/28/2007] [Indexed: 10/22/2022]
Abstract
Trinucleotide repeats (TNRs) have been primarily connected to neurologic and neuromuscular diseases, with few specific TNRs linked with various tumors. Here we conduct a genome-wide analysis and show that TNRs are five times more prevalent in cancer-related human genes. Interestingly, we also find that cancer-related genes are significantly longer than other genes. Our results suggest that genes containing TNRs are more prone to mutagenesis. The database of TNR genes can be used as a list of candidate cancer-related genes.
Collapse
Affiliation(s)
- Yael Haberman
- Department of Pediatric Hemato-Oncology, the Edmond and Lily Safra Children's Hospital and Cancer Research Center, Sheba Medical Center and Sackler School of Medicine, Tel Aviv University, Ramat Aviv 69978, Israel
| | | | | | | |
Collapse
|
9
|
Aishwarya V, Sharma PC. UgMicroSatdb: database for mining microsatellites from unigenes. Nucleic Acids Res 2007; 36:D53-6. [PMID: 17947328 PMCID: PMC2238862 DOI: 10.1093/nar/gkm811] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Microsatellites, also known as simple sequence repeats (SSRs) or short tandem repeats (STRs), have extensively been exploited as molecular markers for diverse applications. Recently, their role in gene regulation and genome evolution has also been discussed widely. We have developed UgMicroSatdb (Unigene MicroSatellite database), a web-based relational database of microsatellites present in unigene sequences covering 80 genomes. UgMicroSatdb allows microsatellite search using multiple parameters like microsatellite type (simple perfect, compound perfect and imperfect), repeat unit length (mono- to hexa-nucleotide), repeat number, microsatellite length and repeat sequence class. Microsatellites can also be retrieved by specifying EST, cDNA, CDS identity or by using Gene Index, GenBank, UniGene IDs. The database also provides information about trinucleotide repeats encoding various amino acids. Such codon repeats can be searched by specifying characteristics of coded amino acids like charge (basic, acidic or neutral), polarity (polar or non-polar), and their hydrophobic or hydrophilic nature. The nucleotide sequences of the target UniGenes are also provided to facilitate primer designing for PCR amplification of the desired microsatellite. UgMicroSatdb is available at http://ipu.ac.in/usbt/UgMicroSatdb.htm.
Collapse
Affiliation(s)
- Veenu Aishwarya
- University School of Biotechnology, Guru Gobind Singh Indraprastha University, Kashmere Gate, Delhi 110 006, India
| | | |
Collapse
|
10
|
EuMicroSatdb: a database for microsatellites in the sequenced genomes of eukaryotes. BMC Genomics 2007; 8:225. [PMID: 17623061 PMCID: PMC1933429 DOI: 10.1186/1471-2164-8-225] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2006] [Accepted: 07/10/2007] [Indexed: 12/02/2022] Open
Abstract
Background Microsatellites have immense utility as molecular markers in different fields like genome characterization and mapping, phylogeny and evolutionary biology. Existing microsatellite databases are of limited utility for experimental and computational biologists with regard to their content and information output. EuMicroSatdb (Eukaryotic MicroSatellite database) is a web based relational database for easy and efficient positional mining of microsatellites from sequenced eukaryotic genomes. Description A user friendly web interface has been developed for microsatellite data retrieval using Active Server Pages (ASP). The backend database codes for data extraction and assembly have been written using Perl based scripts and C++. Precise need based microsatellites data retrieval is possible using different input parameters like microsatellite type (simple perfect or compound perfect), repeat unit length (mono- to hexa-nucleotide), repeat number, microsatellite length and chromosomal location in the genome. Furthermore, information about clustering of different microsatellites in the genome can also be retrieved. Finally, to facilitate primer designing for PCR amplification of any desired microsatellite locus, 200 bp upstream and downstream sequences are provided. Conclusion The database allows easy systematic retrieval of comprehensive information about simple and compound microsatellites, microsatellite clusters and their locus coordinates in 31 sequenced eukaryotic genomes. The information content of the database is useful in different areas of research like gene tagging, genome mapping, population genetics, germplasm characterization and in understanding microsatellite dynamics in eukaryotic genomes.
Collapse
|
11
|
Holmes SE, Wentzell JS, Seixas AI, Callahan C, Silveira I, Ross CA, Margolis RL. A novel trinucleotide repeat expansion at chromosome 3q26.2 identified by a CAG/CTG repeat expansion detection array. Hum Genet 2006; 120:193-200. [PMID: 16783570 DOI: 10.1007/s00439-006-0207-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2006] [Revised: 05/11/2006] [Accepted: 05/12/2006] [Indexed: 11/30/2022]
Abstract
CAG/CTG repeat expansions cause at least 12 different neurological disorders, and additional disorders of this type probably exist. Using the repeat expansion detection (RED) assay, we identified an expanded CAG/CTG repeat in a 50-year-old woman with an autosomal dominant syndrome with prominent progressive sensory neuropathy. The expansion could not be accounted for by any of the CAG/CTG repeats known to undergo expansion. To identify the locus of the expansion, we created a PCR array to assess the repeat length of all repeats of eight or more CAG or CTG triplets in the human genome. The expansion was localized to a repeat contained in an intron of a Genscan-predicted gene, 185 nt downstream of a predicted exon that is conserved through mouse. The closest experimentally verified gene in the region (TNIK, encoding a serine/threonine kinase) occurs approximately 63 Kb downstream from the repeat. The length of the expansion in the proband is 98 triplets. This repeat is not expanded in the proband's cousin (the only other affected family member for whom DNA is currently available) and no expansions were detected in a set of 230 patients with movement disorders of unknown cause. An expanded allele containing 58 triplets was detected in a single control individual, and no other expansions were detected in a set of 255 controls. The normal repeat length ranges from 5 to 30 triplets, with 8 triplets the most common allele. Our results suggest that this new repeat expansion is probably not the direct cause of the phenotype in the proband. Whether the repeat contributes to the patient's phenotype, or is associated with another phenotype, remains to be determined.
Collapse
Affiliation(s)
- S E Holmes
- Laboratory of Genetic Neurobiology, Division of Neurobiology, Department of Psychiatry, The Johns Hopkins University School of Medicine, CMSC 8-121, Baltimore, MD 21287, USA.
| | | | | | | | | | | | | |
Collapse
|
12
|
Chen JM, Férec C, Cooper DN. A systematic analysis of disease-associated variants in the 3' regulatory regions of human protein-coding genes I: general principles and overview. Hum Genet 2006; 120:1-21. [PMID: 16645853 DOI: 10.1007/s00439-006-0180-7] [Citation(s) in RCA: 105] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2006] [Accepted: 03/26/2006] [Indexed: 10/24/2022]
Abstract
The 3' regulatory regions (3' RRs) of human genes play an important role in regulating mRNA 3' end formation, stability/degradation, nuclear export, subcellular localization and translation and are consequently rich in regulatory elements. Although 3' RRs contain only approximately 0.2% of known disease-associated mutations, this is likely to represent a rather conservative estimate of their actual prevalence. In an attempt to catalogue 3' RR-mediated disease and also to gain a greater understanding of the functional role of regulatory elements within 3' RRs, we have performed a systematic analysis of disease-associated 3' RR variants; 121 3' RR variants in 94 human genes were collated. These included 17 mutations in the upstream core polyadenylation signal sequence (UCPAS), 81 in the upstream sequence (USS) between the translational termination codon and the UCPAS, 6 in the left arm of the 'spacer' sequence (LAS) between the UCPAS and the pre-mRNA cleavage site (CS), 3 in the right arm of the 'spacer' sequence (RAS) or downstream core polyadenylation signal sequence (DCPAS) and 7 in the downstream sequence (DSS) of the 3'-flanking region, with 7 further mutations being treated as isolated examples. All the UCPAS mutations and the rather unusual cases of the DMPK, SCA8, FCMD and GLA mutations exert a significant effect on the mRNA phenotype and are usually associated with monogenic disease. By contrast, most of the remaining variants are polymorphisms that exert a comparatively minor influence on mRNA expression, but which may nevertheless predispose to or otherwise modify complex clinical phenotypes. Considerable efforts have been made to validate/elucidate the mechanisms through which the 3' untranslated region (3' UTR) variants affect gene expression. It is hoped that the integrative approach employed here in the study of naturally occurring variants of actual or potential pathological significance will serve to complement ongoing efforts to identify all functional regulatory elements in the human genome.
Collapse
|