1
|
Ranathunge C, Wheeler GL, Chimahusky ME, Kennedy MM, Morrison JI, Baldwin BS, Perkins AD, Welch ME. Transcriptome profiles of sunflower reveal the potential role of microsatellites in gene expression divergence. Mol Ecol 2018; 27:1188-1199. [PMID: 29419922 DOI: 10.1111/mec.14522] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2017] [Revised: 01/18/2018] [Accepted: 01/29/2018] [Indexed: 12/17/2022]
Abstract
The mechanisms by which natural populations generate adaptive genetic variation are not well understood. Some studies propose that microsatellites can function as drivers of adaptive variation. Here, we tested a potentially adaptive role for transcribed microsatellites with natural populations of the common sunflower (Helianthus annuus L.) by assessing the enrichment of microsatellites in genes that show expression divergence across latitudes. Seeds collected from six populations at two distinct latitudes in Kansas and Oklahoma were planted and grown in a common garden. Morphological measurements from the common garden demonstrated that phenotypic variation among populations is largely explained by underlying genetic variation. An RNA-Seq experiment was conducted with 96 of the individuals grown in the common garden and differentially expressed (DE) transcripts between the two latitudes were identified. A total number of 825 DE transcripts were identified. DE transcripts and nondifferentially expressed (NDE) transcripts were then scanned for microsatellites. The abundance of different motif lengths and types in both groups were estimated. Our results indicate that DE transcripts are significantly enriched with mononucleotide repeats and significantly depauperate in trinucleotide repeats. Further, the standardized mononucleotide repeat motif A and dinucleotide repeat motif AG were significantly enriched within DE transcripts while motif types, C, AT, ACC and AAC in DE transcripts, are significantly differentiated in microsatellite tract length between the two latitudes. The tract length differentiation at specific microsatellite motif types across latitudes and their enrichment within DE transcripts indicate a potential functional role for transcribed microsatellites in gene expression divergence in sunflower.
Collapse
Affiliation(s)
- Chathurani Ranathunge
- Department of Biological Sciences, Mississippi State University, Starkville, MS, USA
| | - Gregory L Wheeler
- Department of Biological Sciences, Mississippi State University, Starkville, MS, USA.,Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH, USA
| | - Melody E Chimahusky
- Department of Biological Sciences, Mississippi State University, Starkville, MS, USA
| | - Meaghan M Kennedy
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Jesse I Morrison
- Department of Plant and Soil Sciences, Mississippi State University, Starkville, MS, USA
| | - Brian S Baldwin
- Department of Plant and Soil Sciences, Mississippi State University, Starkville, MS, USA
| | - Andy D Perkins
- Department of Computer Science and Engineering, Mississippi State University, Starkville, MS, USA
| | - Mark E Welch
- Department of Biological Sciences, Mississippi State University, Starkville, MS, USA
| |
Collapse
|
2
|
Huang Y, Mrázek J. Assessing diversity of DNA structure-related sequence features in prokaryotic genomes. DNA Res 2014; 21:285-97. [PMID: 24408877 PMCID: PMC4060949 DOI: 10.1093/dnares/dst057] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
Prokaryotic genomes are diverse in terms of their nucleotide and oligonucleotide composition as well as presence of various sequence features that can affect physical properties of the DNA molecule. We present a survey of local sequence patterns which have a potential to promote non-canonical DNA conformations (i.e. different from standard B-DNA double helix) and interpret the results in terms of relationships with organisms' habitats, phylogenetic classifications, and other characteristics. Our present work differs from earlier similar surveys not only by investigating a wider range of sequence patterns in a large number of genomes but also by using a more realistic null model to assess significant deviations. Our results show that simple sequence repeats and Z-DNA-promoting patterns are generally suppressed in prokaryotic genomes, whereas palindromes and inverted repeats are over-represented. Representation of patterns that promote Z-DNA and intrinsic DNA curvature increases with increasing optimal growth temperature (OGT), and decreases with increasing oxygen requirement. Additionally, representations of close direct repeats, palindromes and inverted repeats exhibit clear negative trends with increasing OGT. The observed relationships with environmental characteristics, particularly OGT, suggest possible evolutionary scenarios of structural adaptation of DNA to particular environmental niches.
Collapse
Affiliation(s)
- Yongjie Huang
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Jan Mrázek
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA Department of Microbiology, University of Georgia, Athens, GA 30602, USA
| |
Collapse
|
3
|
Abstract
Recent developments in DNA vaccine research provide a new momentum for this rather young and potentially disruptive technology. Gene-based vaccines are capable of eliciting protective immunity in humans to persistent intracellular pathogens, such as HIV, malaria, and tuberculosis, for which the conventional vaccine technologies have failed so far. The recent identification and characterization of genes coding for tumor antigens has stimulated the development of DNA-based antigen-specific cancer vaccines. Although most academic researchers consider the production of reasonable amounts of plasmid DNA (pDNA) for immunological studies relatively easy to solve, problems often arise during this first phase of production. In this chapter we review the current state of the art of pDNA production at small (shake flasks) and mid-scales (lab-scale bioreactor fermentations) and address new trends in vector design and strain engineering. We will guide the reader through the different stages of process design starting from choosing the most appropriate plasmid backbone, choosing the right Escherichia coli (E. coli) strain for production, and cultivation media and scale-up issues. In addition, we will address some points concerning the safety and potency of the produced plasmids, with special focus on producing antibiotic resistance-free plasmids. The main goal of this chapter is to make immunologists aware of the fact that production of the pDNA vaccine has to be performed with as much as attention and care as the rest of their research.
Collapse
|
4
|
Buske FA, Bauer DC, Mattick JS, Bailey TL. Triplexator: detecting nucleic acid triple helices in genomic and transcriptomic data. Genome Res 2012; 22:1372-81. [PMID: 22550012 PMCID: PMC3396377 DOI: 10.1101/gr.130237.111] [Citation(s) in RCA: 156] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2011] [Accepted: 03/20/2012] [Indexed: 11/28/2022]
Abstract
Double-stranded DNA is able to form triple-helical structures by accommodating a third nucleotide strand in its major groove. This sequence-specific process offers a potent mechanism for targeting genomic loci of interest that is of great value for biotechnological and gene-therapeutic applications. It is likely that nature has leveraged this addressing system for gene regulation, because computational studies have uncovered an abundance of putative triplex target sites in various genomes, with enrichment particularly in gene promoters. However, to draw a more complete picture of the in vivo role of triplexes, not only the putative targets but also the sequences acting as the third strand and their capability to pair with the predicted target sites need to be studied. Here we present Triplexator, the first computational framework that integrates all aspects of triplex formation, and showcase its potential by discussing research examples for which the different aspects of triplex formation are important. We find that chromatin-associated RNAs have a significantly higher fraction of sequence features able to form triplexes than expected at random, suggesting their involvement in gene regulation. We furthermore identify hundreds of human genes that contain sequence features in their promoter predicted to be able to form a triplex with a target within the same promoter, suggesting the involvement of triplexes in feedback-based gene regulation. With focus on biotechnological applications, we screen mammalian genomes for high-affinity triplex target sites that can be used to target genomic loci specifically and find that triplex formation offers a resolution of ~1300 nt.
Collapse
Affiliation(s)
- Fabian A. Buske
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, 4072 QLD, Australia
| | - Denis C. Bauer
- Division of Mathematics, Informatics, and Statistics, CSIRO, Sydney, 2113 NSW, Australia
- Queensland Brain Institute, The University of Queensland, Brisbane, 4072 QLD, Australia
| | - John S. Mattick
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, 4072 QLD, Australia
- Garvan Institute of Medical Research, Sydney, 2010 NSW, Australia
| | - Timothy L. Bailey
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, 4072 QLD, Australia
| |
Collapse
|
5
|
Buske FA, Mattick JS, Bailey TL. Potential in vivo roles of nucleic acid triple-helices. RNA Biol 2011; 8:427-39. [PMID: 21525785 DOI: 10.4161/rna.8.3.14999] [Citation(s) in RCA: 143] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
The ability of double-stranded DNA to form a triple-helical structure by hydrogen bonding with a third strand is well established, but the biological functions of these structures remain largely unknown. There is considerable albeit circumstantial evidence for the existence of nucleic triplexes in vivo and their potential participation in a variety of biological processes including chromatin organization, DNA repair, transcriptional regulation, and RNA processing has been investigated in a number of studies to date. There is also a range of possible mechanisms to regulate triplex formation through differential expression of triplex-forming RNAs, alteration of chromatin accessibility, sequence unwinding and nucleotide modifications. With the advent of next generation sequencing technology combined with targeted approaches to isolate triplexes, it is now possible to survey triplex formation with respect to their genomic context, abundance and dynamical changes during differentiation and development, which may open up new vistas in understanding genome biology and gene regulation.
Collapse
Affiliation(s)
- Fabian A Buske
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD Australia
| | | | | |
Collapse
|
6
|
Examination of genome homogeneity in prokaryotes using genomic signatures. PLoS One 2009; 4:e8113. [PMID: 19956556 PMCID: PMC2781299 DOI: 10.1371/journal.pone.0008113] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2009] [Accepted: 11/05/2009] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND DNA word frequencies, normalized for genomic AT content, are remarkably stable within prokaryotic genomes and are therefore said to reflect a "genomic signature." The genomic signatures can be used to phylogenetically classify organisms from arbitrary sampled DNA. Genomic signatures can also be used to search for horizontally transferred DNA or DNA regions subjected to special selection forces. Thus, the stability of the genomic signature can be used as a measure of genomic homogeneity. The factors associated with the stability of the genomic signatures are not known, and this motivated us to investigate further. We analyzed the intra-genomic variance of genomic signatures based on AT content normalization (0(th) order Markov model) as well as genomic signatures normalized by smaller DNA words (1(st) and 2(nd) order Markov models) for 636 sequenced prokaryotic genomes. Regression models were fitted, with intra-genomic signature variance as the response variable, to a set of factors representing genomic properties such as genomic AT content, genome size, habitat, phylum, oxygen requirement, optimal growth temperature and oligonucleotide usage variance (OUV, a measure of oligonucleotide usage bias), measured as the variance between genomic tetranucleotide frequencies and Markov chain approximated tetranucleotide frequencies, as predictors. PRINCIPAL FINDINGS Regression analysis revealed that OUV was the most important factor (p<0.001) determining intra-genomic homogeneity as measured using genomic signatures. This means that the less random the oligonucleotide usage is in the sense of higher OUV, the more homogeneous the genome is in terms of the genomic signature. The other factors influencing variance in the genomic signature (p<0.001) were genomic AT content, phylum and oxygen requirement. CONCLUSIONS Genomic homogeneity in prokaryotes is intimately linked to genomic GC content, oligonucleotide usage bias (OUV) and aerobiosis, while oligonucleotide usage bias (OUV) is associated with genomic GC content, aerobiosis and habitat.
Collapse
|
7
|
Hallin PF, Stærfeldt HH, Rotenberg E, Binnewies TT, Benham CJ, Ussery DW. GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes. Stand Genomic Sci 2009; 1:204-15. [PMID: 21304658 PMCID: PMC3035224 DOI: 10.4056/sigs.28177] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
We present an interactive web application for visualizing genomic data of prokaryotic chromosomes. The tool (GeneWiz browser) allows users to carry out various analyses such as mapping alignments of homologous genes to other genomes, mapping of short sequencing reads to a reference chromosome, and calculating DNA properties such as curvature or stacking energy along the chromosome. The GeneWiz browser produces an interactive graphic that enables zooming from a global scale down to single nucleotides, without changing the size of the plot. Its ability to disproportionally zoom provides optimal readability and increased functionality compared to other browsers. The tool allows the user to select the display of various genomic features, color setting and data ranges. Custom numerical data can be added to the plot allowing, for example, visualization of gene expression and regulation data. Further, standard atlases are pre-generated for all prokaryotic genomes available in GenBank, providing a fast overview of all available genomes, including recently deposited genome sequences. The tool is available online from http://www.cbs.dtu.dk/services/gwBrowser. Supplemental material including interactive atlases is available online at http://www.cbs.dtu.dk/services/gwBrowser/suppl/.
Collapse
|
8
|
Hallin PF, Stærfeldt HH, Rotenberg E, Binnewies TT, Benham CJ, Ussery DW. GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes. Stand Genomic Sci 2009. [DOI: 10.4056/sigs.28608] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
- Peter F. Hallin
- 1Center for Biological Sequence Analysis, Department of Systems Biology, The Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - Hans-Henrik Stærfeldt
- 1Center for Biological Sequence Analysis, Department of Systems Biology, The Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | | | | | - Craig J. Benham
- 4UC Davis Genome Center, University of California, Davis, California, U.S.A
| | - David W. Ussery
- 1Center for Biological Sequence Analysis, Department of Systems Biology, The Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| |
Collapse
|
9
|
Bohlin J, Hardy SP, Ussery DW. Stretches of alternating pyrimidine/purines and purines are respectively linked with pathogenicity and growth temperature in prokaryotes. BMC Genomics 2009; 10:346. [PMID: 19646265 PMCID: PMC2728739 DOI: 10.1186/1471-2164-10-346] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2009] [Accepted: 07/31/2009] [Indexed: 02/02/2023] Open
Abstract
Background The genomic fractions of purine (RR) and alternating pyrimidine/purine (YR) stretches of 10 base pairs or more, have been linked to genomic AT content, the formation of different DNA helices, strand-biased gene distribution, DNA structure, and more. Although some of these factors are a consequence of the chemical properties of purines and pyrimidines, a thorough statistical examination of the distributions of YR/RR stretches in sequenced prokaryotic chromosomes has to the best of our knowledge, not been undertaken. The aim of this study is to expand upon previous research by using regression analysis to investigate how AT content, habitat, growth temperature, pathogenicity, phyla, oxygen requirement and halotolerance correlated with the distribution of RR and YR stretches in prokaryotes. Results Our results indicate that RR and YR-stretches are differently distributed in prokaryotic phyla. RR stretches are overrepresented in all phyla except for the Actinobacteria and β-Proteobacteria. In contrast, YR tracts are underrepresented in all phyla except for the β-Proteobacterial group. YR-stretches are associated with phylum, pathogenicity and habitat, whilst RR-tracts are associated with phylum, AT content, oxygen requirement, growth temperature and halotolerance. All associations described were statistically significant with p < 0.001. Conclusion Analysis of chromosomal distributions of RR/YR sequences in prokaryotes reveals a set of associations with environmental factors not observed with mono- and oligonucleotide frequencies. This implies that important information can be found in the distribution of RR/YR stretches that is more difficult to obtain from genomic mono- and oligonucleotide frequencies. The association between pathogenicity and fractions of YR stretches is assumed to be linked to recombination and horizontal transfer.
Collapse
Affiliation(s)
- Jon Bohlin
- Norwegian School of Veterinary Science, Oslo, Norway.
| | | | | |
Collapse
|
10
|
|
11
|
Rational vector design for efficient non-viral gene delivery: challenges facing the use of plasmid DNA. Mol Biotechnol 2008; 39:97-104. [PMID: 18327557 DOI: 10.1007/s12033-008-9046-7] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Although non-viral gene delivery is a very straightforward technology, there are currently no FDA-approved gene medicinal products available. Therefore, improving potency, safety, and efficiency of current plasmid DNA vectors will be a major task for the near future. This article will provide an overview on factors influencing production yield and quality as well as safety issues that emerge from the vector design itself. Special focus will be on generating bacterial pDNA vectors by circumventing the use of antibiotic resistance genes, to generate safer gene medicinal products as well as smaller, more efficient DNA vectors.
Collapse
|
12
|
Bohlin J, Skjerve E, Ussery DW. Investigations of oligonucleotide usage variance within and between prokaryotes. PLoS Comput Biol 2008; 4:e1000057. [PMID: 18421372 PMCID: PMC2289840 DOI: 10.1371/journal.pcbi.1000057] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2007] [Accepted: 03/12/2008] [Indexed: 11/18/2022] Open
Abstract
Oligonucleotide usage in archaeal and bacterial genomes can be linked to a number of properties, including codon usage (trinucleotides), DNA base-stacking energy (dinucleotides), and DNA structural conformation (di- to tetranucleotides). We wanted to assess the statistical information potential of different DNA 'word-sizes' and explore how oligonucleotide frequencies differ in coding and non-coding regions. In addition, we used oligonucleotide frequencies to investigate DNA composition and how DNA sequence patterns change within and between prokaryotic organisms. Among the results found was that prokaryotic chromosomes can be described by hexanucleotide frequencies, suggesting that prokaryotic DNA is predominantly short range correlated, i.e., information in prokaryotic genomes is encoded in short oligonucleotides. Oligonucleotide usage varied more within AT-rich and host-associated genomes than in GC-rich and free-living genomes, and this variation was mainly located in non-coding regions. Bias (selectional pressure) in tetranucleotide usage correlated with GC content, and coding regions were more biased than non-coding regions. Non-coding regions were also found to be approximately 5.5% more AT-rich than coding regions, on average, in the 402 chromosomes examined. Pronounced DNA compositional differences were found both within and between AT-rich and GC-rich genomes. GC-rich genomes were more similar and biased in terms of tetranucleotide usage in non-coding regions than AT-rich genomes. The differences found between AT-rich and GC-rich genomes may possibly be attributed to lifestyle, since tetranucleotide usage within host-associated bacteria was, on average, more dissimilar and less biased than free-living archaea and bacteria.
Collapse
Affiliation(s)
- Jon Bohlin
- Norwegian School of Veterinary Science, Oslo, Norway
| | | | | |
Collapse
|
13
|
Reva ON, Hallin PF, Willenbrock H, Sicheritz-Ponten T, Tümmler B, Ussery DW. Global features of the Alcanivorax borkumensis SK2 genome. Environ Microbiol 2007; 10:614-25. [PMID: 18081853 DOI: 10.1111/j.1462-2920.2007.01483.x] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
The global feature of the completely sequenced Alcanivorax borkumensis SK2 type strain chromosome is its symmetry and homogeneity. The origin and terminus of replication are located opposite to each other in the chromosome and are discerned with high signal to noise ratios by maximal oligonucleotide usage biases on the leading and lagging strand. Genomic DNA structure is rather uniform throughout the chromosome with respect to intrinsic curvature, position preference or base stacking energy. The orthologs and paralogs of A. borkumensis genes with the highest sequence homology were found in most cases among gamma-Proteobacteria, with Acinetobacter and P. aeruginosa as closest relatives. A. borkumensis shares a similar oligonucleotide usage and promoter structure with the Pseudomonadales. A comparatively low number of only 18 genome islands with atypical oligonucleotide usage was detected in the A. borkumensis chromosome. The gene clusters that confer the assimilation of aliphatic hydrocarbons, are localized in two genome islands which were probably acquired from an ancestor of the Yersinia lineage, whereas the alk genes of Pseudomonas putida still exhibit the typical Alcanivorax oligonucleotide signature indicating a complex evolution of this major hydrocarbonoclastic trait.
Collapse
Affiliation(s)
- Oleg N Reva
- Klinische Forschergruppe, OE6711, Medizinische Hochschule Hannover, Carl-Neuberg-Strasse 1, D-30625 Hannover, Germany
| | | | | | | | | | | |
Collapse
|
14
|
van de Vondervoort PJI, Langeveld SMJ, Visser J, van Peij NNME, Pel HJ, van den Hondel CAMJJ, Ram AFJ. Identification of a mitotic recombination hotspot on chromosome III of the asexual fungus Aspergillus niger and its possible correlation with [corrected] elevated basal transcription. Curr Genet 2007; 52:107-14. [PMID: 17684745 PMCID: PMC2071955 DOI: 10.1007/s00294-007-0143-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2007] [Revised: 06/29/2007] [Accepted: 07/02/2007] [Indexed: 11/17/2022]
Abstract
Genetic recombination is an important tool in strain breeding in many organisms. We studied the possibilities of mitotic recombination in strain breeding of the asexual fungus Aspergillus niger. By identifying genes that complemented mapped auxotrophic mutations, the physical map was compared to the genetic map of chromosome III using the genome sequence. In a program to construct a chromosome III-specific marker strain by selecting mitotic crossing-over in diploids, a mitotic recombination hotspot was identified. Analysis of the mitotic recombination hotspot revealed some physical features, elevated basal transcription and a possible correlation with purine stretches.
Collapse
Affiliation(s)
- Peter J. I. van de Vondervoort
- Institute of Biology, Leiden University, Wassenaarseweg 64, 2333AL Leiden, The Netherlands
- DSM Food Specialties, Delft, P.O. Box 1, 2600MA Delft, The Netherlands
| | - Sandra M. J. Langeveld
- Institute of Biology, Leiden University, Wassenaarseweg 64, 2333AL Leiden, The Netherlands
| | - Jaap Visser
- FGT Consultancy, P.O Box 396, 6700AJ Wageningen, The Netherlands
| | | | - Herman J. Pel
- DSM Food Specialties, Delft, P.O. Box 1, 2600MA Delft, The Netherlands
| | | | - Arthur F. J. Ram
- Institute of Biology, Leiden University, Wassenaarseweg 64, 2333AL Leiden, The Netherlands
| |
Collapse
|
15
|
Champ PC, Binnewies TT, Nielsen N, Zinman G, Kiil K, Wu H, Bohlin J, Ussery DW. Genome update: purine strand bias in 280 bacterial genomes. MICROBIOLOGY-SGM 2006; 152:579-583. [PMID: 16514138 DOI: 10.1099/mic.0.28637-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Affiliation(s)
- P Christoph Champ
- Center for Biological Sequence Analysis, BioCentrum-DTU, Building 208, The Technical University of Denmark, DK-2800 Kgs, Lyngby, Denmark
| | - Tim T Binnewies
- Center for Biological Sequence Analysis, BioCentrum-DTU, Building 208, The Technical University of Denmark, DK-2800 Kgs, Lyngby, Denmark
| | - Natasja Nielsen
- Center for Biological Sequence Analysis, BioCentrum-DTU, Building 208, The Technical University of Denmark, DK-2800 Kgs, Lyngby, Denmark
| | - Guy Zinman
- Center for Biological Sequence Analysis, BioCentrum-DTU, Building 208, The Technical University of Denmark, DK-2800 Kgs, Lyngby, Denmark
| | - Kristoffer Kiil
- Center for Biological Sequence Analysis, BioCentrum-DTU, Building 208, The Technical University of Denmark, DK-2800 Kgs, Lyngby, Denmark
| | - Heng Wu
- Center for Biological Sequence Analysis, BioCentrum-DTU, Building 208, The Technical University of Denmark, DK-2800 Kgs, Lyngby, Denmark
| | - Jon Bohlin
- Center for Biological Sequence Analysis, BioCentrum-DTU, Building 208, The Technical University of Denmark, DK-2800 Kgs, Lyngby, Denmark
| | - David W Ussery
- Center for Biological Sequence Analysis, BioCentrum-DTU, Building 208, The Technical University of Denmark, DK-2800 Kgs, Lyngby, Denmark
| |
Collapse
|
16
|
Bacolla A, Collins JR, Gold B, Chuzhanova N, Yi M, Stephens RM, Stefanov S, Olsh A, Jakupciak JP, Dean M, Lempicki RA, Cooper DN, Wells RD. Long homopurine*homopyrimidine sequences are characteristic of genes expressed in brain and the pseudoautosomal region. Nucleic Acids Res 2006; 34:2663-75. [PMID: 16714445 PMCID: PMC1464109 DOI: 10.1093/nar/gkl354] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2006] [Revised: 03/13/2006] [Accepted: 04/20/2006] [Indexed: 01/20/2023] Open
Abstract
Homo(purine*pyrimidine) sequences (R*Y tracts) with mirror repeat symmetries form stable triplexes that block replication and transcription and promote genetic rearrangements. A systematic search was conducted to map the location of the longest R*Y tracts in the human genome in order to assess their potential function(s). The 814 R*Y tracts with > or =250 uninterrupted base pairs were preferentially clustered in the pseudoautosomal region of the sex chromosomes and located in the introns of 228 annotated genes whose protein products were associated with functions at the cell membrane. These genes were highly expressed in the brain and particularly in genes associated with susceptibility to mental disorders, such as schizophrenia. The set of 1957 genes harboring the 2886 R*Y tracts with > or =100 uninterrupted base pairs was additionally enriched in proteins associated with phosphorylation, signal transduction, development and morphogenesis. Comparisons of the > or =250 bp R*Y tracts in the mouse and chimpanzee genomes indicated that these sequences have mutated faster than the surrounding regions and are longer in humans than in chimpanzees. These results support a role for long R*Y tracts in promoting recombination and genome diversity during evolution through destabilization of chromosomal DNA, thereby inducing repair and mutation.
Collapse
Affiliation(s)
- Albino Bacolla
- Institute of Biosciences and Technology, Center for Genome Research, Texas A&M University System Health Science Center, Texas Medical Center2121 West Holcombe Blvd, Houston, TX 77030, USA
- Advanced Biomedical Computing Center, NCI-FrederickFrederick, MD 21702, USA
- Laboratory of Genomic Diversity, NCI-FrederickFrederick, MD 21702, USA
- Biostatistics and Bioinformatics Unit, Cardiff UniversityCardiff CF14 4XN, UK
- Institute of Medical Genetics, Cardiff UniversityHeath Park, Cardiff CF14 4XN, UK
- National Institute of Standards and Technology, DNA Technologies Group, Biotechnology DivisionGaithersburg, MD 20899, USA
- Laboratory of Immunopathogenesis and Bioinformatics, SAIC-Frederick, Inc.Frederick, MD 21702, USA
| | - Jack R. Collins
- Advanced Biomedical Computing Center, NCI-FrederickFrederick, MD 21702, USA
| | - Bert Gold
- Laboratory of Genomic Diversity, NCI-FrederickFrederick, MD 21702, USA
| | - Nadia Chuzhanova
- Biostatistics and Bioinformatics Unit, Cardiff UniversityCardiff CF14 4XN, UK
- Institute of Medical Genetics, Cardiff UniversityHeath Park, Cardiff CF14 4XN, UK
| | - Ming Yi
- Advanced Biomedical Computing Center, NCI-FrederickFrederick, MD 21702, USA
| | - Robert M. Stephens
- Advanced Biomedical Computing Center, NCI-FrederickFrederick, MD 21702, USA
| | - Stefan Stefanov
- Laboratory of Genomic Diversity, NCI-FrederickFrederick, MD 21702, USA
| | - Adam Olsh
- Laboratory of Genomic Diversity, NCI-FrederickFrederick, MD 21702, USA
| | - John P. Jakupciak
- National Institute of Standards and Technology, DNA Technologies Group, Biotechnology DivisionGaithersburg, MD 20899, USA
| | - Michael Dean
- Laboratory of Genomic Diversity, NCI-FrederickFrederick, MD 21702, USA
| | - Richard A. Lempicki
- Laboratory of Immunopathogenesis and Bioinformatics, SAIC-Frederick, Inc.Frederick, MD 21702, USA
| | - David N. Cooper
- Institute of Medical Genetics, Cardiff UniversityHeath Park, Cardiff CF14 4XN, UK
| | - Robert D. Wells
- To whom correspondence should be addressed. Tel: +1 713 677 7651; Fax: +1 713 677 7689;
| |
Collapse
|
17
|
Geisler M, Kleczkowski LA, Karpinski S. A universal algorithm for genome-wide in silicio identification of biologically significant gene promoter putative cis-regulatory-elements; identification of new elements for reactive oxygen species and sucrose signaling in Arabidopsis. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2006; 45:384-98. [PMID: 16412085 DOI: 10.1111/j.1365-313x.2005.02634.x] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Short motifs of many cis-regulatory elements (CREs) can be found in the promoters of most Arabidopsis genes, and this raises the question of how their presence can confer specific regulation. We developed a universal algorithm to test the biological significance of CREs by first identifying every Arabidopsis gene with a CRE and then statistically correlating the presence or absence of the element with the gene expression profile on multiple DNA microarrays. This algorithm was successfully verified for previously characterized abscisic acid, ethylene, sucrose and drought responsive CREs in Arabidopsis, showing that the presence of these elements indeed correlates with treatment-specific gene induction. Later, we used standard motif sampling methods to identify 128 putative motifs induced by excess light, reactive oxygen species and sucrose. Our algorithm was able to filter 20 out of 128 novel CREs which significantly correlated with gene induction by either heat, reactive oxygen species and/or sucrose. The position, orientation and sequence specificity of CREs was tested in silicio by analyzing the expression of genes with naturally occurring sequence variations. In three novel CREs the forward orientation correlated with sucrose induction and the reverse orientation with sucrose suppression. The functionality of the predicted novel CREs was experimentally confirmed using Arabidopsis cell-suspension cultures transformed with short promoter fragments or artificial promoters fused with the GUS reporter gene. Our genome-wide analysis opens up new possibilities for in silicio verification of the biological significance of newly discovered CREs, and allows for subsequent selection of such CREs for experimental studies.
Collapse
Affiliation(s)
- Matt Geisler
- Department of Plant Physiology, Umeå Plant Science Centre, Umeå University, 901 87 Umeå, Sweden
| | | | | |
Collapse
|
18
|
Woodward KJ, Cundall M, Sperle K, Sistermans EA, Ross M, Howell G, Gribble SM, Burford DC, Carter NP, Hobson DL, Garbern JY, Kamholz J, Heng H, Hodes ME, Malcolm S, Hobson GM. Heterogeneous duplications in patients with Pelizaeus-Merzbacher disease suggest a mechanism of coupled homologous and nonhomologous recombination. Am J Hum Genet 2005; 77:966-87. [PMID: 16380909 PMCID: PMC1285180 DOI: 10.1086/498048] [Citation(s) in RCA: 83] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2005] [Accepted: 09/12/2005] [Indexed: 11/04/2022] Open
Abstract
We describe genomic structures of 59 X-chromosome segmental duplications that include the proteolipid protein 1 gene (PLP1) in patients with Pelizaeus-Merzbacher disease. We provide the first report of 13 junction sequences, which gives insight into underlying mechanisms. Although proximal breakpoints were highly variable, distal breakpoints tended to cluster around low-copy repeats (LCRs) (50% of distal breakpoints), and each duplication event appeared to be unique (100 kb to 4.6 Mb in size). Sequence analysis of the junctions revealed no large homologous regions between proximal and distal breakpoints. Most junctions had microhomology of 1-6 bases, and one had a 2-base insertion. Boundaries between single-copy and duplicated DNA were identical to the reference genomic sequence in all patients investigated. Taken together, these data suggest that the tandem duplications are formed by a coupled homologous and nonhomologous recombination mechanism. We suggest repair of a double-stranded break (DSB) by one-sided homologous strand invasion of a sister chromatid, followed by DNA synthesis and nonhomologous end joining with the other end of the break. This is in contrast to other genomic disorders that have recurrent rearrangements formed by nonallelic homologous recombination between LCRs. Interspersed repetitive elements (Alu elements, long interspersed nuclear elements, and long terminal repeats) were found at 18 of the 26 breakpoint sequences studied. No specific motif that may predispose to DSBs was revealed, but single or alternating tracts of purines and pyrimidines that may cause secondary structures were common. Analysis of the 2-Mb region susceptible to duplications identified proximal-specific repeats and distal LCRs in addition to the previously reported ones, suggesting that the unique genomic architecture may have a role in nonrecurrent rearrangements by promoting instability.
Collapse
Affiliation(s)
- Karen J. Woodward
- Clinical and Molecular Genetics, Institute of Child Health, London; Western Diagnostic Pathology, Perth, Australia; Nemours Biomedical Research, Alfred I. duPont Hospital for Children, Nemours Children’s Clinic, Wilmington, DE; Department of Human Genetics, Radboud University, Nijmegen, The Netherlands; The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom; Department of Neurology and Center for Molecular Medicine and Genetics, Wayne State University, Detroit; Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis; and Department of Pediatrics, Thomas Jefferson University, Philadelphia
| | - Maria Cundall
- Clinical and Molecular Genetics, Institute of Child Health, London; Western Diagnostic Pathology, Perth, Australia; Nemours Biomedical Research, Alfred I. duPont Hospital for Children, Nemours Children’s Clinic, Wilmington, DE; Department of Human Genetics, Radboud University, Nijmegen, The Netherlands; The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom; Department of Neurology and Center for Molecular Medicine and Genetics, Wayne State University, Detroit; Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis; and Department of Pediatrics, Thomas Jefferson University, Philadelphia
| | - Karen Sperle
- Clinical and Molecular Genetics, Institute of Child Health, London; Western Diagnostic Pathology, Perth, Australia; Nemours Biomedical Research, Alfred I. duPont Hospital for Children, Nemours Children’s Clinic, Wilmington, DE; Department of Human Genetics, Radboud University, Nijmegen, The Netherlands; The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom; Department of Neurology and Center for Molecular Medicine and Genetics, Wayne State University, Detroit; Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis; and Department of Pediatrics, Thomas Jefferson University, Philadelphia
| | - Erik A. Sistermans
- Clinical and Molecular Genetics, Institute of Child Health, London; Western Diagnostic Pathology, Perth, Australia; Nemours Biomedical Research, Alfred I. duPont Hospital for Children, Nemours Children’s Clinic, Wilmington, DE; Department of Human Genetics, Radboud University, Nijmegen, The Netherlands; The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom; Department of Neurology and Center for Molecular Medicine and Genetics, Wayne State University, Detroit; Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis; and Department of Pediatrics, Thomas Jefferson University, Philadelphia
| | - Mark Ross
- Clinical and Molecular Genetics, Institute of Child Health, London; Western Diagnostic Pathology, Perth, Australia; Nemours Biomedical Research, Alfred I. duPont Hospital for Children, Nemours Children’s Clinic, Wilmington, DE; Department of Human Genetics, Radboud University, Nijmegen, The Netherlands; The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom; Department of Neurology and Center for Molecular Medicine and Genetics, Wayne State University, Detroit; Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis; and Department of Pediatrics, Thomas Jefferson University, Philadelphia
| | - Gareth Howell
- Clinical and Molecular Genetics, Institute of Child Health, London; Western Diagnostic Pathology, Perth, Australia; Nemours Biomedical Research, Alfred I. duPont Hospital for Children, Nemours Children’s Clinic, Wilmington, DE; Department of Human Genetics, Radboud University, Nijmegen, The Netherlands; The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom; Department of Neurology and Center for Molecular Medicine and Genetics, Wayne State University, Detroit; Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis; and Department of Pediatrics, Thomas Jefferson University, Philadelphia
| | - Susan M. Gribble
- Clinical and Molecular Genetics, Institute of Child Health, London; Western Diagnostic Pathology, Perth, Australia; Nemours Biomedical Research, Alfred I. duPont Hospital for Children, Nemours Children’s Clinic, Wilmington, DE; Department of Human Genetics, Radboud University, Nijmegen, The Netherlands; The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom; Department of Neurology and Center for Molecular Medicine and Genetics, Wayne State University, Detroit; Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis; and Department of Pediatrics, Thomas Jefferson University, Philadelphia
| | - Deborah C. Burford
- Clinical and Molecular Genetics, Institute of Child Health, London; Western Diagnostic Pathology, Perth, Australia; Nemours Biomedical Research, Alfred I. duPont Hospital for Children, Nemours Children’s Clinic, Wilmington, DE; Department of Human Genetics, Radboud University, Nijmegen, The Netherlands; The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom; Department of Neurology and Center for Molecular Medicine and Genetics, Wayne State University, Detroit; Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis; and Department of Pediatrics, Thomas Jefferson University, Philadelphia
| | - Nigel P. Carter
- Clinical and Molecular Genetics, Institute of Child Health, London; Western Diagnostic Pathology, Perth, Australia; Nemours Biomedical Research, Alfred I. duPont Hospital for Children, Nemours Children’s Clinic, Wilmington, DE; Department of Human Genetics, Radboud University, Nijmegen, The Netherlands; The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom; Department of Neurology and Center for Molecular Medicine and Genetics, Wayne State University, Detroit; Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis; and Department of Pediatrics, Thomas Jefferson University, Philadelphia
| | - Donald L. Hobson
- Clinical and Molecular Genetics, Institute of Child Health, London; Western Diagnostic Pathology, Perth, Australia; Nemours Biomedical Research, Alfred I. duPont Hospital for Children, Nemours Children’s Clinic, Wilmington, DE; Department of Human Genetics, Radboud University, Nijmegen, The Netherlands; The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom; Department of Neurology and Center for Molecular Medicine and Genetics, Wayne State University, Detroit; Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis; and Department of Pediatrics, Thomas Jefferson University, Philadelphia
| | - James Y. Garbern
- Clinical and Molecular Genetics, Institute of Child Health, London; Western Diagnostic Pathology, Perth, Australia; Nemours Biomedical Research, Alfred I. duPont Hospital for Children, Nemours Children’s Clinic, Wilmington, DE; Department of Human Genetics, Radboud University, Nijmegen, The Netherlands; The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom; Department of Neurology and Center for Molecular Medicine and Genetics, Wayne State University, Detroit; Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis; and Department of Pediatrics, Thomas Jefferson University, Philadelphia
| | - John Kamholz
- Clinical and Molecular Genetics, Institute of Child Health, London; Western Diagnostic Pathology, Perth, Australia; Nemours Biomedical Research, Alfred I. duPont Hospital for Children, Nemours Children’s Clinic, Wilmington, DE; Department of Human Genetics, Radboud University, Nijmegen, The Netherlands; The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom; Department of Neurology and Center for Molecular Medicine and Genetics, Wayne State University, Detroit; Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis; and Department of Pediatrics, Thomas Jefferson University, Philadelphia
| | - Henry Heng
- Clinical and Molecular Genetics, Institute of Child Health, London; Western Diagnostic Pathology, Perth, Australia; Nemours Biomedical Research, Alfred I. duPont Hospital for Children, Nemours Children’s Clinic, Wilmington, DE; Department of Human Genetics, Radboud University, Nijmegen, The Netherlands; The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom; Department of Neurology and Center for Molecular Medicine and Genetics, Wayne State University, Detroit; Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis; and Department of Pediatrics, Thomas Jefferson University, Philadelphia
| | - M. E. Hodes
- Clinical and Molecular Genetics, Institute of Child Health, London; Western Diagnostic Pathology, Perth, Australia; Nemours Biomedical Research, Alfred I. duPont Hospital for Children, Nemours Children’s Clinic, Wilmington, DE; Department of Human Genetics, Radboud University, Nijmegen, The Netherlands; The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom; Department of Neurology and Center for Molecular Medicine and Genetics, Wayne State University, Detroit; Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis; and Department of Pediatrics, Thomas Jefferson University, Philadelphia
| | - Sue Malcolm
- Clinical and Molecular Genetics, Institute of Child Health, London; Western Diagnostic Pathology, Perth, Australia; Nemours Biomedical Research, Alfred I. duPont Hospital for Children, Nemours Children’s Clinic, Wilmington, DE; Department of Human Genetics, Radboud University, Nijmegen, The Netherlands; The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom; Department of Neurology and Center for Molecular Medicine and Genetics, Wayne State University, Detroit; Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis; and Department of Pediatrics, Thomas Jefferson University, Philadelphia
| | - Grace M. Hobson
- Clinical and Molecular Genetics, Institute of Child Health, London; Western Diagnostic Pathology, Perth, Australia; Nemours Biomedical Research, Alfred I. duPont Hospital for Children, Nemours Children’s Clinic, Wilmington, DE; Department of Human Genetics, Radboud University, Nijmegen, The Netherlands; The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom; Department of Neurology and Center for Molecular Medicine and Genetics, Wayne State University, Detroit; Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis; and Department of Pediatrics, Thomas Jefferson University, Philadelphia
| |
Collapse
|
19
|
Cooke JR, McKie EA, Ward JM, Keshavarz-Moore E. Impact of intrinsic DNA structure on processing of plasmids for gene therapy and DNA vaccines. J Biotechnol 2005; 114:239-54. [PMID: 15522434 DOI: 10.1016/j.jbiotec.2004.06.011] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2004] [Revised: 06/21/2004] [Accepted: 06/29/2004] [Indexed: 11/23/2022]
Abstract
Several non-Watson Crick DNA structures have been discovered to date, which may be incorporated into future plasmid constructs for gene therapy and DNA vaccine products. In this study, intrinsic DNA structures were included at a defined point in a 2.9 kb plasmid, and their effects on cell growth rate, total plasmid yield, and topology (i.e. the relative proportions of supercoiled plasmid, open circular and linear forms), were determined. The stability of the inserted sequences were assessed using gel electrophoresis. Z-DNA was shown to be unstable in a batch Escherichia coli DH1 production system grown in complex medium. Encouragingly other sequences studied (triplex, bend and quadruplex) did not cause spontaneous deletions, and no detrimental effect was found on growth rate or on total plasmid yield; indicating that such sequences could be included in future DNA products without any detrimental effect on plasmid yields; although the intra molecular triplex studied significantly decreased the proportion of supercoiled species.
Collapse
Affiliation(s)
- James R Cooke
- Department of Biochemical Engineering, UCL, Torrington Place, London WC1E 7JE, UK
| | | | | | | |
Collapse
|
20
|
Willenbrock H, Binnewies TT, Hallin PF, Ussery DW. Genome update: 2D clustering of bacterial genomes. MICROBIOLOGY-SGM 2005; 151:333-336. [PMID: 15699184 DOI: 10.1099/mic.0.27811-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Affiliation(s)
- Hanni Willenbrock
- Center for Biological Sequence Analysis, Department of Biotechnology, Building 208, The Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
| | - Tim T Binnewies
- Center for Biological Sequence Analysis, Department of Biotechnology, Building 208, The Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
| | - Peter F Hallin
- Center for Biological Sequence Analysis, Department of Biotechnology, Building 208, The Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
| | - David W Ussery
- Center for Biological Sequence Analysis, Department of Biotechnology, Building 208, The Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
| |
Collapse
|
21
|
Homopolymer tract length dependent enrichments in functional regions of 27 eukaryotes and their novel dependence on the organism DNA (G+C)% composition. BMC Genomics 2004; 5:95. [PMID: 15598342 PMCID: PMC539357 DOI: 10.1186/1471-2164-5-95] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2004] [Accepted: 12/14/2004] [Indexed: 12/16/2022] Open
Abstract
Background DNA homopolymer tracts, poly(dA).poly(dT) and poly(dG).poly(dC), are the simplest of simple sequence repeats. Homopolymer tracts have been systematically examined in the coding, intron and flanking regions of a limited number of eukaryotes. As the number of DNA sequences publicly available increases, the representation (over and under) of homopolymer tracts of different lengths in these regions of different genomes can be compared. Results We carried out a survey of the extent of homopolymer tract over-representation (enrichment) and over-proportional length distribution (above expected length) primarily in the single gene documents, but including some whole chromosomes of 27 eukaryotics across the (G+C)% composition range from 20 – 60%. A total of 5.2 × 107 bases from 15,560 cleaned (redundancy removed) sequence documents were analyzed. Calculated frequencies of non-overlapping long homopolymer tracts were found over-represented in non-coding sequences of eukaryotes. Long poly(dA).poly(dT) tracts demonstrated an exponential increase with tract length compared to predicted frequencies. A novel negative slope was observed for all eukaryotes between their (G+C)% composition and the threshold length N where poly(dA).poly(dT) tracts exhibited over-representation and a corresponding positive slope was observed for poly(dG).poly(dC) tracts. Tract size thresholds where over-representation of tracts in different eukaryotes began to occur was between 4 – 11 bp depending upon the organism (G+C)% composition. The higher the GC%, the lower the threshold N value was for poly(dA).poly(dT) tracts, meaning that the over-representation happens at relatively lower tract length in more GC-rich surrounding sequence. We also observed a novel relationship between the highest over-representations, as well as lengths of homopolymer tracts in excess of their random occurrence expected maximum lengths. Conclusions We discuss how our novel tract over-representation observations can be accounted for by a few models. A likely model for poly(dA).poly(dT) tract over-representation involves the known insertion into genomes of DNA synthesized from retroviral mRNAs containing 3' polyA tails. A proposed model that can account for a number of our observed results, concerns the origin of the isochore nature of eukaryotic genomes via a non-equilibrium GC% dependent mutation rate mechanism. Our data also suggest that tract lengthening via slip strand replication is not governed by a simple thermodynamic loop energy model.
Collapse
|
22
|
Paz A, Mester D, Baca I, Nevo E, Korol A. Adaptive role of increased frequency of polypurine tracts in mRNA sequences of thermophilic prokaryotes. Proc Natl Acad Sci U S A 2004; 101:2951-6. [PMID: 14973185 PMCID: PMC365726 DOI: 10.1073/pnas.0308594100] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The mechanism of an organism's adaptation to high temperatures has been investigated intensively in recent years. It was suggested that the macromolecules of thermophilic microorganisms (especially proteins) have structural features that enhance their thermostability. We compared mRNA sequences of 72 fully sequenced prokaryotic proteomes (14 thermophilic and 58 mesophilic species). Although the differences between the percentage of adenine plus guanine content of whole mRNAs of different prokaryotic species are much lower than those of guanine plus cytosine content, the thermophile purine-pyrimidine (R/Y) ratio within their mRNAs is significantly higher than that of the mesophiles. The first and third codon positions of both thermophiles and mesophiles are purine-biased, with the bias more pronounced by the thermophiles. Thermophile mRNAs that display the highest R/Y ratio (1.43-1.69) are those of the ribosomal proteins, histone-like proteins, DNA-dependent RNA polymerase subunits, and heat-shock proteins. Within mesophilic prokaryotes and five eukaryotic species, the R/Y ratio of the mRNAs of heat-shock proteins is higher than their average over coding part of the genome. Polypurine tracts (R)(n) (with n > or = 5) are much more abundant within the thermophile mRNAs compared with mesophiles. Between two sequential pure-purinic codons of thermophile mRNAs, there is a rather strong tendency for the occurrence of adenine but not guanine tracts. The data suggest that mixed adenine.guanine and polyadenine tracts in mRNAs increase the thermostability beyond the contribution of amino acids encoded by purine tracts, which highlights the importance of ecological stress in the evolution of genome architecture.
Collapse
Affiliation(s)
- Arnon Paz
- Institute of Evolution, Haifa University, Mount Carmel, Haifa 31905, Israel
| | | | | | | | | |
Collapse
|
23
|
Goñi JR, de la Cruz X, Orozco M. Triplex-forming oligonucleotide target sequences in the human genome. Nucleic Acids Res 2004; 32:354-60. [PMID: 14726484 PMCID: PMC373298 DOI: 10.1093/nar/gkh188] [Citation(s) in RCA: 125] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The existence of sequences in the human genome which can be a target for triplex formation, and accordingly are candidates for anti-gene therapies, has been studied by using bioinformatics tools. It was found that the population of triplex-forming oligonucleotide target sequences (TTS) is much more abundant than that expected from simple random models. The population of TTS is large in all the genome, without major differences between chromosomes. A wide analysis along annotated regions of the genome allows us to demonstrate that the largest relative concentration of TTS is found in regulatory regions, especially in promoter zones, which suggests a tremendous potentiality for triplex strategy in the control of gene expression. The dependence of the stability and selectivity of the triplexes on the length of the TTS is also analysed using knowledge-based rules.
Collapse
Affiliation(s)
- J Ramon Goñi
- Molecular Modelling and Bioinformatics Unit, Institut de Recerca Biomédica, Parc Científic de Barcelona, Josep Samitier 1-5, Barcelona 08028, Spain
| | | | | |
Collapse
|
24
|
Petersen L, Larsen TS, Ussery DW, On SLW, Krogh A. RpoD promoters in Campylobacter jejuni exhibit a strong periodic signal instead of a -35 box. J Mol Biol 2003; 326:1361-72. [PMID: 12595250 DOI: 10.1016/s0022-2836(03)00034-2] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We have used a hidden Markov model (HMM) to identify the consensus sequence of the RpoD promoters in the genome of Campylobacter jejuni. The identified promoter consensus sequence is unusual compared to other bacteria, in that the region upstream of the TATA-box does not contain a conserved -35 region, but shows a very strong periodic variation in the AT-content and semi-conserved T-stretches, with a period of 10-11 nucleotides. The TATA-box is in some, but not all cases, preceded by a TGx, similar to an extended -10 promoter. We predicted a total of 764 presumed RpoD promoters in the C.jejuni genome, of which 654 were located upstream of annotated genes. A similar promoter was identified in Helicobacter pylori, a close phylogenetic relative of Campylobacter, but not in Escherichia coli, Vibrio cholerae, or six other Proteobacterial genomes, or in Staphylococcus aureus. We used upstream regions of high confidence genes as training data (n=529, for the C.jejuni genome). We found it necessary to limit the training set to genes that are preceded by an intergenic region of >100bp or by a gene oriented in the opposite direction to be able to identify a conserved sequence motif, and ended up with a training set of 175 genes. This leads to the conclusion that the remaining genes (354) are more rarely preceded by a (RpoD) promoter, and consequently that operon structure may be more widespread in C.jejuni than has been assumed by others. Structural predictions of the regions upstream of the TATA-box indicates a region of highly curved DNA, and we assume that this facilitates the wrapping of the DNA around the RNA polymerase holoenzyme, and offsets the absence of a conserved -35 binding motif.
Collapse
Affiliation(s)
- Lise Petersen
- Center for Biological Sequence Analysis, Technical University of Denmark, DK-2800 Lyngby, Denmark.
| | | | | | | | | |
Collapse
|
25
|
|