151
|
Ueno S, Le Provost G, Léger V, Klopp C, Noirot C, Frigerio JM, Salin F, Salse J, Abrouk M, Murat F, Brendel O, Derory J, Abadie P, Léger P, Cabane C, Barré A, de Daruvar A, Couloux A, Wincker P, Reviron MP, Kremer A, Plomion C. Bioinformatic analysis of ESTs collected by Sanger and pyrosequencing methods for a keystone forest tree species: oak. BMC Genomics 2010; 11:650. [PMID: 21092232 PMCID: PMC3017864 DOI: 10.1186/1471-2164-11-650] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2010] [Accepted: 11/23/2010] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND The Fagaceae family comprises about 1,000 woody species worldwide. About half belong to the Quercus family. These oaks are often a source of raw material for biomass wood and fiber. Pedunculate and sessile oaks, are among the most important deciduous forest tree species in Europe. Despite their ecological and economical importance, very few genomic resources have yet been generated for these species. Here, we describe the development of an EST catalogue that will support ecosystem genomics studies, where geneticists, ecophysiologists, molecular biologists and ecologists join their efforts for understanding, monitoring and predicting functional genetic diversity. RESULTS We generated 145,827 sequence reads from 20 cDNA libraries using the Sanger method. Unexploitable chromatograms and quality checking lead us to eliminate 19,941 sequences. Finally a total of 125,925 ESTs were retained from 111,361 cDNA clones. Pyrosequencing was also conducted for 14 libraries, generating 1,948,579 reads, from which 370,566 sequences (19.0%) were eliminated, resulting in 1,578,192 sequences. Following clustering and assembly using TGICL pipeline, 1,704,117 EST sequences collapsed into 69,154 tentative contigs and 153,517 singletons, providing 222,671 non-redundant sequences (including alternative transcripts). We also assembled the sequences using MIRA and PartiGene software and compared the three unigene sets. Gene ontology annotation was then assigned to 29,303 unigene elements. Blast search against the SWISS-PROT database revealed putative homologs for 32,810 (14.7%) unigene elements, but more extensive search with Pfam, Refseq_protein, Refseq_RNA and eight gene indices revealed homology for 67.4% of them. The EST catalogue was examined for putative homologs of candidate genes involved in bud phenology, cuticle formation, phenylpropanoids biosynthesis and cell wall formation. Our results suggest a good coverage of genes involved in these traits. Comparative orthologous sequences (COS) with other plant gene models were identified and allow to unravel the oak paleo-history. Simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs) were searched, resulting in 52,834 SSRs and 36,411 SNPs. All of these are available through the Oak Contig Browser http://genotoul-contigbrowser.toulouse.inra.fr:9092/Quercus_robur/index.html. CONCLUSIONS This genomic resource provides a unique tool to discover genes of interest, study the oak transcriptome, and develop new markers to investigate functional diversity in natural populations.
Collapse
Affiliation(s)
- Saneyoshi Ueno
- INRA, UMR 1202 BIOGECO, 69 route d'Arcachon, F-33612 Cestas, France
- Forestry and Forest Products Research Institute, Department of Forest Genetics, Tree Genetics Laboratory, 1 Matsunosato, Tsukuba, Ibaraki, 305-8687, Japan
| | | | - Valérie Léger
- INRA, UMR 1202 BIOGECO, 69 route d'Arcachon, F-33612 Cestas, France
| | - Christophe Klopp
- Plateforme bioinformatique Genotoul, UR875 Biométrie et Intelligence Artificielle, INRA, 31326 Castanet-Tolosan, France
| | - Céline Noirot
- Plateforme bioinformatique Genotoul, UR875 Biométrie et Intelligence Artificielle, INRA, 31326 Castanet-Tolosan, France
| | | | - Franck Salin
- INRA, UMR 1202 BIOGECO, 69 route d'Arcachon, F-33612 Cestas, France
| | - Jérôme Salse
- INRA/UBP UMR 1095, Laboratoire Génétique, Diversité et Ecophysiologie des Céréales, 234 avenue du Brézet, 63100 Clermont Ferrand, France
| | - Michael Abrouk
- INRA/UBP UMR 1095, Laboratoire Génétique, Diversité et Ecophysiologie des Céréales, 234 avenue du Brézet, 63100 Clermont Ferrand, France
| | - Florent Murat
- INRA/UBP UMR 1095, Laboratoire Génétique, Diversité et Ecophysiologie des Céréales, 234 avenue du Brézet, 63100 Clermont Ferrand, France
| | - Oliver Brendel
- INRA, UMR1137 EEF "Ecologie et Ecophysiologie Forestières", F 54280 Champenoux, France
| | - Jérémy Derory
- INRA, UMR 1202 BIOGECO, 69 route d'Arcachon, F-33612 Cestas, France
| | - Pierre Abadie
- INRA, UMR 1202 BIOGECO, 69 route d'Arcachon, F-33612 Cestas, France
| | - Patrick Léger
- INRA, UMR 1202 BIOGECO, 69 route d'Arcachon, F-33612 Cestas, France
| | - Cyril Cabane
- Université de Bordeaux, Centre de Bioinformatique de Bordeaux, Bordeaux, France
- CNRS, UMR 5800, Laboratoire Bordelais de Recherche en Informatique, Talence, France
| | - Aurélien Barré
- Université de Bordeaux, Centre de Bioinformatique de Bordeaux, Bordeaux, France
| | - Antoine de Daruvar
- Université de Bordeaux, Centre de Bioinformatique de Bordeaux, Bordeaux, France
- CNRS, UMR 5800, Laboratoire Bordelais de Recherche en Informatique, Talence, France
| | - Arnaud Couloux
- CEA, DSV, Genoscope, Centre National de Séquençage, 2 rue Gaston Crémieux CP5706 91057 Evry cedex, France
| | - Patrick Wincker
- CEA, DSV, Genoscope, Centre National de Séquençage, 2 rue Gaston Crémieux CP5706 91057 Evry cedex, France
| | | | - Antoine Kremer
- INRA, UMR 1202 BIOGECO, 69 route d'Arcachon, F-33612 Cestas, France
| | | |
Collapse
|
152
|
Pellegrini M, Renda ME, Vecchio A. TRStalker: an efficient heuristic for finding fuzzy tandem repeats. ACTA ACUST UNITED AC 2010; 26:i358-66. [PMID: 20529928 PMCID: PMC2881393 DOI: 10.1093/bioinformatics/btq209] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Motivation: Genomes in higher eukaryotic organisms contain a substantial amount of repeated sequences. Tandem Repeats (TRs) constitute a large class of repetitive sequences that are originated via phenomena such as replication slippage and are characterized by close spatial contiguity. They play an important role in several molecular regulatory mechanisms, and also in several diseases (e.g. in the group of trinucleotide repeat disorders). While for TRs with a low or medium level of divergence the current methods are rather effective, the problem of detecting TRs with higher divergence (fuzzy TRs) is still open. The detection of fuzzy TRs is propaedeutic to enriching our view of their role in regulatory mechanisms and diseases. Fuzzy TRs are also important as tools to shed light on the evolutionary history of the genome, where higher divergence correlates with more remote duplication events. Results: We have developed an algorithm (christened TRStalker) with the aim of detecting efficiently TRs that are hard to detect because of their inherent fuzziness, due to high levels of base substitutions, insertions and deletions. To attain this goal, we developed heuristics to solve a Steiner version of the problem for which the fuzziness is measured with respect to a motif string not necessarily present in the input string. This problem is akin to the ‘generalized median string’ that is known to be an NP-hard problem. Experiments with both synthetic and biological sequences demonstrate that our method performs better than current state of the art for fuzzy TRs and that the fuzzy TRs of the type we detect are indeed present in important biological sequences. Availability: TRStalker will be integrated in the web-based TRs Discovery Service (TReaDS) at bioalgo.iit.cnr.it. Contact:marco.pellegrini@iit.cnr.it Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Marco Pellegrini
- CNR, Istituto di Informatica e Telematica, Via Moruzzi 1, 56124 Pisa, Italy.
| | | | | |
Collapse
|
153
|
Durand J, Bodénès C, Chancerel E, Frigerio JM, Vendramin G, Sebastiani F, Buonamici A, Gailing O, Koelewijn HP, Villani F, Mattioni C, Cherubini M, Goicoechea PG, Herrán A, Ikaran Z, Cabané C, Ueno S, Alberto F, Dumoulin PY, Guichoux E, de Daruvar A, Kremer A, Plomion C. A fast and cost-effective approach to develop and map EST-SSR markers: oak as a case study. BMC Genomics 2010; 11:570. [PMID: 20950475 PMCID: PMC3091719 DOI: 10.1186/1471-2164-11-570] [Citation(s) in RCA: 116] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2009] [Accepted: 10/15/2010] [Indexed: 08/14/2023] Open
Abstract
Background Expressed Sequence Tags (ESTs) are a source of simple sequence repeats (SSRs) that can be used to develop molecular markers for genetic studies. The availability of ESTs for Quercus robur and Quercus petraea provided a unique opportunity to develop microsatellite markers to accelerate research aimed at studying adaptation of these long-lived species to their environment. As a first step toward the construction of a SSR-based linkage map of oak for quantitative trait locus (QTL) mapping, we describe the mining and survey of EST-SSRs as well as a fast and cost-effective approach (bin mapping) to assign these markers to an approximate map position. We also compared the level of polymorphism between genomic and EST-derived SSRs and address the transferability of EST-SSRs in Castanea sativa (chestnut). Results A catalogue of 103,000 Sanger ESTs was assembled into 28,024 unigenes from which 18.6% presented one or more SSR motifs. More than 42% of these SSRs corresponded to trinucleotides. Primer pairs were designed for 748 putative unigenes. Overall 37.7% (283) were found to amplify a single polymorphic locus in a reference full-sib pedigree of Quercus robur. The usefulness of these loci for establishing a genetic map was assessed using a bin mapping approach. Bin maps were constructed for the male and female parental tree for which framework linkage maps based on AFLP markers were available. The bin set consisting of 14 highly informative offspring selected based on the number and position of crossover sites. The female and male maps comprised 44 and 37 bins, with an average bin length of 16.5 cM and 20.99 cM, respectively. A total of 256 EST-SSRs were assigned to bins and their map position was further validated by linkage mapping. EST-SSRs were found to be less polymorphic than genomic SSRs, but their transferability rate to chestnut, a phylogenetically related species to oak, was higher. Conclusion We have generated a bin map for oak comprising 256 EST-SSRs. This resource constitutes a first step toward the establishment of a gene-based map for this genus that will facilitate the dissection of QTLs affecting complex traits of ecological importance.
Collapse
|
154
|
VACHON NICOLE, FREELAND JOANNAR. Phylogeographic inferences from chloroplast DNA: quantifying the effects of mutations in repetitive and non-repetitive sequences. Mol Ecol Resour 2010; 11:279-85. [DOI: 10.1111/j.1755-0998.2010.02921.x] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
155
|
BWtrs: A tool for searching for tandem repeats in DNA sequences based on the Burrows-Wheeler transform. Genomics 2010; 96:316-21. [PMID: 20709168 DOI: 10.1016/j.ygeno.2010.08.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2010] [Revised: 08/04/2010] [Accepted: 08/09/2010] [Indexed: 11/20/2022]
Abstract
Genomes of organisms contain a variety of repeated structures of various length and type, interspersed or tandem. Tandem repeats play important role in molecular biology as they are related to genetic backgrounds of inherited diseases, and also they can serve as markers for DNA mapping and DNA fingerprinting. Improving the efficiency of algorithms for searching for tandem repeats in DNA sequences can lead to many useful applications in the area of genomics. We introduce a very efficient, web-based tool for large scale searching for exact tandem repeats in genomes, based on the use of the Burrows-Wheeler Transform. The service is a remarkably efficient and powerful application that allows analyzing complete genomes without any restrictions. The Burrows-Wheeler Tandem Repeat Searcher (BWtrs) is an on-line application that searches for the exact occurrences of tandem repetitions in DNA sequences. The BWtrs service is freely available at: http://bioinfo.polsl.pl/BWtrs. We present examples of the use of our web application and we compare results of our computations with the results obtained by using other existing tools for searches for exact tandem repeats.
Collapse
|
156
|
Askitis N, Sinha R. RepMaestro: scalable repeat detection on disk-based genome sequences. ACTA ACUST UNITED AC 2010; 26:2368-74. [PMID: 20663848 DOI: 10.1093/bioinformatics/btq433] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION We investigate the problem of exact repeat detection on large genomic sequences. Most existing approaches based on suffix trees and suffix arrays (SAs) are limited either to small sequences or those that are memory resident. We introduce RepMaestro, a software that adapts existing in-memory-enhanced SA algorithms to enable them to scale efficiently to large sequences that are disk resident. Supermaximal repeats, maximal unique matches (MuMs) and pairwise branching tandem repeats have been used to demonstrate the practicality of our approach; the first such study to use an enhanced SA to detect these repeats in large genome sequences. RESULTS The detection of supermaximal repeats was observed to be up to two times faster than Vmatch, but more importantly, was shown to scale efficiently to large genome sequences that Vmatch could not process due to memory constraints (4 GB). Similar results were observed for the detection of MuMs, with RepMaestro shown to scale well and also perform up to six times faster than Vmatch. For tandem repeats, RepMaestro was found to be slower but could nonetheless scale to large disk-resident sequences. These results are a significant advance in the quest of scalable repeat detection. Software availability: RepMaestro is available at http://www.naskitis.com.
Collapse
Affiliation(s)
- Nikolas Askitis
- Department of Computer Science and Software Engineering, University of Melbourne, Australia.
| | | |
Collapse
|
157
|
Sokol D, Atagun F. TRedD--a database for tandem repeats over the edit distance. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2010; 2010:baq003. [PMID: 20624712 PMCID: PMC2911838 DOI: 10.1093/database/baq003] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
A ‘tandem repeat’ in DNA is a sequence of two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats are common in the genomes of both eukaryotic and prokaryotic organisms. They are significant markers for human identity testing, disease diagnosis, sequence homology and population studies. In this article, we describe a new database, TRedD, which contains the tandem repeats found in the human genome. The database is publicly available online, and the software for locating the repeats is also freely available. The definition of tandem repeats used by TRedD is a new and innovative definition based upon the concept of ‘evolutive tandem repeats’. In addition, we have developed a tool, called TandemGraph, to graphically depict the repeats occurring in a sequence. This tool can be coupled with any repeat finding software, and it should greatly facilitate analysis of results. Database URL:http://tandem.sci.brooklyn.cuny.edu/
Collapse
Affiliation(s)
- Dina Sokol
- Department of Computer and Information Science, Brooklyn College of the City University of New York, 2900 Bedford Avenue, Brooklyn, NY 11210, USA.
| | | |
Collapse
|
158
|
Frérot H, Faucon MP, Willems G, Godé C, Courseaux A, Darracq A, Verbruggen N, Saumitou-Laprade P. Genetic architecture of zinc hyperaccumulation in Arabidopsis halleri: the essential role of QTL x environment interactions. THE NEW PHYTOLOGIST 2010; 187:355-367. [PMID: 20487314 DOI: 10.1111/j.1469-8137.2010.03295.x] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
This study sought to determine the main genomic regions that control zinc (Zn) hyperaccumulation in Arabidopsis halleri and to examine genotype x environment effects on phenotypic variance. To do so, quantitative trait loci (QTLs) were mapped using an interspecific A. halleri x Arabidopsis lyrata petraea F(2) population. *The F(2) progeny as well as representatives of the parental populations were cultivated on soils at two different Zn concentrations. A linkage map was constructed using 70 markers. *In both low and high pollution treatments, zinc hyperaccumulation showed high broad-sense heritability (81.9 and 74.7%, respectively). Five significant QTLs were detected: two QTLs specific to the low pollution treatment (chromosomes 1 and 4), and three QTLs identified at both treatments (chromosomes 3, 6 and 7). These QTLs explained 50.1 and 36.5% of the phenotypic variance in low and high pollution treatments, respectively. Two QTLs identified at both treatments (chromosomes 3 and 6) showed significant QTL x environment interactions. *The QTL on chromosome 3 largely colocalized with a major QTL previously identified for Zn and cadmium (Cd) tolerance. This suggests that Zn tolerance and hyperaccumulation share, at least partially, a common genetic basis and may have simultaneously evolved on heavy metal-contaminated soils.
Collapse
Affiliation(s)
- Hélène Frérot
- Laboratoire de Génétique et Evolution des Populations Végétales, UMR CNRS 8016, Université des Sciences et Technologies de Lille - Lille1, F-59655 Villeneuve d'Ascq Cedex, France
| | - Michel-Pierre Faucon
- Laboratoire de Génétique et Evolution des Populations Végétales, UMR CNRS 8016, Université des Sciences et Technologies de Lille - Lille1, F-59655 Villeneuve d'Ascq Cedex, France
| | - Glenda Willems
- Laboratoire de Génétique et Evolution des Populations Végétales, UMR CNRS 8016, Université des Sciences et Technologies de Lille - Lille1, F-59655 Villeneuve d'Ascq Cedex, France
| | - Cécile Godé
- Laboratoire de Génétique et Evolution des Populations Végétales, UMR CNRS 8016, Université des Sciences et Technologies de Lille - Lille1, F-59655 Villeneuve d'Ascq Cedex, France
| | - Adeline Courseaux
- Laboratoire de Génétique et Evolution des Populations Végétales, UMR CNRS 8016, Université des Sciences et Technologies de Lille - Lille1, F-59655 Villeneuve d'Ascq Cedex, France
| | - Aude Darracq
- Laboratoire de Génétique et Evolution des Populations Végétales, UMR CNRS 8016, Université des Sciences et Technologies de Lille - Lille1, F-59655 Villeneuve d'Ascq Cedex, France
| | - Nathalie Verbruggen
- Laboratoire de Physiologie et de Génétique Moléculaire des Plantes, Université Libre de Bruxelles, Campus de la Plaine, CP242, boulevard du Triomphe, 1050 Bruxelles, Belgium
| | - Pierre Saumitou-Laprade
- Laboratoire de Génétique et Evolution des Populations Végétales, UMR CNRS 8016, Université des Sciences et Technologies de Lille - Lille1, F-59655 Villeneuve d'Ascq Cedex, France
| |
Collapse
|
159
|
Findley SD, Cannon S, Varala K, Du J, Ma J, Hudson ME, Birchler JA, Stacey G. A fluorescence in situ hybridization system for karyotyping soybean. Genetics 2010; 185:727-44. [PMID: 20421607 PMCID: PMC2907198 DOI: 10.1534/genetics.109.113753] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2009] [Accepted: 04/04/2010] [Indexed: 11/18/2022] Open
Abstract
The development of a universal soybean (Glycine max [L.] Merr.) cytogenetic map that associates classical genetic linkage groups, molecular linkage groups, and a sequence-based physical map with the karyotype has been impeded due to the soybean chromosomes themselves, which are small and morphologically homogeneous. To overcome this obstacle, we screened soybean repetitive DNA to develop a cocktail of fluorescent in situ hybridization (FISH) probes that could differentially label mitotic chromosomes in root tip preparations. We used genetically anchored BAC clones both to identify individual chromosomes in metaphase spreads and to complete a FISH-based karyotyping cocktail that permitted simultaneous identification of all 20 chromosome pairs. We applied these karyotyping tools to wild soybean, G. soja Sieb. and Zucc., which represents a large gene pool of potentially agronomically valuable traits. These studies led to the identification and characterization of a reciprocal chromosome translocation between chromosomes 11 and 13 in two accessions of wild soybean. The data confirm that this translocation is widespread in G. soja accessions and likely accounts for the semi-sterility found in some G. soja by G. max crosses.
Collapse
Affiliation(s)
- Seth D. Findley
- National Center for Soybean Biotechnology, Division of Plant Sciences and Division of Biological Sciences, University of Missouri, Columbia, Missouri 65211, United States Department of Agriculture–Agricultural Research Service, Iowa State University, Ames, Iowa 50011 and Department of Crop Sciences, University of Illinois, Urbana, Illinois 61801 and Department of Agronomy, Purdue University, West Lafayette, Indiana 47907
| | - Steven Cannon
- National Center for Soybean Biotechnology, Division of Plant Sciences and Division of Biological Sciences, University of Missouri, Columbia, Missouri 65211, United States Department of Agriculture–Agricultural Research Service, Iowa State University, Ames, Iowa 50011 and Department of Crop Sciences, University of Illinois, Urbana, Illinois 61801 and Department of Agronomy, Purdue University, West Lafayette, Indiana 47907
| | - Kranthi Varala
- National Center for Soybean Biotechnology, Division of Plant Sciences and Division of Biological Sciences, University of Missouri, Columbia, Missouri 65211, United States Department of Agriculture–Agricultural Research Service, Iowa State University, Ames, Iowa 50011 and Department of Crop Sciences, University of Illinois, Urbana, Illinois 61801 and Department of Agronomy, Purdue University, West Lafayette, Indiana 47907
| | - Jianchang Du
- National Center for Soybean Biotechnology, Division of Plant Sciences and Division of Biological Sciences, University of Missouri, Columbia, Missouri 65211, United States Department of Agriculture–Agricultural Research Service, Iowa State University, Ames, Iowa 50011 and Department of Crop Sciences, University of Illinois, Urbana, Illinois 61801 and Department of Agronomy, Purdue University, West Lafayette, Indiana 47907
| | - Jianxin Ma
- National Center for Soybean Biotechnology, Division of Plant Sciences and Division of Biological Sciences, University of Missouri, Columbia, Missouri 65211, United States Department of Agriculture–Agricultural Research Service, Iowa State University, Ames, Iowa 50011 and Department of Crop Sciences, University of Illinois, Urbana, Illinois 61801 and Department of Agronomy, Purdue University, West Lafayette, Indiana 47907
| | - Matthew E. Hudson
- National Center for Soybean Biotechnology, Division of Plant Sciences and Division of Biological Sciences, University of Missouri, Columbia, Missouri 65211, United States Department of Agriculture–Agricultural Research Service, Iowa State University, Ames, Iowa 50011 and Department of Crop Sciences, University of Illinois, Urbana, Illinois 61801 and Department of Agronomy, Purdue University, West Lafayette, Indiana 47907
| | - James A. Birchler
- National Center for Soybean Biotechnology, Division of Plant Sciences and Division of Biological Sciences, University of Missouri, Columbia, Missouri 65211, United States Department of Agriculture–Agricultural Research Service, Iowa State University, Ames, Iowa 50011 and Department of Crop Sciences, University of Illinois, Urbana, Illinois 61801 and Department of Agronomy, Purdue University, West Lafayette, Indiana 47907
| | - Gary Stacey
- National Center for Soybean Biotechnology, Division of Plant Sciences and Division of Biological Sciences, University of Missouri, Columbia, Missouri 65211, United States Department of Agriculture–Agricultural Research Service, Iowa State University, Ames, Iowa 50011 and Department of Crop Sciences, University of Illinois, Urbana, Illinois 61801 and Department of Agronomy, Purdue University, West Lafayette, Indiana 47907
| |
Collapse
|
160
|
Gupta R, Sarthi D, Mittal A, Singh K. A novel signal processing measure to identify exact and inexact tandem repeat patterns in DNA sequences. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2010:43596. [PMID: 17713591 PMCID: PMC3171338 DOI: 10.1155/2007/43596] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2006] [Revised: 11/20/2006] [Accepted: 12/07/2006] [Indexed: 01/07/2023]
Abstract
The identification and analysis of repetitive patterns are active areas of biological and computational research. Tandem repeats in telomeres play a role in cancer and hypervariable trinucleotide tandem repeats are linked to over a dozen major neurodegenerative genetic disorders. In this paper, we present an algorithm to identify the exact and inexact repeat patterns in DNA sequences based on orthogonal exactly periodic subspace decomposition technique. Using the new measure our algorithm resolves the problems like whether the repeat pattern is of period P or its multiple (i.e., 2P, 3P, etc.), and several other problems that were present in previous signal-processing-based algorithms. We present an efficient algorithm of O(NL(w) log L(w)), where N is the length of DNA sequence and L(w) is the window length, for identifying repeats. The algorithm operates in two stages. In the first stage, each nucleotide is analyzed separately for periodicity, and in the second stage, the periodic information of each nucleotide is combined together to identify the tandem repeats. Datasets having exact and inexact repeats were taken up for the experimental purpose. The experimental result shows the effectiveness of the approach.
Collapse
Affiliation(s)
- Ravi Gupta
- Department of Electronics and Computer Engineering, Indian Institute of Technology Roorkee, Roorkee, Uttaranchal 247 667, India
| | - Divya Sarthi
- Department of Electronics and Computer Engineering, Indian Institute of Technology Roorkee, Roorkee, Uttaranchal 247 667, India
| | - Ankush Mittal
- Department of Electronics and Computer Engineering, Indian Institute of Technology Roorkee, Roorkee, Uttaranchal 247 667, India
| | - Kuldip Singh
- Department of Electronics and Computer Engineering, Indian Institute of Technology Roorkee, Roorkee, Uttaranchal 247 667, India
| |
Collapse
|
161
|
Miller RN, Passos MA, Menezes NN, Souza MT, do Carmo Costa MM, Rennó Azevedo VC, Amorim EP, Pappas GJ, Ciampi AY. Characterization of novel microsatellite markers in Musa acuminata subsp. burmannicoides, var. Calcutta 4. BMC Res Notes 2010; 3:148. [PMID: 20507605 PMCID: PMC2893197 DOI: 10.1186/1756-0500-3-148] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2009] [Accepted: 05/27/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Banana is a nutritionally important crop across tropical and sub-tropical countries in sub-Saharan Africa, Central and South America and Asia. Although cultivars have evolved from diploid, triploid and tetraploid wild Asian species of Musa acuminata (A genome) and Musa balbisiana (B genome), many of today's commercial cultivars are sterile triploids or diploids, with fruit developing via parthenocarpy. As a result of restricted genetic variation, improvement has been limited, resulting in a crop frequently lacking resistance to pests and disease. Considering the importance of molecular tools to facilitate development of disease resistant genotypes, the objectives of this study were to develop polymorphic microsatellite markers from BAC clone sequences for M. acuminata subsp. burmannicoides, var. Calcutta 4. This wild diploid species is used as a donor cultivar in breeding programs as a source of resistance to diverse biotic stresses. FINDINGS Microsatellite sequences were identified from five Calcutta 4 BAC consensi datasets. Specific primers were designed for 41 loci. Isolated di-nucleotide repeat motifs were the most abundant, followed by tri-nucleotides. From 33 tested loci, 20 displayed polymorphism when screened across 21 diploid M. acuminata accessions, contrasting in resistance to Sigatoka diseases. The number of alleles per SSR locus ranged from two to four, with a total of 56. Six repeat classes were identified, with di-nucleotides the most abundant. Expected heterozygosity values for polymorphic markers ranged from 0.31 to 0.75. CONCLUSIONS This is the first report identifying polymorphic microsatellite markers from M. acuminata subsp. burmannicoides, var. Calcutta 4 across accessions contrasting in resistance to Sigatoka diseases. These BAC-derived polymorphic microsatellite markers are a useful resource for banana, applicable for genetic map development, germplasm characterization, evolutionary studies and marker assisted selection for traits.
Collapse
Affiliation(s)
- Robert Ng Miller
- Universidade de Brasília, Campus Universitário Darcy Ribeiro, Instituto de Ciências Biológicas, Departamento de Biologia Celular, Asa Norte, Brasília, Brazil.
| | | | | | | | | | | | | | | | | |
Collapse
|
162
|
Moreno M, Marinotti O, Krzywinski J, Tadei WP, James AA, Achee NL, Conn JE. Complete mtDNA genomes of Anopheles darlingi and an approach to anopheline divergence time. Malar J 2010; 9:127. [PMID: 20470395 PMCID: PMC2877063 DOI: 10.1186/1475-2875-9-127] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2010] [Accepted: 05/14/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The complete sequences of the mitochondrial genomes (mtDNA) of members of the northern and southern genotypes of Anopheles (Nyssorhynchus) darlingi were used for comparative studies to estimate the time to the most recent common ancestor for modern anophelines, to evaluate differentiation within this taxon, and to seek evidence of incipient speciation. METHODS The mtDNAs were sequenced from mosquitoes from Belize and Brazil and comparative analyses of structure and base composition, among others, were performed. A maximum likelihood approach linked with phylogenetic information was employed to detect evidence of selection and a Bayesian approach was used to date the split between the subgenus Nyssorhynchus and other Anopheles subgenera. RESULTS The comparison of mtDNA sequences within the Anopheles darlingi taxon does not provide sufficient resolution to establish different units of speciation within the species. In addition, no evidence of positive selection in any protein-coding gene of the mtDNA was detected, and purifying selection likely is the basis for this lack of diversity. Bayesian analysis supports the conclusion that the most recent ancestor of Nyssorhynchus and Anopheles+Cellia was extant ~94 million years ago. CONCLUSION Analyses of mtDNA genomes of Anopheles darlingi do not provide support for speciation in the taxon. The dates estimated for divergence among the anopheline groups tested is in agreement with the geological split of western Gondwana (95 mya), and provides additional support for explaining the absence of Cellia in the New World, and Nyssorhynchus in the Afro-Eurasian continents.
Collapse
Affiliation(s)
- Marta Moreno
- Griffin Laboratory, New York State Department of Health, Wadsworth Center, 5668 State Farm Road, Slingerlands, NY 12159, USA.
| | | | | | | | | | | | | |
Collapse
|
163
|
Hughes DJ, Kipar A, Milligan SG, Cunningham C, Sanders M, Quail MA, Rajandream MA, Efstathiou S, Bowden RJ, Chastel C, Bennett M, Sample JT, Barrell B, Davison AJ, Stewart JP. Characterization of a novel wood mouse virus related to murid herpesvirus 4. J Gen Virol 2010; 91:867-79. [PMID: 19940063 PMCID: PMC2888160 DOI: 10.1099/vir.0.017327-0] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2009] [Accepted: 11/19/2009] [Indexed: 11/18/2022] Open
Abstract
Two novel gammaherpesviruses were isolated, one from a field vole (Microtus agrestis) and the other from wood mice (Apodemus sylvaticus). The genome of the latter, designated wood mouse herpesvirus (WMHV), was completely sequenced. WMHV had the same genome structure and predicted gene content as murid herpesvirus 4 (MuHV4; murine gammaherpesvirus 68). Overall nucleotide sequence identity between WMHV and MuHV4 was 85 % and most of the 10 kb region at the left end of the unique region was particularly highly conserved, especially the viral tRNA-like sequences and the coding regions of genes M1 and M4. The partial sequence (71 913 bp) of another gammaherpesvirus, Brest herpesvirus (BRHV), which was isolated ostensibly from a white-toothed shrew (Crocidura russula), was also determined. The BRHV sequence was 99.2 % identical to the corresponding portion of the WMHV genome. Thus, WMHV and BRHV appeared to be strains of a new virus species. Biological characterization of WMHV indicated that it grew with similar kinetics to MuHV4 in cell culture. The pathogenesis of WMHV in wood mice was also extremely similar to that of MuHV4, except for the absence of inducible bronchus-associated lymphoid tissue at day 14 post-infection and a higher load of latently infected cells at 21 days post-infection.
Collapse
Affiliation(s)
- David J. Hughes
- School of Infection and Host Defence, University of Liverpool, Liverpool L69 3GA, UK
| | - Anja Kipar
- Department of Veterinary Pathology, University of Liverpool, Liverpool, L69 7ZJ, UK
| | - Steven G. Milligan
- MRC Virology Unit, Institute of Virology, University of Glasgow, Church Street, Glasgow G11 5JR, UK
| | - Charles Cunningham
- MRC Virology Unit, Institute of Virology, University of Glasgow, Church Street, Glasgow G11 5JR, UK
| | - Mandy Sanders
- The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Cambridge CB10 1SA, UK
| | - Michael A. Quail
- The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Cambridge CB10 1SA, UK
| | - Marie-Adele Rajandream
- The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Cambridge CB10 1SA, UK
| | - Stacey Efstathiou
- Division of Virology, Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, UK
| | - Rory J. Bowden
- MRC Virology Unit, Institute of Virology, University of Glasgow, Church Street, Glasgow G11 5JR, UK
| | - Claude Chastel
- Laboratoire de Virologie, Faculté de Médecine, 29285 Brest, France
| | - Malcolm Bennett
- Department of Veterinary Pathology, University of Liverpool, Liverpool, L69 7ZJ, UK
| | - Jeffery T. Sample
- Department of Microbiology and Immunology, The Pennsylvania State University College of Medicine, Hershey, PA 17033, USA
| | - Bart Barrell
- The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Cambridge CB10 1SA, UK
| | - Andrew J. Davison
- MRC Virology Unit, Institute of Virology, University of Glasgow, Church Street, Glasgow G11 5JR, UK
| | - James P. Stewart
- School of Infection and Host Defence, University of Liverpool, Liverpool L69 3GA, UK
| |
Collapse
|
164
|
Faria DA, Mamani EMC, Pappas MR, Pappas GJ, Grattapaglia D. A Selected Set of EST-Derived Microsatellites, Polymorphic and Transferable across 6 Species of Eucalyptus. J Hered 2010; 101:512-20. [DOI: 10.1093/jhered/esq024] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
165
|
Abstract
Aphids are important agricultural pests and also biological models for studies of insect-plant interactions, symbiosis, virus vectoring, and the developmental causes of extreme phenotypic plasticity. Here we present the 464 Mb draft genome assembly of the pea aphid Acyrthosiphon pisum. This first published whole genome sequence of a basal hemimetabolous insect provides an outgroup to the multiple published genomes of holometabolous insects. Pea aphids are host-plant specialists, they can reproduce both sexually and asexually, and they have coevolved with an obligate bacterial symbiont. Here we highlight findings from whole genome analysis that may be related to these unusual biological features. These findings include discovery of extensive gene duplication in more than 2000 gene families as well as loss of evolutionarily conserved genes. Gene family expansions relative to other published genomes include genes involved in chromatin modification, miRNA synthesis, and sugar transport. Gene losses include genes central to the IMD immune pathway, selenoprotein utilization, purine salvage, and the entire urea cycle. The pea aphid genome reveals that only a limited number of genes have been acquired from bacteria; thus the reduced gene count of Buchnera does not reflect gene transfer to the host genome. The inventory of metabolic genes in the pea aphid genome suggests that there is extensive metabolite exchange between the aphid and Buchnera, including sharing of amino acid biosynthesis between the aphid and Buchnera. The pea aphid genome provides a foundation for post-genomic studies of fundamental biological questions and applied agricultural problems.
Collapse
|
166
|
Abstract
Aphids are important agricultural pests and also biological models for studies of insect-plant interactions, symbiosis, virus vectoring, and the developmental causes of extreme phenotypic plasticity. Here we present the 464 Mb draft genome assembly of the pea aphid Acyrthosiphon pisum. This first published whole genome sequence of a basal hemimetabolous insect provides an outgroup to the multiple published genomes of holometabolous insects. Pea aphids are host-plant specialists, they can reproduce both sexually and asexually, and they have coevolved with an obligate bacterial symbiont. Here we highlight findings from whole genome analysis that may be related to these unusual biological features. These findings include discovery of extensive gene duplication in more than 2000 gene families as well as loss of evolutionarily conserved genes. Gene family expansions relative to other published genomes include genes involved in chromatin modification, miRNA synthesis, and sugar transport. Gene losses include genes central to the IMD immune pathway, selenoprotein utilization, purine salvage, and the entire urea cycle. The pea aphid genome reveals that only a limited number of genes have been acquired from bacteria; thus the reduced gene count of Buchnera does not reflect gene transfer to the host genome. The inventory of metabolic genes in the pea aphid genome suggests that there is extensive metabolite exchange between the aphid and Buchnera, including sharing of amino acid biosynthesis between the aphid and Buchnera. The pea aphid genome provides a foundation for post-genomic studies of fundamental biological questions and applied agricultural problems.
Collapse
|
167
|
Krawitz P, Rödelsperger C, Jäger M, Jostins L, Bauer S, Robinson PN. Microindel detection in short-read sequence data. ACTA ACUST UNITED AC 2010; 26:722-9. [PMID: 20144947 DOI: 10.1093/bioinformatics/btq027] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Several recent studies have demonstrated the effectiveness of resequencing and single nucleotide variant (SNV) detection by deep short-read sequencing platforms. While several reliable algorithms are available for automated SNV detection, the automated detection of microindels in deep short-read data presents a new bioinformatics challenge. RESULTS We systematically analyzed how the short-read mapping tools MAQ, Bowtie, Burrows-Wheeler alignment tool (BWA), Novoalign and RazerS perform on simulated datasets that contain indels and evaluated how indels affect error rates in SNV detection. We implemented a simple algorithm to compute the equivalent indel region eir, which can be used to process the alignments produced by the mapping tools in order to perform indel calling. Using simulated data that contains indels, we demonstrate that indel detection works well on short-read data: the detection rate for microindels (<4 bp) is >90%. Our study provides insights into systematic errors in SNV detection that is based on ungapped short sequence read alignments. Gapped alignments of short sequence reads can be used to reduce this error and to detect microindels in simulated short-read data. A comparison with microindels automatically identified on the ABI Sanger and Roche 454 platform indicates that microindel detection from short sequence reads identifies both overlapping and distinct indels. CONTACT peter.krawitz@googlemail.com; peter.robinson@charite.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Peter Krawitz
- Institute for Medical Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin.
| | | | | | | | | | | |
Collapse
|
168
|
INVERTER: INtegrated Variable numbER Tandem rEpeat findeR. COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE 2010. [DOI: 10.1007/978-3-642-16750-8_14] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|
169
|
Abstract
Single nucleotide polymorphisms (SNPs) are widely distributed in the human genome and although most SNPs are the result of independent point-mutations, there are exceptions. When studying distances between SNPs, a periodic pattern in the distance between pairs of identical SNPs has been found to be heavily correlated with periodicity in short tandem repeats (STRs). STRs are short DNA segments, widely distributed in the human genome and mainly found outside known tandem repeats. Because of the biased occurrence of SNPs, special care has to be taken when analyzing SNP-variation in STRs. We present a review of STRs in the human genome and discuss molecular mechanisms related to the biased occurrence of SNPs in STRs, and its implications for genome comparisons and genetic association studies.
Collapse
Affiliation(s)
- Bo Eskerod Madsen
- AgroTech, Institute for Agri Technology and Food Innovation, Aarhus N, Denmark
| | | | | |
Collapse
|
170
|
Lee YS, Kim WY, Ji M, Kim JH, Bhak J. MitoVariome: a variome database of human mitochondrial DNA. BMC Genomics 2009; 10 Suppl 3:S12. [PMID: 19958475 PMCID: PMC2788364 DOI: 10.1186/1471-2164-10-s3-s12] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Background Mitochondrial sequence variation provides critical information for studying human evolution and variation. Mitochondrial DNA provides information on the origin of humans, and plays a substantial role in forensics, degenerative diseases, cancers, and aging process. Typically, human mitochondrial DNA has various features such as HVSI, HVSII, single-nucleotide polymorphism (SNP), restriction enzyme sites, and short tandem repeat (STR). Results We present a variome database (MitoVariome) of human mitochondrial DNA sequences. Queries against MitoVariome can be made using accession numbers or haplogroup/continent. Query results are presented not only in text but also in HTML tables to report extensive mitochondrial sequence variation information. The variation information includes repeat pattern, restriction enzyme site polymorphism, short tandem repeat, disease information as well as single nucleotide polymorphism. It also provides a graphical interface as Gbrowse displaying all variations at a glance. The web interface also provides the tool for assigning haplogroup based on the haplogroup-diagnostic system with complete human mitochondrial SNP position list and for retrieving sequences that users query against by using accession numbers. Conclusion MitoVariome is a freely accessible web application and database that enables human mitochondrial genome researchers to study genetic variation in mitochondrial genome with textual and graphical views accompanied by assignment function of haplogrouping if users submit their own data. Hence, the MitoVariome containing many kinds of variation features in the human mitochondrial genome will be useful for understanding mitochondrial variations of each individual, haplogroup, or geographical location to elucidate the history of human evolution.
Collapse
Affiliation(s)
- Yong Seok Lee
- Korean Bioinformation Center (KOBIC), KRIBB, Daejeon 305-806, Korea.
| | | | | | | | | |
Collapse
|
171
|
Meek MH, Baerwald MR, Wintzer AP, May B. Isolation and characterization of microsatellite loci in two non-native hydromedusae in the San Francisco Estuary: Maeotias marginata and Moerisia sp. CONSERV GENET RESOUR 2009. [DOI: 10.1007/s12686-009-9050-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
172
|
Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs. Heredity (Edinb) 2009; 104:520-33. [PMID: 19935826 DOI: 10.1038/hdy.2009.165] [Citation(s) in RCA: 137] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
The production of genome sequences has led to another important advance in their annotation, which is closely linked to the exact determination of their content in terms of repeats, among which are transposable elements (TEs). The evolutionary implications and the presence of coding regions in some TEs can confuse gene annotation, and also hinder the process of genome assembly, making particularly crucial to be able to annotate and classify them correctly in genome sequences. This review is intended to provide an overview as comprehensive as possible of the automated methods currently used to annotate and classify TEs in sequenced genomes. Different categories of programs exist according to their methodology and the repeat, which they can identify. I describe here the main characteristics of the programs, their main goals and the difficulties they can entail. The drawbacks of the different methods are also highlighted to help biologists who are unfamiliar with algorithmic methods to understand this methodology better. Globally, using several different programs and carrying out a cross comparison of their results has the best chance of finding reliable results as any single program. However, this makes it essential to verify the results provided by each program independently. The ideal solution would be to test all programs against the same data set to obtain a true comparison of their actual performance.
Collapse
|
173
|
Jorda J, Kajava AV. T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm. ACTA ACUST UNITED AC 2009; 25:2632-8. [PMID: 19671691 DOI: 10.1093/bioinformatics/btp482] [Citation(s) in RCA: 131] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Over the last years a number of evidences have been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological functions and being related to a number of human diseases. At the same time, frequently, protein repeats are strongly degenerated during evolution and, therefore, cannot be easily identified. To solve this problem, several computer programs which were based on different algorithms have been developed. Nevertheless, our tests showed that there is still room for improvement of methods for accurate and rapid detection of tandem repeats in proteins. RESULTS We developed a new program called T-REKS for ab initio identification of the tandem repeats. It is based on clustering of lengths between identical short strings by using a K-means algorithm. Benchmark of the existing programs and T-REKS on several sequence datasets is presented. Our program being linked to the Protein Repeat DataBase opens the way for large-scale analysis of protein tandem repeats. T-REKS can also be applied to the nucleotide sequences. AVAILABILITY The algorithm has been implemented in JAVA, the program is available upon request at http://bioinfo.montp.cnrs.fr/?r=t-reks. Protein Repeat DataBase generated by using T-REKS is accessible at http://bioinfo.montp.cnrs.fr/?r=repeatDB.
Collapse
Affiliation(s)
- Julien Jorda
- Centre de Recherches de Biochimie Macromoléculaire UMR 5237, CNRS, University of Montpellier 1 and 2, Montpellier, France.
| | | |
Collapse
|
174
|
Zarlenga DS, Gasbarre LC. From parasite genomes to one healthy world: Are we having fun yet? Vet Parasitol 2009; 163:235-49. [PMID: 19560277 DOI: 10.1016/j.vetpar.2009.06.010] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
In 1990, the Human Genome Sequencing Project was established. This laid the ground work for an explosion of sequence data that has since followed. As a result of this effort, the first complete genome of an animal, Caenorhabditis elegans was published in 1998. The sequence of Drosophila melanogaster was made available in March, 2000 and in the following year, working drafts of the human genome were generated with the completed sequence (92%) being released in 2003. Recent advancements and next-generation technologies have made sequencing common place and have infiltrated every aspect of biological research, including parasitology. To date, sequencing of 32 apicomplexa and 24 nematode genomes are either in progress or near completion, and over 600k nematode EST and 200k apicomplexa EST submissions fill the databases. However, the winds have shifted and efforts are now refocusing on how best to store, mine and apply these data to problem solving. Herein we tend not to summarize existing X-omics datasets or present new technological advances that promise future benefits. Rather, the information to follow condenses up-to-date-applications of existing technologies to problem solving as it relates to parasite research. Advancements in non-parasite systems are also presented with the proviso that applications to parasite research are in the making.
Collapse
Affiliation(s)
- Dante S Zarlenga
- USDA, ARS, ANRI Animal Parasitic Diseases Laboratory, Beltsville, MD 20705, USA.
| | | |
Collapse
|
175
|
D'haene B, Attanasio C, Beysen D, Dostie J, Lemire E, Bouchard P, Field M, Jones K, Lorenz B, Menten B, Buysse K, Pattyn F, Friedli M, Ucla C, Rossier C, Wyss C, Speleman F, De Paepe A, Dekker J, Antonarakis SE, De Baere E. Disease-causing 7.4 kb cis-regulatory deletion disrupting conserved non-coding sequences and their interaction with the FOXL2 promotor: implications for mutation screening. PLoS Genet 2009; 5:e1000522. [PMID: 19543368 PMCID: PMC2689649 DOI: 10.1371/journal.pgen.1000522] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2009] [Accepted: 05/18/2009] [Indexed: 11/23/2022] Open
Abstract
To date, the contribution of disrupted potentially cis-regulatory conserved non-coding sequences (CNCs) to human disease is most likely underestimated, as no systematic screens for putative deleterious variations in CNCs have been conducted. As a model for monogenic disease we studied the involvement of genetic changes of CNCs in the cis-regulatory domain of FOXL2 in blepharophimosis syndrome (BPES). Fifty-seven molecularly unsolved BPES patients underwent high-resolution copy number screening and targeted sequencing of CNCs. Apart from three larger distant deletions, a de novo deletion as small as 7.4 kb was found at 283 kb 5′ to FOXL2. The deletion appeared to be triggered by an H-DNA-induced double-stranded break (DSB). In addition, it disrupts a novel long non-coding RNA (ncRNA) PISRT1 and 8 CNCs. The regulatory potential of the deleted CNCs was substantiated by in vitro luciferase assays. Interestingly, Chromosome Conformation Capture (3C) of a 625 kb region surrounding FOXL2 in expressing cellular systems revealed physical interactions of three upstream fragments and the FOXL2 core promoter. Importantly, one of these contains the 7.4 kb deleted fragment. Overall, this study revealed the smallest distant deletion causing monogenic disease and impacts upon the concept of mutation screening in human disease and developmental disorders in particular. Long-range genetic control is an inherent feature of genes harbouring a highly complex spatiotemporal expression pattern, requiring a combined action of multiple cis-regulatory elements such as promoters, enhancers, and silencers. Consequently, disruption of the long-range genetic control of a target gene by genomic rearrangements of regulatory elements may lead to aberrant gene transcription and disease. To date, the contribution of mutated regulatory elements to human disease has not been studied frequently. Here, we explored the contribution of genetic changes in potentially cis-regulatory elements of the FOXL2 gene in blepharophimosis syndrome (BPES), a developmental monogenic condition of the eyelids and ovaries. We identified a de novo very subtle deletion of 7.4 kb causing BPES. Moreover, we studied the functional capacities and chromosome conformation of the deleted region in FOXL2 expressing cellular systems. Interestingly, the chromosome conformation analysis demonstrated the close proximity of the 7.4 kb deleted fragment and two other conserved regions with the FOXL2 core promoter, and the necessity of their integrity for correct FOXL2 expression. Finally, our study revealed the smallest distant deletion causing monogenic disease and emphasized the importance of mutation screening of cis-regulatory elements in human genetic disease.
Collapse
Affiliation(s)
- Barbara D'haene
- Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Catia Attanasio
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland
| | - Diane Beysen
- Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Josée Dostie
- Program in Gene Function and Expression and Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
| | - Edmond Lemire
- Division of Medical Genetics, Royal University Hospital, Saskatoon, Saskatchewan, Canada
| | | | | | - Kristie Jones
- Department of Clinical Genetics, The Children's Hospital at Westmead, Westmead, Australia
| | - Birgit Lorenz
- Department of Ophthalmology, Justus-Liebig-University Giessen, Universitaetsklinikum Giessen und Marburg GmbH Giessen Campus, Giessen, Germany
| | - Björn Menten
- Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Karen Buysse
- Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Filip Pattyn
- Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Marc Friedli
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland
| | - Catherine Ucla
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland
| | - Colette Rossier
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland
| | - Carine Wyss
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland
| | - Frank Speleman
- Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Anne De Paepe
- Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Job Dekker
- Program in Gene Function and Expression and Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
| | - Stylianos E. Antonarakis
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland
| | - Elfride De Baere
- Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
- * E-mail:
| |
Collapse
|
176
|
Brandão A, Jiang T. The composition of untranslated regions in Trypanosoma cruzi genes. Parasitol Int 2009; 58:215-9. [PMID: 19505588 DOI: 10.1016/j.parint.2009.06.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2009] [Revised: 05/26/2009] [Accepted: 06/01/2009] [Indexed: 11/25/2022]
Abstract
We collected the UTRs from Trypanosomacruzi genes that have been experimentally mapped and are publicly available, and made a comprehensive analysis of their composition features including sequence length, G+C content and relationship to ORF, composition of the most frequent words, and distribution of Simple Sequence Repeats (SSR). T. cruzi UTRs exhibit range length of 10-400bp for 5' UTR and 17-2800 for 3' UTR. Both UTRs display mean G+C content of 40%. Ratios between the UTR and protein coding segments show that the 5' UTR is limited to a maximum of 20% of the total length in the final transcript. The 5' UTR most frequent words in the range 4-12 bases are almost exact complement to the 3' UTR respective words. SSR in 3' UTR are longer than in 5' UTR and are mostly derived from TA/AT, TG/GT, and TTA/ATT. SSR accounts up to 20% of the nucleotide composition in 5' UTR and up to 90% in the 3' UTR.
Collapse
|
177
|
Chaley MB, Nazipova NN, Kutyrkin VA. Statistical methods for detecting latent periodicity patterns in biological sequences: The case of small-size samples. PATTERN RECOGNITION AND IMAGE ANALYSIS 2009. [DOI: 10.1134/s1054661809020217] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
178
|
The complete mitochondrial genome of Atelura formicaria (Hexapoda: Zygentoma) and the phylogenetic relationships of basal insects. Gene 2009; 439:25-34. [DOI: 10.1016/j.gene.2009.02.020] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2008] [Revised: 02/18/2009] [Accepted: 02/19/2009] [Indexed: 11/18/2022]
|
179
|
Petersen JL, Ibarra AM, May B. Thirty-seven additional microsatellite loci in the Pacific lion-paw scallop (Nodipecten subnodosus) and cross-amplification in other pectinids. CONSERV GENET RESOUR 2009. [DOI: 10.1007/s12686-009-9025-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
180
|
Treangen TJ, Abraham AL, Touchon M, Rocha EPC. Genesis, effects and fates of repeats in prokaryotic genomes. FEMS Microbiol Rev 2009; 33:539-71. [PMID: 19396957 DOI: 10.1111/j.1574-6976.2009.00169.x] [Citation(s) in RCA: 111] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
DNA repeats are causes and consequences of genome plasticity. Repeats are created by intrachromosomal recombination or horizontal transfer. They are targeted by recombination processes leading to amplifications, deletions and rearrangements of genetic material. The identification and analysis of repeats in nearly 700 genomes of bacteria and archaea is facilitated by the existence of sequence data and adequate bioinformatic tools. These have revealed the immense diversity of repeats in genomes, from those created by selfish elements to the ones used for protection against selfish elements, from those arising from transient gene amplifications to the ones leading to stable duplications. Experimental works have shown that some repeats do not carry any adaptive value, while others allow functional diversification and increased expression. All repeats carry some potential to disorganize and destabilize genomes. Because recombination and selection for repeats vary between genomes, the number and types of repeats are also quite diverse and in line with ecological variables, such as host-dependent associations or population sizes, and with genetic variables, such as the recombination machinery. From an evolutionary point of view, repeats represent both opportunities and problems. We describe how repeats are created and how they can be found in genomes. We then focus on the functional and genomic consequences of repeats that dictate their fate.
Collapse
|
181
|
Peterlongo P, Sacomoto GAT, do Lago AP, Pisanti N, Sagot MF. Lossless filter for multiple repeats with bounded edit distance. Algorithms Mol Biol 2009; 4:3. [PMID: 19183438 PMCID: PMC2661881 DOI: 10.1186/1748-7188-4-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2008] [Accepted: 01/30/2009] [Indexed: 11/10/2022] Open
Abstract
Background Identifying local similarity between two or more sequences, or identifying repeats occurring at least twice in a sequence, is an essential part in the analysis of biological sequences and of their phylogenetic relationship. Finding such fragments while allowing for a certain number of insertions, deletions, and substitutions, is however known to be a computationally expensive task, and consequently exact methods can usually not be applied in practice. Results The filter TUIUIU that we introduce in this paper provides a possible solution to this problem. It can be used as a preprocessing step to any multiple alignment or repeats inference method, eliminating a possibly large fraction of the input that is guaranteed not to contain any approximate repeat. It consists in the verification of several strong necessary conditions that can be checked in a fast way. We implemented three versions of the filter. The first is simply a straightforward extension to the case of multiple sequences of an application of conditions already existing in the literature. The second uses a stronger condition which, as our results show, enable to filter sensibly more with negligible (if any) additional time. The third version uses an additional condition and pushes the sensibility of the filter even further with a non negligible additional time in many circumstances; our experiments show that it is particularly useful with large error rates. The latter version was applied as a preprocessing of a multiple alignment tool, obtaining an overall time (filter plus alignment) on average 63 and at best 530 times smaller than before (direct alignment), with in most cases a better quality alignment. Conclusion To the best of our knowledge, TUIUIU is the first filter designed for multiple repeats and for dealing with error rates greater than 10% of the repeats length.
Collapse
|
182
|
Fisch KM, Petersen JL, Baerwald MR, Pedroia JK, May B. Characterization of 24 microsatellite loci in delta smelt, Hypomesus transpacificus, and their cross-species amplification in two other smelt species of the Osmeridae family. Mol Ecol Resour 2009; 9:405-8. [PMID: 21564663 DOI: 10.1111/j.1755-0998.2008.02254.x] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We characterized 24 polymorphic tetranucleotide microsatellite loci for delta smelt (Hypomesus transpacificus) endemic to the San Francisco Bay Estuary, CA, USA. Screening of samples (n = 30) yielded two to 26 alleles per locus with observed levels of heterozygosity ranging from 0.17 to 1.0. Only one locus deviated from Hardy-Weinberg equilibrium, suggesting these individuals originate from a single panmictic population. Linkage disequilibrium was found in two pairs of loci after excluding the locus out of Hardy-Weinberg equilibrium. Twenty-two primer pairs cross-amplified in wakasagi smelt (Hypomesus nipponensis), and 15 primer pairs cross-amplified in longfin smelt (Spirinchus thaleichthys).
Collapse
Affiliation(s)
- Kathleen M Fisch
- Department of Animal Science, University of California - Davis, 1 Shields Avenue, Davis, CA 95616, USA
| | | | | | | | | |
Collapse
|
183
|
Richard GF, Kerrest A, Dujon B. Comparative genomics and molecular dynamics of DNA repeats in eukaryotes. Microbiol Mol Biol Rev 2008; 72:686-727. [PMID: 19052325 PMCID: PMC2593564 DOI: 10.1128/mmbr.00011-08] [Citation(s) in RCA: 335] [Impact Index Per Article: 20.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Repeated elements can be widely abundant in eukaryotic genomes, composing more than 50% of the human genome, for example. It is possible to classify repeated sequences into two large families, "tandem repeats" and "dispersed repeats." Each of these two families can be itself divided into subfamilies. Dispersed repeats contain transposons, tRNA genes, and gene paralogues, whereas tandem repeats contain gene tandems, ribosomal DNA repeat arrays, and satellite DNA, itself subdivided into satellites, minisatellites, and microsatellites. Remarkably, the molecular mechanisms that create and propagate dispersed and tandem repeats are specific to each class and usually do not overlap. In the present review, we have chosen in the first section to describe the nature and distribution of dispersed and tandem repeats in eukaryotic genomes in the light of complete (or nearly complete) available genome sequences. In the second part, we focus on the molecular mechanisms responsible for the fast evolution of two specific classes of tandem repeats: minisatellites and microsatellites. Given that a growing number of human neurological disorders involve the expansion of a particular class of microsatellites, called trinucleotide repeats, a large part of the recent experimental work on microsatellites has focused on these particular repeats, and thus we also review the current knowledge in this area. Finally, we propose a unified definition for mini- and microsatellites that takes into account their biological properties and try to point out new directions that should be explored in a near future on our road to understanding the genetics of repeated sequences.
Collapse
Affiliation(s)
- Guy-Franck Richard
- Institut Pasteur, Unité de Génétique Moléculaire des Levures, CNRS, URA2171, Université Pierre et Marie Curie, UFR927, 25 rue du Dr. Roux, F-75015, Paris, France.
| | | | | |
Collapse
|
184
|
Hua J, Li M, Dong P, Xie Q, Bu W. The mitochondrial genome of Protohermes concolorus Yang et Yang 1988 (Insecta: Megaloptera: Corydalidae). Mol Biol Rep 2008; 36:1757-65. [PMID: 18949579 DOI: 10.1007/s11033-008-9379-0] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2008] [Accepted: 10/07/2008] [Indexed: 11/26/2022]
Abstract
The first complete mitochondrial genome of dobsonfly Protohermes concolorus Yang et Yang, 1988 (Megaloptera: Corydalidae) was sequenced in this study. The genome was a circular molecule of 15,851 bp containing the typical 37 genes that arranged in the same order as that of the putative ancestor of hexapods. Sequences overlaps were observed between several neighbor genes, which made the genome relatively compact. The tRNA-Ser (GCT) could not be folded into typical secondary structure because its DHU arm was replaced with a simple loop. Six of the 13 protein genes were terminated with a single T adjacent to a downstream tRNA gene in the same strand. The variation of GC content caused the different nucleotide substitution patterns of the protein genes. The genome was AT-biased with a total A + T content of 75.83% which was also demonstrated by the codon usage. The control region was the most AT-rich region with a sub-region of even higher A + T content. Protein genes of two strands presented opposite CG-skew trends which was also reflected by the codon usage. For most of the amino acids, the protein coding sequences did not prefer to use the cognate codons of corresponding tRNAs and the codon usage of the protein genes was not random. The variation of nucleotide substitution patterns of protein genes was significantly correlated with the GC content. The phylogenetic analyses based on all the 13 protein genes showed that Megaloptera was the sister group of other holometabolous insects except Coleoptera.
Collapse
Affiliation(s)
- Jimeng Hua
- Insect Molecular Systematic Lab, Institute of Entomology, College of Life Sciences, Nankai University, 94 Weijin Road, Tianjin, People's Republic of China
| | | | | | | | | |
Collapse
|
185
|
Thierry A, Bouchier C, Dujon B, Richard GF. Megasatellites: a peculiar class of giant minisatellites in genes involved in cell adhesion and pathogenicity in Candida glabrata. Nucleic Acids Res 2008; 36:5970-82. [PMID: 18812401 PMCID: PMC2566889 DOI: 10.1093/nar/gkn594] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
Minisatellites are DNA tandem repeats that are found in all sequenced genomes. In the yeast Saccharomyces cerevisiae, they are frequently encountered in genes encoding cell wall proteins. Minisatellites present in the completely sequenced genome of the pathogenic yeast Candida glabrata were similarly analyzed, and two new types of minisatellites were discovered: minisatellites that are composed of two different intermingled repeats (called compound minisatellites), and minisatellites containing unusually long repeated motifs (126–429 bp). These long repeat minisatellites may reach unusual length for such elements (up to 10 kb). Due to these peculiar properties, they have been named ‘megasatellites’. They are found essentially in genes involved in cell–cell adhesion, and could therefore be involved in the ability of this opportunistic pathogen to colonize the human host. In addition to megasatellites, found in large paralogous gene families, there are 93 minisatellites with simple shorter motifs, comparable to those found in S. cerevisiae. Most of the time, these minisatellites are not conserved between C. glabrata and S. cerevisiae, although their host genes are well conserved, raising the question of an active mechanism creating minisatellites de novo in hemiascomycetes.
Collapse
Affiliation(s)
- Agnès Thierry
- Institut Pasteur, Unité de Génétique Moléculaire des Levures, CNRS, URA2171, F-75015 Paris, France
| | | | | | | |
Collapse
|
186
|
Levdansky E, Sharon H, Osherov N. Coding fungal tandem repeats as generators of fungal diversity. FUNGAL BIOL REV 2008. [DOI: 10.1016/j.fbr.2008.08.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
187
|
Merkel A, Gemmell N. Detecting short tandem repeats from genome data: opening the software black box. Brief Bioinform 2008; 9:355-66. [PMID: 18621747 DOI: 10.1093/bib/bbn028] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Short tandem repeats, specifically microsatellites, are widely used genetic markers, associated with human genetic diseases, and play an important role in various regulatory mechanisms and evolution. Despite their importance, much is yet unknown about their mutational dynamics. The increasing availability of genome data has led to several in silico studies of microsatellite evolution which have produced a vast range of algorithms and software for tandem repeat detection. Documentation of these tools is often sparse, or provided in a format that is impenetrable to most biologists without informatics background. This article introduces the major concepts behind repeat detecting software essential for informed tool selection. We reflect on issues such as parameter settings and program bias, as well as redundancy filtering and efficiency using examples from the currently available range of programs, to provide an integrated comparison and practical guide to microsatellite detecting programs.
Collapse
Affiliation(s)
- Angelika Merkel
- School of Biological Sciences, University of Canterbury, Private Bag 4800, Christchurch 8041, New Zealand.
| | | |
Collapse
|
188
|
Carapelli A, Comandi S, Convey P, Nardi F, Frati F. The complete mitochondrial genome of the Antarctic springtail Cryptopygus antarcticus (Hexapoda: Collembola). BMC Genomics 2008; 9:315. [PMID: 18593463 PMCID: PMC2483729 DOI: 10.1186/1471-2164-9-315] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2008] [Accepted: 07/01/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Mitogenomics data, i.e. complete mitochondrial genome sequences, are popular molecular markers used for phylogenetic, phylogeographic and ecological studies in different animal lineages. Their comparative analysis has been used to shed light on the evolutionary history of given taxa and on the molecular processes that regulate the evolution of the mitochondrial genome. A considerable literature is available in the fields of invertebrate biochemical and ecophysiological adaptation to extreme environmental conditions, exemplified by those of the Antarctic. Nevertheless, limited molecular data are available from terrestrial Antarctic species, and this study represents the first attempt towards the description of a mitochondrial genome from one of the most widespread and common collembolan species of Antarctica. RESULTS In this study we describe the mitochondrial genome of the Antarctic collembolan Cryptopygus antarcticus Willem, 1901. The genome contains the standard set of 37 genes usually present in animal mtDNAs and a large non-coding fragment putatively corresponding to the region (A+T-rich) responsible for the control of replication and transcription. All genes are arranged in the gene order typical of Pancrustacea. Three additional short non-coding regions are present at gene junctions. Two of these are located in positions of abrupt shift of the coding polarity of genes oriented on opposite strands suggesting a role in the attenuation of the polycistronic mRNA transcription(s). In addition, remnants of an additional copy of trnL(uag) are present between trnS(uga) and nad1. Nucleotide composition is biased towards a high A% and T% (A+T = 70.9%), as typically found in hexapod mtDNAs. There is also a significant strand asymmetry, with the J-strand being more abundant in A and C. Within the A+T-rich region, some short sequence fragments appear to be similar (in position and primary sequence) to those involved in the origin of the N-strand replication of the Drosophila mtDNA. CONCLUSION The mitochondrial genome of C. antarcticus shares several features with other pancrustacean genomes, although the presence of unusual non-coding regions is also suggestive of molecular rearrangements that probably occurred before the differentiation of major collembolan families. Closer examination of gene boundaries also confirms previous observations on the presence of unusual start and stop codons, and suggests a role for tRNA secondary structures as potential cleavage signals involved in the maturation of the primary transcript. Sequences potentially involved in the regulation of replication/transcription are present both in the A+T-rich region and in other areas of the genome. Their position is similar to that observed in a limited number of insect species, suggesting unique replication/transcription mechanisms for basal and derived hexapod lineages. This initial description and characterization of the mitochondrial genome of C. antarcticus will constitute the essential foundation prerequisite for investigations of the evolutionary history of one of the most speciose collembolan genera present in Antarctica and other localities of the Southern Hemisphere.
Collapse
Affiliation(s)
- Antonio Carapelli
- Department of Evolutionary Biology, University of Siena, Via A, Moro 2, 53100 Siena, Italy.
| | | | | | | | | |
Collapse
|
189
|
Poisson Approximation for the Number of Repeats in a Stationary Markov Chain. J Appl Probab 2008. [DOI: 10.1017/s0021900200004344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Detection of repeated sequences within complete genomes is a powerful tool to help understanding genome dynamics and species evolutionary history. To distinguish significant repeats from those that can be obtained just by chance, statistical methods have to be developed. In this paper we show that the distribution of the number of long repeats in long sequences generated by stationary Markov chains can be approximated by a Poisson distribution with explicit parameter. Thanks to the Chen-Stein method we provide a bound for the approximation error; this bound converges to 0 as soon as the length n of the sequence tends to ∞ and the length t of the repeats satisfies n
2ρ
t
= O(1) for some 0 < ρ < 1. Using this Poisson approximation, p-values can then be easily calculated to determine if a given genome is significantly enriched in repeats of length t.
Collapse
|
190
|
Grissa I, Bouchon P, Pourcel C, Vergnaud G. On-line resources for bacterial micro-evolution studies using MLVA or CRISPR typing. Biochimie 2008; 90:660-8. [PMID: 17822824 DOI: 10.1016/j.biochi.2007.07.014] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2007] [Accepted: 07/19/2007] [Indexed: 10/23/2022]
Abstract
The control of bacterial pathogens requires the development of tools allowing the precise identification of strains at the subspecies level. It is now widely accepted that these tools will need to be DNA-based assays (in contrast to identification at the species level, where biochemical based assays are still widely used, even though very powerful 16S DNA sequence databases exist). Typing assays need to be cheap and amenable to the designing of international databases. The success of such subspecies typing tools will eventually be measured by the size of the associated reference databases accessible over the internet. Three methods have shown some potential in this direction, the so-called spoligotyping assay (Mycobacterium tuberculosis, 40,000 entries database), Multiple Loci Sequence Typing (MLST; up to a few thousands entries for the more than 20 bacterial species), and more recently Multiple Loci VNTR Analysis (MLVA; up to a few hundred entries, assays available for more than 20 pathogens). In the present report we will review the current status of the tools and resources we have developed along the past seven years to help in the setting-up or the use of MLVA assays or lately for analysing Clustered Regularly Interspaced Short Palindromic Repeats called CRISPRs which are the basis for spoligotyping assays.
Collapse
Affiliation(s)
- Ibtissem Grissa
- Univ Paris-Sud, Institut de Génétique et Microbiologie, Orsay F-91405, France.
| | | | | | | |
Collapse
|
191
|
Wexler Y, Yakhini Z, Kashi Y, Geiger D. Finding approximate tandem repeats in genomic sequences. J Comput Biol 2008; 12:928-42. [PMID: 16201913 DOI: 10.1089/cmb.2005.12.928] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
An efficient algorithm is presented for detecting approximate tandem repeats in genomic sequences. The algorithm is based on a flexible statistical model which allows a wide range of definitions of approximate tandem repeats. The ideas and methods underlying the algorithm are described and its effectiveness on genomic data is demonstrated.
Collapse
Affiliation(s)
- Ydo Wexler
- Computer Science Department, Technion, Technion Campus, Haifa, 32000, Israel.
| | | | | | | |
Collapse
|
192
|
Shelenkov A, Korotkov A, Korotkov E. MMsat—a database of potential micro- and minisatellites. Gene 2008; 409:53-60. [DOI: 10.1016/j.gene.2007.11.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2007] [Revised: 10/08/2007] [Accepted: 11/16/2007] [Indexed: 11/28/2022]
|
193
|
Shelenkov AA, Skryabin KG, Korotkov EV. Classification analysis of a latent dinucleotide periodicity of plant genomes. RUSS J GENET+ 2008. [DOI: 10.1134/s1022795408010134] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
194
|
Model of perfect tandem repeat with random pattern and empirical homogeneity testing poly-criteria for latent periodicity revelation in biological sequences. Math Biosci 2008; 211:186-204. [DOI: 10.1016/j.mbs.2007.10.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2007] [Revised: 10/19/2007] [Accepted: 10/26/2007] [Indexed: 11/23/2022]
|
195
|
Grzebelus D, Lasota S, Gambin T, Kucherov G, Gambin A. Diversity and structure of PIF/Harbinger-like elements in the genome of Medicago truncatula. BMC Genomics 2007; 8:409. [PMID: 17996080 PMCID: PMC2213677 DOI: 10.1186/1471-2164-8-409] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2007] [Accepted: 11/09/2007] [Indexed: 11/25/2022] Open
Abstract
Background Transposable elements constitute a significant fraction of plant genomes. The PIF/Harbinger superfamily includes DNA transposons (class II elements) carrying terminal inverted repeats and producing a 3 bp target site duplication upon insertion. The presence of an ORF coding for the DDE/DDD transposase, required for transposition, is characteristic for the autonomous PIF/Harbinger-like elements. Based on the above features, PIF/Harbinger-like elements were identified in several plant genomes and divided into several evolutionary lineages. Availability of a significant portion of Medicago truncatula genomic sequence allowed for mining PIF/Harbinger-like elements, starting from a single previously described element MtMaster. Results Twenty two putative autonomous, i.e. carrying an ORF coding for TPase and complete terminal inverted repeats, and 67 non-autonomous PIF/Harbinger-like elements were found in the genome of M. truncatula. They were divided into five families, MtPH-A5, MtPH-A6, MtPH-D,MtPH-E, and MtPH-M, corresponding to three previously identified and two new lineages. The largest families, MtPH-A6 and MtPH-M were further divided into four and three subfamilies, respectively. Non-autonomous elements were usually direct deletion derivatives of the putative autonomous element, however other types of rearrangements, including inversions and nested insertions were also observed. An interesting structural characteristic – the presence of 60 bp tandem repeats – was observed in a group of elements of subfamily MtPH-A6-4. Some families could be related to miniature inverted repeat elements (MITEs). The presence of empty loci (RESites), paralogous to those flanking the identified transposable elements, both autonomous and non-autonomous, as well as the presence of transposon insertion related size polymorphisms, confirmed that some of the mined elements were capable for transposition. Conclusion The population of PIF/Harbinger-like elements in the genome of M. truncatula is diverse. A detailed intra-family comparison of the elements' structure proved that they proliferated in the genome generally following the model of abortive gap repair. However, the presence of tandem repeats facilitated more pronounced rearrangements of the element internal regions. The insertion polymorphism of the MtPH elements and related MITE families in different populations of M. truncatula, if further confirmed experimentally, could be used as a source of molecular markers complementary to other marker systems.
Collapse
Affiliation(s)
- Dariusz Grzebelus
- Department of Genetics, Plant Breeding and Seed Science, Agricultural University of Krakow, Al, 29 Listopada 54, 31-425 Krakow, Poland.
| | | | | | | | | |
Collapse
|
196
|
Legendre M, Pochet N, Pak T, Verstrepen KJ. Sequence-based estimation of minisatellite and microsatellite repeat variability. Genome Res 2007; 17:1787-96. [PMID: 17978285 DOI: 10.1101/gr.6554007] [Citation(s) in RCA: 145] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Variable tandem repeats are frequently used for genetic mapping, genotyping, and forensics studies. Moreover, variation in some repeats underlies rapidly evolving traits or certain diseases. However, mutation rates vary greatly from repeat to repeat, and as a consequence, not all tandem repeats are suitable genetic markers or interesting unstable genetic modules. We developed a model, "SERV," that predicts the variability of a broad range of tandem repeats in a wide range of organisms. The nonlinear model uses three basic characteristics of the repeat (number of repeated units, unit length, and purity) to produce a numeric "VARscore" that correlates with repeat variability. SERV was experimentally validated using a large set of different artificial repeats located in the Saccharomyces cerevisiae URA3 gene. Further in silico analysis shows that SERV outperforms existing models and accurately predicts repeat variability in bacteria and eukaryotes, including plants and humans. Using SERV, we demonstrate significant enrichment of variable repeats within human genes involved in transcriptional regulation, chromatin remodeling, morphogenesis, and neurogenesis. Moreover, SERV allows identification of known and candidate genes involved in repeat-based diseases. In addition, we demonstrate the use of SERV for the selection and comparison of suitable variable repeats for genotyping and forensic purposes. Our analysis indicates that tandem repeats used for genotyping should have a VARscore between 1 and 3. SERV is publicly available from http://hulsweb1.cgr.harvard.edu/SERV/.
Collapse
Affiliation(s)
- Matthieu Legendre
- FAS Center for Systems Biology, Harvard University, Cambridge, Massachusetts 02138, USA
| | | | | | | |
Collapse
|
197
|
Sharma PC, Grover A, Kahl G. Mining microsatellites in eukaryotic genomes. Trends Biotechnol 2007; 25:490-8. [PMID: 17945369 DOI: 10.1016/j.tibtech.2007.07.013] [Citation(s) in RCA: 170] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2007] [Revised: 07/12/2007] [Accepted: 07/31/2007] [Indexed: 12/13/2022]
Abstract
During recent decades, microsatellites have become the most popular source of genetic markers. More recently, the availability of enormous sequence data for a large number of eukaryotic genomes has accelerated research aimed at understanding the origin and functions of microsatellites and searching for new applications. This review presents recent developments of in silico mining of microsatellites to reveal various facets of the distribution and dynamics of microsatellites in eukaryotic genomes. Two aspects of microsatellite search strategies--using a suitable search tool and accessing a relevant microsatellite database--have been explored. Judicious microsatellite mining not only helps in addressing biological questions but also facilitates better exploitation of microsatellites for diverse applications.
Collapse
Affiliation(s)
- Prakash C Sharma
- University School of Biotechnology, Guru Gobind Singh Indraprastha University, Kashmere Gate, Delhi 110 006, India.
| | | | | |
Collapse
|
198
|
Lawson MJ, Zhang L. Housekeeping and tissue-specific genes differ in simple sequence repeats in the 5'-UTR region. Gene 2007; 407:54-62. [PMID: 17964742 DOI: 10.1016/j.gene.2007.09.017] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2007] [Revised: 09/25/2007] [Accepted: 09/26/2007] [Indexed: 12/22/2022]
Abstract
SSRs (simple sequence repeats) have been shown to have a variety of effects on an organism. In this study, we compared SSRs in housekeeping and tissue-specific genes in human and mouse, in terms of SSR types and distributions in different regions including 5'-UTRs, introns, coding exons, 3'-UTRs, and upstream regions. Among all these regions, SSRs in the 5'-UTR show the most distinction between housekeeping genes and tissue-specific genes in both densities and repeat types. Specifically, SSR densities in 5'-UTRs in housekeeping genes are about 1.7 times higher than those in tissue-specific genes, in contrast to the 0.8-1.2 times differences between the two classes of genes in other regions. Tri-SSRs in 5'-UTRs of housekeeping genes are more GC rich than those of tissue-specific genes and CGG, the dominant type of tri-SSR in 5'-UTR, accounts for 74-79% of the tri-SSRs in housekeeping genes, as compared to 42-57% in tissue-specific genes. 75% of the tri-SSRs in the 5'-UTR of housekeeping genes have 4-5 repeat units, versus the 86-90% in tissue-specific genes. Taken together, our results suggest that SSRs may have an effect on gene expression and may play an important role in contributing to the different expression profiles between housekeeping and tissue-specific genes.
Collapse
Affiliation(s)
- Mark J Lawson
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | | |
Collapse
|
199
|
Affiliation(s)
- Haixu Tang
- School of Informatics, Center for Genomics and Bioinformatics, Indiana University, Bloomington, Indiana 47408, USA.
| |
Collapse
|
200
|
Huntley MA, Clark AG. Evolutionary Analysis of Amino Acid Repeats across the Genomes of 12 Drosophila Species. Mol Biol Evol 2007; 24:2598-609. [PMID: 17602168 DOI: 10.1093/molbev/msm129] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Repeated motifs of amino acids within proteins are an abundant feature of eukaryotic sequences and may catalyze the rapid production of genetic and even phenotypic variation among organisms. The completion of the genome sequencing projects of 12 distinct Drosophila species provides a unique dataset to study these intriguing sequence features on a phylogeny with a variety of timescales. We show that there is a higher percentage of proteins containing repeats within the Drosophila genus than most other eukaryotes, including non-Drosphila insects, which makes this collection of species particularly useful for the study of protein repeats. We also find that proteins containing repeats are overrepresented in functional categories involving developmental processes, signaling, and gene regulation. Using the set of 1-to-1 ortholog alignments for the 12 Drosophila species, we test the ability of repeats to act as reliable phylogenetic signals and find that they resolve the generally accepted phylogeny despite the noise caused by their accelerated rate of evolution. We also determine that in general the position of repeats within a protein sequence is non-random, with repeats more often being absent from the middle regions of sequences. Finally we find evidence to suggest that the presence of repeats is associated with an increase in evolutionary rate upon the entire sequence in which they are embedded. With additional evidence to suggest a corresponding elevation in positive selection we propose that some repeats may be inducing compensatory substitutions in their surrounding sequence.
Collapse
Affiliation(s)
- Melanie A Huntley
- Department of Molecular Biology and Genetics Cornell University, USA.
| | | |
Collapse
|