1
|
Nunes R, Storer C, Doleck T, Kawahara AY, Pierce NE, Lohman DJ. Predictors of sequence capture in a large-scale anchored phylogenomics project. Front Ecol Evol 2022. [DOI: 10.3389/fevo.2022.943361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2023] Open
Abstract
Next-generation sequencing (NGS) technologies have revolutionized phylogenomics by decreasing the cost and time required to generate sequence data from multiple markers or whole genomes. Further, the fragmented DNA of biological specimens collected decades ago can be sequenced with NGS, reducing the need for collecting fresh specimens. Sequence capture, also known as anchored hybrid enrichment, is a method to produce reduced representation libraries for NGS sequencing. The technique uses single-stranded oligonucleotide probes that hybridize with pre-selected regions of the genome that are sequenced via NGS, culminating in a dataset of numerous orthologous loci from multiple taxa. Phylogenetic analyses using these sequences have the potential to resolve deep and shallow phylogenetic relationships. Identifying the factors that affect sequence capture success could save time, money, and valuable specimens that might be destructively sampled despite low likelihood of sequencing success. We investigated the impacts of specimen age, preservation method, and DNA concentration on sequence capture (number of captured sequences and sequence quality) while accounting for taxonomy and extracted tissue type in a large-scale butterfly phylogenomics project. This project used two probe sets to extract 391 loci or a subset of 13 loci from over 6,000 butterfly specimens. We found that sequence capture is a resilient method capable of amplifying loci in samples of varying age (0–111 years), preservation method (alcohol, papered, pinned), and DNA concentration (0.020 ng/μl - 316 ng/ul). Regression analyses demonstrate that sequence capture is positively correlated with DNA concentration. However, sequence capture and DNA concentration are negatively correlated with sample age and preservation method. Our findings suggest that sequence capture projects should prioritize the use of alcohol-preserved samples younger than 20 years old when available. In the absence of such specimens, dried samples of any age can yield sequence data, albeit with returns that diminish with increasing age.
Collapse
|
2
|
Dickson ZW, Hackenberger D, Kuch M, Marzok A, Banerjee A, Rossi L, Klowak JA, Fox-Robichaud A, Mossmann K, Miller MS, Surette MG, Golding GB, Poinar H. Probe design for simultaneous, targeted capture of diverse metagenomic targets. CELL REPORTS METHODS 2021; 1:100069. [PMID: 35474894 PMCID: PMC9017208 DOI: 10.1016/j.crmeth.2021.100069] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 06/10/2021] [Accepted: 08/05/2021] [Indexed: 11/20/2022]
Abstract
The compounding challenges of low signal, high background, and uncertain targets plague many metagenomic sequencing efforts. One solution has been DNA capture, wherein probes are designed to hybridize with target sequences, enriching them in relation to their background. However, balancing probe depth with breadth of capture is challenging for diverse targets. To find this balance, we have developed the HUBDesign pipeline, which makes use of sequence homology to design probes at multiple taxonomic levels. This creates an efficient probe set capable of simultaneously and specifically capturing known and related sequences. We validated HUBDesign by generating probe sets targeting the breadth of coronavirus diversity, as well as a suite of bacterial pathogens often underlying sepsis. In separate experiments demonstrating significant, simultaneous enrichment, we captured SARS-CoV-2 and HCoV-NL63 in a human RNA background and seven bacterial strains in human blood. HUBDesign (https://github.com/zacherydickson/HUBDesign) has broad applicability wherever there are multiple organisms of interest.
Collapse
Affiliation(s)
- Zachery W. Dickson
- Department of Biology, McMaster University, Hamilton, ON L8S 4K1, Canada
| | - Dirk Hackenberger
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON L8S 4K1, Canada
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON L8S 4K1, Canada
| | - Melanie Kuch
- McMaster aDNA Center, Department of Anthropology, McMaster University, Hamilton, ON L8S 4L9, Canada
| | - Art Marzok
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON L8S 4K1, Canada
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON L8S 4K1, Canada
- McMaster Immunology Research Center, McMaster University, Hamilton, ON L8S 4K1, Canada
| | - Arinjay Banerjee
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON L8S 4K1, Canada
- McMaster Immunology Research Center, McMaster University, Hamilton, ON L8S 4K1, Canada
- Department of Pathology and Molecular Medicine, McMaster University, Hamilton, ON L8S 4K1, Canada
- Vaccine and Infectious Disease Organization, Department of Veterinary Microbiology, University of Saskatchewan, Saskatoon, SK S7N 5E3, Canada
| | - Laura Rossi
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON L8S 4K1, Canada
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON L8S 4K1, Canada
| | | | | | - Karen Mossmann
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON L8S 4K1, Canada
- McMaster Immunology Research Center, McMaster University, Hamilton, ON L8S 4K1, Canada
- Department of Medicine, McMaster University, Hamilton, ON L8S 4K1, Canada
| | - Matthew S. Miller
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON L8S 4K1, Canada
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON L8S 4K1, Canada
- McMaster Immunology Research Center, McMaster University, Hamilton, ON L8S 4K1, Canada
| | - Michael G. Surette
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON L8S 4K1, Canada
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON L8S 4K1, Canada
- Department of Medicine, McMaster University, Hamilton, ON L8S 4K1, Canada
| | | | - Hendrik Poinar
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON L8S 4K1, Canada
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON L8S 4K1, Canada
- McMaster aDNA Center, Department of Anthropology, McMaster University, Hamilton, ON L8S 4L9, Canada
| |
Collapse
|
3
|
Andermann T, Torres Jiménez MF, Matos-Maraví P, Batista R, Blanco-Pastor JL, Gustafsson ALS, Kistler L, Liberal IM, Oxelman B, Bacon CD, Antonelli A. A Guide to Carrying Out a Phylogenomic Target Sequence Capture Project. Front Genet 2020; 10:1407. [PMID: 32153629 PMCID: PMC7047930 DOI: 10.3389/fgene.2019.01407] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Accepted: 12/24/2019] [Indexed: 12/17/2022] Open
Abstract
High-throughput DNA sequencing techniques enable time- and cost-effective sequencing of large portions of the genome. Instead of sequencing and annotating whole genomes, many phylogenetic studies focus sequencing effort on large sets of pre-selected loci, which further reduces costs and bioinformatic challenges while increasing coverage. One common approach that enriches loci before sequencing is often referred to as target sequence capture. This technique has been shown to be applicable to phylogenetic studies of greatly varying evolutionary depth. Moreover, it has proven to produce powerful, large multi-locus DNA sequence datasets suitable for phylogenetic analyses. However, target capture requires careful considerations, which may greatly affect the success of experiments. Here we provide a simple flowchart for designing phylogenomic target capture experiments. We discuss necessary decisions from the identification of target loci to the final bioinformatic processing of sequence data. We outline challenges and solutions related to the taxonomic scope, sample quality, and available genomic resources of target capture projects. We hope this review will serve as a useful roadmap for designing and carrying out successful phylogenetic target capture studies.
Collapse
Affiliation(s)
- Tobias Andermann
- Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden
- Gothenburg Global Biodiversity Centre, Gothenburg, Sweden
| | - Maria Fernanda Torres Jiménez
- Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden
- Gothenburg Global Biodiversity Centre, Gothenburg, Sweden
| | - Pável Matos-Maraví
- Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden
- Gothenburg Global Biodiversity Centre, Gothenburg, Sweden
- Institute of Entomology, Biology Centre of the Czech Academy of Sciences, České Budějovice, Czechia
| | - Romina Batista
- Gothenburg Global Biodiversity Centre, Gothenburg, Sweden
- Programa de Pós-Graduação em Genética, Conservação e Biologia Evolutiva, PPG GCBEv–Instituto Nacional de Pesquisas da Amazônia—INPA Campus II, Manaus, Brazil
- Coordenação de Zoologia, Museu Paraense Emílio Goeldi, Belém, Brazil
| | - José L. Blanco-Pastor
- Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden
- INRAE, Centre Nouvelle-Aquitaine-Poitiers, Lusignan, France
| | | | - Logan Kistler
- Department of Anthropology, National Museum of Natural History, Smithsonian Institution, Washington, DC, United States
| | - Isabel M. Liberal
- Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden
| | - Bengt Oxelman
- Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden
- Gothenburg Global Biodiversity Centre, Gothenburg, Sweden
| | - Christine D. Bacon
- Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden
- Gothenburg Global Biodiversity Centre, Gothenburg, Sweden
| | - Alexandre Antonelli
- Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden
- Gothenburg Global Biodiversity Centre, Gothenburg, Sweden
- Royal Botanic Gardens, Kew, Richmond-Surrey, United Kingdom
| |
Collapse
|
4
|
Veeckman E, Van Glabeke S, Haegeman A, Muylle H, van Parijs FRD, Byrne SL, Asp T, Studer B, Rohde A, Roldán-Ruiz I, Vandepoele K, Ruttink T. Overcoming challenges in variant calling: exploring sequence diversity in candidate genes for plant development in perennial ryegrass (Lolium perenne). DNA Res 2019; 26:1-12. [PMID: 30325414 PMCID: PMC6379033 DOI: 10.1093/dnares/dsy033] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Accepted: 09/06/2018] [Indexed: 11/13/2022] Open
Abstract
Revealing DNA sequence variation within the Lolium perenne genepool is important for genetic analysis and development of breeding applications. We reviewed current literature on plant development to select candidate genes in pathways that control agronomic traits, and identified 503 orthologues in L. perenne. Using targeted resequencing, we constructed a comprehensive catalogue of genomic variation for a L. perenne germplasm collection of 736 genotypes derived from current cultivars, breeding material and wild accessions. To overcome challenges of variant calling in heterogeneous outbreeding species, we used two complementary strategies to explore sequence diversity. First, four variant calling pipelines were integrated with the VariantMetaCaller to reach maximal sensitivity. Additional multiplex amplicon sequencing was used to empirically estimate an appropriate precision threshold. Second, a de novo assembly strategy was used to reconstruct divergent alleles for each gene. The advantage of this approach was illustrated by discovery of 28 novel alleles of LpSDUF247, a polymorphic gene co-segregating with the S-locus of the grass self-incompatibility system. Our approach is applicable to other genetically diverse outbreeding species. The resulting collection of functionally annotated variants can be mined for variants causing phenotypic variation, either through genetic association studies, or by selecting carriers of rare defective alleles for physiological analyses.
Collapse
Affiliation(s)
- Elisabeth Veeckman
- ILVO, Plant Sciences Unit, B Melle, Belgium.,Bioinformatics Institute Ghent, Ghent University, B Ghent, Belgium.,Department of Plant Biotechnology and Bioinformatics, Ghent University, B Ghent, Belgium
| | | | | | | | | | | | - Torben Asp
- Department of Molecular Biology and Genetics, Faculty of Science and Technology, Research Center Flakkebjerg Aarhus University, DK Slagelse, Denmark
| | - Bruno Studer
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, CH Zurich, Switzerland
| | | | - Isabel Roldán-Ruiz
- ILVO, Plant Sciences Unit, B Melle, Belgium.,Department of Plant Biotechnology and Bioinformatics, Ghent University, B Ghent, Belgium
| | - Klaas Vandepoele
- Bioinformatics Institute Ghent, Ghent University, B Ghent, Belgium.,Department of Plant Biotechnology and Bioinformatics, Ghent University, B Ghent, Belgium.,Center for Plant Systems Biology, VIB, B Ghent, Belgium
| | - Tom Ruttink
- ILVO, Plant Sciences Unit, B Melle, Belgium.,Bioinformatics Institute Ghent, Ghent University, B Ghent, Belgium
| |
Collapse
|
5
|
Parisot N, Peyretaillade E, Dugat-Bony E, Denonfoux J, Mahul A, Peyret P. Probe Design Strategies for Oligonucleotide Microarrays. Methods Mol Biol 2016; 1368:67-82. [PMID: 26614069 DOI: 10.1007/978-1-4939-3136-1_6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Oligonucleotide microarrays have been widely used for gene detection and/or quantification of gene expression in various samples ranging from a single organism to a complex microbial assemblage. The success of a microarray experiment, however, strongly relies on the quality of designed probes. Consequently, probe design is of critical importance and therefore multiple parameters should be considered for each probe in order to ensure high specificity, sensitivity, and uniformity as well as potentially quantitative power. Moreover, to assess the complete gene repertoire of complex biological samples such as those studied in the field of microbial ecology, exploratory probe design strategies must be also implemented to target not-yet-described sequences. To design such probes, two algorithms, KASpOD and HiSpOD, have been developed and they are available via two user-friendly web services. Here, we describe the use of this software necessary for the design of highly effective probes especially in the context of microbial oligonucleotide microarrays by taking into account all the crucial parameters.
Collapse
Affiliation(s)
- Nicolas Parisot
- Université d'Auvergne, EA 4678, CIDAM, Clermont Université, BP 10448, F-63000, Clermont-Ferrand, France
| | - Eric Peyretaillade
- Université d'Auvergne, EA 4678, CIDAM, Clermont Université, BP 10448, F-63000, Clermont-Ferrand, France
| | - Eric Dugat-Bony
- Génie et Microbiologie des Procédés Alimentaires, Centre de Biotechnologies Agro-Industrielles, INRA, AgroParisTech, UMR 782, Thiverval-Grignon, France
| | - Jérémie Denonfoux
- Genomic Platform and R&D, Genoscreen, Campus de l'Institut Pasteur, Lille, France
| | | | - Pierre Peyret
- Université d'Auvergne, EA 4678, CIDAM, Clermont Université, BP 10448, F-63000, Clermont-Ferrand, France.
| |
Collapse
|
6
|
Copy Number Variation in Chickens: A Review and Future Prospects. MICROARRAYS 2014; 3:24-38. [PMID: 27605028 PMCID: PMC5003453 DOI: 10.3390/microarrays3010024] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 12/15/2013] [Revised: 01/22/2014] [Accepted: 01/23/2014] [Indexed: 12/19/2022]
Abstract
DNA sequence variations include nucleotide substitution, deletion, insertion, translocation and inversion. Deletion or insertion of a large DNA segment in the genome, referred to as copy number variation (CNV), has caught the attention of many researchers recently. It is believed that CNVs contribute significantly to genome variability, and thus contribute to phenotypic variability. In chickens, genome-wide surveys with array comparative genome hybridization (aCGH), SNP chip detection or whole genome sequencing have revealed a large number of CNVs. A large portion of chicken CNVs involves protein coding or regulatory sequences. A few CNVs have been demonstrated to be the determinant factors for single gene traits, such as late-feathering, pea-comb and dermal hyperpigmentation. The phenotypic effects of the majority of chicken CNVs are to be delineated.
Collapse
|
7
|
Empirical assessment of competitive hybridization and noise in ultra high density canine tiling arrays. BMC Bioinformatics 2013; 14:231. [PMID: 23870167 PMCID: PMC3733988 DOI: 10.1186/1471-2105-14-231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2012] [Accepted: 07/15/2013] [Indexed: 11/25/2022] Open
Abstract
Background In addition to probe sequence characteristics, noise in hybridization array data is thought to be influenced by competitive hybridization between probes tiled at high densities. Empirical evaluation of competitive hybridization and an estimation of what other non-sequence related features might affect noisy data is currently lacking. Results A high density array was designed to a 1.5 megabase region of the canine genome to explore the potential for probe competition to introduce noise. Multivariate assessment of the influence of probe, segment and design characteristics on hybridization intensity demonstrate that whilst increased density significantly depresses fluorescence intensities, this effect is largely consistent when an ultra high density offset is applied. Signal variation not attributable to sequence composition resulted from the reduction in competition when large inter-probe spacing was introduced due to long repetitive elements and when a lower density offset was applied. Tiling of probes immediately adjacent to various classes of repeat elements did not generate noise. Comparison of identical probe sets hybridized with DNA extracted from blood or saliva establishes salivary DNA as a source of noise. Conclusions This analysis demonstrates the occurrence of competitive hybridization between oligonucleotide probes in high density tiling arrays. It supports that probe competition does not generate random noise when it is maintained across a region. To prevent the introduction of noise from this source, the degree of competition should be regulated by minimizing variation in density across the target region. This finding can make an important contribution to optimizing coverage whilst minimizing sources of noise in the design of high density tiling arrays.
Collapse
|
8
|
Uitdewilligen JGAML, Wolters AMA, D’hoop BB, Borm TJA, Visser RGF, van Eck HJ. A next-generation sequencing method for genotyping-by-sequencing of highly heterozygous autotetraploid potato. PLoS One 2013; 8:e62355. [PMID: 23667470 PMCID: PMC3648547 DOI: 10.1371/journal.pone.0062355] [Citation(s) in RCA: 238] [Impact Index Per Article: 21.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2012] [Accepted: 03/20/2013] [Indexed: 11/23/2022] Open
Abstract
Assessment of genomic DNA sequence variation and genotype calling in autotetraploids implies the ability to distinguish among five possible alternative allele copy number states. This study demonstrates the accuracy of genotyping-by-sequencing (GBS) of a large collection of autotetraploid potato cultivars using next-generation sequencing. It is still costly to reach sufficient read depths on a genome wide scale, across the cultivated gene pool. Therefore, we enriched cultivar-specific DNA sequencing libraries using an in-solution hybridisation method (SureSelect). This complexity reduction allowed to confine our study to 807 target genes distributed across the genomes of 83 tetraploid cultivars and one reference (DM 1–3 511). Indexed sequencing libraries were paired-end sequenced in 7 pools of 12 samples using Illumina HiSeq2000. After filtering and processing the raw sequence data, 12.4 Gigabases of high-quality sequence data was obtained, which mapped to 2.1 Mb of the potato reference genome, with a median average read depth of 63× per cultivar. We detected 129,156 sequence variants and genotyped the allele copy number of each variant for every cultivar. In this cultivar panel a variant density of 1 SNP/24 bp in exons and 1 SNP/15 bp in introns was obtained. The average minor allele frequency (MAF) of a variant was 0.14. Potato germplasm displayed a large number of relatively rare variants and/or haplotypes, with 61% of the variants having a MAF below 0.05. A very high average nucleotide diversity (π = 0.0107) was observed. Nucleotide diversity varied among potato chromosomes. Several genes under selection were identified. Genotyping-by-sequencing results, with allele copy number estimates, were validated with a KASP genotyping assay. This validation showed that read depths of ∼60–80× can be used as a lower boundary for reliable assessment of allele copy number of sequence variants in autotetraploids. Genotypic data were associated with traits, and alleles strongly influencing maturity and flesh colour were identified.
Collapse
Affiliation(s)
- Jan G. A. M. L. Uitdewilligen
- Laboratory of Plant Breeding, Wageningen University, Wageningen, The Netherlands
- The Graduate School for Experimental Plant Sciences, Wageningen, The Netherlands
| | - Anne-Marie A. Wolters
- Laboratory of Plant Breeding, Wageningen University, Wageningen, The Netherlands
- The Graduate School for Experimental Plant Sciences, Wageningen, The Netherlands
| | - Bjorn B. D’hoop
- Laboratory of Plant Breeding, Wageningen University, Wageningen, The Netherlands
| | - Theo J. A. Borm
- Laboratory of Plant Breeding, Wageningen University, Wageningen, The Netherlands
- The Graduate School for Experimental Plant Sciences, Wageningen, The Netherlands
| | - Richard G. F. Visser
- Laboratory of Plant Breeding, Wageningen University, Wageningen, The Netherlands
- The Graduate School for Experimental Plant Sciences, Wageningen, The Netherlands
- Centre for BioSystems Genomics, Wageningen, The Netherlands
| | - Herman J. van Eck
- Laboratory of Plant Breeding, Wageningen University, Wageningen, The Netherlands
- The Graduate School for Experimental Plant Sciences, Wageningen, The Netherlands
- Centre for BioSystems Genomics, Wageningen, The Netherlands
- * E-mail:
| |
Collapse
|
9
|
Ward M, Wilson M, Barbosa-Morais N, Schmidt D, Stark R, Pan Q, Schwalie P, Menon S, Lukk M, Watt S, Thybert D, Kutter C, Kirschner K, Flicek P, Blencowe B, Odom D. Latent regulatory potential of human-specific repetitive elements. Mol Cell 2013; 49:262-72. [PMID: 23246434 PMCID: PMC3560060 DOI: 10.1016/j.molcel.2012.11.013] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2012] [Revised: 09/28/2012] [Accepted: 11/09/2012] [Indexed: 12/26/2022]
Abstract
At least half of the human genome is derived from repetitive elements, which are often lineage specific and silenced by a variety of genetic and epigenetic mechanisms. Using a transchromosomic mouse strain that transmits an almost complete single copy of human chromosome 21 via the female germline, we show that a heterologous regulatory environment can transcriptionally activate transposon-derived human regulatory regions. In the mouse nucleus, hundreds of locations on human chromosome 21 newly associate with activating histone modifications in both somatic and germline tissues, and influence the gene expression of nearby transcripts. These regions are enriched with primate and human lineage-specific transposable elements, and their activation corresponds to changes in DNA methylation at CpG dinucleotides. This study reveals the latent regulatory potential of the repetitive human genome and illustrates the species specificity of mechanisms that control it.
Collapse
Affiliation(s)
- Michelle C. Ward
- University of Cambridge, Cancer Research UK-Cambridge Institute, Robinson Way, Cambridge CB2 0RE, UK
| | - Michael D. Wilson
- University of Cambridge, Cancer Research UK-Cambridge Institute, Robinson Way, Cambridge CB2 0RE, UK
| | - Nuno L. Barbosa-Morais
- Banting and Best Department of Medical Research and Department of Molecular Genetics, Donnelly Centre, Toronto, ON M5S 3E1, Canada
| | - Dominic Schmidt
- University of Cambridge, Cancer Research UK-Cambridge Institute, Robinson Way, Cambridge CB2 0RE, UK
| | - Rory Stark
- University of Cambridge, Cancer Research UK-Cambridge Institute, Robinson Way, Cambridge CB2 0RE, UK
| | - Qun Pan
- Banting and Best Department of Medical Research and Department of Molecular Genetics, Donnelly Centre, Toronto, ON M5S 3E1, Canada
| | - Petra C. Schwalie
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK
| | - Suraj Menon
- University of Cambridge, Cancer Research UK-Cambridge Institute, Robinson Way, Cambridge CB2 0RE, UK
| | - Margus Lukk
- University of Cambridge, Cancer Research UK-Cambridge Institute, Robinson Way, Cambridge CB2 0RE, UK
| | - Stephen Watt
- University of Cambridge, Cancer Research UK-Cambridge Institute, Robinson Way, Cambridge CB2 0RE, UK
| | - David Thybert
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK
| | - Claudia Kutter
- University of Cambridge, Cancer Research UK-Cambridge Institute, Robinson Way, Cambridge CB2 0RE, UK
| | - Kristina Kirschner
- University of Cambridge, Cancer Research UK-Cambridge Institute, Robinson Way, Cambridge CB2 0RE, UK
| | - Paul Flicek
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK
| | - Benjamin J. Blencowe
- Banting and Best Department of Medical Research and Department of Molecular Genetics, Donnelly Centre, Toronto, ON M5S 3E1, Canada
| | - Duncan T. Odom
- University of Cambridge, Cancer Research UK-Cambridge Institute, Robinson Way, Cambridge CB2 0RE, UK
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK
| |
Collapse
|
10
|
Coman D, Gruissem W, Hennig L. Transcript profiling in Arabidopsis with genome tiling microarrays. Methods Mol Biol 2013; 1067:35-49. [PMID: 23975784 DOI: 10.1007/978-1-62703-607-8_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Microarray technology is at present a standardized workflow for genome-wide expression analysis. Whole-genome tiling microarrays have emerged as an important platform for flexible and comprehensive expression profiling. In this chapter we describe a detailed standardized workflow for experiments assessing the transcriptome of Arabidopsis using tiling arrays and provide useful hints for critical steps from experimental design to data analysis. Although the protocol is optimized for AGRONOMICS1 arrays, it can readily be adapted to other tiling arrays. AGRONOMICS1 is the first platform that enables strand-specific expression analysis of the Arabidopsis genome with a single array. Moreover, it includes all perfect match probes from the original ATH1 array, allowing readily integration with the large existing ATH1 knowledge base. This workflow is designed for the analysis of raw data for any number of samples and it does not pose any particular hardware requirements.
Collapse
Affiliation(s)
- Diana Coman
- Plant Biotechnology, Department of Biology, ETH Zurich, Zurich, Switzerland
| | | | | |
Collapse
|
11
|
Lemetre C, Zhang ZD. A brief introduction to tiling microarrays: principles, concepts, and applications. Methods Mol Biol 2013; 1067:3-19. [PMID: 23975782 DOI: 10.1007/978-1-62703-607-8_1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Technological achievements have always contributed to the advancement of biomedical research. It has never been more so than in recent times, when the development and application of innovative cutting-edge technologies have transformed biology into a data-rich quantitative science. This stunning revolution in biology primarily ensued from the emergence of microarrays over two decades ago. The completion of whole-genome sequencing projects and the advance in microarray manufacturing technologies enabled the development of tiling microarrays, which gave unprecedented genomic coverage. Since their first description, several types of application of tiling arrays have emerged, each aiming to tackle a different biological problem. Although numerous algorithms have already been developed to analyze microarray data, new method development is still needed not only for better performance but also for integration of available microarray data sets, which without doubt constitute one of the largest collections of biological data ever generated. In this chapter we first introduce the principles behind the emergence and the development of tiling microarrays, and then discuss with some examples how they are used to investigate different biological problems.
Collapse
Affiliation(s)
- Christophe Lemetre
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
| | | |
Collapse
|
12
|
Du Y, Murani E, Ponsuksili S, Wimmers K. Flexible and efficient genome tiling design with penalized uniqueness score. BMC Bioinformatics 2012; 13:323. [PMID: 23216884 PMCID: PMC3583072 DOI: 10.1186/1471-2105-13-323] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2012] [Accepted: 10/26/2012] [Indexed: 11/24/2022] Open
Abstract
Background As a powerful tool in whole genome analysis, tiling array has been widely used in the answering of many genomic questions. Now it could also serve as a capture device for the library preparation in the popular high throughput sequencing experiments. Thus, a flexible and efficient tiling array design approach is still needed and could assist in various types and scales of transcriptomic experiment. Results In this paper, we address issues and challenges in designing probes suitable for tiling array applications and targeted sequencing. In particular, we define the penalized uniqueness score, which serves as a controlling criterion to eliminate potential cross-hybridization, and a flexible tiling array design pipeline. Unlike BLAST or simple suffix array based methods, computing and using our uniqueness measurement can be more efficient for large scale design and require less memory. The parameters provided could assist in various types of genomic tiling task. In addition, using both commercial array data and experiment data we show, unlike previously claimed, that palindromic sequence exhibiting relatively lower uniqueness. Conclusions Our proposed penalized uniqueness score could serve as a better indicator for cross hybridization with higher sensitivity and specificity, giving more control of expected array quality. The flexible tiling design algorithm incorporating the penalized uniqueness score was shown to give higher coverage and resolution. The package to calculate the penalized uniqueness score and the described probe selection algorithm are implemented as a Perl program, which is freely available at http://www1.fbn-dummerstorf.de/en/forschung/fbs/fb3/paper/2012-yang-1/OTAD.v1.1.tar.gz.
Collapse
Affiliation(s)
- Yang Du
- Research Unit Molecular Biology, Leibniz Institute for Farm Animal Biology, Dummerstorf, Germany
| | | | | | | |
Collapse
|
13
|
Abstract
Rapid advances in next-generation sequencing technology are revolutionizing approaches to genomic and epigenomic studies of skin. Deep sequencing of cutaneous malignancies reveals heavily mutagenized genomes with large numbers of low-prevalence mutations and multiple resistance mechanisms to targeted therapies. Next-generation sequencing approaches have already paid rich dividends in identifying the genetic causes of dermatologic disease, both in heritable mutations and the somatic aberrations that underlie cutaneous mosaicism. Although epigenetic alterations clearly influence tumorigenesis, pluripotent stem cell biology, and epidermal cell lineage decisions, labor and cost-intensive approaches long delayed a genome-scale perspective. New insights into epigenomic mechanisms in skin disease should arise from the accelerating assessment of histone modification, DNA methylation, and related gene expression signatures.
Collapse
Affiliation(s)
- Jeffrey B Cheng
- Department of Dermatology, University of California, San Francisco, San Francisco, California 94143, USA
| | | |
Collapse
|
14
|
Hafemeister C, Krause R, Schliep A. Selecting oligonucleotide probes for whole-genome tiling arrays with a cross-hybridization potential. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1642-1652. [PMID: 21358006 DOI: 10.1109/tcbb.2011.39] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
For designing oligonucleotide tiling arrays popular, current methods still rely on simple criteria like Hamming distance or longest common factors, neglecting base stacking effects which strongly contribute to binding energies. Consequently, probes are often prone to cross-hybridization which reduces the signal-to-noise ratio and complicates downstream analysis. We propose the first computationally efficient method using hybridization energy to identify specific oligonucleotide probes. Our Cross-Hybridization Potential (CHP) is computed with a Nearest Neighbor Alignment, which efficiently estimates a lower bound for the Gibbs free energy of the duplex formed by two DNA sequences of bounded length. It is derived from our simplified reformulation of t-gap insertion-deletion-like metrics. The computations are accelerated by a filter using weighted ungapped q-grams to arrive at seeds. The computation of the CHP is implemented in our software OSProbes, available under the GPL, which computes sets of viable probe candidates. The user can choose a trade-off between running time and quality of probes selected. We obtain very favorable results in comparison with prior approaches with respect to specificity and sensitivity for cross-hybridization and genome coverage with high-specificity probes. The combination of OSProbes and our Tileomatic method, which computes optimal tiling paths from candidate sets, yields globally optimal tiling arrays, balancing probe distance, hybridization conditions, and uniqueness of hybridization.
Collapse
Affiliation(s)
- Christoph Hafemeister
- Department of Biology, New York University, 100 Washington Square East, Rm 1009, New York, NY 10003-6688, USA.
| | | | | |
Collapse
|
15
|
Weinhouse C, Anderson OS, Jones TR, Kim J, Liberman SA, Nahar MS, Rozek LS, Jirtle RL, Dolinoy DC. An expression microarray approach for the identification of metastable epialleles in the mouse genome. Epigenetics 2011; 6:1105-13. [PMID: 21829099 DOI: 10.4161/epi.6.9.17103] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
Genetic loci displaying environmentally responsive epigenetic marks, termed metastable epialleles, offer a solution to the paradox presented by genetically identical yet phenotypically distinct individuals. The murine viable yellow agouti (A (vy) ) metastable epiallele exhibits stochastic DNA methylation and histone modifications associated with coat color variation in isogenic individuals. The distribution of A (vy) variable expressivity shifts following maternal nutritional and environmental exposures. To characterize additional murine metastable epialleles, we utilized genome-wide expression arrays (N = 10 male individuals, 3 tissues per individual) and identified candidates displaying large variability in gene expression among individuals (Vi = inter-individual variance), concomitant with a low variability in gene expression across tissues from the three germ layers (Vt = inter-tissue variance), two features characteristic of the A (vy) metastable epiallele. The CpG island in the promoter of Dnajb1 and two contraoriented ERV class II repeats in Glcci1 were validated to display underlying stochasticity in methylation patterns common to metastable epialleles. Furthermore, liver DNA methylation in mice exposed in utero to 50 mg bisphenol A (BPA)/kg diet (N = 91) or a control diet (N = 79) confirmed environmental lability at validated candidate genes. Significant effects of exposure on mean CpG methylation were observed at the Glcci1 Repeat 1 locus (p < 0.0001). Significant effects of BPA also were observed at the first and fifth CpG sites studied in Glcci1 Repeat 2 (p < 0.0001 and p = 0.004, respectively). BPA did not affect methylation in the promoter of Dnajb1 (p = 0.59). The characterization of metastable epialleles in humans is crucial for the development of novel screening and therapeutic targets for human disease prevention.
Collapse
Affiliation(s)
- Caren Weinhouse
- Department of Environmental Health Sciences; University of Michigan, Ann Arbor, MI, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Dufour YS, Wesenberg GE, Tritt AJ, Glasner JD, Perna NT, Mitchell JC, Donohue TJ. chipD: a web tool to design oligonucleotide probes for high-density tiling arrays. Nucleic Acids Res 2010; 38:W321-5. [PMID: 20529880 PMCID: PMC2896189 DOI: 10.1093/nar/gkq517] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
chipD is a web server that facilitates design of DNA oligonucleotide probes for high-density tiling arrays, which can be used in a number of genomic applications such as ChIP-chip or gene-expression profiling. The server implements a probe selection algorithm that takes as an input, in addition to the target sequences, a set of parameters that allow probe design to be tailored to specific applications, protocols or the array manufacturer's requirements. The algorithm optimizes probes to meet three objectives: (i) probes should be specific; (ii) probes should have similar thermodynamic properties; and (iii) the target sequence coverage should be homogeneous and avoid significant gaps. The output provides in a text format, the list of probe sequences with their genomic locations, targeted strands and hybridization characteristics. chipD has been used successfully to design tiling arrays for bacteria and yeast. chipD is available at http://chipd.uwbacter.org/.
Collapse
Affiliation(s)
- Yann S Dufour
- Department of Bacteriology, University of Wisconsin, Madison, WI 53706, USA.
| | | | | | | | | | | | | |
Collapse
|
17
|
Mulle JG, Patel VC, Warren ST, Hegde MR, Cutler DJ, Zwick ME. Empirical evaluation of oligonucleotide probe selection for DNA microarrays. PLoS One 2010; 5:e9921. [PMID: 20360966 PMCID: PMC2847945 DOI: 10.1371/journal.pone.0009921] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2009] [Accepted: 12/04/2009] [Indexed: 12/04/2022] Open
Abstract
DNA-based microarrays are increasingly central to biomedical research. Selecting oligonucleotide sequences that will behave consistently across experiments is essential to the design, production and performance of DNA microarrays. Here our aim was to improve on probe design parameters by empirically and systematically evaluating probe performance in a multivariate context. We used experimental data from 19 array CGH hybridizations to assess the probe performance of 385,474 probes tiled in the Duchenne muscular dystrophy (DMD) region of the X chromosome. Our results demonstrate that probe melting temperature, single nucleotide polymorphisms (SNPs), and homocytosine motifs all have a strong effect on probe behavior. These findings, when incorporated into future microarray probe selection algorithms, may improve microarray performance for a wide variety of applications.
Collapse
Affiliation(s)
- Jennifer G Mulle
- Department of Human Genetics, Emory University School of Medicine, Atlanta, Georgia, United States of America.
| | | | | | | | | | | |
Collapse
|
18
|
Jourdren L, Duclos A, Brion C, Portnoy T, Mathis H, Margeot A, Le Crom S. Teolenn: an efficient and customizable workflow to design high-quality probes for microarray experiments. Nucleic Acids Res 2010; 38:e117. [PMID: 20176570 PMCID: PMC2879536 DOI: 10.1093/nar/gkq110] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Despite the development of new high-throughput sequencing techniques, microarrays are still attractive tools to study small genome organisms, thanks to sample multiplexing and high-feature densities. However, the oligonucleotide design remains a delicate step for most users. A vast array of software is available to deal with this problem, but each program is developed with its own strategy, which makes the choice of the best solution difficult. Here we describe Teolenn, a universal probe design workflow developed with a flexible and customizable module organization allowing fixed or variable length oligonucleotide generation. In addition, our software is able to supply quality scores for each of the designed probes. In order to assess the relevance of these scores, we performed a real hybridization using a tiling array designed against the Trichoderma reesei fungus genome. We show that our scoring pipeline correlates with signal quality for 97.2% of all the designed probes, allowing for a posteriori comparisons between quality scores and signal intensities. This result is useful in discarding any bad scoring probes during the design step in order to get high-quality microarrays. Teolenn is available at http://transcriptome.ens.fr/teolenn/.
Collapse
Affiliation(s)
- Laurent Jourdren
- Institut de Biologie de l'Ecole Normale Supérieure, Institut National de la Santé et de la Recherche Médicale U1024, Centre National de la Recherche Scientifique UMR8197, 75005 Paris, France
| | | | | | | | | | | | | |
Collapse
|
19
|
Høvik H, Chen T. Dynamic probe selection for studying microbial transcriptome with high-density genomic tiling microarrays. BMC Bioinformatics 2010; 11:82. [PMID: 20144223 PMCID: PMC2836303 DOI: 10.1186/1471-2105-11-82] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2009] [Accepted: 02/09/2010] [Indexed: 12/27/2022] Open
Abstract
Background Current commercial high-density oligonucleotide microarrays can hold millions of probe spots on a single microscopic glass slide and are ideal for studying the transcriptome of microbial genomes using a tiling probe design. This paper describes a comprehensive computational pipeline implemented specifically for designing tiling probe sets to study microbial transcriptome profiles. Results The pipeline identifies every possible probe sequence from both forward and reverse-complement strands of all DNA sequences in the target genome including circular or linear chromosomes and plasmids. Final probe sequence lengths are adjusted based on the maximal oligonucleotide synthesis cycles and best isothermality allowed. Optimal probes are then selected in two stages - sequential and gap-filling. In the sequential stage, probes are selected from sequence windows tiled alongside the genome. In the gap-filling stage, additional probes are selected from the largest gaps between adjacent probes that have already been selected, until a predefined number of probes is reached. Selection of the highest quality probe within each window and gap is based on five criteria: sequence uniqueness, probe self-annealing, melting temperature, oligonucleotide length, and probe position. Conclusions The probe selection pipeline evaluates global and local probe sequence properties and selects a set of probes dynamically and evenly distributed along the target genome. Unique to other similar methods, an exact number of non-redundant probes can be designed to utilize all the available probe spots on any chosen microarray platform. The pipeline can be applied to microbial genomes when designing high-density tiling arrays for comparative genomics, ChIP chip, gene expression and comprehensive transcriptome studies.
Collapse
Affiliation(s)
- Hedda Høvik
- Department of Oral Biology, Faculty of Dentistry, University of Oslo, Oslo, Norway
| | | |
Collapse
|
20
|
Abstract
Unique substrings in genomes may indicate high level of specificity which is crucial and fundamental to many genetics studies, such as PCR, microarray hybridization, Southern and Northern blotting, RNA interference (RNAi), and genome (re)sequencing. However, being unique sequence in the genome alone is not adequate to guaranty high specificity. For example, nucleotides mismatches within a certain tolerance may impair specificity even if an interested substring occur only once in the genome. In this study we propose the concept of unique-m substrings of genomes for controlling specificity in genome-wide assays. A unique-m substring is defined if it only has a single perfect match on one strand of the entire genome while all other approximate matches must have more than m mismatches. We developed a pattern growth approach to systematically mine such unique-m substrings from a given genome. Our algorithm does not need a pre-processing step to extract sequential information which is required by most of other rival methods. The search for unique-m substrings from genomes is performed as a single task of regular data mining so that the similarities among queries are utilized to achieve tremendous speedup. The runtime of our algorithm is linear to the sizes of input genomes and the length of unique-m substrings. In addition, the unique-m mining algorithm has been parallelized to facilitate genome-wide computation on a cluster or a single machine of multiple CPUs with shared memory.
Collapse
Affiliation(s)
- Kai Ye
- Molecular Epidemiology section, Medical Statistics and Bioinformatics, Leiden University Medical Center, The Netherlands
| | - Zhenyu Jia
- Department of Pathology & Laboratory Medicine, University of California, Irvine, CA 92697, USA
| | - Yipeng Wang
- Department of Pathology & Laboratory Medicine, University of California, Irvine, CA 92697, USA.,Vaccine Research Institute of San Diego, San Diego, CA 92121, USA
| | - Paul Flicek
- EMBL Outstation, European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Rolf Apweiler
- EMBL Outstation, European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
21
|
Phillippy AM, Deng X, Zhang W, Salzberg SL. Efficient oligonucleotide probe selection for pan-genomic tiling arrays. BMC Bioinformatics 2009; 10:293. [PMID: 19758451 PMCID: PMC2753849 DOI: 10.1186/1471-2105-10-293] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2009] [Accepted: 09/16/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Array comparative genomic hybridization is a fast and cost-effective method for detecting, genotyping, and comparing the genomic sequence of unknown bacterial isolates. This method, as with all microarray applications, requires adequate coverage of probes targeting the regions of interest. An unbiased tiling of probes across the entire length of the genome is the most flexible design approach. However, such a whole-genome tiling requires that the genome sequence is known in advance. For the accurate analysis of uncharacterized bacteria, an array must query a fully representative set of sequences from the species' pan-genome. Prior microarrays have included only a single strain per array or the conserved sequences of gene families. These arrays omit potentially important genes and sequence variants from the pan-genome. RESULTS This paper presents a new probe selection algorithm (PanArray) that can tile multiple whole genomes using a minimal number of probes. Unlike arrays built on clustered gene families, PanArray uses an unbiased, probe-centric approach that does not rely on annotations, gene clustering, or multi-alignments. Instead, probes are evenly tiled across all sequences of the pan-genome at a consistent level of coverage. To minimize the required number of probes, probes conserved across multiple strains in the pan-genome are selected first, and additional probes are used only where necessary to span polymorphic regions of the genome. The viability of the algorithm is demonstrated by array designs for seven different bacterial pan-genomes and, in particular, the design of a 385,000 probe array that fully tiles the genomes of 20 different Listeria monocytogenes strains with overlapping probes at greater than twofold coverage. CONCLUSION PanArray is an oligonucleotide probe selection algorithm for tiling multiple genome sequences using a minimal number of probes. It is capable of fully tiling all genomes of a species on a single microarray chip. These unique pan-genome tiling arrays provide maximum flexibility for the analysis of both known and uncharacterized strains.
Collapse
Affiliation(s)
- Adam M Phillippy
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA.
| | | | | | | |
Collapse
|
22
|
Sasidharan R, Agarwal A, Rozowsky J, Gerstein M. An approach to comparing tiling array and high throughput sequencing technologies for genomic transcript mapping. BMC Res Notes 2009; 2:150. [PMID: 19630981 PMCID: PMC2764720 DOI: 10.1186/1756-0500-2-150] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2009] [Accepted: 07/24/2009] [Indexed: 11/24/2022] Open
Abstract
Background There are two main technologies for transcriptome profiling, namely, tiling microarrays and high-throughput sequencing. Recently there has been a tremendous amount of excitement about the latter because of the advent of next-generation sequencing technologies and its promises. Consequently, the question of the moment is how these two technologies compare. Here we attempt to develop an approach to do a fair comparison of transcripts identified from tiling microarray and MPSS sequencing data. Findings This comparison is a challenging task because the sequencing data is discrete while the tiling array data is continuous. We use the published rice and Arabidopsis datasets which provide currently best matched sets of arrays and sequencing experiments using a slightly earlier generation of sequencing, the MPSS tag sequencing technology. After scoring the arrays consistently in both the organisms, a first pass comparison reveals a surprisingly small overlap in transcripts of 22% and 66% respectively, in rice and Arabidopsis. However, when we do the analysis in detail, we find that this is an underestimate. In particular, when we map the probe intensities onto the sequencing tags and then look at their intensity distribution, we see that they are very similar to exons. Furthermore, restricting our comparison to only protein-coding gene loci revealed a very good overlap between the two technologies. Conclusion Our approach to compare genome tiling microarray and MPSS sequencing data suggests that there is actually a reasonable overlap in transcripts identified by the two technologies. This overlap is distorted by the scoring and thresholding in the tiling array scoring procedure.
Collapse
Affiliation(s)
- Rajkumar Sasidharan
- Molecular Biophysics and Biochemistry Department, Yale University, New Haven, CT 06520, USA.
| | | | | | | |
Collapse
|
23
|
Tang H, Therneau TM. Statistical metrics for quality assessment of high-density tiling array data. Biometrics 2009; 66:630-5. [PMID: 19645697 DOI: 10.1111/j.1541-0420.2009.01298.x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
High-density tiling arrays are designed to blanket an entire genomic region of interest using tiled oligonucleotides at very high resolution and are widely used in various biological applications. Experiments are usually conducted in multiple stages, in which unwanted technical variations may be introduced. As tiling arrays become more popular and are adopted by many research labs, it is pressing to develop quality control tools as was done for expression microarrays. We propose a set of statistical quality metrics analogous to those in expression microarrays with application to tiling array data. We also develop a method to estimate the significance level of an observed quality measurement using randomization tests. These methods have been applied to multiple real data sets, including three independent ChIP-chip experiments and one transcriptom mapping study, and they have successfully identified good quality chips as well as outliers in each study.
Collapse
Affiliation(s)
- Hui Tang
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota 55905, USA.
| | | |
Collapse
|
24
|
Mita H, Toyota M, Aoki F, Akashi H, Maruyama R, Sasaki Y, Suzuki H, Idogawa M, Kashima L, Yanagihara K, Fujita M, Hosokawa M, Kusano M, Sabau SV, Tatsumi H, Imai K, Shinomura Y, Tokino T. A novel method, digital genome scanning detects KRAS gene amplification in gastric cancers: involvement of overexpressed wild-type KRAS in downstream signaling and cancer cell growth. BMC Cancer 2009; 9:198. [PMID: 19545448 PMCID: PMC2717977 DOI: 10.1186/1471-2407-9-198] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2008] [Accepted: 06/23/2009] [Indexed: 01/02/2023] Open
Abstract
Background Gastric cancer is the third most common malignancy affecting the general population worldwide. Aberrant activation of KRAS is a key factor in the development of many types of tumor, however, oncogenic mutations of KRAS are infrequent in gastric cancer. We have developed a novel quantitative method of analysis of DNA copy number, termed digital genome scanning (DGS), which is based on the enumeration of short restriction fragments, and does not involve PCR or hybridization. In the current study, we used DGS to survey copy-number alterations in gastric cancer cells. Methods DGS of gastric cancer cell lines was performed using the sequences of 5000 to 15000 restriction fragments. We screened 20 gastric cancer cell lines and 86 primary gastric tumors for KRAS amplification by quantitative PCR, and investigated KRAS amplification at the DNA, mRNA and protein levels by mutational analysis, real-time PCR, immunoblot analysis, GTP-RAS pull-down assay and immunohistochemical analysis. The effect of KRAS knock-down on the activation of p44/42 MAP kinase and AKT and on cell growth were examined by immunoblot and colorimetric assay, respectively. Results DGS analysis of the HSC45 gastric cancer cell line revealed the amplification of a 500-kb region on chromosome 12p12.1, which contains the KRAS gene locus. Amplification of the KRAS locus was detected in 15% (3/20) of gastric cancer cell lines (8–18-fold amplification) and 4.7% (4/86) of primary gastric tumors (8–50-fold amplification). KRAS mutations were identified in two of the three cell lines in which KRAS was amplified, but were not detected in any of the primary tumors. Overexpression of KRAS protein correlated directly with increased KRAS copy number. The level of GTP-bound KRAS was elevated following serum stimulation in cells with amplified wild-type KRAS, but not in cells with amplified mutant KRAS. Knock-down of KRAS in gastric cancer cells that carried amplified wild-type KRAS resulted in the inhibition of cell growth and suppression of p44/42 MAP kinase and AKT activity. Conclusion Our study highlights the utility of DGS for identification of copy-number alterations. Using DGS, we identified KRAS as a gene that is amplified in human gastric cancer. We demonstrated that gene amplification likely forms the molecular basis of overactivation of KRAS in gastric cancer. Additional studies using a larger cohort of gastric cancer specimens are required to determine the diagnostic and therapeutic implications of KRAS amplification and overexpression.
Collapse
Affiliation(s)
- Hiroaki Mita
- Department of Molecular Biology, Cancer Research Institute, Sapporo Medical University, Sapporo, Japan.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Thomassen GOS, Rowe AD, Lagesen K, Lindvall JM, Rognes T. Custom design and analysis of high-density oligonucleotide bacterial tiling microarrays. PLoS One 2009; 4:e5943. [PMID: 19536279 PMCID: PMC2691959 DOI: 10.1371/journal.pone.0005943] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2009] [Accepted: 05/18/2009] [Indexed: 11/21/2022] Open
Abstract
Background High-density tiling microarrays are a powerful tool for the characterization of complete genomes. The two major computational challenges associated with custom-made arrays are design and analysis. Firstly, several genome dependent variables, such as the genome's complexity and sequence composition, need to be considered in the design to ensure a high quality microarray. Secondly, since tiling projects today very often exceed the limits of conventional array-experiments, researchers cannot use established computer tools designed for commercial arrays, and instead have to redesign previous methods or create novel tools. Principal Findings Here we describe the multiple aspects involved in the design of tiling arrays for transcriptome analysis and detail the normalisation and analysis procedures for such microarrays. We introduce a novel design method to make two 280,000 feature microarrays covering the entire genome of the bacterial species Escherichia coli and Neisseria meningitidis, respectively, as well as the use of multiple copies of control probe-sets on tiling microarrays. Furthermore, a novel normalisation and background estimation procedure for tiling arrays is presented along with a method for array analysis focused on detection of short transcripts. The design, normalisation and analysis methods have been applied in various experiments and several of the detected novel short transcripts have been biologically confirmed by Northern blot tests. Conclusions Tiling-arrays are becoming increasingly applicable in genomic research, but researchers still lack both the tools for custom design of arrays, as well as the systems and procedures for analysis of the vast amount of data resulting from such experiments. We believe that the methods described herein will be a useful contribution and resource for researchers designing and analysing custom tiling arrays for both bacteria and higher organisms.
Collapse
Affiliation(s)
- Gard O. S. Thomassen
- Centre for Molecular Biology and Neuroscience (CMBN), Institute of Medical Microbiology, University of Oslo, Oslo, Norway
- Centre for Molecular Biology and Neuroscience (CMBN), Institute of Medical Microbiology, Oslo University Hospital, Rikshospitalet, Oslo, Norway
| | - Alexander D. Rowe
- Centre for Molecular Biology and Neuroscience (CMBN), Institute of Medical Microbiology, Oslo University Hospital, Rikshospitalet, Oslo, Norway
| | - Karin Lagesen
- Centre for Molecular Biology and Neuroscience (CMBN), Institute of Medical Microbiology, Oslo University Hospital, Rikshospitalet, Oslo, Norway
| | | | - Torbjørn Rognes
- Centre for Molecular Biology and Neuroscience (CMBN), Institute of Medical Microbiology, Oslo University Hospital, Rikshospitalet, Oslo, Norway
- Department of Informatics, University of Oslo, Oslo, Norway
- * E-mail:
| |
Collapse
|
26
|
Lemoine S, Combes F, Le Crom S. An evaluation of custom microarray applications: the oligonucleotide design challenge. Nucleic Acids Res 2009; 37:1726-39. [PMID: 19208645 PMCID: PMC2665234 DOI: 10.1093/nar/gkp053] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The increase in feature resolution and the availability of multipack formats from microarray providers has opened the way to various custom genomic applications. However, oligonucleotide design and selection remains a bottleneck of the microarray workflow. Several tools are available to perform this work, and choosing the best one is not an easy task, nor are the choices obvious. Here we review the oligonucleotide design field to help users make their choice. We have first performed a comparative evaluation of the available solutions based on a set of criteria including: ease of installation, user-friendly access, the number of parameters and settings available. In a second step, we chose to submit two real cases to a selection of programs. Finally, we used a set of tests for the in silico benchmark of the oligo sets obtained from each type of software. We show that the design software must be selected according to the goal of the scientist, depending on factors such as the organism used, the number of probes required and their localization on the target sequence. The present work provides keys to the choice of the most relevant software, according to the various parameters we tested.
Collapse
Affiliation(s)
- Sophie Lemoine
- INSERM, CNRS, IFR36, Plate-forme Transcriptome, Paris, France
| | | | | |
Collapse
|
27
|
Lorenzi H, Thiagarajan M, Haas B, Wortman J, Hall N, Caler E. Genome wide survey, discovery and evolution of repetitive elements in three Entamoeba species. BMC Genomics 2008; 9:595. [PMID: 19077187 PMCID: PMC2657916 DOI: 10.1186/1471-2164-9-595] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2008] [Accepted: 12/10/2008] [Indexed: 11/14/2022] Open
Abstract
Background Identification and mapping of repetitive elements is a key step for accurate gene prediction and overall structural annotation of genomes. During the assembly and annotation of three highly repetitive amoeba genomes, Entamoeba histolytica, Entamoeba dispar, and Entamoeba invadens, we performed comparative sequence analysis to identify and map all class I and class II transposable elements in their sequences. Results Here, we report the identification of two novel Entamoeba-specific repeats: ERE1 and ERE2; ERE1 is spread across the three genomes and associated with different repeats in a species-specific manner, while ERE2 is unique to E. histolytica. We also report the identification of two novel subfamilies of LINE and SINE retrotransposons in E. dispar and provide evidence for how the different LINE and SINE subfamilies evolved in these species. Additionally, we found a putative transposase-coding gene in E. histolytica and E. dispar related to the mariner transposon Hydargos from E. invadens. The distribution of transposable elements in these genomes is markedly skewed with a tendency of forming clusters. More than 70% of the three genomes have a repeat density below their corresponding average value indicating that transposable elements are not evenly distributed. We show that repeats and repeat-clusters are found at syntenic break points between E. histolytica and E. dispar and hence, could work as recombination hot spots promoting genome rearrangements. Conclusion The mapping of all transposable elements found in these parasites shows that repeat coverage is up to three times higher than previously reported. LINE, ERE1 and mariner elements were present in the common ancestor to the three Entamoeba species while ERE2 was likely acquired by E. histolytica after its separation from E. dispar. We demonstrate that E. histolytica and E. dispar share their entire repertoire of LINE and SINE retrotransposons and that Eh_SINE3/Ed_SINE1 originated as a chimeric SINE from Eh/Ed_SINE2 and Eh_SINE1/Ed_SINE3. Our work shows that transposable elements are organized in clusters, frequently found at syntenic break points providing insights into their contribution to chromosome instability and therefore, to genomic variation and speciation in these parasites.
Collapse
Affiliation(s)
- Hernan Lorenzi
- J, Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA.
| | | | | | | | | | | |
Collapse
|
28
|
Schliep A, Krause R. Efficient algorithms for the computational design of optimal tiling arrays. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2008; 5:557-567. [PMID: 18989043 DOI: 10.1109/tcbb.2008.50] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
The representation of a genome by oligonucleotide probes is a prerequisite for the analysis of many of its basic properties, such as transcription factor binding sites, chromosomal breakpoints, gene expression of known genes and detection of novel genes, in particular those coding for small RNAs. An ideal representation would consist of a high density set of oligonucleotides with similar melting temperatures that do not cross-hybridize with other regions of the genome and are equidistantly spaced. The implementation of such design is typically called a tiling array or genome array. We formulate the minimal cost tiling path problem for the selection of oligonucleotides from a set of candidates. Computing the selection of probes requires multi-criterion optimization, which we cast into a shortest path problem. Standard algorithms running in linear time allow us to compute globally optimal tiling paths from millions of candidate oligonucleotides on a standard desktop computer for most problem variants. The solutions to this multi-criterion optimization are spatially adaptive to the problem instance. Our formulation incorporates experimental constraints with respect to specific regions of interest and trade offs between hybridization parameters, probe quality and tiling density easily. A web application is available at http://tileomatic.org.
Collapse
Affiliation(s)
- Alexander Schliep
- Department Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestrasse 69-73, 14195 Berlin, Germany
| | | |
Collapse
|
29
|
Srivastava GP, Guo J, Shi H, Xu D. PRIMEGENS-v2: genome-wide primer design for analyzing DNA methylation patterns of CpG islands. Bioinformatics 2008; 24:1837-42. [PMID: 18579568 DOI: 10.1093/bioinformatics/btn320] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION DNA methylation plays important roles in biological processes and human diseases, especially cancers. High-throughput bisulfite genomic sequencing based on new generation of sequencers, such as the 454-sequencing system provides an efficient method for analyzing DNA methylation patterns. The successful implementation of this approach depends on the use of primer design software capable of performing genome-wide scan for optimal primers from in silico bisulfite-treated genome sequences. We have developed a method, which fulfills this requirement and conduct primer design for sequences including regions of given promoter CpG islands. RESULTS The developed method has been implemented using the C and JAVA programming languages. The primer design results were tested in the PCR experiments of 96 selected human DNA sequences containing CpG islands in the promoter regions. The results indicate that this method is efficient and reliable for designing sequence-specific primers. AVAILABILITY The sequence-specific primer design for DNA meth-ylated sequences including CpG islands has been integrated into the second version of PRIMEGENS as one of the primer design features. The software is freely available for academic use at http://digbio.missouri.edu/primegens/.
Collapse
Affiliation(s)
- Gyan P Srivastava
- Computer Science Department and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | | | | | | |
Collapse
|
30
|
He H, Wang J, Liu T, Liu XS, Li T, Wang Y, Qian Z, Zheng H, Zhu X, Wu T, Shi B, Deng W, Zhou W, Skogerbø G, Chen R. Mapping the C. elegans noncoding transcriptome with a whole-genome tiling microarray. Genome Res 2007; 17:1471-7. [PMID: 17785534 PMCID: PMC1987347 DOI: 10.1101/gr.6611807] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The number of annotated protein coding genes in the genome of Caenorhabditis elegans is similar to that of other animals, but the extent of its non-protein-coding transcriptome remains unknown. Expression profiling on whole-genome tiling microarrays applied to a mixed-stage C. elegans population verified the expression of 71% of all annotated exons. Only a small fraction (11%) of the polyadenylated transcription is non-annotated and appears to consist of approximately 3200 missed or alternative exons and 7800 small transcripts of unknown function (TUFs). Almost half (44%) of the detected transcriptional output is non-polyadenylated and probably not protein coding, and of this, 70% overlaps the boundaries of protein-coding genes in a complex manner. Specific analysis of small non-polyadenylated transcripts verified 97% of all annotated small ncRNAs and suggested that the transcriptome contains approximately 1200 small (<500 nt) unannotated noncoding loci. After combining overlapping transcripts, we estimate that at least 70% of the total C. elegans genome is transcribed.
Collapse
Affiliation(s)
- Housheng He
- Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- Graduate School of the Chinese Academy of Science, Beijing 100080, China
- Corresponding author.E-MAIL ; fax 86-10-64889892.E-mail ; fax 86-10-64889892
| | - Jie Wang
- Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- Graduate School of the Chinese Academy of Science, Beijing 100080, China
- Corresponding author.E-MAIL ; fax 86-10-64889892.E-mail ; fax 86-10-64889892
| | - Tao Liu
- Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- Graduate School of the Chinese Academy of Science, Beijing 100080, China
- Corresponding author.E-MAIL ; fax 86-10-64889892.E-mail ; fax 86-10-64889892
| | - X. Shirley Liu
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts 02115, USA
- Harvard School of Public Health, Boston, Massachusetts 02115, USA
| | - Tiantian Li
- Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- Graduate School of the Chinese Academy of Science, Beijing 100080, China
| | - Yunfei Wang
- Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- Graduate School of the Chinese Academy of Science, Beijing 100080, China
| | - Zuwei Qian
- Affymetrix, Inc., Santa Clara, California 95051, USA
| | - Haixia Zheng
- Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- Graduate School of the Chinese Academy of Science, Beijing 100080, China
| | - Xiaopeng Zhu
- Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- Graduate School of the Chinese Academy of Science, Beijing 100080, China
| | - Tao Wu
- Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- Graduate School of the Chinese Academy of Science, Beijing 100080, China
| | - Baochen Shi
- Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- Graduate School of the Chinese Academy of Science, Beijing 100080, China
| | - Wei Deng
- Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Wei Zhou
- Affymetrix, Inc., Santa Clara, California 95051, USA
| | - Geir Skogerbø
- Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- Corresponding author.E-MAIL ; fax 86-10-64889892.E-mail ; fax 86-10-64889892
| | - Runsheng Chen
- Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- Bioinformatics Research Group, Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Science, Beijing 100080, China
- Chinese National Human Genome Center, Beijing 100176, China
- Corresponding author.E-MAIL ; fax 86-10-64889892.E-mail ; fax 86-10-64889892
| |
Collapse
|
31
|
Gräf S, Nielsen FGG, Kurtz S, Huynen MA, Birney E, Stunnenberg H, Flicek P. Optimized design and assessment of whole genome tiling arrays. ACTA ACUST UNITED AC 2007; 23:i195-204. [PMID: 17646297 PMCID: PMC5892713 DOI: 10.1093/bioinformatics/btm200] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Recent advances in microarray technologies have made it feasible to interrogate whole genomes with tiling arrays and this technique is rapidly becoming one of the most important high-throughput functional genomics assays. For large mammalian genomes, analyzing oligonucleotide tiling array data is complicated by the presence of non-unique sequences on the array, which increases the overall noise in the data and may lead to false positive results due to cross-hybridization. The ability to create custom microarrays using maskless array synthesis has led us to consider ways to optimize array design characteristics for improving data quality and analysis. We have identified a number of design parameters to be optimized including uniqueness of the probe sequences within the whole genome, melting temperature and self-hybridization potential. RESULTS We introduce the uniqueness score, U, a novel quality measure for oligonucleotide probes and present a method to quickly compute it. We show that U is equivalent to the number of shortest unique substrings in the probe and describe an efficient greedy algorithm to design mammalian whole genome tiling arrays using probes that maximize U. Using the mouse genome, we demonstrate how several optimizations influence the tiling array design characteristics. With a sensible set of parameters, our designs cover 78% of the mouse genome including many regions previously considered 'untilable' due to the presence of repetitive sequence. Finally, we compare our whole genome tiling array designs with commercially available designs. AVAILABILITY Source code is available under an open source license from http://www.ebi.ac.uk/~graef/arraydesign/.
Collapse
Affiliation(s)
- Stefan Gräf
- EMBL–European Bioinformatics Institute, Hinxton, Cambridge,
UK
| | - Fiona G. G. Nielsen
- Nijmegen Center for Molecular Life Sciences, Radboud University
Nijmegen, The Netherlands
- Nijmegen Center for Molecular Life Sciences, Radboud University
Nijmegen Medical Center, The Netherlands
| | - Stefan Kurtz
- Center for Bioinformatics, University of Hamburg, Germany
| | - Martijn A. Huynen
- Nijmegen Center for Molecular Life Sciences, Radboud University
Nijmegen Medical Center, The Netherlands
| | - Ewan Birney
- EMBL–European Bioinformatics Institute, Hinxton, Cambridge,
UK
| | - Henk Stunnenberg
- Nijmegen Center for Molecular Life Sciences, Radboud University
Nijmegen, The Netherlands
| | - Paul Flicek
- EMBL–European Bioinformatics Institute, Hinxton, Cambridge,
UK
- To whom correspondence should be addressed:
| |
Collapse
|
32
|
Rivals E, Boureux A, Lejeune M, Ottones F, Pecharromàn Pérez O, Tarhio J, Pierrat F, Ruffle F, Commes T, Marti J. Transcriptome annotation using tandem SAGE tags. Nucleic Acids Res 2007; 35:e108. [PMID: 17709346 PMCID: PMC2034470 DOI: 10.1093/nar/gkm495] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Analysis of several million expressed gene signatures (tags) revealed an increasing number of different sequences, largely exceeding that of annotated genes in mammalian genomes. Serial analysis of gene expression (SAGE) can reveal new Poly(A) RNAs transcribed from previously unrecognized chromosomal regions. However, conventional SAGE tags are too short to identify unambiguously unique sites in large genomes. Here, we design a novel strategy with tags anchored on two different restrictions sites of cDNAs. New transcripts are then tentatively defined by the two SAGE tags in tandem and by the spanning sequence read on the genome between these tagged sites. Having developed a new algorithm to locate these tag-delimited genomic sequences (TDGS), we first validated its capacity to recognize known genes and its ability to reveal new transcripts with two SAGE libraries built in parallel from a single RNA sample. Our algorithm proves fast enough to experiment this strategy at a large scale. We then collected and processed the complete sets of human SAGE tags to predict yet unknown transcripts. A cross-validation with tiling arrays data shows that 47% of these TDGS overlap transcriptional active regions. Our method provides a new and complementary approach for complex transcriptome annotation.
Collapse
Affiliation(s)
- Eric Rivals
- Laboratoire d’Informatique, de Robotique et de Microélectronique, UMR 5506 CNRS – Université de Montpellier II, 161 rue Ada, 34392 Montpellier 05, Institut de Génétique Humaine, CNRS UPR 1142, 141 rue de la Cardonille, 34396 Montpellier 05, France, Helsinki University of Technology, P.O. Box 5400, FI-02015 HUT, Finland and Skuld-Tech, 134, rue du Curat – Bat. Amarante, 34090 Montpellier, France
| | - Anthony Boureux
- Laboratoire d’Informatique, de Robotique et de Microélectronique, UMR 5506 CNRS – Université de Montpellier II, 161 rue Ada, 34392 Montpellier 05, Institut de Génétique Humaine, CNRS UPR 1142, 141 rue de la Cardonille, 34396 Montpellier 05, France, Helsinki University of Technology, P.O. Box 5400, FI-02015 HUT, Finland and Skuld-Tech, 134, rue du Curat – Bat. Amarante, 34090 Montpellier, France
| | - Mireille Lejeune
- Laboratoire d’Informatique, de Robotique et de Microélectronique, UMR 5506 CNRS – Université de Montpellier II, 161 rue Ada, 34392 Montpellier 05, Institut de Génétique Humaine, CNRS UPR 1142, 141 rue de la Cardonille, 34396 Montpellier 05, France, Helsinki University of Technology, P.O. Box 5400, FI-02015 HUT, Finland and Skuld-Tech, 134, rue du Curat – Bat. Amarante, 34090 Montpellier, France
| | - Florence Ottones
- Laboratoire d’Informatique, de Robotique et de Microélectronique, UMR 5506 CNRS – Université de Montpellier II, 161 rue Ada, 34392 Montpellier 05, Institut de Génétique Humaine, CNRS UPR 1142, 141 rue de la Cardonille, 34396 Montpellier 05, France, Helsinki University of Technology, P.O. Box 5400, FI-02015 HUT, Finland and Skuld-Tech, 134, rue du Curat – Bat. Amarante, 34090 Montpellier, France
| | - Oscar Pecharromàn Pérez
- Laboratoire d’Informatique, de Robotique et de Microélectronique, UMR 5506 CNRS – Université de Montpellier II, 161 rue Ada, 34392 Montpellier 05, Institut de Génétique Humaine, CNRS UPR 1142, 141 rue de la Cardonille, 34396 Montpellier 05, France, Helsinki University of Technology, P.O. Box 5400, FI-02015 HUT, Finland and Skuld-Tech, 134, rue du Curat – Bat. Amarante, 34090 Montpellier, France
| | - Jorma Tarhio
- Laboratoire d’Informatique, de Robotique et de Microélectronique, UMR 5506 CNRS – Université de Montpellier II, 161 rue Ada, 34392 Montpellier 05, Institut de Génétique Humaine, CNRS UPR 1142, 141 rue de la Cardonille, 34396 Montpellier 05, France, Helsinki University of Technology, P.O. Box 5400, FI-02015 HUT, Finland and Skuld-Tech, 134, rue du Curat – Bat. Amarante, 34090 Montpellier, France
| | - Fabien Pierrat
- Laboratoire d’Informatique, de Robotique et de Microélectronique, UMR 5506 CNRS – Université de Montpellier II, 161 rue Ada, 34392 Montpellier 05, Institut de Génétique Humaine, CNRS UPR 1142, 141 rue de la Cardonille, 34396 Montpellier 05, France, Helsinki University of Technology, P.O. Box 5400, FI-02015 HUT, Finland and Skuld-Tech, 134, rue du Curat – Bat. Amarante, 34090 Montpellier, France
| | - Florence Ruffle
- Laboratoire d’Informatique, de Robotique et de Microélectronique, UMR 5506 CNRS – Université de Montpellier II, 161 rue Ada, 34392 Montpellier 05, Institut de Génétique Humaine, CNRS UPR 1142, 141 rue de la Cardonille, 34396 Montpellier 05, France, Helsinki University of Technology, P.O. Box 5400, FI-02015 HUT, Finland and Skuld-Tech, 134, rue du Curat – Bat. Amarante, 34090 Montpellier, France
| | - Thérèse Commes
- Laboratoire d’Informatique, de Robotique et de Microélectronique, UMR 5506 CNRS – Université de Montpellier II, 161 rue Ada, 34392 Montpellier 05, Institut de Génétique Humaine, CNRS UPR 1142, 141 rue de la Cardonille, 34396 Montpellier 05, France, Helsinki University of Technology, P.O. Box 5400, FI-02015 HUT, Finland and Skuld-Tech, 134, rue du Curat – Bat. Amarante, 34090 Montpellier, France
- *To whom correspondence should be addressed. +33 4 67 14 42 36+33 4 67 14 42 36 Correspondence may also be addressed to Jacques Marti. +334 67 144241
| | - Jacques Marti
- Laboratoire d’Informatique, de Robotique et de Microélectronique, UMR 5506 CNRS – Université de Montpellier II, 161 rue Ada, 34392 Montpellier 05, Institut de Génétique Humaine, CNRS UPR 1142, 141 rue de la Cardonille, 34396 Montpellier 05, France, Helsinki University of Technology, P.O. Box 5400, FI-02015 HUT, Finland and Skuld-Tech, 134, rue du Curat – Bat. Amarante, 34090 Montpellier, France
| |
Collapse
|
33
|
Euskirchen GM, Rozowsky JS, Wei CL, Lee WH, Zhang ZD, Hartman S, Emanuelsson O, Stolc V, Weissman S, Gerstein MB, Ruan Y, Snyder M. Mapping of transcription factor binding regions in mammalian cells by ChIP: comparison of array- and sequencing-based technologies. Genome Res 2007; 17:898-909. [PMID: 17568005 PMCID: PMC1891348 DOI: 10.1101/gr.5583007] [Citation(s) in RCA: 160] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Recent progress in mapping transcription factor (TF) binding regions can largely be credited to chromatin immunoprecipitation (ChIP) technologies. We compared strategies for mapping TF binding regions in mammalian cells using two different ChIP schemes: ChIP with DNA microarray analysis (ChIP-chip) and ChIP with DNA sequencing (ChIP-PET). We first investigated parameters central to obtaining robust ChIP-chip data sets by analyzing STAT1 targets in the ENCODE regions of the human genome, and then compared ChIP-chip to ChIP-PET. We devised methods for scoring and comparing results among various tiling arrays and examined parameters such as DNA microarray format, oligonucleotide length, hybridization conditions, and the use of competitor Cot-1 DNA. The best performance was achieved with high-density oligonucleotide arrays, oligonucleotides >/=50 bases (b), the presence of competitor Cot-1 DNA and hybridizations conducted in microfluidics stations. When target identification was evaluated as a function of array number, 80%-86% of targets were identified with three or more arrays. Comparison of ChIP-chip with ChIP-PET revealed strong agreement for the highest ranked targets with less overlap for the low ranked targets. With advantages and disadvantages unique to each approach, we found that ChIP-chip and ChIP-PET are frequently complementary in their relative abilities to detect STAT1 targets for the lower ranked targets; each method detected validated targets that were missed by the other method. The most comprehensive list of STAT1 binding regions is obtained by merging results from ChIP-chip and ChIP-sequencing. Overall, this study provides information for robust identification, scoring, and validation of TF targets using ChIP-based technologies.
Collapse
Affiliation(s)
- Ghia M. Euskirchen
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut 06520-8103, USA
| | - Joel S. Rozowsky
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520-8114, USA
| | | | | | - Zhengdong D. Zhang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520-8114, USA
| | - Stephen Hartman
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut 06520-8103, USA
| | - Olof Emanuelsson
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520-8114, USA
| | - Viktor Stolc
- Center for Nanotechnology, NASA Ames Research Center, Moffett Field, California 94035, USA
| | - Sherman Weissman
- Department of Genetics, Yale University School of Medicine, New Haven, Connecticut 06520-8005, USA
| | - Mark B. Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520-8114, USA
| | - Yijun Ruan
- Genome Institute of Singapore, Singapore 138672
| | - Michael Snyder
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut 06520-8103, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520-8114, USA
- Corresponding author.E-mail ; fax (203) 432-6161
| |
Collapse
|
34
|
An efficient pseudomedian filter for tiling microrrays. BMC Bioinformatics 2007; 8:186. [PMID: 17555595 PMCID: PMC1913926 DOI: 10.1186/1471-2105-8-186] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2007] [Accepted: 06/07/2007] [Indexed: 11/17/2022] Open
Abstract
Background Tiling microarrays are becoming an essential technology in the functional genomics toolbox. They have been applied to the tasks of novel transcript identification, elucidation of transcription factor binding sites, detection of methylated DNA and several other applications in several model organisms. These experiments are being conducted at increasingly finer resolutions as the microarray technology enjoys increasingly greater feature densities. The increased densities naturally lead to increased data analysis requirements. Specifically, the most widely employed algorithm for tiling array analysis involves smoothing observed signals by computing pseudomedians within sliding windows, a O(n2logn) calculation in each window. This poor time complexity is an issue for tiling array analysis and could prove to be a real bottleneck as tiling microarray experiments become grander in scope and finer in resolution. Results We therefore implemented Monahan's HLQEST algorithm that reduces the runtime complexity for computing the pseudomedian of n numbers to O(nlogn) from O(n2logn). For a representative tiling microarray dataset, this modification reduced the smoothing procedure's runtime by nearly 90%. We then leveraged the fact that elements within sliding windows remain largely unchanged in overlapping windows (as one slides across genomic space) to further reduce computation by an additional 43%. This was achieved by the application of skip lists to maintaining a sorted list of values from window to window. This sorted list could be maintained with simple O(log n) inserts and deletes. We illustrate the favorable scaling properties of our algorithms with both time complexity analysis and benchmarking on synthetic datasets. Conclusion Tiling microarray analyses that rely upon a sliding window pseudomedian calculation can require many hours of computation. We have eased this requirement significantly by implementing efficient algorithms that scale well with genomic feature density. This result not only speeds the current standard analyses, but also makes possible ones where many iterations of the filter may be required, such as might be required in a bootstrap or parameter estimation setting. Source code and executables are available at .
Collapse
|
35
|
Schnieper-Samec S, Feger G, Wells TN. New biological therapies from the human genome. Expert Opin Drug Discov 2007; 2:621-31. [PMID: 23488954 DOI: 10.1517/17460441.2.5.621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Over the past 20 years, the biotechnology industry has been extraordinarily successful in bringing a wide variety of new products to the market, including recombinant versions of natural proteins such as growth hormone, insulin and the gonadotropins. The availability of the human genome sequence has given us the chance to identify the entire catalogue of human secreted proteins, often called the secretome. One of the challenges of biotechnology research has been to identify the biological activities of these proteins and to identify if any of them could have a therapeutic or pharmacologic use. The paradigm has effectively been reversed, in that it used to be easy to know the biological activity, but difficult to clone, whereas now the contrary is true. Five years on, it is clear that finding new biological activities is a very difficult process. Much of the ground gained in this period has either been through the development of antibodies as therapies or by the use of protein engineering to produce better versions of the proteins that are already being produced.
Collapse
Affiliation(s)
- Sonia Schnieper-Samec
- Merck Serono International SA, Chemin des Mines 9, 1211 Geneva 20, Switzerland +41 22 414 3951 ; +41 22 414 3042 ;
| | | | | |
Collapse
|
36
|
Abstract
Microarrays are revolutionizing genetics by making it possible to genotype hundreds of thousands of DNA markers and to assess the expression (RNA transcripts) of all of the genes in the genome. Microarrays are slides the size of a postage stamp that contain millions of DNA sequences to which single-stranded DNA or RNA can hybridize. This miniaturization requires little DNA or RNA and makes the method fast and inexpensive; multiple assays of each target make the method highly accurate. DNA microarrays with hundreds of thousands of DNA markers have made it possible to conduct systematic scans of the entire genome to identify genetic associations with complex disorders or dimensions likely to be influenced by many genes of small effect size. RNA microarrays can provide snapshots of gene expression across all of the genes in the genome at any time in any tissue, which has far-reaching applications such as structural and functional 'genetic neuroimaging' and providing a biological basis for understanding environmental influence.
Collapse
Affiliation(s)
- Robert Plomin
- Social, Genetic and Developmental Psychiatry, Institute of Psychiatry, London, UK.
| | | |
Collapse
|
37
|
Emanuelsson O, Nagalakshmi U, Zheng D, Rozowsky JS, Urban AE, Du J, Lian Z, Stolc V, Weissman S, Snyder M, Gerstein MB. Assessing the performance of different high-density tiling microarray strategies for mapping transcribed regions of the human genome. Genome Res 2006; 17:886-97. [PMID: 17119069 PMCID: PMC1891347 DOI: 10.1101/gr.5014606] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Genomic tiling microarrays have become a popular tool for interrogating the transcriptional activity of large regions of the genome in an unbiased fashion. There are several key parameters associated with each tiling experiment (e.g., experimental protocols and genomic tiling density). Here, we assess the role of these parameters as they are manifest in different tiling-array platforms used for transcription mapping. First, we analyze how a number of published tiling-array experiments agree with established gene annotation on human chromosome 22. We observe that the transcription detected from high-density arrays correlates substantially better with annotation than that from other array types. Next, we analyze the transcription-mapping performance of the two main high-density oligonucleotide array platforms in the ENCODE regions of the human genome. We hybridize identical biological samples and develop several ways of scoring the arrays and segmenting the genome into transcribed and nontranscribed regions, with the aim of making the platforms most comparable to each other. Finally, we develop a platform comparison approach based on agreement with known annotation. Overall, we find that the performance improves with more data points per locus, coupled with statistical scoring approaches that properly take advantage of this, where this larger number of data points arises from higher genomic tiling density and the use of replicate arrays and mismatches. While we do find significant differences in the performance of the two high-density platforms, we also find that they complement each other to some extent. Finally, our experiments reveal a significant amount of novel transcription outside of known genes, and an appreciable sample of this was validated by independent experiments.
Collapse
Affiliation(s)
- Olof Emanuelsson
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520-8114, USA
| | - Ugrappa Nagalakshmi
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut 06520-8103, USA
| | - Deyou Zheng
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520-8114, USA
| | - Joel S. Rozowsky
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520-8114, USA
| | - Alexander E. Urban
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut 06520-8103, USA
- Department of Genetics, Yale University School of Medicine, New Haven, Connecticut 06520–8005, USA
| | - Jiang Du
- Department of Computer Science, Yale University, New Haven, Connecticut 06520-8285, USA
| | - Zheng Lian
- Department of Genetics, Yale University School of Medicine, New Haven, Connecticut 06520–8005, USA
| | - Viktor Stolc
- Center for Nanotechnology, NASA Ames Research Center, Moffett Field, California 94035, USA
| | - Sherman Weissman
- Department of Genetics, Yale University School of Medicine, New Haven, Connecticut 06520–8005, USA
| | - Michael Snyder
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520-8114, USA
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut 06520-8103, USA
- Corresponding authors.E-mail ; fax (360) 838-7861.E-mail ; fax: (360) 838-7861
| | - Mark B. Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520-8114, USA
- Department of Computer Science, Yale University, New Haven, Connecticut 06520-8285, USA
- Corresponding authors.E-mail ; fax (360) 838-7861.E-mail ; fax: (360) 838-7861
| |
Collapse
|
38
|
Ryder E, Jackson R, Ferguson-Smith A, Russell S. MAMMOT--a set of tools for the design, management and visualization of genomic tiling arrays. Bioinformatics 2006; 22:883-4. [PMID: 16452111 DOI: 10.1093/bioinformatics/btl031] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED The MAMMOT software suite is a collection of Perl and PHP scripts for designing, annotating and visualizing genome tiling arrays to, for example, facilitate studies into the epigenetics of gene regulation. The web design allows rapid experimental data entry from multiple users, and results can easily be shared between groups and individuals. AVAILABILITY http://www.mammot.org.uk/ CONTACT e.ryder@gen.cam.ac.uk.
Collapse
Affiliation(s)
- E Ryder
- Department of Genetics, University of Cambridge CB2 3EH, UK.
| | | | | | | |
Collapse
|