1
|
Saputra D, Rasmussen S, Larsen MV, Haddad N, Sperotto MM, Aarestrup FM, Lund O, Sicheritz-Pontén T. Reads2Type: a web application for rapid microbial taxonomy identification. BMC Bioinformatics 2015; 16:398. [PMID: 26608174 PMCID: PMC4659212 DOI: 10.1186/s12859-015-0829-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2015] [Accepted: 11/17/2015] [Indexed: 12/03/2022] Open
Abstract
Background Identification of bacteria may be based on sequencing and molecular analysis of a specific locus such as 16S rRNA, or a set of loci such as in multilocus sequence typing. In the near future, healthcare institutions and routine diagnostic microbiology laboratories may need to sequence the entire genome of microbial isolates. Therefore we have developed Reads2Type, a web-based tool for taxonomy identification based on whole bacterial genome sequence data. Results Raw sequencing data provided by the user are mapped against a set of marker probes that are derived from currently available bacteria complete genomes. Using a dataset of 1003 whole genome sequenced bacteria from various sequencing platforms, Reads2Type was able to identify the species with 99.5 % accuracy and on the minutes time scale. Conclusions In comparison with other tools, Reads2Type offers the advantage of not needing to transfer sequencing files, as the entire computational analysis is done on the computer of whom utilizes the web application. This also prevents data privacy issues to arise. The Reads2Type tool is available at http://www.cbs.dtu.dk/~dhany/reads2type.html.
Collapse
Affiliation(s)
- Dhany Saputra
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Kemitorvet, Kgs. Lyngby, DK-2800, Denmark.
| | - Simon Rasmussen
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Kemitorvet, Kgs. Lyngby, DK-2800, Denmark.
| | - Mette V Larsen
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Kemitorvet, Kgs. Lyngby, DK-2800, Denmark.
| | - Nizar Haddad
- Bee Research Department, National Centre for Agricultural Research and Extension, P.O. Box 639, Baqa', 19381, Jordan.
| | - Maria Maddalena Sperotto
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Kemitorvet, Kgs. Lyngby, DK-2800, Denmark.
| | - Frank M Aarestrup
- National Food Institute, Division for Epidemiology and Microbial Genomics, Technical University of Denmark, Kemitorvet, Kgs. Lyngby, DK-2800, Denmark.
| | - Ole Lund
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Kemitorvet, Kgs. Lyngby, DK-2800, Denmark.
| | - Thomas Sicheritz-Pontén
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Kemitorvet, Kgs. Lyngby, DK-2800, Denmark.
| |
Collapse
|
2
|
Label-free DNA sequencing using Millikan detection. Anal Biochem 2015; 487:1-7. [PMID: 26151683 DOI: 10.1016/j.ab.2015.06.036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2015] [Revised: 06/22/2015] [Accepted: 06/30/2015] [Indexed: 11/20/2022]
Abstract
A label-free method for DNA sequencing based on the principle of the Millikan oil drop experiment was developed. This sequencing-by-synthesis approach sensed increases in bead charge as nucleotides were added by a polymerase to DNA templates attached to beads. The balance between an electrical force, which was dependent on the number of nucleotide charges on a bead, and opposing hydrodynamic drag and restoring tether forces resulted in a bead velocity that was a function of the number of nucleotides attached to the bead. The velocity of beads tethered via a polymer to a microfluidic channel and subjected to an oscillating electric field was measured using dark-field microscopy and used to determine how many nucleotides were incorporated during each sequencing-by-synthesis cycle. Increases in bead velocity of approximately 1% were reliably detected during DNA polymerization, allowing for sequencing of short DNA templates. The method could lead to a low-cost, high-throughput sequencing platform that could enable routine sequencing in medical applications.
Collapse
|
3
|
Butler MG, Iben JR, Marsden KC, Epstein JA, Granato M, Weinstein BM. SNPfisher: tools for probing genetic variation in laboratory-reared zebrafish. Development 2015; 142:1542-52. [PMID: 25813542 DOI: 10.1242/dev.118786] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2014] [Accepted: 02/25/2015] [Indexed: 12/11/2022]
Abstract
Single nucleotide polymorphisms (SNPs) are the benchmark molecular markers for modern genomics. Until recently, relatively few SNPs were known in the zebrafish genome. The use of next-generation sequencing for the positional cloning of zebrafish mutations has increased the number of known SNP positions dramatically. Still, the identified SNP variants remain under-utilized, owing to scant annotation of strain specificity and allele frequency. To address these limitations, we surveyed SNP variation in three common laboratory zebrafish strains using whole-genome sequencing. This survey identified an average of 5.04 million SNPs per strain compared with the Zv9 reference genome sequence. By comparing the three strains, 2.7 million variants were found to be strain specific, whereas the remaining variants were shared among all (2.3 million) or some of the strains. We also demonstrate the broad usefulness of our identified variants by validating most in independent populations of the same laboratory strains. We have made all of the identified SNPs accessible through 'SNPfisher', a searchable online database (snpfisher.nichd.nih.gov). The SNPfisher website includes the SNPfisher Variant Reporter tool, which provides the genomic position, alternate allele read frequency, strain specificity, restriction enzyme recognition site changes and flanking primers for all SNPs and Indels in a user-defined gene or region of the zebrafish genome. The SNPfisher site also contains links to display our SNP data in the UCSC genome browser. The SNPfisher tools will facilitate the use of SNP variation in zebrafish research as well as vertebrate genome evolution.
Collapse
Affiliation(s)
- Matthew G Butler
- Section on Vertebrate Development, Program in the Genomics of Differentiation, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD 20892, USA
| | - James R Iben
- Section on Molecular Dysmorphology, Program in Developmental Endocrinology and Genetics, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD 20892, USA
| | - Kurt C Marsden
- Department of Cell and Developmental Biology, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA
| | - Jonathan A Epstein
- Section on Molecular Dysmorphology, Program in Developmental Endocrinology and Genetics, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD 20892, USA
| | - Michael Granato
- Department of Cell and Developmental Biology, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA
| | - Brant M Weinstein
- Section on Vertebrate Development, Program in the Genomics of Differentiation, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
4
|
Mikami Y, Fukushima A, Kuwada-Kusunose T, Sakurai T, Kitano T, Komiyama Y, Iwase T, Komiyama K. Whole transcriptome analysis using next-generation sequencing of sterile-cultured Eisenia andrei for immune system research. PLoS One 2015; 10:e0118587. [PMID: 25706644 PMCID: PMC4338202 DOI: 10.1371/journal.pone.0118587] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2014] [Accepted: 01/21/2015] [Indexed: 11/18/2022] Open
Abstract
Recently, earthworms have become a useful model for research into the immune system, and it is expected that results obtained using this model will shed light on the sophisticated vertebrate immune system and the evolution of the immune response, and additionally help identify new biomolecules with therapeutic applications. However, for earthworms to be used as a genetic model of the invertebrate immune system, basic molecular and genetic resources, such as an expressed sequence tag (EST) database, must be developed for this organism. Next-generation sequencing technologies have generated EST libraries by RNA-seq in many model species. In this study, we used Illumina RNA-sequence technology to perform a comprehensive transcriptome analysis using an RNA sample pooled from sterile-cultured Eisenia andrei. All clean reads were assembled de novo into 41,423 unigenes using the Trinity program. Using this transcriptome data, we performed BLAST analysis against the GenBank non-redundant (NR) database and obtained a total of 12,285 significant BLAST hits. Furthermore, gene ontology (GO) analysis assigned 78 unigenes to 24 immune class GO terms. In addition, we detected a unigene with high similarity to beta-1,3-glucuronyltransferase 1 (GlcAT-P), which mediates a glucuronyl transfer reaction during the biosynthesis of the carbohydrate epitope HNK-1 (human natural killer-1, also known as CD57), a marker of NK cells. The identified transcripts will be used to facilitate future research into the immune system using E. andrei.
Collapse
Affiliation(s)
- Yoshikazu Mikami
- Department of Pathology, Nihon University School of Dentistry, 1-8-13, Kanda-Surugadai, Chiyoda-ku, Tokyo 101-8310, Japan
| | - Atsushi Fukushima
- RIKEN Center for Sustainable Resource Science, 1-7-22, Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan
| | - Takao Kuwada-Kusunose
- Department of Liberal Arts (Chemistry), Nihon University School of Dentistry at Matsudo, 2-870-1, Sakaecho-Nishi, Matsudo, Chiba 271-8587, Japan
| | - Tetsuya Sakurai
- RIKEN Center for Sustainable Resource Science, 1-7-22, Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan
| | - Taiichi Kitano
- Department of Pathology, Nihon University School of Dentistry, 1-8-13, Kanda-Surugadai, Chiyoda-ku, Tokyo 101-8310, Japan
| | - Yusuke Komiyama
- Intensive Care Unit, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Takashi Iwase
- Department of Pathology, Nihon University School of Dentistry, 1-8-13, Kanda-Surugadai, Chiyoda-ku, Tokyo 101-8310, Japan
| | - Kazuo Komiyama
- Department of Pathology, Nihon University School of Dentistry, 1-8-13, Kanda-Surugadai, Chiyoda-ku, Tokyo 101-8310, Japan
- * E-mail:
| |
Collapse
|
5
|
Affiliation(s)
- Graham Casey
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA.
| | - David Conti
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, 90033
| | - Robert Haile
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, 90033
| | - David Duggan
- The Translational Genomics Research Institute (TGen), Phoenix, Arizona 85004
| |
Collapse
|
6
|
Wong K, Bumpstead S, Van Der Weyden L, Reinholdt LG, Wilming LG, Adams DJ, Keane TM. Sequencing and characterization of the FVB/NJ mouse genome. Genome Biol 2012; 13:R72. [PMID: 22916792 PMCID: PMC3491372 DOI: 10.1186/gb-2012-13-8-r72] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2012] [Accepted: 08/23/2012] [Indexed: 01/13/2023] Open
Abstract
Background The FVB/NJ mouse strain has its origins in a colony of outbred Swiss mice established in 1935 at the National Institutes of Health. Mice derived from this source were selectively bred for sensitivity to histamine diphosphate and the B strain of Friend leukemia virus. This led to the establishment of the FVB/N inbred strain, which was subsequently imported to the Jackson Laboratory and designated FVB/NJ. The FVB/NJ mouse has several distinct characteristics, such as large pronuclear morphology, vigorous reproductive performance, and consistently large litters that make it highly desirable for transgenic strain production and general purpose use. Results Using next-generation sequencing technology, we have sequenced the genome of FVB/NJ to approximately 50-fold coverage, and have generated a comprehensive catalog of single nucleotide polymorphisms, small insertion/deletion polymorphisms, and structural variants, relative to the reference C57BL/6J genome. We have examined a previously identified quantitative trait locus for atherosclerosis susceptibility on chromosome 10 and identify several previously unknown candidate causal variants. Conclusion The sequencing of the FVB/NJ genome and generation of this catalog has increased the number of known variant sites in FVB/NJ by a factor of four, and will help accelerate the identification of the precise molecular variants that are responsible for phenotypes observed in this widely used strain.
Collapse
|
7
|
Yalcin B, Adams DJ, Flint J, Keane TM. Next-generation sequencing of experimental mouse strains. Mamm Genome 2012; 23:490-8. [PMID: 22772437 PMCID: PMC3463794 DOI: 10.1007/s00335-012-9402-6] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2012] [Accepted: 05/24/2012] [Indexed: 12/24/2022]
Abstract
Since the turn of the century the complete genome sequence of just one mouse strain, C57BL/6J, has been available. Knowing the sequence of this strain has enabled large-scale forward genetic screens to be performed, the creation of an almost complete set of embryonic stem (ES) cell lines with targeted alleles for protein-coding genes, and the generation of a rich catalog of mouse genomic variation. However, many experiments that use other common laboratory mouse strains have been hindered by a lack of whole-genome sequence data for these strains. The last 5 years has witnessed a revolution in DNA sequencing technologies. Recently, these technologies have been used to expand the repertoire of fully sequenced mouse genomes. In this article we review the main findings of these studies and discuss how the sequence of mouse genomes is helping pave the way from sequence to phenotype. Finally, we discuss the prospects for using de novo assembly techniques to obtain high-quality assembled genome sequences of these laboratory mouse strains, and what advances in sequencing technologies may be required to achieve this goal.
Collapse
Affiliation(s)
- Binnaz Yalcin
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.
| | | | | | | |
Collapse
|
8
|
Abstract
The next-generation sequencing (NGS) revolution has drastically reduced time and cost requirements for sequencing of large genomes, and also qualitatively changed the problem of assembly. This article reviews the state of the art in de novo genome assembly, paying particular attention to mammalian-sized genomes. The strengths and weaknesses of the main sequencing platforms are highlighted, leading to a discussion of assembly and the new challenges associated with NGS data. Current approaches to assembly are outlined and the various software packages available are introduced and compared. The question of whether quality assemblies can be produced using short-read NGS data alone, or whether it must be combined with more expensive sequencing techniques, is considered. Prospects for future assemblers and tests of assembly performance are also discussed.
Collapse
Affiliation(s)
- Joseph Henson
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - German Tischler
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Zemin Ning
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| |
Collapse
|
9
|
Davisson MT, Bergstrom DE, Reinholdt LG, Donahue LR. Discovery Genetics - The History and Future of Spontaneous Mutation Research. ACTA ACUST UNITED AC 2012; 2:103-118. [PMID: 25364627 DOI: 10.1002/9780470942390.mo110200] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Historically, spontaneous mutations in mice have served as valuable models of heritable human diseases, contributing substantially to our understanding of both disease mechanisms and basic biological pathways. While advances in molecular technologies have improved our ability to create mouse models of human disease through targeted mutagenesis and transgenesis, spontaneous mutations continue to provide valuable research tools for discovery of novel genes and functions. In addition, the genetic defects caused by spontaneous mutations are molecularly similar to mutations in the human genome and, therefore often produce phenotypes that more closely resemble those characteristic of human disease than do genetically engineered mutations. Due to the rarity with which spontaneous mutations arise and the animal intensive nature of their genetic analysis, large-scale spontaneous mutation analysis has traditionally been limited to large mammalian genetics institutes. More recently, ENU mutagenesis and new screening methods have increased the rate of mutant strain discovery, and high-throughput DNA sequencing has enabled rapid identification of the underlying genes and their causative mutations. Here, we discuss the continued value of spontaneous mutations for biomedical research.
Collapse
|
10
|
Solieri L, Dakal TC, Giudici P. Next-generation sequencing and its potential impact on food microbial genomics. ANN MICROBIOL 2012. [DOI: 10.1007/s13213-012-0478-8] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022] Open
|
11
|
Single-cell and regional gene expression analysis in Alzheimer's disease. Cell Mol Neurobiol 2012; 32:477-89. [PMID: 22271178 DOI: 10.1007/s10571-012-9797-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2011] [Accepted: 01/03/2012] [Indexed: 01/22/2023]
Abstract
The clinical manifestations of Alzheimer's disease (AD) are secondary to the substantial loss of cortical neurons. To be effective, neuroprotective strategies will need to target the primary pathogenic mechanisms of AD prior to cell loss. The differences between neurons are largely determined by their specific repertoire of mRNAs. Thus, transcriptomic analyses that do not assume a priori etiological hypotheses are potentially powerful tools that can be used to understand the pathogenesis of complex diseases, including AD. The human brain comprises thousands of different cell types of both neuronal and non-neuronal origins. Information about individual cell-type-specific gene expression patterns will allow for a better understanding of the mechanisms that govern the progression of AD, which may lead to new therapeutic targets for prevention and treatment of the disease. This review provides an overview of the current technologies in use and the developments for single-cell extraction and transcriptome analysis. Recent transcriptome profiling studies on individual AD-afflicted brain cells are also discussed.
Collapse
|
12
|
Pareek CS, Smoczynski R, Tretyn A. Sequencing technologies and genome sequencing. J Appl Genet 2011; 52:413-35. [PMID: 21698376 PMCID: PMC3189340 DOI: 10.1007/s13353-011-0057-x] [Citation(s) in RCA: 370] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2011] [Revised: 05/27/2011] [Accepted: 05/31/2011] [Indexed: 12/21/2022]
Abstract
The high-throughput - next generation sequencing (HT-NGS) technologies are currently the hottest topic in the field of human and animals genomics researches, which can produce over 100 times more data compared to the most sophisticated capillary sequencers based on the Sanger method. With the ongoing developments of high throughput sequencing machines and advancement of modern bioinformatics tools at unprecedented pace, the target goal of sequencing individual genomes of living organism at a cost of $1,000 each is seemed to be realistically feasible in the near future. In the relatively short time frame since 2005, the HT-NGS technologies are revolutionizing the human and animal genome researches by analysis of chromatin immunoprecipitation coupled to DNA microarray (ChIP-chip) or sequencing (ChIP-seq), RNA sequencing (RNA-seq), whole genome genotyping, genome wide structural variation, de novo assembling and re-assembling of genome, mutation detection and carrier screening, detection of inherited disorders and complex human diseases, DNA library preparation, paired ends and genomic captures, sequencing of mitochondrial genome and personal genomics. In this review, we addressed the important features of HT-NGS like, first generation DNA sequencers, birth of HT-NGS, second generation HT-NGS platforms, third generation HT-NGS platforms: including single molecule Heliscope™, SMRT™ and RNAP sequencers, Nanopore, Archon Genomics X PRIZE foundation, comparison of second and third HT-NGS platforms, applications, advances and future perspectives of sequencing technologies on human and animal genome research.
Collapse
Affiliation(s)
- Chandra Shekhar Pareek
- Laboratory of Functional Genomics, Institute of General and Molecular Biology, Nicolaus Copernicus University, Torun, Poland.
| | | | | |
Collapse
|
13
|
Lo CL, Shen F, Baumgarner K, Cramer MJ, Lossie AC. Identification of 129S6/SvEvTac-specific polymorphisms on mouse chromosome 11. DNA Cell Biol 2011; 31:402-14. [PMID: 21988490 DOI: 10.1089/dna.2011.1353] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Polymorphisms such as single-nucleotide polymorphisms (SNPs) and insertions/deletions (Indels) can be associated with phenotypic traits and be used as markers for disease diagnosis. Identification of these genetic variations within laboratory mice is crucial to improve our understanding of the genetic background of the mice used for research. As part of a positional cloning project, we sequenced six genes (Mettl16, Evi2a, Psmd11, Cct6d, Rffl, and Ap2b1) within a 6.8-Mb domain of mmu chr 11 in the C57BL/6J and 129S6/SvEvTac inbred strains. Although 129S6/SvEvTac is widely used in the mouse community, there is very little current (or projected future) sequence information available for this strain. We identified 6 Indels and 21 novel SNPs and confirmed genotype information for 114 additional SNPs in these 6 genes. Mettl16 and Ap2b1 contained the largest numbers of variants between the C57BL/6J and 129S6/SvEvTac strains. In addition, we found five new SNPs between 129S6/SvEvTac and 129S1/SvImJ within the Ap2b1 locus. Although we did not detect differences between C57BL/6J and 129S6/SvEvTac within Evi2a, this locus contains a relatively high SNP density compared with the surrounding sequence. Our study highlights the genetic differences among three inbred mouse strains (C57BL/6J, 129S6/SvEvTac, and 129S1/SvImJ) and provides valuable sequence information that can be used to track alleles in genomics-based studies.
Collapse
Affiliation(s)
- Chiao-Ling Lo
- Department of Animal Sciences, Purdue University, West Lafayette, Indiana 47907, USA
| | | | | | | | | |
Collapse
|
14
|
Keane TM, Goodstadt L, Danecek P, White MA, Wong K, Yalcin B, Heger A, Agam A, Slater G, Goodson M, Furlotte NA, Eskin E, Nellåker C, Whitley H, Cleak J, Janowitz D, Hernandez-Pliego P, Edwards A, Belgard TG, Oliver PL, McIntyre RE, Bhomra A, Nicod J, Gan X, Yuan W, van der Weyden L, Steward CA, Bala S, Stalker J, Mott R, Durbin R, Jackson IJ, Czechanski A, Guerra-Assunção JA, Donahue LR, Reinholdt LG, Payseur BA, Ponting CP, Birney E, Flint J, Adams DJ. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature 2011; 477:289-94. [PMID: 21921910 PMCID: PMC3276836 DOI: 10.1038/nature10413] [Citation(s) in RCA: 1146] [Impact Index Per Article: 88.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2011] [Accepted: 08/05/2011] [Indexed: 01/16/2023]
Abstract
We report genome sequences of 17 inbred strains of laboratory mice and identify almost ten times more variants than previously known. We use these genomes to explore the phylogenetic history of the laboratory mouse and to examine the functional consequences of allele-specific variation on transcript abundance, revealing that at least 12% of transcripts show a significant tissue-specific expression bias. By identifying candidate functional variants at 718 quantitative trait loci we show that the molecular nature of functional variants and their position relative to genes vary according to the effect size of the locus. These sequences provide a starting point for a new era in the functional analysis of a key model organism.
Collapse
Affiliation(s)
- Thomas M Keane
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Schmieder R, Edwards R. Fast identification and removal of sequence contamination from genomic and metagenomic datasets. PLoS One 2011; 6:e17288. [PMID: 21408061 PMCID: PMC3052304 DOI: 10.1371/journal.pone.0017288] [Citation(s) in RCA: 501] [Impact Index Per Article: 38.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2010] [Accepted: 01/26/2011] [Indexed: 12/11/2022] Open
Abstract
High-throughput sequencing technologies have strongly impacted microbiology, providing a rapid and cost-effective way of generating draft genomes and exploring microbial diversity. However, sequences obtained from impure nucleic acid preparations may contain DNA from sources other than the sample. Those sequence contaminations are a serious concern to the quality of the data used for downstream analysis, causing misassembly of sequence contigs and erroneous conclusions. Therefore, the removal of sequence contaminants is a necessary and required step for all sequencing projects. We developed DeconSeq, a robust framework for the rapid, automated identification and removal of sequence contamination in longer-read datasets (150 bp mean read length). DeconSeq is publicly available as standalone and web-based versions. The results can be exported for subsequent analysis, and the databases used for the web-based version are automatically updated on a regular basis. DeconSeq categorizes possible contamination sequences, eliminates redundant hits with higher similarity to non-contaminant genomes, and provides graphical visualizations of the alignment results and classifications. Using DeconSeq, we conducted an analysis of possible human DNA contamination in 202 previously published microbial and viral metagenomes and found possible contamination in 145 (72%) metagenomes with as high as 64% contaminating sequences. This new framework allows scientists to automatically detect and efficiently remove unwanted sequence contamination from their datasets while eliminating critical limitations of current methods. DeconSeq's web interface is simple and user-friendly. The standalone version allows offline analysis and integration into existing data processing pipelines. DeconSeq's results reveal whether the sequencing experiment has succeeded, whether the correct sample was sequenced, and whether the sample contains any sequence contamination from DNA preparation or host. In addition, the analysis of 202 metagenomes demonstrated significant contamination of the non-human associated metagenomes, suggesting that this method is appropriate for screening all metagenomes. DeconSeq is available at http://deconseq.sourceforge.net/.
Collapse
Affiliation(s)
- Robert Schmieder
- Department of Computer Science, San Diego State University, San Diego, California, United States of America
- Computational Science Research Center, San Diego State University, San Diego, California, United States of America
- * E-mail: (RS); (RE)
| | - Robert Edwards
- Department of Computer Science, San Diego State University, San Diego, California, United States of America
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, United States of America
- * E-mail: (RS); (RE)
| |
Collapse
|
16
|
Laarakker MC, van Lith HA, Ohl F. Behavioral characterization of A/J and C57BL/6J mice using a multidimensional test: association between blood plasma and brain magnesium-ion concentration with anxiety. Physiol Behav 2010; 102:205-19. [PMID: 21036185 DOI: 10.1016/j.physbeh.2010.10.019] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2010] [Revised: 10/20/2010] [Accepted: 10/26/2010] [Indexed: 11/30/2022]
Abstract
Up to 29% of all adults will experience an anxiety-related disorder during their lives. Treatment of these disorders is still difficult and the exact mechanisms and pathways behind anxiety disorders remain to be elucidated. Although evidence exists for genetically based susceptibility of human psychiatric diseases, risk genes have rarely been identified up to now. Inbred mouse strains are, together with the crosses and genetic reference populations derived from them, important tools for the genetic dissection of complex behavioral traits in the mouse. Thus, inbred mouse models of human anxiety may be a potent starting tool to search for candidate genes in mice, which could then via comparative genomics be translated to the human situation. In this paper we investigate whether the A/J and C57BL/6J mouse inbred strains differ in a limited number of motivational systems (anxiety, exploration, memory, locomotion, and social affinity), but especially in anxiety-related behavior from each other. Young adult individuals from both genders of A/J and C57BL/6J strains were behaviorally phenotyped using a multidimensional test: the modified hole board. This paradigm basically is a combination of the traditional hole board and the open field test allowing to test for anxiety-related avoidance behavior, risk assessment, arousal, exploration, memory, locomotor activity, and social affinity, using just one single test. An acute, aversive stimulus (intra-peritoneal injection with saline) was applied to the animals to test for the robustness of their behavioral phenotype. In addition, presumed physiological indicators for anxiety (circulating glucose, cholesterol, and corticosterone, adrenal tyrosine hydroxylase, and blood plasma and brain magnesium) were investigated. It could be concluded that C57BL/6J and A/J mice differ with respect to almost all tested motivational systems. For some measures, including anxiety-related behavioral parameters, there were clear gender effects. The high-anxiety phenotype of A/J mice could be shown to represent a primary and robust characteristic. Further, blood plasma and brain magnesium levels were significantly correlated with several anxiety-related behavioral parameters. These results emphasize the hypothesized, and possibly causal, association between magnesium status and emotionality.
Collapse
Affiliation(s)
- Marijke C Laarakker
- Division of Animal Welfare & Laboratory Animal Science, Department of Animals in Science and Society, Program Emotion and Cognition, Faculty of Veterinary Medicine,Utrecht University, Utrecht, The Netherlands.
| | | | | |
Collapse
|
17
|
Lappalainen T, Dermitzakis ET. Evolutionary history of regulatory variation in human populations. Hum Mol Genet 2010; 19:R197-203. [DOI: 10.1093/hmg/ddq406] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
|
18
|
Abstract
The sum of RNA transcripts of a cell, organ structure, or organism can be referred to as transcriptome. An increasing number of studies report on specific and common alterations in the renal transcriptome in human nephropathies. In this review several challenges in transcriptomic analyses of the human kidney are discussed. This includes ways to approach the heterogeneity of the kidney itself as well as the diversity of renal diseases. Conventional and upcoming techniques for transcriptional profiling of minute tissue samples are presented, including so-called next generation sequencing and microRNA detection. Different tools to integrate transcriptomic data in a systematic context are discussed beside the current challenge to combine such results with data sets from other integrative biology technologies.
Collapse
Affiliation(s)
- Jeffrey B Hodgin
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
| | | |
Collapse
|
19
|
Zhou X, Ren L, Meng Q, Li Y, Yu Y, Yu J. The next-generation sequencing technology and application. Protein Cell 2010; 1:520-36. [PMID: 21204006 DOI: 10.1007/s13238-010-0065-3] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2010] [Accepted: 05/29/2010] [Indexed: 12/11/2022] Open
Abstract
As one of the key technologies in biomedical research, DNA sequencing has not only improved its productivity with an exponential growth rate but also been applied to new areas of application over the past few years. This is largely due to the advent of newer generations of sequencing platforms, offering ever-faster and cheaper ways to analyze sequences. In our previous review, we looked into technical characteristics of the next-generation sequencers and provided prospective insights into their future development. In this article, we present a brief overview of the advantages and shortcomings of key commercially available platforms with a focus on their suitability for a broad range of applications.
Collapse
Affiliation(s)
- Xiaoguang Zhou
- Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China.
| | | | | | | | | | | |
Collapse
|
20
|
Datta S, Datta S, Kim S, Chakraborty S, Gill RS. Statistical Analyses of Next Generation Sequence Data: A Partial Overview. JOURNAL OF PROTEOMICS & BIOINFORMATICS 2010; 3:183-190. [PMID: 21113236 PMCID: PMC2989618 DOI: 10.4172/jpb.1000138] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Next generation sequencing has revolutionized the status of biological research. For a long time, the gold standard of DNA sequencing was considered to be the Sanger method. However, in 2005, commercial launching of next generation sequencing has made it possible to generate massively parallel and high resolution DNA sequence data. Its usefulness in various genomic applications such as genome-wide detection of SNPs, DNA methylation profiling, mRNA expression profiling, whole-genome re-sequencing and so on are now well recognized. There are several platforms for generating next generation sequencing (NGS) data which we briefly discuss in this mini overview. With new technologies come new challenges for the data analysts. This mini review attempts to present a collection of selected topics in the current development of statistical methods dealing with these novel data types. We believe that knowing the advances and bottlenecks of this technology will help the researchers to benchmark the analytical tools dealing with these data and will pave the path for its proper application into clinical diagnostics.
Collapse
Affiliation(s)
- Susmita Datta
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40202, USA
| | - Somnath Datta
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40202, USA
| | - Seongho Kim
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40202, USA
| | - Sutirtha Chakraborty
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40202, USA
| | - Ryan S. Gill
- Department of Mathematics, University of Louisville, Louisville, KY 40202, USA
| |
Collapse
|
21
|
Renn SCP, Machado HE, Jones A, Soneji K, Kulathinal RJ, Hofmann HA. Using comparative genomic hybridization to survey genomic sequence divergence across species: a proof-of-concept from Drosophila. BMC Genomics 2010; 11:271. [PMID: 20429934 PMCID: PMC2873954 DOI: 10.1186/1471-2164-11-271] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2009] [Accepted: 04/29/2010] [Indexed: 01/23/2023] Open
Abstract
Background Genome-wide analysis of sequence divergence among species offers profound insights into the evolutionary processes that shape lineages. When full-genome sequencing is not feasible for a broad comparative study, we propose the use of array-based comparative genomic hybridization (aCGH) in order to identify orthologous genes with high sequence divergence. Here we discuss experimental design, statistical power, success rate, sources of variation and potential confounding factors. We used a spotted PCR product microarray platform from Drosophila melanogaster to assess sequence divergence on a gene-by-gene basis in three fully sequenced heterologous species (D. sechellia, D. simulans, and D. yakuba). Because complete genome assemblies are available for these species this study presents a powerful test for the use of aCGH as a tool to measure sequence divergence. Results We found a consistent and linear relationship between hybridization ratio and sequence divergence of the sample to the platform species. At higher levels of sequence divergence (< 92% sequence identity to D. melanogaster) ~84% of features had significantly less hybridization to the array in the heterologous species than the platform species, and thus could be identified as "diverged". At lower levels of divergence (≥ 97% identity), only 13% of genes were identified as diverged. While ~40% of the variation in hybridization ratio can be accounted for by variation in sequence identity of the heterologous sample relative to D. melanogaster, other individual characteristics of the DNA sequences, such as GC content, also contribute to variation in hybridization ratio, as does technical variation. Conclusions Here we demonstrate that aCGH can accurately be used as a proxy to estimate genome-wide divergence, thus providing an efficient way to evaluate how evolutionary processes and genomic architecture can shape species diversity in non-model systems. Given the increased number of species for which microarray platforms are available, comparative studies can be conducted for many interesting lineages in order to identify highly diverged genes that may be the target of natural selection.
Collapse
Affiliation(s)
- Suzy C P Renn
- Department of Biology, Reed College, Portland, OR 97202, USA.
| | | | | | | | | | | |
Collapse
|
22
|
Abstract
As our ability to generate sequencing data continues to increase, data analysis is replacing data generation as the rate-limiting step in genomics studies. Here we provide a guide to genomic data visualization tools that facilitate analysis tasks by enabling researchers to explore, interpret and manipulate their data, and in some cases perform on-the-fly computations. We will discuss graphical methods designed for the analysis of de novo sequencing assemblies and read alignments, genome browsing, and comparative genomics, highlighting the strengths and limitations of these approaches and the challenges ahead.
Collapse
|
23
|
|
24
|
Abstract
There is a growing gap between the generation of massively parallel sequencing output and the ability to process and analyze the resulting data. New users are left to navigate a bewildering maze of base calling, alignment, assembly and analysis tools with often incomplete documentation and no idea how to compare and validate their outputs. Bridging this gap is essential, or the coveted $1,000 genome will come with a $20,000 analysis price tag.
Collapse
|
25
|
Aubin-Horth N, Renn SCP. Genomic reaction norms: using integrative biology to understand molecular mechanisms of phenotypic plasticity. Mol Ecol 2009; 18:3763-80. [PMID: 19732339 DOI: 10.1111/j.1365-294x.2009.04313.x] [Citation(s) in RCA: 256] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Phenotypic plasticity is the development of different phenotypes from a single genotype, depending on the environment. Such plasticity is a pervasive feature of life, is observed for various traits and is often argued to be the result of natural selection. A thorough study of phenotypic plasticity should thus include an ecological and an evolutionary perspective. Recent advances in large-scale gene expression technology make it possible to also study plasticity from a molecular perspective, and the addition of these data will help answer long-standing questions about this widespread phenomenon. In this review, we present examples of integrative studies that illustrate the molecular and cellular mechanisms underlying plastic traits, and show how new techniques will grow in importance in the study of these plastic molecular processes. These techniques include: (i) heterologous hybridization to DNA microarrays; (ii) next generation sequencing technologies applied to transcriptomics; (iii) techniques for studying the function of noncoding small RNAs; and (iv) proteomic tools. We also present recent studies on genetic model systems that uncover how environmental cues triggering different plastic responses are sensed and integrated by the organism. Finally, we describe recent work on changes in gene expression in response to an environmental cue that persist after the cue is removed. Such long-term responses are made possible by epigenetic molecular mechanisms, including DNA methylation. The results of these current studies help us outline future avenues for the study of plasticity.
Collapse
Affiliation(s)
- Nadia Aubin-Horth
- Département de Sciences biologiques, Université de Montréal, Québec, Canada.
| | | |
Collapse
|