1
|
Hastie AR, Dong L, Smith A, Finklestein J, Lam ET, Huo N, Cao H, Kwok PY, Deal KR, Dvorak J, Luo MC, Gu Y, Xiao M. Rapid genome mapping in nanochannel arrays for highly complete and accurate de novo sequence assembly of the complex Aegilops tauschii genome. PLoS One 2013; 8:e55864. [PMID: 23405223 PMCID: PMC3566107 DOI: 10.1371/journal.pone.0055864] [Citation(s) in RCA: 123] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2012] [Accepted: 01/03/2013] [Indexed: 02/04/2023] Open
Abstract
Next-generation sequencing (NGS) technologies have enabled high-throughput and low-cost generation of sequence data; however, de novo genome assembly remains a great challenge, particularly for large genomes. NGS short reads are often insufficient to create large contigs that span repeat sequences and to facilitate unambiguous assembly. Plant genomes are notorious for containing high quantities of repetitive elements, which combined with huge genome sizes, makes accurate assembly of these large and complex genomes intractable thus far. Using two-color genome mapping of tiling bacterial artificial chromosomes (BAC) clones on nanochannel arrays, we completed high-confidence assembly of a 2.1-Mb, highly repetitive region in the large and complex genome of Aegilops tauschii, the D-genome donor of hexaploid wheat (Triticum aestivum). Genome mapping is based on direct visualization of sequence motifs on single DNA molecules hundreds of kilobases in length. With the genome map as a scaffold, we anchored unplaced sequence contigs, validated the initial draft assembly, and resolved instances of misassembly, some involving contigs <2 kb long, to dramatically improve the assembly from 75% to 95% complete.
Collapse
Affiliation(s)
- Alex R. Hastie
- BioNano Genomics, San Diego, California, United States of America
| | - Lingli Dong
- Genomics and Gene Discovery Research Unit, United States Department of Agriculture - Agricultural Research Service, Albany, California, United States of America
- Department of Plant Sciences, University of California Davis, Davis, California, United States of America
| | - Alexis Smith
- BioNano Genomics, San Diego, California, United States of America
| | - Jeff Finklestein
- BioNano Genomics, San Diego, California, United States of America
| | - Ernest T. Lam
- BioNano Genomics, San Diego, California, United States of America
| | - Naxin Huo
- Genomics and Gene Discovery Research Unit, United States Department of Agriculture - Agricultural Research Service, Albany, California, United States of America
- Department of Plant Sciences, University of California Davis, Davis, California, United States of America
| | - Han Cao
- BioNano Genomics, San Diego, California, United States of America
| | - Pui-Yan Kwok
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America
| | - Karin R. Deal
- Department of Plant Sciences, University of California Davis, Davis, California, United States of America
| | - Jan Dvorak
- Department of Plant Sciences, University of California Davis, Davis, California, United States of America
| | - Ming-Cheng Luo
- Department of Plant Sciences, University of California Davis, Davis, California, United States of America
| | - Yong Gu
- Genomics and Gene Discovery Research Unit, United States Department of Agriculture - Agricultural Research Service, Albany, California, United States of America
- Department of Plant Sciences, University of California Davis, Davis, California, United States of America
- * E-mail: (MX); (YG)
| | - Ming Xiao
- BioNano Genomics, San Diego, California, United States of America
- * E-mail: (MX); (YG)
| |
Collapse
|
2
|
Genomic organization, evolution, and expression of photoprotein and opsin genes in Mnemiopsis leidyi: a new view of ctenophore photocytes. BMC Biol 2012; 10:107. [PMID: 23259493 PMCID: PMC3570280 DOI: 10.1186/1741-7007-10-107] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2012] [Accepted: 12/21/2012] [Indexed: 11/26/2022] Open
Abstract
Background Calcium-activated photoproteins are luciferase variants found in photocyte cells of bioluminescent jellyfish (Phylum Cnidaria) and comb jellies (Phylum Ctenophora). The complete genomic sequence from the ctenophore Mnemiopsis leidyi, a representative of the earliest branch of animals that emit light, provided an opportunity to examine the genome of an organism that uses this class of luciferase for bioluminescence and to look for genes involved in light reception. To determine when photoprotein genes first arose, we examined the genomic sequence from other early-branching taxa. We combined our genomic survey with gene trees, developmental expression patterns, and functional protein assays of photoproteins and opsins to provide a comprehensive view of light production and light reception in Mnemiopsis. Results The Mnemiopsis genome has 10 full-length photoprotein genes situated within two genomic clusters with high sequence conservation that are maintained due to strong purifying selection and concerted evolution. Photoprotein-like genes were also identified in the genomes of the non-luminescent sponge Amphimedon queenslandica and the non-luminescent cnidarian Nematostella vectensis, and phylogenomic analysis demonstrated that photoprotein genes arose at the base of all animals. Photoprotein gene expression in Mnemiopsis embryos begins during gastrulation in migrating precursors to photocytes and persists throughout development in the canals where photocytes reside. We identified three putative opsin genes in the Mnemiopsis genome and show that they do not group with well-known bilaterian opsin subfamilies. Interestingly, photoprotein transcripts are co-expressed with two of the putative opsins in developing photocytes. Opsin expression is also seen in the apical sensory organ. We present evidence that one opsin functions as a photopigment in vitro, absorbing light at wavelengths that overlap with peak photoprotein light emission, raising the hypothesis that light production and light reception may be functionally connected in ctenophore photocytes. We also present genomic evidence of a complete ciliary phototransduction cascade in Mnemiopsis. Conclusions This study elucidates the genomic organization, evolutionary history, and developmental expression of photoprotein and opsin genes in the ctenophore Mnemiopsis leidyi, introduces a novel dual role for ctenophore photocytes in both bioluminescence and phototransduction, and raises the possibility that light production and light reception are linked in this early-branching non-bilaterian animal.
Collapse
|
3
|
Clark BW, Di Giulio RT. Fundulus heteroclitus adapted to PAHs are cross-resistant to multiple insecticides. ECOTOXICOLOGY (LONDON, ENGLAND) 2012; 21:465-74. [PMID: 22037695 PMCID: PMC3278525 DOI: 10.1007/s10646-011-0807-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 09/27/2011] [Indexed: 05/05/2023]
Abstract
Atlantic killifish (Fundulus heteroclitus) from the Atlantic Wood Superfund site on the Elizabeth River (ER), VA are dramatically resistant to the acute toxicity and teratogenesis caused by polycyclic aromatic hydrocarbons (PAHs). To understand the consequences of adaptation to chronic PAH pollution, we have attempted to further define the chemical tolerance associated with this resistance. An important component of the PAH adaptation of ER fish is the dramatic down-regulation of the aryl hydrocarbon receptor (AHR) pathway, resulting in decreased cytochrome p450 (CYP) 1 activity. Herein, we compared the susceptibility to several insecticides of ER fish to that of reference site (King's Creek; KC) fish; use of these chemicals as probes of the resistance will help to demonstrate if the contaminant adaptation exhibited by ER fish is broad or narrow and AHR-focused. We hypothesized that ER fish would be less susceptible to the organophosphate chlorpyrifos (activated by CYP) and more susceptible to the pyrethroid permethrin (detoxified by CYP). Comparison of acute toxicity in 5-day-old larvae supported this hypothesis for chlorpyrifos. As expected, chemical up-regulation of CYP by co-exposure to β-naphthoflavone (BNF) enhanced the susceptibility of KC but it did not affect ER larvae. Unexpectedly, ER larvae were much less susceptible to permethrin than KC larvae. However, co-exposure to BNF greatly decreased the susceptibility of KC larvae, indicating that metabolism of permethrin by CYP was protective. Additionally, fish from each population were compared for susceptibility to the carbamate carbaryl, an acute neurotoxicant and weak AHR agonist that induces teratogenesis similar to that caused by PAHs. ER embryos and larvae were less susceptible than KC fish. These results suggest that the adaptive phenotype of ER fish is multi-faceted and that aspects other than CYP response are likely to greatly affect their response to contaminants.
Collapse
Affiliation(s)
- Bryan W Clark
- Nicholas School of the Environment, Duke University, Durham, NC 27708-0328, USA.
| | | |
Collapse
|
4
|
Haiminen N, Kuhn DN, Parida L, Rigoutsos I. Evaluation of methods for de novo genome assembly from high-throughput sequencing reads reveals dependencies that affect the quality of the results. PLoS One 2011; 6:e24182. [PMID: 21915294 PMCID: PMC3168497 DOI: 10.1371/journal.pone.0024182] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2011] [Accepted: 08/01/2011] [Indexed: 12/19/2022] Open
Abstract
Recent developments in high-throughput sequencing technology have made low-cost sequencing an attractive approach for many genome analysis tasks. Increasing read lengths, improving quality and the production of increasingly larger numbers of usable sequences per instrument-run continue to make whole-genome assembly an appealing target application. In this paper we evaluate the feasibility of de novo genome assembly from short reads (≤100 nucleotides) through a detailed study involving genomic sequences of various lengths and origin, in conjunction with several of the currently popular assembly programs. Our extensive analysis demonstrates that, in addition to sequencing coverage, attributes such as the architecture of the target genome, the identity of the used assembly program, the average read length and the observed sequencing error rates are powerful variables that affect the best achievable assembly of the target sequence in terms of size and correctness.
Collapse
Affiliation(s)
- Niina Haiminen
- Computational Biology Center, IBM Thomas J. Watson Research Center, Yorktown Heights, New York, United States of America
- * E-mail: (NH); (IR)
| | - David N. Kuhn
- Subtropical Horticulture Research Station, Agricultural Research Service (ARS), United States Department of Agriculture (USDA), Miami, Florida, United Sates of America
| | - Laxmi Parida
- Computational Biology Center, IBM Thomas J. Watson Research Center, Yorktown Heights, New York, United States of America
| | - Isidore Rigoutsos
- Computational Biology Center, IBM Thomas J. Watson Research Center, Yorktown Heights, New York, United States of America
- * E-mail: (NH); (IR)
| |
Collapse
|
5
|
Feuillet C, Leach JE, Rogers J, Schnable PS, Eversole K. Crop genome sequencing: lessons and rationales. TRENDS IN PLANT SCIENCE 2011; 16:77-88. [PMID: 21081278 DOI: 10.1016/j.tplants.2010.10.005] [Citation(s) in RCA: 82] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2010] [Revised: 10/09/2010] [Accepted: 10/16/2010] [Indexed: 05/06/2023]
Abstract
2010 marks the 10th anniversary of the completion of the first plant genome sequence (Arabidopsis thaliana). Triggered by advancements in sequencing technologies, many crop genome sequences have been produced, with eight published since 2008. To date, however, only the rice (Oryza sativa) genome sequence has been finished to a quality level similar to that of the Arabidopsis sequence. This trend to produce draft genomes could affect the ability of researchers to address biological questions of speciation and recent evolution or to link sequence variation accurately to phenotypes. Here, we review the current crop genome sequencing activities, discuss how variability in sequence quality impacts utility for different studies and provide a perspective for a paradigm shift in selecting crops for sequencing in the future.
Collapse
Affiliation(s)
- Catherine Feuillet
- Institut National de la Recherche Agronomique-Université Blaise Pascal-UMR1095-Domaine de Crouel, 63100 Clermont-Ferrand, France.
| | | | | | | | | |
Collapse
|
6
|
Hurle B, Marques-Bonet T, Antonacci F, Hughes I, Ryan JF, Eichler EE, Ornitz DM, Green ED. Lineage-specific evolution of the vertebrate Otopetrin gene family revealed by comparative genomic analyses. BMC Evol Biol 2011; 11:23. [PMID: 21261979 PMCID: PMC3038909 DOI: 10.1186/1471-2148-11-23] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2010] [Accepted: 01/24/2011] [Indexed: 11/19/2022] Open
Abstract
Background Mutations in the Otopetrin 1 gene (Otop1) in mice and fish produce an unusual bilateral vestibular pathology that involves the absence of otoconia without hearing impairment. The encoded protein, Otop1, is the only functionally characterized member of the Otopetrin Domain Protein (ODP) family; the extended sequence and structural preservation of ODP proteins in metazoans suggest a conserved functional role. Here, we use the tools of sequence- and cytogenetic-based comparative genomics to study the Otop1 and the Otop2-Otop3 genes and to establish their genomic context in 25 vertebrates. We extend our evolutionary study to include the gene mutated in Usher syndrome (USH) subtype 1G (Ush1g), both because of the head-to-tail clustering of Ush1g with Otop2 and because Otop1 and Ush1g mutations result in inner ear phenotypes. Results We established that OTOP1 is the boundary gene of an inversion polymorphism on human chromosome 4p16 that originated in the common human-chimpanzee lineage more than 6 million years ago. Other lineage-specific evolutionary events included a three-fold expansion of the Otop genes in Xenopus tropicalis and of Ush1g in teleostei fish. The tight physical linkage between Otop2 and Ush1g is conserved in all vertebrates. To further understand the functional organization of the Ushg1-Otop2 locus, we deduced a putative map of binding sites for CCCTC-binding factor (CTCF), a mammalian insulator transcription factor, from genome-wide chromatin immunoprecipitation-sequencing (ChIP-seq) data in mouse and human embryonic stem (ES) cells combined with detection of CTCF-binding motifs. Conclusions The results presented here clarify the evolutionary history of the vertebrate Otop and Ush1g families, and establish a framework for studying the possible interaction(s) of Ush1g and Otop in developmental pathways.
Collapse
Affiliation(s)
- Belen Hurle
- Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Sodeland M, Kent M, Hayes BJ, Grove H, Lien S. Recent and historical recombination in the admixed Norwegian Red cattle breed. BMC Genomics 2011; 12:33. [PMID: 21232164 PMCID: PMC3030550 DOI: 10.1186/1471-2164-12-33] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2010] [Accepted: 01/14/2011] [Indexed: 11/30/2022] Open
Abstract
Background Comparison of recent patterns of recombination derived from linkage maps to historical patterns of recombination from linkage disequilibrium (LD) could help identify genomic regions affected by strong artificial selection, appearing as reduced recent recombination. Norwegian Red cattle (NRF) make an interesting case study for investigating these patterns as it is an admixed breed with an extensively recorded pedigree. NRF have been under strong artificial selection for traits such as milk and meat production, fertility and health. While measures of LD is also crucial for determining the number of markers required for association mapping studies, estimates of recombination rate can be used to assess quality of genomic assemblies. Results A dataset containing more than 17,000 genome-wide distributed SNPs and 2600 animals was used to assess recombination rates and LD in NRF. Although low LD measured by r2 was observed in NRF relative to some of the breeds from which this breed originates, reports from breeds other than those assessed in this study have described more rapid decline in r2 at short distances than what was found in NRF. Rate of decline in r2 for NRF suggested that to obtain an expected r2 between markers and a causal polymorphism of at least 0.5 for genome-wide association studies, approximately one SNP every 15 kb or a total of 200,000 SNPs would be required. For well known quantitative trait loci (QTLs) for milk production traits on Bos Taurus chromosomes 1, 6 and 20, map length based on historic recombination was greater than map length based on recent recombination in NRF. Further, positions for 130 previously unpositioned contigs from assembly of the bovine genome sequence (Btau_4.0) found using comparative sequence analysis were validated by linkage analysis, and 28% of these positions corresponded to extreme values of population recombination rate. Conclusion While LD is reduced in NRF compared to some of the breeds from which this admixed breed originated, it is elevated over short distances compared to some other cattle breeds. Genomic regions in NRF where map length based on historic recombination was greater than map length based on recent recombination coincided with some well known QTL regions for milk production traits. Linkage analysis in combination with comparative sequence analysis and detection of regions with extreme values of population recombination rate proved to be valuable for detecting problematic regions in the Btau_4.0 genome assembly.
Collapse
Affiliation(s)
- Marte Sodeland
- Department of Animal and Aquacultural Sciences, Centre for Integrative Genetics, Norwegian University of Life Sciences, N-1432 Aas, Norway.
| | | | | | | | | |
Collapse
|
8
|
Knudsen B, Forsberg R, Miyamoto MM. A computer simulator for assessing different challenges and strategies of de novo sequence assembly. Genes (Basel) 2010; 1:263-82. [PMID: 24710045 PMCID: PMC3954094 DOI: 10.3390/genes1020263] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2010] [Revised: 08/18/2010] [Accepted: 08/31/2010] [Indexed: 11/16/2022] Open
Abstract
This study presents a new computer program for assessing the effects of different factors and sequencing strategies on de novo sequence assembly. The program uses reads from actual sequencing studies or from simulations with a reference genome that may also be real or simulated. The simulated reads can be created with our read simulator. They can be of differing length and coverage, consist of paired reads with varying distance, and include sequencing errors such as color space miscalls to imitate SOLiD data. The simulated or real reads are mapped to their reference genome and our assembly simulator is then used to obtain optimal assemblies that are limited only by the distribution of repeats. By way of this mapping, the assembly simulator determines which contigs are theoretically possible, or conversely (and perhaps more importantly), which are not. We illustrate the application and utility of our new simulation tools with several experiments that test the effects of genome complexity (repeats), read length and coverage, word size in De Bruijn graph assembly, and alternative sequencing strategies (e.g., BAC pooling) on sequence assemblies. These experiments highlight just some of the uses of our simulators in the experimental design of sequencing projects and in the further development of assembly algorithms.
Collapse
Affiliation(s)
| | | | - Michael M Miyamoto
- Department of Biology, Box 118525, University of Florida, Gainesville, Florida, 32611-8525, USA.
| |
Collapse
|