1
|
Walve R, Puglisi SJ, Salmela L. Space-Efficient Indexing of Spaced Seeds for Accurate Overlap Computation of Raw Optical Mapping Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; PP:2454-2462. [PMID: 34057895 DOI: 10.1109/tcbb.2021.3085086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
A key problem in processing raw optical mapping data (Rmaps) is finding Rmaps originating from the same genomic region. These sets of related Rmaps can be used to correct errors in Rmap data, and to find overlaps between Rmaps to assemble consensus optical maps. Previous Rmap overlap aligners are computationally very expensive and do not scale to large eukaryotic data sets. We present Selkie, an Rmap overlap aligner based on a spaced (l,k)-mer index which was pioneered in the Rmap error correction tool Elmeri. Here we present a space efficient version of the index which is twice as fast as prior art while using just a quarter of the memory on a human data set. Moreover, our index can be used for filtering candidates for Rmap overlap computation, whereas Elmeri used the index only for error correction of Rmaps. By combining our filtering of Rmaps with the exhaustive, but highly accurate, algorithm of Valouev et al. (2006), Selkie maintains or increases the accuracy of finding overlapping Rmaps on a bacterial dataset while being at least four times faster. Furthermore, for finding overlaps in a human dataset, Selkie is up to two orders of magnitude faster than previous methods.
Collapse
|
2
|
Abid HZ, Young E, McCaffrey J, Raseley K, Varapula D, Wang HY, Piazza D, Mell J, Xiao M. Customized optical mapping by CRISPR-Cas9 mediated DNA labeling with multiple sgRNAs. Nucleic Acids Res 2021; 49:e8. [PMID: 33231685 PMCID: PMC7826249 DOI: 10.1093/nar/gkaa1088] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Revised: 10/16/2020] [Accepted: 10/27/2020] [Indexed: 01/01/2023] Open
Abstract
Whole-genome mapping technologies have been developed as a complementary tool to provide scaffolds for genome assembly and structural variation analysis (1,2). We recently introduced a novel DNA labeling strategy based on a CRISPR-Cas9 genome editing system, which can target any 20bp sequences. The labeling strategy is specifically useful in targeting repetitive sequences, and sequences not accessible to other labeling methods. In this report, we present customized mapping strategies that extend the applications of CRISPR-Cas9 DNA labeling. We first design a CRISPR-Cas9 labeling strategy to interrogate and differentiate the single allele differences in NGG protospacer adjacent motifs (PAM sequence). Combined with sequence motif labeling, we can pinpoint the single-base differences in highly conserved sequences. In the second strategy, we design mapping patterns across a genome by selecting sets of specific single-guide RNAs (sgRNAs) for labeling multiple loci of a genomic region or a whole genome. By developing and optimizing a single tube synthesis of multiple sgRNAs, we demonstrate the utility of CRISPR-Cas9 mapping with 162 sgRNAs targeting the 2Mb Haemophilus influenzae chromosome. These CRISPR-Cas9 mapping approaches could be particularly useful for applications in defining long-distance haplotypes and pinpointing the breakpoints in large structural variants in complex genomes and microbial mixtures.
Collapse
MESH Headings
- Alleles
- Base Sequence
- Benzoxazoles/analysis
- CRISPR-Cas Systems
- Chromosome Mapping/methods
- Chromosomes, Bacterial/genetics
- Computer Simulation
- Conserved Sequence/genetics
- DNA-Directed RNA Polymerases
- Drug Resistance, Bacterial/genetics
- Fluorescent Dyes/analysis
- Gene Editing/methods
- Genome, Bacterial
- Genome, Human
- Haemophilus influenzae/drug effects
- Haemophilus influenzae/genetics
- Haplotypes/genetics
- Humans
- Lab-On-A-Chip Devices
- Nalidixic Acid/pharmacology
- Novobiocin/pharmacology
- Nucleotide Motifs/genetics
- Polymorphism, Single Nucleotide
- Quinolinium Compounds/analysis
- RNA, Guide, CRISPR-Cas Systems/chemical synthesis
- RNA, Guide, CRISPR-Cas Systems/genetics
- Repetitive Sequences, Nucleic Acid/genetics
- Sequence Alignment
- Staining and Labeling/methods
- Viral Proteins
Collapse
Affiliation(s)
- Heba Z Abid
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA
| | - Eleanor Young
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA
| | - Jennifer McCaffrey
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA
| | - Kaitlin Raseley
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA
| | - Dharma Varapula
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA
| | - Hung-Yi Wang
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA
| | - Danielle Piazza
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA
- Department of Microbiology and Immunology, College of Medicine, Drexel University, Philadelphia, PA, USA
- Center for Genomic Sciences, Institute of Molecular Medicine and Infectious Disease, Drexel University, Philadelphia, PA, USA
| | - Joshua Mell
- Department of Microbiology and Immunology, College of Medicine, Drexel University, Philadelphia, PA, USA
- Center for Genomic Sciences, Institute of Molecular Medicine and Infectious Disease, Drexel University, Philadelphia, PA, USA
| | - Ming Xiao
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA
- Center for Genomic Sciences, Institute of Molecular Medicine and Infectious Disease, Drexel University, Philadelphia, PA, USA
| |
Collapse
|
3
|
Salmela L, Mukherjee K, Puglisi SJ, Muggli MD, Boucher C. Fast and accurate correction of optical mapping data via spaced seeds. Bioinformatics 2020; 36:682-689. [PMID: 31504206 PMCID: PMC7005598 DOI: 10.1093/bioinformatics/btz663] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Revised: 07/25/2019] [Accepted: 08/30/2019] [Indexed: 11/24/2022] Open
Abstract
Motivation Optical mapping data is used in many core genomics applications, including structural variation detection, scaffolding assembled contigs and mis-assembly detection. However, the pervasiveness of spurious and deleted cut sites in the raw data, which are called Rmaps, make assembly and alignment of them challenging. Although there exists another method to error correct Rmap data, named cOMet, it is unable to scale to even moderately large sized genomes. The challenge faced in error correction is in determining pairs of Rmaps that originate from the same region of the same genome. Results We create an efficient method for determining pairs of Rmaps that contain significant overlaps between them. Our method relies on the novel and nontrivial adaption and application of spaced seeds in the context of optical mapping, which allows for spurious and deleted cut sites to be accounted for. We apply our method to detecting and correcting these errors. The resulting error correction method, referred to as Elmeri, improves upon the results of state-of-the-art correction methods but in a fraction of the time. More specifically, cOMet required 9.9 CPU days to error correct Rmap data generated from the human genome, whereas Elmeri required less than 15 CPU hours and improved the quality of the Rmaps by more than four times compared to cOMet. Availability and implementation Elmeri is publicly available under GNU Affero General Public License at https://github.com/LeenaSalmela/Elmeri. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Leena Salmela
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, FI-00014 University of Helsinki, Helsinki 00100, Finland
| | - Kingshuk Mukherjee
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL 32611, USA
| | - Simon J Puglisi
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, FI-00014 University of Helsinki, Helsinki 00100, Finland
| | - Martin D Muggli
- Department of Computer Science, Colorado State University, Fort Collins, CO 80523, USA
| | - Christina Boucher
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL 32611, USA
| |
Collapse
|
4
|
Abstract
BACKGROUND The long reads produced by third generation sequencing technologies have significantly boosted the results of genome assembly but still, genome-wide assemblies solely based on read data cannot be produced. Thus, for example, optical mapping data has been used to further improve genome assemblies but it has mostly been applied in a post-processing stage after contig assembly. RESULTS We propose OPTICALKERMIT which directly integrates genome wide optical maps into contig assembly. We show how genome wide optical maps can be used to localize reads on the genome and then we adapt the Kermit method, which originally incorporated genetic linkage maps to the miniasm assembler, to use this information in contig assembly. Our experimental results show that incorporating genome wide optical maps to the contig assembly of miniasm increases NGA50 while the number of misassemblies decreases or stays the same. Furthermore, when compared to the Canu assembler, OPTICALKERMIT produces an assembly with almost three times higher NGA50 with a lower number of misassemblies on real A. thaliana reads. CONCLUSIONS OPTICALKERMIT successfully incorporates optical mapping data directly to contig assembly of eukaryotic genomes. Our results show that this is a promising approach to improve the contiguity of genome assemblies.
Collapse
Affiliation(s)
- Miika Leinonen
- Department of Computer Science, Helsinki Institute for Information Technology, University of Helsinki, Pietari Kalmin katu 5, Helsinki, Finland
| | - Leena Salmela
- Department of Computer Science, Helsinki Institute for Information Technology, University of Helsinki, Pietari Kalmin katu 5, Helsinki, Finland.
| |
Collapse
|
5
|
Mukherjee K, Alipanahi B, Kahveci T, Salmela L, Boucher C. Aligning optical maps to de Bruijn graphs. Bioinformatics 2020; 35:3250-3256. [PMID: 30698651 DOI: 10.1093/bioinformatics/btz069] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Revised: 12/31/2018] [Accepted: 01/25/2019] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Optical maps are high-resolution restriction maps (Rmaps) that give a unique numeric representation to a genome. Used in concert with sequence reads, they provide a useful tool for genome assembly and for discovering structural variations and rearrangements. Although they have been a regular feature of modern genome assembly projects, optical maps have been mainly used in post-processing step and not in the genome assembly process itself. Several methods have been proposed for pairwise alignment of single molecule optical maps-called Rmaps, or for aligning optical maps to assembled reads. However, the problem of aligning an Rmap to a graph representing the sequence data of the same genome has not been studied before. Such an alignment provides a mapping between two sets of data: optical maps and sequence data which will facilitate the usage of optical maps in the sequence assembly step itself. RESULTS We define the problem of aligning an Rmap to a de Bruijn graph and present the first algorithm for solving this problem which is based on a seed-and-extend approach. We demonstrate that our method is capable of aligning 73% of Rmaps generated from the Escherichia coli genome to the de Bruijn graph constructed from short reads generated from the same genome. We validate the alignments and show that our method achieves an accuracy of 99.6%. We also show that our method scales to larger genomes. In particular, we show that 76% of Rmaps can be aligned to the de Bruijn graph in the case of human data. AVAILABILITY AND IMPLEMENTATION The software for aligning optical maps to de Bruijn graph, omGraph is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/omGraph. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kingshuk Mukherjee
- Department of Computer and Information Science and Engineering, College of Engineering, University of Florida, Gainesville, USA
| | - Bahar Alipanahi
- Department of Computer and Information Science and Engineering, College of Engineering, University of Florida, Gainesville, USA
| | - Tamer Kahveci
- Department of Computer and Information Science and Engineering, College of Engineering, University of Florida, Gainesville, USA
| | - Leena Salmela
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland
| | - Christina Boucher
- Department of Computer and Information Science and Engineering, College of Engineering, University of Florida, Gainesville, USA
| |
Collapse
|
6
|
Du Y, Wang Y, Hu X, Liu J, Diao J. Single‐molecule quantification of 5‐methylcytosine and 5‐hydroxymethylcytosine in cancer genome. VIEW 2020. [DOI: 10.1002/viw2.9] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Affiliation(s)
- Yang Du
- Department of BiotherapyCancer CenterState Key Laboratory of BiotherapyWest China HospitalSichuan University Chengdu China
- Department of Cancer BiologyUniversity of Cincinnati College of Medicine Cincinnati Ohio USA
| | - Yongyao Wang
- Department of Cancer BiologyUniversity of Cincinnati College of Medicine Cincinnati Ohio USA
| | - Xiao Hu
- Department of Cancer BiologyUniversity of Cincinnati College of Medicine Cincinnati Ohio USA
| | - Jiyan Liu
- Department of BiotherapyCancer CenterState Key Laboratory of BiotherapyWest China HospitalSichuan University Chengdu China
| | - Jiajie Diao
- Department of Cancer BiologyUniversity of Cincinnati College of Medicine Cincinnati Ohio USA
| |
Collapse
|
7
|
Affiliation(s)
- Weihua Pan
- Department of Computer Science and Engineering, University of California, Riverside, California
| | - Tao Jiang
- Department of Computer Science and Engineering, University of California, Riverside, California
| | - Stefano Lonardi
- Department of Computer Science and Engineering, University of California, Riverside, California
| |
Collapse
|
8
|
Sousa TDJ, Parise D, Profeta R, Parise MTD, Gomide ACP, Kato RB, Pereira FL, Figueiredo HCP, Ramos R, Brenig B, Costa da Silva ALD, Ghosh P, Barh D, Góes-Neto A, Azevedo V. Re-sequencing and optical mapping reveals misassemblies and real inversions on Corynebacterium pseudotuberculosis genomes. Sci Rep 2019; 9:16387. [PMID: 31705053 PMCID: PMC6841979 DOI: 10.1038/s41598-019-52695-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Accepted: 10/18/2019] [Indexed: 12/29/2022] Open
Abstract
The number of draft genomes deposited in Genbank from the National Center for Biotechnology Information (NCBI) is higher than the complete ones. Draft genomes are assemblies that contain fragments of misassembled regions (gaps). Such draft genomes present a hindrance to the complete understanding of the biology and evolution of the organism since they lack genomic information. To overcome this problem, strategies to improve the assembly process are developed continuously. Also, the greatest challenge to the assembly progress is the presence of repetitive DNA regions. This article highlights the use of optical mapping, to detect and correct assembly errors in Corynebacterium pseudotuberculosis. We also demonstrate that choosing a reference genome should be done with caution to avoid assembly errors and loss of genetic information.
Collapse
Affiliation(s)
- Thiago de Jesus Sousa
- Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Doglas Parise
- Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Rodrigo Profeta
- Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | | | - Anne Cybelle Pinto Gomide
- Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Rodrigo Bentos Kato
- Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Felipe Luiz Pereira
- National Reference Laboratory for Aquatic Animal Diseases (AQUACEN) of Ministry of Agriculture, Livestock and Food Supply, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Henrique Cesar Pereira Figueiredo
- National Reference Laboratory for Aquatic Animal Diseases (AQUACEN) of Ministry of Agriculture, Livestock and Food Supply, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Rommel Ramos
- Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| | - Bertram Brenig
- Institute of Veterinary Medicine, University Göttingen, Göttingen, Germany
| | | | - Preetam Ghosh
- Department of Computer Science, Virginia Commonwealth University, Richmond, United States
| | - Debmalya Barh
- Institute of Integrative Omics and Applied Biotechnology, Nonakuri West Bengal, India
| | - Aristóteles Góes-Neto
- Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Vasco Azevedo
- Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil.
| |
Collapse
|
9
|
Haemonchus contortus: Genome Structure, Organization and Comparative Genomics. ADVANCES IN PARASITOLOGY 2016; 93:569-98. [PMID: 27238013 DOI: 10.1016/bs.apar.2016.02.016] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
One of the first genome sequencing projects for a parasitic nematode was that for Haemonchus contortus. The open access data from the Wellcome Trust Sanger Institute provided a valuable early resource for the research community, particularly for the identification of specific genes and genetic markers. Later, a second sequencing project was initiated by the University of Melbourne, and the two draft genome sequences for H. contortus were published back-to-back in 2013. There is a pressing need for long-range genomic information for genetic mapping, population genetics and functional genomic studies, so we are continuing to improve the Wellcome Trust Sanger Institute assembly to provide a finished reference genome for H. contortus. This review describes this process, compares the H. contortus genome assemblies with draft genomes from other members of the strongylid group and discusses future directions for parasite genomics using the H. contortus model.
Collapse
|
10
|
McCaffrey J, Sibert J, Zhang B, Zhang Y, Hu W, Riethman H, Xiao M. CRISPR-CAS9 D10A nickase target-specific fluorescent labeling of double strand DNA for whole genome mapping and structural variation analysis. Nucleic Acids Res 2016; 44:e11. [PMID: 26481349 PMCID: PMC4737172 DOI: 10.1093/nar/gkv878] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2015] [Revised: 07/10/2015] [Accepted: 08/20/2015] [Indexed: 12/29/2022] Open
Abstract
We have developed a new, sequence-specific DNA labeling strategy that will dramatically improve DNA mapping in complex and structurally variant genomic regions, as well as facilitate high-throughput automated whole-genome mapping. The method uses the Cas9 D10A protein, which contains a nuclease disabling mutation in one of the two nuclease domains of Cas9, to create a guide RNA-directed DNA nick in the context of an in vitro-assembled CRISPR-CAS9-DNA complex. Fluorescent nucleotides are then incorporated adjacent to the nicking site with a DNA polymerase to label the guide RNA-determined target sequences. This labeling strategy is very powerful in targeting repetitive sequences as well as in barcoding genomic regions and structural variants not amenable to current labeling methods that rely on uneven distributions of restriction site motifs in the DNA. Importantly, it renders the labeled double-stranded DNA available in long intact stretches for high-throughput analysis in nanochannel arrays as well as for lower throughput targeted analysis of labeled DNA regions using alternative methods for stretching and imaging the labeled long DNA molecules. Thus, this method will dramatically improve both automated high-throughput genome-wide mapping as well as targeted analyses of complex regions containing repetitive and structurally variant DNA.
Collapse
MESH Headings
- Amino Acid Substitution
- Bacterial Proteins/chemistry
- Bacterial Proteins/genetics
- CRISPR-Associated Protein 9
- CRISPR-Cas Systems
- Chromosome Mapping/methods
- Chromosomes, Artificial, Bacterial/chemistry
- Chromosomes, Artificial, Bacterial/metabolism
- Clustered Regularly Interspaced Short Palindromic Repeats
- DNA/chemistry
- DNA/genetics
- Deoxyribonuclease I/chemistry
- Deoxyribonuclease I/genetics
- Endonucleases/chemistry
- Endonucleases/genetics
- Fluorescent Dyes/chemistry
- Genome, Human
- HIV-1/chemistry
- HIV-1/genetics
- Humans
- In Situ Nick-End Labeling/methods
- Mutation
- Plasmids/chemistry
- Plasmids/metabolism
- Protein Structure, Tertiary
- RNA, Guide, CRISPR-Cas Systems/chemistry
- RNA, Guide, CRISPR-Cas Systems/genetics
Collapse
Affiliation(s)
- Jennifer McCaffrey
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, USA
| | - Justin Sibert
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, USA
| | - Bin Zhang
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, USA
| | - Yonggang Zhang
- Department of Neuroscience, Temple University School of Medicine, Philadelphia, PA, USA
| | - Wenhui Hu
- Department of Neuroscience, Temple University School of Medicine, Philadelphia, PA, USA
| | | | - Ming Xiao
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, USA
| |
Collapse
|
11
|
Xiao S, Li J, Ma F, Fang L, Xu S, Chen W, Wang ZY. Rapid construction of genome map for large yellow croaker (Larimichthys crocea) by the whole-genome mapping in BioNano Genomics Irys system. BMC Genomics 2015; 16:670. [PMID: 26336087 PMCID: PMC4559010 DOI: 10.1186/s12864-015-1871-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2015] [Accepted: 08/21/2015] [Indexed: 12/21/2022] Open
Abstract
Background Large yellow croaker (Larimichthys crocea) is an important commercial fish in China and East-Asia. The annual product of the species from the aqua-farming industry is about 90 thousand tons. In spite of its economic importance, genetic studies of economic traits and genomic selections of the species are hindered by the lack of genomic resources. Specifically, a whole-genome physical map of large yellow croaker is still missing. The traditional BAC-based fingerprint method is extremely time- and labour-consuming. Here we report the first genome map construction using the high-throughput whole-genome mapping technique by nanochannel arrays in BioNano Genomics Irys system. Results For an optimal marker density of ~10 per 100 kb, the nicking endonuclease Nt.BspQ1 was chosen for the genome map generation. 645,305 DNA molecules with a total length of ~112 Gb were labelled and detected, covering more than 160X of the large yellow croaker genome. Employing IrysView package and signature patterns in raw DNA molecules, a whole-genome map of large yellow croaker was assembled into 686 maps with a total length of 727 Mb, which was consistent with the estimated genome size. The N50 length of the whole-genome map, including 126 maps, was up to 1.7 Mb. The excellent hybrid alignment with large yellow croaker draft genome validated the consensus genome map assembly and highlighted a promising application of whole-genome mapping on draft genome sequence super-scaffolding. The genome map data of large yellow croaker are accessible on lycgenomics.jmu.edu.cn/pm. Conclusion Using the state-of-the-art whole-genome mapping technique in Irys system, the first whole-genome map for large yellow croaker has been constructed and thus highly facilitates the ongoing genomic and evolutionary studies for the species. To our knowledge, this is the first public report on genome map construction by the whole-genome mapping for aquatic-organisms. Our study demonstrates a promising application of the whole-genome mapping on genome maps construction for other non-model organisms in a fast and reliable manner. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1871-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Shijun Xiao
- Key Laboratory of Healthy Mariculture in the East China Sea, Ministry of Agriculture; Fisheries College, Jimei University, Yindou Road, Xiamen, P.R. China
| | - Jiongtang Li
- Chinese Academy of Fishery Sciences, Yongding Road, Beijing, P.R. China
| | | | - Lujing Fang
- Key Laboratory of Healthy Mariculture in the East China Sea, Ministry of Agriculture; Fisheries College, Jimei University, Yindou Road, Xiamen, P.R. China
| | - Shuangbin Xu
- Key Laboratory of Healthy Mariculture in the East China Sea, Ministry of Agriculture; Fisheries College, Jimei University, Yindou Road, Xiamen, P.R. China
| | - Wei Chen
- Key Laboratory of Healthy Mariculture in the East China Sea, Ministry of Agriculture; Fisheries College, Jimei University, Yindou Road, Xiamen, P.R. China
| | - Zhi Yong Wang
- Key Laboratory of Healthy Mariculture in the East China Sea, Ministry of Agriculture; Fisheries College, Jimei University, Yindou Road, Xiamen, P.R. China.
| |
Collapse
|
12
|
Zhou S, Goldstein S, Place M, Bechner M, Patino D, Potamousis K, Ravindran P, Pape L, Rincon G, Hernandez-Ortiz J, Medrano JF, Schwartz DC. A clone-free, single molecule map of the domestic cow (Bos taurus) genome. BMC Genomics 2015; 16:644. [PMID: 26314885 PMCID: PMC4551733 DOI: 10.1186/s12864-015-1823-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2015] [Accepted: 08/07/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The cattle (Bos taurus) genome was originally selected for sequencing due to its economic importance and unique biology as a model organism for understanding other ruminants, or mammals. Currently, there are two cattle genome sequence assemblies (UMD3.1 and Btau4.6) from groups using dissimilar assembly algorithms, which were complemented by genetic and physical map resources. However, past comparisons between these assemblies revealed substantial differences. Consequently, such discordances have engendered ambiguities when using reference sequence data, impacting genomic studies in cattle and motivating construction of a new optical map resource--BtOM1.0--to guide comparisons and improvements to the current sequence builds. Accordingly, our comprehensive comparisons of BtOM1.0 against the UMD3.1 and Btau4.6 sequence builds tabulate large-to-immediate scale discordances requiring mediation. RESULTS The optical map, BtOM1.0, spanning the B. taurus genome (Hereford breed, L1 Dominette 01449) was assembled from an optical map dataset consisting of 2,973,315 (439 X; raw dataset size before assembly) single molecule optical maps (Rmaps; 1 Rmap = 1 restriction mapped DNA molecule) generated by the Optical Mapping System. The BamHI map spans 2,575.30 Mb and comprises 78 optical contigs assembled by a combination of iterative (using the reference sequence: UMD3.1) and de novo assembly techniques. BtOM1.0 is a high-resolution physical map featuring an average restriction fragment size of 8.91 Kb. Comparisons of BtOM1.0 vs. UMD3.1, or Btau4.6, revealed that Btau4.6 presented far more discordances (7,463) vs. UMD3.1 (4,754). Overall, we found that Btau4.6 presented almost double the number of discordances than UMD3.1 across most of the 6 categories of sequence vs. map discrepancies, which are: COMPLEX (misassembly), DELs (extraneous sequences), INSs (missing sequences), ITs (Inverted/Translocated sequences), ECs (extra restriction cuts) and MCs (missing restriction cuts). CONCLUSION Alignments of UMD3.1 and Btau4.6 to BtOM1.0 reveal discordances commensurate with previous reports, and affirm the NCBI's current designation of UMD3.1 sequence assembly as the "reference assembly" and the Btau4.6 as the "alternate assembly." The cattle genome optical map, BtOM1.0, when used as a comprehensive and largely independent guide, will greatly assist improvements to existing sequence builds, and later serve as an accurate physical scaffold for studies concerning the comparative genomics of cattle breeds.
Collapse
Affiliation(s)
- Shiguo Zhou
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, and the UW Biotechnology Center, University of Wisconsin-Madison, 425 Henry Mall, Madison, WI, 53706, USA.
| | - Steve Goldstein
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, and the UW Biotechnology Center, University of Wisconsin-Madison, 425 Henry Mall, Madison, WI, 53706, USA.
| | - Michael Place
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, and the UW Biotechnology Center, University of Wisconsin-Madison, 425 Henry Mall, Madison, WI, 53706, USA.
| | - Michael Bechner
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, and the UW Biotechnology Center, University of Wisconsin-Madison, 425 Henry Mall, Madison, WI, 53706, USA.
| | - Diego Patino
- Departamento de Materiales, Facultad de Minas, Universidad Nacional de Colombia, Sede Medellin, Calle 75 # 79A-51, Bloque M17, Medellin, Colombia, SA.
| | - Konstantinos Potamousis
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, and the UW Biotechnology Center, University of Wisconsin-Madison, 425 Henry Mall, Madison, WI, 53706, USA.
| | - Prabu Ravindran
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, and the UW Biotechnology Center, University of Wisconsin-Madison, 425 Henry Mall, Madison, WI, 53706, USA.
| | - Louise Pape
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, and the UW Biotechnology Center, University of Wisconsin-Madison, 425 Henry Mall, Madison, WI, 53706, USA.
| | - Gonzalo Rincon
- Department of Animal Science, University of California-Davis, Davis, CA, 95616, USA.
| | - Juan Hernandez-Ortiz
- Departamento de Materiales, Facultad de Minas, Universidad Nacional de Colombia, Sede Medellin, Calle 75 # 79A-51, Bloque M17, Medellin, Colombia, SA.
| | - Juan F Medrano
- Department of Animal Science, University of California-Davis, Davis, CA, 95616, USA.
| | - David C Schwartz
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, and the UW Biotechnology Center, University of Wisconsin-Madison, 425 Henry Mall, Madison, WI, 53706, USA.
| |
Collapse
|
13
|
A fast and scalable kymograph alignment algorithm for nanochannel-based optical DNA mappings. PLoS One 2015; 10:e0121905. [PMID: 25875920 PMCID: PMC4395267 DOI: 10.1371/journal.pone.0121905] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2014] [Accepted: 02/05/2015] [Indexed: 11/26/2022] Open
Abstract
Optical mapping by direct visualization of individual DNA molecules, stretched in nanochannels with sequence-specific fluorescent labeling, represents a promising tool for disease diagnostics and genomics. An important challenge for this technique is thermal motion of the DNA as it undergoes imaging; this blurs fluorescent patterns along the DNA and results in information loss. Correcting for this effect (a process referred to as kymograph alignment) is a common preprocessing step in nanochannel-based optical mapping workflows, and we present here a highly efficient algorithm to accomplish this via pattern recognition. We compare our method with the one previous approach, and we find that our method is orders of magnitude faster while producing data of similar quality. We demonstrate proof of principle of our approach on experimental data consisting of melt mapped bacteriophage DNA.
Collapse
|
14
|
Abstract
In the next generation sequencing techniques millions of short reads are produced from a genomic sequence at a single run. The chances of low read coverage to some regions of the sequence are very high. The reads are short and very large in number. Due to erroneous base calling, there could be errors in the reads. As a consequence, sequence assemblers often fail to sequence an entire DNA molecule and instead output a set of overlapping segments that together represent a consensus region of the DNA. This set of overlapping segments are collectively called contigs in the literature. The final step of the sequencing process, called scaffolding, is to assemble the contigs into a correct order. Scaffolding techniques typically exploit additional information such as mate-pairs, pair-ends, or optical restriction maps. In this paper we introduce a series of novel algorithms for scaffolding that exploit optical restriction maps (ORMs). Simulation results show that our algorithms are indeed reliable, scalable, and efficient compared to the best known algorithms in the literature.
Collapse
|
15
|
BAIT: Organizing genomes and mapping rearrangements in single cells. Genome Med 2013; 5:82. [PMID: 24028793 PMCID: PMC3971352 DOI: 10.1186/gm486] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2013] [Accepted: 09/09/2013] [Indexed: 12/30/2022] Open
Abstract
Strand-seq is a single-cell sequencing technique to finely map sister chromatid exchanges (SCEs) and other rearrangements. To analyze these data, we introduce BAIT, software which assigns templates and identifies and localizes SCEs. We demonstrate BAIT can refine completed reference assemblies, identifying approximately 21 Mb of incorrectly oriented fragments and placing over half (2.6 Mb) of the orphan fragments in mm10/GRCm38. BAIT also stratifies scaffold-stage assemblies, potentially accelerating the assembling and finishing of reference genomes. BAIT is available at http://sourceforge.net/projects/bait/.
Collapse
|
16
|
Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly. Nat Biotechnol 2013; 30:771-6. [PMID: 22797562 DOI: 10.1038/nbt.2303] [Citation(s) in RCA: 442] [Impact Index Per Article: 40.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2012] [Accepted: 06/06/2012] [Indexed: 12/21/2022]
Abstract
We describe genome mapping on nanochannel arrays. In this approach, specific sequence motifs in single DNA molecules are fluorescently labeled, and the DNA molecules are uniformly stretched in thousands of silicon channels on a nanofluidic device. Fluorescence imaging allows the construction of maps of the physical distances between occurrences of the sequence motifs. We demonstrate the analysis, individually and as mixtures, of 95 bacterial artificial chromosome (BAC) clones that cover the 4.7-Mb human major histocompatibility complex region. We obtain accurate, haplotype-resolved, sequence motif maps hundreds of kilobases in length, resulting in a median coverage of 114× for the BACs. The final sequence motif map assembly contains three contigs. With an average distance of 9 kb between labels, we detect 22 haplotype differences. We also use the sequence motif maps to provide scaffolds for de novo assembly of sequencing data. Nanochannel genome mapping should facilitate de novo assembly of sequencing reads from complex regions in diploid organisms, haplotype and structural variation analysis and comparative genomics.
Collapse
|
17
|
|
18
|
AGORA: Assembly Guided by Optical Restriction Alignment. BMC Bioinformatics 2012; 13:189. [PMID: 22856673 PMCID: PMC3431216 DOI: 10.1186/1471-2105-13-189] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2012] [Accepted: 06/28/2012] [Indexed: 11/10/2022] Open
Abstract
Background Genome assembly is difficult due to repeated sequences within the genome, which create ambiguities and cause the final assembly to be broken up into many separate sequences (contigs). Long range linking information, such as mate-pairs or mapping data, is necessary to help assembly software resolve repeats, thereby leading to a more complete reconstruction of genomes. Prior work has used optical maps for validating assemblies and scaffolding contigs, after an initial assembly has been produced. However, optical maps have not previously been used within the genome assembly process. Here, we use optical map information within the popular de Bruijn graph assembly paradigm to eliminate paths in the de Bruijn graph which are not consistent with the optical map and help determine the correct reconstruction of the genome. Results We developed a new algorithm called AGORA: Assembly Guided by Optical Restriction Alignment. AGORA is the first algorithm to use optical map information directly within the de Bruijn graph framework to help produce an accurate assembly of a genome that is consistent with the optical map information provided. Our simulations on bacterial genomes show that AGORA is effective at producing assemblies closely matching the reference sequences. Additionally, we show that noise in the optical map can have a strong impact on the final assembly quality for some complex genomes, and we also measure how various characteristics of the starting de Bruijn graph may impact the quality of the final assembly. Lastly, we show that a proper choice of restriction enzyme for the optical map may substantially improve the quality of the final assembly. Conclusions Our work shows that optical maps can be used effectively to assemble genomes within the de Bruijn graph assembly framework. Our experiments also provide insights into the characteristics of the mapping data that most affect the performance of our algorithm, indicating the potential benefit of more accurate optical mapping technologies, such as nano-coding.
Collapse
|
19
|
Abstract
Several species of filamentous fungi contain so-called dispensable or supernumerary chromosomes. These chromosomes are dispensable for the fungus to survive, but may carry genes required for specialized functions, such as infection of a host plant. It has been shown that at least some dispensable chromosomes are able to transfer horizontally (i.e., in the absence of a sexual cycle) from one fungal strain to another. In this paper, we describe a method by which this can be shown. Horizontal chromosome transfer (HCT) occurs during co-incubation of two strains. To document the actual occurrence of HCT, it is necessary to select for HCT progeny. This is accomplished by transforming two different drug-resistance genes into the two parent strains before their co-incubation. In one of the strains (the "donor"), a drug-resistance gene should be integrated in a chromosome of which the propensity for HCT is under investigation. In the "tester" or "recipient" strain, another drug-resistance gene should be integrated somewhere in the core genome. In this way, after co-incubation, HCT progeny can be selected on plates containing both drugs. HCT can be initiated with equal amounts of asexual spores of both strains, plated on regular growth medium for the particular fungus, followed by incubation until new asexual spores are formed. The new asexual spores are then harvested and plated on plates containing both drugs. Double drug-resistant colonies that appear should carry at least one chromosome from each parental strain. Finally, double drug-resistant strains need to be analysed to assess whether HCT has actually occurred. This can be done by various genome mapping methods, like CHEF-gels, AFLP, RFLP, PCR markers, optical maps, or even complete genome sequencing.
Collapse
Affiliation(s)
- H Charlotte van der Does
- Plant Pathology, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands
| | | |
Collapse
|
20
|
Desjardins CA, Champion MD, Holder JW, Muszewska A, Goldberg J, Bailão AM, Brigido MM, Ferreira MEDS, Garcia AM, Grynberg M, Gujja S, Heiman DI, Henn MR, Kodira CD, León-Narváez H, Longo LVG, Ma LJ, Malavazi I, Matsuo AL, Morais FV, Pereira M, Rodríguez-Brito S, Sakthikumar S, Salem-Izacc SM, Sykes SM, Teixeira MM, Vallejo MC, Walter MEMT, Yandava C, Young S, Zeng Q, Zucker J, Felipe MS, Goldman GH, Haas BJ, McEwen JG, Nino-Vega G, Puccia R, San-Blas G, Soares CMDA, Birren BW, Cuomo CA. Comparative genomic analysis of human fungal pathogens causing paracoccidioidomycosis. PLoS Genet 2011; 7:e1002345. [PMID: 22046142 PMCID: PMC3203195 DOI: 10.1371/journal.pgen.1002345] [Citation(s) in RCA: 136] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2011] [Accepted: 08/30/2011] [Indexed: 12/29/2022] Open
Abstract
Paracoccidioides is a fungal pathogen and the cause of paracoccidioidomycosis, a health-threatening human systemic mycosis endemic to Latin America. Infection by Paracoccidioides, a dimorphic fungus in the order Onygenales, is coupled with a thermally regulated transition from a soil-dwelling filamentous form to a yeast-like pathogenic form. To better understand the genetic basis of growth and pathogenicity in Paracoccidioides, we sequenced the genomes of two strains of Paracoccidioides brasiliensis (Pb03 and Pb18) and one strain of Paracoccidioides lutzii (Pb01). These genomes range in size from 29.1 Mb to 32.9 Mb and encode 7,610 to 8,130 genes. To enable genetic studies, we mapped 94% of the P. brasiliensis Pb18 assembly onto five chromosomes. We characterized gene family content across Onygenales and related fungi, and within Paracoccidioides we found expansions of the fungal-specific kinase family FunK1. Additionally, the Onygenales have lost many genes involved in carbohydrate metabolism and fewer genes involved in protein metabolism, resulting in a higher ratio of proteases to carbohydrate active enzymes in the Onygenales than their relatives. To determine if gene content correlated with growth on different substrates, we screened the non-pathogenic onygenale Uncinocarpus reesii, which has orthologs for 91% of Paracoccidioides metabolic genes, for growth on 190 carbon sources. U. reesii showed growth on a limited range of carbohydrates, primarily basic plant sugars and cell wall components; this suggests that Onygenales, including dimorphic fungi, can degrade cellulosic plant material in the soil. In addition, U. reesii grew on gelatin and a wide range of dipeptides and amino acids, indicating a preference for proteinaceous growth substrates over carbohydrates, which may enable these fungi to also degrade animal biomass. These capabilities for degrading plant and animal substrates suggest a duality in lifestyle that could enable pathogenic species of Onygenales to transfer from soil to animal hosts.
Collapse
Affiliation(s)
| | - Mia D. Champion
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Jason W. Holder
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Anna Muszewska
- Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warszawa, Poland
| | - Jonathan Goldberg
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Alexandre M. Bailão
- Laboratório de Biologia Molecular, Instituto de Ciências Biológicas, Universidade Federal de Goiás, Goiânia, Brazil
| | | | | | - Ana Maria Garcia
- Unidad de Biología Celular y Molecular, Corporación para Investigaciones Biológicas, Medellín, Colombia
| | - Marcin Grynberg
- Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warszawa, Poland
| | - Sharvari Gujja
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - David I. Heiman
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Matthew R. Henn
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Chinnappa D. Kodira
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Henry León-Narváez
- Centro de Microbiología y Biología Celular, Instituto Venezolano de Investigaciones Científicas, Caracas, Venezuela
| | - Larissa V. G. Longo
- Departamento de Microbiologia, Imunologia, e Parasitologia, Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo, Brazil
| | - Li-Jun Ma
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Iran Malavazi
- Faculdade de Ciências Farmacêuticas de Ribeirão Preto Universidade de São Paulo, Ribeirão Preto, Brazil
| | - Alisson L. Matsuo
- Departamento de Microbiologia, Imunologia, e Parasitologia, Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo, Brazil
| | - Flavia V. Morais
- Departamento de Microbiologia, Imunologia, e Parasitologia, Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo, Brazil
- Instituto de Pesquisa y Desenvolvimento, Universidade do Vale do Paraíba, São José dos Campos, Brazil
| | - Maristela Pereira
- Laboratório de Biologia Molecular, Instituto de Ciências Biológicas, Universidade Federal de Goiás, Goiânia, Brazil
| | - Sabrina Rodríguez-Brito
- Centro de Microbiología y Biología Celular, Instituto Venezolano de Investigaciones Científicas, Caracas, Venezuela
| | - Sharadha Sakthikumar
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Silvia M. Salem-Izacc
- Laboratório de Biologia Molecular, Instituto de Ciências Biológicas, Universidade Federal de Goiás, Goiânia, Brazil
| | - Sean M. Sykes
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | | | - Milene C. Vallejo
- Departamento de Microbiologia, Imunologia, e Parasitologia, Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo, Brazil
| | | | - Chandri Yandava
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Sarah Young
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Qiandong Zeng
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Jeremy Zucker
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Maria Sueli Felipe
- Instituto de Ciências Biológicas, Universidade de Brasília, Brasília, Brazil
| | - Gustavo H. Goldman
- Faculdade de Ciências Farmacêuticas de Ribeirão Preto Universidade de São Paulo, Ribeirão Preto, Brazil
- Laboratório Nacional de Ciência e Tecnologia do Bioetanol – CTBE, São Paulo, Brazil
| | - Brian J. Haas
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Juan G. McEwen
- Unidad de Biología Celular y Molecular, Corporación para Investigaciones Biológicas, Medellín, Colombia
- Facultad de Medicina, Universidad de Antioquia, Medellín, Colombia
| | - Gustavo Nino-Vega
- Centro de Microbiología y Biología Celular, Instituto Venezolano de Investigaciones Científicas, Caracas, Venezuela
| | - Rosana Puccia
- Departamento de Microbiologia, Imunologia, e Parasitologia, Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo, Brazil
| | - Gioconda San-Blas
- Centro de Microbiología y Biología Celular, Instituto Venezolano de Investigaciones Científicas, Caracas, Venezuela
| | | | - Bruce W. Birren
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Christina A. Cuomo
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| |
Collapse
|
21
|
Neely RK, Deen J, Hofkens J. Optical mapping of DNA: Single-molecule-based methods for mapping genomes. Biopolymers 2011; 95:298-311. [DOI: 10.1002/bip.21579] [Citation(s) in RCA: 87] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2010] [Revised: 12/15/2010] [Accepted: 12/15/2010] [Indexed: 11/09/2022]
|
22
|
Nagarajan N, Cook C, Di Bonaventura M, Ge H, Richards A, Bishop-Lilly KA, DeSalle R, Read TD, Pop M. Finishing genomes with limited resources: lessons from an ensemble of microbial genomes. BMC Genomics 2010; 11:242. [PMID: 20398345 PMCID: PMC2864248 DOI: 10.1186/1471-2164-11-242] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2009] [Accepted: 04/16/2010] [Indexed: 12/03/2022] Open
Abstract
While new sequencing technologies have ushered in an era where microbial genomes can be easily sequenced, the goal of routinely producing high-quality draft and finished genomes in a cost-effective fashion has still remained elusive. Due to shorter read lengths and limitations in library construction protocols, shotgun sequencing and assembly based on these technologies often results in fragmented assemblies. Correspondingly, while draft assemblies can be obtained in days, finishing can take many months and hence the time and effort can only be justified for high-priority genomes and in large sequencing centers. In this work, we revisit this issue in light of our own experience in producing finished and nearly-finished genomes for a range of microbial species in a small-lab setting. These genomes were finished with surprisingly little investments in terms of time, computational effort and lab work, suggesting that the increased access to sequencing might also eventually lead to a greater proportion of finished genomes from small labs and genomics cores.
Collapse
Affiliation(s)
- Niranjan Nagarajan
- Computational and Mathematical Biology, Genome Institute of Singapore 127726, Singapore.
| | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Mir KU. Sequencing genomes: from individuals to populations. BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2010; 8:367-78. [PMID: 19808932 DOI: 10.1093/bfgp/elp040] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
The whole genome sequences of Jim Watson and Craig Venter are early examples of personalized genomics, which promises to change how we approach healthcare in the future. Before personal sequencing can have practical medical benefits, however, and before it should be advocated for implementation at the population-scale, there needs to be a better understanding of which genetic variants influence which traits and how their effects are modified by epigenetic factors. Nonetheless, for forging links between DNA sequence and phenotype, efforts to sequence the genomes of individuals need to continue; this includes sequencing sub-populations for association studies which analyse the difference in sequence between disease affected and unaffected individuals. Such studies can only be applied on a large enough scale to be effective if the massive strides in sequencing technology that have recently occurred also continue.
Collapse
Affiliation(s)
- Kalim U Mir
- The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.
| |
Collapse
|
24
|
Abstract
Several sequencing technologies have been introduced in recent years that dramatically outperform the traditional Sanger technology in terms of throughput and cost. The data generated by these technologies are characterized by generally shorter read lengths (as low as 35 bp) and different error characteristics than Sanger data. Existing software tools for assembly and analysis of sequencing data are, therefore, ill-suited to handle the new types of data generated. This paper surveys the recent software packages aimed specifically at analyzing new generation sequencing data.
Collapse
Affiliation(s)
- Niranjan Nagarajan
- Center for Bioinformatics and Computational Biology, Institute for Advanced Computer Studies and Department of Computer Science, University of Maryland, College Park, MD, USA
| | | |
Collapse
|
25
|
|
26
|
Abstract
Research into genome assembly algorithms has experienced a resurgence due to new challenges created by the development of next generation sequencing technologies. Several genome assemblers have been published in recent years specifically targeted at the new sequence data; however, the ever-changing technological landscape leads to the need for continued research. In addition, the low cost of next generation sequencing data has led to an increased use of sequencing in new settings. For example, the new field of metagenomics relies on large-scale sequencing of entire microbial communities instead of isolate genomes, leading to new computational challenges. In this article, we outline the major algorithmic approaches for genome assembly and describe recent developments in this domain.
Collapse
Affiliation(s)
- Mihai Pop
- Department of Computer Science and the Center for Bioinformatics and Computational Biology at the University of Maryland, College Park, MD 20742, USA.
| |
Collapse
|
27
|
Nagarajan N, Read TD, Pop M. Scaffolding and validation of bacterial genome assemblies using optical restriction maps. Bioinformatics 2008; 24:1229-35. [PMID: 18356192 PMCID: PMC2373919 DOI: 10.1093/bioinformatics/btn102] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2007] [Revised: 03/05/2008] [Accepted: 03/16/2008] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION New, high-throughput sequencing technologies have made it feasible to cheaply generate vast amounts of sequence information from a genome of interest. The computational reconstruction of the complete sequence of a genome is complicated by specific features of these new sequencing technologies, such as the short length of the sequencing reads and absence of mate-pair information. In this article we propose methods to overcome such limitations by incorporating information from optical restriction maps. RESULTS We demonstrate the robustness of our methods to sequencing and assembly errors using extensive experiments on simulated datasets. We then present the results obtained by applying our algorithms to data generated from two bacterial genomes Yersinia aldovae and Yersinia kristensenii. The resulting assemblies contain a single scaffold covering a large fraction of the respective genomes, suggesting that the careful use of optical maps can provide a cost-effective framework for the assembly of genomes. AVAILABILITY The tools described here are available as an open-source package at ftp://ftp.cbcb.umd.edu/pub/software/soma
Collapse
|
28
|
Affiliation(s)
- C Aston
- Department of Chemistry, W. M. Keck Laboratory for Biomolecular Imaging, New York University, New York 10003, USA
| | | | | |
Collapse
|
29
|
Yokota H, Fung K, Trask BJ, van den Engh G, Sarikaya M, Aebersold R. Sharp DNA bends as landmarks of protein-binding sites on straightened DNA. Anal Chem 1999; 71:1663-7. [PMID: 10330902 DOI: 10.1021/ac981370x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We have developed a fluorescence-based method for mapping single or multiple protein-binding sites on straightened, large-size DNA molecules (> 5 kbp). In the described method, protein-DNA complexes were straightened and immobilized on a flat surface using surface tension. A fraction of the immobilized complexes displayed a sharp DNA bend with two DNA segments extending from the apex. The presence of DNA-binding proteins at the apex was verified by atomic force microscopy. The position of protein binding relative to the ends of the DNA molecule was determined by measuring the length of two DNA segments using fluorescence microscopy. We demonstrate the potential of the fluorescence-based method to localize protein-binding sites on the DNA template and to evaluate relative binding affinity. The proposed protein-binding-site mapping technique is simple and easy to perform. Practical applications include screening for DNA-binding proteins and the localization of protein-binding sites on large segments of DNA.
Collapse
Affiliation(s)
- H Yokota
- Department of Molecular Biotechnology and Material Sciences & Engineering, University of Washington, Seattle 98195, USA.
| | | | | | | | | | | |
Collapse
|
30
|
Abstract
High resolution chromatin/DNA fiber fluorescent in situ hybridisation (FISH) is a powerful system for physical mapping and genome research. With direct visualisation of molecular probes along released chromatin or DNA fiber, fiber FISH has become the method of choice to order genes or DNA markers within chromosomal regions of interest. Combined with DNA-protein in situ codetection fiber FISH shall play a more important role for analysis of genome function. In this paper the concept and technical developments of fiber FISH are reviewed with the emphasis of comparison on the various protocols. Future challenges are also discussed along with the highlights of the successful applications achieved by fiber FISH methodology.
Collapse
Affiliation(s)
- H H Heng
- Department of Genetics, Hospital for Sick Children, Toronto, Ontario, Canada.
| | | |
Collapse
|
31
|
Abstract
Genome maps have been constructed for the mycobacterial pathogens Mycobacterium leprae and Mycobacterium tuberculosis, as well as for the attenuated vaccine strain Mycobacterium bovis BCG Pasteur. While the chromosomes of M. tuberculosis and M. bovis BCG Pasteur show extensive conservation at the gross level, comparison with M. leprae revealed a high degree of diversification, with a mosaic-like pattern apparent. The ordered libraries of M. tuberculosis and M. leprae produced during the course of these studies played a central role in the genome sequencing projects of these two bacilli, showing the utility of this approach for systematic sequencing of bacterial genomes.
Collapse
Affiliation(s)
- W J Philipp
- Institute for Medical Microbiology, University of Berne, Switzerland.
| | | | | | | |
Collapse
|
32
|
Yokota H, Johnson F, Lu H, Robinson RM, Belu AM, Garrison MD, Ratner BD, Trask BJ, Miller DL. A new method for straightening DNA molecules for optical restriction mapping. Nucleic Acids Res 1997; 25:1064-70. [PMID: 9023119 PMCID: PMC146532 DOI: 10.1093/nar/25.5.1064] [Citation(s) in RCA: 80] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
We have developed an improved method of straightening DNA molecules for use in optical restriction mapping. The DNA was straightened on 3-aminopropyltriethoxysilane-coated glass slides using surface tension generated by a moving meniscus. In our method the meniscus motion was controlled mechanically, which provides advantages of speed and uniformity of the straightened molecules. Variation in the affinity of the silanized surfaces for DNA was compensated by precoating the slide with single-stranded non-target blocking DNA. A small amount of MgCl2 added to the DNA suspension increased the DNA-surface affinity and was necessary for efficient restriction enzyme digestion of the straightened surface-bound DNA. By adjusting the amounts of blocking DNA and MgCl2, we prepared slides that contained many straight parallel DNA molecules. Straightened lambda phage DNA (48 kb) bound to a slide surface was digested by EcoRI restriction endonuclease, and the resulting restriction fragments were imaged by fluorescence microscopy using a CCD camera. The observed fragment lengths showed excellent agreement with their predicted lengths.
Collapse
Affiliation(s)
- H Yokota
- Department of Molecular Biotechnology, University of Washington, Seattle, WA 98195, USA
| | | | | | | | | | | | | | | | | |
Collapse
|