1
|
Abstract
We describe an efficient algorithm to construct genome wide haplotype restriction maps of an individual by aligning single molecule DNA fragments collected with Optical Mapping technology. Using this algorithm and small amount of genomic material, we can construct the parental haplotypes for each diploid chromosome for any individual. Since such haplotype maps reveal the polymorphisms due to single nucleotide differences (SNPs) and small insertions and deletions (RFLPs), they are useful in association studies, studies involving genomic instabilities in cancer, and genetics, and yet incur relatively low cost and provide high throughput. If the underlying problem is formulated as a combinatorial optimization problem, it can be shown to be NP-complete (a special case of K-population problem). But by effectively exploiting the structure of the underlying error processes and using a novel analog of the Baum-Welch algorithm for HMM models, we devise a probabilistic algorithm with a time complexity that is linear in the number of markers for an epsilon-approximate solution. The algorithms were tested by constructing the first genome wide haplotype restriction map of the microbe T. pseudoana, as well as constructing a haplotype restriction map of a 120 Mb region of Human chromosome 4. The frequency of false positives and false negatives was estimated using simulated data. The empirical results were found very promising.
Collapse
|
2
|
Lim A, Dimalanta ET, Potamousis KD, Yen G, Apodoca J, Tao C, Lin J, Qi R, Skiadas J, Ramanathan A, Perna NT, Plunkett G, Burland V, Mau B, Hackett J, Blattner FR, Anantharaman TS, Mishra B, Schwartz DC. Shotgun optical maps of the whole Escherichia coli O157:H7 genome. Genome Res 2001; 11:1584-93. [PMID: 11544203 PMCID: PMC311123 DOI: 10.1101/gr.172101] [Citation(s) in RCA: 70] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2000] [Accepted: 06/04/2001] [Indexed: 11/24/2022]
Abstract
We have constructed NheI and XhoI optical maps of Escherichia coli O157:H7 solely from genomic DNA molecules to provide a uniquely valuable scaffold for contig closure and sequence validation. E. coli O157:H7 is a common pathogen found in contaminated food and water. Our approach obviated the need for the analysis of clones, PCR products, and hybridizations, because maps were constructed from ensembles of single DNA molecules. Shotgun sequencing of bacterial genomes remains labor-intensive, despite advances in sequencing technology. This is partly due to manual intervention required during the last stages of finishing. The applicability of optical mapping to this problem was enhanced by advances in machine vision techniques that improved mapping throughput and created a path to full automation of mapping. Comparisons were made between maps and sequence data that characterized sequence gaps and guided nascent assemblies.
Collapse
Affiliation(s)
- A Lim
- Laboratory for Molecular and Computational Genomics, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3
|
Perna NT, Plunkett G, Burland V, Mau B, Glasner JD, Rose DJ, Mayhew GF, Evans PS, Gregor J, Kirkpatrick HA, Pósfai G, Hackett J, Klink S, Boutin A, Shao Y, Miller L, Grotbeck EJ, Davis NW, Lim A, Dimalanta ET, Potamousis KD, Apodaca J, Anantharaman TS, Lin J, Yen G, Schwartz DC, Welch RA, Blattner FR. Genome sequence of enterohaemorrhagic Escherichia coli O157:H7. Nature 2001; 409:529-33. [PMID: 11206551 DOI: 10.1038/35054089] [Citation(s) in RCA: 1470] [Impact Index Per Article: 63.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The bacterium Escherichia coli O157:H7 is a worldwide threat to public health and has been implicated in many outbreaks of haemorrhagic colitis, some of which included fatalities caused by haemolytic uraemic syndrome. Close to 75,000 cases of O157:H7 infection are now estimated to occur annually in the United States. The severity of disease, the lack of effective treatment and the potential for large-scale outbreaks from contaminated food supplies have propelled intensive research on the pathogenesis and detection of E. coli O157:H7 (ref. 4). Here we have sequenced the genome of E. coli O157:H7 to identify candidate genes responsible for pathogenesis, to develop better methods of strain detection and to advance our understanding of the evolution of E. coli, through comparison with the genome of the non-pathogenic laboratory strain E. coli K-12 (ref. 5). We find that lateral gene transfer is far more extensive than previously anticipated. In fact, 1,387 new genes encoded in strain-specific clusters of diverse sizes were found in O157:H7. These include candidate virulence factors, alternative metabolic capacities, several prophages and other new functions--all of which could be targets for surveillance.
Collapse
Affiliation(s)
- N T Perna
- Genome Center of Wisconsin, and Department of Animal Health and Biomedical Sciences, University of Wisconsin, Madison 53706, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
4
|
Lai Z, Jing J, Aston C, Clarke V, Apodaca J, Dimalanta ET, Carucci DJ, Gardner MJ, Mishra B, Anantharaman TS, Paxia S, Hoffman SL, Craig Venter J, Huff EJ, Schwartz DC. A shotgun optical map of the entire Plasmodium falciparum genome. Nat Genet 1999; 23:309-13. [PMID: 10610179 DOI: 10.1038/15484] [Citation(s) in RCA: 67] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The unicellular parasite Plasmodium falciparum is the cause of human malaria, resulting in 1.7-2.5 million deaths each year. To develop new means to treat or prevent malaria, the Malaria Genome Consortium was formed to sequence and annotate the entire 24.6-Mb genome. The plan, already underway, is to sequence libraries created from chromosomal DNA separated by pulsed-field gel electrophoresis (PFGE). The AT-rich genome of P. falciparum presents problems in terms of reliable library construction and the relative paucity of dense physical markers or extensive genetic resources. To deal with these problems, we reasoned that a high-resolution, ordered restriction map covering the entire genome could serve as a scaffold for the alignment and verification of sequence contigs developed by members of the consortium. Thus optical mapping was advanced to use simply extracted, unfractionated genomic DNA as its principal substrate. Ordered restriction maps (BamHI and NheI) derived from single molecules were assembled into 14 deep contigs corresponding to the molecular karyotype determined by PFGE (ref. 3).
Collapse
Affiliation(s)
- Z Lai
- W.M. Keck Laboratory for Biomolecular Imaging, Department of Chemistry, New York University, New York, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Abstract
Optical mapping is an approach for the rapid, automated, non-electrophoretic construction of ordered restriction maps of DNA from ensembles of single molecules. Previously, we used optical mapping to make high-resolution maps of large insert clones such as bacterial artificial chromosomes (BAC) and large genomic DNA molecules. Here, we describe a combination of optical mapping and long-range polymerase chain reaction (PCR), in a process we term optical PCR, which enables automated construction of ordered restriction maps of long-range PCR products spanning human genomic loci. Specifically, we amplified three long PCR products, each averaging 14.6 kb in length, which span the 37-kb human tissue plasminogen activator (TPA) gene. PCR products were surface mounted in gridded arrays, and samples were mapped in parallel with either ScaI, XmnI, HpaI, ClaI, or BglII. A contig of overlapping high-resolution maps was generated, which agreed closely with maps predicted from sequence data. The data demonstrate an approach to construct physical maps of genomic loci where very little prior sequence information exists, since the only sequence needed is that required to anchor PCR primers. Large segments of genomic DNA (within the practical limits imposed by long-range PCR) can be mapped quickly and to high resolution without the use of cloning vectors.
Collapse
Affiliation(s)
- J Skiadas
- W.M. Keck Laboratory for Biomolecular Imaging, New York University, Department of Chemistry, Room 866, 31 Washington Place, New York, New York 10003, USA
| | | | | | | | | | | |
Collapse
|
6
|
Lin J, Qi R, Aston C, Jing J, Anantharaman TS, Mishra B, White O, Daly MJ, Minton KW, Venter JC, Schwartz DC. Whole-genome shotgun optical mapping of Deinococcus radiodurans. Science 1999; 285:1558-62. [PMID: 10477518 DOI: 10.1126/science.285.5433.1558] [Citation(s) in RCA: 150] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
A whole-genome restriction map of Deinococcus radiodurans, a radiation-resistant bacterium able to survive up to 15,000 grays of ionizing radiation, was constructed without using DNA libraries, the polymerase chain reaction, or electrophoresis. Very large, randomly sheared, genomic DNA fragments were used to construct maps from individual DNA molecules that were assembled into two circular overlapping maps (2.6 and 0.415 megabases), without gaps. A third smaller chromosome (176 kilobases) was identified and characterized. Aberrant nonlinear DNA structures that may define chromosome structure and organization, as well as intermediates in DNA repair, were directly visualized by optical mapping techniques after gamma irradiation.
Collapse
Affiliation(s)
- J Lin
- W. M. Keck Laboratory for Biomolecular Imaging, Department of Chemistry, New York University, 31 Washington Place, New York, NY 10003, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Jing J, Lai Z, Aston C, Lin J, Carucci DJ, Gardner MJ, Mishra B, Anantharaman TS, Tettelin H, Cummings LM, Hoffman SL, Venter JC, Schwartz DC. Optical mapping of Plasmodium falciparum chromosome 2. Genome Res 1999; 9:175-81. [PMID: 10022982 PMCID: PMC310721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/1998] [Accepted: 12/15/1998] [Indexed: 02/10/2023]
Abstract
Detailed restriction maps of microbial genomes are a valuable resource in genome sequencing studies but are toilsome to construct by contig construction of maps derived from cloned DNA. Analysis of genomic DNA enables large stretches of the genome to be mapped and circumvents library construction and associated cloning artifacts. We used pulsed-field gel electrophoresis purified Plasmodium falciparum chromosome 2 DNA as the starting material for optical mapping, a system for making ordered restriction maps from ensembles of individual DNA molecules. DNA molecules were bound to derivatized glass surfaces, cleaved with NheI or BamHI, and imaged by digital fluorescence microscopy. Large pieces of the chromosome containing ordered DNA restriction fragments were mapped. Maps were assembled from 50 molecules producing an average contig depth of 15 molecules and high-resolution restriction maps covering the entire chromosome. Chromosome 2 was found to be 976 kb by optical mapping with NheI, and 946 kb with BamHI, which compares closely to the published size of 947 kb from large-scale sequencing. The maps were used to further verify assemblies from the plasmid library used for sequencing. Maps generated in silico from the sequence data were compared to the optical mapping data, and good correspondence was found. Such high-resolution restriction maps may become an indispensable resource for large-scale genome sequencing projects.
Collapse
Affiliation(s)
- J Jing
- W.M. Keck Laboratory for Biomolecular Imaging, New York University, Department of Chemistry, New York, New York 10003 USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Jing J, Reed J, Huang J, Hu X, Clarke V, Edington J, Housman D, Anantharaman TS, Huff EJ, Mishra B, Porter B, Shenker A, Wolfson E, Hiort C, Kantor R, Aston C, Schwartz DC. Automated high resolution optical mapping using arrayed, fluid-fixed DNA molecules. Proc Natl Acad Sci U S A 1998; 95:8046-51. [PMID: 9653137 PMCID: PMC20926 DOI: 10.1073/pnas.95.14.8046] [Citation(s) in RCA: 229] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/1998] [Accepted: 04/23/1998] [Indexed: 02/08/2023] Open
Abstract
New mapping approaches construct ordered restriction maps from fluorescence microscope images of individual, endonuclease-digested DNA molecules. In optical mapping, molecules are elongated and fixed onto derivatized glass surfaces, preserving biochemical accessibility and fragment order after enzymatic digestion. Measurements of relative fluorescence intensity and apparent length determine the sizes of restriction fragments, enabling ordered map construction without electrophoretic analysis. The optical mapping system reported here is based on our physical characterization of an effect using fluid flows developed within tiny, evaporating droplets to elongate and fix DNA molecules onto derivatized surfaces. Such evaporation-driven molecular fixation produces well elongated molecules accessible to restriction endonucleases, and notably, DNA polymerase I. We then developed the robotic means to grid DNA spots in well defined arrays that are digested and analyzed in parallel. To effectively harness this effect for high-throughput genome mapping, we developed: (i) machine vision and automatic image acquisition techniques to work with fixed, digested molecules within gridded samples, and (ii) Bayesian inference approaches that are used to analyze machine vision data, automatically producing high-resolution restriction maps from images of individual DNA molecules. The aggregate significance of this work is the development of an integrated system for mapping small insert clones allowing biochemical data obtained from engineered ensembles of individual molecules to be automatically accumulated and analyzed for map construction. These approaches are sufficiently general for varied biochemical analyses of individual molecules using statistically meaningful population sizes.
Collapse
Affiliation(s)
- J Jing
- W. M. Keck Laboratory for Biomolecular Imaging, Department of Chemistry, New York University, 31 Washington Place, New York, NY 10003, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Abstract
In this paper, we describe our algorithmic approach to constructing ordered restriction maps based on the data created from the images of population of individual DNA molecules (clones) digested by restriction enzymes. The goal is to devise map-making algorithms capable of producing high-resolution, high-accuracy maps rapidly and in a scalable manner. The resulting software is a key component of our optical mapping automation tools and has been used routinely to map cosmid, lambda and BAC clones. The experimental results appear highly promising.
Collapse
Affiliation(s)
- T S Anantharaman
- Computer Science and Chemistry Department, New York University, New York 10012, USA
| | | | | |
Collapse
|