1
|
Lu WF, Hsu WL. A test for the consecutive ones property on noisy data--application to physical mapping and sequence assembly. J Comput Biol 2004; 10:709-35. [PMID: 14633395 DOI: 10.1089/106652703322539051] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
A (0,1)-matrix satisfies the consecutive ones property (COP) for the rows if there exists a column permutation such that the ones in each row of the resultant matrix are consecutive. The consecutive ones test is useful for physical mapping and DNA sequence assembly, for example, in the STS content mapping of YAC library, and in the Bactig assembly based on STS as well as EST markers. The linear time algorithm by Booth and Lueker (1976) for this problem has a serious drawback: the data must be error free. However, laboratory work is never flawless. We devised a new iterative clustering algorithm for this problem, which has the following advantages: 1. If the original matrix satisfies the COP, then the algorithm will produce a column ordering realizing it without any fill-in. 2. Under moderate assumptions, the algorithm can accommodate the following four types of errors: false negatives, false positives, nonunique probes, and chimeric clones. Note that in some cases (low quality EST marker identification), NPs occur because of repeat sequences. 3. In case some local data is too noisy, our algorithm could likely discover that and suggest additional lab work to reduce the degree of ambiguity in that part. 4. A unique feature of our algorithm is that, rather than forcing all probes to be included and ordered in the final arrangement, our algorithm would delete some noisy probes. Thus, it could produce more than one contig. The gaps are created mostly by noisy probes.
Collapse
Affiliation(s)
- Wei-Fu Lu
- Institute of Computer and Information Science, National Chiao Tung University, Hsin-chu, Taiwan, ROC
| | | |
Collapse
|
2
|
Enkerli J, Reed H, Briley A, Bhatt G, Covert SF. Physical map of a conditionally dispensable chromosome in Nectria haematococca mating population VI and location of chromosome breakpoints. Genetics 2000; 155:1083-94. [PMID: 10880471 PMCID: PMC1461165 DOI: 10.1093/genetics/155.3.1083] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Certain isolates of the plant pathogenic fungus Nectria haematococca mating population (MP) VI contain a 1.6-Mb conditionally dispensable (CD) chromosome carrying the phytoalexin detoxification genes MAK1 and PDA6-1. This chromosome is structurally unstable during sexual reproduction. As a first step in our analysis of the mechanisms underlying this chromosomal instability, hybridization between overlapping cosmid clones was used to construct a map of the MAK1 PDA6-1 chromosome. The map consists of 33 probes that are linked by 199 cosmid clones. The polymerase chain reaction and Southern analysis of N. haematococca MP VI DNA digested with infrequently cutting restriction enzymes were used to close gaps and order the hybridization-derived contigs. Hybridization to a probe extended from telomeric repeats was used to anchor the ends of the map to the actual chromosome ends. The resulting map is estimated to cover 95% of the MAK1 PDA6-1 chromosome and is composed of two ordered contigs. Thirty-eight percent of the clones in the minimal map are known to contain repeated DNA sequences. Three dispersed repeats were cloned during map construction; each is present in five to seven copies on the chromosome. The cosmid clones representing the map were probed with deleted forms of the CD chromosome and the results were integrated into the map. This allowed the identification of chromosome breakpoints and deletions.
Collapse
Affiliation(s)
- J Enkerli
- Department of Botany, University of Georgia, Athens, Georgia 30602, USA
| | | | | | | | | |
Collapse
|
3
|
Abstract
The parking strategy is an iterative approach to DNA sequencing. Each iteration consists of sequencing a novel portion of target DNA that does not overlap any previously sequenced region. Subject to the constraint of no overlap, each new region is chosen randomly. A parking strategy is often ideal in the early stages of a project for rapidly generating unique data. As a project progresses, parking becomes progressively more expensive and eventually prohibitive. We present a mathematical model with a generalization to allow for overlaps. This model predicts multiple parameters, including progress, costs, and the distribution of gap sizes left by a parking strategy. The highly fragmented nature of the gaps left after an initial parking strategy may make it difficult to finish a project efficiently. Therefore, in addition to our parking model, we model gap closing by walking. Our gap-closing model is generalizable to many other strategies. Our discussion includes modified parking strategies and hybrids with other strategies. A hybrid parking strategy has been employed for portions of the Human Genome Project.
Collapse
Affiliation(s)
- J C Roach
- The Institute for Systems Biology, Seattle, Washington 98105 USA.
| | | | | |
Collapse
|
4
|
Burmester T, Mink M, Pál M, Lászlóffy Z, Lepesant J, Maróy P. Genetic and molecular analysis in the 70CD region of the third chromosome of Drosophila melanogaster. Gene 2000; 246:157-67. [PMID: 10767537 DOI: 10.1016/s0378-1119(00)00066-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
A collection of lethal and semi-lethal P-element insertions in the 70CD region of chromosome 3 of Drosophila melanogaster was used to investigate genes and gene arrangements by a combination of genetic, cytological, functional and molecular methods. The 12 lethal insertions studied fall into seven complementation groups of six genes. Lethal phases, expression patterns and other phenotypic aspects of these genes were determined. The genes and additional available sequences were placed on cloned genomic DNA fragments and arranged in an EcoRI map of 150kb that covers approximately the bands 70C7-8 to 70D1. Determination of deficiency breakpoints links the genetic, physical and molecular data. The sequences adjacent to seven independent P-element insertions were established after plasmid rescue or polymerase chain reaction. Similarity searches allowed the assignment of the P-element insertions to known mutations, expressed sequence tags, sequence tagged sites, or homologous genes of other species. Among these were identified a putative transacylase, a putative cell cycle gene, and the gene responsible for the dominant Polycomb-suppressor phenotype of devenir. The genomic sequence of the l(3)70Ca/b gene reveals a novel heat shock protein (hsc70Cb). l(3)70Da was identified as a member of the CDC48/PEX1 ATPase family and its coding sequence was determined.
Collapse
Affiliation(s)
- T Burmester
- Department of Genetics, Attila Jozsef University, H-6726, Szeged, Hungary
| | | | | | | | | | | |
Collapse
|
5
|
Mozo T, Dewar K, Dunn P, Ecker JR, Fischer S, Kloska S, Lehrach H, Marra M, Martienssen R, Meier-Ewert S, Altmann T. A complete BAC-based physical map of the Arabidopsis thaliana genome. Nat Genet 1999; 22:271-5. [PMID: 10391215 DOI: 10.1038/10334] [Citation(s) in RCA: 120] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Arabidopsis thaliana is a small flowering plant that serves as the major model system in plant molecular genetics. The efforts of many scientists have produced genetic maps that provide extensive coverage of the genome (http://genome-www. stanford.edu/Arabidopsis/maps.html). Recently, detailed YAC, BAC, P1 and cosmid-based physical maps (that is, representations of genomic regions as sets of overlapping clones of corresponding libraries) have been established that extend over wide genomic areas ranging from several hundreds of kilobases to entire chromosomes. These maps provide an entry to gain deeper insight into the A. thaliana genome structure. A. thaliana has been chosen as the subject of the first large-scale project intended to determine the full genome sequence of a plant. This sequencing project, together with the increasing interest in map-based gene cloning, has highlighted the requirement for a complete and accurate physical map of this plant species. To supply the scientific community with a high-quality resource, we present here a complete physical map of A. thaliana using essentially the IGF BAC library. The map consists of 27 contigs that cover the entire genome, except for the presumptive centromeric regions, nucleolar organization regions (NOR) and telomeric areas. This is the first reported map of a complex organism based entirely on BAC clones and it represents the most homogeneous and complete physical map established to date for any plant genome. Furthermore, the analysis performed here serves as a model for an efficient physical mapping procedure using BAC clones that can be applied to other complex genomes.
Collapse
Affiliation(s)
- T Mozo
- Max-Planck-Institut für molekulare Pflanzenphysiologie, Golm, Germany
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Mozo T, Fischer S, Meier-Ewert S, Lehrach H, Altmann T. Use of the IGF BAC library for physical mapping of the Arabidopsis thaliana genome. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 1998; 16:377-84. [PMID: 9881158 DOI: 10.1046/j.1365-313x.1998.00299.x] [Citation(s) in RCA: 21] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
In order to generate a physical map of the Arabidopsis thaliana genome based on bacterial artificial chromosome clones (BACs), an iterative high throughput hybridisation strategy was applied and its efficiency was evaluated. Thus, probes generated from both ends of 500 BAC clones selected from the Arabidopsis-IGF-BAC library were hybridised to the entire library gridded on high density filters. The 1000 hybridisation reactions identified 4496 clones (41.8% of the complete library, or 50.3% if organellar, centromeric, and ribosomal DNA carrying clones are excluded) which were assembled into a minimum of 220 contigs. These results demonstrate the viability of the applied 'double-end clone-limited/sampling without replacement' hybridisation strategy for the generation of a high resolution physical map, and provide a highly useful resource for map-based gene cloning approaches and further genome analysis.
Collapse
Affiliation(s)
- T Mozo
- Institut für Genbiologische Forschung Berlin GmbH, Germany
| | | | | | | | | |
Collapse
|
7
|
Veklerov E, Eeckman FH, Martin CH. MTT: a software tool for quality control in sequence assembly. MICROBIAL & COMPARATIVE GENOMICS 1998; 1:179-84. [PMID: 9689212 DOI: 10.1089/mcg.1996.1.179] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
A large-scale sequencing project requires a tool to control the quality of the input data because a sizable number of trace data may be of low quality. If these data are allowed to enter the sequence assembly pipeline, harm will be done. Hence, it is important to detect such data as soon as possible. MTT (Move-Track-Trim) is a software package analyzing the quality of the lanes. It subjects each lane to a series of tests, and if a lane does not pass all tests, it is flagged as a "bad" lane. The use has a chance to examine both the "good" and the "bad" lanes and reclassify a "bad" lane as "good," or vice versa. Alternatively, the user may decide to retrack the gel or get rid of some lanes altogether. As a by-product of the analysis, MTT performs other useful functions. It trims the lanes and compresses the lane files and moves them to the directories where assembly is carried out. It also generates some useful statistics describing the quality of the gel.
Collapse
Affiliation(s)
- E Veklerov
- Lawrence Berkeley National Laboratory, University of California, Berkeley, USA
| | | | | |
Collapse
|
8
|
Prade RA, Griffith J, Kochut K, Arnold J, Timberlake WE. In vitro reconstruction of the Aspergillus (= Emericella) nidulans genome. Proc Natl Acad Sci U S A 1997; 94:14564-9. [PMID: 9405653 PMCID: PMC25056 DOI: 10.1073/pnas.94.26.14564] [Citation(s) in RCA: 49] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
A physical map of the 31-megabase Aspergillus nidulans genome is reported, in which 94% of 5,134 cosmids are assigned to 49 contiguous segments. The physical map is the result of a two-way ordering process, in which clones and probes were ordered simultaneously on a binary DNA/DNA hybridization matrix. Compression by elimination of redundant clones resulted in a minimal map, which is a chromosome walk. Repetitive DNA is nonrandomly dispersed in the A. nidulans genome, reminiscent of heterochromatic banding patterns of higher eukaryotes. We hypothesize gene clusters may arise by horizontal transfer and spread by transposition to explain the nonrandom pattern of repeats along chromosomes.
Collapse
Affiliation(s)
- R A Prade
- Department of Microbiology and Molecular Genetics, Oklahoma State University, Stillwater, OK 74078-0289, USA
| | | | | | | | | |
Collapse
|
9
|
Hunter K. Application of interspersed repetitive sequence polymerase chain reaction for construction of yeast artificial chromosome contigs. Methods 1997; 13:327-35. [PMID: 9480779 DOI: 10.1006/meth.1997.0541] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Construction of physical maps across candidate regions is one of the rate-limiting steps of positional cloning projects. To date, most physical maps have been constructed by polymerase chain reaction (PCR)-based sequence-tagged site (STS) content mapping. While effective, this technique has a number of disadvantages including problems with yeast artificial chromosome (YAC) chimerism, the time and effort required to generate new STSs from YAC ends, the cost of primer synthesis for large contiging projects, and the time, effort, and expense necessary for screening each STS in the two-tiered hierarchical YAC library screening format. An alternative strategy, interspersed repetitive sequence (IRS) PCR genomics, alleviates many of these constraints. Clonal overlap is detected by hybridization of individual IRS-PCR products to IRS-PCR product pools of the three-dimensional coordinate pools of YAC libraries in dot-blot format. Entire libraries can be screened in a single step, and multiple libraries can be screened simultaneously. Cloning YAC fragments, sequencing, and primer generation are eliminated, increasing the efficiency of contig construction and reducing the expense. In addition, the genomic location of the individual IRS-PCR products can also be simultaneously determined by screening either interspecific backcrosses or radiation hybrid panels, in dot-blot format, confirming contig extension in the region of interest.
Collapse
Affiliation(s)
- K Hunter
- Fox Chase Cancer Center, Philadelphia, Pennsylvania 19111, USA
| |
Collapse
|
10
|
Bouffard GG, Idol JR, Braden VV, Iyer LM, Cunningham AF, Weintraub LA, Touchman JW, Mohr-Tidwell RM, Peluso DC, Fulton RS, Ueltzen MS, Weissenbach J, Magness CL, Green ED. A physical map of human chromosome 7: an integrated YAC contig map with average STS spacing of 79 kb. Genome Res 1997; 7:673-92. [PMID: 9253597 DOI: 10.1101/gr.7.7.673] [Citation(s) in RCA: 67] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
The construction of highly integrated and annotated physical maps of human chromosomes represents a critical goal of the ongoing Human Genome Project. Our laboratory has focused on developing a physical map of human chromosome 7, a approximately 170-Mb segment of DNA that corresponds to an estimated 5% of the human genome. Using a yeast artificial chromosome (YAC)-based sequence-tagged site (STS)-content mapping strategy, 2150 chromosome 7-specific STSs have been established and mapped to a collection of YACs highly enriched for chromosome 7 DNA. The STSs correspond to sequences generated from a variety of DNA sources, with particular emphasis placed on YAC insert ends, genetic markers, and genes. The YACs include a set of relatively nonchimeric clones from a human-hamster hybrid cell line as well as clones isolated from total genomic libraries. For map integration, we have localized 260 STSs corresponding to Genethon genetic markers and 259 STSs corresponding to markers orders by radiation hybrid (RH) mapping on our YAC contigs. Analysis of the data with the program SEGMAP results in the assembly of 22 contigs that are "anchored" on the Genethon genetic map, the RH map, and/or the cytogenetic map. These 22 contigs are ordered relative to one another, are (in all but 3 cases) oriented relative to the centromere and telomeres, and contain > 98% of the mapped STSs. The largest anchored YAC contig, accounting for most of 7p, contains 634 STSs and 1260 YACs. An additional 14 contigs, accounting for approximately 1.5% of the mapped STSs, are assembled but remain unanchored on either the genetic or RH map. Therefore, these 14 "orphan" contigs are not ordered relative to other contigs. In our contig maps, adjacent STSs are connected by two or more YACs in > 95% of cases. With 2150 mapped STSs, our map provides an average STS spacing of approximately 79 kb. The physical map we report here exceeds the goal of 100-kb average STS spacing and should provide an excellent framework for systematic sequencing of the chromosome.
Collapse
Affiliation(s)
- G G Bouffard
- Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Abstract
The aim of this paper is to provide general results for predicting progress in a physical mapping project by anchoring random clones, when clones and anchors are not homogeneously distributed along the genome. A complete physical map of the DNA of an organism consists of overlapping clones spanning the entire genome. Several schemes can be used to construct such a map, depending on the way that clones overlap. We focus here on the approach consisting of assembling clones sharing a common random short sequence called an anchor. Some mathematical analyses providing statistical properties of anchored clones have been developed in the stationary case. Modeling the clone and anchor processes as nonhomogeneous Poisson processes provides such an analysis in a general nonstationary framework. We apply our results to two natural nonhomogeneous models to illustrate the effect of inhomogeneity. This study reveals that using homogeneous processes for clones and anchors provides an overly optimistic assessment of the progress of the mapping project.
Collapse
Affiliation(s)
- S Schbath
- I.N.R.A., Unité de Biométrie, Jouy-en-Josas, France.
| |
Collapse
|
12
|
Nagaraja R, MacMillan S, Kere J, Jones C, Griffin S, Schmatz M, Terrell J, Shomaker M, Jermak C, Hott C, Masisi M, Mumm S, Srivastava A, Pilia G, Featherstone T, Mazzarella R, Kesterson S, McCauley B, Railey B, Burough F, Nowotny V, D'Urso M, States D, Brownstein B, Schlessinger D. X chromosome map at 75-kb STS resolution, revealing extremes of recombination and GC content. Genome Res 1997; 7:210-22. [PMID: 9074925 DOI: 10.1101/gr.7.3.210] [Citation(s) in RCA: 95] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
A YAC/STS map of the X chromosome has reached an inter-STS resolution of 75 kb. The map density is sufficient to provide YACs or other large-insert clones that are cross-validated as sequencing substrates across the chromosome. Marker density also permits estimates of regional gene content and a detailed comparison of genetic and physical map distances. Five regions are detected with relatively high G + C, correlated with gene richness; and a 17-Mb region with very low recombination is revealed between the Xq13.3 [XIST] and Xq21.3 XY homology loci.
Collapse
Affiliation(s)
- R Nagaraja
- Center for Genetics in Medicine, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Bouffard GG, Iyer LM, Idol JR, Braden VV, Cunningham AF, Weintraub LA, Mohr-Tidwell RM, Peluso DC, Fulton RS, Leckie MP, Green ED. A collection of 1814 human chromosome 7-specific STSs. Genome Res 1997; 7:59-64. [PMID: 9037602 DOI: 10.1101/gr.7.1.59] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
An established goal of the ongoing Human Genome Project is the development and mapping of sequence-tagged sites (STSs) every 100 kb, on average, across all human chromosomes. En route to constructing such a physical map of human chromosome 7, we have generated 1814 chromosome 7-specific STSs. The corresponding PCR assays were designed by the use of DNA sequence determined in our laboratory (79%) or generated elsewhere (21%) and were demonstrated to be suitable for screening yeast artificial chromosome (YAC) libraries. This collection provides the requisite landmarks for constructing a physical map of chromosome 7 at < 100-kb average spacing of STSs.
Collapse
|
14
|
Abstract
The past few years have been significant advances in our understanding of eukaryotic genomes. In the field of parasitology, this is best exemplified by the application of genome mapping techniques to the study of genome structure and function in the protozoan parasite, Leishmania. Although much is known about the organism and the diseases it causes, molecular genetics has only recently begun to play a major part in elucidating some of the unusual characteristics of this interesting parasite. Mapping of the small (35 Mb) genome and determination of the functional role of genes by the application of in vitro homologous gene targeting techniques are revealing novel avenues for the development of prophylactic measures.
Collapse
Affiliation(s)
- A C Ivens
- Department of Biochemistry, Imperial College of Science, Technology and Medicine, London, UK.
| | | |
Collapse
|
15
|
Kimmerly W, Stultz K, Lewis S, Lewis K, Lustre V, Romero R, Benke J, Sun D, Shirley G, Martin C, Palazzolo M. A P1-based physical map of the Drosophila euchromatic genome. Genome Res 1996; 6:414-30. [PMID: 8743991 DOI: 10.1101/gr.6.5.414] [Citation(s) in RCA: 57] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
A PCR-based sequence-tagged site (STS) content mapping strategy has been used to generate a physical map with 90% coverage of the 120-Mb euchromatic portion of the Drosophila genome. To facilitate map completion, the bulk of the STS markers was chosen in a nonrandom fashion. To ensure that all contigs were localized in relation to each other and the genome, these contig-building procedures were performed in conjunction with a large-scale in situ hybridization analysis of randomly selected clones from a Drosophila genomic library that had been generated in a P1 cloning vector. To date, the map consists of 649 contigs with an STS localized on average every 50 kb. This is the first whole genome that has been mapped based on a library constructed with large inserts in a vector that is maintained in Escherichia coli as a single-copy plasmid.
Collapse
Affiliation(s)
- W Kimmerly
- Drosophila Genome Center, University of California, Berkeley 94720, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Affiliation(s)
- G M Rubin
- Department of Molecular and Cell Biology, University of California at Berkeley 94720-3200, USA.
| |
Collapse
|
17
|
Abstract
Arabidopsis thaliana is a small flowering plant that is a member of the family cruciferae. It has many characteristics--diploid genetics, rapid growth cycle, relatively low repetitive DNA content, and small genome size--that recommend it as the model for a plant genome project. The current status of the genetic and physical maps, as well as efforts to sequence the genome, are presented. Examples are given of genes isolated by using map-based cloning. The importance of the Arabidopsis project for plant biology in general is discussed.
Collapse
Affiliation(s)
- H M Goodman
- Department of Genetics, Harvard Medical School, Massachusetts General Hospital, Boston, MA 02114, USA
| | | | | |
Collapse
|
18
|
Abstract
We present an efficient algorithm for scoring clones given an ordering of probes under a schema proposed by Alizadeh et al. (1994) in the context of physical mapping with unique probes. The algorithm runs in time linear in the number of blocks of ones in the underlying sparse incidence matrix. A sparse and efficient algorithm for this task is important as it appears to be a central task in most algorithms for physical mapping.
Collapse
Affiliation(s)
- M Jain
- Department of Computer Science, University of Arizona, Tucson 85721, USA
| | | |
Collapse
|
19
|
Alizadeh F, Karp RM, Weisser DK, Zweig G. Physical mapping of chromosomes using unique probes. J Comput Biol 1995; 2:159-84. [PMID: 7497125 DOI: 10.1089/cmb.1995.2.159] [Citation(s) in RCA: 66] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
The goal of physical mapping of the genome is to reconstruct a strand of DNA given a collection of overlapping fragments, or clones, from the strand. We present several algorithms to infer how the clones overlap, given data about each clone. We focus on data used to map human chromosomes 21 and Y, in which relatively short substrings, or probes, are extracted from the ends of clones. The substrings are long enough to be unique with high probability. The data we are given is an incidence matrix of clones and probes. In the absence of error, the correct placement can be found easily using a PQ-tree. The data are never free from error, however, and algorithms are differentiated by their performance in the presence of errors. We approach errors from two angles: by detecting and removing them, and by using algorithms that are robust in the presence of errors. We have also developed a strategy to recover noiseless data through an interactive process that detects anomalies in the data and retests questionable entries in the incidence matrix of clones and probes. We evaluate the effectiveness of our algorithms empirically, using simulated data as well as real data from human chromosome 21.
Collapse
Affiliation(s)
- F Alizadeh
- International Computer Science Institute, Berkeley, CA 94705, USA
| | | | | | | |
Collapse
|
20
|
Balding DJ. Design and analysis of chromosome physical mapping experiments. Philos Trans R Soc Lond B Biol Sci 1994; 344:329-35. [PMID: 7800702 DOI: 10.1098/rstb.1994.0071] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Mathematical and statistical aspects of constructing ordered-clone physical maps of chromosomes are reviewed. Three broad problems are addressed: analysis of fingerprint data to identify configurations of overlapping clones, prediction of the rate of progress of a mapping strategy and optimal design of pooling schemes for screening large clone libraries.
Collapse
Affiliation(s)
- D J Balding
- School of Mathematical Sciences, Queen Mary & Westfield College, University of London, U.K
| |
Collapse
|
21
|
Abstract
Intermediate between DNA sequences and broad patterns of karyotypic change there is a major gap in understanding genome structure and evolution. The gap is at the megabase level between genes and chromosomes. New methods for analyzing large DNA fragments cloned in yeast or bacterial vectors provide experimental access to genome evolution at the megabase level by enabling the assembly of megabase-size contiguous regions. Genome evolution at the megabase level can also be studied using high-resolution genetic maps. Rates and patterns of genome evolution in mammals (mouse versus humans) and Drosophila (D. virilis versus D. melanogaster) are compared and contrasted. Opportunities for research in genome evolution using the new technologies are enumerated and discussed.
Collapse
Affiliation(s)
- D L Hartl
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138
| | | |
Collapse
|
22
|
Yoshida K, Strathmann MP, Mayeda CA, Martin CH, Palazzolo MJ. A simple and efficient method for constructing high resolution physical maps. Nucleic Acids Res 1993; 21:3553-62. [PMID: 8393991 PMCID: PMC331458 DOI: 10.1093/nar/21.15.3553] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
This paper describes a simple and efficient walking method for constructing high resolution physical maps and discusses its applications to genome analysis. The method is an integration of three strategies: (1) use of a highly redundant library of 3Kb-long subclones; (2) construction of a multidimensional pool from the library; (3) direct application of a PCR (polymerase chain reaction)-based screening technique to the pooled library, with two PCR primers, one from the end of the subcloning vector and the other from the leading edge of the walk. This technique allows not only detection of each overlapping subclone but simultaneous determination of its orientation and the size of its overlap. The end of the subclone with the smallest overlap is sequenced and a primer is designed for the next step in the walk. Iteration of the screening procedure with minimum overlapping subclones results in completion of the high resolution map. Using this method, a 3Kb-resolution map was constructed from an 80Kb region of the bithorax complex of Drosophila melanogaster. The method is general enough to be applicable to DNA from other species, and simple enough to be automated.
Collapse
Affiliation(s)
- K Yoshida
- Lawrence Berkeley Laboratory, University of California, Berkeley 94720
| | | | | | | | | |
Collapse
|
23
|
Zhang MQ, Marr TG. Genome mapping by nonrandom anchoring: a discrete theoretical analysis. Proc Natl Acad Sci U S A 1993; 90:600-4. [PMID: 8421694 PMCID: PMC45711 DOI: 10.1073/pnas.90.2.600] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
As part of our effort to construct a physical map of the genome of the fission yeast Schizosaccharomyces pombe, we have made theoretical predictions for the progress expected, as measured by the expected length fraction of island coverage and by the expected properties of the anchored islands such as the number and the size of islands. Our experimental strategy is to construct a random clone library and screen the library for clones having unique sequence at both ends. This scheme is essentially the same as the clone-limited double sequence-tagged-site selection scheme which was used in a computer simulation by Palazzolo et al. [Palazzolo, M. J., Sawyer, S. A., Martin, C. H., Smoller, D. A. & Hartl, D. L. (1991) Proc. Natl. Acad. Sci. USA 88, 8034-8038]. Both simulation and ongoing experiments in our laboratory have shown that the nonrandom anchoring method is far superior to random anchoring. In this paper, we propose a theoretical model to explain the simulated data and the experimental data.
Collapse
Affiliation(s)
- M Q Zhang
- Cold Spring Harbor Laboratory, NY 11724
| | | |
Collapse
|
24
|
Mandel JL, Monaco AP, Nelson DL, Schlessinger D, Willard H. Genome analysis and the human X chromosome. Science 1992; 258:103-9. [PMID: 1439756 DOI: 10.1126/science.1439756] [Citation(s) in RCA: 24] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
A unified genetic, physical, and functional map of the human X chromosome is being built through a concerted, international effort. About 40 percent of the 160 million base pairs of the X chromosome DNA have been cloned in overlapping, ordered contigs derived from yeast artificial chromosomes. This rapid progress toward a physical map is accelerating the identification of inherited disease genes, 26 of which are already cloned and more than 50 others regionally localized by linkage analysis. This article summarizes the mapping strategies now used and the impact of genome research on the understanding of X chromosome inactivation and X-linked diseases.
Collapse
Affiliation(s)
- J L Mandel
- Laboratoire de Genetique Moleculaire des Eucaryotes du CNRS, INSERM, Strasbourg, France
| | | | | | | | | |
Collapse
|
25
|
Evans GA, McElligott DL. Physical mapping of human chromosomes. GENETIC ENGINEERING 1992; 14:269-78. [PMID: 1368280 DOI: 10.1007/978-1-4615-3424-2_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Affiliation(s)
- G A Evans
- Molecular Genetics Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037
| | | |
Collapse
|
26
|
Marr TG, Yan X, Yu Q. Genomic mapping by single copy landmark detection: a predictive model with a discrete mathematical approach. Mamm Genome 1992; 3:644-9. [PMID: 1450514 DOI: 10.1007/bf00352482] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
One of the goals of the Human Genome Project is to produce libraries of largely contiguous, ordered sets of molecular clones for use in sequencing and gene mapping projects. This is planned to be done for human and many model organisms. Theory and practice have shown that long-range contiguity and the degree to which the entire genome is covered by ordered clones can be affected by many biological variables. Many laboratories are currently experimenting with different experimental strategies and theoretical models to help plan strategies for accomplishing long-range molecular mapping of genomes. Here we describe a new mathematical model and formulas for helping to plan genome mapping projects, using various single-copy landmark (SCL) detection, or "anchoring", strategies. We derive formulas that allow us to examine the effects of interactions among the following variables: average insert size of the cloning vector, average size of SCL, the number of SCL, and the redundancy in coverage of the clone library. We also examine and compare three different ways in which anchoring can be implemented: (1) anchors are selected independently of the library to be ordered (random anchoring); (2) anchors are made from end probes from both ends of clones in the library to be ordered (nonrandom anchoring); and (3) anchors are made from one end or the other, randomly, from clones in the library to be ordered (nonrandom anchoring). Our results show that, for biologically realistic conditions, nonrandom anchoring is always more effective than random anchoring for contig building, and there is little to be gained from making SCL from both ends of clones vs. only one end of clones.(ABSTRACT TRUNCATED AT 250 WORDS)
Collapse
Affiliation(s)
- T G Marr
- Cold Spring Harbor Laboratory, New York 11724
| | | | | |
Collapse
|
27
|
Abstract
An ultimate goal of Drosophila genetics is to identify and define the functions of all the genes in the organism. Traditional approaches based on the isolation of mutant genes have been extraordinary fruitful. Recent advances in the manipulation and analysis of large DNA fragments have made it possible to develop detailed molecular maps of the Drosophila genome as the initial steps in determining the complete DNA sequence.
Collapse
Affiliation(s)
- J Merriam
- Department of Biology, University of California, Los Angeles 90024
| | | | | | | |
Collapse
|