1
|
Abid HZ, Young E, McCaffrey J, Raseley K, Varapula D, Wang HY, Piazza D, Mell J, Xiao M. Customized optical mapping by CRISPR-Cas9 mediated DNA labeling with multiple sgRNAs. Nucleic Acids Res 2021; 49:e8. [PMID: 33231685 PMCID: PMC7826249 DOI: 10.1093/nar/gkaa1088] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Revised: 10/16/2020] [Accepted: 10/27/2020] [Indexed: 01/01/2023] Open
Abstract
Whole-genome mapping technologies have been developed as a complementary tool to provide scaffolds for genome assembly and structural variation analysis (1,2). We recently introduced a novel DNA labeling strategy based on a CRISPR-Cas9 genome editing system, which can target any 20bp sequences. The labeling strategy is specifically useful in targeting repetitive sequences, and sequences not accessible to other labeling methods. In this report, we present customized mapping strategies that extend the applications of CRISPR-Cas9 DNA labeling. We first design a CRISPR-Cas9 labeling strategy to interrogate and differentiate the single allele differences in NGG protospacer adjacent motifs (PAM sequence). Combined with sequence motif labeling, we can pinpoint the single-base differences in highly conserved sequences. In the second strategy, we design mapping patterns across a genome by selecting sets of specific single-guide RNAs (sgRNAs) for labeling multiple loci of a genomic region or a whole genome. By developing and optimizing a single tube synthesis of multiple sgRNAs, we demonstrate the utility of CRISPR-Cas9 mapping with 162 sgRNAs targeting the 2Mb Haemophilus influenzae chromosome. These CRISPR-Cas9 mapping approaches could be particularly useful for applications in defining long-distance haplotypes and pinpointing the breakpoints in large structural variants in complex genomes and microbial mixtures.
Collapse
MESH Headings
- Alleles
- Base Sequence
- Benzoxazoles/analysis
- CRISPR-Cas Systems
- Chromosome Mapping/methods
- Chromosomes, Bacterial/genetics
- Computer Simulation
- Conserved Sequence/genetics
- DNA-Directed RNA Polymerases
- Drug Resistance, Bacterial/genetics
- Fluorescent Dyes/analysis
- Gene Editing/methods
- Genome, Bacterial
- Genome, Human
- Haemophilus influenzae/drug effects
- Haemophilus influenzae/genetics
- Haplotypes/genetics
- Humans
- Lab-On-A-Chip Devices
- Nalidixic Acid/pharmacology
- Novobiocin/pharmacology
- Nucleotide Motifs/genetics
- Polymorphism, Single Nucleotide
- Quinolinium Compounds/analysis
- RNA, Guide, CRISPR-Cas Systems/chemical synthesis
- RNA, Guide, CRISPR-Cas Systems/genetics
- Repetitive Sequences, Nucleic Acid/genetics
- Sequence Alignment
- Staining and Labeling/methods
- Viral Proteins
Collapse
Affiliation(s)
- Heba Z Abid
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA
| | - Eleanor Young
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA
| | - Jennifer McCaffrey
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA
| | - Kaitlin Raseley
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA
| | - Dharma Varapula
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA
| | - Hung-Yi Wang
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA
| | - Danielle Piazza
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA
- Department of Microbiology and Immunology, College of Medicine, Drexel University, Philadelphia, PA, USA
- Center for Genomic Sciences, Institute of Molecular Medicine and Infectious Disease, Drexel University, Philadelphia, PA, USA
| | - Joshua Mell
- Department of Microbiology and Immunology, College of Medicine, Drexel University, Philadelphia, PA, USA
- Center for Genomic Sciences, Institute of Molecular Medicine and Infectious Disease, Drexel University, Philadelphia, PA, USA
| | - Ming Xiao
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA
- Center for Genomic Sciences, Institute of Molecular Medicine and Infectious Disease, Drexel University, Philadelphia, PA, USA
| |
Collapse
|
2
|
Trache A, Meininger GA. Total internal reflection fluorescence (TIRF) microscopy. ACTA ACUST UNITED AC 2008; Chapter 2:Unit 2A.2.1-2A.2.22. [PMID: 18729056 DOI: 10.1002/9780471729259.mc02a02s10] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Total internal reflection fluorescence (TIRF) microscopy represents a method of exciting and visualizing fluorophores present in the near-membrane region of live or fixed cells grown on coverslips. TIRF microscopy is based on the total internal reflection phenomenon that occurs when light passes from a high-refractive medium (e.g., glass) into a low-refractive medium (e.g., cell, water). The evanescent field produced by total internally reflected light excites the fluorescent molecules at the cell-substrate interface and is accompanied by minimal exposure of the remaining cell volume. This technique provides high-contrast fluorescence images, with very low background and virtually no out-of-focus light, ideal for visualization and spectroscopy of single-molecule fluorescence near a surface. This unit presents, in a concise manner, the principle of operation, instrument diversity, and TIRF microscopy applications for the study of biological samples.
Collapse
Affiliation(s)
- Andreea Trache
- Department of Systems Biology and Translational Medicine, College of Medicine, Texas A&M Health Science Center, College Station, Texas, USA
| | | |
Collapse
|
3
|
Kidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, Graves T, Hansen N, Teague B, Alkan C, Antonacci F, Haugen E, Zerr T, Yamada NA, Tsang P, Newman TL, Tüzün E, Cheng Z, Ebling HM, Tusneem N, David R, Gillett W, Phelps KA, Weaver M, Saranga D, Brand A, Tao W, Gustafson E, McKernan K, Chen L, Malig M, Smith JD, Korn JM, McCarroll SA, Altshuler DA, Peiffer DA, Dorschner M, Stamatoyannopoulos J, Schwartz D, Nickerson DA, Mullikin JC, Wilson RK, Bruhn L, Olson MV, Kaul R, Smith DR, Eichler EE. Mapping and sequencing of structural variation from eight human genomes. Nature 2008; 453:56-64. [PMID: 18451855 PMCID: PMC2424287 DOI: 10.1038/nature06862] [Citation(s) in RCA: 793] [Impact Index Per Article: 49.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2007] [Accepted: 02/15/2008] [Indexed: 11/08/2022]
Abstract
Genetic variation among individual humans occurs on many different scales, ranging from gross alterations in the human karyotype to single nucleotide changes. Here we explore variation on an intermediate scale--particularly insertions, deletions and inversions affecting from a few thousand to a few million base pairs. We employed a clone-based method to interrogate this intermediate structural variation in eight individuals of diverse geographic ancestry. Our analysis provides a comprehensive overview of the normal pattern of structural variation present in these genomes, refining the location of 1,695 structural variants. We find that 50% were seen in more than one individual and that nearly half lay outside regions of the genome previously described as structurally variant. We discover 525 new insertion sequences that are not present in the human reference genome and show that many of these are variable in copy number between individuals. Complete sequencing of 261 structural variants reveals considerable locus complexity and provides insights into the different mutational processes that have shaped the human genome. These data provide the first high-resolution sequence map of human structural variation--a standard for genotyping platforms and a prelude to future individual genome sequencing projects.
Collapse
Affiliation(s)
- Jeffrey M Kidd
- Department of Genome Sciences and Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
4
|
Hayden HS, Gillett W, Saenphimmachak C, Lim R, Zhou Y, Jacobs MA, Chang J, Rohmer L, D'Argenio DA, Palmieri A, Levy R, Haugen E, Wong GKS, Brittnacher MJ, Burns JL, Miller SI, Olson MV, Kaul R. Large-insert genome analysis technology detects structural variation in Pseudomonas aeruginosa clinical strains from cystic fibrosis patients. Genomics 2008; 91:530-7. [PMID: 18445516 DOI: 10.1016/j.ygeno.2008.02.005] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2007] [Revised: 02/26/2008] [Accepted: 02/28/2008] [Indexed: 11/16/2022]
Abstract
Large-insert genome analysis (LIGAN) is a broadly applicable, high-throughput technology designed to characterize genome-scale structural variation. Fosmid paired-end sequences and DNA fingerprints from a query genome are compared to a reference sequence using the Genomic Variation Analysis (GenVal) suite of software tools to pinpoint locations of insertions, deletions, and rearrangements. Fosmids spanning regions that contain new structural variants can then be sequenced. Clonal pairs of Pseudomonas aeruginosa isolates from four cystic fibrosis patients were used to validate the LIGAN technology. Approximately 1.5 Mb of inserted sequences were identified, including 743 kb containing 615 ORFs that are absent from published P. aeruginosa genomes. Six rearrangement breakpoints and 220 kb of deleted sequences were also identified. Our study expands the "genome universe" of P. aeruginosa and validates a technology that complements emerging, short-read sequencing methods that are better suited to characterizing single-nucleotide polymorphisms than structural variation.
Collapse
Affiliation(s)
- Hillary S Hayden
- Genome Center, University of Washington, Seattle, WA 98195, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Closing gaps in the human genome with fosmid resources generated from multiple individuals. Nat Genet 2007; 40:96-101. [PMID: 18157130 DOI: 10.1038/ng.2007.34] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2007] [Accepted: 09/19/2007] [Indexed: 01/31/2023]
Abstract
The human genome sequence has been finished to very high standards; however, more than 340 gaps remained when the finished genome was published by the International Human Genome Sequencing Consortium in 2004. Using fosmid resources generated from multiple individuals, we targeted gaps in the euchromatic part of the human genome. Here we report 2,488,842 bp of previously unknown euchromatic sequence, 363,114 bp of which close 26 of 250 euchromatic gaps, or 10%, including two remaining euchromatic gaps on chromosome 19. Eight (30.7%) of the closed gaps were found to be polymorphic. These sequences allow complete annotation of several human genes as well as the assignment of mRNAs. The gap sequences are 2.3-fold enriched in segmentally duplicated sequences compared to the whole genome. Our analysis confirms that not all gaps within 'finished' genomes are recalcitrant to subcloning and suggests that the paired-end-sequenced fosmid libraries could prove to be a rich resource for completion of the human euchromatic genome.
Collapse
|
6
|
Kidd JM, Newman TL, Tuzun E, Kaul R, Eichler EE. Population stratification of a common APOBEC gene deletion polymorphism. PLoS Genet 2007; 3:e63. [PMID: 17447845 PMCID: PMC1853121 DOI: 10.1371/journal.pgen.0030063] [Citation(s) in RCA: 199] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2006] [Accepted: 03/05/2007] [Indexed: 01/03/2023] Open
Abstract
The APOBEC3 gene family plays a role in innate cellular immunity inhibiting retroviral infection, hepatitis B virus propagation, and the retrotransposition of endogenous elements. We present a detailed sequence and population genetic analysis of a 29.5-kb common human deletion polymorphism that removes the APOBEC3B gene. We developed a PCR-based genotyping assay, characterized 1,277 human diversity samples, and found that the frequency of the deletion allele varies significantly among major continental groups (global FST = 0.2843). The deletion is rare in Africans and Europeans (frequency of 0.9% and 6%), more common in East Asians and Amerindians (36.9% and 57.7%), and almost fixed in Oceanic populations (92.9%). Despite a worldwide frequency of 22.5%, analysis of data from the International HapMap Project reveals that no single existing tag single nucleotide polymorphism may serve as a surrogate for the deletion variant, emphasizing that without careful analysis its phenotypic impact may be overlooked in association studies. Application of haplotype-based tests for selection revealed potential pitfalls in the direct application of existing methods to the analysis of genomic structural variation. These data emphasize the importance of directly genotyping structural variation in association studies and of accurately resolving variant breakpoints before proceeding with more detailed population-genetic analysis.
Collapse
Affiliation(s)
- Jeffrey M Kidd
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Tera L Newman
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Eray Tuzun
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Rajinder Kaul
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
7
|
Xiao M, Phong A, Ha C, Chan TF, Cai D, Leung L, Wan E, Kistler AL, DeRisi JL, Selvin PR, Kwok PY. Rapid DNA mapping by fluorescent single molecule detection. Nucleic Acids Res 2006; 35:e16. [PMID: 17175538 PMCID: PMC1807959 DOI: 10.1093/nar/gkl1044] [Citation(s) in RCA: 86] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
DNA mapping is an important analytical tool in genomic sequencing, medical diagnostics and pathogen identification. Here we report an optical DNA mapping strategy based on direct imaging of individual DNA molecules and localization of multiple sequence motifs on the molecules. Individual genomic DNA molecules were labeled with fluorescent dyes at specific sequence motifs by the action of nicking endonuclease followed by the incorporation of dye terminators with DNA polymerase. The labeled DNA molecules were then stretched into linear form on a modified glass surface and imaged using total internal reflection fluorescence (TIRF) microscopy. By determining the positions of the fluorescent labels with respect to the DNA backbone, the distribution of the sequence motif recognized by the nicking endonuclease can be established with good accuracy, in a manner similar to reading a barcode. With this approach, we constructed a specific sequence motif map of lambda-DNA. We further demonstrated the capability of this approach to rapidly type a human adenovirus and several strains of human rhinovirus.
Collapse
Affiliation(s)
- Ming Xiao
- Cardiovascular Research Institute and Center for Human Genetics, University of CaliforniaSan Francisco, CA 94115, USA
- To whom correspondence should be addressed at: 513, Parnassus Avenue, HSW-901A, San Francisco, CA 94143, USA. Tel: +1 41 551 43876; Fax: +1 41 547 62956;
| | - Angie Phong
- Cardiovascular Research Institute and Center for Human Genetics, University of CaliforniaSan Francisco, CA 94115, USA
| | - Connie Ha
- Cardiovascular Research Institute and Center for Human Genetics, University of CaliforniaSan Francisco, CA 94115, USA
| | - Ting-Fung Chan
- Cardiovascular Research Institute and Center for Human Genetics, University of CaliforniaSan Francisco, CA 94115, USA
| | - Dongmei Cai
- Cardiovascular Research Institute and Center for Human Genetics, University of CaliforniaSan Francisco, CA 94115, USA
| | - Lucinda Leung
- Cardiovascular Research Institute and Center for Human Genetics, University of CaliforniaSan Francisco, CA 94115, USA
| | - Eunice Wan
- Cardiovascular Research Institute and Center for Human Genetics, University of CaliforniaSan Francisco, CA 94115, USA
| | - Amy L. Kistler
- Department of Biochemistry and Biophysics, University of CaliforniaSan Francisco, CA 94115, USA
| | - Joseph L. DeRisi
- Department of Biochemistry and Biophysics, University of CaliforniaSan Francisco, CA 94115, USA
| | - Paul R. Selvin
- Department of Physics and Center of Biophysics, University of Illinois at Urbana-ChampaignUrbana, IL 61801, USA
| | - Pui-Yan Kwok
- Cardiovascular Research Institute and Center for Human Genetics, University of CaliforniaSan Francisco, CA 94115, USA
- Department of Dermatology, University of CaliforniaSan Francisco, CA 94115, USA
| |
Collapse
|
8
|
Yu J, Ni P, Wong GKS. Comparing the whole-genome-shotgun and map-based sequences of the rice genome. TRENDS IN PLANT SCIENCE 2006; 11:387-91. [PMID: 16843033 DOI: 10.1016/j.tplants.2006.06.005] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2006] [Revised: 05/03/2006] [Accepted: 06/28/2006] [Indexed: 05/10/2023]
Abstract
The rice genome has now been sequenced using whole-genome-shotgun and map-based methods. The relative merits of the two methods are the subject of debate, as they were in the human genome project. In this Opinion article, we will show that the serious discrepancies between the resultant sequences are mostly found in the large transposable elements such as copia and gypsy that populate the intergenic regions of plant genomes. Differences in published gene counts and polymorphism rates are similarly resolved by considering how transposable elements affect the sequence analysis.
Collapse
Affiliation(s)
- Jun Yu
- Beijing Institute of Genomics of Chinese Academy of Sciences, Beijing 101300, China
| | | | | |
Collapse
|
9
|
Kulasekara BR, Kulasekara HD, Wolfgang MC, Stevens L, Frank DW, Lory S. Acquisition and evolution of the exoU locus in Pseudomonas aeruginosa. J Bacteriol 2006; 188:4037-50. [PMID: 16707695 PMCID: PMC1482899 DOI: 10.1128/jb.02000-05] [Citation(s) in RCA: 88] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
ExoU is a potent Pseudomonas aeruginosa cytotoxin translocated into host cells by the type III secretion system. A comparison of genomes of various P. aeruginosa strains showed that that the ExoU determinant is found in the same polymorphic region of the chromosome near a tRNA(Lys) gene, suggesting that exoU is a horizontally acquired virulence determinant. We used yeast recombinational cloning to characterize four distinct ExoU-encoding DNA segments. We then sequenced and annotated three of these four genomic regions. The sequence of the largest DNA segment, named ExoU island A, revealed many plasmid- and genomic island-associated genes, most of which have been conserved across a broad set of beta- and gamma-Proteobacteria. Comparison of the sequenced ExoU-encoding genomic islands to the corresponding PAO1 tRNA(Lys)-linked genomic island, the pathogenicity islands of strain PA14, and pKLC102 of clone C strains allowed us to propose a mechanism for the origin and transmission of the ExoU determinant. The evolutionary history very likely involved transposition of the ExoU determinant onto a transmissible plasmid, followed by transfer of the plasmid into different P. aeruginosa strains. The plasmid subsequently integrated into a tRNA(Lys) gene in the chromosome of each recipient, where it acquired insertion sequences and underwent deletions and rearrangements. We have also applied yeast recombinational cloning to facilitate a targeted mutagenesis of ExoU island A, further demonstrating the utility of the specific features of the yeast capture vector for functional analyses of genes on large horizontally acquired genetic elements.
Collapse
Affiliation(s)
- Bridget R Kulasekara
- Department of Microbiology and Molecular Genetics, Harvard Medical School, 200 Longwood Avenue, Boston, MA 02115, USA
| | | | | | | | | | | |
Collapse
|
10
|
Newman TL, Rieder MJ, Morrison VA, Sharp AJ, Smith JD, Sprague LJ, Kaul R, Carlson CS, Olson MV, Nickerson DA, Eichler EE. High-throughput genotyping of intermediate-size structural variation. Hum Mol Genet 2006; 15:1159-67. [PMID: 16497726 DOI: 10.1093/hmg/ddl031] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
The contribution of large-scale and intermediate-size structural variation (ISV) to human genetic disease and disease susceptibility is only beginning to be understood. The development of high-throughput genotyping technologies is one of the most critical aspects for future studies of linkage disequilibrium (LD) and disease association. Using a simple PCR-based method designed to assay the junctions of the breakpoints, we genotyped seven simple insertion and deletion polymorphisms ranging in size from 6.3 to 24.7 kb among 90 CEPH individuals. We then extended this analysis to a larger collection of samples (n=460) by application of an oligonucleotide extension-ligation genotyping assay. The analysis showed a high level of concordance ( approximately 99%) when compared with PCR/sequence-validated genotypes. Using the available HapMap data, we observed significant LD (r2=0.74-0.95) between each ISV and flanking single nucleotide polymorphisms, but this observation is likely to hold only for similar simple insertion/deletion events. The approach we describe may be used to characterize a large number of individuals in a cost-effective manner once the sequence organization of ISVs is known.
Collapse
Affiliation(s)
- Tera L Newman
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Tuzun E, Sharp AJ, Bailey JA, Kaul R, Morrison VA, Pertz LM, Haugen E, Hayden H, Albertson D, Pinkel D, Olson MV, Eichler EE. Fine-scale structural variation of the human genome. Nat Genet 2005; 37:727-32. [PMID: 15895083 DOI: 10.1038/ng1562] [Citation(s) in RCA: 711] [Impact Index Per Article: 37.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2005] [Accepted: 04/01/2005] [Indexed: 02/04/2023]
Abstract
Inversions, deletions and insertions are important mediators of disease and disease susceptibility. We systematically compared the human genome reference sequence with a second genome (represented by fosmid paired-end sequences) to detect intermediate-sized structural variants >8 kb in length. We identified 297 sites of structural variation: 139 insertions, 102 deletions and 56 inversion breakpoints. Using combined literature, sequence and experimental analyses, we validated 112 of the structural variants, including several that are of biomedical relevance. These data provide a fine-scale structural variation map of the human genome and the requisite sequence precision for subsequent genetic studies of human disease.
Collapse
Affiliation(s)
- Eray Tuzun
- Department of Genome Sciences, University of Washington School of Medicine, 1705 NE Pacific Street, Seattle, Washington 98195, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Smith EE, Sims EH, Spencer DH, Kaul R, Olson MV. Evidence for diversifying selection at the pyoverdine locus of Pseudomonas aeruginosa. J Bacteriol 2005; 187:2138-47. [PMID: 15743962 PMCID: PMC1064051 DOI: 10.1128/jb.187.6.2138-2147.2005] [Citation(s) in RCA: 111] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Pyoverdine is the primary siderophore of the gram-negative bacterium Pseudomonas aeruginosa. The pyoverdine region was recently identified as the most divergent locus alignable between strains in the P. aeruginosa genome. Here we report the nucleotide sequence and analysis of more than 50 kb in the pyoverdine region from nine strains of P. aeruginosa. There are three divergent sequence types in the pyoverdine region, which correspond to the three structural types of pyoverdine. The pyoverdine outer membrane receptor fpvA may be driving diversity at the locus: it is the most divergent alignable gene in the region, is the only gene that showed substantial intratype variation that did not appear to be generated by recombination, and shows evidence of positive selection. The hypothetical membrane protein PA2403 also shows evidence of positive selection; residues on one side of the membrane after protein folding are under positive selection. R', previously identified as a type IV strain, is clearly derived from a type III strain via a 3.4-kb deletion which removes one amino acid from the pyoverdine side chain peptide. This deletion represents a natural modification of the product of a nonribosomal peptide synthetase enzyme, whose consequences are predictive from the DNA sequence. There is also linkage disequilibrium between the pyoverdine region and pvdY, a pyoverdine gene separated by 30 kb from the pyoverdine region. The pyoverdine region shows evidence of horizontal transfer; we propose that some alleles in the region were introduced from other soil bacteria and have been subsequently maintained by diversifying selection.
Collapse
Affiliation(s)
- Eric E Smith
- Program of Molecular and Cellular Biology, University of Washington, Seattle, WA 98195, USA.
| | | | | | | | | |
Collapse
|
13
|
Nguyen G, Bukanov N, Oshimura M, Smith CL. Cloneless genomic DNA analysis: an efficient and simple methods for de novo genomic sequencing projects and gap filling. ACTA ACUST UNITED AC 2005; 21:135-44. [PMID: 15748687 DOI: 10.1016/j.bioeng.2004.08.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2004] [Accepted: 08/19/2004] [Indexed: 11/23/2022]
Abstract
The utility of using genomic DNA directly in agarose, i.e. cloneless libraries, in place of large clone libraries, radiation hybrid panels, or chromosome dissection was demonstrated. The advantage of the cloneless library approach is that, in principle, a targeted genomic resource can be developed rapidly for any genomic region using any genomic DNA sample. Here, a human chromosome 20 Not I fragment library was generated by slicing a pulsed field gel lane containing fractionating Not I cleaved DNA from a monosomic hybrid cell line into 2 mm pieces. A reliable PCR method using agarose embedded DNA was developed. InterAlu PCR generated unique patterns of products from adjacent slices (e.g. fractions). Further, the specificity of the interAlu products was demonstrated by FISH analysis and in other hybridization experiments to arrayed interAlu products. STS content mapping was used to order the fractions and also demonstrate the unique content of the library fractions.
Collapse
Affiliation(s)
- Giang Nguyen
- Molecular Biotechnology Research Laboratory, Department of Biomedical Engineering, 36 Cummington Street, Boston, MA 02215, USA
| | | | | | | |
Collapse
|
14
|
Magrini V, Warren WC, Wallis J, Goldman WE, Xu J, Mardis ER, McPherson JD. Fosmid-based physical mapping of the Histoplasma capsulatum genome. Genome Res 2004; 14:1603-9. [PMID: 15289478 PMCID: PMC509269 DOI: 10.1101/gr.2361404] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
A fosmid library representing 10-fold coverage of the Histoplasma capsulatum G217B genome was used to construct a restriction-based physical map. The data obtained from three restriction endonuclease fingerprints, generated from each clone using BamHI, HindIII, and PstI endonucleases, were combined and used in FPC for automatic and manual contig assembly builds. Concomitantly, a whole-genome shotgun (WGS) sequencing of paired-end reads from plasmids and fosmids were assembled with PCAP, providing a predicted genome size of up to 43.5 Mbp and 17% repetitive DNA. Fosmid paired-end sequences in the WGS assembly provide anchoring information to the physical map and result in joining of existing physical map contigs into 84 clusters containing 9551 fosmid clones. Here, we detail mapping the Histoplasma capsulatum genome comprehensively in fosmids, resulting in an efficient paradigm for de novo sequencing that uses a map-assisted whole genome shotgun approach.
Collapse
Affiliation(s)
- Vincent Magrini
- Washington University School of Medicine, Genome Sequencing Center, St. Louis, Missouri 63108, USA
| | | | | | | | | | | | | |
Collapse
|
15
|
Ernst RK, D'Argenio DA, Ichikawa JK, Bangera MG, Selgrade S, Burns JL, Hiatt P, McCoy K, Brittnacher M, Kas A, Spencer DH, Olson MV, Ramsey BW, Lory S, Miller SI. Genome mosaicism is conserved but not unique in Pseudomonas aeruginosa isolates from the airways of young children with cystic fibrosis. Environ Microbiol 2004; 5:1341-9. [PMID: 14641578 DOI: 10.1111/j.1462-2920.2003.00518.x] [Citation(s) in RCA: 88] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Pseudomonas aeruginosa strains from the chronic lung infections of cystic fibrosis (CF) patients are phenotypically and genotypically diverse. Using strain PAO1 whole genome DNA microarrays, we assessed the genomic variation in P. aeruginosa strains isolated from young children with CF (6 months to 8 years of age) as well as from the environment. Eighty-nine to 97% of the PAO1 open reading frames were detected in 20 strains by microarray analysis, while subsets of 38 gene islands were absent or divergent. No specific pattern of genome mosaicism defined strains associated with CF. Many mosaic regions were distinguished by their low G + C content; their inclusion of phage related or pyocin genes; or by their linkage to a vgr gene or a tRNA gene. Microarray and phenotypic analysis of sequential isolates from individual patients revealed two deletions of greater than 100 kbp formed during evolution in the lung. The gene loss in these sequential isolates raises the possibility that acquisition of pyomelanin production and loss of pyoverdin uptake each may be of adaptive significance. Further characterization of P. aeruginosa diversity within the airways of individual CF patients may reveal common adaptations, perhaps mediated by gene loss, that suggest new opportunities for therapy.
Collapse
Affiliation(s)
- Robert K Ernst
- Department of Microbiology, University of Washington, Health Sciences Building, K-140, Box 357710, Seattle, WA 98195, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Yan HH, Mudge J, Kim DJ, Shoemaker RC, Cook DR, Young ND. Comparative physical mapping reveals features of microsynteny between Glycine max, Medicago truncatula, and Arabidopsis thaliana. Genome 2004; 47:141-55. [PMID: 15060611 DOI: 10.1139/g03-106] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
To gain insight into genomic relationships between soybean (Glycine max) and Medicago truncatula, eight groups of bacterial artificial chromosome (BAC) contigs, together spanning 2.60 million base pairs (Mb) in G. max and 1.56 Mb in M. truncatula, were compared through high-resolution physical mapping combined with sequence and hybridization analysis of low-copy BAC ends. Cross-hybridization among G. max and M. truncatula contigs uncovered microsynteny in six of the contig groups and extensive microsynteny in three. Between G. max homoeologous (within genome duplicate) contigs, 85% of coding and 75% of noncoding sequences were conserved at the level of cross-hybridization. By contrast, only 29% of sequences were conserved between G. max and M. truncatula, and some kilobase-scale rearrangements were also observed. Detailed restriction maps were constructed for 11 contigs from the three highly microsyntenic groups, and these maps suggested that sequence order was highly conserved between G. max duplicates and generally conserved between G. max and M. truncatula. One instance of homoeologous BAC contigs in M. truncatula was also observed and examined in detail. A sequence similarity search against the Arabidopsis thaliana genome sequence identified up to three microsyntenic regions in A. thaliana for each of two of the legume BAC contig groups. Together, these results confirm previous predictions of one recent genome-wide duplication in G. max and suggest that M. truncatula also experienced ancient large-scale genome duplications.
Collapse
Affiliation(s)
- H H Yan
- Department of Plant Pathology, University of Minnesota, St Paul, MN 55108, USA
| | | | | | | | | | | |
Collapse
|
17
|
Yan HH, Mudge J, Kim DJ, Larsen D, Shoemaker RC, Cook DR, Young ND. Estimates of conserved microsynteny among the genomes of Glycine max, Medicago truncatula and Arabidopsis thaliana. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2003; 106:1256-65. [PMID: 12748777 DOI: 10.1007/s00122-002-1183-y] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2002] [Accepted: 10/28/2002] [Indexed: 05/24/2023]
Abstract
A growing body of research indicates that microsynteny is common among dicot genomes. However, most studies focus on just one or a few genomic regions, so the extent of microsynteny across entire genomes remains poorly characterized. To estimate the level of microsynteny between Medicago truncatula (Mt) and Glycine max (soybean), and also among homoeologous segments of soybean, we used a hybridization strategy involving bacterial artificial chromosome (BAC) contigs. A Mt BAC library consisting of 30,720 clones was screened with a total of 187 soybean BAC subclones and restriction fragment length polymorphism (RFLP) probes. These probes came from 50 soybean contig groups, defined as one or more related BAC contigs anchored by the same low-copy probe. In addition, 92 whole soybean BAC clones were hybridized to filters of HindIII-digested Mt BAC DNA to identify additional cases of cross-hybridization after removal of those soybean BACs found to be repetitive in Mt. Microsynteny was inferred when at least two low-copy probes from a single soybean contig hybridized to the same Mt BAC or when a soybean BAC clone hybridized to three or more low-copy fragments from a single Mt BAC. Of the 50 soybean contig groups examined, 54% showed microsynteny to Mt. The degree of conservation among 37 groups of soybean contigs was also investigated. The results indicated substantial conservation among soybean contigs in the same group, with 86.5% of the groups showing at least some level of microsynteny. One contig group was examined in detail by a combination of physical mapping and comparative sequencing of homoeologous segments. A TBLASTX similarity search was performed between 1,085 soybean sequences on the 50 BAC contig groups and the entire Arabidopsis genome. Based on a criterion of sequence homologues <100 kb apart, each with an expected value of < or =1e-07, seven of the 50 soybean contig groups (14%) exhibited microsynteny with Arabidopsis.
Collapse
Affiliation(s)
- H H Yan
- Department of Plant Pathology, 495 Borlaug Hall, 1991 Upper Buford Circle, University of Minnesota, St. Paul 55108, USA
| | | | | | | | | | | | | |
Collapse
|
18
|
Zhang X, Yang H, Yu J, Chen C, Zhang G, Bao J, Du Y, Kibukawa M, Li Z, Wang J, Hu S, Dong W, Wang J, Gregersen N, Niebuhr E, Bolund L. Genomic organization, transcript variants and comparative analysis of the human nucleoporin 155 (NUP155) gene. Gene 2002; 288:9-18. [PMID: 12034489 DOI: 10.1016/s0378-1119(02)00470-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Nucleoporin 155 (Nup155) is a major component of the nuclear pore complex (NPC) involved in cellular nucleo-cytoplasmic transport. We have acquired the complete sequence and interpreted the genomic organization of the Nup155 orthologos from human (Homo sapiens) and pufferfish (Fugu rubripes), which are approximately 80 and 8 kb in length, respectively. The human gene is ubiquitously expressed in many tissues analyzed and has two major transcript variants, resulted from an alternative usage of the 5' cryptic or consensus splice donor in intron 1 and two polyadenylation signals. We have also cloned DNA complementary to RNAs of the Nup155 orthologs from Fugu and mouse. Comparative analysis of the Nup155 orthologs in many species, including H. sapiens, Mus musculus, Rattus norvegicus, F. rubripes, Arabidopsis thaliana, Drosophila melanogaster, and Saccharomyces cerevisiae, has revealed two paralogs in S. cerevisiae but only a single gene with increasing number of introns in more complex organisms. The amino acid sequences of the Nup155 orthologos are highly conserved in the evolution of eukaryotes. Different gene orders in the human and Fugu genomic regions harboring the Nup155 orthologs advocate cautious interpretation of synteny in comparative genomic analysis even within the vertebrate lineage.
Collapse
Affiliation(s)
- Xiuqing Zhang
- Human Genome Center, Institute of Genetics, Chinese Academy of Sciences, Datun Road, Beijing 100101, China
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Rowen L, Young J, Birditt B, Kaur A, Madan A, Philipps DL, Qin S, Minx P, Wilson RK, Hood L, Graveley BR. Analysis of the human neurexin genes: alternative splicing and the generation of protein diversity. Genomics 2002; 79:587-97. [PMID: 11944992 DOI: 10.1006/geno.2002.6734] [Citation(s) in RCA: 144] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The neurexins are neuronal proteins that function as cell adhesion molecules during synaptogenesis and in intercellular signaling. Although mammalian genomes contain only three neurexin genes, thousands of neurexin isoforms may be expressed through the use of two alternative promoters and alternative splicing at up to five different positions in the pre-mRNA. To begin understanding how the expression of the neurexin genes is regulated, we have determined the complete nucleotide sequence of all three human neurexin genes: NRXN1, NRXN2, and NRXN3. Unexpectedly, two of these, NRXN1 ( approximately 1.1 Mb) and NRXN3 ( approximately 1.7 Mb), are among the largest known human genes. In addition, we have identified several conserved intronic sequence elements that may participate in the regulation of alternative splicing. The sequences of these genes provide insight into the mechanisms used to generate the diversity of neurexin protein isoforms and raise several interesting questions regarding the expression mechanism of large genes.
Collapse
Affiliation(s)
- Lee Rowen
- Institute for Systems Biology, 1441 North 34th Street, Seattle, Washington 98103, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Raymond CK, Sims EH, Olson MV. Linker-mediated recombinational subcloning of large DNA fragments using yeast. Genome Res 2002; 12:190-7. [PMID: 11779844 PMCID: PMC155262 DOI: 10.1101/gr.205201] [Citation(s) in RCA: 60] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2001] [Accepted: 10/16/2001] [Indexed: 11/24/2022]
Abstract
The homologous recombination pathway in yeast is an ideal tool for the sequence-specific assembly of plasmids. Complementary 80-nucleotide oligonucleotides that overlap a vector and a target fragment were found to serve as efficient recombination linkers for fragment subcloning. Using electroporation, single-stranded 80-mers were adequate for routine plasmid construction. A cycloheximide-based counterselection was introduced to increase the specificity of cloning by homologous recombination relative to nonspecific vector background. Reconstruction experiments suggest this counterselection increased cloning specificity by 100-fold. Cycloheximide counterselection was used in conjunction with 80-bp linkers to subclone targeted regions from bacterial artificial chromosomes. This technology may find broad application in the final stages of completing the Human Genome Sequencing Project and in applications of BAC clones to the functional analysis of complex genomes.
Collapse
Affiliation(s)
- Christopher K Raymond
- The University of Washington Genome Center, Department of Medicine, University of Washington, Seattle, Washington 98115, USA.
| | | | | |
Collapse
|
21
|
Jiang S, Yu J, Wang J, Tan Z, Xue H, Feng G, He L, Yang H. Complete genomic sequence of 195 Kb of human DNA containing the gene GABRG2. DNA SEQUENCE : THE JOURNAL OF DNA SEQUENCING AND MAPPING 2001; 11:373-82. [PMID: 11328646 DOI: 10.3109/10425170009033988] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
GABA (gamma-aminobutyric acid), as the main inhibitory neurotransmitter in the brain, plays an essential role for the overall balance between neuronal excitation and inhibition by acting on GABAA receptors, which are ligand-gated chloride channels. Impaired GABAergic function contributes to certain forms of epilepsy, schizophrenia, Alzheimer's Disease, and other neurological disorders. In order to identify possible genetic features and to further study biological regulation of GABAA receptor genes whose promoter elements and sequence anomalies may contribute to epileptic disorders, as an initial step, we shot-gun sequenced a BAC clone, dj082c10 (195,909-bp in size), encompassing human gamma(2) subunit of GABAA receptor (GABRG2). It is, we believe, the first genomic sequence of the GABA receptor gamma subunit family. Four contigs were assembled from 2950 reads prior to gap in an average redundancy of eight folds over the entire region. The precision of the consensus sequence was predicted to be 99.999% after closing gaps and finishing weak regions. The nine exons of GABRG2 spans an 85-kb region that had 81 SINEs comprising 22.32%, and nine L1 elements comprising 3.40%, respectively. However, the density of L1 in the regions flanking GABRG2 gene (29.45% by 45 elements) is significantly higher than that within the gene. The length of GABRG2 introns varies in the range of 1.5 kb to 38.1 kb.
Collapse
Affiliation(s)
- S Jiang
- Bio-X Life Science Research Center, Shanghai Jiao Tong University, China
| | | | | | | | | | | | | | | |
Collapse
|
22
|
Hu E, Chen Z, Fredrickson TA, Spurr N, Gentle S, Sims M, Zhu Y, Halsey W, Mao J, Sathe GM, Brooks DP. Rapid isolation of tissue-specific genes from rat kidney. EXPERIMENTAL NEPHROLOGY 2001; 9:156-64. [PMID: 11150865 DOI: 10.1159/000052607] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
A systematic effort to isolate kidney-specific genes was performed using recently described PCR-select methodology. Using this technique, a kidney-specific mini-gene library was generated and a number of kidney-specific genes that share significant homology to previously characterized kidney genes from rats and other species were isolated. These included three renal-specific transporters (an ADH water channel, the anion transporters RST and ROAT1), a cell adhesion molecule (K-cadherin) and a kidney-specific protein upregulated in renal carcinoma (DD96). In addition, we isolated two novel genes from a rat kidney. One of the genes shares limited homology to rat profilin-1 while the other did not share any similarity to genes in the Genbank. Northern blot analysis revealed that the mRNA for each of these genes is expressed in a highly kidney-restricted fashion. Our results suggested that tissue-specific genes can be rapidly isolated and characterized using PCR-select techniques and this methodology may be generally applicable to isolate specific genes from a variety of tissues.
Collapse
Affiliation(s)
- E Hu
- Department of Renal Pharmacology, SmithKline Beecham Pharmaceuticals, King of Prussia, PA 19406, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Cocchia M, Kouprina N, Kim SJ, Larionov V, Schlessinger D, Nagaraja R. Recovery and potential utility of YACs as circular YACs/BACs. Nucleic Acids Res 2000; 28:E81. [PMID: 10954614 PMCID: PMC110718 DOI: 10.1093/nar/28.17.e81] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
A method has been established to convert pYAC4-based linear yeast artificial chromosomes (YACs) into circular chromosomes that can also be propagated in Escherichia coli cells as bacterial artificial chromosomes (BACs). The circularization is based on use of a vector that contains a yeast dominant selectable marker (G418R), a BAC cassette and short targeting sequences adjacent to the edges of the insert in the pYAC4 vector. When it is introduced into yeast, the vector recombines with the YAC target sequences to form a circular molecule, retaining the insert but discarding most of the sequences of the YAC telomeric arms. YACs up to 670 kb can be efficiently circularized using this vector. Re-isolation of megabase-size YAC inserts as a set of overlapping circular YAC/BACs, based on the use of an Alu-containing targeting vector, is also described. We have shown that circular DNA molecules up to 250 kb can be efficiently and accurately transferred into E.coli cells by electroporation. Larger circular DNAs cannot be moved into bacterial cells, but can be purified away from linear yeast chromosomes. We propose that the described system for generation of circular YAC derivatives can facilitate sequencing as well as functional analysis of genomic regions.
Collapse
Affiliation(s)
- M Cocchia
- Laboratory of Genetics, NIA, NIH, 333 Cassell Drive, Suite 4000, Baltimore, MD 21224, USA
| | | | | | | | | | | |
Collapse
|
24
|
Huang GM. High-throughput DNA sequencing: a genomic data manufacturing process. DNA SEQUENCE : THE JOURNAL OF DNA SEQUENCING AND MAPPING 2000; 10:149-53. [PMID: 10647816 DOI: 10.3109/10425179909033940] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
The progress trends in automated DNA sequencing operation are reviewed. Technological development in sequencing instruments, enzymatic chemistry and robotic stations has resulted in ever-increasing capacity of sequence data production. This progress leads to a higher demand on laboratory information management and data quality assessment. High-throughput laboratories face the challenge of organizational management, as well as technology management. Engineering principles of process control should be adopted in this biological data manufacturing procedure. While various systems attempt to provide solutions to automate different parts of, or even the entire process, new technical advances will continue to change the paradigm and provide new challenges.
Collapse
Affiliation(s)
- G M Huang
- Pangea Systems, Inc., Oakland, CA 94612, USA.
| |
Collapse
|
25
|
Marra M, Kucaba T, Sekhon M, Hillier L, Martienssen R, Chinwalla A, Crockett J, Fedele J, Grover H, Gund C, McCombie WR, McDonald K, McPherson J, Mudd N, Parnell L, Schein J, Seim R, Shelby P, Waterston R, Wilson R. zA map for sequence analysis of the Arabidopsis thaliana genome. Nat Genet 1999; 22:265-70. [PMID: 10391214 DOI: 10.1038/10327] [Citation(s) in RCA: 99] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Arabidopsis thaliana has emerged as a model system for studies of plant genetics and development, and its genome has been targeted for sequencing by an international consortium (the Arabidopsis Genome Initiative; http://genome-www. stanford.edu/Arabidopsis/agi.html). To support the genome-sequencing effort, we fingerprinted more than 20,000 BACs (ref. 2) from two high-quality publicly available libraries, generating an estimated 17-fold redundant coverage of the genome, and used the fingerprints to nucleate assembly of the data by computer. Subsequent manual revision of the assemblies resulted in the incorporation of 19,661 fingerprinted BACs into 169 ordered sets of overlapping clones ('contigs'), each containing at least 3 clones. These contigs are ideal for parallel selection of BACs for large-scale sequencing and have supported the generation of more than 5.8 Mb of finished genome sequence submitted to GenBank; analysis of the sequence has confirmed the integrity of contigs constructed using this fingerprint data. Placement of contigs onto chromosomes can now be performed, and is being pursued by groups involved in both sequencing and positional cloning studies. To our knowledge, these data provide the first example of whole-genome random BAC fingerprint analysis of a eucaryote, and have provided a model essential to efforts aimed at generating similar databases of fingerprint contigs to support sequencing of other complex genomes, including that of human.
Collapse
Affiliation(s)
- M Marra
- Washington University Genome Sequencing Center, St Louis, Missouri 63108, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Affiliation(s)
- W Jang
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA.
| | | | | | | |
Collapse
|
27
|
Sasinowska H, Sasinowski M. An algorithm for the assembly of robust physical maps based on a combination of multi-level hybridization data and fingerprinting data. COMPUTERS & CHEMISTRY 1999; 23:251-62. [PMID: 10627143 DOI: 10.1016/s0097-8485(99)00018-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
We have developed an algorithm which combines data obtained from restriction digestion experiments and hybridization experiments to construct robust physical maps of whole chromosomes. The algorithm has been incorporated into a program which accepts hybridization data consisting of an unordered hybridization matrix and fingerprinting data containing band coordinates for each clone. The combined data is used to produce a non-redundant, ordered matrix which can be further reduced to represent a minimum tile coverage of the chromosome. In addition, the method also takes into account multi-level hybridization events which allows for an improved treatment of the hybridization data. The program is evaluated against several other contig building programs using simulated and real data sets. Finally, it is applied to construct a physical map of the 4.1 Mb genome of Ochrobactrum anthropi based on 1387 clones and 70 probes, as well as 624 fingerprints.
Collapse
Affiliation(s)
- H Sasinowska
- Department of Mathematical Sciences, Clemson University, Clemson, SC 29634, USA.
| | | |
Collapse
|
28
|
Ding Y, Johnson MD, Colayco R, Chen YJ, Melnyk J, Schmitt H, Shizuya H. Contig assembly of bacterial artificial chromosome clones through multiplexed fluorescence-labeled fingerprinting. Genomics 1999; 56:237-46. [PMID: 10087190 DOI: 10.1006/geno.1998.5734] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
A rapid multiplexed fingerprinting method has been developed for bacterial artificial chromosome (BAC) contig assembly. Defined subsets of BAC DNA fragments that result from digestion by three paired restriction endonucleases are labeled with unique fluorescent F-ddATP for each subset. Lists of the labeled fragment size are generated by an ABI 377 DNA sequencer and the GeneScan analysis software and then processed by an assembly program, FPC (Fingerprinted Contigs), to produce contig maps. Data obtained from the multiplexed labeling permit detection of smaller overlaps than is observed when data from a single double-digest are analyzed. The method has been tested on 98 BACs from chromosome 22 regions where large-scale sequencing is under way and also through simulation, using randomly generated BAC clones derived from existing DNA sequence data. In each case, contig assembly results demonstrated the advantages of multiplexed fingerprinting.
Collapse
Affiliation(s)
- Y Ding
- Beckman Institute, Division of Biology, 139-74, California Institute of Technology, Pasadena, California 91125, USA
| | | | | | | | | | | | | |
Collapse
|
29
|
Siegel AF, Trask B, Roach JC, Mahairas GG, Hood L, van den Engh G. Analysis of Sequence-Tagged-Connector Strategies for DNA Sequencing. Genome Res 1999. [DOI: 10.1101/gr.9.3.297] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The BAC-end sequencing, or sequence-tagged-connector (STC), approach to genome sequencing involves sequencing the ends of BAC inserts to scatter sequence tags (STCs) randomly across the genome. Once any BAC or other large segment of DNA is sequenced to completion by conventional shotgun approaches, these STC tags can be used to identify a minimum tiling path of BAC clones overlapping the nucleation sequence for sequence extension. Here, we explore the properties of STC-sequencing strategies within a mathematical model of a random target with homologous repeats and imperfect sequencing technology to understand the consequences of varying various parameters on the incidence of problem clones and the cost of the sequencing project. Problem clones are defined as clones for which either (A) there is no identifiable overlapping STC to extend the sequence in a particular direction or (B) the identified STC with minimum overlap comes from a nonoverlapping clone, either owing to random false matches or repeat-family homology. Based on the minimum overlap, we estimate the number of clones to be entirely sequenced and, then, using cost estimates, identify the decision rule (the degree of sequence similarity required before a match is declared between an STC and a clone) to minimize overall sequencing cost. A method to optimize the overlap decision rule is highly desirable, because both the total cost and the number of problem clones are shown to be highly sensitive to this choice. For a target of 3 Gb containing ∼800 Mb of repeats with 85%–90% identity, we expect <10 problem clones with 15 times coverage by 150-kb clones. We derive the optimal redundancy and insert sizes of clone libraries for sequencing genomes of various sizes, from microbial to human. We estimate that establishing the resource of STCs as a means of identifying minimally overlapping clones represents only 1%–3% of the total cost of sequencing the human genome, and, up to a point of diminishing returns, a larger STC resource is associated with a smaller total sequencing cost.
Collapse
|
30
|
Siegel AF, Trask B, Roach JC, Mahairas GG, Hood L, van den Engh G. Analysis of sequence-tagged-connector strategies for DNA sequencing. Genome Res 1999; 9:297-307. [PMID: 10077536 PMCID: PMC310733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
Abstract
The BAC-end sequencing, or sequence-tagged-connector (STC), approach to genome sequencing involves sequencing the ends of BAC inserts to scatter sequence tags (STCs) randomly across the genome. Once any BAC or other large segment of DNA is sequenced to completion by conventional shotgun approaches, these STC tags can be used to identify a minimum tiling path of BAC clones overlapping the nucleation sequence for sequence extension. Here, we explore the properties of STC-sequencing strategies within a mathematical model of a random target with homologous repeats and imperfect sequencing technology to understand the consequences of varying various parameters on the incidence of problem clones and the cost of the sequencing project. Problem clones are defined as clones for which either (A) there is no identifiable overlapping STC to extend the sequence in a particular direction or (B) the identified STC with minimum overlap comes from a nonoverlapping clone, either owing to random false matches or repeat-family homology. Based on the minimum overlap, we estimate the number of clones to be entirely sequenced and, then, using cost estimates, identify the decision rule (the degree of sequence similarity required before a match is declared between an STC and a clone) to minimize overall sequencing cost. A method to optimize the overlap decision rule is highly desirable, because both the total cost and the number of problem clones are shown to be highly sensitive to this choice. For a target of 3 Gb containing approximately 800 Mb of repeats with 85%-90% identity, we expect <10 problem clones with 15 times coverage by 150-kb clones. We derive the optimal redundancy and insert sizes of clone libraries for sequencing genomes of various sizes, from microbial to human. We estimate that establishing the resource of STCs as a means of identifying minimally overlapping clones represents only 1%-3% of the total cost of sequencing the human genome, and, up to a point of diminishing returns, a larger STC resource is associated with a smaller total sequencing cost.
Collapse
Affiliation(s)
- A F Siegel
- Department of Molecular Biotechnology, University of Washington, Seattle, Washington 98195 USA.
| | | | | | | | | | | |
Collapse
|
31
|
Stephen Lasky LR, Hood L. Deciphering Genomes Through Automated Large-scale Sequencing. J Microbiol Methods 1999. [DOI: 10.1016/s0580-9517(08)70204-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
|
32
|
Thayer EC, Olson MV, Karp RM. Error Checking and Graphical Representation of Multiple–Complete–Digest (MCD) Restriction-Fragment Maps. Genome Res 1999. [DOI: 10.1101/gr.9.1.79] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Genetic and physical maps display the relative positions of objects or markers occurring within a target DNA molecule. In constructing maps, the primary objective is to determine the ordering of these objects. A further objective is to assign a coordinate to each object, indicating its distance from a reference end of the target molecule. This paper describes a computational method and a body of software for assigning coordinates to map objects, given a solution or partial solution to the ordering problem. We describe our method in the context of multiple–complete–digest (MCD) mapping, but it should be applicable to a variety of other mapping problems. Because of errors in the data or insufficient clone coverage to uniquely identify the true ordering of the map objects, a partial ordering is typically the best one can hope for. Once a partial ordering has been established, one often seeks to overlay a metric along the map to assess the distances between the map objects. This problem often proves intractable because of data errors such as erroneous local length measurements (e.g., large clone lengths on low-resolution physical maps). We present a solution to the coordinate assignment problem for MCD restriction-fragment mapping, in which a coordinated set of single-enzyme restriction maps are simultaneously constructed. We show that the coordinate assignment problem can be expressed as the solution of a system of linear constraints. If the linear system is free of inconsistencies, it can be solved using the standard Bellman–Ford algorithm. In the more typical case where the system is inconsistent, our program perturbs it to find a new consistent system of linear constraints, close to those of the given inconsistent system, using a modified Bellman–Ford algorithm. Examples are provided of simple map inconsistencies and the methods by which our program detects candidate data errors and directs the user to potential suspect regions of the map.
Collapse
|
33
|
Thayer EC, Olson MV, Karp RM. Error checking and graphical representation of multiple-complete-digest (MCD) restriction-fragment maps. Genome Res 1999; 9:79-90. [PMID: 9927487 PMCID: PMC310706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/1998] [Accepted: 12/01/1998] [Indexed: 02/10/2023]
Abstract
Genetic and physical maps display the relative positions of objects or markers occurring within a target DNA molecule. In constructing maps, the primary objective is to determine the ordering of these objects. A further objective is to assign a coordinate to each object, indicating its distance from a reference end of the target molecule. This paper describes a computational method and a body of software for assigning coordinates to map objects, given a solution or partial solution to the ordering problem. We describe our method in the context of multiple-complete-digest (MCD) mapping, but it should be applicable to a variety of other mapping problems. Because of errors in the data or insufficient clone coverage to uniquely identify the true ordering of the map objects, a partial ordering is typically the best one can hope for. Once a partial ordering has been established, one often seeks to overlay a metric along the map to assess the distances between the map objects. This problem often proves intractable because of data errors such as erroneous local length measurements (e.g., large clone lengths on low-resolution physical maps). We present a solution to the coordinate assignment problem for MCD restriction-fragment mapping, in which a coordinated set of single-enzyme restriction maps are simultaneously constructed. We show that the coordinate assignment problem can be expressed as the solution of a system of linear constraints. If the linear system is free of inconsistencies, it can be solved using the standard Bellman-Ford algorithm. In the more typical case where the system is inconsistent, our program perturbs it to find a new consistent system of linear constraints, close to those of the given inconsistent system, using a modified Bellman-Ford algorithm. Examples are provided of simple map inconsistencies and the methods by which our program detects candidate data errors and directs the user to potential suspect regions of the map.
Collapse
Affiliation(s)
- E C Thayer
- University of Washington Genome Center, Seattle, Washington 98195, USA
| | | | | |
Collapse
|
34
|
Guillaudeux T, Janer M, Wong GK, Spies T, Geraghty DE. The complete genomic sequence of 424,015 bp at the centromeric end of the HLA class I region: gene content and polymorphism. Proc Natl Acad Sci U S A 1998; 95:9494-9. [PMID: 9689108 PMCID: PMC21366 DOI: 10.1073/pnas.95.16.9494] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
We report here the genomic sequence of the centromeric portion of HLA class I, extending 424,015 bp from tumor necrosis factor alpha to a newly identified gene approximately 20 kb telomeric of Otf-3. As a source of DNA, we used cosmids centromeric of HLA-B that had been mapped previously with conventional restriction digestion and fingerprinting and previously characterized yeast artificial chromosomes subcloned into cosmids and mapped with multiple complete digest methodologies. The data presented provide a description of the gene content of centromeric HLA class I including new data on intron, promoter and flanking sequences of previously described genes, and a description of putative new genes that remain to be characterized beyond the structural information uncovered. A complete accounting of the repeat structure including abundant di-, tri-, and tetranucleotide microsatellite loci yielded access to precisely localized mapping tools for the major histocompatibility complex. Comparative analysis of a highly polymorphic region between HLA-B and -C was carried out by sequencing over 40 kb of overlapping sequence from two haplotypes. The levels of variation observed were much higher than those seen in other regions of the genome and indeed were higher than those observed between allelic HLA class I loci.
Collapse
Affiliation(s)
- T Guillaudeux
- The Clinical Research Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue, D2-100, Seattle WA 98109, USA
| | | | | | | | | |
Collapse
|
35
|
Janer M, Geraghty DE. The human major histocompatibility complex: 42,221 bp of genomic sequence, high-density sequence-tagged site map, evolution, and polymorphism for HLA class I. Genomics 1998; 51:35-44. [PMID: 9693031 DOI: 10.1006/geno.1998.5377] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We report the isolation and characterization of newly identified yeast artificial chromosome (YAC) and bacterial artificial chromosome (BAC) clones spanning the HLA class I region between HLA-C and HLA-E and of YACs extending telomeric of HLA-F. When included with previously characterized HLA class I YACs, a contiguous stretch of over 2.4 Mb pairs including the entire class I region has been isolated as a series of overlapping YAC and BAC clones. Evidence that the cloned DNA faithfully represents the source genomic DNA was obtained by extensive characterization of the YACs and by independent isolation of two or more overlapping YACs or BACs spanning the entire region. As a result of this work, over 80 unique sequence probes were identified, the majority of which were sequenced to yield 42,221 bp of new major histocompatibility complex (MHC)-derived sequence. Some of these data were reduced to sequenced tagged site primer sets, facilitating the isolation of all or nearly all of HLA class I from a variety of genomic libraries. The sequence data were analyzed for protein coding capacity and homology to existing expressed tagged sites and tested for conservation of sequences in other mammalian genomes. These results indicated that large portions of the HLA class I region are conserved among mammals. Measurements of polymorphism within non-HLA class I loci generated additional data pointing toward information of potential relevance to MHC-associated diseases. The combined data and clones presented here set the stage for the determination of the complete nucleotide sequence of HLA class I.
Collapse
Affiliation(s)
- M Janer
- Clinical Research Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue N., Seattle, Washington, 98109, USA
| | | |
Collapse
|
36
|
Trask BJ, Friedman C, Martin-Gallardo A, Rowen L, Akinbami C, Blankenship J, Collins C, Giorgi D, Iadonato S, Johnson F, Kuo WL, Massa H, Morrish T, Naylor S, Nguyen OT, Rouquier S, Smith T, Wong DJ, Youngblom J, van den Engh G. Members of the olfactory receptor gene family are contained in large blocks of DNA duplicated polymorphically near the ends of human chromosomes. Hum Mol Genet 1998; 7:13-26. [PMID: 9384599 DOI: 10.1093/hmg/7.1.13] [Citation(s) in RCA: 164] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
We have identified three new members of the olfactory receptor (OR) gene family within a large segment of DNA that is duplicated with high similarity near many human telomeres. This segment is present at 3q, 15q, and 19p in each of 45 unrelated humans sampled from various populations. Additional copies are present polymorphically at 11 other subtelomeric locations. The frequency with which the block is present at some locations varies among populations. While humans carry seven to 11 copies of the OR-containing block, it is located in chimpanzee and gorilla predominantly at a single site, which is not orthologous to any of the locations in the human genome. The observation that sequences flanking the OR-containing segment are duplicated on larger and different sets of chromosomes than the OR block itself demonstrates that the segment is part of a much larger, complex patchwork of subtelomeric duplications. The population analyses and structural results suggest the types of processes that have shaped these regions during evolution. From its sequence, one of the OR genes in this duplicated block appears to be potentially functional. Our findings raise the possibility that functional diversity in the OR family is generated in part through duplications and inter-chromosomal rearrangements of the DNA near human telomeres.
Collapse
Affiliation(s)
- B J Trask
- Department of Molecular Biotechnology, Box 357730, University of Washington, Seattle, WA 98195, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Affiliation(s)
- J D McPherson
- Department of Genetics and the Genome Sequencing Center, Washington University School of Medicine, St. Louis, Missouri 63108 USA.
| |
Collapse
|
38
|
Marra MA, Kucaba TA, Dietrich NL, Green ED, Brownstein B, Wilson RK, McDonald KM, Hillier LW, McPherson JD, Waterston RH. High throughput fingerprint analysis of large-insert clones. Genome Res 1997; 7:1072-84. [PMID: 9371743 PMCID: PMC310686 DOI: 10.1101/gr.7.11.1072] [Citation(s) in RCA: 316] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/1997] [Accepted: 09/12/1997] [Indexed: 02/05/2023]
Abstract
As part of the Human Genome Project, the Washington University Genome Sequencing Center has commenced systematic sequencing of human chromsome 7. To organize and supply the effort, we have undertaken the construction of sequence-ready physical maps for defined chromosomal intervals. Map construction is a serial process composed of three main activities. First, candidate STS-positive large-insert PAC and BAC clones are identified. Next, these candidate clones are subjected to fingerprint analysis. Finally, the fingerprint data are used to assemble sequence-ready maps. The fingerprinting method we have devised is key to the success of the overall approach. We present here the details of the method and show that the fingerprints are of sufficient quality to permit the construction of megabase-size contigs in defined regions of the human genome. We anticipate that the high throughput and precision characteristic of our fingerprinting method will make it of general utility.
Collapse
Affiliation(s)
- M A Marra
- Washington University School of Medicine, Genome Sequencing Center, St. Louis, Missouri 63108, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Bouffard GG, Idol JR, Braden VV, Iyer LM, Cunningham AF, Weintraub LA, Touchman JW, Mohr-Tidwell RM, Peluso DC, Fulton RS, Ueltzen MS, Weissenbach J, Magness CL, Green ED. A physical map of human chromosome 7: an integrated YAC contig map with average STS spacing of 79 kb. Genome Res 1997; 7:673-92. [PMID: 9253597 DOI: 10.1101/gr.7.7.673] [Citation(s) in RCA: 67] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
The construction of highly integrated and annotated physical maps of human chromosomes represents a critical goal of the ongoing Human Genome Project. Our laboratory has focused on developing a physical map of human chromosome 7, a approximately 170-Mb segment of DNA that corresponds to an estimated 5% of the human genome. Using a yeast artificial chromosome (YAC)-based sequence-tagged site (STS)-content mapping strategy, 2150 chromosome 7-specific STSs have been established and mapped to a collection of YACs highly enriched for chromosome 7 DNA. The STSs correspond to sequences generated from a variety of DNA sources, with particular emphasis placed on YAC insert ends, genetic markers, and genes. The YACs include a set of relatively nonchimeric clones from a human-hamster hybrid cell line as well as clones isolated from total genomic libraries. For map integration, we have localized 260 STSs corresponding to Genethon genetic markers and 259 STSs corresponding to markers orders by radiation hybrid (RH) mapping on our YAC contigs. Analysis of the data with the program SEGMAP results in the assembly of 22 contigs that are "anchored" on the Genethon genetic map, the RH map, and/or the cytogenetic map. These 22 contigs are ordered relative to one another, are (in all but 3 cases) oriented relative to the centromere and telomeres, and contain > 98% of the mapped STSs. The largest anchored YAC contig, accounting for most of 7p, contains 634 STSs and 1260 YACs. An additional 14 contigs, accounting for approximately 1.5% of the mapped STSs, are assembled but remain unanchored on either the genetic or RH map. Therefore, these 14 "orphan" contigs are not ordered relative to other contigs. In our contig maps, adjacent STSs are connected by two or more YACs in > 95% of cases. With 2150 mapped STSs, our map provides an average STS spacing of approximately 79 kb. The physical map we report here exceeds the goal of 100-kb average STS spacing and should provide an excellent framework for systematic sequencing of the chromosome.
Collapse
Affiliation(s)
- G G Bouffard
- Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|