1
|
Midha MK, Wu M, Chiu KP. Long-read sequencing in deciphering human genetics to a greater depth. Hum Genet 2019; 138:1201-1215. [PMID: 31538236 DOI: 10.1007/s00439-019-02064-y] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Accepted: 09/13/2019] [Indexed: 12/12/2022]
Abstract
Through four decades' development, DNA sequencing has inched into the era of single-molecule sequencing (SMS), or the third-generation sequencing (TGS), as represented by two distinct technical approaches developed independently by Pacific Bioscience (PacBio) and Oxford Nanopore Technologies (ONT). Historically, each generation of sequencing technologies was marked by innovative technological achievements and novel applications. Long reads (LRs) are considered as the most advantageous feature of SMS shared by both PacBio and ONT to distinguish SMS from next-generation sequencing (NGS, or the second-generation sequencing) and Sanger sequencing (the first-generation sequencing). Long reads overcome the limitations of NGS and drastically improves the quality of genome assembly. Besides, ONT also contributes several unique features including ultra-long reads (ULRs) with read length above 300 kb and some close to 1 million bp, direct RNA sequencing and superior portability as made possible by pocket-sized MinION sequencer. Here, we review the history of DNA sequencing technologies and associated applications, with a special focus on the advantages as well as the limitations of ULR sequencing in genome assembly.
Collapse
Affiliation(s)
- Mohit K Midha
- Genomics Research Center, Academia Sinica, 128 Academia Road, Sec. 2, Nankang District, Taipei, 115, Taiwan.,Institute of Biochemistry and Molecular Biology, National Yang-Ming University, Taipei, Taiwan
| | - Mengchu Wu
- Health GeneTech, 22F No. 99, Xin Pu 6th St., Taoyuan, Taiwan
| | - Kuo-Ping Chiu
- Genomics Research Center, Academia Sinica, 128 Academia Road, Sec. 2, Nankang District, Taipei, 115, Taiwan. .,Institute of Biochemistry and Molecular Biology, National Yang-Ming University, Taipei, Taiwan. .,Department of Life Sciences, College of Life Sciences, National Taiwan University, Taipei, Taiwan.
| |
Collapse
|
2
|
Huang M, Tu J, Lu Z. Recent Advances in Experimental Whole Genome Haplotyping Methods. Int J Mol Sci 2017; 18:E1944. [PMID: 28891974 PMCID: PMC5618593 DOI: 10.3390/ijms18091944] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2017] [Revised: 09/01/2017] [Accepted: 09/05/2017] [Indexed: 01/06/2023] Open
Abstract
Haplotype plays a vital role in diverse fields; however, the sequencing technologies cannot resolve haplotype directly. Pioneers demonstrated several approaches to resolve haplotype in the early years, which was extensively reviewed. Since then, numerous methods have been developed recently that have significantly improved phasing performance. Here, we review experimental methods that have emerged mainly over the past five years, and categorize them into five classes according to their maximum scale of contiguity: (i) encapsulation, (ii) 3D structure capture and construction, (iii) compartmentalization, (iv) fluorography, (v) long-read sequencing. Several subsections of certain methods are attached to each class as instances. We also discuss the relative advantages and disadvantages of different classes and make comparisons among representative methods of each class.
Collapse
Affiliation(s)
- Mengting Huang
- State Key Lab of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China.
| | - Jing Tu
- State Key Lab of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China.
| | - Zuhong Lu
- State Key Lab of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China.
| |
Collapse
|
3
|
Abstract
Deciphering the genetic basis of human disease requires a comprehensive knowledge of genetic variants irrespective of their class or frequency. Although an impressive number of human genetic variants have been catalogued, a large fraction of the genetic difference that distinguishes two human genomes is still not understood at the base-pair level. This is because the emphasis has been on single-nucleotide variation as opposed to less tractable and more complex genetic variants, including indels and structural variants. The latter, we propose, will have a large impact on human phenotypes but require a more systematic assessment of genomes at deeper coverage and alternate sequencing and mapping technologies.
Collapse
|
4
|
Abstract
Human genomes are diploid and, for their complete description and interpretation, it is necessary not only to discover the variation they contain but also to arrange it onto chromosomal haplotypes. Although whole-genome sequencing is becoming increasingly routine, nearly all such individual genomes are mostly unresolved with respect to haplotype, particularly for rare alleles, which remain poorly resolved by inferential methods. Here, we review emerging technologies for experimentally resolving (that is, 'phasing') haplotypes across individual whole-genome sequences. We also discuss computational methods relevant to their implementation, metrics for assessing their accuracy and completeness, and the relevance of haplotype information to applications of genome sequencing in research and clinical medicine.
Collapse
|
5
|
Starkenburg SR, Kwon KJ, Jha RK, McKay C, Jacobs M, Chertkov O, Twary S, Rocap G, Cattolico RA. A pangenomic analysis of the Nannochloropsis organellar genomes reveals novel genetic variations in key metabolic genes. BMC Genomics 2014; 15:212. [PMID: 24646409 PMCID: PMC3999925 DOI: 10.1186/1471-2164-15-212] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2013] [Accepted: 03/11/2014] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND Microalgae in the genus Nannochloropsis are photosynthetic marine Eustigmatophytes of significant interest to the bioenergy and aquaculture sectors due to their ability to efficiently accumulate biomass and lipids for utilization in renewable transportation fuels, aquaculture feed, and other useful bioproducts. To better understand the genetic complement that drives the metabolic processes of these organisms, we present the assembly and comparative pangenomic analysis of the chloroplast and mitochondrial genomes from Nannochloropsis salina CCMP1776. RESULTS The chloroplast and mitochondrial genomes of N. salina are 98.4% and 97% identical to their counterparts in Nannochloropsis gaditana. Comparison of the Nannochloropsis pangenome to other algae within and outside of the same phyla revealed regions of significant genetic divergence in key genes that encode proteins needed for regulation of branched chain amino synthesis (acetohydroxyacid synthase), carbon fixation (RuBisCO activase), energy conservation (ATP synthase), protein synthesis and homeostasis (Clp protease, ribosome). CONCLUSIONS Many organellar gene modifications in Nannochloropsis are unique and deviate from conserved orthologs found across the tree of life. Implementation of secondary and tertiary structure prediction was crucial to functionally characterize many proteins and therefore should be implemented in automated annotation pipelines. The exceptional similarity of the N. salina and N. gaditana organellar genomes suggests that N. gaditana be reclassified as a strain of N. salina.
Collapse
Affiliation(s)
- Shawn R Starkenburg
- Bioscience Division, Los Alamos National Laboratory, Los Alamos 87545, NM, USA
| | - Kyungyoon J Kwon
- Bioscience Division, Los Alamos National Laboratory, Los Alamos 87545, NM, USA
- Department of Molecular and Cell Biology, University of California-Berkeley, Berkeley 94720, CA, USA
| | - Ramesh K Jha
- Bioscience Division, Los Alamos National Laboratory, Los Alamos 87545, NM, USA
| | - Cedar McKay
- School of Oceanography, University of Washington, Seattle 98195, WA, USA
| | - Michael Jacobs
- Biology Department, University of Washington, Seattle 98195, WA, USA
| | - Olga Chertkov
- Bioscience Division, Los Alamos National Laboratory, Los Alamos 87545, NM, USA
| | - Scott Twary
- Bioscience Division, Los Alamos National Laboratory, Los Alamos 87545, NM, USA
| | - Gabrielle Rocap
- School of Oceanography, University of Washington, Seattle 98195, WA, USA
| | | |
Collapse
|
6
|
Pyo CW, Wang R, Vu Q, Cereb N, Yang SY, Duh FM, Wolinsky S, Martin MP, Carrington M, Geraghty DE. Recombinant structures expand and contract inter and intragenic diversification at the KIR locus. BMC Genomics 2013; 14:89. [PMID: 23394822 PMCID: PMC3606631 DOI: 10.1186/1471-2164-14-89] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2012] [Accepted: 01/26/2013] [Indexed: 01/21/2023] Open
Abstract
Background The human KIR genes are arranged in at least six major gene-content haplotypes, all of which are combinations of four centromeric and two telomeric motifs. Several less frequent or minor haplotypes also exist, including insertions, deletions, and hybridization of KIR genes derived from the major haplotypes. These haplotype structures and their concomitant linkage disequilibrium among KIR genes suggest that more meaningful correlative data from studies of KIR genetics and complex disease may be achieved by measuring haplotypes of the KIR region in total. Results Towards that end, we developed a KIR haplotyping method that reports unambiguous combinations of KIR gene-content haplotypes, including both phase and copy number for each KIR. A total of 37 different gene content haplotypes were detected from 4,512 individuals and new sequence data was derived from haplotypes where the detailed structure was not previously available. Conclusions These new structures suggest a number of specific recombinant events during the course of KIR evolution, and add to an expanding diversity of potential new KIR haplotypes derived from gene duplication, deletion, and hybridization.
Collapse
Affiliation(s)
- Chul-Woo Pyo
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Shen S, Pyo CW, Vu Q, Wang R, Geraghty DE. The Essential Detail: The Genetics and Genomics of the Primate Immune Response. ILAR J 2013; 54:181-95. [DOI: 10.1093/ilar/ilt043] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
|
8
|
Pyo CW, Guethlein LA, Vu Q, Wang R, Abi-Rached L, Norman PJ, Marsh SGE, Miller JS, Parham P, Geraghty DE. Different patterns of evolution in the centromeric and telomeric regions of group A and B haplotypes of the human killer cell Ig-like receptor locus. PLoS One 2010; 5:e15115. [PMID: 21206914 PMCID: PMC3012066 DOI: 10.1371/journal.pone.0015115] [Citation(s) in RCA: 184] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2010] [Accepted: 10/25/2010] [Indexed: 12/21/2022] Open
Abstract
The fast evolving human KIR gene family encodes variable lymphocyte receptors specific for polymorphic HLA class I determinants. Nucleotide sequences for 24 representative human KIR haplotypes were determined. With three previously defined haplotypes, this gave a set of 12 group A and 15 group B haplotypes for assessment of KIR variation. The seven gene-content haplotypes are all combinations of four centromeric and two telomeric motifs. 2DL5, 2DS5 and 2DS3 can be present in centromeric and telomeric locations. With one exception, haplotypes having identical gene content differed in their combinations of KIR alleles. Sequence diversity varied between haplotype groups and between centromeric and telomeric halves of the KIR locus. The most variable A haplotype genes are in the telomeric half, whereas the most variable genes characterizing B haplotypes are in the centromeric half. Of the highly polymorphic genes, only the 3DL3 framework gene exhibits a similar diversity when carried by A and B haplotypes. Phylogenetic analysis and divergence time estimates, point to the centromeric gene-content motifs that distinguish A and B haplotypes having emerged ∼6 million years ago, contemporaneously with the separation of human and chimpanzee ancestors. In contrast, the telomeric motifs that distinguish A and B haplotypes emerged more recently, ∼1.7 million years ago, before the emergence of Homo sapiens. Thus the centromeric and telomeric motifs that typify A and B haplotypes have likely been present throughout human evolution. The results suggest the common ancestor of A and B haplotypes combined a B-like centromeric region with an A-like telomeric region.
Collapse
Affiliation(s)
- Chul-Woo Pyo
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Lisbeth A. Guethlein
- Department of Structural Biology, Stanford University School of Medicine, Stanford, California, United States of America
| | - Quyen Vu
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Ruihan Wang
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Laurent Abi-Rached
- Department of Structural Biology, Stanford University School of Medicine, Stanford, California, United States of America
| | - Paul J. Norman
- Department of Structural Biology, Stanford University School of Medicine, Stanford, California, United States of America
| | | | - Jeffrey S. Miller
- University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Peter Parham
- Department of Structural Biology, Stanford University School of Medicine, Stanford, California, United States of America
| | - Daniel E. Geraghty
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
- * E-mail:
| |
Collapse
|
9
|
Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat Biotechnol 2010; 29:59-63. [PMID: 21170042 DOI: 10.1038/nbt.1740] [Citation(s) in RCA: 184] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2010] [Accepted: 11/29/2010] [Indexed: 11/08/2022]
Abstract
Haplotype information is essential to the complete description and interpretation of genomes, genetic diversity and genetic ancestry. Although individual human genome sequencing is increasingly routine, nearly all such genomes are unresolved with respect to haplotype. Here we combine the throughput of massively parallel sequencing with the contiguity information provided by large-insert cloning to experimentally determine the haplotype-resolved genome of a South Asian individual. A single fosmid library was split into a modest number of pools, each providing ∼3% physical coverage of the diploid genome. Sequencing of each pool yielded reads overwhelmingly derived from only one homologous chromosome at any given location. These data were combined with whole-genome shotgun sequence to directly phase 94% of ascertained heterozygous single nucleotide polymorphisms (SNPs) into long haplotype blocks (N50 of 386 kilobases (kbp)). This method also facilitates the analysis of structural variation, for example, to anchor novel insertions to specific locations and haplotypes.
Collapse
|
10
|
Analysis of the genome of the Escherichia coli O157:H7 2006 spinach-associated outbreak isolate indicates candidate genes that may enhance virulence. Infect Immun 2009; 77:3713-21. [PMID: 19564389 DOI: 10.1128/iai.00198-09] [Citation(s) in RCA: 126] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
In addition to causing diarrhea, Escherichia coli O157:H7 infection can lead to hemolytic-uremic syndrome (HUS), a severe disease characterized by hemolysis and renal failure. Differences in HUS frequency among E. coli O157:H7 outbreaks have been noted, but our understanding of bacterial factors that promote HUS is incomplete. In 2006, in an outbreak of E. coli O157:H7 caused by consumption of contaminated spinach, there was a notably high frequency of HUS. We sequenced the genome of the strain responsible (TW14359) with the goal of identifying candidate genetic factors that contribute to an enhanced ability to cause HUS. The TW14359 genome contains 70 kb of DNA segments not present in either of the two reference O157:H7 genomes. We identified seven putative virulence determinants, including two putative type III secretion system effector proteins, candidate genes that could result in increased pathogenicity or, alternatively, adaptation to plants, and an intact anaerobic nitric oxide reductase gene, norV. We surveyed 100 O157:H7 isolates for the presence of these putative virulence determinants. A norV deletion was found in over one-half of the strains surveyed and correlated strikingly with the absence of stx(1). The other putative virulence factors were found in 8 to 35% of the O157:H7 isolates surveyed, and their presence also correlated with the presence of norV and the absence of stx(1), indicating that the presence of norV may serve as a marker of a greater propensity for HUS, similar to the correlation between the absence of stx(1) and a propensity for HUS.
Collapse
|
11
|
Genome sequence of the fish pathogen Renibacterium salmoninarum suggests reductive evolution away from an environmental Arthrobacter ancestor. J Bacteriol 2008; 190:6970-82. [PMID: 18723615 DOI: 10.1128/jb.00721-08] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Renibacterium salmoninarum is the causative agent of bacterial kidney disease and a significant threat to healthy and sustainable production of salmonid fish worldwide. This pathogen is difficult to culture in vitro, genetic manipulation is challenging, and current therapies and preventative strategies are only marginally effective in preventing disease. The complete genome of R. salmoninarum ATCC 33209 was sequenced and shown to be a 3,155,250-bp circular chromosome that is predicted to contain 3,507 open-reading frames (ORFs). A total of 80 copies of three different insertion sequence elements are interspersed throughout the genome. Approximately 21% of the predicted ORFs have been inactivated via frameshifts, point mutations, insertion sequences, and putative deletions. The R. salmoninarum genome has extended regions of synteny to the Arthrobacter sp. strain FB24 and Arthrobacter aurescens TC1 genomes, but it is approximately 1.9 Mb smaller than both Arthrobacter genomes and has a lower G+C content, suggesting that significant genome reduction has occurred since divergence from the last common ancestor. A limited set of putative virulence factors appear to have been acquired via horizontal transmission after divergence of the species; these factors include capsular polysaccharides, heme sequestration molecules, and the major secreted cell surface antigen p57 (also known as major soluble antigen). Examination of the genome revealed a number of ORFs homologous to antibiotic resistance genes, including genes encoding beta-lactamases, efflux proteins, macrolide glycosyltransferases, and rRNA methyltransferases. The genome sequence provides new insights into R. salmoninarum evolution and may facilitate identification of chemotherapeutic targets and vaccine candidates that can be used for prevention and treatment of infections in cultured salmonids.
Collapse
|
12
|
Dapprich J, Ferriola D, Magira EE, Kunkel M, Monos D. SNP-specific extraction of haplotype-resolved targeted genomic regions. Nucleic Acids Res 2008; 36:e94. [PMID: 18611953 PMCID: PMC2528194 DOI: 10.1093/nar/gkn345] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
The availability of genotyping platforms for comprehensive genetic analysis of complex traits has resulted in a plethora of studies reporting the association of specific single-nucleotide polymorphisms (SNPs) with common diseases or drug responses. However, detailed genetic analysis of these associated regions that would correlate particular polymorphisms to phenotypes has lagged. This is primarily due to the lack of technologies that provide additional sequence information about genomic regions surrounding specific SNPs, preferably in haploid form. Enrichment methods for resequencing should have the specificity to provide DNA linked to SNPs of interest with sufficient quality to be used in a cost-effective and high-throughput manner. We describe a simple, automated method of targeting specific sequences of genomic DNA that can directly be used in downstream applications. The method isolates haploid chromosomal regions flanking targeted SNPs by hybridizing and enzymatically elongating oligonucleotides with biotinylated nucleotides based on their selective binding to unique sequence elements that differentiate one allele from any other differing sequence. The targeted genomic region is captured by streptavidin-coated magnetic particles and analyzed by standard genotyping, sequencing or microarray analysis. We applied this technology to determine contiguous molecular haplotypes across a ∼150 kb genomic region of the major histocompatibility complex.
Collapse
|
13
|
Cattolico RA, Jacobs MA, Zhou Y, Chang J, Duplessis M, Lybrand T, McKay J, Ong HC, Sims E, Rocap G. Chloroplast genome sequencing analysis of Heterosigma akashiwo CCMP452 (West Atlantic) and NIES293 (West Pacific) strains. BMC Genomics 2008; 9:211. [PMID: 18462506 PMCID: PMC2410131 DOI: 10.1186/1471-2164-9-211] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2007] [Accepted: 05/08/2008] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Heterokont algae form a monophyletic group within the stramenopile branch of the tree of life. These organisms display wide morphological diversity, ranging from minute unicells to massive, bladed forms. Surprisingly, chloroplast genome sequences are available only for diatoms, representing two (Coscinodiscophyceae and Bacillariophyceae) of approximately 18 classes of algae that comprise this taxonomic cluster. A universal challenge to chloroplast genome sequencing studies is the retrieval of highly purified DNA in quantities sufficient for analytical processing. To circumvent this problem, we have developed a simplified method for sequencing chloroplast genomes, using fosmids selected from a total cellular DNA library. The technique has been used to sequence chloroplast DNA of two Heterosigma akashiwo strains. This raphidophyte has served as a model system for studies of stramenopile chloroplast biogenesis and evolution. RESULTS H. akashiwo strain CCMP452 (West Atlantic) chloroplast DNA is 160,149 bp in size with a 21,822-bp inverted repeat, whereas NIES293 (West Pacific) chloroplast DNA is 159,370 bp in size and has an inverted repeat of 21,665 bp. The fosmid cloning technique reveals that both strains contain an isomeric chloroplast DNA population resulting from an inversion of their single copy domains. Both strains contain multiple small inverted and tandem repeats, non-randomly distributed within the genomes. Although both CCMP452 and NIES293 chloroplast DNAs contains 197 genes, multiple nucleotide polymorphisms are present in both coding and intergenic regions. Several protein-coding genes contain large, in-frame inserts relative to orthologous genes in other plastids. These inserts are maintained in mRNA products. Two genes of interest in H. akashiwo, not previously reported in any chloroplast genome, include tyrC, a tyrosine recombinase, which we hypothesize may be a result of a lateral gene transfer event, and an unidentified 456 amino acid protein, which we hypothesize serves as a G-protein-coupled receptor. The H. akashiwo chloroplast genomes share little synteny with other algal chloroplast genomes sequenced to date. CONCLUSION The fosmid cloning technique eliminates chloroplast isolation, does not require chloroplast DNA purification, and reduces sequencing processing time. Application of this method has provided new insights into chloroplast genome architecture, gene content and evolution within the stramenopile cluster.
Collapse
MESH Headings
- Algal Proteins/genetics
- Amino Acid Sequence
- Atlantic Ocean
- Base Sequence
- Chromosome Mapping
- Cloning, Molecular
- Conserved Sequence
- DNA, Algal/genetics
- DNA, Algal/isolation & purification
- DNA, Chloroplast/genetics
- DNA, Chloroplast/isolation & purification
- Furans
- Genome, Chloroplast
- Molecular Sequence Data
- Pacific Ocean
- Phaeophyceae/classification
- Phaeophyceae/genetics
- Phaeophyceae/isolation & purification
- Polymorphism, Single Nucleotide
- Recombinases/genetics
- Repetitive Sequences, Nucleic Acid
- Sequence Analysis, DNA/methods
- Sequence Homology, Amino Acid
- Species Specificity
- Thiophenes
Collapse
Affiliation(s)
- Rose Ann Cattolico
- Department of Biology, University of Washington, Box 355325, Seattle, WA 98195-5325, USA
- School of Oceanography, University of Washington, Box 357940, Seattle, WA 98195-7940, USA
| | - Michael A Jacobs
- Department of Medicine, University of Washington, Box 352145, Seattle WA 98195-2145, USA
| | - Yang Zhou
- Department of Medicine, University of Washington, Box 352145, Seattle WA 98195-2145, USA
| | - Jean Chang
- Department of Medicine, University of Washington, Box 352145, Seattle WA 98195-2145, USA
| | - Melinda Duplessis
- Department of Biology, University of Washington, Box 355325, Seattle, WA 98195-5325, USA
| | - Terry Lybrand
- Vanderbilt University Center for Structural Biology, 5142 Biosci/MRB III, Nashville, TN 37232-8725, USA
| | - John McKay
- School of Oceanography, University of Washington, Box 357940, Seattle, WA 98195-7940, USA
| | - Han Chuan Ong
- Department of Biology, University of Washington, Box 355325, Seattle, WA 98195-5325, USA
- School of Oceanography, University of Washington, Box 357940, Seattle, WA 98195-7940, USA
- Division of Science, Lyon College, 2300 Highland Rd, Batesville, AR 72501-3629, USA
| | - Elizabeth Sims
- Department of Medicine, University of Washington, Box 352145, Seattle WA 98195-2145, USA
| | - Gabrielle Rocap
- School of Oceanography, University of Washington, Box 357940, Seattle, WA 98195-7940, USA
| |
Collapse
|
14
|
Hayden HS, Gillett W, Saenphimmachak C, Lim R, Zhou Y, Jacobs MA, Chang J, Rohmer L, D'Argenio DA, Palmieri A, Levy R, Haugen E, Wong GKS, Brittnacher MJ, Burns JL, Miller SI, Olson MV, Kaul R. Large-insert genome analysis technology detects structural variation in Pseudomonas aeruginosa clinical strains from cystic fibrosis patients. Genomics 2008; 91:530-7. [PMID: 18445516 DOI: 10.1016/j.ygeno.2008.02.005] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2007] [Revised: 02/26/2008] [Accepted: 02/28/2008] [Indexed: 11/16/2022]
Abstract
Large-insert genome analysis (LIGAN) is a broadly applicable, high-throughput technology designed to characterize genome-scale structural variation. Fosmid paired-end sequences and DNA fingerprints from a query genome are compared to a reference sequence using the Genomic Variation Analysis (GenVal) suite of software tools to pinpoint locations of insertions, deletions, and rearrangements. Fosmids spanning regions that contain new structural variants can then be sequenced. Clonal pairs of Pseudomonas aeruginosa isolates from four cystic fibrosis patients were used to validate the LIGAN technology. Approximately 1.5 Mb of inserted sequences were identified, including 743 kb containing 615 ORFs that are absent from published P. aeruginosa genomes. Six rearrangement breakpoints and 220 kb of deleted sequences were also identified. Our study expands the "genome universe" of P. aeruginosa and validates a technology that complements emerging, short-read sequencing methods that are better suited to characterizing single-nucleotide polymorphisms than structural variation.
Collapse
Affiliation(s)
- Hillary S Hayden
- Genome Center, University of Washington, Seattle, WA 98195, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Andrés AM, Clark AG, Shimmin L, Boerwinkle E, Sing CF, Hixson JE. Understanding the accuracy of statistical haplotype inference with sequence data of known phase. Genet Epidemiol 2008; 31:659-71. [PMID: 17922479 DOI: 10.1002/gepi.20185] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Statistical methods for haplotype inference from multi-site genotypes of unrelated individuals have important application in association studies and population genetics. Understanding the factors that affect the accuracy of this inference is important, but their assessment has been restricted by the limited availability of biological data with known phase. We created hybrid cell lines monosomic for human chromosome 19 and produced single-chromosome complete sequences of a 48 kb genomic region in 39 individuals of African American (AA) and European American (EA) origin. We employ these phase-known genotypes and coalescent simulations to assess the accuracy of statistical haplotype reconstruction by several algorithms. Accuracy of phase inference was considerably low in our biological data even for regions as short as 25-50 kb, suggesting that caution is needed when analyzing reconstructed haplotypes. Moreover, the reliability of estimated confidence in phase inference is not high enough to allow for a reliable incorporation of site-specific uncertainty information in subsequent analyses. We show that, in samples of certain mixed ancestry (AA and EA populations), the most accurate haplotypes are probably obtained when increasing sample size by considering the largest, pooled sample, despite the hypothetical problems associated with pooling across those heterogeneous samples. Strategies to improve confidence in reconstructed haplotypes, and realistic alternatives to the analysis of inferred haplotypes, are discussed.
Collapse
Affiliation(s)
- Aida M Andrés
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA.
| | | | | | | | | | | |
Collapse
|
16
|
Closing gaps in the human genome with fosmid resources generated from multiple individuals. Nat Genet 2007; 40:96-101. [PMID: 18157130 DOI: 10.1038/ng.2007.34] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2007] [Accepted: 09/19/2007] [Indexed: 01/31/2023]
Abstract
The human genome sequence has been finished to very high standards; however, more than 340 gaps remained when the finished genome was published by the International Human Genome Sequencing Consortium in 2004. Using fosmid resources generated from multiple individuals, we targeted gaps in the euchromatic part of the human genome. Here we report 2,488,842 bp of previously unknown euchromatic sequence, 363,114 bp of which close 26 of 250 euchromatic gaps, or 10%, including two remaining euchromatic gaps on chromosome 19. Eight (30.7%) of the closed gaps were found to be polymorphic. These sequences allow complete annotation of several human genes as well as the assignment of mRNAs. The gap sequences are 2.3-fold enriched in segmentally duplicated sequences compared to the whole genome. Our analysis confirms that not all gaps within 'finished' genomes are recalcitrant to subcloning and suggests that the paired-end-sequenced fosmid libraries could prove to be a rich resource for completion of the human euchromatic genome.
Collapse
|
17
|
Microarray-based genomic selection for high-throughput resequencing. Nat Methods 2007; 4:907-9. [PMID: 17934469 DOI: 10.1038/nmeth1109] [Citation(s) in RCA: 277] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2007] [Accepted: 09/20/2007] [Indexed: 11/08/2022]
Abstract
We developed a general method, microarray-based genomic selection (MGS), capable of selecting and enriching targeted sequences from complex eukaryotic genomes without the repeat blocking steps necessary for bacterial artificial chromosome (BAC)-based genomic selection. We demonstrate that large human genomic regions, on the order of hundreds of kilobases, can be enriched and resequenced with resequencing arrays. MGS, when combined with a next-generation resequencing technology, can enable large-scale resequencing in single-investigator laboratories.
Collapse
|
18
|
Spencer DH, Bubb KL, Olson MV. Detecting disease-causing mutations in the human genome by haplotype matching. Am J Hum Genet 2006; 79:958-64. [PMID: 17033972 PMCID: PMC1698563 DOI: 10.1086/508757] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2006] [Accepted: 08/29/2006] [Indexed: 11/04/2022] Open
Abstract
Comparisons between haplotypes from affected patients and the human reference genome are frequently used to identify candidates for disease-causing mutations, even though these alignments are expected to reveal a high level of background neutral polymorphism. This limits the scope of genetic studies to relatively small genomic intervals, because current methods for distinguishing potential causal mutations from neutral variation are inefficient. Here we describe a new strategy for detecting mutations that is based on comparing affected haplotypes with closely matched control sequences from healthy individuals, rather than with the human reference genome. We use theory, simulation, and a real data set to show that this approach is expected to reduce the number of sequence variants that must be subjected to follow-up analysis by at least a factor of 20 when closely matched control sequences are selected from a reference panel with as few as 100 control genomes. We also define a reference data resource that would allow efficient application of this strategy to large critical intervals across the genome.
Collapse
Affiliation(s)
- David H Spencer
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.
| | | | | |
Collapse
|
19
|
Kouprina N, Larionov V. TAR cloning: insights into gene function, long-range haplotypes and genome structure and evolution. Nat Rev Genet 2006; 7:805-12. [PMID: 16983376 DOI: 10.1038/nrg1943] [Citation(s) in RCA: 96] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The structural and functional analysis of mammalian genomes would benefit from the ability to isolate from multiple DNA samples any targeted chromosomal segment that is the size of an average human gene. A cloning technique that is based on transformation-associated recombination (TAR) in the yeast Saccharomyces cerevisiae satisfies this need. It is a unique tool to selectively recover chromosome segments that are up to 250 kb in length from complex genomes. In addition, TAR cloning can be used to characterize gene function and genome variation, including polymorphic structural rearrangements, mutations and the evolution of gene families, and for long-range haplotyping.
Collapse
Affiliation(s)
- Natalay Kouprina
- Laboratory of Biosystems and Cancer, National Cancer Institute, National Institute of Health, Building 37, Room 5032, 9000 Rockville Pike, Bethesda, Maryland 20892, USA.
| | | |
Collapse
|
20
|
Abstract
Three very recent reports provide convincing statistical evidence (P < 10(-8)), at a genome-wide level, of the association of common polymorphisms with three different common diseases: systemic lupus erythematosus (IRF5), prostate cancer and type 1 diabetes (IFIH1 region). This adds to the trickle--soon to be a flood--of disease association results that are highly unlikely to be false positives. There are other convincing examples in the last 12 months: age-related macular degeneration (CFH), type 1 diabetes (IL2RA, also known as CD25) and type 2 diabetes (TCF7L2). Given 20 years of a literature full of irreproducible results, what has changed?
Collapse
Affiliation(s)
- John A Todd
- University of Cambridge, Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, Addenbrooke's Hospital Cambridge, Cambridgeshire CB2 2XY, UK.
| |
Collapse
|
21
|
Bubb KL, Bovee D, Buckley D, Haugen E, Kibukawa M, Paddock M, Palmieri A, Subramanian S, Zhou Y, Kaul R, Green P, Olson MV. Scan of human genome reveals no new Loci under ancient balancing selection. Genetics 2006; 173:2165-77. [PMID: 16751668 PMCID: PMC1569689 DOI: 10.1534/genetics.106.055715] [Citation(s) in RCA: 111] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
There has been much speculation as to what role balancing selection has played in evolution. In an attempt to identify regions, such as HLA, at which polymorphism has been maintained in the human population for millions of years, we scanned the human genome for regions of high SNP density. We found 16 regions that, outside of HLA and ABO, are the most highly polymorphic regions yet described; however, evidence for balancing selection at these sites is notably lacking--indeed, whole-genome simulations indicate that our findings are expected under neutrality. We propose that (i) because it is rarely stable, long-term balancing selection is an evolutionary oddity, and (ii) when a balanced polymorphism is ancient in origin, the requirements for detection by means of SNP data alone will rarely be met.
Collapse
Affiliation(s)
- K L Bubb
- Department of Genome Sciences, University of Washington Genome Center, Seattle, Washington 98195, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Bataillon T, Mailund T, Thorlacius S, Steingrimsson E, Rafnar T, Halldorsson MM, Calian V, Schierup MH. The effective size of the Icelandic population and the prospects for LD mapping: inference from unphased microsatellite markers. Eur J Hum Genet 2006; 14:1044-53. [PMID: 16736029 DOI: 10.1038/sj.ejhg.5201669] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
Characterizing the extent of linkage disequilibrium (LD) in the genome is a pre-requisite for association mapping studies. Patterns of LD also contain information about the past demography of populations. In this study, we focus on the Icelandic population where LD was investigated in 12 regions of approximately 15 cM using regularly spaced microsatellite loci displaying high heterozygosity. A total of 1753 individuals were genotyped for 179 markers. LD was estimated using a composite disequilibrium measure based on unphased data. LD decreases with distance in all 12 regions and more LD than expected by chance can be detected over approximately 4 cM in our sample. Differences in the patterns of decrease of LD with distance among genomic regions were mostly due to two regions exhibiting, respectively, higher and lower proportions of pairs in LD than average within the first 4 cM. We pooled data from all regions, except these two and summarized patterns of LD by computing the proportion of pairs of loci exhibiting significant LD (at the 5% level) as a function of distance. We compared observed patterns of LD with simulated data sets obtained under scenarios with varying demography and intensity of recombination. We show that unphased data allow to make inferences on scaled recombination rates from patterns of LD. Patterns of LD in Iceland suggest a genome-wide scaled recombination rate of rho* = 200 (130-330) per cM (or an effective size of roughly 5000), in the low range of estimates recently reported in three populations from the HapMap project.
Collapse
Affiliation(s)
- Thomas Bataillon
- Bioinformatics Research Center, University of Aarhus, Høegh-Guldbergs Gade 10, DK-8000 Aarhus C, Denmark.
| | | | | | | | | | | | | | | |
Collapse
|