1
|
Edrees BM, Athar M, Abduljaleel Z, Al-Allaf FA, Taher MM, Khan W, Bouazzaoui A, Al-Harbi N, Safar R, Al-Edressi H, Alansary K, Anazi A, Altayeb N, Ahmed MA. Functional alterations due to amino acid changes and evolutionary comparative analysis of ARPKD and ADPKD genes. GENOMICS DATA 2016; 10:127-134. [PMID: 27843768 PMCID: PMC5099264 DOI: 10.1016/j.gdata.2016.10.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Revised: 10/18/2016] [Accepted: 10/30/2016] [Indexed: 12/15/2022]
Abstract
A targeted customized sequencing of genes implicated in autosomal recessive polycystic kidney disease (ARPKD) phenotype was performed to identify candidate variants using the Ion torrent PGM next-generation sequencing. The results identified four potential pathogenic variants in PKHD1 gene [c.4870C > T, p.(Arg1624Trp), c.5725C > T, p.(Arg1909Trp), c.1736C > T, p.(Thr579Met) and c.10628T > G, p.(Leu3543Trp)] among 12 out of 18 samples. However, one variant c.4870C > T, p.(Arg1624Trp) was common among eight patients. Some patient samples also showed few variants in autosomal dominant polycystic kidney disease (ADPKD) disease causing genes PKD1 and PKD2 such as c.12433G > A, p.(Val4145Ile) and c.1445T > G, p.(Phe482Cys), respectively. All causative variants were validated by capillary sequencing and confirmed the presence of a novel homozygous variant c.10628T > G, p.(Leu3543Trp) in a male proband. We have recently published the results of these studies (Edrees et al., 2016). Here we report for the first time the effect of the common mutation p.(Arg1624Trp) found in eight samples on the protein structure and function due to the specific amino acid changes of PKHD1 protein using molecular dynamics simulations. The computational approaches provide tool predict the phenotypic effect of variant on the structure and function of the altered protein. The structural analysis with the common mutation p.(Arg1624Trp) in the native and mutant modeled protein were also studied for solvent accessibility, secondary structure and stabilizing residues to find out the stability of the protein between wild type and mutant forms. Furthermore, comparative genomics and evolutionary analyses of variants observed in PKHD1, PKD1, and PKD2 genes were also performed in some mammalian species including human to understand the complexity of genomes among closely related mammalian species. Taken together, the results revealed that the evolutionary comparative analyses and characterization of PKHD1, PKD1, and PKD2 genes among various related and unrelated mammalian species will provide important insights into their evolutionary process and understanding for further disease characterization and management.
Collapse
Affiliation(s)
- Burhan M Edrees
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, P.O. Box 715, Makkah 21955, Saudi Arabia; King Fahad Medical City, P.O. Box 59046, Riyadh 11525, Saudi Arabia
| | - Mohammad Athar
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, P.O. Box 715, Makkah 21955, Saudi Arabia; Science and Technology Unit, Umm Al Qura University, P.O. Box 715, Makkah 21955, Saudi Arabia
| | - Zainularifeen Abduljaleel
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, P.O. Box 715, Makkah 21955, Saudi Arabia; Science and Technology Unit, Umm Al Qura University, P.O. Box 715, Makkah 21955, Saudi Arabia
| | - Faisal A Al-Allaf
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, P.O. Box 715, Makkah 21955, Saudi Arabia; Science and Technology Unit, Umm Al Qura University, P.O. Box 715, Makkah 21955, Saudi Arabia; Molecular Diagnostics Unit, Department of Laboratory and Blood Bank, King Abdullah Medical City, Makkah 21955, Saudi Arabia
| | - Mohiuddin M Taher
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, P.O. Box 715, Makkah 21955, Saudi Arabia; Science and Technology Unit, Umm Al Qura University, P.O. Box 715, Makkah 21955, Saudi Arabia
| | - Wajahatullah Khan
- Department of Basic Sciences, College of Science and Health Professions, King Saud Bin Abdulaziz University for Health Sciences, P.O. Box 3660, Riyadh 11426, Saudi Arabia
| | - Abdellatif Bouazzaoui
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, P.O. Box 715, Makkah 21955, Saudi Arabia; Science and Technology Unit, Umm Al Qura University, P.O. Box 715, Makkah 21955, Saudi Arabia
| | - Naffaa Al-Harbi
- Department of Pediatric, King Faisal Specialist Hospital and Research Centre, P.O. Box 40047, Jeddah 21499, Saudi Arabia
| | - Ramzia Safar
- Madinah Maternity and Children's Hospital, P.O. Box 5073, Madinah 42318, Saudi Arabia
| | - Howaida Al-Edressi
- Madinah Maternity and Children's Hospital, P.O. Box 5073, Madinah 42318, Saudi Arabia
| | - Khawala Alansary
- King Fahad Medical City, P.O. Box 59046, Riyadh 11525, Saudi Arabia
| | - Abulkareem Anazi
- King Fahad Medical City, P.O. Box 59046, Riyadh 11525, Saudi Arabia
| | - Naji Altayeb
- King Fahad Medical City, P.O. Box 59046, Riyadh 11525, Saudi Arabia
| | - Muawia A Ahmed
- King Salman Armed Forces Hospital, P.O. box 100, Tabuk, Saudi Arabia
| |
Collapse
|
2
|
Ilinsky VV, Korneeva VA, Shatalov PA. Application of whole exome sequencing in the diagnosis of hereditary neurological diseases. Zh Nevrol Psikhiatr Im S S Korsakova 2015; 115:45-52. [DOI: 10.17116/jnevro20151151145-52] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
3
|
Kuhn H, Sahu B, Rapireddy S, Ly DH, Frank-Kamenetskii MD. Sequence specificity at targeting double-stranded DNA with a γ-PNA oligomer modified with guanidinium G-clamp nucleobases. ARTIFICIAL DNA, PNA & XNA 2014; 1:45-53. [PMID: 21687526 DOI: 10.4161/adna.1.1.12444] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2010] [Revised: 05/19/2010] [Accepted: 05/24/2010] [Indexed: 11/19/2022]
Abstract
γ-PNA, a new class of peptide nucleic acids, promises to overcome previous sequence limitations of double-stranded DNA (dsDNA) targeting with PNA. To check the potential of γ-PNA, we have synthesized a biotinylated, pentadecameric γ-PNA of mixed sequence carrying three guanidinium G-clamp nucleobases. We have found that strand invasion reactions of the γ-PNA oligomer to its fully complementary target within dsDNA occurs with significantly higher binding rates than to targets containing single mismatches. Association of the PNA oligomer to mismatched targets does not go to completion but instead reaches a stationary level at or below 60%, even at conditions of very low ionic strength. Initial binding rates to both matched and mismatched targets experience a steep decrease with increasing salt concentration. We demonstrate that a linear DNA target fragment with the correct target sequence can be purified from DNA mixtures containing mismatched target or unrelated genomic DNA by affinity capture with streptavidin-coated magnetic beads. Similarly, supercoiled plasmid DNA is obtained with high purity from an initial sample mixture that included a linear DNA fragment with the fully complementary sequence. Based on the results obtained in this study we believe that γ-PNA has a great potential for specific targeting of chosen duplex DNA sites in a sequence-unrestricted fashion.
Collapse
Affiliation(s)
- Heiko Kuhn
- Center for Advanced Biotechnology; Department of Biomedical Engineering; Boston University; Boston, MA USA
| | | | | | | | | |
Collapse
|
4
|
Elsharawy A, Forster M, Schracke N, Keller A, Thomsen I, Petersen BS, Stade B, Stähler P, Schreiber S, Rosenstiel P, Franke A. Improving mapping and SNP-calling performance in multiplexed targeted next-generation sequencing. BMC Genomics 2012; 13:417. [PMID: 22913592 PMCID: PMC3563481 DOI: 10.1186/1471-2164-13-417] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2011] [Accepted: 08/10/2012] [Indexed: 11/10/2022] Open
Abstract
Background Compared to classical genotyping, targeted next-generation sequencing (tNGS) can be custom-designed to interrogate entire genomic regions of interest, in order to detect novel as well as known variants. To bring down the per-sample cost, one approach is to pool barcoded NGS libraries before sample enrichment. Still, we lack a complete understanding of how this multiplexed tNGS approach and the varying performance of the ever-evolving analytical tools can affect the quality of variant discovery. Therefore, we evaluated the impact of different software tools and analytical approaches on the discovery of single nucleotide polymorphisms (SNPs) in multiplexed tNGS data. To generate our own test model, we combined a sequence capture method with NGS in three experimental stages of increasing complexity (E. coli genes, multiplexed E. coli, and multiplexed HapMap BRCA1/2 regions). Results We successfully enriched barcoded NGS libraries instead of genomic DNA, achieving reproducible coverage profiles (Pearson correlation coefficients of up to 0.99) across multiplexed samples, with <10% strand bias. However, the SNP calling quality was substantially affected by the choice of tools and mapping strategy. With the aim of reducing computational requirements, we compared conventional whole-genome mapping and SNP-calling with a new faster approach: target-region mapping with subsequent ‘read-backmapping’ to the whole genome to reduce the false detection rate. Consequently, we developed a combined mapping pipeline, which includes standard tools (BWA, SAMtools, etc.), and tested it on public HiSeq2000 exome data from the 1000 Genomes Project. Our pipeline saved 12 hours of run time per Hiseq2000 exome sample and detected ~5% more SNPs than the conventional whole genome approach. This suggests that more potential novel SNPs may be discovered using both approaches than with just the conventional approach. Conclusions We recommend applying our general ‘two-step’ mapping approach for more efficient SNP discovery in tNGS. Our study has also shown the benefit of computing inter-sample SNP-concordances and inspecting read alignments in order to attain more confident results.
Collapse
Affiliation(s)
- Abdou Elsharawy
- Institute of Clinical Molecular Biology, Christian-Albrechts-University, Kiel, Germany
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Rossetti S, Hopp K, Sikkink RA, Sundsbak JL, Lee YK, Kubly V, Eckloff BW, Ward CJ, Winearls CG, Torres VE, Harris PC. Identification of gene mutations in autosomal dominant polycystic kidney disease through targeted resequencing. J Am Soc Nephrol 2012; 23:915-33. [PMID: 22383692 DOI: 10.1681/asn.2011101032] [Citation(s) in RCA: 132] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
Mutations in two large multi-exon genes, PKD1 and PKD2, cause autosomal dominant polycystic kidney disease (ADPKD). The duplication of PKD1 exons 1-32 as six pseudogenes on chromosome 16, the high level of allelic heterogeneity, and the cost of Sanger sequencing complicate mutation analysis, which can aid diagnostics of ADPKD. We developed and validated a strategy to analyze both the PKD1 and PKD2 genes using next-generation sequencing by pooling long-range PCR amplicons and multiplexing bar-coded libraries. We used this approach to characterize a cohort of 230 patients with ADPKD. This process detected definitely and likely pathogenic variants in 115 (63%) of 183 patients with typical ADPKD. In addition, we identified atypical mutations, a gene conversion, and one missed mutation resulting from allele dropout, and we characterized the pattern of deep intronic variation for both genes. In summary, this strategy involving next-generation sequencing is a model for future genetic characterization of large ADPKD populations.
Collapse
Affiliation(s)
- Sandro Rossetti
- Division of Nephrology and Hypertension, Mayo Clinic, Rochester, MN 55905, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Cronn R, Knaus BJ, Liston A, Maughan PJ, Parks M, Syring JV, Udall J. Targeted enrichment strategies for next-generation plant biology. AMERICAN JOURNAL OF BOTANY 2012; 99:291-311. [PMID: 22312117 DOI: 10.3732/ajb.1100356] [Citation(s) in RCA: 120] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
PREMISE OF THE STUDY The dramatic advances offered by modern DNA sequencers continue to redefine the limits of what can be accomplished in comparative plant biology. Even with recent achievements, however, plant genomes present obstacles that can make it difficult to execute large-scale population and phylogenetic studies on next-generation sequencing platforms. Factors like large genome size, extensive variation in the proportion of organellar DNA in total DNA, polyploidy, and gene number/redundancy contribute to these challenges, and they demand flexible targeted enrichment strategies to achieve the desired goals. METHODS In this article, we summarize the many available targeted enrichment strategies that can be used to target partial-to-complete organellar genomes, as well as known and anonymous nuclear targets. These methods fall under four categories: PCR-based enrichment, hybridization-based enrichment, restriction enzyme-based enrichment, and enrichment of expressed gene sequences. KEY RESULTS Examples of plant-specific applications exist for nearly all methods described. While some methods are well established (e.g., transcriptome sequencing), other promising methods are in their infancy (hybridization enrichment). A direct comparison of methods shows that PCR-based enrichment may be a reasonable strategy for accessing small genomic targets (e.g., ≤50 kbp), but that hybridization and transcriptome sequencing scale more efficiently if larger targets are desired. CONCLUSIONS While the benefits of targeted sequencing are greatest in plants with large genomes, nearly all comparative projects can benefit from the improved throughput offered by targeted multiplex DNA sequencing, particularly as the amount of data produced from a single instrument approaches a trillion bases per run.
Collapse
Affiliation(s)
- Richard Cronn
- Pacific Northwest Research Station, USDA Forest Service, Corvallis, Oregon 97331, USA.
| | | | | | | | | | | | | |
Collapse
|
7
|
Next-generation sequencing reveals phylogeographic structure and a species tree for recent bird divergences. Mol Phylogenet Evol 2012; 62:397-406. [DOI: 10.1016/j.ympev.2011.10.012] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2011] [Revised: 09/20/2011] [Accepted: 10/15/2011] [Indexed: 02/03/2023]
|
8
|
Robinson PN, Krawitz P, Mundlos S. Strategies for exome and genome sequence data analysis in disease-gene discovery projects. Clin Genet 2011; 80:127-32. [PMID: 21615730 DOI: 10.1111/j.1399-0004.2011.01713.x] [Citation(s) in RCA: 78] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
In whole-exome sequencing (WES), target capture methods are used to enrich the sequences of the coding regions of genes from fragmented total genomic DNA, followed by massively parallel, 'next-generation' sequencing of the captured fragments. Since its introduction in 2009, WES has been successfully used in several disease-gene discovery projects, but the analysis of whole-exome sequence data can be challenging. In this overview, we present a summary of the main computational strategies that have been applied to identify novel disease genes in whole-exome data, including intersect filters, the search for de novo mutations, and the application of linkage mapping or inference of identity-by-descent (IBD) in family studies.
Collapse
Affiliation(s)
- Peter N Robinson
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany.
| | | | | |
Collapse
|
9
|
Voelkerding KV, Dames S, Durtschi JD. Next generation sequencing for clinical diagnostics-principles and application to targeted resequencing for hypertrophic cardiomyopathy: a paper from the 2009 William Beaumont Hospital Symposium on Molecular Pathology. J Mol Diagn 2011; 12:539-51. [PMID: 20805560 DOI: 10.2353/jmoldx.2010.100043] [Citation(s) in RCA: 96] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
During the past five years, new high-throughput DNA sequencing technologies have emerged; these technologies are collectively referred to as next generation sequencing (NGS). By virtue of sequencing clonally amplified DNA templates or single DNA molecules in a massively parallel fashion in a flow cell, NGS provides both qualitative and quantitative sequence data. This combination of information has made NGS the technology of choice for complex genetic analyses that were previously either technically infeasible or cost prohibitive. As a result, NGS has had a fundamental and broad impact on many facets of biomedical research. In contrast, the dissemination of NGS into the clinical diagnostic realm is in its early stages. Though NGS is powerful and can be envisioned to have multiple applications in clinical diagnostics, the technology is currently complex. Successful adoption of NGS into the clinical laboratory will require expertise in both molecular biology techniques and bioinformatics. The current report presents principles that underlie NGS including sequencing library preparation, sequencing chemistries, and an introduction to NGS data analysis. These concepts are subsequently further illustrated by showing representative results from a case study using NGS for targeted resequencing of genes implicated in hypertrophic cardiomyopathy.
Collapse
|
10
|
Abstract
In the few years since its initial application, massively parallel cDNA sequencing, or RNA-seq, has allowed many advances in the characterization and quantification of transcriptomes. Recently, several developments in RNA-seq methods have provided an even more complete characterization of RNA transcripts. These developments include improvements in transcription start site mapping, strand-specific measurements, gene fusion detection, small RNA characterization and detection of alternative splicing events. Ongoing developments promise further advances in the application of RNA-seq, particularly direct RNA sequencing and approaches that allow RNA quantification from very small amounts of cellular materials.
Collapse
Affiliation(s)
- Fatih Ozsolak
- Helicos BioSciences Corporation, One Kendall Square, Cambridge, Massachusetts 02139, USA.
| | | |
Collapse
|
11
|
Teer JK, Bonnycastle LL, Chines PS, Hansen NF, Aoyama N, Swift AJ, Abaan HO, Albert TJ, Margulies EH, Green ED, Collins FS, Mullikin JC, Biesecker LG. Systematic comparison of three genomic enrichment methods for massively parallel DNA sequencing. Genome Res 2010; 20:1420-31. [PMID: 20810667 DOI: 10.1101/gr.106716.110] [Citation(s) in RCA: 187] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Massively parallel DNA sequencing technologies have greatly increased our ability to generate large amounts of sequencing data at a rapid pace. Several methods have been developed to enrich for genomic regions of interest for targeted sequencing. We have compared three of these methods: Molecular Inversion Probes (MIP), Solution Hybrid Selection (SHS), and Microarray-based Genomic Selection (MGS). Using HapMap DNA samples, we compared each of these methods with respect to their ability to capture an identical set of exons and evolutionarily conserved regions associated with 528 genes (2.61 Mb). For sequence analysis, we developed and used a novel Bayesian genotype-assigning algorithm, Most Probable Genotype (MPG). All three capture methods were effective, but sensitivities (percentage of targeted bases associated with high-quality genotypes) varied for an equivalent amount of pass-filtered sequence: for example, 70% (MIP), 84% (SHS), and 91% (MGS) for 400 Mb. In contrast, all methods yielded similar accuracies of >99.84% when compared to Infinium 1M SNP BeadChip-derived genotypes and >99.998% when compared to 30-fold coverage whole-genome shotgun sequencing data. We also observed a low false-positive rate with all three methods; of the heterozygous positions identified by each of the capture methods, >99.57% agreed with 1M SNP BeadChip, and >98.840% agreed with the whole-genome shotgun data. In addition, we successfully piloted the genomic enrichment of a set of 12 pooled samples via the MGS method using molecular bar codes. We find that these three genomic enrichment methods are highly accurate and practical, with sensitivities comparable to that of 30-fold coverage whole-genome shotgun data.
Collapse
Affiliation(s)
- Jamie K Teer
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Abstract
The development of massively parallel sequencing technologies, coupled with new massively parallel DNA enrichment technologies (genomic capture), has allowed the sequencing of targeted regions of the human genome in rapidly increasing numbers of samples. Genomic capture can target specific areas in the genome, including genes of interest and linkage regions, but this limits the study to what is already known. Exome capture allows an unbiased investigation of the complete protein-coding regions in the genome. Researchers can use exome capture to focus on a critical part of the human genome, allowing larger numbers of samples than are currently practical with whole-genome sequencing. In this review, we briefly describe some of the methodologies currently used for genomic and exome capture and highlight recent applications of this technology.
Collapse
Affiliation(s)
- Jamie K Teer
- Genetic Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, 5625 Fishers Lane, Bethesda, MD 20892, USA
| | | |
Collapse
|
13
|
Targeted high throughput sequencing of a cancer-related exome subset by specific sequence capture with a fully automated microarray platform. Genomics 2010; 95:241-6. [PMID: 20138981 DOI: 10.1016/j.ygeno.2010.01.006] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2009] [Revised: 01/29/2010] [Accepted: 01/30/2010] [Indexed: 11/24/2022]
Abstract
Sequence capture methods for targeted next generation sequencing promise to massively reduce cost of genomics projects compared to untargeted sequencing. However, evaluated capture methods specifically dedicated to biologically relevant genomic regions are rare. Whole exome capture has been shown to be a powerful tool to discover the genetic origin of disease and provides a reduction in target size and thus calculative sequencing capacity of >90-fold compared to untargeted whole genome sequencing. For further cost reduction, a valuable complementing approach is the analysis of smaller, relevant gene subsets but involving large cohorts of samples. However, effective adjustment of target sizes and sample numbers is hampered by the limited scalability of enrichment systems. We report a highly scalable and automated method to capture a 480 Kb exome subset of 115 cancer-related genes using microfluidic DNA arrays. The arrays are adaptable from 125 Kb to 1 Mb target size and/or one to eight samples without barcoding strategies, representing a further 26 - 270-fold reduction of calculative sequencing capacity compared to whole exome sequencing. Illumina GAII analysis of a HapMap genome enriched for this exome subset revealed a completeness of >96%. Uniformity was such that >68% of exons had at least half the median depth of coverage. An analysis of reference SNPs revealed a sensitivity of up to 93% and a specificity of 98.2% or higher.
Collapse
|
14
|
|
15
|
Abstract
The emergence of massively parallel DNA sequencing platforms has made resequencing an affordable approach to study genetic variation. However, the cost of whole genome resequencing remains too high to apply to large numbers of human samples. Genomic partitioning methods allow enrichment for regions of interest at a scale that is matched to the throughput of the new sequencing platforms. We review general categories of methods for genomic partitioning including multiplex PCR, capture-by-circularization, and capture-by-hybridization. Parameters that are relevant to the performance of any given method include multiplexity, specificity, uniformity, input requirements, scalability, and cost. The successful development of genomic partitioning strategies will be key to taking full advantage of massively parallel sequencing, at least until resequencing of complete mammalian genomes becomes widely affordable.
Collapse
Affiliation(s)
- Emily H Turner
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195-5065, USA.
| | | | | | | |
Collapse
|
16
|
HybSelect: high-throughput access to genomic regions of interest for targeted next-generation sequencing. Nat Methods 2009. [DOI: 10.1038/nmeth.f.266] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
17
|
Summerer D. Enabling technologies of genomic-scale sequence enrichment for targeted high-throughput sequencing. Genomics 2009; 94:363-8. [PMID: 19720138 DOI: 10.1016/j.ygeno.2009.08.012] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2009] [Revised: 08/12/2009] [Accepted: 08/22/2009] [Indexed: 10/20/2022]
Abstract
Next-generation sequencing has still not reached its full potential due to the technical inability of effectively targeting desired genomic regions of interest. Once available, methods adressing this bottleneck will dramatically reduce cost and enable the efficient analysis of complex samples. Recently, a number of possible approaches for genomic-scale sequence enrichment have been reported using different strategies. All methods basically rely on sequence-specific nucleic acid hybridization, however, they differ in several aspects such as the use of solid phase versus solution phase hybridization, probe design and overall workflows with implications for automation. Overall, several key challenges of genome-wide sequence enrichment have become clear after these studies that remain to be overcome. We summarize the different technologies and highlight individual characteristics related to general potential and different suitabilities for specific applications.
Collapse
Affiliation(s)
- Daniel Summerer
- febit biomed gmbh, Im Neuenheimer Feld 519, 69120 Heidelberg, Germany.
| |
Collapse
|
18
|
Summerer D, Wu H, Haase B, Cheng Y, Schracke N, Stähler CF, Chee MS, Stähler PF, Beier M. Microarray-based multicycle-enrichment of genomic subsets for targeted next-generation sequencing. Genome Res 2009; 19:1616-21. [PMID: 19638418 DOI: 10.1101/gr.091942.109] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The lack of efficient high-throughput methods for enrichment of specific sequences from genomic DNA represents a key bottleneck in exploiting the enormous potential of next-generation sequencers. Such methods would allow for a systematic and targeted analysis of relevant genomic regions. Recent studies reported sequence enrichment using a hybridization step to specific DNA capture probes as a possible solution to the problem. However, so far no method has provided sufficient depths of coverage for reliable base calling over the entire target regions. We report a strategy to multiply the enrichment performance and consequently improve depth and breadth of coverage for desired target sequences by applying two iterative cycles of hybridization with microfluidic Geniom biochips. Using this strategy, we enriched and then sequenced the cancer-related genes BRCA1 and TP53 and a set of 1000 individual dbSNP regions of 500 bp using Illumina technology. We achieved overall enrichment factors of up to 1062-fold and average coverage depths of 470-fold. Combined with high coverage uniformity, this resulted in nearly complete consensus coverages with >86% of target region covered at 20-fold or higher. Analysis of SNP calling accuracies after enrichment revealed excellent concordance, with the reference sequence closely mirroring the previously reported performance of Illumina sequencing conducted without sequence enrichment.
Collapse
|
19
|
Li JB, Gao Y, Aach J, Zhang K, Kryukov GV, Xie B, Ahlford A, Yoon JK, Rosenbaum AM, Zaranek AW, LeProust E, Sunyaev SR, Church GM. Multiplex padlock targeted sequencing reveals human hypermutable CpG variations. Genome Res 2009; 19:1606-15. [PMID: 19525355 DOI: 10.1101/gr.092213.109] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Utilizing the full power of next-generation sequencing often requires the ability to perform large-scale multiplex enrichment of many specific genomic loci in multiple samples. Several technologies have been recently developed but await substantial improvements. We report the 10,000-fold improvement of a previously developed padlock-based approach, and apply the assay to identifying genetic variations in hypermutable CpG regions across human chromosome 21. From approximately 3 million reads derived from a single Illumina Genome Analyzer lane, approximately 94% (approximately 50,500) target sites can be observed with at least one read. The uniformity of coverage was also greatly improved; up to 93% and 57% of all targets fell within a 100- and 10-fold coverage range, respectively. Alleles at >400,000 target base positions were determined across six subjects and examined for single nucleotide polymorphisms (SNPs), and the concordance with independently obtained genotypes was 98.4%-100%. We detected >500 SNPs not currently in dbSNP, 362 of which were in targeted CpG locations. Transitions in CpG sites were at least 13.7 times more abundant than non-CpG transitions. Fractions of polymorphic CpG sites are lower in CpG-rich regions and show higher correlation with human-chimpanzee divergence within CpG versus non-CpG sites. This is consistent with the hypothesis that methylation rate heterogeneity along chromosomes contributes to mutation rate variation in humans. Our success suggests that targeted CpG resequencing is an efficient way to identify common and rare genetic variations. In addition, the significantly improved padlock capture technology can be readily applied to other projects that require multiplex sample preparation.
Collapse
Affiliation(s)
- Jin Billy Li
- Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Lister R, Ecker JR. Finding the fifth base: genome-wide sequencing of cytosine methylation. Genome Res 2009; 19:959-66. [PMID: 19273618 DOI: 10.1101/gr.083451.108] [Citation(s) in RCA: 251] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Complete sequences of myriad eukaryotic genomes, including several human genomes, are now available, and recent dramatic developments in DNA sequencing technology are opening the floodgates to vast volumes of sequence data. Yet, despite knowing for several decades that a significant proportion of cytosines in the genomes of plants and animals are present in the form of methylcytosine, until very recently the precise locations of these modified bases have never been accurately mapped throughout a eukaryotic genome. Advanced "next-generation" DNA sequencing technologies are now enabling the global mapping of this epigenetic modification at single-base resolution, providing new insights into the regulation and dynamics of DNA methylation in genomes.
Collapse
Affiliation(s)
- Ryan Lister
- Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, California 92037, USA
| | | |
Collapse
|