151
|
Abstract
Genomic information reported as haplotypes rather than genotypes will be increasingly important for personalized medicine. Current technologies generate diploid sequence data that is rarely resolved into its constituent haplotypes. Furthermore, paradigms for thinking about genomic information are based on interpreting genotypes rather than haplotypes. Nevertheless, haplotypes have historically been useful in contexts ranging from population genetics to disease-gene mapping efforts. The main approaches for phasing genomic sequence data are molecular haplotyping, genetic haplotyping, and population-based inference. Long-read sequencing technologies are enabling longer molecular haplotypes, and decreases in the cost of whole-genome sequencing are enabling the sequencing of whole-chromosome genetic haplotypes. Hybrid approaches combining high-throughput short-read assembly with strategic approaches that enable physical or virtual binning of reads into haplotypes are enabling multi-gene haplotypes to be generated from single individuals. These techniques can be further combined with genetic and population approaches. Here, we review advances in whole-genome haplotyping approaches and discuss the importance of haplotypes for genomic medicine. Clinical applications include diagnosis by recognition of compound heterozygosity and by phasing regulatory variation to coding variation. Haplotypes, which are more specific than less complex variants such as single nucleotide variants, also have applications in prognostics and diagnostics, in the analysis of tumors, and in typing tissue for transplantation. Future advances will include technological innovations, the application of standard metrics for evaluating haplotype quality, and the development of databases that link haplotypes to disease.
Collapse
Affiliation(s)
- Gustavo Glusman
- Institute for Systems Biology, Terry Avenue North, Seattle, WA 98109 USA
| | - Hannah C Cox
- Institute for Systems Biology, Terry Avenue North, Seattle, WA 98109 USA
| | - Jared C Roach
- Institute for Systems Biology, Terry Avenue North, Seattle, WA 98109 USA
| |
Collapse
|
152
|
Matsumoto H, Kiryu H. Integrating dilution-based sequencing and population genotypes for single individual haplotyping. BMC Genomics 2014; 15:733. [PMID: 25167975 PMCID: PMC4162929 DOI: 10.1186/1471-2164-15-733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2013] [Accepted: 08/18/2014] [Indexed: 11/30/2022] Open
Abstract
Background Haplotype information is useful for many genetic analyses and haplotypes are usually inferred using computational approaches. Among such approaches, the importance of single individual haplotyping (SIH), which infers individual haplotypes from sequence fragments, has been increasing with the advent of novel sequencing techniques, such as dilution-based sequencing. These techniques could produce virtual long read fragments by separating DNA fragments into multiple low-concentration aliquots, sequencing and mapping each aliquot, and merging clustered short reads. Although these experimental techniques are sophisticated, they have the problem of producing chimeric fragments whose left and right parts match different chromosomes. In our previous research, we found that chimeric fragments significantly decrease the accuracy of SIH. Although chimeric fragments can be removed by using haplotypes which are determined from pedigree genotypes, pedigree genotypes are generally not available. The length of reads cluster and heterozygous calls were also used to detect chimeric fragments. Although some chimeric fragments will be removed with these features, considerable number of chimeric fragments will be undetected because of the dispersion of the length and the absence of SNPs in the overlapped regions. For these reasons, a general method to detect and remove chimeric fragments is needed. Results In this paper, we propose a general method to detect chimeric fragments. The basis of our method is that a chimeric fragment would correspond to an artificial recombinant haplotype and would differ from biological haplotypes. To detect differences from biological haplotypes, we integrated statistical phasing, which is a haplotype inference approach from population genotypes, into our method. We applied our method to two datasets and detected chimeric fragments with high AUC. AUC values of our method are higher than those of just using cluster length and heterozygous calls. We then used multiple SIH algorithm to compare the accuracy of SIH before and after removing the chimeric fragment candidates. The accuracy of assembled haplotypes increased significantly after removing chimeric fragment candidates. Conclusions Our method is useful for detecting chimeric fragments and improving SIH accuracy. The Ruby script is available at
https://sites.google.com/site/hmatsu1226/software/csp. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-733) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hirotaka Matsumoto
- Department of Computational Biology, Faculty of Frontier Science, The University of Tokyo, 5-1-5, Kashiwanoha, Kashiwa, Chiba 277-8561, Japan.
| | | |
Collapse
|
153
|
Su ZD, Sheng QH, Li QR, Chi H, Jiang X, Yan Z, Fu N, He SM, Khaitovich P, Wu JR, Zeng R. De novo identification and quantification of single amino-acid variants in human brain. J Mol Cell Biol 2014; 6:421-33. [PMID: 25007923 DOI: 10.1093/jmcb/mju031] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The detection of single amino-acid variants (SAVs) usually depends on single-nucleotide polymorphisms (SNPs) database. Here, we describe a novel method that discovers SAVs at proteome level independent of SNPs data. Using mass spectrometry-based de novo sequencing algorithm, peptide-candidates are identified and compared with theoretical protein database to generate SAVs under pairing strategy, which is followed by database re-searching to control false discovery rate. In human brain tissues, we can confidently identify known and novel protein variants with diverse origins. Combined with DNA/RNA sequencing, we verify SAVs derived from DNA mutations, RNA alternative splicing, and unknown post-transcriptional mechanisms. Furthermore, quantitative analysis in human brain tissues reveals several tissue-specific differential expressions of SAVs. This approach provides a novel access to high-throughput detection of protein variants, which may offer the potential for clinical biomarker discovery and mechanistic research.
Collapse
Affiliation(s)
- Zhi-Duan Su
- Key Laboratory of Systems Biology, Chinese Academy of Sciences, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Shanghai 200031, China
| | - Quan-Hu Sheng
- Key Laboratory of Systems Biology, Chinese Academy of Sciences, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Shanghai 200031, China
| | - Qing-Run Li
- Key Laboratory of Systems Biology, Chinese Academy of Sciences, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Shanghai 200031, China
| | - Hao Chi
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Xi Jiang
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai 200031, China
| | - Zheng Yan
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai 200031, China
| | - Ning Fu
- Key Laboratory of Systems Biology, Chinese Academy of Sciences, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Shanghai 200031, China
| | - Si-Min He
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Philipp Khaitovich
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai 200031, China
| | - Jia-Rui Wu
- Key Laboratory of Systems Biology, Chinese Academy of Sciences, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Shanghai 200031, China
| | - Rong Zeng
- Key Laboratory of Systems Biology, Chinese Academy of Sciences, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Shanghai 200031, China
| |
Collapse
|
154
|
Ekblom R, Wolf JBW. A field guide to whole-genome sequencing, assembly and annotation. Evol Appl 2014; 7:1026-42. [PMID: 25553065 PMCID: PMC4231593 DOI: 10.1111/eva.12178] [Citation(s) in RCA: 188] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2014] [Accepted: 05/20/2014] [Indexed: 12/12/2022] Open
Abstract
Genome sequencing projects were long confined to biomedical model organisms and required the concerted effort of large consortia. Rapid progress in high-throughput sequencing technology and the simultaneous development of bioinformatic tools have democratized the field. It is now within reach for individual research groups in the eco-evolutionary and conservation community to generate de novo draft genome sequences for any organism of choice. Because of the cost and considerable effort involved in such an endeavour, the important first step is to thoroughly consider whether a genome sequence is necessary for addressing the biological question at hand. Once this decision is taken, a genome project requires careful planning with respect to the organism involved and the intended quality of the genome draft. Here, we briefly review the state of the art within this field and provide a step-by-step introduction to the workflow involved in genome sequencing, assembly and annotation with particular reference to large and complex genomes. This tutorial is targeted at scientists with a background in conservation genetics, but more generally, provides useful practical guidance for researchers engaging in whole-genome sequencing projects.
Collapse
Affiliation(s)
- Robert Ekblom
- Department of Evolutionary Biology, Uppsala University Uppsala, Sweden
| | - Jochen B W Wolf
- Department of Evolutionary Biology, Uppsala University Uppsala, Sweden
| |
Collapse
|
155
|
Abstract
That each of us is truly biologically unique, extending to even monozygotic, "identical" twins, is not fully appreciated. Now that it is possible to perform a comprehensive "omic" assessment of an individual, including one's DNA and RNA sequence and at least some characterization of one's proteome, metabolome, microbiome, autoantibodies, and epigenome, it has become abundantly clear that each of us has truly one-of-a-kind biological content. Well beyond the allure of the matchless fingerprint or snowflake concept, these singular, individual data and information set up a remarkable and unprecedented opportunity to improve medical treatment and develop preventive strategies to preserve health.
Collapse
Affiliation(s)
- Eric J Topol
- The Scripps Translational Science Institute, The Scripps Research Institute and Scripps Health, La Jolla, CA 92037, USA.
| |
Collapse
|
156
|
Yanagi I, Akahori R, Hatano T, Takeda KI. Fabricating nanopores with diameters of sub-1 nm to 3 nm using multilevel pulse-voltage injection. Sci Rep 2014; 4:5000. [PMID: 24847795 PMCID: PMC4028839 DOI: 10.1038/srep05000] [Citation(s) in RCA: 83] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2014] [Accepted: 04/14/2014] [Indexed: 12/22/2022] Open
Abstract
To date, solid-state nanopores have been fabricated primarily through a focused-electronic beam via TEM. For mass production, however, a TEM beam is not suitable and an alternative fabrication method is required. Recently, a simple method for fabricating solid-state nanopores was reported by Kwok, H. et al. and used to fabricate a nanopore (down to 2 nm in size) in a membrane via dielectric breakdown. In the present study, to fabricate smaller nanopores stably--specifically with a diameter of 1 to 2 nm (which is an essential size for identifying each nucleotide)--via dielectric breakdown, a technique called "multilevel pulse-voltage injection" (MPVI) is proposed and evaluated. MPVI can generate nanopores with diameters of sub-1 nm in a 10-nm-thick Si3N4 membrane with a probability of 90%. The generated nanopores can be widened to the desired size (as high as 3 nm in diameter) with sub-nanometre precision, and the mean effective thickness of the fabricated nanopores was 3.7 nm.
Collapse
Affiliation(s)
- Itaru Yanagi
- Hitachi Ltd., Central Research Laboratory, 1-280 Higashi-koigakubo, Kokubunji, Tokyo, 185-8603
| | - Rena Akahori
- Hitachi Ltd., Central Research Laboratory, 1-280 Higashi-koigakubo, Kokubunji, Tokyo, 185-8603
| | - Toshiyuki Hatano
- Hitachi Ltd., Central Research Laboratory, 1-280 Higashi-koigakubo, Kokubunji, Tokyo, 185-8603
| | - Ken-ichi Takeda
- Hitachi Ltd., Central Research Laboratory, 1-280 Higashi-koigakubo, Kokubunji, Tokyo, 185-8603
| |
Collapse
|
157
|
Kuleshov V, Xie D, Chen R, Pushkarev D, Ma Z, Blauwkamp T, Kertesz M, Snyder M. Whole-genome haplotyping using long reads and statistical methods. Nat Biotechnol 2014; 32:261-266. [PMID: 24561555 PMCID: PMC4073643 DOI: 10.1038/nbt.2833] [Citation(s) in RCA: 126] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2013] [Accepted: 01/17/2014] [Indexed: 12/24/2022]
Abstract
The rapid growth of sequencing technologies has greatly contributed to our understanding of human genetics. Yet, despite this growth, mainstream technologies have not been fully able to resolve the diploid nature of the human genome. Here we describe statistically aided, long-read haplotyping (SLRH), a rapid, accurate method that uses a statistical algorithm to take advantage of the partially phased information contained in long genomic fragments analyzed by short-read sequencing. For a human sample, as little as 30 Gbp of additional sequencing data are needed to phase genotypes identified by 50× coverage whole-genome sequencing. Using SLRH, we phase 99% of single-nucleotide variants in three human genomes into long haplotype blocks 0.2-1 Mbp in length. We apply our method to determine allele-specific methylation patterns in a human genome and identify hundreds of differentially methylated regions that were previously unknown. SLRH should facilitate population-scale haplotyping of human genomes.
Collapse
Affiliation(s)
- Volodymyr Kuleshov
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
- Illumina, Inc., 5200 Illumina Way, San Diego, CA 92199, USA
| | - Dan Xie
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Rui Chen
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | | | - Zhihai Ma
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Tim Blauwkamp
- Illumina, Inc., 5200 Illumina Way, San Diego, CA 92199, USA
| | | | - Michael Snyder
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
158
|
Murphy NM, Pouton CW, Irving HR. Human leukocyte antigen haplotype phasing by allele-specific enrichment with peptide nucleic acid probes. Mol Genet Genomic Med 2014; 2:245-53. [PMID: 24936514 PMCID: PMC4049365 DOI: 10.1002/mgg3.65] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2013] [Revised: 12/10/2013] [Accepted: 12/17/2013] [Indexed: 12/22/2022] Open
Abstract
Targeted capture of large fragments of genomic DNA that enrich for human leukocyte antigen (HLA) system haplotypes has utility in haematopoietic stem cell transplantation. Current methods of HLA matching are based on inference or familial studies of inheritance; and each approach has its own inherent limitations. We have designed and tested a probe–target-extraction method for capturing specific HLA haplotypes by hybridization of peptide nucleic acid (PNA) probes to alleles of the HLA-DRB1 gene. Short target fragments contained in plasmids were initially used to optimize the method followed by testing samples of genomic DNA from human subjects with preselected HLA haplotypes and obtained approximately 10% enrichment for the specific haplotype. When performed with high-molecular-weight genomic DNA, 99.0% versus 84.0% alignment match was obtained for the specific haplotype probed. The allele-specific target enrichment that we obtained can facilitate the elucidation of haplotypes between the 65 kb separating the HLA-DRB1 and the HLA-DQA1 genes, potentially spanning a total distance of at least 130 kb. Allele-specific target enrichment with PNA probes is a straightforward technique that has the capability to improve the resolution of DNA and whole genome sequencing technologies by allowing haplotyping of enriched DNA and crucially, retaining the DNA methylation profile.
Collapse
Affiliation(s)
- Nicholas M Murphy
- Drug Discovery Biology, Monash Institute of Pharmaceutical Sciences, Monash University (Parkville Campus) Melbourne, Victoria, 3052, Australia ; Department of Preimplantation Genetic Diagnosis, Melbourne IVF 344 Victoria Parade, East Melbourne, Australia
| | - Colin W Pouton
- Drug Discovery Biology, Monash Institute of Pharmaceutical Sciences, Monash University (Parkville Campus) Melbourne, Victoria, 3052, Australia
| | - Helen R Irving
- Drug Discovery Biology, Monash Institute of Pharmaceutical Sciences, Monash University (Parkville Campus) Melbourne, Victoria, 3052, Australia
| |
Collapse
|
159
|
|
160
|
Selvaraj S, R Dixon J, Bansal V, Ren B. Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing. Nat Biotechnol 2013; 31:1111-8. [PMID: 24185094 DOI: 10.1038/nbt.2728] [Citation(s) in RCA: 222] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2013] [Accepted: 10/02/2013] [Indexed: 12/22/2022]
Abstract
Rapid advances in high-throughput sequencing facilitate variant discovery and genotyping, but linking variants into a single haplotype remains challenging. Here we demonstrate HaploSeq, an approach for assembling chromosome-scale haplotypes by exploiting the existence of 'chromosome territories'. We use proximity ligation and sequencing to show that alleles on homologous chromosomes occupy distinct territories, and therefore this experimental protocol preferentially recovers physically linked DNA variants on a homolog. Computational analysis of such data sets allows for accurate (∼99.5%) reconstruction of chromosome-spanning haplotypes for ∼95% of alleles in hybrid mouse cells with 30× sequencing coverage. To resolve haplotypes for a human genome, which has a low density of variants, we coupled HaploSeq with local conditional phasing to obtain haplotypes for ∼81% of alleles with ∼98% accuracy from just 17× sequencing. Whereas methods based on proximity ligation were originally designed to investigate spatial organization of genomes, our results lend support for their use as a general tool for haplotyping.
Collapse
Affiliation(s)
- Siddarth Selvaraj
- 1] Ludwig Institute for Cancer Research, La Jolla, California, USA. [2] Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, California, USA. [3]
| | | | | | | |
Collapse
|
161
|
Delaneau O, Howie B, Cox AJ, Zagury JF, Marchini J. Haplotype estimation using sequencing reads. Am J Hum Genet 2013; 93:687-96. [PMID: 24094745 DOI: 10.1016/j.ajhg.2013.09.002] [Citation(s) in RCA: 267] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2013] [Revised: 08/19/2013] [Accepted: 09/04/2013] [Indexed: 12/20/2022] Open
Abstract
High-throughput sequencing technologies produce short sequence reads that can contain phase information if they span two or more heterozygote genotypes. This information is not routinely used by current methods that infer haplotypes from genotype data. We have extended the SHAPEIT2 method to use phase-informative sequencing reads to improve phasing accuracy. Our model incorporates the read information in a probabilistic model through base quality scores within each read. The method is primarily designed for high-coverage sequence data or data sets that already have genotypes called. One important application is phasing of single samples sequenced at high coverage for use in medical sequencing and studies of rare diseases. Our method can also use existing panels of reference haplotypes. We tested the method by using a mother-father-child trio sequenced at high-coverage by Illumina together with the low-coverage sequence data from the 1000 Genomes Project (1000GP). We found that use of phase-informative reads increases the mean distance between switch errors by 22% from 274.4 kb to 328.6 kb. We also used male chromosome X haplotypes from the 1000GP samples to simulate sequencing reads with varying insert size, read length, and base error rate. When using short 100 bp paired-end reads, we found that using mixtures of insert sizes produced the best results. When using longer reads with high error rates (5-20 kb read with 4%-15% error per base), phasing performance was substantially improved.
Collapse
Affiliation(s)
- Olivier Delaneau
- Department of Statistics, University of Oxford, Oxford OX1 3TG, UK
| | | | | | | | | |
Collapse
|
162
|
Dorn C, Grunert M, Sperling SR. Application of high-throughput sequencing for studying genomic variations in congenital heart disease. Brief Funct Genomics 2013; 13:51-65. [PMID: 24095982 DOI: 10.1093/bfgp/elt040] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Congenital heart diseases (CHD) represent the most common birth defect in human. The majority of cases are caused by a combination of complex genetic alterations and environmental influences. In the past, many disease-causing mutations have been identified; however, there is still a large proportion of cardiac malformations with unknown precise origin. High-throughput sequencing technologies established during the last years offer novel opportunities to further study the genetic background underlying the disease. In this review, we provide a roadmap for designing and analyzing high-throughput sequencing studies focused on CHD, but also with general applicability to other complex diseases. The three main next-generation sequencing (NGS) platforms including their particular advantages and disadvantages are presented. To identify potentially disease-related genomic variations and genes, different filtering steps and gene prioritization strategies are discussed. In addition, available control datasets based on NGS are summarized. Finally, we provide an overview of current studies already using NGS technologies and showing that these techniques will help to further unravel the complex genetics underlying CHD.
Collapse
Affiliation(s)
- Cornelia Dorn
- Department of Cardiovascular Genetics, Experimental and Clinical Research Center (ECRC), Charité-University Medicine Berlin and Max Delbrück Center (MDC) for Molecular Medicine, Lindenberger Weg 80, 13125 Berlin, Germany. Department of Biochemistry, Free University Berlin, Berlin, Germany. Tel.: +49-(0)30-450540123; Fax: +49-(0)30-84131699;
| | | | | |
Collapse
|
163
|
Kuk AYC, Li X, Xu J. An EM algorithm based on an internal list for estimating haplotype distributions of rare variants from pooled genotype data. BMC Genet 2013; 14:82. [PMID: 24034507 PMCID: PMC3847674 DOI: 10.1186/1471-2156-14-82] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2013] [Accepted: 08/28/2013] [Indexed: 12/19/2022] Open
Abstract
Background Pooling is a cost effective way to collect data for genetic association studies, particularly for rare genetic variants. It is of interest to estimate the haplotype frequencies, which contain more information than single locus statistics. By viewing the pooled genotype data as incomplete data, the expectation-maximization (EM) algorithm is the natural algorithm to use, but it is computationally intensive. A recent proposal to reduce the computational burden is to make use of database information to form a list of frequently occurring haplotypes, and to restrict the haplotypes to come from this list only in implementing the EM algorithm. There is, however, the danger of using an incorrect list, and there may not be enough database information to form a list externally in some applications. Results We investigate the possibility of creating an internal list from the data at hand. One way to form such a list is to collapse the observed total minor allele frequencies to “zero” or “at least one”, which is shown to have the desirable effect of amplifying the haplotype frequencies. To improve coverage, we propose ways to add and remove haplotypes from the list, and a benchmarking method to determine the frequency threshold for removing haplotypes. Simulation results show that the EM estimates based on a suitably augmented and trimmed collapsed data list (ATCDL) perform satisfactorily. In two scenarios involving 25 and 32 loci respectively, the EM-ATCDL estimates outperform the EM estimates based on other lists as well as the collapsed data maximum likelihood estimates. Conclusions The proposed augmented and trimmed CD list is a useful list for the EM algorithm to base upon in estimating the haplotype distributions of rare variants. It can handle more markers and larger pool size than existing methods, and the resulting EM-ATCDL estimates are more efficient than the EM estimates based on other lists.
Collapse
Affiliation(s)
- Anthony Y C Kuk
- Department of Statistics and Applied Probability, National University of Singapore, Singapore, 117546, Singapore.
| | | | | |
Collapse
|
164
|
Bromberg Y. Building a genome analysis pipeline to predict disease risk and prevent disease. J Mol Biol 2013; 425:3993-4005. [PMID: 23928561 DOI: 10.1016/j.jmb.2013.07.038] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2013] [Revised: 07/26/2013] [Accepted: 07/28/2013] [Indexed: 12/24/2022]
Abstract
Reduced costs and increased speed and accuracy of sequencing can bring the genome-based evaluation of individual disease risk to the bedside. While past efforts have identified a number of actionable mutations, the bulk of genetic risk remains hidden in sequence data. The biggest challenge facing genomic medicine today is the development of new techniques to predict the specifics of a given human phenome (set of all expressed phenotypes) encoded by each individual variome (full set of genome variants) in the context of the given environment. Numerous tools exist for the computational identification of the functional effects of a single variant. However, the pipelines taking advantage of full genomic, exomic, transcriptomic (and other) sequences have only recently become a reality. This review looks at the building of methodologies for predicting "variome"-defined disease risk. It also discusses some of the challenges for incorporating such a pipeline into everyday medical practice.
Collapse
Affiliation(s)
- Y Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Drive, New Brunswick, NJ 08873, USA.
| |
Collapse
|
165
|
Abstract
Genetic susceptibility to type 1 diabetes (T1D) has been a subject of intensive study for nearly four decades. This article will present the history of these studies, beginning with observations of the Human Leukocyte Antigen (HLA) association in the 1970s, through the advent of DNA-based genotyping methodologies, through recent large, international collaborations and genome-wide association studies. More than 40 genetic loci have been associated with T1D in multiple studies; however, the HLA region, with its multiple genes and extreme polymorphism at those loci, remains by far the greatest contributor to the genetic susceptibility to T1D. Even after decades of study, the complete story has yet to unfold, and exact mechanisms by which HLA and other associated loci confer T1D susceptibility remain elusive.
Collapse
Affiliation(s)
- Janelle A Noble
- Children's Hospital Oakland Research Institute, Oakland, California 94609, USA.
| | | |
Collapse
|
166
|
Abstract
Genetic susceptibility to type 1 diabetes (T1D) has been a subject of intensive study for nearly four decades. This article will present the history of these studies, beginning with observations of the Human Leukocyte Antigen (HLA) association in the 1970s, through the advent of DNA-based genotyping methodologies, through recent large, international collaborations and genome-wide association studies. More than 40 genetic loci have been associated with T1D in multiple studies; however, the HLA region, with its multiple genes and extreme polymorphism at those loci, remains by far the greatest contributor to the genetic susceptibility to T1D. Even after decades of study, the complete story has yet to unfold, and exact mechanisms by which HLA and other associated loci confer T1D susceptibility remain elusive.
Collapse
Affiliation(s)
- Janelle A Noble
- Children's Hospital Oakland Research Institute, Oakland, California 94609, USA.
| | | |
Collapse
|
167
|
Metzger BPH, Gelembiuk GW, Lee CE. Direct sequencing of haplotypes from diploid individuals through a modified emulsion PCR-based single-molecule sequencing approach. Mol Ecol Resour 2013; 13:135-43. [PMID: 23231626 DOI: 10.1111/1755-0998.12034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2012] [Revised: 10/08/2012] [Accepted: 10/11/2012] [Indexed: 11/30/2022]
Abstract
While standard DNA-sequencing approaches readily yield genotypic sequence data, haplotype information is often of greater utility for population genetic analyses. However, obtaining individual haplotype sequences can be costly and time-consuming and sometimes requires statistical reconstruction approaches that are subject to bias and error. Advancements have recently been made in determining individual chromosomal sequences in large-scale genomic studies, yet few options exist for obtaining this information from large numbers of highly polymorphic individuals in a cost-effective manner. As a solution, we developed a simple PCR-based method for obtaining sequence information from individual DNA strands using standard laboratory equipment. The method employs a water-in-oil emulsion to separate the PCR mixture into thousands of individual microreactors. PCR within these small vesicles results in amplification from only a single starting DNA template molecule and thus a single haplotype. We improved upon previous approaches by including SYBR Green I and a melted agarose solution in the PCR, allowing easy identification and separation of individually amplified DNA molecules. We demonstrate the use of this method on a highly polymorphic estuarine population of the copepod Eurytemora affinis for which current molecular and computational methods for haplotype determination have been inadequate.
Collapse
|
168
|
Dong QZ, Zhang XF, Zhao Y, Jia HL, Zhou HJ, Dai C, Sun HJ, Qin Y, Zhang WD, Ren N, Ye QH, Qin LX. Osteopontin promoter polymorphisms at locus -443 significantly affect the metastasis and prognosis of human hepatocellular carcinoma. Hepatology 2013; 57:1024-34. [PMID: 23079960 DOI: 10.1002/hep.26103] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/28/2012] [Accepted: 09/28/2012] [Indexed: 01/11/2023]
Abstract
UNLABELLED Osteopontin (OPN) plays a crucial role in hepatocellular carcinoma (HCC) metastasis. However, little is known about the impact of OPN polymorphisms on cancer progression. In this study, we first identified the single nucleotide polymorphisms (SNPs) in the OPN promoter region by direct sequencing in 30 HCCs, and then evaluated the prognostic values of the selected ones in two large cohorts of 826 HCC patients. The identified SNPs were functionally analyzed using in vitro and in vivo assays and their correlations with OPN levels were also evaluated. Only SNP at locus -443 and their related haplotypes (Ht2: -1748A/-616G/-443T/-155* [*indicates base deletion]; Ht3: -1748A/-616G/-443C/-155*) were significantly associated with overall survival (OS) and time to recurrence (TTR). The patients with the -443TT/TC genotype or Ht2 had a shorter OS and TTR compared with those with -443CC genotype or Ht3. This was further confirmed in the validation cohort. Moreover, this correlation remained significant in patients with small HCCs (≤5 cm). Multivariate analyses indicated that the prognostic performance of the -443 genotypes (OS, P=0.031; TTR, P=0.005) and their related haplotypes (OS, P=0.002; TTR, P=0.001) was independent of other clinicopathological factors. The Ht2 and -443TT genotype could significantly increase the promoter transcriptional activity and expression level of OPN compared with the Ht3 or -443CC genotype, and lead to an obvious increase in both in vitro invasion and in vivo tumor growth and lung metastasis of HCC cells (P<0.05). CONCLUSION The genetic variation at locus -443 of the OPN promoter plays important roles in the regulation of OPN expression and cancer progression of HCCs, which is a novel determinant and target for HCC metastasis and prognosis.
Collapse
Affiliation(s)
- Qiong-Zhu Dong
- Liver Cancer Institute & Zhongshan Hospital, Institutes of Biomedical Science, Fudan University, Shanghai, China
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
169
|
Abstract
BACKGROUND Haplotype information is useful for various genetic analyses, including genome-wide association studies. Determining haplotypes experimentally is difficult and there are several computational approaches that infer haplotypes from genomic data. Among such approaches, single individual haplotyping or haplotype assembly, which infers two haplotypes of an individual from aligned sequence fragments, has been attracting considerable attention. To avoid incorrect results in downstream analyses, it is important not only to assemble haplotypes as long as possible but also to provide means to extract highly reliable haplotype regions. Although there are several efficient algorithms for solving haplotype assembly, there are no efficient method that allow for extracting the regions assembled with high confidence. RESULTS We develop a probabilistic model, called MixSIH, for solving the haplotype assembly problem. The model has two mixture components representing two haplotypes. Based on the optimized model, a quality score is defined, which we call the 'minimum connectivity' (MC) score, for each segment in the haplotype assembly. Because existing accuracy measures for haplotype assembly are designed to compare the efficiency between the algorithms and are not suitable for evaluating the quality of the set of partially assembled haplotype segments, we develop an accuracy measure based on the pairwise consistency and evaluate the accuracy on the simulation and real data. By using the MC scores, our algorithm can extract highly accurate haplotype segments. We also show evidence that an existing experimental dataset contains chimeric read fragments derived from different haplotypes, which significantly degrade the quality of assembled haplotypes. CONCLUSIONS We develop a novel method for solving the haplotype assembly problem. We also define the quality score which is based on our model and indicates the accuracy of the haplotypes segments. In our evaluation, MixSIH has successfully extracted reliable haplotype segments. The C++ source code of MixSIH is available at https://sites.google.com/site/hmatsu1226/software/mixsih.
Collapse
Affiliation(s)
- Hirotaka Matsumoto
- Department of Computational Biology, Faculty of Frontier Science, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan.
| | | |
Collapse
|
170
|
Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly. Nat Biotechnol 2013; 30:771-6. [PMID: 22797562 DOI: 10.1038/nbt.2303] [Citation(s) in RCA: 442] [Impact Index Per Article: 40.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2012] [Accepted: 06/06/2012] [Indexed: 12/21/2022]
Abstract
We describe genome mapping on nanochannel arrays. In this approach, specific sequence motifs in single DNA molecules are fluorescently labeled, and the DNA molecules are uniformly stretched in thousands of silicon channels on a nanofluidic device. Fluorescence imaging allows the construction of maps of the physical distances between occurrences of the sequence motifs. We demonstrate the analysis, individually and as mixtures, of 95 bacterial artificial chromosome (BAC) clones that cover the 4.7-Mb human major histocompatibility complex region. We obtain accurate, haplotype-resolved, sequence motif maps hundreds of kilobases in length, resulting in a median coverage of 114× for the BACs. The final sequence motif map assembly contains three contigs. With an average distance of 9 kb between labels, we detect 22 haplotype differences. We also use the sequence motif maps to provide scaffolds for de novo assembly of sequencing data. Nanochannel genome mapping should facilitate de novo assembly of sequencing reads from complex regions in diploid organisms, haplotype and structural variation analysis and comparative genomics.
Collapse
|
171
|
Lu S, Zong C, Fan W, Yang M, Li J, Chapman AR, Zhu P, Hu X, Xu L, Yan L, Bai F, Qiao J, Tang F, Li R, Xie XS. Probing meiotic recombination and aneuploidy of single sperm cells by whole-genome sequencing. Science 2013; 338:1627-30. [PMID: 23258895 DOI: 10.1126/science.1229112] [Citation(s) in RCA: 234] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Meiotic recombination creates genetic diversity and ensures segregation of homologous chromosomes. Previous population analyses yielded results averaged among individuals and affected by evolutionary pressures. We sequenced 99 sperm from an Asian male by using the newly developed amplification method-multiple annealing and looping-based amplification cycles-to phase the personal genome and map recombination events at high resolution, which are nonuniformly distributed across the genome in the absence of selection pressure. The paucity of recombination near transcription start sites observed in individual sperm indicates that such a phenomenon is intrinsic to the molecular mechanism of meiosis. Interestingly, a decreased crossover frequency combined with an increase of autosomal aneuploidy is observable on a global per-sperm basis.
Collapse
Affiliation(s)
- Sijia Lu
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
172
|
Exploiting identifiability and intergene correlation for improved detection of differential expression. ISRN BIOINFORMATICS 2013; 2013:404717. [PMID: 25937946 PMCID: PMC4393076 DOI: 10.1155/2013/404717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/13/2012] [Accepted: 11/19/2012] [Indexed: 11/23/2022]
Abstract
Accurate differential analysis of microarray data strongly depends on effective treatment of intergene correlation. Such dependence is ordinarily accounted for in terms of its effect on significance cutoffs. In this paper, it is shown that correlation can, in fact, be exploited to share information across tests and reorder expression differentials for increased statistical power, regardless of the threshold. Significantly improved differential analysis is the result of two simple measures: (i) adjusting test statistics to exploit information from identifiable genes (the large subset of genes represented on a microarray that can be classified a priori as nondifferential with very high confidence], but (ii) doing so in a way that accounts for linear dependencies among identifiable and nonidentifiable genes. A method is developed that builds upon the widely used two-sample t-statistic approach and uses analysis in Hilbert space to decompose the nonidentified gene vector into two components that are correlated and uncorrelated with the identified set. In the application to data derived from a widely studied prostate cancer database, the proposed method outperforms some of the most highly regarded approaches published to date. Algorithms in MATLAB and in R are available for public download.
Collapse
|
173
|
Xie M, Wang J, Jiang T. A fast and accurate algorithm for single individual haplotyping. BMC SYSTEMS BIOLOGY 2012; 6 Suppl 2:S8. [PMID: 23282221 PMCID: PMC3521186 DOI: 10.1186/1752-0509-6-s2-s8] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Background Due to the difficulty in separating two (paternal and maternal) copies of a chromosome, most published human genome sequences only provide genotype information, i.e., the mixed information of the underlying two haplotypes. However, phased haplotype information is needed to completely understand complex genetic polymorphisms and to increase the power of genome-wide association studies for complex diseases. With the rapid development of DNA sequencing technologies, reconstructing a pair of haplotypes from an individual's aligned DNA fragments by computer algorithms (i.e., Single Individual Haplotyping) has become a practical haplotyping approach. Results In the paper, we combine two measures "errors corrected" and "fragments cut" and propose a new optimization model, called Balanced Optimal Partition (BOP), for single individual haplotyping. The model generalizes two existing models, Minimum Error Correction (MEC) and Maximum Fragments Cut (MFC), and could be made either model by using some extreme parameter values. To solve the model, we design a heuristic dynamic programming algorithm H-BOP. By limiting the number of intermediate solutions at each iteration to an appropriately chosen small integer k, H-BOP is able to solve the model efficiently. Conclusions Extensive experimental results on simulated and real data show that when k = 8, H-BOP is generally faster and more accurate than a recent state-of-art algorithm ReFHap in haplotype reconstruction. The running time of H-BOP is linearly dependent on some of the key parameters controlling the input size and H-BOP scales well to large input data. The code of H-BOP is available to the public for free upon request to the corresponding author.
Collapse
Affiliation(s)
- Minzhu Xie
- College of Physics and Information Science, Hunan Normal University, Changsha 410081, PR China.
| | | | | |
Collapse
|
174
|
Tyson J, Armour JAL. Determination of haplotypes at structurally complex regions using emulsion haplotype fusion PCR. BMC Genomics 2012; 13:693. [PMID: 23231411 PMCID: PMC3543183 DOI: 10.1186/1471-2164-13-693] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2012] [Accepted: 12/07/2012] [Indexed: 12/26/2022] Open
Abstract
Background Genotyping and massively-parallel sequencing projects result in a vast amount of diploid data that is only rarely resolved into its constituent haplotypes. It is nevertheless this phased information that is transmitted from one generation to the next and is most directly associated with biological function and the genetic causes of biological effects. Despite progress made in genome-wide sequencing and phasing algorithms and methods, problems assembling (and reconstructing linear haplotypes in) regions of repetitive DNA and structural variation remain. These dynamic and structurally complex regions are often poorly understood from a sequence point of view. Regions such as these that are highly similar in their sequence tend to be collapsed onto the genome assembly. This is turn means downstream determination of the true sequence haplotype in these regions poses a particular challenge. For structurally complex regions, a more focussed approach to assembling haplotypes may be required. Results In order to investigate reconstruction of spatial information at structurally complex regions, we have used an emulsion haplotype fusion PCR approach to reproducibly link sequences of up to 1kb in length to allow phasing of multiple variants from neighbouring loci, using allele-specific PCR and sequencing to detect the phase. By using emulsion systems linking flanking regions to amplicons within the CNV, this led to the reconstruction of a 59kb haplotype across the DEFA1A3 CNV in HapMap individuals. Conclusion This study has demonstrated a novel use for emulsion haplotype fusion PCR in addressing the issue of reconstructing structural haplotypes at multiallelic copy variable regions, using the DEFA1A3 locus as an example.
Collapse
Affiliation(s)
- Jess Tyson
- School of Biology, University of Nottingham, Queen's Medical Centre, Nottingham, NG7 2UH, UK.
| | | |
Collapse
|
175
|
Pirola Y, Della Vedova G, Biffani S, Stella A, Bonizzoni P. A fast and practical approach to genotype phasing and imputation on a pedigree with erroneous and incomplete information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1582-1594. [PMID: 22848137 DOI: 10.1109/tcbb.2012.100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
The MINIMUM-RECOMBINANT HAPLOTYPE CONFIGURATION problem (MRHC) has been highly successful in providing a sound combinatorial formulation for the important problem of genotype phasing on pedigrees. Despite several algorithmic advances that have improved the efficiency, its applicability to real data sets has been limited since it does not take into account some important phenomena such as mutations, genotyping errors, and missing data. In this work, we propose the MINIMUM-RECOMBINANT HAPLOTYPE CONFIGURATION WITH BOUNDED ERRORS problem (MRHCE), which extends the original MRHC formulation by incorporating the two most common characteristics of real data: errors and missing genotypes (including untyped individuals). We describe a practical algorithm for MRHCE that is based on a reduction to the well-known Satisfiability problem (SAT) and exploits recent advances in the constraint programming literature. An experimental analysis demonstrates the biological soundness of the phasing model and the effectiveness (on both accuracy and performance) of the algorithm under several scenarios. The analysis on real data and the comparison with state-of-the-art programs reveals that our approach couples better scalability to large and complex pedigrees with the explicit inclusion of genotyping errors into the model.
Collapse
Affiliation(s)
- Yuri Pirola
- Dipartimento di Informatica Sistemistica e Comunicazione (DISCo), Università degli Studi di Milano-Bicocca, V.le Sarca, 336, Milan 20126, Italy.
| | | | | | | | | |
Collapse
|
176
|
Torkamani A, Pham P, Libiger O, Bansal V, Zhang G, Scott-Van Zeeland AA, Tewhey R, Topol EJ, Schork NJ. Clinical implications of human population differences in genome-wide rates of functional genotypes. Front Genet 2012; 3:211. [PMID: 23125845 PMCID: PMC3485509 DOI: 10.3389/fgene.2012.00211] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2012] [Accepted: 09/26/2012] [Indexed: 12/21/2022] Open
Abstract
There have been a number of recent successes in the use of whole genome sequencing and sophisticated bioinformatics techniques to identify pathogenic DNA sequence variants responsible for individual idiopathic congenital conditions. However, the success of this identification process is heavily influenced by the ancestry or genetic background of a patient with an idiopathic condition. This is so because potential pathogenic variants in a patient’s genome must be contrasted with variants in a reference set of genomes made up of other individuals’ genomes of the same ancestry as the patient. We explored the effect of ignoring the ancestries of both an individual patient and the individuals used to construct reference genomes. We pursued this exploration in two major steps. We first considered variation in the per-genome number and rates of likely functional derived (i.e., non-ancestral, based on the chimp genome) single nucleotide variants and small indels in 52 individual whole human genomes sampled from 10 different global populations. We took advantage of a suite of computational and bioinformatics techniques to predict the functional effect of over 24 million genomic variants, both coding and non-coding, across these genomes. We found that the typical human genome harbors ∼5.5–6.1 million total derived variants, of which ∼12,000 are likely to have a functional effect (∼5000 coding and ∼7000 non-coding). We also found that the rates of functional genotypes per the total number of genotypes in individual whole genomes differ dramatically between human populations. We then created tables showing how the use of comparator or reference genome panels comprised of genomes from individuals that do not have the same ancestral background as a patient can negatively impact pathogenic variant identification. Our results have important implications for clinical sequencing initiatives.
Collapse
Affiliation(s)
- Ali Torkamani
- The Scripps Translational Science La Jolla, CA, USA ; Scripps Health La Jolla, CA, USA ; Department of Molecular and Experimental Medicine, The Scripps Research Institute La Jolla, CA, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
177
|
Aguiar D, Istrail S. HapCompass: a fast cycle basis algorithm for accurate haplotype assembly of sequence data. J Comput Biol 2012; 19:577-90. [PMID: 22697235 DOI: 10.1089/cmb.2012.0084] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Genome assembly methods produce haplotype phase ambiguous assemblies due to limitations in current sequencing technologies. Determining the haplotype phase of an individual is computationally challenging and experimentally expensive. However, haplotype phase information is crucial in many bioinformatics workflows such as genetic association studies and genomic imputation. Current computational methods of determining haplotype phase from sequence data--known as haplotype assembly--have difficulties producing accurate results for large (1000 genomes-type) data or operate on restricted optimizations that are unrealistic considering modern high-throughput sequencing technologies. We present a novel algorithm, HapCompass, for haplotype assembly of densely sequenced human genome data. The HapCompass algorithm operates on a graph where single nucleotide polymorphisms (SNPs) are nodes and edges are defined by sequence reads and viewed as supporting evidence of co-occurring SNP alleles in a haplotype. In our graph model, haplotype phasings correspond to spanning trees. We define the minimum weighted edge removal optimization on this graph and develop an algorithm based on cycle basis local optimizations for resolving conflicting evidence. We then estimate the amount of sequencing required to produce a complete haplotype assembly of a chromosome. Using these estimates together with metrics borrowed from genome assembly and haplotype phasing, we compare the accuracy of HapCompass, the Genome Analysis ToolKit, and HapCut for 1000 Genomes Project and simulated data. We show that HapCompass performs significantly better for a variety of data and metrics. HapCompass is freely available for download (www.brown.edu/Research/Istrail_Lab/).
Collapse
Affiliation(s)
- Derek Aguiar
- Department of Computer Science, Brown University, Providence RI 02912, USA
| | | |
Collapse
|
178
|
|
179
|
Kuk AY, Li X, Xu J. A fast collapsed data method for estimating haplotype frequencies from pooled genotype data with applications to the study of rare variants. Stat Med 2012; 32:1343-60. [DOI: 10.1002/sim.5540] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2012] [Accepted: 06/11/2012] [Indexed: 12/31/2022]
Affiliation(s)
- Anthony Y.C. Kuk
- Department of Statistics and Applied Probability; National University of Singapore; Singapore; Singapore
| | - Xiang Li
- Department of Statistics and Applied Probability; National University of Singapore; Singapore; Singapore
| | - Jinfeng Xu
- Department of Statistics and Applied Probability; National University of Singapore; Singapore; Singapore
| |
Collapse
|
180
|
Wu JR, Zeng R. Molecular basis for population variation: from SNPs to SAPs. FEBS Lett 2012; 586:2841-5. [PMID: 22828278 DOI: 10.1016/j.febslet.2012.07.036] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2012] [Revised: 07/14/2012] [Accepted: 07/16/2012] [Indexed: 01/09/2023]
Abstract
Single nucleotide polymorphisms (SNPs) are one type of genomic DNA variations in a population. Correspondingly, single amino-acid polymorphisms (SAPs) derived from non-synonymous SNPs represent protein variations in a population. Recently, using proteomic approaches, SAPs in the plasma proteomes of an Asian population were systematically identified for the first time. That study showed that heterozygous and homozygous proteins with various SAPs have different associations with particular traits in the population. Recent discoveries of widespread differences between RNA and DNA sequences indicate that RNA editing is also a source of SAPs--one that is independent of genomic SNPs. Furthermore, we argue that there are de novo SAPs that are not encoded by either DNA or RNA sequences.
Collapse
Affiliation(s)
- Jia-Rui Wu
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | | |
Collapse
|
181
|
Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells. Nature 2012; 487:190-5. [PMID: 22785314 PMCID: PMC3397394 DOI: 10.1038/nature11236] [Citation(s) in RCA: 207] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2012] [Accepted: 05/15/2012] [Indexed: 12/16/2022]
Abstract
Recent advances in whole genome sequencing have brought the vision of personal genomics and genomic medicine closer to reality. However, current methods lack clinical accuracy and the ability to describe the context (haplotypes) in which genome variants co-occur in a cost-effective manner. Here we describe a low-cost DNA sequencing and haplotyping process, Long Fragment Read (LFR) technology, similar to sequencing long single DNA molecules without cloning or separation of metaphase chromosomes. In this study, ten LFR libraries were made using only ~100 pg of human DNA per sample. Up to 97% of the heterozygous single nucleotide variants (SNVs) were assembled into long haplotype contigs. Removal of false positive SNVs not phased by multiple LFR haplotypes resulted in a final genome error rate of 1 in 10 Mb. Cost-effective and accurate genome sequencing and haplotyping from 10-20 human cells, as demonstrated here, will enable comprehensive genetic studies and diverse clinical applications.
Collapse
|
182
|
Rosenfeld JA, Mason CE, Smith TM. Limitations of the human reference genome for personalized genomics. PLoS One 2012; 7:e40294. [PMID: 22811759 PMCID: PMC3394790 DOI: 10.1371/journal.pone.0040294] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2012] [Accepted: 06/07/2012] [Indexed: 11/19/2022] Open
Abstract
Data from the 1000 genomes project (1KGP) and Complete Genomics (CG) have dramatically increased the numbers of known genetic variants and challenge several assumptions about the reference genome and its uses in both clinical and research settings. Specifically, 34% of published array-based GWAS studies for a variety of diseases utilize probes that overlap unanticipated single nucleotide polymorphisms (SNPs), indels, or structural variants. Linkage disequilibrium (LD) block length depends on the numbers of markers used, and the mean LD block size decreases from 16 kb to 7 kb,when HapMap-based calculations are compared to blocks computed from1KGP data. Additionally, when 1KGP and CG variants are compared, 19% of the single nucleotide variants (SNVs) reported from common genomes are unique to one dataset; likely a result of differences in data collection methodology, alignment of reads to the reference genome, and variant-calling algorithms. Together these observations indicate that current research resources and informatics methods do not adequately account for the high level of variation that already exists in the human population and significant efforts are needed to create resources that can accurately assess personal genomics for health, disease, and predict treatment outcomes.
Collapse
Affiliation(s)
- Jeffrey A. Rosenfeld
- Division of High Performance and Research Computing, University of Medicine & Dentistry of New Jersey, Newark, New Jersey, United States of America
- American Museum of Natural History, Sackler Institute for Comparative Genomics, New York, New York, United States of America
| | - Christopher E. Mason
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, United States of America
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York, United States of America
| | - Todd M. Smith
- PerkinElmer, Seattle, Washington, United States of America
| |
Collapse
|
183
|
Boulanger J, Muresan L, Tiemann-Boege I. Massively parallel haplotyping on microscopic beads for the high-throughput phase analysis of single molecules. PLoS One 2012; 7:e36064. [PMID: 22558329 PMCID: PMC3340404 DOI: 10.1371/journal.pone.0036064] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2012] [Accepted: 03/30/2012] [Indexed: 12/12/2022] Open
Abstract
In spite of the many advances in haplotyping methods, it is still very difficult to characterize rare haplotypes in tissues and different environmental samples or to accurately assess the haplotype diversity in large mixtures. This would require a haplotyping method capable of analyzing the phase of single molecules with an unprecedented throughput. Here we describe such a haplotyping method capable of analyzing in parallel hundreds of thousands single molecules in one experiment. In this method, multiple PCR reactions amplify different polymorphic regions of a single DNA molecule on a magnetic bead compartmentalized in an emulsion drop. The allelic states of the amplified polymorphisms are identified with fluorescently labeled probes that are then decoded from images taken of the arrayed beads by a microscope. This method can evaluate the phase of up to 3 polymorphisms separated by up to 5 kilobases in hundreds of thousands single molecules. We tested the sensitivity of the method by measuring the number of mutant haplotypes synthesized by four different commercially available enzymes: Phusion, Platinum Taq, Titanium Taq, and Phire. The digital nature of the method makes it highly sensitive to detecting haplotype ratios of less than 1:10,000. We also accurately quantified chimera formation during the exponential phase of PCR by different DNA polymerases.
Collapse
Affiliation(s)
- Jérôme Boulanger
- Cell and Tissue Imaging Core, Centre National de la Recherche Scientifique, Institut Curie, Paris, France
- Radon Institute for Computational and Applied Mathematics of the Austrian Academy of Sciences, Linz, Austria
| | - Leila Muresan
- Department of Knowledge-Based Mathematical Systems, Johannes Kepler University, Linz, Austria
| | | |
Collapse
|
184
|
Su ZD, Sun L, Yu DX, Li RX, Li HX, Yu ZJ, Sheng QH, Lin X, Zeng R, Wu JR. Quantitative detection of single amino acid polymorphisms by targeted proteomics. J Mol Cell Biol 2012; 3:309-15. [PMID: 22028381 DOI: 10.1093/jmcb/mjr024] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Single-nucleotide polymorphisms (SNPs) are recognized as one kind of major genetic variants in population scale. However, polymorphisms at the proteome level in population scale remain elusive. In the present study, we named amino acid variances derived from SNPs within coding regions as single amino acid polymorphisms (SAPs) at the proteome level, and developed a pipeline of non-targeted and targeted proteomics to identify and quantify SAP peptides in human plasma. The absolute concentrations of three selected SAP-peptide pairs among 290 Asian individuals were measured by selected reaction monitoring (SRM) approach, and their associations with both obesity and diabetes were further analyzed. This work revealed that heterozygotes and homozygotes with various SAPs in a population could have different associations with particular traits. In addition, the SRM approach allows us for the first time to separately measure the absolute concentration of each SAP peptide in the heterozygotes, which also shows different associations with particular traits.
Collapse
Affiliation(s)
- Zhi-Duan Su
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences
| | | | | | | | | | | | | | | | | | | |
Collapse
|
185
|
Zaina S, Lund G. Integrating genomic and epigenomic information: a promising strategy for identifying functional DNA variants of human disease. Clin Genet 2012; 81:334-40. [DOI: 10.1111/j.1399-0004.2011.01840.x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
186
|
Koepke T, Schaeffer S, Krishnan V, Jiwan D, Harper A, Whiting M, Oraguzie N, Dhingra A. Rapid gene-based SNP and haplotype marker development in non-model eukaryotes using 3'UTR sequencing. BMC Genomics 2012; 13:18. [PMID: 22239826 PMCID: PMC3293726 DOI: 10.1186/1471-2164-13-18] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2011] [Accepted: 01/12/2012] [Indexed: 11/25/2022] Open
Abstract
Background Sweet cherry (Prunus avium L.), a non-model crop with narrow genetic diversity, is an important member of sub-family Amygdoloideae within Rosaceae. Compared to other important members like peach and apple, sweet cherry lacks in genetic and genomic information, impeding understanding of important biological processes and development of efficient breeding approaches. Availability of single nucleotide polymorphism (SNP)-based molecular markers can greatly benefit breeding efforts in such non-model species. RNA-seq approaches employing second generation sequencing platforms offer a unique avenue to rapidly identify gene-based SNPs. Additionally, haplotype markers can be rapidly generated from transcript-based SNPs since they have been found to be extremely utile in identification of genetic variants related to health, disease and response to environment as highlighted by the human HapMap project. Results RNA-seq was performed on two sweet cherry cultivars, Bing and Rainier using a 3' untranslated region (UTR) sequencing method yielding 43,396 assembled contigs. In order to test our approach of rapid identification of SNPs without any reference genome information, over 25% (10,100) of the contigs were screened for the SNPs. A total of 207 contigs from this set were identified to contain high quality SNPs. A set of 223 primer pairs were designed to amplify SNP containing regions from these contigs and high resolution melting (HRM) analysis was performed with eight important parental sweet cherry cultivars. Six of the parent cultivars were distantly related to Bing and Rainier, the cultivars used for initial SNP discovery. Further, HRM analysis was also performed on 13 seedlings derived from a cross between two of the parents. Our analysis resulted in the identification of 84 (38.7%) primer sets that demonstrated variation among the tested germplasm. Reassembly of the raw 3'UTR sequences using upgraded transcriptome assembly software yielded 34,620 contigs containing 2243 putative SNPs in 887 contigs after stringent filtering. Contigs with multiple SNPs were visually parsed to identify 685 putative haplotypes at 335 loci in 301 contigs. Conclusions This approach, which leverages the advantages of RNA-seq approaches, enabled rapid generation of gene-linked SNP and haplotype markers. The general approach presented in this study can be easily applied to other non-model eukaryotes irrespective of the ploidy level to identify gene-linked polymorphisms that are expected to facilitate efficient Gene Assisted Breeding (GAB), genotyping and population genetics studies. The identified SNP haplotypes reveal some of the allelic differences in the two sweet cherry cultivars analyzed. The identification of these SNP and haplotype markers is expected to significantly improve the genomic resources for sweet cherry and facilitate efficient GAB in this non-model crop.
Collapse
Affiliation(s)
- Tyson Koepke
- Department of Horticulture, Washington State University, Pullman, WA, USA
| | | | | | | | | | | | | | | |
Collapse
|
187
|
Deller JR, Radha H, McCormick JJ, Wang H. Nonlinear dependence in the discovery of differentially expressed genes. ISRN BIOINFORMATICS 2012; 2012:564715. [PMID: 25937940 PMCID: PMC4393074 DOI: 10.5402/2012/564715] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/16/2011] [Accepted: 11/09/2011] [Indexed: 11/23/2022]
Abstract
Microarray data are used to determine which genes are active in response to a changing cell environment. Genes are “discovered” when they are significantly differentially expressed in the microarray data collected under the differing conditions. In one prevalent approach, all genes are assumed to satisfy a null hypothesis, ℍ0, of no difference in expression. A false discovery (type
1 error) occurs when ℍ0 is incorrectly rejected. The quality of a detection algorithm is assessed by estimating its number of false
discoveries, 𝔉. Work involving the second-moment modeling of the z-value histogram (representing gene expression differentials) has
shown significantly deleterious effects of intergene expression correlation on the estimate of 𝔉. This paper suggests that nonlinear
dependencies could likewise be important. With an applied emphasis, this paper extends the “moment framework” by including
third-moment skewness corrections in an estimator of 𝔉. This estimator combines observed correlation (corrected for sampling
fluctuations) with the information from easily identifiable null cases. Nonlinear-dependence modeling reduces the estimation error
relative to that of linear estimation. Third-moment calculations involve empirical densities of 3 × 3 covariance matrices estimated using very few samples. The principle of entropy maximization is employed to connect estimated moments to 𝔉 inference. Model results are tested with BRCA and HIV data sets and with carefully constructed simulations.
Collapse
Affiliation(s)
- J R Deller
- Department of Electrical and Computer Engineering, Michigan State University, 2120 EB, East Lansing, MI 48824, USA
| | - Hayder Radha
- Department of Electrical and Computer Engineering, Michigan State University, 2120 EB, East Lansing, MI 48824, USA
| | - J Justin McCormick
- Carcinogenesis Laboratory, Department of Molecular Biology and Biochemistry, Michigan State University, 341 FST, East Lansing, MI 48824, USA
| | - Huiyan Wang
- College of Computer Science and Information Engineering, Zhejiang Gongshang University, 18 Xuezheng Street, Zhejiang Province Hangzhou, 310018, China
| |
Collapse
|
188
|
Kidd MJ, Chen Z, Wang Y, Jackson KJ, Zhang L, Boyd SD, Fire AZ, Tanaka MM, Gaëta BA, Collins AM. The inference of phased haplotypes for the immunoglobulin H chain V region gene loci by analysis of VDJ gene rearrangements. THE JOURNAL OF IMMUNOLOGY 2011; 188:1333-40. [PMID: 22205028 DOI: 10.4049/jimmunol.1102097] [Citation(s) in RCA: 80] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
The existence of many highly similar genes in the lymphocyte receptor gene loci makes them difficult to investigate, and the determination of phased "haplotypes" has been particularly problematic. However, V(D)J gene rearrangements provide an opportunity to infer the association of Ig genes along the chromosomes. The chromosomal distribution of H chain genes in an Ig genotype can be inferred through analysis of VDJ rearrangements in individuals who are heterozygous at points within the IGH locus. We analyzed VDJ rearrangements from 44 individuals for whom sufficient unique rearrangements were available to allow comprehensive genotyping. Nine individuals were identified who were heterozygous at the IGHJ6 locus and for whom sufficient suitable VDJ rearrangements were available to allow comprehensive haplotyping. Each of the 18 resulting IGHV│IGHD│IGHJ haplotypes was unique. Apparent deletion polymorphisms were seen that involved as many as four contiguous, functional IGHV genes. Two deletion polymorphisms involving multiple contiguous IGHD genes were also inferred. Three previously unidentified gene duplications were detected, where two sequences recognized as allelic variants of a single gene were both inferred to be on a single chromosome. Phased genomic data brings clarity to the study of the contribution of each gene to the available repertoire of rearranged VDJ genes. Analysis of rearrangement frequencies suggests that particular genes may have substantially different yet predictable propensities for rearrangement within different haplotypes. Together with data highlighting the extent of haplotypic variation within the population, this suggests that there may be substantial variability in the available Ab repertoires of different individuals.
Collapse
Affiliation(s)
- Marie J Kidd
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Kensington, Sydney, New South Wales 2052, Australia
| | | | | | | | | | | | | | | | | | | |
Collapse
|
189
|
Duitama J, McEwen GK, Huebsch T, Palczewski S, Schulz S, Verstrepen K, Suk EK, Hoehe MR. Fosmid-based whole genome haplotyping of a HapMap trio child: evaluation of Single Individual Haplotyping techniques. Nucleic Acids Res 2011; 40:2041-53. [PMID: 22102577 PMCID: PMC3299995 DOI: 10.1093/nar/gkr1042] [Citation(s) in RCA: 94] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Determining the underlying haplotypes of individual human genomes is an essential, but currently difficult, step toward a complete understanding of genome function. Fosmid pool-based next-generation sequencing allows genome-wide generation of 40-kb haploid DNA segments, which can be phased into contiguous molecular haplotypes computationally by Single Individual Haplotyping (SIH). Many SIH algorithms have been proposed, but the accuracy of such methods has been difficult to assess due to the lack of real benchmark data. To address this problem, we generated whole genome fosmid sequence data from a HapMap trio child, NA12878, for which reliable haplotypes have already been produced. We assembled haplotypes using eight algorithms for SIH and carried out direct comparisons of their accuracy, completeness and efficiency. Our comparisons indicate that fosmid-based haplotyping can deliver highly accurate results even at low coverage and that our SIH algorithm, ReFHap, is able to efficiently produce high-quality haplotypes. We expanded the haplotypes for NA12878 by combining the current haplotypes with our fosmid-based haplotypes, producing near-to-complete new gold-standard haplotypes containing almost 98% of heterozygous SNPs. This improvement includes notable fractions of disease-related and GWA SNPs. Integrated with other molecular biological data sets, this phase information will advance the emerging field of diploid genomics.
Collapse
Affiliation(s)
- Jorge Duitama
- Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Ihnestr. 63-73, D-14195 Berlin, Germany.
| | | | | | | | | | | | | | | |
Collapse
|
190
|
Abstract
Determination of haplotype phase is becoming increasingly important as we enter the era of large-scale sequencing because many of its applications, such as imputing low-frequency variants and characterizing the relationship between genetic variation and disease susceptibility, are particularly relevant to sequence data. Haplotype phase can be generated through laboratory-based experimental methods, or it can be estimated using computational approaches. We assess the haplotype phasing methods that are available, focusing in particular on statistical methods, and we discuss the practical aspects of their application. We also describe recent developments that may transform this field, particularly the use of identity-by-descent for computational phasing.
Collapse
Affiliation(s)
- Sharon R. Browning
- Department of Biostatistics, University of Washington, Seattle WA 98195, USA
| | - Brian L. Browning
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle WA 98195, USA
| |
Collapse
|
191
|
Roach J, Glusman G, Hubley R, Montsaroff S, Holloway A, Mauldin D, Srivastava D, Garg V, Pollard K, Galas D, Hood L, Smit A. Chromosomal haplotypes by genetic phasing of human families. Am J Hum Genet 2011; 89:382-97. [PMID: 21855840 DOI: 10.1016/j.ajhg.2011.07.023] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2011] [Revised: 07/23/2011] [Accepted: 07/30/2011] [Indexed: 01/06/2023] Open
Abstract
Assignment of alleles to haplotypes for nearly all the variants on all chromosomes can be performed by genetic analysis of a nuclear family with three or more children. Whole-genome sequence data enable deterministic phasing of nearly all sequenced alleles by permitting assignment of recombinations to precise chromosomal positions and specific meioses. We demonstrate this process of genetic phasing on two families each with four children. We generate haplotypes for all of the children and their parents; these haplotypes span all genotyped positions, including rare variants. Misassignments of phase between variants (switch errors) are nearly absent. Our algorithm can also produce multimegabase haplotypes for nuclear families with just two children and can handle families with missing individuals. We implement our algorithm in a suite of software scripts (Haploscribe). Haplotypes and family genome sequences will become increasingly important for personalized medicine and for fundamental biology.
Collapse
|
192
|
Suk EK, McEwen GK, Duitama J, Nowick K, Schulz S, Palczewski S, Schreiber S, Holloway DT, McLaughlin S, Peckham H, Lee C, Huebsch T, Hoehe MR. A comprehensively molecular haplotype-resolved genome of a European individual. Genome Res 2011; 21:1672-85. [PMID: 21813624 DOI: 10.1101/gr.125047.111] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Independent determination of both haplotype sequences of an individual genome is essential to relate genetic variation to genome function, phenotype, and disease. To address the importance of phase, we have generated the most complete haplotype-resolved genome to date, "Max Planck One" (MP1), by fosmid pool-based next generation sequencing. Virtually all SNPs (>99%) and 80,000 indels were phased into haploid sequences of up to 6.3 Mb (N50 ~1 Mb). The completeness of phasing allowed determination of the concrete molecular haplotype pairs for the vast majority of genes (81%) including potential regulatory sequences, of which >90% were found to be constituted by two different molecular forms. A subset of 159 genes with potentially severe mutations in either cis or trans configurations exemplified in particular the role of phase for gene function, disease, and clinical interpretation of personal genomes (e.g., BRCA1). Extended genomic regions harboring manifold combinations of physically and/or functionally related genes and regulatory elements were resolved into their underlying "haploid landscapes," which may define the functional genome. Moreover, the majority of genes and functional sequences were found to contain individual or rare SNPs, which cannot be phased from population data alone, emphasizing the importance of molecular phasing for characterizing a genome in its molecular individuality. Our work provides the foundation to understand that the distinction of molecular haplotypes is essential to resolve the (inherently individual) biology of genes, genomes, and disease, establishing a reference point for "phase-sensitive" personal genomics. MP1's annotated haploid genomes are available as a public resource.
Collapse
Affiliation(s)
- Eun-Kyung Suk
- Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
193
|
Torkamani A, Scott-Van Zeeland AA, Topol EJ, Schork NJ. Annotating individual human genomes. Genomics 2011; 98:233-41. [PMID: 21839162 DOI: 10.1016/j.ygeno.2011.07.006] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2011] [Accepted: 07/26/2011] [Indexed: 02/03/2023]
Abstract
Advances in DNA sequencing technologies have made it possible to rapidly, accurately and affordably sequence entire individual human genomes. As impressive as this ability seems, however, it will not likely amount to much if one cannot extract meaningful information from individual sequence data. Annotating variations within individual genomes and providing information about their biological or phenotypic impact will thus be crucially important in moving individual sequencing projects forward, especially in the context of the clinical use of sequence information. In this paper we consider the various ways in which one might annotate individual sequence variations and point out limitations in the available methods for doing so. It is arguable that, in the foreseeable future, DNA sequencing of individual genomes will become routine for clinical, research, forensic, and personal purposes. We therefore also consider directions and areas for further research in annotating genomic variants.
Collapse
|