1051
|
Henrichsen CN, Chaignat E, Reymond A. Copy number variants, diseases and gene expression. Hum Mol Genet 2009; 18:R1-8. [PMID: 19297395 DOI: 10.1093/hmg/ddp011] [Citation(s) in RCA: 292] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Copy number variation (CNV) has recently gained considerable interest as a source of genetic variation likely to play a role in phenotypic diversity and evolution. Much effort has been put into the identification and mapping of regions that vary in copy number among seemingly normal individuals in humans and a number of model organisms, using bioinformatics or hybridization-based methods. These have allowed uncovering associations between copy number changes and complex diseases in whole-genome association studies, as well as identify new genomic disorders. At the genome-wide scale, however, the functional impact of CNV remains poorly studied. Here we review the current catalogs of CNVs, their association with diseases and how they link genotype and phenotype. We describe initial evidence which revealed that genes in CNV regions are expressed at lower and more variable levels than genes mapping elsewhere, and also that CNV not only affects the expression of genes varying in copy number, but also have a global influence on the transcriptome. Further studies are warranted for complete cataloguing and fine mapping of CNVs, as well as to elucidate the different mechanisms by which they influence gene expression.
Collapse
Affiliation(s)
- Charlotte N Henrichsen
- The Center for Integrative Genomics, Genopode Building, University of Lausanne, Lausanne, Switzerland
| | | | | |
Collapse
|
1052
|
Raymond FL, Whibley A, Stratton MR, Gecz J. Lessons learnt from large-scale exon re-sequencing of the X chromosome. Hum Mol Genet 2009; 18:R60-4. [PMID: 19297402 PMCID: PMC2657946 DOI: 10.1093/hmg/ddp071] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
A candidate gene approach to identifying novel causes of disease is concept-limiting and in the new era of high throughput sequencing there is now no need to restrict the experiment to a few interesting genes. We have recently completed a large-scale exon re-sequencing project using Sanger sequencing technology to analyse approximately 1 Mb of coding sequence of the X chromosome in probands from >200 families with various forms of intellectual disability. We review the lessons learnt from this experience. Comparing large data sets will certainly reveal pathogenic mutations in genes that were not possible to identify previously. However, the task of distinguishing pathogenic mutations from rare sequence variants is not easy and is the most substantial challenge to the next decade. High-throughput technology has the attraction of being cheap, fast and comprehensive but for projects that require detailed coverage of a genomic region at an exhaustive level they may require a combination of large-scale with a small-scale follow-up of difficult regions to sequence. The number of rare truncating variants present in coding regions of the X chromosome that are not pathogenic was 1%. The importance of the quality of the starting material both clinically and molecularly and the number of sequence variants both rare and common that any one individual has across their coding sequence is discussed.
Collapse
Affiliation(s)
- F Lucy Raymond
- Cambridge Institute of Medical Research, University of Cambridge, Cambridge, UK.
| | | | | | | |
Collapse
|
1053
|
Abstract
Motivation: Next-generation DNA sequencing machines are generating an enormous amount of sequence data, placing unprecedented demands on traditional single-processor read-mapping algorithms. CloudBurst is a new parallel read-mapping algorithm optimized for mapping next-generation sequence data to the human genome and other reference genomes, for use in a variety of biological analyses including SNP discovery, genotyping and personal genomics. It is modeled after the short read-mapping program RMAP, and reports either all alignments or the unambiguous best alignment for each read with any number of mismatches or differences. This level of sensitivity could be prohibitively time consuming, but CloudBurst uses the open-source Hadoop implementation of MapReduce to parallelize execution using multiple compute nodes. Results: CloudBurst's running time scales linearly with the number of reads mapped, and with near linear speedup as the number of processors increases. In a 24-processor core configuration, CloudBurst is up to 30 times faster than RMAP executing on a single core, while computing an identical set of alignments. Using a larger remote compute cloud with 96 cores, CloudBurst improved performance by >100-fold, reducing the running time from hours to mere minutes for typical jobs involving mapping of millions of short reads to the human genome. Availability: CloudBurst is available open-source as a model for parallelizing algorithms with MapReduce at http://cloudburst-bio.sourceforge.net/. Contact:mschatz@umiacs.umd.edu
Collapse
Affiliation(s)
- Michael C Schatz
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA.
| |
Collapse
|
1054
|
Frazer KA, Murray SS, Schork NJ, Topol EJ. Human genetic variation and its contribution to complex traits. Nat Rev Genet 2009; 10:241-51. [PMID: 19293820 DOI: 10.1038/nrg2554] [Citation(s) in RCA: 675] [Impact Index Per Article: 45.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The last few years have seen extensive efforts to catalogue human genetic variation and correlate it with phenotypic differences. Most common SNPs have now been assessed in genome-wide studies for statistical associations with many complex traits, including many important common diseases. Although these studies have provided new biological insights, only a limited amount of the heritable component of any complex trait has been identified and it remains a challenge to elucidate the functional link between associated variants and phenotypic traits. Technological advances, such as the ability to detect rare and structural variants, and a clear understanding of the challenges in linking different types of variation with phenotype, will be essential for future progress.
Collapse
Affiliation(s)
- Kelly A Frazer
- Scripps Genomic Medicine, Scripps Translational Science Institute and The Scripps Research Institute, La Jolla, California 92037, USA.
| | | | | | | |
Collapse
|
1055
|
Tyner JW, Rutenberg-Schoenberg ML, Erickson H, Willis SG, O'Hare T, Deininger MW, Druker BJ, Loriaux MM. Functional characterization of an activating TEK mutation in acute myeloid leukemia: a cellular context-dependent activating mutation. Leukemia 2009; 23:1345-8. [PMID: 19340004 DOI: 10.1038/leu.2009.66] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
1056
|
Shen Y, Wu BL. Designing a simple multiplex ligation-dependent probe amplification (MLPA) assay for rapid detection of copy number variants in the genome. J Genet Genomics 2009; 36:257-65. [DOI: 10.1016/s1673-8527(08)60113-7] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2008] [Revised: 10/14/2008] [Accepted: 10/18/2008] [Indexed: 11/26/2022]
|
1057
|
Li M, Marin-Muller C, Bharadwaj U, Chow KH, Yao Q, Chen C. MicroRNAs: control and loss of control in human physiology and disease. World J Surg 2009; 33:667-84. [PMID: 19030926 PMCID: PMC2933043 DOI: 10.1007/s00268-008-9836-x] [Citation(s) in RCA: 168] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Analysis of the human genome indicates that a large fraction of the genome sequences are RNAs that do not encode any proteins, also known as non-coding RNAs. MicroRNAs (miRNAs) are a group of small non-coding RNA molecules 20-22 nucleotides (nt) in length that are predicted to control the activity of approximately 30% of all protein-coding genes in mammals. miRNAs play important roles in many diseases, including cancer, cardiovascular disease, and immune disorders. The expression of miRNAs can be regulated by epigenetic modification, DNA copy number change, and genetic mutations. miRNAs can serve as a valuable therapeutic target for a large number of diseases. For miRNAs with oncogenic capabilities, potential therapies include miRNA silencing, antisense blocking, and miRNA modifications. For miRNAs with tumor suppression functions, overexpression of those miRNAs might be a useful strategy to inhibit tumor growth. In this review, we discuss the current progress of miRNA research, regulation of miRNA expression, prediction of miRNA targets, and regulatory role of miRNAs in human physiology and diseases, with a specific focus on miRNAs in pancreatic cancer, liver cancer, colorectal cancer, cardiovascular disease, the immune system, and infectious disease. This review provides valuable information for clinicians and researchers who want to recognize the newest advances in this new field and identify possible lines of investigation in miRNAs as important mediators in human physiology and diseases.
Collapse
Affiliation(s)
- Min Li
- Molecular Surgeon Research Center, Division of Vascular Surgery and Endovascular Therapy, Michael E. DeBakey Department of Surgery and Michael E. DeBakey VA Medical Center, Baylor College of Medicine, Houston, Texas, USA
| | - Christian Marin-Muller
- Molecular Surgeon Research Center, Division of Vascular Surgery and Endovascular Therapy, Michael E. DeBakey Department of Surgery and Michael E. DeBakey VA Medical Center, Baylor College of Medicine, Houston, Texas, USA
| | - Uddalak Bharadwaj
- Molecular Surgeon Research Center, Division of Vascular Surgery and Endovascular Therapy, Michael E. DeBakey Department of Surgery and Michael E. DeBakey VA Medical Center, Baylor College of Medicine, Houston, Texas, USA
| | - Kwong-Hon Chow
- Molecular Surgeon Research Center, Division of Vascular Surgery and Endovascular Therapy, Michael E. DeBakey Department of Surgery and Michael E. DeBakey VA Medical Center, Baylor College of Medicine, Houston, Texas, USA
| | - Qizhi Yao
- Molecular Surgeon Research Center, Division of Vascular Surgery and Endovascular Therapy, Michael E. DeBakey Department of Surgery and Michael E. DeBakey VA Medical Center, Baylor College of Medicine, Houston, Texas, USA
| | - Changyi Chen
- Molecular Surgeon Research Center, Division of Vascular Surgery and Endovascular Therapy, Michael E. DeBakey Department of Surgery and Michael E. DeBakey VA Medical Center, Baylor College of Medicine, Houston, Texas, USA
| |
Collapse
|
1058
|
Voidonikolas G, Kreml SS, Chen C, Fisher WE, Brunicardi FC, Gibbs RA, Gingras MC. Basic principles and technologies for deciphering the genetic map of cancer. World J Surg 2009; 33:615-29. [PMID: 19115029 PMCID: PMC2924149 DOI: 10.1007/s00268-008-9851-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
The progress achieved in the field of genomics in recent years is leading medicine to adopt a personalized model in which the knowledge of individual DNA alterations will allow a targeted approach to cancer. Using pancreatic cancer as a model, we discuss herein the fundamentals that need to be considered for the high throughput and global identification of mutations. These include patient-related issues, sample collection, DNA isolation, gene selection, primer design, and sequencing techniques. We also describe the possible applications of the discovery of DNA changes to the approach of this disease and cite preliminary efforts where the knowledge has been translated into the clinical or preclinical setting.
Collapse
Affiliation(s)
- Georgios Voidonikolas
- Michael E. DeBakey Department of Surgery, Baylor College of Medicine, Houston, Texas, USA
| | - Stephanie S. Kreml
- Michael E. DeBakey Department of Surgery, Baylor College of Medicine, Houston, Texas, USA
| | - Changyi Chen
- Michael E. DeBakey Department of Surgery, Baylor College of Medicine, Houston, Texas, USA
| | - William E. Fisher
- Michael E. DeBakey Department of Surgery, Baylor College of Medicine, Houston, Texas, USA
- The Elkins Pancreas Center, Baylor College of Medicine, Houston, Texas, USA
| | - F. Charles Brunicardi
- Michael E. DeBakey Department of Surgery, Baylor College of Medicine, Houston, Texas, USA
| | - Richard A. Gibbs
- Human Genome Sequencing Center; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Marie-Claude Gingras
- Michael E. DeBakey Department of Surgery, Baylor College of Medicine, Houston, Texas, USA
- Human Genome Sequencing Center; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| |
Collapse
|
1059
|
Ansorge WJ. Next-generation DNA sequencing techniques. N Biotechnol 2009; 25:195-203. [DOI: 10.1016/j.nbt.2008.12.009] [Citation(s) in RCA: 470] [Impact Index Per Article: 31.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2008] [Accepted: 12/23/2008] [Indexed: 02/07/2023]
|
1060
|
Kaput J, Cotton RGH, Hardman L, Watson M, Al Aqeel AI, Al-Aama JY, Al-Mulla F, Alonso S, Aretz S, Auerbach AD, Bapat B, Bernstein IT, Bhak J, Bleoo SL, Blöcker H, Brenner SE, Burn J, Bustamante M, Calzone R, Cambon-Thomsen A, Cargill M, Carrera P, Cavedon L, Cho YS, Chung YJ, Claustres M, Cutting G, Dalgleish R, den Dunnen JT, Díaz C, Dobrowolski S, dos Santos MRN, Ekong R, Flanagan SB, Flicek P, Furukawa Y, Genuardi M, Ghang H, Golubenko MV, Greenblatt MS, Hamosh A, Hancock JM, Hardison R, Harrison TM, Hoffmann R, Horaitis R, Howard HJ, Barash CI, Izagirre N, Jung J, Kojima T, Laradi S, Lee YS, Lee JY, Gil-da-Silva-Lopes VL, Macrae FA, Maglott D, Marafie MJ, Marsh SGE, Matsubara Y, Messiaen LM, Möslein G, Netea MG, Norton ML, Oefner PJ, Oetting WS, O'Leary JC, de Ramirez AMO, Paalman MH, Parboosingh J, Patrinos GP, Perozzi G, Phillips IR, Povey S, Prasad S, Qi M, Quin DJ, Ramesar RS, Richards CS, Savige J, Scheible DG, Scott RJ, Seminara D, Shephard EA, Sijmons RH, Smith TD, Sobrido MJ, Tanaka T, Tavtigian SV, Taylor GR, Teague J, Töpel T, Ullman-Cullere M, Utsunomiya J, van Kranen HJ, Vihinen M, Webb E, Weber TK, Yeager M, Yeom YI, Yim SH, Yoo HS. Planning the human variome project: the Spain report. Hum Mutat 2009; 30:496-510. [PMID: 19306394 PMCID: PMC5879779 DOI: 10.1002/humu.20972] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
The remarkable progress in characterizing the human genome sequence, exemplified by the Human Genome Project and the HapMap Consortium, has led to the perception that knowledge and the tools (e.g., microarrays) are sufficient for many if not most biomedical research efforts. A large amount of data from diverse studies proves this perception inaccurate at best, and at worst, an impediment for further efforts to characterize the variation in the human genome. Because variation in genotype and environment are the fundamental basis to understand phenotypic variability and heritability at the population level, identifying the range of human genetic variation is crucial to the development of personalized nutrition and medicine. The Human Variome Project (HVP; http://www.humanvariomeproject.org/) was proposed initially to systematically collect mutations that cause human disease and create a cyber infrastructure to link locus specific databases (LSDB). We report here the discussions and recommendations from the 2008 HVP planning meeting held in San Feliu de Guixols, Spain, in May 2008.
Collapse
Affiliation(s)
- Jim Kaput
- Division of Personalised Nutrition and Medicine, FDA/National Center for Toxicological Research, Jefferson, Arkansas 72079, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1061
|
Abstract
A new sequence-alignment algorithm that uses advanced data structures to help data analysis keep pace with data generation. DNA sequence data are being produced at an ever-increasing rate. The Bowtie sequence-alignment algorithm uses advanced data structures to help data analysis keep pace with data generation.
Collapse
Affiliation(s)
- Paul Flicek
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
| |
Collapse
|
1062
|
Harismendy O, Ng PC, Strausberg RL, Wang X, Stockwell TB, Beeson KY, Schork NJ, Murray SS, Topol EJ, Levy S, Frazer KA. Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome Biol 2009; 10:R32. [PMID: 19327155 PMCID: PMC2691003 DOI: 10.1186/gb-2009-10-3-r32] [Citation(s) in RCA: 399] [Impact Index Per Article: 26.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2008] [Revised: 02/23/2009] [Accepted: 03/27/2009] [Indexed: 12/03/2022] Open
Abstract
Human sequence generated from three next-generation sequencing platforms reveals systematic variability in sequence coverage due to local sequence characteristics. Background Next generation sequencing (NGS) platforms are currently being utilized for targeted sequencing of candidate genes or genomic intervals to perform sequence-based association studies. To evaluate these platforms for this application, we analyzed human sequence generated by the Roche 454, Illumina GA, and the ABI SOLiD technologies for the same 260 kb in four individuals. Results Local sequence characteristics contribute to systematic variability in sequence coverage (>100-fold difference in per-base coverage), resulting in patterns for each NGS technology that are highly correlated between samples. A comparison of the base calls to 88 kb of overlapping ABI 3730xL Sanger sequence generated for the same samples showed that the NGS platforms all have high sensitivity, identifying >95% of variant sites. At high coverage, depth base calling errors are systematic, resulting from local sequence contexts; as the coverage is lowered additional 'random sampling' errors in base calling occur. Conclusions Our study provides important insights into systematic biases and data variability that need to be considered when utilizing NGS platforms for population targeted sequencing studies.
Collapse
Affiliation(s)
- Olivier Harismendy
- Scripps Genomic Medicine, Scripps Translational Science Institute, The Scripps Research Institute, La Jolla, CA 92037, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1063
|
Chen FC, Chen YZ, Chuang TJ. CNVVdb: a database of copy number variations across vertebrate genomes. Bioinformatics 2009; 25:1419-21. [PMID: 19321736 PMCID: PMC2682513 DOI: 10.1093/bioinformatics/btp166] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
SUMMARY CNVVdb is a web interface for identification of putative copy number variations (CNVs) among 16 vertebrate species using the-same-species self-alignments and cross-species pairwise alignments. By querying genomic coordinates in the target species, all the potential paralogous/orthologous regions that overlap > or = 80-100% (adjustable) of the query sequences with user-specified sequence identity (> or = 60% ~ > or = 90%) are returned. Additional information is also given for the genes that are included in the returned regions, including gene description, alternatively spliced transcripts, gene ontology descriptions and other biologically important information. CNVVdb also provides information of pseudogenes and single nucleotide polymorphisms (SNPs) for the CNV-related genomic regions. Moreover, multiple sequence alignments of shared CNVs across species are also provided. With the combination of CNV, SNP, pseudogene and functional information, CNVVdb can be very useful for comparative and functional studies in vertebrates. AVAILABILITY CNVVdb is freely accessible at (http://CNVVdb.genomics.sinica.edu.tw).
Collapse
Affiliation(s)
- Feng-Chi Chen
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Miaoli County 350, Taipei, Taiwan
| | | | | |
Collapse
|
1064
|
Deagle BE, Kirkwood R, Jarman SN. Analysis of Australian fur seal diet by pyrosequencing prey DNA in faeces. Mol Ecol 2009; 18:2022-38. [PMID: 19317847 DOI: 10.1111/j.1365-294x.2009.04158.x] [Citation(s) in RCA: 201] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
DNA-based techniques have proven useful for defining trophic links in a variety of ecosystems and recently developed sequencing technologies provide new opportunities for dietary studies. We investigated the diet of Australian fur seals (Arctocephalus pusillus doriferus) by pyrosequencing prey DNA from faeces collected at three breeding colonies across the seals' range. DNA from 270 faecal samples was amplified with four polymerase chain reaction primer sets and a blocking primer was used to limit amplification of fur seal DNA. Pooled amplicons from each colony were sequenced using the Roche GS-FLX platform, generating > 20,000 sequences. Software was developed to sort and group similar sequences. A total of 54 bony fish, 4 cartilaginous fish and 4 cephalopods were identified based on the most taxonomically informative amplicons sequenced (mitochondrial 16S). The prevalence of sequences from redbait (Emmelichthys nitidus) and jack mackerel (Trachurus declivis) confirm the importance of these species in the seals' diet. A third fish species, blue mackerel (Scomber australasicus), may be a more important prey species than previously recognised. There were major differences in the proportions of prey DNA recovered in faeces from different colonies, probably reflecting differences in prey availability. Parallel hard-part analysis identified largely the same main prey species as did the DNA-based technique, but with lower species diversity and no remains from cartilaginous prey. The pyrosequencing approach presented significantly expands the capabilities of DNA-based methods of dietary analysis and is suitable for large-scale diet investigations on a broad range of animals.
Collapse
Affiliation(s)
- Bruce E Deagle
- Australian Marine Mammal Centre, Australian Antarctic Division, Tasmania, Australia.
| | | | | |
Collapse
|
1065
|
Hasin-Brumshtein Y, Lancet D, Olender T. Human olfaction: from genomic variation to phenotypic diversity. Trends Genet 2009; 25:178-84. [PMID: 19303166 DOI: 10.1016/j.tig.2009.02.002] [Citation(s) in RCA: 104] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2009] [Revised: 02/06/2009] [Accepted: 02/06/2009] [Indexed: 12/19/2022]
Abstract
The sense of smell is a complex molecular device, encompassing several hundred olfactory receptor proteins (ORs). These receptors, encoded by the largest human gene superfamily, integrate odorant signals into an accurate 'odor image' in the brain. Widespread phenotypic diversity in human olfaction is, in part, attributable to prevalent genetic variation in OR genes, owing to copy number variation, deletion alleles and deleterious single nucleotide polymorphisms. The development of new genomic tools, including next generation sequencing and CNV assays, provides opportunities to characterize the genetic variations of this system. The advent of large-scale functional screens of expressed ORs, combined with genetic association studies, has the potential to link variations in ORs to human chemosensory phenotypes. This promises to provide a genome-wide view of human olfaction, resulting in a deeper understanding of personalized odor coding, with the potential to decipher flavor and fragrance preferences.
Collapse
Affiliation(s)
- Yehudit Hasin-Brumshtein
- Department of Molecular Genetics and the Crown Human Genome Center, Weizmann Institute of Science, Rehovot 76100, Israel
| | | | | |
Collapse
|
1066
|
Stamatoyannopoulos JA, Adzhubei I, Thurman RE, Kryukov GV, Mirkin SM, Sunyaev SR. Human mutation rate associated with DNA replication timing. Nat Genet 2009; 41:393-5. [PMID: 19287383 DOI: 10.1038/ng.363] [Citation(s) in RCA: 293] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2008] [Accepted: 02/24/2009] [Indexed: 11/09/2022]
Abstract
Eukaryotic DNA replication is highly stratified, with different genomic regions shown to replicate at characteristic times during S phase. Here we observe that mutation rate, as reflected in recent evolutionary divergence and human nucleotide diversity, is markedly increased in later-replicating regions of the human genome. All classes of substitutions are affected, suggesting a generalized mechanism involving replication time-dependent DNA damage. This correlation between mutation rate and regionally stratified replication timing may have substantial evolutionary implications.
Collapse
|
1067
|
Reeves GA, Talavera D, Thornton JM. Genome and proteome annotation: organization, interpretation and integration. J R Soc Interface 2009; 6:129-47. [PMID: 19019817 DOI: 10.1098/rsif.2008.0341] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Recent years have seen a huge increase in the generation of genomic and proteomic data. This has been due to improvements in current biological methodologies, the development of new experimental techniques and the use of computers as support tools. All these raw data are useless if they cannot be properly analysed, annotated, stored and displayed. Consequently, a vast number of resources have been created to present the data to the wider community. Annotation tools and databases provide the means to disseminate these data and to comprehend their biological importance. This review examines the various aspects of annotation: type, methodology and availability. Moreover, it puts a special interest on novel annotation fields, such as that of phenotypes, and highlights the recent efforts focused on the integrating annotations.
Collapse
Affiliation(s)
- Gabrielle A Reeves
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | |
Collapse
|
1068
|
Cancer gene discovery in mouse and man. Biochim Biophys Acta Rev Cancer 2009; 1796:140-61. [PMID: 19285540 PMCID: PMC2756404 DOI: 10.1016/j.bbcan.2009.03.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2009] [Revised: 03/03/2009] [Accepted: 03/05/2009] [Indexed: 12/31/2022]
Abstract
The elucidation of the human and mouse genome sequence and developments in high-throughput genome analysis, and in computational tools, have made it possible to profile entire cancer genomes. In parallel with these advances mouse models of cancer have evolved into a powerful tool for cancer gene discovery. Here we discuss the approaches that may be used for cancer gene identification in both human and mouse and discuss how a cross-species 'oncogenomics' approach to cancer gene discovery represents a powerful strategy for finding genes that drive tumourigenesis.
Collapse
|
1069
|
Bailey JA, Kidd JM, Eichler EE. Human copy number polymorphic genes. Cytogenet Genome Res 2009; 123:234-43. [PMID: 19287160 DOI: 10.1159/000184713] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/20/2008] [Indexed: 11/19/2022] Open
Abstract
Recent large-scale genomic studies within human populations have identified numerous genomic regions as copy number variant (CNV). As these CNV regions often overlap coding regions of the genome, large lists of potentially copy number polymorphic genes have been produced that are candidates for disease association. Most of the current data regarding normal genic variation, however, has been generated using BAC or SNP microarrays, which lack precision especially with respect to exons. To address this, we assessed 2,790 candidate CNV genes defined from available studies in nine well-characterized HapMap individuals by designing a customized oligonucleotide microarray targeted specifically to exons. Using exon array comparative genomic hybridization (aCGH), we detected 255 (9%) of the candidates as true CNVs including 134 with evidence of variation over the entire gene. Individuals differed in copy number from the control by an average of 100 gene loci. Both partial- and whole-gene CNVs were strongly associated with segmental duplications (55 and 71%, respectively) as well as regions of positive selection. We confirmed 37% of the whole-gene CNVs using the fosmid end sequence pair (ESP) structural variation map for these same individuals. If we modify the end sequence pair mapping strategy to include low-sequence identity ESPs (98-99.5%) and ESPs with an everted orientation, we can capture 82% of the missed genes leading to more complete ascertainment of structural variation within duplicated genes. Our results indicate that segmental duplications are the source of the majority of full-length copy number polymorphic genes, most of the variant genes are organized as tandem duplications, and a significant fraction of these genes will represent paralogs with levels of sequence diversity beyond thresholds of allelic variation. In addition, these data provide a targeted set of CNV genes enriched for regions likely to be associated with human phenotypic differences due to copy number changes and present a source of copy number responsive oligonucleotide probes for future association studies.
Collapse
Affiliation(s)
- J A Bailey
- Department of Pathology, Case Western University School of Medicine and University Hospitals of Cleveland, Cleveland, OH, USA.
| | | | | |
Collapse
|
1070
|
Xie C, Tammi MT. CNV-seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinformatics 2009; 10:80. [PMID: 19267900 PMCID: PMC2667514 DOI: 10.1186/1471-2105-10-80] [Citation(s) in RCA: 395] [Impact Index Per Article: 26.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2008] [Accepted: 03/06/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND DNA copy number variation (CNV) has been recognized as an important source of genetic variation. Array comparative genomic hybridization (aCGH) is commonly used for CNV detection, but the microarray platform has a number of inherent limitations. RESULTS Here, we describe a method to detect copy number variation using shotgun sequencing, CNV-seq. The method is based on a robust statistical model that describes the complete analysis procedure and allows the computation of essential confidence values for detection of CNV. Our results show that the number of reads, not the length of the reads is the key factor determining the resolution of detection. This favors the next-generation sequencing methods that rapidly produce large amount of short reads. CONCLUSION Simulation of various sequencing methods with coverage between 0.1x to 8x show overall specificity between 91.7 - 99.9%, and sensitivity between 72.2 - 96.5%. We also show the results for assessment of CNV between two individual human genomes.
Collapse
Affiliation(s)
- Chao Xie
- Department of Biological Sciences, National University of Singapore, Singapore.
| | | |
Collapse
|
1071
|
Petrosino JF, Highlander S, Luna RA, Gibbs RA, Versalovic J. Metagenomic pyrosequencing and microbial identification. Clin Chem 2009; 55:856-66. [PMID: 19264858 DOI: 10.1373/clinchem.2008.107565] [Citation(s) in RCA: 379] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
BACKGROUND The Human Microbiome Project has ushered in a new era for human metagenomics and high-throughput next-generation sequencing strategies. CONTENT This review describes evolving strategies in metagenomics, with a special emphasis on the core technology of DNA pyrosequencing. The challenges of microbial identification in the context of microbial populations are discussed. The development of next-generation pyrosequencing strategies and the technical hurdles confronting these methodologies are addressed. Bioinformatics-related topics include taxonomic systems, sequence databases, sequence-alignment tools, and classifiers. DNA sequencing based on 16S rRNA genes or entire genomes is summarized with respect to potential pyrosequencing applications. SUMMARY Both the approach of 16S rDNA amplicon sequencing and the whole-genome sequencing approach may be useful for human metagenomics, and numerous bioinformatics tools are being deployed to tackle such vast amounts of microbiological sequence diversity. Metagenomics, or genetic studies of microbial communities, may ultimately contribute to a more comprehensive understanding of human health, disease susceptibilities, and the pathophysiology of infectious and immune-mediated diseases.
Collapse
Affiliation(s)
- Joseph F Petrosino
- Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX, USA
| | | | | | | | | |
Collapse
|
1072
|
Li G, Ma L, Song C, Yang Z, Wang X, Huang H, Li Y, Li R, Zhang X, Yang H, Wang J, Wang J. The YH database: the first Asian diploid genome database. Nucleic Acids Res 2009; 37:D1025-8. [PMID: 19073702 PMCID: PMC2686535 DOI: 10.1093/nar/gkn966] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
The YH database is a server that allows the user to easily browse and download data from the first Asian diploid genome. The aim of this platform is to facilitate the study of this Asian genome and to enable improved organization and presentation large-scale personal genome data. Powered by GBrowse, we illustrate here the genome sequences, SNPs, and sequencing reads in the MapView. The relationships between phenotype and genotype can be searched by location, dbSNP ID, HGMD ID, gene symbol and disease name. A BLAST web service is also provided for the purpose of aligning query sequence against YH genome consensus. The YH database is currently one of the three personal genome database, organizing the original data and analysis results in a user-friendly interface, which is an endeavor to achieve fundamental goals for establishing personal medicine. The database is available at http://yh.genomics.org.cn.
Collapse
Affiliation(s)
- Guoqing Li
- Beijing Genomics Institute at Shenzhen 518083, China
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1073
|
Hirschfield GM, Siminovitch KA. Metagenomics and autoimmune liver disease: searching for the unknown. Liver Int 2009; 29:319-20. [PMID: 19267861 DOI: 10.1111/j.1478-3231.2009.01974.x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/13/2023]
|
1074
|
Effects of ploidy and recombination on evolution of robustness in a model of the segment polarity network. PLoS Comput Biol 2009; 5:e1000296. [PMID: 19247428 PMCID: PMC2637435 DOI: 10.1371/journal.pcbi.1000296] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2008] [Accepted: 01/20/2009] [Indexed: 11/19/2022] Open
Abstract
Many genetic networks are astonishingly robust to quantitative variation, allowing these networks to continue functioning in the face of mutation and environmental perturbation. However, the evolution of such robustness remains poorly understood for real genetic networks. Here we explore whether and how ploidy and recombination affect the evolution of robustness in a detailed computational model of the segment polarity network. We introduce a novel computational method that predicts the quantitative values of biochemical parameters from bit sequences representing genotype, allowing our model to bridge genotype to phenotype. Using this, we simulate 2,000 generations of evolution in a population of individuals under stabilizing and truncation selection, selecting for individuals that could sharpen the initial pattern of engrailed and wingless expression. Robustness was measured by simulating a mutation in the network and measuring the effect on the engrailed and wingless patterns; higher robustness corresponded to insensitivity of this pattern to perturbation. We compared robustness in diploid and haploid populations, with either asexual or sexual reproduction. In all cases, robustness increased, and the greatest increase was in diploid sexual populations; diploidy and sex synergized to evolve greater robustness than either acting alone. Diploidy conferred increased robustness by allowing most deleterious mutations to be rescued by a working allele. Sex (recombination) conferred a robustness advantage through "survival of the compatible": those alleles that can work with a wide variety of genetically diverse partners persist, and this selects for robust alleles.
Collapse
|
1075
|
Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJM, Birol I. ABySS: a parallel assembler for short read sequence data. Genome Res 2009; 19:1117-23. [PMID: 19251739 DOI: 10.1101/gr.089532.108] [Citation(s) in RCA: 2418] [Impact Index Per Article: 161.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Widespread adoption of massively parallel deoxyribonucleic acid (DNA) sequencing instruments has prompted the recent development of de novo short read assembly algorithms. A common shortcoming of the available tools is their inability to efficiently assemble vast amounts of data generated from large-scale sequencing projects, such as the sequencing of individual human genomes to catalog natural genetic variation. To address this limitation, we developed ABySS (Assembly By Short Sequences), a parallelized sequence assembler. As a demonstration of the capability of our software, we assembled 3.5 billion paired-end reads from the genome of an African male publicly released by Illumina, Inc. Approximately 2.76 million contigs > or =100 base pairs (bp) in length were created with an N50 size of 1499 bp, representing 68% of the reference human genome. Analysis of these contigs identified polymorphic and novel sequences not present in the human reference assembly, which were validated by alignment to alternate human assemblies and to other primate genomes.
Collapse
Affiliation(s)
- Jared T Simpson
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia V5Z 4E6, Canada
| | | | | | | | | | | |
Collapse
|
1076
|
Voelkerding KV, Dames SA, Durtschi JD. Next-generation sequencing: from basic research to diagnostics. Clin Chem 2009; 55:641-58. [PMID: 19246620 DOI: 10.1373/clinchem.2008.112789] [Citation(s) in RCA: 433] [Impact Index Per Article: 28.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
BACKGROUND For the past 30 years, the Sanger method has been the dominant approach and gold standard for DNA sequencing. The commercial launch of the first massively parallel pyrosequencing platform in 2005 ushered in the new era of high-throughput genomic analysis now referred to as next-generation sequencing (NGS). CONTENT This review describes fundamental principles of commercially available NGS platforms. Although the platforms differ in their engineering configurations and sequencing chemistries, they share a technical paradigm in that sequencing of spatially separated, clonally amplified DNA templates or single DNA molecules is performed in a flow cell in a massively parallel manner. Through iterative cycles of polymerase-mediated nucleotide extensions or, in one approach, through successive oligonucleotide ligations, sequence outputs in the range of hundreds of megabases to gigabases are now obtained routinely. Highlighted in this review are the impact of NGS on basic research, bioinformatics considerations, and translation of this technology into clinical diagnostics. Also presented is a view into future technologies, including real-time single-molecule DNA sequencing and nanopore-based sequencing. SUMMARY In the relatively short time frame since 2005, NGS has fundamentally altered genomics research and allowed investigators to conduct experiments that were previously not technically feasible or affordable. The various technologies that constitute this new paradigm continue to evolve, and further improvements in technology robustness and process streamlining will pave the path for translation into clinical diagnostics.
Collapse
Affiliation(s)
- Karl V Voelkerding
- ARUP Institute for Experimental and Clinical Pathology, Salt Lake City, Utah 84108, USA.
| | | | | |
Collapse
|
1077
|
Korbel JO, Abyzov A, Mu XJ, Carriero N, Cayting P, Zhang Z, Snyder M, Gerstein MB. PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. Genome Biol 2009; 10:R23. [PMID: 19236709 PMCID: PMC2688268 DOI: 10.1186/gb-2009-10-2-r23] [Citation(s) in RCA: 167] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2008] [Revised: 12/22/2008] [Accepted: 02/23/2009] [Indexed: 11/10/2022] Open
Abstract
Personal-genomics endeavors, such as the 1000 Genomes project, are generating maps of genomic structural variants by analyzing ends of massively sequenced genome fragments. To process these we developed Paired-End Mapper (PEMer; http://sv.gersteinlab.org/pemer). This comprises an analysis pipeline, compatible with several next-generation sequencing platforms; simulation-based error models, yielding confidence-values for each structural variant; and a back-end database. The simulations demonstrated high structural variant reconstruction efficiency for PEMer's coverage-adjusted multi-cutoff scoring-strategy and showed its relative insensitivity to base-calling errors.
Collapse
Affiliation(s)
- Jan O Korbel
- Gene Expression Unit, European Molecular Biology Laboratory (EMBL), Meyerhofstr, Heidelberg, 69117, Germany.
| | | | | | | | | | | | | | | |
Collapse
|
1078
|
Kumagai H, Utsunomiya S, Nakamura S, Yamamoto R, Harada A, Kaji T, Hazama M, Ohashi T, Inami A, Ikegami T, Miyamoto K, Endo N, Yoshimi K, Toyoda A, Hattori M, Sakaki Y. Large-scale microfabricated channel plates for high-throughput, fully automated DNA sequencing. Electrophoresis 2009; 29:4723-32. [PMID: 19016243 DOI: 10.1002/elps.200800301] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
We have described a new DNA sequencing platform based on the Sanger chemistry, in which the large-scale microfabricated channel plates and electrophoretic system result in higher-throughput DNA sequencing. Three hundred and eighty-four channels are arranged in a fan-like shape on a 25x47 cm glass plate, on which 384 oval sample holes are connected to each channel coupled to the opposite anode access holes. Two microfabricated plates are set on the sequencing apparatus, in which sequencing electrophoresis is conducted on one plate and the preparation process is on another plate. Each sample hole is loaded with 2.3 microL volume of sample and injected into separation channels electrokinetically. High-quality sequencing data were acquired using the pUC18 template, achieving an average read-length of 1001 bases with 99% accuracy and a throughput of 5 Mbases per day per instrument. To assess the performance in actual sequencing field, the bacterial artificial chromosome shotgun library of the Pseudorca crassidens genome was sequenced, using 1/80 of the quantity of Sanger reagent (0.1 microL). We believe that this is the first demonstration of the useful performance of DNA sequencing using monolithic microfabricated devices with walk-away operation.
Collapse
|
1079
|
A burst of segmental duplications in the genome of the African great ape ancestor. Nature 2009; 457:877-81. [PMID: 19212409 DOI: 10.1038/nature07744] [Citation(s) in RCA: 169] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2008] [Accepted: 12/18/2008] [Indexed: 02/02/2023]
Abstract
It is generally accepted that the extent of phenotypic change between human and great apes is dissonant with the rate of molecular change. Between these two groups, proteins are virtually identical, cytogenetically there are few rearrangements that distinguish ape-human chromosomes, and rates of single-base-pair change and retrotransposon activity have slowed particularly within hominid lineages when compared to rodents or monkeys. Studies of gene family evolution indicate that gene loss and gain are enriched within the primate lineage. Here, we perform a systematic analysis of duplication content of four primate genomes (macaque, orang-utan, chimpanzee and human) in an effort to understand the pattern and rates of genomic duplication during hominid evolution. We find that the ancestral branch leading to human and African great apes shows the most significant increase in duplication activity both in terms of base pairs and in terms of events. This duplication acceleration within the ancestral species is significant when compared to lineage-specific rate estimates even after accounting for copy-number polymorphism and homoplasy. We discover striking examples of recurrent and independent gene-containing duplications within the gorilla and chimpanzee that are absent in the human lineage. Our results suggest that the evolutionary properties of copy-number mutation differ significantly from other forms of genetic mutation and, in contrast to the hominid slowdown of single-base-pair mutations, there has been a genomic burst of duplication activity at this period during human evolution.
Collapse
|
1080
|
Rodríguez JE, McCudden CR, Willis MS. Familial hypertrophic cardiomyopathy: basic concepts and future molecular diagnostics. Clin Biochem 2009; 42:755-65. [PMID: 19318019 DOI: 10.1016/j.clinbiochem.2009.01.020] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2008] [Revised: 01/24/2009] [Accepted: 01/28/2009] [Indexed: 11/26/2022]
Abstract
Familial hypertrophic cardiomyopathies (FHC) are the most common genetic heart diseases in the United States, affecting nearly 1 in 500 people. Manifesting as increased cardiac wall thickness, this autosomal dominant disease goes mainly unnoticed as most affected individuals are asymptomatic. Up to 1-2% of children and adolescents and 0.5-1% adults with FHC die of sudden cardiac death, making it critical to quickly and accurately diagnose FHC to institute therapy and potentially reduce mortality. However, due to the heterogeneity of the genetic defects in mainly sarcomere proteins, this is a daunting task even with current diagnostic methods. Exciting new methods utilizing high-throughput microarray technology to identify FHC mutations by a method known as array-based resequencing has recently been described. Additionally, next generation sequencing methodologies may aid in improving FHC diagnosis. In this review, we discuss FHC pathophysiology, the rationale for testing, and discuss the limitations and advantages of current and future diagnostics.
Collapse
Affiliation(s)
- Jessica E Rodríguez
- Department of Pathology and Laboratory Medicine, University of North Carolina, Chapel Hill, NC 27599-7525, USA
| | | | | |
Collapse
|
1081
|
Rokas A, Abbot P. Harnessing genomics for evolutionary insights. Trends Ecol Evol 2009; 24:192-200. [PMID: 19201503 DOI: 10.1016/j.tree.2008.11.004] [Citation(s) in RCA: 116] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2008] [Revised: 11/07/2008] [Accepted: 11/10/2008] [Indexed: 11/25/2022]
Abstract
Next-generation DNA sequencing technologies can generate unprecedented amounts of genomic data, even for non-model organisms. Here we describe how these new technologies have facilitated recent key advances in ecology and evolutionary biology, and highlight several outstanding ecological and evolutionary questions that are distinctly suited to the innovations they provide. Importantly, using these technologies to their full potential requires careful experimental design and critical consideration of several caveats associated with them. Although several significant challenges remain to be resolved before the integration of next-generation sequencing technologies into single-investigator research programs, we argue that they will soon transform ecology and evolution by fundamentally changing the ranges and types of questions that can be addressed.
Collapse
Affiliation(s)
- Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, VU Station B 35-1634, Nashville, TN 37235, USA.
| | | |
Collapse
|
1082
|
Buchanan JA, Carson AR, Chitayat D, Malkin D, Meyn MS, Ray PN, Shuman C, Weksberg R, Scherer SW. The cycle of genome-directed medicine. Genome Med 2009; 1:16. [PMID: 19341487 PMCID: PMC2664949 DOI: 10.1186/gm16] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
The genome era in medicine is upon us. Questions that arise from patient and family care are a watershed for research and technology, which in turn fuel the cycle of opportunity for impact through delivery of health services, which feeds back to families. Medical infrastructure needs to adapt to the dramatic pace of technology development in the wake of the Human Genome Project, in order for genome data to be delivered as information and applied as knowledge to benefit health.
Collapse
Affiliation(s)
- Janet A Buchanan
- The Centre for Applied Genomics, The Hospital for Sick Children, 555 University Avenue, Toronto, ON M5G 1X8, Canada
| | | | | | | | | | | | | | | | | |
Collapse
|
1083
|
Gnirke A, Melnikov A, Maguire J, Rogov P, LeProust EM, Brockman W, Fennell T, Giannoukos G, Fisher S, Russ C, Gabriel S, Jaffe DB, Lander ES, Nusbaum C. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat Biotechnol 2009; 27:182-9. [PMID: 19182786 PMCID: PMC2663421 DOI: 10.1038/nbt.1523] [Citation(s) in RCA: 1020] [Impact Index Per Article: 68.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2008] [Accepted: 01/05/2009] [Indexed: 11/29/2022]
Abstract
Targeting genomic loci by massively parallel sequencing requires new methods to enrich templates to be sequenced. We developed a capture method that uses biotinylated RNA “baits” to “fish” targets out of a “pond” of DNA fragments. The RNA is transcribed from PCR-amplified oligodeoxynucleotides originally synthesized on a microarray, generating sufficient bait for multiple captures at concentrations high enough to drive the hybridization. We tested this method with 170-mer baits that target >15,000 coding exons (2.5 Mb) and four regions (1.7 Mb total) using Illumina sequencing as read-out. About 90% of uniquely aligning bases fell on or near bait sequence; up to 50% lay on exons proper. The uniformity was such that ~60% of target bases in the exonic “catch”, and ~80% in the regional catch, had at least half the mean coverage. One lane of Illumina sequence was sufficient to call high-confidence genotypes for 89% of the targeted exon space.
Collapse
Affiliation(s)
- Andreas Gnirke
- Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, Massachusetts 02142, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1084
|
Abstract
Hypertension represents a global public health burden. In addition to the rarer Mendelian forms of hypertension, classic genetic studies have documented a significant heritable component to the most common form, essential hypertension (EH). Extensive efforts are under way to elucidate the genetic basis of this disease. Recently, a new form of Mendelian hypertension has been identified, pharmacogenetic association studies in hypertensive patients have identified novel gene-by-drug interactions, and the first genome-wide association studies of EH have been published. New findings in consomic and congenic rat models also offer new clues to the genetic architecture of this complex phenotype. In this review, the authors summarize and evaluate the most recent findings related to hypertension gene identification.
Collapse
|
1085
|
Stenson PD, Mort M, Ball EV, Howells K, Phillips AD, Thomas NST, Cooper DN. The Human Gene Mutation Database: 2008 update. Genome Med 2009; 1:13. [PMID: 19348700 PMCID: PMC2651586 DOI: 10.1186/gm13] [Citation(s) in RCA: 638] [Impact Index Per Article: 42.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
The Human Gene Mutation Database (HGMD((R))) is a comprehensive core collection of germline mutations in nuclear genes that underlie or are associated with human inherited disease. Here, we summarize the history of the database and its current resources. By December 2008, the database contained over 85,000 different lesions detected in 3,253 different genes, with new entries currently accumulating at a rate exceeding 9,000 per annum. Although originally established for the scientific study of mutational mechanisms in human genes, HGMD has since acquired a much broader utility for researchers, physicians, clinicians and genetic counselors as well as for companies specializing in biopharmaceuticals, bioinformatics and personalized genomics. HGMD was first made publicly available in April 1996, and a collaboration was initiated in 2006 between HGMD and BIOBASE GmbH. This cooperative agreement covers the exclusive worldwide marketing of the most up-to-date (subscription) version of HGMD, HGMD Professional, to academic, clinical and commercial users.
Collapse
Affiliation(s)
- Peter D Stenson
- Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, UK
| | - Matthew Mort
- Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, UK
| | - Edward V Ball
- Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, UK
| | - Katy Howells
- Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, UK
| | - Andrew D Phillips
- Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, UK
| | - Nick ST Thomas
- Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, UK
| | - David N Cooper
- Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, UK
| |
Collapse
|
1086
|
McCabe-Sellers B, Lovera D, Nuss H, Wise C, Ning B, Teitel C, Clark BS, Toennessen T, Green B, Bogle ML, Kaput J. Personalizing nutrigenomics research through community based participatory research and omics technologies. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2009; 12:263-72. [PMID: 19040372 DOI: 10.1089/omi.2008.0041] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Personal and public health information are often obtained from studies of large population groups. Risk factors for nutrients, toxins, genetic variation, and more recently, nutrient-gene interactions are statistical estimates of the percentage reduction in disease in the population if the risk were to be avoided or the gene variant were not present. Because individuals differ in genetic makeup, lifestyle, and dietary patterns than those individuals in the study population, these risk factors are valuable guidelines, but may not apply to individuals. Intervention studies are likewise limited by small sample sizes, short time frames to assess physiological changes, and variable experimental designs that often preclude comparative or consensus analyses. A fundamental challenge for nutrigenomics will be to develop a means to sort individuals into metabolic groups, and eventually, develop risk factors for individuals. To reach the goal of personalizing medicine and nutrition, new experimental strategies are needed for human study designs. A promising approach for more complete analyses of the interaction of genetic makeups and environment relies on community-based participatory research (CBPR) methodologies. CBPR's central focus is developing a partnership among researchers and individuals in a community that allows for more in depth lifestyle analyses but also translational research that simultaneously helps improve the health of individuals and communities. The USDA-ARS Delta Nutrition Intervention Research program exemplifies CBPR providing a foundation for expanded personalized nutrition and medicine research for communities and individuals.
Collapse
|
1087
|
Pervasive hitchhiking at coding and regulatory sites in humans. PLoS Genet 2009; 5:e1000336. [PMID: 19148272 PMCID: PMC2613029 DOI: 10.1371/journal.pgen.1000336] [Citation(s) in RCA: 101] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2008] [Accepted: 12/11/2008] [Indexed: 01/28/2023] Open
Abstract
Much effort and interest have focused on assessing the importance of natural selection, particularly positive natural selection, in shaping the human genome. Although scans for positive selection have identified candidate loci that may be associated with positive selection in humans, such scans do not indicate whether adaptation is frequent in general in humans. Studies based on the reasoning of the MacDonald-Kreitman test, which, in principle, can be used to evaluate the extent of positive selection, suggested that adaptation is detectable in the human genome but that it is less common than in Drosophila or Escherichia coli. Both positive and purifying natural selection at functional sites should affect levels and patterns of polymorphism at linked nonfunctional sites. Here, we search for these effects by analyzing patterns of neutral polymorphism in humans in relation to the rates of recombination, functional density, and functional divergence with chimpanzees. We find that the levels of neutral polymorphism are lower in the regions of lower recombination and in the regions of higher functional density or divergence. These correlations persist after controlling for the variation in GC content, density of simple repeats, selective constraint, mutation rate, and depth of sequencing coverage. We argue that these results are most plausibly explained by the effects of natural selection at functional sites -- either recurrent selective sweeps or background selection -- on the levels of linked neutral polymorphism. Natural selection at both coding and regulatory sites appears to affect linked neutral polymorphism, reducing neutral polymorphism by 6% genome-wide and by 11% in the gene-rich half of the human genome. These findings suggest that the effects of natural selection at linked sites cannot be ignored in the study of neutral human polymorphism.
Collapse
|
1088
|
Abstract
Two developments have sparked new directions in the genetics-to-genomics transition for research and medical applications: the advance of whole-genome assays by array or DNA sequencing technologies, and the discovery among human genomes of extensive submicroscopic genomic structural variation, including copy number variation. For health care to benefit from interpretation of genomic data, we need to know how these variants contribute to the phenotype of the individual. Research is revealing the spectrum, both in size and complexity, of structural genotypic variation, and its association with a broad range of human phenotypes. Genomic disorders associated with relatively large, recurrent contiguous variants have been recognized for some time, as have certain Mendelian traits associated with functional disruption of single genes by structural variation. More recent examples from phenotype- and genotype-driven studies demonstrate a greater level of complexity, with evidence of incremental dosage effects, gene interaction networks, buffering and modifiers, and position effects. Mechanisms underlying such variation are emerging to provide a handle on the bulk of human variation, which is associated with complex traits and adaptive potential. Interpreting genotypes for personalized health care and communicating knowledge to the individual will be significant challenges for genomics professionals.
Collapse
|
1089
|
Ryu GM, Song P, Kim KW, Oh KS, Park KJ, Kim JH. Genome-wide analysis to predict protein sequence variations that change phosphorylation sites or their corresponding kinases. Nucleic Acids Res 2009; 37:1297-307. [PMID: 19139070 PMCID: PMC2651802 DOI: 10.1093/nar/gkn1008] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
We define phosphovariants as genetic variations that change phosphorylation sites or their interacting kinases. Considering the essential role of phosphorylation in protein functions, it is highly likely that phosphovariants change protein functions. Therefore, a comparison of phosphovariants between individuals or between species can give clues about phenotypic differences. We categorized phosphovariants into three subtypes and developed a system that predicts them. Our method can be used to screen important polymorphisms and help to identify the mechanisms of genetic diseases.
Collapse
Affiliation(s)
- Gil-Mi Ryu
- Center for Genome Science, 5 Nokbun-Dong, Eunpyung-Ku, Seoul, 122-701, Korea
| | | | | | | | | | | |
Collapse
|
1090
|
Abstract
The advent of next-generation sequencing technologies has spurred remarkable progress in the field of genomics. Whereas traditional Sanger sequencing has yielded the first complete human genome sequence, next-generation methods have been able to resequence several human genomes. In this manner, next-generation approaches have powerful capabilities for understanding human variation. The throughput for these approaches is often measured in billions of base pairs per run, astounding numbers when compared with the millions of base pairs per day generated by automated capillary DNA sequencers. However, unlike traditional Sanger dideoxy sequencing, these methods have lower accuracy and shorter read lengths than the dideoxy gold standard. Are these limitations offset by the higher throughputs? An in-depth look at the single read and composite accuracy of these methods is presented. The stringent requirements for single nucleotide polymorphism (SNP) discovery utilizing these approaches is discussed along with a review of studies that have successfully employed next-generation sequencing methods for large-scale SNP discovery. Ultimately, the application of these ultra-high-throughput sequencing methods for SNP discovery will open up new horizons for understanding human genomic variation.
Collapse
|
1091
|
|
1092
|
Inter-individual variation in expression: a missing link in biomarker biology? Trends Biotechnol 2009; 27:5-10. [DOI: 10.1016/j.tibtech.2008.10.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2008] [Revised: 09/25/2008] [Accepted: 10/01/2008] [Indexed: 11/22/2022]
|
1093
|
|
1094
|
Abstract
Copy number variation (CNV) is a source of genetic diversity in humans. Numerous CNVs are being identified with various genome analysis platforms, including array comparative genomic hybridization (aCGH), single nucleotide polymorphism (SNP) genotyping platforms, and next-generation sequencing. CNV formation occurs by both recombination-based and replication-based mechanisms and de novo locus-specific mutation rates appear much higher for CNVs than for SNPs. By various molecular mechanisms, including gene dosage, gene disruption, gene fusion, position effects, etc., CNVs can cause Mendelian or sporadic traits, or be associated with complex diseases. However, CNV can also represent benign polymorphic variants. CNVs, especially gene duplication and exon shuffling, can be a predominant mechanism driving gene and genome evolution.
Collapse
Affiliation(s)
- Feng Zhang
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | | | | | | |
Collapse
|
1095
|
Abstract
Large interindividual variation is observed in both the response and toxicity associated with anticancer therapy. The etiology of this variation is multifactorial, but is due in part to host genetic variations. Pharmacogenetic and pharmacogenomic studies have successfully identified genetic variants that contribute to this variation in susceptibility to chemotherapy. This review provides an overview of the progress made in the field of pharmacogenetics and pharmacogenomics using a five-stage architecture, which includes 1) determining the role of genetics in drug response; 2) screening and identifying genetic markers; 3) validating genetic markers; 4) clinical utility assessment; and 5) pharmacoeconomic impact. Examples are provided to illustrate the identification, validation, utility, and challenges of these pharmacogenetic and pharmacogenomic markers, with the focus on the current application of this knowledge in cancer therapy. With the advance of technology, it becomes feasible to evaluate the human genome in a relatively inexpensive and efficient manner; however, extensive pharmacogenetic research and education are urgently needed to improve the translation of pharmacogenetic concepts from bench to bedside.
Collapse
Affiliation(s)
- R Stephanie Huang
- Section of Hematology and Oncology, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | | |
Collapse
|
1096
|
Abstract
Genotype imputation is now an essential tool in the analysis of genome-wide association scans. This technique allows geneticists to accurately evaluate the evidence for association at genetic markers that are not directly genotyped. Genotype imputation is particularly useful for combining results across studies that rely on different genotyping platforms but also increases the power of individual scans. Here, we review the history and theoretical underpinnings of the technique. To illustrate performance of the approach, we summarize results from several gene mapping studies. Finally, we preview the role of genotype imputation in an era when whole genome resequencing is becoming increasingly common.
Collapse
Affiliation(s)
- Yun Li
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor
| | - Cristen Willer
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor
| | - Serena Sanna
- Istituto di Neurogenetica e Neurofarmacologia, Consiglio Nazionale delle Ricerche, Cagliari, Italy
| | - Gonçalo Abecasis
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor
| |
Collapse
|
1097
|
Bhak J, Ghang H, Reja R, Kim SS. Personal Genomics, Bioinformatics, and Variomics. Genomics Inform 2008. [DOI: 10.5808/gi.2008.6.4.161] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
1098
|
Abstract
It could be argued that the greatest transformative aspect of the Human Genome Project has been not the sequencing of the genome itself, but the resultant development of new technologies. A host of new approaches has fundamentally changed the way we approach problems in basic and translational research. Now, a new generation of high-throughput sequencing technologies promises to again transform the scientific enterprise, potentially supplanting array-based technologies and opening up many new possibilities. By allowing DNA/RNA to be assayed more rapidly than previously possible, these next-generation platforms promise a deeper understanding of genome regulation and biology. Significantly enhancing sequencing throughput will allow us to follow the evolution of viral and bacterial resistance in real time, to uncover the huge diversity of novel genes that are currently inaccessible, to understand nucleic acid therapeutics, to better integrate biological information for a complete picture of health and disease at a personalized level and to move to advances that we cannot yet imagine.
Collapse
|
1099
|
Abstract
The 454 Sequencer has dramatically increased the volume of sequencing conducted by the scientific community and expanded the range of problems that can be addressed by the direct readouts of DNA sequence. Key breakthroughs in the development of the 454 sequencing platform included higher throughput, simplified all in vitro sample preparation and the miniaturization of sequencing chemistries, enabling massively parallel sequencing reactions to be carried out at a scale and cost not previously possible. Together with other recently released next-generation technologies, the 454 platform has started to democratize sequencing, providing individual laboratories with access to capacities that rival those previously found only at a handful of large sequencing centers. Over the past 18 months, 454 sequencing has led to a better understanding of the structure of the human genome, allowed the first non-Sanger sequence of an individual human and opened up new approaches to identify small RNAs. To make next-generation technologies more widely accessible, they must become easier to use and less costly. In the longer term, the principles established by 454 sequencing might reduce cost further, potentially enabling personalized genomics.
Collapse
|
1100
|
Abstract
DNA sequence represents a single format onto which a broad range of biological phenomena can be projected for high-throughput data collection. Over the past three years, massively parallel DNA sequencing platforms have become widely available, reducing the cost of DNA sequencing by over two orders of magnitude, and democratizing the field by putting the sequencing capacity of a major genome center in the hands of individual investigators. These new technologies are rapidly evolving, and near-term challenges include the development of robust protocols for generating sequencing libraries, building effective new approaches to data-analysis, and often a rethinking of experimental design. Next-generation DNA sequencing has the potential to dramatically accelerate biological and biomedical research, by enabling the comprehensive analysis of genomes, transcriptomes and interactomes to become inexpensive, routine and widespread, rather than requiring significant production-scale efforts.
Collapse
Affiliation(s)
- Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195-5065, USA.
| | | |
Collapse
|