201
|
Lundin S, Gruselius J, Nystedt B, Lexow P, Käller M, Lundeberg J. Hierarchical molecular tagging to resolve long continuous sequences by massively parallel sequencing. Sci Rep 2013; 3:1186. [PMID: 23470464 PMCID: PMC3592332 DOI: 10.1038/srep01186] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2012] [Accepted: 01/09/2013] [Indexed: 01/20/2023] Open
Abstract
Here we demonstrate the use of short-read massive sequencing systems to in effect achieve longer read lengths through hierarchical molecular tagging. We show how indexed and PCR-amplified targeted libraries are degraded, sub-sampled and arrested at timed intervals to achieve pools of differing average length, each of which is indexed with a new tag. By this process, indices of sample origin, molecular origin, and degree of degradation is incorporated in order to achieve a nested hierarchical structure, later to be utilized in the data processing to order the reads over a longer distance than the sequencing system originally allows. With this protocol we show how continuous regions beyond 3000 bp can be decoded by an Illumina sequencing system, and we illustrate the potential applications by calling variants of the lambda genome, analysing TP53 in cancer cell lines, and targeting a variable canine mitochondrial region.
Collapse
Affiliation(s)
- Sverker Lundin
- Science for Life Laboratory, KTH, Gene Technology, Solna, 171 65, Sweden
| | - Joel Gruselius
- Science for Life Laboratory, KTH, Gene Technology, Solna, 171 65, Sweden
| | - Björn Nystedt
- Science for Life Laboratory, Stockholm University, Department of Biochemistry and Biophysics, Stockholm, 106 91, Sweden
| | | | - Max Käller
- Science for Life Laboratory, KTH, Gene Technology, Solna, 171 65, Sweden
| | - Joakim Lundeberg
- Science for Life Laboratory, KTH, Gene Technology, Solna, 171 65, Sweden
| |
Collapse
|
202
|
Bafna V, Kozanitis C, Deutsch A, Ohno-Machado L, Heiberg A, Varghese G. Abstractions for Genomics. COMMUNICATIONS OF THE ACM 2013; 56:83-93. [PMID: 25284821 PMCID: PMC4183138 DOI: 10.1145/2398356.2398376] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Large genomic databases with interactive access require new, layered abstractions, including separating "evidence" from "inference."
Collapse
|
203
|
Ha G, Shah S. Distinguishing somatic and germline copy number events in cancer patient DNA hybridized to whole-genome SNP genotyping arrays. Methods Mol Biol 2013; 973:355-372. [PMID: 23412801 DOI: 10.1007/978-1-62703-281-0_22] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Chromosomal aneuploidy and segmental copy number changes are common genomic aberrations in -cancer. Copy number alterations (CNAs) arise from deletions, insertions, or duplications resulting in -chromosomal aberrations and aneuploidy. Genomes of normal cells also exhibit variable copy number called germline copy number variants (CNVs). CNVs in the general population tend to confound interpretation of predictions when attempting to extract relevant driver somatic events in cancer. In large studies of CNAs in cancer patients, it becomes necessary to accurately identify and separate CNAs and CNVs so as to prioritize candidate tumor suppressors and oncogenes. We have developed a probabilistic approach, HMM-Dosage, for segmenting and distinguishing CNAs and CNVs as separate, discrete events in cancer SNP genotyping array data. We outline the steps and computer code for the analysis of whole-genome cancer DNA hybridized to SNP genotyping arrays, focusing on distinguishing somatic CNA and germline CNVs, and describe the combined approach of HMM-Dosage for probabilistic inference and classification of somatic and germline copy number changes.
Collapse
Affiliation(s)
- Gavin Ha
- Molecular Oncology, BC Cancer Agency, Vancouver, BC, Canada.
| | | |
Collapse
|
204
|
'Omics' approaches to understanding interstitial cystitis/painful bladder syndrome/bladder pain syndrome. Int Neurourol J 2012; 16:159-68. [PMID: 23346481 PMCID: PMC3547176 DOI: 10.5213/inj.2012.16.4.159] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2012] [Accepted: 12/18/2012] [Indexed: 11/08/2022] Open
Abstract
Recent efforts in the generation of large genomics, transcriptomics, proteomics, metabolomics and other types of 'omics' data sets have provided an unprecedentedly detailed view of certain diseases, however to date most of this literature has been focused on malignancy and other lethal pathological conditions. Very little intensive work on global profiles has been performed to understand the molecular mechanism of interstitial cystitis/painful bladder syndrome/bladder pain syndrome (IC/PBS/BPS), a chronic lower urinary tract disorder characterized by pelvic pain, urinary urgency and frequency, which can lead to long lasting adverse effects on quality of life. A lack of understanding of molecular mechanism has been a challenge and dilemma for diagnosis and treatment, and has also led to a delay in basic and translational research focused on biomarker and drug discovery, clinical therapy, and preventive strategies against IC/PBS/BPS. This review describes the current state of 'omics' studies and available data sets relevant to IC/PBS/BPS, and presents opportunities for new research directed at understanding the pathogenesis of this complex condition.
Collapse
|
205
|
Hayes M, Pyon YS, Li J. A model-based clustering method for genomic structural variant prediction and genotyping using paired-end sequencing data. PLoS One 2012; 7:e52881. [PMID: 23300804 PMCID: PMC3531386 DOI: 10.1371/journal.pone.0052881] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2012] [Accepted: 11/22/2012] [Indexed: 01/08/2023] Open
Abstract
Structural variation (SV) has been reported to be associated with numerous diseases such as cancer. With the advent of next generation sequencing (NGS) technologies, various types of SV can be potentially identified. We propose a model based clustering approach utilizing a set of features defined for each type of SV events. Our method, termed SVMiner, not only provides a probability score for each candidate, but also predicts the heterozygosity of genomic deletions. Extensive experiments on genome-wide deep sequencing data have demonstrated that SVMiner is robust against the variability of a single cluster feature, and it significantly outperforms several commonly used SV detection programs. SVMiner can be downloaded from http://cbc.case.edu/svminer/.
Collapse
Affiliation(s)
- Matthew Hayes
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, Ohio, United States of America
| | - Yoon Soo Pyon
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, Ohio, United States of America
| | - Jing Li
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, Ohio, United States of America
| |
Collapse
|
206
|
Abstract
Differences between individual human genomes, or between human and cancer genomes, range in scale from single nucleotide variants (SNVs) through intermediate and large-scale duplications, deletions, and rearrangements of genomic segments. The latter class, called structural variants (SVs), have received considerable attention in the past several years as they are a previously under appreciated source of variation in human genomes. Much of this recent attention is the result of the availability of higher-resolution technologies for measuring these variants, including both microarray-based techniques, and more recently, high-throughput DNA sequencing. We describe the genomic technologies and computational techniques currently used to measure SVs, focusing on applications in human and cancer genomics.
Collapse
Affiliation(s)
- Benjamin J Raphael
- Department of Computer Science and Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America.
| |
Collapse
|
207
|
Lee H, Popodi E, Foster PL, Tang H. Detecting structural variants involving repetitive elements: capturing transposition events of IS elements in the genome of Escherichia coli. BMC Bioinformatics 2012. [PMCID: PMC3522030 DOI: 10.1186/1471-2105-13-s18-a12] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
208
|
Zichner T, Garfield DA, Rausch T, Stütz AM, Cannavó E, Braun M, Furlong EEM, Korbel JO. Impact of genomic structural variation in Drosophila melanogaster based on population-scale sequencing. Genome Res 2012; 23:568-79. [PMID: 23222910 PMCID: PMC3589545 DOI: 10.1101/gr.142646.112] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Genomic structural variation (SV) is a major determinant for phenotypic variation. Although it has been extensively studied in humans, the nucleotide resolution structure of SVs within the widely used model organism Drosophila remains unknown. We report a highly accurate, densely validated map of unbalanced SVs comprising 8962 deletions and 916 tandem duplications in 39 lines derived from short-read DNA sequencing in a natural population (the “Drosophila melanogaster Genetic Reference Panel,” DGRP). Most SVs (>90%) were inferred at nucleotide resolution, and a large fraction was genotyped across all samples. Comprehensive analyses of SV formation mechanisms using the short-read data revealed an abundance of SVs formed by mobile element and nonhomologous end-joining-mediated rearrangements, and clustering of variants into SV hotspots. We further observed a strong depletion of SVs overlapping genes, which, along with population genetics analyses, suggests that these SVs are often deleterious. We inferred several gene fusion events also highlighting the potential role of SVs in the generation of novel protein products. Expression quantitative trait locus (eQTL) mapping revealed the functional impact of our high-resolution SV map, with quantifiable effects at >100 genic loci. Our map represents a resource for population-level studies of SVs in an important model organism.
Collapse
Affiliation(s)
- Thomas Zichner
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
| | | | | | | | | | | | | | | |
Collapse
|
209
|
Assessing the risks of genotoxicity in the therapeutic development of induced pluripotent stem cells. Mol Ther 2012. [PMID: 23207694 DOI: 10.1038/mt.2012.255] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Induced pluripotent stem cells (iPSCs) have great potential for regenerative medicine as well as for basic and translational research. However, following the initial excitement over the enormous prospects of this technology, several reports uncovered serious concerns regarding its safety for clinical applications and reproducibility for laboratory applications such as disease modeling or drug screening. In particular, the genomic integrity of iPSCs is the focus of extensive research. Epigenetic remodeling, aberrant expression of reprogramming factors, clonal selection, and prolonged in vitro culture are potential pathways for acquiring genomic alterations. In this review, we will critically discuss current reprogramming technologies particularly in the context of genotoxicity, and the consequences of these alternations for the potential applications of reprogrammed cells. In addition, current strategies of genetic modification of iPSCs, as well as applicable suicide strategies to control the risk of iPSC-based therapies will be introduced.
Collapse
|
210
|
Systems genetics in "-omics" era: current and future development. Theory Biosci 2012; 132:1-16. [PMID: 23138757 DOI: 10.1007/s12064-012-0168-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2012] [Accepted: 10/25/2012] [Indexed: 02/06/2023]
Abstract
The systems genetics is an emerging discipline that integrates high-throughput expression profiling technology and systems biology approaches for revealing the molecular mechanism of complex traits, and will improve our understanding of gene functions in the biochemical pathway and genetic interactions between biological molecules. With the rapid advances of microarray analysis technologies, bioinformatics is extensively used in the studies of gene functions, SNP-SNP genetic interactions, LD block-block interactions, miRNA-mRNA interactions, DNA-protein interactions, protein-protein interactions, and functional mapping for LD blocks. Based on bioinformatics panel, which can integrate "-omics" datasets to extract systems knowledge and useful information for explaining the molecular mechanism of complex traits, systems genetics is all about to enhance our understanding of biological processes. Systems biology has provided systems level recognition of various biological phenomena, and constructed the scientific background for the development of systems genetics. In addition, the next-generation sequencing technology and post-genome wide association studies empower the discovery of new gene and rare variants. The integration of different strategies will help to propose novel hypothesis and perfect the theoretical framework of systems genetics, which will make contribution to the future development of systems genetics, and open up a whole new area of genetics.
Collapse
|
211
|
Story M, Ding LH, Brock WA, Ang KK, Alsbeih G, Minna J, Park S, Das A. Defining molecular and cellular responses after low and high linear energy transfer radiations to develop biomarkers of carcinogenic risk or therapeutic outcome. HEALTH PHYSICS 2012; 103:596-606. [PMID: 23032890 PMCID: PMC4492459 DOI: 10.1097/hp.0b013e3182692085] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
The variability in radiosensitivity across the human population is governed in part by genetic factors. The ability to predict therapeutic response, identify individuals at greatest risk for adverse clinical responses after therapeutic radiation doses, or identify individuals at high risk for carcinogenesis from environmental or medical radiation exposures has a medical and economic impact on both the individual and society at large. As radiotherapy incorporates particles, particularly particles larger than protons, into therapy, the need for such discriminators, (i.e., biomarkers) will become ever more important. Cellular assays for survival, DNA repair, or chromatid/chromosomal analysis have been used to identify at-risk individuals, but they are not clinically applicable. Newer approaches, such as genome-wide analysis of gene expression or single nucleotide polymorphisms and small copy number variations within chromosomes, are examples of technologies being applied to the discovery process. Gene expression analysis of primary or immortalized human cells suggests that there are distinct gene expression patterns associated with radiation exposure to both low and high linear energy transfer radiations and that those most radiosensitive are discernible by their basal gene expression patterns. However, because the genetic alterations that drive radio response may be subtle and cumulative, the need for large sample sizes of specific cell or tissue types is required. A systems biology approach will ultimately be necessary. Potential biomarkers from cell lines or animal models will require validation in a human setting where possible and before being considered as a credible biomarker some understanding of the molecular mechanism is necessary.
Collapse
Affiliation(s)
- Michael Story
- Department of Radiation Oncology, Division of Molecular Radiation Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.
| | | | | | | | | | | | | | | |
Collapse
|
212
|
Costelloe SJ, El-Sayed Moustafa JS, Drenos F, Palmen J, Li Q, Qiao L, Whiting S, Thomas M, Kivimaki M, Kumari M, Hingorani AD, Tzoulaki I, Järvelin MR, Marjo-Riitta J, Ruokonen A, Aimo R, Hartikainen AL, Pouta A, Walters RG, Blakemore AIF, Humphries SE, Coin LJM, Talmud PJ. Gene-targeted analysis of copy number variants identifies 3 novel associations with coronary heart disease traits. CIRCULATION. CARDIOVASCULAR GENETICS 2012; 5:555-60. [PMID: 22972876 DOI: 10.1161/circgenetics.111.961037] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
BACKGROUND Copy number variants (CNVs) are a major form of genomic variation, which may be implicated in complex disease phenotypes. However, investigation of the role of CNVs in coronary heart disease (CHD) traits has been limited. METHODS AND RESULTS We examined the use of the cnvHap algorithm for CNV detection, using data for 2500 men from the Second Northwick Park Heart Study (NPHS-II). An Illumina custom chip, including 722 single-nucleotide polymorphisms covering 76 coronary heart disease-trait genes, was used. Common CNVs were significantly associated (at P<0.05, after correction) with coronary heart disease phenotypes in 5 genes. Novel associations of CNVs in toll-like receptor-4 with apolipoprotein AI were replicated (P<0.05) in the Whitehall II cohort (4887 subjects), whereas newly described associations of CNVs in sterol regulatory element-binding protein with apolipoprotein AI and associations of interleukin-6 signal transducer with apolipoprotein B were replicated in the data from 3546 subjects from the North Finnish Birth Cohort 1966 (P<0.05). CONCLUSIONS This study supports the use of CNV detection algorithms such as cnvHap as potential tools for the identification of novel CNVs, some of which show significant association and replication with coronary heart disease risk phenotypes. However, the functional basis for these associations requires further substantiation.
Collapse
Affiliation(s)
- Seán J Costelloe
- Center for Cardiovascular Genetics, Institute of Cardiovascular Science, The Royal Free London NHS Foundation Trust, Pond St, London, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
213
|
Yao F, Ariyaratne PN, Hillmer AM, Lee WH, Li G, Teo ASM, Woo XY, Zhang Z, Chen JP, Poh WT, Zawack KFB, Chan CS, Leong ST, Neo SC, Choi PSD, Gao S, Nagarajan N, Thoreau H, Shahab A, Ruan X, Cacheux-Rataboul V, Wei CL, Bourque G, Sung WK, Liu ET, Ruan Y. Long span DNA paired-end-tag (DNA-PET) sequencing strategy for the interrogation of genomic structural mutations and fusion-point-guided reconstruction of amplicons. PLoS One 2012; 7:e46152. [PMID: 23029419 PMCID: PMC3461012 DOI: 10.1371/journal.pone.0046152] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2012] [Accepted: 08/28/2012] [Indexed: 01/23/2023] Open
Abstract
Structural variations (SVs) contribute significantly to the variability of the human genome and extensive genomic rearrangements are a hallmark of cancer. While genomic DNA paired-end-tag (DNA-PET) sequencing is an attractive approach to identify genomic SVs, the current application of PET sequencing with short insert size DNA can be insufficient for the comprehensive mapping of SVs in low complexity and repeat-rich genomic regions. We employed a recently developed procedure to generate PET sequencing data using large DNA inserts of 10–20 kb and compared their characteristics with short insert (1 kb) libraries for their ability to identify SVs. Our results suggest that although short insert libraries bear an advantage in identifying small deletions, they do not provide significantly better breakpoint resolution. In contrast, large inserts are superior to short inserts in providing higher physical genome coverage for the same sequencing cost and achieve greater sensitivity, in practice, for the identification of several classes of SVs, such as copy number neutral and complex events. Furthermore, our results confirm that large insert libraries allow for the identification of SVs within repetitive sequences, which cannot be spanned by short inserts. This provides a key advantage in studying rearrangements in cancer, and we show how it can be used in a fusion-point-guided-concatenation algorithm to study focally amplified regions in cancer.
Collapse
Affiliation(s)
- Fei Yao
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
- Department of Epidemiology and Public Health, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Pramila N. Ariyaratne
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Axel M. Hillmer
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Wah Heng Lee
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Guoliang Li
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Audrey S. M. Teo
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Xing Yi Woo
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Zhenshui Zhang
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Jieqi P. Chen
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Wan Ting Poh
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Kelson F. B. Zawack
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Chee Seng Chan
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - See Ting Leong
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Say Chuan Neo
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Poh Sum D. Choi
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Song Gao
- Graduate School for Integrative Sciences and Engineering, Centre for Life Sciences, National University of Singapore, Singapore, Singapore
| | - Niranjan Nagarajan
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Hervé Thoreau
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Atif Shahab
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Xiaoan Ruan
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Valère Cacheux-Rataboul
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Chia-Lin Wei
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Guillaume Bourque
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Wing-Kin Sung
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Edison T. Liu
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Yijun Ruan
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- * E-mail:
| |
Collapse
|
214
|
Sindi SS, Onal S, Peng LC, Wu HT, Raphael BJ. An integrative probabilistic model for identification of structural variation in sequencing data. Genome Biol 2012; 13:R22. [PMID: 22452995 PMCID: PMC3439973 DOI: 10.1186/gb-2012-13-3-r22] [Citation(s) in RCA: 110] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2011] [Accepted: 03/27/2012] [Indexed: 12/12/2022] Open
Abstract
Paired-end sequencing is a common approach for identifying structural variation (SV) in genomes. Discrepancies between the observed and expected alignments indicate potential SVs. Most SV detection algorithms use only one of the possible signals and ignore reads with multiple alignments. This results in reduced sensitivity to detect SVs, especially in repetitive regions. We introduce GASVPro, an algorithm combining both paired read and read depth signals into a probabilistic model that can analyze multiple alignments of reads. GASVPro outperforms existing methods with a 50 to 90% improvement in specificity on deletions and a 50% improvement on inversions. GASVPro is available at http://compbio.cs.brown.edu/software.
Collapse
Affiliation(s)
- Suzanne S Sindi
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA.
| | | | | | | | | |
Collapse
|
215
|
Rolfe PA, Bernstein DA, Grisafi P, Fink GR, Gifford DK. Ruler arrays reveal haploid genomic structural variation. PLoS One 2012; 7:e43210. [PMID: 22952647 PMCID: PMC3428316 DOI: 10.1371/journal.pone.0043210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2012] [Accepted: 07/18/2012] [Indexed: 11/18/2022] Open
Abstract
Despite the known relevance of genomic structural variants to pathogen behavior, cancer, development, and evolution, certain repeat based structural variants may evade detection by existing high-throughput techniques. Here, we present ruler arrays, a technique to detect genomic structural variants including insertions and deletions (indels), duplications, and translocations. A ruler array exploits DNA polymerase’s processivity to detect physical distances between defined genomic sequences regardless of the intervening sequence. The method combines a sample preparation protocol, tiling genomic microarrays, and a new computational analysis. The analysis of ruler array data from two genomic samples enables the identification of structural variation between the samples. In an empirical test between two closely related haploid strains of yeast ruler arrays detected 78% of the structural variants larger than 100 bp.
Collapse
Affiliation(s)
- P. Alexander Rolfe
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Douglas A. Bernstein
- The Whitehead Institute for Biomedical Research, Cambridge, Massachusetts, United States of America
| | - Paula Grisafi
- The Whitehead Institute for Biomedical Research, Cambridge, Massachusetts, United States of America
| | - Gerald R. Fink
- The Whitehead Institute for Biomedical Research, Cambridge, Massachusetts, United States of America
- * E-mail: (DKG); (GRF)
| | - David K. Gifford
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- * E-mail: (DKG); (GRF)
| |
Collapse
|
216
|
Testing the genomic enrichment of a large copy number variation within schizophrenia linkage regions. Psychiatr Genet 2012; 22:294-7. [PMID: 22914616 DOI: 10.1097/ypg.0b013e3283586231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
We hypothesize that copy number variants (CNVs) contribute to the location of schizophrenia linkage regions. Therefore, we test whether CNVs published by the International Schizophrenia Consortium are enriched in schizophrenia linkage regions recorded in the Online Mendelian Inheritance in Man database. For each region, the number of overlapping CNV events and the number of CNV base pairs are compared with 10 000 random regions of matched size. This shows an enrichment of CNV events within the linkage regions SCZD4 (22q11), SCZD10 (15q13-q14) and SCZD12 (1p36) for both cases and controls. The magnitude of this genomic enrichment of CNV event is more pronounced among cases for SCZD10 and SCZD12, whereas the number of CNV base pairs is greater among cases for SCZD4 and SCZD10. These results are consistent with a higher mutability that has produced an increased CNV burden in these regions in both cases and controls, with CNVs being more likely to be deleterious among cases.
Collapse
|
217
|
Cantsilieris S, White SJ. Correlating multiallelic copy number polymorphisms with disease susceptibility. Hum Mutat 2012; 34:1-13. [PMID: 22837109 DOI: 10.1002/humu.22172] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2012] [Accepted: 07/13/2012] [Indexed: 01/20/2023]
Abstract
The human genome contains a significant amount of sequence variation, from single nucleotide polymorphisms to large stretches of DNA that may be present in a range of different copies between individuals. Several such regions are variable in >1% of the population (referred to as copy number polymorphisms or CNPs), and many studies have looked for associations between the copy number of genes within multiallelic CNPs and disease susceptibility. Associations have indeed been described for several genes, including the β-defensins (DEFB4, DEFB103, DEFB104), chemokine ligand 3 like 1 (CCL3L1), Fc gamma receptor 3B (FCGR3B), and complement component C4 (C4). However, follow-up replication in independent cohorts has failed to reproduce a number of these associations. It is clear that replicated associations such as those between C4 and systemic lupus erythematosus, and β-defensin and psoriasis, have used robust genotyping methodologies. Technical issues associated with genotyping sequences of high identity may therefore account for failure to replicate other associations. Here, we compare and contrast the most popular approaches that have been used to genotype CNPs, describe how they have been applied in different situations, and discuss potential reasons for the difficulty in reproducibly linking multiallelic CNPs to complex diseases.
Collapse
Affiliation(s)
- Stuart Cantsilieris
- Centre for Reproduction and Development, Monash Institute of Medical Research, Monash University, Melbourne, Victoria, Australia
| | | |
Collapse
|
218
|
Williams LJS, Tabbaa DG, Li N, Berlin AM, Shea TP, Maccallum I, Lawrence MS, Drier Y, Getz G, Young SK, Jaffe DB, Nusbaum C, Gnirke A. Paired-end sequencing of Fosmid libraries by Illumina. Genome Res 2012; 22:2241-9. [PMID: 22800726 PMCID: PMC3483553 DOI: 10.1101/gr.138925.112] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Eliminating the bacterial cloning step has been a major factor in the vastly improved efficiency of massively parallel sequencing approaches. However, this also has made it a technical challenge to produce the modern equivalent of the Fosmid- or BAC-end sequences that were crucial for assembling and analyzing complex genomes during the Sanger-based sequencing era. To close this technology gap, we developed Fosill, a method for converting Fosmids to Illumina-compatible jumping libraries. We constructed Fosmid libraries in vectors with Illumina primer sequences and specific nicking sites flanking the cloning site. Our family of pFosill vectors allows multiplex Fosmid cloning of end-tagged genomic fragments without physical size selection and is compatible with standard and multiplex paired-end Illumina sequencing. To excise the bulk of each cloned insert, we introduced two nicks in the vector, translated them into the inserts, and cleaved them. Recircularization of the vector via coligation of insert termini followed by inverse PCR generates a jumping library for paired-end sequencing with 101-base reads. The yield of unique Fosmid-sized jumps is sufficiently high, and the background of short, incorrectly spaced and chimeric artifacts sufficiently low, to enable applications such as mapping of structural variation and scaffolding of de novo assemblies. We demonstrate the power of Fosill to map genome rearrangements in a cancer cell line and identified three fusion genes that were corroborated by RNA-seq data. Our Fosill-powered assembly of the mouse genome has an N50 scaffold length of 17.0 Mb, rivaling the connectivity (16.9 Mb) of the Sanger-sequencing based draft assembly.
Collapse
|
219
|
Liu GE, Bickhart DM. Copy number variation in the cattle genome. Funct Integr Genomics 2012; 12:609-24. [DOI: 10.1007/s10142-012-0289-9] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2012] [Revised: 06/13/2012] [Accepted: 06/20/2012] [Indexed: 11/29/2022]
|
220
|
Ma J, Amos CI. Investigation of inversion polymorphisms in the human genome using principal components analysis. PLoS One 2012; 7:e40224. [PMID: 22808122 PMCID: PMC3392271 DOI: 10.1371/journal.pone.0040224] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2011] [Accepted: 06/02/2012] [Indexed: 11/18/2022] Open
Abstract
Despite the significant advances made over the last few years in mapping inversions with the advent of paired-end sequencing approaches, our understanding of the prevalence and spectrum of inversions in the human genome has lagged behind other types of structural variants, mainly due to the lack of a cost-efficient method applicable to large-scale samples. We propose a novel method based on principal components analysis (PCA) to characterize inversion polymorphisms using high-density SNP genotype data. Our method applies to non-recurrent inversions for which recombination between the inverted and non-inverted segments in inversion heterozygotes is suppressed due to the loss of unbalanced gametes. Inside such an inversion region, an effect similar to population substructure is thus created: two distinct “populations” of inversion homozygotes of different orientations and their 1∶1 admixture, namely the inversion heterozygotes. This kind of substructure can be readily detected by performing PCA locally in the inversion regions. Using simulations, we demonstrated that the proposed method can be used to detect and genotype inversion polymorphisms using unphased genotype data. We applied our method to the phase III HapMap data and inferred the inversion genotypes of known inversion polymorphisms at 8p23.1 and 17q21.31. These inversion genotypes were validated by comparing with literature results and by checking Mendelian consistency using the family data whenever available. Based on the PCA-approach, we also performed a preliminary genome-wide scan for inversions using the HapMap data, which resulted in 2040 candidate inversions, 169 of which overlapped with previously reported inversions. Our method can be readily applied to the abundant SNP data, and is expected to play an important role in developing human genome maps of inversions and exploring associations between inversions and susceptibility of diseases.
Collapse
Affiliation(s)
- Jianzhong Ma
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America.
| | | |
Collapse
|
221
|
Steinberg KM, Antonacci F, Sudmant PH, Kidd JM, Campbell CD, Vives L, Malig M, Scheinfeldt L, Beggs W, Ibrahim M, Lema G, Nyambo TB, Omar SA, Bodo JM, Froment A, Donnelly MP, Kidd KK, Tishkoff SA, Eichler EE. Structural diversity and African origin of the 17q21.31 inversion polymorphism. Nat Genet 2012; 44:872-80. [PMID: 22751100 PMCID: PMC3408829 DOI: 10.1038/ng.2335] [Citation(s) in RCA: 99] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2011] [Accepted: 06/01/2012] [Indexed: 12/12/2022]
Abstract
The 17q21.31 inversion polymorphism exists either as direct (H1) or inverted (H2) haplotypes with differential predispositions to disease and selection. We investigated its genetic diversity in 2,700 individuals, with an emphasis on African populations. We characterize eight structural haplotypes due to complex rearrangements that vary in size from 1.08-1.49 Mb and provide evidence for a 30-kb H1-H2 double recombination event. We show that recurrent partial duplications of the KANSL1 gene have occurred on both the H1 and H2 haplotypes and have risen to high frequency in European populations. We identify a likely ancestral H2 haplotype (H2') lacking these duplications that is enriched among African hunter-gatherer groups yet essentially absent from West African populations. Whereas H1 and H2 segmental duplications arose independently and before human migration out of Africa, they have reached high frequencies recently among Europeans, either because of extraordinary genetic drift or selective sweeps.
Collapse
|
222
|
Song HH, Hu HJ, Seok IH, Chung YJ. Identifying Copy Number Variants under Selection in Geographically Structured Populations Based on F-statistics. Genomics Inform 2012; 10:81-7. [PMID: 23105934 PMCID: PMC3480682 DOI: 10.5808/gi.2012.10.2.81] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2012] [Revised: 05/11/2012] [Accepted: 05/17/2012] [Indexed: 11/20/2022] Open
Abstract
Large-scale copy number variants (CNVs) in the human provide the raw material for delineating population differences, as natural selection may have affected at least some of the CNVs thus far discovered. Although the examination of relatively large numbers of specific ethnic groups has recently started in regard to inter-ethnic group differences in CNVs, identifying and understanding particular instances of natural selection have not been performed. The traditional FST measure, obtained from differences in allele frequencies between populations, has been used to identify CNVs loci subject to geographically varying selection. Here, we review advances and the application of multinomial-Dirichlet likelihood methods of inference for identifying genome regions that have been subject to natural selection with the FST estimates. The contents of presentation are not new; however, this review clarifies how the application of the methods to CNV data, which remains largely unexplored, is possible. A hierarchical Bayesian method, which is implemented via Markov Chain Monte Carlo, estimates locus-specific FST and can identify outlying CNVs loci with large values of FST. By applying this Bayesian method to the publicly available CNV data, we identified the CNV loci that show signals of natural selection, which may elucidate the genetic basis of human disease and diversity.
Collapse
Affiliation(s)
- Hae-Hiang Song
- Division of Biostatistics, Department of Medical Lifescience, The Catholic University of Korea, College of Medicine, Seoul 137-040, Korea
| | | | | | | |
Collapse
|
223
|
McPherson A, Wu C, Wyatt AW, Shah S, Collins C, Sahinalp SC. nFuse: discovery of complex genomic rearrangements in cancer using high-throughput sequencing. Genome Res 2012; 22:2250-61. [PMID: 22745232 PMCID: PMC3483554 DOI: 10.1101/gr.136572.111] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Complex genomic rearrangements (CGRs) are emerging as a new feature of cancer genomes. CGRs are characterized by multiple genomic breakpoints and thus have the potential to simultaneously affect multiple genes, fusing some genes and interrupting other genes. Analysis of high-throughput whole-genome shotgun sequencing (WGSS) is beginning to facilitate the discovery and characterization of CGRs, but further development of computational methods is required. We have developed an algorithmic method for identifying CGRs in WGSS data based on shortest alternating paths in breakpoint graphs. Aiming for a method with the highest possible sensitivity, we use breakpoint graphs built from all WGSS data, including sequences with ambiguous genomic origin. Since the majority of cell function is encoded by the transcriptome, we target our search to find CGRs that underlie fusion transcripts predicted from matched high-throughput cDNA sequencing (RNA-seq). We have applied our method, nFuse, to the discovery of CGRs in publicly available data from the well-studied breast cancer cell line HCC1954 and primary prostate tumor sample 963. We first establish the sensitivity and specificity of the nFuse breakpoint prediction and scoring method using breakpoints previously discovered in HCC1954. We then validate five out of six CGRs in HCC1954 and two out of two CGRs in 963. We show examples of gene fusions that would be difficult to discover using methods that do not account for the existence of CGRs, including one important event that was missed in a previous study of the HCC1954 genome. Finally, we illustrate how CGRs may be used to infer the gene expression history of a tumor.
Collapse
Affiliation(s)
- Andrew McPherson
- School of Computing Science, Simon Fraser University, Vancouver, British Columbia V5A 1S6, Canada.
| | | | | | | | | | | |
Collapse
|
224
|
Chiara M, Pesole G, Horner DS. SVM²: an improved paired-end-based tool for the detection of small genomic structural variations using high-throughput single-genome resequencing data. Nucleic Acids Res 2012; 40:e145. [PMID: 22735696 PMCID: PMC3467043 DOI: 10.1093/nar/gks606] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Several bioinformatics methods have been proposed for the detection and characterization of genomic structural variation (SV) from ultra high-throughput genome resequencing data. Recent surveys show that comprehensive detection of SV events of different types between an individual resequenced genome and a reference sequence is best achieved through the combination of methods based on different principles (split mapping, reassembly, read depth, insert size, etc.). The improvement of individual predictors is thus an important objective. In this study, we propose a new method that combines deviations from expected library insert sizes and additional information from local patterns of read mapping and uses supervised learning to predict the position and nature of structural variants. We show that our approach provides greatly increased sensitivity with respect to other tools based on paired end read mapping at no cost in specificity, and it makes reliable predictions of very short insertions and deletions in repetitive and low-complexity genomic contexts that can confound tools based on split mapping of reads.
Collapse
Affiliation(s)
- Matteo Chiara
- Department of Biomolecular Sciences and Biotechnology, University of Milan, Milan 20133, Italy.
| | | | | |
Collapse
|
225
|
Lee J, Kim B, Yoon J, Lee U. Detection of copy number variation using scale space filtering. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2012; 2011:5555-8. [PMID: 22255597 DOI: 10.1109/iembs.2011.6091417] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
This study proposes a novel CNV detection algorithm based on scale space filtering. It uses Gaussian filter for the convolution with a scale parameter. The range of the scale parameter is adjusted according to the coverage level of read data. The position of a CNV region is determined through a coarse and a fine searches over the scales. The results showed low dependency of the performance of the proposed method on the coverage level compared to the conventional methods. The results also showed that the proposed method outperforms the conventional methods by 63.29 ~ 73.57 %.
Collapse
Affiliation(s)
- Jongkeun Lee
- Department of Computer Engineering, Hallym University, Korea.
| | | | | | | |
Collapse
|
226
|
Valsesia A, Stevenson BJ, Waterworth D, Mooser V, Vollenweider P, Waeber G, Jongeneel CV, Beckmann JS, Kutalik Z, Bergmann S. Identification and validation of copy number variants using SNP genotyping arrays from a large clinical cohort. BMC Genomics 2012; 13:241. [PMID: 22702538 PMCID: PMC3464625 DOI: 10.1186/1471-2164-13-241] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2011] [Accepted: 06/15/2012] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND Genotypes obtained with commercial SNP arrays have been extensively used in many large case-control or population-based cohorts for SNP-based genome-wide association studies for a multitude of traits. Yet, these genotypes capture only a small fraction of the variance of the studied traits. Genomic structural variants (GSV) such as Copy Number Variation (CNV) may account for part of the missing heritability, but their comprehensive detection requires either next-generation arrays or sequencing. Sophisticated algorithms that infer CNVs by combining the intensities from SNP-probes for the two alleles can already be used to extract a partial view of such GSV from existing data sets. RESULTS Here we present several advances to facilitate the latter approach. First, we introduce a novel CNV detection method based on a Gaussian Mixture Model. Second, we propose a new algorithm, PCA merge, for combining copy-number profiles from many individuals into consensus regions. We applied both our new methods as well as existing ones to data from 5612 individuals from the CoLaus study who were genotyped on Affymetrix 500K arrays. We developed a number of procedures in order to evaluate the performance of the different methods. This includes comparison with previously published CNVs as well as using a replication sample of 239 individuals, genotyped with Illumina 550K arrays. We also established a new evaluation procedure that employs the fact that related individuals are expected to share their CNVs more frequently than randomly selected individuals. The ability to detect both rare and common CNVs provides a valuable resource that will facilitate association studies exploring potential phenotypic associations with CNVs. CONCLUSION Our new methodologies for CNV detection and their evaluation will help in extracting additional information from the large amount of SNP-genotyping data on various cohorts and use this to explore structural variants and their impact on complex traits.
Collapse
Affiliation(s)
- Armand Valsesia
- Department of Medical Genetics, University of Lausanne, Lausanne, Switzerland
| | | | | | | | | | | | | | | | | | | |
Collapse
|
227
|
Marotta M, Piontkivska H, Tanaka H. Molecular trajectories leading to the alternative fates of duplicate genes. PLoS One 2012; 7:e38958. [PMID: 22720000 PMCID: PMC3375281 DOI: 10.1371/journal.pone.0038958] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2012] [Accepted: 05/14/2012] [Indexed: 11/21/2022] Open
Abstract
Gene duplication generates extra gene copies in which mutations can accumulate without risking the function of pre-existing genes. Such mutations modify duplicates and contribute to evolutionary novelties. However, the vast majority of duplicates appear to be short-lived and experience duplicate silencing within a few million years. Little is known about the molecular mechanisms leading to these alternative fates. Here we delineate differing molecular trajectories of a relatively recent duplication event between humans and chimpanzees by investigating molecular properties of a single duplicate: DNA sequences, gene expression and promoter activities. The inverted duplication of the Glutathione S-transferase Theta 2 (GSTT2) gene had occurred at least 7 million years ago in the common ancestor of African great apes and is preserved in chimpanzees (Pan troglodytes), whereas a deletion polymorphism is prevalent in humans. The alternative fates are associated with expression divergence between these species, and reduced expression in humans is regulated by silencing mutations that have been propagated between duplicates by gene conversion. In contrast, selective constraint preserved duplicate divergence in chimpanzees. The difference in evolutionary processes left a unique DNA footprint in which dying duplicates are significantly more similar to each other (99.4%) than preserved ones. Such molecular trajectories could provide insights for the mechanisms underlying duplicate life and death in extant genomes.
Collapse
Affiliation(s)
- Michael Marotta
- Department of Molecular Genetics, Cleveland Clinic Foundation, Cleveland, Ohio, United States of America
| | - Helen Piontkivska
- Department of Biological Sciences, Kent State University, Kent, Ohio, United States of America
| | - Hisashi Tanaka
- Department of Molecular Genetics, Cleveland Clinic Foundation, Cleveland, Ohio, United States of America
| |
Collapse
|
228
|
Tan S, Zhong Y, Hou H, Yang S, Tian D. Variation of presence/absence genes among Arabidopsis populations. BMC Evol Biol 2012; 12:86. [PMID: 22697058 PMCID: PMC3433342 DOI: 10.1186/1471-2148-12-86] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2011] [Accepted: 06/14/2012] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Gene presence/absence (P/A) polymorphisms are commonly observed in plants and are important in individual adaptation and species differentiation. Detecting their abundance, distribution and variation among individuals would help to understand the role played by these polymorphisms in a given species. The recently sequenced 80 Arabidopsis genomes provide an opportunity to address these questions. RESULTS By systematically investigating these accessions, we identified 2,407 P/A genes (or 8.9%) absent in one or more genomes, averaging 444 absent genes per accession. 50.6% of P/A genes belonged to multi-copy gene families, or 31.0% to clustered genes. However, the highest proportion of P/A genes, outnumbered in singleton genes, was observed in the regions near centromeres. In addition, a significant correlation was observed between the P/A gene frequency among the 80 accessions and the diversity level at P/A loci. Furthermore, the proportion of P/A genes was different among functional gene categories. Finally, a P/A gene tree showed a diversified population structure in the worldwide Arabidopsis accessions. CONCLUSIONS An estimate of P/A genes and their frequency distribution in the worldwide Arabidopsis accessions was obtained. Our results suggest that there are diverse mechanisms to generate or maintain P/A genes, by which individuals and functionally different genes can selectively maintain P/A polymorphisms for a specific adaptation.
Collapse
Affiliation(s)
- Shengjun Tan
- State Key Laboratory of Pharmaceutical Biotechnology, Department of Biology, Nanjing University, Nanjing, 210093, China
| | - Yan Zhong
- State Key Laboratory of Pharmaceutical Biotechnology, Department of Biology, Nanjing University, Nanjing, 210093, China
| | - Huan Hou
- State Key Laboratory of Pharmaceutical Biotechnology, Department of Biology, Nanjing University, Nanjing, 210093, China
| | - Sihai Yang
- State Key Laboratory of Pharmaceutical Biotechnology, Department of Biology, Nanjing University, Nanjing, 210093, China
| | - Dacheng Tian
- State Key Laboratory of Pharmaceutical Biotechnology, Department of Biology, Nanjing University, Nanjing, 210093, China
| |
Collapse
|
229
|
Abstract
Inversion polymorphisms have occupied a privileged place in Drosophila genetic research since their discovery in the 1920s. Indeed, inversions seem to be nearly ubiquitous, and the majority of species that have been thoroughly surveyed have been found to be polymorphic for one or more chromosomal inversions. Despite enduring interest, however, inversions remain difficult to study because their effects are often cryptic, and few efficient assays have been developed. Even in Drosophila melanogaster, in which inversions can be reliably detected and have received considerable attention, the breakpoints of only three inversions have been characterized molecularly. Hence, inversion detection and assay design remain important unsolved problems. Here, we present a method for identification and local de novo assembly of inversion breakpoints using next-generation paired-end reads derived from D. melanogaster isofemale lines. PCR and cytological confirmations demonstrate that our method can reliably assemble inversion breakpoints, providing tools for future research on D. melanogaster inversions as well as a framework for detection and assay design of inversions and other chromosome aberrations in diverse taxa.
Collapse
|
230
|
Iskow RC, Gokcumen O, Lee C. Exploring the role of copy number variants in human adaptation. Trends Genet 2012; 28:245-57. [PMID: 22483647 PMCID: PMC3533238 DOI: 10.1016/j.tig.2012.03.002] [Citation(s) in RCA: 97] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2011] [Revised: 03/05/2012] [Accepted: 03/06/2012] [Indexed: 11/18/2022]
Abstract
Over the past decade, the ubiquity of copy number variants (CNVs, the gain or loss of genomic material) in the genomes of healthy humans has become apparent. Although some of these variants are associated with disorders, a handful of studies documented an adaptive advantage conferred by CNVs. In this review, we propose that CNVs are substrates for human evolution and adaptation. We discuss the possible mechanisms and evolutionary processes in which CNVs are selected, outline the current challenges in identifying these loci, and highlight that copy number variable regions allow for the creation of novel genes that may diversify the repertoire of such genes in response to rapidly changing environments. We expect that many more adaptive CNVs will be discovered in the coming years, and we believe that these new findings will contribute to our understanding of human-specific phenotypes.
Collapse
Affiliation(s)
- Rebecca C Iskow
- Department of Pathology, Brigham and Women's Hospital, Boston, MA 02115, USA
| | | | | |
Collapse
|
231
|
Naidoo N, Pawitan Y, Soong R, Cooper DN, Ku CS. Human genetics and genomics a decade after the release of the draft sequence of the human genome. Hum Genomics 2012; 5:577-622. [PMID: 22155605 PMCID: PMC3525251 DOI: 10.1186/1479-7364-5-6-577] [Citation(s) in RCA: 77] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Substantial progress has been made in human genetics and genomics research over the past ten years since the publication of the draft sequence of the human genome in 2001. Findings emanating directly from the Human Genome Project, together with those from follow-on studies, have had an enormous impact on our understanding of the architecture and function of the human genome. Major developments have been made in cataloguing genetic variation, the International HapMap Project, and with respect to advances in genotyping technologies. These developments are vital for the emergence of genome-wide association studies in the investigation of complex diseases and traits. In parallel, the advent of high-throughput sequencing technologies has ushered in the 'personal genome sequencing' era for both normal and cancer genomes, and made possible large-scale genome sequencing studies such as the 1000 Genomes Project and the International Cancer Genome Consortium. The high-throughput sequencing and sequence-capture technologies are also providing new opportunities to study Mendelian disorders through exome sequencing and whole-genome sequencing. This paper reviews these major developments in human genetics and genomics over the past decade.
Collapse
Affiliation(s)
- Nasheen Naidoo
- Centre for Molecular Epidemiology, Department of Epidemiology and Public Health, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | | | | | | | | |
Collapse
|
232
|
Watson CT, Breden F. The immunoglobulin heavy chain locus: genetic variation, missing data, and implications for human disease. Genes Immun 2012; 13:363-73. [PMID: 22551722 DOI: 10.1038/gene.2012.12] [Citation(s) in RCA: 130] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
The immunoglobulin (IG) loci consist of repeated and highly homologous sets of genes of different types, variable (V), diversity (D) and junction (J), that rearrange in developing B cells to produce an individual's highly variable repertoire of expressed antibodies, designed to bind to a vast array of pathogens. This repeated structure makes these loci susceptible to a high frequency of insertion and deletion events through evolutionary time, and also makes them difficult to characterize at the genomic level or assay with high-throughput techniques. Given the central role of antibodies in the adaptive immune system, it is not surprising that early candidate gene approaches showed that germline polymorphisms in these regions correlated with susceptibility to both infectious and autoimmune diseases. However, more recent studies, particularly those using high-throughput genome-wide arrays, have failed to implicate these loci in disease. In this review of the IG heavy chain variable gene cluster (IGHV), we examine how poorly we understand the distribution of haplotype variation in this genomic region, and we argue that this lack of information may mask candidate loci in the IGHV gene cluster as causative factors for infectious and autoimmune diseases.
Collapse
Affiliation(s)
- C T Watson
- Department of Biological Sciences, Simon Fraser University, Burnaby, BC, Canada.
| | | |
Collapse
|
233
|
Dennis MY, Nuttle X, Sudmant PH, Antonacci F, Graves TA, Nefedov M, Rosenfeld JA, Sajjadian S, Malig M, Kotkiewicz H, Curry CJ, Shafer S, Shaffer LG, de Jong PJ, Wilson RK, Eichler EE. Evolution of human-specific neural SRGAP2 genes by incomplete segmental duplication. Cell 2012; 149:912-22. [PMID: 22559943 DOI: 10.1016/j.cell.2012.03.033] [Citation(s) in RCA: 252] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2011] [Revised: 02/17/2012] [Accepted: 03/01/2012] [Indexed: 10/28/2022]
Abstract
Gene duplication is an important source of phenotypic change and adaptive evolution. We leverage a haploid hydatidiform mole to identify highly identical sequences missing from the reference genome, confirming that the cortical development gene Slit-Robo Rho GTPase-activating protein 2 (SRGAP2) duplicated three times exclusively in humans. We show that the promoter and first nine exons of SRGAP2 duplicated from 1q32.1 (SRGAP2A) to 1q21.1 (SRGAP2B) ∼3.4 million years ago (mya). Two larger duplications later copied SRGAP2B to chromosome 1p12 (SRGAP2C) and to proximal 1q21.1 (SRGAP2D) ∼2.4 and ∼1 mya, respectively. Sequence and expression analyses show that SRGAP2C is the most likely duplicate to encode a functional protein and is among the most fixed human-specific duplicate genes. Our data suggest a mechanism where incomplete duplication created a novel gene function-antagonizing parental SRGAP2 function-immediately "at birth" 2-3 mya, which is a time corresponding to the transition from Australopithecus to Homo and the beginning of neocortex expansion.
Collapse
Affiliation(s)
- Megan Y Dennis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, 98195, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
234
|
Rios JJ, Shastry S, Jasso J, Hauser N, Garg A, Bensadoun A, Cohen JC, Hobbs HH. Deletion of GPIHBP1 causing severe chylomicronemia. J Inherit Metab Dis 2012; 35:531-40. [PMID: 22008945 PMCID: PMC3319888 DOI: 10.1007/s10545-011-9406-5] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/18/2011] [Revised: 09/20/2011] [Accepted: 09/22/2011] [Indexed: 12/19/2022]
Abstract
Lipoprotein lipase (LPL) is a hydrolase that cleaves circulating triglycerides to release fatty acids to the surrounding tissues. The enzyme is synthesized in parenchymal cells and is transported to its site of action on the capillary endothelium by glycophosphatidylinositol (GPI)-anchored high-density lipoprotein-binding protein 1 (GPIHBP1). Inactivating mutations in LPL; in its cofactor, apolipoprotein (Apo) C2; or in GPIHBP1 cause severe hypertriglyceridemia. Here we describe an individual with complete deficiency of GPIHBP1. The proband was an Asian Indian boy who had severe chylomicronemia at 2 months of age. Array-based copy-number analysis of his genomic DNA revealed homozygosity for a 17.5-kb deletion that included GPIHBP1. A 44-year-old aunt with a history of hypertriglyceridemia and pancreatitis was also homozygous for the deletion. A bolus of intravenously administered heparin caused a rapid increase in circulating LPL and decreased plasma triglyceride levels in control individuals but not in two GPIHBP1-deficient patients. Thus, short-term treatment with heparin failed to attenuate the hypertriglyceridemia in patients with GPIHBP1 deficiency. The increasing resolution of copy number microarrays and their widespread adoption for routine cytogenetic analysis is likely to reveal a greater role for submicroscopic deletions in Mendelian conditions. We describe the first neonate with complete GPIHBP1 deficiency due to homozygosity for a deletion of GPIHBP1.
Collapse
Affiliation(s)
- Jonathan J. Rios
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390 USA
| | - Savitha Shastry
- Division of Nutrition and Metabolic Diseases, Center for Human Nutrition, University of Texas Southwestern Medical Center, Dallas, TX USA
| | - Juan Jasso
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390 USA
| | - Natalie Hauser
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390 USA
| | - Abhimanyu Garg
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390 USA
- Division of Nutrition and Metabolic Diseases, Center for Human Nutrition, University of Texas Southwestern Medical Center, Dallas, TX USA
| | - André Bensadoun
- Division of Nutritional Sciences, Cornell University, Ithaca, NY USA
| | - Jonathan C. Cohen
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390 USA
- Division of Nutrition and Metabolic Diseases, Center for Human Nutrition, University of Texas Southwestern Medical Center, Dallas, TX USA
| | - Helen H. Hobbs
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390 USA
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX USA
| |
Collapse
|
235
|
Simmons AD, Carvalho CMB, Lupski JR. What have studies of genomic disorders taught us about our genome? Methods Mol Biol 2012; 838:1-27. [PMID: 22228005 DOI: 10.1007/978-1-61779-507-7_1] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
The elucidation of genomic disorders began with molecular technologies that enabled detection of genomic changes which were (a) smaller than those resolved by traditional cytogenetics (less than 5 Mb) and (b) larger than what could be determined by conventional gel electrophoresis. Methods such as pulsed field gel electrophoresis (PFGE) and fluorescent in situ hybridization (FISH) could resolve such changes but were limited to locus-specific studies. The study of genomic disorders has rapidly advanced with the development of array-based techniques. These enabled examination of the entire human genome at a higher level of resolution, thus allowing elucidation of the basis of many new disorders, mechanisms that result in genomic changes that can result in copy number variation (CNV), and most importantly, a deeper understanding of the characteristics, features, and plasticity of our genome. In this chapter, we focus on the structural and architectural features of the genome, which can potentially result in genomic instability, delineate how mechanisms, such as NAHR, NHEJ, and FoSTeS/MMBIR lead to disease-causing rearrangements, and briefly describe the relationship between the leading methods presently used in studying genomic disorders. We end with a discussion on our new understanding about our genome including: the contribution of new mutation CNV to disease, the abundance of mosaicism, the extent of subtelomeric rearrangements, the frequency of de novo rearrangements associated with sporadic birth defects, the occurrence of balanced and unbalanced translocations, the increasing discovery of insertional translocations, the exploration of complex rearrangements and exonic CNVs. In the postgenomic era, our understanding of the genome has advanced very rapidly as the level of technical resolution has become higher. This leads to a greater understanding of the effects of rearrangements present both in healthy subjects and individuals with clinically relevant phenotypes.
Collapse
|
236
|
Abstract
Background A cancer genome is derived from the germline genome through a series of somatic mutations. Somatic structural variants - including duplications, deletions, inversions, translocations, and other rearrangements - result in a cancer genome that is a scrambling of intervals, or "blocks" of the germline genome sequence. We present an efficient algorithm for reconstructing the block organization of a cancer genome from paired-end DNA sequencing data. Results By aligning paired reads from a cancer genome - and a matched germline genome, if available - to the human reference genome, we derive: (i) a partition of the reference genome into intervals; (ii) adjacencies between these intervals in the cancer genome; (iii) an estimated copy number for each interval. We formulate the Copy Number and Adjacency Genome Reconstruction Problem of determining the cancer genome as a sequence of the derived intervals that is consistent with the measured adjacencies and copy numbers. We design an efficient algorithm, called Paired-end Reconstruction of Genome Organization (PREGO), to solve this problem by reducing it to an optimization problem on an interval-adjacency graph constructed from the data. The solution to the optimization problem results in an Eulerian graph, containing an alternating Eulerian tour that corresponds to a cancer genome that is consistent with the sequencing data. We apply our algorithm to five ovarian cancer genomes that were sequenced as part of The Cancer Genome Atlas. We identify numerous rearrangements, or structural variants, in these genomes, analyze reciprocal vs. non-reciprocal rearrangements, and identify rearrangements consistent with known mechanisms of duplication such as tandem duplications and breakage/fusion/bridge (B/F/B) cycles. Conclusions We demonstrate that PREGO efficiently identifies complex and biologically relevant rearrangements in cancer genome sequencing data. An implementation of the PREGO algorithm is available at http://compbio.cs.brown.edu/software/.
Collapse
|
237
|
Coe BP, Girirajan S, Eichler EE. The genetic variability and commonality of neurodevelopmental disease. AMERICAN JOURNAL OF MEDICAL GENETICS PART C-SEMINARS IN MEDICAL GENETICS 2012; 160C:118-29. [PMID: 22499536 PMCID: PMC4114147 DOI: 10.1002/ajmg.c.31327] [Citation(s) in RCA: 91] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Despite detailed clinical definition and refinement of neurodevelopmental disorders and neuropsychiatric conditions, the underlying genetic etiology has proved elusive. Recent genetic studies have revealed some common themes: considerable locus heterogeneity, variable expressivity for the same mutation, and a role for multiple disruptive events in the same individual affecting genes in common pathways. Recurrent copy number variation (CNV), in particular, has emphasized the importance of either de novo or essentially private mutations creating imbalances for multiple genes. CNVs have foreshadowed a model where the distinction between milder neuropsychiatric conditions from those of severe developmental impairment may be a consequence of increased mutational burden affecting more genes.
Collapse
Affiliation(s)
- Bradley P Coe
- Department of Genome Sciences and Howard Hughes Medical Institute, University of Washington School of Medicine, Seattle, WA, USA
| | | | | |
Collapse
|
238
|
Breheny P, Chalise P, Batzler A, Wang L, Fridley BL. Genetic association studies of copy-number variation: should assignment of copy number states precede testing? PLoS One 2012; 7:e34262. [PMID: 22493684 PMCID: PMC3320903 DOI: 10.1371/journal.pone.0034262] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2011] [Accepted: 02/24/2012] [Indexed: 11/18/2022] Open
Abstract
Recently, structural variation in the genome has been implicated in many complex diseases. Using genomewide single nucleotide polymorphism (SNP) arrays, researchers are able to investigate the impact not only of SNP variation, but also of copy-number variants (CNVs) on the phenotype. The most common analytic approach involves estimating, at the level of the individual genome, the underlying number of copies present at each location. Once this is completed, tests are performed to determine the association between copy number state and phenotype. An alternative approach is to carry out association testing first, between phenotype and raw intensities from the SNP array at the level of the individual marker, and then aggregate neighboring test results to identify CNVs associated with the phenotype. Here, we explore the strengths and weaknesses of these two approaches using both simulations and real data from a pharmacogenomic study of the chemotherapeutic agent gemcitabine. Our results indicate that pooled marker-level testing is capable of offering a dramatic increase in power (> 12-fold) over CNV-level testing, particularly for small CNVs. However, CNV-level testing is superior when CNVs are large and rare; understanding these tradeoffs is an important consideration in conducting association studies of structural variation.
Collapse
Affiliation(s)
- Patrick Breheny
- Department of Biostatistics, University of Kentucky, Lexington, Kentucky, United States of America.
| | | | | | | | | |
Collapse
|
239
|
Itsara A, Vissers L, Steinberg K, Meyer K, Zody M, Koolen D, de Ligt J, Cuppen E, Baker C, Lee C, Graves TA, Wilson R, Jenkins R, Veltman J, Eichler E. Resolving the breakpoints of the 17q21.31 microdeletion syndrome with next-generation sequencing. Am J Hum Genet 2012; 90:599-613. [PMID: 22482802 DOI: 10.1016/j.ajhg.2012.02.013] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2011] [Revised: 01/23/2012] [Accepted: 02/16/2012] [Indexed: 01/22/2023] Open
Abstract
Recurrent deletions have been associated with numerous diseases and genomic disorders. Few, however, have been resolved at the molecular level because their breakpoints often occur in highly copy-number-polymorphic duplicated sequences. We present an approach that uses a combination of somatic cell hybrids, array comparative genomic hybridization, and the specificity of next-generation sequencing to determine breakpoints that occur within segmental duplications. Applying our technique to the 17q21.31 microdeletion syndrome, we used genome sequencing to determine copy-number-variant breakpoints in three deletion-bearing individuals with molecular resolution. For two cases, we observed breakpoints consistent with nonallelic homologous recombination involving only H2 chromosomal haplotypes, as expected. Molecular resolution revealed that the breakpoints occurred at different locations within a 145 kbp segment of >99% identity and disrupt KANSL1 (previously known as KANSL1). In the remaining case, we found that unequal crossover occurred interchromosomally between the H1 and H2 haplotypes and that this event was mediated by a homologous sequence that was once again missing from the human reference. Interestingly, the breakpoints mapped preferentially to gaps in the current reference genome assembly, which we resolved in this study. Our method provides a strategy for the identification of breakpoints within complex regions of the genome harboring high-identity and copy-number-polymorphic segmental duplication. The approach should become particularly useful as high-quality alternate reference sequences become available and genome sequencing of individuals' DNA becomes more routine.
Collapse
|
240
|
Rodríguez-Santiago B, Armengol L. Tecnologías de secuenciación de nueva generación en diagnóstico genético pre- y postnatal. ACTA ACUST UNITED AC 2012. [DOI: 10.1016/j.diapre.2012.02.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
241
|
Schrider DR, Stevens K, Cardeño CM, Langley CH, Hahn MW. Genome-wide analysis of retrogene polymorphisms in Drosophila melanogaster. Genome Res 2012; 21:2087-95. [PMID: 22135405 DOI: 10.1101/gr.116434.110] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Gene duplication via retrotransposition has been shown to be an important mechanism in evolution, affecting gene dosage and allowing for the acquisition of new gene functions. Although fixed retrotransposed genes have been found in a variety of species, very little effort has been made to identify retrogene polymorphisms. Here, we examine 37 Illumina-sequenced North American Drosophila melanogaster inbred lines and present the first ever data set and analysis of polymorphic retrogenes in Drosophila. We show that this type of polymorphism is quite common, with any two gametes in the North American population differing in the presence or absence of six retrogenes, accounting for ~13% of gene copy-number heterozygosity. These retrogenes were identified by a straightforward method that can be applied using any type of DNA sequencing data. We also use a variant of this method to conduct a genome-wide scan for intron presence/absence polymorphisms, and show that any two chromosomes in the population likely differ in the presence of multiple introns. We show that these polymorphisms are all in fact deletions rather than intron gain events present in the reference genome. Finally, by leveraging the known location of the parental genes that give rise to the retrogene polymorphisms, we provide direct evidence that natural selection is responsible for the excess of fixations of retrogenes moving off of the X chromosome in Drosophila. Further efforts to identify retrogene and intron presence/absence polymorphisms will undoubtedly improve our understanding of the evolution of gene copy number and gene structure.
Collapse
Affiliation(s)
- Daniel R Schrider
- Department of Biology, Indiana University, Bloomington, Indiana 47405, USA.
| | | | | | | | | |
Collapse
|
242
|
Neuman JA, Isakov O, Shomron N. Analysis of insertion-deletion from deep-sequencing data: software evaluation for optimal detection. Brief Bioinform 2012; 14:46-55. [PMID: 22707752 DOI: 10.1093/bib/bbs013] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Insertion and deletion (indel) mutations, the most common type of structural variance in the human genome, affect a multitude of human traits and diseases. New sequencing technologies, such as deep sequencing, allow massive throughput of sequence data and greatly contribute to the field of disease causing mutation detection, in general, and indel detection, specifically. In order to infer indel presence (indel calling), the deep-sequencing data have to undergo comprehensive computational analysis. Selecting which indel calling software to use can often skew the results and inherent tool limitations may affect downstream analysis. In order to better understand these inter-software differences, we evaluated the performance of several indel calling software for short indel (1-10 nt) detection. We compared the software's sensitivity and predictive values in the presence of varying parameters such as read depth (coverage), read length, indel size and frequency. We pinpoint several key features that assist successful experimental design and appropriate tool selection. Our study may also serve as a basis for future evaluation of additional indel calling methods.
Collapse
Affiliation(s)
- Joseph A Neuman
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | | | | |
Collapse
|
243
|
Clop A, Vidal O, Amills M. Copy number variation in the genomes of domestic animals. Anim Genet 2012; 43:503-17. [PMID: 22497594 DOI: 10.1111/j.1365-2052.2012.02317.x] [Citation(s) in RCA: 98] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/28/2011] [Indexed: 12/28/2022]
Abstract
Copy number variation (CNV) might be one of the main contributors to phenotypic diversity and evolutionary adaptation in animals and plants, employing a wide variety of mechanisms, such as gene dosage and transcript structure alterations, to modulate organismal plasticity. In the past 4 years, considerable advances have been made in the characterization of the genomic architecture of CNV in domestic species. First, low-resolution CNV maps were produced for cattle, goat, sheep, pig, dog, chicken, duck and turkey, showing that these structural polymorphisms comprise a significant part of these genomes. Furthermore, CNVs have been associated with several pigmentation (white coat in horse, pig and sheep) and morphological (late feathering and pea comb in chicken) traits, as well as with susceptibility to a wide array of diseases and developmental disorders, for example osteopetrosis, anhidrotic ectodermal dysplasia, copper toxicosis, intersexuality, cone degeneration, periodic fever and dermoid sinus, among others. In the future, development of high-resolution tools for CNV detection and typing combined with the implementation of databases integrating CNV, QTL and gene expression data will be essential to identify and measure the impact of this source of structural variation on the many phenotypes that are relevant to animal breeders and veterinary practitioners.
Collapse
Affiliation(s)
- A Clop
- Department of Medical and Molecular Genetics, King's College London, Great Maze Pond, SE1 9RT, London, UK
| | | | | |
Collapse
|
244
|
Robinson JI, Carr IM, Cooper DL, Rashid LH, Martin SG, Emery P, Isaacs JD, Barton A, Wilson AG, Barrett JH, Morgan AW. Confirmation of association of FCGR3B but not FCGR3A copy number with susceptibility to autoantibody positive rheumatoid arthritis. Hum Mutat 2012; 33:741-9. [PMID: 22290871 DOI: 10.1002/humu.22031] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2011] [Accepted: 01/17/2012] [Indexed: 11/07/2022]
Abstract
The FCGR locus encoding the low-affinity Fcγ receptors (FcγR) for immunoglobulin G has largely been missed by genome-wide association studies due to complications with structural variation and segmental duplication. Recently identified copy number variants (CNVs) affecting FCGR3A and FCGR3B have been linked to a number of autoimmune disorders. We have developed and validated a novel quantitative sequence variant assay in combination with an adapted paralogue ratio test to examine independent CNVs carrying FCGR3A and FCGR3B in rheumatoid arthritis (RA) compared with healthy volunteers (n = 1,115 and 654, respectively). Implementation of a robust statistical analysis framework (CNVtools) allowed for systematic batch effects and for the inherent uncertainty of copy number assignment, thus avoiding two major sources of false positive results. Evidence for association with neither duplications nor deletions of FCGR3A was found; however, in line with previous studies, there was evidence of overrepresentation of FCGR3B deletions in RA (odds ratio [OR] 1.50, P = 0.028), which was more apparent in rheumatoid factor positive disease (OR 1.61, P = 0.011). The level of FcγRIIIb, encoded by FCGR3B, expression on neutrophils was shown to correlate with gene copy number. Thus, our results may highlight an important role for neutrophils in the pathogenesis of RA, potentially through reduced FcγRIIIb-mediated immune complex clearance.
Collapse
Affiliation(s)
- James I Robinson
- NIHR-Leeds Musculoskeletal Biomedical Research Unit and Leeds Institute of Molecular Medicine, University of Leeds, Leeds, UK
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
245
|
Advances in BAC-based physical mapping and map integration strategies in plants. J Biomed Biotechnol 2012; 2012:184854. [PMID: 22500080 PMCID: PMC3303678 DOI: 10.1155/2012/184854] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2011] [Revised: 10/26/2011] [Accepted: 11/11/2011] [Indexed: 12/29/2022] Open
Abstract
In the advent of next-generation sequencing (NGS) platforms, map-based sequencing strategy has been recently suppressed being too expensive and laborious. The detailed studies on NGS drafts alone indicated these assemblies remain far from gold standard reference quality, especially when applied on complex genomes. In this context the conventional BAC-based physical mapping has been identified as an important intermediate layer in current hybrid sequencing strategy. BAC-based physical map construction and its integration with high-density genetic maps have benefited from NGS and high-throughput array platforms. This paper addresses the current advancements of BAC-based physical mapping and high-throughput map integration strategies to obtain densely anchored well-ordered physical maps. The resulted maps are of immediate utility while providing a template to harness the maximum benefits of the current NGS platforms.
Collapse
|
246
|
Copy number variation of age-related macular degeneration relevant genes in the Korean population. PLoS One 2012; 7:e31243. [PMID: 22355348 PMCID: PMC3280288 DOI: 10.1371/journal.pone.0031243] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2011] [Accepted: 01/05/2012] [Indexed: 11/19/2022] Open
Abstract
PURPOSE Studies that analyzed single nucleotide polymorphisms (SNP) in various genes have shown that genetic factors are strongly associated with age-related macular degeneration (AMD) susceptibility. Copy number variation (CNV) may be an additional type of genetic variation that contributes to AMD pathogenesis. This study investigated CNV in 4 AMD-relevant genes in Korean AMD patients and control subjects. METHODS Four CNV candidate regions located in AMD-relevant genes (VEGFA, ARMS2/HTRA1, CFH and VLDLR), were selected based on the outcomes of our previous study which elucidated common CNVs in the Asian populations. Real-time PCR based TaqMan Copy Number Assays were performed on CNV candidates in 273 AMD patients and 257 control subjects. RESULTS The predicted copy number (PCN, 0, 1, 2 or 3+) of each region was called using the CopyCaller program. All candidate genes except ARMS2/HTRA1 showed CNV in at least one individual, in which losses of VEGFA and VLDLR represent novel findings in the Asian population. When the frequencies of PCN were compared, only the gain in VLDLR showed significant differences between AMD patients and control subjects (p = 0.025). Comparisons of the raw copy values (RCV) revealed that 3 of 4 candidate genes showed significant differences (2.03 vs. 1.92 for VEGFA, p<0.01; 2.01 vs. 1.97 for CFH, p<0.01; 1.97 vs. 2.01, p<0.01 for ARMS2/HTRA1). CONCLUSION CNVs located in AMD-relevant genes may be associated with AMD susceptibility. Further investigations encompassing larger patient cohorts are needed to elucidate the role of CNV in AMD pathogenesis.
Collapse
|
247
|
Copy-number variations observed in a Japanese population by BAC array CGH: summary of relatively rare CNVs. J Biomed Biotechnol 2012; 2012:789024. [PMID: 22315515 PMCID: PMC3270455 DOI: 10.1155/2012/789024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2011] [Revised: 09/21/2011] [Accepted: 10/07/2011] [Indexed: 11/30/2022] Open
Abstract
Copy-number variations (CNVs) may contribute to genetic variation in humans. Reports regarding existence and characteristics of CNVs in a large apparently healthy Japanese cohort are quite limited. We report the data from a screening of 213 unrelated Japanese individuals using comparative genomic hybridization based on a bacterial artificial chromosome microarray (BAC aCGH). In a previous paper, we summarized the data by focusing on highly polymorphic CNVs (in ≥5.0 % of the individuals). However, rare variations have recently received attention from scientists who espouse a hypothesis called “common disease and rare variants.” Here, we report CNVs identified in fewer than 10 individuals in our study population. We found a total of 126 CNVs at 52 different BAC regions in the genome. The CNVs observed at 27 of the 52 BAC-regions were found in only one unrelated individual. The majority of CNVs found in this study were not identified in the Japanese who were examined in the other studies. Family studies were conducted, and the results demonstrated that the CNVs were inherited from one parent in the families.
Collapse
|
248
|
No evidence for GNAS copy number variants in patients with features of Albright's hereditary osteodystrophy and abnormal platelet Gs activity. J Hum Genet 2012; 57:277-9. [PMID: 22277900 DOI: 10.1038/jhg.2012.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Albright's hereditary osteodystrophy (AHO) is characterized by short stature, round face, calcifications, obesity, brachydactyly and intellectual disability. AHO without hormone resistance is called pseudopseudohypoparathyroidism (PPHP), a rare clinical condition difficult to diagnose with highly variable features. PPHP is caused by paternally inherited loss-of-function mutations in the GNAS. Patients with 2q37 microdeletions or HDAC4 mutations are also defined as having an AHO-like phenotype with normal stimulatory G (Gs) function. We have studied 256 patients with AHO features but no other diagnosis. Their platelet Gs activity was determined via the aggregation-inhibition test showing Gs hypo- or hyperfuncton in 24% and 15% of the patients, respectively. Before initiating with detailed (epi)genetic GNAS studies, we here wanted to excluded copy number variants (CNVs) in GNAS as cause of AHO with a novel large-scale screening technique. Multiplex amplicon quantification (MAQ) for CNVs screening was developed for the 20q13.3 region including GNAS and potential long-range imprinting control elements such as STX16. This is the first large-scale GNAS CNV study in patients with common AHO features but no CNVs were detected. In conclusion, CNVs in the GNAS region are not likely to cause an AHO-like phenotype with or without abnormal platelet Gs activity. Future studies will be undertaken to find out whether these AHO patients with abnormal Gs function are characterized by GNAS coding or methylation defects.
Collapse
|
249
|
Zhang Y, Liu KJ, Wang TL, Shih IM, Wang TH. Mapping DNA quantity into electrophoretic mobility through quantum dot nanotethers for high-resolution genetic and epigenetic analysis. ACS NANO 2012; 6:858-864. [PMID: 22136600 PMCID: PMC3273333 DOI: 10.1021/nn204377k] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Newly discovered nanoparticle properties have driven the development of novel applications and uses. We report a new observation where the electrophoretic mobility of a quantum dot/DNA nanoassembly can be precisely modulated by the degree of surface DNA conjugation. By using streptavidin-coated quantum dots (QDs) as nanotethers to gather biotin-labeled DNA into electrophoretic nanoassemblies, the QD surface charge is modulated and transformed into electrophoretic mobility shifts using standard agarose gel electrophoresis. Typical fluorescent assays quantify based on relative intensity. However, this phenomenon uses a novel approach that accurately maps DNA quantity into shifts in relative band position. This property was applied in a QD-enabled nanoassay called quantum dot electrophoretic mobility shift assay (QEMSA) that enables accurate quantification of DNA targets down to 1.1-fold (9%) changes in quantity, beyond what is achievable in qPCR. In addition to these experimental findings, an analytical model is presented to explain this behavior. Finally, QEMSA was applied to both genetic and epigenetic analysis of cancer. First, it was used to analyze copy number variation (CNV) of the RSF1/HBXAP gene, where conventional approaches for CNV analysis based on comparative genomic hybridization (CGH), microarrays, and qPCR are unable to reliably differentiate less than 2-fold changes in copy number. Then, QEMSA was used for DNA methylation analysis of the p16/CDK2A tumor suppressor gene, where its ability to detect subtle changes in methylation was shown to be superior to that of qPCR.
Collapse
Affiliation(s)
- Yi Zhang
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21205
- Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins, Baltimore, MD 21287
| | - Kelvin J. Liu
- Department of Mechanical Engineering, Johns Hopkins University, Baltimore, MD 21218
| | - Tian-Li Wang
- Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins, Baltimore, MD 21287
| | - Ie-Ming Shih
- Department of Pathology, The Johns Hopkins University School of Medicine, Baltimore, MD 21231
- Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins, Baltimore, MD 21287
| | - Tza-Huei Wang
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21205
- Department of Mechanical Engineering, Johns Hopkins University, Baltimore, MD 21218
- Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins, Baltimore, MD 21287
- Center of Cancer Nanotechnology Excellence at the Johns Hopkins Institute for NanoBioTechnology, Johns Hopkins University, Baltimore, MD 21218
| |
Collapse
|
250
|
Iqbal Z, Caccamo M, Turner I, Flicek P, McVean G. De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat Genet 2012; 44:226-32. [PMID: 22231483 PMCID: PMC3272472 DOI: 10.1038/ng.1028] [Citation(s) in RCA: 352] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2011] [Accepted: 11/07/2011] [Indexed: 12/24/2022]
Abstract
Detecting genetic variants that are highly divergent from a reference sequence remains a major challenge in genome sequencing. We introduce de novo assembly algorithms using colored de Bruijn graphs for detecting and genotyping simple and complex genetic variants in an individual or population. We provide an efficient software implementation, Cortex, the first de novo assembler capable of assembling multiple eukaryotic genomes simultaneously. Four applications of Cortex are presented. First, we detect and validate both simple and complex structural variations in a high-coverage human genome. Second, we identify more than 3 Mb of sequence absent from the human reference genome, in pooled low-coverage population sequence data from the 1000 Genomes Project. Third, we show how population information from ten chimpanzees enables accurate variant calls without a reference sequence. Last, we estimate classical human leukocyte antigen (HLA) genotypes at HLA-B, the most variable gene in the human genome.
Collapse
Affiliation(s)
- Zamin Iqbal
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford OX3 7BN, UK
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK
| | - Mario Caccamo
- The Genome Analysis Centre, Norwich Research Park, Norwich, NR4 7UH, UK
| | - Isaac Turner
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford OX3 7BN, UK
| | - Paul Flicek
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK
| | - Gil McVean
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford OX3 7BN, UK
- Department of Statistics, 1 South Parks Road, Oxford OX1 3TG, UK
| |
Collapse
|