1
|
Zhang T, Dong J, Jiang H, Zhao Z, Zhou M, Yuan T. CNV-PCC: An efficient method for detecting copy number variations from next-generation sequencing data. Front Bioeng Biotechnol 2022; 10:1000638. [DOI: 10.3389/fbioe.2022.1000638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 11/18/2022] [Indexed: 12/03/2022] Open
Abstract
Copy number variations (CNVs) significantly influence the diversity of the human genome and the occurrence of many complex diseases. The next-generation sequencing (NGS) technology provides rich data for detecting CNVs, and the read depth (RD)-based approach is widely used. However, low CN (copy number of 3–4) duplication events are challenging to identify with existing methods, especially when the size of CNVs is small. In addition, the RD-based approach can only obtain rough breakpoints. We propose a new method, CNV-PCC (detection of CNVs based on Principal Component Classifier), to identify CNVs in whole genome sequencing data. CNV-PPC first uses the split read signal to search for potential breakpoints. A two-stage segmentation strategy is then implemented to enhance the identification capabilities of low CN duplications and small CNVs. Next, the outlier scores are calculated for each segment by PCC (Principal Component Classifier). Finally, the OTSU algorithm calculates the threshold to determine the CNVs regions. The analysis of simulated data results indicates that CNV-PCC outperforms the other methods for sensitivity and F1-score and improves breakpoint accuracy. Furthermore, CNV-PCC shows high consistency on real sequencing samples with other methods. This study demonstrates that CNV-PCC is an effective method for detecting CNVs, even for low CN duplications and small CNVs.
Collapse
|
2
|
Yuan Y, Bayer PE, Batley J, Edwards D. Current status of structural variation studies in plants. PLANT BIOTECHNOLOGY JOURNAL 2021; 19:2153-2163. [PMID: 34101329 PMCID: PMC8541774 DOI: 10.1111/pbi.13646] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 05/31/2021] [Accepted: 06/03/2021] [Indexed: 05/23/2023]
Abstract
Structural variations (SVs) including gene presence/absence variations and copy number variations are a common feature of genomes in plants and, together with single nucleotide polymorphisms and epigenetic differences, are responsible for the heritable phenotypic diversity observed within and between species. Understanding the contribution of SVs to plant phenotypic variation is important for plant breeders to assist in producing improved varieties. The low resolution of early genetic technologies and inefficient methods have previously limited our understanding of SVs in plants. However, with the rapid expansion in genomic technologies, it is possible to assess SVs with an ever-greater resolution and accuracy. Here, we review the current status of SV studies in plants, examine the roles that SVs play in phenotypic traits, compare current technologies and assess future challenges for SV studies.
Collapse
Affiliation(s)
- Yuxuan Yuan
- School of Biological Sciences and Institute of AgricultureThe University of Western AustraliaPerthWAAustralia
- School of Life Sciences and State Key Laboratory for AgrobiotechnologyThe Chinese University of Hong KongHong Kong SARChina
| | - Philipp E. Bayer
- School of Biological Sciences and Institute of AgricultureThe University of Western AustraliaPerthWAAustralia
| | - Jacqueline Batley
- School of Biological Sciences and Institute of AgricultureThe University of Western AustraliaPerthWAAustralia
| | - David Edwards
- School of Biological Sciences and Institute of AgricultureThe University of Western AustraliaPerthWAAustralia
| |
Collapse
|
3
|
Liu Y, Zhang M, Sun J, Chang W, Sun M, Zhang S, Wu J. Comparison of multiple algorithms to reliably detect structural variants in pears. BMC Genomics 2020; 21:61. [PMID: 31959124 PMCID: PMC6972009 DOI: 10.1186/s12864-020-6455-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2019] [Accepted: 01/07/2020] [Indexed: 01/01/2023] Open
Abstract
Background Structural variations (SVs) have been reported to play an important role in genetic diversity and trait regulation. Many computer algorithms detecting SVs have recently been developed, but the use of multiple algorithms to detect high-confidence SVs has not been studied. The most suitable sequencing depth for detecting SVs in pear is also not known. Results In this study, a pipeline to detect SVs using next-generation and long-read sequencing data was constructed. The performances of seven types of SV detection software using next-generation sequencing (NGS) data and two types of software using long-read sequencing data (SVIM and Sniffles), which are based on different algorithms, were compared. Of the nine software packages evaluated, SVIM identified the most SVs, and Sniffles detected SVs with the highest accuracy (> 90%). When the results from multiple SV detection tools were combined, the SVs identified by both MetaSV and IMR/DENOM, which use NGS data, were more accurate than those identified by both SVIM and Sniffles, with mean accuracies of 98.7 and 96.5%, respectively. The software packages using long-read sequencing data required fewer CPU cores and less memory and ran faster than those using NGS data. In addition, according to the performances of assembly-based algorithms using NGS data, we found that a sequencing depth of 50× is appropriate for detecting SVs in the pear genome. Conclusion This study provides strong evidence that more than one SV detection software package, each based on a different algorithm, should be used to detect SVs with higher confidence, and that long-read sequencing data are better than NGS data for SV detection. The SV detection pipeline that we have established will facilitate the study of diversity in other crops.
Collapse
Affiliation(s)
- Yueyuan Liu
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Mingyue Zhang
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Jieying Sun
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Wenjing Chang
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Manyi Sun
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Shaoling Zhang
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Jun Wu
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China.
| |
Collapse
|
4
|
Streptococcus agalactiae Strains with Chromosomal Deletions Evade Detection with Molecular Methods. J Clin Microbiol 2019; 57:JCM.02040-18. [PMID: 30760532 DOI: 10.1128/jcm.02040-18] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Accepted: 02/03/2019] [Indexed: 01/28/2023] Open
Abstract
Surveillance of circulating microbial populations is critical for monitoring the performance of a molecular diagnostic test. In this study, we characterized 31 isolates of Streptococcus agalactiae (group B Streptococcus [GBS]) from several geographic locations in the United States and Ireland that contain deletions in or adjacent to the region of the chromosome that encodes the hemolysin gene cfb, the region targeted by the Xpert GBS and GBS LB assays. PCR-negative, culture-positive isolates were recognized during verification studies of the Xpert GBS assay in 12 laboratories between 2012 and 2018. Whole-genome sequencing of 15 GBS isolates from 11 laboratories revealed four unique deletions of chromosomal DNA ranging from 181 bp to 49 kb. Prospective surveillance studies demonstrated that the prevalence of GBS isolates containing deletions in the convenience sample was <1% in three geographic locations but 7% in a fourth location. Among the 15 isolates with chromosomal deletions, multiple pulsed-field gel electrophoresis types were identified, one of which appears to be broadly dispersed across the United States.
Collapse
|
5
|
Genomic Rearrangements in Arabidopsis Considered as Quantitative Traits. Genetics 2017; 205:1425-1441. [PMID: 28179367 PMCID: PMC5378104 DOI: 10.1534/genetics.116.192823] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2016] [Accepted: 01/27/2017] [Indexed: 11/18/2022] Open
Abstract
Structural Rearrangements can have unexpected effects on quantitative phenotypes. Surprisingly, these rearrangements can also be considered as... To understand the population genetics of structural variants and their effects on phenotypes, we developed an approach to mapping structural variants that segregate in a population sequenced at low coverage. We avoid calling structural variants directly. Instead, the evidence for a potential structural variant at a locus is indicated by variation in the counts of short-reads that map anomalously to that locus. These structural variant traits are treated as quantitative traits and mapped genetically, analogously to a gene expression study. Association between a structural variant trait at one locus, and genotypes at a distant locus indicate the origin and target of a transposition. Using ultra-low-coverage (0.3×) population sequence data from 488 recombinant inbred Arabidopsis thaliana genomes, we identified 6502 segregating structural variants. Remarkably, 25% of these were transpositions. While many structural variants cannot be delineated precisely, we validated 83% of 44 predicted transposition breakpoints by polymerase chain reaction. We show that specific structural variants may be causative for quantitative trait loci for germination and resistance to infection by the fungus Albugo laibachii, isolate Nc14. Further we show that the phenotypic heritability attributable to read-mapping anomalies differs from, and, in the case of time to germination and bolting, exceeds that due to standard genetic variation. Genes within structural variants are also more likely to be silenced or dysregulated. This approach complements the prevalent strategy of structural variant discovery in fewer individuals sequenced at high coverage. It is generally applicable to large populations sequenced at low-coverage, and is particularly suited to mapping transpositions.
Collapse
|
6
|
Doran AG, Wong K, Flint J, Adams DJ, Hunter KW, Keane TM. Deep genome sequencing and variation analysis of 13 inbred mouse strains defines candidate phenotypic alleles, private variation and homozygous truncating mutations. Genome Biol 2016; 17:167. [PMID: 27480531 PMCID: PMC4968449 DOI: 10.1186/s13059-016-1024-y] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2016] [Accepted: 07/12/2016] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND The Mouse Genomes Project is an ongoing collaborative effort to sequence the genomes of the common laboratory mouse strains. In 2011, the initial analysis of sequence variation across 17 strains found 56.7 M unique single nucleotide polymorphisms (SNPs) and 8.8 M indels. We carry out deep sequencing of 13 additional inbred strains (BUB/BnJ, C57BL/10J, C57BR/cdJ, C58/J, DBA/1J, I/LnJ, KK/HiJ, MOLF/EiJ, NZB/B1NJ, NZW/LacJ, RF/J, SEA/GnJ and ST/bJ), cataloguing molecular variation within and across the strains. These strains include important models for immune response, leukaemia, age-related hearing loss and rheumatoid arthritis. We now have several examples of fully sequenced closely related strains that are divergent for several disease phenotypes. RESULTS Approximately 27.4 M unique SNPs and 5 M indels are identified across these strains compared to the C57BL/6 J reference genome (GRCm38). The amount of variation found in the inbred laboratory mouse genome has increased to 71 M SNPs and 12 M indels. We investigate the genetic basis of highly penetrant cancer susceptibility in RF/J finding private novel missense mutations in DNA damage repair and highly cancer associated genes. We use two highly related strains (DBA/1J and DBA/2J) to investigate the genetic basis of collagen-induced arthritis susceptibility. CONCLUSIONS This paper significantly expands the catalogue of fully sequenced laboratory mouse strains and now contains several examples of highly genetically similar strains with divergent phenotypes. We show how studying private missense mutations can lead to insights into the genetic mechanism for a highly penetrant phenotype.
Collapse
Affiliation(s)
- Anthony G Doran
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK
| | - Kim Wong
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK
| | - Jonathan Flint
- The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
| | - David J Adams
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK
| | - Kent W Hunter
- Laboratory of Cancer Biology and Genetics, NCI, NIH, Bethesda, Maryland, USA.
| | - Thomas M Keane
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK.
| |
Collapse
|
7
|
Guan P, Sung WK. Structural variation detection using next-generation sequencing data: A comparative technical review. Methods 2016; 102:36-49. [PMID: 26845461 DOI: 10.1016/j.ymeth.2016.01.020] [Citation(s) in RCA: 98] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2015] [Revised: 01/09/2016] [Accepted: 01/31/2016] [Indexed: 12/11/2022] Open
Abstract
Structural variations (SVs) are mutations in the genome of size at least fifty nucleotides. They contribute to the phenotypic differences among healthy individuals, cause severe diseases and even cancers by breaking or linking genes. Thus, it is crucial to systematically profile SVs in the genome. In the past decade, many next-generation sequencing (NGS)-based SV detection methods have been proposed due to the significant cost reduction of NGS experiments and their ability to unbiasedly detect SVs to the base-pair resolution. These SV detection methods vary in both sensitivity and specificity, since they use different SV-property-dependent and library-property-dependent features. As a result, predictions from different SV callers are often inconsistent. Besides, the noises in the data (both platform-specific sequencing error and artificial chimeric reads) impede the specificity of SV detection. Poorly characterized regions in the human genome (e.g., repeat regions) greatly impact the reads mapping and in turn affect the SV calling accuracy. Calling of complex SVs requires specialized SV callers. Apart from accuracy, processing speed of SV caller is another factor deciding its usability. Knowing the pros and cons of different SV calling techniques and the objectives of the biological study are essential for biologists and bioinformaticians to make informed decisions. This paper describes different components in the SV calling pipeline and reviews the techniques used by existing SV callers. Through simulation study, we also demonstrate that library properties, especially insert size, greatly impact the sensitivity of different SV callers. We hope the community can benefit from this work both in designing new SV calling methods and in selecting the appropriate SV caller for specific biological studies.
Collapse
Affiliation(s)
- Peiyong Guan
- School of Computing, National University of Singapore, 117543, Singapore
| | - Wing-Kin Sung
- School of Computing, National University of Singapore, 117543, Singapore; Computational & Mathematical Biology Group, Genome Institute of Singapore, 138672, Singapore.
| |
Collapse
|
8
|
Yiğiter A, Chen J, An L, Danacioğlu N. An online copy number variant detection method for short sequencing reads. J Appl Stat 2015. [DOI: 10.1080/02664763.2014.1001330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
9
|
Nikolskiy I, Conrad DF, Chun S, Fay JC, Cheverud JM, Lawson HA. Using whole-genome sequences of the LG/J and SM/J inbred mouse strains to prioritize quantitative trait genes and nucleotides. BMC Genomics 2015; 16:415. [PMID: 26016481 PMCID: PMC4445795 DOI: 10.1186/s12864-015-1592-3] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2014] [Accepted: 04/28/2015] [Indexed: 12/04/2022] Open
Abstract
Background The laboratory mouse is the most commonly used model for studying variation in complex traits relevant to human disease. Here we present the whole-genome sequences of two inbred strains, LG/J and SM/J, which are frequently used to study variation in complex traits as diverse as aging, bone-growth, adiposity, maternal behavior, and methamphetamine sensitivity. Results We identified small nucleotide variants (SNVs) and structural variants (SVs) in the LG/J and SM/J strains relative to the reference genome and discovered novel variants in these two strains by comparing their sequences to other mouse genomes. We find that 39% of the LG/J and SM/J genomes are identical-by-descent (IBD). We characterized amino-acid changing mutations using three algorithms: LRT, PolyPhen-2 and SIFT. We also identified polymorphisms between LG/J and SM/J that fall in regulatory regions and highly informative transcription factor binding sites (TFBS). We intersected these functional predictions with quantitative trait loci (QTL) mapped in advanced intercrosses of these two strains. We find that QTL are both over-represented in non-IBD regions and highly enriched for variants predicted to have a functional impact. Variants in QTL associated with metabolic (231 QTL identified in an F16 generation) and developmental (41 QTL identified in an F34 generation) traits were interrogated and we highlight candidate quantitative trait genes (QTG) and nucleotides (QTN) in a QTL on chr13 associated with variation in basal glucose levels and in a QTL on chr6 associated with variation in tibia length. Conclusions We show how integrating genomic sequence with QTL reduces the QTL search space and helps researchers prioritize candidate genes and nucleotides for experimental follow-up. Additionally, given the LG/J and SM/J phylogenetic context among inbred strains, these data contribute important information to the genomic landscape of the laboratory mouse. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1592-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Igor Nikolskiy
- Department of Genetics, Washington University School of Medicine, Campus Box 8108, 660 S Euclid Ave, St Louis, MO, 63110, USA.
| | - Donald F Conrad
- Department of Genetics, Washington University School of Medicine, Campus Box 8108, 660 S Euclid Ave, St Louis, MO, 63110, USA.
| | - Sung Chun
- Division of Genetics, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
| | - Justin C Fay
- Department of Genetics, Washington University School of Medicine, Campus Box 8108, 660 S Euclid Ave, St Louis, MO, 63110, USA.
| | | | - Heather A Lawson
- Department of Genetics, Washington University School of Medicine, Campus Box 8108, 660 S Euclid Ave, St Louis, MO, 63110, USA.
| |
Collapse
|
10
|
Wang W, Wang W, Sun W, Crowley JJ, Szatkiewicz JP. Allele-specific copy-number discovery from whole-genome and whole-exome sequencing. Nucleic Acids Res 2015; 43:e90. [PMID: 25883151 PMCID: PMC4538801 DOI: 10.1093/nar/gkv319] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2014] [Accepted: 03/27/2015] [Indexed: 11/14/2022] Open
Abstract
Copy-number variants (CNVs) are a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize them. It is conceivable that allele-specific reads from high-throughput sequencing data could be leveraged to both enhance CNV detection and produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. In this paper, we develop an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. To evaluate the performance of AS-GENSENG, we conducted extensive simulations, generated empirical data using existing WGS and WES data sets and validated predicted CNVs using an independent methodology. We conclude that AS-GENSENG not only predicts accurate ASCN calls but also improves the accuracy of total copy number calls, owing to its unique ability to exploit information from both total and allele-specific read counts while accounting for various experimental biases in sequence data. Our novel, user-friendly and computationally efficient method and a complete analytic protocol is freely available at https://sourceforge.net/projects/asgenseng/.
Collapse
Affiliation(s)
- WeiBo Wang
- Department of Computer Science, University of North Carolina at Chapel Hill, NC 27599-3175, USA
| | - Wei Wang
- Department of Computer Science, University of California, Los Angeles, CA 90095, USA
| | - Wei Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, NC 27599-7400, USA
| | - James J Crowley
- Department of Genetics, University of North Carolina at Chapel Hill, NC 27599-7264, USA
| | - Jin P Szatkiewicz
- Department of Genetics, University of North Carolina at Chapel Hill, NC 27599-7264, USA
| |
Collapse
|
11
|
Boulouis A, Drapier D, Razafimanantsoa H, Wostrikoff K, Tourasse NJ, Pascal K, Girard-Bascou J, Vallon O, Wollman FA, Choquet Y. Spontaneous dominant mutations in chlamydomonas highlight ongoing evolution by gene diversification. THE PLANT CELL 2015; 27:984-1001. [PMID: 25804537 PMCID: PMC4558696 DOI: 10.1105/tpc.15.00010] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/05/2015] [Revised: 02/10/2015] [Accepted: 03/05/2015] [Indexed: 05/04/2023]
Abstract
We characterized two spontaneous and dominant nuclear mutations in the unicellular alga Chlamydomonas reinhardtii, ncc1 and ncc2 (for nuclear control of chloroplast gene expression), which affect two octotricopeptide repeat (OPR) proteins encoded in a cluster of paralogous genes on chromosome 15. Both mutations cause a single amino acid substitution in one OPR repeat. As a result, the mutated NCC1 and NCC2 proteins now recognize new targets that we identified in the coding sequences of the chloroplast atpA and petA genes, respectively. Interaction of the mutated proteins with these targets leads to transcript degradation; however, in contrast to the ncc1 mutation, the ncc2 mutation requires on-going translation to promote the decay of the petA mRNA. Thus, these mutants reveal a mechanism by which nuclear factors act on chloroplast mRNAs in Chlamydomonas. They illustrate how diversifying selection can allow cells to adapt the nuclear control of organelle gene expression to environmental changes. We discuss these data in the wider context of the evolution of regulation by helical repeat proteins.
Collapse
Affiliation(s)
- Alix Boulouis
- Unité Mixte de Recherche 7141, CNRS/UPMC, Institut de Biologie Physico-Chimique, F-75005 Paris, France
| | - Dominique Drapier
- Unité Mixte de Recherche 7141, CNRS/UPMC, Institut de Biologie Physico-Chimique, F-75005 Paris, France
| | - Hélène Razafimanantsoa
- Unité Mixte de Recherche 7141, CNRS/UPMC, Institut de Biologie Physico-Chimique, F-75005 Paris, France
| | - Katia Wostrikoff
- Unité Mixte de Recherche 7141, CNRS/UPMC, Institut de Biologie Physico-Chimique, F-75005 Paris, France
| | - Nicolas J Tourasse
- Unité Mixte de Recherche 7141, CNRS/UPMC, Institut de Biologie Physico-Chimique, F-75005 Paris, France
| | - Kevin Pascal
- Unité Mixte de Recherche 7141, CNRS/UPMC, Institut de Biologie Physico-Chimique, F-75005 Paris, France
| | - Jacqueline Girard-Bascou
- Unité Mixte de Recherche 7141, CNRS/UPMC, Institut de Biologie Physico-Chimique, F-75005 Paris, France
| | - Olivier Vallon
- Unité Mixte de Recherche 7141, CNRS/UPMC, Institut de Biologie Physico-Chimique, F-75005 Paris, France
| | - Francis-André Wollman
- Unité Mixte de Recherche 7141, CNRS/UPMC, Institut de Biologie Physico-Chimique, F-75005 Paris, France
| | - Yves Choquet
- Unité Mixte de Recherche 7141, CNRS/UPMC, Institut de Biologie Physico-Chimique, F-75005 Paris, France
| |
Collapse
|
12
|
Lin K, Smit S, Bonnema G, Sanchez-Perez G, de Ridder D. Making the difference: integrating structural variation detection tools. Brief Bioinform 2014; 16:852-64. [PMID: 25504367 DOI: 10.1093/bib/bbu047] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2014] [Indexed: 01/01/2023] Open
Abstract
From prokaryotes to eukaryotes, phenotypic variation, adaptation and speciation has been associated with structural variation between genomes of individuals within the same species. Many computer algorithms detecting such variations (callers) have recently been developed, spurred by the advent of the next-generation sequencing technology. Such callers mainly exploit split-read mapping or paired-end read mapping. However, as different callers are geared towards different types of structural variation, there is still no single caller that can be considered a community standard; instead, increasingly the various callers are combined in integrated pipelines. In this article, we review a wide range of callers, discuss challenges in the integration step and present a survey of pipelines used in population genomics studies. Based on our findings, we provide general recommendations on how to set-up such pipelines. Finally, we present an outlook on future challenges in structural variation detection.
Collapse
|
13
|
Keane TM, Wong K, Adams DJ, Flint J, Reymond A, Yalcin B. Identification of structural variation in mouse genomes. Front Genet 2014; 5:192. [PMID: 25071822 PMCID: PMC4079067 DOI: 10.3389/fgene.2014.00192] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2014] [Accepted: 06/12/2014] [Indexed: 01/25/2023] Open
Abstract
Structural variation is variation in structure of DNA regions affecting DNA sequence length and/or orientation. It generally includes deletions, insertions, copy-number gains, inversions, and transposable elements. Traditionally, the identification of structural variation in genomes has been challenging. However, with the recent advances in high-throughput DNA sequencing and paired-end mapping (PEM) methods, the ability to identify structural variation and their respective association to human diseases has improved considerably. In this review, we describe our current knowledge of structural variation in the mouse, one of the prime model systems for studying human diseases and mammalian biology. We further present the evolutionary implications of structural variation on transposable elements. We conclude with future directions on the study of structural variation in mouse genomes that will increase our understanding of molecular architecture and functional consequences of structural variation.
Collapse
Affiliation(s)
| | - Kim Wong
- Wellcome Trust Sanger Institute Hinxton, Cambridge, UK
| | - David J Adams
- Wellcome Trust Sanger Institute Hinxton, Cambridge, UK
| | | | - Alexandre Reymond
- Center for Integrative Genomics, University of Lausanne Lausanne, Switzerland
| | - Binnaz Yalcin
- Center for Integrative Genomics, University of Lausanne Lausanne, Switzerland ; Institute of Genetics and Molecular and Cellular Biology Illkirch, France
| |
Collapse
|
14
|
Mace ES, Tai S, Gilding EK, Li Y, Prentis PJ, Bian L, Campbell BC, Hu W, Innes DJ, Han X, Cruickshank A, Dai C, Frère C, Zhang H, Hunt CH, Wang X, Shatte T, Wang M, Su Z, Li J, Lin X, Godwin ID, Jordan DR, Wang J. Whole-genome sequencing reveals untapped genetic potential in Africa's indigenous cereal crop sorghum. Nat Commun 2014; 4:2320. [PMID: 23982223 PMCID: PMC3759062 DOI: 10.1038/ncomms3320] [Citation(s) in RCA: 260] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2013] [Accepted: 07/17/2013] [Indexed: 11/09/2022] Open
Abstract
Sorghum is a food and feed cereal crop adapted to heat and drought and a staple for 500 million of the world’s poorest people. Its small diploid genome and phenotypic diversity make it an ideal C4 grass model as a complement to C3 rice. Here we present high coverage (16–45 × ) resequenced genomes of 44 sorghum lines representing the primary gene pool and spanning dimensions of geographic origin, end-use and taxonomic group. We also report the first resequenced genome of S. propinquum, identifying 8 M high-quality SNPs, 1.9 M indels and specific gene loss and gain events in S. bicolor. We observe strong racial structure and a complex domestication history involving at least two distinct domestication events. These assembled genomes enable the leveraging of existing cereal functional genomics data against the novel diversity available in sorghum, providing an unmatched resource for the genetic improvement of sorghum and other grass species. Sorghum is a drought-resistant food and feed cereal crop used by over half a billion of the world’s poorest people. Here the authors present high-coverage resequencing genome data of 44 sorghum lines of varying geographic and taxonomic origin, which include a number of sorghum wild relatives.
Collapse
Affiliation(s)
- Emma S Mace
- 1] Department of Agriculture, Fisheries and Forestry Queensland (DAFFQ), Warwick, Queensland 4370, Australia [2]
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Abstract
Common copy number variations (CNVs) are small regions of genomic variations at the same loci across multiple samples, which can be detected with high resolution from next-generation sequencing (NGS) technique. Multiple sequencing data samples are often available from genomic studies; examples include sequences from multiple platforms and sequences from multiple individuals. By integrating complementary information from multiple data samples, detection power can be potentially improved. However, most of current CNV detection methods often process an individual sequence sample, or two samples in an abnormal versus matched normal study; researches on detecting common CNVs across multiple samples have been very limited but are much needed. In this paper, we propose a novel method to detect common CNVs from multiple sequencing samples by exploiting the concurrency of genomic variations in read depth signals derived from multiple NGS data. We use a penalized sparse regression model to fit multiple read depth profiles, based on which common CNV identification is formulated as a change-point detection problem. Finally, we validate the proposed method on both simulation and real data, showing that it can give both higher detection power and better break point estimation over several published CNV detection methods.
Collapse
Affiliation(s)
- Junbo Duan
- Department of Biomedical Engineering, Xi’an Jiaotong University, Xi’an 710049, China
| | - Hong-Wen Deng
- Department of Biomedical Engineering and Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, LA 70118 USA
| | - Yu-Ping Wang
- Department of Biomedical Engineering and Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, LA 70118 USA
| |
Collapse
|
16
|
Confidence limits for genome DNA copy number variations in HR-CGH array measurements. Biomed Signal Process Control 2014. [DOI: 10.1016/j.bspc.2013.11.007] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
17
|
Kumar V, Kim K, Joseph C, Kourrich S, Yoo SH, Huang HC, Vitaterna MH, de Villena FPM, Churchill G, Bonci A, Takahashi JS. C57BL/6N mutation in cytoplasmic FMRP interacting protein 2 regulates cocaine response. Science 2014; 342:1508-12. [PMID: 24357318 DOI: 10.1126/science.1245503] [Citation(s) in RCA: 140] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
The inbred mouse C57BL/6J is the reference strain for genome sequence and for most behavioral and physiological phenotypes. However, the International Knockout Mouse Consortium uses an embryonic stem cell line derived from a related C57BL/6N substrain. We found that C57BL/6N has a lower acute and sensitized response to cocaine and methamphetamine. We mapped a single causative locus and identified a nonsynonymous mutation of serine to phenylalanine (S968F) in Cytoplasmic FMRP interacting protein 2 (Cyfip2) as the causative variant. The S968F mutation destabilizes CYFIP2, and deletion of the C57BL/6N mutant allele leads to acute and sensitized cocaine-response phenotypes. We propose that CYFIP2 is a key regulator of cocaine response in mammals and present a framework to use mouse substrains to identify previously unknown genes and alleles regulating behavior.
Collapse
Affiliation(s)
- Vivek Kumar
- Department of Neuroscience, University of Texas Southwestern Medical Center, Dallas, TX 75390-9111, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Zheng C, Miao X, Li Y, Huang Y, Ruan J, Ma X, Wang L, Wu CI, Cai J. Determination of genomic copy number alteration emphasizing a restriction site-based strategy of genome re-sequencing. Bioinformatics 2013; 29:2813-21. [DOI: 10.1093/bioinformatics/btt481] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
|
19
|
Takada T, Ebata T, Noguchi H, Keane TM, Adams DJ, Narita T, Shin-I T, Fujisawa H, Toyoda A, Abe K, Obata Y, Sakaki Y, Moriwaki K, Fujiyama A, Kohara Y, Shiroishi T. The ancestor of extant Japanese fancy mice contributed to the mosaic genomes of classical inbred strains. Genome Res 2013; 23:1329-38. [PMID: 23604024 PMCID: PMC3730106 DOI: 10.1101/gr.156497.113] [Citation(s) in RCA: 71] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2013] [Accepted: 04/10/2013] [Indexed: 01/07/2023]
Abstract
Commonly used classical inbred mouse strains have mosaic genomes with sequences from different subspecific origins. Their genomes are derived predominantly from the Western European subspecies Mus musculus domesticus, with the remaining sequences derived mostly from the Japanese subspecies Mus musculus molossinus. However, it remains unknown how this intersubspecific genome introgression occurred during the establishment of classical inbred strains. In this study, we resequenced the genomes of two M. m. molossinus-derived inbred strains, MSM/Ms and JF1/Ms. MSM/Ms originated from Japanese wild mice, and the ancestry of JF1/Ms was originally found in Europe and then transferred to Japan. We compared the characteristics of these sequences to those of the C57BL/6J reference sequence and the recent data sets from the resequencing of 17 inbred strains in the Mouse Genome Project (MGP), and the results unequivocally show that genome introgression from M. m. molossinus into M. m. domesticus provided the primary framework for the mosaic genomes of classical inbred strains. Furthermore, the genomes of C57BL/6J and other classical inbred strains have long consecutive segments with extremely high similarity (>99.998%) to the JF1/Ms strain. In the early 20th century, Japanese waltzing mice with a morphological phenotype resembling that of JF1/Ms mice were often crossed with European fancy mice for early studies of "Mendelism," which suggests that the ancestor of the extant JF1/Ms strain provided the origin of the M. m. molossinus genome in classical inbred strains and largely contributed to its intersubspecific genome diversity.
Collapse
Affiliation(s)
- Toyoyuki Takada
- Mammalian Genetics Laboratory, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
- Transdisciplinary Research Integration Center, Research Organization of Information and Systems, Minato-ku, Tokyo 105-0001, Japan
| | - Toshinobu Ebata
- Genome Biology Laboratory, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
- Comparative Genomics Laboratory, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Hideki Noguchi
- Comparative Genomics Laboratory, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Thomas M. Keane
- The Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom
| | - David J. Adams
- The Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom
| | - Takanori Narita
- Genome Biology Laboratory, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Tadasu Shin-I
- Genome Biology Laboratory, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
- Comparative Genomics Laboratory, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Hironori Fujisawa
- Transdisciplinary Research Integration Center, Research Organization of Information and Systems, Minato-ku, Tokyo 105-0001, Japan
- The Institute of Statistical Mathematics, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan
| | - Atsushi Toyoda
- Comparative Genomics Laboratory, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Kuniya Abe
- RIKEN BioResource Center, Tsukuba, Ibaraki 305-0074, Japan
| | - Yuichi Obata
- RIKEN BioResource Center, Tsukuba, Ibaraki 305-0074, Japan
| | - Yoshiyuki Sakaki
- Genome Science Center, RIKEN Yokohama Institute, Yokohama, Kanagawa 230-0045, Japan
| | - Kazuo Moriwaki
- RIKEN BioResource Center, Tsukuba, Ibaraki 305-0074, Japan
| | - Asao Fujiyama
- Comparative Genomics Laboratory, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Yuji Kohara
- Genome Biology Laboratory, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Toshihiko Shiroishi
- Mammalian Genetics Laboratory, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
- Transdisciplinary Research Integration Center, Research Organization of Information and Systems, Minato-ku, Tokyo 105-0001, Japan
| |
Collapse
|
20
|
Duan J, Zhang JG, Deng HW, Wang YP. CNV-TV: a robust method to discover copy number variation from short sequencing reads. BMC Bioinformatics 2013; 14:150. [PMID: 23634703 PMCID: PMC3679874 DOI: 10.1186/1471-2105-14-150] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2012] [Accepted: 04/19/2013] [Indexed: 11/29/2022] Open
Abstract
Background Copy number variation (CNV) is an important structural variation (SV) in human genome. Various studies have shown that CNVs are associated with complex diseases. Traditional CNV detection methods such as fluorescence in situ hybridization (FISH) and array comparative genomic hybridization (aCGH) suffer from low resolution. The next generation sequencing (NGS) technique promises a higher resolution detection of CNVs and several methods were recently proposed for realizing such a promise. However, the performances of these methods are not robust under some conditions, e.g., some of them may fail to detect CNVs of short sizes. There has been a strong demand for reliable detection of CNVs from high resolution NGS data. Results A novel and robust method to detect CNV from short sequencing reads is proposed in this study. The detection of CNV is modeled as a change-point detection from the read depth (RD) signal derived from the NGS, which is fitted with a total variation (TV) penalized least squares model. The performance (e.g., sensitivity and specificity) of the proposed approach are evaluated by comparison with several recently published methods on both simulated and real data from the 1000 Genomes Project. Conclusion The experimental results showed that both the true positive rate and false positive rate of the proposed detection method do not change significantly for CNVs with different copy numbers and lengthes, when compared with several existing methods. Therefore, our proposed approach results in a more reliable detection of CNVs than the existing methods.
Collapse
Affiliation(s)
- Junbo Duan
- Department of Biomedical Engineering, Tulane University, New Orleans, LA, USA
| | | | | | | |
Collapse
|
21
|
Duan J, Zhang JG, Deng HW, Wang YP. Comparative studies of copy number variation detection methods for next-generation sequencing technologies. PLoS One 2013; 8:e59128. [PMID: 23527109 PMCID: PMC3604020 DOI: 10.1371/journal.pone.0059128] [Citation(s) in RCA: 119] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2012] [Accepted: 02/12/2013] [Indexed: 11/25/2022] Open
Abstract
Copy number variation (CNV) has played an important role in studies of susceptibility or resistance to complex diseases. Traditional methods such as fluorescence in situ hybridization (FISH) and array comparative genomic hybridization (aCGH) suffer from low resolution of genomic regions. Following the emergence of next generation sequencing (NGS) technologies, CNV detection methods based on the short read data have recently been developed. However, due to the relatively young age of the procedures, their performance is not fully understood. To help investigators choose suitable methods to detect CNVs, comparative studies are needed. We compared six publicly available CNV detection methods: CNV-seq, FREEC, readDepth, CNVnator, SegSeq and event-wise testing (EWT). They are evaluated both on simulated and real data with different experiment settings. The receiver operating characteristic (ROC) curve is employed to demonstrate the detection performance in terms of sensitivity and specificity, box plot is employed to compare their performances in terms of breakpoint and copy number estimation, Venn diagram is employed to show the consistency among these methods, and F-score is employed to show the overlapping quality of detected CNVs. The computational demands are also studied. The results of our work provide a comprehensive evaluation on the performances of the selected CNV detection methods, which will help biological investigators choose the best possible method.
Collapse
Affiliation(s)
- Junbo Duan
- Department of Biomedical Engineering, Tulane University, New Orleans, Louisiana, United States of America
- Center for Bioinformatics and Genomics, Tulane University, New Orleans, Louisiana, United States of America
| | - Ji-Gang Zhang
- Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, Louisiana, United States of America
- Center for Bioinformatics and Genomics, Tulane University, New Orleans, Louisiana, United States of America
| | - Hong-Wen Deng
- Department of Biomedical Engineering, Tulane University, New Orleans, Louisiana, United States of America
- Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, Louisiana, United States of America
- Center for Bioinformatics and Genomics, Tulane University, New Orleans, Louisiana, United States of America
| | - Yu-Ping Wang
- Department of Biomedical Engineering, Tulane University, New Orleans, Louisiana, United States of America
- Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, Louisiana, United States of America
- Center for Bioinformatics and Genomics, Tulane University, New Orleans, Louisiana, United States of America
- * E-mail:
| |
Collapse
|
22
|
McCallum KJ, Wang JP. Quantifying copy number variations using a hidden Markov model with inhomogeneous emission distributions. Biostatistics 2013; 14:600-11. [PMID: 23428932 DOI: 10.1093/biostatistics/kxt003] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Copy number variations (CNVs) are a significant source of genetic variation and have been found frequently associated with diseases such as cancers and autism. High-throughput sequencing data are increasingly being used to detect and quantify CNVs; however, the distributional properties of the data are not fully understood. A hidden Markov model (HMM) is proposed using inhomogeneous emission distributions based on negative binomial regression to account for the sequencing biases. The model is tested on the whole genome sequencing data and simulated data sets. An algorithm for CNV detection is implemented in the R package CNVfinder. The model based on negative binomial regression is shown to provide a good fit to the data and provides competitive performance compared with methods based on normalization of read counts.
Collapse
|
23
|
Wang Z, Hormozdiari F, Yang WY, Halperin E, Eskin E. CNVeM: copy number variation detection using uncertainty of read mapping. J Comput Biol 2013; 20:224-36. [PMID: 23421794 DOI: 10.1089/cmb.2012.0258] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Copy number variations (CNVs) are widely known to be an important mediator for diseases and traits. The development of high-throughput sequencing (HTS) technologies has provided great opportunities to identify CNV regions in mammalian genomes. In a typical experiment, millions of short reads obtained from a genome of interest are mapped to a reference genome. The mapping information can be used to identify CNV regions. One important challenge in analyzing the mapping information is the large fraction of reads that can be mapped to multiple positions. Most existing methods either only consider reads that can be uniquely mapped to the reference genome or randomly place a read to one of its mapping positions. Therefore, these methods have low power to detect CNVs located within repeated sequences. In this study, we propose a probabilistic model, CNVeM, that utilizes the inherent uncertainty of read mapping. We use maximum likelihood to estimate locations and copy numbers of copied regions and implement an expectation-maximization (EM) algorithm. One important contribution of our model is that we can distinguish between regions in the reference genome that differ from each other by as little as 0.1%. As our model aims to predict the copy number of each nucleotide, we can predict the CNV boundaries with high resolution. We apply our method to simulated datasets and achieve higher accuracy compared to CNVnator. Moreover, we apply our method to real data from which we detected known CNVs. To our knowledge, this is the first attempt to predict CNVs at nucleotide resolution and to utilize uncertainty of read mapping.
Collapse
Affiliation(s)
- Zhanyong Wang
- Computer Science Department, University of California Los Angeles, Los Angeles, CA 90095-1596, USA
| | | | | | | | | |
Collapse
|
24
|
Szatkiewicz JP, Wang W, Sullivan PF, Wang W, Sun W. Improving detection of copy-number variation by simultaneous bias correction and read-depth segmentation. Nucleic Acids Res 2012; 41:1519-32. [PMID: 23275535 PMCID: PMC3561969 DOI: 10.1093/nar/gks1363] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Structural variation is an important class of genetic variation in mammals. High-throughput sequencing (HTS) technologies promise to revolutionize copy-number variation (CNV) detection but present substantial analytic challenges. Converging evidence suggests that multiple types of CNV-informative data (e.g. read-depth, read-pair, split-read) need be considered, and that sophisticated methods are needed for more accurate CNV detection. We observed that various sources of experimental biases in HTS confound read-depth estimation, and note that bias correction has not been adequately addressed by existing methods. We present a novel read-depth-based method, GENSENG, which uses a hidden Markov model and negative binomial regression framework to identify regions of discrete copy-number changes while simultaneously accounting for the effects of multiple confounders. Based on extensive calibration using multiple HTS data sets, we conclude that our method outperforms existing read-depth-based CNV detection algorithms. The concept of simultaneous bias correction and CNV detection can serve as a basis for combining read-depth with other types of information such as read-pair or split-read in a single analysis. A user-friendly and computationally efficient implementation of our method is freely available.
Collapse
Affiliation(s)
- Jin P Szatkiewicz
- Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599-7264, USA.
| | | | | | | | | |
Collapse
|
25
|
Systems genetics in "-omics" era: current and future development. Theory Biosci 2012; 132:1-16. [PMID: 23138757 DOI: 10.1007/s12064-012-0168-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2012] [Accepted: 10/25/2012] [Indexed: 02/06/2023]
Abstract
The systems genetics is an emerging discipline that integrates high-throughput expression profiling technology and systems biology approaches for revealing the molecular mechanism of complex traits, and will improve our understanding of gene functions in the biochemical pathway and genetic interactions between biological molecules. With the rapid advances of microarray analysis technologies, bioinformatics is extensively used in the studies of gene functions, SNP-SNP genetic interactions, LD block-block interactions, miRNA-mRNA interactions, DNA-protein interactions, protein-protein interactions, and functional mapping for LD blocks. Based on bioinformatics panel, which can integrate "-omics" datasets to extract systems knowledge and useful information for explaining the molecular mechanism of complex traits, systems genetics is all about to enhance our understanding of biological processes. Systems biology has provided systems level recognition of various biological phenomena, and constructed the scientific background for the development of systems genetics. In addition, the next-generation sequencing technology and post-genome wide association studies empower the discovery of new gene and rare variants. The integration of different strategies will help to propose novel hypothesis and perfect the theoretical framework of systems genetics, which will make contribution to the future development of systems genetics, and open up a whole new area of genetics.
Collapse
|
26
|
Yalcin B, Wong K, Bhomra A, Goodson M, Keane TM, Adams DJ, Flint J. The fine-scale architecture of structural variants in 17 mouse genomes. Genome Biol 2012; 13:R18. [PMID: 22439878 PMCID: PMC3439969 DOI: 10.1186/gb-2012-13-3-r18] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2011] [Revised: 03/14/2012] [Accepted: 03/20/2012] [Indexed: 11/19/2022] Open
Abstract
Background Accurate catalogs of structural variants (SVs) in mammalian genomes are necessary to elucidate the potential mechanisms that drive SV formation and to assess their functional impact. Next generation sequencing methods for SV detection are an advance on array-based methods, but are almost exclusively limited to four basic types: deletions, insertions, inversions and copy number gains. Results By visual inspection of 100 Mbp of genome to which next generation sequence data from 17 inbred mouse strains had been aligned, we identify and interpret 21 paired-end mapping patterns, which we validate by PCR. These paired-end mapping patterns reveal a greater diversity and complexity in SVs than previously recognized. In addition, Sanger-based sequence analysis of 4,176 breakpoints at 261 SV sites reveal additional complexity at approximately a quarter of structural variants analyzed. We find micro-deletions and micro-insertions at SV breakpoints, ranging from 1 to 107 bp, and SNPs that extend breakpoint micro-homology and may catalyze SV formation. Conclusions An integrative approach using experimental analyses to train computational SV calling is essential for the accurate resolution of the architecture of SVs. We find considerable complexity in SV formation; about a quarter of SVs in the mouse are composed of a complex mixture of deletion, insertion, inversion and copy number gain. Computational methods can be adapted to identify most paired-end mapping patterns.
Collapse
Affiliation(s)
- Binnaz Yalcin
- The Wellcome Trust Centre for Human Genetics, Oxford, UK.
| | | | | | | | | | | | | |
Collapse
|
27
|
Wong K, Bumpstead S, Van Der Weyden L, Reinholdt LG, Wilming LG, Adams DJ, Keane TM. Sequencing and characterization of the FVB/NJ mouse genome. Genome Biol 2012; 13:R72. [PMID: 22916792 PMCID: PMC3491372 DOI: 10.1186/gb-2012-13-8-r72] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2012] [Accepted: 08/23/2012] [Indexed: 01/13/2023] Open
Abstract
Background The FVB/NJ mouse strain has its origins in a colony of outbred Swiss mice established in 1935 at the National Institutes of Health. Mice derived from this source were selectively bred for sensitivity to histamine diphosphate and the B strain of Friend leukemia virus. This led to the establishment of the FVB/N inbred strain, which was subsequently imported to the Jackson Laboratory and designated FVB/NJ. The FVB/NJ mouse has several distinct characteristics, such as large pronuclear morphology, vigorous reproductive performance, and consistently large litters that make it highly desirable for transgenic strain production and general purpose use. Results Using next-generation sequencing technology, we have sequenced the genome of FVB/NJ to approximately 50-fold coverage, and have generated a comprehensive catalog of single nucleotide polymorphisms, small insertion/deletion polymorphisms, and structural variants, relative to the reference C57BL/6J genome. We have examined a previously identified quantitative trait locus for atherosclerosis susceptibility on chromosome 10 and identify several previously unknown candidate causal variants. Conclusion The sequencing of the FVB/NJ genome and generation of this catalog has increased the number of known variant sites in FVB/NJ by a factor of four, and will help accelerate the identification of the precise molecular variants that are responsible for phenotypes observed in this widely used strain.
Collapse
|
28
|
Yalcin B, Adams DJ, Flint J, Keane TM. Next-generation sequencing of experimental mouse strains. Mamm Genome 2012; 23:490-8. [PMID: 22772437 PMCID: PMC3463794 DOI: 10.1007/s00335-012-9402-6] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2012] [Accepted: 05/24/2012] [Indexed: 12/24/2022]
Abstract
Since the turn of the century the complete genome sequence of just one mouse strain, C57BL/6J, has been available. Knowing the sequence of this strain has enabled large-scale forward genetic screens to be performed, the creation of an almost complete set of embryonic stem (ES) cell lines with targeted alleles for protein-coding genes, and the generation of a rich catalog of mouse genomic variation. However, many experiments that use other common laboratory mouse strains have been hindered by a lack of whole-genome sequence data for these strains. The last 5 years has witnessed a revolution in DNA sequencing technologies. Recently, these technologies have been used to expand the repertoire of fully sequenced mouse genomes. In this article we review the main findings of these studies and discuss how the sequence of mouse genomes is helping pave the way from sequence to phenotype. Finally, we discuss the prospects for using de novo assembly techniques to obtain high-quality assembled genome sequences of these laboratory mouse strains, and what advances in sequencing technologies may be required to achieve this goal.
Collapse
Affiliation(s)
- Binnaz Yalcin
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.
| | | | | | | |
Collapse
|
29
|
Lee J, Kim B, Yoon J, Lee U. Detection of copy number variation using scale space filtering. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2012; 2011:5555-8. [PMID: 22255597 DOI: 10.1109/iembs.2011.6091417] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
This study proposes a novel CNV detection algorithm based on scale space filtering. It uses Gaussian filter for the convolution with a scale parameter. The range of the scale parameter is adjusted according to the coverage level of read data. The position of a CNV region is determined through a coarse and a fine searches over the scales. The results showed low dependency of the performance of the proposed method on the coverage level compared to the conventional methods. The results also showed that the proposed method outperforms the conventional methods by 63.29 ~ 73.57 %.
Collapse
Affiliation(s)
- Jongkeun Lee
- Department of Computer Engineering, Hallym University, Korea.
| | | | | | | |
Collapse
|
30
|
Nellåker C, Keane TM, Yalcin B, Wong K, Agam A, Belgard TG, Flint J, Adams DJ, Frankel WN, Ponting CP. The genomic landscape shaped by selection on transposable elements across 18 mouse strains. Genome Biol 2012; 13:R45. [PMID: 22703977 PMCID: PMC3446317 DOI: 10.1186/gb-2012-13-6-r45] [Citation(s) in RCA: 127] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2012] [Revised: 05/25/2012] [Accepted: 06/15/2012] [Indexed: 12/20/2022] Open
Abstract
Background Transposable element (TE)-derived sequence dominates the landscape of mammalian genomes and can modulate gene function by dysregulating transcription and translation. Our current knowledge of TEs in laboratory mouse strains is limited primarily to those present in the C57BL/6J reference genome, with most mouse TEs being drawn from three distinct classes, namely short interspersed nuclear elements (SINEs), long interspersed nuclear elements (LINEs) and the endogenous retrovirus (ERV) superfamily. Despite their high prevalence, the different genomic and gene properties controlling whether TEs are preferentially purged from, or are retained by, genetic drift or positive selection in mammalian genomes remain poorly defined. Results Using whole genome sequencing data from 13 classical laboratory and 4 wild-derived mouse inbred strains, we developed a comprehensive catalogue of 103,798 polymorphic TE variants. We employ this extensive data set to characterize TE variants across the Mus lineage, and to infer neutral and selective processes that have acted over 2 million years. Our results indicate that the majority of TE variants are introduced though the male germline and that only a minority of TE variants exert detectable changes in gene expression. However, among genes with differential expression across the strains there are twice as many TE variants identified as being putative causal variants as expected. Conclusions Most TE variants that cause gene expression changes appear to be purged rapidly by purifying selection. Our findings demonstrate that past TE insertions have often been highly deleterious, and help to prioritize TE variants according to their likely contribution to gene expression or phenotype variation.
Collapse
Affiliation(s)
- Christoffer Nellåker
- MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, UK.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Seifert M, Gohr A, Strickert M, Grosse I. Parsimonious higher-order hidden Markov models for improved array-CGH analysis with applications to Arabidopsis thaliana. PLoS Comput Biol 2012; 8:e1002286. [PMID: 22253580 PMCID: PMC3257270 DOI: 10.1371/journal.pcbi.1002286] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2011] [Accepted: 10/11/2011] [Indexed: 12/19/2022] Open
Abstract
Array-based comparative genomic hybridization (Array-CGH) is an important technology in molecular biology for the detection of DNA copy number polymorphisms between closely related genomes. Hidden Markov Models (HMMs) are popular tools for the analysis of Array-CGH data, but current methods are only based on first-order HMMs having constrained abilities to model spatial dependencies between measurements of closely adjacent chromosomal regions. Here, we develop parsimonious higher-order HMMs enabling the interpolation between a mixture model ignoring spatial dependencies and a higher-order HMM exhaustively modeling spatial dependencies. We apply parsimonious higher-order HMMs to the analysis of Array-CGH data of the accessions C24 and Col-0 of the model plant Arabidopsis thaliana. We compare these models against first-order HMMs and other existing methods using a reference of known deletions and sequence deviations. We find that parsimonious higher-order HMMs clearly improve the identification of these polymorphisms. Moreover, we perform a functional analysis of identified polymorphisms revealing novel details of genomic differences between C24 and Col-0. Additional model evaluations are done on widely considered Array-CGH data of human cell lines indicating that parsimonious HMMs are also well-suited for the analysis of non-plant specific data. All these results indicate that parsimonious higher-order HMMs are useful for Array-CGH analyses. An implementation of parsimonious higher-order HMMs is available as part of the open source Java library Jstacs (www.jstacs.de/index.php/PHHMM). Array-based comparative genomics is a standard approach for the identification of DNA copy number polymorphisms between closely related genomes. The huge amounts of data produced by these experiments require efficient and accurate bioinformatics tools for the identification of copy number polymorphisms. Hidden Markov Models (HMMs) are frequently used for analyzing such data sets, but current models are based on first-order HMMs only having limited capabilities to model spatial dependencies between measurements of closely adjacent chromosomal regions. We develop parsimonious higher-order HMMs enabling the interpolation between a mixture model ignoring spatial dependencies and a higher-order HMM exhaustively modeling these dependencies to overcome this limitation. In an in-depth case study with Arabidopsis thaliana, we find that parsimonious higher-order HMMs clearly improve the identification of copy number polymorphisms in comparison to standard first-order HMMs and other frequently used methods. Functional analysis of identified polymorphisms revealed details of genomic differences between the accessions C24 and Col-0 of Arabidopsis thaliana. An additional study on human cell lines further indicates that parsimonious HMMs are well-suited for the analysis of Array-CGH data.
Collapse
Affiliation(s)
- Michael Seifert
- Department of Molecular Genetics, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany.
| | | | | | | |
Collapse
|
32
|
Yalcin B, Wong K, Agam A, Goodson M, Keane TM, Gan X, Nellåker C, Goodstadt L, Nicod J, Bhomra A, Hernandez-Pliego P, Whitley H, Cleak J, Dutton R, Janowitz D, Mott R, Adams DJ, Flint J. Sequence-based characterization of structural variation in the mouse genome. Nature 2011; 477:326-9. [PMID: 21921916 PMCID: PMC3428933 DOI: 10.1038/nature10432] [Citation(s) in RCA: 259] [Impact Index Per Article: 19.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2011] [Accepted: 08/04/2011] [Indexed: 02/02/2023]
Abstract
Structural variation is widespread in mammalian genomes and is an important cause of disease, but just how abundant and important structural variants (SVs) are in shaping phenotypic variation remains unclear. Without knowing how many SVs there are, and how they arise, it is difficult to discover what they do. Combining experimental with automated analyses, we identified 711,920 SVs at 281,243 sites in the genomes of thirteen classical and four wild-derived inbred mouse strains. The majority of SVs are less than 1 kilobase in size and 98% are deletions or insertions. The breakpoints of 160,000 SVs were mapped to base pair resolution, allowing us to infer that insertion of retrotransposons causes more than half of SVs. Yet, despite their prevalence, SVs are less likely than other sequence variants to cause gene expression or quantitative phenotypic variation. We identified 24 SVs that disrupt coding exons, acting as rare variants of large effect on gene function. One-third of the genes so affected have immunological functions.
Collapse
Affiliation(s)
- Binnaz Yalcin
- The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
| | - Kim Wong
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK
| | - Avigail Agam
- The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
- MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, South Parks Road, Oxford OX1 3QX, UK
| | - Martin Goodson
- The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
| | - Thomas M. Keane
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK
| | - Xiangchao Gan
- The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
| | - Christoffer Nellåker
- MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, South Parks Road, Oxford OX1 3QX, UK
| | - Leo Goodstadt
- The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
| | - Jérôme Nicod
- The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
| | - Amarjit Bhomra
- The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
| | | | - Helen Whitley
- The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
| | - James Cleak
- The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
| | - Rebekah Dutton
- The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
| | - Deborah Janowitz
- The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
| | - Richard Mott
- The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
| | - David J. Adams
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK
| | - Jonathan Flint
- The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford, OX3 7BN, UK
| |
Collapse
|
33
|
He D, Hormozdiari F, Furlotte N, Eskin E. Efficient algorithms for tandem copy number variation reconstruction in repeat-rich regions. Bioinformatics 2011; 27:1513-20. [PMID: 21505028 DOI: 10.1093/bioinformatics/btr169] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
MOTIVATION Structural variations and in particular copy number variations (CNVs) have dramatic effects of disease and traits. Technologies for identifying CNVs have been an active area of research for over 10 years. The current generation of high-throughput sequencing techniques presents new opportunities for identification of CNVs. Methods that utilize these technologies map sequencing reads to a reference genome and look for signatures which might indicate the presence of a CNV. These methods work well when CNVs lie within unique genomic regions. However, the problem of CNV identification and reconstruction becomes much more challenging when CNVs are in repeat-rich regions, due to the multiple mapping positions of the reads. RESULTS In this study, we propose an efficient algorithm to handle these multi-mapping reads such that the CNVs can be reconstructed with high accuracy even for repeat-rich regions. To our knowledge, this is the first attempt to both identify and reconstruct CNVs in repeat-rich regions. Our experiments show that our method is not only computationally efficient but also accurate.
Collapse
Affiliation(s)
- Dan He
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA 90095, USA
| | | | | | | |
Collapse
|
34
|
Wong K, Keane TM, Stalker J, Adams DJ. Enhanced structural variant and breakpoint detection using SVMerge by integration of multiple detection methods and local assembly. Genome Biol 2010; 11:R128. [PMID: 21194472 PMCID: PMC3046488 DOI: 10.1186/gb-2010-11-12-r128] [Citation(s) in RCA: 82] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2010] [Revised: 11/09/2010] [Accepted: 12/31/2010] [Indexed: 11/10/2022] Open
Abstract
We present a pipeline, SVMerge, to detect structural variants by integrating calls from several existing structural variant callers, which are then validated and the breakpoints refined using local de novo assembly. SVMerge is modular and extensible, allowing new callers to be incorporated as they become available. We applied SVMerge to the analysis of a HapMap trio, demonstrating enhanced structural variant detection, breakpoint refinement, and a lower false discovery rate. SVMerge can be downloaded from http://svmerge.sourceforge.net.
Collapse
Affiliation(s)
- Kim Wong
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.
| | | | | | | |
Collapse
|
35
|
He D, Furlotte N, Eskin E. Detection and reconstruction of tandemly organized de novo copy number variations. BMC Bioinformatics 2010; 11 Suppl 11:S12. [PMID: 21172047 PMCID: PMC3024866 DOI: 10.1186/1471-2105-11-s11-s12] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Background The characterization of structural variations (SV) such as insertions, deletions and copy number variations is a critical step in the process of understanding the full genetic architecture of organisms. Copy number variations (CNV) have attracted much recent attention due to their effects on gene expression and disease status. Results In this paper, we present a method that utilizes next-generation sequencing technologies (NGS), in order to both detect and reconstruct CNVs. We focus on a special type of CNV, namely tandemly organized de novo CNVs, which have been shown to occur with high frequency in the mouse genome. Conclusions We apply our method to CNV regions randomly inserted into the reference mouse genome and show that our method achieves good performance for both detection and reconstruction of tandemly organized de novo CNVs.
Collapse
Affiliation(s)
- Dan He
- Dept, of Comp, Sci, Univ, of California Los Angeles, Los Angeles, CA 90095, USA.
| | | | | |
Collapse
|
36
|
Medvedev P, Fiume M, Dzamba M, Smith T, Brudno M. Detecting copy number variation with mated short reads. Genome Res 2010; 20:1613-22. [PMID: 20805290 DOI: 10.1101/gr.106344.110] [Citation(s) in RCA: 119] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
The development of high-throughput sequencing (HTS) technologies has opened the door to novel methods for detecting copy number variants (CNVs) in the human genome. While in the past CNVs have been detected based on array CGH data, recent studies have shown that depth-of-coverage information from HTS technologies can also be used for the reliable identification of large copy-variable regions. Such methods, however, are hindered by sequencing biases that lead certain regions of the genome to be over- or undersampled, lowering their resolution and ability to accurately identify the exact breakpoints of the variants. In this work, we develop a method for CNV detection that supplements the depth-of-coverage with paired-end mapping information, where mate pairs mapping discordantly to the reference serve to indicate the presence of variation. Our algorithm, called CNVer, combines this information within a unified computational framework called the donor graph, allowing us to better mitigate the sequencing biases that cause uneven local coverage and accurately predict CNVs. We use CNVer to detect 4879 CNVs in the recently described genome of a Yoruban individual. Most of the calls (77%) coincide with previously known variants within the Database of Genomic Variants, while 81% of deletion copy number variants previously known for this individual coincide with one of our loss calls. Furthermore, we demonstrate that CNVer can reconstruct the absolute copy counts of segments of the donor genome and evaluate the feasibility of using CNVer with low coverage datasets.
Collapse
Affiliation(s)
- Paul Medvedev
- Department of Computer Science, University of Toronto, Toronto, Ontario M5R 3G4, Canada
| | | | | | | | | |
Collapse
|
37
|
Cirulli ET, Goldstein DB. Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat Rev Genet 2010; 11:415-25. [PMID: 20479773 DOI: 10.1038/nrg2779] [Citation(s) in RCA: 827] [Impact Index Per Article: 59.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Although genome-wide association (GWA) studies for common variants have thus far succeeded in explaining only a modest fraction of the genetic components of human common diseases, recent advances in next-generation sequencing technologies could rapidly facilitate substantial progress. This outcome is expected if much of the missing genetic control is due to gene variants that are too rare to be picked up by GWA studies and have relatively large effects on risk. Here, we evaluate the evidence for an important role of rare gene variants of major effect in common diseases and outline discovery strategies for their identification.
Collapse
Affiliation(s)
- Elizabeth T Cirulli
- Center for Human Genome Variation, Duke University Medical School, Durham, North Carolina 27708, USA
| | | |
Collapse
|