1
|
Povysil G, Hochreiter S. IBD Sharing between Africans, Neandertals, and Denisovans. Genome Biol Evol 2018; 8:3406-3416. [PMID: 28158547 PMCID: PMC5381509 DOI: 10.1093/gbe/evw234] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/19/2016] [Indexed: 12/03/2022] Open
Abstract
Interbreeding between ancestors of humans and other hominins outside of Africa has been studied intensively, while their common history within Africa still lacks proper attention. However, shedding light on human evolution in this time period about which little is known, is essential for understanding subsequent events outside of Africa. We investigate the genetic relationships of humans, Neandertals, and Denisovans by identifying very short DNA segments in the 1000 Genomes Phase 3 data that these hominins share identical by descent (IBD). By focusing on low frequency and rare variants, we identify very short IBD segments with high confidence. These segments reveal events from a very distant past because shorter IBD segments are presumably older than longer ones. We extracted two types of very old IBD segments that are not only shared among humans, but also with Neandertals and/or Denisovans. The first type contains longer segments that are found primarily in Asians and Europeans where more segments are found in South Asians than in East Asians for both Neandertal and Denisovan. These longer segments indicate complex admixture events outside of Africa. The second type consists of shorter segments that are shared mainly by Africans and therefore may indicate events involving ancestors of humans and other ancient hominins within Africa. Our results from the autosomes are further supported by an analysis of chromosome X, on which segments that are shared by Africans and match the Neandertal and/or Denisovan genome were even more prominent. Our results indicate that interbreeding with other hominins was a common feature of human evolution starting already long before ancestors of modern humans left Africa.
Collapse
Affiliation(s)
- Gundula Povysil
- Institute of Bioinformatics, Johannes Kepler University Linz, Austria
| | - Sepp Hochreiter
- Institute of Bioinformatics, Johannes Kepler University Linz, Austria
| |
Collapse
|
2
|
Abstract
Differences between genomes can be due to single nucleotide variants (SNPs), translocations, inversions and copy number variants (CNVs, gain or loss of DNA). The latter can range from sub-microscopic events to complete chromosomal aneuploidies. Small CNVs are often benign but those larger than 250 kb are strongly associated with morbid consequences such as developmental disorders and cancer. Detecting CNVs within and between populations is essential to better understand the plasticity of our genome and to elucidate its possible contribution to disease or phenotypic traits.While the link between SNPs and disease susceptibility has been well studied, to date there are still very few published CNV genome-wide association studies; probably owing to the fact that CNV analysis remains a slightly more complex task than SNP analysis (both in term of bioinformatics workflow and uncertainty in the CNV calling leading to high false positive rates and unknown false negative rates). This chapter aims at explaining computational methods for the analysis of CNVs, ranging from study design, data processing and quality control, up to genome-wide association study with clinical traits.
Collapse
Affiliation(s)
- Aurélien Macé
- Institute of Social and Preventive Medicine, University Hospital of Lausanne, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Zoltán Kutalik
- Institute of Social and Preventive Medicine, University Hospital of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | |
Collapse
|
3
|
Wei Z, Shu C, Zhang C, Huang J, Cai H. A short review of variants calling for single-cell-sequencing data with applications. Int J Biochem Cell Biol 2017; 92:218-226. [PMID: 28951246 DOI: 10.1016/j.biocel.2017.09.018] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2017] [Revised: 09/19/2017] [Accepted: 09/23/2017] [Indexed: 11/16/2022]
Abstract
The field of single-cell sequencing is fleetly expanding, and many techniques have been developed in the past decade. With this technology, biologists can study not only the heterogeneity between two adjacent cells in the same tissue or organ, but also the evolutionary relationships and degenerative processes in a single cell. Calling variants is the main purpose in analyzing single cell sequencing (SCS) data. Currently, some popular methods used for bulk-cell-sequencing data analysis are tailored directly to be applied in dealing with SCS data. However, SCS requires an extra step of genome amplification to accumulate enough quantity for satisfying sequencing needs. The amplification yields large biases and thus raises challenge for using the bulk-cell-sequencing methods. In order to provide guidance for the development of specialized analyzed methods as well as using currently developed tools for SNS, this paper aims to bridge the gap. In this paper, we firstly introduced two popular genome amplification methods and compared their capabilities. Then we introduced a few popular models for calling single-nucleotide polymorphisms and copy-number variations. Finally, break-through applications of SNS were summarized to demonstrate its potential in researching cell evolution.
Collapse
Affiliation(s)
- Zhuohui Wei
- School of Computer Science & Engineering, South China University of Technology, Guangzhou, China
| | - Chang Shu
- School of Computer Science & Engineering, South China University of Technology, Guangzhou, China
| | - Changsheng Zhang
- School of Computer Science & Engineering, South China University of Technology, Guangzhou, China
| | - Jingying Huang
- School of Computer Science & Engineering, South China University of Technology, Guangzhou, China
| | - Hongmin Cai
- School of Computer Science & Engineering, South China University of Technology, Guangzhou, China.
| |
Collapse
|
4
|
Exome Sequencing Landscape Analysis in Ovarian Clear Cell Carcinoma Shed Light on Key Chromosomal Regions and Mutation Gene Networks. THE AMERICAN JOURNAL OF PATHOLOGY 2017; 187:2246-2258. [PMID: 28888422 DOI: 10.1016/j.ajpath.2017.06.012] [Citation(s) in RCA: 88] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Accepted: 06/08/2017] [Indexed: 12/18/2022]
Abstract
Previous studies have reported genome-wide mutation profile analyses in ovarian clear cell carcinomas (OCCCs). This study aims to identify specific novel molecular alterations by combined analyses of somatic mutation and copy number variation. We performed whole exome sequencing of 39 OCCC samples with 16 matching blood tissue samples. Four hundred twenty-six genes had recurrent somatic mutations. Among the 39 samples, ARID1A (62%) and PIK3CA (51%) were frequently mutated, as were genes such as KRAS (10%), PPP2R1A (10%), and PTEN (5%), that have been reported in previous OCCC studies. We also detected mutations in MLL3 (15%), ARID1B (10%), and PIK3R1 (8%), which are associations not previously reported. Gene interaction analysis and functional assessment revealed that mutated genes were clustered into groups pertaining to chromatin remodeling, cell proliferation, DNA repair and cell cycle checkpointing, and cytoskeletal organization. Copy number variation analysis identified frequent amplification in chr8q (64%), chr20q (54%), and chr17q (46%) loci as well as deletion in chr19p (41%), chr13q (28%), chr9q (21%), and chr18q (21%) loci. Integration of the analyses uncovered that frequently mutated or amplified/deleted genes were involved in the KRAS/phosphatidylinositol 3-kinase (82%) and MYC/retinoblastoma (75%) pathways as well as the critical chromatin remodeling complex switch/sucrose nonfermentable (85%). The individual and integrated analyses contribute details about the OCCC genomic landscape, which could lead to enhanced diagnostics and therapeutic options.
Collapse
|
5
|
Cava C, Bertoli G, Castiglioni I. Integrating genetics and epigenetics in breast cancer: biological insights, experimental, computational methods and therapeutic potential. BMC SYSTEMS BIOLOGY 2015; 9:62. [PMID: 26391647 PMCID: PMC4578257 DOI: 10.1186/s12918-015-0211-x] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Accepted: 09/15/2015] [Indexed: 12/11/2022]
Abstract
BACKGROUND Development of human cancer can proceed through the accumulation of different genetic changes affecting the structure and function of the genome. Combined analyses of molecular data at multiple levels, such as DNA copy-number alteration, mRNA and miRNA expression, can clarify biological functions and pathways deregulated in cancer. The integrative methods that are used to investigate these data involve different fields, including biology, bioinformatics, and statistics. RESULTS These methodologies are presented in this review, and their implementation in breast cancer is discussed with a focus on integration strategies. We report current applications, recent studies and interesting results leading to the identification of candidate biomarkers for diagnosis, prognosis, and therapy in breast cancer by using both individual and combined analyses. CONCLUSION This review presents a state of art of the role of different technologies in breast cancer based on the integration of genetics and epigenetics, and shares some issues related to the new opportunities and challenges offered by the application of such integrative approaches.
Collapse
Affiliation(s)
- Claudia Cava
- Institute of Molecular Bioimaging and Physiology (IBFM), National Research Council (CNR), Milan, Italy.
| | - Gloria Bertoli
- Institute of Molecular Bioimaging and Physiology (IBFM), National Research Council (CNR), Milan, Italy.
| | - Isabella Castiglioni
- Institute of Molecular Bioimaging and Physiology (IBFM), National Research Council (CNR), Milan, Italy.
| |
Collapse
|
6
|
Klambauer G, Wischenbart M, Mahr M, Unterthiner T, Mayr A, Hochreiter S. Rchemcpp: a web service for structural analoging in ChEMBL, Drugbank and the Connectivity Map. Bioinformatics 2015; 31:3392-4. [PMID: 26088801 DOI: 10.1093/bioinformatics/btv373] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2015] [Accepted: 06/11/2015] [Indexed: 01/27/2023] Open
Abstract
UNLABELLED We have developed Rchempp, a web service that identifies structurally similar compounds (structural analogs) in large-scale molecule databases. The service allows compounds to be queried in the widely used ChEMBL, DrugBank and the Connectivity Map databases. Rchemcpp utilizes the best performing similarity functions, i.e. molecule kernels, as measures for structural similarity. Molecule kernels have proven superior performance over other similarity measures and are currently excelling at machine learning challenges. To considerably reduce computational time, and thereby make it feasible as a web service, a novel efficient prefiltering strategy has been developed, which maintains the sensitivity of the method. By exploiting information contained in public databases, the web service facilitates many applications crucial for the drug development process, such as prioritizing compounds after screening or reducing adverse side effects during late phases. Rchemcpp was used in the DeepTox pipeline that has won the Tox21 Data Challenge and is frequently used by researchers in pharmaceutical companies. AVAILABILITY AND IMPLEMENTATION The web service and the R package are freely available via http://shiny.bioinf.jku.at/Analoging/ and via Bioconductor. CONTACT hochreit@bioinf.jku.at SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Günter Klambauer
- Institute of Bioinformatics, Johannes Kepler University Linz, Altenbergerstr. 69, 4040 Linz, Austria
| | - Martin Wischenbart
- Institute of Bioinformatics, Johannes Kepler University Linz, Altenbergerstr. 69, 4040 Linz, Austria
| | - Michael Mahr
- Institute of Bioinformatics, Johannes Kepler University Linz, Altenbergerstr. 69, 4040 Linz, Austria
| | - Thomas Unterthiner
- Institute of Bioinformatics, Johannes Kepler University Linz, Altenbergerstr. 69, 4040 Linz, Austria
| | - Andreas Mayr
- Institute of Bioinformatics, Johannes Kepler University Linz, Altenbergerstr. 69, 4040 Linz, Austria
| | - Sepp Hochreiter
- Institute of Bioinformatics, Johannes Kepler University Linz, Altenbergerstr. 69, 4040 Linz, Austria
| |
Collapse
|
7
|
Multimodality vaccination against clade C SHIV: partial protection against mucosal challenges with a heterologous tier 2 virus. Vaccine 2014; 32:6527-36. [PMID: 25245933 DOI: 10.1016/j.vaccine.2014.08.065] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2014] [Revised: 08/14/2014] [Accepted: 08/27/2014] [Indexed: 12/17/2022]
Abstract
We sought to test whether vaccine-induced immune responses could protect rhesus macaques (RMs) against upfront heterologous challenges with an R5 simian-human immunodeficiency virus, SHIV-2873Nip. This SHIV strain exhibits many properties of transmitted HIV-1, such as tier 2 phenotype (relatively difficult to neutralize), exclusive CCR5 tropism, and gradual disease progression in infected RMs. Since no human AIDS vaccine recipient is likely to encounter an HIV-1 strain that exactly matches the immunogens, we immunized the RMs with recombinant Env proteins heterologous to the challenge virus. For induction of immune responses against Gag, Tat, and Nef, we explored a strategy of immunization with overlapping synthetic peptides (OSP). The immune responses against Gag and Tat were finally boosted with recombinant proteins. The vaccinees and a group of ten control animals were given five low-dose intrarectal (i.r.) challenges with SHIV-2873Nip. All controls and seven out of eight vaccinees became systemically infected; there was no significant difference in viremia levels of vaccinees vs. controls. Prevention of viremia was observed in one vaccinee which showed strong boosting of virus-specific cellular immunity during virus exposures. The protected animal showed no challenge virus-specific neutralizing antibodies in the TZM-bl or A3R5 cell-based assays and had low-level ADCC activity after the virus exposures. Microarray data strongly supported a role for cellular immunity in the protected animal. Our study represents a case of protection against heterologous tier 2 SHIV-C by vaccine-induced, virus-specific cellular immune responses.
Collapse
|
8
|
Vandeweyer G, Kooy RF. Detection and interpretation of genomic structural variation in health and disease. Expert Rev Mol Diagn 2014; 13:61-82. [DOI: 10.1586/erm.12.119] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
9
|
Hochreiter S. HapFABIA: identification of very short segments of identity by descent characterized by rare variants in large sequencing data. Nucleic Acids Res 2013; 41:e202. [PMID: 24174545 PMCID: PMC3905877 DOI: 10.1093/nar/gkt1013] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Identity by descent (IBD) can be reliably detected for long shared DNA segments, which are found in related individuals. However, many studies contain cohorts of unrelated individuals that share only short IBD segments. New sequencing technologies facilitate identification of short IBD segments through rare variants, which convey more information on IBD than common variants. Current IBD detection methods, however, are not designed to use rare variants for the detection of short IBD segments. Short IBD segments reveal genetic structures at high resolution. Therefore, they can help to improve imputation and phasing, to increase genotyping accuracy for low-coverage sequencing and to increase the power of association studies. Since short IBD segments are further assumed to be old, they can shed light on the evolutionary history of humans. We propose HapFABIA, a computational method that applies biclustering to identify very short IBD segments characterized by rare variants. HapFABIA is designed to detect short IBD segments in genotype data that were obtained from next-generation sequencing, but can also be applied to DNA microarray data. Especially in next-generation sequencing data, HapFABIA exploits rare variants for IBD detection. HapFABIA significantly outperformed competing algorithms at detecting short IBD segments on artificial and simulated data with rare variants. HapFABIA identified 160 588 different short IBD segments characterized by rare variants with a median length of 23 kb (mean 24 kb) in data for chromosome 1 of the 1000 Genomes Project. These short IBD segments contain 752 000 single nucleotide variants (SNVs), which account for 39% of the rare variants and 23.5% of all variants. The vast majority—152 000 IBD segments—are shared by Africans, while only 19 000 and 11 000 are shared by Europeans and Asians, respectively. IBD segments that match the Denisova or the Neandertal genome are found significantly more often in Asians and Europeans but also, in some cases exclusively, in Africans. The lengths of IBD segments and their sharing between continental populations indicate that many short IBD segments from chromosome 1 existed before humans migrated out of Africa. Thus, rare variants that tag these short IBD segments predate human migration from Africa. The software package HapFABIA is available from Bioconductor. All data sets, result files and programs for data simulation, preprocessing and evaluation are supplied at http://www.bioinf.jku.at/research/short-IBD.
Collapse
Affiliation(s)
- Sepp Hochreiter
- Institute of Bioinformatics, Johannes Kepler University, Linz, Austria
| |
Collapse
|
10
|
Byrareddy SN, Ayash-Rashkovsky M, Kramer VG, Lee SJ, Correll M, Novembre FJ, Villinger F, Johnson WE, von Gegerfelt A, Felber BK, Ruprecht RM. Live attenuated Rev-independent Nef¯SIV enhances acquisition of heterologous SIVsmE660 in acutely vaccinated rhesus macaques. PLoS One 2013; 8:e75556. [PMID: 24098702 PMCID: PMC3787041 DOI: 10.1371/journal.pone.0075556] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2013] [Accepted: 08/14/2013] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Rhesus macaques (RMs) inoculated with live-attenuated Rev-Independent Nef¯ simian immunodeficiency virus (Rev-Ind Nef¯SIV) as adults or neonates controlled viremia to undetectable levels and showed no signs of immunodeficiency over 6-8 years of follow-up. We tested the capacity of this live-attenuated virus to protect RMs against pathogenic, heterologous SIVsmE660 challenges. METHODOLOGY/PRINCIPAL FINDINGS Three groups of four RM were inoculated with Rev-Ind Nef¯SIV and compared. Group 1 was inoculated 8 years prior and again 15 months before low dose intrarectal challenges with SIVsmE660. Group 2 animals were inoculated with Rev-Ind Nef¯SIV at 15 months and Group 3 at 2 weeks prior to the SIVsmE660 challenges, respectively. Group 4 served as unvaccinated controls. All RMs underwent repeated weekly low-dose intrarectal challenges with SIVsmE660. Surprisingly, all RMs with acute live-attenuated virus infection (Group 3) became superinfected with the challenge virus, in contrast to the two other vaccine groups (Groups 1 and 2) (P=0.006 for each) and controls (Group 4) (P=0.022). Gene expression analysis showed significant upregulation of innate immune response-related chemokines and their receptors, most notably CCR5 in Group 3 animals during acute infection with Rev-Ind Nef¯SIV. CONCLUSIONS/SIGNIFICANCE We conclude that although Rev-Ind Nef¯SIV remained apathogenic, acute replication of the vaccine strain was not protective but associated with increased acquisition of heterologous mucosal SIVsmE660 challenges.
Collapse
Affiliation(s)
- Siddappa N. Byrareddy
- Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
| | - Mila Ayash-Rashkovsky
- Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
| | - Victor G. Kramer
- Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
| | - Sandra J. Lee
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
| | - Mick Correll
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- Center for Cancer Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
| | - Francis J. Novembre
- Yerkes National Primate Research Center, Emory University, Atlanta, Georgia, United States of America
- Department of Microbiology and Immunology, Emory University, Atlanta, Georgia, United States of America
| | - Francois Villinger
- Yerkes National Primate Research Center, Emory University, Atlanta, Georgia, United States of America
- Department of Pathology and Laboratory Medicine, Emory Vaccine Center, Emory University, Atlanta, Georgia, United States of America
| | - Welkin E. Johnson
- Biology Department, Boston College, Boston, Massachusetts, United States of America
| | - Agneta von Gegerfelt
- Human Retrovirus Pathogenesis Section, Vaccine Branch, Center for Cancer Research, Frederick, Maryland, United States of America
| | - Barbara K. Felber
- Human Retrovirus Pathogenesis Section, Vaccine Branch, Center for Cancer Research, Frederick, Maryland, United States of America
| | - Ruth M. Ruprecht
- Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
| |
Collapse
|
11
|
Klambauer G, Unterthiner T, Hochreiter S. DEXUS: identifying differential expression in RNA-Seq studies with unknown conditions. Nucleic Acids Res 2013; 41:e198. [PMID: 24049071 PMCID: PMC3834838 DOI: 10.1093/nar/gkt834] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Detection of differential expression in RNA-Seq data is currently limited to studies in which two or more sample conditions are known a priori. However, these biological conditions are typically unknown in cohort, cross-sectional and nonrandomized controlled studies such as the HapMap, the ENCODE or the 1000 Genomes project. We present DEXUS for detecting differential expression in RNA-Seq data for which the sample conditions are unknown. DEXUS models read counts as a finite mixture of negative binomial distributions in which each mixture component corresponds to a condition. A transcript is considered differentially expressed if modeling of its read counts requires more than one condition. DEXUS decomposes read count variation into variation due to noise and variation due to differential expression. Evidence of differential expression is measured by the informative/noninformative (I/NI) value, which allows differentially expressed transcripts to be extracted at a desired specificity (significance level) or sensitivity (power). DEXUS performed excellently in identifying differentially expressed transcripts in data with unknown conditions. On 2400 simulated data sets, I/NI value thresholds of 0.025, 0.05 and 0.1 yielded average specificities of 92, 97 and 99% at sensitivities of 76, 61 and 38%, respectively. On real-world data sets, DEXUS was able to detect differentially expressed transcripts related to sex, species, tissue, structural variants or quantitative trait loci. The DEXUS R package is publicly available from Bioconductor and the scripts for all experiments are available at http://www.bioinf.jku.at/software/dexus/.
Collapse
Affiliation(s)
- Günter Klambauer
- Institute of Bioinformatics, Johannes Kepler University, A-4040 Linz, Austria
| | | | | |
Collapse
|
12
|
Valsesia A, Macé A, Jacquemont S, Beckmann JS, Kutalik Z. The Growing Importance of CNVs: New Insights for Detection and Clinical Interpretation. Front Genet 2013; 4:92. [PMID: 23750167 PMCID: PMC3667386 DOI: 10.3389/fgene.2013.00092] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2013] [Accepted: 05/04/2013] [Indexed: 02/03/2023] Open
Abstract
Differences between genomes can be due to single nucleotide variants, translocations, inversions, and copy number variants (CNVs, gain or loss of DNA). The latter can range from sub-microscopic events to complete chromosomal aneuploidies. Small CNVs are often benign but those larger than 500 kb are strongly associated with morbid consequences such as developmental disorders and cancer. Detecting CNVs within and between populations is essential to better understand the plasticity of our genome and to elucidate its possible contribution to disease. Hence there is a need for better-tailored and more robust tools for the detection and genome-wide analyses of CNVs. While a link between a given CNV and a disease may have often been established, the relative CNV contribution to disease progression and impact on drug response is not necessarily understood. In this review we discuss the progress, challenges, and limitations that occur at different stages of CNV analysis from the detection (using DNA microarrays and next-generation sequencing) and identification of recurrent CNVs to the association with phenotypes. We emphasize the importance of germline CNVs and propose strategies to aid clinicians to better interpret structural variations and assess their clinical implications.
Collapse
Affiliation(s)
- Armand Valsesia
- Genetics Core, Nestlé Institute of Health Sciences Lausanne, Switzerland
| | | | | | | | | |
Collapse
|
13
|
Li W, Olivier M. Current analysis platforms and methods for detecting copy number variation. Physiol Genomics 2012; 45:1-16. [PMID: 23132758 DOI: 10.1152/physiolgenomics.00082.2012] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Copy number variation (CNV), generated through duplication or deletion events that affect one or more loci, is widespread in the human genomes and is often associated with functional consequences that may include changes in gene expression levels or fusion of genes. Genome-wide association studies indicate that some disease phenotypes and physiological pathways might be impacted by CNV in a small number of characterized genomic regions. However, the pervasiveness and full impact of such variation remains unclear. Suitable analytic methods are needed to thoroughly mine human genomes for genomic structural variation, and to explore the interplay between observed CNV and disease phenotypes, but many medical researchers are unfamiliar with the features and nuances of recently developed technologies for detecting CNV. In this article, we evaluate a suite of commonly used and recently developed approaches to uncovering genome-wide CNVs and discuss the relative merits of each.
Collapse
Affiliation(s)
- Wenli Li
- Biotechnology and Bioengineering Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | | |
Collapse
|
14
|
Valsesia A, Stevenson BJ, Waterworth D, Mooser V, Vollenweider P, Waeber G, Jongeneel CV, Beckmann JS, Kutalik Z, Bergmann S. Identification and validation of copy number variants using SNP genotyping arrays from a large clinical cohort. BMC Genomics 2012; 13:241. [PMID: 22702538 PMCID: PMC3464625 DOI: 10.1186/1471-2164-13-241] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2011] [Accepted: 06/15/2012] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND Genotypes obtained with commercial SNP arrays have been extensively used in many large case-control or population-based cohorts for SNP-based genome-wide association studies for a multitude of traits. Yet, these genotypes capture only a small fraction of the variance of the studied traits. Genomic structural variants (GSV) such as Copy Number Variation (CNV) may account for part of the missing heritability, but their comprehensive detection requires either next-generation arrays or sequencing. Sophisticated algorithms that infer CNVs by combining the intensities from SNP-probes for the two alleles can already be used to extract a partial view of such GSV from existing data sets. RESULTS Here we present several advances to facilitate the latter approach. First, we introduce a novel CNV detection method based on a Gaussian Mixture Model. Second, we propose a new algorithm, PCA merge, for combining copy-number profiles from many individuals into consensus regions. We applied both our new methods as well as existing ones to data from 5612 individuals from the CoLaus study who were genotyped on Affymetrix 500K arrays. We developed a number of procedures in order to evaluate the performance of the different methods. This includes comparison with previously published CNVs as well as using a replication sample of 239 individuals, genotyped with Illumina 550K arrays. We also established a new evaluation procedure that employs the fact that related individuals are expected to share their CNVs more frequently than randomly selected individuals. The ability to detect both rare and common CNVs provides a valuable resource that will facilitate association studies exploring potential phenotypic associations with CNVs. CONCLUSION Our new methodologies for CNV detection and their evaluation will help in extracting additional information from the large amount of SNP-genotyping data on various cohorts and use this to explore structural variants and their impact on complex traits.
Collapse
Affiliation(s)
- Armand Valsesia
- Department of Medical Genetics, University of Lausanne, Lausanne, Switzerland
| | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Wang Q, Peng P, Qian M, Wan L, Deng M. Hybridization and amplification rate correction for affymetrix SNP arrays. BMC Med Genomics 2012; 5:24. [PMID: 22691279 PMCID: PMC3428662 DOI: 10.1186/1755-8794-5-24] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2012] [Accepted: 06/12/2012] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Copy number variation (CNV) is essential to understand the pathology of many complex diseases at the DNA level. Affymetrix SNP arrays, which are widely used for CNV studies, significantly depend on accurate copy number (CN) estimation. Nevertheless, CN estimation may be biased by several factors, including cross-hybridization and training sample batch, as well as genomic waves of intensities induced by sequence-dependent hybridization rate and amplification efficiency. Since many available algorithms only address one or two of the three factors, a high false discovery rate (FDR) often results when identifying CNV. Therefore, we have developed a new CNV detection pipeline which is based on hybridization and amplification rate correction (CNVhac). METHODS CNVhac first estimates the allelic concentrations (ACs) of target sequences by using the sample independent parameters trained through physicochemical hybridization law. Then the raw CN is estimated by taking the ratio of AC to the corresponding average AC from a reference sample set for one specific site. Finally, a hidden Markov model (HMM) segmentation process is implemented to detect CNV regions. RESULTS Based on public HapMap data, the results show that CNVhac effectively smoothes the genomic waves and facilitates more accurate raw CN estimates compared to other methods. Moreover, CNVhac alleviates, to a certain extent, the sample dependence of inference and makes CNV calling with appreciable low FDRs. CONCLUSION CNVhac is an effective approach to address the common difficulties in SNP array analysis, and the working principles of CNVhac can be easily extended to other platforms.
Collapse
Affiliation(s)
- Quan Wang
- Center for Theoretical Biology, Peking University, Beijing 100871, People's Republic of China
| | | | | | | | | |
Collapse
|
16
|
Klambauer G, Schwarzbauer K, Mayr A, Clevert DA, Mitterecker A, Bodenhofer U, Hochreiter S. cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate. Nucleic Acids Res 2012; 40:e69. [PMID: 22302147 PMCID: PMC3351174 DOI: 10.1093/nar/gks003] [Citation(s) in RCA: 317] [Impact Index Per Article: 24.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Quantitative analyses of next-generation sequencing (NGS) data, such as the detection of copy number variations (CNVs), remain challenging. Current methods detect CNVs as changes in the depth of coverage along chromosomes. Technological or genomic variations in the depth of coverage thus lead to a high false discovery rate (FDR), even upon correction for GC content. In the context of association studies between CNVs and disease, a high FDR means many false CNVs, thereby decreasing the discovery power of the study after correction for multiple testing. We propose ‘Copy Number estimation by a Mixture Of PoissonS’ (cn.MOPS), a data processing pipeline for CNV detection in NGS data. In contrast to previous approaches, cn.MOPS incorporates modeling of depths of coverage across samples at each genomic position. Therefore, cn.MOPS is not affected by read count variations along chromosomes. Using a Bayesian approach, cn.MOPS decomposes variations in the depth of coverage across samples into integer copy numbers and noise by means of its mixture components and Poisson distributions, respectively. The noise estimate allows for reducing the FDR by filtering out detections having high noise that are likely to be false detections. We compared cn.MOPS with the five most popular methods for CNV detection in NGS data using four benchmark datasets: (i) simulated data, (ii) NGS data from a male HapMap individual with implanted CNVs from the X chromosome, (iii) data from HapMap individuals with known CNVs, (iv) high coverage data from the 1000 Genomes Project. cn.MOPS outperformed its five competitors in terms of precision (1–FDR) and recall for both gains and losses in all benchmark data sets. The software cn.MOPS is publicly available as an R package at http://www.bioinf.jku.at/software/cnmops/ and at Bioconductor.
Collapse
Affiliation(s)
- Günter Klambauer
- Institute of Bioinformatics, Johannes Kepler University, A-4040 Linz, Austria
| | | | | | | | | | | | | |
Collapse
|