1
|
Abstract
BACKGROUND The use of high-throughput sequencing data has improved the results of genomic analysis due to the resolution of mapping algorithms. Although several tools for copy-number variation calling in whole genome sequencing have been published, the noisy nature of sequencing data is still a limitation for accuracy and concordance among such tools. To assess the performance of PennCNV original algorithm for array data in whole genome sequencing data, we processed mapping (BAM) files to extract coverage, representing log R ratio (LRR) of signal intensity, and B allele frequency (BAF). RESULTS We used high quality sample NA12878 from the recently reported NIST database and created 10 artificial samples with several CNVs spread along all chromosomes. We compared PennCNV-Seq with other tools with general deletions and duplications, as well as for different number of copies and copy-neutral loss-of-heterozygosity (LOH). CONCLUSION PennCNV-Seq was able to find correct CNVs and can be integrated in existing CNV calling pipelines to report accurately the number of copies in specific genomic regions.
Collapse
|
Journal Article |
8 |
12 |
2
|
Abstract
High-resolution single-nucleotide polymorphism (SNP) genotyping arrays offer a sensitive and affordable method for genome-wide detection of copy number variants (CNVs). PennCNV is a hidden Markov model (HMM)-based CNV caller for SNP arrays, first released 10 years ago. A typical CNV calling procedure using PennCNV includes preparation of input files, CNV calling, filtering CNV calls, CNV annotation, and CNV visualization. Here we describe several protocols for CNV calling using PennCNV, together with descriptions on several recent improvements to the software tool.
Collapse
|
|
7 |
7 |
3
|
Analysis of copy number variations in Mexican Holstein cattle using axiom genome-wide Bos 1 array. GENOMICS DATA 2015; 7:97-100. [PMID: 26981375 PMCID: PMC4778655 DOI: 10.1016/j.gdata.2015.12.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/25/2015] [Accepted: 12/15/2015] [Indexed: 01/09/2023]
Abstract
Recently, for copy number variation (CNV) analysis, bovine researchers have focused mainly on the use of genome-wide SNP genotyping arrays. One of the highest densities commercially available SNPchips for cattle is the Affymetrix axiom genome-wide Bos 1, which assays 648,315 informative SNPs across the whole bovine genome. Here, we describe the microarray data, quality controls and validation implemented in a study published in Genetics and Molecular Research Journal in 2015 [1]. The microarray raw data has been deposited into Gene Expression Omnibus under accession #GSE54813.
Collapse
|
Journal Article |
10 |
5 |
4
|
Wang Y, Zhang T, Wang C. Detection and analysis of genome-wide copy number variation in the pig genome using an 80 K SNP Beadchip. J Anim Breed Genet 2019; 137:166-176. [PMID: 31506991 DOI: 10.1111/jbg.12435] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 08/02/2019] [Accepted: 08/05/2019] [Indexed: 12/23/2022]
Abstract
Copy number variation (CNV) is an important source of genetic variability in human or animal genomes and play key roles in phenotypic diversity and disease susceptibility. In the present study, we performed a genome-wide analysis for CNV detection using SNP genotyping data of 857 Large White pigs. A total of 312 CNV regions (CNVRs) were detected with the PennCNV algorithm, which covered 57.76 Mb of the pig genome and correspond to 2.36% of the genome sequence. The length of the CNVRs on autosomes ranged from 1.77 Kb to 1.76 Mb with an average of 185.11 Kb. Of these, 220 completely or partially overlapped with 1,092 annotated genes, which enriched a wide variety of biological processes. Comparisons with previously reported pig CNVR revealed 92 (29.49%) novel CNVRs. Experimentally, 80% of CNVRs selected randomly were validated by quantitative PCR (qPCR). We also performed an association analysis between some of the CNVRs and reproductive traits, with results demonstrating the potential importance of CNVR61 and CNVR283 associated with litter sizes. Notably, the GPER1 gene located in CNVR61 plays a key role in reproduction. Our study is an important complement to the CNV map in the pig genome and provides valuable information for investigating the association between genomic variation and economic traits.
Collapse
|
Journal Article |
6 |
5 |
5
|
Concordance rate between copy number variants detected using either high- or medium-density single nucleotide polymorphism genotype panels and the potential of imputing copy number variants from flanking high density single nucleotide polymorphism haplotypes in cattle. BMC Genomics 2020; 21:205. [PMID: 32131735 PMCID: PMC7057620 DOI: 10.1186/s12864-020-6627-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Accepted: 02/26/2020] [Indexed: 12/01/2022] Open
Abstract
Background The trading of individual animal genotype information often involves only the exchange of the called genotypes and not necessarily the additional information required to effectively call structural variants. The main aim here was to determine if it is possible to impute copy number variants (CNVs) using the flanking single nucleotide polymorphism (SNP) haplotype structure in cattle. While this objective was achieved using high-density genotype panels (i.e., 713,162 SNPs), a secondary objective investigated the concordance of CNVs called with this high-density genotype panel compared to CNVs called from a medium-density panel (i.e., 45,677 SNPs in the present study). This is the first study to compare CNVs called from high-density and medium-density SNP genotypes from the same animals. High (and medium-density) genotypes were available on 991 Holstein-Friesian, 1015 Charolais, and 1394 Limousin bulls. The concordance between CNVs called from the medium-density and high-density genotypes were calculated separately for each animal. A subset of CNVs which were called from the high-density genotypes was selected for imputation. Imputation was carried out separately for each breed using a set of high-density SNPs flanking the midpoint of each CNV. A CNV was deemed to be imputed correctly when the called copy number matched the imputed copy number. Results For 97.0% of CNVs called from the high-density genotypes, the corresponding genomic position on the medium-density of the animal did not contain a called CNV. The average accuracy of imputation for CNV deletions was 0.281, with a standard deviation of 0.286. The average accuracy of imputation of the CNV normal state, i.e. the absence of a CNV, was 0.982 with a standard deviation of 0.022. Two CNV duplications were imputed in the Charolais, a single CNV duplication in the Limousins, and a single CNV duplication in the Holstein-Friesians; in all cases the CNV duplications were incorrectly imputed. Conclusion The vast majority of CNVs called from the high-density genotypes were not detected using the medium-density genotypes. Furthermore, CNVs cannot be accurately predicted from flanking SNP haplotypes, at least based on the imputation algorithms routinely used in cattle, and using the SNPs currently available on the high-density genotype panel.
Collapse
|
Journal Article |
5 |
5 |
6
|
Igoshin AV, Deniskova TE, Yurchenko AA, Yudin NS, Dotsev AV, Selionova MI, Zinovieva NA, Larkin DM. Copy number variants in genomes of local sheep breeds from Russia. Anim Genet 2021; 53:119-132. [PMID: 34904242 DOI: 10.1111/age.13163] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/28/2021] [Indexed: 01/21/2023]
Abstract
Copy number variants (CNVs) are genomic structural variations that contribute to many adaptive and economically important traits in livestock. In this study, we detected CNVs in 354 animals from 16 Russian indigenous sheep breeds and analysed their possible functional roles. Our analysis of the entire sample set resulted in 4527 CNVs forming 1450 CNV regions (CNVRs). When constructing CNVRs for individual breeds, a total of 2715 regions ranging from 88 in Groznensk to 337 in Osetin breeds were identified. To make interbreed CNVR frequency comparison possible, we also identified core CNVRs using CNVs with overlapping chromosomal locations found in different breeds. This resulted in 137 interbreed CNVRs with frequency >15% in at least one breed. Functional enrichment analysis of genes affected by CNVRs in individual breeds revealed 12 breeds with significant enrichments in olfactory perception, PRAME family proteins, and immune response. Function of genes affected by interbreed and breed-specific CNVRs revealed candidates related to domestication, adaptation to high altitudes and cold climates, reproduction, parasite resistance, milk and meat qualities, wool traits, fat storage, and fat metabolism. Our work is the first attempt to uncover and characterise the CNV makeup of Russian indigenous sheep breeds. Further experimental and functional validation of CNVRs would help in developing new and improving existing sheep breeds.
Collapse
|
|
4 |
4 |
7
|
Lepamets M, Auwerx C, Nõukas M, Claringbould A, Porcu E, Kals M, Jürgenson T, Morris AP, Võsa U, Bochud M, Stringhini S, Wijmenga C, Franke L, Peterson H, Vilo J, Lepik K, Mägi R, Kutalik Z. Omics-informed CNV calls reduce false-positive rates and improve power for CNV-trait associations. HGG ADVANCES 2022; 3:100133. [PMID: 36035246 PMCID: PMC9399386 DOI: 10.1016/j.xhgg.2022.100133] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 07/26/2022] [Indexed: 11/29/2022] Open
Abstract
Copy-number variations (CNV) are believed to play an important role in a wide range of complex traits, but discovering such associations remains challenging. While whole-genome sequencing (WGS) is the gold-standard approach for CNV detection, there are several orders of magnitude more samples with available genotyping microarray data. Such array data can be exploited for CNV detection using dedicated software (e.g., PennCNV); however, these calls suffer from elevated false-positive and -negative rates. In this study, we developed a CNV quality score that weights PennCNV calls (pCNVs) based on their likelihood of being true positive. First, we established a measure of pCNV reliability by leveraging evidence from multiple omics data (WGS, transcriptomics, and methylomics) obtained from the same samples. Next, we built a predictor of omics-confirmed pCNVs, termed omics-informed quality score (OQS), using only PennCNV software output parameters. Promisingly, OQS assigned to pCNVs detected in close family members was up to 35% higher than the OQS of pCNVs not carried by other relatives (p < 3.0 × 10-90), outperforming other scores. Finally, in an association study of four anthropometric traits in 89,516 Estonian Biobank samples, the use of OQS led to a relative increase in the trait variance explained by CNVs of up to 56% compared with published quality filtering methods or scores. Overall, we put forward a flexible framework to improve any CNV detection method leveraging multi-omics evidence, applied it to improve PennCNV calls, and demonstrated its utility by improving the statistical power for downstream association analyses.
Collapse
|
research-article |
3 |
4 |
8
|
Ahmad SF, Singh A, Panda S, Malla WA, Kumar A, Dutt T. Genome-wide elucidation of CNV regions and their association with production and reproduction traits in composite Vrindavani cattle. Gene 2022; 830:146510. [PMID: 35447249 DOI: 10.1016/j.gene.2022.146510] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 03/23/2022] [Accepted: 04/14/2022] [Indexed: 11/17/2022]
Abstract
The present study was aimed to analyze the genome-wide copy number variations (CNVs) in Vrindavani composite cattle and concatenate them into CNV regions (CNVRs), and finally test the association of CNVRs with different production and reproduction traits. Genotypic data, generated on BovineSNP50 Beadchip (v3) array for 96 Vrindavani animals, was used to elucidate the CNVs at the genome level. Intensity data covering over 53,218 SNP genotypes on bovine genome was used. Algorithm based on Hidden Markov Model was employed in PennCNV program to detect, normalize and filter CNVs across the genome. 252 putative CNVs, detected via PennCNV program, in different individuals were concatenated into 71 CNV regions (CNVRs) using CNVRuler program. Association of CNVRs with important (re)production traits in Vrindavani animals was assessed using linear regression. Five CNVRs were found to be significantly associated with ten important (re)production traits. The genes harbored in these regions provided useful insights into the association of CNVRs with genes and ultimately the variation at phenotype level. Important genes that overlapped with CNVRs included WASHC4, HS6ST3, MBNL2, TOLLIP, PIDD1 and TSPAN4. Furthermore, the CNVRs were found to overlap with important QTLs available in AnimalQTL database which affect milk yield and composition along with reproduction and immune function traits. The copy number states of three enes were validated using digital droplet PCR technique. The results from the present study significantly enhance the understanding about CNVs in Vrindavani cattle and should help establish its CNV map. The study will also enable further investigation on association of these variants with important traits of economic interest including disease incidence.
Collapse
|
|
3 |
4 |
9
|
Gregory MD, Kolachana B, Yao Y, Nash T, Dickinson D, Eisenberg DP, Mervis CB, Berman KF. A method for determining haploid and triploid genotypes and their association with vascular phenotypes in Williams syndrome and 7q11.23 duplication syndrome. BMC MEDICAL GENETICS 2018; 19:53. [PMID: 29614955 PMCID: PMC5883342 DOI: 10.1186/s12881-018-0563-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/19/2017] [Accepted: 03/19/2018] [Indexed: 12/28/2022]
Abstract
Background Williams syndrome ([WS], 7q11.23 hemideletion) and 7q11.23 duplication syndrome (Dup7) show contrasting syndromic symptoms. However, within each group there is considerable interindividual variability in the degree to which these phenotypes are expressed. Though software exists to identify areas of copy number variation (CNV) from commonly-available SNP-chip data, this software does not provide non-diploid genotypes in CNV regions. Here, we describe a method for identifying haploid and triploid genotypes in CNV regions, and then, as a proof-of-concept for applying this information to explain clinical variability, we test for genotype-phenotype associations. Methods Blood samples for 25 individuals with WS and 13 individuals with Dup7 were genotyped with Illumina-HumanOmni5M SNP-chips. PennCNV and in-house code were used to make genotype calls for each SNP in the 7q11.23 locus. We tested for association between the presence of aortic arteriopathy and genotypes of the remaining (haploid in WS) or duplicated (triploid in Dup7) alleles. Results Haploid calls in the 7q11.23 region were made for 99.0% of SNPs in the WS group, and triploid calls for 98.8% of SNPs in those with Dup7. The G allele of SNP rs2528795 in the ELN gene was associated with aortic stenosis in WS participants (p < 0.0049) while the A allele of the same SNP was associated with aortic dilation in Dup7. Conclusions Commonly available SNP-chip information can be used to make haploid and triploid calls in individuals with CNVs and then to relate variability in specific genes to variability in syndromic phenotypes, as demonstrated here using aortic arteriopathy. This work sets the stage for similar genotype-phenotype analyses in CNVs where phenotypes may be more complex and/or where there is less information about genetic mechanisms. Electronic supplementary material The online version of this article (10.1186/s12881-018-0563-3) contains supplementary material, which is available to authorized users.
Collapse
|
Research Support, Non-U.S. Gov't |
7 |
3 |
10
|
Panda S, Kumar A, Gaur GK, Ahmad SF, Chauhan A, Mehrotra A, Dutt T. Genome wide copy number variations using Porcine 60K SNP Beadchip in Landlly pigs. Anim Biotechnol 2023; 34:1891-1899. [PMID: 35369845 DOI: 10.1080/10495398.2022.2056047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
In the present study, Porcine 60K SNP genotype data from 69 Landlly pigs were used to explore Copy Number Variations (CNVs) across the autosomes. A total of 386 CNVs were identified using Hidden Markov Model (HMM) in PennCNV software, which were subsequently aggregated to 115 CNV regions (CNVRs). Among the total detected CNVRs, 58 gain, 49 were loss type while remaining 8 events were both gain and loss types. Identified CNVRs covered 12.5 Mb (0.55%) of Sus scrofa reference 11.1 genome. Comparison of our results with previous investigations on pigs revealed that approximately 75% CNVRs were novel, which may be due to differences in genetic background, environment and implementation of artificial selection in Landlly pigs. Functional annotation and pathway analysis showed the significant enrichment of 267 well-annotated Sus scrofa genes in CNVRs. These genes were involved in different biological functions like sensory perception, meat quality traits, back fat thickness and immunity. Additionally, KIT and FUT1 were two major genes detected on CNVR in our population. This investigation provided a comprehensive overview of CNV distribution in the Indian porcine genome for the first time, which may be useful for further investigating the association of important quantitative traits in Landlly pigs.Highlights115 CNVRs were identified in 69 Landlly pig population.Approximately 75% detected CNVRs were novel for Landlly population.Significant enrichment of 267 well-annotated Sus scrofa genes observed in these CNVRs.These genes were involved in different biological functions like sensory perception, meat quality traits, back fat thickness and immunity.Comprehensive CNV map in the Indian porcine genome developed for the first time.
Collapse
|
Review |
2 |
|
11
|
Rodriguez S, Al-Ghamdi OA, Guthrie PA, Shihab HA, McArdle W, Gaunt T, Alharbi KK, Day IN. Frequency of KLK3 gene deletions in the general population. Ann Clin Biochem 2016; 54:472-480. [PMID: 27555663 DOI: 10.1177/0004563216666999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Background One of the kallikrein genes ( KLK3) encodes prostate-specific antigen, a key biomarker for prostate cancer. A number of factors, both genetic and non-genetic, determine variation of serum prostate-specific antigen concentrations in the population. We have recently found three KLK3 deletions in individuals with very low prostate-specific antigen concentrations, suggesting a link between abnormally reduced KLK3 expression and deletions of KLK3. Here, we aim to determine the frequency of kallikrein gene 3 deletions in the general population. Methods The frequency of KLK3 deletions in the general population was estimated from the 1958 Birth Cohort sample ( n = 3815) using amplification ratiometry control system. In silico analyses using PennCNV were carried out in the same cohort and in NBS-WTCCC2 in order to provide an independent estimation of the frequency of KLK3 deletions in the general population. Results Amplification ratiometry control system results from the 1958 cohort indicated a frequency of KLK3 deletions of 0.81% (3.98% following a less stringent calling criterion). From in silico analyses, we found that potential deletions harbouring the KLK3 gene occurred at rates of 2.13% (1958 Cohort, n = 2867) and 0.99% (NBS-WTCCC2, n = 2737), respectively. These results are in good agreement with our in vitro experiments. All deletions found were in heterozygosis. Conclusions We conclude that a number of individuals from the general population present KLK3 deletions in heterozygosis. Further studies are required in order to know if interpretation of low serum prostate-specific antigen concentrations in individuals with KLK3 deletions may offer false-negative assurances with consequences for prostate cancer screening, diagnosis and monitoring.
Collapse
|
|
9 |
|