1
|
Yan Z, Song K, Wang P, Gun S, Long X. Evaluation of the Genetic Diversity and Population Structure of Four Native Pig Populations in Gansu Province. Int J Mol Sci 2023; 24:17154. [PMID: 38138983 PMCID: PMC10743271 DOI: 10.3390/ijms242417154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 11/30/2023] [Accepted: 12/04/2023] [Indexed: 12/24/2023] Open
Abstract
Indigenous pig populations, including Bamei pigs (BM), Hezuo pigs (HZ), Huixian Qingni Black pigs (HX), and Minxian Black pigs (MX) in Gansu Province, live in a particular climate and a relatively closed geographical environment. These local pig breeds are characterized by excellent characteristics (e.g., cold tolerance, robust disease resistance, and superior meat quality). In the past few years, pig populations in Gansu Province have decreased significantly because of their poor lean meat percentage, high fat content, and slow growth rate. Maintaining the diversity of these four breeds can act as a source of new alleles to be incorporated into commercial breeds which are more susceptible to disease and less adaptable to changing conditions because of inbreeding. Genomic data analysis is adequate for determining the genetic diversity and livestock breeding population structure, even in local pig populations. However, the genetic diversity and population structure of the four native pig populations in Gansu Province are still unknown. Thus, we used "Zhongxin-I" porcine chip for the SNP detection of 102 individuals living on four pig conservation farms. A total of 57,466 SNPs were identified among the four pig breeds. The linkage disequilibrium (LD) plot showed that MX had the highest level of LD, followed by BM, HZ, and HX. The observed heterozygosity (Ho) in all four populations was higher than the expected heterozygosity (He). A principal component analysis (PCA) demonstrated that the four local pig populations were isolated. The identity displayed by the state matrix and G matrix heat map results indicated that small numbers of individuals among the four pig breeds had a high genetic distance and weak genetic relationships. The results of the population genetic structure of BM, HZ, HX, and MX pigs showed a slight genetic diversity loss. Our findings enabled us to better understand the genome characteristics of these four indigenous pig populations, which will provide novel insights for the future germplasm conservation and utilization of these indigenous pig populations.
Collapse
Affiliation(s)
- Zunqiang Yan
- College of Animal Science and Technology, Gansu Agricultural University, Lanzhou 730070, China; (Z.Y.); (K.S.); (P.W.)
| | - Kelin Song
- College of Animal Science and Technology, Gansu Agricultural University, Lanzhou 730070, China; (Z.Y.); (K.S.); (P.W.)
| | - Pengfei Wang
- College of Animal Science and Technology, Gansu Agricultural University, Lanzhou 730070, China; (Z.Y.); (K.S.); (P.W.)
| | - Shuangbao Gun
- College of Animal Science and Technology, Gansu Agricultural University, Lanzhou 730070, China; (Z.Y.); (K.S.); (P.W.)
| | - Xi Long
- Chongqing Academy of Animal Sciences, Chongqing 402460, China
| |
Collapse
|
2
|
Wijayanti D, Zhang S, Yang Y, Bai Y, Akhatayeva Z, Pan C, Zhu H, Qu L, Lan X. Goat SMAD family member 1 (SMAD1): mRNA expression, genetic variants, and their associations with litter size. Theriogenology 2022; 193:11-19. [PMID: 36116245 DOI: 10.1016/j.theriogenology.2022.09.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 08/24/2022] [Accepted: 09/02/2022] [Indexed: 12/20/2022]
Abstract
SMAD family member 1 (SMAD1) is phosphorylated and activated by the BMP receptors, which help regulate ovulation rate, cell growth, apoptosis, and development. Previously, the genome-wide association study revealed that it has been associated with fecundity in sheep. However, its effect on litter size has not been investigated in goats. Therefore, this study aimed to determine the level of SMAD1 mRNA expression in various tissues and to identify its polymorphisms and their association with litter size in Shaanbei white cashmere goat (SBWC). As a result, RT-qPCR analysis showed that SMAD1 was expressed in various tissues in female SBWC goats, including the ovary (P < 0.05). Importantly, the mRNA expression level in the ovaries of mothers of multi-lambs had a higher level than the mothers of single lambs (P < 0.05). Moreover, two InDels (18-bp and 7-bp) in intron 1 of SMAD1 were polymorphic among ten potential loci. Both 18-bp and 7-bp InDels were significantly correlated with litter size (P = 0.014) and (P = 0.0001), respectively. As shown by the chi-squared test, genotypic distributions of 18-bp and 7-bp were significantly distinct between single-lamb (P = 0.02) and multi-lamb mothers (P = 0.002). Our findings confirm that two InDels in SMAD1 were significantly associated with litter size and suggest that they could be used to improve fertility traits in goat breeding strategies.
Collapse
Affiliation(s)
- Dwi Wijayanti
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, 712100, China; Department of Animal Science, Perjuangan University of Tasikmalaya, Tasikmalaya, West Java, 46115, Indonesia.
| | - Sihuan Zhang
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, 712100, China.
| | - Yuta Yang
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, 712100, China.
| | - Yangyang Bai
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, 712100, China.
| | - Zhanerke Akhatayeva
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, 712100, China.
| | - Chuanying Pan
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, 712100, China.
| | - Haijing Zhu
- Shaanxi Provincial Engineering and Technology Research Center of Cashmere Goats, Yulin University, Yulin, Shaanxi, 719000, PR China; Life Science Research Center, Yulin University, Yulin, Shaanxi, 719000, PR China.
| | - Lei Qu
- Shaanxi Provincial Engineering and Technology Research Center of Cashmere Goats, Yulin University, Yulin, Shaanxi, 719000, PR China; Life Science Research Center, Yulin University, Yulin, Shaanxi, 719000, PR China.
| | - Xianyong Lan
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, 712100, China.
| |
Collapse
|
3
|
Genetic Variations and mRNA Expression of Goat DNAH1 and Their Associations with Litter Size. Cells 2022; 11:cells11081371. [PMID: 35456050 PMCID: PMC9024473 DOI: 10.3390/cells11081371] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 04/02/2022] [Accepted: 04/07/2022] [Indexed: 12/24/2022] Open
Abstract
Dynein Axonemal Heavy Chain 1 (DNAH1) encodes proteins which provide structural support for the physiological function and motor structure of spermatozoa (hereafter referred to as sperm) and ova. This study found that three single nucleotide polymorphisms (SNPs), the 27-bp insertion/deletion (InDel) mutations and three exonic copy number variations (CNVs) within DNAH1 were significantly associated with litter size of Shaanbei white cashmere goats (n = 1101). Goats with the wildtypes of these three SNPs had higher litter sizes than other carriers (p < 0.05). II genotype of the 27-bp InDel had the highest litter size compared with ID carriers (p = 0.000022). The gain genotype had the largest litter sizes compared with the loss or medium carriers for the three CNV mutations (p < 0.01). Individuals with the AA-TT-CC-II-M1-M2-M3 and AA-TT-CC-II-G1-G2-M3 combination genotypes had larger litter sizes compared with the other genotypes. This study also showed the DNAH1 expression in mothers of multiple kids was higher than mothers of single kids. These three SNPs, the 27-bp InDel and three CNVs in DNAH1 could be used as molecular markers for the selection of goat reproductive traits.
Collapse
|
4
|
Zheng J, Deng T, Jiang E, Li J, Wijayanti D, Wang Y, Ding X, Lan X. Genetic variations of bovine PCOS-related DENND1A gene identified in GWAS significantly affect female reproductive traits. Gene 2021; 802:145867. [PMID: 34352299 DOI: 10.1016/j.gene.2021.145867] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 07/13/2021] [Accepted: 07/30/2021] [Indexed: 12/20/2022]
Abstract
Genome-wide association studies (GWAS) have identified DENND1A as a potential candidate gene linked to the fertility-related phenotypes in dairy cows. However, to date, no studies have examined the association of the DENND1A insertion/deletions (indels) to bovine fertility on a large scale. Herein, two indel sites, including P4-del-26-bp and P8-ins-15-bp were identified in 1064 Holstein cows. The values of the minor allelic frequency (MAF) ranged between 0.471 (deletion) and 0.230 (deletion), respectively, and combined four different haplotypes by analyzing the haplotype combination. It is noteworthy that P4-del-26-bp is associated with the ovarian width (P = 0.0004) and corpus luteum diameter (P = 0.004). Meanwhile, P8-ins-15-bp was found to have a significant association with the ovarian width (P = 0.020), ovarian weight (P = 0.004), the number of mature follicles (P = 0.020), and diameter of the mature follicles (P = 0.016). Furthermore, the combinatorial analysis showed that the two indel combined-genotypes were significantly related to several reproductive traits (ovarian width, ovarian weight, etc.). Collectively, our findings indicated that these two novel indels and their combinations are correlated with the reproductive traits, and hence, they can serve in the marker-assisted selection (MAS) in cattle breeding. Nevertheless, further functional experiments are needed for understanding the mechanisms of these indels in cattle reproduction in a better way.
Collapse
Affiliation(s)
- Juanshan Zheng
- Key Laboratory of Yak Breeding Engineering, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China; Laboratory of Animal Genome and Gene Function, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Tianyu Deng
- Laboratory of Animal Genome and Gene Function, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Enhui Jiang
- Laboratory of Animal Genome and Gene Function, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Jie Li
- Laboratory of Animal Genome and Gene Function, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Dwi Wijayanti
- Laboratory of Animal Genome and Gene Function, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Yongsheng Wang
- College of Veterinary Medicine, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Xuezhi Ding
- Key Laboratory of Yak Breeding Engineering, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou 730050, China.
| | - Xianyong Lan
- Laboratory of Animal Genome and Gene Function, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China.
| |
Collapse
|
5
|
Wang Z, Pan Y, He L, Song X, Chen H, Pan C, Qu L, Zhu H, Lan X. Multiple morphological abnormalities of the sperm flagella (MMAF)-associated genes: The relationships between genetic variation and litter size in goats. Gene 2020; 753:144778. [DOI: 10.1016/j.gene.2020.144778] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 04/20/2020] [Accepted: 05/14/2020] [Indexed: 12/12/2022]
|
6
|
Vergara-Lope A, Ennis S, Vorechovsky I, Pengelly RJ, Collins A. Heterogeneity in the extent of linkage disequilibrium among exonic, intronic, non-coding RNA and intergenic chromosome regions. Eur J Hum Genet 2019; 27:1436-1444. [PMID: 31053778 DOI: 10.1038/s41431-019-0419-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2018] [Revised: 03/04/2019] [Accepted: 04/16/2019] [Indexed: 11/09/2022] Open
Abstract
Whole-genome sequence data enable construction of high-resolution linkage disequilibrium (LD) maps revealing the LD structure of functional elements within genic and subgenic sequences. The Malecot-Morton model defines LD map distances in linkage disequilibrium units (LDUs), analogous to the centimorgan scale of linkage maps. For whole-genome sequence-derived LD maps, we introduce the ratio of corresponding map lengths kilobases/LDU to describe the extent of LD within genome components. The extent of LD is highly variable across the genome ranging from ~38 kb for intergenic sequences to ~858 kb for centromeric regions. LD is ~16% more extensive in genic, compared with intergenic sequences, reflecting relatively increased selection and/or reduced recombination in genes. The LD profile across 18,268 autosomal genes reveals reduced extent of LD, consistent with elevated recombination, in exonic regions near the 5' end of genes but more extensive LD, compared with intronic sequences, across more centrally located exons. Genes classified as essential and genes linked to Mendelian phenotypes show more extensive LD compared with genes associated with complex traits, perhaps reflecting differences in selective pressure. Significant differences between exonic, intronic and intergenic components demonstrate that fine-scale LD structure provides important insights into genome function, which cannot be revealed by LD analysis of much lower resolution array-based genotyping and conventional linkage maps.
Collapse
Affiliation(s)
- Alejandra Vergara-Lope
- Human Genetics, Faculty of Medicine, University of Southampton, Duthie Building (808), Southampton General Hospital, Tremona Road, Southampton, SO16 6YD, UK
| | - Sarah Ennis
- Human Genetics, Faculty of Medicine, University of Southampton, Duthie Building (808), Southampton General Hospital, Tremona Road, Southampton, SO16 6YD, UK
| | - Igor Vorechovsky
- Human Genetics, Faculty of Medicine, University of Southampton, Duthie Building (808), Southampton General Hospital, Tremona Road, Southampton, SO16 6YD, UK
| | - Reuben J Pengelly
- Human Genetics, Faculty of Medicine, University of Southampton, Duthie Building (808), Southampton General Hospital, Tremona Road, Southampton, SO16 6YD, UK
| | - Andrew Collins
- Human Genetics, Faculty of Medicine, University of Southampton, Duthie Building (808), Southampton General Hospital, Tremona Road, Southampton, SO16 6YD, UK.
| |
Collapse
|
7
|
Havrilla JM, Pedersen BS, Layer RM, Quinlan AR. A map of constrained coding regions in the human genome. Nat Genet 2018; 51:88-95. [PMID: 30531870 DOI: 10.1038/s41588-018-0294-6] [Citation(s) in RCA: 154] [Impact Index Per Article: 25.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Accepted: 10/29/2018] [Indexed: 12/13/2022]
Abstract
Deep catalogs of genetic variation from thousands of humans enable the detection of intraspecies constraint by identifying coding regions with a scarcity of variation. While existing techniques summarize constraint for entire genes, single gene-wide metrics conceal regional constraint variability within each gene. Therefore, we have created a detailed map of constrained coding regions (CCRs) by leveraging variation observed among 123,136 humans from the Genome Aggregation Database. The most constrained CCRs are enriched for pathogenic variants in ClinVar and mutations underlying developmental disorders. CCRs highlight protein domain families under high constraint and suggest unannotated or incomplete protein domains. The highest-percentile CCRs complement existing variant prioritization methods when evaluating de novo mutations in studies of autosomal dominant disease. Finally, we identify highly constrained CCRs within genes lacking known disease associations. This observation suggests that CCRs may identify regions under strong purifying selection that, when mutated, cause severe developmental phenotypes or embryonic lethality.
Collapse
Affiliation(s)
- James M Havrilla
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.,USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Brent S Pedersen
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.,USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Ryan M Layer
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA.,Department of Computer Science, University of Colorado, Boulder, CO, USA
| | - Aaron R Quinlan
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA. .,USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA. .,Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA.
| |
Collapse
|
8
|
Pengelly RJ, Vergara-Lope A, Alyousfi D, Jabalameli MR, Collins A. Understanding the disease genome: gene essentiality and the interplay of selection, recombination and mutation. Brief Bioinform 2017; 20:267-273. [DOI: 10.1093/bib/bbx110] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Indexed: 12/24/2022] Open
Affiliation(s)
- Reuben J Pengelly
- Genetic Epidemiology and Genomic Informatics, Faculty of Medicine, University of Southampton, Southampton, UK
| | - Alejandra Vergara-Lope
- Genetic Epidemiology and Genomic Informatics, Faculty of Medicine, University of Southampton, Southampton, UK
| | - Dareen Alyousfi
- Genetic Epidemiology and Genomic Informatics, Faculty of Medicine, University of Southampton, Southampton, UK
| | - M Reza Jabalameli
- Genetic Epidemiology and Genomic Informatics, Faculty of Medicine, University of Southampton, Southampton, UK
| | - Andrew Collins
- Genetic Epidemiology and Genomic Informatics, Faculty of Medicine, University of Southampton, Southampton, UK
| |
Collapse
|
9
|
Pengelly RJ, Gheyas AA, Kuo R, Mossotto E, Seaby EG, Burt DW, Ennis S, Collins A. Commercial chicken breeds exhibit highly divergent patterns of linkage disequilibrium. Heredity (Edinb) 2016; 117:375-382. [PMID: 27381324 DOI: 10.1038/hdy.2016.47] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2016] [Revised: 05/10/2016] [Accepted: 05/19/2016] [Indexed: 02/06/2023] Open
Abstract
The analysis of linkage disequilibrium (LD) underpins the development of effective genotyping technologies, trait mapping and understanding of biological mechanisms such as those driving recombination and the impact of selection. We apply the Malécot-Morton model of LD to create additive LD maps that describe the high-resolution LD landscape of commercial chickens. We investigated LD in chickens (Gallus gallus) at the highest resolution to date for broiler, white egg and brown egg layer commercial lines. There is minimal concordance between breeds of fine-scale LD patterns (correlation coefficient <0.21), and even between discrete broiler lines. Regions of LD breakdown, which may align with recombination hot spots, are enriched near CpG islands and transcription start sites (P<2.2 × 10-16), consistent with recent evidence described in finches, but concordance in hot spot locations between commercial breeds is only marginally greater than random. As in other birds, functional elements in the chicken genome are associated with recombination but, unlike evidence from other bird species, the LD landscape is not stable in the populations studied. The development of optimal genotyping panels for genome-led selection programmes will depend on careful analysis of the LD structure of each line of interest. Further study is required to fully elucidate the mechanisms underlying highly divergent LD patterns found in commercial chickens.
Collapse
Affiliation(s)
- R J Pengelly
- Human Genetics and Genomic Medicine, Faculty of Medicine, University of Southampton, Southampton, UK
| | - A A Gheyas
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | - R Kuo
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | - E Mossotto
- Human Genetics and Genomic Medicine, Faculty of Medicine, University of Southampton, Southampton, UK
| | - E G Seaby
- Human Genetics and Genomic Medicine, Faculty of Medicine, University of Southampton, Southampton, UK
| | - D W Burt
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | - S Ennis
- Human Genetics and Genomic Medicine, Faculty of Medicine, University of Southampton, Southampton, UK
| | - A Collins
- Human Genetics and Genomic Medicine, Faculty of Medicine, University of Southampton, Southampton, UK
| |
Collapse
|
10
|
Wang MD, Dzama K, Hefer CA, Muchadeyi FC. Genomic population structure and prevalence of copy number variations in South African Nguni cattle. BMC Genomics 2015; 16:894. [PMID: 26531252 PMCID: PMC4632335 DOI: 10.1186/s12864-015-2122-z] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2015] [Accepted: 10/22/2015] [Indexed: 12/21/2022] Open
Abstract
Background Copy number variations (CNVs) are modifications in DNA structure comprising of deletions, duplications, insertions and complex multi-site variants. Although CNVs are proven to be involved in a variety of phenotypic discrepancies, the full extent and consequence of CNVs is yet to be understood. To date, no such genomic characterization has been performed in indigenous South African Nguni cattle. Nguni cattle are recognized for their ability to sustain harsh environmental conditions while exhibiting enhanced resistance to disease and parasites and are thought to comprise of up to nine different ecotypes. Methods Illumina BovineSNP50 Beadchip data was utilized to investigate genomic population structure and the prevalence of CNVs in 492 South African Nguni cattle. PLINK, ADMIXTURE, R, gPLINK and Haploview software was utilized for quality control, population structure and haplotype block determination. PennCNV hidden Markov model identified CNVs and genes contained within and 10 Mb downstream from reported CNVs. PANTHER and Ensembl databases were subsequently utilized for gene annotation analyses. Results Population structure analyses on Nguni cattle revealed 5 sub-populations with a possible sub-structure evident at K equal to 8. Four hundred and thirty three CNVs that formed 334 CNVRs ranging from 30 kb to 1 Mb in size are reported. Only 231 of the 492 animals demonstrated CNVRs. Two hundred and eighty nine genes were observed within CNVRs identified. Of these 149, 28, 44, 2 and 14 genes were unique to sub-populations A, B, C, D and E respectively. Gene ontology analyses demonstrated a number of pathways to be represented by respective genes, including immune response, response to abiotic stress and biological regulation processess. Conclusions CNVs may explain part of the phenotypic diversity and the enhanced adaptation evident in Nguni cattle. Genes involved in a number of cellular components, biological processes and molecular functions are reported within CNVRs identified. The significance of such CNVRs and the possible effect thereof needs to be ascertained and may hold interesting insight into the functional and adaptive consequence of CNVs in cattle. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2122-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Magretha Diane Wang
- Department of Animal Sciences, University of Stellenbosch, Private Bag X1, Matieland, Stellenbosch, 7602, South Africa. .,Biotechnology Platform, Agricultural Research Council, Private Bag X5, Onderstepoort, 0110, South Africa.
| | - Kennedy Dzama
- Department of Animal Sciences, University of Stellenbosch, Private Bag X1, Matieland, Stellenbosch, 7602, South Africa.
| | - Charles A Hefer
- Biotechnology Platform, Agricultural Research Council, Private Bag X5, Onderstepoort, 0110, South Africa.
| | - Farai C Muchadeyi
- Biotechnology Platform, Agricultural Research Council, Private Bag X5, Onderstepoort, 0110, South Africa.
| |
Collapse
|
11
|
Taliun D, Gamper J, Pattaro C. Efficient haplotype block recognition of very long and dense genetic sequences. BMC Bioinformatics 2014; 15:10. [PMID: 24423111 PMCID: PMC3898000 DOI: 10.1186/1471-2105-15-10] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2013] [Accepted: 12/18/2013] [Indexed: 11/10/2022] Open
Abstract
Background The new sequencing technologies enable to scan very long and dense genetic sequences, obtaining datasets of genetic markers that are an order of magnitude larger than previously available. Such genetic sequences are characterized by common alleles interspersed with multiple rarer alleles. This situation has renewed the interest for the identification of haplotypes carrying the rare risk alleles. However, large scale explorations of the linkage-disequilibrium (LD) pattern to identify haplotype blocks are not easy to perform, because traditional algorithms have at least Θ(n2) time and memory complexity. Results We derived three incremental optimizations of the widely used haplotype block recognition algorithm proposed by Gabriel et al. in 2002. Our most efficient solution, called MIG ++, has only Θ(n) memory complexity and, on a genome-wide scale, it omits >80% of the calculations, which makes it an order of magnitude faster than the original algorithm. Differently from the existing software, the MIG ++ analyzes the LD between SNPs at any distance, avoiding restrictions on the maximal block length. The haplotype block partition of the entire HapMap II CEPH dataset was obtained in 457 hours. By replacing the standard likelihood-based D′ variance estimator with an approximated estimator, the runtime was further improved. While producing a coarser partition, the approximate method allowed to obtain the full-genome haplotype block partition of the entire 1000 Genomes Project CEPH dataset in 44 hours, with no restrictions on allele frequency or long-range correlations. These experiments showed that LD-based haplotype blocks can span more than one million base-pairs in both HapMap II and 1000 Genomes datasets. An application to the North American Rheumatoid Arthritis Consortium (NARAC) dataset shows how the MIG ++ can support genome-wide haplotype association studies. Conclusions The MIG ++ enables to perform LD-based haplotype block recognition on genetic sequences of any length and density. In the new generation sequencing era, this can help identify haplotypes that carry rare variants of interest. The low computational requirements open the possibility to include the haplotype block structure into genome-wide association scans, downstream analyses, and visual interfaces for online genome browsers.
Collapse
Affiliation(s)
- Daniel Taliun
- Center for Biomedicine, European Academy of Bolzano/Bozen (EURAC), Bozen-Bolzano, Italy.
| | | | | |
Collapse
|
12
|
Abstract
Increasing evidence indicates that genes containing disease causal variation have distinct functional and genomic properties. The importance of understanding these properties is highlighted by efforts to filter lists of variants from next-generation sequencing studies, where the number of potentially deleterious variants, which are in fact unrelated to disease, may be large. Available evidence indicates that the majority of disease genes are 'non-essential' and their products occupy functionally peripheral positions in protein networks. They tend to be intermediate between genes that have core biological functions, particularly low mutation rates and low haplotype diversity, and genes for which high haplotype diversity and high mutation rates are advantageous (such as those involved in sensory perception and some immune system functions). Evidence presented here supports these conclusions through analysis of integrated data sets incorporating the latest mutational profiles, linkage disequilibrium structure and other genomic properties of individual genes. The analysis highlights the contrasting functions of genes predicted as least and most likely to contain disease variation and provides a basis for filtering gene variant lists to exclude the least plausible disease candidates.
Collapse
|
13
|
Pengelly RJ, Gibson J, Andreoletti G, Collins A, Mattocks CJ, Ennis S. A SNP profiling panel for sample tracking in whole-exome sequencing studies. Genome Med 2013; 5:89. [PMID: 24070238 PMCID: PMC3978886 DOI: 10.1186/gm492] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2013] [Accepted: 09/16/2013] [Indexed: 12/17/2022] Open
Abstract
Whole-exome sequencing provides a cost-effective means to sequence protein coding regions within the genome, which are significantly enriched for etiological variants. We describe a panel of single nucleotide polymorphisms (SNPs) to facilitate the validation of data provenance in whole-exome sequencing studies. This is particularly significant where multiple processing steps necessitate transfer of sample custody between clinical, laboratory and bioinformatics facilities. SNPs captured by all commonly used exome enrichment kits were identified, and filtered for possible confounding properties. The optimised panel provides a simple, yet powerful, method for the assignment of intrinsic, highly discriminatory identifiers to genetic samples.
Collapse
Affiliation(s)
- Reuben J Pengelly
- Human Genetics and Genomic Medicine, Faculty of Medicine, University of Southampton, Duthie Building (MP 808), Southampton General Hospital, Tremona Road, Southampton SO16 6YD, UK
| | - Jane Gibson
- Human Genetics and Genomic Medicine, Faculty of Medicine, University of Southampton, Duthie Building (MP 808), Southampton General Hospital, Tremona Road, Southampton SO16 6YD, UK
| | - Gaia Andreoletti
- Human Genetics and Genomic Medicine, Faculty of Medicine, University of Southampton, Duthie Building (MP 808), Southampton General Hospital, Tremona Road, Southampton SO16 6YD, UK
| | - Andrew Collins
- Human Genetics and Genomic Medicine, Faculty of Medicine, University of Southampton, Duthie Building (MP 808), Southampton General Hospital, Tremona Road, Southampton SO16 6YD, UK
| | - Christopher J Mattocks
- National Genetics Reference Laboratory (Wessex), Salisbury District Hospital, Salisbury SP2 8BJ, UK
| | - Sarah Ennis
- Human Genetics and Genomic Medicine, Faculty of Medicine, University of Southampton, Duthie Building (MP 808), Southampton General Hospital, Tremona Road, Southampton SO16 6YD, UK
| |
Collapse
|