1
|
MacLeod IM, Bowman PJ, Vander Jagt CJ, Haile-Mariam M, Kemper KE, Chamberlain AJ, Schrooten C, Hayes BJ, Goddard ME. Exploiting biological priors and sequence variants enhances QTL discovery and genomic prediction of complex traits. BMC Genomics 2016; 17:144. [PMID: 26920147 PMCID: PMC4769584 DOI: 10.1186/s12864-016-2443-6] [Citation(s) in RCA: 200] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Accepted: 02/08/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Dense SNP genotypes are often combined with complex trait phenotypes to map causal variants, study genetic architecture and provide genomic predictions for individuals with genotypes but no phenotype. A single method of analysis that jointly fits all genotypes in a Bayesian mixture model (BayesR) has been shown to competitively address all 3 purposes simultaneously. However, BayesR and other similar methods ignore prior biological knowledge and assume all genotypes are equally likely to affect the trait. While this assumption is reasonable for SNP array genotypes, it is less sensible if genotypes are whole-genome sequence variants which should include causal variants. RESULTS We introduce a new method (BayesRC) based on BayesR that incorporates prior biological information in the analysis by defining classes of variants likely to be enriched for causal mutations. The information can be derived from a range of sources, including variant annotation, candidate gene lists and known causal variants. This information is then incorporated objectively in the analysis based on evidence of enrichment in the data. We demonstrate the increased power of BayesRC compared to BayesR using real dairy cattle genotypes with simulated phenotypes. The genotypes were imputed whole-genome sequence variants in coding regions combined with dense SNP markers. BayesRC increased the power to detect causal variants and increased the accuracy of genomic prediction. The relative improvement for genomic prediction was most apparent in validation populations that were not closely related to the reference population. We also applied BayesRC to real milk production phenotypes in dairy cattle using independent biological priors from gene expression analyses. Although current biological knowledge of which genes and variants affect milk production is still very incomplete, our results suggest that the new BayesRC method was equal to or more powerful than BayesR for detecting candidate causal variants and for genomic prediction of milk traits. CONCLUSIONS BayesRC provides a novel and flexible approach to simultaneously improving the accuracy of QTL discovery and genomic prediction by taking advantage of prior biological knowledge. Approaches such as BayesRC will become increasing useful as biological knowledge accumulates regarding functional regions of the genome for a range of traits and species.
Collapse
|
Journal Article |
9 |
200 |
2
|
Goddard ME, Kemper KE, MacLeod IM, Chamberlain AJ, Hayes BJ. Genetics of complex traits: prediction of phenotype, identification of causal polymorphisms and genetic architecture. Proc Biol Sci 2017; 283:rspb.2016.0569. [PMID: 27440663 DOI: 10.1098/rspb.2016.0569] [Citation(s) in RCA: 83] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2016] [Accepted: 06/23/2016] [Indexed: 01/01/2023] Open
Abstract
Complex or quantitative traits are important in medicine, agriculture and evolution, yet, until recently, few of the polymorphisms that cause variation in these traits were known. Genome-wide association studies (GWAS), based on the ability to assay thousands of single nucleotide polymorphisms (SNPs), have revolutionized our understanding of the genetics of complex traits. We advocate the analysis of GWAS data by a statistical method that fits all SNP effects simultaneously, assuming that these effects are drawn from a prior distribution. We illustrate how this method can be used to predict future phenotypes, to map and identify the causal mutations, and to study the genetic architecture of complex traits. The genetic architecture of complex traits is even more complex than previously thought: in almost every trait studied there are thousands of polymorphisms that explain genetic variation. Methods of predicting future phenotypes, collectively known as genomic selection or genomic prediction, have been widely adopted in livestock and crop breeding, leading to increased rates of genetic improvement.
Collapse
|
Review |
8 |
83 |
3
|
Xiang R, Berg IVD, MacLeod IM, Hayes BJ, Prowse-Wilkins CP, Wang M, Bolormaa S, Liu Z, Rochfort SJ, Reich CM, Mason BA, Vander Jagt CJ, Daetwyler HD, Lund MS, Chamberlain AJ, Goddard ME. Quantifying the contribution of sequence variants with regulatory and evolutionary significance to 34 bovine complex traits. Proc Natl Acad Sci U S A 2019; 116:19398-19408. [PMID: 31501319 PMCID: PMC6765237 DOI: 10.1073/pnas.1904159116] [Citation(s) in RCA: 81] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Many genome variants shaping mammalian phenotype are hypothesized to regulate gene transcription and/or to be under selection. However, most of the evidence to support this hypothesis comes from human studies. Systematic evidence for regulatory and evolutionary signals contributing to complex traits in a different mammalian model is needed. Sequence variants associated with gene expression (expression quantitative trait loci [eQTLs]) and concentration of metabolites (metabolic quantitative trait loci [mQTLs]) and under histone-modification marks in several tissues were discovered from multiomics data of over 400 cattle. Variants under selection and evolutionary constraint were identified using genome databases of multiple species. These analyses defined 30 sets of variants, and for each set, we estimated the genetic variance the set explained across 34 complex traits in 11,923 bulls and 32,347 cows with 17,669,372 imputed variants. The per-variant trait heritability of these sets across traits was highly consistent (r > 0.94) between bulls and cows. Based on the per-variant heritability, conserved sites across 100 vertebrate species and mQTLs ranked the highest, followed by eQTLs, young variants, those under histone-modification marks, and selection signatures. From these results, we defined a Functional-And-Evolutionary Trait Heritability (FAETH) score indicating the functionality and predicted heritability of each variant. In additional 7,551 cattle, the high FAETH-ranking variants had significantly increased genetic variances and genomic prediction accuracies in 3 production traits compared to the low FAETH-ranking variants. The FAETH framework combines the information of gene regulation, evolution, and trait heritability to rank variants, and the publicly available FAETH data provide a set of biological priors for cattle genomic selection worldwide.
Collapse
|
research-article |
6 |
81 |
4
|
Pausch H, MacLeod IM, Fries R, Emmerling R, Bowman PJ, Daetwyler HD, Goddard ME. Evaluation of the accuracy of imputed sequence variant genotypes and their utility for causal variant detection in cattle. Genet Sel Evol 2017; 49:24. [PMID: 28222685 PMCID: PMC5320806 DOI: 10.1186/s12711-017-0301-x] [Citation(s) in RCA: 71] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Accepted: 02/14/2017] [Indexed: 12/11/2022] Open
Abstract
Background The availability of dense genotypes and whole-genome sequence variants from various sources offers the opportunity to compile large datasets consisting of tens of thousands of individuals with genotypes at millions of polymorphic sites that may enhance the power of genomic analyses. The imputation of missing genotypes ensures that all individuals have genotypes for a shared set of variants. Results We evaluated the accuracy of imputation from dense genotypes to whole-genome sequence variants in 249 Fleckvieh and 450 Holstein cattle using Minimac and FImpute. The sequence variants of a subset of the animals were reduced to the variants that were included on the Illumina BovineHD genotyping array and subsequently inferred in silico using either within- or multi-breed reference populations. The accuracy of imputation varied considerably across chromosomes and dropped at regions where the bovine genome contains segmental duplications. Depending on the imputation strategy, the correlation between imputed and true genotypes ranged from 0.898 to 0.952. The accuracy of imputation was higher with Minimac than FImpute particularly for variants with a low minor allele frequency. Using a multi-breed reference population increased the accuracy of imputation, particularly when FImpute was used to infer genotypes. When the sequence variants were imputed using Minimac, the true genotypes were more correlated to predicted allele dosages than best-guess genotypes. The computing costs to impute 23,256,743 sequence variants in 6958 animals were ten-fold higher with Minimac than FImpute. Association studies with imputed sequence variants revealed seven quantitative trait loci (QTL) for milk fat percentage. Two causal mutations in the DGAT1 and GHR genes were the most significantly associated variants at two QTL on chromosomes 14 and 20 when Minimac was used to infer genotypes. Conclusions The population-based imputation of millions of sequence variants in large cohorts is computationally feasible and provides accurate genotypes. However, the accuracy of imputation is low in regions where the genome contains large segmental duplications or the coverage with array-derived single nucleotide polymorphisms is poor. Using a reference population that includes individuals from many breeds increases the accuracy of imputation particularly at low-frequency variants. Considering allele dosages rather than best-guess genotypes as explanatory variables is advantageous to detect causal mutations in association studies with imputed sequence variants. Electronic supplementary material The online version of this article (doi:10.1186/s12711-017-0301-x) contains supplementary material, which is available to authorized users.
Collapse
|
Journal Article |
8 |
71 |
5
|
MacLeod IM, Larkin DM, Lewin HA, Hayes BJ, Goddard ME. Inferring demography from runs of homozygosity in whole-genome sequence, with correction for sequence errors. Mol Biol Evol 2013; 30:2209-23. [PMID: 23842528 PMCID: PMC3748359 DOI: 10.1093/molbev/mst125] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Whole-genome sequence is potentially the richest source of genetic data for inferring ancestral demography. However, full sequence also presents significant challenges to fully utilize such large data sets and to ensure that sequencing errors do not introduce bias into the inferred demography. Using whole-genome sequence data from two Holstein cattle, we demonstrate a new method to correct for bias caused by hidden errors and then infer stepwise changes in ancestral demography up to present. There was a strong upward bias in estimates of recent effective population size (Ne) if the correction method was not applied to the data, both for our method and the Li and Durbin (Inference of human population history from individual whole-genome sequences. Nature 475:493-496) pairwise sequentially Markovian coalescent method. To infer demography, we use an analytical predictor of multiloci linkage disequilibrium (LD) based on a simple coalescent model that allows for changes in Ne. The LD statistic summarizes the distribution of runs of homozygosity for any given demography. We infer a best fit demography as one that predicts a match with the observed distribution of runs of homozygosity in the corrected sequence data. We use multiloci LD because it potentially holds more information about ancestral demography than pairwise LD. The inferred demography indicates a strong reduction in the Ne around 170,000 years ago, possibly related to the divergence of African and European Bos taurus cattle. This is followed by a further reduction coinciding with the period of cattle domestication, with Ne of between 3,500 and 6,000. The most recent reduction of Ne to approximately 100 in the Holstein breed agrees well with estimates from pedigrees. Our approach can be applied to whole-genome sequence from any diploid species and can be scaled up to use sequence from multiple individuals.
Collapse
|
Research Support, Non-U.S. Gov't |
12 |
67 |
6
|
Xiang R, MacLeod IM, Daetwyler HD, de Jong G, O’Connor E, Schrooten C, Chamberlain AJ, Goddard ME. Genome-wide fine-mapping identifies pleiotropic and functional variants that predict many traits across global cattle populations. Nat Commun 2021; 12:860. [PMID: 33558518 PMCID: PMC7870883 DOI: 10.1038/s41467-021-21001-0] [Citation(s) in RCA: 54] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 11/23/2020] [Indexed: 02/08/2023] Open
Abstract
The difficulty in finding causative mutations has hampered their use in genomic prediction. Here, we present a methodology to fine-map potentially causal variants genome-wide by integrating the functional, evolutionary and pleiotropic information of variants using GWAS, variant clustering and Bayesian mixture models. Our analysis of 17 million sequence variants in 44,000+ Australian dairy cattle for 34 traits suggests, on average, one pleiotropic QTL existing in each 50 kb chromosome-segment. We selected a set of 80k variants representing potentially causal variants within each chromosome segment to develop a bovine XT-50K genotyping array. The custom array contains many pleiotropic variants with biological functions, including splicing QTLs and variants at conserved sites across 100 vertebrate species. This biology-informed custom array outperformed the standard array in predicting genetic value of multiple traits across populations in independent datasets of 90,000+ dairy cattle from the USA, Australia and New Zealand.
Collapse
|
research-article |
4 |
54 |
7
|
Beynon SE, Slavov GT, Farré M, Sunduimijid B, Waddams K, Davies B, Haresign W, Kijas J, MacLeod IM, Newbold CJ, Davies L, Larkin DM. Population structure and history of the Welsh sheep breeds determined by whole genome genotyping. BMC Genet 2015; 16:65. [PMID: 26091804 PMCID: PMC4474581 DOI: 10.1186/s12863-015-0216-x] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2014] [Accepted: 05/13/2015] [Indexed: 11/10/2022] Open
Abstract
Background One of the most economically important areas within the Welsh agricultural sector is sheep farming, contributing around £230 million to the UK economy annually. Phenotypic selection over several centuries has generated a number of native sheep breeds, which are presumably adapted to the diverse and challenging landscape of Wales. Little is known about the history, genetic diversity and relationships of these breeds with other European breeds. We genotyped 353 individuals from 18 native Welsh sheep breeds using the Illumina OvineSNP50 array and characterised the genetic structure of these breeds. Our genotyping data were then combined with, and compared to, those from a set of 74 worldwide breeds, previously collected during the International Sheep Genome Consortium HapMap project. Results Model based clustering of the Welsh and European breeds indicated shared ancestry. This finding was supported by multidimensional scaling analysis (MDS), which revealed separation of the European, African and Asian breeds. As expected, the commercial Texel and Merino breeds appeared to have extensive co-ancestry with most European breeds. Consistently high levels of haplotype sharing were observed between native Welsh and other European breeds. The Welsh breeds did not, however, form a genetically homogeneous group, with pairwise FST between breeds averaging 0.107 and ranging between 0.020 and 0.201. Four subpopulations were identified within the 18 native breeds, with high homogeneity observed amongst the majority of mountain breeds. Recent effective population sizes estimated from linkage disequilibrium ranged from 88 to 825. Conclusions Welsh breeds are highly diverse with low to moderate effective population sizes and form at least four distinct genetic groups. Our data suggest common ancestry between the native Welsh and European breeds. These findings provide the basis for future genome-wide association studies and a first step towards developing genomics assisted breeding strategies in the UK. Electronic supplementary material The online version of this article (doi:10.1186/s12863-015-0216-x) contains supplementary material, which is available to authorized users.
Collapse
|
Research Support, Non-U.S. Gov't |
10 |
47 |
8
|
Bolormaa S, Chamberlain AJ, Khansefid M, Stothard P, Swan AA, Mason B, Prowse-Wilkins CP, Duijvesteijn N, Moghaddar N, van der Werf JH, Daetwyler HD, MacLeod IM. Accuracy of imputation to whole-genome sequence in sheep. Genet Sel Evol 2019; 51:1. [PMID: 30654735 PMCID: PMC6337865 DOI: 10.1186/s12711-018-0443-5] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Accepted: 12/18/2018] [Indexed: 12/12/2022] Open
Abstract
Background The use of whole-genome sequence (WGS) data for genomic prediction and association studies is highly desirable because the causal mutations should be present in the data. The sequencing of 935 sheep from a range of breeds provides the opportunity to impute sheep genotyped with single nucleotide polymorphism (SNP) arrays to WGS. This study evaluated the accuracy of imputation from SNP genotypes to WGS using this reference population of 935 sequenced sheep. Results The accuracy of imputation from the Ovine Infinium® HD BeadChip SNP (~ 500 k) to WGS was assessed for three target breeds: Merino, Poll Dorset and F1 Border Leicester × Merino. Imputation accuracy was highest for the Poll Dorset breed, although there were more Merino individuals in the sequenced reference population than Poll Dorset individuals. In addition, empirical imputation accuracies were higher (by up to 1.7%) when using larger multi-breed reference populations compared to using a smaller single-breed reference population. The mean accuracy of imputation across target breeds using the Minimac3 or the FImpute software was 0.94. The empirical imputation accuracy varied considerably across the genome; six chromosomes carried regions of one or more Mb with a mean imputation accuracy of < 0.7. Imputation accuracy in five variant annotation classes ranged from 0.87 (missense) up to 0.94 (intronic variants), where lower accuracy corresponded to higher proportions of rare alleles. The imputation quality statistic reported from Minimac3 (R2) had a clear positive relationship with the empirical imputation accuracy. Therefore, by first discarding imputed variants with an R2 below 0.4, the mean empirical accuracy across target breeds increased to 0.97. Although accuracy of genomic prediction was less affected by filtering on R2 in a multi-breed population of sheep with imputed WGS, the genomic heritability clearly tended to be lower when using variants with an R2 ≤ 0.4. Conclusions The mean imputation accuracy was high for all target breeds and was increased by combining smaller breed sets into a multi-breed reference. We found that the Minimac3 software imputation quality statistic (R2) was a useful indicator of empirical imputation accuracy, enabling removal of very poorly imputed variants before downstream analyses. Electronic supplementary material The online version of this article (10.1186/s12711-018-0443-5) contains supplementary material, which is available to authorized users.
Collapse
|
Journal Article |
6 |
36 |
9
|
MacLeod IM, Hayes BJ, Savin KW, Chamberlain AJ, McPartlan HC, Goddard ME. Power of a genome scan to detect and locate quantitative trait loci in cattle using dense single nucleotide polymorphisms. J Anim Breed Genet 2010; 127:133-42. [PMID: 20433522 DOI: 10.1111/j.1439-0388.2009.00831.x] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
There is increasing use of dense single nucleotide polymorphisms (SNPs) for whole-genome association studies (WGAS) in livestock to map and identify quantitative trait loci (QTL). These studies rely on linkage disequilibrium (LD) to detect an association between SNP genotypes and phenotypes. The power and precision of these WGAS are unknown, and will depend on the extent of LD in the experimental population. One complication for WGAS in livestock populations is that they typically consist of many paternal half-sib families, and in some cases full-sib families; unless this subtle population stratification is accounted for, many spurious associations may be reported. Our aim was to investigate the power, precision and false discovery rates of WGAS for QTL discovery, with a commercial SNP array, given existing patterns of LD in cattle. We also tested the efficiency of selective genotyping animals. A total of 365 cattle were genotyped for 9232 SNPs. We simulated a QTL effect as well as polygenic and environmental effects for all animals. One QTL was simulated on a randomly chosen SNP and accounted for 5%, 10% or 18% of the total variance. The power to detect a moderate-sized additive QTL (5% of the phenotypic variance) with 365 animals genotyped was 37% (p < 0.001). Most importantly, if pedigree structure was not accounted for, the number of false positives significantly increased above those expected by chance alone. Selective genotyping also resulted in a significant increase in false positives, even when pedigree structure was accounted for.
Collapse
|
Research Support, Non-U.S. Gov't |
15 |
32 |
10
|
Cheruiyot EK, Haile-Mariam M, Cocks BG, MacLeod IM, Xiang R, Pryce JE. New loci and neuronal pathways for resilience to heat stress in cattle. Sci Rep 2021; 11:16619. [PMID: 34404823 PMCID: PMC8371109 DOI: 10.1038/s41598-021-95816-8] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Accepted: 07/30/2021] [Indexed: 02/07/2023] Open
Abstract
While understanding the genetic basis of heat tolerance is crucial in the context of global warming's effect on humans, livestock, and wildlife, the specific genetic variants and biological features that confer thermotolerance in animals are still not well characterized. We used dairy cows as a model to study heat tolerance because they are lactating, and therefore often prone to thermal stress. The data comprised almost 0.5 million milk records (milk, fat, and proteins) of 29,107 Australian Holsteins, each having around 15 million imputed sequence variants. Dairy animals often reduce their milk production when temperature and humidity rise; thus, the phenotypes used to measure an individual's heat tolerance were defined as the rate of milk production decline (slope traits) with a rising temperature-humidity index. With these slope traits, we performed a genome-wide association study (GWAS) using different approaches, including conditional analyses, to correct for the relationship between heat tolerance and level of milk production. The results revealed multiple novel loci for heat tolerance, including 61 potential functional variants at sites highly conserved across 100 vertebrate species. Moreover, it was interesting that specific candidate variants and genes are related to the neuronal system (ITPR1, ITPR2, and GRIA4) and neuroactive ligand-receptor interaction functions for heat tolerance (NPFFR2, CALCR, and GHR), providing a novel insight that can help to develop genetic and management approaches to combat heat stress.
Collapse
|
research-article |
4 |
28 |
11
|
van den Berg I, Meuwissen THE, MacLeod IM, Goddard ME. Predicting the effect of reference population on the accuracy of within, across, and multibreed genomic prediction. J Dairy Sci 2019; 102:3155-3174. [PMID: 30738664 DOI: 10.3168/jds.2018-15231] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 12/08/2018] [Indexed: 01/24/2023]
Abstract
Genomic prediction is widely used to select candidates for breeding. Size and composition of the reference population are important factors influencing prediction accuracy. In Holstein dairy cattle, large reference populations are used, but this is difficult to achieve in numerically small breeds and for traits that are not routinely recorded. The prediction accuracy is usually estimated using cross-validation, requiring the full data set. It would be useful to have a method to predict the benefit of multibreed reference populations that does not require the availability of the full data set. Our objective was to study the effect of the size and breed composition of the reference population on the accuracy of genomic prediction using genomic BLUP and Bayes R. We also examined the effect of trait heritability and validation breed on prediction accuracy. Using these empirical results, we investigated the use of a formula to predict the effect of the size and composition of the reference population on the accuracy of genomic prediction. Phenotypes were simulated in a data set containing real genotypes of imputed sequence variants for 22,752 dairy bulls and cows, including Holstein, Jersey, Red Holstein, and Australian Red cattle. Different reference populations were constructed, varying in size and composition, to study within-breed, multibreed, and across-breed prediction. Phenotypes were simulated varying in heritability, number of chromosomes, and number of quantitative trait loci. Genomic prediction was carried out using genomic BLUP and Bayes R. We used either the genomic relationship matrix (GRM) to estimate the number of independent chromosomal segments and subsequently to predict accuracy, or the accuracies obtained from single-breed reference populations to predict the accuracies of larger or multibreed reference populations. Using the GRM overestimated the accuracy; this overestimation was likely due to close relationships among some of the reference animals. Consequently, the GRM could not be used to predict the accuracy of genomic prediction reliably. However, a method using the prediction accuracies obtained by cross-validation using a small, single-breed reference population predicted the accuracy using a multibreed reference population well and slightly overestimated the accuracy for a larger reference population of the same breed, but gave a reasonably close estimate of the accuracy for a multibreed reference population. This method could be useful for making decisions regarding the size and composition of the reference population.
Collapse
|
Journal Article |
6 |
27 |
12
|
Xiang R, Hayes BJ, Vander Jagt CJ, MacLeod IM, Khansefid M, Bowman PJ, Yuan Z, Prowse-Wilkins CP, Reich CM, Mason BA, Garner JB, Marett LC, Chen Y, Bolormaa S, Daetwyler HD, Chamberlain AJ, Goddard ME. Genome variants associated with RNA splicing variations in bovine are extensively shared between tissues. BMC Genomics 2018; 19:521. [PMID: 29973141 PMCID: PMC6032541 DOI: 10.1186/s12864-018-4902-8] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2017] [Accepted: 06/27/2018] [Indexed: 12/12/2022] Open
Abstract
Background Mammalian phenotypes are shaped by numerous genome variants, many of which may regulate gene transcription or RNA splicing. To identify variants with regulatory functions in cattle, an important economic and model species, we used sequence variants to map a type of expression quantitative trait loci (expression QTLs) that are associated with variations in the RNA splicing, i.e., sQTLs. To further the understanding of regulatory variants, sQTLs were compare with other two types of expression QTLs, 1) variants associated with variations in gene expression, i.e., geQTLs and 2) variants associated with variations in exon expression, i.e., eeQTLs, in different tissues. Results Using whole genome and RNA sequence data from four tissues of over 200 cattle, sQTLs identified using exon inclusion ratios were verified by matching their effects on adjacent intron excision ratios. sQTLs contained the highest percentage of variants that are within the intronic region of genes and contained the lowest percentage of variants that are within intergenic regions, compared to eeQTLs and geQTLs. Many geQTLs and sQTLs are also detected as eeQTLs. Many expression QTLs, including sQTLs, were significant in all four tissues and had a similar effect in each tissue. To verify such expression QTL sharing between tissues, variants surrounding (±1 Mb) the exon or gene were used to build local genomic relationship matrices (LGRM) and estimated genetic correlations between tissues. For many exons, the splicing and expression level was determined by the same cis additive genetic variance in different tissues. Thus, an effective but simple-to-implement meta-analysis combining information from three tissues is introduced to increase power to detect and validate sQTLs. sQTLs and eeQTLs together were more enriched for variants associated with cattle complex traits, compared to geQTLs. Several putative causal mutations were identified, including an sQTL at Chr6:87392580 within the 5th exon of kappa casein (CSN3) associated with milk production traits. Conclusions Using novel analytical approaches, we report the first identification of numerous bovine sQTLs which are extensively shared between multiple tissue types. The significant overlaps between bovine sQTLs and complex traits QTL highlight the contribution of regulatory mutations to phenotypic variations. Electronic supplementary material The online version of this article (10.1186/s12864-018-4902-8) contains supplementary material, which is available to authorized users.
Collapse
|
Journal Article |
7 |
25 |
13
|
Xiang R, MacLeod IM, Bolormaa S, Goddard ME. Genome-wide comparative analyses of correlated and uncorrelated phenotypes identify major pleiotropic variants in dairy cattle. Sci Rep 2017; 7:9248. [PMID: 28835686 PMCID: PMC5569018 DOI: 10.1038/s41598-017-09788-9] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Accepted: 07/31/2017] [Indexed: 11/10/2022] Open
Abstract
While single nucleotide polymorphisms (SNPs) associated with multiple phenotype have been reported, the knowledge of pleiotropy of uncorrelated phenotype is minimal. Principal components (PCs) and uncorrelated Cholesky transformed traits (CT) were constructed using 25 raw traits (RTs) of 2841 dairy bulls. Multi-trait meta-analyses of single-trait genome-wide association studies for RT, PC and CT in bulls were validated in 6821 cows. Most PCs and CTs had substantial estimates of heritability, suggesting that genes affect phenotype via diverse pathways. Phenotypic orthogonalizations did not eliminate pleiotropy: the meta-analysis achieved an agreement of significant pleiotropic SNPs (p < 1 × 10-5, n = 368) between RTs (416), PCs (466) and CTs (425). From this overlap we identified 21 lead SNPs with 100% validation rate containing two clusters: one consisted of DGAT1 (chr14:1.8 M+), MGST1 (chr5:93 M+), PAEP (chr11:103 M+) and GPAT4 (chr27:36 M+) affecting protein, milk and fat yield and the other included CSN2 (chr6:87 M+), MUC1 (chr3:15.6 M), GHR (chr20:31.2 M+) and SDC2 (chr14:70 M+) affecting protein and milk yield. Combining beef cattle data identified correlated SNPs representing CAPN1 (chr29:44 M+) and CAST (chr 7:96 M+) loci affecting beef tenderness, showing pleiotropic effects in dairy cattle. Our findings show that SNPs with a large effect on one trait are likely to have small effects on other uncorrelated traits.
Collapse
|
research-article |
8 |
25 |
14
|
Turner MJ, MacLeod IM, Rothberg AD. Effects of temperature and composition on the viscosity of respiratory gases. J Appl Physiol (1985) 1989; 67:472-7. [PMID: 2503494 DOI: 10.1152/jappl.1989.67.1.472] [Citation(s) in RCA: 24] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
The steady-state sensitivity of resistance pneumotachographs is proportional to viscosity. Dynamic characteristics of pneumotachographs, pressure transducers, and mass spectrometers are also viscosity dependent. We derive linear equations to approximate the viscosities of O2, N2, CO2, H2O, He, N2O, and Ar for temperatures between 20 and 40 degrees C by using published viscosity data and a nonlinear extrapolation equation. We verify the accuracy of the extrapolation equation by comparison with published data. Our linear equations for pure gas viscosities yield standard errors less than 0.35 microP. We also compare a nonlinear equation for calculating the viscosities of mixtures of gases with published measured viscosities of dry air, humid air, and He-O2 and N2-CO2 mixtures. The maximum difference between published and calculated values is 1.3% for 10% CO2 in N2. All other differences are less than 0.38%. For saturated humid air at 35 degrees C, a linear concentration-weighted combination of viscosities differs from our nonlinear equation by 4.9, 2.1, and 1.7% at barometric pressures of 32, 83, and 100 kPa, respectively. By use of our method, the viscosity of normal respiratory gases can be calculated to within 1% of measured values.
Collapse
|
|
36 |
24 |
15
|
van den Berg I, Bowman PJ, MacLeod IM, Hayes BJ, Wang T, Bolormaa S, Goddard ME. Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect. Genet Sel Evol 2017; 49:70. [PMID: 28934948 PMCID: PMC5609075 DOI: 10.1186/s12711-017-0347-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 09/13/2017] [Indexed: 11/26/2022] Open
Abstract
Background The increasing availability of whole-genome sequence data is expected to increase the accuracy of genomic prediction. However, results from simulation studies and analysis of real data do not always show an increase in accuracy from sequence data compared to high-density (HD) single nucleotide polymorphism (SNP) chip genotypes. In addition, the sheer number of variants makes analysis of all variants and accurate estimation of all effects computationally challenging. Our objective was to find a strategy to approximate the analysis of whole-sequence data with a Bayesian variable selection model. Using a simulated dataset, we applied a Bayes R hybrid model to analyse whole-sequence data, test the effect of dropping a proportion of variants during the analysis, and test how the analysis can be split into separate analyses per chromosome to reduce the elapsed computing time. We also investigated the effect of imputation errors on prediction accuracy. Subsequently, we applied the approach to a dataset that contained imputed sequences and records for production and fertility traits for 38,492 Holstein, Jersey, Australian Red and crossbred bulls and cows. Results With the simulated dataset, we found that prediction accuracy was highly increased for a breed that was not represented in the training population for sequence data compared to HD SNP data. Either dropping part of the variants during the analysis or splitting the analysis into separate analyses per chromosome decreased accuracy compared to analysing whole-sequence data. First, dropping variants from each chromosome and reanalysing the retained variants together resulted in an accuracy similar to that obtained when analysing whole-sequence data. Adding imputation errors decreased prediction accuracy, especially for errors in the validation population. With real data, using sequence variants resulted in accuracies that were similar to those obtained with the HD SNPs. Conclusions We present an efficient approach to approximate analysis of whole-sequence data with a Bayesian variable selection model. The lack of increase in prediction accuracy when applied to real data could be due to imputation errors, which demonstrates the importance of developing more accurate methods of imputation or directly genotyping sequence variants that have a major effect in the prediction equation. Electronic supplementary material The online version of this article (doi:10.1186/s12711-017-0347-9) contains supplementary material, which is available to authorized users.
Collapse
|
Journal Article |
8 |
24 |
16
|
Xiang R, van den Berg I, MacLeod IM, Daetwyler HD, Goddard ME. Effect direction meta-analysis of GWAS identifies extreme, prevalent and shared pleiotropy in a large mammal. Commun Biol 2020; 3:88. [PMID: 32111961 PMCID: PMC7048789 DOI: 10.1038/s42003-020-0823-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Accepted: 02/06/2020] [Indexed: 12/17/2022] Open
Abstract
In genome-wide association studies (GWAS), variants showing consistent effect directions across populations are considered as true discoveries. We model this information in an Effect Direction MEta-analysis (EDME) to quantify pleiotropy using GWAS of 34 Cholesky-decorrelated traits in 44,000+ cattle with sequence variants. The effect-direction agreement between independent bull and cow datasets was used to quantify the false discovery rate by effect direction (FDRed) and the number of affected traits for prioritised variants. Variants with multi-trait p < 1e–6 affected 1∼22 traits with an average of 10 traits. EDME assigns pleiotropic variants to each trait which informs the biology behind complex traits. New pleiotropic loci are identified, including signals from the cattle FTO locus mirroring its bystander effects on human obesity. When validated in the 1000-Bull Genome database, the prioritized pleiotropic variants consistently predicted expected phenotypic differences between dairy and beef cattle. EDME provides robust approaches to control GWAS FDR and quantify pleiotropy. Xiang et al. developed an Effect Direction Meta-analysis (EDME) approach to identify true pleiotropy. They used Cholesky-transformation to decorrelate the traits and identified many pleiotropic variants that consistently predicted phenotypic differences in cattle.
Collapse
|
Research Support, Non-U.S. Gov't |
5 |
23 |
17
|
Dorji J, MacLeod IM, Chamberlain AJ, Vander Jagt CJ, Ho PN, Khansefid M, Mason BA, Prowse-Wilkins CP, Marett LC, Wales WJ, Cocks BG, Pryce JE, Daetwyler HD. Mitochondrial protein gene expression and the oxidative phosphorylation pathway associated with feed efficiency and energy balance in dairy cattle. J Dairy Sci 2020; 104:575-587. [PMID: 33162069 DOI: 10.3168/jds.2020-18503] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Accepted: 08/20/2020] [Indexed: 12/12/2022]
Abstract
Feed efficiency and energy balance are important traits underpinning profitability and environmental sustainability in animal production. They are complex traits, and our understanding of their underlying biology is currently limited. One measure of feed efficiency is residual feed intake (RFI), which is the difference between actual and predicted intake. Variation in RFI among individuals is attributable to the metabolic efficiency of energy utilization. High RFI (H_RFI) animals require more energy per unit of weight gain or milk produced compared with low RFI (L_RFI) animals. Energy balance (EB) is a closely related trait calculated very similarly to RFI. Cellular energy metabolism in mitochondria involves mitochondrial protein (MiP) encoded by both nuclear (NuMiP) and mitochondrial (MtMiP) genomes. We hypothesized that MiP genes are differentially expressed (DE) between H_RFI and L_RFI animal groups and similarly between negative and positive EB groups. Our study aimed to characterize MiP gene expression in white blood cells of H_RFI and L_RFI cows using RNA sequencing to identify genes and biological pathways associated with feed efficiency in dairy cattle. We used the top and bottom 14 cows ranked for RFI and EB out of 109 animals as H_RFI and L_RFI, and positive and negative EB groups, respectively. The gene expression counts across all nuclear and mitochondrial genes for animals in each group were used for differential gene expression analyses, weighted gene correlation network analysis, functional enrichment, and identification of hub genes. Out of 244 DE genes between RFI groups, 38 were MiP genes. The DE genes were enriched for the oxidative phosphorylation (OXPHOS) and ribosome pathways. The DE MiP genes were underexpressed in L_RFI (and negative EB) compared with the H_RFI (and positive EB) groups, suggestive of reduced mitochondrial activity in the L_RFI group. None of the MtMiP genes were among the DE MiP genes between the groups, which suggests a non-rate limiting role of MtMiP genes in feed efficiency and warrants further investigation. The role of MiP, particularly the NuMiP and OXPHOS pathways in RFI, was also supported by our gene correlation network analysis and the hub gene identification. We validated the findings in an independent data set. Overall, our study suggested that differences in feed efficiency in dairy cows may be linked to differences in cellular energy demand. This study broadens our knowledge of the biology of feed efficiency in dairy cattle.
Collapse
|
Journal Article |
5 |
17 |
18
|
Wang M, Hancock TP, MacLeod IM, Pryce JE, Cocks BG, Hayes BJ. Putative enhancer sites in the bovine genome are enriched with variants affecting complex traits. Genet Sel Evol 2017; 49:56. [PMID: 28683716 PMCID: PMC5499214 DOI: 10.1186/s12711-017-0331-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Accepted: 06/26/2017] [Indexed: 12/31/2022] Open
Abstract
Background Enhancers are non-coding DNA sequences, which when they are bound by specific proteins increase the level of gene transcription. Enhancers activate unique gene expression patterns within cells of different types or under different conditions. Enhancers are key contributors to gene regulation, and causative variants that affect quantitative traits in humans and mice have been located in enhancer regions. However, in the bovine genome, enhancers as well as other regulatory elements are not yet well defined. In this paper, we sought to improve the annotation of bovine enhancer regions by using publicly available mammalian enhancer information. To test if the identified putative bovine enhancer regions are enriched with functional variants that affect milk production traits, we performed genome-wide association studies using imputed whole-genome sequence data followed by meta-analysis and enrichment analysis. Results We produced a library of candidate bovine enhancer regions by using publicly available bovine ChIP-Seq enhancer data in combination with enhancer data that were identified based on sequence homology with human and mouse enhancer databases. We found that imputed whole-genome sequence variants associated with milk production traits in 16,581 dairy cattle were enriched with enhancer regions that were marked by bovine-liver H3K4me3 and H3K27ac histone modifications from both permutation tests and gene set enrichment analysis. Enhancer regions that were identified based on sequence homology with human and mouse enhancer regions were not as strongly enriched with trait-associated sequence variants as the bovine ChIP-Seq candidate enhancer regions. The bovine ChIP-Seq enriched enhancer regions were located near genes and quantitative trait loci that are associated with pregnancy, growth, disease resistance, meat quality and quantity, and milk quality and quantity traits in dairy and beef cattle. Conclusions Our results suggest that sequence variants within enhancer regions that are located in bovine non-coding genomic regions contribute to the variation in complex traits. The level of enrichment was higher in bovine-specific enhancer regions that were identified by detecting histone modifications H3K4me3 and H3K27ac in bovine liver tissues than in enhancer regions identified by sequence homology with human and mouse data. These results highlight the need to use bovine-specific experimental data for the identification of enhancer regions. Electronic supplementary material The online version of this article (doi:10.1186/s12711-017-0331-4) contains supplementary material, which is available to authorized users.
Collapse
|
Journal Article |
8 |
15 |
19
|
Turner MJ, MacLeod IM, Rothberg AD. Calibration of Fleisch and screen pneumotachographs for use with various oxygen concentrations. Med Biol Eng Comput 1990; 28:200-4. [PMID: 2376998 DOI: 10.1007/bf02441780] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
|
35 |
14 |
20
|
Khansefid M, Goddard ME, Haile-Mariam M, Konstantinov KV, Schrooten C, de Jong G, Jewell EG, O’Connor E, Pryce JE, Daetwyler HD, MacLeod IM. Improving Genomic Prediction of Crossbred and Purebred Dairy Cattle. Front Genet 2020; 11:598580. [PMID: 33381150 PMCID: PMC7767986 DOI: 10.3389/fgene.2020.598580] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 11/19/2020] [Indexed: 11/17/2022] Open
Abstract
This study assessed the accuracy and bias of genomic prediction (GP) in purebred Holstein (H) and Jersey (J) as well as crossbred (H and J) validation cows using different reference sets and prediction strategies. The reference sets were made up of different combinations of 36,695 H and J purebreds and crossbreds. Additionally, the effect of using different sets of marker genotypes on GP was studied (conventional panel: 50k, custom panel enriched with, or close to, causal mutations: XT_50k, and conventional high-density with a limited custom set: pruned HDnGBS). We also compared the use of genomic best linear unbiased prediction (GBLUP) and Bayesian (emBayesR) models, and the traits tested were milk, fat, and protein yields. On average, by including crossbred cows in the reference population, the prediction accuracies increased by 0.01-0.08 and were less biased (regression coefficient closer to 1 by 0.02-0.16), and the benefit was greater for crossbreds compared to purebreds. The accuracy of prediction increased by 0.02 using XT_50k compared to 50k genotypes without affecting the bias. Although using pruned HDnGBS instead of 50k also increased the prediction accuracy by about 0.02, it increased the bias for purebred predictions in emBayesR models. Generally, emBayesR outperformed GBLUP for prediction accuracy when using 50k or pruned HDnGBS genotypes, but the benefits diminished with XT_50k genotypes. Crossbred predictions derived from a joint pure H and J reference were similar in accuracy to crossbred predictions derived from the two separate purebred reference sets and combined proportional to breed composition. However, the latter approach was less biased by 0.13. Most interestingly, using an equalized breed reference instead of an H-dominated reference, on average, reduced the bias of prediction by 0.16-0.19 and increased the accuracy by 0.04 for crossbred and J cows, with a little change in the H accuracy. In conclusion, we observed improved genomic predictions for both crossbreds and purebreds by equalizing breed contributions in a mixed breed reference that included crossbred cows. Furthermore, we demonstrate, that compared to the conventional 50k or high-density panels, our customized set of 50k sequence markers improved or matched the prediction accuracy and reduced bias with both GBLUP and Bayesian models.
Collapse
|
research-article |
5 |
14 |
21
|
Turner MJ, MacLeod IM, Rothberg AD. Effect of airway inertance on linear regression estimates of resistance and compliance in mechanically ventilated infants: a computer model study. Pediatr Pulmonol 1991; 11:147-52. [PMID: 1758732 DOI: 10.1002/ppul.1950110212] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Respiratory inertance (I) is usually ignored when resistance (R) and compliance (C) of mechanically ventilated infants are estimated by least squares linear regression. Values of I that have been reported for these patients can cause impedances whose magnitudes approximate respiratory resistance. We show theoretically that if inertance is neglected no error is expected in resistance estimates, but a positive bias in compliance can be, proportional to the inertance, the compliance, and the sinusoidal frequency at which the measurements are made. To determine the errors in parameter estimates when the pressure waveform is non-sinusoidal, we simulated linear regression based on non-inertive and inertive models. R, C, and I of the simulated lung were varied over the range expected in an infant intensive care unit. The ventilator was simulated as a critically damped second order system with a square pulse input. The rise time (TR) of the pressure pulse was varied over the range reported in infant ICUs. Simulated measurements confirmed that resistance is correctly estimated if inertance is neglected. Maximum error in compliance estimates (13%) occurred when TR and R were low, and C and I were high. The variation in the error in estimated compliance was consistent with the theory. Coefficients of variation of the parameters, the standard errors, and R2 of the regressions tended to deteriorate with increasing compliance error, but the relationships were not single valued. These statistics may alert investigators to possible bias in compliance caused by neglected inertance, but cannot be used to correct any bias.
Collapse
|
|
34 |
12 |
22
|
Wang T, Chen YPP, MacLeod IM, Pryce JE, Goddard ME, Hayes BJ. Application of a Bayesian non-linear model hybrid scheme to sequence data for genomic prediction and QTL mapping. BMC Genomics 2017; 18:618. [PMID: 28810831 PMCID: PMC5558724 DOI: 10.1186/s12864-017-4030-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2017] [Accepted: 08/07/2017] [Indexed: 11/10/2022] Open
Abstract
Background Using whole genome sequence data might improve genomic prediction accuracy, when compared with high-density SNP arrays, and could lead to identification of casual mutations affecting complex traits. For some traits, the most accurate genomic predictions are achieved with non-linear Bayesian methods. However, as the number of variants and the size of the reference population increase, the computational time required to implement these Bayesian methods (typically with Monte Carlo Markov Chain sampling) becomes unfeasibly long. Results Here, we applied a new method, HyB_BR (for Hybrid BayesR), which implements a mixture model of normal distributions and hybridizes an Expectation-Maximization (EM) algorithm followed by Markov Chain Monte Carlo (MCMC) sampling, to genomic prediction in a large dairy cattle population with imputed whole genome sequence data. The imputed whole genome sequence data included 994,019 variant genotypes of 16,214 Holstein and Jersey bulls and cows. Traits included fat yield, milk volume, protein kg, fat% and protein% in milk, as well as fertility and heat tolerance. HyB_BR achieved genomic prediction accuracies as high as the full MCMC implementation of BayesR, both for predicting a validation set of Holstein and Jersey bulls (multi-breed prediction) and a validation set of Australian Red bulls (across-breed prediction). HyB_BR had a ten fold reduction in compute time, compared with the MCMC implementation of BayesR (48 hours versus 594 hours). We also demonstrate that in many cases HyB_BR identified sequence variants with a high posterior probability of affecting the milk production or fertility traits that were similar to those identified in BayesR. For heat tolerance, both HyB_BR and BayesR found variants in or close to promising candidate genes associated with this trait and not detected by previous studies. Conclusions The results demonstrate that HyB_BR is a feasible method for simultaneous genomic prediction and QTL mapping with whole genome sequence in large reference populations.
Collapse
|
Journal Article |
8 |
12 |
23
|
Nguyen TV, Vander Jagt CJ, Wang J, Daetwyler HD, Xiang R, Goddard ME, Nguyen LT, Ross EM, Hayes BJ, Chamberlain AJ, MacLeod IM. In it for the long run: perspectives on exploiting long-read sequencing in livestock for population scale studies of structural variants. Genet Sel Evol 2023; 55:9. [PMID: 36721111 PMCID: PMC9887926 DOI: 10.1186/s12711-023-00783-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 01/23/2023] [Indexed: 02/02/2023] Open
Abstract
Studies have demonstrated that structural variants (SV) play a substantial role in the evolution of species and have an impact on Mendelian traits in the genome. However, unlike small variants (< 50 bp), it has been challenging to accurately identify and genotype SV at the population scale using short-read sequencing. Long-read sequencing technologies are becoming competitively priced and can address several of the disadvantages of short-read sequencing for the discovery and genotyping of SV. In livestock species, analysis of SV at the population scale still faces challenges due to the lack of resources, high costs, technological barriers, and computational limitations. In this review, we summarize recent progress in the characterization of SV in the major livestock species, the obstacles that still need to be overcome, as well as the future directions in this growing field. It seems timely that research communities pool resources to build global population-scale long-read sequencing consortiums for the major livestock species for which the application of genomic tools has become cost-effective.
Collapse
|
Review |
2 |
11 |
24
|
Dorji J, Vander Jagt CJ, Garner JB, Marett LC, Mason BA, Reich CM, Xiang R, Clark EL, Cocks BG, Chamberlain AJ, MacLeod IM, Daetwyler HD. Expression of mitochondrial protein genes encoded by nuclear and mitochondrial genomes correlate with energy metabolism in dairy cattle. BMC Genomics 2020; 21:720. [PMID: 33076826 PMCID: PMC7574280 DOI: 10.1186/s12864-020-07018-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Accepted: 08/20/2020] [Indexed: 12/21/2022] Open
Abstract
Background Mutations in the mitochondrial genome have been implicated in mitochondrial disease, often characterized by impaired cellular energy metabolism. Cellular energy metabolism in mitochondria involves mitochondrial proteins (MP) from both the nuclear (NuMP) and mitochondrial (MtMP) genomes. The expression of MP genes in tissues may be tissue specific to meet varying specific energy demands across the tissues. Currently, the characteristics of MP gene expression in tissues of dairy cattle are not well understood. In this study, we profile the expression of MP genes in 29 adult and six foetal tissues in dairy cattle using RNA sequencing and gene expression analyses: particularly differential gene expression and co-expression network analyses. Results MP genes were differentially expressed (DE; over-expressed or under-expressed) across tissues in cattle. All 29 tissues showed DE NuMP genes in varying proportions of over-expression and under-expression. On the other hand, DE of MtMP genes was observed in < 50% of tissues and notably MtMP genes within a tissue was either all over-expressed or all under-expressed. A high proportion of NuMP (up to 60%) and MtMP (up to 100%) genes were over-expressed in tissues with expected high metabolic demand; heart, skeletal muscles and tongue, and under-expressed (up to 45% of NuMP, 77% of MtMP genes) in tissues with expected low metabolic rates; leukocytes, thymus, and lymph nodes. These tissues also invariably had the expression of all MtMP genes in the direction of dominant NuMP genes expression. The NuMP and MtMP genes were highly co-expressed across tissues and co-expression of genes in a cluster were non-random and functionally enriched for energy generation pathway. The differential gene expression and co-expression patterns were validated in independent cow and sheep datasets. Conclusions The results of this study support the concept that there are biological interaction of MP genes from the mitochondrial and nuclear genomes given their over-expression in tissues with high energy demand and co-expression in tissues. This highlights the importance of considering MP genes from both genomes in future studies related to mitochondrial functions and traits related to energy metabolism.
Collapse
|
Journal Article |
5 |
10 |
25
|
Fikere M, Barbulescu DM, Malmberg MM, Shi F, Koh JCO, Slater AT, MacLeod IM, Bowman PJ, Salisbury PA, Spangenberg GC, Cogan NOI, Daetwyler HD. Genomic Prediction Using Prior Quantitative Trait Loci Information Reveals a Large Reservoir of Underutilised Blackleg Resistance in Diverse Canola ( Brassica napus L.) Lines. THE PLANT GENOME 2018; 11. [PMID: 30025024 DOI: 10.3835/plantgenome2017.11.0100] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Genomic prediction is becoming a popular plant breeding method to predict the genetic merit of lines. While some genomic prediction results have been reported in canola, none have been evaluated for blackleg disease. Here, we report genomic prediction for seedling emergence, survival rate, and internal infection), using 532 Spring and Winter canola lines. These lines were phenotyped in two replicated blackleg disease nurseries grown at Wickliffe and Green Lake, Victoria, Australia. A transcriptome genotyping-by-sequencing approach revealed 98,054 single nucleotide polymorphisms (SNPs) after quality control. We assessed various genomic prediction scenarios based on Genomic Best Linear Unbiased Prediction (GBLUP), BayesR and BayesRC, which can make use of prior quantitative trait loci information, via cross-validation. Clustering based on genomic relationships showed that Winter and Spring lines were genetically distinct, indicating limited gene flow between sets. Genetic correlations within traits between Spring and Winter lines ranged from 0.68 and 0.90 (mean = 0.76). Based on GBLUP in the whole population, moderate to high genomic prediction accuracies were achieved within environments (0.35-0.74) and were reduced across environments (0.28-0.58). Prediction accuracy within the Spring set ranged from 0.30-0.69, and from 0.19-0.71 within the Winter set. The BayesR model resulted in slightly lower accuracy to GBLUP. The proportion of genetic variance explained by known blackleg quantitative trait loci (QTL) was < 30%, indicating that there is a large reservoir of genetic variation in blackleg traits that remains to be discovered, but can be captured with genomic prediction. However, providing prior information of known QTL in the BayesRC method resulted in an increased prediction accuracy for survival and internal infection, particularly with Spring lines. Overall, these promising results indicate that genomic prediction will be a valuable tool to make use of all genetic variation to improve blackleg resistance in canola.
Collapse
|
|
7 |
9 |