1
|
Stegemiller MR, Redden RR, Notter DR, Taylor T, Taylor JB, Cockett NE, Heaton MP, Kalbfleisch TS, Murdoch BM. Using whole genome sequence to compare variant callers and breed differences of US sheep. Front Genet 2023; 13:1060882. [PMID: 36685812 PMCID: PMC9846548 DOI: 10.3389/fgene.2022.1060882] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 11/22/2022] [Indexed: 01/06/2023] Open
Abstract
As whole genome sequence (WGS) data sets have become abundant and widely available, so has the need for variant detection and scoring. The aim of this study was to compare the accuracy of commonly used variant calling programs, Freebayes and GATK HaplotypeCaller (GATK-HC), and to use U.S. sheep WGS data sets to identify novel breed-associated SNPs. Sequence data from 145 sheep consisting of 14 U.S. breeds were filtered and biallelic single nucleotide polymorphisms (SNPs) were retained for genotyping analyses. Genotypes from both programs were compared to each other and to genotypes from bead arrays. The SNPs from WGS were compared to the bead array data with breed heterozygosity, principal component analysis and identifying breed associated SNPs to analyze genetic diversity. The average sequence read depth was 2.78 reads greater with 6.11% more SNPs being identified in Freebayes compared to GATK-HC. The genotype concordance of the variant callers to bead array data was 96.0% and 95.5% for Freebayes and GATK-HC, respectively. Genotyping with WGS identified 10.5 million SNPs from all 145 sheep. This resulted in an 8% increase in measured heterozygosity and greater breed separation in the principal component analysis compared to the bead array analysis. There were 1,849 SNPs identified in only the Romanov sheep where all 10 rams were homozygous for one allele and the remaining 135 sheep from 13 breeds were homozygous for the opposite allele. Both variant calling programs had greater than 95% concordance of SNPs with bead array data, and either was suitably accurate for ovine WGS data sets. The use of WGS SNPs improved the resolution of PCA analysis and was critical for identifying Romanov breed-associated SNPs. Subsets of such SNPs could be used to estimate germplasm composition in animals without pedigree information.
Collapse
Affiliation(s)
- Morgan R. Stegemiller
- Department of Animal, Veterinary and Food Sciences, University of Idaho, Moscow, ID, United States
| | - Reid R. Redden
- Texas A&M AgriLife Research and Extension, Texas A&M University, San Angelo, TX, United States
| | - David R. Notter
- School of Animal Sciences, Virginia Tech, Blacksburg, VA, United States
| | - Todd Taylor
- Department of Animal and Dairy Sciences, University of Wisconsin-Madison, Madison, WI, United States
| | - J. Bret Taylor
- United States Sheep Experiment Station, United States Department of Agriculture, Agricultural Research Service, Dubois, ID, United States
| | - Noelle E. Cockett
- Department of Animal, Dairy and Veterinary Sciences, Utah State University, Logan, UT, United States
| | - Michael P. Heaton
- USDA, ARS, U.S. Meat Animal Research Center, Clay Center, NE, United States
| | - Theodore S. Kalbfleisch
- Gluck Equine Research Center, College of Agriculture, Food, and Environment, University of Kentucky, Lexington, KY, United States,*Correspondence: Theodore S. Kalbfleisch, ; Brenda M. Murdoch,
| | - Brenda M. Murdoch
- Department of Animal, Veterinary and Food Sciences, University of Idaho, Moscow, ID, United States,*Correspondence: Theodore S. Kalbfleisch, ; Brenda M. Murdoch,
| |
Collapse
|
2
|
Heaton MP, Smith TPL, Freking BA, Workman AM, Bennett GL, Carnahan JK, Kalbfleisch TS. Using sheep genomes from diverse U.S. breeds to identify missense variants in genes affecting fecundity. F1000Res 2017; 6:1303. [PMID: 28928950 PMCID: PMC5590088 DOI: 10.12688/f1000research.12216.1] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/28/2017] [Indexed: 11/20/2022] Open
Abstract
Background: Access to sheep genome sequences significantly improves the chances of identifying genes that may influence the health, welfare, and productivity of these animals. Methods: A public, searchable DNA sequence resource for U.S. sheep was created with whole genome sequence (WGS) of 96 rams. The animals shared minimal pedigree relationships and represent nine popular U.S. breeds and a composite line. The genomes are viewable online with the user-friendly Integrated Genome Viewer environment, and may be used to identify and decode gene variants present in U.S. sheep. Results: The genomes had a combined average read depth of 16, and an average WGS genotype scoring rate and accuracy exceeding 99%. The utility of this resource was illustrated by characterizing three genes with 14 known coding variants affecting litter size in global sheep populations: growth and differentiation factor 9 (
GDF9), bone morphogenetic protein 15 (
BMP15), and bone morphogenetic protein receptor 1B (
BMPR1B). In the 96 U.S. rams, nine missense variants encoding 11 protein variants were identified. However, only one was previously reported to affect litter size (
GDF9 V371M, Finnsheep). Two missense variants in
BMP15 were identified that had not previously been reported: R67Q in Dorset, and L252P in Dorper and White Dorper breeds. Also, two novel missense variants were identified in
BMPR1B: M64I in Katahdin, and T345N in Romanov and Finnsheep breeds. Based on the strict conservation of amino acid residues across placental mammals, the four variants encoded by
BMP15 and
BMPR1B are predicted to interfere with their function. However, preliminary analyses of litter sizes in small samples did not reveal a correlation with variants in
BMP15 and
BMPR1B with daughters of these rams. Conclusions: Collectively, this report describes a new resource for discovering protein variants
in silico and identifies alleles for further testing of their effects on litter size in U.S. breeds.
Collapse
Affiliation(s)
- Michael P Heaton
- U.S. Meat Animal Research Center (USMARC), Clay Center, NE, 68933, USA
| | - Timothy P L Smith
- U.S. Meat Animal Research Center (USMARC), Clay Center, NE, 68933, USA
| | - Bradley A Freking
- U.S. Meat Animal Research Center (USMARC), Clay Center, NE, 68933, USA
| | - Aspen M Workman
- U.S. Meat Animal Research Center (USMARC), Clay Center, NE, 68933, USA
| | - Gary L Bennett
- U.S. Meat Animal Research Center (USMARC), Clay Center, NE, 68933, USA
| | - Jacky K Carnahan
- U.S. Meat Animal Research Center (USMARC), Clay Center, NE, 68933, USA
| | - Theodore S Kalbfleisch
- Department of Biochemistry and Molecular Biology, School of Medicine, University of Louisville, Louisville, KY, 40202, USA
| |
Collapse
|
3
|
Heaton MP, Leymaster KA, Kalbfleisch TS, Kijas JW, Clarke SM, McEwan J, Maddox JF, Basnayake V, Petrik DT, Simpson B, Smith TPL, Chitko-McKown CG. SNPs for parentage testing and traceability in globally diverse breeds of sheep. PLoS One 2014; 9:e94851. [PMID: 24740156 PMCID: PMC3989260 DOI: 10.1371/journal.pone.0094851] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2013] [Accepted: 03/19/2014] [Indexed: 01/02/2023] Open
Abstract
DNA-based parentage determination accelerates genetic improvement in sheep by increasing pedigree accuracy. Single nucleotide polymorphism (SNP) markers can be used for determining parentage and to provide unique molecular identifiers for tracing sheep products to their source. However, the utility of a particular "parentage SNP" varies by breed depending on its minor allele frequency (MAF) and its sequence context. Our aims were to identify parentage SNPs with exceptional qualities for use in globally diverse breeds and to develop a subset for use in North American sheep. Starting with genotypes from 2,915 sheep and 74 breed groups provided by the International Sheep Genomics Consortium (ISGC), we analyzed 47,693 autosomal SNPs by multiple criteria and selected 163 with desirable properties for parentage testing. On average, each of the 163 SNPs was highly informative (MAF≥0.3) in 48±5 breed groups. Nearby polymorphisms that could otherwise confound genetic testing were identified by whole genome and Sanger sequencing of 166 sheep from 54 breed groups. A genetic test with 109 of the 163 parentage SNPs was developed for matrix-assisted laser desorption/ionization-time-of-flight mass spectrometry. The scoring rates and accuracies for these 109 SNPs were greater than 99% in a panel of North American sheep. In a blinded set of 96 families (sire, dam, and non-identical twin lambs), each parent of every lamb was identified without using the other parent's genotype. In 74 ISGC breed groups, the median estimates for probability of a coincidental match between two animals (PI), and the fraction of potential adults excluded from parentage (PE) were 1.1×10(-39) and 0.999987, respectively, for the 109 SNPs combined. The availability of a well-characterized set of 163 parentage SNPs facilitates the development of high-throughput genetic technologies for implementing accurate and economical parentage testing and traceability in many of the world's sheep breeds.
Collapse
Affiliation(s)
- Michael P. Heaton
- U.S. Meat Animal Research Center (USMARC), Clay Center, Nebraska, United States of America
- * E-mail:
| | - Kreg A. Leymaster
- U.S. Meat Animal Research Center (USMARC), Clay Center, Nebraska, United States of America
| | - Theodore S. Kalbfleisch
- Department of Biochemistry and Molecular Biology, School of Medicine, University of Louisville, Louisville, Kentucky, United States of America
| | - James W. Kijas
- Division of Animal, Food and Health Sciences, CSIRO, Brisbane, Australia
| | | | - John McEwan
- AgResearch, Invermay Agricultural Center, Mosgiel, New Zealand
| | | | | | - Dustin T. Petrik
- GeneSeek, a Neogen company, Lincoln, Nebraska, United States of America
| | - Barry Simpson
- GeneSeek, a Neogen company, Lincoln, Nebraska, United States of America
| | - Timothy P. L. Smith
- U.S. Meat Animal Research Center (USMARC), Clay Center, Nebraska, United States of America
| | - Carol G. Chitko-McKown
- U.S. Meat Animal Research Center (USMARC), Clay Center, Nebraska, United States of America
| | | |
Collapse
|
4
|
Kalbfleisch T, Heaton MP. Mapping whole genome shotgun sequence and variant calling in mammalian species without their reference genomes. F1000Res 2013; 2:244. [PMID: 25075278 PMCID: PMC4103496 DOI: 10.12688/f1000research.2-244.v2] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/04/2014] [Indexed: 01/20/2023] Open
Abstract
Genomics research in mammals has produced reference genome sequences that are essential for identifying variation associated with disease. High quality reference genome sequences are now available for humans, model species, and economically important agricultural animals. Comparisons between these species have provided unique insights into mammalian gene function. However, the number of species with reference genomes is small compared to those needed for studying molecular evolutionary relationships in the tree of life. For example, among the even-toed ungulates there are approximately 300 species whose phylogenetic relationships have been calculated in the 10k trees project. Only six of these have reference genomes: cattle, swine, sheep, goat, water buffalo, and bison. Although reference sequences will eventually be developed for additional hoof stock, the resources in terms of time, money, infrastructure and expertise required to develop a quality reference genome may be unattainable for most species for at least another decade. In this work we mapped 35 Gb of next generation sequence data of a Katahdin sheep to its own species' reference genome ( Ovis aries Oar3.1) and to that of a species that diverged 15 to 30 million years ago ( Bos taurus UMD3.1). In total, 56% of reads covered 76% of UMD3.1 to an average depth of 6.8 reads per site, 83 million variants were identified, of which 78 million were homozygous and likely represent interspecies nucleotide differences. Excluding repeat regions and sex chromosomes, nearly 3.7 million heterozygous sites were identified in this animal vs. bovine UMD3.1, representing polymorphisms occurring in sheep. Of these, 41% could be readily mapped to orthologous positions in ovine Oar3.1 with 80% corroborated as heterozygous. These variant sites, identified via interspecies mapping could be used for comparative genomics, disease association studies, and ultimately to understand mammalian gene function.
Collapse
Affiliation(s)
- Ted Kalbfleisch
- Department of Biochemistry and Molecular Biology, School of Medicine, University of Louisville, Louisville, KY, 40202, USA
- Intrepid Bioinformatics, Louisville, KY, 40202, USA
| | - Michael P Heaton
- USDA Meat Animal Research Center, Clay Center, Nebraska, 68933, USA
| |
Collapse
|
5
|
Kalbfleisch T, Heaton MP. Mapping whole genome shotgun sequence and variant calling in mammalian species without their reference genomes. F1000Res 2013; 2:244. [PMID: 25075278 PMCID: PMC4103496 DOI: 10.12688/f1000research.2-244.v1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/05/2013] [Indexed: 05/28/2024] Open
Abstract
Genomics research in mammals has produced reference genome sequences that are essential for identifying variation associated with disease. High quality reference genome sequences are now available for humans, model species, and economically important agricultural animals. Comparisons between these species have provided unique insights into mammalian gene function. However, the number of species with reference genomes is small compared to those needed for studying molecular evolutionary relationships in the tree of life. For example, among the even-toed ungulates there are approximately 300 species whose phylogenetic relationships have been calculated in the 10k trees project. Only six of these have reference genomes: cattle, swine, sheep, goat, water buffalo, and bison. Although reference sequences will eventually be developed for additional hoof stock, the resources in terms of time, money, infrastructure and expertise required to develop a quality reference genome may be unattainable for most species for at least another decade. In this work we mapped 35 Gb of next generation sequence data of a Katahdin sheep to its own species' reference genome ( Ovis aries Oar3.1) and to that of a species that diverged 15 to 30 million years ago ( Bos taurus UMD3.1). In total, 56% of reads covered 76% of UMD3.1 to an average depth of 6.8 reads per site, 83 million variants were identified, of which 78 million were homozygous and likely represent interspecies nucleotide differences. Excluding genome repeat regions and sex chromosomes, approximately 3.7 million heterozygous sites were identified in this animal vs. bovine UMD3.1, representing polymorphisms occurring in sheep. Of these, 41% could be readily mapped to orthologous positions in ovine Oar3.1 with 80% corroborated as heterozygous. These variant sites, identified via interspecies mapping could be used for comparative genomics, disease association studies, and ultimately to understand mammalian gene function.
Collapse
Affiliation(s)
- Ted Kalbfleisch
- Department of Biochemistry and Molecular Biology, School of Medicine, University of Louisville, Louisville, KY, 40202, USA
- Intrepid Bioinformatics, Louisville, KY, 40202, USA
| | - Michael P Heaton
- USDA Meat Animal Research Center, Clay Center, Nebraska, 68933, USA
| |
Collapse
|
6
|
Heaton MP, Kalbfleisch TS, Petrik DT, Simpson B, Kijas JW, Clawson ML, Chitko-McKown CG, Harhay GP, Leymaster KA. Genetic testing for TMEM154 mutations associated with lentivirus susceptibility in sheep. PLoS One 2013; 8:e55490. [PMID: 23408992 PMCID: PMC3569457 DOI: 10.1371/journal.pone.0055490] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2012] [Accepted: 12/23/2012] [Indexed: 11/19/2022] Open
Abstract
In sheep, small ruminant lentiviruses cause an incurable, progressive, lymphoproliferative disease that affects millions of animals worldwide. Known as ovine progressive pneumonia virus (OPPV) in the U.S., and Visna/Maedi virus (VMV) elsewhere, these viruses reduce an animal’s health, productivity, and lifespan. Genetic variation in the ovine transmembrane protein 154 gene (TMEM154) has been previously associated with OPPV infection in U.S. sheep. Sheep with the ancestral TMEM154 haplotype encoding glutamate (E) at position 35, and either form of an N70I variant, were highly-susceptible compared to sheep homozygous for the K35 missense mutation. Our current overall aim was to characterize TMEM154 in sheep from around the world to develop an efficient genetic test for reduced susceptibility. The average frequency of TMEM154 E35 among 74 breeds was 0.51 and indicated that highly-susceptible alleles were present in most breeds around the world. Analysis of whole genome sequences from an international panel of 75 sheep revealed more than 1,300 previously unreported polymorphisms in a 62 kb region containing TMEM154 and confirmed that the most susceptible haplotypes were distributed worldwide. Novel missense mutations were discovered in the signal peptide (A13V) and the extracellular domains (E31Q, I74F, and I102T) of TMEM154. A matrix-assisted laser desorption/ionization–time-of flight mass spectrometry (MALDI-TOF MS) assay was developed to detect these and six previously reported missense and two deletion mutations in TMEM154. In blinded trials, the call rate for the eight most common coding polymorphisms was 99.4% for 499 sheep tested and 96.0% of the animals were assigned paired TMEM154 haplotypes (i.e., diplotypes). The widespread distribution of highly-susceptible TMEM154 alleles suggests that genetic testing and selection may improve the health and productivity of infected flocks.
Collapse
Affiliation(s)
- Michael P. Heaton
- U.S. Meat Animal Research Center (USMARC), Clay Center, Nebraska, United States of America
- * E-mail: (MPH); (TSK)
| | - Theodore S. Kalbfleisch
- Department of Biochemistry and Molecular Biology, School of Medicine, University of Louisville, Louisville, Kentucky, United States of America
- Intrepid Bioinformatics, Louisville, Kentucky, United States of America
- * E-mail: (MPH); (TSK)
| | - Dustin T. Petrik
- GeneSeek, a Neogen company, Lincoln, Nebraska, United States of America
| | - Barry Simpson
- GeneSeek, a Neogen company, Lincoln, Nebraska, United States of America
| | - James W. Kijas
- Division of Animal, Food and Health Sciences, CSIRO, Brisbane, Australia
| | - Michael L. Clawson
- U.S. Meat Animal Research Center (USMARC), Clay Center, Nebraska, United States of America
| | - Carol G. Chitko-McKown
- U.S. Meat Animal Research Center (USMARC), Clay Center, Nebraska, United States of America
| | - Gregory P. Harhay
- U.S. Meat Animal Research Center (USMARC), Clay Center, Nebraska, United States of America
| | - Kreg A. Leymaster
- U.S. Meat Animal Research Center (USMARC), Clay Center, Nebraska, United States of America
| | | |
Collapse
|
7
|
de Andrade CP, de Almeida LL, de Castro LA, Driemeier D, da Silva SC. Development of a real-time polymerase chain reaction assay for single nucleotide polymorphism genotyping codons 136, 154, and 171 of the prnp gene and application to Brazilian sheep herds. J Vet Diagn Invest 2013; 25:120-4. [PMID: 23345274 DOI: 10.1177/1040638712471343] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Scrapie is a transmissible spongiform encephalopathy of sheep and goats and is associated with the deposition of an abnormal isoform of prion protein (PrP(sc)). This isoform presents an altered conformation that leads to its aggregation in the host's central nervous and lymphoreticular systems. A predisposition to the prion-agent infection can be influenced by specific genotypes that are related to polymorphisms in the ovine prnp gene. The most characterized polymorphisms occur at codons 136, 154, and 171, with genotype VRQ being the most susceptible and ARR the most resistant. In the current study, a real-time quantitative polymerase chain reaction (qPCR) technique based on allele-specific TaqMan probes was developed to identify single nucleotide polymorphisms in the prnp gene from Brazilian herds. Specific primers and TaqMan probes were designed for all 3 codons of interest. Samples from a total of 142 animals were analyzed by qPCR, followed by DNA sequencing of the amplicons. All of the genotypes determined by qPCR were in agreement with the data determined by DNA sequencing. In all 3 of the analyzed breeds, the majority of the animals were AA homozygous for the 136 codon. The most frequent genotype for codon 154 was RR, and genotypes QQ and QR were the most frequent for codon 171. The results are discussed in relation to establishing scrapie control measures and breeding programs for Brazilian herds.
Collapse
Affiliation(s)
- Caroline P de Andrade
- Setor de Patologia Veterinária, Faculdade de Veterinária, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
| | | | | | | | | |
Collapse
|
8
|
Reduced lentivirus susceptibility in sheep with TMEM154 mutations. PLoS Genet 2012; 8:e1002467. [PMID: 22291605 PMCID: PMC3266874 DOI: 10.1371/journal.pgen.1002467] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2011] [Accepted: 11/21/2011] [Indexed: 11/19/2022] Open
Abstract
Visna/Maedi, or ovine progressive pneumonia (OPP) as it is known in the United States, is an incurable slow-acting disease of sheep caused by persistent lentivirus infection. This disease affects multiple tissues, including those of the respiratory and central nervous systems. Our aim was to identify ovine genetic risk factors for lentivirus infection. Sixty-nine matched pairs of infected cases and uninfected controls were identified among 736 naturally exposed sheep older than five years of age. These pairs were used in a genome-wide association study with 50,614 markers. A single SNP was identified in the ovine transmembrane protein (TMEM154) that exceeded genome-wide significance (unadjusted p-value 3×10−9). Sanger sequencing of the ovine TMEM154 coding region identified six missense and two frameshift deletion mutations in the predicted signal peptide and extracellular domain. Two TMEM154 haplotypes encoding glutamate (E) at position 35 were associated with infection while a third haplotype with lysine (K) at position 35 was not. Haplotypes encoding full-length E35 isoforms were analyzed together as genetic risk factors in a multi-breed, matched case-control design, with 61 pairs of 4-year-old ewes. The odds of infection for ewes with one copy of a full-length TMEM154 E35 allele were 28 times greater than the odds for those without (p-value<0.0001, 95% CI 5–1,100). In a combined analysis of nine cohorts with 2,705 sheep from Nebraska, Idaho, and Iowa, the relative risk of infection was 2.85 times greater for sheep with a full-length TMEM154 E35 allele (p-value<0.0001, 95% CI 2.36–3.43). Although rare, some sheep were homozygous for TMEM154 deletion mutations and remained uninfected despite a lifetime of significant exposure. Together, these findings indicate that TMEM154 may play a central role in ovine lentivirus infection and removing sheep with the most susceptible genotypes may help eradicate OPP and protect flocks from reinfection. Ovine lentivirus targets the host immune system and causes persistent retroviral infections affecting millions of sheep worldwide. In primates, lentivirus resistance is attributed to mutant virus coreceptors that are not expressed. In sheep, some animals are resistant to lentivirus infection despite repeated exposure; however, the mechanism of resistance is unknown. We designed a genome-wide association study to test whether sheep might have genetic variation that protects against lentivirus infection. Our results showed that variation in an ovine gene (TMEM154) was associated with infection. Sheep with the ancestral type of this gene were nearly three times more likely to become infected than those with mutant forms. We also discovered two mutant forms predicted to abolish the protein's function. Although the biological function of TMEM154 is unknown, our results indicate that it plays an important role in lentivirus infection in sheep. Producing sheep with the least susceptible form of TMEM154 may help eradicate the ovine disease caused by lentivirus.
Collapse
|