1
|
Winfield MO, Allen AM, Burridge AJ, Barker GLA, Benbow HR, Wilkinson PA, Coghill J, Waterfall C, Davassi A, Scopes G, Pirani A, Webster T, Brew F, Bloor C, King J, West C, Griffiths S, King I, Bentley AR, Edwards KJ. High-density SNP genotyping array for hexaploid wheat and its secondary and tertiary gene pool. PLANT BIOTECHNOLOGY JOURNAL 2016; 14:1195-206. [PMID: 26466852 PMCID: PMC4950041 DOI: 10.1111/pbi.12485] [Citation(s) in RCA: 261] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2015] [Revised: 08/21/2015] [Accepted: 09/07/2015] [Indexed: 05/15/2023]
Abstract
In wheat, a lack of genetic diversity between breeding lines has been recognized as a significant block to future yield increases. Species belonging to bread wheat's secondary and tertiary gene pools harbour a much greater level of genetic variability, and are an important source of genes to broaden its genetic base. Introgression of novel genes from progenitors and related species has been widely employed to improve the agronomic characteristics of hexaploid wheat, but this approach has been hampered by a lack of markers that can be used to track introduced chromosome segments. Here, we describe the identification of a large number of single nucleotide polymorphisms that can be used to genotype hexaploid wheat and to identify and track introgressions from a variety of sources. We have validated these markers using an ultra-high-density Axiom(®) genotyping array to characterize a range of diploid, tetraploid and hexaploid wheat accessions and wheat relatives. To facilitate the use of these, both the markers and the associated sequence and genotype information have been made available through an interactive web site.
Collapse
|
research-article |
9 |
261 |
2
|
Allen AM, Winfield MO, Burridge AJ, Downie RC, Benbow HR, Barker GLA, Wilkinson PA, Coghill J, Waterfall C, Davassi A, Scopes G, Pirani A, Webster T, Brew F, Bloor C, Griffiths S, Bentley AR, Alda M, Jack P, Phillips AL, Edwards KJ. Characterization of a Wheat Breeders' Array suitable for high-throughput SNP genotyping of global accessions of hexaploid bread wheat (Triticum aestivum). PLANT BIOTECHNOLOGY JOURNAL 2017; 15:390-401. [PMID: 27627182 PMCID: PMC5316916 DOI: 10.1111/pbi.12635] [Citation(s) in RCA: 198] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Revised: 09/02/2016] [Accepted: 09/09/2016] [Indexed: 05/18/2023]
Abstract
Targeted selection and inbreeding have resulted in a lack of genetic diversity in elite hexaploid bread wheat accessions. Reduced diversity can be a limiting factor in the breeding of high yielding varieties and crucially can mean reduced resilience in the face of changing climate and resource pressures. Recent technological advances have enabled the development of molecular markers for use in the assessment and utilization of genetic diversity in hexaploid wheat. Starting with a large collection of 819 571 previously characterized wheat markers, here we describe the identification of 35 143 single nucleotide polymorphism-based markers, which are highly suited to the genotyping of elite hexaploid wheat accessions. To assess their suitability, the markers have been validated using a commercial high-density Affymetrix Axiom® genotyping array (the Wheat Breeders' Array), in a high-throughput 384 microplate configuration, to characterize a diverse global collection of wheat accessions including landraces and elite lines derived from commercial breeding communities. We demonstrate that the Wheat Breeders' Array is also suitable for generating high-density genetic maps of previously uncharacterized populations and for characterizing novel genetic diversity produced by mutagenesis. To facilitate the use of the array by the wheat community, the markers, the associated sequence and the genotype information have been made available through the interactive web site 'CerealsDB'.
Collapse
|
research-article |
8 |
198 |
3
|
Translating genotype data of 44,000 biobank participants into clinical pharmacogenetic recommendations: challenges and solutions. Genet Med 2018; 21:1345-1354. [PMID: 30327539 PMCID: PMC6752278 DOI: 10.1038/s41436-018-0337-5] [Citation(s) in RCA: 66] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Accepted: 10/02/2018] [Indexed: 12/26/2022] Open
Abstract
PURPOSE Biomedical databases combining electronic medical records and phenotypic and genomic data constitute a powerful resource for the personalization of treatment. To leverage the wealth of information provided, algorithms are required that systematically translate the contained information into treatment recommendations based on existing genotype-phenotype associations. METHODS We developed and tested algorithms for translation of preexisting genotype data of over 44,000 participants of the Estonian biobank into pharmacogenetic recommendations. We compared the results obtained by genome sequencing, exome sequencing, and genotyping using microarrays, and evaluated the impact of pharmacogenetic reporting based on drug prescription statistics in the Nordic countries and Estonia. RESULTS Our most striking result was that the performance of genotyping arrays is similar to that of genome sequencing, whereas exome sequencing is not suitable for pharmacogenetic predictions. Interestingly, 99.8% of all assessed individuals had a genotype associated with increased risks to at least one medication, and thereby the implementation of pharmacogenetic recommendations based on genotyping affects at least 50 daily drug doses per 1000 inhabitants. CONCLUSION We find that microarrays are a cost-effective solution for creating preemptive pharmacogenetic reports, and with slight modifications, existing databases can be applied for automated pharmacogenetic decision support for clinicians.
Collapse
|
Research Support, Non-U.S. Gov't |
7 |
66 |
4
|
Whole-exome sequencing to analyze population structure, parental inbreeding, and familial linkage. Proc Natl Acad Sci U S A 2016; 113:6713-8. [PMID: 27247391 DOI: 10.1073/pnas.1606460113] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Principal component analysis (PCA), homozygosity rate estimations, and linkage studies in humans are classically conducted through genome-wide single-nucleotide variant arrays (GWSA). We compared whole-exome sequencing (WES) and GWSA for this purpose. We analyzed 110 subjects originating from different regions of the world, including North Africa and the Middle East, which are poorly covered by public databases and have high consanguinity rates. We tested and applied a number of quality control (QC) filters. Compared with GWSA, we found that WES provided an accurate prediction of population substructure using variants with a minor allele frequency > 2% (correlation = 0.89 with the PCA coordinates obtained by GWSA). WES also yielded highly reliable estimates of homozygosity rates using runs of homozygosity with a 1,000-kb window (correlation = 0.94 with the estimates provided by GWSA). Finally, homozygosity mapping analyses in 15 families including a single offspring with high homozygosity rates showed that WES provided 51% less genome-wide linkage information than GWSA overall but 97% more information for the coding regions. At the genome-wide scale, 76.3% of linked regions were found by both GWSA and WES, 17.7% were found by GWSA only, and 6.0% were found by WES only. For coding regions, the corresponding percentages were 83.5%, 7.4%, and 9.1%, respectively. With appropriate QC filters, WES can be used for PCA and adjustment for population substructure, estimating homozygosity rates in individuals, and powerful linkage analyses, particularly in coding regions.
Collapse
|
Research Support, Non-U.S. Gov't |
9 |
54 |
5
|
Fatima F, McCallum BD, Pozniak CJ, Hiebert CW, McCartney CA, Fedak G, You FM, Cloutier S. Identification of New Leaf Rust Resistance Loci in Wheat and Wild Relatives by Array-Based SNP Genotyping and Association Genetics. FRONTIERS IN PLANT SCIENCE 2020; 11:583738. [PMID: 33304363 PMCID: PMC7701059 DOI: 10.3389/fpls.2020.583738] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2020] [Accepted: 10/15/2020] [Indexed: 05/22/2023]
Abstract
Leaf rust caused by Puccinia triticina is the most widespread rust disease of wheat. As pathogen populations are constantly evolving, identification of novel sources of resistance is necessary to maintain disease resistance and stay ahead of this plant-pathogen evolutionary arms race. The wild genepool of wheat is a rich source of genetic diversity, accounting for 44% of the Lr genes identified. Here we performed a genome-wide association study (GWAS) on a diverse germplasm of 385 accessions, including 27 different Triticum and Aegilops species. Genetic characterization using the wheat 90 K array and subsequent filtering identified a set of 20,501 single nucleotide polymorphic (SNP) markers. Of those, 9,570 were validated using exome capture and mapped onto the Chinese Spring reference sequence v1.0. Phylogenetic analyses illustrated four major clades, clearly separating the wild species from the T. aestivum and T. turgidum species. GWAS was conducted using eight statistical models for infection types against six leaf rust isolates and leaf rust severity rated in field trials for 3-4 years at 2-3 locations in Canada. Functional annotation of genes containing significant quantitative trait nucleotides (QTNs) identified 96 disease-related loci associated with leaf rust resistance. A total of 21 QTNs were in haplotype blocks or within flanking markers of at least 16 known Lr genes. The remaining significant QTNs were considered loci that putatively harbor new Lr resistance genes. Isolation of these candidate genes will contribute to the elucidation of their role in leaf rust resistance and promote their usefulness in marker-assisted selection and introgression.
Collapse
|
research-article |
5 |
24 |
6
|
Harlemon M, Ajayi O, Kachambwa P, Kim MS, Simonti CN, Quiver MH, Petersen DC, Mittal A, Fernandez PW, Hsing AW, Baichoo S, Agalliu I, Jalloh M, Gueye SM, Snyper NYF, Adusei B, Mensah JE, Abrahams AOD, Adebiyi AO, Orunmuyi AT, Aisuodionoe-Shadrach OI, Nwegbu MM, Joffe M, Chen WC, Irusen H, Neugut AI, Quintana Y, Seutloali M, Fadipe MB, Warren C, Woehrmann MH, Zhang P, Ongaco CM, Mawhinney M, McBride J, Andrews CV, Adams M, Pugh E, Rebbeck TR, Petersen LN, Lachance J. A Custom Genotyping Array Reveals Population-Level Heterogeneity for the Genetic Risks of Prostate Cancer and Other Cancers in Africa. Cancer Res 2020; 80:2956-2966. [PMID: 32393663 PMCID: PMC7335354 DOI: 10.1158/0008-5472.can-19-2165] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2019] [Revised: 10/03/2019] [Accepted: 05/06/2020] [Indexed: 12/25/2022]
Abstract
Although prostate cancer is the leading cause of cancer mortality for African men, the vast majority of known disease associations have been detected in European study cohorts. Furthermore, most genome-wide association studies have used genotyping arrays that are hindered by SNP ascertainment bias. To overcome these disparities in genomic medicine, the Men of African Descent and Carcinoma of the Prostate (MADCaP) Network has developed a genotyping array that is optimized for African populations. The MADCaP Array contains more than 1.5 million markers and an imputation backbone that successfully tags over 94% of common genetic variants in African populations. This array also has a high density of markers in genomic regions associated with cancer susceptibility, including 8q24. We assessed the effectiveness of the MADCaP Array by genotyping 399 prostate cancer cases and 403 controls from seven urban study sites in sub-Saharan Africa. Samples from Ghana and Nigeria clustered together, whereas samples from Senegal and South Africa yielded distinct ancestry clusters. Using the MADCaP array, we identified cancer-associated loci that have large allele frequency differences across African populations. Polygenic risk scores for prostate cancer were higher in Nigeria than in Senegal. In summary, individual and population-level differences in prostate cancer risk were revealed using a novel genotyping array. SIGNIFICANCE: This study presents an Africa-specific genotyping array, which enables investigators to identify novel disease associations and to fine-map genetic loci that are associated with prostate and other cancers.
Collapse
|
Research Support, N.I.H., Extramural |
5 |
21 |
7
|
Liu JJ, Sniezko RA, Sturrock RN, Chen H. Western white pine SNP discovery and high-throughput genotyping for breeding and conservation applications. BMC PLANT BIOLOGY 2014; 14:380. [PMID: 25547170 PMCID: PMC4302426 DOI: 10.1186/s12870-014-0380-6] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2014] [Accepted: 12/11/2014] [Indexed: 05/10/2023]
Abstract
BACKGROUND Western white pine (WWP, Pinus monticola Douglas ex D. Don) is of high interest in forest breeding and conservation because of its high susceptibility to the invasive disease white pine blister rust (WPBR, caused by the fungus Cronartium ribicola J. C. Fisch). However, WWP lacks genomic resource development and is evolutionarily far away from plants with available draft genome sequences. Here we report a single nucleotide polymorphism (SNP) study by bulked segregation-based RNA-Seq analysis. RESULTS A collection of resistance germplasm was used for construction of cDNA libraries and SNP genotyping. Approximately 36-89 million 2 × 100-bp reads were obtained per library and de-novo assembly generated the first shoot-tip reference transcriptome containing a total of 54,661 unique transcripts. Bioinformatic SNP detection identified >100,000 high quality SNPs in three expressed candidate gene groups: Pinus highly conserved genes (HCGs), differential expressed genes (DEGs) in plant defense response, and resistance gene analogs (RGAs). To estimate efficiency of in-silico SNP discovery, genotyping assay was developed by using Sequenom iPlex and it unveiled SNP success rates from 40.1% to 61.1%. SNP clustering analyses consistently revealed distinct populations, each composed of multiple full-sib seed families by parentage assignment in the WWP germplasm collection. Linkage disequilibrium (LD) analysis identified six genes in significant association with major gene (Cr2) resistance, including three RGAs (two NBS-LRR genes and one receptor-like protein kinase -RLK gene), two HCGs, and one DEG. At least one SNP locus provided an excellent marker for Cr2 selection across P. monticola populations. CONCLUSIONS The WWP shoot tip transcriptome and those validated SNP markers provide novel genomic resources for genetic, evolutionary and ecological studies. SNP loci of those candidate genes associated with resistant phenotypes can be used as positional and functional variation sites for further characterization of WWP major gene resistance against C. ribicola. Our results demonstrate that integration of RNA-seq-based transcriptome analysis and high-throughput genotyping is an effective approach for discovery of a large number of nucleotide variations and for identification of functional gene variants associated with adaptive traits in a non-model species.
Collapse
|
research-article |
11 |
19 |
8
|
Przewieslik-Allen AM, Burridge AJ, Wilkinson PA, Winfield MO, Shaw DS, McAusland L, King J, King IP, Edwards KJ, Barker GLA. Developing a High-Throughput SNP-Based Marker System to Facilitate the Introgression of Traits From Aegilops Species Into Bread Wheat ( Triticum aestivum). FRONTIERS IN PLANT SCIENCE 2019; 9:1993. [PMID: 30733728 PMCID: PMC6354564 DOI: 10.3389/fpls.2018.01993] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/11/2018] [Accepted: 12/21/2018] [Indexed: 06/09/2023]
Abstract
The genus Aegilops contains a diverse collection of wild species exhibiting variation in geographical distribution, ecological adaptation, ploidy and genome organization. Aegilops is the most closely related genus to Triticum which includes cultivated wheat, a globally important crop that has a limited gene pool for modern breeding. Aegilops species are a potential future resource for wheat breeding for traits, such as adaptation to different ecological conditions and pest and disease resistance. This study describes the development and application of the first high-throughput genotyping platform specifically designed for screening wheat relative species. The platform was used to screen multiple accessions representing all species in the genus Aegilops. Firstly, the data was demonstrated to be useful for screening diversity and examining relationships within and between Aegilops species. Secondly, markers able to characterize and track introgressions from Aegilops species in hexaploid wheat were identified and validated using two different approaches.
Collapse
|
research-article |
6 |
15 |
9
|
Brieger K, Zajac GJM, Pandit A, Foerster JR, Li KW, Annis AC, Schmidt EM, Clark CP, McMorrow K, Zhou W, Yang J, Kwong AM, Boughton AP, Wu J, Scheller C, Parikh T, de la Vega A, Brazel DM, Frieser M, Rea-Sandin G, Fritsche LG, Vrieze SI, Abecasis GR. Genes for Good: Engaging the Public in Genetics Research via Social Media. Am J Hum Genet 2019; 105:65-77. [PMID: 31204010 PMCID: PMC6612519 DOI: 10.1016/j.ajhg.2019.05.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Accepted: 05/08/2019] [Indexed: 01/06/2023] Open
Abstract
The Genes for Good study uses social media to engage a large, diverse participant pool in genetics research and education. Health history and daily tracking surveys are administered through a Facebook application, and participants who complete a minimum number of surveys are mailed a saliva sample kit ("spit kit") to collect DNA for genotyping. As of March 2019, we engaged >80,000 individuals, sent spit kits to >32,000 individuals who met minimum participation requirements, and collected >27,000 spit kits. Participants come from all 50 states and include a diversity of ancestral backgrounds. Rates of important chronic health indicators are consistent with those estimated for the general U.S. population using more traditional study designs. However, our sample is younger and contains a greater percentage of females than the general population. As one means of verifying data quality, we have replicated genome-wide association studies (GWASs) for exemplar traits, such as asthma, diabetes, body mass index (BMI), and pigmentation. The flexible framework of the web application makes it relatively simple to add new questionnaires and for other researchers to collaborate. We anticipate that the study sample will continue to grow and that future analyses may further capitalize on the strengths of the longitudinal data in combination with genetic information.
Collapse
|
research-article |
6 |
11 |
10
|
Fridley BL, Chalise P, Tsai YY, Sun Z, Vierkant RA, Larson MC, Cunningham JM, Iversen ES, Fenstermacher D, Barnholtz-Sloan J, Asmann Y, Risch HA, Schildkraut JM, Phelan CM, Sutphen R, Sellers TA, Goode EL. Germline copy number variation and ovarian cancer survival. Front Genet 2012; 3:142. [PMID: 22891074 PMCID: PMC3413872 DOI: 10.3389/fgene.2012.00142] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2012] [Accepted: 07/13/2012] [Indexed: 12/14/2022] Open
Abstract
Copy number variants (CNVs) have been implicated in many complex diseases. We examined whether inherited CNVs were associated with overall survival among women with invasive epithelial ovarian cancer. Germline DNA from 1,056 cases (494 deceased, average of 3.7 years follow-up) was interrogated with the Illumina 610 quad genome-wide array containing, after quality control exclusions, 581,903 single nucleotide polymorphisms (SNPs) and 17,917 CNV probes. Comprehensive analysis capitalized upon the strengths of three complementary approaches to CNV classification. First, to identify small CNVs, single markers were evaluated and, where associated with survival, consecutive markers were combined. Two chromosomal regions were associated with survival using this approach (14q31.3 rs2274736 p = 1.59 × 10−6, p = 0.001; 22q13.31 rs2285164 p = 4.01 × 10−5, p = 0.009), but were not significant after multiple testing correction. Second, to identify large CNVs, genome-wide segmentation was conducted to characterize chromosomal gains and losses, and association with survival was evaluated by segment. Four regions were associated with survival (1q21.3 loss p = 0.005, 5p14.1 loss p = 0.004, 9p23 loss p = 0.002, and 15q22.31 gain p = 0.002); however, again, after correcting for multiple testing, no regions were statistically significant, and none were in common with the single marker approach. Finally, to evaluate associations with general amounts of copy number changes across the genome, we estimated CNV burden based on genome-wide numbers of gains and losses; no associations with survival were observed (p > 0.40). Although CNVs that were not well-covered by the Illumina 610 quad array merit investigation, these data suggest no association between inherited CNVs and survival after ovarian cancer.
Collapse
|
Journal Article |
13 |
10 |
11
|
Hanks SC, Forer L, Schönherr S, LeFaive J, Martins T, Welch R, Gagliano Taliun SA, Braff D, Johnsen JM, Kenny EE, Konkle BA, Laakso M, Loos RFJ, McCarroll S, Pato C, Pato MT, Smith AV, Boehnke M, Scott LJ, Fuchsberger C. Extent to which array genotyping and imputation with large reference panels approximate deep whole-genome sequencing. Am J Hum Genet 2022; 109:1653-1666. [PMID: 35981533 PMCID: PMC9502057 DOI: 10.1016/j.ajhg.2022.07.012] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Accepted: 07/20/2022] [Indexed: 01/02/2023] Open
Abstract
Understanding the genetic basis of human diseases and traits is dependent on the identification and accurate genotyping of genetic variants. Deep whole-genome sequencing (WGS), the gold standard technology for SNP and indel identification and genotyping, remains very expensive for most large studies. Here, we quantify the extent to which array genotyping followed by genotype imputation can approximate WGS in studies of individuals of African, Hispanic/Latino, and European ancestry in the US and of Finnish ancestry in Finland (a population isolate). For each study, we performed genotype imputation by using the genetic variants present on the Illumina Core, OmniExpress, MEGA, and Omni 2.5M arrays with the 1000G, HRC, and TOPMed imputation reference panels. Using the Omni 2.5M array and the TOPMed panel, ≥90% of bi-allelic single-nucleotide variants (SNVs) are well imputed (r2 > 0.8) down to minor-allele frequencies (MAFs) of 0.14% in African, 0.11% in Hispanic/Latino, 0.35% in European, and 0.85% in Finnish ancestries. There was little difference in TOPMed-based imputation quality among the arrays with >700k variants. Individual-level imputation quality varied widely between and within the three US studies. Imputation quality also varied across genomic regions, producing regions where even common (MAF > 5%) variants were consistently not well imputed across ancestries. The extent to which array genotyping and imputation can approximate WGS therefore depends on reference panel, genotype array, sample ancestry, and genomic location. Imputation quality by variant or genomic region can be queried with our new tool, RsqBrowser, now deployed on the Michigan Imputation Server.
Collapse
|
research-article |
3 |
10 |
12
|
Identifying rare, medically relevant variation via population-based genomic screening in Alabama: opportunities and pitfalls. Genet Med 2020; 23:280-288. [PMID: 32989269 DOI: 10.1038/s41436-020-00976-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 09/15/2020] [Accepted: 09/15/2020] [Indexed: 12/20/2022] Open
Abstract
PURPOSE To evaluate the effectiveness and specificity of population-based genomic screening in Alabama. METHODS The Alabama Genomic Health Initiative (AGHI) has enrolled and evaluated 5369 participants for the presence of pathogenic/likely pathogenic (P/LP) variants using the Illumina Global Screening Array (GSA), with validation of all P/LP variants via Sanger sequencing in a CLIA-certified laboratory before return of results. RESULTS Among 131 variants identified by the GSA that were evaluated by Sanger sequencing, 67 (51%) were false positives (FP). For 39 of the 67 FP variants, a benign/likely benign variant was present at or near the targeted P/LP variant. Variants detected within African American individuals were significantly enriched for FPs, likely due to a higher rate of nontargeted alternative alleles close to array-targeted P/LP variants. CONCLUSION In AGHI, we have implemented an array-based process to screen for highly penetrant genetic variants in actionable disease genes. We demonstrate the need for clinical validation of array-identified variants in direct-to-consumer or population testing, especially for diverse populations.
Collapse
|
|
5 |
10 |
13
|
Jackson C, Christie N, Reynolds M, Marais C, Tii-Kuzu Y, Caballero M, Kampman T, Visser EA, Naidoo S, Kain D, Whetten RW, Isik F, Wegrzyn J, Hodge G, Acosta JJ, Myburg AA. A genome-wide SNP genotyping resource for tropical pine tree species. Mol Ecol Resour 2021; 22:695-710. [PMID: 34383377 DOI: 10.1111/1755-0998.13484] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Revised: 07/10/2021] [Accepted: 07/16/2021] [Indexed: 11/28/2022]
Abstract
We performed gene and genome targeted SNP discovery towards the development of a genome-wide, multi-species genotyping array for tropical pines. Pooled RNA-seq data from shoots of seedlings from five tropical pine species was used to identify transcript-based SNPs resulting in 1.3 million candidate Affymetrix SNP probe sets. In addition, we used a custom 40K probe set to perform capture-seq in pooled DNA from 81 provenances representing the natural ranges of six tropical pine species in Mexico and Central America resulting in 563K candidate SNP probe sets. Altogether, 300K RNA-seq (72%) and 120K capture-seq (28%) derived SNP probe sets were tiled on a 420K screening array that was used to genotype 576 trees representing the 81 provenances and commercial breeding material. Based on the screening array results, 50K SNPs were selected for commercial SNP array production (Axiom 384 format, Thermo Fisher Scientific) including 20K polymorphic SNPs for P. patula, P. tecunumanii, P. oocarpa and P. caribaea, 15K for P. greggii and P. maximinoi, 13K for P. elliottii and 8K for P. pseudostrobus. We included 9.7K ancestry informative SNPs that will be valuable for species and hybrid discrimination. Of the 50K SNP markers, 25% are polymorphic in only one species, while 75% are shared by two or more species. The Pitro50K SNP chip will be useful for population genomics and molecular breeding in this group of pine species that, together with their hybrids, represent the majority of fast-growing tropical and subtropical pine plantations globally.
Collapse
|
Journal Article |
4 |
8 |
14
|
Liu Z, Sun C, Yan Y, Li G, Li XC, Wu G, Yang N. Design and evaluation of a custom 50K Infinium SNP array for egg-type chickens. Poult Sci 2021; 100:101044. [PMID: 33743497 PMCID: PMC8010521 DOI: 10.1016/j.psj.2021.101044] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2020] [Revised: 12/14/2020] [Accepted: 02/04/2021] [Indexed: 11/28/2022] Open
Abstract
With the development of molecular genetics and high-throughput sequencing technology, genotyping arrays consisting of large numbers of SNP have raised great interest in animal and plant research. However, the application of commercial chicken 600K SNP arrays has varied in different populations of egg-type chickens. Moreover, their genotyping cost is too high for large-scale population applications. Herein, we independently developed a custom Illumina 50K BeadChip, named PhenoixChip-I, for egg-type chickens based on SNP from 479 sequenced individuals in 7 lines. We filtered and selected SNP with stringent criteria, such as high polymorphism, genome coverage, design score, and priorities. Finally, a total of 43,681 effective SNP successfully genotyped were included on our custom array. Approximately 14K SNP were previously reported to be associated with important economic traits in egg-type chickens. Subsequently, we verified the applicability and efficiency of the PhenoixChip-I SNP array from many aspects, including evaluating its use scientific research (population structure analysis and genome-wide association study) and the poultry breeding industry (genomic selection). The findings in our study will play a crucial role in accelerating the genetic improvement of egg-type chickens.
Collapse
|
Journal Article |
4 |
8 |
15
|
Zajac GJM, Fritsche LG, Weinstock JS, Dagenais SL, Lyons RH, Brummett CM, Abecasis GR. Estimation of DNA contamination and its sources in genotyped samples. Genet Epidemiol 2019; 43:980-995. [PMID: 31452258 DOI: 10.1002/gepi.22257] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 07/11/2019] [Accepted: 08/09/2019] [Indexed: 11/11/2022]
Abstract
Array genotyping is a cost-effective and widely used tool that enables assessment of up to millions of genetic markers in hundreds of thousands of individuals. Genotyping array data are typically highly accurate but sensitive to mixing of DNA samples from multiple individuals before or during genotyping. Contaminated samples can lead to genotyping errors and consequently cause false positive signals or reduce power of association analyses. Here, we propose a new method to identify contaminated samples and the sources of contamination within a genotyping batch. Through analysis of array intensity and genotype data from intentionally mixed samples and 22,366 samples of the Michigan Genomics Initiative, an ongoing biobank-based study, we show that our method can reliably estimate contamination. We also show that identifying sources of contamination can implicate problematic sample processing steps and guide process improvements. Compared to existing methods, our approach can estimate the proportion of contaminating DNA more accurately, eliminate the need for external databases of allele frequencies, and provide contamination estimates that are more robust to the ancestral origin of the contaminating sample.
Collapse
|
Research Support, N.I.H., Extramural |
6 |
7 |
16
|
Genomic Structural Diversity in Local Goats: Analysis of Copy-Number Variations. Animals (Basel) 2020; 10:ani10061040. [PMID: 32560248 PMCID: PMC7341319 DOI: 10.3390/ani10061040] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 06/05/2020] [Accepted: 06/12/2020] [Indexed: 12/14/2022] Open
Abstract
Simple Summary Copy-number variations (CNVs) are one of the widely dispersed forms of structural variations in mammalian genomes and are known to be present in genomic regions that regulate important physiological functions. In this study, CNV detection was performed starting from genotypic data of 120 individuals, belonging to four Sicilian dairy goat breeds, genotyped with the Illumina GoatSNP50 BeadChip array. Using PennCNV software, a total of 702 CNVs were identified in 107 individuals. These were merged in 75 CNV regions (CNVRs), i.e., regions containing CNVs overlapped by at least 1 base pair. Functional annotation of the CNVRs allowed the identification of 139 genes/loci within the most frequent CNVRs, which are involved in local adaptation, mild behaviour, immune response, reproduction, and olfactory receptors. This study provides insights into the genomic variations within these Italian goat breeds and should be of value for future studies to identify the relationships between this type of genetic variation and phenotypic traits. Abstract Copy-number variations (CNVs) are one of the widely dispersed forms of structural variations in mammalian genomes, and are present as deletions, insertions, or duplications. Only few studies have been conducted in goats on CNVs derived from SNP array data, and many local breeds still remain uncharacterized, e.g., the Sicilian goat dairy breeds. In this study, CNV detection was performed, starting from the genotypic data of 120 individuals, belonging to four local breeds (Argentata dell’Etna, Derivata di Siria, Girgentana, and Messinese), genotyped with the Illumina GoatSNP50 BeadChip array. Overall, 702 CNVs were identified in 107 individuals using PennCNV software based on the hidden Markov model algorithm. These were merged in 75 CNV regions (CNVRs), i.e., regions containing CNVs overlapped by at least 1 base pair, while 85 CNVs remained unique. The part of the genome covered by CNV events was 35.21 Mb (1.2% of the goat genome length). Functional annotation of the CNVRs allowed the identification of 139 genes/loci within the most frequent CNVRs that are involved in local adaptations, such as coat colour (ADAMTS20 and EDNRA), mild behaviour (NR3C2), immune response (EXOC3L4 and TNFAIP2), reproduction (GBP1 and GBP6), and olfactory receptors (OR7E24). This study provides insights into the genomic variations for these Sicilian dairy goat breeds and should be of value for future studies to identify the relationships between this type of genetic variation and phenotypic traits.
Collapse
|
Journal Article |
5 |
6 |
17
|
Arab MM, Marrano A, Abdollahi-Arpanahi R, Leslie CA, Cheng H, Neale DB, Vahdati K. Combining phenotype, genotype, and environment to uncover genetic components underlying water use efficiency in Persian walnut. JOURNAL OF EXPERIMENTAL BOTANY 2020; 71:1107-1127. [PMID: 31639822 DOI: 10.1093/jxb/erz467] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 10/08/2019] [Indexed: 06/10/2023]
Abstract
Walnut production is challenged by climate change and abiotic stresses. Elucidating the genomic basis of adaptation to climate is essential to breeding drought-tolerant cultivars for enhanced productivity in arid and semi-arid regions. Here, we aimed to identify loci potentially involved in water use efficiency (WUE) and adaptation to drought in Persian walnut using a diverse panel of 95 walnut families (950 seedlings) from Iran, which show contrasting levels of water availability in their native habitats. We analyzed associations between phenotypic, genotypic, and environmental variables from data sets of 609 000 high-quality single nucleotide polymorphisms (SNPs), three categories of phenotypic traits [WUE-related traits under drought, their drought stress index, and principal components (PCs)], and 21 climate variables and their combination (first three PCs). Our genotype-phenotype analysis identified 22 significant and 266 suggestive associations, some of which were for multiple traits, suggesting their correlation and a possible common genetic control. Also, genotype-environment association analysis found 115 significant and 265 suggestive SNP loci that displayed potential signals of local adaptation. Several sets of stress-responsive genes were found in the genomic regions significantly associated with the aforementioned traits. Most of the candidate genes identified are involved in abscisic acid signaling, stomatal regulation, transduction of environmental signals, antioxidant defense system, osmotic adjustment, and leaf growth and development. Upon validation, the marker-trait associations identified for drought tolerance-related traits would allow the selection and development of new walnut rootstocks or scion cultivars with superior WUE.
Collapse
|
Comparative Study |
5 |
5 |
18
|
Kukučková V, Moravčíková N, Curik I, Simčič M, Mészáros G, Kasarda R. Genetic diversity of local cattle. Acta Biochim Pol 2018; 65:421-424. [PMID: 30148506 DOI: 10.18388/abp.2017_2347] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Revised: 03/01/2018] [Accepted: 06/09/2018] [Indexed: 11/10/2022]
Abstract
The Slovak Pinzgau breed faces the bottleneck effect and the loss of diversity due to unequal use of founders and a significant population decline. Further population size reduction can lead to serious problems. Information obtained here and in other studies from high-throughput genotyping of 179 individuals was used to characterise genetic diversity and differentiation of Slovak Pinzgau, Austrian Pinzgau, Cika and Piedmontese cattle by Bayesian clustering algorithm. A gene flow network for the clusters estimated from admixture results was produced. The low estimate of genetic differentiation (FST) in Pinzgau cattle populations indicated that differentiation among these populations is low, particularly owing to a common historical origin and high gene flow. Changes in the log marginal likelihood indicated Austrian Pinzgau as the most similar breed to Slovak Pinzgau. All populations except the Piedmontese one displayed two ways of gene flow among populations, indicating that Piedmontese cattle was involved in producing of the analysed breeds while these breeds were not involved in creation of Piedmontese. Genetic evaluation represents an important tool in breeding and cattle selection. It is more strategically important than ever to preserve as much of the livestock diversity as possible, to ensure a prompt and proper response to the needs of future generations. Information provided by the fine-scale genetic characterization of this study clearly shows that there is a difference in genetic composition of Slovak and Austrian populations, as well as the Cika and Piedmontese cattle. Despite its population size, the Slovak Pinzgau cattle have a potential to serve as a basic gene reserve of this breed, with European and world-wide importance.
Collapse
|
Journal Article |
7 |
4 |
19
|
Hiraoka Y, Ferrante SP, Wu GA, Federici CT, Roose ML. Development and Assessment of SNP Genotyping Arrays for Citrus and Its Close Relatives. PLANTS (BASEL, SWITZERLAND) 2024; 13:691. [PMID: 38475537 DOI: 10.3390/plants13050691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 02/13/2024] [Accepted: 02/22/2024] [Indexed: 03/14/2024]
Abstract
Rapid advancements in technologies provide various tools to analyze fruit crop genomes to better understand genetic diversity and relationships and aid in breeding. Genome-wide single nucleotide polymorphism (SNP) genotyping arrays offer highly multiplexed assays at a relatively low cost per data point. We report the development and validation of 1.4M SNP Axiom® Citrus HD Genotyping Array (Citrus 15AX 1 and Citrus 15AX 2) and 58K SNP Axiom® Citrus Genotyping Arrays for Citrus and close relatives. SNPs represented were chosen from a citrus variant discovery panel consisting of 41 diverse whole-genome re-sequenced accessions of Citrus and close relatives, including eight progenitor citrus species. SNPs chosen mainly target putative genic regions of the genome and are accurately called in both Citrus and its closely related genera while providing good coverage of the nuclear and chloroplast genomes. Reproducibility of the arrays was nearly 100%, with a large majority of the SNPs classified as the most stringent class of markers, "PolyHighResolution" (PHR) polymorphisms. Concordance between SNP calls in sequence data and array data average 98%. Phylogenies generated with array data were similar to those with comparable sequence data and little affected by 3 to 5% genotyping error. Both arrays are publicly available.
Collapse
|
|
1 |
|
20
|
Gómez-Palacio A, Morinaga G, Turner PE, Micieli MV, Elnour MAB, Salim B, Surendran SN, Ramasamy R, Powell JR, Soghigian J, Gloria-Soria A. Robustness in population-structure and demographic-inference results derived from the Aedes aegypti genotyping chip and whole-genome sequencing data. G3 (BETHESDA, MD.) 2024; 14:jkae082. [PMID: 38626295 PMCID: PMC11152066 DOI: 10.1093/g3journal/jkae082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 03/04/2024] [Accepted: 04/04/2024] [Indexed: 04/18/2024]
Abstract
The mosquito Aedes aegypti is the primary vector of many human arboviruses such as dengue, yellow fever, chikungunya, and Zika, which affect millions of people worldwide. Population genetic studies on this mosquito have been important in understanding its invasion pathways and success as a vector of human disease. The Axiom aegypti1 SNP chip was developed from a sample of geographically diverse A. aegypti populations to facilitate genomic studies on this species. We evaluate the utility of the Axiom aegypti1 SNP chip for population genetics and compare it with a low-depth shotgun sequencing approach using mosquitoes from the native (Africa) and invasive ranges (outside Africa). These analyses indicate that results from the SNP chip are highly reproducible and have a higher sensitivity to capture alternative alleles than a low-coverage whole-genome sequencing approach. Although the SNP chip suffers from ascertainment bias, results from population structure, ancestry, demographic, and phylogenetic analyses using the SNP chip were congruent with those derived from low-coverage whole-genome sequencing, and consistent with previous reports on Africa and outside Africa populations using microsatellites. More importantly, we identified a subset of SNPs that can be reliably used to generate merged databases, opening the door to combined analyses. We conclude that the Axiom aegypti1 SNP chip is a convenient, more accurate, low-cost alternative to low-depth whole-genome sequencing for population genetic studies of A. aegypti that do not rely on full allelic frequency spectra. Whole-genome sequencing and SNP chip data can be easily merged, extending the usefulness of both approaches.
Collapse
|
research-article |
1 |
|
21
|
Shaikh MA, Al-Rawashdeh HS, Sait ARW. A Review of Artificial Intelligence-Based Down Syndrome Detection Techniques. Life (Basel) 2025; 15:390. [PMID: 40141735 PMCID: PMC11943655 DOI: 10.3390/life15030390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2025] [Revised: 02/19/2025] [Accepted: 02/28/2025] [Indexed: 03/28/2025] Open
Abstract
BACKGROUND Down syndrome (DS) is one of the most prevalent chromosomal abnormalities affecting global healthcare. Recent advances in artificial intelligence (AI) and machine learning (ML) have enhanced DS diagnostic accuracy. However, there is a lack of thorough evaluations analyzing the overall impact and effectiveness of AI-based DS diagnostic approaches. OBJECTIVES This review intends to identify methodologies and technologies used in AI-driven DS diagnostics. It evaluates the performance of AI models in terms of standard evaluation metrics, highlighting their strengths and limitations. METHODOLOGY In order to ensure transparency and rigor, the authors followed the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines. They extracted 1175 articles from major academic databases. By leveraging inclusion and exclusion criteria, a final set of 25 articles was selected. OUTCOMES The findings revealed significant advancements in AI-powered DS diagnostics across diverse data modalities. The modalities, including facial images, ultrasound scans, and genetic data, demonstrated strong potential for early DS diagnosis. Despite these advancements, this review outlined the limitations of AI approaches. Small and imbalanced datasets reduce the generalizability of the AI models. The authors present actionable strategies to enhance the clinical adoptions of these models.
Collapse
|
Review |
1 |
|
22
|
Keele JW, McDaneld TG, Kuehn LA. Use of overlapping DNA pools to discern genetic differences despite pooling error. J Anim Sci 2023; 101:skad166. [PMID: 37227930 PMCID: PMC10263113 DOI: 10.1093/jas/skad166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 05/22/2023] [Indexed: 05/27/2023] Open
Abstract
Genotyping pools of commercial cattle and individual seedstock animals may reveal hidden relationships between sectors enabling use of commercial data for genetic evaluation. However, commercial data capture may be compromised by inexact pool formation. We aimed to estimate the concordance between distances or genomic covariance among pooling allele frequencies (PAFs) of DNA pools comprised of 100 animals with 0% or 50% overlap of animals in common between pools. Cattle lung samples were collected from a commercial beef processing plant on a single day. Six pools of 100 animals each were constructed so that overlap between pools was 0% or 50%. Two pools of all 200 animals were constructed to estimate PAFs for all 200 animals. Frozen lung tissue (0.01 g) from each animal was weighed into a tube containing a pool; there were two pools of 200 animals each and six pools of 100 animals each. Every contribution of an individual animal was an independent measurement to insure independence of pooling errors. Lung samples were kept on dried ice during the pooling process to keep them from thawing. The eight pools were then assayed for approximately 100,000 single nucleotide polymorphisms (SNP). PAF for each SNP and pool was based on the relative intensity of the two dyes used to detect the alleles rather than genotype calls which are not tractable from pooling data. Euclidean distances and genomic relationships among the PAFs for the eight pools were estimated and distances were tested for concordance with pool overlap using permutation-based analysis of distance. Distances among pools were concordant with the planned overlap of animals shared between pools (P = 0.0024); pool overlap accounted for 70% of the variation and pooling error accounted for 30%. Pools containing 100 animals with no overlap were the most distant from one another and pools with 50% overlap were the least distant. This work shows that we can discern differences in distance between pairs of overlapping DNA pools sharing 0% and 50% of the animals. Genomic correlations among nonoverlapping pools indicated that nonoverlapping pool pairs did not share many related animals because genomic correlations were near zero for these pairs. On the other hand, one pair of nonoverlapping pools likely contained related animals between pools because the correlation was 0.21. Pools sharing 50% overlap ranged in genomic relationship between 0.21 and 0.39 (N = 12).
Collapse
|
research-article |
2 |
|