Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Long N, Gianola D, Rosa GJM, Weigel KA, Avendaño S. Comparison of classification methods for detecting associations between SNPs and chick mortality. Genet Sel Evol 2009;41:18. [PMID: 19284707 PMCID: PMC3225888 DOI: 10.1186/1297-9686-41-18] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2008] [Accepted: 01/23/2009] [Indexed: 11/23/2022] Open

For:	Long N, Gianola D, Rosa GJM, Weigel KA, Avendaño S. Comparison of classification methods for detecting associations between SNPs and chick mortality. Genet Sel Evol 2009;41:18. [PMID: 19284707 PMCID: PMC3225888 DOI: 10.1186/1297-9686-41-18] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2008] [Accepted: 01/23/2009] [Indexed: 11/23/2022] Open

Number

Cited by Other Article(s)

pour AF, Pietrzak M, Sucheston-Campbell LE, Karaesmen E, Dalton LA, Rempała GA. High dimensional model representation of log likelihood ratio: binary classification with SNP data. BMC Med Genomics 2020;13:133. [PMID: 32957998 PMCID: PMC7504683 DOI: 10.1186/s12920-020-00774-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open

Abstract

BACKGROUND

Developing binary classification rules based on SNP observations has been a major challenge for many modern bioinformatics applications, e.g., predicting risk of future disease events in complex conditions such as cancer. Small-sample, high-dimensional nature of SNP data, weak effect of each SNP on the outcome, and highly non-linear SNP interactions are several key factors complicating the analysis. Additionally, SNPs take a finite number of values which may be best understood as ordinal or categorical variables, but are treated as continuous ones by many algorithms.

METHODS

We use the theory of high dimensional model representation (HDMR) to build appropriate low dimensional glass-box models, allowing us to account for the effects of feature interactions. We compute the second order HDMR expansion of the log-likelihood ratio to account for the effects of single SNPs and their pairwise interactions. We propose a regression based approach, called linear approximation for block second order HDMR expansion of categorical observations (LABS-HDMR-CO), to approximate the HDMR coefficients. We show how HDMR can be used to detect pairwise SNP interactions, and propose the fixed pattern test (FPT) to identify statistically significant pairwise interactions.

RESULTS

We apply LABS-HDMR-CO and FPT to synthetically generated HAPGEN2 data as well as to two GWAS cancer datasets. In these examples LABS-HDMR-CO enjoys superior accuracy compared with several algorithms used for SNP classification, while also taking pairwise interactions into account. FPT declares very few significant interactions in the small sample GWAS datasets when bounding false discovery rate (FDR) by 5%, due to the large number of tests performed. On the other hand, LABS-HDMR-CO utilizes a large number of SNP pairs to improve its prediction accuracy. In the larger HAPGEN2 dataset FTP declares a larger portion of SNP pairs used by LABS-HDMR-CO as significant.

CONCLUSION

LABS-HDMR-CO and FPT are interesting methods to design prediction rules and detect pairwise feature interactions for SNP data. Reliably detecting pairwise SNP interactions and taking advantage of potential interactions to improve prediction accuracy are two different objectives addressed by these methods. While the large number of potential SNP interactions may result in low power of detection, potentially interacting SNP pairs, of which many might be false alarms, can still be used to improve prediction accuracy.

Collapse

Chang LY, Toghiani S, Aggrey SE, Rekaya R. Increasing accuracy of genomic selection in presence of high density marker panels through the prioritization of relevant polymorphisms. BMC Genet 2019;20:21. [PMID: 30795734 PMCID: PMC6387489 DOI: 10.1186/s12863-019-0720-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2018] [Accepted: 02/04/2019] [Indexed: 12/27/2022] Open

Abstract

BACKGROUND

It becomes clear that the increase in the density of marker panels and even the use of sequence data didn't result in any meaningful increase in the accuracy of genomic selection (GS) using either regression (RM) or variance component (VC) approaches. This is in part due to the limitations of current methods. Association model are well over-parameterized and suffer from severe co-linearity and lack of statistical power. Even when the variant effects are not directly estimated using VC based approaches, the genomic relationships didn't improve after the marker density exceeded a certain threshold. SNP prioritization-based fixation index (F_ST) scores were used to track the majority of significant QTL and to reduce the dimensionality of the association model.

RESULTS

Two populations with average LD between adjacent markers of 0.3 (P1) and 0.7 (P2) were simulated. In both populations, the genomic data consisted of 400 K SNP markers distributed on 10 chromosomes. The density of simulated genomic data mimics roughly 1.2 million SNP markers in the bovine genome. The genomic relationship matrix (G) was calculated for each set of selected SNPs based on their F_ST score and similar numbers of SNPs were selected randomly for comparison. Using all 400 K SNPs, 46% of the off-diagonal elements (OD) were between - 0.01 and 0.01. The same portion was 31, 23 and 16% when 80 K, 40 K and 20 K SNPs were selected based on F_ST scores. For randomly selected 20 K SNP subsets, around 33% of the OD fell within the same range. Genomic similarity computed using SNPs selected based on F_ST scores was always higher than using the same number of SNPs selected randomly. Maximum accuracies of 0.741 and 0.828 were achieved when 20 and 10 K SNPs were selected based on F_ST scores in P₁ and P₂, respectively.

CONCLUSIONS

Genomic similarity could be maximized by the decrease in the number of selected SNPs, but it also leads to a decrease in the percentage of genetic variation explained by the selected markers. Finding the balance between these two parameters could optimize the accuracy of GS in the presence of high density marker panels.

Collapse

Computational Biosensors: Molecules, Algorithms, and Detection Platforms. MODELING, METHODOLOGIES AND TOOLS FOR MOLECULAR AND NANO-SCALE COMMUNICATIONS 2017. [PMCID: PMC7123247 DOI: 10.1007/978-3-319-50688-3_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Shahinfar S, Page D, Guenther J, Cabrera V, Fricke P, Weigel K. Prediction of insemination outcomes in Holstein dairy cattle using alternative machine learning algorithms. J Dairy Sci 2014;97:731-42. [DOI: 10.3168/jds.2013-6693] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2013] [Accepted: 09/11/2013] [Indexed: 11/19/2022]

Genotyping strategies for genomic selection in small dairy cattle populations. Animal 2013;6:1216-24. [PMID: 23217224 DOI: 10.1017/s1751731112000341] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open

Abstract

This study evaluated different female-selective genotyping strategies to increase the predictive accuracy of genomic breeding values (GBVs) in populations that have a limited number of sires with a large number of progeny. A simulated dairy population was utilized to address the aims of the study. The following selection strategies were used: random selection, two-tailed selection by yield deviations, two-tailed selection by breeding value, top yield deviation selection and top breeding value selection. For comparison, two other strategies, genotyping of sires and pedigree indexes from traditional genetic evaluation, were included in the analysis. Two scenarios were simulated, low heritability (h 2 = 0.10) and medium heritability (h 2 = 0.30). GBVs were estimated using the Bayesian Lasso. The accuracy of predicted GBVs using the two-tailed strategies was better than the accuracy obtained using other strategies (0.50 and 0.63 for the two-tailed selection by yield deviations strategy and 0.48 and 0.63 for the two-tailed selection by breeding values strategy in low- and medium-heritability scenarios, respectively, using 1000 genotyped cows). When 996 genotyped bulls were used as the training population, the sire' strategy led to accuracies of 0.48 and 0.55 for low- and medium-heritability traits, respectively. The Random strategies required larger training populations to outperform the accuracies of the pedigree index; however, selecting females from the top of the yield deviations or breeding values of the population did not improve accuracy relative to that of the pedigree index. Bias was found for all genotyping strategies considered, although the Top strategies produced the most biased predictions. Strategies that involve genotyping cows can be implemented in breeding programs that have a limited number of sires with a reliable progeny test. The results of this study showed that females that exhibited upper and lower extreme values within the distribution of yield deviations may be included as training population to increase reliability in small reference populations. The strategies that selected only the females that had high estimated breeding values or yield deviations produced suboptimal results.

Collapse

Investigation of Single Nucleotide Polymorphisms Associated to Familial Combined Hyperlipidemia with Random Forests. ACTA ACUST UNITED AC 2013. [DOI: 10.1007/978-3-642-35467-0_18] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]

Morota G, Valente BD, Rosa GJM, Weigel KA, Gianola D. An assessment of linkage disequilibrium in Holstein cattle using a Bayesian network. J Anim Breed Genet 2012;129:474-87. [PMID: 23148973 DOI: 10.1111/jbg.12002] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2012] [Accepted: 07/31/2012] [Indexed: 11/30/2022]

Abstract

Linkage disequilibrium (LD) is defined as a non-random association of the distributions of alleles at different loci within a population. This association between loci is valuable in prediction of quantitative traits in animals and plants and in genome-wide association studies. A question that arises is whether standard metrics such as D' and r(2) reflect complex associations in a genetic system properly. It seems reasonable to take the view that loci associate and interact together as a system or network, as opposed to in a simple pairwise manner. We used a Bayesian network (BN) as a representation of choice for an LD network. A BN is a graphical depiction of a probability distribution and can represent sets of conditional independencies. Moreover, it provides a visual display of the joint distribution of the set of random variables in question. The usefulness of BN for linkage disequilibrium was explored and illustrated using genetic marker loci found to have the strongest effects on milk protein in Holstein cattle based on three strategies for ranking marker effect estimates: posterior means, standardized posterior means and additive genetic variance. Two different algorithms, Tabu search (a local score-based algorithm) and incremental association Markov blanket (a constraint-based algorithm), coupled with the chi-square test, were used for learning the structure of the BN and were compared with the reference r(2) metric represented as an LD heat map. The BN captured several genetic markers associated as clusters, implying that markers are inter-related in a complicated manner. Further, the BN detected conditionally dependent markers. The results confirm that LD relationships are of a multivariate nature and that r(2) gives an incomplete description and understanding of LD. Use of an LD Bayesian network enables inferring associations between loci in a systems framework and provides a more accurate picture of LD than that resulting from the use of pairwise metrics.

Collapse

Parsimonious classification of binary lacunarity data computed from food surface images using kernel principal component analysis and artificial neural networks. Meat Sci 2011;87:107-14. [DOI: 10.1016/j.meatsci.2010.08.014] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2010] [Revised: 08/18/2010] [Accepted: 08/25/2010] [Indexed: 11/21/2022]

Okser S, Lehtimäki T, Elo LL, Mononen N, Peltonen N, Kähönen M, Juonala M, Fan YM, Hernesniemi JA, Laitinen T, Lyytikäinen LP, Rontu R, Eklund C, Hutri-Kähönen N, Taittonen L, Hurme M, Viikari JSA, Raitakari OT, Aittokallio T. Genetic variants and their interactions in the prediction of increased pre-clinical carotid atherosclerosis: the cardiovascular risk in young Finns study. PLoS Genet 2010;6:e1001146. [PMID: 20941391 PMCID: PMC2947986 DOI: 10.1371/journal.pgen.1001146] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2009] [Accepted: 09/01/2010] [Indexed: 12/14/2022] Open

Abstract

The relative contribution of genetic risk factors to the progression of subclinical atherosclerosis is poorly understood. It is likely that multiple variants are implicated in the development of atherosclerosis, but the subtle genotypic and phenotypic differences are beyond the reach of the conventional case-control designs and the statistical significance testing procedures being used in most association studies. Our objective here was to investigate whether an alternative approach--in which common disorders are treated as quantitative phenotypes that are continuously distributed over a population--can reveal predictive insights into the early atherosclerosis, as assessed using ultrasound imaging-based quantitative measurement of carotid artery intima-media thickness (IMT). Using our population-based follow-up study of atherosclerosis precursors as a basis for sampling subjects with gradually increasing IMT levels, we searched for such subsets of genetic variants and their interactions that are the most predictive of the various risk classes, rather than using exclusively those variants meeting a stringent level of statistical significance. The area under the receiver operating characteristic curve (AUC) was used to evaluate the predictive value of the variants, and cross-validation was used to assess how well the predictive models will generalize to other subsets of subjects. By means of our predictive modeling framework with machine learning-based SNP selection, we could improve the prediction of the extreme classes of atherosclerosis risk and progression over a 6-year period (average AUC 0.844 and 0.761), compared to that of using conventional cardiovascular risk factors alone (average AUC 0.741 and 0.629), or when combined with the statistically significant variants (average AUC 0.762 and 0.651). The predictive accuracy remained relatively high in an independent validation set of subjects (average decrease of 0.043). These results demonstrate that the modeling framework can utilize the "gray zone" of genetic variation in the classification of subjects with different degrees of risk of developing atherosclerosis.

Collapse

Affiliation(s)

Sebastian Okser Biomathematics Research Group, Department of Mathematics, University of Turku, Turku, Finland
Terho Lehtimäki Department of Clinical Chemistry, Tampere University Hospital and University of Tampere, Tampere, Finland
Laura L. Elo Biomathematics Research Group, Department of Mathematics, University of Turku, Turku, Finland Data Mining and Modeling Group, Turku Centre for Biotechnology, Turku, Finland
Nina Mononen Department of Clinical Chemistry, Tampere University Hospital and University of Tampere, Tampere, Finland
Nina Peltonen Department of Clinical Chemistry, Tampere University Hospital and University of Tampere, Tampere, Finland
Mika Kähönen Department of Clinical Physiology, Tampere University Hospital and University of Tampere, Tampere, Finland
Markus Juonala Department of Medicine, Turku University Central Hospital, Turku, Finland Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland
Yue-Mei Fan Department of Clinical Chemistry, Tampere University Hospital and University of Tampere, Tampere, Finland
Jussi A. Hernesniemi Department of Clinical Chemistry, Tampere University Hospital and University of Tampere, Tampere, Finland
Tomi Laitinen Department of Clinical Physiology and Nuclear Medicine, Kuopio University Hospital and University of Eastern Finland, Kuopio, Finland
Leo-Pekka Lyytikäinen Department of Clinical Chemistry, Tampere University Hospital and University of Tampere, Tampere, Finland
Riikka Rontu Department of Clinical Chemistry, Tampere University Hospital and University of Tampere, Tampere, Finland
Carita Eklund Department of Microbiology and Immunology, University of Tampere, Tampere, Finland
Nina Hutri-Kähönen Department of Pediatrics, Tampere University Hospital, Tampere, Finland
Leena Taittonen Department of Pediatrics, University of Oulu, Oulu, Finland
Mikko Hurme Department of Microbiology and Immunology, University of Tampere, Tampere, Finland
Jorma S. A. Viikari Department of Medicine, Turku University Central Hospital, Turku, Finland Department of Medicine, University of Turku, Turku, Finland
Olli T. Raitakari Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland Department of Clinical Physiology, Turku University Hospital, Turku, Finland
Tero Aittokallio Biomathematics Research Group, Department of Mathematics, University of Turku, Turku, Finland Data Mining and Modeling Group, Turku Centre for Biotechnology, Turku, Finland * E-mail:

Collapse

Wang G, Yang Y, Ott J. Genome-wide conditional search for epistatic disease-predisposing variants in human association studies. Hum Hered 2010;70:34-41. [PMID: 20413980 PMCID: PMC2912644 DOI: 10.1159/000293722] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2009] [Accepted: 03/01/2010] [Indexed: 11/19/2022] Open

Jannink JL, Lorenz AJ, Iwata H. Genomic selection in plant breeding: from theory to practice. Brief Funct Genomics 2010;9:166-77. [PMID: 20156985 DOI: 10.1093/bfgp/elq001] [Citation(s) in RCA: 538] [Impact Index Per Article: 38.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open