1
|
Sun Y, Wang X, Shang J, Liu JX, Zheng CH, Lei X. Introducing Heuristic Information Into Ant Colony Optimization Algorithm for Identifying Epistasis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1253-1261. [PMID: 30403637 DOI: 10.1109/tcbb.2018.2879673] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Epistasis learning, which is aimed at detecting associations between multiple Single Nucleotide Polymorphisms (SNPs) and complex diseases, has gained increasing attention in genome wide association studies. Although much work has been done on mapping the SNPs underlying complex diseases, there is still difficulty in detecting epistatic interactions due to the lack of heuristic information to expedite the search process. In this study, a method EACO is proposed to detect epistatic interactions based on the ant colony optimization (ACO) algorithm, the highlights of which are the introduced heuristic information, fitness function, and a candidate solutions filtration strategy. The heuristic information multi-SURF* is introduced into EACO for identifying epistasis, which is incorporated into ant-decision rules to guide the search with linear time. Two functionally complementary fitness functions, mutual information and the Gini index, are combined to effectively evaluate the associations between SNP combinations and the phenotype. Furthermore, a strategy for candidate solutions filtration is provided to adaptively retain all optimal solutions which yields a more accurate way for epistasis searching. Experiments of EACO, as well as three ACO based methods (AntEpiSeeker, MACOED, and epiACO) and four commonly used methods (BOOST, SNPRuler, TEAM, and epiMODE) are performed on both simulation data sets and a real data set of age-related macular degeneration. Results indicate that EACO is promising in identifying epistasis.
Collapse
|
2
|
Scheinhardt MO, Ziegler A. Location Tests for Biomarker Studies: A Comparison Using Simulations for the Two-sample Case. Methods Inf Med 2018; 52:351-9. [DOI: 10.3414/me12-02-0014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2012] [Accepted: 06/09/2013] [Indexed: 11/09/2022]
Abstract
Summary
Background: Gene, protein, or metabolite expression levels are often non-normally distributed, heavy tailed and contain outliers. Standard statistical approaches may fail as location tests in this situation.
Objectives: In three Monte-Carlo simulation studies, we aimed at comparing the type I error levels and empirical power of standard location tests and three adaptive tests [O’Gorman, Can J Stat 1997; 25: 269 –279; Keselman et al., Brit J Math Stat Psychol 2007; 60: 267– 293; Szymczak et al., Stat Med 2013; 32: 524 – 537] for a wide range of distributions.
Methods: We simulated two-sample scena -rios using the g-and-k-distribution family to systematically vary tail length and skewness with identical and varying variability between groups.
Results: All tests kept the type I error level when groups did not vary in their variability. The standard non-parametric U-test per -formed well in all simulated scenarios. It was outperformed by the two non-parametric adaptive methods in case of heavy tails or large skewness. Most tests did not keep the type I error level for skewed data in the case of heterogeneous variances.
Conclusions: The standard U-test was a powerful and robust location test for most of the simulated scenarios except for very heavy tailed or heavy skewed data, and it is thus to be recommended except for these cases. The non-parametric adaptive tests were powerful for both normal and non-normal distributions under sample variance homogeneity. But when sample variances differed, they did not keep the type I error level. The parametric adaptive test lacks power for skewed and heavy tailed distributions.
Collapse
|
3
|
Sun Y, Shang J, Liu JX, Li S, Zheng CH. epiACO - a method for identifying epistasis based on ant Colony optimization algorithm. BioData Min 2017; 10:23. [PMID: 28694848 PMCID: PMC5500974 DOI: 10.1186/s13040-017-0143-7] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2016] [Accepted: 06/29/2017] [Indexed: 11/23/2022] Open
Abstract
Background Identifying epistasis or epistatic interactions, which refer to nonlinear interaction effects of single nucleotide polymorphisms (SNPs), is essential to understand disease susceptibility and to detect genetic architectures underlying complex diseases. Though many works have been done for identifying epistatic interactions, due to their methodological and computational challenges, the algorithmic development is still ongoing. Results In this study, a method epiACO is proposed to identify epistatic interactions, which based on ant colony optimization algorithm. Highlights of epiACO are the introduced fitness function Svalue, path selection strategies, and a memory based strategy. The Svalue leverages the advantages of both mutual information and Bayesian network to effectively and efficiently measure associations between SNP combinations and the phenotype. Two path selection strategies, i.e., probabilistic path selection strategy and stochastic path selection strategy, are provided to adaptively guide ant behaviors of exploration and exploitation. The memory based strategy is designed to retain candidate solutions found in the previous iterations, and compare them to solutions of the current iteration to generate new candidate solutions, yielding a more accurate way for identifying epistasis. Conclusions Experiments of epiACO and its comparison with other recent methods epiMODE, TEAM, BOOST, SNPRuler, AntEpiSeeker, AntMiner, MACOED, and IACO are performed on both simulation data sets and a real data set of age-related macular degeneration. Results show that epiACO is promising in identifying epistasis and might be an alternative to existing methods.
Collapse
Affiliation(s)
- Yingxia Sun
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826 China
| | - Junliang Shang
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826 China.,Institute of Network Computing, Qufu Normal University, Rizhao, 276826 China
| | - Jin-Xing Liu
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826 China
| | - Shengjun Li
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826 China
| | - Chun-Hou Zheng
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826 China.,College of Electrical Engineering and Automation, Anhui University, Hefei, 230601 China
| |
Collapse
|
4
|
Exarchos KP, Carpegianni C, Rigas G, Exarchos TP, Vozzi F, Sakellarios A, Marraccini P, Naka K, Michalis L, Parodi O, Fotiadis DI. A Multiscale Approach for Modeling Atherosclerosis Progression. IEEE J Biomed Health Inform 2015; 19:709-19. [DOI: 10.1109/jbhi.2014.2323935] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
5
|
Mahachie John JM, Van Lishout F, Gusareva ES, Van Steen K. A robustness study of parametric and non-parametric tests in model-based multifactor dimensionality reduction for epistasis detection. BioData Min 2013; 6:9. [PMID: 23618370 PMCID: PMC3668290 DOI: 10.1186/1756-0381-6-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2012] [Accepted: 04/20/2013] [Indexed: 11/10/2022] Open
Abstract
Background Applying a statistical method implies identifying underlying (model) assumptions and checking their validity in the particular context. One of these contexts is association modeling for epistasis detection. Here, depending on the technique used, violation of model assumptions may result in increased type I error, power loss, or biased parameter estimates. Remedial measures for violated underlying conditions or assumptions include data transformation or selecting a more relaxed modeling or testing strategy. Model-Based Multifactor Dimensionality Reduction (MB-MDR) for epistasis detection relies on association testing between a trait and a factor consisting of multilocus genotype information. For quantitative traits, the framework is essentially Analysis of Variance (ANOVA) that decomposes the variability in the trait amongst the different factors. In this study, we assess through simulations, the cumulative effect of deviations from normality and homoscedasticity on the overall performance of quantitative Model-Based Multifactor Dimensionality Reduction (MB-MDR) to detect 2-locus epistasis signals in the absence of main effects. Methodology Our simulation study focuses on pure epistasis models with varying degrees of genetic influence on a quantitative trait. Conditional on a multilocus genotype, we consider quantitative trait distributions that are normal, chi-square or Student’s t with constant or non-constant phenotypic variances. All data are analyzed with MB-MDR using the built-in Student’s t-test for association, as well as a novel MB-MDR implementation based on Welch’s t-test. Traits are either left untransformed or are transformed into new traits via logarithmic, standardization or rank-based transformations, prior to MB-MDR modeling. Results Our simulation results show that MB-MDR controls type I error and false positive rates irrespective of the association test considered. Empirically-based MB-MDR power estimates for MB-MDR with Welch’s t-tests are generally lower than those for MB-MDR with Student’s t-tests. Trait transformations involving ranks tend to lead to increased power compared to the other considered data transformations. Conclusions When performing MB-MDR screening for gene-gene interactions with quantitative traits, we recommend to first rank-transform traits to normality and then to apply MB-MDR modeling with Student’s t-tests as internal tests for association.
Collapse
|
6
|
Cho P, Gelinas L, Corbett NP, Tebbutt SJ, Turvey SE, Fortuno ES, Kollmann TR. Association of common single-nucleotide polymorphisms in innate immune genes with differences in TLR-induced cytokine production in neonates. Genes Immun 2013; 14:199-211. [DOI: 10.1038/gene.2013.5] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
|
7
|
Exarchos KP, Exarchos TP, Bourantas CV, Papafaklis MI, Naka KK, Michalis LK, Parodi O, Fotiadis DI. Prediction of coronary atherosclerosis progression using dynamic Bayesian networks. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2013; 2013:3889-3892. [PMID: 24110581 DOI: 10.1109/embc.2013.6610394] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
In this paper we propose a methodology for predicting the progression of atherosclerosis in coronary arteries using dynamic Bayesian networks. The methodology takes into account patient data collected at the baseline study and the same data collected in the follow-up study. Our aim is to analyze all the different sources of information (Demographic, Clinical, Biochemical profile, Inflammatory markers, Treatment characteristics) in order to predict possible manifestations of the disease; subsequently, our purpose is twofold: i) to identify the key factors that dictate the progression of atherosclerosis and ii) based on these factors to build a model which is able to predict the progression of atherosclerosis for a specific patient, providing at the same time information about the underlying mechanism of the disease.
Collapse
|
8
|
Szymczak S, Scheinhardt MO, Zeller T, Wild PS, Blankenberg S, Ziegler A. Adaptive linear rank tests for eQTL studies. Stat Med 2012; 32:524-37. [PMID: 22933317 DOI: 10.1002/sim.5593] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2011] [Accepted: 08/07/2012] [Indexed: 11/05/2022]
Abstract
Expression quantitative trait loci (eQTL) studies are performed to identify single-nucleotide polymorphisms that modify average expression values of genes, proteins, or metabolites, depending on the genotype. As expression values are often not normally distributed, statistical methods for eQTL studies should be valid and powerful in these situations. Adaptive tests are promising alternatives to standard approaches, such as the analysis of variance or the Kruskal-Wallis test. In a two-stage procedure, skewness and tail length of the distributions are estimated and used to select one of several linear rank tests. In this study, we compare two adaptive tests that were proposed in the literature using extensive Monte Carlo simulations of a wide range of different symmetric and skewed distributions. We derive a new adaptive test that combines the advantages of both literature-based approaches. The new test does not require the user to specify a distribution. It is slightly less powerful than the locally most powerful rank test for the correct distribution and at least as powerful as the maximin efficiency robust rank test. We illustrate the application of all tests using two examples from different eQTL studies.
Collapse
Affiliation(s)
- Silke Szymczak
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Maria-Goeppert-Str. 1, 23562 Lübeck, Germany
| | | | | | | | | | | |
Collapse
|
9
|
Yu J, Vexler A, Kim SE, Hutson AD. Two-sample empirical likelihood ratio tests for medians in application to biomarker evaluations. CAN J STAT 2011. [DOI: 10.1002/cjs.10108] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
10
|
Wild PS, Zeller T, Schillert A, Szymczak S, Sinning CR, Deiseroth A, Schnabel RB, Lubos E, Keller T, Eleftheriadis MS, Bickel C, Rupprecht HJ, Wilde S, Rossmann H, Diemert P, Cupples LA, Perret C, Erdmann J, Stark K, Kleber ME, Epstein SE, Voight BF, Kuulasmaa K, Li M, Schäfer AS, Klopp N, Braund PS, Sager HB, Demissie S, Proust C, König IR, Wichmann HE, Reinhard W, Hoffmann MM, Virtamo J, Burnett MS, Siscovick D, Wiklund PG, Qu L, El Mokthari NE, Thompson JR, Peters A, Smith AV, Yon E, Baumert J, Hengstenberg C, März W, Amouyel P, Devaney J, Schwartz SM, Saarela O, Mehta NN, Rubin D, Silander K, Hall AS, Ferrieres J, Harris TB, Melander O, Kee F, Hakonarson H, Schrezenmeir J, Gudnason V, Elosua R, Arveiler D, Evans A, Rader DJ, Illig T, Schreiber S, Bis JC, Altshuler D, Kavousi M, Witteman JCM, Uitterlinden AG, Hofman A, Folsom AR, Barbalic M, Boerwinkle E, Kathiresan S, Reilly MP, O'Donnell CJ, Samani NJ, Schunkert H, Cambien F, Lackner KJ, Tiret L, Salomaa V, Munzel T, Ziegler A, Blankenberg S. A genome-wide association study identifies LIPA as a susceptibility gene for coronary artery disease. ACTA ACUST UNITED AC 2011; 4:403-12. [PMID: 21606135 DOI: 10.1161/circgenetics.110.958728] [Citation(s) in RCA: 111] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
BACKGROUND eQTL analyses are important to improve the understanding of genetic association results. We performed a genome-wide association and global gene expression study to identify functionally relevant variants affecting the risk of coronary artery disease (CAD). METHODS AND RESULTS In a genome-wide association analysis of 2078 CAD cases and 2953 control subjects, we identified 950 single-nucleotide polymorphisms (SNPs) that were associated with CAD at P<10(-3). Subsequent in silico and wet-laboratory replication stages and a final meta-analysis of 21 428 CAD cases and 38 361 control subjects revealed a novel association signal at chromosome 10q23.31 within the LIPA (lysosomal acid lipase A) gene (P=3.7×10(-8); odds ratio, 1.1; 95% confidence interval, 1.07 to 1.14). The association of this locus with global gene expression was assessed by genome-wide expression analyses in the monocyte transcriptome of 1494 individuals. The results showed a strong association of this locus with expression of the LIPA transcript (P=1.3×10(-96)). An assessment of LIPA SNPs and transcript with cardiovascular phenotypes revealed an association of LIPA transcript levels with impaired endothelial function (P=4.4×10(-3)). CONCLUSIONS The use of data on genetic variants and the addition of data on global monocytic gene expression led to the identification of the novel functional CAD susceptibility locus LIPA, located on chromosome 10q23.31. The respective eSNPs associated with CAD strongly affect LIPA gene expression level, which was related to endothelial dysfunction, a precursor of CAD.
Collapse
Affiliation(s)
- Philipp S Wild
- Department of Medicine II, University Medical Center Mainz, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Van Steen K. Perspectives on genome-wide multi-stage family-based association studies. Stat Med 2011; 30:2201-21. [DOI: 10.1002/sim.4259] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2010] [Accepted: 03/07/2011] [Indexed: 01/03/2023]
|