1
|
Feronato SG, Silva MLM, Izbicki R, Farias TDJ, Shigunov P, Dallagiovanna B, Passetti F, dos Santos HG. Selecting Genetic Variants and Interactions Associated with Amyotrophic Lateral Sclerosis: A Group LASSO Approach. J Pers Med 2022; 12:jpm12081330. [PMID: 36013279 PMCID: PMC9410070 DOI: 10.3390/jpm12081330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 08/10/2022] [Accepted: 08/12/2022] [Indexed: 11/16/2022] Open
Abstract
Amyotrophic lateral sclerosis (ALS) is a multi-system neurodegenerative disease that affects both upper and lower motor neurons, resulting from a combination of genetic, environmental, and lifestyle factors. Usually, the association between single-nucleotide polymorphisms (SNPs) and this disease is tested individually, which leads to the testing of multiple hypotheses. In addition, this classical approach does not support the detection of interaction-dependent SNPs. We applied a two-step procedure to select SNPs and pairwise interactions associated with ALS. SNP data from 276 ALS patients and 268 controls were analyzed by a two-step group LASSO in 2000 iterations. In the first step, we fitted a group LASSO model to a bootstrap sample and a random subset of predictors (25%) from the original data set aiming to screen for important SNPs and, in the second step, we fitted a hierarchical group LASSO model to evaluate pairwise interactions. An in silico analysis was performed on a set of variables, which were prioritized according to their bootstrap selection frequency. We identified seven SNPs (rs16984239, rs10459680, rs1436918, rs1037666, rs4552942, rs10773543, and rs2241493) and two pairwise interactions (rs16984239:rs2118657 and rs16984239:rs3172469) potentially involved in nervous system conservation and function. These results may contribute to the understanding of ALS pathogenesis, its diagnosis, and therapeutic strategy improvement.
Collapse
Affiliation(s)
| | | | - Rafael Izbicki
- Department of Statistics, Universidade Federal de São Carlos, São Carlos 13565-905, Brazil
| | - Ticiana D. J. Farias
- Instituto Carlos Chagas, Fundação Oswaldo Cruz, Curitiba 81310-020, Brazil
- Division of Biomedical Informatics, Department of Immunology and Microbiology, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Patrícia Shigunov
- Instituto Carlos Chagas, Fundação Oswaldo Cruz, Curitiba 81310-020, Brazil
| | | | - Fabio Passetti
- Instituto Carlos Chagas, Fundação Oswaldo Cruz, Curitiba 81310-020, Brazil
| | | |
Collapse
|
2
|
Zheng C, Huang J, Wood IA, Wu Y. A modified expectation‐maximization algorithm for latent Gaussian graphical model. CAN J STAT 2021. [DOI: 10.1002/cjs.11643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Chaowen Zheng
- Department of Statistics North Carolina State University Raleigh North Carolina USA
| | - Jingfang Huang
- Department of Mathematics University of North Carolina at Chapel Hill Chapel Hill North Carolina USA
| | - Ian A. Wood
- School of Mathematics and Physics University of Queensland St. Lucia Queensland Australia
| | - Yichao Wu
- Department of Mathematics, Statistics and Computer Science The University of Illinois at Chicago Chicago Illinois USA
| |
Collapse
|
3
|
Zheng C, Ferrari D, Zhang M, Baird P. Ranking the importance of genetic factors by variable‐selection confidence sets. J R Stat Soc Ser C Appl Stat 2019. [DOI: 10.1111/rssc.12337] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Affiliation(s)
| | - Davide Ferrari
- University of Bozen–Bolzano Bolzano Italy
- University of Melbourne Melbourne Australia
| | - Michael Zhang
- University of Melbourne Melbourne Australia
- Royal Victorian Eye and Ear Hospital Melbourne Australia
| | - Paul Baird
- University of Melbourne Melbourne Australia
- Royal Victorian Eye and Ear Hospital Melbourne Australia
| |
Collapse
|
4
|
Huang Z, Ferrari D, Qian G. Parsimonious and powerful composite likelihood testing for group difference and genotype–phenotype association. Comput Stat Data Anal 2017. [DOI: 10.1016/j.csda.2016.12.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
5
|
Ferrari D, Qian G, Hunter T. Parsimonious and Efficient Likelihood Composition by Gibbs Sampling. J Comput Graph Stat 2016. [DOI: 10.1080/10618600.2015.1058799] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
6
|
Fan J, Liu H, Ning Y, Zou H. High dimensional semiparametric latent graphical model for mixed data. J R Stat Soc Series B Stat Methodol 2016. [DOI: 10.1111/rssb.12168] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Affiliation(s)
| | | | | | - Hui Zou
- University of Minnesota Minneapolis USA
| |
Collapse
|
7
|
Chen Y, Hong C, Ning Y, Su X. Meta-analysis of studies with bivariate binary outcomes: a marginal beta-binomial model approach. Stat Med 2015; 35:21-40. [PMID: 26303591 DOI: 10.1002/sim.6620] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2013] [Revised: 05/01/2015] [Accepted: 07/23/2015] [Indexed: 11/11/2022]
Abstract
When conducting a meta-analysis of studies with bivariate binary outcomes, challenges arise when the within-study correlation and between-study heterogeneity should be taken into account. In this paper, we propose a marginal beta-binomial model for the meta-analysis of studies with binary outcomes. This model is based on the composite likelihood approach and has several attractive features compared with the existing models such as bivariate generalized linear mixed model (Chu and Cole, 2006) and Sarmanov beta-binomial model (Chen et al., 2012). The advantages of the proposed marginal model include modeling the probabilities in the original scale, not requiring any transformation of probabilities or any link function, having closed-form expression of likelihood function, and no constraints on the correlation parameter. More importantly, because the marginal beta-binomial model is only based on the marginal distributions, it does not suffer from potential misspecification of the joint distribution of bivariate study-specific probabilities. Such misspecification is difficult to detect and can lead to biased inference using currents methods. We compare the performance of the marginal beta-binomial model with the bivariate generalized linear mixed model and the Sarmanov beta-binomial model by simulation studies. Interestingly, the results show that the marginal beta-binomial model performs better than the Sarmanov beta-binomial model, whether or not the true model is Sarmanov beta-binomial, and the marginal beta-binomial model is more robust than the bivariate generalized linear mixed model under model misspecifications. Two meta-analyses of diagnostic accuracy studies and a meta-analysis of case-control studies are conducted for illustration.
Collapse
Affiliation(s)
- Yong Chen
- Department of Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, 19104, Pennsylvania, U.S.A
| | - Chuan Hong
- Division of Biostatistics, University of Texas School of Public Health, 1200 Pressler St, Houston, 77030, Texas, U.S.A
| | - Yang Ning
- Department of Operations Research and Financial Engineering, Princeton University, Princeton, 08544, New Jersey, U.S.A
| | - Xiao Su
- Division of Biostatistics, University of Texas School of Public Health, 1200 Pressler St, Houston, 77030, Texas, U.S.A
| |
Collapse
|
8
|
Stingo FC, Swartz MD, Vannucci M. A Bayesian approach to identify genes and gene-level SNP aggregates in a genetic analysis of cancer data. STATISTICS AND ITS INTERFACE 2015; 8:137-151. [PMID: 28989562 PMCID: PMC5630184 DOI: 10.4310/sii.2015.v8.n2.a2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Complex diseases, such as cancer, arise from complex etiologies consisting of multiple single-nucleotide polymorphisms (SNPs), each contributing a small amount to the overall risk of disease. Thus, many researchers have gone beyond single-SNPs analysis methods, focusing instead on groups of SNPs, for example by analysing haplotypes. More recently, pathway-based methods have been proposed that use prior biological knowledge on gene function to achieve a more powerful analysis of genome-wide association studies (GWAS) data. In this paper we propose a novel Bayesian modeling framework to identify molecular biomarkers for disease prediction. Our method combines pathway-based approaches with multiple SNP analyses of a specified region of interest. The model's development is motivated by SNP data from a lung cancer study. In our approach we define gene-level scores based on SNP allele frequencies and use a linear modeling setting to study the scores association to the observed phenotype. The basic idea behind the definition of gene-level scores is to weigh the SNPs within the gene according to their rarity, based on genotype frequencies expected under the Hardy-Weinberg equilibrium law. This results in scores giving more importance to the unusually low frequencies, i.e. to SNPs that might indicate peculiar genetic differences between subjects belonging to different groups. An additional feature of our approach is that we incorporate information on SNP-to-SNP associations into the model. In particular, we use network priors that model the linkage disequilibrium between SNPs. For posterior inference, we design a stochastic search method that identifies significant biomarkers (genes and SNPs) for disease prediction. We assess performances on simulated data and compare results to existing approaches. We then show the ability of the proposed methodology to detect relevant genes and associated SNPs in a lung cancer dataset.
Collapse
Affiliation(s)
- Francesco C Stingo
- Department of Biostatistics, MD Anderson Cancer Center, 1400 Pressler St. Houston, TX 77030, USA
| | - Michael D Swartz
- Department of Biostatistics, UT School of Public Health, 1200 Pressler St. Houston, TX 77030, USA
| | - Marina Vannucci
- Department of Statistics, MS 138, Rice University, 6100 Main St. Houston, TX 77251-1892 USA
| |
Collapse
|