1
|
Kwak J, Shin D. Gene-Nutrient Interactions in Obesity: COBLL1 Genetic Variants Interact with Dietary Fat Intake to Modulate the Incidence of Obesity. Int J Mol Sci 2023; 24:ijms24043758. [PMID: 36835164 PMCID: PMC9959357 DOI: 10.3390/ijms24043758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 01/26/2023] [Accepted: 02/02/2023] [Indexed: 02/16/2023] Open
Abstract
The COBLL1 gene is associated with leptin, a hormone important for appetite and weight maintenance. Dietary fat is a significant factor in obesity. This study aimed to determine the association between COBLL1 gene, dietary fat, and incidence of obesity. Data from the Korean Genome and Epidemiology Study were used, and 3055 Korean adults aged ≥ 40 years were included. Obesity was defined as a body mass index ≥ 25 kg/m2. Patients with obesity at baseline were excluded. The effects of the COBLL1 rs6717858 genotypes and dietary fat on incidence of obesity were evaluated using multivariable Cox proportional hazard models. During an average follow-up period of 9.2 years, 627 obesity cases were documented. In men, the hazard ratio (HR) for obesity was higher in CT, CC carriers (minor allele carriers) in the highest tertile of dietary fat intake than for men with TT carriers in the lowest tertile of dietary fat intake (Model 1: HR: 1.66, 95% confidence interval [CI]: 1.07-2.58; Model 2: HR: 1.63, 95% CI: 1.04-2.56). In women, the HR for obesity was higher in TT carriers in the highest tertile of dietary fat intake than for women with TT carriers in the lowest tertile of dietary fat intake (Model 1: HR: 1.49, 95% CI: 1.08-2.06; Model 2: HR: 1.53, 95% CI: 1.10-2.13). COBLL1 genetic variants and dietary fat intake had different sex-dependent effects in obesity. These results imply that a low-fat diet may protect against the effects of COBLL1 genetic variants on future obesity risk.
Collapse
|
2
|
Association of Single Nucleotide Polymorphisms in KCNA10 and SLC13A3 Genes with the Susceptibility to Chronic Kidney Disease of Unknown Etiology in Central Indian Patients. Biochem Genet 2023:10.1007/s10528-023-10335-7. [PMID: 36696070 DOI: 10.1007/s10528-023-10335-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 01/09/2023] [Indexed: 01/26/2023]
Abstract
Global rise in the prevalence of endemic chronic kidney disease of unknown etiology (CKDu) possess major health issues. The prevalence of CKDu is also rising in the Indian population. Besides environmental factors, genetic factors play an important role in the predisposition to CKDu. In the present study, we have analyzed the association of single nucleotide polymorphisms (SNPs) in three genes with the susceptibility to CKDu. This was a case-control study with a total of 180 adult subjects (CKD = 60, CKDu = 60, Healthy = 60) from central India. We performed KASP genotyping assay to determine the allele frequency of SNP genotypes. We used the odds ratio (OR) to assess the association of individual SNPs, rs34970857 of KCNA10, rs6066043 of SLC13A3, and rs2910164 of miR-146a with CKDu and CKD susceptibility. In the case of rs34970857 of the KCNA10 gene, we noted a significantly increased OR for CKDu versus healthy control (Dominant model; CKDu versus control, CT + CC versus TT, OR = 3.96, p = 0.004). In the recessive and homozygous model, we observed significantly increased OR for rs6066043 of SLC13A3 gene, CKDu versus healthy control {(Recessive model; CKDu versus control, GG versus AA + GA, OR = 2.41, p = 0.03; homozygous model, GG versus AA, OR = 3.54, p = 0.04)}. CC genotype of rs34970857 of the KCNA10 gene and the GG genotype of the SLC13A3 gene are significantly associated with the susceptibility of CKDu.
Collapse
|
3
|
Nguyen P, Ohnmacht AJ, Galhoz A, Büttner M, Theis F, Menden MP. Künstliche Intelligenz und maschinelles Lernen in der Diabetesforschung. DIABETOLOGE 2021. [DOI: 10.1007/s11428-021-00817-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
4
|
Mieth B, Rozier A, Rodriguez JA, Höhne MMC, Görnitz N, Müller KR. DeepCOMBI: explainable artificial intelligence for the analysis and discovery in genome-wide association studies. NAR Genom Bioinform 2021; 3:lqab065. [PMID: 34296082 PMCID: PMC8291080 DOI: 10.1093/nargab/lqab065] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 05/27/2021] [Accepted: 07/08/2021] [Indexed: 02/06/2023] Open
Abstract
Deep learning has revolutionized data science in many fields by greatly improving prediction performances in comparison to conventional approaches. Recently, explainable artificial intelligence has emerged as an area of research that goes beyond pure prediction improvement by extracting knowledge from deep learning methodologies through the interpretation of their results. We investigate such explanations to explore the genetic architectures of phenotypes in genome-wide association studies. Instead of testing each position in the genome individually, the novel three-step algorithm, called DeepCOMBI, first trains a neural network for the classification of subjects into their respective phenotypes. Second, it explains the classifiers’ decisions by applying layer-wise relevance propagation as one example from the pool of explanation techniques. The resulting importance scores are eventually used to determine a subset of the most relevant locations for multiple hypothesis testing in the third step. The performance of DeepCOMBI in terms of power and precision is investigated on generated datasets and a 2007 study. Verification of the latter is achieved by validating all findings with independent studies published up until 2020. DeepCOMBI is shown to outperform ordinary raw P-value thresholding and other baseline methods. Two novel disease associations (rs10889923 for hypertension, rs4769283 for type 1 diabetes) were identified.
Collapse
Affiliation(s)
- Bettina Mieth
- Machine Learning Group, Technische Universität Berlin, Berlin 10587, Germany
| | - Alexandre Rozier
- Machine Learning Group, Technische Universität Berlin, Berlin 10587, Germany
| | - Juan Antonio Rodriguez
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona 08003, Spain
| | - Marina M C Höhne
- Machine Learning Group, Technische Universität Berlin, Berlin 10587, Germany
| | | | - Klaus-Robert Müller
- Machine Learning Group, Technische Universität Berlin, Berlin 10587, Germany
| |
Collapse
|
5
|
Fryett JJ, Morris AP, Cordell HJ. Investigation of prediction accuracy and the impact of sample size, ancestry, and tissue in transcriptome-wide association studies. Genet Epidemiol 2020; 44:425-441. [PMID: 32190932 PMCID: PMC8641384 DOI: 10.1002/gepi.22290] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Revised: 02/05/2020] [Accepted: 03/06/2020] [Indexed: 01/14/2023]
Abstract
In transcriptome-wide association studies (TWAS), gene expression values are predicted using genotype data and tested for association with a phenotype. The power of this approach to detect associations relies, at least in part, on the accuracy of the prediction. Here we compare the prediction accuracy of six different methods-LASSO, Ridge regression, Elastic net, Best Linear Unbiased Predictor, Bayesian Sparse Linear Mixed Model, and Random Forests-by performing cross-validation using data from the Geuvadis Project. We also examine prediction accuracy (a) at different sample sizes, (b) when ancestry of the prediction model training and testing populations is different, and (c) when the tissue used to train the model is different from the tissue to be predicted. We find that, for most genes, the expression cannot be accurately predicted, but in general sparse statistical models tend to outperform polygenic models at prediction. Average prediction accuracy is reduced when the model training set size is reduced or when predicting across ancestries and is marginally reduced when predicting across tissues. We conclude that using sparse statistical models and the development of large reference panels across multiple ethnicities and tissues will lead to better prediction of gene expression, and thus may improve TWAS power.
Collapse
Affiliation(s)
- James J. Fryett
- Population Health Sciences Institute, Faculty of Medical SciencesNewcastle UniversityNewcastle upon TyneUK
| | - Andrew P. Morris
- Division of Musculoskeletal and Dermatological SciencesUniversity of ManchesterManchesterUK
| | - Heather J. Cordell
- Population Health Sciences Institute, Faculty of Medical SciencesNewcastle UniversityNewcastle upon TyneUK
| |
Collapse
|
6
|
Park C, Jiang N, Park T. Pure additive contribution of genetic variants to a risk prediction model using propensity score matching: application to type 2 diabetes. Genomics Inform 2019; 17:e47. [PMID: 31896247 PMCID: PMC6944048 DOI: 10.5808/gi.2019.17.4.e47] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Accepted: 12/09/2019] [Indexed: 12/29/2022] Open
Abstract
The achievements of genome-wide association studies have suggested ways to predict diseases, such as type 2 diabetes (T2D), using single-nucleotide polymorphisms (SNPs). Most T2D risk prediction models have used SNPs in combination with demographic variables. However, it is difficult to evaluate the pure additive contribution of genetic variants to classically used demographic models. Since prediction models include some heritable traits, such as body mass index, the contribution of SNPs using unmatched case-control samples may be underestimated. In this article, we propose a method that uses propensity score matching to avoid underestimation by matching case and control samples, thereby determining the pure additive contribution of SNPs. To illustrate the proposed propensity score matching method, we used SNP data from the Korea Association Resources project and reported SNPs from the genome-wide association study catalog. We selected various SNP sets via stepwise logistic regression (SLR), least absolute shrinkage and selection operator (LASSO), and the elastic-net (EN) algorithm. Using these SNP sets, we made predictions using SLR, LASSO, and EN as logistic regression modeling techniques. The accuracy of the predictions was compared in terms of area under the receiver operating characteristic curve (AUC). The contribution of SNPs to T2D was evaluated by the difference in the AUC between models using only demographic variables and models that included the SNPs. The largest difference among our models showed that the AUC of the model using genetic variants with demographic variables could be 0.107 higher than that of the corresponding model using only demographic variables.
Collapse
Affiliation(s)
- Chanwoo Park
- Department of Statistics, Seoul National University, Seoul 08826, Korea
| | - Nan Jiang
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul 08826, Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
- Corresponding author: E-mail:
| |
Collapse
|
7
|
Chattopadhyay A, Lu TP. Gene-gene interaction: the curse of dimensionality. ANNALS OF TRANSLATIONAL MEDICINE 2019; 7:813. [PMID: 32042829 DOI: 10.21037/atm.2019.12.87] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Identified genetic variants from genome wide association studies frequently show only modest effects on the disease risk, leading to the "missing heritability" problem. An avenue, to account for a part of this "missingness" is to evaluate gene-gene interactions (epistasis) thereby elucidating their effect on complex diseases. This can potentially help with identifying gene functions, pathways, and drug targets. However, the exhaustive evaluation of all possible genetic interactions among millions of single nucleotide polymorphisms (SNPs) raises several issues, otherwise known as the "curse of dimensionality". The dimensionality involved in the epistatic analysis of such exponentially growing SNPs diminishes the usefulness of traditional, parametric statistical methods. With the immense popularity of multifactor dimensionality reduction (MDR), a non-parametric method, proposed in 2001, that classifies multi-dimensional genotypes into one- dimensional binary approaches, led to the emergence of a fast-growing collection of methods that were based on the MDR approach. Moreover, machine-learning (ML) methods such as random forests and neural networks (NNs), deep-learning (DL) approaches, and hybrid approaches have also been applied profusely, in the recent years, to tackle this dimensionality issue associated with whole genome gene-gene interaction studies. However, exhaustive searching in MDR based approaches or variable selection in ML methods, still pose the risk of missing out on relevant SNPs. Furthermore, interpretability issues are a major hindrance for DL methods. To minimize this loss of information, Python based tools such as PySpark can potentially take advantage of distributed computing resources in the cloud, to bring back smaller subsets of data for further local analysis. Parallel computing can be a powerful resource that stands to fight this "curse". PySpark supports all standard Python libraries and C extensions thus making it convenient to write codes to deliver dramatic improvements in processing speed for extraordinarily large sets of data.
Collapse
Affiliation(s)
- Amrita Chattopadhyay
- Institute of Epidemiology and Preventive Medicine, Department of Public Health, National Taiwan University, Taipei
| | - Tzu-Pin Lu
- Institute of Epidemiology and Preventive Medicine, Department of Public Health, National Taiwan University, Taipei
| |
Collapse
|
8
|
An H, Wei CS, Wang O, Wang DH, Xu LW, Lu Q, Ye CY. An ensemble-based likelihood ratio approach for family-based genomic risk prediction. J Zhejiang Univ Sci B 2018; 19:935-947. [PMID: 30507077 PMCID: PMC6305257 DOI: 10.1631/jzus.b1800162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2018] [Revised: 07/12/2018] [Accepted: 07/12/2018] [Indexed: 11/11/2022]
Abstract
OBJECTIVE As one of the most popular designs used in genetic research, family-based design has been well recognized for its advantages, such as robustness against population stratification and admixture. With vast amounts of genetic data collected from family-based studies, there is a great interest in studying the role of genetic markers from the aspect of risk prediction. This study aims to develop a new statistical approach for family-based risk prediction analysis with an improved prediction accuracy compared with existing methods based on family history. METHODS In this study, we propose an ensemble-based likelihood ratio (ELR) approach, Fam-ELR, for family-based genomic risk prediction. Fam-ELR incorporates a clustered receiver operating characteristic (ROC) curve method to consider correlations among family samples, and uses a computationally efficient tree-assembling procedure for variable selection and model building. RESULTS Through simulations, Fam-ELR shows its robustness in various underlying disease models and pedigree structures, and attains better performance than two existing family-based risk prediction methods. In a real-data application to a family-based genome-wide dataset of conduct disorder, Fam-ELR demonstrates its ability to integrate potential risk predictors and interactions into the model for improved accuracy, especially on a genome-wide level. CONCLUSIONS By comparing existing approaches, such as genetic risk-score approach, Fam-ELR has the capacity of incorporating genetic variants with small or moderate marginal effects and their interactions into an improved risk prediction model. Therefore, it is a robust and useful approach for high-dimensional family-based risk prediction, especially on complex disease with unknown or less known disease etiology.
Collapse
Affiliation(s)
- Hui An
- Department of Health Management, School of Medicine, Hangzhou Normal University, Hangzhou 310036, China
| | - Chang-shuai Wei
- Department of Biostatistics and Epidemiology, University of North Texas Health Science Center, Fort Worth, TX 76107, USA
| | | | - Da-hui Wang
- Department of Health Management, School of Medicine, Hangzhou Normal University, Hangzhou 310036, China
| | - Liang-wen Xu
- Department of Preventive Medicine, School of Medicine, Hangzhou Normal University, Hangzhou 310036, China
| | - Qing Lu
- Department of Epidemiology and Biostatistics, College of Human Medicine, Michigan State University, East Lansing, MI 48824, USA
| | - Cheng-yin Ye
- Department of Health Management, School of Medicine, Hangzhou Normal University, Hangzhou 310036, China
| |
Collapse
|
9
|
Lee J, Lee Y, Park B, Won S, Han JS, Heo NJ. Genome-wide association analysis identifies multiple loci associated with kidney disease-related traits in Korean populations. PLoS One 2018; 13:e0194044. [PMID: 29558500 PMCID: PMC5860731 DOI: 10.1371/journal.pone.0194044] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2017] [Accepted: 02/25/2018] [Indexed: 12/19/2022] Open
Abstract
Chronic kidney disease (CKD) is an important social health problem characterized by a decrease in the kidney glomerular filtration rate (GFR). In this study, we analyzed genome-wide association studies for kidney disease-related traits using data from a Korean adult health screening cohort comprising 7,064 participants. Kidney disease-related traits analyzed include blood urea nitrogen (BUN), serum creatinine, estimated GFR, and uric acid levels. We detected two genetic loci (SLC14A2 and an intergenic region) and 8 single nucleotide polymorphisms (SNPs) associated with BUN, 3 genetic loci (BCAS3, C17orf82, ALDH2) and 6 SNPs associated with serum creatinine, 3 genetic loci (BCAS3, C17orf82/TBX2, LRP2) and 7 SNPs associated with GFR, and 14 genetic loci (3 in ABCG2/PKD2, 2 in SLC2A9, 3 in intergenic regions on chromosome 4; OTUB1, NRXN2/SLC22A12, CDC42BPG, RPS6KA4, SLC22A9, and MAP4K2 on chromosome 11) and 84 SNPs associated with uric acid levels. By comparing significant genetic loci associated with serum creatinine levels and GFR, rs9895661 in BCAS3 and rs757608 in C17orf82 were simultaneously associated with both traits. The SNPs rs11710227 in intergenic regions on chromosome 3 showing significant association with BUN is newly discovered. Genetic variations of multiple gene loci are associated with kidney disease-related traits, and differences in associations between kidney disease-related traits and genetic variation are dependent on the population. The meanings of the mutations identified in this study will need to be reaffirmed in other population groups in the future.
Collapse
Affiliation(s)
- Jeonghwan Lee
- Department of Internal Medicine, Hallym University Hangang Sacred Heart Hospital, Seoul, Korea
| | - Young Lee
- Veterans Medical Research Institute, Veterans Health Service Medical Center, Seoul, Korea
| | - Boram Park
- Department of Public Health Science, Seoul National University, Seoul, Korea
| | - Sungho Won
- Department of Public Health Science, Seoul National University, Seoul, Korea
- Interdisciplinary Program of Bioinformatics, Seoul National University, Seoul, Korea
- Institute of Health and Environment, Seoul National University, Seoul, Korea
| | - Jin Suk Han
- Department of Internal Medicine, Seoul National University College of Medicine, Seoul, Korea
| | - Nam Ju Heo
- Division of Nephrology, Department of Internal Medicine, Healthcare System Gangnam Center, Seoul National University Hospital, Seoul, Korea
- * E-mail:
| |
Collapse
|