1
|
Zhang S, Jiang Z, Zeng P. Incorporating genetic similarity of auxiliary samples into eGene identification under the transfer learning framework. J Transl Med 2024; 22:258. [PMID: 38461317 PMCID: PMC10924384 DOI: 10.1186/s12967-024-05053-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 03/01/2024] [Indexed: 03/11/2024] Open
Abstract
BACKGROUND The term eGene has been applied to define a gene whose expression level is affected by at least one independent expression quantitative trait locus (eQTL). It is both theoretically and empirically important to identify eQTLs and eGenes in genomic studies. However, standard eGene detection methods generally focus on individual cis-variants and cannot efficiently leverage useful knowledge acquired from auxiliary samples into target studies. METHODS We propose a multilocus-based eGene identification method called TLegene by integrating shared genetic similarity information available from auxiliary studies under the statistical framework of transfer learning. We apply TLegene to eGene identification in ten TCGA cancers which have an explicit relevant tissue in the GTEx project, and learn genetic effect of variant in TCGA from GTEx. We also adopt TLegene to the Geuvadis project to evaluate its usefulness in non-cancer studies. RESULTS We observed substantial genetic effect correlation of cis-variants between TCGA and GTEx for a larger number of genes. Furthermore, consistent with the results of our simulations, we found that TLegene was more powerful than existing methods and thus identified 169 distinct candidate eGenes, which was much larger than the approach that did not consider knowledge transfer across target and auxiliary studies. Previous studies and functional enrichment analyses provided empirical evidence supporting the associations of discovered eGenes, and it also showed evidence of allelic heterogeneity of gene expression. Furthermore, TLegene identified more eGenes in Geuvadis and revealed that these eGenes were mainly enriched in cells EBV transformed lymphocytes tissue. CONCLUSION Overall, TLegene represents a flexible and powerful statistical method for eGene identification through transfer learning of genetic similarity shared across auxiliary and target studies.
Collapse
Affiliation(s)
- Shuo Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Zhou Jiang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Xuzhou Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Jiangsu Engineering Research Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
2
|
Kang HY, Choe EK. Clinical Strategies in Gene Screening Counseling for the Healthy General Population. Korean J Fam Med 2024; 45:61-68. [PMID: 38528647 DOI: 10.4082/kjfm.23.0254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 11/26/2023] [Indexed: 03/27/2024] Open
Abstract
The burgeoning interest in precision medicine has propelled an increase in the use of genome tests for screening purposes within the healthy population. Gene screening tests aim to pre-emptively identify those individuals who may be genetically predisposed to certain diseases. However, as genetic screening becomes more commonplace, it is essential to acknowledge the unique challenges it poses. A prevalent issue in this regard is the occurrence of falsepositive results, which can lead to unnecessary additional tests or treatments, and psychological distress. Additionally, the interpretation of genomic variants is based on current research evidence, and can accordingly change as new research findings emerge, potentially altering the clinical significance of these variants. Conversely, a further prominent concern regards false assurances in genetic testing, as genetic tests can yield false-negative results, potentially posing a significant clinical risk. Moreover, the results obtained for the same disease can vary among different genetic testing services, due to differences in the types of variants assessed, the scope of tests, analytical methods, and the algorithms used for predicting diseases. Consequently, whereas genetic testing holds significant promise for the future of medicine, it poses unique challenges. If conducted without a full understanding of its implications, genetic testing may fail to achieve its purpose potentially hindering effective health management. Therefore, to ensure a comprehensive understanding of the implications of genetic testing within the general population, sufficient discussion and careful consideration should be given to counseling based on gene test results.
Collapse
Affiliation(s)
- Hae Yeon Kang
- Department of Internal Medicine, Healthcare Research Institute, Healthcare System Gangnam Center, Seoul National University Hospital, Seoul, Korea
| | - Eun Kyung Choe
- Department of Surgery, Healthcare Research Institute, Healthcare System Gangnam Center, Seoul National University Hospital, Seoul, Korea
| |
Collapse
|
3
|
Lee SB, Choi JE, Hong KW, Jung DH. Genetic Variants Linked to Myocardial Infarction in Individuals with Non-Alcoholic Fatty Liver Disease and Their Potential Interaction with Dietary Patterns. Nutrients 2024; 16:602. [PMID: 38474730 DOI: 10.3390/nu16050602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 02/12/2024] [Accepted: 02/20/2024] [Indexed: 03/14/2024] Open
Abstract
In recent studies, non-alcoholic fatty liver disease (NAFLD) has been associated with a high risk of ischemic heart disease. This study aimed to investigate a genetic variant within a specific gene associated with myocardial infarction (MI) among patients with NAFLD. We included 57,205 participants from a Korean genome and epidemiology study. The baseline population consisted of 45,400 individuals, with 11,805 identified as patients with NAFLD. Genome-wide association studies were conducted for three groups: the entire sample, the healthy population, and patients with NAFLD. We defined the p-value < 1 × 10-5 as the nominal significance and the p-value < 5 × 10-2 as statistically significant for the gene-by-nutrient interaction. Among the significant single-nucleotide polymorphisms (SNPs), the lead SNP of each locus was further analyzed. In this cross-sectional study, a total of 1529 participants (2.8%) had experienced MI. Multivariable logistic regression was performed to evaluate the association of 102 SNPs across nine loci. Nine SNPs (rs11891202, rs2278549, rs13146480, rs17293047, rs184257317, rs183081683, rs1887427, rs146939423, and rs76662689) demonstrated an association with MI in the group with NAFLD Notably, the MI-associated SNP, rs134146480, located within the SORCS2 gene, known for its role in secreting insulin in islet cells, showed the most significant association with MI (p-value = 2.55 × 10-7). Our study identifies candidate genetic polymorphisms associated with NAFLD-related MI. These findings may serve as valuable indicators for estimating MI risk and for conducting future investigations into the underlying mechanisms of NAFLD-related MI.
Collapse
Affiliation(s)
- Sung-Bum Lee
- Department of Family Medicine, Soonchunhyang University Bucheon Hospital, Bucheon 22972, Republic of Korea
| | - Ja-Eun Choi
- R&D Division, Theragen Health Co., Ltd., Seongnam-si 13493, Republic of Korea
| | - Kyung-Won Hong
- R&D Division, Theragen Health Co., Ltd., Seongnam-si 13493, Republic of Korea
| | - Dong-Hyuk Jung
- Department of Family Medicine, Yongin Severance Hospital, Yongin-si 16995, Republic of Korea
| |
Collapse
|
4
|
Seo H, Park JH, Hwang JT, Choi HK, Park SH, Lee J. Epigenetic Profiling of Type 2 Diabetes Mellitus: An Epigenome-Wide Association Study of DNA Methylation in the Korean Genome and Epidemiology Study. Genes (Basel) 2023; 14:2207. [PMID: 38137029 PMCID: PMC10743302 DOI: 10.3390/genes14122207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 12/08/2023] [Accepted: 12/12/2023] [Indexed: 12/24/2023] Open
Abstract
Diabetes is characterized by persistently high blood glucose levels and severe complications and affects millions of people worldwide. In this study, we explored the epigenetic landscape of diabetes using data from the Korean Genome and Epidemiology Study (KoGES), specifically the Ansung-Ansan (AS-AS) cohort. Using epigenome-wide association studies, we investigated DNA methylation patterns in patients with type 2 diabetes mellitus (T2DM) and those with normal glucose regulation. Differential methylation analysis revealed 106 differentially methylated probes (DMPs), with the 10 top DMPs prominently associated with TXNIP, PDK4, NBPF20, ARRDC4, UFM1, PFKFB2, C7orf50, and ABCG1, indicating significant changes in methylation. Correlation analysis highlighted the association between the leading DMPs (e.g., cg19693031 and cg26974062 for TXNIP and cg26823705 for NBPF20) and key glycemic markers (fasting plasma glucose and hemoglobin A1c), confirming their relevance in T2DM. Moreover, we identified 62 significantly differentially methylated regions (DMRs) spanning 61 genes. A DMR associated with PDE1C showed hypermethylation, whereas DMRs associated with DIP2C, FLJ90757, PRSS50, and TDRD9 showed hypomethylation. PDE1C and TDRD9 showed a strong positive correlation between the CpG sites included in each DMR, which have previously been implicated in T2DM-related processes. This study contributes to the understanding of epigenetic modifications in T2DM. These valuable insights can be utilized in identifying potential biomarkers and therapeutic targets for effective management and prevention of diabetes.
Collapse
Affiliation(s)
| | | | | | | | | | - Jangho Lee
- Korea Food Research Institute, Wanju-gun 55365, Jeollabuk-do, Republic of Korea; (H.S.); (J.-H.P.); (J.-T.H.); (H.-K.C.); (S.-H.P.)
| |
Collapse
|
5
|
Lu H, Zhang S, Jiang Z, Zeng P. Leveraging trans-ethnic genetic risk scores to improve association power for complex traits in underrepresented populations. Brief Bioinform 2023:bbad232. [PMID: 37332016 DOI: 10.1093/bib/bbad232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 05/06/2023] [Accepted: 06/04/2023] [Indexed: 06/20/2023] Open
Abstract
Trans-ethnic genome-wide association studies have revealed that many loci identified in European populations can be reproducible in non-European populations, indicating widespread trans-ethnic genetic similarity. However, how to leverage such shared information more efficiently in association analysis is less investigated for traits in underrepresented populations. We here propose a statistical framework, trans-ethnic genetic risk score informed gene-based association mixed model (GAMM), by hierarchically modeling single-nucleotide polymorphism effects in the target population as a function of effects of the same trait in well-studied populations. GAMM powerfully integrates genetic similarity across distinct ancestral groups to enhance power in understudied populations, as confirmed by extensive simulations. We illustrate the usefulness of GAMM via the application to 13 blood cell traits (i.e. basophil count, eosinophil count, hematocrit, hemoglobin concentration, lymphocyte count, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, mean corpuscular volume, monocyte count, neutrophil count, platelet count, red blood cell count and total white blood cell count) in Africans of the UK Biobank (n = 3204) while utilizing genetic overlap shared in Europeans (n = 746 667) and East Asians (n = 162 255). We discovered multiple new associated genes, which had otherwise been missed by existing methods, and revealed that the trans-ethnic information indirectly contributed much to the phenotypic variance. Overall, GAMM represents a flexible and powerful statistical framework of association analysis for complex traits in underrepresented populations by integrating trans-ethnic genetic similarity across well-studied populations, and helps attenuate health inequities in current genetics research for people of minority populations.
Collapse
Affiliation(s)
- Haojie Lu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Shuo Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Zhou Jiang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| |
Collapse
|
6
|
Shankar R, Dwivedi AK, Singh V, Jain M. Genome-wide discovery of genetic variations between rice cultivars with contrasting drought stress response and their potential functional relevance. PHYSIOLOGIA PLANTARUM 2023; 175:e13879. [PMID: 36805564 DOI: 10.1111/ppl.13879] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 02/14/2023] [Accepted: 02/15/2023] [Indexed: 06/18/2023]
Abstract
Drought stress is a serious threat to rice productivity. Investigating genetic variations between drought-tolerant (DT) and drought-sensitive (DS) rice cultivars may decipher the candidate genes/regulatory regions involved in drought stress tolerance/response. In this study, whole-genome resequencing data of four DS and five DT rice cultivars were analyzed. We identified a total of approximately 4.8 million single nucleotide polymorphisms (SNPs) and 0.54 million insertions/deletions (InDels). The genetic variations (162,638 SNPs and 17,217 InDels) differentiating DS and DT rice cultivars were found to be unevenly distributed throughout the rice genome; however, they were more frequent near the transcription start and stop sites than in the genic regions. The cis-regulatory motifs representing the binding sites of stress-related transcription factors (MYB, HB, bZIP, ERF, ARR, and AREB) harboring the SNPs/InDels in the promoter regions of a few differentially expressed genes (DEGs) were identified. Importantly, many of these DEGs were located within the drought-associated quantitative trait loci. Overall, this study provides a valuable large-scale genotyping resource and facilitates the discovery of candidate genes associated with drought stress tolerance in rice.
Collapse
Affiliation(s)
- Rama Shankar
- School of Computational & Integrative Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Anuj Kumar Dwivedi
- School of Computational & Integrative Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Vikram Singh
- School of Computational & Integrative Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Mukesh Jain
- School of Computational & Integrative Sciences, Jawaharlal Nehru University, New Delhi, India
| |
Collapse
|
7
|
CoNet: Efficient Network Regression for Survival Analysis in Transcriptome-Wide Association Studies—With Applications to Studies of Breast Cancer. Genes (Basel) 2023; 14:genes14030586. [PMID: 36980857 PMCID: PMC10048118 DOI: 10.3390/genes14030586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Revised: 02/23/2023] [Accepted: 02/23/2023] [Indexed: 03/02/2023] Open
Abstract
Transcriptome-wide association studies (TWASs) aim to detect associations between genetically predicted gene expression and complex diseases or traits through integrating genome-wide association studies (GWASs) and expression quantitative trait loci (eQTL) mapping studies. Most current TWAS methods analyze one gene at a time, ignoring the correlations between multiple genes. Few of the existing TWAS methods focus on survival outcomes. Here, we propose a novel method, namely a COx proportional hazards model for NEtwork regression in TWAS (CoNet), that is applicable for identifying the association between one given network and the survival time. CoNet considers the general relationship among the predicted gene expression as edges of the network and quantifies it through pointwise mutual information (PMI), which is under a two-stage TWAS. Extensive simulation studies illustrate that CoNet can not only achieve type I error calibration control in testing both the node effect and edge effect, but it can also gain more power compared with currently available methods. In addition, it demonstrates superior performance in real data application, namely utilizing the breast cancer survival data of UK Biobank. CoNet effectively accounts for network structure and can simultaneously identify the potential effecting nodes and edges that are related to survival outcomes in TWAS.
Collapse
|
8
|
Muneeb M, Feng S, Henschel A. Transfer learning for genotype-phenotype prediction using deep learning models. BMC Bioinformatics 2022; 23:511. [PMID: 36447153 PMCID: PMC9710151 DOI: 10.1186/s12859-022-05036-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 11/05/2022] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND For some understudied populations, genotype data is minimal for genotype-phenotype prediction. However, we can use the data of some other large populations to learn about the disease-causing SNPs and use that knowledge for the genotype-phenotype prediction of small populations. This manuscript illustrated that transfer learning is applicable for genotype data and genotype-phenotype prediction. RESULTS Using HAPGEN2 and PhenotypeSimulator, we generated eight phenotypes for 500 cases/500 controls (CEU, large population) and 100 cases/100 controls (YRI, small populations). We considered 5 (4 phenotypes) and 10 (4 phenotypes) different risk SNPs for each phenotype to evaluate the proposed method. The improved accuracy with transfer learning for eight different phenotypes was between 2 and 14.2 percent. The two-tailed p-value between the classification accuracies for all phenotypes without transfer learning and with transfer learning was 0.0306 for five risk SNPs phenotypes and 0.0478 for ten risk SNPs phenotypes. CONCLUSION The proposed pipeline is used to transfer knowledge for the case/control classification of the small population. In addition, we argue that this method can also be used in the realm of endangered species and personalized medicine. If the large population data is extensive compared to small population data, expect transfer learning results to improve significantly. We show that Transfer learning is capable to create powerful models for genotype-phenotype predictions in large, well-studied populations and fine-tune these models to populations were data is sparse.
Collapse
Affiliation(s)
- Muhammad Muneeb
- grid.440568.b0000 0004 1762 9729Department of Electrical Engineering and Computer Science, Khalifa University of Science and Technology, Al Saada St - Zone 1, Abu Dhabi, United Arab Emirates
| | - Samuel Feng
- grid.449223.a0000 0004 1754 9534Department of Science and Engineering, Sorbonne University Abu Dhabi, PO Box 38044, Abu Dhabi, United Arab Emirates
| | - Andreas Henschel
- grid.440568.b0000 0004 1762 9729Department of Electrical Engineering and Computer Science, Khalifa University of Science and Technology, Al Saada St - Zone 1, Abu Dhabi, United Arab Emirates
| |
Collapse
|
9
|
Qiao J, Shao Z, Wu Y, Zeng P, Wang T. Detecting associated genes for complex traits shared across East Asian and European populations under the framework of composite null hypothesis testing. Lab Invest 2022; 20:424. [PMID: 36138484 PMCID: PMC9503281 DOI: 10.1186/s12967-022-03637-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 09/12/2022] [Indexed: 11/21/2022]
Abstract
Background Detecting trans-ethnic common associated genetic loci can offer important insights into shared genetic components underlying complex diseases/traits across diverse continental populations. However, effective statistical methods for such a goal are currently lacking. Methods By leveraging summary statistics available from global-scale genome-wide association studies, we herein proposed a novel genetic overlap detection method called CONTO (COmposite Null hypothesis test for Trans-ethnic genetic Overlap) from the perspective of high-dimensional composite null hypothesis testing. Unlike previous studies which generally analyzed individual genetic variants, CONTO is a gene-centric method which focuses on a set of genetic variants located within a gene simultaneously and assesses their joint significance with the trait of interest. By borrowing the similar principle of joint significance test (JST), CONTO takes the maximum P value of multiple associations as the significance measurement. Results Compared to JST which is often overly conservative, CONTO is improved in two aspects, including the construction of three-component mixture null distribution and the adjustment of trans-ethnic genetic correlation. Consequently, CONTO corrects the conservativeness of JST with well-calibrated P values and is much more powerful validated by extensive simulation studies. We applied CONTO to discover common associated genes for 31 complex diseases/traits between the East Asian and European populations, and identified many shared trait-associated genes that had otherwise been missed by JST. We further revealed that population-common genes were generally more evolutionarily conserved than population-specific or null ones. Conclusion Overall, CONTO represents a powerful method for detecting common associated genes across diverse ancestral groups; our results provide important implications on the transferability of GWAS discoveries in one population to others. Supplementary Information The online version contains supplementary material available at 10.1186/s12967-022-03637-8.
Collapse
Affiliation(s)
- Jiahao Qiao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Zhonghe Shao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Yuxuan Wu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China. .,Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China. .,Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China. .,Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China. .,Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
10
|
Roh H. A genome-wide association study of the occurrence of genetic variations in Edwardsiella piscicida, Vibrio harveyi, and Streptococcus parauberis under stressed environments. JOURNAL OF FISH DISEASES 2022; 45:1373-1388. [PMID: 35735095 PMCID: PMC9541752 DOI: 10.1111/jfd.13668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 05/29/2022] [Accepted: 06/01/2022] [Indexed: 06/15/2023]
Abstract
Bacterial mutation and genetic diversity in aquaculture have led to increasing phenotypic variances, which can weaken or invalidate strategies for controlling diseases. However, few studies have monitored the degree of mutation in fish bacterial pathogens caused by environmental pressure within a short period. In this study, transcriptomic sequences from Edwardsiella piscicida, Vibrio harveyi and Streptococcus parauberis under stressed environments were used for investigating the emergence of variants. In detail, a sub-inhibitory concentration of formalin and phenol for E. piscicida, sea water at 30°C for V. harveyi and flounder serum for S. parauberis were used as stressed environments, and significant single-nucleotide polymorphisms (SNPs) and/or mutation sites were investigated after culture in the ordinary liquid media (control) and the stressed environment through a genome-wide association study. As results, several SNPs or mutations during incubation were observed under different environments in E. piscicida and/or V. harveyi in the genes relevant to flagella, fimbria type 3 secretion systems, and outer and inner membranes that have been directly exposed to external environments. In particular, given that flagella and fimbriae are considered important factors in differentiating the serotypes in some bacterial pathogens, it can be speculated that different environmental pressures are the source of phenotypic or serotypic differentiation from the same origin. On the other hands, S. parauberis did not exhibit notable changes for 4 h when inoculated in the serum from olive flounder. The results presented in this study provide examples of possible molecular evolution in pathogens relevant to the aquaculture industry as a response to different environmental pressure.
Collapse
Affiliation(s)
- HyeongJin Roh
- Pathogens and Disease TransferInstitute of Marine ResearchBergenNorway
| |
Collapse
|
11
|
Shao Z, Wang T, Qiao J, Zhang Y, Huang S, Zeng P. A comprehensive comparison of multilocus association methods with summary statistics in genome-wide association studies. BMC Bioinformatics 2022; 23:359. [PMID: 36042399 PMCID: PMC9429742 DOI: 10.1186/s12859-022-04897-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 08/22/2022] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Multilocus analysis on a set of single nucleotide polymorphisms (SNPs) pre-assigned within a gene constitutes a valuable complement to single-marker analysis by aggregating data on complex traits in a biologically meaningful way. However, despite the existence of a wide variety of SNP-set methods, few comprehensive comparison studies have been previously performed to evaluate the effectiveness of these methods. RESULTS We herein sought to fill this knowledge gap by conducting a comprehensive empirical comparison for 22 commonly-used summary-statistics based SNP-set methods. We showed that only seven methods could effectively control the type I error, and that these well-calibrated approaches had varying power performance under the simulation scenarios. Overall, we confirmed that the burden test was generally underpowered and score-based variance component tests (e.g., sequence kernel association test) were much powerful under the polygenic genetic architecture in both common and rare variant association analyses. We further revealed that two linkage-disequilibrium-free P value combination methods (e.g., harmonic mean P value method and aggregated Cauchy association test) behaved very well under the sparse genetic architecture in simulations and real-data applications to common and rare variant association analyses as well as in expression quantitative trait loci weighted integrative analysis. We also assessed the scalability of these approaches by recording computational time and found that all these methods can be scalable to biobank-scale data although some might be relatively slow. CONCLUSION In conclusion, we hope that our findings can offer an important guidance on how to choose appropriate multilocus association analysis methods in post-GWAS era. All the SNP-set methods are implemented in the R package called MCA, which is freely available at https://github.com/biostatpzeng/ .
Collapse
Affiliation(s)
- Zhonghe Shao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Jiahao Qiao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Yuchen Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Shuiping Huang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
12
|
Zhang M, Qiao J, Zhang S, Zeng P. Exploring the association between birthweight and breast cancer using summary statistics from a perspective of genetic correlation, mediation, and causality. J Transl Med 2022; 20:227. [PMID: 35568861 PMCID: PMC9107660 DOI: 10.1186/s12967-022-03435-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Accepted: 04/04/2022] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Previous studies demonstrated a positive relationship between birthweight and breast cancer; however, inconsistent, sometimes even controversial, observations also emerged, and the nature of such relationship remains unknown. METHODS Using summary statistics of birthweight and breast cancer, we assessed the fetal/maternal-specific genetic correlation between them via LDSC and prioritized fetal/maternal-specific pleiotropic genes through MAIUP. Relying on summary statistics we conducted Mendelian randomization (MR) to evaluate the fetal/maternal-specific origin of causal relationship between birthweight, age of menarche, age at menopause and breast cancer. RESULTS With summary statistics we identified a positive genetic correlation between fetal-specific birthweight and breast cancer (rg = 0.123 and P = 0.013) as well as a negative but insignificant correlation between maternal-specific birthweight and breast cancer (rg = - 0.068, P = 0.206); and detected 84 pleiotropic genes shared by fetal-specific birthweight and breast cancer, 49 shared by maternal-specific birthweight and breast cancer. We also revealed fetal-specific birthweight indirectly influenced breast cancer risk in adulthood via the path of age of menarche or age at menopause in terms of MR-based mediation analysis. CONCLUSION This study reveals that shared genetic foundation and causal mediation commonly drive the connection between the two traits, and that fetal/maternal-specific birthweight plays substantially distinct roles in such relationship. However, our work offers little supportive evidence for the fetal origins hypothesis of breast cancer originating in utero.
Collapse
Affiliation(s)
- Meng Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Jiahao Qiao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Shuo Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China. .,Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China. .,Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China. .,Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
13
|
Lu H, Qiao J, Shao Z, Wang T, Huang S, Zeng P. A comprehensive gene-centric pleiotropic association analysis for 14 psychiatric disorders with GWAS summary statistics. BMC Med 2021; 19:314. [PMID: 34895209 PMCID: PMC8667366 DOI: 10.1186/s12916-021-02186-z] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 11/10/2021] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Recent genome-wide association studies (GWASs) have revealed the polygenic nature of psychiatric disorders and discovered a few of single-nucleotide polymorphisms (SNPs) associated with multiple psychiatric disorders. However, the extent and pattern of pleiotropy among distinct psychiatric disorders remain not completely clear. METHODS We analyzed 14 psychiatric disorders using summary statistics available from the largest GWASs by far. We first applied the cross-trait linkage disequilibrium score regression (LDSC) to estimate genetic correlation between disorders. Then, we performed a gene-based pleiotropy analysis by first aggregating a set of SNP-level associations into a single gene-level association signal using MAGMA. From a methodological perspective, we viewed the identification of pleiotropic associations across the entire genome as a high-dimensional problem of composite null hypothesis testing and utilized a novel method called PLACO for pleiotropy mapping. We ultimately implemented functional analysis for identified pleiotropic genes and used Mendelian randomization for detecting causal association between these disorders. RESULTS We confirmed extensive genetic correlation among psychiatric disorders, based on which these disorders can be grouped into three diverse categories. We detected a large number of pleiotropic genes including 5884 associations and 2424 unique genes and found that differentially expressed pleiotropic genes were significantly enriched in pancreas, liver, heart, and brain, and that the biological process of these genes was remarkably enriched in regulating neurodevelopment, neurogenesis, and neuron differentiation, offering substantial evidence supporting the validity of identified pleiotropic loci. We further demonstrated that among all the identified pleiotropic genes there were 342 unique ones linked with 6353 drugs with drug-gene interaction which can be classified into distinct types including inhibitor, agonist, blocker, antagonist, and modulator. We also revealed causal associations among psychiatric disorders, indicating that genetic overlap and causality commonly drove the observed co-existence of these disorders. CONCLUSIONS Our study is among the first large-scale effort to characterize gene-level pleiotropy among a greatly expanded set of psychiatric disorders and provides important insight into shared genetic etiology underlying these disorders. The findings would inform psychiatric nosology, identify potential neurobiological mechanisms predisposing to specific clinical presentations, and pave the way to effective drug targets for clinical treatment.
Collapse
Affiliation(s)
- Haojie Lu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Jiahao Qiao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Zhonghe Shao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Shuiping Huang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
14
|
Suitability of GWAS as a Tool to Discover SNPs Associated with Tick Resistance in Cattle: A Review. Pathogens 2021; 10:pathogens10121604. [PMID: 34959558 PMCID: PMC8707706 DOI: 10.3390/pathogens10121604] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 11/22/2021] [Accepted: 12/01/2021] [Indexed: 12/22/2022] Open
Abstract
Understanding the biological mechanisms underlying tick resistance in cattle holds the potential to facilitate genetic improvement through selective breeding. Genome wide association studies (GWAS) are popular in research on unraveling genetic determinants underlying complex traits such as tick resistance. To date, various studies have been published on single nucleotide polymorphisms (SNPs) associated with tick resistance in cattle. The discovery of SNPs related to tick resistance has led to the mapping of associated candidate genes. Despite the success of these studies, information on genetic determinants associated with tick resistance in cattle is still limited. This warrants the need for more studies to be conducted. In Africa, the cost of genotyping is still relatively expensive; thus, conducting GWAS is a challenge, as the minimum number of animals recommended cannot be genotyped. These population size and genotype cost challenges may be overcome through the establishment of collaborations. Thus, the current review discusses GWAS as a tool to uncover SNPs associated with tick resistance, by focusing on the study design, association analysis, factors influencing the success of GWAS, and the progress on cattle tick resistance studies.
Collapse
|
15
|
Monnot S, Desaint H, Mary-Huard T, Moreau L, Schurdi-Levraud V, Boissot N. Deciphering the Genetic Architecture of Plant Virus Resistance by GWAS, State of the Art and Potential Advances. Cells 2021; 10:3080. [PMID: 34831303 PMCID: PMC8625838 DOI: 10.3390/cells10113080] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 11/03/2021] [Accepted: 11/04/2021] [Indexed: 01/04/2023] Open
Abstract
Growing virus resistant varieties is a highly effective means to avoid yield loss due to infection by many types of virus. The challenge is to be able to detect resistance donors within plant species diversity and then quickly introduce alleles conferring resistance into elite genetic backgrounds. Until now, mainly monogenic forms of resistance with major effects have been introduced in crops. Polygenic resistance is harder to map and introduce in susceptible genetic backgrounds, but it is likely more durable. Genome wide association studies (GWAS) offer an opportunity to accelerate mapping of both monogenic and polygenic resistance, but have seldom been implemented and described in the plant-virus interaction context. Yet, all of the 48 plant-virus GWAS published so far have successfully mapped QTLs involved in plant virus resistance. In this review, we analyzed general and specific GWAS issues regarding plant virus resistance. We have identified and described several key steps throughout the GWAS pipeline, from diversity panel assembly to GWAS result analyses. Based on the 48 published articles, we analyzed the impact of each key step on the GWAS power and showcase several GWAS methods tailored to all types of viruses.
Collapse
Affiliation(s)
- Severine Monnot
- INRAE, Génétique et Amélioration des Fruits et Légumes (GAFL), 84143 Montfavet, France
- Bayer Crop Science, Chemin de Roque Martine, 13670 Saint-Andiol, France
| | - Henri Desaint
- INRAE, Génétique et Amélioration des Fruits et Légumes (GAFL), 84143 Montfavet, France
| | - Tristan Mary-Huard
- INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution-Le Moulon, Université Paris-Saclay, Ferme du Moulon, 91190 Gif-sur-Yvette, France
- Mathématiques et Informatique Appliquées (MIA)-Paris, INRAE, AgroParisTech, Université Paris-Saclay, 75231 Paris, France
| | - Laurence Moreau
- INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution-Le Moulon, Université Paris-Saclay, Ferme du Moulon, 91190 Gif-sur-Yvette, France
| | | | - Nathalie Boissot
- INRAE, Génétique et Amélioration des Fruits et Légumes (GAFL), 84143 Montfavet, France
| |
Collapse
|
16
|
Lu H, Wei Y, Jiang Z, Zhang J, Wang T, Huang S, Zeng P. Integrative eQTL-weighted hierarchical Cox models for SNP-set based time-to-event association studies. J Transl Med 2021; 19:418. [PMID: 34627275 PMCID: PMC8502405 DOI: 10.1186/s12967-021-03090-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2021] [Accepted: 09/26/2021] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Integrating functional annotations into SNP-set association studies has been proven a powerful analysis strategy. Statistical methods for such integration have been developed for continuous and binary phenotypes; however, the SNP-set integrative approaches for time-to-event or survival outcomes are lacking. METHODS We here propose IEHC, an integrative eQTL (expression quantitative trait loci) hierarchical Cox regression, for SNP-set based survival association analysis by modeling effect sizes of genetic variants as a function of eQTL via a hierarchical manner. Three p-values combination tests are developed to examine the joint effects of eQTL and genetic variants after a novel decorrelated modification of statistics for the two components. An omnibus test (IEHC-ACAT) is further adapted to aggregate the strengths of all available tests. RESULTS Simulations demonstrated that the IEHC joint tests were more powerful if both eQTL and genetic variants contributed to association signal, while IEHC-ACAT was robust and often outperformed other approaches across various simulation scenarios. When applying IEHC to ten TCGA cancers by incorporating eQTL from relevant tissues of GTEx, we revealed that substantial correlations existed between the two types of effect sizes of genetic variants from TCGA and GTEx, and identified 21 (9 unique) cancer-associated genes which would otherwise be missed by approaches not incorporating eQTL. CONCLUSION IEHC represents a flexible, robust, and powerful approach to integrate functional omics information to enhance the power of identifying association signals for the survival risk of complex human cancers.
Collapse
Affiliation(s)
- Haojie Lu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Yongyue Wei
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, 211166, Jiangsu, China
| | - Zhou Jiang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Jinhui Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Shuiping Huang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
17
|
Gao Y, Zhang J, Zhao H, Guan F, Zeng P. Instrumental Heterogeneity in Sex-Specific Two-Sample Mendelian Randomization: Empirical Results From the Relationship Between Anthropometric Traits and Breast/Prostate Cancer. Front Genet 2021; 12:651332. [PMID: 34178025 PMCID: PMC8220153 DOI: 10.3389/fgene.2021.651332] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Accepted: 04/20/2021] [Indexed: 12/24/2022] Open
Abstract
Background In two-sample Mendelian randomization (MR) studies, sex instrumental heterogeneity is an important problem needed to address carefully, which however is often overlooked and may lead to misleading causal inference. Methods We first employed cross-trait linkage disequilibrium score regression (LDSC), Pearson's correlation analysis, and the Cochran's Q test to examine sex genetic similarity and heterogeneity in instrumental variables (IVs) of exposures. Simulation was further performed to explore the influence of sex instrumental heterogeneity on causal effect estimation in sex-specific two-sample MR analyses. Furthermore, we chose breast/prostate cancer as outcome and four anthropometric traits as exposures as an illustrative example to illustrate the importance of taking sex heterogeneity of instruments into account in MR studies. Results The simulation definitively demonstrated that sex-combined IVs can lead to biased causal effect estimates in sex-specific two-sample MR studies. In our real applications, both LDSC and Pearson's correlation analyses showed high genetic correlation between sex-combined and sex-specific IVs of the four anthropometric traits, while nearly all the correlation coefficients were larger than zero but less than one. The Cochran's Q test also displayed sex heterogeneity for some instruments. When applying sex-specific instruments, significant discrepancies in the magnitude of estimated causal effects were detected for body mass index (BMI) on breast cancer (P = 1.63E-6), for hip circumference (HIP) on breast cancer (P = 1.25E-20), and for waist circumference (WC) on prostate cancer (P = 0.007) compared with those generated with sex-combined instruments. Conclusion Our study reveals that the sex instrumental heterogeneity has non-ignorable impact on sex-specific two-sample MR studies and the causal effects of anthropometric traits on breast/prostate cancer would be biased if sex-combined IVs are incorrectly employed.
Collapse
Affiliation(s)
- Yixin Gao
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Jinhui Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Huashuo Zhao
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Fengjun Guan
- Department of Pediatrics, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| |
Collapse
|
18
|
Muneeb M, Henschel A. Eye-color and Type-2 diabetes phenotype prediction from genotype data using deep learning methods. BMC Bioinformatics 2021; 22:198. [PMID: 33874881 PMCID: PMC8056510 DOI: 10.1186/s12859-021-04077-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Accepted: 03/03/2021] [Indexed: 01/08/2023] Open
Abstract
Background Genotype–phenotype predictions are of great importance in genetics. These predictions can help to find genetic mutations causing variations in human beings. There are many approaches for finding the association which can be broadly categorized into two classes, statistical techniques, and machine learning. Statistical techniques are good for finding the actual SNPs causing variation where Machine Learning techniques are good where we just want to classify the people into different categories. In this article, we examined the Eye-color and Type-2 diabetes phenotype. The proposed technique is a hybrid approach consisting of some parts from statistical techniques and remaining from Machine learning. Results The main dataset for Eye-color phenotype consists of 806 people. 404 people have Blue-Green eyes where 402 people have Brown eyes. After preprocessing we generated 8 different datasets, containing different numbers of SNPs, using the mutation difference and thresholding at individual SNP. We calculated three types of mutation at each SNP no mutation, partial mutation, and full mutation. After that data is transformed for machine learning algorithms. We used about 9 classifiers, RandomForest, Extreme Gradient boosting, ANN, LSTM, GRU, BILSTM, 1DCNN, ensembles of ANN, and ensembles of LSTM which gave the best accuracy of 0.91, 0.9286, 0.945, 0.94, 0.94, 0.92, 0.95, and 0.96% respectively. Stacked ensembles of LSTM outperformed other algorithms for 1560 SNPs with an overall accuracy of 0.96, AUC = 0.98 for brown eyes, and AUC = 0.97 for Blue-Green eyes. The main dataset for Type-2 diabetes consists of 107 people where 30 people are classified as cases and 74 people as controls. We used different linear threshold to find the optimal number of SNPs for classification. The final model gave an accuracy of 0.97%. Conclusion Genotype–phenotype predictions are very useful especially in forensic. These predictions can help to identify SNP variant association with traits and diseases. Given more datasets, machine learning model predictions can be increased. Moreover, the non-linearity in the Machine learning model and the combination of SNPs Mutations while training the model increases the prediction. We considered binary classification problems but the proposed approach can be extended to multi-class classification.
Collapse
Affiliation(s)
- Muhammad Muneeb
- Department of Electrical Engineering and Computer Science, Center for Biotechnology Khalifa University, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Andreas Henschel
- Department of Electrical Engineering and Computer Science, Center for Biotechnology Khalifa University, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates.
| |
Collapse
|
19
|
McGuire D, Jiang Y, Liu M, Weissenkampen JD, Eckert S, Yang L, Chen F, Berg A, Vrieze S, Jiang B, Li Q, Liu DJ. Model-based assessment of replicability for genome-wide association meta-analysis. Nat Commun 2021; 12:1964. [PMID: 33785739 PMCID: PMC8009871 DOI: 10.1038/s41467-021-21226-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 01/07/2021] [Indexed: 01/17/2023] Open
Abstract
Genome-wide association meta-analysis (GWAMA) is an effective approach to enlarge sample sizes and empower the discovery of novel associations between genotype and phenotype. Independent replication has been used as a gold-standard for validating genetic associations. However, as current GWAMA often seeks to aggregate all available datasets, it becomes impossible to find a large enough independent dataset to replicate new discoveries. Here we introduce a method, MAMBA (Meta-Analysis Model-based Assessment of replicability), for assessing the "posterior-probability-of-replicability" for identified associations by leveraging the strength and consistency of association signals between contributing studies. We demonstrate using simulations that MAMBA is more powerful and robust than existing methods, and produces more accurate genetic effects estimates. We apply MAMBA to a large-scale meta-analysis of addiction phenotypes with 1.2 million individuals. In addition to accurately identifying replicable common variant associations, MAMBA also pinpoints novel replicable rare variant associations from imputation-based GWAMA and hence greatly expands the set of analyzable variants.
Collapse
Affiliation(s)
- Daniel McGuire
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA
| | - Yu Jiang
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA
| | - Mengzhen Liu
- Department of Psychology, University of Minnesota, Minneapolis, MN, USA
| | - J Dylan Weissenkampen
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA
| | - Scott Eckert
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA
| | - Lina Yang
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA
| | - Fang Chen
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA
| | | | - Arthur Berg
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA
| | - Scott Vrieze
- Department of Psychology, University of Minnesota, Minneapolis, MN, USA
| | - Bibo Jiang
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA.
| | - Qunhua Li
- Department of Statistics, Penn State University, University Park, PA, USA.
| | - Dajiang J Liu
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA.
| |
Collapse
|
20
|
Lu H, Zhang J, Jiang Z, Zhang M, Wang T, Zhao H, Zeng P. Detection of Genetic Overlap Between Rheumatoid Arthritis and Systemic Lupus Erythematosus Using GWAS Summary Statistics. Front Genet 2021; 12:656545. [PMID: 33815486 PMCID: PMC8012913 DOI: 10.3389/fgene.2021.656545] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 03/01/2021] [Indexed: 01/04/2023] Open
Abstract
Background Clinical and epidemiological studies have suggested systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA) are comorbidities and common genetic etiologies can partly explain such coexistence. However, shared genetic determinations underlying the two diseases remain largely unknown. Methods Our analysis relied on summary statistics available from genome-wide association studies of SLE (N = 23,210) and RA (N = 58,284). We first evaluated the genetic correlation between RA and SLE through the linkage disequilibrium score regression (LDSC). Then, we performed a multiple-tissue eQTL (expression quantitative trait loci) weighted integrative analysis for each of the two diseases and aggregated association evidence across these tissues via the recently proposed harmonic mean P-value (HMP) combination strategy, which can produce a single well-calibrated P-value for correlated test statistics. Afterwards, we conducted the pleiotropy-informed association using conjunction conditional FDR (ccFDR) to identify potential pleiotropic genes associated with both RA and SLE. Results We found there existed a significant positive genetic correlation (rg = 0.404, P = 6.01E-10) via LDSC between RA and SLE. Based on the multiple-tissue eQTL weighted integrative analysis and the HMP combination across various tissues, we discovered 14 potential pleiotropic genes by ccFDR, among which four were likely newly novel genes (i.e., INPP5B, OR5K2, RP11-2C24.5, and CTD-3105H18.4). The SNP effect sizes of these pleiotropic genes were typically positively dependent, with an average correlation of 0.579. Functionally, these genes were implicated in multiple auto-immune relevant pathways such as inositol phosphate metabolic process, membrane and glucagon signaling pathway. Conclusion This study reveals common genetic components between RA and SLE and provides candidate associated loci for understanding of molecular mechanism underlying the comorbidity of the two diseases.
Collapse
Affiliation(s)
- Haojie Lu
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Jinhui Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Zhou Jiang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Meng Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Ting Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Huashuo Zhao
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| |
Collapse
|
21
|
Scossa F, Fernie AR. Ancestral sequence reconstruction - An underused approach to understand the evolution of gene function in plants? Comput Struct Biotechnol J 2021; 19:1579-1594. [PMID: 33868595 PMCID: PMC8039532 DOI: 10.1016/j.csbj.2021.03.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2021] [Revised: 03/04/2021] [Accepted: 03/06/2021] [Indexed: 02/06/2023] Open
Abstract
Whilst substantial research effort has been placed on understanding the interactions of plant proteins with their molecular partners, relatively few studies in plants - by contrast to work in other organisms - address how these interactions evolve. It is thought that ancestral proteins were more promiscuous than modern proteins and that specificity often evolved following gene duplication and subsequent functional refining. However, ancestral protein resurrection studies have found that some modern proteins have evolved de novo from ancestors lacking those functions. Intriguingly, the new interactions evolved as a consequence of just a few mutations and, as such, acquisition of new functions appears to be neither difficult nor rare, however, only a few of them are incorporated into biological processes before they are lost to subsequent mutations. Here, we detail the approach of ancestral sequence reconstruction (ASR), providing a primer to reconstruct the sequence of an ancestral gene. We will present case studies from a range of different eukaryotes before discussing the few instances where ancestral reconstructions have been used in plants. As ASR is used to dig into the remote evolutionary past, we will also present some alternative genetic approaches to investigate molecular evolution on shorter timescales. We argue that the study of plant secondary metabolism is particularly well suited for ancestral reconstruction studies. Indeed, its ancient evolutionary roots and highly diverse landscape provide an ideal context in which to address the focal issue around the emergence of evolutionary novelties and how this affects the chemical diversification of plant metabolism.
Collapse
Key Words
- APR, ancestral protein resurrection
- ASR, ancestral sequence reconstruction
- Ancestral sequence reconstruction
- CDS, coding sequence
- Evolution
- GR, glucocorticoid receptor
- GWAS, genome wide association study
- Genomics
- InDel, insertion/deletion
- MCMC, Markov Chain Monte Carlo
- ML, maximum likelihood
- MP, maximum parsimony
- MR, mineralcorticoid receptor
- MSA, multiple sequence alignment
- Metabolism
- NJ, neighbor-joining
- Phylogenetics
- Plants
- SFS, site frequency spectrum
Collapse
Affiliation(s)
- Federico Scossa
- Max-Planck-Institute of Molecular Plant Physiology (MPI-MP), 14476 Potsdam-Golm, Germany
- Council for Agricultural Research and Economics (CREA), Research Centre for Genomics and Bioinformatics (CREA-GB), Rome, Italy
| | - Alisdair R. Fernie
- Max-Planck-Institute of Molecular Plant Physiology (MPI-MP), 14476 Potsdam-Golm, Germany
- Center of Plant Systems Biology and Biotechnology (CPSBB), Plovdiv, Bulgaria
| |
Collapse
|
22
|
Dennis JK, Sealock JM, Straub P, Lee YH, Hucks D, Actkins K, Faucon A, Feng YCA, Ge T, Goleva SB, Niarchou M, Singh K, Morley T, Smoller JW, Ruderfer DM, Mosley JD, Chen G, Davis LK. Clinical laboratory test-wide association scan of polygenic scores identifies biomarkers of complex disease. Genome Med 2021; 13:6. [PMID: 33441150 PMCID: PMC7807864 DOI: 10.1186/s13073-020-00820-8] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Accepted: 12/08/2020] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Clinical laboratory (lab) tests are used in clinical practice to diagnose, treat, and monitor disease conditions. Test results are stored in electronic health records (EHRs), and a growing number of EHRs are linked to patient DNA, offering unprecedented opportunities to query relationships between genetic risk for complex disease and quantitative physiological measurements collected on large populations. METHODS A total of 3075 quantitative lab tests were extracted from Vanderbilt University Medical Center's (VUMC) EHR system and cleaned for population-level analysis according to our QualityLab protocol. Lab values extracted from BioVU were compared with previous population studies using heritability and genetic correlation analyses. We then tested the hypothesis that polygenic risk scores for biomarkers and complex disease are associated with biomarkers of disease extracted from the EHR. In a proof of concept analyses, we focused on lipids and coronary artery disease (CAD). We cleaned lab traits extracted from the EHR performed lab-wide association scans (LabWAS) of the lipids and CAD polygenic risk scores across 315 heritable lab tests then replicated the pipeline and analyses in the Massachusetts General Brigham Biobank. RESULTS Heritability estimates of lipid values (after cleaning with QualityLab) were comparable to previous reports and polygenic scores for lipids were strongly associated with their referent lipid in a LabWAS. LabWAS of the polygenic score for CAD recapitulated canonical heart disease biomarker profiles including decreased HDL, increased pre-medication LDL, triglycerides, blood glucose, and glycated hemoglobin (HgbA1C) in European and African descent populations. Notably, many of these associations remained even after adjusting for the presence of cardiovascular disease and were replicated in the MGBB. CONCLUSIONS Polygenic risk scores can be used to identify biomarkers of complex disease in large-scale EHR-based genomic analyses, providing new avenues for discovery of novel biomarkers and deeper understanding of disease trajectories in pre-symptomatic individuals. We present two methods and associated software, QualityLab and LabWAS, to clean and analyze EHR labs at scale and perform a Lab-Wide Association Scan.
Collapse
Affiliation(s)
- Jessica K Dennis
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V5Z 4H4, Canada
| | - Julia M Sealock
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Peter Straub
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Younga H Lee
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA, 02115, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Donald Hucks
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Ky'Era Actkins
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Department of Microbiology, Immunology, and Physiology, Meharry Medical College, Nashville, TN, 37232, USA
| | - Annika Faucon
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Yen-Chen Anne Feng
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
- Analytic and Translational Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Tian Ge
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA, 02115, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Slavina B Goleva
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Department of Molecular Physiology and Biophysics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Maria Niarchou
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Kritika Singh
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Theodore Morley
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Jordan W Smoller
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA, 02115, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Douglas M Ruderfer
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Departments of Medicine and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Jonathan D Mosley
- Departments of Medicine and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Guanhua Chen
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Lea K Davis
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
- Department of Molecular Physiology and Biophysics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
- Departments of Medicine and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
- Division of Genetic Medicine, Department of Medicine, Vanderbilt Genetics Institute, Vanderbilt University, 511-A Light Hall, 2215 Garland Ave, Nashville, TN, 37232, USA.
| |
Collapse
|
23
|
Chen H, Wang T, Yang J, Huang S, Zeng P. Improved Detection of Potentially Pleiotropic Genes in Coronary Artery Disease and Chronic Kidney Disease Using GWAS Summary Statistics. Front Genet 2020; 11:592461. [PMID: 33343632 PMCID: PMC7744760 DOI: 10.3389/fgene.2020.592461] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Accepted: 11/17/2020] [Indexed: 12/24/2022] Open
Abstract
The coexistence of coronary artery disease (CAD) and chronic kidney disease (CKD) implies overlapped genetic foundation. However, the common genetic determination between the two diseases remains largely unknown. Relying on summary statistics publicly available from large scale genome-wide association studies (n = 184,305 for CAD and n = 567,460 for CKD), we observed significant positive genetic correlation between CAD and CKD (rg = 0.173, p = 0.024) via the linkage disequilibrium score regression. Next, we implemented gene-based association analysis for each disease through MAGMA (Multi-marker Analysis of GenoMic Annotation) and detected 763 and 827 genes associated with CAD or CKD (FDR < 0.05). Among those 72 genes were shared between the two diseases. Furthermore, by integrating the overlapped genetic information between CAD and CKD, we implemented two pleiotropy-informed informatics approaches including cFDR (conditional false discovery rate) and GPA (Genetic analysis incorporating Pleiotropy and Annotation), and identified 169 and 504 shared genes (FDR < 0.05), of which 121 genes were simultaneously discovered by cFDR and GPA. Importantly, we found 11 potentially new pleiotropic genes related to both CAD and CKD (i.e., ARHGEF19, RSG1, NDST2, CAMK2G, VCL, LRP10, RBM23, USP10, WNT9B, GOSR2, and RPRML). Five of the newly identified pleiotropic genes were further repeated via an additional dataset CAD available from UK Biobank. Our functional enrichment analysis showed that those pleiotropic genes were enriched in diverse relevant pathway processes including quaternary ammonium group transmembrane transporter, dopamine transport. Overall, this study identifies common genetic architectures overlapped between CAD and CKD and will help to advance understanding of the molecular mechanisms underlying the comorbidity of the two diseases.
Collapse
Affiliation(s)
- Haimiao Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Ting Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Jinna Yang
- Department of Infectious Diseases, People's Hospital of Zhuji, Shaoxing, China
| | - Shuiping Huang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| |
Collapse
|
24
|
Xiao L, Yuan Z, Jin S, Wang T, Huang S, Zeng P. Multiple-Tissue Integrative Transcriptome-Wide Association Studies Discovered New Genes Associated With Amyotrophic Lateral Sclerosis. Front Genet 2020; 11:587243. [PMID: 33329728 PMCID: PMC7714931 DOI: 10.3389/fgene.2020.587243] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2020] [Accepted: 10/26/2020] [Indexed: 12/12/2022] Open
Abstract
Genome-wide association studies (GWAS) have identified multiple causal genes associated with amyotrophic lateral sclerosis (ALS); however, the genetic architecture of ALS remains completely unknown and a large number of causal genes have yet been discovered. To full such gap in part, we implemented an integrative analysis of transcriptome-wide association study (TWAS) for ALS to prioritize causal genes with summary statistics from 80,610 European individuals and employed 13 GTEx brain tissues as reference transcriptome panels. The summary-level TWAS analysis with single brain tissue was first undertaken and then a flexible p-value combination strategy, called summary data-based Cauchy Aggregation TWAS (SCAT), was proposed to pool association signals from single-tissue TWAS analysis while protecting against highly positive correlation among tests. Extensive simulations demonstrated SCAT can produce well-calibrated p-value for the control of type I error and was often much more powerful to identify association signals across various scenarios compared with single-tissue TWAS analysis. Using SCAT, we replicated three ALS-associated genes (i.e., ATXN3, SCFD1, and C9orf72) identified in previous GWASs and discovered additional five genes (i.e., SLC9A8, FAM66D, TRIP11, JUP, and RP11-529H20.6) which were not reported before. Furthermore, we discovered the five associations were largely driven by genes themselves and thus might be new genes which were likely related to the risk of ALS. However, further investigations are warranted to verify these results and untangle the pathophysiological function of the genes in developing ALS.
Collapse
Affiliation(s)
- Lishun Xiao
- Department of Epidemiology and Biostatistics, Xuzhou Medical University, Xuzhou, China
| | - Zhongshang Yuan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Siyi Jin
- Department of Epidemiology and Biostatistics, Xuzhou Medical University, Xuzhou, China
| | - Ting Wang
- Department of Epidemiology and Biostatistics, Xuzhou Medical University, Xuzhou, China
| | - Shuiping Huang
- Department of Epidemiology and Biostatistics, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Ping Zeng
- Department of Epidemiology and Biostatistics, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| |
Collapse
|
25
|
Ramanan VK, Wang X, Przybelski SA, Raghavan S, Heckman MG, Batzler A, Kosel ML, Hohman TJ, Knopman DS, Graff-Radford J, Lowe VJ, Mielke MM, Jack CR, Petersen RC, Ross OA, Vemuri P. Variants in PPP2R2B and IGF2BP3 are associated with higher tau deposition. Brain Commun 2020; 2:fcaa159. [PMID: 33426524 PMCID: PMC7780444 DOI: 10.1093/braincomms/fcaa159] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Revised: 07/29/2020] [Accepted: 08/24/2020] [Indexed: 12/18/2022] Open
Abstract
Tau deposition is a key biological feature of Alzheimer’s disease that is closely related to cognitive impairment. However, it remains poorly understood why certain individuals may be more susceptible to tau deposition while others are more resistant. The recent availability of in vivo assessment of tau burden through positron emission tomography provides an opportunity to test the hypothesis that common genetic variants may influence tau deposition. We performed a genome-wide association study of tau-positron emission tomography on a sample of 754 individuals over age 50 (mean age 72.4 years, 54.6% men, 87.6% cognitively unimpaired) from the population-based Mayo Clinic Study of Aging. Linear regression was performed to test nucleotide polymorphism associations with AV-1451 (18F-flortaucipir) tau-positron emission tomography burden in an Alzheimer’s-signature composite region of interest, using an additive genetic model and covarying for age, sex and genetic principal components. Genome-wide significant associations with higher tau were identified for rs76752255 (P = 9.91 × 10−9, β = 0.20) in the tau phosphorylation regulatory gene PPP2R2B (protein phosphatase 2 regulatory subunit B) and for rs117402302 (P = 4.00 × 10−8, β = 0.19) near IGF2BP3 (insulin-like growth factor 2 mRNA-binding protein 3). The PPP2R2B association remained genome-wide significant after additionally covarying for global amyloid burden and cerebrovascular disease risk, while the IGF2BP3 association was partially attenuated after accounting for amyloid load. In addition to these discoveries, three single nucleotide polymorphisms within MAPT (microtubule-associated protein tau) displayed nominal associations with tau-positron emission tomography burden, and the association of the APOE (apolipoprotein E) ɛ4 allele with tau-positron emission tomography was marginally nonsignificant (P = 0.06, β = 0.07). No associations with tau-positron emission tomography burden were identified for other single nucleotide polymorphisms associated with Alzheimer’s disease clinical diagnosis in prior large case–control studies. Our findings nominate PPP2R2B and IGF2BP3 as novel potential influences on tau pathology which warrant further functional characterization. Our data are also supportive of previous literature on the associations of MAPT genetic variation with tau, and more broadly supports the inference that tau accumulation may have a genetic architecture distinct from known Alzheimer’s susceptibility genes, which may have implications for improved risk stratification and therapeutic targeting.
Collapse
Affiliation(s)
- Vijay K Ramanan
- Department of Neurology, Mayo Clinic-Minnesota, Rochester, MN 55905, USA
| | - Xuewei Wang
- Department of Health Sciences Research, Mayo Clinic-Minnesota, Rochester, MN 55905, USA
| | - Scott A Przybelski
- Department of Health Sciences Research, Mayo Clinic-Minnesota, Rochester, MN 55905, USA
| | | | - Michael G Heckman
- Division of Biomedical Statistics and Informatics, Mayo Clinic-Florida, Jacksonville, FL 32224, USA
| | - Anthony Batzler
- Department of Health Sciences Research, Mayo Clinic-Minnesota, Rochester, MN 55905, USA
| | - Matthew L Kosel
- Department of Health Sciences Research, Mayo Clinic-Minnesota, Rochester, MN 55905, USA
| | - Timothy J Hohman
- Department of Neurology, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - David S Knopman
- Department of Neurology, Mayo Clinic-Minnesota, Rochester, MN 55905, USA
| | | | - Val J Lowe
- Department of Radiology, Mayo Clinic-Minnesota, Rochester, MN 55905, USA
| | - Michelle M Mielke
- Department of Neurology, Mayo Clinic-Minnesota, Rochester, MN 55905, USA
| | - Clifford R Jack
- Department of Radiology, Mayo Clinic-Minnesota, Rochester, MN 55905, USA
| | - Ronald C Petersen
- Department of Neurology, Mayo Clinic-Minnesota, Rochester, MN 55905, USA
| | - Owen A Ross
- Department of Neuroscience, Mayo Clinic-Florida, Jacksonville, FL 32224, USA
| | - Prashanthi Vemuri
- Department of Radiology, Mayo Clinic-Minnesota, Rochester, MN 55905, USA
| |
Collapse
|
26
|
Interactions of Habitual Coffee Consumption by Genetic Polymorphisms with the Risk of Prediabetes and Type 2 Diabetes Combined. Nutrients 2020; 12:nu12082228. [PMID: 32722627 PMCID: PMC7468962 DOI: 10.3390/nu12082228] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Revised: 07/23/2020] [Accepted: 07/23/2020] [Indexed: 01/15/2023] Open
Abstract
Habitual coffee consumption and its association with health outcomes may be modified by genetic variation. Adults aged 40 to 69 years who participated in the Korea Association Resource (KARE) study were included in this study. We conducted a genome-wide association study (GWAS) on coffee consumption in 7868 Korean adults, and examined whether the association between coffee consumption and the risk of prediabetes and type 2 diabetes combined was modified by the genetic variations in 4054 adults. In the GWAS for coffee consumption, a total of five single nucleotide polymorphisms (SNPs) located in 12q24.11-13 (rs2074356, rs11066015, rs12229654, rs11065828, and rs79105258) were selected and used to calculate weighted genetic risk scores. Individuals who had a larger number of minor alleles for these five SNPs had higher genetic risk scores. Multivariate logistic regression models were used to estimate the odds ratios (ORs) and 95% confidence intervals (95% CIs) to examine the association. During the 12 years of follow-up, a total of 2468 (60.9%) and 480 (11.8%) participants were diagnosed as prediabetes or type 2 diabetes, respectively. Compared with non-black-coffee consumers, the OR (95% CI) for ≥2 cups/day by black-coffee consumers was 0.61 (0.38–0.95; p for trend = 0.023). Similarly, sugared coffee showed an inverse association. We found a potential interaction by the genetic variations related to black-coffee consumption, suggesting a stronger association among individuals with higher genetic risk scores compared to those with lower scores; the ORs (95% CIs) were 0.36 (0.15–0.88) for individuals with 5 to 10 points and 0.87 (0.46–1.66) for those with 0 points. Our study suggests that habitual coffee consumption was related to genetic polymorphisms and modified the risk of prediabetes and type 2 diabetes combined in a sample of the Korean population. The mechanisms between coffee-related genetic variation and the risk of prediabetes and type 2 diabetes combined warrant further investigation.
Collapse
|
27
|
Kuo TT, Jiang X, Tang H, Wang X, Bath T, Bu D, Wang L, Harmanci A, Zhang S, Zhi D, Sofia HJ, Ohno-Machado L. iDASH secure genome analysis competition 2018: blockchain genomic data access logging, homomorphic encryption on GWAS, and DNA segment searching. BMC Med Genomics 2020; 13:98. [PMID: 32693816 PMCID: PMC7372776 DOI: 10.1186/s12920-020-0715-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Affiliation(s)
- Tsung-Ting Kuo
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| | - Xiaoqian Jiang
- School of Biomedical Informatics, The University of Texas Health Science Center, Houston, TX, 77030, USA
| | - Haixu Tang
- School of Informatics, Computing and Engineering, Indiana University Bloomington, Bloomington, IN, 47408, USA
| | - XiaoFeng Wang
- School of Informatics, Computing and Engineering, Indiana University Bloomington, Bloomington, IN, 47408, USA
| | - Tyler Bath
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| | - Diyue Bu
- School of Informatics, Computing and Engineering, Indiana University Bloomington, Bloomington, IN, 47408, USA
| | - Lei Wang
- School of Informatics, Computing and Engineering, Indiana University Bloomington, Bloomington, IN, 47408, USA
| | - Arif Harmanci
- School of Biomedical Informatics, The University of Texas Health Science Center, Houston, TX, 77030, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Southern Florida, Orlando, FL, 32816, USA
| | - Degui Zhi
- School of Biomedical Informatics, The University of Texas Health Science Center, Houston, TX, 77030, USA
| | - Heidi J Sofia
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Lucila Ohno-Machado
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA.
- Division of Health Services Research & Development, VA San Diego Healthcare System, San Diego, CA, 92161, USA.
| |
Collapse
|
28
|
Padilla-Martínez F, Collin F, Kwasniewski M, Kretowski A. Systematic Review of Polygenic Risk Scores for Type 1 and Type 2 Diabetes. Int J Mol Sci 2020; 21:E1703. [PMID: 32131491 PMCID: PMC7084489 DOI: 10.3390/ijms21051703] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Revised: 02/28/2020] [Accepted: 02/28/2020] [Indexed: 02/07/2023] Open
Abstract
Recent studies have led to considerable advances in the identification of genetic variants associated with type 1 and type 2 diabetes. An approach for converting genetic data into a predictive measure of disease susceptibility is to add the risk effects of loci into a polygenic risk score. In order to summarize the recent findings, we conducted a systematic review of studies comparing the accuracy of polygenic risk scores developed during the last two decades. We selected 15 risk scores from three databases (Scopus, Web of Science and PubMed) enrolled in this systematic review. We identified three polygenic risk scores that discriminate between type 1 diabetes patients and healthy people, one that discriminate between type 1 and type 2 diabetes, two that discriminate between type 1 and monogenic diabetes and nine polygenic risk scores that discriminate between type 2 diabetes patients and healthy people. Prediction accuracy of polygenic risk scores was assessed by comparing the area under the curve. The actual benefits, potential obstacles and possible solutions for the implementation of polygenic risk scores in clinical practice were also discussed. Develop strategies to establish the clinical validity of polygenic risk scores by creating a framework for the interpretation of findings and their translation into actual evidence, are the way to demonstrate their utility in medical practice.
Collapse
Affiliation(s)
- Felipe Padilla-Martínez
- Centre for Bioinformatics and Data Analysis, Medical University of Bialystok, 15-276 Bialystok, Poland; (F.C.); (M.K.)
- Clinical Research Centre, Medical University of Bialystok, 15-276 Bialystok, Poland;
| | - Francois Collin
- Centre for Bioinformatics and Data Analysis, Medical University of Bialystok, 15-276 Bialystok, Poland; (F.C.); (M.K.)
| | - Miroslaw Kwasniewski
- Centre for Bioinformatics and Data Analysis, Medical University of Bialystok, 15-276 Bialystok, Poland; (F.C.); (M.K.)
| | - Adam Kretowski
- Clinical Research Centre, Medical University of Bialystok, 15-276 Bialystok, Poland;
- Department of Endocrinology, Diabetology and Internal Medicine, Medical University of Bialystok, 15-276 Bialystok, Poland
| |
Collapse
|
29
|
Gaudillo J, Rodriguez JJR, Nazareno A, Baltazar LR, Vilela J, Bulalacao R, Domingo M, Albia J. Machine learning approach to single nucleotide polymorphism-based asthma prediction. PLoS One 2019; 14:e0225574. [PMID: 31800601 PMCID: PMC6892549 DOI: 10.1371/journal.pone.0225574] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Accepted: 11/07/2019] [Indexed: 12/31/2022] Open
Abstract
Machine learning (ML) is poised as a transformational approach uniquely positioned to discover the hidden biological interactions for better prediction and diagnosis of complex diseases. In this work, we integrated ML-based models for feature selection and classification to quantify the risk of individual susceptibility to asthma using single nucleotide polymorphism (SNP). Random forest (RF) and recursive feature elimination (RFE) algorithm were implemented to identify the SNPs with high implication to asthma. K-nearest neighbor (kNN) and support vector machine (SVM) algorithms were trained to classify the identified SNPs whether associated with non-asthmatic or asthmatic samples. Feature selection step showed that RF outperformed RFE and the feature importance score derived from RF was consistently high for a subset of SNPs, indicating the robustness of RF in selecting relevant features associated with asthma. Model comparison showed that the integration of RF-SVM obtained the highest model performance with an accuracy, precision, and sensitivity of 62.5%, 65.3%, and 69%, respectively, when compared to the baseline, RF-kNN, and an external MeanDiff-kNN models. Furthermore, results show that the occurrence of asthma can be predicted with an Area under the Curve (AUC) of 0.62 and 0.64 for RF-SVM and RF-kNN models, respectively. This study demonstrates the integration of ML models to augment traditional methods in predicting genetic predisposition to multifactorial diseases such as asthma.
Collapse
Affiliation(s)
- Joverlyn Gaudillo
- Institute of Mathematical Sciences and Physics, University of the Philippines Los Baños, Philippines
- Computational Interdisciplinary Research Laboratories (CINTERLabs), University of the Philippines Los Baños, Philippines
| | - Jae Joseph Russell Rodriguez
- Genetics and Molecular Biology Division, Institute of Biological Sciences, University of the Philippines Los Baños, Philippines
| | - Allen Nazareno
- Institute of Mathematical Sciences and Physics, University of the Philippines Los Baños, Philippines
| | - Lei Rigi Baltazar
- Institute of Mathematical Sciences and Physics, University of the Philippines Los Baños, Philippines
- Computational Interdisciplinary Research Laboratories (CINTERLabs), University of the Philippines Los Baños, Philippines
| | - Julianne Vilela
- Philippine Genome Center Program for Agriculture, Office of the Vice Chancellor for Research and Extension, University of the Philippines Los Baños, Philippines
| | - Rommel Bulalacao
- Domingo Artificial Intelligence Research Center, Los Baños, Philippines
| | - Mario Domingo
- Domingo Artificial Intelligence Research Center, Los Baños, Philippines
| | - Jason Albia
- Institute of Mathematical Sciences and Physics, University of the Philippines Los Baños, Philippines
- Computational Interdisciplinary Research Laboratories (CINTERLabs), University of the Philippines Los Baños, Philippines
- * E-mail:
| |
Collapse
|
30
|
Sanyal N, Lo MT, Kauppi K, Djurovic S, Andreassen OA, Johnson VE, Chen CH. GWASinlps: non-local prior based iterative SNP selection tool for genome-wide association studies. Bioinformatics 2019; 35:1-11. [PMID: 29931045 DOI: 10.1093/bioinformatics/bty472] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2017] [Accepted: 06/12/2018] [Indexed: 01/29/2023] Open
Abstract
Motivation Multiple marker analysis of the genome-wide association study (GWAS) data has gained ample attention in recent years. However, because of the ultra high-dimensionality of GWAS data, such analysis is challenging. Frequently used penalized regression methods often lead to large number of false positives, whereas Bayesian methods are computationally very expensive. Motivated to ameliorate these issues simultaneously, we consider the novel approach of using non-local priors in an iterative variable selection framework. Results We develop a variable selection method, named, iterative non-local prior based selection for GWAS, or GWASinlps, that combines, in an iterative variable selection framework, the computational efficiency of the screen-and-select approach based on some association learning and the parsimonious uncertainty quantification provided by the use of non-local priors. The hallmark of our method is the introduction of 'structured screen-and-select' strategy, that considers hierarchical screening, which is not only based on response-predictor associations, but also based on response-response associations and concatenates variable selection within that hierarchy. Extensive simulation studies with single nucleotide polymorphisms having realistic linkage disequilibrium structures demonstrate the advantages of our computationally efficient method compared to several frequentist and Bayesian variable selection methods, in terms of true positive rate, false discovery rate, mean squared error and effect size estimation error. Further, we provide empirical power analysis useful for study design. Finally, a real GWAS data application was considered with human height as phenotype. Availability and implementation An R-package for implementing the GWASinlps method is available at https://cran.r-project.org/web/packages/GWASinlps/index.html. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nilotpal Sanyal
- Department of Radiology, University of California, San Diego, La Jolla, CA, USA
| | - Min-Tzu Lo
- Department of Radiology, University of California, San Diego, La Jolla, CA, USA
| | - Karolina Kauppi
- Department of Radiation Sciences, Umeå University, Umeå, Sweden
| | - Srdjan Djurovic
- Department of Medical Genetics, NORMENT, KG Jebsen Centre, University of Bergen, Bergen, Oslo University Hospital, Oslo, Norway
| | - Ole A Andreassen
- Division of Mental Health and Addiction, NORMENT, KG Jebsen Centre, Oslo University Hospital and Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Valen E Johnson
- Department of Statistics, Texas A&M University, College Station, TX, USA
| | - Chi-Hua Chen
- Department of Radiology, University of California, San Diego, La Jolla, CA, USA
| |
Collapse
|
31
|
Romagnoni A, Jégou S, Van Steen K, Wainrib G, Hugot JP. Comparative performances of machine learning methods for classifying Crohn Disease patients using genome-wide genotyping data. Sci Rep 2019; 9:10351. [PMID: 31316157 PMCID: PMC6637191 DOI: 10.1038/s41598-019-46649-z] [Citation(s) in RCA: 57] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Accepted: 07/03/2019] [Indexed: 02/08/2023] Open
Abstract
Crohn Disease (CD) is a complex genetic disorder for which more than 140 genes have been identified using genome wide association studies (GWAS). However, the genetic architecture of the trait remains largely unknown. The recent development of machine learning (ML) approaches incited us to apply them to classify healthy and diseased people according to their genomic information. The Immunochip dataset containing 18,227 CD patients and 34,050 healthy controls enrolled and genotyped by the international Inflammatory Bowel Disease genetic consortium (IIBDGC) has been re-analyzed using a set of ML methods: penalized logistic regression (LR), gradient boosted trees (GBT) and artificial neural networks (NN). The main score used to compare the methods was the Area Under the ROC Curve (AUC) statistics. The impact of quality control (QC), imputing and coding methods on LR results showed that QC methods and imputation of missing genotypes may artificially increase the scores. At the opposite, neither the patient/control ratio nor marker preselection or coding strategies significantly affected the results. LR methods, including Lasso, Ridge and ElasticNet provided similar results with a maximum AUC of 0.80. GBT methods like XGBoost, LightGBM and CatBoost, together with dense NN with one or more hidden layers, provided similar AUC values, suggesting limited epistatic effects in the genetic architecture of the trait. ML methods detected near all the genetic variants previously identified by GWAS among the best predictors plus additional predictors with lower effects. The robustness and complementarity of the different methods are also studied. Compared to LR, non-linear models such as GBT or NN may provide robust complementary approaches to identify and classify genetic markers.
Collapse
Affiliation(s)
- Alberto Romagnoni
- Centre de recherche sur l'inflammation UMR 1149, Inserm - Université Paris Diderot, 75018, Paris, France.,Data Team, Département d'informatique de l'ENS, École normale supérieure, CNRS, PSL Research University, 75005, Paris, France
| | | | - Kristel Van Steen
- WELBIO, GIGA-R Medical Genomics - BIO3, University of Liège, Liège, Belgium.,Department of Human Genetics, University of Leuven, Leuven, Belgium
| | - Gilles Wainrib
- Data Team, Département d'informatique de l'ENS, École normale supérieure, CNRS, PSL Research University, 75005, Paris, France.,Owkin, 75011, Paris, France
| | - Jean-Pierre Hugot
- Centre de recherche sur l'inflammation UMR 1149, Inserm - Université Paris Diderot, 75018, Paris, France. .,Hôpital Robert Debré, Assistance Publique-Hôpitaux de Paris, 75019, Paris, France.
| | | |
Collapse
|
32
|
Lan T, Yang B, Zhang X, Wang T, Lu Q. Statistical Methods and Software for Substance Use and Dependence Genetic Research. Curr Genomics 2019; 20:172-183. [PMID: 31929725 PMCID: PMC6935956 DOI: 10.2174/1389202920666190617094930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Revised: 05/16/2019] [Accepted: 05/24/2019] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND Substantial substance use disorders and related health conditions emerged dur-ing the mid-20th century and continue to represent a remarkable 21st century global burden of disease. This burden is largely driven by the substance-dependence process, which is a complex process and is influenced by both genetic and environmental factors. During the past few decades, a great deal of pro-gress has been made in identifying genetic variants associated with Substance Use and Dependence (SUD) through linkage, candidate gene association, genome-wide association and sequencing studies. METHODS Various statistical methods and software have been employed in different types of SUD ge-netic studies, facilitating the identification of new SUD-related variants. CONCLUSION In this article, we review statistical methods and software that are currently available for SUD genetic studies, and discuss their strengths and limitations.
Collapse
Affiliation(s)
| | | | | | - Tong Wang
- Address correspondence to these authors at the Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China; Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, USA; Tel/ Fax: ++1-517-353-8623; E-mails: ;
| | - Qing Lu
- Address correspondence to these authors at the Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China; Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, USA; Tel/ Fax: ++1-517-353-8623; E-mails: ;
| |
Collapse
|
33
|
Brinster R, Köttgen A, Tayo BO, Schumacher M, Sekula P. Control procedures and estimators of the false discovery rate and their application in low-dimensional settings: an empirical investigation. BMC Bioinformatics 2018; 19:78. [PMID: 29499647 PMCID: PMC5833079 DOI: 10.1186/s12859-018-2081-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2017] [Accepted: 02/20/2018] [Indexed: 01/14/2023] Open
Abstract
Background When many (up to millions) of statistical tests are conducted in discovery set analyses such as genome-wide association studies (GWAS), approaches controlling family-wise error rate (FWER) or false discovery rate (FDR) are required to reduce the number of false positive decisions. Some methods were specifically developed in the context of high-dimensional settings and partially rely on the estimation of the proportion of true null hypotheses. However, these approaches are also applied in low-dimensional settings such as replication set analyses that might be restricted to a small number of specific hypotheses. The aim of this study was to compare different approaches in low-dimensional settings using (a) real data from the CKDGen Consortium and (b) a simulation study. Results In both application and simulation FWER approaches were less powerful compared to FDR control methods, whether a larger number of hypotheses were tested or not. Most powerful was the q-value method. However, the specificity of this method to maintain true null hypotheses was especially decreased when the number of tested hypotheses was small. In this low-dimensional situation, estimation of the proportion of true null hypotheses was biased. Conclusions The results highlight the importance of a sizeable data set for a reliable estimation of the proportion of true null hypotheses. Consequently, methods relying on this estimation should only be applied in high-dimensional settings. Furthermore, if the focus lies on testing of a small number of hypotheses such as in replication settings, FWER methods rather than FDR methods should be preferred to maintain high specificity. Electronic supplementary material The online version of this article (10.1186/s12859-018-2081-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Regina Brinster
- Institute of Medical Biometry and Informatics, University of Heidelberg, Im Neuenheimer Feld 130.3, 69120, Heidelberg, Germany. .,Institute for Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Stefan-Meier-Str. 26, 79104, Freiburg, Germany.
| | - Anna Köttgen
- Institute of Genetic Epidemiology, Faculty of Medicine and Medical Center, University of Freiburg, Hugstetter Str. 49, 79106, Freiburg, Germany
| | - Bamidele O Tayo
- Department of Public Health Sciences, Loyola University Chicago Stritch School of Medicine, Maywood, IL, USA
| | - Martin Schumacher
- Institute for Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Stefan-Meier-Str. 26, 79104, Freiburg, Germany
| | - Peggy Sekula
- Institute of Genetic Epidemiology, Faculty of Medicine and Medical Center, University of Freiburg, Hugstetter Str. 49, 79106, Freiburg, Germany
| | | |
Collapse
|
34
|
Zeng P, Wang T, Huang S. Cis-SNPs Set Testing and PrediXcan Analysis for Gene Expression Data using Linear Mixed Models. Sci Rep 2017; 7:15237. [PMID: 29127305 PMCID: PMC5681585 DOI: 10.1038/s41598-017-15055-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2017] [Accepted: 10/19/2017] [Indexed: 12/21/2022] Open
Abstract
Understanding the functional mechanism of SNPs identified in GWAS on complex diseases is currently a challenging task. The studies of expression quantitative trait loci (eQTL) have shown that regulatory variants play a crucial role in the function of associated SNPs. Detecting significant genes (called eGenes) in eQTL studies and analyzing the effect sizes of cis-SNPs can offer important implications on the genetic architecture of associated SNPs and interpretations of the molecular basis of diseases. We applied linear mixed models (LMM) to the gene expression level and constructed likelihood ratio tests (LRT) to test for eGene in the Geuvadis data. We identified about 11% genes as eGenes in the Geuvadis data and found some eGenes were enriched in approximately independent linkage disequilibrium (LD) blocks (e.g. MHC). We further performed PrediXcan analysis for seven diseases in the WTCCC data with weights estimated using LMM and identified 64, 5, 21 and 1 significant genes (p < 0.05 after Bonferroni correction) associated with T1D, CD, RA and T2D. We found most of the significant genes of T1D and RA were also located within the MHC region. Our results provide strong evidence that gene expression plays an intermediate role for the associated variants in GWAS.
Collapse
Affiliation(s)
- Ping Zeng
- Xuzhou Medical University, Department of Epidemiology and Biostatistics, Xuzhou, 221004, China.
- University of Michigan, Department of Biostatistics, Ann Arbor, MI, 48104, USA.
| | - Ting Wang
- Xuzhou Medical University, Department of Epidemiology and Biostatistics, Xuzhou, 221004, China
| | - Shuiping Huang
- Xuzhou Medical University, Department of Epidemiology and Biostatistics, Xuzhou, 221004, China.
| |
Collapse
|
35
|
Zeng P, Zhou X, Huang S. Prediction of gene expression with cis-SNPs using mixed models and regularization methods. BMC Genomics 2017; 18:368. [PMID: 28490319 PMCID: PMC5425981 DOI: 10.1186/s12864-017-3759-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2016] [Accepted: 05/03/2017] [Indexed: 12/25/2022] Open
Abstract
Background It has been shown that gene expression in human tissues is heritable, thus predicting gene expression using only SNPs becomes possible. The prediction of gene expression can offer important implications on the genetic architecture of individual functional associated SNPs and further interpretations of the molecular basis underlying human diseases. Methods We compared three types of methods for predicting gene expression using only cis-SNPs, including the polygenic model, i.e. linear mixed model (LMM), two sparse models, i.e. Lasso and elastic net (ENET), and the hybrid of LMM and sparse model, i.e. Bayesian sparse linear mixed model (BSLMM). The three kinds of prediction methods have very different assumptions of underlying genetic architectures. These methods were evaluated using simulations under various scenarios, and were applied to the Geuvadis gene expression data. Results The simulations showed that these four prediction methods (i.e. Lasso, ENET, LMM and BSLMM) behaved best when their respective modeling assumptions were satisfied, but BSLMM had a robust performance across a range of scenarios. According to R2 of these models in the Geuvadis data, the four methods performed quite similarly. We did not observe any clustering or enrichment of predictive genes (defined as genes with R2 ≥ 0.05) across the chromosomes, and also did not see there was any clear relationship between the proportion of the predictive genes and the proportion of genes in each chromosome. However, an interesting finding in the Geuvadis data was that highly predictive genes (e.g. R2 ≥ 0.30) may have sparse genetic architectures since Lasso, ENET and BSLMM outperformed LMM for these genes; and this observation was validated in another gene expression data. We further showed that the predictive genes were enriched in approximately independent LD blocks. Conclusions Gene expression can be predicted with only cis-SNPs using well-developed prediction models and these predictive genes were enriched in some approximately independent LD blocks. The prediction of gene expression can shed some light on the functional interpretation for identified SNPs in GWASs.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, Xuzhou Medical University, 209 Tongshan Rd, Xuzhou, Jiangsu, 221004, China. .,Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, MI, 48104, USA.
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, MI, 48104, USA
| | - Shuiping Huang
- Department of Epidemiology and Biostatistics, Xuzhou Medical University, 209 Tongshan Rd, Xuzhou, Jiangsu, 221004, China.
| |
Collapse
|
36
|
Kao PYP, Leung KH, Chan LWC, Yip SP, Yap MKH. Pathway analysis of complex diseases for GWAS, extending to consider rare variants, multi-omics and interactions. Biochim Biophys Acta Gen Subj 2016; 1861:335-353. [PMID: 27888147 DOI: 10.1016/j.bbagen.2016.11.030] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2016] [Revised: 10/17/2016] [Accepted: 11/19/2016] [Indexed: 12/20/2022]
Abstract
BACKGROUND Genome-wide association studies (GWAS) is a major method for studying the genetics of complex diseases. Finding all sequence variants to explain fully the aetiology of a disease is difficult because of their small effect sizes. To better explain disease mechanisms, pathway analysis is used to consolidate the effects of multiple variants, and hence increase the power of the study. While pathway analysis has previously been performed within GWAS only, it can now be extended to examining rare variants, other "-omics" and interaction data. SCOPE OF REVIEW 1. Factors to consider in the choice of software for GWAS pathway analysis. 2. Examples of how pathway analysis is used to analyse rare variants, other "-omics" and interaction data. MAJOR CONCLUSIONS To choose appropriate software tools, factors for consideration include covariate compatibility, null hypothesis, one- or two-step analysis required, curation method of gene sets, size of pathways, and size of flanking regions to define gene boundaries. For rare variants, analysis performance depends on consistency between assumed and actual effect distribution of variants. Integration of other "-omics" data and interaction can better explain gene functions. GENERAL SIGNIFICANCE Pathway analysis methods will be more readily used for integration of multiple sources of data, and enable more accurate prediction of phenotypes.
Collapse
Affiliation(s)
- Patrick Y P Kao
- Centre for Myopia Research, School of Optometry, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Kim Hung Leung
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Lawrence W C Chan
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Shea Ping Yip
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong SAR, China.
| | - Maurice K H Yap
- Centre for Myopia Research, School of Optometry, The Hong Kong Polytechnic University, Hong Kong SAR, China
| |
Collapse
|
37
|
Personalized risk prediction for type 2 diabetes: the potential of genetic risk scores. Genet Med 2016; 19:322-329. [PMID: 27513194 PMCID: PMC5506454 DOI: 10.1038/gim.2016.103] [Citation(s) in RCA: 97] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2015] [Accepted: 06/20/2016] [Indexed: 12/16/2022] Open
Abstract
Purpose: Using effect estimates from genome-wide association studies (GWAS), we identified
a genetic risk score (GRS) that has the strongest association with type 2 diabetes
(T2D) status in a population-based cohort and investigated its potential for
prospective T2D risk assessment. Methods: By varying the number of single-nucleotide polymorphisms (SNPs) and their
respective weights, alternative versions of GRS can be computed. They were tested
in 1,181 T2D cases and 9,092 controls of the Estonian Biobank cohort. The
best-fitting GRS was chosen for the subsequent analysis of incident T2D (386
cases). Results: The best fit was provided by a novel doubly weighted GRS that captures
the effect of 1,000 SNPs. The hazard for incident T2D was 3.45 times (95% CI:
2.31–5.17) higher in the highest GRS quintile compared with the lowest
quintile, after adjusting for body mass index and other known predictors. Adding
GRS to the prediction model for 5-year T2D risk resulted in continuous net
reclassification improvement of 0.324 (95% CI: 0.211–0.444). In
addition, a significant effect of the GRS on all-cause and cardiovascular
mortality was observed. Conclusion: The proposed GRS would improve the accuracy of T2D risk prediction when added to
the currently used set of predictors. Genet Med19 3, 322–329.
Collapse
|
38
|
Umehara H, Numata S, Tajima A, Nishi A, Nakataki M, Imoto I, Sumitani S, Ohmori T. Calcium Signaling Pathway Is Associated with the Long-Term Clinical Response to Selective Serotonin Reuptake Inhibitors (SSRI) and SSRI with Antipsychotics in Patients with Obsessive-Compulsive Disorder. PLoS One 2016; 11:e0157232. [PMID: 27281126 PMCID: PMC4900663 DOI: 10.1371/journal.pone.0157232] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2016] [Accepted: 05/26/2016] [Indexed: 01/18/2023] Open
Abstract
Background Selective serotonin reuptake inhibitors (SSRI) are established first-line pharmacological treatments for obsessive-compulsive disorder (OCD), while antipsychotics are used as an augmentation strategy for SSRI in OCD patients who have either no response or a partial response to SSRI treatment. The goal of the present study was to identify genetic variants and pathways that are associated with the long-term clinical response of OCD patients to SSRI or SSRI with antipsychotics. Methods We first performed a genome-wide association study of 96 OCD patients to examine genetic variants contributing to the response to SSRI or SSRI with antipsychotics. Subsequently, we conducted pathway-based analyses by using Improved Gene Set Enrichment Analysis for Genome-wide Association Study (i-GSEA4GWAS) to examine the combined effects of genetic variants on the clinical response in OCD. Results While we failed to detect specific genetic variants associated with clinical responses to SSRI or to SSRI with an atypical antipsychotic at genome-wide levels of significance, we identified 8 enriched pathways for the SSRI treatment response and 5 enriched pathways for the treatment response to SSRI with an antipsychotic medication. Notably, the calcium signaling pathway was identified in both treatment responses. Conclusions Our results provide novel insight into the molecular mechanisms underlying the variability in clinical response to SSRI and SSRI with antipsychotics in OCD patients.
Collapse
Affiliation(s)
- Hidehiro Umehara
- Department of Psychiatry, Institute of Biomedical Sciences, Tokushima University Graduate School, Tokushima, Japan
| | - Shusuke Numata
- Department of Psychiatry, Institute of Biomedical Sciences, Tokushima University Graduate School, Tokushima, Japan
- * E-mail:
| | - Atsushi Tajima
- Department of Bioinformatics and Genomics, Graduate School of Advanced Preventive Medical Sciences, Kanazawa University, Ishikawa, Japan
- Department of Human Genetics, Institute of Biomedical Sciences, Tokushima University Graduate School, Tokushima, Japan
| | - Akira Nishi
- Department of Psychiatry, Institute of Biomedical Sciences, Tokushima University Graduate School, Tokushima, Japan
| | - Masahito Nakataki
- Department of Psychiatry, Institute of Biomedical Sciences, Tokushima University Graduate School, Tokushima, Japan
| | - Issei Imoto
- Department of Human Genetics, Institute of Biomedical Sciences, Tokushima University Graduate School, Tokushima, Japan
| | - Satsuki Sumitani
- Department of Psychiatry, Institute of Biomedical Sciences, Tokushima University Graduate School, Tokushima, Japan
- Department of support for students with special needs, Institute of Biomedical Sciences, Tokushima University Graduate School, Tokushima, Japan
| | - Tetsuro Ohmori
- Department of Psychiatry, Institute of Biomedical Sciences, Tokushima University Graduate School, Tokushima, Japan
| |
Collapse
|
39
|
Zhang Q, Zhao Y, Zhang R, Wei Y, Yi H, Shao F, Chen F. A Comparative Study of Five Association Tests Based on CpG Set for Epigenome-Wide Association Studies. PLoS One 2016; 11:e0156895. [PMID: 27258058 PMCID: PMC4892473 DOI: 10.1371/journal.pone.0156895] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2016] [Accepted: 05/20/2016] [Indexed: 11/19/2022] Open
Abstract
An epigenome-wide association study (EWAS) is a large-scale study of human disease-associated epigenetic variation, specifically variation in DNA methylation. High throughput technologies enable simultaneous epigenetic profiling of DNA methylation at hundreds of thousands of CpGs across the genome. The clustering of correlated DNA methylation at CpGs is reportedly similar to that of linkage-disequilibrium (LD) correlation in genetic single nucleotide polymorphisms (SNP) variation. However, current analysis methods, such as the t-test and rank-sum test, may be underpowered to detect differentially methylated markers. We propose to test the association between the outcome (e.g case or control) and a set of CpG sites jointly. Here, we compared the performance of five CpG set analysis approaches: principal component analysis (PCA), supervised principal component analysis (SPCA), kernel principal component analysis (KPCA), sequence kernel association test (SKAT), and sliced inverse regression (SIR) with Hotelling's T2 test and t-test using Bonferroni correction. The simulation results revealed that the first six methods can control the type I error at the significance level, while the t-test is conservative. SPCA and SKAT performed better than other approaches when the correlation among CpG sites was strong. For illustration, these methods were also applied to a real methylation dataset.
Collapse
Affiliation(s)
- Qiuyi Zhang
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China, 211166
| | - Yang Zhao
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China, 211166
| | - Ruyang Zhang
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China, 211166
| | - Yongyue Wei
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China, 211166
| | - Honggang Yi
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China, 211166
| | - Fang Shao
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China, 211166
| | - Feng Chen
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China, 211166
| |
Collapse
|
40
|
Gasc C, Peyretaillade E, Peyret P. Sequence capture by hybridization to explore modern and ancient genomic diversity in model and nonmodel organisms. Nucleic Acids Res 2016; 44:4504-18. [PMID: 27105841 PMCID: PMC4889952 DOI: 10.1093/nar/gkw309] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2016] [Revised: 04/07/2016] [Accepted: 04/12/2016] [Indexed: 12/25/2022] Open
Abstract
The recent expansion of next-generation sequencing has significantly improved biological research. Nevertheless, deep exploration of genomes or metagenomic samples remains difficult because of the sequencing depth and the associated costs required. Therefore, different partitioning strategies have been developed to sequence informative subsets of studied genomes. Among these strategies, hybridization capture has proven to be an innovative and efficient tool for targeting and enriching specific biomarkers in complex DNA mixtures. It has been successfully applied in numerous areas of biology, such as exome resequencing for the identification of mutations underlying Mendelian or complex diseases and cancers, and its usefulness has been demonstrated in the agronomic field through the linking of genetic variants to agricultural phenotypic traits of interest. Moreover, hybridization capture has provided access to underexplored, but relevant fractions of genomes through its ability to enrich defined targets and their flanking regions. Finally, on the basis of restricted genomic information, this method has also allowed the expansion of knowledge of nonreference species and ancient genomes and provided a better understanding of metagenomic samples. In this review, we present the major advances and discoveries permitted by hybridization capture and highlight the potency of this approach in all areas of biology.
Collapse
Affiliation(s)
- Cyrielle Gasc
- EA 4678 CIDAM, Université d'Auvergne, Clermont-Ferrand, 63001, France
| | | | - Pierre Peyret
- EA 4678 CIDAM, Université d'Auvergne, Clermont-Ferrand, 63001, France
| |
Collapse
|