Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zeng P, Zhao Y, Qian C, Zhang L, Zhang R, Gou J, Liu J, Liu L, Chen F. Statistical analysis for genome-wide association study. J Biomed Res 2014;29:285-97. [PMID: 26243515 PMCID: PMC4547377 DOI: 10.7555/jbr.29.20140007] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Revised: 06/07/2014] [Accepted: 09/27/2014] [Indexed: 12/19/2022] Open

For:	Zeng P, Zhao Y, Qian C, Zhang L, Zhang R, Gou J, Liu J, Liu L, Chen F. Statistical analysis for genome-wide association study. J Biomed Res 2014;29:285-97. [PMID: 26243515 PMCID: PMC4547377 DOI: 10.7555/jbr.29.20140007] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Revised: 06/07/2014] [Accepted: 09/27/2014] [Indexed: 12/19/2022] Open

Number

Cited by Other Article(s)

Zhang S, Jiang Z, Zeng P. Incorporating genetic similarity of auxiliary samples into eGene identification under the transfer learning framework. J Transl Med 2024;22:258. [PMID: 38461317 PMCID: PMC10924384 DOI: 10.1186/s12967-024-05053-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 03/01/2024] [Indexed: 03/11/2024] Open

Abstract

BACKGROUND

The term eGene has been applied to define a gene whose expression level is affected by at least one independent expression quantitative trait locus (eQTL). It is both theoretically and empirically important to identify eQTLs and eGenes in genomic studies. However, standard eGene detection methods generally focus on individual cis-variants and cannot efficiently leverage useful knowledge acquired from auxiliary samples into target studies.

METHODS

We propose a multilocus-based eGene identification method called TLegene by integrating shared genetic similarity information available from auxiliary studies under the statistical framework of transfer learning. We apply TLegene to eGene identification in ten TCGA cancers which have an explicit relevant tissue in the GTEx project, and learn genetic effect of variant in TCGA from GTEx. We also adopt TLegene to the Geuvadis project to evaluate its usefulness in non-cancer studies.

RESULTS

We observed substantial genetic effect correlation of cis-variants between TCGA and GTEx for a larger number of genes. Furthermore, consistent with the results of our simulations, we found that TLegene was more powerful than existing methods and thus identified 169 distinct candidate eGenes, which was much larger than the approach that did not consider knowledge transfer across target and auxiliary studies. Previous studies and functional enrichment analyses provided empirical evidence supporting the associations of discovered eGenes, and it also showed evidence of allelic heterogeneity of gene expression. Furthermore, TLegene identified more eGenes in Geuvadis and revealed that these eGenes were mainly enriched in cells EBV transformed lymphocytes tissue.

CONCLUSION

Overall, TLegene represents a flexible and powerful statistical method for eGene identification through transfer learning of genetic similarity shared across auxiliary and target studies.

Collapse

Kang HY, Choe EK. Clinical Strategies in Gene Screening Counseling for the Healthy General Population. Korean J Fam Med 2024;45:61-68. [PMID: 38528647 DOI: 10.4082/kjfm.23.0254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 11/26/2023] [Indexed: 03/27/2024] Open

Lee SB, Choi JE, Hong KW, Jung DH. Genetic Variants Linked to Myocardial Infarction in Individuals with Non-Alcoholic Fatty Liver Disease and Their Potential Interaction with Dietary Patterns. Nutrients 2024;16:602. [PMID: 38474730 DOI: 10.3390/nu16050602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 02/12/2024] [Accepted: 02/20/2024] [Indexed: 03/14/2024] Open

Seo H, Park JH, Hwang JT, Choi HK, Park SH, Lee J. Epigenetic Profiling of Type 2 Diabetes Mellitus: An Epigenome-Wide Association Study of DNA Methylation in the Korean Genome and Epidemiology Study. Genes (Basel) 2023;14:2207. [PMID: 38137029 PMCID: PMC10743302 DOI: 10.3390/genes14122207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 12/08/2023] [Accepted: 12/12/2023] [Indexed: 12/24/2023] Open

Lu H, Zhang S, Jiang Z, Zeng P. Leveraging trans-ethnic genetic risk scores to improve association power for complex traits in underrepresented populations. Brief Bioinform 2023:bbad232. [PMID: 37332016 DOI: 10.1093/bib/bbad232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 05/06/2023] [Accepted: 06/04/2023] [Indexed: 06/20/2023] Open

Abstract

Trans-ethnic genome-wide association studies have revealed that many loci identified in European populations can be reproducible in non-European populations, indicating widespread trans-ethnic genetic similarity. However, how to leverage such shared information more efficiently in association analysis is less investigated for traits in underrepresented populations. We here propose a statistical framework, trans-ethnic genetic risk score informed gene-based association mixed model (GAMM), by hierarchically modeling single-nucleotide polymorphism effects in the target population as a function of effects of the same trait in well-studied populations. GAMM powerfully integrates genetic similarity across distinct ancestral groups to enhance power in understudied populations, as confirmed by extensive simulations. We illustrate the usefulness of GAMM via the application to 13 blood cell traits (i.e. basophil count, eosinophil count, hematocrit, hemoglobin concentration, lymphocyte count, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, mean corpuscular volume, monocyte count, neutrophil count, platelet count, red blood cell count and total white blood cell count) in Africans of the UK Biobank (n = 3204) while utilizing genetic overlap shared in Europeans (n = 746 667) and East Asians (n = 162 255). We discovered multiple new associated genes, which had otherwise been missed by existing methods, and revealed that the trans-ethnic information indirectly contributed much to the phenotypic variance. Overall, GAMM represents a flexible and powerful statistical framework of association analysis for complex traits in underrepresented populations by integrating trans-ethnic genetic similarity across well-studied populations, and helps attenuate health inequities in current genetics research for people of minority populations.

Collapse

Shankar R, Dwivedi AK, Singh V, Jain M. Genome-wide discovery of genetic variations between rice cultivars with contrasting drought stress response and their potential functional relevance. PHYSIOLOGIA PLANTARUM 2023;175:e13879. [PMID: 36805564 DOI: 10.1111/ppl.13879] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 02/14/2023] [Accepted: 02/15/2023] [Indexed: 06/18/2023]

CoNet: Efficient Network Regression for Survival Analysis in Transcriptome-Wide Association Studies—With Applications to Studies of Breast Cancer. Genes (Basel) 2023;14:genes14030586. [PMID: 36980857 PMCID: PMC10048118 DOI: 10.3390/genes14030586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Revised: 02/23/2023] [Accepted: 02/23/2023] [Indexed: 03/02/2023] Open

Muneeb M, Feng S, Henschel A. Transfer learning for genotype-phenotype prediction using deep learning models. BMC Bioinformatics 2022;23:511. [PMID: 36447153 PMCID: PMC9710151 DOI: 10.1186/s12859-022-05036-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 11/05/2022] [Indexed: 12/03/2022] Open

Qiao J, Shao Z, Wu Y, Zeng P, Wang T. Detecting associated genes for complex traits shared across East Asian and European populations under the framework of composite null hypothesis testing. Lab Invest 2022;20:424. [PMID: 36138484 PMCID: PMC9503281 DOI: 10.1186/s12967-022-03637-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 09/12/2022] [Indexed: 11/21/2022]

Abstract

Background

Detecting trans-ethnic common associated genetic loci can offer important insights into shared genetic components underlying complex diseases/traits across diverse continental populations. However, effective statistical methods for such a goal are currently lacking.

Methods

By leveraging summary statistics available from global-scale genome-wide association studies, we herein proposed a novel genetic overlap detection method called CONTO (COmposite Null hypothesis test for Trans-ethnic genetic Overlap) from the perspective of high-dimensional composite null hypothesis testing. Unlike previous studies which generally analyzed individual genetic variants, CONTO is a gene-centric method which focuses on a set of genetic variants located within a gene simultaneously and assesses their joint significance with the trait of interest. By borrowing the similar principle of joint significance test (JST), CONTO takes the maximum P value of multiple associations as the significance measurement.

Results

Compared to JST which is often overly conservative, CONTO is improved in two aspects, including the construction of three-component mixture null distribution and the adjustment of trans-ethnic genetic correlation. Consequently, CONTO corrects the conservativeness of JST with well-calibrated P values and is much more powerful validated by extensive simulation studies. We applied CONTO to discover common associated genes for 31 complex diseases/traits between the East Asian and European populations, and identified many shared trait-associated genes that had otherwise been missed by JST. We further revealed that population-common genes were generally more evolutionarily conserved than population-specific or null ones.

Conclusion

Overall, CONTO represents a powerful method for detecting common associated genes across diverse ancestral groups; our results provide important implications on the transferability of GWAS discoveries in one population to others.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12967-022-03637-8.

Collapse

Roh H. A genome-wide association study of the occurrence of genetic variations in Edwardsiella piscicida, Vibrio harveyi, and Streptococcus parauberis under stressed environments. JOURNAL OF FISH DISEASES 2022;45:1373-1388. [PMID: 35735095 PMCID: PMC9541752 DOI: 10.1111/jfd.13668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 05/29/2022] [Accepted: 06/01/2022] [Indexed: 06/15/2023]

Shao Z, Wang T, Qiao J, Zhang Y, Huang S, Zeng P. A comprehensive comparison of multilocus association methods with summary statistics in genome-wide association studies. BMC Bioinformatics 2022;23:359. [PMID: 36042399 PMCID: PMC9429742 DOI: 10.1186/s12859-022-04897-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 08/22/2022] [Indexed: 02/07/2023] Open

Abstract

BACKGROUND

Multilocus analysis on a set of single nucleotide polymorphisms (SNPs) pre-assigned within a gene constitutes a valuable complement to single-marker analysis by aggregating data on complex traits in a biologically meaningful way. However, despite the existence of a wide variety of SNP-set methods, few comprehensive comparison studies have been previously performed to evaluate the effectiveness of these methods.

RESULTS

We herein sought to fill this knowledge gap by conducting a comprehensive empirical comparison for 22 commonly-used summary-statistics based SNP-set methods. We showed that only seven methods could effectively control the type I error, and that these well-calibrated approaches had varying power performance under the simulation scenarios. Overall, we confirmed that the burden test was generally underpowered and score-based variance component tests (e.g., sequence kernel association test) were much powerful under the polygenic genetic architecture in both common and rare variant association analyses. We further revealed that two linkage-disequilibrium-free P value combination methods (e.g., harmonic mean P value method and aggregated Cauchy association test) behaved very well under the sparse genetic architecture in simulations and real-data applications to common and rare variant association analyses as well as in expression quantitative trait loci weighted integrative analysis. We also assessed the scalability of these approaches by recording computational time and found that all these methods can be scalable to biobank-scale data although some might be relatively slow.

CONCLUSION

In conclusion, we hope that our findings can offer an important guidance on how to choose appropriate multilocus association analysis methods in post-GWAS era. All the SNP-set methods are implemented in the R package called MCA, which is freely available at https://github.com/biostatpzeng/ .

Collapse

Zhang M, Qiao J, Zhang S, Zeng P. Exploring the association between birthweight and breast cancer using summary statistics from a perspective of genetic correlation, mediation, and causality. J Transl Med 2022;20:227. [PMID: 35568861 PMCID: PMC9107660 DOI: 10.1186/s12967-022-03435-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Accepted: 04/04/2022] [Indexed: 12/24/2022] Open

Lu H, Qiao J, Shao Z, Wang T, Huang S, Zeng P. A comprehensive gene-centric pleiotropic association analysis for 14 psychiatric disorders with GWAS summary statistics. BMC Med 2021;19:314. [PMID: 34895209 PMCID: PMC8667366 DOI: 10.1186/s12916-021-02186-z] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 11/10/2021] [Indexed: 12/15/2022] Open

Abstract

BACKGROUND

Recent genome-wide association studies (GWASs) have revealed the polygenic nature of psychiatric disorders and discovered a few of single-nucleotide polymorphisms (SNPs) associated with multiple psychiatric disorders. However, the extent and pattern of pleiotropy among distinct psychiatric disorders remain not completely clear.

METHODS

We analyzed 14 psychiatric disorders using summary statistics available from the largest GWASs by far. We first applied the cross-trait linkage disequilibrium score regression (LDSC) to estimate genetic correlation between disorders. Then, we performed a gene-based pleiotropy analysis by first aggregating a set of SNP-level associations into a single gene-level association signal using MAGMA. From a methodological perspective, we viewed the identification of pleiotropic associations across the entire genome as a high-dimensional problem of composite null hypothesis testing and utilized a novel method called PLACO for pleiotropy mapping. We ultimately implemented functional analysis for identified pleiotropic genes and used Mendelian randomization for detecting causal association between these disorders.

RESULTS

We confirmed extensive genetic correlation among psychiatric disorders, based on which these disorders can be grouped into three diverse categories. We detected a large number of pleiotropic genes including 5884 associations and 2424 unique genes and found that differentially expressed pleiotropic genes were significantly enriched in pancreas, liver, heart, and brain, and that the biological process of these genes was remarkably enriched in regulating neurodevelopment, neurogenesis, and neuron differentiation, offering substantial evidence supporting the validity of identified pleiotropic loci. We further demonstrated that among all the identified pleiotropic genes there were 342 unique ones linked with 6353 drugs with drug-gene interaction which can be classified into distinct types including inhibitor, agonist, blocker, antagonist, and modulator. We also revealed causal associations among psychiatric disorders, indicating that genetic overlap and causality commonly drove the observed co-existence of these disorders.

CONCLUSIONS

Our study is among the first large-scale effort to characterize gene-level pleiotropy among a greatly expanded set of psychiatric disorders and provides important insight into shared genetic etiology underlying these disorders. The findings would inform psychiatric nosology, identify potential neurobiological mechanisms predisposing to specific clinical presentations, and pave the way to effective drug targets for clinical treatment.

Collapse

Suitability of GWAS as a Tool to Discover SNPs Associated with Tick Resistance in Cattle: A Review. Pathogens 2021;10:pathogens10121604. [PMID: 34959558 PMCID: PMC8707706 DOI: 10.3390/pathogens10121604] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 11/22/2021] [Accepted: 12/01/2021] [Indexed: 12/22/2022] Open

Monnot S, Desaint H, Mary-Huard T, Moreau L, Schurdi-Levraud V, Boissot N. Deciphering the Genetic Architecture of Plant Virus Resistance by GWAS, State of the Art and Potential Advances. Cells 2021;10:3080. [PMID: 34831303 PMCID: PMC8625838 DOI: 10.3390/cells10113080] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 11/03/2021] [Accepted: 11/04/2021] [Indexed: 01/04/2023] Open

Lu H, Wei Y, Jiang Z, Zhang J, Wang T, Huang S, Zeng P. Integrative eQTL-weighted hierarchical Cox models for SNP-set based time-to-event association studies. J Transl Med 2021;19:418. [PMID: 34627275 PMCID: PMC8502405 DOI: 10.1186/s12967-021-03090-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2021] [Accepted: 09/26/2021] [Indexed: 01/23/2023] Open

Gao Y, Zhang J, Zhao H, Guan F, Zeng P. Instrumental Heterogeneity in Sex-Specific Two-Sample Mendelian Randomization: Empirical Results From the Relationship Between Anthropometric Traits and Breast/Prostate Cancer. Front Genet 2021;12:651332. [PMID: 34178025 PMCID: PMC8220153 DOI: 10.3389/fgene.2021.651332] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Accepted: 04/20/2021] [Indexed: 12/24/2022] Open

Abstract

Background

In two-sample Mendelian randomization (MR) studies, sex instrumental heterogeneity is an important problem needed to address carefully, which however is often overlooked and may lead to misleading causal inference.

Methods

We first employed cross-trait linkage disequilibrium score regression (LDSC), Pearson's correlation analysis, and the Cochran's Q test to examine sex genetic similarity and heterogeneity in instrumental variables (IVs) of exposures. Simulation was further performed to explore the influence of sex instrumental heterogeneity on causal effect estimation in sex-specific two-sample MR analyses. Furthermore, we chose breast/prostate cancer as outcome and four anthropometric traits as exposures as an illustrative example to illustrate the importance of taking sex heterogeneity of instruments into account in MR studies.

Results

The simulation definitively demonstrated that sex-combined IVs can lead to biased causal effect estimates in sex-specific two-sample MR studies. In our real applications, both LDSC and Pearson's correlation analyses showed high genetic correlation between sex-combined and sex-specific IVs of the four anthropometric traits, while nearly all the correlation coefficients were larger than zero but less than one. The Cochran's Q test also displayed sex heterogeneity for some instruments. When applying sex-specific instruments, significant discrepancies in the magnitude of estimated causal effects were detected for body mass index (BMI) on breast cancer (P = 1.63E-6), for hip circumference (HIP) on breast cancer (P = 1.25E-20), and for waist circumference (WC) on prostate cancer (P = 0.007) compared with those generated with sex-combined instruments.

Conclusion

Our study reveals that the sex instrumental heterogeneity has non-ignorable impact on sex-specific two-sample MR studies and the causal effects of anthropometric traits on breast/prostate cancer would be biased if sex-combined IVs are incorrectly employed.

Collapse

Muneeb M, Henschel A. Eye-color and Type-2 diabetes phenotype prediction from genotype data using deep learning methods. BMC Bioinformatics 2021;22:198. [PMID: 33874881 PMCID: PMC8056510 DOI: 10.1186/s12859-021-04077-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Accepted: 03/03/2021] [Indexed: 01/08/2023] Open

Abstract

Background

Genotype–phenotype predictions are of great importance in genetics. These predictions can help to find genetic mutations causing variations in human beings. There are many approaches for finding the association which can be broadly categorized into two classes, statistical techniques, and machine learning. Statistical techniques are good for finding the actual SNPs causing variation where Machine Learning techniques are good where we just want to classify the people into different categories. In this article, we examined the Eye-color and Type-2 diabetes phenotype. The proposed technique is a hybrid approach consisting of some parts from statistical techniques and remaining from Machine learning.

Results

The main dataset for Eye-color phenotype consists of 806 people. 404 people have Blue-Green eyes where 402 people have Brown eyes. After preprocessing we generated 8 different datasets, containing different numbers of SNPs, using the mutation difference and thresholding at individual SNP. We calculated three types of mutation at each SNP no mutation, partial mutation, and full mutation. After that data is transformed for machine learning algorithms. We used about 9 classifiers, RandomForest, Extreme Gradient boosting, ANN, LSTM, GRU, BILSTM, 1DCNN, ensembles of ANN, and ensembles of LSTM which gave the best accuracy of 0.91, 0.9286, 0.945, 0.94, 0.94, 0.92, 0.95, and 0.96% respectively. Stacked ensembles of LSTM outperformed other algorithms for 1560 SNPs with an overall accuracy of 0.96, AUC = 0.98 for brown eyes, and AUC = 0.97 for Blue-Green eyes. The main dataset for Type-2 diabetes consists of 107 people where 30 people are classified as cases and 74 people as controls. We used different linear threshold to find the optimal number of SNPs for classification. The final model gave an accuracy of 0.97%.

Conclusion

Genotype–phenotype predictions are very useful especially in forensic. These predictions can help to identify SNP variant association with traits and diseases. Given more datasets, machine learning model predictions can be increased. Moreover, the non-linearity in the Machine learning model and the combination of SNPs Mutations while training the model increases the prediction. We considered binary classification problems but the proposed approach can be extended to multi-class classification.

Collapse

McGuire D, Jiang Y, Liu M, Weissenkampen JD, Eckert S, Yang L, Chen F, Berg A, Vrieze S, Jiang B, Li Q, Liu DJ. Model-based assessment of replicability for genome-wide association meta-analysis. Nat Commun 2021;12:1964. [PMID: 33785739 PMCID: PMC8009871 DOI: 10.1038/s41467-021-21226-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 01/07/2021] [Indexed: 01/17/2023] Open

Lu H, Zhang J, Jiang Z, Zhang M, Wang T, Zhao H, Zeng P. Detection of Genetic Overlap Between Rheumatoid Arthritis and Systemic Lupus Erythematosus Using GWAS Summary Statistics. Front Genet 2021;12:656545. [PMID: 33815486 PMCID: PMC8012913 DOI: 10.3389/fgene.2021.656545] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 03/01/2021] [Indexed: 01/04/2023] Open

Abstract

Background

Clinical and epidemiological studies have suggested systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA) are comorbidities and common genetic etiologies can partly explain such coexistence. However, shared genetic determinations underlying the two diseases remain largely unknown.

Methods

Our analysis relied on summary statistics available from genome-wide association studies of SLE (N = 23,210) and RA (N = 58,284). We first evaluated the genetic correlation between RA and SLE through the linkage disequilibrium score regression (LDSC). Then, we performed a multiple-tissue eQTL (expression quantitative trait loci) weighted integrative analysis for each of the two diseases and aggregated association evidence across these tissues via the recently proposed harmonic mean P-value (HMP) combination strategy, which can produce a single well-calibrated P-value for correlated test statistics. Afterwards, we conducted the pleiotropy-informed association using conjunction conditional FDR (ccFDR) to identify potential pleiotropic genes associated with both RA and SLE.

Results

We found there existed a significant positive genetic correlation (r_g = 0.404, P = 6.01E-10) via LDSC between RA and SLE. Based on the multiple-tissue eQTL weighted integrative analysis and the HMP combination across various tissues, we discovered 14 potential pleiotropic genes by ccFDR, among which four were likely newly novel genes (i.e., INPP5B, OR5K2, RP11-2C24.5, and CTD-3105H18.4). The SNP effect sizes of these pleiotropic genes were typically positively dependent, with an average correlation of 0.579. Functionally, these genes were implicated in multiple auto-immune relevant pathways such as inositol phosphate metabolic process, membrane and glucagon signaling pathway.

Conclusion

This study reveals common genetic components between RA and SLE and provides candidate associated loci for understanding of molecular mechanism underlying the comorbidity of the two diseases.

Collapse

Scossa F, Fernie AR. Ancestral sequence reconstruction - An underused approach to understand the evolution of gene function in plants? Comput Struct Biotechnol J 2021;19:1579-1594. [PMID: 33868595 PMCID: PMC8039532 DOI: 10.1016/j.csbj.2021.03.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2021] [Revised: 03/04/2021] [Accepted: 03/06/2021] [Indexed: 02/06/2023] Open

Dennis JK, Sealock JM, Straub P, Lee YH, Hucks D, Actkins K, Faucon A, Feng YCA, Ge T, Goleva SB, Niarchou M, Singh K, Morley T, Smoller JW, Ruderfer DM, Mosley JD, Chen G, Davis LK. Clinical laboratory test-wide association scan of polygenic scores identifies biomarkers of complex disease. Genome Med 2021;13:6. [PMID: 33441150 PMCID: PMC7807864 DOI: 10.1186/s13073-020-00820-8] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Accepted: 12/08/2020] [Indexed: 12/27/2022] Open

Abstract

BACKGROUND

Clinical laboratory (lab) tests are used in clinical practice to diagnose, treat, and monitor disease conditions. Test results are stored in electronic health records (EHRs), and a growing number of EHRs are linked to patient DNA, offering unprecedented opportunities to query relationships between genetic risk for complex disease and quantitative physiological measurements collected on large populations.

METHODS

A total of 3075 quantitative lab tests were extracted from Vanderbilt University Medical Center's (VUMC) EHR system and cleaned for population-level analysis according to our QualityLab protocol. Lab values extracted from BioVU were compared with previous population studies using heritability and genetic correlation analyses. We then tested the hypothesis that polygenic risk scores for biomarkers and complex disease are associated with biomarkers of disease extracted from the EHR. In a proof of concept analyses, we focused on lipids and coronary artery disease (CAD). We cleaned lab traits extracted from the EHR performed lab-wide association scans (LabWAS) of the lipids and CAD polygenic risk scores across 315 heritable lab tests then replicated the pipeline and analyses in the Massachusetts General Brigham Biobank.

RESULTS

Heritability estimates of lipid values (after cleaning with QualityLab) were comparable to previous reports and polygenic scores for lipids were strongly associated with their referent lipid in a LabWAS. LabWAS of the polygenic score for CAD recapitulated canonical heart disease biomarker profiles including decreased HDL, increased pre-medication LDL, triglycerides, blood glucose, and glycated hemoglobin (HgbA1C) in European and African descent populations. Notably, many of these associations remained even after adjusting for the presence of cardiovascular disease and were replicated in the MGBB.

CONCLUSIONS

Polygenic risk scores can be used to identify biomarkers of complex disease in large-scale EHR-based genomic analyses, providing new avenues for discovery of novel biomarkers and deeper understanding of disease trajectories in pre-symptomatic individuals. We present two methods and associated software, QualityLab and LabWAS, to clean and analyze EHR labs at scale and perform a Lab-Wide Association Scan.

Collapse

Affiliation(s)

Jessica K Dennis Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Department of Medical Genetics, University of British Columbia, Vancouver, BC, V5Z 4H4, Canada
Julia M Sealock Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
Peter Straub Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
Younga H Lee Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA Department of Psychiatry, Harvard Medical School, Boston, MA, 02115, USA Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
Donald Hucks Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
Ky'Era Actkins Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Department of Microbiology, Immunology, and Physiology, Meharry Medical College, Nashville, TN, 37232, USA
Annika Faucon Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
Yen-Chen Anne Feng Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA Analytic and Translational Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
Tian Ge Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA Department of Psychiatry, Harvard Medical School, Boston, MA, 02115, USA Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
Slavina B Goleva Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Department of Molecular Physiology and Biophysics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
Maria Niarchou Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
Kritika Singh Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
Theodore Morley Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
Jordan W Smoller Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA Department of Psychiatry, Harvard Medical School, Boston, MA, 02115, USA Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
Douglas M Ruderfer Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Departments of Medicine and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
Jonathan D Mosley Departments of Medicine and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
Guanhua Chen Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA
Lea K Davis Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA. Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA. Department of Molecular Physiology and Biophysics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA. Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN, 37232, USA. Departments of Medicine and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA. Division of Genetic Medicine, Department of Medicine, Vanderbilt Genetics Institute, Vanderbilt University, 511-A Light Hall, 2215 Garland Ave, Nashville, TN, 37232, USA.

Collapse

Chen H, Wang T, Yang J, Huang S, Zeng P. Improved Detection of Potentially Pleiotropic Genes in Coronary Artery Disease and Chronic Kidney Disease Using GWAS Summary Statistics. Front Genet 2020;11:592461. [PMID: 33343632 PMCID: PMC7744760 DOI: 10.3389/fgene.2020.592461] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Accepted: 11/17/2020] [Indexed: 12/24/2022] Open

Abstract

The coexistence of coronary artery disease (CAD) and chronic kidney disease (CKD) implies overlapped genetic foundation. However, the common genetic determination between the two diseases remains largely unknown. Relying on summary statistics publicly available from large scale genome-wide association studies (n = 184,305 for CAD and n = 567,460 for CKD), we observed significant positive genetic correlation between CAD and CKD (r_g = 0.173, p = 0.024) via the linkage disequilibrium score regression. Next, we implemented gene-based association analysis for each disease through MAGMA (Multi-marker Analysis of GenoMic Annotation) and detected 763 and 827 genes associated with CAD or CKD (FDR < 0.05). Among those 72 genes were shared between the two diseases. Furthermore, by integrating the overlapped genetic information between CAD and CKD, we implemented two pleiotropy-informed informatics approaches including cFDR (conditional false discovery rate) and GPA (Genetic analysis incorporating Pleiotropy and Annotation), and identified 169 and 504 shared genes (FDR < 0.05), of which 121 genes were simultaneously discovered by cFDR and GPA. Importantly, we found 11 potentially new pleiotropic genes related to both CAD and CKD (i.e., ARHGEF19, RSG1, NDST2, CAMK2G, VCL, LRP10, RBM23, USP10, WNT9B, GOSR2, and RPRML). Five of the newly identified pleiotropic genes were further repeated via an additional dataset CAD available from UK Biobank. Our functional enrichment analysis showed that those pleiotropic genes were enriched in diverse relevant pathway processes including quaternary ammonium group transmembrane transporter, dopamine transport. Overall, this study identifies common genetic architectures overlapped between CAD and CKD and will help to advance understanding of the molecular mechanisms underlying the comorbidity of the two diseases.

Collapse

Xiao L, Yuan Z, Jin S, Wang T, Huang S, Zeng P. Multiple-Tissue Integrative Transcriptome-Wide Association Studies Discovered New Genes Associated With Amyotrophic Lateral Sclerosis. Front Genet 2020;11:587243. [PMID: 33329728 PMCID: PMC7714931 DOI: 10.3389/fgene.2020.587243] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2020] [Accepted: 10/26/2020] [Indexed: 12/12/2022] Open

Ramanan VK, Wang X, Przybelski SA, Raghavan S, Heckman MG, Batzler A, Kosel ML, Hohman TJ, Knopman DS, Graff-Radford J, Lowe VJ, Mielke MM, Jack CR, Petersen RC, Ross OA, Vemuri P. Variants in PPP2R2B and IGF2BP3 are associated with higher tau deposition. Brain Commun 2020;2:fcaa159. [PMID: 33426524 PMCID: PMC7780444 DOI: 10.1093/braincomms/fcaa159] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Revised: 07/29/2020] [Accepted: 08/24/2020] [Indexed: 12/18/2022] Open

Abstract

Tau deposition is a key biological feature of Alzheimer’s disease that is closely related to cognitive impairment. However, it remains poorly understood why certain individuals may be more susceptible to tau deposition while others are more resistant. The recent availability of in vivo assessment of tau burden through positron emission tomography provides an opportunity to test the hypothesis that common genetic variants may influence tau deposition. We performed a genome-wide association study of tau-positron emission tomography on a sample of 754 individuals over age 50 (mean age 72.4 years, 54.6% men, 87.6% cognitively unimpaired) from the population-based Mayo Clinic Study of Aging. Linear regression was performed to test nucleotide polymorphism associations with AV-1451 (¹⁸F-flortaucipir) tau-positron emission tomography burden in an Alzheimer’s-signature composite region of interest, using an additive genetic model and covarying for age, sex and genetic principal components. Genome-wide significant associations with higher tau were identified for rs76752255 (P = 9.91 × 10⁻⁹, β = 0.20) in the tau phosphorylation regulatory gene PPP2R2B (protein phosphatase 2 regulatory subunit B) and for rs117402302 (P = 4.00 × 10⁻⁸, β = 0.19) near IGF2BP3 (insulin-like growth factor 2 mRNA-binding protein 3). The PPP2R2B association remained genome-wide significant after additionally covarying for global amyloid burden and cerebrovascular disease risk, while the IGF2BP3 association was partially attenuated after accounting for amyloid load. In addition to these discoveries, three single nucleotide polymorphisms within MAPT (microtubule-associated protein tau) displayed nominal associations with tau-positron emission tomography burden, and the association of the APOE (apolipoprotein E) ɛ4 allele with tau-positron emission tomography was marginally nonsignificant (P = 0.06, β = 0.07). No associations with tau-positron emission tomography burden were identified for other single nucleotide polymorphisms associated with Alzheimer’s disease clinical diagnosis in prior large case–control studies. Our findings nominate PPP2R2B and IGF2BP3 as novel potential influences on tau pathology which warrant further functional characterization. Our data are also supportive of previous literature on the associations of MAPT genetic variation with tau, and more broadly supports the inference that tau accumulation may have a genetic architecture distinct from known Alzheimer’s susceptibility genes, which may have implications for improved risk stratification and therapeutic targeting.

Collapse

Interactions of Habitual Coffee Consumption by Genetic Polymorphisms with the Risk of Prediabetes and Type 2 Diabetes Combined. Nutrients 2020;12:nu12082228. [PMID: 32722627 PMCID: PMC7468962 DOI: 10.3390/nu12082228] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Revised: 07/23/2020] [Accepted: 07/23/2020] [Indexed: 01/15/2023] Open

Abstract

Habitual coffee consumption and its association with health outcomes may be modified by genetic variation. Adults aged 40 to 69 years who participated in the Korea Association Resource (KARE) study were included in this study. We conducted a genome-wide association study (GWAS) on coffee consumption in 7868 Korean adults, and examined whether the association between coffee consumption and the risk of prediabetes and type 2 diabetes combined was modified by the genetic variations in 4054 adults. In the GWAS for coffee consumption, a total of five single nucleotide polymorphisms (SNPs) located in 12q24.11-13 (rs2074356, rs11066015, rs12229654, rs11065828, and rs79105258) were selected and used to calculate weighted genetic risk scores. Individuals who had a larger number of minor alleles for these five SNPs had higher genetic risk scores. Multivariate logistic regression models were used to estimate the odds ratios (ORs) and 95% confidence intervals (95% CIs) to examine the association. During the 12 years of follow-up, a total of 2468 (60.9%) and 480 (11.8%) participants were diagnosed as prediabetes or type 2 diabetes, respectively. Compared with non-black-coffee consumers, the OR (95% CI) for ≥2 cups/day by black-coffee consumers was 0.61 (0.38–0.95; p for trend = 0.023). Similarly, sugared coffee showed an inverse association. We found a potential interaction by the genetic variations related to black-coffee consumption, suggesting a stronger association among individuals with higher genetic risk scores compared to those with lower scores; the ORs (95% CIs) were 0.36 (0.15–0.88) for individuals with 5 to 10 points and 0.87 (0.46–1.66) for those with 0 points. Our study suggests that habitual coffee consumption was related to genetic polymorphisms and modified the risk of prediabetes and type 2 diabetes combined in a sample of the Korean population. The mechanisms between coffee-related genetic variation and the risk of prediabetes and type 2 diabetes combined warrant further investigation.

Collapse

Kuo TT, Jiang X, Tang H, Wang X, Bath T, Bu D, Wang L, Harmanci A, Zhang S, Zhi D, Sofia HJ, Ohno-Machado L. iDASH secure genome analysis competition 2018: blockchain genomic data access logging, homomorphic encryption on GWAS, and DNA segment searching. BMC Med Genomics 2020;13:98. [PMID: 32693816 PMCID: PMC7372776 DOI: 10.1186/s12920-020-0715-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open

Padilla-Martínez F, Collin F, Kwasniewski M, Kretowski A. Systematic Review of Polygenic Risk Scores for Type 1 and Type 2 Diabetes. Int J Mol Sci 2020;21:E1703. [PMID: 32131491 PMCID: PMC7084489 DOI: 10.3390/ijms21051703] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Revised: 02/28/2020] [Accepted: 02/28/2020] [Indexed: 02/07/2023] Open

Gaudillo J, Rodriguez JJR, Nazareno A, Baltazar LR, Vilela J, Bulalacao R, Domingo M, Albia J. Machine learning approach to single nucleotide polymorphism-based asthma prediction. PLoS One 2019;14:e0225574. [PMID: 31800601 PMCID: PMC6892549 DOI: 10.1371/journal.pone.0225574] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Accepted: 11/07/2019] [Indexed: 12/31/2022] Open

Sanyal N, Lo MT, Kauppi K, Djurovic S, Andreassen OA, Johnson VE, Chen CH. GWASinlps: non-local prior based iterative SNP selection tool for genome-wide association studies. Bioinformatics 2019;35:1-11. [PMID: 29931045 DOI: 10.1093/bioinformatics/bty472] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2017] [Accepted: 06/12/2018] [Indexed: 01/29/2023] Open

Abstract

Motivation

Multiple marker analysis of the genome-wide association study (GWAS) data has gained ample attention in recent years. However, because of the ultra high-dimensionality of GWAS data, such analysis is challenging. Frequently used penalized regression methods often lead to large number of false positives, whereas Bayesian methods are computationally very expensive. Motivated to ameliorate these issues simultaneously, we consider the novel approach of using non-local priors in an iterative variable selection framework.

Results

We develop a variable selection method, named, iterative non-local prior based selection for GWAS, or GWASinlps, that combines, in an iterative variable selection framework, the computational efficiency of the screen-and-select approach based on some association learning and the parsimonious uncertainty quantification provided by the use of non-local priors. The hallmark of our method is the introduction of 'structured screen-and-select' strategy, that considers hierarchical screening, which is not only based on response-predictor associations, but also based on response-response associations and concatenates variable selection within that hierarchy. Extensive simulation studies with single nucleotide polymorphisms having realistic linkage disequilibrium structures demonstrate the advantages of our computationally efficient method compared to several frequentist and Bayesian variable selection methods, in terms of true positive rate, false discovery rate, mean squared error and effect size estimation error. Further, we provide empirical power analysis useful for study design. Finally, a real GWAS data application was considered with human height as phenotype.

Availability and implementation

An R-package for implementing the GWASinlps method is available at https://cran.r-project.org/web/packages/GWASinlps/index.html.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Romagnoni A, Jégou S, Van Steen K, Wainrib G, Hugot JP. Comparative performances of machine learning methods for classifying Crohn Disease patients using genome-wide genotyping data. Sci Rep 2019;9:10351. [PMID: 31316157 PMCID: PMC6637191 DOI: 10.1038/s41598-019-46649-z] [Citation(s) in RCA: 57] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Accepted: 07/03/2019] [Indexed: 02/08/2023] Open

Lan T, Yang B, Zhang X, Wang T, Lu Q. Statistical Methods and Software for Substance Use and Dependence Genetic Research. Curr Genomics 2019;20:172-183. [PMID: 31929725 PMCID: PMC6935956 DOI: 10.2174/1389202920666190617094930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Revised: 05/16/2019] [Accepted: 05/24/2019] [Indexed: 12/04/2022] Open

Brinster R, Köttgen A, Tayo BO, Schumacher M, Sekula P. Control procedures and estimators of the false discovery rate and their application in low-dimensional settings: an empirical investigation. BMC Bioinformatics 2018;19:78. [PMID: 29499647 PMCID: PMC5833079 DOI: 10.1186/s12859-018-2081-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2017] [Accepted: 02/20/2018] [Indexed: 01/14/2023] Open

Abstract

Background

When many (up to millions) of statistical tests are conducted in discovery set analyses such as genome-wide association studies (GWAS), approaches controlling family-wise error rate (FWER) or false discovery rate (FDR) are required to reduce the number of false positive decisions. Some methods were specifically developed in the context of high-dimensional settings and partially rely on the estimation of the proportion of true null hypotheses. However, these approaches are also applied in low-dimensional settings such as replication set analyses that might be restricted to a small number of specific hypotheses. The aim of this study was to compare different approaches in low-dimensional settings using (a) real data from the CKDGen Consortium and (b) a simulation study.

Results

In both application and simulation FWER approaches were less powerful compared to FDR control methods, whether a larger number of hypotheses were tested or not. Most powerful was the q-value method. However, the specificity of this method to maintain true null hypotheses was especially decreased when the number of tested hypotheses was small. In this low-dimensional situation, estimation of the proportion of true null hypotheses was biased.

Conclusions

The results highlight the importance of a sizeable data set for a reliable estimation of the proportion of true null hypotheses. Consequently, methods relying on this estimation should only be applied in high-dimensional settings. Furthermore, if the focus lies on testing of a small number of hypotheses such as in replication settings, FWER methods rather than FDR methods should be preferred to maintain high specificity.

Electronic supplementary material

The online version of this article (10.1186/s12859-018-2081-x) contains supplementary material, which is available to authorized users.

Collapse

Zeng P, Wang T, Huang S. Cis-SNPs Set Testing and PrediXcan Analysis for Gene Expression Data using Linear Mixed Models. Sci Rep 2017;7:15237. [PMID: 29127305 PMCID: PMC5681585 DOI: 10.1038/s41598-017-15055-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2017] [Accepted: 10/19/2017] [Indexed: 12/21/2022] Open

Zeng P, Zhou X, Huang S. Prediction of gene expression with cis-SNPs using mixed models and regularization methods. BMC Genomics 2017;18:368. [PMID: 28490319 PMCID: PMC5425981 DOI: 10.1186/s12864-017-3759-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2016] [Accepted: 05/03/2017] [Indexed: 12/25/2022] Open

Abstract

Background

It has been shown that gene expression in human tissues is heritable, thus predicting gene expression using only SNPs becomes possible. The prediction of gene expression can offer important implications on the genetic architecture of individual functional associated SNPs and further interpretations of the molecular basis underlying human diseases.

Methods

We compared three types of methods for predicting gene expression using only cis-SNPs, including the polygenic model, i.e. linear mixed model (LMM), two sparse models, i.e. Lasso and elastic net (ENET), and the hybrid of LMM and sparse model, i.e. Bayesian sparse linear mixed model (BSLMM). The three kinds of prediction methods have very different assumptions of underlying genetic architectures. These methods were evaluated using simulations under various scenarios, and were applied to the Geuvadis gene expression data.

Results

The simulations showed that these four prediction methods (i.e. Lasso, ENET, LMM and BSLMM) behaved best when their respective modeling assumptions were satisfied, but BSLMM had a robust performance across a range of scenarios. According to R² of these models in the Geuvadis data, the four methods performed quite similarly. We did not observe any clustering or enrichment of predictive genes (defined as genes with R² ≥ 0.05) across the chromosomes, and also did not see there was any clear relationship between the proportion of the predictive genes and the proportion of genes in each chromosome. However, an interesting finding in the Geuvadis data was that highly predictive genes (e.g. R² ≥ 0.30) may have sparse genetic architectures since Lasso, ENET and BSLMM outperformed LMM for these genes; and this observation was validated in another gene expression data. We further showed that the predictive genes were enriched in approximately independent LD blocks.

Conclusions

Gene expression can be predicted with only cis-SNPs using well-developed prediction models and these predictive genes were enriched in some approximately independent LD blocks. The prediction of gene expression can shed some light on the functional interpretation for identified SNPs in GWASs.

Collapse

Kao PYP, Leung KH, Chan LWC, Yip SP, Yap MKH. Pathway analysis of complex diseases for GWAS, extending to consider rare variants, multi-omics and interactions. Biochim Biophys Acta Gen Subj 2016;1861:335-353. [PMID: 27888147 DOI: 10.1016/j.bbagen.2016.11.030] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2016] [Revised: 10/17/2016] [Accepted: 11/19/2016] [Indexed: 12/20/2022]

Personalized risk prediction for type 2 diabetes: the potential of genetic risk scores. Genet Med 2016;19:322-329. [PMID: 27513194 PMCID: PMC5506454 DOI: 10.1038/gim.2016.103] [Citation(s) in RCA: 97] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2015] [Accepted: 06/20/2016] [Indexed: 12/16/2022] Open

Umehara H, Numata S, Tajima A, Nishi A, Nakataki M, Imoto I, Sumitani S, Ohmori T. Calcium Signaling Pathway Is Associated with the Long-Term Clinical Response to Selective Serotonin Reuptake Inhibitors (SSRI) and SSRI with Antipsychotics in Patients with Obsessive-Compulsive Disorder. PLoS One 2016;11:e0157232. [PMID: 27281126 PMCID: PMC4900663 DOI: 10.1371/journal.pone.0157232] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2016] [Accepted: 05/26/2016] [Indexed: 01/18/2023] Open

Zhang Q, Zhao Y, Zhang R, Wei Y, Yi H, Shao F, Chen F. A Comparative Study of Five Association Tests Based on CpG Set for Epigenome-Wide Association Studies. PLoS One 2016;11:e0156895. [PMID: 27258058 PMCID: PMC4892473 DOI: 10.1371/journal.pone.0156895] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2016] [Accepted: 05/20/2016] [Indexed: 11/19/2022] Open

Gasc C, Peyretaillade E, Peyret P. Sequence capture by hybridization to explore modern and ancient genomic diversity in model and nonmodel organisms. Nucleic Acids Res 2016;44:4504-18. [PMID: 27105841 PMCID: PMC4889952 DOI: 10.1093/nar/gkw309] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2016] [Revised: 04/07/2016] [Accepted: 04/12/2016] [Indexed: 12/25/2022] Open