1
|
Shivakumar M, Miller JE, Dasari VR, Zhang Y, Lee MTM, Carey DJ, Gogoi R, Kim D. Genetic Analysis of Functional Rare Germline Variants across Nine Cancer Types from an Electronic Health Record Linked Biobank. Cancer Epidemiol Biomarkers Prev 2021; 30:1681-1688. [PMID: 34244158 DOI: 10.1158/1055-9965.epi-21-0082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 02/15/2021] [Accepted: 06/17/2021] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Rare variants play an essential role in the etiology of cancer. In this study, we aim to characterize rare germline variants that impact the risk of cancer. METHODS We performed a genome-wide rare variant analysis using germline whole exome sequencing (WES) data derived from the Geisinger MyCode initiative to discover cancer predisposition variants. The case-control association analysis was conducted by binning variants in 5,538 patients with cancer and 7,286 matched controls in a discovery set and 1,991 patients with cancer and 2,504 matched controls in a validation set across nine cancer types. Further, The Cancer Genome Atlas (TCGA) germline data were used to replicate the findings. RESULTS We identified 133 significant pathway-cancer pairs (85 replicated) and 90 significant gene-cancer pairs (12 replicated). In addition, we identified 18 genes and 3 pathways that were associated with survival outcome across cancers (Bonferroni P < 0.05). CONCLUSIONS In this study, we identified potential predisposition genes and pathways based on rare variants in nine cancers. IMPACT This work adds to the knowledge base and progress being made in precision medicine.
Collapse
Affiliation(s)
- Manu Shivakumar
- Biomedical & Translational Informatics Institute, Geisinger, Danville, Pennsylvania
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Jason E Miller
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, Pennsylvania
| | | | - Yanfei Zhang
- Genomic Medicine Institute, Geisinger, Danville, Pennsylvania
| | | | - David J Carey
- Department of Molecular and Functional Genomics, Geisinger, Danville, Pennsylvania
| | - Radhika Gogoi
- Weis Center for Research, Geisinger Clinic, Danville, Pennsylvania.
| | | |
Collapse
|
2
|
Fore R, Boehme J, Li K, Westra J, Tintle N. Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants. Front Genet 2020; 11:591606. [PMID: 33240333 PMCID: PMC7680887 DOI: 10.3389/fgene.2020.591606] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Accepted: 10/05/2020] [Indexed: 12/22/2022] Open
Abstract
Gene-based tests of association (e.g., variance components and burden tests) are now common practice for analyses attempting to elucidate the contribution of rare genetic variants on common disease. As sequencing datasets continue to grow in size, the number of variants within each set (e.g., gene) being tested is also continuing to grow. Pathway-based methods have been used to allow for the initial aggregation of gene-based statistical evidence and then the subsequent aggregation of evidence across the pathway. This “multi-set” approach (first gene-based test, followed by pathway-based) lacks thorough exploration in regard to evaluating genotype–phenotype associations in the age of large, sequenced datasets. In particular, we wonder whether there are statistical and biological characteristics that make the multi-set approach optimal vs. simply doing all gene-based tests? In this paper, we provide an intuitive framework for evaluating these questions and use simulated data to affirm us this intuition. A real data application is provided demonstrating how our insights manifest themselves in practice. Ultimately, we find that when initial subsets are biologically informative (e.g., tending to aggregate causal genetic variants within one or more subsets, often genes), multi-set strategies can improve statistical power, with particular gains in cases where causal variants are aggregated in subsets with less variants overall (high proportion of causal variants in the subset). However, we find that there is little advantage when the sets are non-informative (similar proportion of causal variants in the subsets). Our application to real data further demonstrates this intuition. In practice, we recommend wider use of pathway-based methods and further exploration of optimal ways of aggregating variants into subsets based on emerging biological evidence of the genetic architecture of complex disease.
Collapse
Affiliation(s)
- Ruby Fore
- Department of Biostatistics, Brown University, Providence, RI, United States
| | - Jaden Boehme
- Department of Mathematics, Oregon State University, Corvallis, OR, United States
| | - Kevin Li
- Department of Mathematics, School of Arts and Sciences, Columbia University, New York, NY, United States
| | - Jason Westra
- Department of Mathematics and Statistics, Dordt University, Sioux Center, IA, United States
| | - Nathan Tintle
- Department of Mathematics and Statistics, Dordt University, Sioux Center, IA, United States
| |
Collapse
|
3
|
Shivakumar M, Miller JE, Dasari VR, Gogoi R, Kim D. Exome-Wide Rare Variant Analysis From the DiscovEHR Study Identifies Novel Candidate Predisposition Genes for Endometrial Cancer. Front Oncol 2019; 9:574. [PMID: 31338326 PMCID: PMC6626914 DOI: 10.3389/fonc.2019.00574] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Accepted: 06/13/2019] [Indexed: 12/19/2022] Open
Abstract
Endometrial cancer is the fourth most commonly diagnosed cancer in women. Family history is a known risk factor for endometrial cancer. The incidence of endometrial cancer in a first-degree relative elevates the relative risk to range between 1.3 and 2.8. It is unclear to what extent or what other novel germline variants are at play in endometrial cancer. We aim to address this question by utilizing whole exome sequencing as a means to identify novel, rare variant associations between exonic regions and endometrial cancer. The MyCode community health initiative is an excellent resource for this study with germline whole exome data for 60,000 patients available in the first phase, and further 30,000 patients independently sequenced in the second phase as part of DiscovEHR study. We conducted exome-wide rare variant association using 472 cases and 4,110 controls in 60,000 patients (discovery cohort); and 261 cases and 1,531 controls from 30,000 patients (replication cohort). After binning rare germline variants into genes, case-control association tests performed using Optimal Unified Approach for Rare-Variant Association, SKAT-O. Seven genes, including RBM12, NDUFB6, ATP6V1A, RECK, SLC35E1, RFX3 (Bonferroni-corrected P < 0.05) and ATP8A1 (suggestive P < 10−5), and one long non-coding RNA, DLGAP4-AS1 (Bonferroni-corrected P < 0.05), were associated with endometrial cancer. Notably, RECK, and ATP8A1 were replicated from the replication cohort (suggestive threshold P < 0.05). Additionally, a pathway-based rare variant analysis, using pathogenic and likely pathogenic variants, identified two significant pathways, pyrimidine metabolism and protein processing in the endoplasmic reticulum (Bonferroni-corrected P < 0.05). In conclusion, our results using the single-source electronic health records (EHR) linked to genomic data highlights candidate genes and pathways associated with endometrial cancer and indicates rare variants involvement in endometrial cancer predisposition, which could help in personalized prognosis and also further our understanding of its genetic etiology.
Collapse
Affiliation(s)
- Manu Shivakumar
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, United States
| | - Jason E Miller
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, United States.,Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | | | - Radhika Gogoi
- Weis Center for Research, Geisinger Clinic, Danville, PA, United States
| | - Dokyoon Kim
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, United States.,Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States.,Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
4
|
Zhang X, Basile AO, Pendergrass SA, Ritchie MD. Real world scenarios in rare variant association analysis: the impact of imbalance and sample size on the power in silico. BMC Bioinformatics 2019; 20:46. [PMID: 30669967 PMCID: PMC6343276 DOI: 10.1186/s12859-018-2591-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 12/26/2018] [Indexed: 11/11/2022] Open
Abstract
Background The development of sequencing techniques and statistical methods provides great opportunities for identifying the impact of rare genetic variation on complex traits. However, there is a lack of knowledge on the impact of sample size, case numbers, the balance of cases vs controls for both burden and dispersion based rare variant association methods. For example, Phenome-Wide Association Studies may have a wide range of case and control sample sizes across hundreds of diagnoses and traits, and with the application of statistical methods to rare variants, it is important to understand the strengths and limitations of the analyses. Results We conducted a large-scale simulation of randomly selected low-frequency protein-coding regions using twelve different balanced samples with an equal number of cases and controls as well as twenty-one unbalanced sample scenarios. We further explored statistical performance of different minor allele frequency thresholds and a range of genetic effect sizes. Our simulation results demonstrate that using an unbalanced study design has an overall higher type I error rate for both burden and dispersion tests compared with a balanced study design. Regression has an overall higher type I error with balanced cases and controls, while SKAT has higher type I error for unbalanced case-control scenarios. We also found that both type I error and power were driven by the number of cases in addition to the case to control ratio under large control group scenarios. Based on our power simulations, we observed that a SKAT analysis with case numbers larger than 200 for unbalanced case-control models yielded over 90% power with relatively well controlled type I error. To achieve similar power in regression, over 500 cases are needed. Moreover, SKAT showed higher power to detect associations in unbalanced case-control scenarios than regression. Conclusions Our results provide important insights into rare variant association study designs by providing a landscape of type I error and statistical power for a wide range of sample sizes. These results can serve as a benchmark for making decisions about study design for rare variant analyses. Electronic supplementary material The online version of this article (10.1186/s12859-018-2591-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xinyuan Zhang
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Anna O Basile
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Sarah A Pendergrass
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, USA
| | - Marylyn D Ritchie
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA. .,Department of Genetics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA.
| |
Collapse
|
5
|
Basile AO, Byrska-Bishop M, Wallace J, Frase AT, Ritchie MD. Novel features and enhancements in BioBin, a tool for the biologically inspired binning and association analysis of rare variants. Bioinformatics 2018; 34:527-529. [PMID: 28968757 PMCID: PMC5860358 DOI: 10.1093/bioinformatics/btx559] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Accepted: 09/13/2017] [Indexed: 11/27/2022] Open
Abstract
Motivation BioBin is an automated bioinformatics tool for the multi-level biological binning of sequence variants. Herein, we present a significant update to BioBin which expands the software to facilitate a comprehensive rare variant analysis and incorporates novel features and analysis enhancements. Results In BioBin 2.3, we extend our software tool by implementing statistical association testing, updating the binning algorithm, as well as incorporating novel analysis features providing for a robust, highly customizable, and unified rare variant analysis tool. Availability and implementation The BioBin software package is open source and freely available to users at http://www.ritchielab.com/software/biobin-download Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anna O Basile
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Marta Byrska-Bishop
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, 17822 USA
| | - John Wallace
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, 17822 USA
| | - Alexander T Frase
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, 17822 USA
| | - Marylyn D Ritchie
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, 17822 USA
| |
Collapse
|
6
|
Abstract
Biomedical data science has experienced an explosion of new data over the past decade. Abundant genetic and genomic data are increasingly available in large, diverse data sets due to the maturation of modern molecular technologies. Along with these molecular data, dense, rich phenotypic data are also available on comprehensive clinical data sets from health care provider organizations, clinical trials, population health registries, and epidemiologic studies. The methods and approaches for interrogating these large genetic/genomic and clinical data sets continue to evolve rapidly, as our understanding of the questions and challenges continue to emerge. In this review, the state-of-the-art methodologies for genetic/genomic analysis along with complex phenomics will be discussed. This field is changing and adapting to the novel data types made available, as well as technological advances in computation and machine learning. Thus, I will also discuss the future challenges in this exciting and innovative space. The promises of precision medicine rely heavily on the ability to marry complex genetic/genomic data with clinical phenotypes in meaningful ways.
Collapse
Affiliation(s)
- Marylyn D. Ritchie
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| |
Collapse
|
7
|
Verma SS, Josyula N, Verma A, Zhang X, Veturi Y, Dewey FE, Hartzel DN, Lavage DR, Leader J, Ritchie MD, Pendergrass SA. Rare variants in drug target genes contributing to complex diseases, phenome-wide. Sci Rep 2018; 8:4624. [PMID: 29545597 PMCID: PMC5854600 DOI: 10.1038/s41598-018-22834-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Accepted: 03/01/2018] [Indexed: 12/30/2022] Open
Abstract
The DrugBank database consists of ~800 genes that are well characterized drug targets. This list of genes is a useful resource for association testing. For example, loss of function (LOF) genetic variation has the potential to mimic the effect of drugs, and high impact variation in these genes can impact downstream traits. Identifying novel associations between genetic variation in these genes and a range of diseases can also uncover new uses for the drugs that target these genes. Phenome Wide Association Studies (PheWAS) have been successful in identifying genetic associations across hundreds of thousands of diseases. We have conducted a novel gene based PheWAS to test the effect of rare variants in DrugBank genes, evaluating associations between these genes and more than 500 quantitative and dichotomous phenotypes. We used whole exome sequencing data from 38,568 samples in Geisinger MyCode Community Health Initiative. We evaluated the results of this study when binning rare variants using various filters based on potential functional impact. We identified multiple novel associations, and the majority of the significant associations were driven by functionally annotated variation. Overall, this study provides a sweeping exploration of rare variant associations within functionally relevant genes across a wide range of diagnoses.
Collapse
Affiliation(s)
- Shefali Setia Verma
- Perelman School of Medicine, Department of Genetics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Navya Josyula
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, 17221, USA
| | - Anurag Verma
- Perelman School of Medicine, Department of Genetics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Xinyuan Zhang
- Perelman School of Medicine, Department of Genetics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Yogasudha Veturi
- Perelman School of Medicine, Department of Genetics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | | | - Dustin N Hartzel
- Phenomic Analytics and Clinical Data Core, Geisinger, Danville, PA, USA
| | - Daniel R Lavage
- Phenomic Analytics and Clinical Data Core, Geisinger, Danville, PA, USA
| | - Joe Leader
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, 17221, USA.,Phenomic Analytics and Clinical Data Core, Geisinger, Danville, PA, USA
| | - Marylyn D Ritchie
- Perelman School of Medicine, Department of Genetics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Sarah A Pendergrass
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, 17221, USA.
| |
Collapse
|
8
|
Miller JE, Shivakumar MK, Risacher SL, Saykin AJ, Lee S, Nho K, Kim D. Codon bias among synonymous rare variants is associated with Alzheimer's disease imaging biomarker. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018; 23:365-376. [PMID: 29218897 PMCID: PMC5756629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Alzheimer's disease (AD) is a neurodegenerative disorder with few biomarkers even though it impacts a relatively large portion of the population and is predicted to affect significantly more individuals in the future. Neuroimaging has been used in concert with genetic information to improve our understanding in relation to how AD arises and how it can be potentially diagnosed. Additionally, evidence suggests synonymous variants can have a functional impact on gene regulatory mechanisms, including those related to AD. Some synonymous codons are preferred over others leading to a codon bias. The bias can arise with respect to codons that are more or less frequently used in the genome. A bias can also result from optimal and non-optimal codons, which have stronger and weaker codon anti-codon interactions, respectively. Although association tests have been utilized before to identify genes associated with AD, it remains unclear how codon bias plays a role and if it can improve rare variant analysis. In this work, rare variants from whole-genome sequencing from the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort were binned into genes using BioBin. An association analysis of the genes with AD-related neuroimaging biomarker was performed using SKAT-O. While using all synonymous variants we did not identify any genomewide significant associations, using only synonymous variants that affected codon frequency we identified several genes as significantly associated with the imaging phenotype. Additionally, significant associations were found using only rare variants that contains an optimal codon in among minor alleles and a non-optimal codon in the major allele. These results suggest that codon bias may play a role in AD and that it can be used to improve detection power in rare variant association analysis.
Collapse
Affiliation(s)
- Jason E Miller
- Biomedical and Translational Informatics Institute, Geisinger Health System, Danville, PA, USA
| | | | | | | | | | | | | |
Collapse
|
9
|
Tanner JA, Zhu AZ, Claw KG, Prasad B, Korchina V, Hu J, Doddapaneni H, Muzny DM, Schuetz EG, Lerman C, Thummel KE, Scherer SE, Tyndale RF. Novel CYP2A6 diplotypes identified through next-generation sequencing are associated with in-vitro and in-vivo nicotine metabolism. Pharmacogenet Genomics 2018; 28:7-16. [PMID: 29232328 PMCID: PMC5729933 DOI: 10.1097/fpc.0000000000000317] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
OBJECTIVES Smoking patterns and cessation rates vary widely across smokers and can be influenced by variation in rates of nicotine metabolism [i.e. cytochrome P450 2A6 (CYP2A6), enzyme activity]. There is high heritability of CYP2A6-mediated nicotine metabolism (60-80%) owing to known and unidentified genetic variation in the CYP2A6 gene. We aimed to identify and characterize additional genetic variants at the CYP2A6 gene locus. METHODS A new CYP2A6-specific sequencing method was used to investigate genetic variation in CYP2A6. Novel variants were characterized in a White human liver bank that has been extensively phenotyped for CYP2A6. Linkage and haplotype structure for the novel single nucleotide polymorphisms (SNPs) were assessed. The association between novel five-SNP diplotypes and nicotine metabolism rate was investigated. RESULTS Seven high-frequency (minor allele frequencies ≥6%) noncoding SNPs were identified as important contributors to CYP2A6 phenotypes in a White human liver bank (rs57837628, rs7260629, rs7259706, rs150298687 (also denoted rs4803381), rs56113850, rs28399453, and rs8192733), accounting for two times more variation in in-vitro CYP2A6 activity relative to the four established functional CYP2A6 variants that are frequently tested in Whites (CYP2A6*2, *4, *9, and *12). Two pairs of novel SNPs were in high linkage disequilibrium, allowing us to establish five-SNP diplotypes that were associated with CYP2A6 enzyme activity (rate of nicotine metabolism) in-vitro in the liver bank and in-vivo among smokers. CONCLUSION The novel five-SNP diplotype may be useful to incorporate into CYP2A6 genotype models for personalized prediction of nicotine metabolism rate, cessation success, and response to pharmacotherapies.
Collapse
Affiliation(s)
- Julie-Anne Tanner
- Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health (CAMH)
- Department of Pharmacology and Toxicology
| | - Andy Z Zhu
- Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health (CAMH)
- Department of Pharmacology and Toxicology
| | - Katrina G Claw
- Department of Pharmaceutics, University of Washington, Seattle, Washington
| | - Bhagwat Prasad
- Department of Pharmaceutics, University of Washington, Seattle, Washington
| | - Viktoriya Korchina
- Department of Molecular and Human Genetics, The Baylor College of Medicine Human Genome Sequencing Center, Houston, Texas, USA
| | - Jianhong Hu
- Department of Molecular and Human Genetics, The Baylor College of Medicine Human Genome Sequencing Center, Houston, Texas, USA
| | - HarshaVardhan Doddapaneni
- Department of Molecular and Human Genetics, The Baylor College of Medicine Human Genome Sequencing Center, Houston, Texas, USA
| | - Donna M Muzny
- Department of Molecular and Human Genetics, The Baylor College of Medicine Human Genome Sequencing Center, Houston, Texas, USA
| | - Erin G Schuetz
- Pharmaceutical Sciences Department, St Jude Children's Research Hospital, Memphis, Tennessee
| | - Caryn Lerman
- Department of Psychiatry, Annenberg School for Communication, and Abramson Cancer Center, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Kenneth E Thummel
- Department of Pharmaceutics, University of Washington, Seattle, Washington
| | - Steven E Scherer
- Department of Molecular and Human Genetics, The Baylor College of Medicine Human Genome Sequencing Center, Houston, Texas, USA
| | - Rachel F Tyndale
- Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health (CAMH)
- Department of Pharmacology and Toxicology
- Department of Psychiatry, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
10
|
Kim D, Basile AO, Bang L, Horgusluoglu E, Lee S, Ritchie MD, Saykin AJ, Nho K. Knowledge-driven binning approach for rare variant association analysis: application to neuroimaging biomarkers in Alzheimer's disease. BMC Med Inform Decis Mak 2017; 17:61. [PMID: 28539126 PMCID: PMC5444041 DOI: 10.1186/s12911-017-0454-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Background Rapid advancement of next generation sequencing technologies such as whole genome sequencing (WGS) has facilitated the search for genetic factors that influence disease risk in the field of human genetics. To identify rare variants associated with human diseases or traits, an efficient genome-wide binning approach is needed. In this study we developed a novel biological knowledge-based binning approach for rare-variant association analysis and then applied the approach to structural neuroimaging endophenotypes related to late-onset Alzheimer’s disease (LOAD). Methods For rare-variant analysis, we used the knowledge-driven binning approach implemented in Bin-KAT, an automated tool, that provides 1) binning/collapsing methods for multi-level variant aggregation with a flexible, biologically informed binning strategy and 2) an option of performing unified collapsing and statistical rare variant analyses in one tool. A total of 750 non-Hispanic Caucasian participants from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort who had both WGS data and magnetic resonance imaging (MRI) scans were used in this study. Mean bilateral cortical thickness of the entorhinal cortex extracted from MRI scans was used as an AD-related neuroimaging endophenotype. SKAT was used for a genome-wide gene- and region-based association analysis of rare variants (MAF (minor allele frequency) < 0.05) and potential confounding factors (age, gender, years of education, intracranial volume (ICV) and MRI field strength) for entorhinal cortex thickness were used as covariates. Significant associations were determined using FDR adjustment for multiple comparisons. Results Our knowledge-driven binning approach identified 16 functional exonic rare variants in FANCC significantly associated with entorhinal cortex thickness (FDR-corrected p-value < 0.05). In addition, the approach identified 7 evolutionary conserved regions, which were mapped to FAF1, RFX7, LYPLAL1 and GOLGA3, significantly associated with entorhinal cortex thickness (FDR-corrected p-value < 0.05). In further analysis, the functional exonic rare variants in FANCC were also significantly associated with hippocampal volume and cerebrospinal fluid (CSF) Aβ1–42 (p-value < 0.05). Conclusions Our novel binning approach identified rare variants in FANCC as well as 7 evolutionary conserved regions significantly associated with a LOAD-related neuroimaging endophenotype. FANCC (fanconi anemia complementation group C) has been shown to modulate TLR and p38 MAPK-dependent expression of IL-1β in macrophages. Our results warrant further investigation in a larger independent cohort and demonstrate that the biological knowledge-driven binning approach is a powerful strategy to identify rare variants associated with AD and other complex disease.
Collapse
Affiliation(s)
- Dokyoon Kim
- Biomedical & Translational Informatics Institute, Geisinger Health System, Danville, PA, USA.,The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Anna O Basile
- The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Lisa Bang
- Biomedical & Translational Informatics Institute, Geisinger Health System, Danville, PA, USA
| | - Emrin Horgusluoglu
- Center for Neuroimaging, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Seunggeun Lee
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Marylyn D Ritchie
- Biomedical & Translational Informatics Institute, Geisinger Health System, Danville, PA, USA.,The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Andrew J Saykin
- Center for Neuroimaging, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Kwangsik Nho
- Center for Neuroimaging, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA.
| |
Collapse
|