1
|
Zhen RR, Qu YJ, Zhang LM, Gu C, Ding MR, Chen L, Peng X, Hu B, An HM. Exploring the potential anti-Alzheimer disease mechanisms of Alpiniae Oxyphyliae Fructus by network pharmacology study and molecular docking. Metab Brain Dis 2022; 38:933-944. [PMID: 36484971 DOI: 10.1007/s11011-022-01137-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 11/29/2022] [Indexed: 12/13/2022]
Abstract
Alpiniae Oxyphyliae Fructus (AOF) (yizhi) is a frequently medicated Chinese herb for Alzheimer disease (AD) treatment. The present study investigated the components and potential mechanisms of AOF through network pharmacology analysis and molecular docking. The results showed that AOF contains at least 20 active ingredients and involves 184 target genes. A total of 301 AD-related genes were obtained from the DisGeNET, GeneCards, GEO, OMIM, and Alzheimer Disease: Genes databases. A total of 41 key targets were identified from the topology analysis of the AOF-AD target network. These key targets are involved in 105 signal pathways, such as the PI3K-Akt, HIF-1, and MAPK pathways, and can regulate gene transcription, cell death, cell proliferation, drug response, and protein phosphorylation. AOF's active ingredients, Chrysin, Isocyperol, Izalpinin, Linolenic acid, CHEMBL489541, Oxyphyllenone A, Oxyphyllenone B, and Oxyphyllol C, show high affinity to targets, including PPARG, ESR1, and AKT1. These findings provide a new basis for AOF application and anti-AD study.
Collapse
Affiliation(s)
- Rong-Rong Zhen
- Department of Neurology, Longhua Hospital, Shanghai University of Traditional Chinese Medicine, 200032, Shanghai, People's Republic of China
| | - Yan-Jie Qu
- Department of Neurology, Longhua Hospital, Shanghai University of Traditional Chinese Medicine, 200032, Shanghai, People's Republic of China
| | - Li-Min Zhang
- Department of Neurology, Longhua Hospital, Shanghai University of Traditional Chinese Medicine, 200032, Shanghai, People's Republic of China
| | - Chao Gu
- Department of Neurology, Longhua Hospital, Shanghai University of Traditional Chinese Medicine, 200032, Shanghai, People's Republic of China
| | - Min-Rui Ding
- Department of Neurology, Longhua Hospital, Shanghai University of Traditional Chinese Medicine, 200032, Shanghai, People's Republic of China
| | - Lei Chen
- Institute of Traditional Chinese Medicine in Oncology, Department of Oncology, Longhua Hospital, Shanghai University of Traditional Chinese Medicine, 200032, Shanghai, People's Republic of China
| | - Xiao Peng
- Institute of Traditional Chinese Medicine in Oncology, Department of Oncology, Longhua Hospital, Shanghai University of Traditional Chinese Medicine, 200032, Shanghai, People's Republic of China
| | - Bing Hu
- Institute of Traditional Chinese Medicine in Oncology, Department of Oncology, Longhua Hospital, Shanghai University of Traditional Chinese Medicine, 200032, Shanghai, People's Republic of China.
| | - Hong-Mei An
- Department of Science & Technology, Longhua Hospital, Shanghai University of Traditional Chinese Medicine, 200032, Shanghai, People's Republic of China.
| |
Collapse
|
2
|
Genome wide association study identifies novel candidate genes for growth and body conformation traits in goats. Sci Rep 2022; 12:9891. [PMID: 35701479 PMCID: PMC9197946 DOI: 10.1038/s41598-022-14018-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 05/31/2022] [Indexed: 11/08/2022] Open
Abstract
Pakistan is third largest country in term of goat population with distinct characteristics of breeds and estimated population of 78.2 million. Punjab province has 37% of country's total population with seven important documented goat breeds namely Beetal, Daira Din Pannah, Nachi, Barbari, Teddi, Pahari and Pothwari. There is paucity of literature on GWAS for economically important traits i.e., body weight and morphometric measurements. Therefore, we performed GWAS using 50 K SNP Chip for growth in term of age adjusted body weight and morphometric measurements in order to identify genomic regions influencing these traits among Punjab goat breeds. Blood samples were collected from 879 unrelated animals of seven goat breeds along with data for body weight and morphometric measurements including body length, body height, pubic bone length, heart girth and chest length. Genomic DNA was extracted and genotyped using 50 K SNP bead chip. Association of genotypic data with the phenotypic data was performed using Plink 1.9 software. Linear mixed model was used for the association study. Genes were annotated from Capra hircus genome using assembly ARS1. We have identified a number of highly significant SNPs and respective candidate genes associated with growth and body conformation traits. The functional aspects of these candidate genes suggested their potential role in body growth. Moreover, pleiotropic effects were observed for some SNPs for body weight and conformation traits. The results of current study contributed to a better understanding of genes influencing growth and body conformation traits in goat.
Collapse
|
3
|
Morciano G, Rimessi A, Patergnani S, Vitto VAM, Danese A, Kahsay A, Palumbo L, Bonora M, Wieckowski MR, Giorgi C, Pinton P. Calcium dysregulation in heart diseases: Targeting calcium channels to achieve a correct calcium homeostasis. Pharmacol Res 2022; 177:106119. [PMID: 35131483 DOI: 10.1016/j.phrs.2022.106119] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 02/01/2022] [Accepted: 02/03/2022] [Indexed: 12/16/2022]
Abstract
Intracellular calcium signaling is a universal language source shared by the most part of biological entities inside cells that, all together, give rise to physiological and functional anatomical units, the organ. Although preferentially recognized as signaling between cell life and death processes, in the heart it assumes additional relevance considered the importance of calcium cycling coupled to ATP consumption in excitation-contraction coupling. The concerted action of a plethora of exchangers, channels and pumps inward and outward calcium fluxes where needed, to convert energy and electric impulses in muscle contraction. All this without realizing it, thousands of times, every day. An improper function of those proteins (i.e., variation in expression, mutations onset, dysregulated channeling, differential protein-protein interactions) being part of this signaling network triggers a short circuit with severe acute and chronic pathological consequences reported as arrhythmias, cardiac remodeling, heart failure, reperfusion injury and cardiomyopathies. By acting with chemical, peptide-based and pharmacological modulators of these players, a correction of calcium homeostasis can be achieved accompanied by an amelioration of clinical symptoms. This review will focus on all those defects in calcium homeostasis which occur in the most common cardiac diseases, including myocardial infarction, arrhythmia, hypertrophy, heart failure and cardiomyopathies. This part will be introduced by the state of the art on the proteins involved in calcium homeostasis in cardiomyocytes and followed by the therapeutic treatments that to date, are able to target them and to revert the pathological phenotype.
Collapse
Affiliation(s)
- Giampaolo Morciano
- Laboratory for Technologies of Advanced Therapies (LTTA), Section of Experimental Medicine, Department of Medical Sciences, University of Ferrara, 44121 Ferrara, Italy; Maria Cecilia Hospital, GVM Care & Research, 48033 Cotignola, RA, Italy.
| | - Alessandro Rimessi
- Laboratory for Technologies of Advanced Therapies (LTTA), Section of Experimental Medicine, Department of Medical Sciences, University of Ferrara, 44121 Ferrara, Italy
| | - Simone Patergnani
- Laboratory for Technologies of Advanced Therapies (LTTA), Section of Experimental Medicine, Department of Medical Sciences, University of Ferrara, 44121 Ferrara, Italy
| | - Veronica A M Vitto
- Laboratory for Technologies of Advanced Therapies (LTTA), Section of Experimental Medicine, Department of Medical Sciences, University of Ferrara, 44121 Ferrara, Italy
| | - Alberto Danese
- Laboratory for Technologies of Advanced Therapies (LTTA), Section of Experimental Medicine, Department of Medical Sciences, University of Ferrara, 44121 Ferrara, Italy
| | - Asrat Kahsay
- Laboratory for Technologies of Advanced Therapies (LTTA), Section of Experimental Medicine, Department of Medical Sciences, University of Ferrara, 44121 Ferrara, Italy
| | - Laura Palumbo
- Laboratory for Technologies of Advanced Therapies (LTTA), Section of Experimental Medicine, Department of Medical Sciences, University of Ferrara, 44121 Ferrara, Italy
| | - Massimo Bonora
- Laboratory for Technologies of Advanced Therapies (LTTA), Section of Experimental Medicine, Department of Medical Sciences, University of Ferrara, 44121 Ferrara, Italy
| | - Mariusz R Wieckowski
- Laboratory of Mitochondrial Biology and Metabolism. Nencki Institute of Experimental Biology, Polish Academy of Sciences, 02-093 Warsaw, Poland
| | - Carlotta Giorgi
- Laboratory for Technologies of Advanced Therapies (LTTA), Section of Experimental Medicine, Department of Medical Sciences, University of Ferrara, 44121 Ferrara, Italy
| | - Paolo Pinton
- Laboratory for Technologies of Advanced Therapies (LTTA), Section of Experimental Medicine, Department of Medical Sciences, University of Ferrara, 44121 Ferrara, Italy; Maria Cecilia Hospital, GVM Care & Research, 48033 Cotignola, RA, Italy.
| |
Collapse
|
4
|
Mieth B, Rozier A, Rodriguez JA, Höhne MMC, Görnitz N, Müller KR. DeepCOMBI: explainable artificial intelligence for the analysis and discovery in genome-wide association studies. NAR Genom Bioinform 2021; 3:lqab065. [PMID: 34296082 PMCID: PMC8291080 DOI: 10.1093/nargab/lqab065] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 05/27/2021] [Accepted: 07/08/2021] [Indexed: 02/06/2023] Open
Abstract
Deep learning has revolutionized data science in many fields by greatly improving prediction performances in comparison to conventional approaches. Recently, explainable artificial intelligence has emerged as an area of research that goes beyond pure prediction improvement by extracting knowledge from deep learning methodologies through the interpretation of their results. We investigate such explanations to explore the genetic architectures of phenotypes in genome-wide association studies. Instead of testing each position in the genome individually, the novel three-step algorithm, called DeepCOMBI, first trains a neural network for the classification of subjects into their respective phenotypes. Second, it explains the classifiers’ decisions by applying layer-wise relevance propagation as one example from the pool of explanation techniques. The resulting importance scores are eventually used to determine a subset of the most relevant locations for multiple hypothesis testing in the third step. The performance of DeepCOMBI in terms of power and precision is investigated on generated datasets and a 2007 study. Verification of the latter is achieved by validating all findings with independent studies published up until 2020. DeepCOMBI is shown to outperform ordinary raw P-value thresholding and other baseline methods. Two novel disease associations (rs10889923 for hypertension, rs4769283 for type 1 diabetes) were identified.
Collapse
Affiliation(s)
- Bettina Mieth
- Machine Learning Group, Technische Universität Berlin, Berlin 10587, Germany
| | - Alexandre Rozier
- Machine Learning Group, Technische Universität Berlin, Berlin 10587, Germany
| | - Juan Antonio Rodriguez
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona 08003, Spain
| | - Marina M C Höhne
- Machine Learning Group, Technische Universität Berlin, Berlin 10587, Germany
| | | | - Klaus-Robert Müller
- Machine Learning Group, Technische Universität Berlin, Berlin 10587, Germany
| |
Collapse
|
5
|
Wang L, Wang R, Fang J. Author's reply to "A novel seven-gene panel predicts the sensitivity and prognosis of head and neck squamous cell carcinoma treated with platinum-based radio(chemo)therapy". Eur Arch Otorhinolaryngol 2021; 278:3599-3600. [PMID: 34255146 DOI: 10.1007/s00405-021-06985-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2021] [Accepted: 07/05/2021] [Indexed: 11/28/2022]
Affiliation(s)
- Lingwa Wang
- Department of Otolaryngology-Head and Neck Surgery, Beijing Tongren Hospital, Capital Medical University, Beijing, People's Republic of China
| | - Ru Wang
- Department of Otolaryngology-Head and Neck Surgery, Beijing Tongren Hospital, Capital Medical University, Beijing, People's Republic of China.
| | - Jugao Fang
- Department of Otolaryngology-Head and Neck Surgery, Beijing Tongren Hospital, Capital Medical University, Beijing, People's Republic of China.
| |
Collapse
|
6
|
Kang J, Coates JT, Strawderman RL, Rosenstein BS, Kerns SL. Genomics models in radiotherapy: From mechanistic to machine learning. Med Phys 2020; 47:e203-e217. [PMID: 32418335 PMCID: PMC8725063 DOI: 10.1002/mp.13751] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Revised: 06/28/2019] [Accepted: 07/17/2019] [Indexed: 12/28/2022] Open
Abstract
Machine learning (ML) provides a broad framework for addressing high-dimensional prediction problems in classification and regression. While ML is often applied for imaging problems in medical physics, there are many efforts to apply these principles to biological data toward questions of radiation biology. Here, we provide a review of radiogenomics modeling frameworks and efforts toward genomically guided radiotherapy. We first discuss medical oncology efforts to develop precision biomarkers. We next discuss similar efforts to create clinical assays for normal tissue or tumor radiosensitivity. We then discuss modeling frameworks for radiosensitivity and the evolution of ML to create predictive models for radiogenomics.
Collapse
Affiliation(s)
- John Kang
- Department of Radiation Oncology, University of Rochester Medical Center, Rochester, NY 14642, USA
| | - James T. Coates
- CRUK/MRC Oxford Institute for Radiation Oncology, University of Oxford, Oxford OX3 7DQ, UK
| | - Robert L. Strawderman
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY 14642, USA
| | - Barry S. Rosenstein
- Department of Radiation Oncology and the Department of Genetics and Genomic Sciences, Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Sarah L. Kerns
- Department of Radiation Oncology, University of Rochester Medical Center, Rochester, NY 14642, USA
| |
Collapse
|
7
|
Renaux C, Buzdugan L, Kalisch M, Bühlmann P. Hierarchical inference for genome-wide association studies: a view on methodology with software. Comput Stat 2020. [DOI: 10.1007/s00180-019-00939-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
8
|
An B, Gao X, Chang T, Xia J, Wang X, Miao J, Xu L, Zhang L, Chen Y, Li J, Xu S, Gao H. Genome-wide association studies using binned genotypes. Heredity (Edinb) 2019; 124:288-298. [PMID: 31641238 DOI: 10.1038/s41437-019-0279-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Revised: 09/25/2019] [Accepted: 09/26/2019] [Indexed: 01/23/2023] Open
Abstract
Linear mixed models (LMM) that tests trait association one marker at a time have been the most popular methods for genome-wide association studies. However, this approach has potential pitfalls: over conservativeness after Bonferroni correction, ignorance of linkage disequilibrium (LD) between neighboring markers, and power reduction due to overfitting SNP effects. So, multiple locus models that can simultaneously estimate and test all markers in the genome are more appropriate. Based on the multiple locus models, we proposed a bin model that combines markers into bins based on their LD relationships. A bin is treated as a new synthetic marker and we detect the associations between bins and traits. Since the number of bins can be substantially smaller than the number of markers, a penalized multiple regression method can be adopted by fitting all bins to a single model. We developed an innovative method to bin the neighboring markers and used the least absolute shrinkage and selection operator (LASSO) method. We compared BIN-Lasso with SNP-Lasso and Q + K-LMM in a simulation experiment, and showed that the new method is more powerful with less Type I error than the other two methods. We also applied the bin model to a Chinese Simmental beef cattle population for bone weight association study. The new method identified more significant associations than the classical LMM. The bin model is a new dimension reduction technique that takes advantage of biological information (i.e., LD). The new method will be a significant breakthrough in associative genomics in the big data era.
Collapse
Affiliation(s)
- Bingxing An
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Xue Gao
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Tianpeng Chang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Jiangwei Xia
- Institute of Basic Medical Science, Westlake Institute for Advanced Study, Hangzhou, China
| | - Xiaoqiao Wang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Jian Miao
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lingyang Xu
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lupei Zhang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Yan Chen
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Junya Li
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Shizhong Xu
- Department of Botany and Plant Sciences, University of California, Riverside, CA, USA
| | - Huijiang Gao
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China.
| |
Collapse
|
9
|
Kawabata T, Emoto R, Nishino J, Takahashi K, Matsui S. Two-stage analysis for selecting fixed numbers of features in omics association studies. Stat Med 2019; 38:2956-2971. [PMID: 30931544 DOI: 10.1002/sim.8150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Revised: 12/31/2018] [Accepted: 02/28/2019] [Indexed: 11/07/2022]
Abstract
One of main roles of omics-based association studies with high-throughput technologies is to screen out relevant molecular features, such as genetic variants, genes, and proteins, from a large pool of such candidate features based on their associations with the phenotype of interest. Typically, screened features are subject to validation studies using more established or conventional assays, where the number of evaluable features is relatively limited, so that there may exist a fixed number of features measurable by these assays. Such a limitation necessitates narrowing a feature set down to a fixed size, following an initial screening analysis via multiple testing where adjustment for multiplicity is made. We propose a two-stage screening approach to control the false discovery rate (FDR) for a feature set with fixed size that is subject to validation studies, rather than for a feature set from the initial screening analysis. Out of the feature set selected in the first stage with a relaxed FDR level, a fraction of features with most statistical significance is firstly selected. For the remaining feature set, features are selected based on biological consideration only, without regard to any statistical information, which allows evaluating the FDR level for the finally selected feature set with fixed size. Improvement of the power is discussed in the proposed two-stage screening approach. Simulation experiments based on parametric models and real microarray datasets demonstrated substantial increment in the number of screened features for biological consideration compared with the standard screening approach, allowing for more extensive and in-depth biological investigations in omics association studies.
Collapse
Affiliation(s)
- Takanori Kawabata
- Clinical Research Promotion Unit, Clinical Research Center, Shizuoka Cancer Center, Shizuoka, Japan
| | - Ryo Emoto
- Department of Biostatistics, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Jo Nishino
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan
| | - Kunihiko Takahashi
- Department of Biostatistics, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Shigeyuki Matsui
- Department of Biostatistics, Nagoya University Graduate School of Medicine, Nagoya, Japan
| |
Collapse
|
10
|
Ding R, Yang M, Quan J, Li S, Zhuang Z, Zhou S, Zheng E, Hong L, Li Z, Cai G, Huang W, Wu Z, Yang J. Single-Locus and Multi-Locus Genome-Wide Association Studies for Intramuscular Fat in Duroc Pigs. Front Genet 2019; 10:619. [PMID: 31316554 PMCID: PMC6609572 DOI: 10.3389/fgene.2019.00619] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2018] [Accepted: 06/13/2019] [Indexed: 12/26/2022] Open
Abstract
Intramuscular fat (IMF) is an important quantitative trait of meat, which affects the associated sensory properties and nutritional value of pork. To gain a better understanding of the genetic determinants of IMF, we used a composite strategy, including single-locus and multi-locus association analyses to perform genome-wide association studies (GWAS) for IMF in 1,490 Duroc boars. We estimated the genomic heritability of IMF to be 0.23 ± 0.04. A total of 30 single nucleotide polymorphisms (SNPs) were found to be significantly associated with IMF. The single-locus mixed linear model (MLM) and multiple-locus methods multi-locus random-SNP-effect mixed linear model (mrMLM), fast multi-locus random-SNP-effect efficient mixed model association (FASTmrEMMA), and integrative sure independence screening expectation maximization Bayesian least absolute shrinkage and selection operator model (ISIS EM-BLASSO) analyses identified 5, 9, 8, and 21 significant SNPs, respectively. Interestingly, a novel quantitative trait locus (QTL) on SSC 7 was found to affect IMF. In addition, 10 candidate genes (BDKRB2, GTF2IRD1, UTRN, TMEM138, DPYD, CASQ2, ZNF518B, S1PR1, GPC6, and GLI1) were found to be associated with IMF based on their potential functional roles in IMF. GO analysis showed that most of the genes were involved in muscle and organ development. A significantly enriched KEGG pathway, the sphingolipid signaling pathway, was reported to be associated with fat deposition and obesity. Identification of novel variants and functional genes will advance our understanding of the genetic mechanisms of IMF and provide specific opportunities for marker-assisted or genomic selection in pigs. In general, such a composite single-locus and multi-locus strategy for GWAS may be useful for understanding the genetic architecture of economic traits in livestock.
Collapse
Affiliation(s)
- Rongrong Ding
- College of Animal Science and National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangdong, China
| | - Ming Yang
- National Engineering Research Center for Breeding Swine Industry, Guangdong Wens Foodstuffs Group, Co., Ltd., Guangdong, China
| | - Jianping Quan
- College of Animal Science and National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangdong, China
| | - Shaoyun Li
- College of Animal Science and National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangdong, China
| | - Zhanwei Zhuang
- College of Animal Science and National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangdong, China
| | - Shenping Zhou
- College of Animal Science and National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangdong, China
| | - Enqin Zheng
- College of Animal Science and National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangdong, China
| | - Linjun Hong
- College of Animal Science and National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangdong, China
| | - Zicong Li
- College of Animal Science and National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangdong, China
| | - Gengyuan Cai
- College of Animal Science and National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangdong, China.,National Engineering Research Center for Breeding Swine Industry, Guangdong Wens Foodstuffs Group, Co., Ltd., Guangdong, China
| | - Wen Huang
- Department of Animal Science, Michigan State University, East Lansing, MI, United States
| | - Zhenfang Wu
- College of Animal Science and National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangdong, China.,National Engineering Research Center for Breeding Swine Industry, Guangdong Wens Foodstuffs Group, Co., Ltd., Guangdong, China
| | - Jie Yang
- College of Animal Science and National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangdong, China
| |
Collapse
|
11
|
Kang J, Rancati T, Lee S, Oh JH, Kerns SL, Scott JG, Schwartz R, Kim S, Rosenstein BS. Machine Learning and Radiogenomics: Lessons Learned and Future Directions. Front Oncol 2018; 8:228. [PMID: 29977864 PMCID: PMC6021505 DOI: 10.3389/fonc.2018.00228] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Accepted: 06/04/2018] [Indexed: 12/25/2022] Open
Abstract
Due to the rapid increase in the availability of patient data, there is significant interest in precision medicine that could facilitate the development of a personalized treatment plan for each patient on an individual basis. Radiation oncology is particularly suited for predictive machine learning (ML) models due to the enormous amount of diagnostic data used as input and therapeutic data generated as output. An emerging field in precision radiation oncology that can take advantage of ML approaches is radiogenomics, which is the study of the impact of genomic variations on the sensitivity of normal and tumor tissue to radiation. Currently, patients undergoing radiotherapy are treated using uniform dose constraints specific to the tumor and surrounding normal tissues. This is suboptimal in many ways. First, the dose that can be delivered to the target volume may be insufficient for control but is constrained by the surrounding normal tissue, as dose escalation can lead to significant morbidity and rare. Second, two patients with nearly identical dose distributions can have substantially different acute and late toxicities, resulting in lengthy treatment breaks and suboptimal control, or chronic morbidities leading to poor quality of life. Despite significant advances in radiogenomics, the magnitude of the genetic contribution to radiation response far exceeds our current understanding of individual risk variants. In the field of genomics, ML methods are being used to extract harder-to-detect knowledge, but these methods have yet to fully penetrate radiogenomics. Hence, the goal of this publication is to provide an overview of ML as it applies to radiogenomics. We begin with a brief history of radiogenomics and its relationship to precision medicine. We then introduce ML and compare it to statistical hypothesis testing to reflect on shared lessons and to avoid common pitfalls. Current ML approaches to genome-wide association studies are examined. The application of ML specifically to radiogenomics is next presented. We end with important lessons for the proper integration of ML into radiogenomics.
Collapse
Affiliation(s)
- John Kang
- Department of Radiation Oncology, University of Rochester Medical Center, Rochester, NY, United States
| | - Tiziana Rancati
- Prostate Cancer Program, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy
| | - Sangkyu Lee
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY, United States
| | - Jung Hun Oh
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY, United States
| | - Sarah L. Kerns
- Department of Radiation Oncology, University of Rochester Medical Center, Rochester, NY, United States
| | - Jacob G. Scott
- Department of Translational Hematology and Oncology Research, Cleveland Clinic, Cleveland, OH, United States
- Department of Radiation Oncology, Cleveland Clinic, Cleveland, OH, United States
| | - Russell Schwartz
- Computational Biology Department, Carnegie Mellon School of Computer Science, Pittsburgh, PA, United States
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Seyoung Kim
- Computational Biology Department, Carnegie Mellon School of Computer Science, Pittsburgh, PA, United States
| | - Barry S. Rosenstein
- Department of Radiation Oncology, Icahn School of Medicine at Mount Sinai, New York, NY, United States
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| |
Collapse
|
12
|
Kennedy AE, Ozbek U, Dorak MT. What has GWAS done for HLA and disease associations? Int J Immunogenet 2018; 44:195-211. [PMID: 28877428 DOI: 10.1111/iji.12332] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2017] [Revised: 06/16/2017] [Accepted: 07/20/2017] [Indexed: 12/14/2022]
Abstract
The major histocompatibility complex (MHC) is located in chromosome 6p21 and contains crucial regulators of immune response, including human leucocyte antigen (HLA) genes, alongside other genes with nonimmunological roles. More recently, a repertoire of noncoding RNA genes, including expressed pseudogenes, has also been identified. The MHC is the most gene dense and most polymorphic part of the human genome. The region exhibits haplotype-specific linkage disequilibrium patterns, contains the strongest cis- and trans-eQTLs/meQTLs in the genome and is known as a hot spot for disease associations. Another layer of complexity is provided to the region by the extreme structural variation and copy number variations. While the HLA-B gene has the highest number of alleles, the HLA-DR/DQ subregion is structurally most variable and shows the highest number of disease associations. Reliance on a single reference sequence has complicated the design, execution and analysis of GWAS for the MHC region and not infrequently, the MHC region has even been excluded from the analysis of GWAS data. Here, we contrast features of the MHC region with the rest of the genome and highlight its complexities, including its functional polymorphisms beyond those determined by single nucleotide polymorphisms or single amino acid residues. One of the several issues with customary GWAS analysis is that it does not address this additional layer of polymorphisms unique to the MHC region. We highlight alternative approaches that may assist with the analysis of GWAS data from the MHC region and unravel associations with all functional polymorphisms beyond single SNPs. We suggest that despite already showing the highest number of disease associations, the true extent of the involvement of the MHC region in disease genetics may not have been uncovered.
Collapse
Affiliation(s)
- A E Kennedy
- Center for Research Strategy, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - U Ozbek
- Department of Population Health Science and Policy, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - M T Dorak
- Head of School of Life Sciences, Pharmacy and Chemistry, Kingston University London, Kingston-upon-Thames, UK
| |
Collapse
|
13
|
Miao J, Wang X, Bao J, Jin S, Chang T, Xia J, Yang L, Zhu B, Xu L, Zhang L, Gao X, Chen Y, Li J, Gao H. Multimarker and rare variants genomewide association studies for bone weight in Simmental cattle. J Anim Breed Genet 2018; 135:159-169. [DOI: 10.1111/jbg.12326] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2017] [Accepted: 03/27/2018] [Indexed: 12/30/2022]
Affiliation(s)
- J. Miao
- Laboratory of Molecular Biology and Bovine Breeding; Institute of Animal Sciences; Chinese Academy of Agricultural Sciences; Beijing China
- College of Animal Sciences; Fujian Agriculture and Forestry University; Fujian China
| | - X. Wang
- Laboratory of Molecular Biology and Bovine Breeding; Institute of Animal Sciences; Chinese Academy of Agricultural Sciences; Beijing China
| | - J. Bao
- Veterinary Bureau of Wulagai Precinct in Xilin Gol League; Wulagai China
| | - S. Jin
- Veterinary Bureau of Wulagai Precinct in Xilin Gol League; Wulagai China
| | - T. Chang
- Laboratory of Molecular Biology and Bovine Breeding; Institute of Animal Sciences; Chinese Academy of Agricultural Sciences; Beijing China
| | - J. Xia
- Laboratory of Molecular Biology and Bovine Breeding; Institute of Animal Sciences; Chinese Academy of Agricultural Sciences; Beijing China
| | - L. Yang
- Laboratory of Molecular Biology and Bovine Breeding; Institute of Animal Sciences; Chinese Academy of Agricultural Sciences; Beijing China
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province; Sichuan Agricultural University; Sichuan China
| | - B. Zhu
- Laboratory of Molecular Biology and Bovine Breeding; Institute of Animal Sciences; Chinese Academy of Agricultural Sciences; Beijing China
| | - L. Xu
- Laboratory of Molecular Biology and Bovine Breeding; Institute of Animal Sciences; Chinese Academy of Agricultural Sciences; Beijing China
| | - L. Zhang
- Laboratory of Molecular Biology and Bovine Breeding; Institute of Animal Sciences; Chinese Academy of Agricultural Sciences; Beijing China
| | - X. Gao
- Laboratory of Molecular Biology and Bovine Breeding; Institute of Animal Sciences; Chinese Academy of Agricultural Sciences; Beijing China
| | - Y. Chen
- Laboratory of Molecular Biology and Bovine Breeding; Institute of Animal Sciences; Chinese Academy of Agricultural Sciences; Beijing China
| | - J. Li
- Laboratory of Molecular Biology and Bovine Breeding; Institute of Animal Sciences; Chinese Academy of Agricultural Sciences; Beijing China
| | - H. Gao
- Laboratory of Molecular Biology and Bovine Breeding; Institute of Animal Sciences; Chinese Academy of Agricultural Sciences; Beijing China
| |
Collapse
|
14
|
Otani T, Noma H, Nishino J, Matsui S. Re-assessment of multiple testing strategies for more efficient genome-wide association studies. Eur J Hum Genet 2018. [PMID: 29523830 DOI: 10.1038/s41431-018-0125-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Although enormous costs have been dedicated to discovering relevant disease-related genetic variants, especially in genome-wide association studies (GWASs), only a small fraction of estimated heritability can be explained by these results. This is the so-called missing heritability problem. The conventional use of overly conservative multiple testing strategies based on controlling the familywise error rate (FWER), in particular with a genome-wide significance threshold of P <5 × 10-8, is one of the most important issues from a statistical perspective. To help resolve this problem, we performed comprehensive re-assessments of currently available strategies using recently published, extremely large-scale GWAS data sets of rheumatoid arthritis and schizophrenia (>50,000 subjects). The estimates of statistical power averaged for all disease-related genetic variants of the standard FWER-based strategy were only 0.09% for the rheumatoid arthritis data and 0.04% for the schizophrenia data. To design more efficient strategies, we also conducted an extensive comparison of multiple testing strategies by applying false discovery rate (FDR)-controlling procedures to these data sets and simulations, and found that the FDR-based procedures achieved higher power than the FWER-based strategy, even at a strict FDR level (e.g., FDR = 1%). We also discuss a useful alternative measure, namely "partial power," which is an averaged power for detecting the clinically and biologically meaningful genetic factors with the largest effects. Simulation results suggest that the FDR-based procedures can achieve sufficient partial power (>80%) for detecting these factors (odds ratios of >1.05) with 80,000 subjects, and thus this may be a useful measure for defining realistic objectives of future GWASs.
Collapse
Affiliation(s)
- Takahiro Otani
- Risk Analysis Research Center, The Institute of Statistical Mathematics, Tachikawa, Tokyo, 190-8562, Japan
| | - Hisashi Noma
- Department of Data Science, The Institute of Statistical Mathematics, Tachikawa, Tokyo, 190-8562, Japan.
| | - Jo Nishino
- Department of Biostatistics, Nagoya University Graduate School of Medicine, Nagoya, Aichi, 466-8550, Japan
| | - Shigeyuki Matsui
- Department of Biostatistics, Nagoya University Graduate School of Medicine, Nagoya, Aichi, 466-8550, Japan
| |
Collapse
|
15
|
Mieth B, Kloft M, Rodríguez JA, Sonnenburg S, Vobruba R, Morcillo-Suárez C, Farré X, Marigorta UM, Fehr E, Dickhaus T, Blanchard G, Schunk D, Navarro A, Müller KR. Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies. Sci Rep 2016; 6:36671. [PMID: 27892471 PMCID: PMC5125008 DOI: 10.1038/srep36671] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Accepted: 10/06/2016] [Indexed: 12/21/2022] Open
Abstract
The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008-2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0.
Collapse
Affiliation(s)
- Bettina Mieth
- Machine Learning Group, Technische Universität Berlin, Berlin, 10587, Germany
| | - Marius Kloft
- Department of Computer Science, Humboldt University of Berlin, Berlin, 10099, Germany
| | - Juan Antonio Rodríguez
- Institut de Biología Evolutiva (CSIC-UPF). Departament de Ciències Experimentals i de la Salut. Universitat Pompeu Fabra, Barcelona, 08003, Spain
| | | | - Robin Vobruba
- Machine Learning Group, Technische Universität Berlin, Berlin, 10587, Germany
| | - Carlos Morcillo-Suárez
- Institut de Biología Evolutiva (CSIC-UPF). Departament de Ciències Experimentals i de la Salut. Universitat Pompeu Fabra, Barcelona, 08003, Spain
| | - Xavier Farré
- Institut de Biología Evolutiva (CSIC-UPF). Departament de Ciències Experimentals i de la Salut. Universitat Pompeu Fabra, Barcelona, 08003, Spain
| | - Urko M. Marigorta
- School of Biology, Georgia Institute of Technology, Atlanta, 30332, GA, USA
| | - Ernst Fehr
- Department of Economics, Laboratory for Social and Neural Systems Research, University of Zurich, Zurich, 8006, Switzerland
| | - Thorsten Dickhaus
- Institute for Statistics (FB 3), University of Bremen, Bremen, 28359, Germany
| | - Gilles Blanchard
- Department of Mathematics, University of Potsdam, Potsdam, 14476, Germany
| | - Daniel Schunk
- Department of Economics, University of Mainz, Mainz, 55099, Germany
| | - Arcadi Navarro
- Institut de Biología Evolutiva (CSIC-UPF). Departament de Ciències Experimentals i de la Salut. Universitat Pompeu Fabra, Barcelona, 08003, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, 08010, Spain
- Center for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, 08003, Spain
| | - Klaus-Robert Müller
- Machine Learning Group, Technische Universität Berlin, Berlin, 10587, Germany
- Department of Brain and Cognitive Engineering, Korea University, Seoul, Republic of Korea
| |
Collapse
|
16
|
Buzdugan L, Kalisch M, Navarro A, Schunk D, Fehr E, Bühlmann P. Assessing statistical significance in multivariable genome wide association analysis. Bioinformatics 2016; 32:1990-2000. [PMID: 27153677 PMCID: PMC4920127 DOI: 10.1093/bioinformatics/btw128] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2015] [Accepted: 03/02/2016] [Indexed: 01/29/2023] Open
Abstract
Motivation: Although Genome Wide Association Studies (GWAS) genotype a very large number of single nucleotide polymorphisms (SNPs), the data are often analyzed one SNP at a time. The low predictive power of single SNPs, coupled with the high significance threshold needed to correct for multiple testing, greatly decreases the power of GWAS. Results: We propose a procedure in which all the SNPs are analyzed in a multiple generalized linear model, and we show its use for extremely high-dimensional datasets. Our method yields P-values for assessing significance of single SNPs or groups of SNPs while controlling for all other SNPs and the family wise error rate (FWER). Thus, our method tests whether or not a SNP carries any additional information about the phenotype beyond that available by all the other SNPs. This rules out spurious correlations between phenotypes and SNPs that can arise from marginal methods because the ‘spuriously correlated’ SNP merely happens to be correlated with the ‘truly causal’ SNP. In addition, the method offers a data driven approach to identifying and refining groups of SNPs that jointly contain informative signals about the phenotype. We demonstrate the value of our method by applying it to the seven diseases analyzed by the Wellcome Trust Case Control Consortium (WTCCC). We show, in particular, that our method is also capable of finding significant SNPs that were not identified in the original WTCCC study, but were replicated in other independent studies. Availability and implementation: Reproducibility of our research is supported by the open-source Bioconductor package hierGWAS. Contact:peter.buehlmann@stat.math.ethz.ch Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Laura Buzdugan
- Seminar for Statistics, Department of Mathematics, ETH Zürich, Zürich 8092, Switzerland Department of Economics, University of Zürich, Zürich 8006, Switzerland
| | - Markus Kalisch
- Seminar for Statistics, Department of Mathematics, ETH Zürich, Zürich 8092, Switzerland
| | - Arcadi Navarro
- Institute of Evolutionary Biology (CSIC-UPF), Universitat Pompeu Fabra, Barcelona 08003, Spain Institució Catalana de Recerca i Estudis Avançats (ICREA) Center for Genomic Regulation (CRG), Barcelona Biomedical Research Park (PRBB), Barcelona 08003, Spain
| | - Daniel Schunk
- Department of Economics, University of Mainz, Mainz, Germany
| | - Ernst Fehr
- Department of Economics, University of Zürich, Zürich 8006, Switzerland
| | - Peter Bühlmann
- Seminar for Statistics, Department of Mathematics, ETH Zürich, Zürich 8092, Switzerland
| |
Collapse
|
17
|
Lee S, Breheny P. Strong Rules for Nonconvex Penalties and Their Implications for Efficient Algorithms in High-Dimensional Regression. J Comput Graph Stat 2015. [DOI: 10.1080/10618600.2014.975231] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
18
|
Yang J, Wang S, Yang Z, Hodgkinson CA, Iarikova P, Ma JZ, Payne TJ, Goldman D, Li MD. The contribution of rare and common variants in 30 genes to risk nicotine dependence. Mol Psychiatry 2015; 20:1467-78. [PMID: 25450229 PMCID: PMC4452458 DOI: 10.1038/mp.2014.156] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/31/2014] [Revised: 09/28/2014] [Accepted: 10/08/2014] [Indexed: 01/17/2023]
Abstract
Genetic and functional studies have revealed that both common and rare variants of several nicotinic acetylcholine receptor subunits are associated with nicotine dependence (ND). In this study, we identified variants in 30 candidate genes including nicotinic receptors in 200 sib pairs selected from the Mid-South Tobacco Family population with equal numbers of African Americans (AAs) and European Americans (EAs). We selected 135 of the rare and common variants and genotyped them in the Mid-South Tobacco Case-Control (MSTCC) population, which consists of 3088 AAs and 1430 EAs. None of the genotyped common variants showed significant association with smoking status (smokers vs non-smokers), Fagerström Test for ND scores or indexed cigarettes per day after Bonferroni correction. Rare variants in NRXN1, CHRNA9, CHRNA2, NTRK2, GABBR2, GRIN3A, DNM1, NRXN2, NRXN3 and ARRB2 were significantly associated with smoking status in the MSTCC AA sample, with weighted sum statistic (WSS) P-values ranging from 2.42 × 10(-3) to 1.31 × 10(-4) after 10(6) phenotype rearrangements. We also observed a significant excess of rare nonsynonymous variants exclusive to EA smokers in NRXN1, CHRNA9, TAS2R38, GRIN3A, DBH, ANKK1/DRD2, NRXN3 and CDH13 with WSS P-values between 3.5 × 10(-5) and 1 × 10(-6). Variants rs142807401 (A432T) and rs139982841 (A452V) in CHRNA9 and variants V132L, V389L, rs34755188 (R480H) and rs75981117 (N549S) in GRIN3A are of particular interest because they are found in both the AA and EA samples. A significant aggregate contribution of rare and common coding variants in CHRNA9 to the risk for ND (SKAT-C: P=0.0012) was detected by applying the combined sum test in MSTCC EAs. Together, our results indicate that rare variants alone or combined with common variants in a subset of 30 biological candidate genes contribute substantially to the risk of ND.
Collapse
Affiliation(s)
- Jiekun Yang
- Department of Psychiatry and Neurobehavioral Sciences, University of Virginia, Charlottesville, VA 22903
| | - Shaolin Wang
- Department of Psychiatry and Neurobehavioral Sciences, University of Virginia, Charlottesville, VA 22903
| | - Zhongli Yang
- Department of Psychiatry and Neurobehavioral Sciences, University of Virginia, Charlottesville, VA 22903
| | | | | | - Jennie Z. Ma
- Department of Public Health Sciences, University of Virginia, Charlottesville
| | - Thomas J. Payne
- ACT Center for Tobacco Treatment, Education and Research, Department of Otolaryngology and Communicative Sciences, University of Mississippi Medical Center, Jackson, MS 39213
| | - David Goldman
- Laboratory of Neurogenetics, NIAAA, NIH; Bethesda, MD 20852
| | - Ming D. Li
- Department of Psychiatry and Neurobehavioral Sciences, University of Virginia, Charlottesville, VA 22903
| |
Collapse
|
19
|
Zheng J, Rao DC, Shi G. An update on genome-wide association studies of hypertension. ACTA ACUST UNITED AC 2015. [DOI: 10.1186/s40535-015-0013-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
20
|
Higgins GA, Allyn-Feuer A, Barbour E, Athey BD. A glutamatergic network mediates lithium response in bipolar disorder as defined by epigenome pathway analysis. Pharmacogenomics 2015; 16:1547-63. [PMID: 26343379 DOI: 10.2217/pgs.15.106] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
AIM A regulatory network in the human brain mediating lithium response in bipolar patients was revealed by analysis of functional SNPs from genome-wide association studies (GWAS) and published gene association studies, followed by epigenome mapping. METHODS An initial set of 23,312 SNPs in linkage disequilibrium with lead SNPs, and sub-threshold GWAS SNPs rescued by pathway analysis, were studied in the same populations. These were assessed using our workflow and annotation by the epigenome roadmap consortium. RESULTS Twenty-seven percent of 802 SNPs that were associated with lithium response (13 published studies gene association studies and two GWAS) were shared in common with 1281 SNPs from 18 GWAS examining psychiatric disorders and adverse events associated with lithium treatment. Nineteen SNPs were annotated as active regulatory elements such as enhancers and promoters in a tissue-specific manner. They were located within noncoding regions of ten genes: ANK3, ARNTL, CACNA1C, CACNG2, CDKN1A, CREB1, GRIA2, GSK3B, NR1D1 and SLC1A2. Following gene set enrichment and pathway analysis, these genes were found to be significantly associated (p = 10(-27); Fisher exact test) with an AMPA2 glutamate receptor network in human brain. Our workflow results showed concordance with annotation of regulatory elements from the epigenome roadmap. Analysis of cognate mRNA and enhancer RNA exhibited patterns consistent with an integrated pathway in human brain. CONCLUSION This pharmacoepigenomic regulatory pathway is located in the same brain regions that exhibit tissue volume loss in bipolar disorder. Although in silico analysis requires biological validation, the approach provides value for identification of candidate variants that may be used in pharmacogenomic testing to identify bipolar patients likely to respond to lithium.
Collapse
Affiliation(s)
- Gerald A Higgins
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA.,Pharmacogenomic Science, Assurex Health, Inc., Mason, OH 45040, USA
| | - Ari Allyn-Feuer
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Edward Barbour
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Brian D Athey
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA.,Department of Psychiatry, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| |
Collapse
|
21
|
Higgins GA, Allyn-Feuer A, Athey BD. Epigenomic mapping and effect sizes of noncoding variants associated with psychotropic drug response. Pharmacogenomics 2015; 16:1565-83. [PMID: 26340055 DOI: 10.2217/pgs.15.105] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
AIM To provide insight into potential regulatory mechanisms of gene expression underlying addiction, analgesia, psychotropic drug response and adverse drug events, genome-wide association studies searching for variants associated with these phenotypes has been undertaken with limited success. We undertook analysis of these results with the aim of applying epigenetic knowledge to aid variant discovery and interpretation. METHODS We applied conditional imputation to results from 26 genome-wide association studies and three candidate gene-association studies. The analysis workflow included data from chromatin conformation capture, chromatin state annotation, DNase I hypersensitivity, hypomethylation, anatomical localization and biochronicity. We also made use of chromatin state data from the epigenome roadmap, transcription factor-binding data, spatial maps from published Hi-C datasets and 'guilt by association' methods. RESULTS We identified 31 pharmacoepigenomic SNPs from a total of 2024 variants in linkage disequilibrium with lead SNPs, of which only 6% were coding variants. Interrogation of chromatin state using our workflow and the epigenome roadmap showed agreement on 34 of 35 tissue assignments to regulatory elements including enhancers and promoters. Loop boundary domains were inferred by association with CTCF (CCCTC-binding factor) and cohesin, suggesting proximity to topologically associating domain boundaries and enhancer clusters. Spatial interactions between enhancer-promoter pairs detected both known and previously unknown mechanisms. Addiction and analgesia SNPs were common in relevant populations and exhibited large effect sizes, whereas a SNP located in the promoter of the SLC1A2 gene exhibited a moderate effect size for lithium response in bipolar disorder in patients of European ancestry. SNPs associated with drug-induced organ injury were rare but exhibited the largest effect sizes, consistent with the published literature. CONCLUSION This work demonstrates that an in silico bioinformatics-based approach using integrative analysis of a diversity of molecular and morphological data types can discover pharmacoepigenomic variants that are suitable candidates for further validation in cell lines, animal models and human clinical trials.
Collapse
Affiliation(s)
- Gerald A Higgins
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, 1301 Catherine Road, Ann Arbor, MI 48109, USA
- Pharmacogenomic Science, Assurex Health, Inc., Mason, OH, USA
| | - Ari Allyn-Feuer
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, 1301 Catherine Road, Ann Arbor, MI 48109, USA
| | - Brian D Athey
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, 1301 Catherine Road, Ann Arbor, MI 48109, USA
- Department of Psychiatry, University of Michigan Medical School, Ann Arbor, MI, USA
| |
Collapse
|
22
|
Okser S, Pahikkala T, Airola A, Salakoski T, Ripatti S, Aittokallio T. Regularized machine learning in the genetic prediction of complex traits. PLoS Genet 2014; 10:e1004754. [PMID: 25393026 PMCID: PMC4230844 DOI: 10.1371/journal.pgen.1004754] [Citation(s) in RCA: 99] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Affiliation(s)
- Sebastian Okser
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
| | - Tapio Pahikkala
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
| | - Antti Airola
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
| | - Tapio Salakoski
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
| | - Samuli Ripatti
- Hjelt Institute, University of Helsinki, Helsinki, Finland
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Wellcome Trust Sanger Institute, Hinxton, United Kingdom
| | - Tero Aittokallio
- Turku Centre for Computer Science (TUCS), University of Turku and Åbo Akademi University, Turku, Finland
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- * E-mail:
| |
Collapse
|
23
|
Abstract
The cost of next-generation sequencing is now approaching that of the first generation of genome-wide single-nucleotide genotyping panels, but this is still out of reach for large-scale epidemiologic studies with tens of thousands of subjects. Furthermore, the anticipated yield of millions of rare variants poses serious challenges for distinguishing causal from noncausal variants for disease. We explore the merits of using family-based designs for sequencing substudies to identify novel variants and prioritize them for their likelihood of causality. While the sharing of variants within families means that family-based designs may be less efficient for discovery than sequencing of a comparable number of unrelated individuals, the ability to exploit cosegregation of variants with disease within families helps distinguish causal from noncausal ones. We introduce a score test criterion for prioritizing discovered variants in terms of their likelihood of being functional. We compare the relative statistical efficiency of 2-stage versus1-stage family-based designs by application to the Genetic Analysis Workshop 18 simulated sequence data.
Collapse
Affiliation(s)
- Zhao Yang
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089-9234, USA
| | - Duncan C Thomas
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089-9234, USA
| |
Collapse
|
24
|
Bayesian systems-based genetic association analysis with effect strength estimation and omic wide interpretation: a case study in rheumatoid arthritis. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2014; 1142:143-76. [PMID: 24706282 DOI: 10.1007/978-1-4939-0404-4_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Rich dependency structures are often formed in genetic association studies between the phenotypic, clinical, and environmental descriptors. These descriptors may not be standardized, and may encompass various disease definitions and clinical endpoints which are only weakly influenced by various (e.g., genetic) factors. Such loosely defined complex intermediate clinical phenotypes are typically used in follow-up candidate gene association studies, e.g., after genome-wide analysis, to deepen the understanding of the associations and to estimate effect strength. This chapter discusses a solid methodology, which is useful in such a scenario, by using probabilistic graphical models, namely, Bayesian networks in the Bayesian statistical framework. This method offers systematically scalable, comprehensive hierarchical hypotheses about multivariate relevance. We discuss its workflow: from data engineering to semantic publication of the results. We overview the construction, visualization, and interpretation of complex hypotheses related to the structural analysis of relevance. Furthermore, we illustrate the use of a dependency model-based relevance measure, which takes into account the structural properties of the model, for quantifying the effect strength. Finally, we discuss the "interpretational" or translational challenge of a genetic association study, with a focus on the fusion of heterogeneous omic knowledge to reintegrate the results into a genome-wide context.
Collapse
|
25
|
Test of rare variant association based on affected sib-pairs. Eur J Hum Genet 2014; 23:229-37. [PMID: 24667785 DOI: 10.1038/ejhg.2014.43] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2013] [Revised: 11/06/2013] [Accepted: 12/30/2013] [Indexed: 11/08/2022] Open
Abstract
With the development of sequencing techniques, there is increasing interest to detect associations between rare variants and complex traits. Quite a few statistical methods to detect associations between rare variants and complex traits have been developed for unrelated individuals. Statistical methods for detecting rare variant associations under family-based designs have not received as much attention as methods for unrelated individuals. Recent studies show that rare disease variants will be enriched in family data and thus family-based designs may improve power to detect rare variant associations. In this article, we propose a novel test to test association between the optimally weighted combination of variants and trait of interests for affected sib-pairs. The optimal weights are analytically derived and can be calculated from sampled genotypes and phenotypes. Based on the optimal weights, the proposed method is robust to the directions of the effects of causal variants and is less affected by neutral variants than existing methods are. Our simulation results show that, in all the cases, the proposed method is substantially more powerful than existing methods based on unrelated individuals and existing methods based on affected sib-pairs.
Collapse
|
26
|
Li J, Dan J, Li C, Wu R. A model-free approach for detecting interactions in genetic association studies. Brief Bioinform 2013; 15:1057-68. [PMID: 24273216 DOI: 10.1093/bib/bbt082] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Over the past few decades, genome-wide association studies analyzed by efficient statistical procedures have successfully identified single-nucleotide polymorphisms (SNPs) that are associated with complex traits or human diseases. However, due to the overwhelming number of SNPs, most approaches have focused on additive genetic model without genome-wide SNP-SNP interactions. In this study, we propose an efficient statistical procedure in a genetic model-free framework for detecting SNPs exhibiting main genetic effects as well as epistatic interactions. Specifically, the association between phenotype and genotype is characterized by an unknown function to be estimated using nonparametric techniques, and a two-stage non-parametric independence screening procedure is proposed to sequentially identify potentially important main genetic effects and interactions. Finally, the subset of genetic predictors implied by two-stage non-parametric independence screening is analyzed by penalized regressions such as LASSO, and a final model is identified. In this framework, specific genetic model is not assumed and interactions are not only among marginally important SNPs. Therefore, SNPs that are involved in genetic regulatory networks but missed by previous studies are expected to be recognized. In simulation studies, we show that the procedure is computationally efficient and has an outstanding finite sample performance in selecting potential SNPs as well as SNP-SNP interactions. A real data analysis further indicates the importance of epistatic interactions in explaining body mass index.
Collapse
|
27
|
Silver M, Chen P, Li R, Cheng CY, Wong TY, Tai ES, Teo YY, Montana G. Pathways-driven sparse regression identifies pathways and genes associated with high-density lipoprotein cholesterol in two Asian cohorts. PLoS Genet 2013; 9:e1003939. [PMID: 24278029 PMCID: PMC3836716 DOI: 10.1371/journal.pgen.1003939] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2013] [Accepted: 09/11/2013] [Indexed: 01/11/2023] Open
Abstract
Standard approaches to data analysis in genome-wide association studies (GWAS) ignore any potential functional relationships between gene variants. In contrast gene pathways analysis uses prior information on functional structure within the genome to identify pathways associated with a trait of interest. In a second step, important single nucleotide polymorphisms (SNPs) or genes may be identified within associated pathways. The pathways approach is motivated by the fact that genes do not act alone, but instead have effects that are likely to be mediated through their interaction in gene pathways. Where this is the case, pathways approaches may reveal aspects of a trait's genetic architecture that would otherwise be missed when considering SNPs in isolation. Most pathways methods begin by testing SNPs one at a time, and so fail to capitalise on the potential advantages inherent in a multi-SNP, joint modelling approach. Here, we describe a dual-level, sparse regression model for the simultaneous identification of pathways and genes associated with a quantitative trait. Our method takes account of various factors specific to the joint modelling of pathways with genome-wide data, including widespread correlation between genetic predictors, and the fact that variants may overlap multiple pathways. We use a resampling strategy that exploits finite sample variability to provide robust rankings for pathways and genes. We test our method through simulation, and use it to perform pathways-driven gene selection in a search for pathways and genes associated with variation in serum high-density lipoprotein cholesterol levels in two separate GWAS cohorts of Asian adults. By comparing results from both cohorts we identify a number of candidate pathways including those associated with cardiomyopathy, and T cell receptor and PPAR signalling. Highlighted genes include those associated with the L-type calcium channel, adenylate cyclase, integrin, laminin, MAPK signalling and immune function.
Collapse
Affiliation(s)
- Matt Silver
- Statistics Section, Department of Mathematics, Imperial College, London, United Kingdom
- MRC International Nutrition Group, London School of Hygiene and Tropical Medicine, London, United Kingdom
- * E-mail:
| | - Peng Chen
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
| | - Ruoying Li
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Ching-Yu Cheng
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
- Department of Ophthalmology, National University of Singapore, Singapore
- Singapore Eye Research Institute, Singapore National Eye Center, Singapore
| | - Tien-Yin Wong
- Department of Ophthalmology, National University of Singapore, Singapore
- Singapore Eye Research Institute, Singapore National Eye Center, Singapore
| | - E-Shyong Tai
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Yik-Ying Teo
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
- NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore
- Life Sciences Institute, National University of Singapore, Singapore
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore
- Department of Statistics and Applied Probability, National University of Singapore, Singapore
| | - Giovanni Montana
- Statistics Section, Department of Mathematics, Imperial College, London, United Kingdom
| |
Collapse
|
28
|
Evangelou E, Ioannidis JPA. Meta-analysis methods for genome-wide association studies and beyond. Nat Rev Genet 2013; 14:379-89. [PMID: 23657481 DOI: 10.1038/nrg3472] [Citation(s) in RCA: 382] [Impact Index Per Article: 34.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Meta-analysis of genome-wide association studies (GWASs) has become a popular method for discovering genetic risk variants. Here, we overview both widely applied and newer statistical methods for GWAS meta-analysis, including issues of interpretation and assessment of sources of heterogeneity. We also discuss extensions of these meta-analysis methods to complex data. Where possible, we provide guidelines for researchers who are planning to use these methods. Furthermore, we address special issues that may arise for meta-analysis of sequencing data and rare variants. Finally, we discuss challenges and solutions surrounding the goals of making meta-analysis data publicly available and building powerful consortia.
Collapse
Affiliation(s)
- Evangelos Evangelou
- Clinical and Molecular Epidemiology Unit, Department of Hygiene and Epidemiology, University of Ioannina Medical School, Ioannina 45110, Greece
| | | |
Collapse
|
29
|
Benke KS, Wu Y, Fallin DM, Maher B, Palmer LJ. Strategy to control type I error increases power to identify genetic variation using the full biological trajectory. Genet Epidemiol 2013; 37:419-30. [PMID: 23633177 DOI: 10.1002/gepi.21733] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2013] [Revised: 03/21/2013] [Accepted: 04/02/2013] [Indexed: 01/18/2023]
Abstract
Genome-wide association studies have been successful in identifying loci that underlie continuous traits measured at a single time point. To additionally consider continuous traits longitudinally, it is desirable to look at SNP effects at baseline and over time using linear-mixed effects models. Estimation and interpretation of two coefficients in the same model raises concern regarding the optimal control of type I error. To investigate this issue, we calculate type I error and power under an alternative for joint tests, including the two degree of freedom likelihood ratio test, and compare this to single degree of freedom tests for each effect separately at varying alpha levels. We show which joint tests are the optimal way to control the type I error and also illustrate that information can be gained by joint testing in situations where either or both SNP effects are underpowered. We also show that closed form power calculations can approximate simulated power for the case of balanced data, provide reasonable approximations for imbalanced data, but overestimate power for complicated residual error structures. We conclude that a two degree of freedom test is an attractive strategy in a hypothesis-free genome-wide setting and recommend its use for genome-wide studies employing linear-mixed effects models.
Collapse
Affiliation(s)
- K S Benke
- Johns Hopkins Bloomberg School of Public Health, Mental Health Department, Baltimore, Maryland 21205, USA.
| | | | | | | | | |
Collapse
|
30
|
Divaris K, Monda KL, North KE, Olshan AF, Reynolds LM, Hsueh WC, Lange EM, Moss K, Barros SP, Weyant RJ, Liu Y, Newman AB, Beck JD, Offenbacher S. Exploring the genetic basis of chronic periodontitis: a genome-wide association study. Hum Mol Genet 2013; 22:2312-24. [PMID: 23459936 PMCID: PMC3652417 DOI: 10.1093/hmg/ddt065] [Citation(s) in RCA: 179] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Chronic periodontitis (CP) is a common oral disease that confers substantial systemic inflammatory and microbial burden and is a major cause of tooth loss. Here, we present the results of a genome-wide association study of CP that was carried out in a cohort of 4504 European Americans (EA) participating in the Atherosclerosis Risk in Communities (ARIC) Study (mean age—62 years, moderate CP—43% and severe CP—17%). We detected no genome-wide significant association signals for CP; however, we found suggestive evidence of association (P < 5 × 10−6) for six loci, including NIN, NPY, WNT5A for severe CP and NCR2, EMR1, 10p15 for moderate CP. Three of these loci had concordant effect size and direction in an independent sample of 656 adult EA participants of the Health, Aging, and Body Composition (Health ABC) Study. Meta-analysis pooled estimates were severe CP (n = 958 versus health: n = 1909)—NPY, rs2521634 [G]: odds ratio [OR = 1.49 (95% confidence interval (CI = 1.28–1.73, P = 3.5 × 10−7))]; moderate CP (n = 2293)—NCR2, rs7762544 [G]: OR = 1.40 (95% CI = 1.24–1.59, P = 7.5 × 10−8), EMR1, rs3826782 [A]: OR = 2.01 (95% CI = 1.52–2.65, P = 8.2 × 10−7). Canonical pathway analysis indicated significant enrichment of nervous system signaling, cellular immune response and cytokine signaling pathways. A significant interaction of NUAK1 (rs11112872, interaction P = 2.9 × 10−9) with smoking in ARIC was not replicated in Health ABC, although estimates of heritable variance in severe CP explained by all single nucleotide polymorphisms increased from 18 to 52% with the inclusion of a genome-wide interaction term with smoking. These genome-wide association results provide information on multiple candidate regions and pathways for interrogation in future genetic studies of CP.
Collapse
Affiliation(s)
- Kimon Divaris
- Department of Pediatric Dentistry, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Inkster B, Strijbis EM, Vounou M, Kappos L, Radue EW, Matthews PM, Uitdehaag BM, Barkhof F, Polman CH, Montana G, Geurts JJ. Histone deacetylase gene variants predict brain volume changes in multiple sclerosis. Neurobiol Aging 2013; 34:238-47. [DOI: 10.1016/j.neurobiolaging.2012.07.007] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2011] [Revised: 07/05/2012] [Accepted: 07/11/2012] [Indexed: 11/15/2022]
|
32
|
Bacanu SA, Kendler KS. Extracting actionable information from genome scans. Genet Epidemiol 2012; 37:48-59. [PMID: 22996309 DOI: 10.1002/gepi.21682] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2012] [Revised: 08/01/2012] [Accepted: 08/17/2012] [Indexed: 02/02/2023]
Abstract
Genome-wide association studies discovered numerous genetic variants significantly associated with various phenotypes. However, significant signals explain only a small portion of the variation in many traits. One explanation is that missing variation is found in "suggestive signals," i.e., variants with reasonably small P-values. However, it is not clear how to capture this information and use it optimally to design and analyze future studies. We propose to extract the available information from a genome scan by accurately estimating the means of univariate statistics. The means are estimated by: (i) computing the sum of squares (SS) of a genome scan's univariate statistics, (ii) using SS to estimate the expected SS for the means (SSM) of univariate statistics, and (iii) constructing accurate soft threshold (ST) estimators for means of univariate statistics by requiring that the SS of these estimators equals the SSM. When compared to competitors, ST estimators explain a substantially higher fraction of the variability in true means. The accuracy of proposed estimators can be used to design two-tier follow-up studies in which regions close to variants having ST-estimated means above a certain threshold are sequenced at high coverage and the rest of the genome is sequenced at low coverage. This follow-up approach reduces the sequencing burden by at least an order of magnitude when compared to a high coverage sequencing of the whole genome. Finally, we suggest ways in which ST methodology can be used to improve signal detection in future sequencing studies and to perform general statistical model selection.
Collapse
Affiliation(s)
- Silviu-Alin Bacanu
- Department of Psychiatry, Virginia Commonwealth University, Richmond, Virginia, USA.
| | | |
Collapse
|
33
|
Kohannim O, Hibar DP, Stein JL, Jahanshad N, Hua X, Rajagopalan P, Toga AW, Jack CR, Weiner MW, de Zubicaray GI, McMahon KL, Hansell NK, Martin NG, Wright MJ, Thompson PM. Discovery and Replication of Gene Influences on Brain Structure Using LASSO Regression. Front Neurosci 2012; 6:115. [PMID: 22888310 PMCID: PMC3412288 DOI: 10.3389/fnins.2012.00115] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2012] [Accepted: 07/12/2012] [Indexed: 12/12/2022] Open
Abstract
We implemented least absolute shrinkage and selection operator (LASSO) regression to evaluate gene effects in genome-wide association studies (GWAS) of brain images, using an MRI-derived temporal lobe volume measure from 729 subjects scanned as part of the Alzheimer's Disease Neuroimaging Initiative (ADNI). Sparse groups of SNPs in individual genes were selected by LASSO, which identifies efficient sets of variants influencing the data. These SNPs were considered jointly when assessing their association with neuroimaging measures. We discovered 22 genes that passed genome-wide significance for influencing temporal lobe volume. This was a substantially greater number of significant genes compared to those found with standard, univariate GWAS. These top genes are all expressed in the brain and include genes previously related to brain function or neuropsychiatric disorders such as MACROD2, SORCS2, GRIN2B, MAGI2, NPAS3, CLSTN2, GABRG3, NRXN3, PRKAG2, GAS7, RBFOX1, ADARB2, CHD4, and CDH13. The top genes we identified with this method also displayed significant and widespread post hoc effects on voxelwise, tensor-based morphometry (TBM) maps of the temporal lobes. The most significantly associated gene was an autism susceptibility gene known as MACROD2. We were able to successfully replicate the effect of the MACROD2 gene in an independent cohort of 564 young, Australian healthy adult twins and siblings scanned with MRI (mean age: 23.8 ± 2.2 SD years). Our approach powerfully complements univariate techniques in detecting influences of genes on the living brain.
Collapse
Affiliation(s)
- Omid Kohannim
- Imaging Genetics Center at the Laboratory of Neuro Imaging, Department of Neurology, UCLA School of MedicineLos Angeles, CA, USA
| | - Derrek P. Hibar
- Imaging Genetics Center at the Laboratory of Neuro Imaging, Department of Neurology, UCLA School of MedicineLos Angeles, CA, USA
| | - Jason L. Stein
- Imaging Genetics Center at the Laboratory of Neuro Imaging, Department of Neurology, UCLA School of MedicineLos Angeles, CA, USA
| | - Neda Jahanshad
- Imaging Genetics Center at the Laboratory of Neuro Imaging, Department of Neurology, UCLA School of MedicineLos Angeles, CA, USA
| | - Xue Hua
- Imaging Genetics Center at the Laboratory of Neuro Imaging, Department of Neurology, UCLA School of MedicineLos Angeles, CA, USA
| | - Priya Rajagopalan
- Imaging Genetics Center at the Laboratory of Neuro Imaging, Department of Neurology, UCLA School of MedicineLos Angeles, CA, USA
| | - Arthur W. Toga
- Imaging Genetics Center at the Laboratory of Neuro Imaging, Department of Neurology, UCLA School of MedicineLos Angeles, CA, USA
| | | | - Michael W. Weiner
- Department of Radiology, UC San FranciscoSan Francisco, CA, USA
- Department of Medicine, UC San FranciscoSan Francisco, CA, USA
- Department of Psychiatry, UC San FranciscoSan Francisco, CA, USA
- Department of Veterans Affairs Medical CenterSan Francisco, CA, USA
| | | | - Katie L. McMahon
- Center for Advanced Imaging, University of QueenslandBrisbane, QLD, Australia
| | | | | | | | - Paul M. Thompson
- Imaging Genetics Center at the Laboratory of Neuro Imaging, Department of Neurology, UCLA School of MedicineLos Angeles, CA, USA
| | | |
Collapse
|
34
|
Strijbis EMM, Inkster B, Vounou M, Naegelin Y, Kappos L, Radue EW, Matthews PM, Uitdehaag BMJ, Barkhof F, Polman CH, Montana G, Geurts JJG. Glutamate gene polymorphisms predict brain volumes in multiple sclerosis. Mult Scler 2012; 19:281-8. [DOI: 10.1177/1352458512454345] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Background: Several genetic markers have been associated with multiple sclerosis (MS) susceptibility; however, uncovering the genetic aetiology of the complex phenotypic expression of MS has been more difficult so far. The most common approach in imaging genetics is based on mass-univariate linear modelling (MULM), which faces several limitations. Objective: Here we apply a novel multivariate statistical model, sparse reduced-rank regression (sRRR), to identify possible associations of glutamate related single nucleotide polymorphisms (SNPs) and multiple MRI-derived phenotypes in MS. Methods: Seven phenotypes related to brain and lesion volumes for a total number of 326 relapsing–remitting and secondary-progressive MS patients and a total of 3809 glutamate related and control SNPs were analysed with sRRR, which resulted in a ranking of SNPs in decreasing order of importance (‘selection probability’). Lasso regression and MULM were used as comparative statistical techniques to assess consistency of the most important associations over different statistical models. Results: Five SNPs within the NMDA-receptor-2A-subunit (GRIN2A) domain were identified by sRRR in association with normalized brain volume (NBV), normalized grey matter volume and normalized white matter volume (NMWM). The association between GRIN2A and both NBV and NWMV was confirmed in MULM and Lasso analysis. Conclusions: Using a novel, multivariate regression model confirmed by two other statistical approaches we show associations between GRIN2A SNPs and phenotypic variation in NBV and NWMV in this first exploratory study. Replications in independent datasets are now necessary to validate these findings.
Collapse
Affiliation(s)
- Eva MM Strijbis
- Department of Neurology, VU University Medical Centre, Amsterdam, The Netherlands
- Department of Anatomy and Neuroscience, Section of Clinical Neuroscience, VU University Medical Centre, Amsterdam, The Netherlands
| | - Becky Inkster
- Department of Mathematics, Statistics Section, Imperial College London, UK
- Centre for Neuroscience, Department of Medicine, Hammersmith Hospital, Imperial College London, UK
| | - Maria Vounou
- Department of Mathematics, Statistics Section, Imperial College London, UK
| | - Yvonne Naegelin
- Department of Neurology and Medical Image Analysis Centre, University Hospital, Basel, Switzerland
| | - Ludwig Kappos
- Department of Neurology and Medical Image Analysis Centre, University Hospital, Basel, Switzerland
| | - Ernst-Wilhelm Radue
- Department of Neurology and Medical Image Analysis Centre, University Hospital, Basel, Switzerland
| | - Paul M Matthews
- Centre for Neuroscience, Department of Medicine, Hammersmith Hospital, Imperial College London, UK
- GlaxoSmithKline Clinical Imaging Centre, Hammersmith Hospital, London, UK
| | - Bernard MJ Uitdehaag
- Department of Neurology, VU University Medical Centre, Amsterdam, The Netherlands
- Department of Epidemiology and Biostatistics, VU University Medical Centre, Amsterdam, The Netherlands
| | - Frederik Barkhof
- Department of Radiology, VU University Medical Centre, Amsterdam, The Netherlands
| | - Chris H Polman
- Department of Neurology, VU University Medical Centre, Amsterdam, The Netherlands
| | - Giovanni Montana
- Department of Mathematics, Statistics Section, Imperial College London, UK
| | - Jeroen JG Geurts
- Department of Anatomy and Neuroscience, Section of Clinical Neuroscience, VU University Medical Centre, Amsterdam, The Netherlands
| |
Collapse
|
35
|
Regan K, Wang K, Doughty E, Li H, Li J, Lee Y, Kann MG, Lussier YA. Translating Mendelian and complex inheritance of Alzheimer's disease genes for predicting unique personal genome variants. J Am Med Inform Assoc 2012; 19:306-16. [PMID: 22319180 PMCID: PMC3277633 DOI: 10.1136/amiajnl-2011-000656] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Objective Although trait-associated genes identified as complex versus single-gene inheritance differ substantially in odds ratio, the authors nonetheless posit that their mechanistic concordance can reveal fundamental properties of the genetic architecture, allowing the automated interpretation of unique polymorphisms within a personal genome. Materials and methods An analytical method, SPADE-gen, spanning three biological scales was developed to demonstrate the mechanistic concordance between Mendelian and complex inheritance of Alzheimer's disease (AD) genes: biological functions (BP), protein interaction modeling, and protein domain implicated in the disease-associated polymorphism. Results Among Gene Ontology (GO) biological processes (BP) enriched at a false detection rate <5% in 15 AD genes of Mendelian inheritance (Online Mendelian Inheritance in Man) and independently in those of complex inheritance (25 host genes of intragenic AD single-nucleotide polymorphisms confirmed in genome-wide association studies), 16 overlapped (empirical p=0.007) and 45 were similar (empirical p<0.009; information theory). SPAN network modeling extended the canonical pathway of AD (KEGG) with 26 new protein interactions (empirical p<0.0001). Discussion The study prioritized new AD-associated biological mechanisms and focused the analysis on previously unreported interactions associated with the biological processes of polymorphisms that affect specific protein domains within characterized AD genes and their direct interactors using (1) concordant GO-BP and (2) domain interactions within STRING protein–protein interactions corresponding to the genomic location of the AD polymorphism (eg, EPHA1, APOE, and CD2AP). Conclusion These results are in line with unique-event polymorphism theory, indicating how disease-associated polymorphisms of Mendelian or complex inheritance relate genetically to those observed as ‘unique personal variants’. They also provide insight for identifying novel targets, for repositioning drugs, and for personal therapeutics.
Collapse
Affiliation(s)
- Kelly Regan
- Department of Medicine, University of Illinois at Chicago, Chicago, Illinois 60637, USA
| | | | | | | | | | | | | | | |
Collapse
|
36
|
Pahikkala T, Okser S, Airola A, Salakoski T, Aittokallio T. Wrapper-based selection of genetic features in genome-wide association studies through fast matrix operations. Algorithms Mol Biol 2012; 7:11. [PMID: 22551170 PMCID: PMC3606421 DOI: 10.1186/1748-7188-7-11] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2011] [Accepted: 04/23/2012] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Through the wealth of information contained within them, genome-wide association studies (GWAS) have the potential to provide researchers with a systematic means of associating genetic variants with a wide variety of disease phenotypes. Due to the limitations of approaches that have analyzed single variants one at a time, it has been proposed that the genetic basis of these disorders could be determined through detailed analysis of the genetic variants themselves and in conjunction with one another. The construction of models that account for these subsets of variants requires methodologies that generate predictions based on the total risk of a particular group of polymorphisms. However, due to the excessive number of variants, constructing these types of models has so far been computationally infeasible. RESULTS We have implemented an algorithm, known as greedy RLS, that we use to perform the first known wrapper-based feature selection on the genome-wide level. The running time of greedy RLS grows linearly in the number of training examples, the number of features in the original data set, and the number of selected features. This speed is achieved through computational short-cuts based on matrix calculus. Since the memory consumption in present-day computers can form an even tighter bottleneck than running time, we also developed a space efficient variation of greedy RLS which trades running time for memory. These approaches are then compared to traditional wrapper-based feature selection implementations based on support vector machines (SVM) to reveal the relative speed-up and to assess the feasibility of the new algorithm. As a proof of concept, we apply greedy RLS to the Hypertension - UK National Blood Service WTCCC dataset and select the most predictive variants using 3-fold external cross-validation in less than 26 minutes on a high-end desktop. On this dataset, we also show that greedy RLS has a better classification performance on independent test data than a classifier trained using features selected by a statistical p-value-based filter, which is currently the most popular approach for constructing predictive models in GWAS. CONCLUSIONS Greedy RLS is the first known implementation of a machine learning based method with the capability to conduct a wrapper-based feature selection on an entire GWAS containing several thousand examples and over 400,000 variants. In our experiments, greedy RLS selected a highly predictive subset of genetic variants in a fraction of the time spent by wrapper-based selection methods used together with SVM classifiers. The proposed algorithms are freely available as part of the RLScore software library at http://users.utu.fi/aatapa/RLScore/.
Collapse
Affiliation(s)
- Tapio Pahikkala
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science, Turku, Finland
| | - Sebastian Okser
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science, Turku, Finland
| | - Antti Airola
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science, Turku, Finland
| | - Tapio Salakoski
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science, Turku, Finland
| | - Tero Aittokallio
- Turku Centre for Computer Science, Turku, Finland
- Department of Mathematics, University of Turku, Turku, Finland
- Data Mining and Modeling group, Turku Centre for Biotechnology, Turku, Finland
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| |
Collapse
|
37
|
Valdar W, Sabourin J, Nobel A, Holmes CC. Reprioritizing genetic associations in hit regions using LASSO-based resample model averaging. Genet Epidemiol 2012; 36:451-62. [PMID: 22549815 PMCID: PMC3470705 DOI: 10.1002/gepi.21639] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2011] [Revised: 03/21/2012] [Accepted: 03/21/2012] [Indexed: 12/13/2022]
Abstract
Significance testing one SNP at a time has proven useful for identifying genomic regions that harbor variants affecting human disease. But after an initial genome scan has identified a “hit region” of association, single-locus approaches can falter. Local linkage disequilibrium (LD) can make both the number of underlying true signals and their identities ambiguous. Simultaneous modeling of multiple loci should help. However, it is typically applied ad hoc: conditioning on the top SNPs, with limited exploration of the model space and no assessment of how sensitive model choice was to sampling variability. Formal alternatives exist but are seldom used. Bayesian variable selection is coherent but requires specifying a full joint model, including priors on parameters and the model space. Penalized regression methods (e.g., LASSO) appear promising but require calibration, and, once calibrated, lead to a choice of SNPs that can be misleadingly decisive. We present a general method for characterizing uncertainty in model choice that is tailored to reprioritizing SNPs within a hit region under strong LD. Our method, LASSO local automatic regularization resample model averaging (LLARRMA), combines LASSO shrinkage with resample model averaging and multiple imputation, estimating for each SNP the probability that it would be included in a multi-SNP model in alternative realizations of the data. We apply LLARRMA to simulations based on case-control genome-wide association studies data, and find that when there are several causal loci and strong LD, LLARRMA identifies a set of candidates that is enriched for true signals relative to single locus analysis and to the recently proposed method of Stability Selection. Genet. Epidemiol. 36:451–462, 2012. © 2012 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- William Valdar
- Department of Genetics, and Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7265, USA.
| | | | | | | |
Collapse
|
38
|
Rodin AS, Gogoshin G, Boerwinkle E. Systems biology data analysis methodology in pharmacogenomics. Pharmacogenomics 2012; 12:1349-60. [PMID: 21919609 DOI: 10.2217/pgs.11.76] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Pharmacogenetics aims to elucidate the genetic factors underlying the individual's response to pharmacotherapy. Coupled with the recent (and ongoing) progress in high-throughput genotyping, sequencing and other genomic technologies, pharmacogenetics is rapidly transforming into pharmacogenomics, while pursuing the primary goals of identifying and studying the genetic contribution to drug therapy response and adverse effects, and existing drug characterization and new drug discovery. Accomplishment of both of these goals hinges on gaining a better understanding of the underlying biological systems; however, reverse-engineering biological system models from the massive datasets generated by the large-scale genetic epidemiology studies presents a formidable data analysis challenge. In this article, we review the recent progress made in developing such data analysis methodology within the paradigm of systems biology research that broadly aims to gain a 'holistic', or 'mechanistic' understanding of biological systems by attempting to capture the entirety of interactions between the components (genetic and otherwise) of the system.
Collapse
Affiliation(s)
- Andrei S Rodin
- Human Genetics Center, School of Public Health, University of Texas Health Science Center, Houston, TX 77030, USA.
| | | | | |
Collapse
|
39
|
A case control association study and cognitive function analysis of neuropilin and tolloid-like 1 gene and schizophrenia in the Japanese population. PLoS One 2011; 6:e28929. [PMID: 22205981 PMCID: PMC3243668 DOI: 10.1371/journal.pone.0028929] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2011] [Accepted: 11/17/2011] [Indexed: 11/19/2022] Open
Abstract
Background Using a knock-out mouse model, it was shown that NETO1 is a critical component of the NMDAR complex, and that loss of Neto1 leads to impaired hippocampal long term potentiation and hippocampal-dependent learning and memory. Moreover, hemizygosity of NETO1 was shown to be associated with autistic-like behavior in humans. Purpose of the Research We examined the association between schizophrenia and the neuropilin and tolloid-like 1 gene (NETO1). First, we selected eight single nucleotide polymorphisms (SNPs) within the NETO1 locus, based on the Japanese schizophrenia genome wide association study (JGWAS) results and previously conducted association studies. These SNPs were genotyped in the replication sample comprised of 963 schizophrenic patients and 919 healthy controls. We also examined the effect of associated SNPs on scores in the Continuous Performance Test and the Wisconsin Card Sorting Test Keio version (schizophrenic patients 107, healthy controls 104). Results There were no significant allele-wise and haplotype-wise associations in the replication analysis after Bonferroni correction. However, in meta-analysis (JGWAS and replication dataset) three association signals were observed (rs17795324: p = 0.028, rs8098760: p = 0.017, rs17086492: p = 0.003). These SNPs were followed up but we could not detect the allele-specific effect on cognitive performance measured by the Continuous performance test (CPT) and Wisconsin Card Sorting test (WCST). Major Conclusions We did not detect evidence for the association of NETO1 with schizophrenia in the Japanese population. Common variants within the NETO1 locus may not increase the genetic risk for schizophrenia in the Japanese population. Additionally, common variants investigated in the current study did not affect cognitive performance, as measured by the CPT and WCST.
Collapse
|