1
|
Sun W, Jon K, Zhu W. Multiple phenotype association tests based on sliced inverse regression. BMC Bioinformatics 2024; 25:144. [PMID: 38575890 PMCID: PMC10996256 DOI: 10.1186/s12859-024-05731-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 03/05/2024] [Indexed: 04/06/2024] Open
Abstract
BACKGROUND Joint analysis of multiple phenotypes in studies of biological systems such as Genome-Wide Association Studies is critical to revealing the functional interactions between various traits and genetic variants, but growth of data in dimensionality has become a very challenging problem in the widespread use of joint analysis. To handle the excessiveness of variables, we consider the sliced inverse regression (SIR) method. Specifically, we propose a novel SIR-based association test that is robust and powerful in testing the association between multiple predictors and multiple outcomes. RESULTS We conduct simulation studies in both low- and high-dimensional settings with various numbers of Single-Nucleotide Polymorphisms and consider the correlation structure of traits. Simulation results show that the proposed method outperforms the existing methods. We also successfully apply our method to the genetic association study of ADNI dataset. Both the simulation studies and real data analysis show that the SIR-based association test is valid and achieves a higher efficiency compared with its competitors. CONCLUSION Several scenarios with low- and high-dimensional responses and genotypes are considered in this paper. Our SIR-based method controls the estimated type I error at the pre-specified level α .
Collapse
Affiliation(s)
- Wenyuan Sun
- Key Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, Changchun, 130024, Jilin, China
- Department of Mathematics, College of Science, Yanbian University, Yanji, 133002, Jilin, China
| | - Kyongson Jon
- Key Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, Changchun, 130024, Jilin, China
- Faculty of Mathematics, Kim Il Sung University, Pyongyan , 999093, Democratic People's Republic of Korea
| | - Wensheng Zhu
- Key Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, Changchun, 130024, Jilin, China.
- School of Mathematical Sciences, Harbin Normal University, Harbin, 150025, Heilongjiang, China.
| |
Collapse
|
2
|
Chi J, Xu M, Sheng X, Zhou Y. Association detection between multiple traits and rare variants based on family data via a nonparametric method. PeerJ 2023; 11:e16040. [PMID: 37780393 PMCID: PMC10541022 DOI: 10.7717/peerj.16040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 08/15/2023] [Indexed: 10/03/2023] Open
Abstract
Background The rapid development of next-generation sequencing technologies allow people to analyze human complex diseases at the molecular level. It has been shown that rare variants play important roles for human diseases besides common variants. Thus, effective statistical methods need to be proposed to test for the associations between traits (e.g., diseases) and rare variants. Currently, more and more rare genetic variants are being detected throughout the human genome, which demonstrates the possibility to study rare variants. Yet complex diseases are usually measured as a variety of forms, such as binary, ordinal, quantitative, or some mixture of them. Therefore, the genetic mapping problem can be attributable to the association detection between multiple traits and multiple loci, with sufficiently considering the correlated structure among multiple traits. Methods In this article, we construct a new non-parametric statistic by the generalized Kendall's τ theory based on family data. The new test statistic has an asymptotic distribution, it can be used to study the associations between multiple traits and rare variants, which broadens the way to identify genetic factors of human complex diseases. Results We apply our method (called Nonp-FAM) to analyze simulated data and GAW17 data, and conduct comprehensive comparison with some existing methods. Experimental results show that the proposed family-based method is powerful and robust for testing associations between multiple traits and rare variants, even if the data has some population stratification effect.
Collapse
Affiliation(s)
- Jinling Chi
- Department of Statistics, Heilongjiang University, Harbin, China
- School of Mathematics and Statistics, Xidian University, Xi’an, China
| | - Meijuan Xu
- Department of Statistics, Heilongjiang University, Harbin, China
| | - Xiaona Sheng
- School of Information Engineering, Harbin University, Harbin, China
| | - Ying Zhou
- Department of Statistics, Heilongjiang University, Harbin, China
| |
Collapse
|
3
|
Liu W, Xu Y, Wang A, Huang T, Liu Z. The eigen higher criticism and eigen Berk–Jones tests for multiple trait association studies based on GWAS summary statistics. Genet Epidemiol 2021; 46:89-104. [DOI: 10.1002/gepi.22439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 09/10/2021] [Accepted: 10/21/2021] [Indexed: 11/11/2022]
Affiliation(s)
- Wei Liu
- Department of Statistics and Actuarial Science The University of Hong Kong Hong Kong SAR China
- Department of Cell Biology and Genetics, School of Basic Medical Sciences Xi'an Jiaotong University Health Science Center Xi'an China
| | - Yuyang Xu
- Department of Statistics and Actuarial Science The University of Hong Kong Hong Kong SAR China
| | - Anqi Wang
- Department of Statistics and Actuarial Science The University of Hong Kong Hong Kong SAR China
| | - Tao Huang
- Department of Epidemiology and Biostatistics, School of Public Health Peking University Beijing China
- Institute for Artificial Intelligence, Center for Intelligent Public Health Peking University Beijing China
- Key Laboratory of Molecular Cardiovascular Diseases, Peking University Ministry of Education Beijing China
| | - Zhonghua Liu
- Department of Statistics and Actuarial Science The University of Hong Kong Hong Kong SAR China
| |
Collapse
|
4
|
Cui T, Wang P, Zhu W. Covariate-adjusted multiple testing in genome-wide association studies via factorial hidden Markov models. TEST-SPAIN 2021. [DOI: 10.1007/s11749-020-00746-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
5
|
Feng GJ, Wei XT, Zhang H, Yang XL, Shen H, Tian Q, Deng HW, Zhang L, Pei YF. Identification of pleiotropic loci underlying hip bone mineral density and trunk lean mass. J Hum Genet 2021; 66:251-260. [PMID: 32929176 PMCID: PMC7880826 DOI: 10.1038/s10038-020-00835-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Revised: 08/15/2020] [Accepted: 08/24/2020] [Indexed: 11/09/2022]
Abstract
Bone mineral density (BMD) and lean body mass (LBM) not only have a considerable heritability each, but also are genetically correlated. However, common genetic determinants shared by both traits are largely unknown. In the present study, we performed a bivariate genome-wide association study (GWAS) meta-analysis of hip BMD and trunk lean mass (TLM) in 11,335 subjects from 6 samples, and performed replication in estimated heel BMD and TLM in 215,234 UK Biobank (UKB) participants. We identified 2 loci that nearly attained the genome-wide significance (GWS, p < 5.0 × 10-8) level in the discovery GWAS meta-analysis and that were successfully replicated in the UKB sample: 11p15.2 (lead SNP rs12800228, discovery p = 2.88 × 10-7, replication p = 1.95 × 10-4) and 18q21.32 (rs489693, discovery p = 1.67 × 10-7, replication p = 1.17 × 10-3). The above 2 pleiotropic loci may play a pleiotropic role for hip BMD and TLM development. So our findings provide useful insights that further enhance our understanding of genetic interplay between BMD and LBM.
Collapse
Affiliation(s)
- Gui-Juan Feng
- Department of Epidemiology and Health Statistics, School of Public Health, Medical College of Soochow University, Jiangsu, People's Republic of China
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Jiangsu, People's Republic of China
| | - Xin-Tong Wei
- Department of Epidemiology and Health Statistics, School of Public Health, Medical College of Soochow University, Jiangsu, People's Republic of China
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Jiangsu, People's Republic of China
| | - Hong Zhang
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Jiangsu, People's Republic of China
- Center for Genetic Epidemiology and Genomics, School of Public Health, Medical College of Soochow University, Jiangsu, People's Republic of China
| | - Xiao-Lin Yang
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Jiangsu, People's Republic of China
- Center for Genetic Epidemiology and Genomics, School of Public Health, Medical College of Soochow University, Jiangsu, People's Republic of China
| | - Hui Shen
- Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA, USA
| | - Qing Tian
- Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA, USA
| | - Hong-Wen Deng
- Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA, USA.
| | - Lei Zhang
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Jiangsu, People's Republic of China.
- Center for Genetic Epidemiology and Genomics, School of Public Health, Medical College of Soochow University, Jiangsu, People's Republic of China.
| | - Yu-Fang Pei
- Department of Epidemiology and Health Statistics, School of Public Health, Medical College of Soochow University, Jiangsu, People's Republic of China.
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Jiangsu, People's Republic of China.
| |
Collapse
|
6
|
Zhang YX, Zhang SS, Ran S, Liu Y, Zhang H, Yang XL, Hai R, Shen H, Tian Q, Deng HW, Zhang L, Pei YF. Three pleiotropic loci associated with bone mineral density and lean body mass. Mol Genet Genomics 2021; 296:55-65. [PMID: 32970232 PMCID: PMC7903521 DOI: 10.1007/s00438-020-01724-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Accepted: 09/09/2020] [Indexed: 11/26/2022]
Abstract
Both bone mineral density (BMD) and lean body mass (LBM) are important physiological measures with strong genetic determination. Besides, BMD and LBM might have common genetic factors. Aiming to identify pleiotropic genomic loci underlying BMD and LBM, we performed bivariate genome-wide association study meta-analyses of femoral neck bone mineral density and LBM at arms and legs, and replicated in the large-scale UK Biobank cohort sample. Combining the results from discovery meta-analysis and replication sample, we identified three genomic loci at the genome-wide significance level (p < 5.0 × 10-8): 2p23.2 (lead SNP rs4477866, discovery p = 3.47 × 10-8, replication p = 1.03 × 10-4), 16q12.2 (rs1421085, discovery p = 2.04 × 10-9, replication p = 6.47 × 10-14) and 18q21.32 (rs11152213, discovery p = 3.47 × 10-8, replication p = 6.69 × 10-6). Our findings not only provide useful insights into lean mass and bone mass development, but also enhance our understanding of the potential genetic correlation between BMD and LBM.
Collapse
Affiliation(s)
- Yu-Xue Zhang
- Center for Genetic Epidemiology and Genomics, School of Public Health, Medical College of Soochow University, 199 Ren-ai Rd.Jiangsu, Suzhou, 215123, People's Republic of China
- School of Medical Instruments and Food Engineering, University of Shanghai for Science and Technology, Shanghai, People's Republic of China
| | - Shan-Shan Zhang
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Jiangsu, Suzhou, People's Republic of China
- Department of Epidemiology and Health Statistics, School of Public Health, Medical College of Soochow University, 199 Ren-ai Rd.Jiangsu, Suzhou, 215123, People's Republic of China
| | - Shu Ran
- School of Medical Instruments and Food Engineering, University of Shanghai for Science and Technology, Shanghai, People's Republic of China
| | - Yu Liu
- School of Medical Instruments and Food Engineering, University of Shanghai for Science and Technology, Shanghai, People's Republic of China
| | - Hong Zhang
- Center for Genetic Epidemiology and Genomics, School of Public Health, Medical College of Soochow University, 199 Ren-ai Rd.Jiangsu, Suzhou, 215123, People's Republic of China
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Jiangsu, Suzhou, People's Republic of China
| | - Xiao-Lin Yang
- Center for Genetic Epidemiology and Genomics, School of Public Health, Medical College of Soochow University, 199 Ren-ai Rd.Jiangsu, Suzhou, 215123, People's Republic of China
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Jiangsu, Suzhou, People's Republic of China
| | - Rong Hai
- Inner Mongolia Autonomous Region Center of Health Management Service, Baotou, Inner Mongolia, People's Republic of China
| | - Hui Shen
- Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, 1440 Canal St., Suite 2001, New Orleans, LA, 70112, USA
| | - Qing Tian
- Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, 1440 Canal St., Suite 2001, New Orleans, LA, 70112, USA
| | - Hong-Wen Deng
- Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, 1440 Canal St., Suite 2001, New Orleans, LA, 70112, USA.
| | - Lei Zhang
- Center for Genetic Epidemiology and Genomics, School of Public Health, Medical College of Soochow University, 199 Ren-ai Rd.Jiangsu, Suzhou, 215123, People's Republic of China.
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Jiangsu, Suzhou, People's Republic of China.
| | - Yu-Fang Pei
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Jiangsu, Suzhou, People's Republic of China.
- Department of Epidemiology and Health Statistics, School of Public Health, Medical College of Soochow University, 199 Ren-ai Rd.Jiangsu, Suzhou, 215123, People's Republic of China.
| |
Collapse
|
7
|
Kwak M. Genome-wide association study using truncated likelihood with incomplete information for stratum specific missingness. J Korean Stat Soc 2020. [DOI: 10.1007/s42952-020-00064-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
8
|
Multivariate association test for rare variants controlling for cryptic and family relatedness. CAN J STAT 2019. [DOI: 10.1002/cjs.11475] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
9
|
Zhang H, Liu D, Zhao J, Bi X. Modeling Hybrid Traits for Comorbidity and Genetic Studies of Alcohol and Nicotine Co-Dependence. Ann Appl Stat 2018; 12:2359-2378. [PMID: 30666272 PMCID: PMC6338437 DOI: 10.1214/18-aoas1156] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
We propose a novel multivariate model for analyzing hybrid traits and identifying genetic factors for comorbid conditions. Comorbidity is a common phenomenon in mental health in which an individual suffers from multiple disorders simultaneously. For example, in the Study of Addiction: Genetics and Environment (SAGE), alcohol and nicotine addiction were recorded through multiple assessments that we refer to as hybrid traits. Statistical inference for studying the genetic basis of hybrid traits has not been well-developed. Recent rank-based methods have been utilized for conducting association analyses of hybrid traits but do not inform the strength or direction of effects. To overcome this limitation, a parametric modeling framework is imperative. Although such parametric frameworks have been proposed in theory, they are neither well-developed nor extensively used in practice due to their reliance on complicated likelihood functions that have high computational complexity. Many existing parametric frameworks tend to instead use pseudo-likelihoods to reduce computational burdens. Here, we develop a model fitting algorithm for the full likelihood. Our extensive simulation studies demonstrate that inference based on the full likelihood can control the type-I error rate, and gains power and improves the effect size estimation when compared with several existing methods for hybrid models. These advantages remain even if the distribution of the latent variables is misspecified. After analyzing the SAGE data, we identify three genetic variants (rs7672861, rs958331, rs879330) that are significantly associated with the comorbidity of alcohol and nicotine addiction at the chromosome-wide level. Moreover, our approach has greater power in this analysis than several existing methods for hybrid traits.Although the analysis of the SAGE data motivated us to develop the model, it can be broadly applied to analyze any hybrid responses.
Collapse
Affiliation(s)
- Heping Zhang
- Heping Zhang is Susan Dwight Bliss Professor , Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520; Dungang Liu is Assistant Professor , Department of Operations, Business Analytics and Information Systems, University of Cincinnati Lindner College of Business, Cincinnati, OH 45221; Jiwei Zhao is Assistant Professor , Department of Biostatistics, State University of New York at Buffalo, Buffalo, NY 14214; and Xuan Bi is Postdoctoral Associate, Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520
| | - Dungang Liu
- Heping Zhang is Susan Dwight Bliss Professor , Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520; Dungang Liu is Assistant Professor , Department of Operations, Business Analytics and Information Systems, University of Cincinnati Lindner College of Business, Cincinnati, OH 45221; Jiwei Zhao is Assistant Professor , Department of Biostatistics, State University of New York at Buffalo, Buffalo, NY 14214; and Xuan Bi is Postdoctoral Associate, Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520
| | - Jiwei Zhao
- Heping Zhang is Susan Dwight Bliss Professor , Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520; Dungang Liu is Assistant Professor , Department of Operations, Business Analytics and Information Systems, University of Cincinnati Lindner College of Business, Cincinnati, OH 45221; Jiwei Zhao is Assistant Professor , Department of Biostatistics, State University of New York at Buffalo, Buffalo, NY 14214; and Xuan Bi is Postdoctoral Associate, Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520
| | - Xuan Bi
- Heping Zhang is Susan Dwight Bliss Professor , Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520; Dungang Liu is Assistant Professor , Department of Operations, Business Analytics and Information Systems, University of Cincinnati Lindner College of Business, Cincinnati, OH 45221; Jiwei Zhao is Assistant Professor , Department of Biostatistics, State University of New York at Buffalo, Buffalo, NY 14214; and Xuan Bi is Postdoctoral Associate, Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520
| |
Collapse
|
10
|
Qi G, Chatterjee N. Heritability informed power optimization (HIPO) leads to enhanced detection of genetic associations across multiple traits. PLoS Genet 2018; 14:e1007549. [PMID: 30289880 PMCID: PMC6192650 DOI: 10.1371/journal.pgen.1007549] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2017] [Revised: 10/17/2018] [Accepted: 07/09/2018] [Indexed: 12/31/2022] Open
Abstract
Genome-wide association studies have shown that pleiotropy is a common phenomenon that can potentially be exploited for enhanced detection of susceptibility loci. We propose heritability informed power optimization (HIPO) for conducting powerful pleiotropic analysis using summary-level association statistics. We find optimal linear combinations of association coefficients across traits that are expected to maximize non-centrality parameter for the underlying test statistics, taking into account estimates of heritability, sample size variations and overlaps across the traits. Simulation studies show that the proposed method has correct type I error, robust to population stratification and leads to desired genome-wide enrichment of association signals. Application of the proposed method to publicly available data for three groups of genetically related traits, lipids (N = 188,577), psychiatric diseases (Ncase = 33,332, Ncontrol = 27,888) and social science traits (N ranging between 161,460 to 298,420 across individual traits) increased the number of genome-wide significant loci by 12%, 200% and 50%, respectively, compared to those found by analysis of individual traits. Evidence of replication is present for many of these loci in subsequent larger studies for individual traits. HIPO can potentially be extended to high-dimensional phenotypes as a way of dimension reduction to maximize power for subsequent genetic association testing.
Collapse
Affiliation(s)
- Guanghao Qi
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
- Department of Oncology, Johns Hopkins School of Medicine, Baltimore, Maryland, United States of America
| |
Collapse
|
11
|
Jaimini U, Thirunarayan K, Kalra M, Venkataraman R, Kadariya D, Sheth A. "How Is My Child's Asthma?" Digital Phenotype and Actionable Insights for Pediatric Asthma. JMIR Pediatr Parent 2018; 1:e11988. [PMID: 31008446 PMCID: PMC6469868 DOI: 10.2196/11988] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND In the traditional asthma management protocol, a child meets with a clinician infrequently, once in 3 to 6 months, and is assessed using the Asthma Control Test questionnaire. This information is inadequate for timely determination of asthma control, compliance, precise diagnosis of the cause, and assessing the effectiveness of the treatment plan. The continuous monitoring and improved tracking of the child's symptoms, activities, sleep, and treatment adherence can allow precise determination of asthma triggers and a reliable assessment of medication compliance and effectiveness. Digital phenotyping refers to moment-by-moment quantification of the individual-level human phenotype in situ using data from personal digital devices, in particular, mobile phones. The kHealth kit consists of a mobile app, provided on an Android tablet, that asks timely and contextually relevant questions related to asthma symptoms, medication intake, reduced activity because of symptoms, and nighttime awakenings; a Fitbit to monitor activity and sleep; a Microlife Peak Flow Meter to monitor the peak expiratory flow and forced exhaled volume in 1 second; and a Foobot to monitor indoor air quality. The kHealth cloud stores personal health data and environmental data collected using Web services. The kHealth Dashboard interactively visualizes the collected data. OBJECTIVE The objective of this study was to discuss the usability and feasibility of collecting clinically relevant data to help clinicians diagnose or intervene in a child's care plan by using the kHealth system for continuous and comprehensive monitoring of child's symptoms, activity, sleep pattern, environmental triggers, and compliance. The kHealth system helps in deriving actionable insights to help manage asthma at both the personal and cohort levels. The Digital Phenotype Score and Controller Compliance Score introduced in the study are the basis of ongoing work on addressing personalized asthma care and answer questions such as, "How can I help my child better adhere to care instructions and reduce future exacerbation?" METHODS The Digital Phenotype Score and Controller Compliance Score summarize the child's condition from the data collected using the kHealth kit to provide actionable insights. The Digital Phenotype Score formalizes the asthma control level using data about symptoms, rescue medication usage, activity level, and sleep pattern. The Compliance Score captures how well the child is complying with the treatment protocol. We monitored and analyzed data for 95 children, each recruited for a 1- or 3-month-long study. The Asthma Control Test scores obtained from the medical records of 57 children were used to validate the asthma control levels calculated using the Digital Phenotype Scores. RESULTS At the cohort level, we found asthma was very poorly controlled in 37% (30/82) of the children, not well controlled in 26% (21/82), and well controlled in 38% (31/82). Among the very poorly controlled children (n=30), we found 30% (9/30) were highly compliant toward their controller medication intake-suggesting a re-evaluation for change in medication or dosage-whereas 50% (15/30) were poorly compliant and candidates for a more timely intervention to improve compliance to mitigate their situation. We observed a negative Kendall Tau correlation between Asthma Control Test scores and Digital Phenotype Score as -0.509 (P<.01). CONCLUSIONS kHealth kit is suitable for the collection of clinically relevant information from pediatric patients. Furthermore, Digital Phenotype Score and Controller Compliance Score, computed based on the continuous digital monitoring, provide the clinician with timely and detailed evidence of a child's asthma-related condition when compared with the Asthma Control Test scores taken infrequently during clinic visits.
Collapse
Affiliation(s)
- Utkarshani Jaimini
- Department of Computer Sciene, Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Wright State University, Dayton, OH, United States
| | - Krishnaprasad Thirunarayan
- Department of Computer Sciene, Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Wright State University, Dayton, OH, United States
| | | | - Revathy Venkataraman
- Department of Computer Sciene, Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Wright State University, Dayton, OH, United States
| | - Dipesh Kadariya
- Department of Computer Sciene, Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Wright State University, Dayton, OH, United States
| | - Amit Sheth
- Department of Computer Sciene, Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Wright State University, Dayton, OH, United States
| |
Collapse
|
12
|
Hackinger S, Zeggini E. Statistical methods to detect pleiotropy in human complex traits. Open Biol 2018; 7:rsob.170125. [PMID: 29093210 PMCID: PMC5717338 DOI: 10.1098/rsob.170125] [Citation(s) in RCA: 78] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2017] [Accepted: 09/29/2017] [Indexed: 12/13/2022] Open
Abstract
In recent years pleiotropy, the phenomenon of one genetic locus influencing several traits, has become a widely researched field in human genetics. With the increasing availability of genome-wide association study summary statistics, as well as the establishment of deeply phenotyped sample collections, it is now possible to systematically assess the genetic overlap between multiple traits and diseases. In addition to increasing power to detect associated variants, multi-trait methods can also aid our understanding of how different disorders are aetiologically linked by highlighting relevant biological pathways. A plethora of available tools to perform such analyses exists, each with their own advantages and limitations. In this review, we outline some of the currently available methods to conduct multi-trait analyses. First, we briefly introduce the concept of pleiotropy and outline the current landscape of pleiotropy research in human genetics; second, we describe analytical considerations and analysis methods; finally, we discuss future directions for the field.
Collapse
|
13
|
Wei C, Lu Q. A generalized association test based on U statistics. Bioinformatics 2018; 33:1963-1971. [PMID: 28334117 DOI: 10.1093/bioinformatics/btx103] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2016] [Accepted: 02/15/2017] [Indexed: 11/13/2022] Open
Abstract
Motivation Second generation sequencing technologies are being increasingly used for genetic association studies, where the main research interest is to identify sets of genetic variants that contribute to various phenotypes. The phenotype can be univariate disease status, multivariate responses and even high-dimensional outcomes. Considering the genotype and phenotype as two complex objects, this also poses a general statistical problem of testing association between complex objects. Results We here proposed a similarity-based test, generalized similarity U (GSU), that can test the association between complex objects. We first studied the theoretical properties of the test in a general setting and then focused on the application of the test to sequencing association studies. Based on theoretical analysis, we proposed to use Laplacian Kernel-based similarity for GSU to boost power and enhance robustness. Through simulation, we found that GSU did have advantages over existing methods in terms of power and robustness. We further performed a whole genome sequencing (WGS) scan for Alzherimer's disease neuroimaging initiative data, identifying three genes, APOE , APOC1 and TOMM40 , associated with imaging phenotype. Availability and Implementation We developed a C ++ package for analysis of WGS data using GSU. The source codes can be downloaded at https://github.com/changshuaiwei/gsu . Contact weichangshuai@gmail.com ; qlu@epi.msu.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Changshuai Wei
- Department of Biostatistics and Epidemiology, University of North Texas Health Science Center, Fort Worth, TX 76107
| | - Qing Lu
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
14
|
Guo X, Zhu J, Fan Q, He M, Wang X, Zhang H. A univariate perspective of multivariate genome-wide association analysis. Genet Epidemiol 2018; 42:470-479. [PMID: 29781551 DOI: 10.1002/gepi.22128] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Revised: 03/26/2018] [Accepted: 03/30/2018] [Indexed: 01/11/2023]
Abstract
Multiple correlated phenotypes are frequently collected in genome-wide association studies (GWASs), and a systematic, simultaneous analysis of multiple phenotypes can integrate the signals from single phenotypes, therefore increasing the power of detecting genetic signals. However, fundamental questions remain open, including the conditions and reasons under which the multivariate analysis is beneficial, how a highly significant signal arises in the multivariate analysis. To understand these issues, we propose to decompose the multivariate model into a series of simple univariate models. This transformation offers a clearer quantitative analysis of the circumstances under which a multivariate approach can be beneficial for the bivariate phenotypes case. A real data analysis is employed to illustrate how to interpret how the signals arising from multivariate GWASs.
Collapse
Affiliation(s)
- Xiaobo Guo
- Department of Statistical Science, School of Mathematics, Sun Yat-Sen University, Guangzhou, China.,Southern China Center for Statistical Science, Sun Yat-Sen University, Guangzhou, China.,Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, University of Melbourne, Melbourne, Victoria, Australia
| | - Junxian Zhu
- Department of Statistical Science, School of Mathematics, Sun Yat-Sen University, Guangzhou, China.,Southern China Center for Statistical Science, Sun Yat-Sen University, Guangzhou, China
| | - Qiao Fan
- DUKE-National University of Singapore Graduate Medical School, Singapore, Singapore
| | - Mingguang He
- Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, University of Melbourne, Melbourne, Victoria, Australia.,State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, China
| | - Xueqin Wang
- Department of Statistical Science, School of Mathematics, Sun Yat-Sen University, Guangzhou, China.,Southern China Center for Statistical Science, Sun Yat-Sen University, Guangzhou, China.,Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, China
| | - Heping Zhang
- Department of Statistical Science, School of Mathematics, Sun Yat-Sen University, Guangzhou, China.,Southern China Center for Statistical Science, Sun Yat-Sen University, Guangzhou, China.,Department of Biostatistics, Yale University School of Public Health, New Haven, Connecticut, United States of America
| |
Collapse
|
15
|
Jadhav S, Tong X, Lu Q. A functional U-statistic method for association analysis of sequencing data. Genet Epidemiol 2017; 41:636-643. [PMID: 28850771 DOI: 10.1002/gepi.22063] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2017] [Revised: 06/06/2017] [Accepted: 07/10/2017] [Indexed: 11/08/2022]
Abstract
Although sequencing studies hold great promise for uncovering novel variants predisposing to human diseases, the high dimensionality of the sequencing data brings tremendous challenges to data analysis. Moreover, for many complex diseases (e.g., psychiatric disorders) multiple related phenotypes are collected. These phenotypes can be different measurements of an underlying disease, or measurements characterizing multiple related diseases for studying common genetic mechanism. Although jointly analyzing these phenotypes could potentially increase the power of identifying disease-associated genes, the different types of phenotypes pose challenges for association analysis. To address these challenges, we propose a nonparametric method, functional U-statistic method (FU), for multivariate analysis of sequencing data. It first constructs smooth functions from individuals' sequencing data, and then tests the association of these functions with multiple phenotypes by using a U-statistic. The method provides a general framework for analyzing various types of phenotypes (e.g., binary and continuous phenotypes) with unknown distributions. Fitting the genetic variants within a gene using a smoothing function also allows us to capture complexities of gene structure (e.g., linkage disequilibrium, LD), which could potentially increase the power of association analysis. Through simulations, we compared our method to the multivariate outcome score test (MOST), and found that our test attained better performance than MOST. In a real data application, we apply our method to the sequencing data from Minnesota Twin Study (MTS) and found potential associations of several nicotine receptor subunit (CHRN) genes, including CHRNB3, associated with nicotine dependence and/or alcohol dependence.
Collapse
Affiliation(s)
- Sneha Jadhav
- Department of Statistics and Probability, Michigan State University, East Lansing, Michigan, United States of America
| | - Xiaoran Tong
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan, United States of America
| | - Qing Lu
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan, United States of America
| |
Collapse
|
16
|
Yang JJ, Williams LK, Buu A. Identifying pleiotropic genes in genome-wide association studies from related subjects using the linear mixed model and Fisher combination function. BMC Bioinformatics 2017; 18:376. [PMID: 28836938 PMCID: PMC5571642 DOI: 10.1186/s12859-017-1791-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2016] [Accepted: 08/15/2017] [Indexed: 11/11/2022] Open
Abstract
Background A multivariate genome-wide association test is proposed for analyzing data on multivariate quantitative phenotypes collected from related subjects. The proposed method is a two-step approach. The first step models the association between the genotype and marginal phenotype using a linear mixed model. The second step uses the correlation between residuals of the linear mixed model to estimate the null distribution of the Fisher combination test statistic. Results The simulation results show that the proposed method controls the type I error rate and is more powerful than the marginal tests across different population structures (admixed or non-admixed) and relatedness (related or independent). The statistical analysis on the database of the Study of Addiction: Genetics and Environment (SAGE) demonstrates that applying the multivariate association test may facilitate identification of the pleiotropic genes contributing to the risk for alcohol dependence commonly expressed by four correlated phenotypes. Conclusions This study proposes a multivariate method for identifying pleiotropic genes while adjusting for cryptic relatedness and population structure between subjects. The two-step approach is not only powerful but also computationally efficient even when the number of subjects and the number of phenotypes are both very large.
Collapse
Affiliation(s)
- James J Yang
- School of Nursing, University of Michigan, Ann Arbor, 48104, Michigan, USA.
| | - L Keoki Williams
- Department of Internal Medicine, Henry Ford Health System, Detroit, 48202, Michigan, USA.,The Center for Health Policy and Health Services Research, Henry Ford Health System, Detroit, 48202, Michigan, USA
| | - Anne Buu
- Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, 48104, Michigan, USA
| |
Collapse
|
17
|
Ji S, Ning J, Qin J, Follmann D. Conditional independence test by generalized Kendall's tau with generalized odds ratio. Stat Methods Med Res 2017; 27:3224-3235. [PMID: 29298614 DOI: 10.1177/0962280217695345] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Determining conditional dependence is a challenging but important task in both model building and in applications such as genetic association studies and graphical models. Research on this topic has focused on kernel-based methods or has used categorical conditioning variables because of the challenge of the curse of dimensionality. To overcome this challenge, we propose a class of tests for conditional independence without any restriction on the distribution of the conditioning variables. The proposed test statistic can be treated as a generalized weighted Kendall's tau, in which the generalized odds ratio is utilized as a weight function to account for the distance between different values of the conditioning variables. The test procedure has desirable asymptotic properties and is easy to implement. We evaluate the finite sample performance of the proposed test through simulation studies and illustrate it using two real data examples.
Collapse
Affiliation(s)
| | - Jing Ning
- 2 Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Jing Qin
- 3 Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, Rockville, MD, USA
| | - Dean Follmann
- 3 Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, Rockville, MD, USA
| |
Collapse
|
18
|
Mägi R, Suleimanov YV, Clarke GM, Kaakinen M, Fischer K, Prokopenko I, Morris AP. SCOPA and META-SCOPA: software for the analysis and aggregation of genome-wide association studies of multiple correlated phenotypes. BMC Bioinformatics 2017; 18:25. [PMID: 28077070 PMCID: PMC5225593 DOI: 10.1186/s12859-016-1437-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2016] [Accepted: 12/17/2016] [Indexed: 11/10/2022] Open
Abstract
Background Genome-wide association studies (GWAS) of single nucleotide polymorphisms (SNPs) have been successful in identifying loci contributing genetic effects to a wide range of complex human diseases and quantitative traits. The traditional approach to GWAS analysis is to consider each phenotype separately, despite the fact that many diseases and quantitative traits are correlated with each other, and often measured in the same sample of individuals. Multivariate analyses of correlated phenotypes have been demonstrated, by simulation, to increase power to detect association with SNPs, and thus may enable improved detection of novel loci contributing to diseases and quantitative traits. Results We have developed the SCOPA software to enable GWAS analysis of multiple correlated phenotypes. The software implements “reverse regression” methodology, which treats the genotype of an individual at a SNP as the outcome and the phenotypes as predictors in a general linear model. SCOPA can be applied to quantitative traits and categorical phenotypes, and can accommodate imputed genotypes under a dosage model. The accompanying META-SCOPA software enables meta-analysis of association summary statistics from SCOPA across GWAS. Application of SCOPA to two GWAS of high-and low-density lipoprotein cholesterol, triglycerides and body mass index, and subsequent meta-analysis with META-SCOPA, highlighted stronger association signals than univariate phenotype analysis at established lipid and obesity loci. The META-SCOPA meta-analysis also revealed a novel signal of association at genome-wide significance for triglycerides mapping to GPC5 (lead SNP rs71427535, p = 1.1x10−8), which has not been reported in previous large-scale GWAS of lipid traits. Conclusions The SCOPA and META-SCOPA software enable discovery and dissection of multiple phenotype association signals through implementation of a powerful reverse regression approach.
Collapse
Affiliation(s)
- Reedik Mägi
- Estonian Genome Center, University of Tartu, Tartu, Estonia
| | - Yury V Suleimanov
- Computation-based Science and Technology Research Center, Cyprus Institute, Nicosia, Cyprus.,Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Geraldine M Clarke
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | | | - Krista Fischer
- Estonian Genome Center, University of Tartu, Tartu, Estonia
| | | | - Andrew P Morris
- Estonian Genome Center, University of Tartu, Tartu, Estonia. .,Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK. .,Department of Biostatistics, University of Liverpool, Liverpool, UK.
| |
Collapse
|
19
|
Abstract
For over a decade, genome-wide association studies (GWAS) have been a major tool for detecting genetic variants underlying complex traits. Recent studies have demonstrated that the same variant or gene can be associated with multiple traits, and such associations are termed cross-phenotype (CP) associations. CP association analysis can improve statistical power by searching for variants that contribute to multiple traits, which is often relevant to pleiotropy. In this chapter, we discuss existing statistical methods for analyzing association between a single marker and multivariate phenotypes, we introduce a general approach, CPASSOC, to detect the CP associations, and explain how to conduct the analysis in practice.
Collapse
|
20
|
Li M, Wei C, Wen Y, Wang T, Lu Q. Detecting Gene-Gene Interactions Associated with Multiple Complex Traits with U-Statistics. Curr Genomics 2016; 17:403-415. [PMID: 28479869 PMCID: PMC5320542 DOI: 10.2174/1389202917666160513100946] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2015] [Revised: 05/26/2015] [Accepted: 06/06/2015] [Indexed: 12/02/2022] Open
Abstract
Many complex diseases, such as psychiatric and behavioral disorders, are commonly characterized through various measurements that reflect physical, behavioral and psychological aspects of diseases. While it remains a great challenge to find a unified measurement to characterize a disease, the available multiple phenotypes can be analyzed jointly in the genetic association study. Simultaneously testing these phenotypes has many advantages, including considering different aspects of the disease in the analysis, and utilizing correlated phenotypes to improve the power of detecting disease-associated variants. Furthermore, complex diseases are likely caused by the interplay of multiple genetic variants through complicated mechanisms. Considering gene-gene interactions in the joint association analysis of complex diseases could further increase our ability to discover genetic variants involving complex disease pathways. In this article, we propose a stepwise U-test for joint association analysis of multiple loci and multiple phenotypes. Through simulations, we demonstrated that testing multiple phenotypes simultaneously could attain higher power than testing one single phenotype at a time, especially when there are shared genes contributing to multiple phenotypes. We also illustrated the proposed method with an application to Nicotine Dependence (ND), using datasets from the Study of Addition, Genetics and Environment (SAGE). The joint analysis of three ND phenotypes identified two SNPs, rs10508649 and rs2491397, and reached a nominal P-value of 3.79e-13. The association was further replicated in two independent datasets with P-values of 2.37e-05 and 7.46e-05.
Collapse
Affiliation(s)
- Ming Li
- 1Department of Epidemiology and Biostatistics, Indiana University at Bloomington, Bloomington, IN 47405, U.S.A; 2Department of Epidemiology and Biostatistics, University of North Texas Health Science Center, Fort Worth, TX 76107, U.S.A; 3Department of Statistics, University of Auckland, Auckland 1010, New Zealand; 4Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi 030001, P.R. China; 5Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, U.S.A
| | - Changshuai Wei
- 1Department of Epidemiology and Biostatistics, Indiana University at Bloomington, Bloomington, IN 47405, U.S.A; 2Department of Epidemiology and Biostatistics, University of North Texas Health Science Center, Fort Worth, TX 76107, U.S.A; 3Department of Statistics, University of Auckland, Auckland 1010, New Zealand; 4Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi 030001, P.R. China; 5Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, U.S.A
| | - Yalu Wen
- 1Department of Epidemiology and Biostatistics, Indiana University at Bloomington, Bloomington, IN 47405, U.S.A; 2Department of Epidemiology and Biostatistics, University of North Texas Health Science Center, Fort Worth, TX 76107, U.S.A; 3Department of Statistics, University of Auckland, Auckland 1010, New Zealand; 4Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi 030001, P.R. China; 5Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, U.S.A
| | - Tong Wang
- 1Department of Epidemiology and Biostatistics, Indiana University at Bloomington, Bloomington, IN 47405, U.S.A; 2Department of Epidemiology and Biostatistics, University of North Texas Health Science Center, Fort Worth, TX 76107, U.S.A; 3Department of Statistics, University of Auckland, Auckland 1010, New Zealand; 4Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi 030001, P.R. China; 5Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, U.S.A
| | - Qing Lu
- 1Department of Epidemiology and Biostatistics, Indiana University at Bloomington, Bloomington, IN 47405, U.S.A; 2Department of Epidemiology and Biostatistics, University of North Texas Health Science Center, Fort Worth, TX 76107, U.S.A; 3Department of Statistics, University of Auckland, Auckland 1010, New Zealand; 4Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi 030001, P.R. China; 5Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, U.S.A
| |
Collapse
|
21
|
Sun L, Wang C, Hu YQ. Utilizing mutual information for detecting rare and common variants associated with a categorical trait. PeerJ 2016; 4:e2139. [PMID: 27350900 PMCID: PMC4918222 DOI: 10.7717/peerj.2139] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2016] [Accepted: 05/25/2016] [Indexed: 11/20/2022] Open
Abstract
Background. Genome-wide association studies have succeeded in detecting novel common variants which associate with complex diseases. As a result of the fast changes in next generation sequencing technology, a large number of sequencing data are generated, which offers great opportunities to identify rare variants that could explain a larger proportion of missing heritability. Many effective and powerful methods are proposed, although they are usually limited to continuous, dichotomous or ordinal traits. Notice that traits having nominal categorical features are commonly observed in complex diseases, especially in mental disorders, which motivates the incorporation of the characteristics of the categorical trait into association studies with rare and common variants. Methods. We construct two simple and intuitive nonparametric tests, MIT and aMIT, based on mutual information for detecting association between genetic variants in a gene or region and a categorical trait. MIT and aMIT can gauge the difference among the distributions of rare and common variants across a region given every categorical trait value. If there is little association between variants and a categorical trait, MIT or aMIT approximately equals zero. The larger the difference in distributions, the greater values MIT and aMIT have. Therefore, MIT and aMIT have the potential for detecting functional variants. Results.We checked the validity of proposed statistics and compared them to the existing ones through extensive simulation studies with varied combinations of the numbers of variants of rare causal, rare non-causal, common causal, and common non-causal, deleterious and protective, various minor allele frequencies and different levels of linkage disequilibrium. The results show our methods have higher statistical power than conventional ones, including the likelihood based score test, in most cases: (1) there are multiple genetic variants in a gene or region; (2) both protective and deleterious variants are present; (3) there exist rare and common variants; and (4) more than half of the variants are neutral. The proposed tests are applied to the data from Collaborative Studies on Genetics of Alcoholism, and a competent performance is exhibited therein. Discussion. As a complementary to the existing methods mainly focusing on quantitative traits, this study provides the nonparametric tests MIT and aMIT for detecting variants associated with categorical trait. Furthermore, we plan to investigate the association between rare variants and multiple categorical traits.
Collapse
Affiliation(s)
- Leiming Sun
- State Key Laboratory of Genetic Engineering, Institute of Biostatistics, School of Life Sciences, Fudan University , Shanghai , China
| | - Chan Wang
- State Key Laboratory of Genetic Engineering, Institute of Biostatistics, School of Life Sciences, Fudan University , Shanghai , China
| | - Yue-Qing Hu
- State Key Laboratory of Genetic Engineering, Institute of Biostatistics, School of Life Sciences, Fudan University , Shanghai , China
| |
Collapse
|
22
|
Wei C, Elston RC, Lu Q. A weighted U statistic for association analyses considering genetic heterogeneity. Stat Med 2016; 35:2802-14. [PMID: 26833871 DOI: 10.1002/sim.6877] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2015] [Revised: 11/11/2015] [Accepted: 12/28/2015] [Indexed: 11/10/2022]
Abstract
Converging evidence suggests that common complex diseases with the same or similar clinical manifestations could have different underlying genetic etiologies. While current research interests have shifted toward uncovering rare variants and structural variations predisposing to human diseases, the impact of heterogeneity in genetic studies of complex diseases has been largely overlooked. Most of the existing statistical methods assume the disease under investigation has a homogeneous genetic effect and could, therefore, have low power if the disease undergoes heterogeneous pathophysiological and etiological processes. In this paper, we propose a heterogeneity-weighted U (HWU) method for association analyses considering genetic heterogeneity. HWU can be applied to various types of phenotypes (e.g., binary and continuous) and is computationally efficient for high-dimensional genetic data. Through simulations, we showed the advantage of HWU when the underlying genetic etiology of a disease was heterogeneous, as well as the robustness of HWU against different model assumptions (e.g., phenotype distributions). Using HWU, we conducted a genome-wide analysis of nicotine dependence from the Study of Addiction: Genetics and Environments dataset. The genome-wide analysis of nearly one million genetic markers took 7h, identifying heterogeneous effects of two new genes (i.e., CYP3A5 and IKBKB) on nicotine dependence. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Changshuai Wei
- Department of Biostatistics and Epidemiology, University of North Texas Health Science Center, Fort Worth, TX, U.S.A
| | - Robert C Elston
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, U.S.A
| | - Qing Lu
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, U.S.A
| |
Collapse
|
23
|
ZHOU YING, CHENG YANGYANG, ZHU WENSHENG, ZHOU QIAN. A nonparametric method to test for associations between rare variants and multiple traits. Genet Res (Camb) 2016; 98:e1. [PMID: 27159928 PMCID: PMC6865163 DOI: 10.1017/s0016672315000269] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2015] [Revised: 09/18/2015] [Accepted: 12/08/2015] [Indexed: 11/06/2022] Open
Abstract
More and more rare genetic variants are being detected in the human genome, and it is believed that besides common variants, some rare variants also explain part of the phenotypic variance for human diseases. Due to the importance of rare variants, many statistical methods have been proposed to test for associations between rare variants and human traits. However, in existing studies, most methods only test for associations between multiple loci and one trait; therefore, the joint information of multiple traits has not been considered simultaneously and sufficiently. In this article, we present a study of testing for associations between rare variants and multiple traits, where trait value can be binary, ordinal, quantitative and/or any mixture of them. Based on the method of generalized Kendall’s τ, a nonparametric method called NM-RV is proposed. A new kernel function for U-statistic, which could incorporate the information of each rare variant itself, is also presented and is expected to enhance the power of rare variant analysis. We further consider the asymptotic distribution of the proposed association test statistic. Our simulation work suggests that the proposed method is more powerful and robust than existing methods in testing for associations between rare variants and multiple traits,especially for multivariate ordinal traits.
Collapse
Affiliation(s)
- YING ZHOU
- Key Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, Changchun 130024, China
- School of Mathematical Sciences, Heilongjiang University, Harbin 150080, China
| | - YANGYANG CHENG
- Key Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, Changchun 130024, China
| | - WENSHENG ZHU
- Key Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, Changchun 130024, China
| | - QIAN ZHOU
- Department of Humanities, Mianyang Vocational and Technical College, Mianyang 621000, China
| |
Collapse
|
24
|
Zhao J, Zhang H. Modeling Multiple Responses via Bootstrapping Margins with an Application to Genetic Association Testing. STATISTICS AND ITS INTERFACE 2015; 9:47-56. [PMID: 26543519 PMCID: PMC4629876 DOI: 10.4310/sii.2016.v9.n1.a5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The need for analysis of multiple responses arises from many applications. In behavioral science, for example, comorbidity is a common phenomenon where multiple disorders occur in the same person. The advantage of jointly analyzing multiple correlated responses has been examined and documented. Due to the difficulties of modeling multiple responses, nonparametric tests such as generalized Kendall's Tau have been developed to assess the association between multiple responses and risk factors. These procedures have been applied to genomewide association studies of multiple complex traits. Unfortunately, those nonparametric tests only provide the significance of the association but not the magnitude. We propose a Gaussian copula model with discrete margins for modeling multivariate binary responses. This model separates marginal effects from between-trait correlations. We use a bootstrapping margins approach to constructing Wald's statistic for the association test. Although our derivation is based on the fully parametric Gaussian copula framework for simplicity, the underlying assumptions to apply our method can be weakened. The bootstrapping margins approach only requires the correct specification of the model margins. Our simulation and real data analysis demonstrate that our proposed method not only increases power over some existing association tests, but also provides further insight into genetic association studies of multivariate traits.
Collapse
Affiliation(s)
- Jiwei Zhao
- Department of Biostatistics, School of Public Health and Health Professions, University at Buffalo, The State University of New York, Buffalo, NY, 14214; ()
| | - Heping Zhang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06511
| |
Collapse
|
25
|
Kim J, Bai Y, Pan W. An Adaptive Association Test for Multiple Phenotypes with GWAS Summary Statistics. Genet Epidemiol 2015; 39:651-63. [PMID: 26493956 DOI: 10.1002/gepi.21931] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Accepted: 08/12/2015] [Indexed: 01/01/2023]
Abstract
We study the problem of testing for single marker-multiple phenotype associations based on genome-wide association study (GWAS) summary statistics without access to individual-level genotype and phenotype data. For most published GWASs, because obtaining summary data is substantially easier than accessing individual-level phenotype and genotype data, while often multiple correlated traits have been collected, the problem studied here has become increasingly important. We propose a powerful adaptive test and compare its performance with some existing tests. We illustrate its applications to analyses of a meta-analyzed GWAS dataset with three blood lipid traits and another with sex-stratified anthropometric traits, and further demonstrate its potential power gain over some existing methods through realistic simulation studies. We start from the situation with only one set of (possibly meta-analyzed) genome-wide summary statistics, then extend the method to meta-analysis of multiple sets of genome-wide summary statistics, each from one GWAS. We expect the proposed test to be useful in practice as more powerful than or complementary to existing methods.
Collapse
Affiliation(s)
- Junghi Kim
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Yun Bai
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|
26
|
Associating Multivariate Quantitative Phenotypes with Genetic Variants in Family Samples with a Novel Kernel Machine Regression Method. Genetics 2015; 201:1329-39. [PMID: 26482791 DOI: 10.1534/genetics.115.178590] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 10/04/2015] [Indexed: 11/18/2022] Open
Abstract
The recent development of sequencing technology allows identification of association between the whole spectrum of genetic variants and complex diseases. Over the past few years, a number of association tests for rare variants have been developed. Jointly testing for association between genetic variants and multiple correlated phenotypes may increase the power to detect causal genes in family-based studies, but familial correlation needs to be appropriately handled to avoid an inflated type I error rate. Here we propose a novel approach for multivariate family data using kernel machine regression (denoted as MF-KM) that is based on a linear mixed-model framework and can be applied to a large range of studies with different types of traits. In our simulation studies, the usual kernel machine test has inflated type I error rates when applied directly to familial data, while our proposed MF-KM method preserves the expected type I error rates. Moreover, the MF-KM method has increased power compared to methods that either analyze each phenotype separately while considering family structure or use only unrelated founders from the families. Finally, we illustrate our proposed methodology by analyzing whole-genome genotyping data from a lung function study.
Collapse
|
27
|
Guo X, Li Y, Ding X, He M, Wang X, Zhang H. Association Tests of Multiple Phenotypes: ATeMP. PLoS One 2015; 10:e0140348. [PMID: 26479245 PMCID: PMC4610695 DOI: 10.1371/journal.pone.0140348] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2015] [Accepted: 09/24/2015] [Indexed: 11/19/2022] Open
Abstract
Joint analysis of multiple phenotypes has gained growing attention in genome-wide association studies (GWASs), especially for the analysis of multiple intermediate phenotypes which measure the same underlying complex human disorder. One of the multivariate methods, MultiPhen (O’ Reilly et al. 2012), employs the proportional odds model to regress a genotype on multiple phenotypes, hence ignoring the phenotypic distributions. Despite the flexibilities of MultiPhen, the properties and performance of MultiPhen are not well understood, especially when the phenotypic distributions are non-normal. In fact, it is well known in the statistical literature that the estimation is attenuated when the explanatory variables contain measurement errors. In this study, we first established an equivalence relationship between MultiPhen and the generalized Kendall tau association test, shedding light on why MultiPhen can perform well for joint association analysis of multiple phenotypes. Through the equivalence, we show that MultiPhen may lose power when the phenotypes are non-normal. To maintain the power, we propose two solutions (ATeMP-rn and ATeMP-or) to improve MultiPhen, and demonstrate their effectiveness through extensive simulation studies and a real case study from the Guangzhou Twin Eye Study.
Collapse
Affiliation(s)
- Xiaobo Guo
- Department of Statistical Science, School of Mathematics & Computational Science, Sun Yat-Sen University, Guangzhou, GD 510275, China
- SYSU-CMU Shunde International Joint Research Institute, Shunde, GD 528300, China
- Southern China Research Center of Statistical Science, Sun Yat-Sen University, Guangzhou, GD 510275, China
| | - Yixi Li
- Peking University HSBC Business School, Shenzhen, GD 518055, China
| | - Xiaohu Ding
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, GD 510080, China
| | - Mingguang He
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, GD 510080, China
| | - Xueqin Wang
- Department of Statistical Science, School of Mathematics & Computational Science, Sun Yat-Sen University, Guangzhou, GD 510275, China
- SYSU-CMU Shunde International Joint Research Institute, Shunde, GD 528300, China
- Southern China Research Center of Statistical Science, Sun Yat-Sen University, Guangzhou, GD 510275, China
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, GD 510080, China
| | - Heping Zhang
- Department of Statistical Science, School of Mathematics & Computational Science, Sun Yat-Sen University, Guangzhou, GD 510275, China
- Department of Biostatistics, Yale University School of Public Health, New Haven, CT 06520, United States of America
- Southern China Research Center of Statistical Science, Sun Yat-Sen University, Guangzhou, GD 510275, China
- * E-mail:
| |
Collapse
|
28
|
Wang Y, Liu A, Mills JL, Boehnke M, Wilson AF, Bailey-Wilson JE, Xiong M, Wu CO, Fan R. Pleiotropy analysis of quantitative traits at gene level by multivariate functional linear models. Genet Epidemiol 2015; 39:259-75. [PMID: 25809955 PMCID: PMC4443751 DOI: 10.1002/gepi.21895] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Revised: 01/28/2015] [Accepted: 01/28/2015] [Indexed: 10/23/2022]
Abstract
In genetics, pleiotropy describes the genetic effect of a single gene on multiple phenotypic traits. A common approach is to analyze the phenotypic traits separately using univariate analyses and combine the test results through multiple comparisons. This approach may lead to low power. Multivariate functional linear models are developed to connect genetic variant data to multiple quantitative traits adjusting for covariates for a unified analysis. Three types of approximate F-distribution tests based on Pillai-Bartlett trace, Hotelling-Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants in one genetic region. The approximate F-distribution tests provide much more significant results than those of F-tests of univariate analysis and optimal sequence kernel association test (SKAT-O). Extensive simulations were performed to evaluate the false positive rates and power performance of the proposed models and tests. We show that the approximate F-distribution tests control the type I error rates very well. Overall, simultaneous analysis of multiple traits can increase power performance compared to an individual test of each trait. The proposed methods were applied to analyze (1) four lipid traits in eight European cohorts, and (2) three biochemical traits in the Trinity Students Study. The approximate F-distribution tests provide much more significant results than those of F-tests of univariate analysis and SKAT-O for the three biochemical traits. The approximate F-distribution tests of the proposed functional linear models are more sensitive than those of the traditional multivariate linear models that in turn are more sensitive than SKAT-O in the univariate case. The analysis of the four lipid traits and the three biochemical traits detects more association than SKAT-O in the univariate case.
Collapse
Affiliation(s)
- Yifan Wang
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Aiyi Liu
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
| | - James L. Mills
- Epidemiology Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Michael Boehnke
- Department of Biostatistics, School of Public Health, The University of Michigan, Ann Arbor, Michigan, United States of America
| | - Alexander F. Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Joan E. Bailey-Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Momiao Xiong
- Human Genetics Center, University of Texas - Houston, Houston, Texas, United States of America
| | - Colin O. Wu
- Office of Biostatistics Research, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Ruzong Fan
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
| |
Collapse
|
29
|
Meta-analysis of correlated traits via summary statistics from GWASs with an application in hypertension. Am J Hum Genet 2015; 96:21-36. [PMID: 25500260 DOI: 10.1016/j.ajhg.2014.11.011] [Citation(s) in RCA: 238] [Impact Index Per Article: 26.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2014] [Accepted: 11/17/2014] [Indexed: 12/14/2022] Open
Abstract
Genome-wide association studies (GWASs) have identified many genetic variants underlying complex traits. Many detected genetic loci harbor variants that associate with multiple-even distinct-traits. Most current analysis approaches focus on single traits, even though the final results from multiple traits are evaluated together. Such approaches miss the opportunity to systemically integrate the phenome-wide data available for genetic association analysis. In this study, we propose a general approach that can integrate association evidence from summary statistics of multiple traits, either correlated, independent, continuous, or binary traits, which might come from the same or different studies. We allow for trait heterogeneity effects. Population structure and cryptic relatedness can also be controlled. Our simulations suggest that the proposed method has improved statistical power over single-trait analysis in most of the cases we studied. We applied our method to the Continental Origins and Genetic Epidemiology Network (COGENT) African ancestry samples for three blood pressure traits and identified four loci (CHIC2, HOXA-EVX1, IGFBP1/IGFBP3, and CDH17; p < 5.0 × 10(-8)) associated with hypertension-related traits that were missed by a single-trait analysis in the original report. Six additional loci with suggestive association evidence (p < 5.0 × 10(-7)) were also observed, including CACNA1D and WNT3. Our study strongly suggests that analyzing multiple phenotypes can improve statistical power and that such analysis can be executed with the summary statistics from GWASs. Our method also provides a way to study a cross phenotype (CP) association by using summary statistics from GWASs of multiple phenotypes.
Collapse
|
30
|
Wang W, Feng Z, Bull SB, Wang Z. A 2-step strategy for detecting pleiotropic effects on multiple longitudinal traits. Front Genet 2014; 5:357. [PMID: 25368629 PMCID: PMC4202779 DOI: 10.3389/fgene.2014.00357] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2014] [Accepted: 09/25/2014] [Indexed: 12/13/2022] Open
Abstract
Genetic pleiotropy refers to the situation in which a single gene influences multiple traits and so it is considered as a major factor that underlies genetic correlation among traits. To identify pleiotropy, an important focus in genome-wide association studies (GWAS) is on finding genetic variants that are simultaneously associated with multiple traits. On the other hand, longitudinal designs are often employed in many complex disease studies, such that, traits are measured repeatedly over time within the same subject. Performing genetic association analysis simultaneously on multiple longitudinal traits for detecting pleiotropic effects is interesting but challenging. In this paper, we propose a 2-step method for simultaneously testing the genetic association with multiple longitudinal traits. In the first step, a mixed effects model is used to analyze each longitudinal trait. We focus on estimation of the random effect that accounts for the subject-specific genetic contribution to the trait; fixed effects of other confounding covariates are also estimated. This first step enables separation of the genetic effect from other confounding effects for each subject and for each longitudinal trait. Then in the second step, we perform a simultaneous association test on multiple estimated random effects arising from multiple longitudinal traits. The proposed method can efficiently detect pleiotropic effects on multiple longitudinal traits and can flexibly handle traits of different data types such as quantitative, binary, or count data. We apply this method to analyze the 16th Genetic Analysis Workshop (GAW16) Framingham Heart Study (FHS) data. A simulation study is also conducted to validate this 2-step method and evaluate its performance.
Collapse
Affiliation(s)
- Weiqiang Wang
- Department of Mathematics and Statistics, University of Guelph Guelph, ON, Canada
| | - Zeny Feng
- Department of Mathematics and Statistics, University of Guelph Guelph, ON, Canada
| | - Shelley B Bull
- Lunenfeld-Tanenbaum Research Institute of Mount Sinai Hospital, Prosserman Centre for Health Research Toronto, ON, Canada ; Dalla Lana School of Public Health, University of Toronto Toronto, ON, Canada
| | - Zuoheng Wang
- Division of Biostatistics, Yale School of Public Health New Haven, CT, USA
| |
Collapse
|
31
|
Zhang Y, Xu Z, Shen X, Pan W. Testing for association with multiple traits in generalized estimation equations, with application to neuroimaging data. Neuroimage 2014; 96:309-25. [PMID: 24704269 PMCID: PMC4043944 DOI: 10.1016/j.neuroimage.2014.03.061] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2013] [Revised: 02/14/2014] [Accepted: 03/23/2014] [Indexed: 11/17/2022] Open
Abstract
There is an increasing need to develop and apply powerful statistical tests to detect multiple traits-single locus associations, as arising from neuroimaging genetics and other studies. For example, in the Alzheimer's Disease Neuroimaging Initiative (ADNI), in addition to genome-wide single nucleotide polymorphisms (SNPs), thousands of neuroimaging and neuropsychological phenotypes as intermediate phenotypes for Alzheimer's disease, have been collected. Although some classic methods like MANOVA and newly proposed methods may be applied, they have their own limitations. For example, MANOVA cannot be applied to binary and other discrete traits. In addition, the relationships among these methods are not well understood. Importantly, since these tests are not data adaptive, depending on the unknown association patterns among multiple traits and between multiple traits and a locus, these tests may or may not be powerful. In this paper we propose a class of data-adaptive weights and the corresponding weighted tests in the general framework of generalized estimation equations (GEE). A highly adaptive test is proposed to select the most powerful one from this class of the weighted tests so that it can maintain high power across a wide range of situations. Our proposed tests are applicable to various types of traits with or without covariates. Importantly, we also analytically show relationships among some existing and our proposed tests, indicating that many existing tests are special cases of our proposed tests. Extensive simulation studies were conducted to compare and contrast the power properties of various existing and our new methods. Finally, we applied the methods to an ADNI dataset to illustrate the performance of the methods. We conclude with the recommendation for the use of the GEE-based Score test and our proposed adaptive test for their high and complementary performance.
Collapse
Affiliation(s)
- Yiwei Zhang
- Division of Biostatistics, School of Public Health, Minneapolis, MN 55455, USA
| | - Zhiyuan Xu
- Division of Biostatistics, School of Public Health, Minneapolis, MN 55455, USA
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Wei Pan
- Division of Biostatistics, School of Public Health, Minneapolis, MN 55455, USA.
| |
Collapse
|
32
|
Hsieh TJ, Chang SH, Tai JJ. A family-based robust multivariate association test using maximum statistic. Ann Hum Genet 2014; 78:117-28. [PMID: 24571230 DOI: 10.1111/ahg.12054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Accepted: 12/18/2013] [Indexed: 11/29/2022]
Abstract
For characterizing the genetic mechanisms of complex diseases familial data with multiple correlated quantitative traits are usually collected in genetic studies. To analyze such data, various multivariate tests have been proposed to investigate the association between the underlying disease genes and the multiple traits. Although these multivariate association tests may have better power performance than the univariate association tests, they suffer from loss of testing power when the genetic models of the putative genes are misspecified. To address the problem, in this paper we aim to develop a family-based robust multivariate association test. We will first establish the optimal multivariate score tests for the recessive, additive, and dominant genetic models. Based on these optimal tests, a maximum-type robust multivariate association test is then obtained. Simulations are conducted to compare the power of our method with that of other existing multivariate methods. The results show that the robust multivariate test does manifest the robustness in power over all plausible genetic models. A practical data set is applied to demonstrate the applicability of our approach. The results suggest that the robust multivariate test is more powerful than the robust univariate test when dealing with multiple quantitative traits.
Collapse
Affiliation(s)
- Tsung-Jen Hsieh
- Division of Biostatistics, College of Public Health, National Taiwan University, Taipei, Taiwan
| | | | | |
Collapse
|
33
|
Jiang Y, Li N, Zhang H. Identifying Genetic Variants for Addiction via Propensity Score Adjusted Generalized Kendall's Tau. J Am Stat Assoc 2014; 109:905-930. [PMID: 25382885 PMCID: PMC4219655 DOI: 10.1080/01621459.2014.901223] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2012] [Revised: 12/01/2013] [Indexed: 12/18/2022]
Abstract
Identifying replicable genetic variants for addiction has been extremely challenging. Besides the common difficulties with genome-wide association studies (GWAS), environmental factors are known to be critical to addiction, and comorbidity is widely observed. Despite the importance of environmental factors and comorbidity for addiction study, few GWAS analyses adequately considered them due to the limitations of the existing statistical methods. Although parametric methods have been developed to adjust for covariates in association analysis, difficulties arise when the traits are multivariate because there is no ready-to-use model for them. Recent nonparametric development includes U-statistics to measure the phenotype-genotype association weighted by a similarity score of covariates. However, it is not clear how to optimize the similarity score. Therefore, we propose a semiparametric method to measure the association adjusted by covariates. In our approach, the nonparametric U-statistic is adjusted by parametric estimates of propensity scores using the idea of inverse probability weighting. The new measurement is shown to be asymptotically unbiased under our null hypothesis while the previous non-weighted and weighted ones are not. Simulation results show that our test improves power as opposed to the non-weighted and two other weighted U-statistic methods, and it is particularly powerful for detecting gene-environment interactions. Finally, we apply our proposed test to the Study of Addiction: Genetics and Environment (SAGE) to identify genetic variants for addiction. Novel genetic variants are found from our analysis, which warrant further investigation in the future.
Collapse
Affiliation(s)
- Yuan Jiang
- Department of Statistics, Oregon State University, Corvallis, Oregon 97331-4606
| | - Ni Li
- School of Mathematics and Statistics, Hainan Normal University, Haikou 571158, China
| | | |
Collapse
|
34
|
Li Q, Hu J, Ding J, Zheng G. Fisher's method of combining dependent statistics using generalizations of the gamma distribution with applications to genetic pleiotropic associations. Biostatistics 2013; 15:284-95. [PMID: 24174580 DOI: 10.1093/biostatistics/kxt045] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
A classical approach to combine independent test statistics is Fisher's combination of $p$-values, which follows the $\chi ^2$ distribution. When the test statistics are dependent, the gamma distribution (GD) is commonly used for the Fisher's combination test (FCT). We propose to use two generalizations of the GD: the generalized and the exponentiated GDs. We study some properties of mis-using the GD for the FCT to combine dependent statistics when one of the two proposed distributions are true. Our results show that both generalizations have better control of type I error rates than the GD, which tends to have inflated type I error rates at more extreme tails. In practice, common model selection criteria (e.g. Akaike information criterion/Bayesian information criterion) can be used to help select a better distribution to use for the FCT. A simple strategy of the two generalizations of the GD in genome-wide association studies is discussed. Applications of the results to genetic pleiotrophic associations are described, where multiple traits are tested for association with a single marker.
Collapse
Affiliation(s)
- Qizhai Li
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
| | | | | | | |
Collapse
|
35
|
Feng Z. A generalized quasi-likelihood scoring approach for simultaneously testing the genetic association of multiple traits. J R Stat Soc Ser C Appl Stat 2013. [DOI: 10.1111/rssc.12038] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
36
|
Solovieff N, Cotsapas C, Lee PH, Purcell SM, Smoller JW. Pleiotropy in complex traits: challenges and strategies. Nat Rev Genet 2013; 14:483-95. [PMID: 23752797 DOI: 10.1038/nrg3461] [Citation(s) in RCA: 682] [Impact Index Per Article: 62.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Genome-wide association studies have identified many variants that each affects multiple traits, particularly across autoimmune diseases, cancers and neuropsychiatric disorders, suggesting that pleiotropic effects on human complex traits may be widespread. However, systematic detection of such effects is challenging and requires new methodologies and frameworks for interpreting cross-phenotype results. In this Review, we discuss the evidence for pleiotropy in contemporary genetic mapping studies, new and established analytical approaches to identifying pleiotropic effects, sources of spurious cross-phenotype effects and study design considerations. We also outline the molecular and clinical implications of such findings and discuss future directions of research.
Collapse
Affiliation(s)
- Nadia Solovieff
- Center for Human Genetics Research, Massachusetts General Hospital, 185 Cambridge Street, Boston, Massachusetts 02114, USA
| | | | | | | | | |
Collapse
|
37
|
ZHU W, ZHANG H. A nonparametric regression method for multiple longitudinal phenotypes using multivariate adaptive splines. FRONTIERS OF MATHEMATICS IN CHINA : SELECTED PAPERS FROM CHINESE UNIVERSITIES 2013; 8:731-743. [PMID: 25309585 PMCID: PMC4193387 DOI: 10.1007/s11464-012-0256-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
In genetic studies of complex diseases, particularly mental illnesses, and behavior disorders, two distinct characteristics have emerged in some data sets. First, genetic data sets are collected with a large number of phenotypes that are potentially related to the complex disease under study. Second, each phenotype is collected from the same subject repeatedly over time. In this study, we present a nonparametric regression approach to study multivariate and time-repeated phenotypes together by using the technique of the multivariate adaptive regression splines for analysis of longitudinal data (MASAL), which makes it possible to identify genes, gene-gene and gene-environment, including time, interactions associated with the phenotypes of interest. Furthermore, we propose a permutation test to assess the associations between the phenotypes and selected markers. Through simulation, we demonstrate that our proposed approach has advantages over the existing methods that examine each longitudinal phenotype separately or analyze the summarized values of phenotypes by compressing them into one-time-point phenotypes. Application of the proposed method to the Framingham Heart Study illustrates that the use of multivariate longitudinal phenotypes enhanced the significance of the association test.
Collapse
Affiliation(s)
- Wensheng ZHU
- Key Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, Changchun 130024, China
- Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT 06520-8034, USA
| | - Heping ZHANG
- Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT 06520-8034, USA
| |
Collapse
|
38
|
Li Q, Li Z, Zheng G, Gao G, Yu K. Rank-based robust tests for quantitative-trait genetic association studies. Genet Epidemiol 2013; 37:358-65. [PMID: 23526350 DOI: 10.1002/gepi.21723] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2012] [Revised: 02/18/2013] [Accepted: 02/20/2013] [Indexed: 11/06/2022]
Abstract
Standard linear regression is commonly used for genetic association studies of quantitative traits. This approach may not be appropriate if the trait, on its original or transformed scales, does not follow a normal distribution. A rank-based nonparametric approach that does not rely on any distributional assumptions can be an attractive alternative. Although several nonparametric tests exist in the literature, their performance in the genetic association setting is not well studied. We evaluate various nonparametric tests for the analysis of quantitative traits and propose a new class of nonparametric tests that have robust performance for traits with various distributions and under different genetic models. We demonstrate the advantage of our proposed methods through simulation study and real data applications.
Collapse
Affiliation(s)
- Qizhai Li
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.
| | | | | | | | | |
Collapse
|
39
|
Wu CO, Zheng G, Kwak M. A Joint Regression Analysis for Genetic Association Studies with Outcome Stratified Samples. Biometrics 2013; 69:417-26. [DOI: 10.1111/biom.12012] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2011] [Revised: 10/01/2012] [Accepted: 11/01/2012] [Indexed: 11/30/2022]
Affiliation(s)
- Colin O. Wu
- Office of Biostatistics Research, National Heart, Lung and Blood Institute, BethesdaMaryland 20892U.S.A
| | - Gang Zheng
- Office of Biostatistics Research, National Heart, Lung and Blood Institute, BethesdaMaryland 20892U.S.A
| | - Minjung Kwak
- Office of Biostatistics Research, National Heart, Lung and Blood Institute, BethesdaMaryland 20892U.S.A
| |
Collapse
|
40
|
Liu Z, Guo X, Jiang Y, Zhang H. NCK2 is significantly associated with opiates addiction in African-origin men. ScientificWorldJournal 2013; 2013:748979. [PMID: 23533358 PMCID: PMC3603435 DOI: 10.1155/2013/748979] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2012] [Accepted: 01/18/2013] [Indexed: 11/17/2022] Open
Abstract
Substance dependence is a complex environmental and genetic disorder with significant social and medical concerns. Understanding the etiology of substance dependence is imperative to the development of effective treatment and prevention strategies. To this end, substantial effort has been made to identify genes underlying substance dependence, and in recent years, genome-wide association studies (GWASs) have led to discoveries of numerous genetic variants for complex diseases including substance dependence. Most of the GWAS discoveries were only based on single nucleotide polymorphisms (SNPs) and a single dichotomized outcome. By employing both SNP- and gene-based methods of analysis, we identified a strong (odds ratio = 13.87) and significant (P value = 1.33E - 11) association of an SNP in the NCK2 gene on chromosome 2 with opiates addiction in African-origin men. Codependence analysis also identified a genome-wide significant association between NCK2 and comorbidity of substance dependence (P value = 3.65E - 08) in African-origin men. Furthermore, we observed that the association between the NCK2 gene (P value = 3.12E - 10) and opiates addiction reached the gene-based genome-wide significant level. In summary, our findings provided the first evidence for the involvement of NCK2 in the susceptibility to opiates addiction and further revealed the racial and gender specificities of its impact.
Collapse
Affiliation(s)
- Zhifa Liu
- Department of Biostatistics, Yale University School of Public Health, New Haven, CT 06520, USA
| | - Xiaobo Guo
- Department of Biostatistics, Yale University School of Public Health, New Haven, CT 06520, USA
- Department of Statistical Science, School of Mathematics and Computational Science, Sun Yat-sen University, Guangzhou 510275, China
| | - Yuan Jiang
- Department of Statistics, Oregon State University, Corvallis, OR 97331, USA
| | - Heping Zhang
- Department of Biostatistics, Yale University School of Public Health, New Haven, CT 06520, USA
| |
Collapse
|
41
|
Guo X, Liu Z, Wang X, Zhang H. Genetic association test for multiple traits at gene level. Genet Epidemiol 2013; 37:122-9. [PMID: 23032486 PMCID: PMC3524409 DOI: 10.1002/gepi.21688] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2012] [Revised: 08/21/2012] [Accepted: 09/07/2012] [Indexed: 01/09/2023]
Abstract
Genome-wide association studies (GWASs) at the gene level are commonly used to understand biological mechanisms underlying complex diseases. In general, one response or outcome is used to present a disease of interest in such studies. In this study, we consider a multiple traits association test from the gene level. We propose and examine a class of test statistics that summarizes the association information between single nucleotide polymorphisms (SNPs) and each of the traits. Our simulation studies demonstrate the advantage of gene-based multiple traits association tests when multiple traits share common genes. Using our proposed tests, we reanalyze the dataset from the Study of Addiction: Genetics and Environment (SAGE). Our result validates previous findings while presenting stronger evidence for consideration of multiple traits.
Collapse
Affiliation(s)
- Xiaobo Guo
- Department of Biostatistics, Yale University School of Medicine, New Haven, CT, USA
- Department of Statistical Science, School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, China
| | - Zhifa Liu
- Department of Biostatistics, Yale University School of Medicine, New Haven, CT, USA
| | - Xueqin Wang
- Department of Statistical Science, School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, China
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, China
| | - Heping Zhang
- Department of Biostatistics, Yale University School of Medicine, New Haven, CT, USA
| |
Collapse
|
42
|
Maity A, Sullivan PF, Tzeng JY. Multivariate phenotype association analysis by marker-set kernel machine regression. Genet Epidemiol 2012; 36:686-95. [PMID: 22899176 DOI: 10.1002/gepi.21663] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2012] [Revised: 05/23/2012] [Accepted: 06/18/2012] [Indexed: 11/06/2022]
Abstract
Genetic studies of complex diseases often collect multiple phenotypes relevant to the disorders. As these phenotypes can be correlated and share common genetic mechanisms, jointly analyzing these traits may bring more power to detect genes influencing individual or multiple phenotypes. Given the advancement brought by the multivariate phenotype approaches and the multimarker kernel machine regression, we construct a multivariate regression based on kernel machine to facilitate the joint evaluation of multimarker effects on multiple phenotypes. The kernel machine serves as a powerful dimension-reduction tool to capture complex effects among markers. The multivariate framework incorporates the potentially correlated multidimensional phenotypic information and accommodates common or different environmental covariates for each trait. We derive the multivariate kernel machine test based on a score-like statistic, and conduct simulations to evaluate the validity and efficacy of the method. We also study the performance of the commonly adapted strategies for kernel machine analysis on multiple phenotypes, including the multiple univariate kernel machine tests with original phenotypes or with their principal components. Our results suggest that none of these approaches has the uniformly best power, and the optimal test depends on the magnitude of the phenotype correlation and the effect patterns. However, the multivariate test retains to be a reasonable approach when the multiple phenotypes have none or mild correlations, and gives the best power once the correlation becomes stronger or when there exist genes that affect more than one phenotype. We illustrate the utility of the multivariate kernel machine method through the Clinical Antipsychotic Trails of Intervention Effectiveness antibody study.
Collapse
Affiliation(s)
- Arnab Maity
- Department of Statistics, North Carolina State University, Raleigh, USA
| | | | | |
Collapse
|
43
|
Abstract
Kendall's τ is a non-parametric measure of correlation based on ranks and is used in a wide range of research disciplines. Although methods are available for making inference about Kendall's τ, none has been extended to modeling multiple Kendall's τs arising in longitudinal data analysis. Compounding this problem is the pervasive issue of missing data in such study designs. In this paper, we develop a novel approach to provide inference about Kendall's τ within a longitudinal study setting under both complete and missing data. The proposed approach is illustrated with simulated data and applied to an HIV prevention study.
Collapse
Affiliation(s)
- Yan Ma
- Hospital for Special Surgery, Department of Public Health, Weill Medical College of Cornell University, New York, NY 10021
| |
Collapse
|
44
|
Zheng G, Wu CO, Kwak M, Jiang W, Joo J, Lima JAC. Joint analysis of binary and quantitative traits with data sharing and outcome-dependent sampling. Genet Epidemiol 2012; 36:263-73. [PMID: 22460626 DOI: 10.1002/gepi.21619] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2011] [Revised: 12/23/2011] [Accepted: 01/02/2012] [Indexed: 11/07/2022]
Abstract
We study the analysis of a joint association between a genetic marker with both binary (case-control) and quantitative (continuous) traits, where the quantitative trait values are only available for the cases due to data sharing and outcome-dependent sampling. Data sharing becomes common in genetic association studies, and the outcome-dependent sampling is the consequence of data sharing, under which a phenotype of interest is not measured for some subgroup. The trend test (or Pearson's test) and F-test are often, respectively, used to analyze the binary and quantitative traits. Because of the outcome-dependent sampling, the usual F-test can be applied using the subgroup with the observed quantitative traits. We propose a modified F-test by also incorporating the genotype frequencies of the subgroup whose traits are not observed. Further, a combination of this modified F-test and Pearson's test is proposed by Fisher's combination of their P-values as a joint analysis. Because of the correlation of the two analyses, we propose to use a Gamma (scaled chi-squared) distribution to fit the asymptotic null distribution for the joint analysis. The proposed modified F-test and the joint analysis can also be applied to test single trait association (either binary or quantitative trait). Through simulations, we identify the situations under which the proposed tests are more powerful than the existing ones. Application to a real dataset of rheumatoid arthritis is presented.
Collapse
Affiliation(s)
- Gang Zheng
- National Heart, Lung and Blood Institute, 6701 Rockledge Drive, Bethesda, MD 20892, USA.
| | | | | | | | | | | |
Collapse
|
45
|
Shriner D. Moving toward System Genetics through Multiple Trait Analysis in Genome-Wide Association Studies. Front Genet 2012; 3:1. [PMID: 22303408 PMCID: PMC3266611 DOI: 10.3389/fgene.2012.00001] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2011] [Accepted: 01/01/2012] [Indexed: 02/05/2023] Open
Abstract
Association studies are a staple of genotype–phenotype mapping studies, whether they are based on single markers, haplotypes, candidate genes, genome-wide genotypes, or whole genome sequences. Although genetic epidemiological studies typically contain data collected on multiple traits which themselves are often correlated, most analyses have been performed on single traits. Here, I review several methods that have been developed to perform multiple trait analysis. These methods range from traditional multivariate models for systems of equations to recently developed graphical approaches based on network theory. The application of network theory to genetics is termed systems genetics and has the potential to address long-standing questions in genetics about complex processes such as coordinate regulation, homeostasis, and pleiotropy.
Collapse
Affiliation(s)
- Daniel Shriner
- Center for Research on Genomics and Global Health, National Human Genome Research Institute Bethesda, MD, USA
| |
Collapse
|
46
|
Zhu W, Jiang Y, Zhang H. Nonparametric Covariate-Adjusted Association Tests Based on the Generalized Kendall's Tau(). J Am Stat Assoc 2012; 107:1-11. [PMID: 22745516 PMCID: PMC3381868 DOI: 10.1080/01621459.2011.643707] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Identifying the risk factors for comorbidity is important in psychiatric research. Empirically, studies have shown that testing multiple, correlated traits simultaneously is more powerful than testing a single trait at a time in association analysis. Furthermore, for complex diseases, especially mental illnesses and behavioral disorders, the traits are often recorded in different scales such as dichotomous, ordinal and quantitative. In the absence of covariates, nonparametric association tests have been developed for multiple complex traits to study comorbidity. However, genetic studies generally contain measurements of some covariates that may affect the relationship between the risk factors of major interest (such as genes) and the outcomes. While it is relatively easy to adjust these covariates in a parametric model for quantitative traits, it is challenging for multiple complex traits with possibly different scales. In this article, we propose a nonparametric test for multiple complex traits that can adjust for covariate effects. The test aims to achieve an optimal scheme of adjustment by using a maximum statistic calculated from multiple adjusted test statistics. We derive the asymptotic null distribution of the maximum test statistic, and also propose a resampling approach, both of which can be used to assess the significance of our test. Simulations are conducted to compare the type I error and power of the nonparametric adjusted test to the unadjusted test and other existing adjusted tests. The empirical results suggest that our proposed test increases the power through adjustment for covariates when there exist environmental effects, and is more robust to model misspecifications than some existing parametric adjusted tests. We further demonstrate the advantage of our test by analyzing a data set on genetics of alcoholism.
Collapse
|
47
|
Jiang Y, Zhang H. Propensity score-based nonparametric test revealing genetic variants underlying bipolar disorder. Genet Epidemiol 2011; 35:125-32. [PMID: 21254220 DOI: 10.1002/gepi.20558] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Association analysis has led to the identification of many genetic variants for complex diseases. While assessing the association between genes and a disease, other factors can play an important role. The consequence of not considering covariates (such as population stratification and environmental factors) is well-documented in genetic studies. We introduce a nonparametric test of association that adjusts for covariate effects. Specifically, the adjustment is realized through weights that are constructed from genomic propensity scores that summarize the contribution of all covariates. The benefit of our test is demonstrated through an important data set on bipolar disorder (BD) collected by the Wellcome Trust Case Control Consortium. When compared to other tests, our test identified an unreported region with three single nucleotide polymorphisms (SNPs) on chromosome 16 that show strong evidence of association (P-value <5 × 10(-7)). This region is near the RPGRIP1L gene known to be associated with BD. A haplotype block including these three SNPs was further discovered to be strongly associated with BD. It is also interesting to note that our nonparametric test did not reveal strong signals at two SNPs that were detected by a covariate-adjusted parametric test. This suggests that different methods of covariate adjustment can complement each other. Thus, we recommend using both parametric and nonparametric testing. Additionally, we performed simulation studies to compare our proposed test with the unadjusted test and an adjusted parametric test. Our finding underscores the importance of accommodating and controlling for covariate effects in discovering genetic variants associated with complex disorders.
Collapse
Affiliation(s)
- Yuan Jiang
- Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, Connecticut 06520, USA
| | | |
Collapse
|
48
|
Abstract
Identifying the risk factors for mental illnesses is of significant public health importance. Diagnosis, stigma associated with mental illnesses, comorbidity, and complex etiologies, among others, make it very challenging to study mental disorders. Genetic studies of mental illnesses date back at least a century ago, beginning with descriptive studies based on Mendelian laws of inheritance. A variety of study designs including twin studies, family studies, linkage analysis, and more recently, genomewide association studies have been employed to study the genetics of mental illnesses, or complex diseases in general. In this paper, I will present the challenges and methods from a statistical perspective and focus on genetic association studies.
Collapse
Affiliation(s)
- Heping Zhang
- Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT 06520-8034
| |
Collapse
|