151
|
Abstract
Fourteen years ago, the first article on molecular genetics was published in this journal: Child Development, Molecular Genetics, andWhat to Do With Genes Once They Are Found (R. Plomin & M. Rutter, 1998). The goal of the article was to outline what developmentalists can do with genes once they are found. These new directions for developmental research are still relevant today. The problem lies with the phrase “once they are found”: It has been much more difficult than expected to identify genes responsible for the heritability of complex traits and common disorders, the so-called missing heritability problem. The present article considers reasons for the missing heritability problem and possible solutions.
Collapse
|
152
|
Xiong Q, Ancona N, Hauser ER, Mukherjee S, Furey TS. Integrating genetic and gene expression evidence into genome-wide association analysis of gene sets. Genome Res 2012; 22:386-97. [PMID: 21940837 PMCID: PMC3266045 DOI: 10.1101/gr.124370.111] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2011] [Accepted: 09/19/2011] [Indexed: 12/11/2022]
Abstract
Single variant or single gene analyses generally account for only a small proportion of the phenotypic variation in complex traits. Alternatively, gene set or pathway association analyses are playing an increasingly important role in uncovering genetic architectures of complex traits through the identification of systematic genetic interactions. Two dominant paradigms for gene set analyses are association analyses based on SNP genotypes and those based on gene expression profiles. However, gene-disease association can manifest in many ways, such as alterations of gene expression, genotype, and copy number; thus, an integrative approach combining multiple forms of evidence can more accurately and comprehensively capture pathway associations. We have developed a single statistical framework, Gene Set Association Analysis (GSAA), that simultaneously measures genome-wide patterns of genetic variation and gene expression variation to identify sets of genes enriched for differential expression and/or trait-associated genetic markers. Simulation studies illustrate that joint analyses of genomic data increase the power to detect real associations when compared with gene set methods that use only one genomic data type. The analysis of two human diseases, glioblastoma and Crohn's disease, detected abnormalities in previously identified disease-associated pathways, such as pathways related to PI3K signaling, DNA damage response, and the activation of NFKB. In addition, GSAA predicted novel pathway associations, for example, differential genetic and expression characteristics in genes from the ABC transporter family in glioblastoma and from the HLA system in Crohn's disease. These demonstrate that GSAA can help uncover biological pathways underlying human diseases and complex traits.
Collapse
Affiliation(s)
- Qing Xiong
- Department of Genetics, Department of Biology, Lineberger Comprehensive Cancer Center, and Carolina Center for Genome Sciences, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Nicola Ancona
- Institute of Intelligent Systems for Automation National Research Council, Bari IT 70126, Italy
| | - Elizabeth R. Hauser
- Center for Human Genetics and Section of Medical Genetics, Department of Medicine, Duke University, Durham, North Carolina 27710, USA
| | - Sayan Mukherjee
- Departments of Statistical Science, Computer Science, and Mathematics, Institute for Genome Sciences & Policy, Duke University, Durham, North Carolina 27708, USA
| | - Terrence S. Furey
- Department of Genetics, Department of Biology, Lineberger Comprehensive Cancer Center, and Carolina Center for Genome Sciences, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| |
Collapse
|
153
|
Casals F, Idaghdour Y, Hussin J, Awadalla P. Next-generation sequencing approaches for genetic mapping of complex diseases. J Neuroimmunol 2012; 248:10-22. [PMID: 22285396 DOI: 10.1016/j.jneuroim.2011.12.017] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2011] [Revised: 11/30/2011] [Accepted: 12/15/2011] [Indexed: 01/12/2023]
Abstract
The advent of next generation sequencing technologies has opened new possibilities in the analysis of human disease. In this review we present the main next-generation sequencing technologies, with their major contributions and possible applications to the study of the genetic etiology of complex diseases.
Collapse
Affiliation(s)
- Ferran Casals
- Centre de Recherche du Centre Hospitalier Universitaire Sainte-Justine, Université de Montréal, Montréal, Québec, Canada.
| | | | | | | |
Collapse
|
154
|
Daye ZJ, Li H, Wei Z. A powerful test for multiple rare variants association studies that incorporates sequencing qualities. Nucleic Acids Res 2012; 40:e60. [PMID: 22262732 PMCID: PMC3340416 DOI: 10.1093/nar/gks024] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Next-generation sequencing data will soon become routinely available for association studies between complex traits and rare variants. Sequencing data, however, are characterized by the presence of sequencing errors at each individual genotype. This makes it especially challenging to perform association studies of rare variants, which, due to their low minor allele frequencies, can be easily perturbed by genotype errors. In this article, we develop the quality-weighted multivariate score association test (qMSAT), a new procedure that allows powerful association tests between complex traits and multiple rare variants under the presence of sequencing errors. Simulation results based on quality scores from real data show that the qMSAT often dominates over current methods, that do not utilize quality information. In particular, the qMSAT can dramatically increase power over existing methods under moderate sample sizes and relatively low coverage. Moreover, in an obesity data study, we identified using the qMSAT two functional regions (MGLL promoter and MGLL 3′-untranslated region) where rare variants are associated with extreme obesity. Due to the high cost of sequencing data, the qMSAT is especially valuable for large-scale studies involving rare variants, as it can potentially increase power without additional experimental cost. qMSAT is freely available at http://qmsat.sourceforge.net/.
Collapse
Affiliation(s)
- Z John Daye
- Department of Biostatistics and Epidemiology, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA
| | | | | |
Collapse
|
155
|
Abstract
Monogenic autoimmune syndromes provide a rare yet powerful glimpse into the fundamental mechanisms of immunologic tolerance. Such syndromes reveal not only the contribution of an individual breakpoint in tolerance but also patterns in the pathogenesis of autoimmunity. Disturbances in innate immunity, a system built for ubiquitous sensing of danger signals, tend to generate systemic autoimmunity. For example, defects in the clearance of self-antigens and chronic stimulation of type 1 interferons lead to the systemic autoimmunity seen in C1q deficiency, SPENCDI, and AGS. In contrast, disturbances of adaptive immunity, which is built for antigen specificity, tend to produce organ-specific autoimmunity. Thus, the loss of lymphocyte homeostasis, whether through defects in apoptosis, suppression, or negative selection, leads to organ-specific autoimmunity in ALPS, IPEX, and APS1. We discuss the unique mechanisms of disease in these prominent syndromes as well as how they contribute to the spectrum of organ-specific or systemic autoimmunity. The continued study of rare variants in autoimmune disease will inform future investigations and treatments directed at rare and common autoimmune diseases alike.
Collapse
Affiliation(s)
- Mickie H. Cheng
- Diabetes Center; Department of Medicine, Division of Endocrinology and Metabolism, University of California at San Francisco, San Francisco, California 94143;
| | - Mark S. Anderson
- Diabetes Center; Department of Medicine, Division of Endocrinology and Metabolism, University of California at San Francisco, San Francisco, California 94143;
| |
Collapse
|
156
|
Tang W, Fu YP, Figueroa JD, Malats N, Garcia-Closas M, Chatterjee N, Kogevinas M, Baris D, Thun M, Hall JL, De Vivo I, Albanes D, Porter-Gill P, Purdue MP, Burdett L, Liu L, Hutchinson A, Myers T, Tardón A, Serra C, Carrato A, Garcia-Closas R, Lloreta J, Johnson A, Schwenn M, Karagas MR, Schned A, Black A, Jacobs EJ, Diver WR, Gapstur SM, Virtamo J, Hunter DJ, Fraumeni JF, Chanock SJ, Silverman DT, Rothman N, Prokunina-Olsson L. Mapping of the UGT1A locus identifies an uncommon coding variant that affects mRNA expression and protects from bladder cancer. Hum Mol Genet 2012; 21:1918-30. [PMID: 22228101 DOI: 10.1093/hmg/ddr619] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
A recent genome-wide association study of bladder cancer identified the UGT1A gene cluster on chromosome 2q37.1 as a novel susceptibility locus. The UGT1A cluster encodes a family of UDP-glucuronosyltransferases (UGTs), which facilitate cellular detoxification and removal of aromatic amines. Bioactivated forms of aromatic amines found in tobacco smoke and industrial chemicals are the main risk factors for bladder cancer. The association within the UGT1A locus was detected by a single nucleotide polymorphism (SNP) rs11892031. Now, we performed detailed resequencing, imputation and genotyping in this region. We clarified the original genetic association detected by rs11892031 and identified an uncommon SNP rs17863783 that explained and strengthened the association in this region (allele frequency 0.014 in 4035 cases and 0.025 in 5284 controls, OR = 0.55, 95%CI = 0.44-0.69, P = 3.3 × 10(-7)). Rs17863783 is a synonymous coding variant Val209Val within the functional UGT1A6.1 splicing form, strongly expressed in the liver, kidney and bladder. We found the protective T allele of rs17863783 to be associated with increased mRNA expression of UGT1A6.1 in in-vitro exontrap assays and in human liver tissue samples. We suggest that rs17863783 may protect from bladder cancer by increasing the removal of carcinogens from bladder epithelium by the UGT1A6.1 protein. Our study shows an example of genetic and functional role of an uncommon protective genetic variant in a complex human disease, such as bladder cancer.
Collapse
Affiliation(s)
- Wei Tang
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20892, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
157
|
Besenbacher S, Mailund T, Schierup MH. Association mapping and disease: evolutionary perspectives. Methods Mol Biol 2012; 856:275-91. [PMID: 22399463 DOI: 10.1007/978-1-61779-585-5_11] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
In this chapter, we give a short introduction to the genetics of complex disease with special emphasis on evolutionary models for disease genes and the effect of different models on the genetic architecture, and finally give a survey of the state-of-the-art of genome-wide association studies.
Collapse
|
158
|
Abstract
The limitations of genome-wide association (GWA) studies that are based on the common disease common variants (CDCV) hypothesis have motivated geneticists to test the hypothesis that rare variants contribute to the variation of common diseases, i.e., common disease/rare variants (CDRV). The newly developed high-throughput sequencing technologies have made the studies of rare variants practicable. Statistical approaches to test associations between a phenotype and rare variants are quickly developing. The central idea of these methods is to test a set of rare variants in a defined region or regions by collapsing or aggregating rare variants, thereby improving the statistical power. In this chapter, we introduce these methods as well as their applications in practice.
Collapse
Affiliation(s)
- Tao Feng
- Department of Epidemiology and Biostatistics, Case Western Reserve University School of Medicine, Cleveland, OH, USA.
| | | |
Collapse
|
159
|
Tazearslan C, Cho M, Suh Y. Discovery of functional gene variants associated with human longevity: opportunities and challenges. J Gerontol A Biol Sci Med Sci 2011; 67:376-83. [PMID: 22156437 DOI: 10.1093/gerona/glr200] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Age is a major risk factor for many human diseases. Extremely long-lived individuals, such as centenarians, have managed to ward off age-related diseases and serve as human models to search for the genetic factors that influence longevity. The discovery of evolutionarily conserved pathways with major impact on life span in animal models has provided tantalizing opportunities to test the relevance of these pathways for human longevity. Here we specifically focus on the insulin/insulin-like growth factor-1 signaling as a prime candidate pathway. Coupled with the rapid advances in ultra high-throughput sequencing technologies, it is now feasible to comprehensively analyze all possible sequence variants in candidate genes segregating with a longevity phenotype and to investigate the functional consequences of the associated variants. A better understanding of the functional genes that affect healthy longevity in humans may lead to a rational basis for intervention strategies that can delay or prevent age-related diseases.
Collapse
Affiliation(s)
- Cagdas Tazearslan
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | | | | |
Collapse
|
160
|
Yi N, Liu N, Zhi D, Li J. Hierarchical generalized linear models for multiple groups of rare and common variants: jointly estimating group and individual-variant effects. PLoS Genet 2011; 7:e1002382. [PMID: 22144906 PMCID: PMC3228815 DOI: 10.1371/journal.pgen.1002382] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2011] [Accepted: 09/29/2011] [Indexed: 12/19/2022] Open
Abstract
Complex diseases and traits are likely influenced by many common and rare genetic variants and environmental factors. Detecting disease susceptibility variants is a challenging task, especially when their frequencies are low and/or their effects are small or moderate. We propose here a comprehensive hierarchical generalized linear model framework for simultaneously analyzing multiple groups of rare and common variants and relevant covariates. The proposed hierarchical generalized linear models introduce a group effect and a genetic score (i.e., a linear combination of main-effect predictors for genetic variants) for each group of variants, and jointly they estimate the group effects and the weights of the genetic scores. This framework includes various previous methods as special cases, and it can effectively deal with both risk and protective variants in a group and can simultaneously estimate the cumulative contribution of multiple variants and their relative importance. Our computational strategy is based on extending the standard procedure for fitting generalized linear models in the statistical software R to the proposed hierarchical models, leading to the development of stable and flexible tools. The methods are illustrated with sequence data in gene ANGPTL4 from the Dallas Heart Study. The performance of the proposed procedures is further assessed via simulation studies. The methods are implemented in a freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/).
Collapse
Affiliation(s)
- Nengjun Yi
- Department of Biostatistics, Section on Statistical Genetics, University of Alabama at Birmingham, Birmingham, Alabama, USA.
| | | | | | | |
Collapse
|
161
|
Abstract
A number of studies have been conducted to investigate the predictive value of common genetic variants for complex diseases. To date, these studies have generally shown that common variants have no appreciable added predictive value over classical risk factors. New sequencing technology has enhanced the ability to identify rare variants that may have larger functional effects than common variants. One would expect rare variants to improve the discrimination power for disease risk by permitting more detailed quantification of genetic risk. Using the Genetic Analysis Workshop 17 simulated data sets for unrelated individuals, we evaluate the predictive value of rare variants by comparing prediction models built using the support vector machine algorithm with or without rare variants. Empirical results suggest that rare variants have appreciable effects on disease risk prediction.
Collapse
Affiliation(s)
- Chengqing Wu
- Department of Epidemiology and Public Health, Yale University, 60 College Street, New Haven, CT 06510, USA.
| | | | | | | | | |
Collapse
|
162
|
Bueno Filho JS, Morota G, Tran Q, Maenner MJ, Vera-Cala LM, Engelman CD, Meyers KJ. Analysis of human mini-exome sequencing data from Genetic Analysis Workshop 17 using a Bayesian hierarchical mixture model. BMC Proc 2011; 5 Suppl 9:S93. [PMID: 22373180 PMCID: PMC3287935 DOI: 10.1186/1753-6561-5-s9-s93] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Next-generation sequencing technologies are rapidly changing the field of genetic epidemiology and enabling exploration of the full allele frequency spectrum underlying complex diseases. Although sequencing technologies have shifted our focus toward rare genetic variants, statistical methods traditionally used in genetic association studies are inadequate for estimating effects of low minor allele frequency variants. Four our study we use the Genetic Analysis Workshop 17 data from 697 unrelated individuals (genotypes for 24,487 autosomal variants from 3,205 genes). We apply a Bayesian hierarchical mixture model to identify genes associated with a simulated binary phenotype using a transformed genotype design matrix weighted by allele frequencies. A Metropolis Hasting algorithm is used to jointly sample each indicator variable and additive genetic effect pair from its conditional posterior distribution, and remaining parameters are sampled by Gibbs sampling. This method identified 58 genes with a posterior probability greater than 0.8 for being associated with the phenotype. One of these 58 genes, PIK3C2B was correctly identified as being associated with affected status based on the simulation process. This project demonstrates the utility of Bayesian hierarchical mixture models using a transformed genotype matrix to detect genes containing rare and common variants associated with a binary phenotype.
Collapse
Affiliation(s)
- Julio S Bueno Filho
- Department of Dairy Science, University of Wisconsin-Madison, 444 Animal Science Building, 1675 Observatory Drive, Madison, WI 53706-1284, USA.,Departamento de Ciências Exatas, Universidade Federal de Lavras, PO Box 3037, Lavras, MG 37200-000, Brazil
| | - Gota Morota
- Department of Dairy Science, University of Wisconsin-Madison, 444 Animal Science Building, 1675 Observatory Drive, Madison, WI 53706-1284, USA
| | - Quoc Tran
- Department of Statistics, University of Wisconsin-Madison, 1300 University Avenue, Madison, WI 53706, USA
| | - Matthew J Maenner
- Department of Population Health Sciences, University of Wisconsin-Madison, 707 WARF Building, 610 North Walnut Street, Madison, WI 53726, USA
| | - Lina M Vera-Cala
- Department of Population Health Sciences, University of Wisconsin-Madison, 707 WARF Building, 610 North Walnut Street, Madison, WI 53726, USA.,Departamento de Salud Pública Universidad Industrial de Santander, Carrera 32 #29-31 Piso 3, Bucaramanga, Santander 680002, Colombia
| | - Corinne D Engelman
- Department of Population Health Sciences, University of Wisconsin-Madison, 707 WARF Building, 610 North Walnut Street, Madison, WI 53726, USA
| | - Kristin J Meyers
- Department of Population Health Sciences, University of Wisconsin-Madison, 707 WARF Building, 610 North Walnut Street, Madison, WI 53726, USA
| |
Collapse
|
163
|
Chen W, Gao X, Wang J, Sun C, Wan W, Zhi D, Liu N, Chen X, Gao G. Evaluation of association tests for rare variants using simulated data sets in the Genetic Analysis Workshop 17 data. BMC Proc 2011; 5 Suppl 9:S86. [PMID: 22373475 PMCID: PMC3287927 DOI: 10.1186/1753-6561-5-s9-s86] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
We evaluate four association tests for rare variants—the combined multivariate and collapsing (CMC) method, two weighted-sum methods, and a variable threshold method—by applying them to the simulated data sets of unrelated individuals in the Genetic Analysis Workshop 17 (GAW17) data. The family-wise error rate (FWER) and average power are used as criteria for evaluation. Our results show that when all nonsynonymous SNPs (rare variants and common variants) in a gene are jointly analyzed, the CMC method fails to control the FWER; when only rare variants (single-nucleotide polymorphisms with minor allele frequency less than 0.05) are analyzed, all four methods can control FWER well. All four methods have comparable power, which is low for the analysis of the GAW17 data sets. Three of the methods (not including the CMC method) involve estimation of p-values using permutation procedures that either can be computationally intensive or generate inflated FWERs. We adapt a fast permutation procedure into these three methods. The results show that using the fast permutation procedure can produce FWERs and average powers close to the values obtained from the standard permutation procedure on the GAW17 data sets. The standard permutation procedure is computationally intensive.
Collapse
Affiliation(s)
- Wenan Chen
- Department of Biostatistics, Virginia Commonwealth University School of Medicine, 830 East Main Street, One Capitol Square, 7th Floor, Richmond, VA 23298-0032, USA
| | - Xi Gao
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Room E4225, PO Box 843019, Richmond, VA 23284-3019, USA
| | - Jiexun Wang
- Department of Biostatistics, Virginia Commonwealth University School of Medicine, 830 East Main Street, One Capitol Square, 7th Floor, Richmond, VA 23298-0032, USA
| | - Chuanyu Sun
- Department of Biostatistics, Virginia Commonwealth University School of Medicine, 830 East Main Street, One Capitol Square, 7th Floor, Richmond, VA 23298-0032, USA
| | - Wen Wan
- Department of Biostatistics, Virginia Commonwealth University School of Medicine, 830 East Main Street, One Capitol Square, 7th Floor, Richmond, VA 23298-0032, USA
| | - Degui Zhi
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, 1665 University Boulevard, Birmingham, AL 35294, USA
| | - Nianjun Liu
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, 1665 University Boulevard, Birmingham, AL 35294, USA
| | - Xiangning Chen
- Departments of Psychiatry, Virginia Commonwealth University School of Medicine, Richmond, VA 23298-0003, USA
| | - Guimin Gao
- Department of Biostatistics, Virginia Commonwealth University School of Medicine, 830 East Main Street, One Capitol Square, 7th Floor, Richmond, VA 23298-0032, USA
| |
Collapse
|
164
|
Wei W, Visweswaran S, Cooper GF. The application of naive Bayes model averaging to predict Alzheimer's disease from genome-wide data. J Am Med Inform Assoc 2011; 18:370-5. [PMID: 21672907 DOI: 10.1136/amiajnl-2011-000101] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
OBJECTIVE Predicting patient outcomes from genome-wide measurements holds significant promise for improving clinical care. The large number of measurements (eg, single nucleotide polymorphisms (SNPs)), however, makes this task computationally challenging. This paper evaluates the performance of an algorithm that predicts patient outcomes from genome-wide data by efficiently model averaging over an exponential number of naive Bayes (NB) models. DESIGN This model-averaged naive Bayes (MANB) method was applied to predict late onset Alzheimer's disease in 1411 individuals who each had 312,318 SNP measurements available as genome-wide predictive features. Its performance was compared to that of a naive Bayes algorithm without feature selection (NB) and with feature selection (FSNB). MEASUREMENT Performance of each algorithm was measured in terms of area under the ROC curve (AUC), calibration, and run time. RESULTS The training time of MANB (16.1 s) was fast like NB (15.6 s), while FSNB (1684.2 s) was considerably slower. Each of the three algorithms required less than 0.1 s to predict the outcome of a test case. MANB had an AUC of 0.72, which is significantly better than the AUC of 0.59 by NB (p<0.00001), but not significantly different from the AUC of 0.71 by FSNB. MANB was better calibrated than NB, and FSNB was even better in calibration. A limitation was that only one dataset and two comparison algorithms were included in this study. CONCLUSION MANB performed comparatively well in predicting a clinical outcome from a high-dimensional genome-wide dataset. These results provide support for including MANB in the methods used to predict outcomes from large, genome-wide datasets.
Collapse
Affiliation(s)
- Wei Wei
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, USA
| | | | | |
Collapse
|
165
|
Refaat MM, Lubitz SA, Makino S, Islam Z, Frangiskakis JM, Mehdi H, Gutmann R, Zhang ML, Bloom HL, MacRae CA, Dudley SC, Shalaby AA, Weiss R, McNamara DM, London B, Ellinor PT. Genetic variation in the alternative splicing regulator RBM20 is associated with dilated cardiomyopathy. Heart Rhythm 2011; 9:390-6. [PMID: 22004663 DOI: 10.1016/j.hrthm.2011.10.016] [Citation(s) in RCA: 123] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/18/2011] [Accepted: 10/10/2011] [Indexed: 01/12/2023]
Abstract
BACKGROUND Dilated cardiomyopathy (DCM) is a leading cause of heart failure and death. The etiology of DCM is genetically heterogeneous. OBJECTIVES We sought to define the prevalence of mutations in the RNA splicing protein RBM20 in a large cohort with DCM and to determine whether genetic variation in RBM20 is associated with clinical outcomes. METHODS Subjects included in the Genetic Risk Assessment of Defibrillator Events (GRADE) study were aged at least 18 years, had an ejection fraction of ≤30%, and an implantable cardioverter-defibrillator (ICD). The coding region and splice junctions of RBM20 were screened in subjects with DCM; 2 common polymorphisms in RBM20, rs942077 and rs35141404, were genotyped in all GRADE subjects. RESULTS A total of 1465 subjects were enrolled in the GRADE study, and 283 with DCM were screened for RBM20 mutations. The mean age of subjects with DCM was 58 ± 13 years, 64% were males, and the mean follow-up time was 24.2 ± 17.1 months after ICD placement. RBM20 mutations were identified in 8 subjects with DCM (2.8%). Mutation carriers had a similar survival, transplantation rate, and frequency of ICD therapy compared with nonmutation carriers. Three of 8 subjects with RBM20 mutations (37.5%) had atrial fibrillation (AF), whereas 19 subjects without mutations (7.4%) had AF (P = .02). Among all GRADE subjects, rs35141404 was associated with AF (minor allele odds ratio = 0.62; 95% confidence interval = 0.44-0.86; P = .006). In the subset of GRADE subjects with DCM, rs35141404 was associated with AF (minor allele odds ratio = 0.58; P = .047). CONCLUSIONS Mutations in RBM20 were observed in approximately 3% of subjects with DCM. There were no differences in survival, transplantation rate, and frequency of ICD therapy in mutation carriers.
Collapse
Affiliation(s)
- Marwan M Refaat
- Cardiovascular Institute, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania 15213, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
166
|
Biswas S, Lin S. Logistic Bayesian LASSO for identifying association with rare haplotypes and application to age-related macular degeneration. Biometrics 2011; 68:587-97. [PMID: 21955118 DOI: 10.1111/j.1541-0420.2011.01680.x] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Rare variants have been heralded as key to uncovering "missing heritability" in complex diseases. These variants can now be genotyped using next-generation sequencing technologies; nonetheless, rare haplotypes may also result from combination of common single nucleotide polymorphisms available from genome-wide association studies (GWAS). The National Eye Institute's data on age-related macular degeneration (AMD) is such an example. Studies on AMD had identified potential rare variants; however, due to lack of appropriate statistical tools, effects of individual rare haplotypes were never studied. Here we develop a method for identifying association with rare haplotypes for case-control design. A logistic regression based retrospective likelihood is formulated and is regularized using logistic Bayesian LASSO (LBL). In particular, we penalize the regression coefficients using appropriate priors to weed out unassociated haplotypes, making it possible for the rare associated ones to stand out. We applied LBL to the AMD data and identified common and rare haplotypes in the complement factor H gene, gaining insights into rare variants' contributions to AMD beyond the current literature. This analysis also demonstrates the richness of GWAS data for mapping rare haplotypes-a potential largely unexplored. Additionally, we conducted simulations to investigate the performance of LBL and compare it with Hapassoc. Our results show that LBL is much more powerful in identifying rare associated haplotypes when the false positive rates for both approaches are kept the same.
Collapse
Affiliation(s)
- Swati Biswas
- Department of Biostatistics, School of Public Health, University of North Texas Health Science Center, Fort Worth, Texas 76107, USA.
| | | |
Collapse
|
167
|
Genetics and the environment converge to dysregulate N-glycosylation in multiple sclerosis. Nat Commun 2011; 2:334. [PMID: 21629267 PMCID: PMC3133923 DOI: 10.1038/ncomms1333] [Citation(s) in RCA: 124] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2011] [Accepted: 05/04/2011] [Indexed: 02/06/2023] Open
Abstract
How environmental factors combine with genetic risk at the molecular level to promote complex trait diseases such as multiple sclerosis (MS) is largely unknown. In mice, N-glycan branching by the Golgi enzymes Mgat1 and/or Mgat5 prevents T cell hyperactivity, cytotoxic T-lymphocyte antigen 4 (CTLA-4) endocytosis, spontaneous inflammatory demyelination and neurodegeneration, the latter pathologies characteristic of MS. Here we show that MS risk modulators converge to alter N-glycosylation and/or CTLA-4 surface retention conditional on metabolism and vitamin D3, including genetic variants in interleukin-7 receptor-α (IL7RA*C), interleukin-2 receptor-α (IL2RA*T), MGAT1 (IVAVT−T) and CTLA-4 (Thr17Ala). Downregulation of Mgat1 by IL7RA*C and IL2RA*T is opposed by MGAT1 (IVAVT−T) and vitamin D3, optimizing branching and mitigating MS risk when combined with enhanced CTLA-4 N-glycosylation by CTLA-4 Thr17. Our data suggest a molecular mechanism in MS whereby multiple environmental and genetic inputs lead to dysregulation of a final common pathway, namely N-glycosylation. Complex diseases such as multiple sclerosis have both genetic and environmental components. This study demonstrates that variants of genes implicated in multiple sclerosis, and alterations in cellular metabolism and vitamin D3 levels, alter N-glycosylation, a post-translational modification causal of the disease in mice.
Collapse
|
168
|
Sul JH, Han B, Eskin E. Increasing power of groupwise association test with likelihood ratio test. J Comput Biol 2011; 18:1611-24. [PMID: 21919745 DOI: 10.1089/cmb.2011.0161] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Sequencing studies have been discovering a numerous number of rare variants, allowing the identification of the effects of rare variants on disease susceptibility. As a method to increase the statistical power of studies on rare variants, several groupwise association tests that group rare variants in genes and detect associations between genes and diseases have been proposed. One major challenge in these methods is to determine which variants are causal in a group, and to overcome this challenge, previous methods used prior information that specifies how likely each variant is causal. Another source of information that can be used to determine causal variants is the observed data because case individuals are likely to have more causal variants than control individuals. In this article, we introduce a likelihood ratio test (LRT) that uses both data and prior information to infer which variants are causal and uses this finding to determine whether a group of variants is involved in a disease. We demonstrate through simulations that LRT achieves higher power than previous methods. We also evaluate our method on mutation screening data of the susceptibility gene for ataxia telangiectasia, and show that LRT can detect an association in real data. To increase the computational speed of our method, we show how we can decompose the computation of LRT, and propose an efficient permutation test. With this optimization, we can efficiently compute an LRT statistic and its significance at a genome-wide level. The software for our method is publicly available at http://genetics.cs.ucla.edu/rarevariants .
Collapse
Affiliation(s)
- Jae Hoon Sul
- Department of Computer Science, University of California, Los Angeles, California, USA
| | | | | |
Collapse
|
169
|
A general framework for detecting disease associations with rare variants in sequencing studies. Am J Hum Genet 2011; 89:354-67. [PMID: 21885029 DOI: 10.1016/j.ajhg.2011.07.015] [Citation(s) in RCA: 209] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2011] [Revised: 07/21/2011] [Accepted: 07/26/2011] [Indexed: 12/19/2022] Open
Abstract
Biological and empirical evidence suggests that rare variants account for a large proportion of the genetic contributions to complex human diseases. Recent technological advances in high-throughput sequencing platforms have made it possible for researchers to generate comprehensive information on rare variants in large samples. We provide a general framework for association testing with rare variants by combining mutation information across multiple variant sites within a gene and relating the enriched genetic information to disease phenotypes through appropriate regression models. Our framework covers all major study designs (i.e., case-control, cross-sectional, cohort and family studies) and all common phenotypes (e.g., binary, quantitative, and age at onset), and it allows arbitrary covariates (e.g., environmental factors and ancestry variables). We derive theoretically optimal procedures for combining rare mutations and construct suitable test statistics for various biological scenarios. The allele-frequency threshold can be fixed or variable. The effects of the combined rare mutations on the phenotype can be in the same direction or different directions. The proposed methods are statistically more powerful and computationally more efficient than existing ones. An application to a deep-resequencing study of drug targets led to a discovery of rare variants associated with total cholesterol. The relevant software is freely available.
Collapse
|
170
|
Inherited mitochondrial variants are not a major cause of age-related hearing impairment in the European population. Mitochondrion 2011; 11:729-34. [DOI: 10.1016/j.mito.2011.05.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2010] [Revised: 05/13/2011] [Accepted: 05/25/2011] [Indexed: 11/20/2022]
|
171
|
Kulminski AM. Complex phenotypes and phenomenon of genome-wide inter-chromosomal linkage disequilibrium in the human genome. Exp Gerontol 2011; 46:979-86. [PMID: 21907271 DOI: 10.1016/j.exger.2011.08.010] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2011] [Revised: 07/29/2011] [Accepted: 08/23/2011] [Indexed: 10/17/2022]
Abstract
Studies of non-human species show that loci on non-homologous chromosomes can be in linkage disequilibrium (LD). I focus on the Framingham Heart Study (FHS) participants to explore whether the phenomenon of inter-chromosomal LD can be caused by non-stochastic bio-genetic mechanisms in the human genome and be associated with complex, polygenic phenotypes. This paper documents remarkably strong and extensive LD among SNPs at loci on multiple non-homologous chromosomes genotyped using two independent (Affymetrix 50K and 500K) arrays. The analyses provided compelling evidences that the observed inter-chromosomal LD was unlikely generated by stochasticity, population or family structure, or mis-genotyping. The analyses show that this LD is associated with complex heritable phenotypes characterizing poor health. The inter-chromosomal LD was observed in parental and offspring generations of the FHS participants. These findings suggest that inter-chromosomal LD can be caused by bio-genetic mechanisms possibly associated with favorable or unfavorable epistatic evolution. This phenomenon can challenge our understanding of the role of genes and gene networks in regulating complex, polygenic phenotypes in humans.
Collapse
|
172
|
Ion channels and schizophrenia: a gene set-based analytic approach to GWAS data for biological hypothesis testing. Hum Genet 2011; 131:373-91. [PMID: 21866342 DOI: 10.1007/s00439-011-1082-x] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2011] [Accepted: 08/08/2011] [Indexed: 01/11/2023]
Abstract
Schizophrenia is a complex genetic disorder. Gene set-based analytic (GSA) methods have been widely applied for exploratory analyses of large, high-throughput datasets, but less commonly employed for biological hypothesis testing. Our primary hypothesis is that variation in ion channel genes contribute to the genetic susceptibility to schizophrenia. We applied Exploratory Visual Analysis (EVA), one GSA application, to analyze European-American (EA) and African-American (AA) schizophrenia genome-wide association study datasets for statistical enrichment of ion channel gene sets, comparing GSA results derived under three SNP-to-gene mapping strategies: (1) GENIC; (2) 500-Kb; (3) 2.5-Mb and three complimentary SNP-to-gene statistical reduction methods: (1) minimum p value (pMIN); (2) a novel method, proportion of SNPs per Gene with p values below a pre-defined α-threshold (PROP); and (3) the truncated product method (TPM). In the EA analyses, ion channel gene set(s) were enriched under all mapping and statistical approaches. In the AA analysis, ion channel gene set(s) were significantly enriched under pMIN for all mapping strategies and under PROP for broader mapping strategies. Less extensive enrichment in the AA sample may reflect true ethnic differences in susceptibility, sampling or case ascertainment differences, or higher dimensionality relative to sample size of the AA data. More consistent findings under broader mapping strategies may reflect enhanced power due to increased SNP inclusion, enhanced capture of effects over extended haplotypes or significant contributions from regulatory regions. While extensive pMIN findings may reflect gene size bias, the extent and significance of PROP and TPM findings suggest that common variation at ion channel genes may capture some of the heritability of schizophrenia.
Collapse
|
173
|
Torkamani A, Scott-Van Zeeland AA, Topol EJ, Schork NJ. Annotating individual human genomes. Genomics 2011; 98:233-41. [PMID: 21839162 DOI: 10.1016/j.ygeno.2011.07.006] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2011] [Accepted: 07/26/2011] [Indexed: 02/03/2023]
Abstract
Advances in DNA sequencing technologies have made it possible to rapidly, accurately and affordably sequence entire individual human genomes. As impressive as this ability seems, however, it will not likely amount to much if one cannot extract meaningful information from individual sequence data. Annotating variations within individual genomes and providing information about their biological or phenotypic impact will thus be crucially important in moving individual sequencing projects forward, especially in the context of the clinical use of sequence information. In this paper we consider the various ways in which one might annotate individual sequence variations and point out limitations in the available methods for doing so. It is arguable that, in the foreseeable future, DNA sequencing of individual genomes will become routine for clinical, research, forensic, and personal purposes. We therefore also consider directions and areas for further research in annotating genomic variants.
Collapse
|
174
|
Integrating Rare-Variant Testing, Function Prediction, and Gene Network in Composite Resequencing-Based Genome-Wide Association Studies (CR-GWAS). G3-GENES GENOMES GENETICS 2011; 1:233-43. [PMID: 22384334 PMCID: PMC3276137 DOI: 10.1534/g3.111.000364] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/19/2011] [Accepted: 07/05/2011] [Indexed: 01/08/2023]
Abstract
High-density array-based genome-wide association studies (GWAS) are complemented by exome sequencing and whole-genome resequencing-based association studies. Here we present a composite resequencing-based genome-wide association study (CR-GWAS) strategy that systematically exploits collective biological information and analytical tools for a robust analysis. We showcased the utility of this strategy by using Arabidopsis (Arabidopsis thaliana) resequencing data. Bioinformatic predictions of biological function alteration at each locus were integrated into the process of association testing of both common and rare variants for complex traits with a suite of statistics. Significant signals were then filtered with a priori candidate loci generated from genome database and gene network models to obtain a posteriori candidate loci. A probabilistic gene network (AraNet) that interrogates network neighborhoods of genes was then used to expand the filtering power to examine the significant testing signals. Using this strategy, we confirmed the known true positives and identified several new promising associations. Promising genes (AP1, FCA, FRI, FLC, FLM, SPL5, FY, and DCL2) were shown to control for flowering time through either common variants or rare variants within a diverse set of Arabidopsis accessions. Although many of these candidate genes were cloned earlier with mutational studies, identifying their allele variation contribution to overall phenotypic variation among diverse natural accessions is critical. Our rare allele testing established a greater number of connections than previous analyses in which this issue was not addressed. More importantly, our results demonstrated the potential of integrating various biological, statistical, and bioinformatic tools into complex trait dissection.
Collapse
|
175
|
Abstract
Exome sequencing is rapidly becoming a fundamental tool for genetics and functional genomics laboratories. This methodology has enabled the discovery of novel pathogenic mutations causing mendelian diseases that had, until now, remained elusive. In this review, we discuss not only how we envisage exome sequencing being applied to a complex disease, such as Parkinson's disease, but also what are the known caveats of this approach.
Collapse
Affiliation(s)
- Jose M Bras
- Department of Molecular Neuroscience, Institute of Neurology, University College of London, London, UK.
| | | |
Collapse
|
176
|
Génin E, Schumacher M, Roujeau JC, Naldi L, Liss Y, Kazma R, Sekula P, Hovnanian A, Mockenhaupt M. Genome-wide association study of Stevens-Johnson Syndrome and Toxic Epidermal Necrolysis in Europe. Orphanet J Rare Dis 2011; 6:52. [PMID: 21801394 PMCID: PMC3173287 DOI: 10.1186/1750-1172-6-52] [Citation(s) in RCA: 88] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2011] [Accepted: 07/29/2011] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Stevens-Johnson syndrome (SJS) and Toxic Epidermal Necrolysis (TEN) are rare but extremely severe cutaneous adverse drug reactions in which drug-specific associations with HLA-B alleles were described. OBJECTIVES To investigate genetic association at a genome-wide level on a large sample of SJS/TEN patients. METHODS We performed a genome wide association study on a sample of 424 European cases and 1,881 controls selected from a Reference Control Panel. RESULTS Six SNPs located in the HLA region showed significant evidence for association (OR range: 1.53-1.74). The haplotype formed by their risk allele was more associated with the disease than any of the single SNPs and was even much stronger in patients exposed to allopurinol (OR(allopurinol) = 7.77, 95%CI = [4.66; 12.98]). The associated haplotype is in linkage disequilibrium with the HLA-B*5801 allele known to be associated with allopurinol induced SJS/TEN in Asian populations. CONCLUSION The involvement of genetic variants located in the HLA region in SJS/TEN is confirmed in European samples, but no other locus reaches genome-wide statistical significance in this sample that is also the largest one collected so far. If some loci outside HLA play a role in SJS/TEN, their effect is thus likely to be very small.
Collapse
|
177
|
Butte NF, Voruganti VS, Cole SA, Haack K, Comuzzie AG, Muzny DM, Wheeler DA, Chang K, Hawes A, Gibbs RA. Resequencing of IRS2 reveals rare variants for obesity but not fasting glucose homeostasis in Hispanic children. Physiol Genomics 2011; 43:1029-37. [PMID: 21771880 DOI: 10.1152/physiolgenomics.00019.2011] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Our objective was to resequence insulin receptor substrate 2 (IRS2) to identify variants associated with obesity- and diabetes-related traits in Hispanic children. Exonic and intronic segments, 5' and 3' flanking regions of IRS2 (∼14.5 kb), were bidirectionally sequenced for single nucleotide polymorphism (SNP) discovery in 934 Hispanic children using 3730XL DNA Sequencers. Additionally, 15 SNPs derived from Illumina HumanOmni1-Quad BeadChips were analyzed. Measured genotype analysis tested associations between SNPs and obesity and diabetes-related traits. Bayesian quantitative trait nucleotide analysis was used to statistically infer the most likely functional polymorphisms. A total of 140 SNPs were identified with minor allele frequencies (MAF) ranging from 0.001 to 0.47. Forty-two of the 70 coding SNPs result in nonsynonymous amino acid substitutions relative to the consensus sequence; 28 SNPs were detected in the promoter, 12 in introns, 28 in the 3'-UTR, and 2 in the 5'-UTR. Two insertion/deletions (indels) were detected. Ten independent rare SNPs (MAF = 0.001-0.009) were associated with obesity-related traits (P = 0.01-0.00002). SNP 10510452_139 in the promoter region was shown to have a high posterior probability (P = 0.77-0.86) of influencing BMI, fat mass, and waist circumference in Hispanic children. SNP 10510452_139 contributed between 2 and 4% of the population variance in body weight and composition. None of the SNPs or indels were associated with diabetes-related traits or accounted for a previously identified quantitative trait locus on chromosome 13 for fasting serum glucose. Rare but not common IRS2 variants may play a role in the regulation of body weight but not an essential role in fasting glucose homeostasis in Hispanic children.
Collapse
Affiliation(s)
- Nancy F Butte
- Department of Pediatrics, Baylor College of Medicine, US Department of Agriculture/Agricultural Research Service Children's Nutrition Research Center, Houston, TX, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
178
|
Basu S, Pan W. Comparison of statistical tests for disease association with rare variants. Genet Epidemiol 2011; 35:606-19. [PMID: 21769936 DOI: 10.1002/gepi.20609] [Citation(s) in RCA: 188] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2010] [Revised: 03/23/2011] [Accepted: 06/03/2011] [Indexed: 01/31/2023]
Abstract
In anticipation of the availability of next-generation sequencing data, there is increasing interest in investigating association between complex traits and rare variants (RVs). In contrast to association studies for common variants (CVs), due to the low frequencies of RVs, common wisdom suggests that existing statistical tests for CVs might not work, motivating the recent development of several new tests for analyzing RVs, most of which are based on the idea of pooling/collapsing RVs. However, there is a lack of evaluations of, and thus guidance on the use of, existing tests. Here we provide a comprehensive comparison of various statistical tests using simulated data. We consider both independent and correlated rare mutations, and representative tests for both CVs and RVs. As expected, if there are no or few non-causal (i.e. neutral or non-associated) RVs in a locus of interest while the effects of causal RVs on the trait are all (or mostly) in the same direction (i.e. either protective or deleterious, but not both), then the simple pooled association tests (without selecting RVs and their association directions) and a new test called kernel-based adaptive clustering (KBAC) perform similarly and are most powerful; KBAC is more robust than simple pooled association tests in the presence of non-causal RVs; however, as the number of non-causal CVs increases and/or in the presence of opposite association directions, the winners are two methods originally proposed for CVs and a new test called C-alpha test proposed for RVs, each of which can be regarded as testing on a variance component in a random-effects model. Interestingly, several methods based on sequential model selection (i.e. selecting causal RVs and their association directions), including two new methods proposed here, perform robustly and often have statistical power between those of the above two classes.
Collapse
Affiliation(s)
- Saonli Basu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota 55455-0392, USA
| | | |
Collapse
|
179
|
Tang CSM, Ngan ESW, Tang WK, So MT, Cheng G, Miao XP, Leon TYY, Leung BMC, Hui KJWS, Lui VHC, Chen Y, Chan IHY, Chung PHY, Liu XL, Wong KKY, Sham PC, Cherny SS, Tam PKH, Garcia-Barcelo MM. Mutations in the NRG1 gene are associated with Hirschsprung disease. Hum Genet 2011; 131:67-76. [PMID: 21706185 DOI: 10.1007/s00439-011-1035-4] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2011] [Accepted: 06/05/2011] [Indexed: 12/19/2022]
Abstract
Hirschsprung disease (HSCR, congenital colon aganglionosis) is a relatively common complex genetic condition caused by abnormal development of the enteric nervous system (ENS). Through a recent genome-wide association study conducted on Chinese HSCR patients, we identified a new HSCR contributing locus, neuregulin 1 (NRG1; 8p12), a gene known to be involved in the development of the ENS. As genes in which disease-associated common variants are found are to be considered as candidates for the search of deleterious rare variants (RVs) in the coding sequences, we sequenced the NRG1 exons of 358 sporadic HSCR patients and 333 controls. We identified a total of 13 different heterozygous RVs including 8 non-synonymous (A28G, E134K, V266L, H347Y, P356L, V486M, A511T, P608A) and 3 synonymous amino acid substitutions (P24P, T169T, L483L), a frameshift (E239fsX10), and a c.503-4insT insertion. Functional analysis of the most conserved non-synonymous substitutions, H347Y and P356L, showed uneven intracellular distribution and aberrant expression of the mutant proteins. Except for T169T and V486M, all variants were exclusive to HSCR patients. Overall, there was a statistically significant over-representation of NRG1 RVs in HSCR patients (p = 0.008). We show here that not only common, but also rare variants of the NRG1 gene contribute to HSCR. This strengthens the role of NRG1.
Collapse
|
180
|
Edwards TL, Song Z, Li C. Enriching targeted sequencing experiments for rare disease alleles. ACTA ACUST UNITED AC 2011; 27:2112-8. [PMID: 21700677 DOI: 10.1093/bioinformatics/btr324] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Next-generation targeted resequencing of genome-wide association study (GWAS)-associated genomic regions is a common approach for follow-up of indirect association of common alleles. However, it is prohibitively expensive to sequence all the samples from a well-powered GWAS study with sufficient depth of coverage to accurately call rare genotypes. As a result, many studies may use next-generation sequencing for single nucleotide polymorphism (SNP) discovery in a smaller number of samples, with the intent to genotype candidate SNPs with rare alleles captured by resequencing. This approach is reasonable, but may be inefficient for rare alleles if samples are not carefully selected for the resequencing experiment. RESULTS We have developed a probability-based approach, SampleSeq, to select samples for a targeted resequencing experiment that increases the yield of rare disease alleles substantially over random sampling of cases or controls or sampling based on genotypes at associated SNPs from GWAS data. This technique allows for smaller sample sizes for resequencing experiments, or allows the capture of rarer risk alleles. When following up multiple regions, SampleSeq selects subjects with an even representation of all the regions. SampleSeq also can be used to calculate the sample size needed for the resequencing to increase the chance of successful capture of rare alleles of desired frequencies. SOFTWARE http://biostat.mc.vanderbilt.edu/SampleSeq
Collapse
Affiliation(s)
- Todd L Edwards
- Vanderbilt Epidemiology Center, Division of Epidemiology, Department of Medicine, Vanderbilt University, Nashville, TN 37203, USA
| | | | | |
Collapse
|
181
|
Siu H, Zhu Y, Jin L, Xiong M. Implication of next-generation sequencing on association studies. BMC Genomics 2011; 12:322. [PMID: 21682891 PMCID: PMC3148210 DOI: 10.1186/1471-2164-12-322] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2010] [Accepted: 06/17/2011] [Indexed: 01/21/2023] Open
Abstract
Background Next-generation sequencing technologies can effectively detect the entire spectrum of genomic variation and provide a powerful tool for systematic exploration of the universe of common, low frequency and rare variants in the entire genome. However, the current paradigm for genome-wide association studies (GWAS) is to catalogue and genotype common variants (5% < MAF). The methods and study design for testing the association of low frequency (0.5% < MAF ≤ 5%) and rare variation (MAF ≤ 0.5%) have not been thoroughly investigated. The 1000 Genomes Project represents one such endeavour to characterize the human genetic variation pattern at the MAF = 1% level as a foundation for association studies. In this report, we explore different strategies and study designs for the near future GWAS in the post-era, based on both low coverage pilot data and exon pilot data in 1000 Genomes Project. Results We investigated the linkage disequilibrium (LD) pattern among common and low frequency SNPs and its implication for association studies. We found that the LD between low frequency alleles and low frequency alleles, and low frequency alleles and common alleles are much weaker than the LD between common and common alleles. We examined various tagging designs with and without statistical imputation approaches and compare their power against de novo resequencing in mapping causal variants under various disease models. We used the low coverage pilot data which contain ~14 M SNPs as a hypothetical genotype-array platform (Pilot 14 M) to interrogate its impact on the selection of tag SNPs, mapping coverage and power of association tests. We found that even after imputation we still observed 45.4% of low frequency SNPs which were untaggable and only 67.7% of the low frequency variation was covered by the Pilot 14 M array. Conclusions This suggested GWAS based on SNP arrays would be ill-suited for association studies of low frequency variation.
Collapse
Affiliation(s)
- Hoicheong Siu
- MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences, Fudan University, Shanghai, 200433, China
| | | | | | | |
Collapse
|
182
|
Kulminski AM, Culminskaya I, Ukraintseva SV, Arbeev KG, Arbeeva L, Wu D, Akushevich I, Land KC, Yashin AI. Trade-off in the effects of the apolipoprotein E polymorphism on the ages at onset of CVD and cancer influences human lifespan. Aging Cell 2011; 10:533-41. [PMID: 21332925 DOI: 10.1111/j.1474-9726.2011.00689.x] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Progress in unraveling the genetic origins of healthy aging is tempered, in part, by a lack of replication of effects, which is often considered a signature of false-positive findings. We convincingly demonstrate that the lack of genetic effects on an aging-related trait can be because of trade-offs in the gene action. We focus on the well-studied apolipoprotein E (APOE) e2/3/4 polymorphism and on lifespan and ages at onset of cardiovascular diseases (CVD) and cancer, using data on 3924 participants of the Framingham Heart Study Offspring cohort. Kaplan-Meier estimates show that the e4 allele carriers live shorter lives than the non-e4 allele carriers (log rank = 0.016). The adverse effect was attributed to the poor survival of the e4 homozygotes, whereas the effect of the common e3/4 genotype was insignificant. The e3/4 genotype, however, was antagonistically associated with onsets of those diseases predisposing to an earlier onset of CVD and a later onset of cancer compared to the non-e4 allele genotypes. This trade-off explains the lack of a significant effect of the e3/4 genotype on survival; adjustment for it in the Cox regression model makes the detrimental effect of the e4 allele highly significant (P = 0.002). This trade-off is likely caused by the lipid-metabolism-related (for CVD) and nonrelated (for cancer) mechanisms. An evolutionary rationale suggests that genetic trade-offs should not be an exception in studies of aging-related traits. Deeper insights into biological mechanisms mediating gene action are critical for understanding the genetic regulation of a healthy lifespan and for personalizing medical care.
Collapse
|
183
|
Rahimov F, Jugessur A, Murray JC. Genetics of nonsyndromic orofacial clefts. Cleft Palate Craniofac J 2011; 49:73-91. [PMID: 21545302 DOI: 10.1597/10-178] [Citation(s) in RCA: 179] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
With an average worldwide prevalence of approximately 1.2/1000 live births, orofacial clefts are the most common craniofacial birth defects in humans. Like other complex disorders, these birth defects are thought to result from the complex interplay of multiple genes and environmental factors. Significant progress in the identification of underlying genes and pathways has benefited from large populations available for study, increased international collaboration, rapid advances in genotyping technology, and major improvements in analytic approaches. Here we review recent advances in genetic epidemiological approaches to complex traits and their applications to studies of nonsyndromic orofacial clefts. Our main aim is to bring together a discussion of new and previously identified candidate genes to create a more cohesive picture of interacting pathways that shape the human craniofacial region. In future directions, we highlight the need to search for copy number variants that affect gene dosage and rare variants that are possibly associated with a higher disease penetrance. In addition, sequencing of protein-coding regions in candidate genes and screening for genetic variation in noncoding regulatory elements will help advance this important area of research.
Collapse
Affiliation(s)
- Fedik Rahimov
- Interdisciplinary Ph.D. Program in Genetics, Department of Pediatrics, University of Iowa, Iowa City, Iowa, USA
| | | | | |
Collapse
|
184
|
Abstract
Genome-wide association studies (GWAS) have become the primary approach for identifying genes with common variants influencing complex diseases. Despite considerable progress, the common variations identified by GWAS account for only a small fraction of disease heritability and are unlikely to explain the majority of phenotypic variations of common diseases. A potential source of the missing heritability is the contribution of rare variants. Next-generation sequencing technologies will detect millions of novel rare variants, but these technologies have three defining features: identification of a large number of rare variants, a high proportion of sequence errors, and a large proportion of missing data. These features raise challenges for testing the association of rare variants with phenotypes of interest. In this study, we use a genome continuum model and functional principal components as a general principle for developing novel and powerful association analysis methods designed for resequencing data. We use simulations to calculate the type I error rates and the power of nine alternative statistics: two functional principal component analysis (FPCA)-based statistics, the multivariate principal component analysis (MPCA)-based statistic, the weighted sum (WSS), the variable-threshold (VT) method, the generalized T(2), the collapsing method, the CMC method, and individual tests. We also examined the impact of sequence errors on their type I error rates. Finally, we apply the nine statistics to the published resequencing data set from ANGPTL4 in the Dallas Heart Study. We report that FPCA-based statistics have a higher power to detect association of rare variants and a stronger ability to filter sequence errors than the other seven methods.
Collapse
Affiliation(s)
- Li Luo
- Human Genetics Center, University of Texas School of Public Health, Houston, TX 77030, USA
| | | | | |
Collapse
|
185
|
Capanu M, Concannon P, Haile RW, Bernstein L, Malone KE, Lynch CF, Liang X, Teraoka SN, Diep AT, Thomas DC, Bernstein JL, Begg CB. Assessment of rare BRCA1 and BRCA2 variants of unknown significance using hierarchical modeling. Genet Epidemiol 2011; 35:389-97. [PMID: 21520273 DOI: 10.1002/gepi.20587] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2010] [Revised: 03/18/2011] [Accepted: 03/21/2011] [Indexed: 11/11/2022]
Abstract
Current evidence suggests that the genetic risk of breast cancer may be caused primarily by rare variants. However, while classification of protein-truncating mutations as deleterious is relatively straightforward, distinguishing as deleterious or neutral the large number of rare missense variants is a difficult on-going task. In this article, we present one approach to this problem, hierarchical statistical modeling of data observed in a case-control study of contralateral breast cancer (CBC) in which all the participants were genotyped for variants in BRCA1 and BRCA2. Hierarchical modeling permits leverage of information from observed correlations of characteristics of groups of variants with case-control status to infer with greater precision the risks of individual rare variants. A total of 181 distinct rare missense variants were identified among the 705 cases with CBC and the 1,398 controls with unilateral breast cancer. The model identified three bioinformatic hierarchical covariates, align-GV, align-GD, and SIFT scores, each of which was modestly associated with risk. Collectively, the 11 variants that were classified as adverse on the basis of all the three bioinformatic predictors demonstrated a stronger risk signal. This group included five of six missense variants that were classified as deleterious at the outset by conventional criteria. The remaining six variants can be considered as plausibly deleterious, and deserving of further investigation (BRCA1 R866C; BRCA2 G1529R, D2665G, W2626C, E2663V, and R3052W). Hierarchical modeling is a strategy that has promise for interpreting the evidence from future association studies that involve sequencing of known or suspected cancer genes.
Collapse
Affiliation(s)
- Marinela Capanu
- Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY 10065, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
186
|
Pan W, Shen X. Adaptive tests for association analysis of rare variants. Genet Epidemiol 2011; 35:381-8. [PMID: 21520272 DOI: 10.1002/gepi.20586] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2011] [Revised: 03/03/2011] [Accepted: 03/21/2011] [Indexed: 01/30/2023]
Abstract
In anticipation of the availability of next-generation sequencing data, there has been increasing interest in association analysis of rare variants (RVs). Owing to the extremely low frequency of a RV, single variant-based analysis and many existing tests developed for common variants may not be suitable. Hence, it is of interest to develop powerful statistical tests to assess association between complex traits and RVs with sequence data. Recently, a pooled association test based on variable thresholds (VT) was proposed and shown to be more powerful than some existing tests (Price et al. [2010] Am J Hum Genet 86:832-838). In this study, we generalize the VT test of Price et al. in several aspects. We propose a general class of adaptive tests that covers the VT test of Price et al. as a special case. In particular, we show that some of our proposed adaptive tests may substantially improve the power over the pooled association tests, including the VT test of Price et al., especially so in the presence of many neutral RVs and/or of causal RVs with opposite association directions, in which cases most of the existing pooled association tests suffer from significant loss of power. Our proposed tests are also general and flexible with the ability to incorporate weights on RVs and to adjust for covariates.
Collapse
Affiliation(s)
- Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455–0392, USA.
| | | |
Collapse
|
187
|
Turner S, Armstrong LL, Bradford Y, Carlson CS, Crawford DC, Crenshaw AT, de Andrade M, Doheny KF, Haines JL, Hayes G, Jarvik G, Jiang L, Kullo IJ, Li R, Ling H, Manolio TA, Matsumoto M, McCarty CA, McDavid AN, Mirel DB, Paschall JE, Pugh EW, Rasmussen LV, Wilke RA, Zuvich RL, Ritchie MD. Quality control procedures for genome-wide association studies. ACTA ACUST UNITED AC 2011; Chapter 1:Unit1.19. [PMID: 21234875 DOI: 10.1002/0471142905.hg0119s68] [Citation(s) in RCA: 201] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Genome-wide association studies (GWAS) are being conducted at an unprecedented rate in population-based cohorts and have increased our understanding of the pathophysiology of complex disease. Regardless of context, the practical utility of this information will ultimately depend upon the quality of the original data. Quality control (QC) procedures for GWAS are computationally intensive, operationally challenging, and constantly evolving. Here we enumerate some of the challenges in QC of GWAS data and describe the approaches that the electronic MEdical Records and Genomics (eMERGE) network is using for quality assurance in GWAS data, thereby minimizing potential bias and error in GWAS results. We discuss common issues associated with QC of GWAS data, including data file formats, software packages for data manipulation and analysis, sex chromosome anomalies, sample identity, sample relatedness, population substructure, batch effects, and marker quality. We propose best practices and discuss areas of ongoing and future research.
Collapse
Affiliation(s)
- Stephen Turner
- Center for Human Genetics Research, Department of Molecular Physiology & Biophysics, Vanderbilt University, Nashville, Tennessee, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
188
|
A genome-wide comparison of the functional properties of rare and common genetic variants in humans. Am J Hum Genet 2011; 88:458-68. [PMID: 21457907 DOI: 10.1016/j.ajhg.2011.03.008] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2011] [Revised: 03/01/2011] [Accepted: 03/14/2011] [Indexed: 01/31/2023] Open
Abstract
One of the longest running debates in evolutionary biology concerns the kind of genetic variation that is primarily responsible for phenotypic variation in species. Here, we address this question for humans specifically from the perspective of population allele frequency of variants across the complete genome, including both coding and noncoding regions. We establish simple criteria to assess the likelihood that variants are functional based on their genomic locations and then use whole-genome sequence data from 29 subjects of European origin to assess the relationship between the functional properties of variants and their population allele frequencies. We find that for all criteria used to assess the likelihood that a variant is functional, the rarer variants are significantly more likely to be functional than the more common variants. Strikingly, these patterns disappear when we focus on only those variants in which the major alleles are derived. These analyses indicate that the majority of the genetic variation in terms of phenotypic consequence may result from a mutation-selection balance, as opposed to balancing selection, and have direct relevance to the study of human disease.
Collapse
|
189
|
Holmes MV, Harrison S, Talmud PJ, Hingorani AD, Humphries SE. Utility of genetic determinants of lipids and cardiovascular events in assessing risk. Nat Rev Cardiol 2011; 8:207-21. [PMID: 21321562 DOI: 10.1038/nrcardio.2011.6] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The prevention of coronary heart disease (CHD) is a major public-health goal, but disease architecture is such that a larger proportion of clinical events occur among the average majority than among the high-risk minority--the prevention paradox. Genetic findings over the past few years have resulted in the reopening of the old debate on whether an individualized or a population-based approach to prevention is preferable. Genetic testing is an attractive tool for CHD risk prediction because it is a low-cost, high-fidelity technology with multiplex capability. Moreover, by contrast with nongenetic markers, genotype is invariant and determined from conception, which eliminates biological variability and makes prediction from early life possible. Mindful of the prevention paradox, this Review examines the potential applications and challenges of using genetic information for predicting CHD, focusing on lipid risk factors and drawing on experience in the evaluation of nongenetic risk factors as screening tests for CHD. Many of the issues we discuss hold true for any late-onset common disease with modifiable risk factors and proven preventative strategies.
Collapse
Affiliation(s)
- Michael V Holmes
- Genetic Epidemiology Group, Department of Epidemiology and Public Health, University College London, 1-19 Torrington Place, London WC1E 6BT, UK
| | | | | | | | | |
Collapse
|
190
|
An optimal weighted aggregated association test for identification of rare variants involved in common diseases. Genetics 2011; 188:181-8. [PMID: 21368279 DOI: 10.1534/genetics.110.125070] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
The advent of next generation sequencing technologies allows one to discover nearly all rare variants in a genomic region of interest. This technological development increases the need for an effective statistical method for testing the aggregated effect of rare variants in a gene on disease susceptibility. The idea behind this approach is that if a certain gene is involved in a disease, many rare variants within the gene will disrupt the function of the gene and are associated with the disease. In this article, we present the rare variant weighted aggregate statistic (RWAS), a method that groups rare variants and computes a weighted sum of differences between case and control mutation counts. We show that our method outperforms the groupwise association test of Madsen and Browning in the disease-risk model that assumes that each variant makes an equally small contribution to disease risk. In addition, we can incorporate prior information into our method of which variants are likely causal. By using simulated data and real mutation screening data of the susceptibility gene for ataxia telangiectasia, we demonstrate that prior information has a substantial influence on the statistical power of association studies. Our method is publicly available at http://genetics.cs.ucla.edu/rarevariants.
Collapse
|
191
|
Le Clerc S, Coulonges C, Delaneau O, Van Manen D, Herbeck JT, Limou S, An P, Martinson JJ, Spadoni JL, Therwath A, Veldink JH, van den Berg LH, Taing L, Labib T, Mellak S, Montes M, Delfraissy JF, Schächter F, Winkler C, Froguel P, Mullins JI, Schuitemaker H, Zagury JF. Screening low-frequency SNPS from genome-wide association study reveals a new risk allele for progression to AIDS. J Acquir Immune Defic Syndr 2011; 56:279-84. [PMID: 21107268 PMCID: PMC3386792 DOI: 10.1097/qai.0b013e318204982b] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
BACKGROUND Seven genome-wide association studies (GWAS) have been published in AIDS, and only associations in the HLA region on chromosome 6 and CXCR6 have passed genome-wide significance. METHODS We reanalyzed the data from 3 previously published GWAS, targeting specifically low-frequency SNPs (minor allele frequency <5%). Two groups composed of 365 slow progressors and 147 rapid progressors from Europe and the United States were compared with a control group of 1394 seronegative individuals using Eigenstrat corrections. RESULTS Of the 8584 SNPs with minor allele frequency <5% in cases and controls (Bonferroni threshold = 5.8 × 10⁻⁶), 4 SNPs showed statistical evidence of association with the slow progressor phenotype. The best result was for HCP5 rs2395029 [P = 8.54 × 10⁻¹⁵, odds ratio (OR) = 3.41] in the HLA locus, in partial linkage disequilibrium with 2 additional chromosome 6 associations in C6orf48 (P = 3.03 × 10⁻¹⁰, OR = 2.9) and NOTCH4 (9.08 × 10⁻⁰⁷, OR = 2.32). The fourth association corresponded to rs2072255 located in RICH2 (P = 3.30 × 10⁻⁰⁶, OR = 0.43) in chromosome 17. Using HCP5 rs2395029 as a covariate, the C6orf48 and NOTCH4 signals disappeared, but the RICH2 signal still remained significant. CONCLUSIONS Besides the already known chromosome 6 associations, the analysis of low-frequency SNPs brought up a new association in the RICH2 gene. Interestingly, RICH2 interacts with BST-2 known to be a major restriction factor for HIV-1 infection. Our study has thus identified a new candidate gene for AIDS molecular etiology and confirms the interest of singling out low-frequency SNPs to exploit GWAS data.
Collapse
Affiliation(s)
- Sigrid Le Clerc
- Chaire de Bioinformatique, Conservatoire National des Arts et Métiers, Paris, France
- Université Paris 12, INSERM U955, Créteil, France
- ANRS Genomic Group (French Agency for Research on AIDS and Hepatitis), Paris, France
| | - Cédric Coulonges
- Chaire de Bioinformatique, Conservatoire National des Arts et Métiers, Paris, France
- ANRS Genomic Group (French Agency for Research on AIDS and Hepatitis), Paris, France
| | - Olivier Delaneau
- Chaire de Bioinformatique, Conservatoire National des Arts et Métiers, Paris, France
| | - Danielle Van Manen
- Department of Experimental Immunology, Sanquin Research, Landsteiner Laboratory, Center for Infectious Diseases and Immunity Amsterdam (CINIMA) Academic Medical Center, University of Amsterdam, Amsterdam, Netherlands
| | - Joshua T. Herbeck
- University of Washington School of Medicine, Department of Microbiology, Seattle, WA, USA
| | - Sophie Limou
- Chaire de Bioinformatique, Conservatoire National des Arts et Métiers, Paris, France
- Université Paris 12, INSERM U955, Créteil, France
- ANRS Genomic Group (French Agency for Research on AIDS and Hepatitis), Paris, France
- CEA/Institut de Génomique, Centre National de Génotypage, Evry, France
| | - Ping An
- Laboratory of Genomic Diversity, SAIC-Frederick, Inc., National Cancer Institute-Frederick, Frederick, MD, USA
| | | | - Jean-Louis Spadoni
- Chaire de Bioinformatique, Conservatoire National des Arts et Métiers, Paris, France
| | - Amu Therwath
- Laboratoire d’Oncologie Moléculaire, Université Paris 7, Paris, France
| | - Jan H. Veldink
- Rudolf Magnus Institute of Neuroscience, Department of Neurology, University Medical Center Utrecht, 3584 CX, Utrecht, The Netherlands
| | - Leonard H. van den Berg
- Rudolf Magnus Institute of Neuroscience, Department of Neurology, University Medical Center Utrecht, 3584 CX, Utrecht, The Netherlands
| | - Lieng Taing
- Chaire de Bioinformatique, Conservatoire National des Arts et Métiers, Paris, France
| | - Taoufik Labib
- Chaire de Bioinformatique, Conservatoire National des Arts et Métiers, Paris, France
| | - Safa Mellak
- Chaire de Bioinformatique, Conservatoire National des Arts et Métiers, Paris, France
| | - Matthieu Montes
- Chaire de Bioinformatique, Conservatoire National des Arts et Métiers, Paris, France
| | | | - François Schächter
- Chaire de Bioinformatique, Conservatoire National des Arts et Métiers, Paris, France
| | - Cheryl Winkler
- Laboratory of Genomic Diversity, SAIC-Frederick, Inc., National Cancer Institute-Frederick, Frederick, MD, USA
| | - Philippe Froguel
- UMR CNRS 8090, Institut Pasteur de Lille, Lille, France
- Genomic Medicine, Hammersmith Hospital, Imperial College London, London, UK
| | - James I. Mullins
- University of Washington School of Medicine, Department of Microbiology, Seattle, WA, USA
| | - Hanneke Schuitemaker
- Department of Experimental Immunology, Sanquin Research, Landsteiner Laboratory, Center for Infectious Diseases and Immunity Amsterdam (CINIMA) Academic Medical Center, University of Amsterdam, Amsterdam, Netherlands
| | - Jean-François Zagury
- Chaire de Bioinformatique, Conservatoire National des Arts et Métiers, Paris, France
- Université Paris 12, INSERM U955, Créteil, France
- ANRS Genomic Group (French Agency for Research on AIDS and Hepatitis), Paris, France
| |
Collapse
|
192
|
Shriner D, Vaughan LK. A unified framework for multi-locus association analysis of both common and rare variants. BMC Genomics 2011; 12:89. [PMID: 21281506 PMCID: PMC3040731 DOI: 10.1186/1471-2164-12-89] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2010] [Accepted: 01/31/2011] [Indexed: 11/10/2022] Open
Abstract
Background Common, complex diseases are hypothesized to result from a combination of common and rare genetic variants. We developed a unified framework for the joint association testing of both types of variants. Within the framework, we developed a union-intersection test suitable for genome-wide analysis of single nucleotide polymorphisms (SNPs), candidate gene data, as well as medical sequencing data. The union-intersection test is a composite test of association of genotype frequencies and differential correlation among markers. Results We demonstrated by computer simulation that the false positive error rate was controlled at the expected level. We also demonstrated scenarios in which the multi-locus test was more powerful than traditional single marker analysis. To illustrate use of the union-intersection test with real data, we analyzed a publically available data set of 319,813 autosomal SNPs genotyped for 938 cases of Parkinson disease and 863 neurologically normal controls for which no genome-wide significant results were found by traditional single marker analysis. We also analyzed an independent follow-up sample of 183 cases and 248 controls for replication. Conclusions We identified a single risk haplotype with a directionally consistent effect in both samples in the gene GAK, which is involved in clathrin-mediated membrane trafficking. We also found suggestive evidence that directionally inconsistent marginal effects from single marker analysis appeared to result from risk being driven by different haplotypes in the two samples for the genes SYN3 and NGLY1, which are involved in neurotransmitter release and proteasomal degradation, respectively. These results illustrate the utility of our unified framework for genome-wide association analysis of common, complex diseases.
Collapse
Affiliation(s)
- Daniel Shriner
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, Bethesda, MD 20892, USA.
| | | |
Collapse
|
193
|
Lee JS, Choi M, Yan X, Lifton RP, Zhao H. On optimal pooling designs to identify rare variants through massive resequencing. Genet Epidemiol 2011; 35:139-47. [PMID: 21254222 PMCID: PMC3176340 DOI: 10.1002/gepi.20561] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2010] [Revised: 09/17/2010] [Accepted: 12/09/2010] [Indexed: 11/18/2022]
Abstract
The advent of next-generation sequencing technologies has facilitated the detection of rare variants. Despite the significant cost reduction, sequencing cost is still high for large-scale studies. In this article, we examine DNA pooling as a cost-effective strategy for rare variant detection. We consider the optimal number of individuals in a DNA pool to detect an allele with a specific minor allele frequency (MAF) under a given coverage depth and detection threshold. We found that the optimal number of individuals in a pool is indifferent to the MAF at the same coverage depth and detection threshold. In addition, when the individual contributions to each pool are equal, the total number of individuals across different pools required in an optimal design to detect a variant with a desired power is similar at different coverage depths. When the contributions are more variable, more individuals tend to be needed for higher coverage depths. Our study provides general guidelines on using DNA pooling for more cost-effective identifications of rare variants. Genet. Epidemiol. 35:139-147, 2011. © 2011 Wiley-Liss, Inc.
Collapse
Affiliation(s)
- Joon Sang Lee
- Department of Epidemiology and Public Health, Yale University, New Haven, Connecticut, USA.
| | | | | | | | | |
Collapse
|
194
|
Liu DJ, Leal SM. Replication strategies for rare variant complex trait association studies via next-generation sequencing. Am J Hum Genet 2010; 87:790-801. [PMID: 21129725 DOI: 10.1016/j.ajhg.2010.10.025] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2010] [Revised: 10/08/2010] [Accepted: 10/26/2010] [Indexed: 01/09/2023] Open
Abstract
There is solid evidence that complex traits can be caused by rare variants. Next-generation sequencing technologies are powerful tools for mapping rare variants. Confirmation of significant findings in stage 1 through replication in an independent stage 2 sample is necessary for association studies. For gene-based mapping of rare variants, two replication strategies are possible: (1) variant-based replication, wherein only variants from nucleotide sites uncovered in stage 1 are genotyped and followed-up and (2) sequence-based replication, wherein the gene region is sequenced in the replication sample and both known and novel variants are tested. The efficiency of the two strategies is dependent on the proportions of causative variants discovered in stage 1 and sequencing/genotyping errors. With rigorous population genetic and phenotypic models, it is demonstrated that sequence-based replication is consistently more powerful. However, the power gain is small (1) for large-scale studies with thousands of individuals, because a large fraction of causative variant sites can be observed and (2) for small- to medium-scale studies with a few hundred samples, because a large proportion of the locus population attributable risk can be explained by the uncovered variants. Therefore, genotyping can be a temporal solution for replicating genetic studies if stage 1 and 2 samples are drawn from the same population. However, sequence-based replication is advantageous if the stage 1 sample is small or novel variants discovery is also of interest. It is shown that currently attainable levels of sequencing error only minimally affect the comparison, and the advantage of sequence-based replication remains.
Collapse
Affiliation(s)
- Dajiang J Liu
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | | |
Collapse
|
195
|
Taub MA, Corrada Bravo H, Irizarry RA. Overcoming bias and systematic errors in next generation sequencing data. Genome Med 2010; 2:87. [PMID: 21144010 PMCID: PMC3025429 DOI: 10.1186/gm208] [Citation(s) in RCA: 74] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Considerable time and effort has been spent in developing analysis and quality assessment methods to allow the use of microarrays in a clinical setting. As is the case for microarrays and other high-throughput technologies, data from new high-throughput sequencing technologies are subject to technological and biological biases and systematic errors that can impact downstream analyses. Only when these issues can be readily identified and reliably adjusted for will clinical applications of these new technologies be feasible. Although much work remains to be done in this area, we describe consistently observed biases that should be taken into account when analyzing high-throughput sequencing data. In this article, we review current knowledge about these biases, discuss their impact on analysis results, and propose solutions.
Collapse
Affiliation(s)
- Margaret A Taub
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 North Wolfe Street, E3527, Baltimore, MD 21205, USA
| | - Hector Corrada Bravo
- Department of Computer Science, University of Maryland Institute for Advanced Computer Studies and Center for Bioinformatics and Computational Biology, Biomolecular Sciences Building 296, College Park, MD 20742, USA
| | - Rafael A Irizarry
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 North Wolfe Street, E3527, Baltimore, MD 21205, USA
| |
Collapse
|
196
|
Affiliation(s)
- Jennifer Asimit
- Wellcome Trust Sanger Institute, Hinxton CB10 1SA, United Kingdom;
| | | |
Collapse
|
197
|
Ku CS, Naidoo N, Teo SM, Pawitan Y. Regions of homozygosity and their impact on complex diseases and traits. Hum Genet 2010; 129:1-15. [PMID: 21104274 DOI: 10.1007/s00439-010-0920-6] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2010] [Accepted: 11/04/2010] [Indexed: 12/23/2022]
Abstract
Regions of homozygosity (ROHs) are more abundant in the human genome than previously thought. These regions are without heterozygosity, i.e. all the genetic variations within the regions have two identical alleles. At present there are no standardized criteria for defining the ROHs resulting in the different studies using their own criteria in the analysis of homozygosity. Compared to the era of genotyping microsatellite markers, the advent of high-density single nucleotide polymorphism genotyping arrays has provided an unparalleled opportunity to comprehensively detect these regions in the whole genome in different populations. Several studies have identified ROHs which were associated with complex phenotypes such as schizophrenia, late-onset of Alzheimer's disease and height. Collectively, these studies have conclusively shown the abundance of ROHs larger than 1 Mb in outbred populations. The homozygosity association approach holds great promise in identifying genetic susceptibility loci harboring recessive variants for complex diseases and traits.
Collapse
Affiliation(s)
- Chee Seng Ku
- Department of Epidemiology and Public Health, Centre for Molecular Epidemiology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
| | | | | | | |
Collapse
|
198
|
Extending rare-variant testing strategies: analysis of noncoding sequence and imputed genotypes. Am J Hum Genet 2010; 87:604-17. [PMID: 21070896 DOI: 10.1016/j.ajhg.2010.10.012] [Citation(s) in RCA: 111] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2010] [Revised: 08/01/2010] [Accepted: 10/07/2010] [Indexed: 11/23/2022] Open
Abstract
Next Generation Sequencing Technology has revolutionized our ability to study the contribution of rare genetic variation to heritable traits. However, existing single-marker association tests are underpowered for detecting rare risk variants. A more powerful approach involves pooling methods that combine multiple rare variants from the same gene into a single test statistic. Proposed pooling methods can be limited because they generally assume high-quality genotypes derived from deep-coverage sequencing, which may not be available. In this paper, we consider an intuitive and computationally efficient pooling statistic, the cumulative minor-allele test (CMAT). We assess the performance of the CMAT and other pooling methods on datasets simulated with population genetic models to contain realistic levels of neutral variation. We consider study designs ranging from exon-only to whole-gene analyses that contain noncoding variants. For all study designs, the CMAT achieves power comparable to that of previously proposed methods. We then extend the CMAT to probabilistic genotypes and describe application to low-coverage sequencing and imputation data. We show that augmenting sequence data with imputed samples is a practical method for increasing the power of rare-variant studies. We also provide a method of controlling for confounding variables such as population stratification. Finally, we demonstrate that our method makes it possible to use external imputation templates to analyze rare variants imputed into existing GWAS datasets. As proof of principle, we performed a CMAT analysis of more than 8 million SNPs that we imputed into the GAIN psoriasis dataset by using haplotypes from the 1000 Genomes Project.
Collapse
|
199
|
King CR, Rathouz PJ, Nicolae DL. An evolutionary framework for association testing in resequencing studies. PLoS Genet 2010; 6:e1001202. [PMID: 21085648 PMCID: PMC2978703 DOI: 10.1371/journal.pgen.1001202] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2010] [Accepted: 10/07/2010] [Indexed: 11/17/2022] Open
Abstract
Sequencing technologies are becoming cheap enough to apply to large numbers of study participants and promise to provide new insights into human phenotypes by bringing to light rare and previously unknown genetic variants. We develop a new framework for the analysis of sequence data that incorporates all of the major features of previously proposed approaches, including those focused on allele counts and allele burden, but is both more general and more powerful. We harness population genetic theory to provide prior information on effect sizes and to create a pooling strategy for information from rare variants. Our method, EMMPAT (Evolutionary Mixed Model for Pooled Association Testing), generates a single test per gene (substantially reducing multiple testing concerns), facilitates graphical summaries, and improves the interpretation of results by allowing calculation of attributable variance. Simulations show that, relative to previously used approaches, our method increases the power to detect genes that affect phenotype when natural selection has kept alleles with large effect sizes rare. We demonstrate our approach on a population-based re-sequencing study of association between serum triglycerides and variation in ANGPTL4.
Collapse
Affiliation(s)
- C Ryan King
- Department of Health Studies, University of Chicago, Chicago, Illinois, United States of America.
| | | | | |
Collapse
|
200
|
Zhang L, Pei YF, Li J, Papasian CJ, Deng HW. Improved detection of rare genetic variants for diseases. PLoS One 2010; 5:e13857. [PMID: 21079782 PMCID: PMC2975623 DOI: 10.1371/journal.pone.0013857] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2010] [Accepted: 09/30/2010] [Indexed: 11/18/2022] Open
Abstract
Technology advances have promoted gene-based sequencing studies with the aim of identifying rare mutations responsible for complex diseases. A complication in these types of association studies is that the vast majority of non-synonymous mutations are believed to be neutral to phenotypes. It is thus critical to distinguish potential causative variants from neutral variation before performing association tests. In this study, we used existing predicting algorithms to predict functional amino acid substitutions, and incorporated that information into association tests. Using simulations, we comprehensively studied the effects of several influential factors, including the sensitivity and specificity of functional variant predictions, number of variants, and proportion of causative variants, on the performance of association tests. Our results showed that incorporating information regarding functional variants obtained from existing prediction algorithms improves statistical power under certain conditions, particularly when the proportion of causative variants is moderate. The application of the proposed tests to a real sequencing study confirms our conclusions. Our work may help investigators who are planning to pursue gene-based sequencing studies.
Collapse
Affiliation(s)
- Lei Zhang
- Center of System Biomedical Sciences, University of Shanghai for Science and Technology, Shanghai, People's Republic of China
- Key Laboratory of Biomedical Information Engineering, School of Life Science and Technology, Ministry of Education and Institute of Molecular Genetics, Xi'an Jiaotong University, Xi'an, Shaanxi, People's Republic of China
| | - Yu-Fang Pei
- Center of System Biomedical Sciences, University of Shanghai for Science and Technology, Shanghai, People's Republic of China
- Key Laboratory of Biomedical Information Engineering, School of Life Science and Technology, Ministry of Education and Institute of Molecular Genetics, Xi'an Jiaotong University, Xi'an, Shaanxi, People's Republic of China
| | - Jian Li
- School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America
| | - Christopher J. Papasian
- School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America
| | - Hong-Wen Deng
- Center of System Biomedical Sciences, University of Shanghai for Science and Technology, Shanghai, People's Republic of China
- School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America
- College of Life Sciences and Engineering, Beijing Jiao Tong University, Beijing, People's Republic of China
- * E-mail:
| |
Collapse
|