201
|
The genetic architecture of methotrexate toxicity is similar in Drosophila melanogaster and humans. G3-GENES GENOMES GENETICS 2013; 3:1301-10. [PMID: 23733889 PMCID: PMC3737169 DOI: 10.1534/g3.113.006619] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
The severity of the toxic side effects of chemotherapy varies among patients, and much of this variation is likely genetically based. Here, we use the model system Drosophila melanogaster to genetically dissect the toxicity of methotrexate (MTX), a drug used primarily to treat childhood acute lymphoblastic leukemia and rheumatoid arthritis. We use the Drosophila Synthetic Population Resource, a panel of recombinant inbred lines derived from a multiparent advanced intercross, and quantify MTX toxicity as a reduction in female fecundity. We identify three quantitative trait loci (QTL) affecting MTX toxicity; two colocalize with the fly orthologs of human genes believed to mediate MTX toxicity and one is a novel MTX toxicity gene with a human ortholog. A fourth suggestive QTL spans a centromere. Local single-marker association scans of candidate gene exons fail to implicate amino acid variants as the causative single-nucleotide polymorphisms, and we therefore hypothesize the causative variation is regulatory. In addition, the effects at our mapped QTL do not conform to a simple biallelic pattern, suggesting multiple causative factors underlie the QTL mapping results. Consistent with this observation, no single single-nucleotide polymorphism located in or near a candidate gene can explain the QTL mapping signal. Overall, our results validate D. melanogaster as a model for uncovering the genetic basis of chemotoxicity and suggest the genetic basis of MTX toxicity is due to a handful of genes each harboring multiple segregating regulatory factors.
Collapse
|
202
|
Jiang Y, Epstein MP, Conneely KN. Assessing the impact of population stratification on association studies of rare variation. Hum Hered 2013; 76:28-35. [PMID: 23921847 DOI: 10.1159/000353270] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2012] [Accepted: 05/27/2013] [Indexed: 11/19/2022] Open
Abstract
AIMS The study of rare variants, which can potentially explain a great proportion of heritability, has emerged as an important topic in human gene mapping of complex diseases. Although several statistical methods have been developed to increase the power to detect disease-related rare variants, none of these methods address an important issue that often arises in genetic studies: false positives due to population stratification. Using simulations, we investigated the impact of population stratification on false-positive rates of rare-variant association tests. METHODS We simulated a series of case-control studies assuming various sample sizes and levels of population structure. Using such data, we examined the impact of population stratification on rare-variant collapsing and burden tests of rare variation. We further evaluated the ability of 2 existing methods (principal component analysis and genomic control) to correct for stratification in such rare-variant studies. RESULTS We found that population stratification can have a significant influence on studies of rare variants especially when the sample size is large and the population is severely stratified. Our results showed that principal component analysis performed quite well in most situations, while genomic control often yielded conservative results. CONCLUSIONS Our results imply that researchers need to carefully match cases and controls on ancestry in order to avoid false positives caused by population structure in studies of rare variants, particularly if genome-wide data are not available.
Collapse
Affiliation(s)
- Yunxuan Jiang
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Ga., USA
| | | | | |
Collapse
|
203
|
Quast C, Cuboni S, Bader D, Altmann A, Weber P, Arloth J, Röh S, Brückl T, Ising M, Kopczak A, Erhardt A, Hausch F, Lucae S, Binder EB. Functional coding variants in SLC6A15, a possible risk gene for major depression. PLoS One 2013; 8:e68645. [PMID: 23874702 PMCID: PMC3712998 DOI: 10.1371/journal.pone.0068645] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2013] [Accepted: 05/30/2013] [Indexed: 11/18/2022] Open
Abstract
SLC6A15 is a neuron-specific neutral amino acid transporter that belongs to the solute carrier 6 gene family. This gene family is responsible for presynaptic re-uptake of the majority of neurotransmitters. Convergent data from human studies, animal models and pharmacological investigations suggest a possible role of SLC6A15 in major depressive disorder. In this work, we explored potential functional variants in this gene that could influence the activity of the amino acid transporter and thus downstream neuronal function and possibly the risk for stress-related psychiatric disorders. DNA from 400 depressed patients and 400 controls was screened for genetic variants using a pooled targeted re-sequencing approach. Results were verified by individual re-genotyping and validated non-synonymous coding variants were tested in an independent sample (N = 1934). Nine variants altering the amino acid sequence were then assessed for their functional effects by measuring SLC6A15 transporter activity in a cellular uptake assay. In total, we identified 405 genetic variants, including twelve non-synonymous variants. While none of the non-synonymous coding variants showed significant differences in case-control associations, two rare non-synonymous variants were associated with a significantly increased maximal (3)H proline uptake as compared to the wildtype sequence. Our data suggest that genetic variants in the SLC6A15 locus change the activity of the amino acid transporter and might thus influence its neuronal function and the risk for stress-related psychiatric disorders. As statistically significant association for rare variants might only be achieved in extremely large samples (N >70,000) functional exploration may shed light on putatively disease-relevant variants.
Collapse
Affiliation(s)
- Carina Quast
- Max Planck Institute of Psychiatry, Munich, Germany.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
204
|
Hendricks AE, Dupuis J, Logue MW, Myers RH, Lunetta KL. Correction for multiple testing in a gene region. Eur J Hum Genet 2013; 22:414-8. [PMID: 23838599 DOI: 10.1038/ejhg.2013.144] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2012] [Revised: 04/27/2013] [Accepted: 05/11/2013] [Indexed: 11/09/2022] Open
Abstract
Several methods to correct for multiple testing within a gene region have been proposed. These methods are useful for candidate gene studies, and to fine map gene-regions from GWAs. The Bonferroni correction and permutation are common adjustments, but are overly conservative and computationally intensive, respectively. Other options include calculating the effective number of independent single-nucleotide polymorphisms (SNPs) or using theoretical approximations. Here, we compare a theoretical approximation based on extreme tail theory with four methods for calculating the effective number of independent SNPs. We evaluate the type-I error rates of these methods using single SNP association tests over 10 gene regions simulated using 1000 Genomes data. Overall, we find that the effective number of independent SNP method by Gao et al, as well as extreme tail theory produce type-I error rates at the or close to the chosen significance level. The type-I error rates for the other effective number of independent SNP methods vary by gene region characteristics. We find Gao et al and extreme tail theory to be efficient alternatives to more computationally intensive approaches to control for multiple testing in gene regions.
Collapse
Affiliation(s)
- Audrey E Hendricks
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Josée Dupuis
- 1] Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA [2] Bioinformatics Program, Boston University, Boston, MA, USA
| | - Mark W Logue
- 1] Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA [2] Department of Biomedical Genetics, Boston University School of Medicine, Boston, MA, USA
| | - Richard H Myers
- Department of Neurology, Boston University School of Medicine, Boston, MA, USA
| | - Kathryn L Lunetta
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| |
Collapse
|
205
|
Ayers KL, Cordell HJ. Identification of grouped rare and common variants via penalized logistic regression. Genet Epidemiol 2013; 37:592-602. [PMID: 23836590 PMCID: PMC3842118 DOI: 10.1002/gepi.21746] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2012] [Revised: 05/24/2013] [Accepted: 05/24/2013] [Indexed: 11/09/2022]
Abstract
In spite of the success of genome-wide association studies in finding many common variants associated with disease, these variants seem to explain only a small proportion of the estimated heritability. Data collection has turned toward exome and whole genome sequencing, but it is well known that single marker methods frequently used for common variants have low power to detect rare variants associated with disease, even with very large sample sizes. In response, a variety of methods have been developed that attempt to cluster rare variants so that they may gather strength from one another under the premise that there may be multiple causal variants within a gene. Most of these methods group variants by gene or proximity, and test one gene or marker window at a time. We propose a penalized regression method (PeRC) that analyzes all genes at once, allowing grouping of all (rare and common) variants within a gene, along with subgrouping of the rare variants, thus borrowing strength from both rare and common variants within the same gene. The method can incorporate either a burden-based weighting of the rare variants or one in which the weights are data driven. In simulations, our method performs favorably when compared to many previously proposed approaches, including its predecessor, the sparse group lasso [Friedman et al., 2010].
Collapse
Affiliation(s)
- Kristin L Ayers
- Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne NE1 3BZ, United Kingdom.
| | | |
Collapse
|
206
|
O’Connor TD, Kiezun A, Bamshad M, Rich SS, Smith JD, Turner E, Leal SM, Akey JM. Fine-scale patterns of population stratification confound rare variant association tests. PLoS One 2013; 8:e65834. [PMID: 23861739 PMCID: PMC3701690 DOI: 10.1371/journal.pone.0065834] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2013] [Accepted: 04/29/2013] [Indexed: 11/18/2022] Open
Abstract
Advances in next-generation sequencing technology have enabled systematic exploration of the contribution of rare variation to Mendelian and complex diseases. Although it is well known that population stratification can generate spurious associations with common alleles, its impact on rare variant association methods remains poorly understood. Here, we performed exhaustive coalescent simulations with demographic parameters calibrated from exome sequence data to evaluate the performance of nine rare variant association methods in the presence of fine-scale population structure. We find that all methods have an inflated spurious association rate for parameter values that are consistent with levels of differentiation typical of European populations. For example, at a nominal significance level of 5%, some test statistics have a spurious association rate as high as 40%. Finally, we empirically assess the impact of population stratification in a large data set of 4,298 European American exomes. Our results have important implications for the design, analysis, and interpretation of rare variant genome-wide association studies.
Collapse
Affiliation(s)
- Timothy D. O’Connor
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Adam Kiezun
- Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts, United States of America
- Division of Genetics, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Michael Bamshad
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
- Department of Pediatrics, University of Washington, Seattle, Washington, United States of America
| | - Stephen S. Rich
- Center for Public Health Genomics, University of Virginia School of Medicine, Charlottesville, Virginia, United States of America
| | - Joshua D. Smith
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Emily Turner
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | | | | | - Suzanne M. Leal
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Joshua M. Akey
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| |
Collapse
|
207
|
Abstract
Evaluation of: Guerreiro R, Wojtas A, Brás J et al.TREM2 variants in Alzheimer’s disease. N. Engl. J. Med. 368(2), 117–127 (2013); Jonsson T, Stefansson H, Steinberg S et al. Variant of TREM2 associated with the risk of Alzheimer’s disease. N. Engl. J. Med. 368(2), 107–116 (2013). The articles by Guerreiro et al. and Jonsson et al. report the association between a specific exonic variant in the TREM2 gene, a cell surface receptor involved in immune system regulation, and late-onset Alzheimer’s disease (AD). The observations of these studies are relevant as they further disentangle the genetic causes underlying AD by identifying a disease-associated rare variant in the TREM2 gene exhibiting an effect size comparable to that of the APOEe4 allele (i.e., increasing the risk approximately twofold) thereby strengthening the link between AD and the immune system. All other AD-associated genes described to date have shown smaller effect sizes with increases in risk of 10–20%. The two articles also underline the role of rare variants in this complex disease and the importance of sequencing analyses for detecting these, while commonly used genome-wide association studies are typically designed to screen the genome and identify common variants.
Collapse
Affiliation(s)
- Giuseppe Tosto
- The Taub Institute for Research on Alzheimer’s Disease & the Aging Brain, Columbia University, New York, NY, USA
| | - Christiane Reitz
- Sergievsky Center, Columbia University, New York, NY, USA
- Department of Neurology, Columbia University, New York, NY, USA
| |
Collapse
|
208
|
Goldstein DB, Allen A, Keebler J, Margulies EH, Petrou S, Petrovski S, Sunyaev S. Sequencing studies in human genetics: design and interpretation. Nat Rev Genet 2013; 14:460-70. [PMID: 23752795 DOI: 10.1038/nrg3455] [Citation(s) in RCA: 185] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Next-generation sequencing is becoming the primary discovery tool in human genetics. There have been many clear successes in identifying genes that are responsible for Mendelian diseases, and sequencing approaches are now poised to identify the mutations that cause undiagnosed childhood genetic diseases and those that predispose individuals to more common complex diseases. There are, however, growing concerns that the complexity and magnitude of complete sequence data could lead to an explosion of weakly justified claims of association between genetic variants and disease. Here, we provide an overview of the basic workflow in next-generation sequencing studies and emphasize, where possible, measures and considerations that facilitate accurate inferences from human sequencing studies.
Collapse
Affiliation(s)
- David B Goldstein
- Center for Human Genome Variation, Duke University School of Medicine, 308 Research Drive, Box 91009, LSRC B Wing, Room 330, Durham, North Carolina 27708, USA.
| | | | | | | | | | | | | |
Collapse
|
209
|
Abo-Ismail MK, Kelly MJ, Squires EJ, Swanson KC, Bauck S, Miller SP. Identification of single nucleotide polymorphisms in genes involved in digestive and metabolic processes associated with feed efficiency and performance traits in beef cattle1,2. J Anim Sci 2013; 91:2512-29. [DOI: 10.2527/jas.2012-5756] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- M. K. Abo-Ismail
- Centre for Genetic Improvement of Livestock, Department of Animal and Poultry Science, University of Guelph, Guelph, Ontario, Canada, N1G 2W0
- Department of Animal and Poultry Science, Damanhour University, Damanhour, Egypt
| | - M. J. Kelly
- Centre for Genetic Improvement of Livestock, Department of Animal and Poultry Science, University of Guelph, Guelph, Ontario, Canada, N1G 2W0
- Queensland Alliance for Agriculture and Food Innovation University of Queensland, St Lucia, QLD 4072, Australia
| | - E. J. Squires
- Centre for Genetic Improvement of Livestock, Department of Animal and Poultry Science, University of Guelph, Guelph, Ontario, Canada, N1G 2W0
| | - K. C. Swanson
- Centre for Genetic Improvement of Livestock, Department of Animal and Poultry Science, University of Guelph, Guelph, Ontario, Canada, N1G 2W0
- Animal Sciences Department, North Dakota State University, Fargo 58108-6050
| | - S. Bauck
- GeneSeek, 4665 Innovation Drive, Suite 120, Lincoln, NE 68521
| | - S. P. Miller
- Centre for Genetic Improvement of Livestock, Department of Animal and Poultry Science, University of Guelph, Guelph, Ontario, Canada, N1G 2W0
| |
Collapse
|
210
|
Wu G, Zhi D. Pathway-based approaches for sequencing-based genome-wide association studies. Genet Epidemiol 2013; 37:478-94. [PMID: 23650134 DOI: 10.1002/gepi.21728] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2012] [Revised: 03/04/2013] [Accepted: 03/29/2013] [Indexed: 01/07/2023]
Abstract
For analyzing complex trait association with sequencing data, most current studies test aggregated effects of variants in a gene or genomic region. Although gene-based tests have insufficient power even for moderately sized samples, pathway-based analyses combine information across multiple genes in biological pathways and may offer additional insight. However, most existing pathway association methods are originally designed for genome-wide association studies, and are not comprehensively evaluated for sequencing data. Moreover, region-based rare variant association methods, although potentially applicable to pathway-based analysis by extending their region definition to gene sets, have never been rigorously tested. In the context of exome-based studies, we use simulated and real datasets to evaluate pathway-based association tests. Our simulation strategy adopts a genome-wide genetic model that distributes total genetic effects hierarchically into pathways, genes, and individual variants, allowing the evaluation of pathway-based methods with realistic quantifiable assumptions on the underlying genetic architectures. The results show that, although no single pathway-based association method offers superior performance in all simulated scenarios, a modification of Gene Set Enrichment Analysis approach using statistics from single-marker tests without gene-level collapsing (weighted Kolmogrov-Smirnov [WKS]-Variant method) is consistently powerful. Interestingly, directly applying rare variant association tests (e.g., sequence kernel association test) to pathway analysis offers a similar power, but its results are sensitive to assumptions of genetic architecture. We applied pathway association analysis to an exome-sequencing data of the chronic obstructive pulmonary disease, and found that the WKS-Variant method confirms associated genes previously published.
Collapse
Affiliation(s)
- Guodong Wu
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama 35294, USA
| | | |
Collapse
|
211
|
Schaid DJ, McDonnell SK, Sinnwell JP, Thibodeau SN. Multiple genetic variant association testing by collapsing and kernel methods with pedigree or population structured data. Genet Epidemiol 2013; 37:409-18. [PMID: 23650101 DOI: 10.1002/gepi.21727] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2013] [Revised: 03/11/2013] [Accepted: 04/01/2013] [Indexed: 11/11/2022]
Abstract
Searching for rare genetic variants associated with complex diseases can be facilitated by enriching for diseased carriers of rare variants by sampling cases from pedigrees enriched for disease, possibly with related or unrelated controls. This strategy, however, complicates analyses because of shared genetic ancestry, as well as linkage disequilibrium among genetic markers. To overcome these problems, we developed broad classes of "burden" statistics and kernel statistics, extending commonly used methods for unrelated case-control data to allow for known pedigree relationships, for autosomes and the X chromosome. Furthermore, by replacing pedigree-based genetic correlation matrices with estimates of genetic relationships based on large-scale genomic data, our methods can be used to account for population-structured data. By simulations, we show that the type I error rates of our developed methods are near the asymptotic nominal levels, allowing rapid computation of P-values. Our simulations also show that a linear weighted kernel statistic is generally more powerful than a weighted "burden" statistic. Because the proposed statistics are rapid to compute, they can be readily used for large-scale screening of the association of genomic sequence data with disease status.
Collapse
Affiliation(s)
- Daniel J Schaid
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota 55905, USA.
| | | | | | | |
Collapse
|
212
|
Screening for rare variants in the coding region of ALS-associated genes at 9p21.2 and 19p13.3. Neurobiol Aging 2013; 34:1518.e5-7. [DOI: 10.1016/j.neurobiolaging.2012.09.018] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2012] [Revised: 09/19/2012] [Accepted: 09/26/2012] [Indexed: 11/16/2022]
|
213
|
Listgarten J, Lippert C, Kang EY, Xiang J, Kadie CM, Heckerman D. A powerful and efficient set test for genetic markers that handles confounders. ACTA ACUST UNITED AC 2013; 29:1526-33. [PMID: 23599503 PMCID: PMC3673214 DOI: 10.1093/bioinformatics/btt177] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
MOTIVATION Approaches for testing sets of variants, such as a set of rare or common variants within a gene or pathway, for association with complex traits are important. In particular, set tests allow for aggregation of weak signal within a set, can capture interplay among variants and reduce the burden of multiple hypothesis testing. Until now, these approaches did not address confounding by family relatedness and population structure, a problem that is becoming more important as larger datasets are used to increase power. RESULTS We introduce a new approach for set tests that handles confounders. Our model is based on the linear mixed model and uses two random effects-one to capture the set association signal and one to capture confounders. We also introduce a computational speedup for two random-effects models that makes this approach feasible even for extremely large cohorts. Using this model with both the likelihood ratio test and score test, we find that the former yields more power while controlling type I error. Application of our approach to richly structured Genetic Analysis Workshop 14 data demonstrates that our method successfully corrects for population structure and family relatedness, whereas application of our method to a 15 000 individual Crohn's disease case-control cohort demonstrates that it additionally recovers genes not recoverable by univariate analysis. AVAILABILITY A Python-based library implementing our approach is available at http://mscompbio.codeplex.com.
Collapse
|
214
|
Abstract
The role of rare variants has become a focus in the search for association with complex traits. Imputation is a powerful and cost-efficient tool to access variants that have not been directly typed, but there are several challenges when imputing rare variants, most notably reference panel selection. Extensions to rare variant association tests to incorporate genotype uncertainty from imputation are discussed, as well as the use of imputed low-frequency and rare variants in the study of population isolates.
Collapse
|
215
|
Liu K, Fast S, Zawistowski M, Tintle NL. A geometric framework for evaluating rare variant tests of association. Genet Epidemiol 2013; 37:345-57. [PMID: 23526307 DOI: 10.1002/gepi.21722] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2012] [Revised: 02/12/2013] [Accepted: 02/13/2013] [Indexed: 11/08/2022]
Abstract
The wave of next-generation sequencing data has arrived. However, many questions still remain about how to best analyze sequence data, particularly the contribution of rare genetic variants to human disease. Numerous statistical methods have been proposed to aggregate association signals across multiple rare variant sites in an effort to increase statistical power; however, the precise relation between the tests is often not well understood. We present a geometric representation for rare variant data in which rare allele counts in case and control samples are treated as vectors in Euclidean space. The geometric framework facilitates a rigorous classification of existing rare variant tests into two broad categories: tests for a difference in the lengths of the case and control vectors, and joint tests for a difference in either the lengths or angles of the two vectors. We demonstrate that genetic architecture of a trait, including the number and frequency of risk alleles, directly relates to the behavior of the length and joint tests. Hence, the geometric framework allows prediction of which tests will perform best under different disease models. Furthermore, the structure of the geometric framework immediately suggests additional classes and types of rare variant tests. We consider two general classes of tests which show robustness to noncausal and protective variants. The geometric framework introduces a novel and unique method to assess current rare variant methodology and provides guidelines for both applied and theoretical researchers.
Collapse
Affiliation(s)
- Keli Liu
- Department of Statistics, Harvard University, Cambridge, MA, USA
| | | | | | | |
Collapse
|
216
|
Yeo RA, Gangestad SW, Liu J, Ehrlich S, Thoma RJ, Pommy J, Mayer AR, Schulz SC, Wassink TH, Morrow EM, Bustillo JR, Sponheim SR, Ho BC, Calhoun VD. The impact of copy number deletions on general cognitive ability and ventricle size in patients with schizophrenia and healthy control subjects. Biol Psychiatry 2013; 73:540-5. [PMID: 23237311 PMCID: PMC3582736 DOI: 10.1016/j.biopsych.2012.10.013] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/13/2012] [Revised: 10/03/2012] [Accepted: 10/03/2012] [Indexed: 02/05/2023]
Abstract
BACKGROUND General cognitive ability is usually lower in individuals with schizophrenia, partly due to genetic influences. However, the specific genetic features related to general cognitive ability are poorly understood. Individual variation in a specific type of mutation, uncommon genetic deletions, has recently been linked with both general cognitive ability and risk for schizophrenia. METHODS We derived measures of the aggregate number of "uncommon" deletions (i.e., those occurring in 3% or less of our combined samples) and the total number of base pairs affected by these deletions in individuals with schizophrenia (n = 79) and healthy control subjects (n = 110) and related each measure to the first principal component of a large battery of cognitive tests, a common technique for characterizing general cognitive ability. These two measures of mutation load were also evaluated for relationships with total brain gray matter, white matter, and lateral ventricle volume. RESULTS The groups did not differ on genetic variables. Multivariate general linear models revealed a group (control subjects vs. patients) × uncommon deletion number interaction, such that the latter variable was associated with lower general cognitive ability and larger ventricles in patients but not control subjects. CONCLUSIONS These data suggest that aggregate uncommon deletion burden moderates central features of the schizophrenia phenotype.
Collapse
Affiliation(s)
- Ronald A Yeo
- Department of Psychology, University of New Mexico, Albuquerque, New Mexico, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
217
|
Zhao LP, Huang X. Recursive organizer (ROR): an analytic framework for sequence-based association analysis. Hum Genet 2013; 132:745-59. [PMID: 23494241 DOI: 10.1007/s00439-013-1285-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2012] [Accepted: 03/03/2013] [Indexed: 12/13/2022]
Abstract
The advent of next-generation sequencing technologies affords the ability to sequence thousands of subjects cost-effectively, and is revolutionizing the landscape of genetic research. With the evolving genotyping/sequencing technologies, it is not unrealistic to expect that we will soon obtain a pair of diploidic fully phased genome sequences from each subject in the near future. Here, in light of this potential, we propose an analytic framework called, recursive organizer (ROR), which recursively groups sequence variants based upon sequence similarities and their empirical disease associations, into fewer and potentially more interpretable super sequence variants (SSV). As an illustration, we applied ROR to assess an association between HLA-DRB1 and type 1 diabetes (T1D), discovering SSVs of HLA-DRB1 with sequence data from the Wellcome Trust Case Control Consortium. Specifically, ROR reduces 36 observed unique HLA-DRB1 sequences into 8 SSVs that empirically associate with T1D, a fourfold reduction of sequence complexity. Using HLA-DRB1 data from Type 1 Diabetes Genetics Consortium as cases and data from Fred Hutchinson Cancer Research Center as controls, we are able to validate associations of these SSVs with T1D. Further, SSVs consist of nine nucleotides, and each associates with its corresponding amino acids. Detailed examination of these selected amino acids reveals their potential functional roles in protein structures and possible implication to the mechanism of T1D.
Collapse
Affiliation(s)
- Lue Ping Zhao
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, Mailstop M2-B500, P.O. Box 19024, Seattle, WA 98109-1024, USA.
| | | |
Collapse
|
218
|
Walters RG, Coin LJM, Ruokonen A, de Smith AJ, El-Sayed Moustafa JS, Jacquemont S, Elliott P, Esko T, Hartikainen AL, Laitinen J, Männik K, Martinet D, Meyre D, Nauck M, Schurmann C, Sladek R, Thorleifsson G, Thorsteinsdóttir U, Valsesia A, Waeber G, Zufferey F, Balkau B, Pattou F, Metspalu A, Völzke H, Vollenweider P, Stefansson K, Järvelin MR, Beckmann JS, Froguel P, Blakemore AIF. Rare genomic structural variants in complex disease: lessons from the replication of associations with obesity. PLoS One 2013; 8:e58048. [PMID: 23554873 PMCID: PMC3595275 DOI: 10.1371/journal.pone.0058048] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2012] [Accepted: 01/30/2013] [Indexed: 01/19/2023] Open
Abstract
The limited ability of common variants to account for the genetic contribution to complex disease has prompted searches for rare variants of large effect, to partly explain the 'missing heritability'. Analyses of genome-wide genotyping data have identified genomic structural variants (GSVs) as a source of such rare causal variants. Recent studies have reported multiple GSV loci associated with risk of obesity. We attempted to replicate these associations by similar analysis of two familial-obesity case-control cohorts and a population cohort, and detected GSVs at 11 out of 18 loci, at frequencies similar to those previously reported. Based on their reported frequencies and effect sizes (OR≥25), we had sufficient statistical power to detect the large majority (80%) of genuine associations at these loci. However, only one obesity association was replicated. Deletion of a 220 kb region on chromosome 16p11.2 has a carrier population frequency of 2×10(-4) (95% confidence interval [9.6×10(-5)-3.1×10(-4)]); accounts overall for 0.5% [0.19%-0.82%] of severe childhood obesity cases (P = 3.8×10(-10); odds ratio = 25.0 [9.9-60.6]); and results in a mean body mass index (BMI) increase of 5.8 kg.m(-2) [1.8-10.3] in adults from the general population. We also attempted replication using BMI as a quantitative trait in our population cohort; associations with BMI at or near nominal significance were detected at two further loci near KIF2B and within FOXP2, but these did not survive correction for multiple testing. These findings emphasise several issues of importance when conducting rare GSV association, including the need for careful cohort selection and replication strategy, accurate GSV identification, and appropriate correction for multiple testing and/or control of false discovery rate. Moreover, they highlight the potential difficulty in replicating rare CNV associations across different populations. Nevertheless, we show that such studies are potentially valuable for the identification of variants making an appreciable contribution to complex disease.
Collapse
Affiliation(s)
- Robin G. Walters
- Department of Genomics of Common Disease, Imperial College London, London, United Kingdom
- Clinical Trial Service Unit and Epidemiological Studies Unit, University of Oxford, Oxford, United Kingdom
| | - Lachlan J. M. Coin
- Department of Genomics of Common Disease, Imperial College London, London, United Kingdom
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia
| | - Aimo Ruokonen
- Institute of Diagnostics, Clinical Chemistry, University of Oulu, Oulu, Finland
- Oulu University Hospital, Oulu, Finland
| | - Adam J. de Smith
- Department of Genomics of Common Disease, Imperial College London, London, United Kingdom
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, California, United States of America
| | | | - Sebastien Jacquemont
- Service of Medical Genetics, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
| | - Paul Elliott
- Department of Epidemiology and Biostatistics, Imperial College London, London, United Kingdom
- MRC Health Protection Agency (HPA) Centre for Environment and Health, Imperial College London, London, United Kingdom
| | - Tõnu Esko
- Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
- Estonian Genome Center, University of Tartu, Tartu, Estonia
| | - Anna-Liisa Hartikainen
- Institute of Clinical Sciences/Obstetrics and Gynecology, University of Oulu, Oulu, Finland
| | | | - Katrin Männik
- Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
- The Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Danielle Martinet
- Service of Medical Genetics, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
| | - David Meyre
- CNRS 8199-Institute of Biology, Pasteur Institute, Lille, France
- Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada
| | - Matthias Nauck
- Institute of Clinical Chemistry and Laboratory Medicine, Ernst-Moritz-Arndt-University, Greifswald, Germany
| | - Claudia Schurmann
- Interfaculty Institute for Genetics and Functional Genomics, Ernst-Moritz-Arndt-University, Greifswald, Germany
| | - Rob Sladek
- McGill University and Genome Quebec Innovation Centre, Montreal, Canada
- Department of Medicine and Human Genetics, McGill University, Montreal, Canada
| | | | - Unnur Thorsteinsdóttir
- deCODE Genetics, Reykjavík, Iceland
- Faculty of Medicine, University of Iceland, Reykjavik, Iceland
| | - Armand Valsesia
- Department of Medical Genetics, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
| | - Gerard Waeber
- Department of Internal Medicine, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
| | - Flore Zufferey
- Service of Medical Genetics, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
| | - Beverley Balkau
- INSERM, CESP Centre for Research in Epidemiology and Population Health, U1018, Villejuif, France
- University Paris Sud 11, UMRS 1018, Villejuif, France
| | - François Pattou
- INSERM U859, Lille, France
- Université Lille Nord de France, Centre Hospitalier Universitaire Lille, Lille, France
| | - Andres Metspalu
- Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
- Estonian Genome Center, University of Tartu, Tartu, Estonia
| | - Henry Völzke
- Institute for Community Medicine, Ernst-Moritz-Arndt-University, Greifswald, Germany
| | - Peter Vollenweider
- Department of Internal Medicine, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
| | - Kári Stefansson
- deCODE Genetics, Reykjavík, Iceland
- Faculty of Medicine, University of Iceland, Reykjavik, Iceland
| | - Marjo-Riitta Järvelin
- Department of Epidemiology and Biostatistics, Imperial College London, London, United Kingdom
- MRC Health Protection Agency (HPA) Centre for Environment and Health, Imperial College London, London, United Kingdom
- Institute of Health Sciences, University of Oulu, Oulu, Finland
- Biocenter Oulu, University of Oulu, Oulu, Finland
- Department of Lifecourse and Services, National Institute for Health and Welfare, Oulu, Finland
| | - Jacques S. Beckmann
- Service of Medical Genetics, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
- Department of Medical Genetics, University of Lausanne, Lausanne, Switzerland
| | - Philippe Froguel
- Department of Genomics of Common Disease, Imperial College London, London, United Kingdom
- CNRS 8199-Institute of Biology, Pasteur Institute, Lille, France
| | - Alexandra I. F. Blakemore
- Department of Genomics of Common Disease, Imperial College London, London, United Kingdom
- Section of Investigative Medicine, Imperial College London, London, United Kingdom
| |
Collapse
|
219
|
Okser S, Pahikkala T, Aittokallio T. Genetic variants and their interactions in disease risk prediction - machine learning and network perspectives. BioData Min 2013; 6:5. [PMID: 23448398 PMCID: PMC3606427 DOI: 10.1186/1756-0381-6-5] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2012] [Accepted: 02/11/2013] [Indexed: 12/31/2022] Open
Abstract
A central challenge in systems biology and medical genetics is to understand how interactions among genetic loci contribute to complex phenotypic traits and human diseases. While most studies have so far relied on statistical modeling and association testing procedures, machine learning and predictive modeling approaches are increasingly being applied to mining genotype-phenotype relationships, also among those associations that do not necessarily meet statistical significance at the level of individual variants, yet still contributing to the combined predictive power at the level of variant panels. Network-based analysis of genetic variants and their interaction partners is another emerging trend by which to explore how sub-network level features contribute to complex disease processes and related phenotypes. In this review, we describe the basic concepts and algorithms behind machine learning-based genetic feature selection approaches, their potential benefits and limitations in genome-wide setting, and how physical or genetic interaction networks could be used as a priori information for providing improved predictive power and mechanistic insights into the disease networks. These developments are geared toward explaining a part of the missing heritability, and when combined with individual genomic profiling, such systems medicine approaches may also provide a principled means for tailoring personalized treatment strategies in the future.
Collapse
|
220
|
Kim HL, Schuster SC. Poor Man's 1000 Genome Project: Recent Human Population Expansion Confounds the Detection of Disease Alleles in 7,098 Complete Mitochondrial Genomes. Front Genet 2013; 4:13. [PMID: 23450075 PMCID: PMC3584485 DOI: 10.3389/fgene.2013.00013] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2012] [Accepted: 01/28/2013] [Indexed: 01/29/2023] Open
Abstract
Rapid growth of the human population has caused the accumulation of rare genetic variants that may play a role in the origin of genetic diseases. However, it is challenging to identify those rare variants responsible for specific diseases without genetic data from an extraordinarily large population sample. Here we focused on the accumulated data from the human mitochondrial (mt) genome sequences because this data provided 7,098 whole genomes for analysis. In this dataset we identified 6,110 single nucleotide variants (SNVs) and their frequency and determined that the best-fit demographic model for the 7,098 genomes included severe population bottlenecks and exponential expansions of the non-African population. Using this model, we simulated the evolution of mt genomes in order to ascertain the behavior of deleterious mutations. We found that such deleterious mutations barely survived during population expansion. We derived the threshold frequency of a deleterious mutation in separate African, Asian, and European populations and used it to identify pathogenic mutations in our dataset. Although threshold frequency was very low, the proportion of variants showing a lower frequency than that threshold was 82, 83, and 91% of the total variants for the African, Asian, and European populations, respectively. Within these variants, only 18 known pathogenic mutations were detected in the 7,098 genomes. This result showed the difficulty of detecting a pathogenic mutation within an abundance of rare variants in the human population, even with a large number of genomes available for study.
Collapse
Affiliation(s)
- Hie Lim Kim
- Center for Comparative Genomics and Bioinformatics, Pennsylvania State University University Park, PA, USA
| | | |
Collapse
|
221
|
Albrechtsen A, Grarup N, Li Y, Sparsø T, Tian G, Cao H, Jiang T, Kim SY, Korneliussen T, Li Q, Nie C, Wu R, Skotte L, Morris AP, Ladenvall C, Cauchi S, Stančáková A, Andersen G, Astrup A, Banasik K, Bennett AJ, Bolund L, Charpentier G, Chen Y, Dekker JM, Doney ASF, Dorkhan M, Forsen T, Frayling TM, Groves CJ, Gui Y, Hallmans G, Hattersley AT, He K, Hitman GA, Holmkvist J, Huang S, Jiang H, Jin X, Justesen JM, Kristiansen K, Kuusisto J, Lajer M, Lantieri O, Li W, Liang H, Liao Q, Liu X, Ma T, Ma X, Manijak MP, Marre M, Mokrosiński J, Morris AD, Mu B, Nielsen AA, Nijpels G, Nilsson P, Palmer CNA, Rayner NW, Renström F, Ribel-Madsen R, Robertson N, Rolandsson O, Rossing P, Schwartz TW, Slagboom PE, Sterner M, Tang M, Tarnow L, Tuomi T, van’t Riet E, van Leeuwen N, Varga TV, Vestmar MA, Walker M, Wang B, Wang Y, Wu H, Xi F, Yengo L, Yu C, Zhang X, Zhang J, Zhang Q, Zhang W, Zheng H, Zhou Y, Altshuler D, ‘t Hart LM, Franks PW, Balkau B, Froguel P, McCarthy MI, Laakso M, Groop L, Christensen C, Brandslund I, Lauritzen T, Witte DR, Linneberg A, Jørgensen T, Hansen T, Wang J, Nielsen R, Pedersen O. Exome sequencing-driven discovery of coding polymorphisms associated with common metabolic phenotypes. Diabetologia 2013; 56:298-310. [PMID: 23160641 PMCID: PMC3536959 DOI: 10.1007/s00125-012-2756-1] [Citation(s) in RCA: 85] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/20/2012] [Accepted: 09/28/2012] [Indexed: 12/13/2022]
Abstract
AIMS/HYPOTHESIS Human complex metabolic traits are in part regulated by genetic determinants. Here we applied exome sequencing to identify novel associations of coding polymorphisms at minor allele frequencies (MAFs) >1% with common metabolic phenotypes. METHODS The study comprised three stages. We performed medium-depth (8×) whole exome sequencing in 1,000 cases with type 2 diabetes, BMI >27.5 kg/m(2) and hypertension and in 1,000 controls (stage 1). We selected 16,192 polymorphisms nominally associated (p < 0.05) with case-control status, from four selected annotation categories or from loci reported to associate with metabolic traits. These variants were genotyped in 15,989 Danes to search for association with 12 metabolic phenotypes (stage 2). In stage 3, polymorphisms showing potential associations were genotyped in a further 63,896 Europeans. RESULTS Exome sequencing identified 70,182 polymorphisms with MAF >1%. In stage 2 we identified 51 potential associations with one or more of eight metabolic phenotypes covered by 45 unique polymorphisms. In meta-analyses of stage 2 and stage 3 results, we demonstrated robust associations for coding polymorphisms in CD300LG (fasting HDL-cholesterol: MAF 3.5%, p = 8.5 × 10(-14)), COBLL1 (type 2 diabetes: MAF 12.5%, OR 0.88, p = 1.2 × 10(-11)) and MACF1 (type 2 diabetes: MAF 23.4%, OR 1.10, p = 8.2 × 10(-10)). CONCLUSIONS/INTERPRETATION We applied exome sequencing as a basis for finding genetic determinants of metabolic traits and show the existence of low-frequency and common coding polymorphisms with impact on common metabolic traits. Based on our study, coding polymorphisms with MAF above 1% do not seem to have particularly high effect sizes on the measured metabolic traits.
Collapse
Affiliation(s)
- A. Albrechtsen
- Centre of Bioinformatics, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | - N. Grarup
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, DIKU Building, Universitetsparken 1, 2100 Copenhagen Ø, Denmark
| | - Y. Li
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - T. Sparsø
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, DIKU Building, Universitetsparken 1, 2100 Copenhagen Ø, Denmark
| | | | - H. Cao
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - T. Jiang
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - S. Y. Kim
- Department of Integrative Biology, University of California, 3060 Valley Life Sciences, Bldg #3140, Berkeley, CA 94720-3140 USA
| | - T. Korneliussen
- Centre of Bioinformatics, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | - Q. Li
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - C. Nie
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - R. Wu
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - L. Skotte
- Centre of Bioinformatics, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | - A. P. Morris
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - C. Ladenvall
- Department of Clinical Sciences, Diabetes and Endocrinology, Lund University and Lund University Diabetes Centre, Malmö, Sweden
| | - S. Cauchi
- UMR CNRS 8199, Genomic and Metabolic Disease, Lille, France
| | - A. Stančáková
- Department of Medicine, University of Eastern Finland and Kuopio University Hospital, Kuopio, Finland
| | - G. Andersen
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, DIKU Building, Universitetsparken 1, 2100 Copenhagen Ø, Denmark
| | - A. Astrup
- Department of Human Nutrition, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | - K. Banasik
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, DIKU Building, Universitetsparken 1, 2100 Copenhagen Ø, Denmark
| | - A. J. Bennett
- Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, UK
| | - L. Bolund
- Institute of Human Genetics, Aarhus University, Aarhus, Denmark
| | - G. Charpentier
- Department of Endocrinology-Diabetology, Corbeil-Essonnes Hospital, Corbeil-Essonnes, France
| | - Y. Chen
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - J. M. Dekker
- EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, the Netherlands
| | - A. S. F. Doney
- Diabetes Research Centre, Biomedical Research Institute, University of Dundee, Ninewells Hospital, Dundee, UK
- Pharmacogenomics Centre, Biomedical Research Institute, University of Dundee, Ninewells Hospital, Dundee, UK
| | - M. Dorkhan
- Department of Clinical Sciences, Diabetes and Endocrinology, Lund University and Lund University Diabetes Centre, Malmö, Sweden
| | - T. Forsen
- Department of General Practice and Primary Health Care, University of Helsinki, Helsinki, Finland
- Vasa Health Care Center, Vaasa, Finland
| | - T. M. Frayling
- Genetics of Complex Traits, Institute of Biomedical and Clinical Science, Peninsula Medical School, University of Exeter, Exeter, UK
- Diabetes Genetics, Institute of Biomedical and Clinical Science, Peninsula Medical School, University of Exeter, Exeter, UK
| | - C. J. Groves
- Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, UK
| | - Y. Gui
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - G. Hallmans
- Department of Public Health and Clinical Medicine, Umeå University, Umeå, Sweden
| | - A. T. Hattersley
- Genetics of Complex Traits, Institute of Biomedical and Clinical Science, Peninsula Medical School, University of Exeter, Exeter, UK
- Diabetes Genetics, Institute of Biomedical and Clinical Science, Peninsula Medical School, University of Exeter, Exeter, UK
| | - K. He
- Chinese PLA General Hospital, Beijing, China
| | - G. A. Hitman
- Centre for Diabetes, Blizard Institute, Queen Mary University of London, London, UK
| | - J. Holmkvist
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, DIKU Building, Universitetsparken 1, 2100 Copenhagen Ø, Denmark
- Vipergen Aps, Copenhagen, Denmark
| | - S. Huang
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
- School of Bioscience and Biotechnology, South China University of Technology, Guangzhou, China
| | - H. Jiang
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - X. Jin
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - J. M. Justesen
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, DIKU Building, Universitetsparken 1, 2100 Copenhagen Ø, Denmark
| | - K. Kristiansen
- Department of Biology, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | - J. Kuusisto
- Department of Medicine, University of Eastern Finland and Kuopio University Hospital, Kuopio, Finland
| | - M. Lajer
- Steno Diabetes Center, Gentofte, Denmark
| | - O. Lantieri
- Institut inter Regional pour la Santé (IRSA), La Riche, France
| | - W. Li
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - H. Liang
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - Q. Liao
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - X. Liu
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - T. Ma
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - X. Ma
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - M. P. Manijak
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, DIKU Building, Universitetsparken 1, 2100 Copenhagen Ø, Denmark
| | - M. Marre
- Department of Endocrinology, Diabetology and Nutrition, Bichat-Claude Bernard University Hospital, Assistance Publique des Hôpitaux de Paris, Paris, France
- Inserm U695, Université Denis Diderot Paris 7, Paris, France
| | - J. Mokrosiński
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, DIKU Building, Universitetsparken 1, 2100 Copenhagen Ø, Denmark
- Laboratory for Molecular Pharmacology, Department of Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - A. D. Morris
- Diabetes Research Centre, Biomedical Research Institute, University of Dundee, Ninewells Hospital, Dundee, UK
- Pharmacogenomics Centre, Biomedical Research Institute, University of Dundee, Ninewells Hospital, Dundee, UK
| | - B. Mu
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - A. A. Nielsen
- Department of Clinical Biochemistry, Vejle Hospital, Vejle, Denmark
| | - G. Nijpels
- EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, the Netherlands
| | - P. Nilsson
- Department of Clinical Sciences, Medicine, Lund University, Malmö, Sweden
| | - C. N. A. Palmer
- Diabetes Research Centre, Biomedical Research Institute, University of Dundee, Ninewells Hospital, Dundee, UK
- Pharmacogenomics Centre, Biomedical Research Institute, University of Dundee, Ninewells Hospital, Dundee, UK
| | - N. W. Rayner
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
- Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, UK
| | - F. Renström
- Department of Clinical Sciences, Genetic and Molecular Epidemiology Unit, Skåna University Hospital, Lund University, Malmö, Sweden
| | - R. Ribel-Madsen
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, DIKU Building, Universitetsparken 1, 2100 Copenhagen Ø, Denmark
| | - N. Robertson
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
- Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, UK
| | - O. Rolandsson
- Department of Public Health and Clinical Medicine, Umeå University, Umeå, Sweden
| | - P. Rossing
- Steno Diabetes Center, Gentofte, Denmark
| | - T. W. Schwartz
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, DIKU Building, Universitetsparken 1, 2100 Copenhagen Ø, Denmark
- Laboratory for Molecular Pharmacology, Department of Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - P. E. Slagboom
- Section of Molecular Epidemiology, Leiden University Medical Center, Leiden, the Netherlands
- Netherlands Center for Healthy Ageing, Leiden, the Netherlands
| | - M. Sterner
- Department of Clinical Sciences, Diabetes and Endocrinology, Lund University and Lund University Diabetes Centre, Malmö, Sweden
| | | | - M. Tang
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - L. Tarnow
- Steno Diabetes Center, Gentofte, Denmark
| | | | - T. Tuomi
- Department of Medicine, Helsinki University Hospital, Helsinki, Finland
- Folkhälsan Research Center, Helsinki, Finland
| | - E. van’t Riet
- EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, the Netherlands
| | - N. van Leeuwen
- Department of Molecular Cell Biology, Leiden University Medical Center, Leiden, the Netherlands
| | - T. V. Varga
- Department of Clinical Sciences, Genetic and Molecular Epidemiology Unit, Skåna University Hospital, Lund University, Malmö, Sweden
| | - M. A. Vestmar
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, DIKU Building, Universitetsparken 1, 2100 Copenhagen Ø, Denmark
- Laboratory for Molecular Pharmacology, Department of Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - M. Walker
- Diabetes Research Group, School of Clinical Medical Sciences, Newcastle University, Newcastle upon Tyne, UK
| | - B. Wang
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - Y. Wang
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - H. Wu
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - F. Xi
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - L. Yengo
- UMR CNRS 8199, Genomic and Metabolic Disease, Lille, France
| | - C. Yu
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - X. Zhang
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - J. Zhang
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - Q. Zhang
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - W. Zhang
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - H. Zheng
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - Y. Zhou
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
| | - D. Altshuler
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA USA
- Broad Institute of Harvard and MIT, Cambridge, MA USA
| | - L. M. ‘t Hart
- Section of Molecular Epidemiology, Leiden University Medical Center, Leiden, the Netherlands
- Department of Molecular Cell Biology, Leiden University Medical Center, Leiden, the Netherlands
| | - P. W. Franks
- Department of Public Health and Clinical Medicine, Umeå University, Umeå, Sweden
- Department of Clinical Sciences, Genetic and Molecular Epidemiology Unit, Skåna University Hospital, Lund University, Malmö, Sweden
- Department of Nutrition, Harvard School of Public Health, Boston, MA USA
| | - B. Balkau
- Inserm CESP U1018, Villejuif, France
| | - P. Froguel
- UMR CNRS 8199, Genomic and Metabolic Disease, Lille, France
- Genomic Medicine, Hammersmith Hospital, Imperial College London, London, UK
| | - M. I. McCarthy
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
- Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, UK
- Oxford National Institute for Health Research Biomedical Research Centre, Churchill Hospital, Oxford, UK
| | - M. Laakso
- Department of Medicine, University of Eastern Finland and Kuopio University Hospital, Kuopio, Finland
| | - L. Groop
- Department of Clinical Sciences, Diabetes and Endocrinology, Lund University and Lund University Diabetes Centre, Malmö, Sweden
| | - C. Christensen
- Department of Internal Medicine and Endocrinology, Vejle Hospital, Vejle, Denmark
| | - I. Brandslund
- Department of Clinical Biochemistry, Vejle Hospital, Vejle, Denmark
- Institute of Regional Health Research, University of Southern Denmark, Odense, Denmark
| | - T. Lauritzen
- Department of General Practice, Aarhus University, Aarhus, Denmark
| | | | - A. Linneberg
- Research Centre for Prevention and Health, Glostrup University Hospital, Glostrup, Denmark
| | - T. Jørgensen
- Research Centre for Prevention and Health, Glostrup University Hospital, Glostrup, Denmark
- Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- Faculty of Medicine, University of Aalborg, Aalborg, Denmark
| | - T. Hansen
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, DIKU Building, Universitetsparken 1, 2100 Copenhagen Ø, Denmark
- Faculty of Health Sciences, University of Southern Denmark, Odense, Denmark
| | - J. Wang
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, DIKU Building, Universitetsparken 1, 2100 Copenhagen Ø, Denmark
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, 518083 Shenzhen, China
- Department of Biology, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | - R. Nielsen
- Centre of Bioinformatics, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
- Department of Integrative Biology, University of California, 3060 Valley Life Sciences, Bldg #3140, Berkeley, CA 94720-3140 USA
- Department of Statistics, University of California, Berkeley, CA USA
| | - O. Pedersen
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, DIKU Building, Universitetsparken 1, 2100 Copenhagen Ø, Denmark
- Faculty of Health Sciences, Aarhus University, Aarhus, Denmark
- Hagedorn Research Institute, Gentofte, Denmark
- Institute of Biomedical Science, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
222
|
Exome sequencing identifies novel rheumatoid arthritis-susceptible variants in the BTNL2. J Hum Genet 2013; 58:210-5. [DOI: 10.1038/jhg.2013.2] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
223
|
Zakharov S, Salim A, Thalamuthu A. Comparison of similarity-based tests and pooling strategies for rare variants. BMC Genomics 2013; 14:50. [PMID: 23343094 PMCID: PMC3600007 DOI: 10.1186/1471-2164-14-50] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2012] [Accepted: 01/17/2013] [Indexed: 11/10/2022] Open
Abstract
Background As several rare genomic variants have been shown to affect common phenotypes, rare variants association analysis has received considerable attention. Several efficient association tests using genotype and phenotype similarity measures have been proposed in the literature. The major advantages of similarity-based tests are their ability to accommodate multiple types of DNA variations within one association test, and to account for the possible interaction within a region. However, not much work has been done to compare the performance of similarity-based tests on rare variants association scenarios, especially when applied with different rare variants pooling strategies. Results Based on the population genetics simulations and analysis of a publicly-available sequencing data set, we compared the performance of four similarity-based tests and two rare variants pooling strategies. We showed that weighting approach outperforms collapsing under the presence of strong effect from rare variants and under the presence of moderate effect from common variants, whereas collapsing of rare variants is preferable when common variants possess a strong effect. We also demonstrated that the difference in statistical power between the two pooling strategies may be substantial. The results also highlighted consistently high power of two similarity-based approaches when applied with an appropriate pooling strategy. Conclusions Population genetics simulations and sequencing data set analysis showed high power of two similarity-based tests and a substantial difference in power between the two pooling strategies.
Collapse
Affiliation(s)
- Sergii Zakharov
- Human Genetics, Genome Institute of Singapore, 60 Biopolis Street, Singapore 138672, Singapore.
| | | | | |
Collapse
|
224
|
Chen YC, Carter H, Parla J, Kramer M, Goes FS, Pirooznia M, Zandi PP, McCombie WR, Potash JB, Karchin R. A hybrid likelihood model for sequence-based disease association studies. PLoS Genet 2013; 9:e1003224. [PMID: 23358228 PMCID: PMC3554549 DOI: 10.1371/journal.pgen.1003224] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2012] [Accepted: 11/21/2012] [Indexed: 11/18/2022] Open
Abstract
In the past few years, case-control studies of common diseases have shifted their focus from single genes to whole exomes. New sequencing technologies now routinely detect hundreds of thousands of sequence variants in a single study, many of which are rare or even novel. The limitation of classical single-marker association analysis for rare variants has been a challenge in such studies. A new generation of statistical methods for case-control association studies has been developed to meet this challenge. A common approach to association analysis of rare variants is the burden-style collapsing methods to combine rare variant data within individuals across or within genes. Here, we propose a new hybrid likelihood model that combines a burden test with a test of the position distribution of variants. In extensive simulations and on empirical data from the Dallas Heart Study, the new model demonstrates consistently good power, in particular when applied to a gene set (e.g., multiple candidate genes with shared biological function or pathway), when rare variants cluster in key functional regions of a gene, and when protective variants are present. When applied to data from an ongoing sequencing study of bipolar disorder (191 cases, 107 controls), the model identifies seven gene sets with nominal p-values0.05, of which one MAPK signaling pathway (KEGG) reaches trend-level significance after correcting for multiple testing. Inexpensive, high-throughput sequencing has transformed the field of case-control association studies. For the first time, it may be possible to identify the genetic underpinnings of complex diseases, by sequencing the DNA of hundreds (even thousands) of cases and controls and comparing patterns of DNA sequence variation. However, complex diseases are likely to be caused by many variants, some of which are very rare. Taken one at a time, the association between variant and disease phenotype may not be detectable by current statistical methods. One strategy is to identify regions where important variants occur by “collapsing” variants into groups. Here, we present a new collapsing approach, capable of detecting subtle genetic differences between cases and controls. We show, in extensive simulations and using a benchmark set of genes involved in human triglyceride levels, that the approach is potentially more powerful than existing methods. We apply the new method to an ongoing sequencing study of bipolar cases and controls and identify a set of genes found in neuronal synapses, which may be implicated in bipolar disorder.
Collapse
Affiliation(s)
- Yun-Ching Chen
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Hannah Carter
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Jennifer Parla
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Melissa Kramer
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Fernando S. Goes
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, Maryland, United States of America
| | - Mehdi Pirooznia
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, Maryland, United States of America
| | - Peter P. Zandi
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, Maryland, United States of America
| | - W. Richard McCombie
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - James B. Potash
- Department of Psychiatry, University of Iowa, Iowa City, Iowa, United States of America
| | - Rachel Karchin
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland, United States of America
- * E-mail:
| |
Collapse
|
225
|
De G, Yip WK, Ionita-Laza I, Laird N. Rare variant analysis for family-based design. PLoS One 2013; 8:e48495. [PMID: 23341868 PMCID: PMC3546113 DOI: 10.1371/journal.pone.0048495] [Citation(s) in RCA: 76] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2011] [Accepted: 10/01/2012] [Indexed: 12/21/2022] Open
Abstract
Genome-wide association studies have been able to identify disease associations with many common variants; however most of the estimated genetic contribution explained by these variants appears to be very modest. Rare variants are thought to have larger effect sizes compared to common SNPs but effects of rare variants cannot be tested in the GWAS setting. Here we propose a novel method to test for association of rare variants obtained by sequencing in family-based samples by collapsing the standard family-based association test (FBAT) statistic over a region of interest. We also propose a suitable weighting scheme so that low frequency SNPs that may be enriched in functional variants can be upweighted compared to common variants. Using simulations we show that the family-based methods perform at par with the population-based methods under no population stratification. By construction, family-based tests are completely robust to population stratification; we show that our proposed methods remain valid even when population stratification is present.
Collapse
Affiliation(s)
- Gourab De
- Department of Biostatistics, Harvard University, Boston, MA, USA.
| | | | | | | |
Collapse
|
226
|
Londin E, Yadav P, Surrey S, Kricka LJ, Fortina P. Use of linkage analysis, genome-wide association studies, and next-generation sequencing in the identification of disease-causing mutations. Methods Mol Biol 2013; 1015:127-46. [PMID: 23824853 DOI: 10.1007/978-1-62703-435-7_8] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
For the past two decades, linkage analysis and genome-wide analysis have greatly advanced our knowledge of the human genome. But despite these successes the genetic architecture of diseases remains unknown. More recently, the availability of next-generation sequencing has dramatically increased our capability for determining DNA sequences that range from large portions of one individual's genome to targeted regions of many genomes in a cohort of interest. In this review, we highlight the successes and shortcomings that have been achieved using genome-wide association studies (GWAS) to identify the variants contributing to disease. We further review the methods and use of new technologies, based on next-generation sequencing, that are becoming increasingly used to expand our knowledge of the causes of genetic disease.
Collapse
Affiliation(s)
- Eric Londin
- Computational Medicine Center, Thomas Jefferson University Jefferson Medical College, Philadelphia, PA, USA
| | | | | | | | | |
Collapse
|
227
|
McPherson R. From Genome-Wide Association Studies to Functional Genomics: New Insights Into Cardiovascular Disease. Can J Cardiol 2013. [DOI: 10.1016/j.cjca.2012.08.017] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022] Open
|
228
|
Slavov G, Allison G, Bosch M. Advances in the genetic dissection of plant cell walls: tools and resources available in Miscanthus. FRONTIERS IN PLANT SCIENCE 2013; 4:217. [PMID: 23847628 PMCID: PMC3701120 DOI: 10.3389/fpls.2013.00217] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2013] [Accepted: 06/05/2013] [Indexed: 05/19/2023]
Abstract
Tropical C4 grasses from the genus Miscanthus are believed to have great potential as biomass crops. However, Miscanthus species are essentially undomesticated, and genetic, molecular and bioinformatics tools are in very early stages of development. Furthermore, similar to other crops targeted as lignocellulosic feedstocks, the efficient utilization of biomass is hampered by our limited knowledge of the structural organization of the plant cell wall and the underlying genetic components that control this organization. The Institute of Biological, Environmental and Rural Sciences (IBERS) has assembled an extensive collection of germplasm for several species of Miscanthus. In addition, an integrated, multidisciplinary research programme at IBERS aims to inform accelerated breeding for biomass productivity and composition, while also generating fundamental knowledge. Here we review recent advances with respect to the genetic characterization of the cell wall in Miscanthus. First, we present a summary of recent and on-going biochemical studies, including prospects and limitations for the development of powerful phenotyping approaches. Second, we review current knowledge about genetic variation for cell wall characteristics of Miscanthus and illustrate how phenotypic data, combined with high-density arrays of single-nucleotide polymorphisms, are being used in genome-wide association studies to generate testable hypotheses and guide biological discovery. Finally, we provide an overview of the current knowledge about the molecular biology of cell wall biosynthesis in Miscanthus and closely related grasses, discuss the key conceptual and technological bottlenecks, and outline the short-term prospects for progress in this field.
Collapse
Affiliation(s)
- Gancho Slavov
- *Correspondence: Gancho Slavov, Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Plas Gogerddan, Aberystwyth, Ceredigion, Wales SY23 3EB, UK e-mail:
| | | | | |
Collapse
|
229
|
Altmann A, Quast C, Weber P. Detecting rare variants for psychiatric disorders using next generation sequencing: a methods primer. Curr Psychiatry Rep 2013; 15:333. [PMID: 23250814 DOI: 10.1007/s11920-012-0333-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Recent advances in massively parallel sequencing (MPS) have had an extensive impact on research in medical genomics. In particular, the analysis of rare variants using MPS promises to lead to a better understanding of complex disorders. Nevertheless, for meaningful studies that address the genetic basis for neuropsychiatric disorders, at least hundreds of patient samples have to be analyzed. This undertaking is still not feasible for single research groups on a whole-genome scale and in individual samples. Thus, researchers increasingly employ strategies for reducing the amount of sequencing efforts, such as target enrichment and non-barcoded sample pooling. This review provides an overview of current technologies, discusses options for reduced experimental designs, and illustrates the successful application of the presented methodologies in a recent study of panic disorder patients. Thereby, it aims to introduce the emerging field of MPS into neuropsychiatric research and might serve as a guide for further studies.
Collapse
Affiliation(s)
- Andre Altmann
- Department of Neurology & Neurological Sciences, Functional Imaging in Neurodegenerative Disorders Laboratory, Stanford University, Stanford, CA, USA.
| | | | | |
Collapse
|
230
|
Sivley RM, Fish AE, Bush WS. Knowledge-constrained K-medoids Clustering of Regulatory Rare Alleles for Burden Tests. EVOLUTIONARY COMPUTATION, MACHINE LEARNING AND DATA MINING IN BIOINFORMATICS. EVOBIO (CONFERENCE) 2013; 7833:35-42. [PMID: 25541630 PMCID: PMC4274942 DOI: 10.1007/978-3-642-37189-9_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Rarely occurring genetic variants are hypothesized to influence human diseases, but statistically associating these rare variants to disease is challenging due to a lack of statistical power in most feasibly sized datasets. Several statistical tests have been developed to either collapse multiple rare variants from a genomic region into a single variable (presence/absence) or to tally the number of rare alleles within a region, relating the burden of rare alleles to disease risk. Both these approaches, however, rely on user-specification of a genomic region to generate these collapsed or burden variables, usually an entire gene. Recent studies indicate that most risk variants for common diseases are found within regulatory regions, not genes. To capture the effect of rare alleles within non-genic regulatory regions for burden tests, we contrast a simple sliding window approach with a knowledge-guided k-medoids clustering method to group rare variants into statistically powerful, biologically meaningful windows. We apply these methods to detect genomic regions that alter expression of nearby genes.
Collapse
Affiliation(s)
- R Michael Sivley
- Center for Human Genetics Research, Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | | | | |
Collapse
|
231
|
Luo L, Zhu Y, Xiong M. Quantitative trait locus analysis for next-generation sequencing with the functional linear models. J Med Genet 2012; 49:513-24. [PMID: 22889854 DOI: 10.1136/jmedgenet-2012-100798] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
BACKGROUND Although in the past few years we have witnessed the rapid development of novel statistical methods for association studies of qualitative traits using next generation sequencing (NGS) data, only a few statistics are proposed for testing the association of rare variants with quantitative traits. The quantitative trait locus (QTL) analysis of rare variants remains challenging. Analysis from low dimensional data to high dimensional genomic data demands changes in statistical methods from multivariate data analysis to functional data analysis. METHODS We propose a functional linear model (FLM) as a general principle for developing novel and powerful QTL analysis methods designed for resequencing data. By simulations we calculated the type I error rates and evaluated the power of the FLM and other eight existing statistical methods, even in the presence of both positive and negative signs of effects. RESULTS Since the FLM retains all of the genetic information in the data and explores the merits of both variant-by-variant and collective analysis and overcomes their limitation, the FLM has a much higher power than other existing statistics in all the scenarios considered. To evaluate its performance further, the FLM was applied to association analysis of six quantitative traits in the Dallas Heart Study, and RNA-seq eQTL analysis with genetic variation in the low coverage resequencing data of the 1000 Genomes Project. Real data analysis showed that the FLM had much smaller p values to identify significantly associated variants than other existing methods. CONCLUSIONS The FLM is expected to open a new route for QTL analysis.
Collapse
Affiliation(s)
- Li Luo
- Division of Epidemiology, Biostatistics and Preventive Medicine, University of New Mexico, Albuquerque, NM, USA
| | | | | |
Collapse
|
232
|
Feng Q, Wilke RA, Baye TM. Individualized risk for statin-induced myopathy: current knowledge, emerging challenges and potential solutions. Pharmacogenomics 2012; 13:579-94. [PMID: 22462750 DOI: 10.2217/pgs.12.11] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
Skeletal muscle toxicity is the primary adverse effect of statins. In this review, we summarize current knowledge regarding the genetic and nongenetic determinants of risk for statin induced myopathy. Many genetic factors were initially identified through candidate gene association studies limited to pharmacokinetic (PK) targets. Through genome-wide association studies, it has become clear that SLCO1B1 is among the strongest PK predictors of myopathy risk. Genome-wide association studies have also expanded our understanding of pharmacodynamic candidate genes, including RYR2. It is anticipated that deep resequencing efforts will define new loci with rare variants that also contribute, and sophisticated computational approaches will be needed to characterize gene-gene and gene-environment interactions. Beyond environment, race is a critical covariate, and its influence is only partly explained by geographic differences in the frequency of known pharmacodynamic and PK variants. As such, admixture analyses will be essential for a full understanding of statin-induced myopathy.
Collapse
Affiliation(s)
- QiPing Feng
- Department of Medicine, Vanderbilt University Medical Center, Oates Institute for Experimental Therapeutics, Nashville, TN, USA
| | | | | |
Collapse
|
233
|
Lescai F, Bonfiglio S, Bacchelli C, Chanudet E, Waters A, Sisodiya SM, Kasperavičiūtė D, Williams J, Harold D, Hardy J, Kleta R, Cirak S, Williams R, Achermann JC, Anderson J, Kelsell D, Vulliamy T, Houlden H, Wood N, Sheerin U, Tonini GP, Mackay D, Hussain K, Sowden J, Kinsler V, Osinska J, Brooks T, Hubank M, Beales P, Stupka E. Characterisation and validation of insertions and deletions in 173 patient exomes. PLoS One 2012; 7:e51292. [PMID: 23251486 PMCID: PMC3522676 DOI: 10.1371/journal.pone.0051292] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2012] [Accepted: 11/01/2012] [Indexed: 01/01/2023] Open
Abstract
Recent advances in genomics technologies have spurred unprecedented efforts in genome and exome re-sequencing aiming to unravel the genetic component of rare and complex disorders. While in rare disorders this allowed the identification of novel causal genes, the missing heritability paradox in complex diseases remains so far elusive. Despite rapid advances of next-generation sequencing, both the technology and the analysis of the data it produces are in its infancy. At present there is abundant knowledge pertaining to the role of rare single nucleotide variants (SNVs) in rare disorders and of common SNVs in common disorders. Although the 1,000 genome project has clearly highlighted the prevalence of rare variants and more complex variants (e.g. insertions, deletions), their role in disease is as yet far from elucidated.We set out to analyse the properties of sequence variants identified in a comprehensive collection of exome re-sequencing studies performed on samples from patients affected by a broad range of complex and rare diseases (N = 173). Given the known potential for Loss of Function (LoF) variants to be false positive, we performed an extensive validation of the common, rare and private LoF variants identified, which indicated that most of the private and rare variants identified were indeed true, while common novel variants had a significantly higher false positive rate. Our results indicated a strong enrichment of very low-frequency insertion/deletion variants, so far under-investigated, which might be difficult to capture with low coverage and imputation approaches and for which most of study designs would be under-powered. These insertions and deletions might play a significant role in disease genetics, contributing specifically to the underlining rare and private variation predicted to be discovered through next generation sequencing.
Collapse
Affiliation(s)
- Francesco Lescai
- UCL Genomics, University College London, London, United Kingdom
- Division of Research Strategy, University College London, London, United Kingdom
- GOSgene, UCL Institute of Child Health, University College London, London, United Kingdom
| | - Silvia Bonfiglio
- Centre for Translational Genomics and Bioinformatics, San Raffaele Scientific Institute, Milan, Italy
| | - Chiara Bacchelli
- GOSgene, UCL Institute of Child Health, University College London, London, United Kingdom
- UCL Institute of Child Health, University College London, London, United Kingdom
| | - Estelle Chanudet
- GOSgene, UCL Institute of Child Health, University College London, London, United Kingdom
- UCL Institute of Child Health, University College London, London, United Kingdom
| | - Aoife Waters
- UCL Institute of Child Health, University College London, London, United Kingdom
| | - Sanjay M. Sisodiya
- UCL Institute of Neurology, University College London, London, United Kingdom
| | | | - Julie Williams
- Department of Psychological Medicine, Cardiff University, Cardiff, United Kingdom
| | - Denise Harold
- Department of Psychological Medicine, Cardiff University, Cardiff, United Kingdom
| | - John Hardy
- UCL Institute of Neurology, University College London, London, United Kingdom
| | - Robert Kleta
- UCL Institute of Child Health, University College London, London, United Kingdom
| | - Sebahattin Cirak
- UCL Institute of Child Health, University College London, London, United Kingdom
| | - Richard Williams
- UCL Institute of Child Health, University College London, London, United Kingdom
| | - John C. Achermann
- UCL Institute of Child Health, University College London, London, United Kingdom
| | - John Anderson
- UCL Institute of Child Health, University College London, London, United Kingdom
| | - David Kelsell
- Blizard Institute of Cell and Molecular Science, Barts and The London, London, United Kingdom
| | - Tom Vulliamy
- Blizard Institute of Cell and Molecular Science, Barts and The London, London, United Kingdom
| | - Henry Houlden
- UCL Institute of Neurology, University College London, London, United Kingdom
| | - Nicholas Wood
- UCL Institute of Neurology, University College London, London, United Kingdom
| | - Una Sheerin
- UCL Institute of Neurology, University College London, London, United Kingdom
| | - Gian Paolo Tonini
- Translational Oncopathology, National Cancer Research Institute (IST), Genova, Italy
| | - Donna Mackay
- Institute of Ophthalmology, University College London, London, United Kingdom
| | - Khalid Hussain
- UCL Institute of Child Health, University College London, London, United Kingdom
| | - Jane Sowden
- UCL Institute of Child Health, University College London, London, United Kingdom
| | - Veronica Kinsler
- UCL Institute of Child Health, University College London, London, United Kingdom
| | - Justyna Osinska
- UCL Genomics, University College London, London, United Kingdom
| | - Tony Brooks
- UCL Genomics, University College London, London, United Kingdom
| | - Mike Hubank
- UCL Genomics, University College London, London, United Kingdom
- UCL Institute of Child Health, University College London, London, United Kingdom
| | - Philip Beales
- GOSgene, UCL Institute of Child Health, University College London, London, United Kingdom
- UCL Institute of Child Health, University College London, London, United Kingdom
| | - Elia Stupka
- UCL Genomics, University College London, London, United Kingdom
- Centre for Translational Genomics and Bioinformatics, San Raffaele Scientific Institute, Milan, Italy
- Cancer Institute, University College London, London, United Kingdom
| |
Collapse
|
234
|
Bassuk AG, Muthuswamy LB, Boland R, Smith TL, Hulstrand AM, Northrup H, Hakeman M, Dierdorff JM, Yung CK, Long A, Brouillette RB, Au KS, Gurnett C, Houston DW, Cornell RA, Manak JR. Copy number variation analysis implicates the cell polarity gene glypican 5 as a human spina bifida candidate gene. Hum Mol Genet 2012; 22:1097-111. [PMID: 23223018 DOI: 10.1093/hmg/dds515] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Neural tube defects (NTDs) are common birth defects of complex etiology. Family and population-based studies have confirmed a genetic component to NTDs. However, despite more than three decades of research, the genes involved in human NTDs remain largely unknown. We tested the hypothesis that rare copy number variants (CNVs), especially de novo germline CNVs, are a significant risk factor for NTDs. We used array-based comparative genomic hybridization (aCGH) to identify rare CNVs in 128 Caucasian and 61 Hispanic patients with non-syndromic lumbar-sacral myelomeningocele. We also performed aCGH analysis on the parents of affected individuals with rare CNVs where parental DNA was available (42 sets). Among the eight de novo CNVs that we identified, three generated copy number changes of entire genes. One large heterozygous deletion removed 27 genes, including PAX3, a known spina bifida-associated gene. A second CNV altered genes (PGPD8, ZC3H6) for which little is known regarding function or expression. A third heterozygous deletion removed GPC5 and part of GPC6, genes encoding glypicans. Glypicans are proteoglycans that modulate the activity of morphogens such as Sonic Hedgehog (SHH) and bone morphogenetic proteins (BMPs), both of which have been implicated in NTDs. Additionally, glypicans function in the planar cell polarity (PCP) pathway, and several PCP genes have been associated with NTDs. Here, we show that GPC5 orthologs are expressed in the neural tube, and that inhibiting their expression in frog and fish embryos results in NTDs. These results implicate GPC5 as a gene required for normal neural tube development.
Collapse
Affiliation(s)
- Alexander G Bassuk
- Department of Pediatrics, University of Iowa Carver College of Medicine, University of Iowa, Iowa City, IA 52242, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
235
|
Quast C, Altmann A, Weber P, Arloth J, Bader D, Heck A, Pfister H, Müller-Myhsok B, Erhardt A, Binder EB. Rare variants in TMEM132D in a case-control sample for panic disorder. Am J Med Genet B Neuropsychiatr Genet 2012; 159B:896-907. [PMID: 22911938 DOI: 10.1002/ajmg.b.32096] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/08/2012] [Accepted: 08/03/2012] [Indexed: 11/06/2022]
Abstract
Genome-wide association studies have identified common variants associated with common diseases. Most variants, however, explain only a small proportion of the estimated heritability, suggesting that rare variants might contribute to a larger extent to common diseases than assumed to date. Here, we use next-generation sequencing to test whether such variants contribute to the risk for anxiety disorders by re-sequencing 40 kb including all exons of the TMEM132D locus which we have previously shown to be associated with panic disorder and anxiety severity measures. DNA from 300 patients suffering from anxiety disorders, mostly panic disorder (84.7%), and 300 healthy controls was screened for the presence of genetic variants using next-generation re-sequencing in a pooled approach. Results were verified by individual re-genotyping. We identified 371 variants of which 247 had not been reported before, including 15 novel non-synonymous variants. The majority, 76% of these variants had a minor allele frequency less than 5%. While we did not identify additional common variants in TMEM132D associated with panic disorders, we observed an overrepresentation of presumably functional coding variants in healthy controls as compared to cases as well as a higher rate of private coding variants in cases, with one non-synonymous coding variant present in four patients but not in any of the matched controls nor in over 5,500 individuals of different ethnic origins from publicly available re-sequencing datasets. Our data suggest that not only common but also putatively functional and/or rare variants within TMEM132D might contribute to the risk to develop anxiety disorders.
Collapse
Affiliation(s)
- Carina Quast
- Max Planck Institute of Psychiatry, Munich, Germany
| | | | | | | | | | | | | | | | | | | |
Collapse
|
236
|
Abstract
Genetic variation influences the response of an individual to drug treatments. Understanding this variation has the potential to make therapy safer and more effective by determining selection and dosing of drugs for an individual patient. In the context of cancer, tumours may have specific disease-defining mutations, but a patient's germline genetic variation will also affect drug response (both efficacy and toxicity), and here we focus on how to study this variation. Advances in sequencing technologies, statistical genetics analysis methods and clinical trial designs have shown promise for the discovery of variants associated with drug response. We discuss the application of germline genetics analysis methods to cancer pharmacogenomics with a focus on the special considerations for study design.
Collapse
|
237
|
Barnett IJ, Lee S, Lin X. Detecting rare variant effects using extreme phenotype sampling in sequencing association studies. Genet Epidemiol 2012. [PMID: 23184518 DOI: 10.1002/gepi.21699] [Citation(s) in RCA: 108] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
In the increasing number of sequencing studies aimed at identifying rare variants associated with complex traits, the power of the test can be improved by guided sampling procedures. We confirm both analytically and numerically that sampling individuals with extreme phenotypes can enrich the presence of causal rare variants and can therefore lead to an increase in power compared to random sampling. Although application of traditional rare variant association tests to these extreme phenotype samples requires dichotomizing the continuous phenotypes before analysis, the dichotomization procedure can decrease the power by reducing the information in the phenotypes. To avoid this, we propose a novel statistical method based on the optimal Sequence Kernel Association Test that allows us to test for rare variant effects using continuous phenotypes in the analysis of extreme phenotype samples. The increase in power of this method is demonstrated through simulation of a wide range of scenarios as well as in the triglyceride data of the Dallas Heart Study.
Collapse
Affiliation(s)
- Ian J Barnett
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA
| | | | | |
Collapse
|
238
|
Shugart YY, Zhu Y, Guo W, Xiong M. Weighted pedigree-based statistics for testing the association of rare variants. BMC Genomics 2012; 13:667. [PMID: 23176082 PMCID: PMC3827928 DOI: 10.1186/1471-2164-13-667] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2012] [Accepted: 11/12/2012] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND With the advent of next-generation sequencing (NGS) technologies, researchers are now generating a deluge of data on high dimensional genomic variations, whose analysis is likely to reveal rare variants involved in the complex etiology of disease. Standing in the way of such discoveries, however, is the fact that statistics for rare variants are currently designed for use with population-based data. In this paper, we introduce a pedigree-based statistic specifically designed to test for rare variants in family-based data. The additional power of pedigree-based statistics stems from the fact that while rare variants related to diseases or traits of interest occur only infrequently in populations, in families with multiple affected individuals, such variants are enriched. Note that while the proposed statistic can be applied with and without statistical weighting, our simulations show that its power increases when weighting (WSS and VT) are applied. RESULTS Our working hypothesis was that, since rare variants are concentrated in families with multiple affected individuals, pedigree-based statistics should detect rare variants more powerfully than population-based statistics. To evaluate how well our new pedigree-based statistics perform in association studies, we develop a general framework for sequence-based association studies capable of handling data from pedigrees of various types and also from unrelated individuals. In short, we developed a procedure for transforming population-based statistics into tests for family-based associations. Furthermore, we modify two existing tests, the weighted sum-square test and the variable-threshold test, and apply both to our family-based collapsing methods. We demonstrate that the new family-based tests are more powerful than corresponding population-based test and they generate a reasonable type I error rate.To demonstrate feasibility, we apply the newly developed tests to a pedigree-based GWAS data set from the Framingham Heart Study (FHS). FHS-GWAS data contain approximately 5000 uncommon variants with frequencies less than 0.05. Potential association findings in these data demonstrate the feasibility of the software PB-STAR (note, PB-STAR is now freely available to the public). CONCLUSION Our tests show that when analyzing for rare variants, a pedigree-based design is more powerful than a population-based case-control design. We further demonstrate that a pedigree-based statistic's power to detect rare variants increases in direct relation to the proportion of affected individuals within the pedigree.
Collapse
Affiliation(s)
- Yin Yao Shugart
- Unit of Statistical Genomics, Division of Intramural Division Program, National Institute of Mental Health, National Institute of Health, Bethesda, MD, USA
| | - Yun Zhu
- Division of Biostatistics, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Wei Guo
- Unit of Statistical Genomics, Division of Intramural Division Program, National Institute of Mental Health, National Institute of Health, Bethesda, MD, USA
| | - Momiao Xiong
- Division of Biostatistics, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Human Genetics Center, The University of Texas Health Science Center at Houston, P.O. Box 20186, Houston, TX 77225, USA
| |
Collapse
|
239
|
Ward LD, Kellis M. Interpreting noncoding genetic variation in complex traits and human disease. Nat Biotechnol 2012; 30:1095-106. [PMID: 23138309 PMCID: PMC3703467 DOI: 10.1038/nbt.2422] [Citation(s) in RCA: 340] [Impact Index Per Article: 26.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2012] [Accepted: 10/16/2012] [Indexed: 12/13/2022]
Abstract
Association studies provide genome-wide information about the genetic basis of complex disease, but medical research has primarily focused on protein-coding variants, due to the difficulty of interpreting non-coding mutations. This picture has changed with advances in the systematic annotation of functional non-coding elements. Evolutionary conservation, functional genomics, chromatin state, sequence motifs, and molecular quantitative trait loci all provide complementary information about non-coding function. These functional maps can help prioritize variants on risk haplotypes, filter mutations encountered in the clinic, and perform systems-level analyses to reveal processes underlying disease associations. Advances in predictive modeling can enable dataset integration to reveal pathways shared across loci and alleles, and richer regulatory models can guide the search for epistatic interactions. Lastly, new massively parallel reporter experiments can systematically validate regulatory predictions. Ultimately, advances in regulatory and systems genomics can help unleash the value of whole-genome sequencing for personalized genomic risk assessment, diagnosis, and treatment.
Collapse
Affiliation(s)
- Lucas D Ward
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.
| | | |
Collapse
|
240
|
Genetics of coronary artery disease: Genome-wide association studies and beyond. Atherosclerosis 2012; 225:1-10. [DOI: 10.1016/j.atherosclerosis.2012.05.015] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/08/2011] [Revised: 05/15/2012] [Accepted: 05/16/2012] [Indexed: 12/14/2022]
|
241
|
Vrieze SI, Iacono WG, McGue M. Confluence of genes, environment, development, and behavior in a post Genome-Wide Association Study world. Dev Psychopathol 2012; 24:1195-214. [PMID: 23062291 PMCID: PMC3476066 DOI: 10.1017/s0954579412000648] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
This article serves to outline a research paradigm to investigate main effects and interactions of genes, environment, and development on behavior and psychiatric illness. We provide a historical context for candidate gene studies and genome-wide association studies, including benefits, limitations, and expected payoffs. Using substance use and abuse as our driving example, we then turn to the importance of etiological psychological theory in guiding genetic, environmental, and developmental research, as well as the utility of refined phenotypic measures, such as endophenotypes, in the pursuit of etiological understanding and focused tests of genetic and environmental associations. Phenotypic measurement has received considerable attention in the history of psychology and is informed by psychometrics, whereas the environment remains relatively poorly measured and is often confounded with genetic effects (i.e., gene-environment correlation). Genetically informed designs, which are no longer limited to twin and adoption studies thanks to ever-cheaper genotyping, are required to understand environmental influences. Finally, we outline the vast amount of individual difference in structural genomic variation, most of which remains to be leveraged in genetic association tests. Although the genetic data can be massive and burdensome (tens of millions of variants per person), we argue that improved understanding of genomic structure and function will provide investigators with new tools to test specific a priori hypotheses derived from etiological psychological theory, much like current candidate gene research but with less confusion and more payoff than candidate gene research has to date.
Collapse
Affiliation(s)
- Scott I Vrieze
- Department of Psychology, University of Minnesota, Minneapolis, MN 55455, USA.
| | | | | |
Collapse
|
242
|
Torkamani A, Pham P, Libiger O, Bansal V, Zhang G, Scott-Van Zeeland AA, Tewhey R, Topol EJ, Schork NJ. Clinical implications of human population differences in genome-wide rates of functional genotypes. Front Genet 2012; 3:211. [PMID: 23125845 PMCID: PMC3485509 DOI: 10.3389/fgene.2012.00211] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2012] [Accepted: 09/26/2012] [Indexed: 12/21/2022] Open
Abstract
There have been a number of recent successes in the use of whole genome sequencing and sophisticated bioinformatics techniques to identify pathogenic DNA sequence variants responsible for individual idiopathic congenital conditions. However, the success of this identification process is heavily influenced by the ancestry or genetic background of a patient with an idiopathic condition. This is so because potential pathogenic variants in a patient’s genome must be contrasted with variants in a reference set of genomes made up of other individuals’ genomes of the same ancestry as the patient. We explored the effect of ignoring the ancestries of both an individual patient and the individuals used to construct reference genomes. We pursued this exploration in two major steps. We first considered variation in the per-genome number and rates of likely functional derived (i.e., non-ancestral, based on the chimp genome) single nucleotide variants and small indels in 52 individual whole human genomes sampled from 10 different global populations. We took advantage of a suite of computational and bioinformatics techniques to predict the functional effect of over 24 million genomic variants, both coding and non-coding, across these genomes. We found that the typical human genome harbors ∼5.5–6.1 million total derived variants, of which ∼12,000 are likely to have a functional effect (∼5000 coding and ∼7000 non-coding). We also found that the rates of functional genotypes per the total number of genotypes in individual whole genomes differ dramatically between human populations. We then created tables showing how the use of comparator or reference genome panels comprised of genomes from individuals that do not have the same ancestral background as a patient can negatively impact pathogenic variant identification. Our results have important implications for clinical sequencing initiatives.
Collapse
Affiliation(s)
- Ali Torkamani
- The Scripps Translational Science La Jolla, CA, USA ; Scripps Health La Jolla, CA, USA ; Department of Molecular and Experimental Medicine, The Scripps Research Institute La Jolla, CA, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
243
|
Zhang Y, Guan W, Pan W. Adjustment for population stratification via principal components in association analysis of rare variants. Genet Epidemiol 2012; 37:99-109. [PMID: 23065775 DOI: 10.1002/gepi.21691] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2012] [Revised: 09/11/2012] [Accepted: 09/13/2012] [Indexed: 11/07/2022]
Abstract
For unrelated samples, principal component (PC) analysis has been established as a simple and effective approach to adjusting for population stratification in association analysis of common variants (CVs, with minor allele frequencies MAF > 5%). However, it is less clear how it would perform in analysis of low-frequency variants (LFVs, MAF between 1% and 5%), or of rare variants (RVs, MAF < 5%). Furthermore, with next-generation sequencing data, it is unknown whether PCs should be constructed based on CVs, LFVs, or RVs. In this study, we used the 1000 Genomes Project sequence data to explore the construction of PCs and their use in association analysis of LFVs or RVs for unrelated samples. It is shown that a few top PCs based on either CVs or LFVs could separate two continental groups, European and African samples, but those based on only RVs performed less well. When applied to several association tests in simulated data with population stratification, using PCs based on either CVs or LFVs was effective in controlling Type I error rates, while nonadjustment led to inflated Type I error rates. Perhaps the most interesting observation is that, although the PCs based on LFVs could better separate the two continental groups than those based on CVs, the use of the former could lead to overadjustment in the sense of substantial power loss in the absence of population stratification; in contrast, we did not see any problem with the use of the PCs based on CVs in all our examples.
Collapse
Affiliation(s)
- Yiwei Zhang
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota 55455-0392, USA
| | | | | |
Collapse
|
244
|
Rare and low frequency variant stratification in the UK population: description and impact on association tests. PLoS One 2012; 7:e46519. [PMID: 23071581 PMCID: PMC3465327 DOI: 10.1371/journal.pone.0046519] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2012] [Accepted: 09/01/2012] [Indexed: 12/05/2022] Open
Abstract
Although variations in allele frequencies at common SNPs have been extensively studied in different populations, little is known about the stratification of rare variants and its impact on association tests. In this paper, we used Affymetrix 500K genotype data from the WTCCC to investigate if variants in three different frequency categories (below 1%, between 1 and 5%, above 5%) show different stratification patterns in the UK population. We found that these patterns are indeed different. The top principal component extracted from the rare variant category shows poor correlations with any principal component or combination of principal components from the low frequency or common variant categories. These results could suggest that a suitable solution to avoid false positive association due to population stratification would involve adjusting for the respective PCs when testing for variants in different allele frequency categories. However, we found this was not the case both on type 2 diabetes data and on simulated data. Indeed, adjusting rare variant association tests on PCs derived from rare variants does no better to correct for population stratification than adjusting on PCs derived from more common variants. Mixed models perform slightly better for low frequency variants than PC based adjustments but less well for the rarest variants. These results call for the need of new methodological developments specifically devoted to address rare variant stratification issues in association tests.
Collapse
|
245
|
Derkach A, Lawless JF, Sun L. Robust and powerful tests for rare variants using Fisher's method to combine evidence of association from two or more complementary tests. Genet Epidemiol 2012; 37:110-21. [PMID: 23032573 DOI: 10.1002/gepi.21689] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2012] [Revised: 08/23/2012] [Accepted: 09/07/2012] [Indexed: 01/02/2023]
Abstract
Many association tests have been proposed for rare variants, but the choice of a powerful test is uncertain when there is limited information on the underlying genetic model. Proposed methods use either linear statistics, which are powerful when most variants are causal and have the same direction of effect, or quadratic statistics, which are more powerful in other scenarios. To achieve robustness, it is natural to combine the evidence of association from two or more complementary tests. To this end, we consider the minimum-p and Fisher's methods of combining P-values from linear and quadratic statistics. Extensive simulation studies show that both methods are robust across models with varying proportions of causal, deleterious, and protective rare variants, allele frequencies, and effect sizes. When the majority (>75%) of the causal effects are in the same direction (deleterious or protective), Fisher's method consistently outperforms the minimum-p and the individual linear and quadratic tests, as well as the optimal sequence kernel association test, SKAT-O. When the individual test has moderate power, Fisher's test has improved power for 90% of the ~5000 models considered, with >20% relative efficiency gain for 40% of the models. The maximum absolute power loss is 8% for the remaining 10% of the models. An application to the GAW17 quantitative trait Q2 data based on sequence data of the 1000 Genomes Project shows that, compared with linear and quadratic tests, Fisher's test has comparable power for all 13 functional genes and provides the best power for more than half of them.
Collapse
Affiliation(s)
- Andriy Derkach
- Department of Statistics, University of Toronto, Toronto, Ontario, Canada
| | | | | |
Collapse
|
246
|
van der Zee HH, Laman JD, Boer J, Prens EP. Hidradenitis suppurativa: viewpoint on clinical phenotyping, pathogenesis and novel treatments. Exp Dermatol 2012; 21:735-9. [PMID: 22882284 DOI: 10.1111/j.1600-0625.2012.01552.x] [Citation(s) in RCA: 145] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/12/2012] [Indexed: 11/28/2022]
Abstract
Hidradenitis suppurativa (HS) is an inflammatory, debilitating follicular skin disease with recurring flare-ups. The painful, deep-seated, inflamed lesions in the inverse areas of the body cause severe discomfort, and hence, serious psycho-social and economic costs. HS is common, but often misdiagnosed and mechanistically poorly understood. Furthermore, HS is notoriously difficult to treat resulting in a high unmet medical need. To provoke debate, rational experimentation and initiate strategic studies, we here present a concise viewpoint on seven topics: the diagnosis of HS, the role of mechanical friction, the critical importance of accurate clinical subgrouping, smoking and obesity, the role of bacteria, and our comprehensive view on HS pathogenesis with a central role for keratin clearance, and novel treatment approaches.
Collapse
Affiliation(s)
- Hessel H van der Zee
- Department of Dermatology, Erasmus MC, University Medical Center, Rotterdam, The Netherlands.
| | | | | | | |
Collapse
|
247
|
Brisbin A, Jenkins GD, Ellsworth KA, Wang L, Fridley BL. Localization of association signal from risk and protective variants in sequencing studies. Front Genet 2012; 3:173. [PMID: 22973297 PMCID: PMC3434438 DOI: 10.3389/fgene.2012.00173] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2012] [Accepted: 08/19/2012] [Indexed: 11/13/2022] Open
Abstract
Aggregating information across multiple variants in a gene or region can improve power for rare variant association testing. Power is maximized when the aggregation region contains many causal variants and few neutral variants. In this paper, we present a method for the localization of the association signal in a region using a sliding-window based approach to rare variant association testing in a region. We first introduce a novel method for analysis of rare variants, the Difference in Minor Allele Frequency test (DMAF), which allows combined analysis of common and rare variants, and makes no assumptions about the direction of effects. In whole-region analyses of simulated data with risk and protective variants, DMAF and other methods which pool data across individuals were found to outperform methods which pool data across variants. We then implement a sliding-window version of DMAF, using a step-down permutation approach to control type I error with the testing of multiple windows. In simulations, the sliding-window DMAF improved power to detect a causal sub-region, compared to applying DMAF to the whole region. Sliding-window DMAF was also effective in localizing the causal sub-region. We also applied the DMAF sliding-window approach to test for an association between response to the drug gemcitabine and variants in the gene FKBP5 sequenced in 91 lymphoblastoid cell lines derived from white non-Hispanic individuals. The application of the sliding-window test procedure detected an association in a sub-region spanning an exon and two introns, when rare and common variants were analyzed together.
Collapse
Affiliation(s)
- Abra Brisbin
- Department of Health Sciences Research, Mayo Clinic Rochester, MN, USA
| | | | | | | | | |
Collapse
|
248
|
Mägi R, Asimit JL, Day-Williams AG, Zeggini E, Morris AP. Genome-wide association analysis of imputed rare variants: application to seven common complex diseases. Genet Epidemiol 2012; 36:785-96. [PMID: 22951892 PMCID: PMC3569874 DOI: 10.1002/gepi.21675] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2012] [Revised: 07/23/2012] [Accepted: 07/27/2012] [Indexed: 12/21/2022]
Abstract
Genome-wide association studies have been successful in identifying loci contributing effects to a range of complex human traits. The majority of reproducible associations within these loci are with common variants, each of modest effect, which together explain only a small proportion of heritability. It has been suggested that much of the unexplained genetic component of complex traits can thus be attributed to rare variation. However, genome-wide association study genotyping chips have been designed primarily to capture common variation, and thus are underpowered to detect the effects of rare variants. Nevertheless, we demonstrate here, by simulation, that imputation from an existing scaffold of genome-wide genotype data up to high-density reference panels has the potential to identify rare variant associations with complex traits, without the need for costly re-sequencing experiments. By application of this approach to genome-wide association studies of seven common complex diseases, imputed up to publicly available reference panels, we identify genome-wide significant evidence of rare variant association in PRDM10 with coronary artery disease and multiple genes in the major histocompatibility complex (MHC) with type 1 diabetes. The results of our analyses highlight that genome-wide association studies have the potential to offer an exciting opportunity for gene discovery through association with rare variants, conceivably leading to substantial advancements in our understanding of the genetic architecture underlying complex human traits.
Collapse
Affiliation(s)
- Reedik Mägi
- Estonian Genome Centre, University of Tartu, Tartu, Estonia
| | | | | | | | | |
Collapse
|
249
|
Yang IV, Schwartz DA. The next generation of complex lung genetic studies. Am J Respir Crit Care Med 2012; 186:1087-94. [PMID: 22936355 DOI: 10.1164/rccm.201207-1178pp] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Common genetic risk variants identified by genome-wide association studies have explained a small portion of disease heritability in complex diseases. It is becoming apparent that each gene/locus is heterogeneous and that multiple rare independent risk alleles across the population contribute to disease risk. Next-generation sequencing technologies have reached the maturity and low cost necessary to perform whole genome, whole exome, and targeted region sequencing to identify all rare risk alleles across a population, a task that is not possible to achieve by genotyping. Design of whole genome, whole exome, and targeted sequencing projects to identify disease variants for complex lung diseases requires four main steps: library preparation, sequencing, sequence data analysis, and statistical analysis. Although data analysis approaches are still evolving, a number of published studies have successfully identified rare variants associated with complex disease. Despite many challenges that lie ahead in applying these technologies to lung disease, rare variants are likely to be a critical piece of the puzzle that needs to be solved to understand the genetic basis of complex lung disease and to use this information to develop better therapies.
Collapse
Affiliation(s)
- Ivana V Yang
- Department of Medicine, University of Colorado Denver, 12700 East 19th Avenue, 8611, Aurora, CO 80045, USA.
| | | |
Collapse
|
250
|
Abstract
It is widely believed that both common and rare variants contribute to the risks of common diseases or complex traits and the cumulative effects of multiple rare variants can explain a significant proportion of trait variances. Advances in high-throughput DNA sequencing technologies allow us to genotype rare causal variants and investigate the effects of such rare variants on complex traits. We developed an adaptive ridge regression method to analyze the collective effects of multiple variants in the same gene or the same functional unit. Our model focuses on continuous trait and incorporates covariate factors to remove potential confounding effects. The proposed method estimates and tests multiple rare variants collectively but does not depend on the assumption of same direction of each rare variant effect. Compared with the Bayesian hierarchical generalized linear model approach, the state-of-the-art method of rare variant detection, the proposed new method is easy to implement, yet it has higher statistical power. Application of the new method is demonstrated using the well-known data from the Dallas Heart Study.
Collapse
Affiliation(s)
- Haimao Zhan
- Department of Botany and Plant Sciences, University of California Riverside, Riverside, California, United States of America
| | - Shizhong Xu
- Department of Botany and Plant Sciences, University of California Riverside, Riverside, California, United States of America
| |
Collapse
|