1
|
Wang YC, Wu Y, Choi J, Allington G, Zhao S, Khanfar M, Yang K, Fu PY, Wrubel M, Yu X, Mekbib KY, Ocken J, Smith H, Shohfi J, Kahle KT, Lu Q, Jin SC. Computational Genomics in the Era of Precision Medicine: Applications to Variant Analysis and Gene Therapy. J Pers Med 2022; 12:175. [PMID: 35207663 PMCID: PMC8878256 DOI: 10.3390/jpm12020175] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 01/18/2022] [Accepted: 01/24/2022] [Indexed: 02/04/2023] Open
Abstract
Rapid methodological advances in statistical and computational genomics have enabled researchers to better identify and interpret both rare and common variants responsible for complex human diseases. As we continue to see an expansion of these advances in the field, it is now imperative for researchers to understand the resources and methodologies available for various data types and study designs. In this review, we provide an overview of recent methods for identifying rare and common variants and understanding their roles in disease etiology. Additionally, we discuss the strategy, challenge, and promise of gene therapy. As computational and statistical approaches continue to improve, we will have an opportunity to translate human genetic findings into personalized health care.
Collapse
Affiliation(s)
- Yung-Chun Wang
- Department of Genetics, School of Medicine, Washington University, St. Louis, MO 63110, USA; (Y.-C.W.); (J.C.); (S.Z.); (M.K.); (K.Y.); (P.-Y.F.); (M.W.); (X.Y.)
| | - Yuchang Wu
- Department of Biostatistics & Medical Informatics, University of Wisconsin-Madison, Madison, WI 53706, USA;
| | - Julie Choi
- Department of Genetics, School of Medicine, Washington University, St. Louis, MO 63110, USA; (Y.-C.W.); (J.C.); (S.Z.); (M.K.); (K.Y.); (P.-Y.F.); (M.W.); (X.Y.)
| | - Garrett Allington
- Department of Pathology, Yale School of Medicine, New Haven, CT 06510, USA;
- Department of Neurosurgery, Massachusetts General Hospital, Boston, MA 02114, USA; (H.S.); (K.T.K.)
| | - Shujuan Zhao
- Department of Genetics, School of Medicine, Washington University, St. Louis, MO 63110, USA; (Y.-C.W.); (J.C.); (S.Z.); (M.K.); (K.Y.); (P.-Y.F.); (M.W.); (X.Y.)
| | - Mariam Khanfar
- Department of Genetics, School of Medicine, Washington University, St. Louis, MO 63110, USA; (Y.-C.W.); (J.C.); (S.Z.); (M.K.); (K.Y.); (P.-Y.F.); (M.W.); (X.Y.)
| | - Kuangying Yang
- Department of Genetics, School of Medicine, Washington University, St. Louis, MO 63110, USA; (Y.-C.W.); (J.C.); (S.Z.); (M.K.); (K.Y.); (P.-Y.F.); (M.W.); (X.Y.)
| | - Po-Ying Fu
- Department of Genetics, School of Medicine, Washington University, St. Louis, MO 63110, USA; (Y.-C.W.); (J.C.); (S.Z.); (M.K.); (K.Y.); (P.-Y.F.); (M.W.); (X.Y.)
| | - Max Wrubel
- Department of Genetics, School of Medicine, Washington University, St. Louis, MO 63110, USA; (Y.-C.W.); (J.C.); (S.Z.); (M.K.); (K.Y.); (P.-Y.F.); (M.W.); (X.Y.)
| | - Xiaobing Yu
- Department of Genetics, School of Medicine, Washington University, St. Louis, MO 63110, USA; (Y.-C.W.); (J.C.); (S.Z.); (M.K.); (K.Y.); (P.-Y.F.); (M.W.); (X.Y.)
- Department of Computer Science & Engineering, Washington University, St. Louis, MO 63130, USA
| | - Kedous Y. Mekbib
- Department of Neurosurgery, Yale University School of Medicine, New Haven, CT 06510, USA; (K.Y.M.); (J.O.); (J.S.)
| | - Jack Ocken
- Department of Neurosurgery, Yale University School of Medicine, New Haven, CT 06510, USA; (K.Y.M.); (J.O.); (J.S.)
| | - Hannah Smith
- Department of Neurosurgery, Massachusetts General Hospital, Boston, MA 02114, USA; (H.S.); (K.T.K.)
- Department of Neurosurgery, Yale University School of Medicine, New Haven, CT 06510, USA; (K.Y.M.); (J.O.); (J.S.)
| | - John Shohfi
- Department of Neurosurgery, Yale University School of Medicine, New Haven, CT 06510, USA; (K.Y.M.); (J.O.); (J.S.)
| | - Kristopher T. Kahle
- Department of Neurosurgery, Massachusetts General Hospital, Boston, MA 02114, USA; (H.S.); (K.T.K.)
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA 02115, USA
- Departments of Pediatrics and Neurology, Harvard Medical School, Boston, MA 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Qiongshi Lu
- Department of Biostatistics & Medical Informatics, University of Wisconsin-Madison, Madison, WI 53706, USA;
| | - Sheng Chih Jin
- Department of Genetics, School of Medicine, Washington University, St. Louis, MO 63110, USA; (Y.-C.W.); (J.C.); (S.Z.); (M.K.); (K.Y.); (P.-Y.F.); (M.W.); (X.Y.)
- Department of Pediatrics, School of Medicine, Washington University, St. Louis, MO 63110, USA
| |
Collapse
|
2
|
A powerful nonparametric statistical framework for family-based association analyses. Genetics 2015; 200:69-78. [PMID: 25745024 DOI: 10.1534/genetics.115.175174] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2015] [Accepted: 02/23/2015] [Indexed: 01/04/2023] Open
Abstract
Family-based study design is commonly used in genetic research. It has many ideal features, including being robust to population stratification (PS). With the advance of high-throughput technologies and ever-decreasing genotyping cost, it has become common for family studies to examine a large number of variants for their associations with disease phenotypes. The yield from the analysis of these family-based genetic data can be enhanced by adopting computationally efficient and powerful statistical methods. We propose a general framework of a family-based U-statistic, referred to as family-U, for family-based association studies. Unlike existing parametric-based methods, the proposed method makes no assumption of the underlying disease models and can be applied to various phenotypes (e.g., binary and quantitative phenotypes) and pedigree structures (e.g., nuclear families and extended pedigrees). By using only within-family information, it can offer robust protection against PS. In the absence of PS, it can also utilize additional information (i.e., between-family information) for power improvement. Through simulations, we demonstrated that family-U attained higher power over a commonly used method, family-based association tests, under various disease scenarios. We further illustrated the new method with an application to large-scale family data from the Framingham Heart Study. By utilizing additional information (i.e., between-family information), family-U confirmed a previous association of CHRNA5 with nicotine dependence.
Collapse
|
3
|
Chung RH, Tsai WY, Martin ER. Family-based association test using both common and rare variants and accounting for directions of effects for sequencing data. PLoS One 2014; 9:e107800. [PMID: 25244564 PMCID: PMC4171487 DOI: 10.1371/journal.pone.0107800] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2014] [Accepted: 08/22/2014] [Indexed: 11/19/2022] Open
Abstract
Current family-based association tests for sequencing data were mainly developed for identifying rare variants associated with a complex disease. As the disease can be influenced by the joint effects of common and rare variants, common variants with modest effects may not be identified by the methods focusing on rare variants. Moreover, variants can have risk, neutral, or protective effects. Association tests that can effectively select groups of common and rare variants that are likely to be causal and consider the directions of effects have become important. We developed the Ordered Subset - Variable Threshold - Pedigree Disequilibrium Test (OVPDT), a combination of three algorithms, for association analysis in family sequencing data. The ordered subset algorithm is used to select a subset of common variants based on their relative risks, calculated using only parental mating types. The variable threshold algorithm is used to search for an optimal allele frequency threshold such that rare variants below the threshold are more likely to be causal. The PDT statistics from both rare and common variants selected by the two algorithms are combined as the OVPDT statistic. A permutation procedure is used in OVPDT to calculate the p-value. We used simulations to demonstrate that OVPDT has the correct type I error rates under different scenarios and compared the power of OVPDT with two other family-based association tests. The results suggested that OVPDT can have more power than the other tests if both common and rare variants have effects on the disease in a region.
Collapse
Affiliation(s)
- Ren-Hua Chung
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli, Taiwan
| | - Wei-Yun Tsai
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli, Taiwan
| | - Eden R. Martin
- Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, Florida, United States of America
| |
Collapse
|
4
|
Abstract
The cost of next-generation sequencing is now approaching that of the first generation of genome-wide single-nucleotide genotyping panels, but this is still out of reach for large-scale epidemiologic studies with tens of thousands of subjects. Furthermore, the anticipated yield of millions of rare variants poses serious challenges for distinguishing causal from noncausal variants for disease. We explore the merits of using family-based designs for sequencing substudies to identify novel variants and prioritize them for their likelihood of causality. While the sharing of variants within families means that family-based designs may be less efficient for discovery than sequencing of a comparable number of unrelated individuals, the ability to exploit cosegregation of variants with disease within families helps distinguish causal from noncausal ones. We introduce a score test criterion for prioritizing discovered variants in terms of their likelihood of being functional. We compare the relative statistical efficiency of 2-stage versus1-stage family-based designs by application to the Genetic Analysis Workshop 18 simulated sequence data.
Collapse
Affiliation(s)
- Zhao Yang
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089-9234, USA
| | - Duncan C Thomas
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089-9234, USA
| |
Collapse
|
5
|
Thomas DC, Yang Z, Yang F. Two-phase and family-based designs for next-generation sequencing studies. Front Genet 2013; 4:276. [PMID: 24379824 PMCID: PMC3861783 DOI: 10.3389/fgene.2013.00276] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2013] [Accepted: 11/19/2013] [Indexed: 12/21/2022] Open
Abstract
The cost of next-generation sequencing is now approaching that of early GWAS panels, but is still out of reach for large epidemiologic studies and the millions of rare variants expected poses challenges for distinguishing causal from non-causal variants. We review two types of designs for sequencing studies: two-phase designs for targeted follow-up of genomewide association studies using unrelated individuals; and family-based designs exploiting co-segregation for prioritizing variants and genes. Two-phase designs subsample subjects for sequencing from a larger case-control study jointly on the basis of their disease and carrier status; the discovered variants are then tested for association in the parent study. The analysis combines the full sequence data from the substudy with the more limited SNP data from the main study. We discuss various methods for selecting this subset of variants and describe the expected yield of true positive associations in the context of an on-going study of second breast cancers following radiotherapy. While the sharing of variants within families means that family-based designs are less efficient for discovery than sequencing unrelated individuals, the ability to exploit co-segregation of variants with disease within families helps distinguish causal from non-causal ones. Furthermore, by enriching for family history, the yield of causal variants can be improved and use of identity-by-descent information improves imputation of genotypes for other family members. We compare the relative efficiency of these designs with those using unrelated individuals for discovering and prioritizing variants or genes for testing association in larger studies. While associations can be tested with single variants, power is low for rare ones. Recent generalizations of burden or kernel tests for gene-level associations to family-based data are appealing. These approaches are illustrated in the context of a family-based study of colorectal cancer.
Collapse
Affiliation(s)
- Duncan C Thomas
- Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA
| | - Zhao Yang
- Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA
| | - Fan Yang
- Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA
| |
Collapse
|
6
|
Lutz SM, Vansteelandt S, Lange C. Testing for direct genetic effects using a screening step in family-based association studies. Front Genet 2013; 4:243. [PMID: 24312120 PMCID: PMC3836057 DOI: 10.3389/fgene.2013.00243] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2013] [Accepted: 10/25/2013] [Indexed: 11/13/2022] Open
Abstract
In genome wide association studies (GWAS), family-based studies tend to have less power to detect genetic associations than population-based studies, such as case-control studies. This can be an issue when testing if genes in a family-based GWAS have a direct effect on the phenotype of interest over and above their possible indirect effect through a secondary phenotype. When multiple SNPs are tested for a direct effect in the family-based study, a screening step can be used to minimize the burden of multiple comparisons in the causal analysis. We propose a 2-stage screening step that can be incorporated into the family-based association test (FBAT) approach similar to the conditional mean model approach in the Van Steen-algorithm (Van Steen et al., 2005). Simulations demonstrate that the type 1 error is preserved and this method is advantageous when multiple markers are tested. This method is illustrated by an application to the Framingham Heart Study.
Collapse
Affiliation(s)
- Sharon M Lutz
- Department of Biostatistics, University of Colorado Aurora, CO, USA ; Department of Biostatistics, Harvard School of Public Health Boston, MA, USA
| | | | | |
Collapse
|
7
|
Attitudes toward Genetic Testing for Hypertension among African American Women and Girls. Nurs Res Pract 2013; 2013:341374. [PMID: 24303212 PMCID: PMC3835880 DOI: 10.1155/2013/341374] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2013] [Revised: 07/29/2013] [Accepted: 09/17/2013] [Indexed: 11/17/2022] Open
Abstract
Introduction. Although African American (AA) women have the highest prevalence of hypertension and many genetic studies have been conducted to examine this disparity, no published studies have investigated their attitudes toward genetic testing for hypertension. The purpose of the present study was to use the health belief model as a guide to examine attitudes toward perceived barriers and benefits of genetic testing held by AA multigenerational triads and to determine whether they differed by generation, age, education, or income level. Methods. A descriptive correlational research design were used with 183 African American women and girls from Detroit. Correlations between triad membership, age, income, and education level were examined for association with attitudes toward genetic testing. Results. Increasing age and education were associated with significant differences in attitudes regarding benefits (F[2, 160] = 5.19, P = 0.007, d = 0.06) and awareness (F[2, 160] = 6.49, P = 0.002, d = 0.08). No statistically significant differences existed on the three subscales when compared by income levels or triad membership. Conclusions. This highlights the need for increased outreach to younger generations regarding benefits of genetic services. Further research is necessary to determine whether rural and male populations have similar beliefs.
Collapse
|
8
|
Fan R, Lee A, Lu Z, Liu A, Troendle JF, Mills JL. Association analysis of complex diseases using triads, parent-child dyads and singleton monads. BMC Genet 2013; 14:78. [PMID: 24007308 PMCID: PMC3844511 DOI: 10.1186/1471-2156-14-78] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2013] [Accepted: 08/17/2013] [Indexed: 11/16/2022] Open
Abstract
Background Triad families are routinely used to test association between genetic variants and complex diseases. Triad studies are important and popular since they are robust in terms of being less prone to false positives due to population structure. In practice, one may collect not only complete triads, but also incomplete families such as dyads (affected child with one parent) and singleton monads (affected child without parents). Since there is a lack of convenient algorithms and software to analyze the incomplete data, dyads and monads are usually discarded. This may lead to loss of power and insufficient utilization of genetic information in a study. Results We develop likelihood-based statistical models and likelihood ratio tests to test for association between complex diseases and genetic markers by using combinations of full triads, parent-child dyads, and affected singleton monads for a unified analysis. A likelihood is calculated directly to facilitate the data analysis without imputation and to avoid computational complexity. This makes it easy to implement the models and to explain the results. Conclusion By simulation studies, we show that the proposed models and tests are very robust in terms of accurately controlling type I error evaluations, and are powerful by empirical power evaluations. The methods are applied to test for association between transforming growth factor alpha (TGFA) gene and cleft palate in an Irish study.
Collapse
Affiliation(s)
- Ruzong Fan
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, 6100 Executive Blvd, MSC 7510, Rockville, MD 20852, USA.
| | | | | | | | | | | |
Collapse
|
9
|
De G, Yip WK, Ionita-Laza I, Laird N. Rare variant analysis for family-based design. PLoS One 2013; 8:e48495. [PMID: 23341868 PMCID: PMC3546113 DOI: 10.1371/journal.pone.0048495] [Citation(s) in RCA: 76] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2011] [Accepted: 10/01/2012] [Indexed: 12/21/2022] Open
Abstract
Genome-wide association studies have been able to identify disease associations with many common variants; however most of the estimated genetic contribution explained by these variants appears to be very modest. Rare variants are thought to have larger effect sizes compared to common SNPs but effects of rare variants cannot be tested in the GWAS setting. Here we propose a novel method to test for association of rare variants obtained by sequencing in family-based samples by collapsing the standard family-based association test (FBAT) statistic over a region of interest. We also propose a suitable weighting scheme so that low frequency SNPs that may be enriched in functional variants can be upweighted compared to common variants. Using simulations we show that the family-based methods perform at par with the population-based methods under no population stratification. By construction, family-based tests are completely robust to population stratification; we show that our proposed methods remain valid even when population stratification is present.
Collapse
Affiliation(s)
- Gourab De
- Department of Biostatistics, Harvard University, Boston, MA, USA.
| | | | | | | |
Collapse
|
10
|
Yu Z, Gillen D, Li CF, Demetriou M. Incorporating parental information into family-based association tests. Biostatistics 2012; 14:556-72. [PMID: 23266418 DOI: 10.1093/biostatistics/kxs048] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Assumptions regarding the true underlying genetic model, or mode of inheritance, are necessary when quantifying genetic associations with disease phenotypes. Here we propose new methods to ascertain the underlying genetic model from parental data in family-based association studies. Specifically, for parental mating-type data, we propose a novel statistic to test whether the underlying genetic model is additive, dominant, or recessive; for parental genotype-phenotype data, we propose three strategies to determine the true mode of inheritance. We illustrate how to incorporate the information gleaned from these strategies into family-based association tests. Because family-based association tests are conducted conditional on parental genotypes, the type I error rate of these procedures is not inflated by the information learned from parental data. This result holds even if such information is weak or when the assumption of Hardy-Weinberg equilibrium is violated. Our simulations demonstrate that incorporating parental data into family-based association tests can improve power under common inheritance models. The application of our proposed methods to a candidate-gene study of type 1 diabetes successfully detects a recessive effect in MGAT5 that would otherwise be missed by conventional family-based association tests.
Collapse
Affiliation(s)
- Zhaoxia Yu
- Department of Statistics, University of California at Irvine, Irvine, CA 92697, USA.
| | | | | | | |
Collapse
|
11
|
Wason J, Dudbridge F. A general framework for two-stage analysis of genome-wide association studies and its application to case-control studies. Am J Hum Genet 2012; 90:760-73. [PMID: 22560088 DOI: 10.1016/j.ajhg.2012.03.007] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2012] [Revised: 02/17/2012] [Accepted: 03/09/2012] [Indexed: 02/03/2023] Open
Abstract
Two-stage analyses of genome-wide association studies have been proposed as a means to improving power for designs including family-based association and gene-environment interaction testing. In these analyses, all markers are first screened via a statistic that may not be robust to an underlying assumption, and the markers thus selected are then analyzed in a second stage with a test that is independent from the first stage and is robust to the assumption in question. We give a general formulation of two-stage designs and show how one can use this formulation both to derive existing methods and to improve upon them, opening up a range of possible further applications. We show how using simple regression models in conjunction with external data such as average trait values can improve the power of genome-wide association studies. We focus on case-control studies and show how it is possible to use allele frequencies derived from an external reference to derive a powerful two-stage analysis. An illustration involving the Wellcome Trust Case-Control Consortium data shows several genome-wide-significant associations, subsequently validated, that were not significant in the standard analysis. We give some analytic properties of the methods and discuss some underlying principles.
Collapse
|
12
|
Van Steen K. Perspectives on genome-wide multi-stage family-based association studies. Stat Med 2011; 30:2201-21. [DOI: 10.1002/sim.4259] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2010] [Accepted: 03/07/2011] [Indexed: 01/03/2023]
|
13
|
Eosinophilic esophagitis: it is here to stay. Clin Gastroenterol Hepatol 2011; 9:370-2. [PMID: 21277391 DOI: 10.1016/j.cgh.2011.01.018] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/30/2010] [Revised: 01/07/2011] [Accepted: 01/17/2011] [Indexed: 02/07/2023]
|
14
|
Murphy A, Won S, Rogers A, Chu JH, Raby BA, Lange C. On the genome-wide analysis of copy number variants in family-based designs: methods for combining family-based and population-based information for testing dichotomous or quantitative traits, or completely ascertained samples. Genet Epidemiol 2011; 34:582-90. [PMID: 20718041 DOI: 10.1002/gepi.20515] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We propose a new approach for the analysis of copy number variants (CNVs)for genome-wide association studies in family-based designs. Our new overall association test combines the between-family component and the within-family component of the family-based data so that the new test statistic is fully efficient and, at the same time, maintains robustness against population-admixture and stratification, like classical family-based association tests that are based only on the within-family component. Although all data are incorporated into the test statistic, an adjustment for genetic confounding is not needed, even for the between-family component. The new test statistic is valid for testing either quantitative or dichotomous phenotypes. If external CNV data are available, the approach can also be applied to completely ascertained samples. Similar to the approach by Ionita-Laza et al. ([2008]. Genet Epidemiol 32:273-284), the proposed test statistic does not require a CNV-calling algorithm and is based directly on the CNV probe intensities. We show, via simulation studies, that our methodology increases the power of the FBAT statistic to levels comparable to those of population-based designs. The advantages of the approach in practice are demonstrated by an application to a genome-wide association study for body mass index.
Collapse
Affiliation(s)
- Amy Murphy
- Channing Laboratory, Brigham and Women's Hospital, Boston, Massachusetts, USA
| | | | | | | | | | | |
Collapse
|
15
|
Himes BE, Lasky-Su J, Wu AC, Wilk JB, Hunninghake GM, Klanderman B, Murphy AJ, Lazarus R, Soto-Quiros ME, Avila L, Celedón JC, Lange C, O'Connor GT, Raby BA, Silverman EK, Weiss ST. Asthma-susceptibility variants identified using probands in case-control and family-based analyses. BMC MEDICAL GENETICS 2010; 11:122. [PMID: 20698975 PMCID: PMC2927535 DOI: 10.1186/1471-2350-11-122] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/08/2010] [Accepted: 08/10/2010] [Indexed: 01/30/2023]
Abstract
BACKGROUND Asthma is a chronic respiratory disease whose genetic basis has been explored for over two decades, most recently via genome-wide association studies. We sought to find asthma-susceptibility variants by using probands from a single population in both family-based and case-control association designs. METHODS We used probands from the Childhood Asthma Management Program (CAMP) in two primary genome-wide association study designs: (1) probands were combined with publicly available population controls in a case-control design, and (2) probands and their parents were used in a family-based design. We followed a two-stage replication process utilizing three independent populations to validate our primary findings. RESULTS We found that single nucleotide polymorphisms with similar case-control and family-based association results were more likely to replicate in the independent populations, than those with the smallest p-values in either the case-control or family-based design alone. The single nucleotide polymorphism that showed the strongest evidence for association to asthma was rs17572584, which replicated in 2/3 independent populations with an overall p-value among replication populations of 3.5E-05. This variant is near a gene that encodes an enzyme that has been implicated to act coordinately with modulators of Th2 cell differentiation and is expressed in human lung. CONCLUSIONS Our results suggest that using probands from family-based studies in case-control designs, and combining results of both family-based and case-control approaches, may be a way to augment our ability to find SNPs associated with asthma and other complex diseases.
Collapse
Affiliation(s)
- Blanca E Himes
- Harvard-MIT Division of Health Sciences and Technology, Cambridge, MA, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Lasky-Su J, Won S, Mick E, Anney RJ, Franke B, Neale B, Biederman J, Smalley SL, Loo SK, Todorov A, Faraone SV, Weiss ST, Lange C. On genome-wide association studies for family-based designs: an integrative analysis approach combining ascertained family samples with unselected controls. Am J Hum Genet 2010; 86:573-80. [PMID: 20346434 DOI: 10.1016/j.ajhg.2010.02.019] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2009] [Revised: 01/22/2010] [Accepted: 02/10/2010] [Indexed: 10/19/2022] Open
Abstract
Large numbers of control individuals with genome-wide genotype data are now available through various databases. These controls are regularly used in case-control genome-wide association studies (GWAS) to increase the statistical power. Controls are often "unselected" for the disease of interest and are not matched to cases in terms of confounding factors, making the studies more vulnerable to confounding as a result of population stratification. In this communication, we demonstrate that family-based designs can integrate unselected controls from other studies into the analysis without compromising the robustness of family-based designs against genetic confounding. The result is a hybrid case-control family-based analysis that achieves higher power levels than population-based studies with the same number of cases and controls. This strategy is widely applicable and works ideally for all situations in which both family and case-control data are available. The approach consists of three steps. First, we perform a standard family-based association test that does not utilize the between-family component. Second, we use the between-family information in conjunction with the genotypes from unselected controls in a Cochran-Armitage trend test. The p values from this step are then calculated by rank ordering the individual Cochran-Armitage trend test statistics for the genotype markers. Third, we generate a combined p value with the association p values from the first two steps. Simulation studies are used to assess the achievable power levels of this method compared to standard analysis approaches. We illustrate the approach by an application to a GWAS of attention deficit hyperactivity disorder parent-offspring trios and publicly available controls.
Collapse
|
17
|
Lasky-Su J, Murphy A, McQueen MB, Weiss S, Lange C. An omnibus test for family-based association studies with multiple SNPs and multiple phenotypes. Eur J Hum Genet 2010; 18:720-5. [PMID: 20087406 DOI: 10.1038/ejhg.2009.221] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
We propose an omnibus family-based association test (MFBAT) that can be applied to multiple markers and multiple phenotypes and that has only one degree of freedom. The proposed test statistic extends current FBAT methodology to incorporate multiple markers as well as multiple phenotypes. Using simulation studies, power estimates for the proposed methodology are compared with the standard methodologies. On the basis of these simulations, we find that MFBAT substantially outperforms other methods, including haplotypic approaches and doing multiple tests with single single-nucleotide polymorphisms (SNPs) and single phenotypes. The practical relevance of the approach is illustrated by an application to asthma in which SNP/phenotype combinations are identified and reach overall significance that would not have been identified using other approaches. This methodology is directly applicable to cases in which there are multiple SNPs, such as candidate gene studies, cases in which there are multiple phenotypes, such as expression data, and cases in which there are multiple phenotypes and genotypes, such as genome-wide association studies that incorporate expression profiles as phenotypes. This program is available in the PBAT analysis package.
Collapse
Affiliation(s)
- Jessica Lasky-Su
- Channing Laboratory, Brigham and Women's Hospital, Boston, MA, USA
| | | | | | | | | |
Collapse
|
18
|
Association of polymorphisms in the SLIT2 axonal guidance gene with anger in suicide attempters. Mol Psychiatry 2010; 15:10-1. [PMID: 20029409 DOI: 10.1038/mp.2009.70] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
19
|
Murphy A, T Weiss S, Lange C. Two-stage testing strategies for genome-wide association studies in family-based designs. Methods Mol Biol 2010; 620:485-496. [PMID: 20652517 DOI: 10.1007/978-1-60761-580-4_17] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
The analysis of genome-wide association studies (GWAS) poses statistical hurdles that have to be handled efficiently in order for the study to be successful. The two largest impediments in the analysis phase of the study are the multiple comparisons problem and maintaining robustness against confounding due to population admixture and stratification. For quantitative traits in family-based designs, Van Steen (1) proposed a two-stage testing strategy that can be considered a hybrid approach between family-based and population-based analysis. By including the population-based component into the family-based analysis, the Van Steen algorithm maximizes the statistical power, while at the same time, maintains the original robustness of family-based association tests (FBATs) (2-4). The Van Steen approach consists of two statistically independent steps, a screening step and a testing step. For all genotyped single nucleotide polymorphisms (SNPs), the screening step examines the evidence for association at a population-based level. Based on support for a potential genetic association from the screening step, the SNPs are prioritized for testing in the next step, where they are analyzed with a FBAT (3). By exploiting population-based information in the screening step that is not utilized in family-based association testing step, the two steps are statistically independent. Therefore, the use of the population-based data for the purposes of screening does not bias the FBAT statistic calculated in the testing step. Depending on the trait type and the ascertainment conditions, Van Steen-type testing strategies can achieve statistical power levels that are comparable to those of population-based studies with the same number of probands. In this chapter, we review the original Van Steen algorithm, its numerous extensions, and discuss its advantages and disadvantages.
Collapse
Affiliation(s)
- Amy Murphy
- Channing Laboratory, Center for Genomic Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | | | | |
Collapse
|
20
|
Abstract
PURPOSE OF REVIEW Food allergy, a growing clinical and public health problem in the United States and worldwide, is likely determined by multiple environmental and genetic factors. The purpose of this review is to summarize recent advances in food allergy genetic research. RECENT FINDINGS There is compelling evidence that genetic factors may play a role in food allergy. However, the specific genetic loci that may modulate individual risk of food allergy remain to be identified. To date, only a limited number of candidate gene association studies of food allergy have been reported. Polymorphism(s) in nine genes have been associated with the incidence of food allergy or food allergy severity in at least one study. But most of these findings remain to be replicated in independent populations. In contrast, there are considerable advances in genetics of other allergic diseases such as asthma and atopic dermatitis. Although asthma and atopic dermatitis often coexist with food allergy, the relevance of their candidate genes to food allergy remains to be evaluated. SUMMARY Genetics in food allergy is a promising research area but is still in its infancy. More studies are needed to dissect susceptible genes of food allergy. A genome-wide association approach may serve as a powerful tool to identify novel genes related to food allergy. Furthermore, the role of gene-environment interaction, gene-gene interaction, and epigenetics in food allergy remains largely unexplored. Given the complex nature of food allergy, future studies need to integrate environment, genomics, and epigenomics in order to better understand the multifaceted etiology and biological mechanisms of food allergy.
Collapse
|
21
|
Won S, Wilk JB, Mathias RA, O'Donnell CJ, Silverman EK, Barnes K, O'Connor GT, Weiss ST, Lange C. On the analysis of genome-wide association studies in family-based designs: a universal, robust analysis approach and an application to four genome-wide association studies. PLoS Genet 2009; 5:e1000741. [PMID: 19956679 PMCID: PMC2777973 DOI: 10.1371/journal.pgen.1000741] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2009] [Accepted: 10/26/2009] [Indexed: 11/19/2022] Open
Abstract
For genome-wide association studies in family-based designs, we propose a new, universally applicable approach. The new test statistic exploits all available information about the association, while, by virtue of its design, it maintains the same robustness against population admixture as traditional family-based approaches that are based exclusively on the within-family information. The approach is suitable for the analysis of almost any trait type, e.g. binary, continuous, time-to-onset, multivariate, etc., and combinations of those. We use simulation studies to verify all theoretically derived properties of the approach, estimate its power, and compare it with other standard approaches. We illustrate the practical implications of the new analysis method by an application to a lung-function phenotype, forced expiratory volume in one second (FEV1) in 4 genome-wide association studies. In genome-wide association studies, the multiple testing problem and confounding due to population stratification have been intractable issues. Family-based designs have considered only the transmission of genotypes from founder to nonfounder to prevent sensitivity to the population stratification, which leads to the loss of information. Here we propose a novel analysis approach that combines mutually independent FBAT and screening statistics in a robust way. The proposed method is more powerful than any other, while it preserves the complete robustness of family-based association tests, which only achieves much smaller power level. Furthermore, the proposed method is virtually as powerful as population-based approaches/designs, even in the absence of population stratification. By nature of the proposed method, it is always robust as long as FBAT is valid, and the proposed method achieves the optimal efficiency if our linear model for screening test reasonably explains the observed data in terms of covariance structure and population admixture. We illustrate the practical relevance of the approach by an application in 4 genome-wide association studies.
Collapse
Affiliation(s)
- Sungho Won
- Department of Statistics, Chung-Ang University, Seoul, Korea
- Research Center for Data Science, Chung-Ang University, Seoul, Korea
| | - Jemma B. Wilk
- Department of Neurology, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Rasika A. Mathias
- Genometrics Section, Inherited Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Baltimore, Maryland, United States of America
| | - Christopher J. O'Donnell
- National Heart, Lung, and Blood Institute and Framingham Heart Study, Bethesda, Maryland, United States of America
- Cardiology Division, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Edwin K. Silverman
- Channing Laboratory, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
| | - Kathleen Barnes
- Department of Medicine, School of Medicine, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - George T. O'Connor
- Pulmonary Center, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Scott T. Weiss
- Channing Laboratory, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
- Center for Genomic Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | - Christoph Lange
- Harvard Medical School, Boston, Massachusetts, United States of America
- Center for Genomic Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
22
|
|