1
|
Singar S, Nagpal R, Arjmandi BH, Akhavan NS. Personalized Nutrition: Tailoring Dietary Recommendations through Genetic Insights. Nutrients 2024; 16:2673. [PMID: 39203810 PMCID: PMC11357412 DOI: 10.3390/nu16162673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 08/08/2024] [Accepted: 08/09/2024] [Indexed: 09/03/2024] Open
Abstract
Personalized nutrition (PN) represents a transformative approach in dietary science, where individual genetic profiles guide tailored dietary recommendations, thereby optimizing health outcomes and managing chronic diseases more effectively. This review synthesizes key aspects of PN, emphasizing the genetic basis of dietary responses, contemporary research, and practical applications. We explore how individual genetic differences influence dietary metabolisms, thus underscoring the importance of nutrigenomics in developing personalized dietary guidelines. Current research in PN highlights significant gene-diet interactions that affect various conditions, including obesity and diabetes, suggesting that dietary interventions could be more precise and beneficial if they are customized to genetic profiles. Moreover, we discuss practical implementations of PN, including technological advancements in genetic testing that enable real-time dietary customization. Looking forward, this review identifies the robust integration of bioinformatics and genomics as critical for advancing PN. We advocate for multidisciplinary research to overcome current challenges, such as data privacy and ethical concerns associated with genetic testing. The future of PN lies in broader adoption across health and wellness sectors, promising significant advancements in public health and personalized medicine.
Collapse
Affiliation(s)
- Saiful Singar
- Department of Health, Nutrition, and Food Sciences, College of Education, Health, and Human Sciences, Florida State University, Tallahassee, FL 32306, USA; (S.S.); (R.N.); (B.H.A.)
| | - Ravinder Nagpal
- Department of Health, Nutrition, and Food Sciences, College of Education, Health, and Human Sciences, Florida State University, Tallahassee, FL 32306, USA; (S.S.); (R.N.); (B.H.A.)
| | - Bahram H. Arjmandi
- Department of Health, Nutrition, and Food Sciences, College of Education, Health, and Human Sciences, Florida State University, Tallahassee, FL 32306, USA; (S.S.); (R.N.); (B.H.A.)
| | - Neda S. Akhavan
- Department of Kinesiology and Nutrition Sciences, School of Integrated Health Sciences, University of Nevada, Las Vegas, NV 89154, USA
| |
Collapse
|
2
|
Mbatchou J, McPeek MS. JASPER: Fast, powerful, multitrait association testing in structured samples gives insight on pleiotropy in gene expression. Am J Hum Genet 2024; 111:1750-1769. [PMID: 39025064 PMCID: PMC11339629 DOI: 10.1016/j.ajhg.2024.06.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 06/19/2024] [Accepted: 06/20/2024] [Indexed: 07/20/2024] Open
Abstract
Joint association analysis of multiple traits with multiple genetic variants can provide insight into genetic architecture and pleiotropy, improve trait prediction, and increase power for detecting association. Furthermore, some traits are naturally high-dimensional, e.g., images, networks, or longitudinally measured traits. Assessing significance for multitrait genetic association can be challenging, especially when the sample has population sub-structure and/or related individuals. Failure to adequately adjust for sample structure can lead to power loss and inflated type 1 error, and commonly used methods for assessing significance can work poorly with a large number of traits or be computationally slow. We developed JASPER, a fast, powerful, robust method for assessing significance of multitrait association with a set of genetic variants, in samples that have population sub-structure, admixture, and/or relatedness. In simulations, JASPER has higher power, better type 1 error control, and faster computation than existing methods, with the power and speed advantage of JASPER increasing with the number of traits. JASPER is potentially applicable to a wide range of association testing applications, including for multiple disease traits, expression traits, image-derived traits, and microbiome abundances. It allows for covariates, ascertainment, and rare variants and is robust to phenotype model misspecification. We apply JASPER to analyze gene expression in the Framingham Heart Study, where, compared to alternative approaches, JASPER finds more significant associations, including several that indicate pleiotropic effects, most of which replicate previous results, while others have not previously been reported. Our results demonstrate the promise of JASPER for powerful multitrait analysis in structured samples.
Collapse
Affiliation(s)
- Joelle Mbatchou
- Regeneron Genetics Center, Tarrytown, NY 10591, USA; Department of Statistics, The University of Chicago, Chicago, IL 60637, USA
| | - Mary Sara McPeek
- Department of Statistics, The University of Chicago, Chicago, IL 60637, USA; Department of Human Genetics, The University of Chicago, Chicago, IL 60637, USA.
| |
Collapse
|
3
|
Mbatchou J, McPeek MS. JASPER: fast, powerful, multitrait association testing in structured samples gives insight on pleiotropy in gene expression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.18.571948. [PMID: 38187553 PMCID: PMC10769254 DOI: 10.1101/2023.12.18.571948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Joint association analysis of multiple traits with multiple genetic variants can provide insight into genetic architecture and pleiotropy, improve trait prediction and increase power for detecting association. Furthermore, some traits are naturally high-dimensional, e.g., images, networks or longitudinally measured traits. Assessing significance for multitrait genetic association can be challenging, especially when the sample has population sub-structure and/or related individuals. Failure to adequately adjust for sample structure can lead to power loss and inflated type 1 error, and commonly used methods for assessing significance can work poorly with a large number of traits or be computationally slow. We developed JASPER, a fast, powerful, robust method for assessing significance of multitrait association with a set of genetic variants, in samples that have population sub-structure, admixture and/or relatedness. In simulations, JASPER has higher power, better type 1 error control, and faster computation than existing methods, with the power and speed advantage of JASPER increasing with the number of traits. JASPER is potentially applicable to a wide range of association testing applications, including for multiple disease traits, expression traits, image-derived traits and microbiome abundances. It allows for covariates, ascertainment and rare variants and is robust to phenotype model misspecification. We apply JASPER to analyze gene expression in the Framingham Heart Study, where, compared to alternative approaches, JASPER finds more significant associations, including several that indicate pleiotropic effects, some of which replicate previous results, while others have not previously been reported. Our results demonstrate the promise of JASPER for powerful multitrait analysis in structured samples.
Collapse
Affiliation(s)
- Joelle Mbatchou
- Regeneron Genetics Center, Tarrytown, NY 10591, USA
- Department of Statistics, The University of Chicago, Chicago, IL 60637, USA
| | - Mary Sara McPeek
- Department of Statistics, The University of Chicago, Chicago, IL 60637, USA
- Department of Human Genetics, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
4
|
Rajabli F, Kunkle BW. Strategies in Aggregation Tests for Rare Variants. Curr Protoc 2023; 3:e931. [PMID: 37988228 DOI: 10.1002/cpz1.931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2023]
Abstract
Genome-wide association studies (GWAS) successfully identified numerous common variants involved in complex diseases, but only limited heritability was explained by these findings. Advances in high-throughput sequencing technology made it possible to assess the contribution of rare variants in common diseases. However, study of rare variants introduces challenges due to low frequency of rare variants. Well-established common variant methods were underpowered to identify the rare variants in GWAS. To address this challenge, several new methods have been developed to examine the role of rare variants in complex diseases. These approaches are based on testing the aggregate effect of multiple rare variants in a predefined genetic region. Provided here is an overview of statistical approaches and the protocols explaining step-by-step analysis of aggregations tests with the hands-on experience using R scripts in four categories: burden tests, adaptive burden tests, variance-component tests, and combined tests. Also explained are the concepts of rare variants, permutation tests, kernel methods, and genetic variant annotation. At the end we discuss relevant topics of bioinformatics tools for annotation, family-based design of rare-variant analysis, population stratification adjustment, and meta-analysis. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC.
Collapse
Affiliation(s)
- Farid Rajabli
- Dr. John T. Macdonald Foundation Department of Human Genetics, John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, Florida, USA
| | - Brian W Kunkle
- Dr. John T. Macdonald Foundation Department of Human Genetics, John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, Florida, USA
| |
Collapse
|
5
|
Ovsyannikova IG, Haralambieva IH, Schaid DJ, Warner ND, Poland GA, Kennedy RB. Genome-wide determinants of cellular immune responses to mumps vaccine. Vaccine 2023; 41:6579-6588. [PMID: 37778899 DOI: 10.1016/j.vaccine.2023.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 08/03/2023] [Accepted: 09/01/2023] [Indexed: 10/03/2023]
Abstract
BACKGROUND We have previously described genetic polymorphisms in candidate genes that are associated with inter-individual variations in antibody responses to mumps vaccination. To expand upon our previous work, we performed a genome-wide association study (GWAS) to discover host genetic variants associated with mumps vaccine-induced cellular immune responses. METHODS We performed a GWAS of mumps-specific immune response outcomes (11 secreted cytokines/chemokines) in a cohort of 1,406 subjects. RESULTS Among the 11 cytokine/chemokines we studied, four (IFN-γ, IL-2, IL-1β, and TNFα) demonstrated GWAS signals reaching genome-wide significance (p < 5 × 10-8). A genomic region (encoding Sialic acid-binding immunoglobulin-type lectins/SIGLEC) located on chromosome 19q13 (p < 5 × 10-8) was associated with both IL-1β and TNFα responses. The SIGLEC5/SIGLEC14 region contained 11 statistically significant single nucleotide polymorphisms (SNPs), including the intronic SIGLEC5 rs872629 (p = 1.3E-11) and rs1106476 (p = 1.32E-11) whose alternate alleles were significantly associated with decreased levels of mumps-specific IL-1β (rs872629, p = 1.77E-09; rs1106476, p = 1.78E-09) and TNFα (rs872629, p = 1.3E-11; rs1106476, p = 1.32E-11) production. CONCLUSIONS Our results suggest that SNPs in the SIGLEC5/SIGLEC14 genes play a role in cellular and inflammatory immune responses to mumps vaccination. These findings motivate further research into the functional roles of SIGLEC genes in the regulation of mumps vaccine-induced immunity.
Collapse
Affiliation(s)
| | | | - Daniel J Schaid
- Division of Computational Biology, Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Nathaniel D Warner
- Division of Computational Biology, Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | | | | |
Collapse
|
6
|
Chien LC. Testing for association between ordinal traits and genetic variants in pedigree-structured samples by collapsing and kernel methods. Int J Biostat 2023; 0:ijb-2022-0123. [PMID: 37743670 DOI: 10.1515/ijb-2022-0123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 07/28/2023] [Indexed: 09/26/2023]
Abstract
In genome-wide association studies (GWAS), logistic regression is one of the most popular analytics methods for binary traits. Multinomial regression is an extension of binary logistic regression that allows for multiple categories. However, many GWAS methods have been limited application to binary traits. These methods have improperly often been used to account for ordinal traits, which causes inappropriate type I error rates and poor statistical power. Owing to the lack of analysis methods, GWAS of ordinal traits has been known to be problematic and gaining attention. In this paper, we develop a general framework for identifying ordinal traits associated with genetic variants in pedigree-structured samples by collapsing and kernel methods. We use the local odds ratios GEE technology to account for complicated correlation structures between family members and ordered categorical traits. We use the retrospective idea to treat the genetic markers as random variables for calculating genetic correlations among markers. The proposed genetic association method can accommodate ordinal traits and allow for the covariate adjustment. We conduct simulation studies to compare the proposed tests with the existing models for analyzing the ordered categorical data under various configurations. We illustrate application of the proposed tests by simultaneously analyzing a family study and a cross-sectional study from the Genetic Analysis Workshop 19 (GAW19) data.
Collapse
Affiliation(s)
- Li-Chu Chien
- Center for Fundamental Science, Kaohsiung Medical University, Kaohsiung, Taiwan, ROC
| |
Collapse
|
7
|
Boutry S, Helaers R, Lenaerts T, Vikkula M. Rare variant association on unrelated individuals in case-control studies using aggregation tests: existing methods and current limitations. Brief Bioinform 2023; 24:bbad412. [PMID: 37974506 DOI: 10.1093/bib/bbad412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 10/14/2023] [Accepted: 10/28/2023] [Indexed: 11/19/2023] Open
Abstract
Over the past years, progress made in next-generation sequencing technologies and bioinformatics have sparked a surge in association studies. Especially, genome-wide association studies (GWASs) have demonstrated their effectiveness in identifying disease associations with common genetic variants. Yet, rare variants can contribute to additional disease risk or trait heterogeneity. Because GWASs are underpowered for detecting association with such variants, numerous statistical methods have been recently proposed. Aggregation tests collapse multiple rare variants within a genetic region (e.g. gene, gene set, genomic loci) to test for association. An increasing number of studies using such methods successfully identified trait-associated rare variants and led to a better understanding of the underlying disease mechanism. In this review, we compare existing aggregation tests, their statistical features and scope of application, splitting them into the five classical classes: burden, adaptive burden, variance-component, omnibus and other. Finally, we describe some limitations of current aggregation tests, highlighting potential direction for further investigations.
Collapse
Affiliation(s)
- Simon Boutry
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, 1050 Brussels, Belgium
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, 1050 Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Artificial Intelligence laboratory, Vrije Universiteit Brussel, 1050 Brussels, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
- WELBIO department, WEL Research Institute, avenue Pasteur, 6, 1300 Wavre, Belgium
| |
Collapse
|
8
|
Boutry S, Helaers R, Lenaerts T, Vikkula M. Excalibur: A new ensemble method based on an optimal combination of aggregation tests for rare-variant association testing for sequencing data. PLoS Comput Biol 2023; 19:e1011488. [PMID: 37708232 PMCID: PMC10522036 DOI: 10.1371/journal.pcbi.1011488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 09/26/2023] [Accepted: 09/04/2023] [Indexed: 09/16/2023] Open
Abstract
The development of high-throughput next-generation sequencing technologies and large-scale genetic association studies produced numerous advances in the biostatistics field. Various aggregation tests, i.e. statistical methods that analyze associations of a trait with multiple markers within a genomic region, have produced a variety of novel discoveries. Notwithstanding their usefulness, there is no single test that fits all needs, each suffering from specific drawbacks. Selecting the right aggregation test, while considering an unknown underlying genetic model of the disease, remains an important challenge. Here we propose a new ensemble method, called Excalibur, based on an optimal combination of 36 aggregation tests created after an in-depth study of the limitations of each test and their impact on the quality of result. Our findings demonstrate the ability of our method to control type I error and illustrate that it offers the best average power across all scenarios. The proposed method allows for novel advances in Whole Exome/Genome sequencing association studies, able to handle a wide range of association models, providing researchers with an optimal aggregation analysis for the genetic regions of interest.
Collapse
Affiliation(s)
- Simon Boutry
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
- Artificial Intelligence laboratory, Vrije Universiteit Brussel, Brussels, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- WELBIO department, WEL Research Institute, Wavre, Belgium
| |
Collapse
|
9
|
Majumdar S, Basu S, McGue M, Chatterjee S. Simultaneous selection of multiple important single nucleotide polymorphisms in familial genome wide association studies data. Sci Rep 2023; 13:8476. [PMID: 37231056 PMCID: PMC10213008 DOI: 10.1038/s41598-023-35379-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 05/17/2023] [Indexed: 05/27/2023] Open
Abstract
We propose a resampling-based fast variable selection technique for detecting relevant single nucleotide polymorphisms (SNP) in a multi-marker mixed effect model. Due to computational complexity, current practice primarily involves testing the effect of one SNP at a time, commonly termed as 'single SNP association analysis'. Joint modeling of genetic variants within a gene or pathway may have better power to detect associated genetic variants, especially the ones with weak effects. In this paper, we propose a computationally efficient model selection approach-based on the e-values framework-for single SNP detection in families while utilizing information on multiple SNPs simultaneously. To overcome computational bottleneck of traditional model selection methods, our method trains one single model, and utilizes a fast and scalable bootstrap procedure. We illustrate through numerical studies that our proposed method is more effective in detecting SNPs associated with a trait than either single-marker analysis using family data or model selection methods that ignore the familial dependency structure. Further, we perform gene-level analysis in Minnesota Center for Twin and Family Research (MCTFR) dataset using our method to detect several SNPs using this that have been implicated to be associated with alcohol consumption.
Collapse
Affiliation(s)
- Subhabrata Majumdar
- University of Minnesota Twin Cities, Minneapolis, USA.
- AI Risk and Vulnerability Alliance, Seattle, USA.
| | - Saonli Basu
- University of Minnesota Twin Cities, Minneapolis, USA
| | - Matt McGue
- University of Minnesota Twin Cities, Minneapolis, USA
| | | |
Collapse
|
10
|
Ovsyannikova IG, Haralambieva IH, Schaid DJ, Warner ND, Poland GA, Kennedy RB. Genome-Wide Determinants of Cellular Immune Responses to Mumps Vaccine. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.04.27.23289213. [PMID: 37205333 PMCID: PMC10187346 DOI: 10.1101/2023.04.27.23289213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Background We have previously described genetic polymorphisms in candidate genes that are associated with inter-individual variations in antibody responses to mumps vaccination. To expand upon our previous work, we performed a genome-wide association study (GWAS) to discover host genetic variants associated with mumps vaccine-induced cellular immune responses. Methods We performed a GWAS of mumps-specific immune response outcomes (11 secreted cytokines/chemokines) in a cohort of 1,406 subjects. Results Among the 11 cytokine/chemokines we studied, four (IFN-γ, IL-2, IL-1β, and TNFα) demonstrated GWAS signals reaching genome-wide significance (p<5 x 10 -8 ). A genomic region (encoding Sialic acid-binding immunoglobulin-type lectins/SIGLEC) located on chromosome 19q13 (p<5×10 -8 ) was associated with both IL-1β and TNFα responses. The SIGLEC5/SIGLEC14 region contained 11 statistically significant single nucleotide polymorphisms (SNPs), including the intronic SIGLEC5 rs872629 (p=1.3E-11) and rs1106476 (p=1.32E-11) whose alternate alleles were significantly associated with decreased levels of mumps-specific IL-1β (rs872629, p=1.77E-09; rs1106476, p=1.78E-09) and TNFα (rs872629, p=1.3E-11; rs1106476, p=1.32E-11) production. Conclusions Our results suggest that SNPs in the SIGLEC5/SIGLEC14 genes play a role in cellular and inflammatory immune responses to mumps vaccination. These findings motivate further research into the functional roles of SIGLEC genes in the regulation of mumps vaccine-induced immunity.
Collapse
|
11
|
Mao K, Borel C, Ansar M, Jolly A, Makrythanasis P, Froehlich C, Iwaszkiewicz J, Wang B, Xu X, Li Q, Blanc X, Zhu H, Chen Q, Jin F, Ankamreddy H, Singh S, Zhang H, Wang X, Chen P, Ranza E, Paracha SA, Shah SF, Guida V, Piceci-Sparascio F, Melis D, Dallapiccola B, Digilio MC, Novelli A, Magliozzi M, Fadda MT, Streff H, Machol K, Lewis RA, Zoete V, Squeo GM, Prontera P, Mancano G, Gori G, Mariani M, Selicorni A, Psoni S, Fryssira H, Douzgou S, Marlin S, Biskup S, De Luca A, Merla G, Zhao S, Cox TC, Groves AK, Lupski JR, Zhang Q, Zhang YB, Antonarakis SE. FOXI3 pathogenic variants cause one form of craniofacial microsomia. Nat Commun 2023; 14:2026. [PMID: 37041148 PMCID: PMC10090152 DOI: 10.1038/s41467-023-37703-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 03/28/2023] [Indexed: 04/13/2023] Open
Abstract
Craniofacial microsomia (CFM; also known as Goldenhar syndrome), is a craniofacial developmental disorder of variable expressivity and severity with a recognizable set of abnormalities. These birth defects are associated with structures derived from the first and second pharyngeal arches, can occur unilaterally and include ear dysplasia, microtia, preauricular tags and pits, facial asymmetry and other malformations. The inheritance pattern is controversial, and the molecular etiology of this syndrome is largely unknown. A total of 670 patients belonging to unrelated pedigrees with European and Chinese ancestry with CFM, are investigated. We identify 18 likely pathogenic variants in 21 probands (3.1%) in FOXI3. Biochemical experiments on transcriptional activity and subcellular localization of the likely pathogenic FOXI3 variants, and knock-in mouse studies strongly support the involvement of FOXI3 in CFM. Our findings indicate autosomal dominant inheritance with reduced penetrance, and/or autosomal recessive inheritance. The phenotypic expression of the FOXI3 variants is variable. The penetrance of the likely pathogenic variants in the seemingly dominant form is reduced, since a considerable number of such variants in affected individuals were inherited from non-affected parents. Here we provide suggestive evidence that common variation in the FOXI3 allele in trans with the pathogenic variant could modify the phenotypic severity and accounts for the incomplete penetrance.
Collapse
Affiliation(s)
- Ke Mao
- School of Engineering Medicine, Beihang University, Beijing, 100191, China
| | - Christelle Borel
- Department of Genetic Medicine and Development, University of Geneva Medical Faculty, Geneva, 1211, Switzerland
| | - Muhammad Ansar
- Department of Genetic Medicine and Development, University of Geneva Medical Faculty, Geneva, 1211, Switzerland
- Jules-Gonin Eye Hospital, Department of Ophthalmology, University of Lausanne, 1004, Lausanne, Switzerland
| | - Angad Jolly
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Periklis Makrythanasis
- Department of Genetic Medicine and Development, University of Geneva Medical Faculty, Geneva, 1211, Switzerland
- Laboratory of Medical Genetics, Medical School, University of Athens, Athens, Greece
- Biomedical Research Foundation of the Academy of Athens, Athens, Greece
| | | | - Justyna Iwaszkiewicz
- Molecular Modeling Group, Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
| | - Bingqing Wang
- Plastic Surgery Hospital, Chinese Academy of Medical Sciences, Beijing, 100144, China
| | - Xiaopeng Xu
- School of Engineering Medicine, Beihang University, Beijing, 100191, China
- Key Laboratory of Big Data-Based Precision Medicine (Beihang University), Ministry of Industry and Information Technology, Beijing, China
| | - Qiang Li
- Department of Plastic Surgery, Affiliated Hospital of Xuzhou Medical University, Xuzhou, 221000, China
| | - Xavier Blanc
- Medigenome, Swiss Institute of Genomic Medicine, 1207, Geneva, Switzerland
| | - Hao Zhu
- School of Engineering Medicine, Beihang University, Beijing, 100191, China
| | - Qi Chen
- Plastic Surgery Hospital, Chinese Academy of Medical Sciences, Beijing, 100144, China
| | - Fujun Jin
- School of Engineering Medicine, Beihang University, Beijing, 100191, China
- Key Laboratory of Big Data-Based Precision Medicine (Beihang University), Ministry of Industry and Information Technology, Beijing, China
| | - Harinarayana Ankamreddy
- Department of Biotechnology, School of Bioengineering, SRMIST, Kattankulathur, Tamilnadu, 603203, India
| | - Sunita Singh
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Hongyuan Zhang
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Xiaogang Wang
- School of Engineering Medicine, Beihang University, Beijing, 100191, China
- Key Laboratory of Big Data-Based Precision Medicine (Beihang University), Ministry of Industry and Information Technology, Beijing, China
| | - Peiwei Chen
- Department of Otolaryngology-Head and Neck Surgery, Beijing Tongren Hospital, Capital Medical University, Beijing, China
| | - Emmanuelle Ranza
- Medigenome, Swiss Institute of Genomic Medicine, 1207, Geneva, Switzerland
| | - Sohail Aziz Paracha
- Anatomy Department, Khyber Medical University Institute of Medical Sciences (KIMS), Kohat, Pakistan
| | - Syed Fahim Shah
- Department of Medicine, KMU Institute of Medical Sciences (KIMS), DHQ Hospital KDA, Kohat, Pakistan
| | - Valentina Guida
- Medical Genetics Division, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo, Italy
| | | | - Daniela Melis
- Department of Medicine, Surgery, and Dentistry, Università University degli of Studi di Salerno, Salerno, Italy
| | - Bruno Dallapiccola
- Medical Genetics and Rare Disease Research Division, Pediatric Cardiology, Medical Genetics Laboratory, Neuropsychiatry, Scientific Rectorate, Bambino Gesù Children Hospital, IRCCS, Rome, Italy
| | | | - Antonio Novelli
- Sezione di Genetica Medica, Ospedale 'Bambino Gesù', Rome, Italy
| | - Monia Magliozzi
- Sezione di Genetica Medica, Ospedale 'Bambino Gesù', Rome, Italy
| | - Maria Teresa Fadda
- Department of Maxillo-Facial Surgery, Policlinico Umberto I, Rome, Italy
| | - Haley Streff
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Keren Machol
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Richard A Lewis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Vincent Zoete
- Molecular Modeling Group, Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
- Department of Fundamental Oncology, Ludwig Institute for Cancer Research, Lausanne University, Epalinges, 1066, Switzerland
| | - Gabriella Maria Squeo
- Laboratory of Regulatory & Functional Genomics, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo, Italy
| | - Paolo Prontera
- Medical Genetics Unit, Hospital Santa Maria della Misericordia, Perugia, Italy
| | - Giorgia Mancano
- Medical Genetics Unit, University of Perugia Hospital SM della Misericordia, Perugia, Italy
| | - Giulia Gori
- Medical Genetics Unit, Meyer Children's University Hospital, Florence, Italy
| | - Milena Mariani
- Pediatric Department, ASST Lariana, Santa Anna General Hospital, Como, Italy
| | - Angelo Selicorni
- Pediatric Department, ASST Lariana, Santa Anna General Hospital, Como, Italy
| | - Stavroula Psoni
- Laboratory of Medical Genetics, Medical School, University of Athens, Athens, Greece
| | - Helen Fryssira
- Laboratory of Medical Genetics, Medical School, University of Athens, Athens, Greece
| | - Sofia Douzgou
- Division of Evolution, Infection and Genomics, School of Biological Sciences, University of Manchester, Manchester, UK
- Department of Medical Genetics, Haukeland University Hospital, Bergen, Norway
| | - Sandrine Marlin
- Centre de Référence Surdités Génétiques, Hôpital Necker, Institut Imagine, Paris, France
| | - Saskia Biskup
- CeGaT GmbH and Praxis für Humangenetik Tuebingen, Tuebingen, 72076, Germany
| | - Alessandro De Luca
- Medical Genetics Division, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo, Italy
| | - Giuseppe Merla
- Laboratory of Regulatory & Functional Genomics, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo, Italy
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Via S. Pansini 5, 80131, Naples, Italy
| | - Shouqin Zhao
- Department of Otolaryngology-Head and Neck Surgery, Beijing Tongren Hospital, Capital Medical University, Beijing, China
| | - Timothy C Cox
- Departments of Oral & Craniofacial Sciences and Pediatrics, University of Missouri-Kansas City, Kansas City, MO, 64108, USA
| | - Andrew K Groves
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, 77030, USA
| | - James R Lupski
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
- Department of Pediatrics, Baylor College of Medicine, Houston, TX, 77030, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Qingguo Zhang
- Plastic Surgery Hospital, Chinese Academy of Medical Sciences, Beijing, 100144, China.
| | - Yong-Biao Zhang
- School of Engineering Medicine, Beihang University, Beijing, 100191, China.
- Key Laboratory of Big Data-Based Precision Medicine (Beihang University), Ministry of Industry and Information Technology, Beijing, China.
| | - Stylianos E Antonarakis
- Department of Genetic Medicine and Development, University of Geneva Medical Faculty, Geneva, 1211, Switzerland.
- Medigenome, Swiss Institute of Genomic Medicine, 1207, Geneva, Switzerland.
- iGE3 Institute of Genetics and Genomes in Geneva, Geneva, Switzerland.
| |
Collapse
|
12
|
Chen W, Coombes BJ, Larson NB. Recent advances and challenges of rare variant association analysis in the biobank sequencing era. Front Genet 2022; 13:1014947. [PMID: 36276986 PMCID: PMC9582646 DOI: 10.3389/fgene.2022.1014947] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 09/22/2022] [Indexed: 12/04/2022] Open
Abstract
Causal variants for rare genetic diseases are often rare in the general population. Rare variants may also contribute to common complex traits and can have much larger per-allele effect sizes than common variants, although power to detect these associations can be limited. Sequencing costs have steadily declined with technological advancements, making it feasible to adopt whole-exome and whole-genome profiling for large biobank-scale sample sizes. These large amounts of sequencing data provide both opportunities and challenges for rare-variant association analysis. Herein, we review the basic concepts of rare-variant analysis methods, the current state-of-the-art methods in utilizing variant annotations or external controls to improve the statistical power, and particular challenges facing rare variant analysis such as accounting for population structure, extremely unbalanced case-control design. We also review recent advances and challenges in rare variant analysis for familial sequencing data and for more complex phenotypes such as survival data. Finally, we discuss other potential directions for further methodology investigation.
Collapse
Affiliation(s)
- Wenan Chen
- Center for Applied Bioinformatics, St. Jude Children’s Research Hospital, Memphis, TN, United States
- *Correspondence: Wenan Chen, ; Brandon J. Coombes, ; Nicholas B. Larson,
| | - Brandon J. Coombes
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
- *Correspondence: Wenan Chen, ; Brandon J. Coombes, ; Nicholas B. Larson,
| | - Nicholas B. Larson
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
- *Correspondence: Wenan Chen, ; Brandon J. Coombes, ; Nicholas B. Larson,
| |
Collapse
|
13
|
Wang Y, Chen H, Peloso GM, Meigs JB, Beiser AS, Seshadri S, DeStefano AL, Dupuis J. Family history aggregation unit-based tests to detect rare genetic variant associations with application to the Framingham Heart Study. Am J Hum Genet 2022; 109:738-749. [PMID: 35316615 PMCID: PMC9069079 DOI: 10.1016/j.ajhg.2022.03.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 02/28/2022] [Indexed: 11/15/2022] Open
Abstract
A challenge in standard genetic studies is maintaining good power to detect associations, especially for low prevalent diseases and rare variants. The traditional methods are most powerful when evaluating the association between variants in balanced study designs. Without accounting for family correlation and unbalanced case-control ratio, these analyses could result in inflated type I error. One cost-effective solution to increase statistical power is exploitation of available family history (FH) that contains valuable information about disease heritability. Here, we develop methods to address the aforementioned type I error issues while providing optimal power to analyze aggregates of rare variants by incorporating additional information from FH. With enhanced power in these methods exploiting FH and accounting for relatedness and unbalanced designs, we successfully detect genes with suggestive associations with Alzheimer disease, dementia, and type 2 diabetes by using the exome chip data from the Framingham Heart Study.
Collapse
Affiliation(s)
- Yanbing Wang
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA 02215, USA.
| | - Han Chen
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA; Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Gina M Peloso
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA 02215, USA
| | - James B Meigs
- Division of General Internal Medicine, Massachusetts General Hospital, Boston, MA 02214, USA; Harvard Medical School, Boston, MA 02215, USA; Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02115, USA
| | - Alexa S Beiser
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA 02215, USA; Framingham Heart Study, Framingham, MA 01701, USA; Department of Neurology, Boston University School of Medicine, Boston, MA 02215, USA
| | - Sudha Seshadri
- Framingham Heart Study, Framingham, MA 01701, USA; Department of Neurology, Boston University School of Medicine, Boston, MA 02215, USA; Glenn Biggs Institute for Alzheimer Disease and Neurodegenerative Diseases, University of Texas Health San Antonio, San Antonio, TX 78229, USA
| | - Anita L DeStefano
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA 02215, USA
| | - Josée Dupuis
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA 02215, USA
| |
Collapse
|
14
|
Mascia E, Clarelli F, Zauli A, Guaschino C, Sorosina M, Barizzone N, Basagni C, Santoro S, Ferrè L, Bonfiglio S, Biancolini D, Pozzato M, Guerini FR, Protti A, Liguori M, Moiola L, Vecchio D, Bresolin N, Comi G, Filippi M, Esposito F, D'Alfonso S, Martinelli-Boneschi F. Burden of rare coding variants in an Italian cohort of familial multiple sclerosis. J Neuroimmunol 2022; 362:577760. [PMID: 34922125 DOI: 10.1016/j.jneuroim.2021.577760] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 10/18/2021] [Accepted: 10/31/2021] [Indexed: 12/30/2022]
Abstract
BACKGROUND Multiple Sclerosis (MS) is a chronic inflammatory and neurodegenerative demyelinating disease of the central nervous system. It is a complex and heterogeneous disease caused by a combination of genetic and environmental factors, and it can cluster in families. OBJECTIVE to evaluate at gene-level the aggregate contribution of predicted damaging low-frequency and rare variants to MS risk in multiplex families. METHODS We performed whole exome sequencing (WES) in 28 multiplex MS families with at least 3 MS cases (81 affected and 42 unaffected relatives) and 38 unrelated healthy controls. A gene-based burden test was then performed, focusing on two sets of candidate genes: i) literature-driven selection and ii) data-driven selection. RESULTS We identified 11 genes enriched with predicted damaging low-frequency and rare variants in MS compared to healthy individuals. Among them, UBR2 and DST were the two genes with the strongest enrichment (p = 5 × 10-4 and 3 × 10-4, respectively); interestingly enough the association signal in UBR2 is driven by rs62414610, which was present in 25% of analysed families. CONCLUSION Despite limitations, this is one of the first studies evaluating the aggregate contribution of predicted damaging low-frequency and rare variants in MS families using WES data. A replication effort in independent cohorts is warranted to validate our findings and to evaluate the role of identified genes in MS pathogenesis.
Collapse
Affiliation(s)
- E Mascia
- Laboratory of Human Genetics of Neurological Disorders, Institute of Experimental Neurology (INSPE), Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Via Olgettina 58, 20132 Milan, Italy
| | - F Clarelli
- Laboratory of Human Genetics of Neurological Disorders, Institute of Experimental Neurology (INSPE), Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Via Olgettina 58, 20132 Milan, Italy
| | - A Zauli
- Laboratory of Human Genetics of Neurological Disorders, Institute of Experimental Neurology (INSPE), Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Via Olgettina 58, 20132 Milan, Italy
| | - C Guaschino
- Laboratory of Human Genetics of Neurological Disorders, Institute of Experimental Neurology (INSPE), Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Via Olgettina 58, 20132 Milan, Italy; Department of Neurology, Sant'Antonio Abate Hospital, Gallarate, Italy
| | - M Sorosina
- Laboratory of Human Genetics of Neurological Disorders, Institute of Experimental Neurology (INSPE), Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Via Olgettina 58, 20132 Milan, Italy
| | - N Barizzone
- Department of Health Sciences, Center on Autoimmune and Allergic Diseases (CAAD), UPO, University of Eastern Piedmont, A. Avogadro, 28100 Novara, Italy
| | - C Basagni
- Department of Health Sciences, Center on Autoimmune and Allergic Diseases (CAAD), UPO, University of Eastern Piedmont, A. Avogadro, 28100 Novara, Italy
| | - S Santoro
- Laboratory of Human Genetics of Neurological Disorders, Institute of Experimental Neurology (INSPE), Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Via Olgettina 58, 20132 Milan, Italy
| | - L Ferrè
- Laboratory of Human Genetics of Neurological Disorders, Institute of Experimental Neurology (INSPE), Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Via Olgettina 58, 20132 Milan, Italy; Neurology Unit, IRCCS San Raffaele Scientific Institute, Via Olgettina 48, 20132 Milan, Italy
| | - S Bonfiglio
- Center for Omics Sciences, IRCCS San Raffaele Scientific Institute, Via Olgettina 58, 20132 Milan, Italy
| | - D Biancolini
- Center for Omics Sciences, IRCCS San Raffaele Scientific Institute, Via Olgettina 58, 20132 Milan, Italy
| | - M Pozzato
- Neurology Unit and MS Centre, Foundation IRCCS Ca' Granda Ospedale Maggiore Policlinico, Via Francesco Sforza 35, 20122 Milan, Italy
| | - F R Guerini
- IRCCS Fondazione Don Carlo Gnocchi, ONLUS, Milan, Italy
| | - A Protti
- Ospedale Niguarda, Department of Neurology, Milan, Italy
| | - M Liguori
- National Research Council, Institute of Biomedical Technologies, Bari Unit, 70126 Bari, Italy
| | - L Moiola
- Neurology Unit, IRCCS San Raffaele Scientific Institute, Via Olgettina 48, 20132 Milan, Italy
| | - D Vecchio
- SCDU Neurology, AOU Maggiore della Carità, 28100 Novara, Italy
| | - N Bresolin
- Department of Pathophysiology and Transplantation (DEPT), Dino Ferrari Centre, Neuroscience Section, University of Milan, Via Francesco Sforza 35, 20122 Milan, Italy
| | - G Comi
- Institute of Experimental Neurology, Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Via Olgettina 58, 20132 Milan, Italy
| | - M Filippi
- Neurology Unit, IRCCS San Raffaele Scientific Institute, Via Olgettina 48, 20132 Milan, Italy; Vita-Salute San Raffaele University, Via Olgettina 58, 20132 Milan, Italy; Neuroimaging Research Unit, Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Via Olgettina 48, 20132 Milan, Italy; Neurophysiology Unit, IRCCS San Raffaele Scientific Institute, San Raffaele Scientific Institute, Via Olgettina 48, 20132 Milan, Italy
| | - F Esposito
- Laboratory of Human Genetics of Neurological Disorders, Institute of Experimental Neurology (INSPE), Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Via Olgettina 58, 20132 Milan, Italy; Neurology Unit, IRCCS San Raffaele Scientific Institute, Via Olgettina 48, 20132 Milan, Italy
| | - S D'Alfonso
- Department of Health Sciences, Center on Autoimmune and Allergic Diseases (CAAD), UPO, University of Eastern Piedmont, A. Avogadro, 28100 Novara, Italy
| | - F Martinelli-Boneschi
- Neurology Unit and MS Centre, Foundation IRCCS Ca' Granda Ospedale Maggiore Policlinico, Via Francesco Sforza 35, 20122 Milan, Italy; Department of Pathophysiology and Transplantation (DEPT), Dino Ferrari Centre, Neuroscience Section, University of Milan, Via Francesco Sforza 35, 20122 Milan, Italy.
| |
Collapse
|
15
|
Barizzone N, Cagliani R, Basagni C, Clarelli F, Mendozzi L, Agliardi C, Forni D, Tosi M, Mascia E, Favero F, Corà D, Corrado L, Sorosina M, Esposito F, Zuccalà M, Vecchio D, Liguori M, Comi C, Comi G, Martinelli V, Filippi M, Leone M, Martinelli-Boneschi F, Caputo D, Sironi M, Guerini FR, D’Alfonso S. An Investigation of the Role of Common and Rare Variants in a Large Italian Multiplex Family of Multiple Sclerosis Patients. Genes (Basel) 2021; 12:1607. [PMID: 34681001 PMCID: PMC8535321 DOI: 10.3390/genes12101607] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Revised: 09/26/2021] [Accepted: 10/01/2021] [Indexed: 12/30/2022] Open
Abstract
Known multiple sclerosis (MS) susceptibility variants can only explain half of the disease's estimated heritability, whereas low-frequency and rare variants may partly account for the missing heritability. Thus, here we sought to determine the occurrence of rare functional variants in a large Italian MS multiplex family with five affected members. For this purpose, we combined linkage analysis and next-generation sequencing (NGS)-based whole exome and whole genome sequencing (WES and WGS, respectively). The genetic burden attributable to known common MS variants was also assessed by weighted genetic risk score (wGRS). We found a significantly higher burden of common variants in the affected family members compared to that observed among sporadic MS patients and healthy controls (HCs). We also identified 34 genes containing at least one low-frequency functional variant shared among all affected family members, showing a significant enrichment in genes involved in specific biological processes-particularly mRNA transport-or neurodegenerative diseases. Altogether, our findings point to a possible pathogenic role of different low-frequency functional MS variants belonging to shared pathways. We propose that these rare variants, together with other known common MS variants, may account for the high number of affected family members within this MS multiplex family.
Collapse
Affiliation(s)
- Nadia Barizzone
- Department of Health Sciences, CAAD (Center for Translational Research on Autoimmune and Allergic Diseases), University of Eastern Piedmont, 28100 Novara, Italy; (C.B.); (M.T.); (L.C.); (M.Z.)
| | - Rachele Cagliani
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, 23842 Bosisio Parini, Italy; (R.C.); (D.F.); (M.S.)
| | - Chiara Basagni
- Department of Health Sciences, CAAD (Center for Translational Research on Autoimmune and Allergic Diseases), University of Eastern Piedmont, 28100 Novara, Italy; (C.B.); (M.T.); (L.C.); (M.Z.)
| | - Ferdinando Clarelli
- Laboratory of Genetics of Neurological Complex Disorders, Division of Neuroscience, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy; (F.C.); (E.M.); (M.S.); (F.E.)
| | - Laura Mendozzi
- IRCCS Fondazione Don Carlo Gnocchi ONLUS, 20148 Milan, Italy; (L.M.); (C.A.); (D.C.); (F.R.G.)
| | - Cristina Agliardi
- IRCCS Fondazione Don Carlo Gnocchi ONLUS, 20148 Milan, Italy; (L.M.); (C.A.); (D.C.); (F.R.G.)
| | - Diego Forni
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, 23842 Bosisio Parini, Italy; (R.C.); (D.F.); (M.S.)
| | - Martina Tosi
- Department of Health Sciences, CAAD (Center for Translational Research on Autoimmune and Allergic Diseases), University of Eastern Piedmont, 28100 Novara, Italy; (C.B.); (M.T.); (L.C.); (M.Z.)
| | - Elisabetta Mascia
- Laboratory of Genetics of Neurological Complex Disorders, Division of Neuroscience, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy; (F.C.); (E.M.); (M.S.); (F.E.)
| | - Francesco Favero
- Department of Translational Medicine, CAAD (Center for Translational Research on Autoimmune and Allergic Diseases), University of Eastern Piedmont, 28100 Novara, Italy; (F.F.); (D.C.)
| | - Davide Corà
- Department of Translational Medicine, CAAD (Center for Translational Research on Autoimmune and Allergic Diseases), University of Eastern Piedmont, 28100 Novara, Italy; (F.F.); (D.C.)
| | - Lucia Corrado
- Department of Health Sciences, CAAD (Center for Translational Research on Autoimmune and Allergic Diseases), University of Eastern Piedmont, 28100 Novara, Italy; (C.B.); (M.T.); (L.C.); (M.Z.)
| | - Melissa Sorosina
- Laboratory of Genetics of Neurological Complex Disorders, Division of Neuroscience, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy; (F.C.); (E.M.); (M.S.); (F.E.)
| | - Federica Esposito
- Laboratory of Genetics of Neurological Complex Disorders, Division of Neuroscience, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy; (F.C.); (E.M.); (M.S.); (F.E.)
| | - Miriam Zuccalà
- Department of Health Sciences, CAAD (Center for Translational Research on Autoimmune and Allergic Diseases), University of Eastern Piedmont, 28100 Novara, Italy; (C.B.); (M.T.); (L.C.); (M.Z.)
| | - Domizia Vecchio
- Department of Translational Medicine, IRCAD (Interdisciplinary Research Center of Autoimmune Diseases), University of Eastern Piedmont, 28100 Novara, Italy; (D.V.); (C.C.)
| | - Maria Liguori
- Institute of Biomedical Technologies, Bari Unit, National Research Council, 70126 Bari, Italy;
| | - Cristoforo Comi
- Department of Translational Medicine, IRCAD (Interdisciplinary Research Center of Autoimmune Diseases), University of Eastern Piedmont, 28100 Novara, Italy; (D.V.); (C.C.)
| | - Giancarlo Comi
- Vita-Salute San Raffaele University, 20132 Milan, Italy; (G.C.); (M.F.)
| | - Vittorio Martinelli
- Neurology Unit, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy;
| | - Massimo Filippi
- Vita-Salute San Raffaele University, 20132 Milan, Italy; (G.C.); (M.F.)
- Neurology Unit, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy;
- Neurorehabilitation Unit, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy
- Neurophysiology Service, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy
- Neuroimaging Research Unit, Division of Neuroscience, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy
| | - Maurizio Leone
- Dipartimento di Emergenza e Area Critica, UO Neurologia, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo, 71013 Foggia, Italy;
| | - Filippo Martinelli-Boneschi
- Department of Pathophysiology and Transplantation (DEPT), Dino Ferrari Centre, Neuroscience Section, University of Milan, 20122 Milan, Italy;
- Neurology Unit and MS Centre, Foundation IRCCS Ca’ Granda Ospedale Maggiore Policlinico, 20122 Milan, Italy
| | - Domenico Caputo
- IRCCS Fondazione Don Carlo Gnocchi ONLUS, 20148 Milan, Italy; (L.M.); (C.A.); (D.C.); (F.R.G.)
| | - Manuela Sironi
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, 23842 Bosisio Parini, Italy; (R.C.); (D.F.); (M.S.)
| | - Franca Rosa Guerini
- IRCCS Fondazione Don Carlo Gnocchi ONLUS, 20148 Milan, Italy; (L.M.); (C.A.); (D.C.); (F.R.G.)
| | - Sandra D’Alfonso
- Department of Health Sciences, CAAD (Center for Translational Research on Autoimmune and Allergic Diseases), University of Eastern Piedmont, 28100 Novara, Italy; (C.B.); (M.T.); (L.C.); (M.Z.)
| |
Collapse
|
16
|
Associating Multivariate Traits with Genetic Variants Using Collapsing and Kernel Methods with Pedigree- or Population-Based Studies. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:8812282. [PMID: 33628328 PMCID: PMC7889379 DOI: 10.1155/2021/8812282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2020] [Revised: 01/02/2021] [Accepted: 01/08/2021] [Indexed: 11/18/2022]
Abstract
In genetic association analysis, several relevant phenotypes or multivariate traits with different types of components are usually collected to study complex or multifactorial diseases. Over the past few years, jointly testing for association between multivariate traits and multiple genetic variants has become more popular because it can increase statistical power to identify causal genes in pedigree- or population-based studies. However, most of the existing methods mainly focus on testing genetic variants associated with multiple continuous phenotypes. In this investigation, we develop a framework for identifying the pleiotropic effects of genetic variants on multivariate traits by using collapsing and kernel methods with pedigree- or population-structured data. The proposed framework is applicable to the burden test, the kernel test, and the omnibus test for autosomes and the X chromosome. The proposed multivariate trait association methods can accommodate continuous phenotypes or binary phenotypes and further can adjust for covariates. Simulation studies show that the performance of our methods is satisfactory with respect to the empirical type I error rates and power rates in comparison with the existing methods.
Collapse
|
17
|
Novel directions in data pre-processing and genome-wide association study (GWAS) methodologies to overcome ongoing challenges. INFORMATICS IN MEDICINE UNLOCKED 2021. [DOI: 10.1016/j.imu.2021.100586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
18
|
Schaid DJ, McDonnell SK, FitzGerald LM, DeRycke L, Fogarty Z, Giles GG, MacInnis RJ, Southey MC, Nguyen-Dumont T, Cancel-Tassin G, Cussenot O, Whittemore AS, Sieh W, Ioannidis NM, Hsieh CL, Stanford JL, Schleutker J, Cropp CD, Carpten J, Hoegel J, Eeles R, Kote-Jarai Z, Ackerman MJ, Klein CJ, Mandal D, Cooney KA, Bailey-Wilson JE, Helfand B, Catalona WJ, Wiklund F, Riska S, Bahetti S, Larson MC, Cannon Albright L, Teerlink C, Xu J, Isaacs W, Ostrander EA, Thibodeau SN. Two-stage Study of Familial Prostate Cancer by Whole-exome Sequencing and Custom Capture Identifies 10 Novel Genes Associated with the Risk of Prostate Cancer. Eur Urol 2020; 79:353-361. [PMID: 32800727 PMCID: PMC7881048 DOI: 10.1016/j.eururo.2020.07.038] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2020] [Accepted: 07/31/2020] [Indexed: 10/23/2022]
Abstract
BACKGROUND Family history of prostate cancer (PCa) is a well-known risk factor, and both common and rare genetic variants are associated with the disease. OBJECTIVE To detect new genetic variants associated with PCa, capitalizing on the role of family history and more aggressive PCa. DESIGN, SETTING, AND PARTICIPANTS A two-stage design was used. In stage one, whole-exome sequencing was used to identify potential risk alleles among affected men with a strong family history of disease or with more aggressive disease (491 cases and 429 controls). Aggressive disease was based on a sum of scores for Gleason score, node status, metastasis, tumor stage, prostate-specific antigen at diagnosis, systemic recurrence, and time to PCa death. Genes identified in stage one were screened in stage two using a custom-capture design in an independent set of 2917 cases and 1899 controls. OUTCOME MEASUREMENTS AND STATISTICAL ANALYSIS Frequencies of genetic variants (singly or jointly in a gene) were compared between cases and controls. RESULTS AND LIMITATIONS Eleven genes previously reported to be associated with PCa were detected (ATM, BRCA2, HOXB13, FAM111A, EMSY, HNF1B, KLK3, MSMB, PCAT1, PRSS3, and TERT), as well as an additional 10 novel genes (PABPC1, QK1, FAM114A1, MUC6, MYCBP2, RAPGEF4, RNASEH2B, ULK4, XPO7, and THAP3). Of these 10 novel genes, all but PABPC1 and ULK4 were primarily associated with the risk of aggressive PCa. CONCLUSIONS Our approach demonstrates the advantage of gene sequencing in the search for genetic variants associated with PCa and the benefits of sampling patients with a strong family history of disease or an aggressive form of disease. PATIENT SUMMARY Multiple genes are associated with prostate cancer (PCa) among men with a strong family history of this disease or among men with an aggressive form of PCa.
Collapse
Affiliation(s)
- Daniel J Schaid
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA.
| | - Shannon K McDonnell
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Liesel M FitzGerald
- Menzies Institute for Medical Research, University of Tasmania, Hobart, Australia
| | - Lissa DeRycke
- Specialized Services, National Marrow Donor Program, Minneapolis, MN, USA
| | - Zachary Fogarty
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Graham G Giles
- Cancer Epidemiology Division, Cancer Council Victoria, Melbourne, Victoria, Australia; Centre for Epidemiology and Biostatistics, The University of Melbourne, Parkville, Victoria, Australia; Department of Epidemiology and Preventive Medicine, Monash University, Melbourne, Victoria, Australia; Precision Medicine, School of Clinical Sciences at Monash Health, Monash University, Melbourne, Victoria, Australia
| | - Robert J MacInnis
- Cancer Epidemiology Division, Cancer Council Victoria, Melbourne, Victoria, Australia; Centre for Epidemiology and Biostatistics, The University of Melbourne, Parkville, Victoria, Australia
| | - Melissa C Southey
- Cancer Epidemiology Division, Cancer Council Victoria, Melbourne, Victoria, Australia; Precision Medicine, School of Clinical Sciences at Monash Health, Monash University, Melbourne, Victoria, Australia; Department of Clinical Pathology, Melbourne Medical School, The University of Melbourne, Melbourne, Victoria, Australia
| | - Tu Nguyen-Dumont
- Precision Medicine, School of Clinical Sciences at Monash Health, Monash University, Melbourne, Victoria, Australia; Department of Clinical Pathology, Melbourne Medical School, The University of Melbourne, Melbourne, Victoria, Australia
| | | | | | - Alice S Whittemore
- Department of Health Research and Policy, Stanford University, Stanford, CA, USA
| | - Weiva Sieh
- Population Health Science and Policy, Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Nilah Monnier Ioannidis
- Center for Computational Biology and Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
| | - Chih-Lin Hsieh
- Department of Urology, University of Southern California, Los Angeles, CA, USA
| | - Janet L Stanford
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Johanna Schleutker
- Institute of Biomedicine, University of Turku, and Department of Medical Genetics, Genomics, Laboratory Division, Turku University Hospital, Turku, Finland
| | - Cheryl D Cropp
- Department of Pharmaceutical, Social and Administrative Sciences, McWhorter School of Pharmacy, Samford University, Birmingham, AL, USA
| | - John Carpten
- Department of Translation Genomics, University of Southern California, Los Angeles, CA, USA
| | - Josef Hoegel
- Department of Human Genetics, University of Ulm, Ulm, Germany
| | - Rosalind Eeles
- Division of Genetics and Epidemiology, The Institute of Cancer Research, Sutton Surrey, UK
| | - Zsofia Kote-Jarai
- Division of Genetics and Epidemiology, The Institute of Cancer Research, Sutton Surrey, UK
| | - Michael J Ackerman
- Division of Heart Rhythm Services, Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA; Division of Pediatric Cardiology, Department of Pediatric and Adolescent Medicine, Mayo Clinic, Rochester, MN, USA; Windland Smith Rice Sudden Death Genomics Laboratory, Department of Molecular Pharmacology & Experimental Therapeutics, Mayo Clinic, Rochester, MN, USA
| | | | - Diptasri Mandal
- Department of Genetics, Louisiana State University Health Sciences Center, New Orleans, LA, USA
| | - Kathleen A Cooney
- Department of Medicine and Duke Cancer Institute, Duke University School of Medicine, Durham, NC, USA
| | - Joan E Bailey-Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, Baltimore, MD, USA
| | - Brian Helfand
- Department of Surgery, North Shore University Health System/University of Chicago, Evanston, IL, USA
| | - William J Catalona
- Department of Urology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Fredrick Wiklund
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Shaun Riska
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Saurabh Bahetti
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Melissa C Larson
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Lisa Cannon Albright
- Department of Internal Medicine, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Craig Teerlink
- Department of Internal Medicine, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Jianfeng Xu
- Northshore University Health System, Evanston, IL, USA
| | - William Isaacs
- Department of Urology, Johns Hopkins Hospital, Baltimore, MD, USA
| | - Elaine A Ostrander
- Cancer Genetics and Comparative Genomic Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Stephen N Thibodeau
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
| |
Collapse
|
19
|
Jiang Y, Chiu CY, Yan Q, Chen W, Gorin MB, Conley YP, Lakhal-Chaieb ML, Cook RJ, Amos CI, Wilson AF, Bailey-Wilson JE, McMahon FJ, Vazquez AI, Yuan A, Zhong X, Xiong M, Weeks DE, Fan R. Gene-Based Association Testing of Dichotomous Traits With Generalized Functional Linear Mixed Models Using Extended Pedigrees: Applications to Age-Related Macular Degeneration. J Am Stat Assoc 2020; 116:531-545. [PMID: 34321704 PMCID: PMC8315575 DOI: 10.1080/01621459.2020.1799809] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2017] [Revised: 07/09/2020] [Accepted: 07/17/2020] [Indexed: 10/23/2022]
Abstract
Genetics plays a role in age-related macular degeneration (AMD), a common cause of blindness in the elderly. There is a need for powerful methods for carrying out region-based association tests between a dichotomous trait like AMD and genetic variants on family data. Here, we apply our new generalized functional linear mixed models (GFLMM) developed to test for gene-based association in a set of AMD families. Using common and rare variants, we observe significant association with two known AMD genes: CFH and ARMS2. Using rare variants, we find suggestive signals in four genes: ASAH1, CLEC6A, TMEM63C, and SGSM1. Intriguingly, ASAH1 is down-regulated in AMD aqueous humor, and ASAH1 deficiency leads to retinal inflammation and increased vulnerability to oxidative stress. These findings were made possible by our GFLMM which model the effect of a major gene as a fixed mean, the polygenic contributions as a random variation, and the correlation of pedigree members by kinship coefficients. Simulations indicate that the GFLMM likelihood ratio tests (LRTs) accurately control the Type I error rates. The LRTs have similar or higher power than existing retrospective kernel and burden statistics. Our GFLMM-based statistics provide a new tool for conducting family-based genetic studies of complex diseases. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.
Collapse
Affiliation(s)
- Yingda Jiang
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA
| | - Chi-Yang Chiu
- Division of Biostatistics, Department of Preventive Medicine, University of Tennessee Health Science Center, Memphis, TN
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Baltimore, MD
| | - Qi Yan
- Division of Pulmonary Medicine, Allergy and Immunology, Children’s Hospital of Pittsburgh at The University of Pittsburgh, Pittsburgh, PA
| | - Wei Chen
- Division of Pulmonary Medicine, Allergy and Immunology, Children’s Hospital of Pittsburgh at The University of Pittsburgh, Pittsburgh, PA
| | - Michael B. Gorin
- Department of Ophthalmology, David Geffen School of Medicine, UCLA Stein Eye Institute, Los Angeles, CA
| | - Yvette P. Conley
- Department of Health Promotion and Development, University of Pittsburgh, Pittsburgh, PA
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA
| | | | - Richard J. Cook
- Department of Statistics and Actuarial Science, Waterloo, ON, Canada
| | | | - Alexander F. Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Baltimore, MD
| | - Joan E. Bailey-Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Baltimore, MD
| | - Francis J. McMahon
- Human Genetics Branch and Genetic Basis of Mood and Anxiety Disorders Section, National Institute of Mental Health, NIH, Bethesda, MD
| | - Ana I. Vazquez
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI
| | - Ao Yuan
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, DC
| | - Xiaogang Zhong
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, DC
| | - Momiao Xiong
- Human Genetics Center, University of Texas, Houston, TX
| | - Daniel E. Weeks
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA
| | - Ruzong Fan
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Baltimore, MD
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, DC
| |
Collapse
|
20
|
Xia Y. Correlation and association analyses in microbiome study integrating multiomics in health and disease. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2020; 171:309-491. [PMID: 32475527 DOI: 10.1016/bs.pmbts.2020.04.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Correlation and association analyses are one of the most widely used statistical methods in research fields, including microbiome and integrative multiomics studies. Correlation and association have two implications: dependence and co-occurrence. Microbiome data are structured as phylogenetic tree and have several unique characteristics, including high dimensionality, compositionality, sparsity with excess zeros, and heterogeneity. These unique characteristics cause several statistical issues when analyzing microbiome data and integrating multiomics data, such as large p and small n, dependency, overdispersion, and zero-inflation. In microbiome research, on the one hand, classic correlation and association methods are still applied in real studies and used for the development of new methods; on the other hand, new methods have been developed to target statistical issues arising from unique characteristics of microbiome data. Here, we first provide a comprehensive view of classic and newly developed univariate correlation and association-based methods. We discuss the appropriateness and limitations of using classic methods and demonstrate how the newly developed methods mitigate the issues of microbiome data. Second, we emphasize that concepts of correlation and association analyses have been shifted by introducing network analysis, microbe-metabolite interactions, functional analysis, etc. Third, we introduce multivariate correlation and association-based methods, which are organized by the categories of exploratory, interpretive, and discriminatory analyses and classification methods. Fourth, we focus on the hypothesis testing of univariate and multivariate regression-based association methods, including alpha and beta diversities-based, count-based, and relative abundance (or compositional)-based association analyses. We demonstrate the characteristics and limitations of each approaches. Fifth, we introduce two specific microbiome-based methods: phylogenetic tree-based association analysis and testing for survival outcomes. Sixth, we provide an overall view of longitudinal methods in analysis of microbiome and omics data, which cover standard, static, regression-based time series methods, principal trend analysis, and newly developed univariate overdispersed and zero-inflated as well as multivariate distance/kernel-based longitudinal models. Finally, we comment on current association analysis and future direction of association analysis in microbiome and multiomics studies.
Collapse
Affiliation(s)
- Yinglin Xia
- Department of Medicine, University of Illinois at Chicago, Chicago, IL, United States.
| |
Collapse
|
21
|
Wang Y, Bandyopadhyay D, Shaffer JR, Wu X. Gene-Based Association Mapping for Dental Caries in The GENEVA Consortium. JOURNAL OF DENTISTRY AND DENTAL MEDICINE 2020; 3:156. [PMID: 34622142 PMCID: PMC8494074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
OBJECTIVE Dental caries is a multifactorial disease with high prevalence in both children and adults. Recent genome-wide association studies (GWASs) have revealed that genetic factors play an important role in caries incidence. However, existing methods are not sufficient to identify caries-associated genes, due to the complex correlation structure of caries GWAS data, and lack of appropriate summarization at the gene level. This paper attempts to address that by analyzing data from the Gene, Environment Association Studies (GENEVA) consortium. METHODS We investigated gene-based genetic associations for dental caries based on genome-wide data derived from the GENEVA database, with adjustment to covariates, linkage disequilibrium among single-nucleotide polymorphisms, and family relations, in sampled individuals. RESULTS Several suggestive genes were identified, in which some of them have been previously found to have potential biological functions on cariogenesis. CONCLUSIONS By comparing the gene sets identified from gene-based and SNP-based association testing methods, we found a non-negligible overlap, which indicates that our gene-based analysis can provide substantial supplement to the traditional GWAS analysis.
Collapse
Affiliation(s)
- Yueyao Wang
- Department of Statistics, Virginia Polytechnic Institute & State University, Blacksburg, VA
| | | | - John R. Shaffer
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA
| | - Xiaowei Wu
- Department of Statistics, Virginia Polytechnic Institute & State University, Blacksburg, VA
| |
Collapse
|
22
|
Wu YF, Sytwu HK, Lung FW. Polymorphisms in the Human Aquaporin 4 Gene Are Associated With Schizophrenia in the Southern Chinese Han Population: A Case-Control Study. Front Psychiatry 2020; 11:596. [PMID: 32676041 PMCID: PMC7333661 DOI: 10.3389/fpsyt.2020.00596] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Accepted: 06/09/2020] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND In psychiatric illness, pathogenic role of neuroinflammation has been supported by multiple lines of evidence. Astrocytes contribute to the blood-brain barrier (BBB) with formation of the "glymphatic" drainage system of the central nervous system (CNS) through perivascular processes. Found primarily at the end-feet of astrocytes, the aquaporin 4 (AQP4) gene has been suspected to play putative roles in the development of psychiatric disorders as well as the clearance of the glymphatic system. However, there remain many uncertainties because of the limited research on AQP4. The present study is focused on the association between AQP4 gene polymorphisms and schizophrenia (SCZ) in the Southern Chinese Han population. METHODS Two hundred ninety-two patients and 100 healthy controls were enrolled in this study. To study the relationship of AQP4 gene polymorphisms and SCZ, genetic information was drawn from a cohort of 100 healthy controls and 100 matched patients with SCZ of Southern Han Chinese descent. Comparisons of the allele and genotype distributions between control and case groups were made using the χ2 test. Two-group comparisons were made to assess the linkage equilibrium and haplotype. RESULTS Three SNPs were found. In comparison to healthy controls, patients had higher T-allele frequencies at rs1058424 and G-allele frequencies at rs3763043 (p = 0.043 and p = 0.045, respectively). Furthermore, there is an association between the decreased risk of SCZ and the AA genotype at both rs1058424 (p = 0.021, OR = 2.04) and rs3763043 (p = 0.018, OR = 2.25) The TCG haplotype (p = 0.036) was associated with a potential risk of SCZ, while the ACA haplotype (p = 0.0007) was associated with a decreased risk of SCZ and retained statistical significance after Bonferroni correction (p = 0.006). CONCLUSIONS An etiological reference for SCZ is provided by the association between AQP4 gene polymorphisms and SCZ in Southern Han Chinese population.
Collapse
Affiliation(s)
- Yung-Fu Wu
- Department of Psychiatry, Beitou Branch, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan.,Graduate Institute of Medical Science, National Defense Medical Center, Taipei, Taiwan
| | - Huey-Kang Sytwu
- Department of Microbiology and Immunology, National Defense Medical Center, Taipei, Taiwan
| | - For-Wey Lung
- Graduate Institute of Medical Science, National Defense Medical Center, Taipei, Taiwan.,Department of Psychiatry, Calo Psychiatric Center, Pingtung County, Taiwan
| |
Collapse
|
23
|
Dapas M, Sisk R, Legro RS, Urbanek M, Dunaif A, Hayes MG. Family-Based Quantitative Trait Meta-Analysis Implicates Rare Noncoding Variants in DENND1A in Polycystic Ovary Syndrome. J Clin Endocrinol Metab 2019; 104:3835-3850. [PMID: 31038695 PMCID: PMC6660913 DOI: 10.1210/jc.2018-02496] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Accepted: 04/17/2019] [Indexed: 02/07/2023]
Abstract
CONTEXT Polycystic ovary syndrome (PCOS) is among the most common endocrine disorders of premenopausal women, affecting 5% to15% of this population depending on the diagnostic criteria applied. It is characterized by hyperandrogenism, ovulatory dysfunction, and polycystic ovarian morphology. PCOS is highly heritable, but only a small proportion of this heritability can be accounted for by the common genetic susceptibility variants identified to date. OBJECTIVE The objective of this study was to test whether rare genetic variants contribute to PCOS pathogenesis. DESIGN, PATIENTS, AND METHODS We performed whole-genome sequencing on DNA from 261 individuals from 62 families with one or more daughters with PCOS. We tested for associations of rare variants with PCOS and its concomitant hormonal traits using a quantitative trait meta-analysis. RESULTS We found rare variants in DENND1A (P = 5.31 × 10-5, adjusted P = 0.039) that were significantly associated with reproductive and metabolic traits in PCOS families. CONCLUSIONS Common variants in DENND1A have previously been associated with PCOS diagnosis in genome-wide association studies. Subsequent studies indicated that DENND1A is an important regulator of human ovarian androgen biosynthesis. Our findings provide additional evidence that DENND1A plays a central role in PCOS and suggest that rare noncoding variants contribute to disease pathogenesis.
Collapse
Affiliation(s)
- Matthew Dapas
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
| | - Ryan Sisk
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
| | - Richard S Legro
- Department of Obstetrics and Gynecology, Penn State College of Medicine, Hershey, Pennsylvania
| | - Margrit Urbanek
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
- Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
- Center for Reproductive Science, Northwestern University Feinberg School of Medicine, Chicago, Illinois
| | - Andrea Dunaif
- Division of Endocrinology, Diabetes, and Bone Disease, Icahn School of Medicine at Mount Sinai, New York, New York
| | - M Geoffrey Hayes
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
- Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
- Department of Anthropology, Northwestern University, Evanston, Illinois
| |
Collapse
|
24
|
Schaid DJ, Tong X, Batzler A, Sinnwell JP, Qing J, Biernacka JM. Multivariate generalized linear model for genetic pleiotropy. Biostatistics 2019; 20:111-128. [PMID: 29267957 DOI: 10.1093/biostatistics/kxx067] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Accepted: 11/05/2017] [Indexed: 02/07/2023] Open
Abstract
When a single gene influences more than one trait, known as pleiotropy, it is important to detect pleiotropy to improve the biological understanding of a gene. This can lead to improved screening, diagnosis, and treatment of diseases. Yet, most current multivariate methods to evaluate pleiotropy test the null hypothesis that none of the traits are associated with a variant; departures from the null could be driven by just one associated trait. A formal test of pleiotropy should assume a null hypothesis that one or fewer traits are associated with a genetic variant. We recently developed statistical methods to analyze pleiotropy for quantitative traits having a multivariate normal distribution. We now extend this approach to traits that can be modeled by generalized linear models, such as analysis of binary, ordinal, or quantitative traits, or a mixture of these types of traits. Based on methods from estimating equations, we developed a new test for pleiotropy. We then extended the testing framework to a sequential approach to test the null hypothesis that $k+1$ traits are associated, given that the null of $k$ associated traits was rejected. This provides a testing framework to determine the number of traits associated with a genetic variant, as well as which traits, while accounting for correlations among the traits. By simulations, we illustrate the Type-I error rate and power of our new methods, describe how they are influenced by sample size, the number of traits, and the trait correlations, and apply the new methods to a genome-wide association study of multivariate traits measuring symptoms of major depression. Our new approach provides a quantitative assessment of pleiotropy, enhancing current analytic practice.
Collapse
Affiliation(s)
- Daniel J Schaid
- Department of Health Sciences Research, Mayo Clinic, Harwick 775, 200 First ST SW, Rochester, MN, USA
| | - Xingwei Tong
- School of Statistics, Beijing Normal University, Beijing, China
| | - Anthony Batzler
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Jason P Sinnwell
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Jiang Qing
- School of Statistics, Beijing Normal University, Beijing, China
| | | |
Collapse
|
25
|
Saad M, Wijsman EM. Association score testing for rare variants and binary traits in family data with shared controls. Brief Bioinform 2019; 20:245-253. [PMID: 28968627 DOI: 10.1093/bib/bbx107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2017] [Indexed: 11/12/2022] Open
Abstract
Genome-wide association studies have been an important approach used to localize trait loci, with primary focus on common variants. The multiple rare variant-common disease hypothesis may explain the missing heritability remaining after accounting for identified common variants. Advances of sequencing technologies with their decreasing costs, coupled with methodological advances in the context of association studies in large samples, now make the study of rare variants at a genome-wide scale feasible. The resurgence of family-based association designs because of their advantage in studying rare variants has also stimulated more methods development, mainly based on linear mixed models (LMMs). Other tests such as score tests can have advantages over the LMMs, but to date have mainly been proposed for single-marker association tests. In this article, we extend several score tests (χcorrected2, WQLS, and SKAT) to the multiple variant association framework. We evaluate and compare their statistical performances relative with the LMM. Moreover, we show that three tests can be cast as the difference between marker allele frequencies (AFs) estimated in each of the group of affected and unaffected subjects. We show that these tests are flexible, as they can be based on related, unrelated or both related and unrelated subjects. They also make feasible an increasingly common design that only sequences a subset of affected subjects (related or unrelated) and uses for comparison publicly available AFs estimated in a group of healthy subjects. Finally, we show the great impact of linkage disequilibrium on the performance of all these tests.
Collapse
Affiliation(s)
- Mohamad Saad
- Department of Biostatistics, University of Washington, Seattle, USA.,Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, USA.,Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Ellen M Wijsman
- Department of Biostatistics, University of Washington, Seattle, USA.,Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, USA
| |
Collapse
|
26
|
Larson NB, Chen J, Schaid DJ. A review of kernel methods for genetic association studies. Genet Epidemiol 2019; 43:122-136. [PMID: 30604442 DOI: 10.1002/gepi.22180] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2018] [Revised: 11/09/2018] [Accepted: 11/26/2018] [Indexed: 12/17/2022]
Abstract
Evaluating the association of multiple genetic variants with a trait of interest by use of kernel-based methods has made a significant impact on how genetic association analyses are conducted. An advantage of kernel methods is that they tend to be robust when the genetic variants have effects that are a mixture of positive and negative effects, as well as when there is a small fraction of causal variants. Another advantage is that kernel methods fit within the framework of mixed models, providing flexible ways to adjust for additional covariates that influence traits. Herein, we review the basic ideas behind the use of kernel methods for genetic association analysis as well as recent methodological advancements for different types of traits, multivariate traits, pedigree data, and longitudinal data. Finally, we discuss opportunities for future research.
Collapse
Affiliation(s)
- Nicholas B Larson
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| | - Jun Chen
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| | - Daniel J Schaid
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| |
Collapse
|
27
|
Robust Rare-Variant Association Tests for Quantitative Traits in General Pedigrees. STATISTICS IN BIOSCIENCES 2018; 10:491-505. [DOI: 10.1007/s12561-017-9197-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
28
|
Hamvas A, Feng R, Bi Y, Wang F, Bhattacharya S, Mereness J, Kaushal M, Cotten CM, Ballard PL, Mariani TJ. Exome sequencing identifies gene variants and networks associated with extreme respiratory outcomes following preterm birth. BMC Genet 2018; 19:94. [PMID: 30342483 PMCID: PMC6195962 DOI: 10.1186/s12863-018-0679-7] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 10/01/2018] [Indexed: 12/28/2022] Open
Abstract
Background Previous studies have identified genetic variants associated with bronchopulmonary dysplasia (BPD) in extremely preterm infants. However, findings with genome-wide significance have been rare, and not replicated. We hypothesized that whole exome sequencing (WES) of premature subjects with extremely divergent phenotypic outcomes could facilitate the identification of genetic variants or gene networks contributing disease risk. Results The Prematurity and Respiratory Outcomes Program (PROP) recruited a cohort of > 765 extremely preterm infants for the identification of markers of respiratory morbidity. We completed WES on 146 PROP subjects (85 affected, 61 unaffected) representing extreme phenotypes of early respiratory morbidity. We tested for association between disease status and individual common variants, screened for rare variants exclusive to either affected or unaffected subjects, and tested the combined association of variants across gene loci. Pathway analysis was performed and disease-related expression patterns were assessed. Marginal association with BPD was observed for numerous common and rare variants. We identified 345 genes with variants unique to BPD-affected preterm subjects, and 292 genes with variants unique to our unaffected preterm subjects. Of these unique variants, 28 (19 in the affected cohort and 9 in unaffected cohort) replicate a prior WES study of BPD-associated variants. Pathway analysis of sets of variants, informed by disease-related gene expression, implicated protein kinase A, MAPK and Neuregulin/epidermal growth factor receptor signaling. Conclusions We identified novel genes and associated pathways that may play an important role in susceptibility/resilience for the development of lung disease in preterm infants. Electronic supplementary material The online version of this article (10.1186/s12863-018-0679-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Aaron Hamvas
- Department of Pediatrics, Northwestern University, Chicago, IL, USA. .,Ann and Robert H. Lurie Children's Hospital of Chicago and Northwestern University, Chicago, IL, USA.
| | - Rui Feng
- Department of Biostatistics, University of Pennsylvania, Philadelphia, PA, USA
| | - Yingtao Bi
- Department of Preventive Medicine, Northwestern University, Chicago, IL, USA
| | - Fan Wang
- Department of Biostatistics, University of Pennsylvania, Philadelphia, PA, USA
| | | | - Jared Mereness
- Department of Pediatrics, University of Rochester, Rochester, NY, USA
| | - Madhurima Kaushal
- Center for Biomedical Informatics, Washington University, St. Louis, MO, USA
| | | | - Philip L Ballard
- Department of Pediatrics, University of California, San Francisco, CA, USA
| | - Thomas J Mariani
- Department of Pediatrics, University of Rochester, Rochester, NY, USA. .,Division of Neonatology and Pediatric Molecular and Personalized Medicine Program University of Rochester Medical Center, 601 Elmwood Ave, Box 850, Rochester, NY, 14642, USA.
| | | |
Collapse
|
29
|
Zhan X, Xue L, Zheng H, Plantinga A, Wu MC, Schaid DJ, Zhao N, Chen J. A small‐sample kernel association test for correlated data with application to microbiome association studies. Genet Epidemiol 2018; 42:772-782. [DOI: 10.1002/gepi.22160] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Revised: 06/27/2018] [Accepted: 07/15/2018] [Indexed: 01/11/2023]
Affiliation(s)
- Xiang Zhan
- Department of Public Health SciencesPennsylvania State UniversityHershey Pennsylvania
| | - Lingzhou Xue
- Department of StatisticsPennsylvania State UniversityUniversity Park Pennsylvania
| | - Haotian Zheng
- Department of Mathematical SciencesTsinghua UniversityBeijing China
| | - Anna Plantinga
- Department of BiostatisticsUniversity of WashingtonSeattle Washington
| | - Michael C. Wu
- Department of BiostatisticsUniversity of WashingtonSeattle Washington
- Division of Public Health SciencesFred Hutchinson Cancer Research CenterSeattle Washington
| | - Daniel J. Schaid
- Division of Biomedical Statistics and InformaticsMayo ClinicRochester Minnesota
| | - Ni Zhao
- Department of BiostatisticsJohns Hopkins UniversityBaltimore Maryland
| | - Jun Chen
- Division of Biomedical Statistics and InformaticsMayo ClinicRochester Minnesota
- Center for Individualized MedicineMayo ClinicRochester Minnesota
| |
Collapse
|
30
|
Wu X, Guan T, Liu DJ, León Novelo LG, Bandyopadhyay D. ADAPTIVE-WEIGHT BURDEN TEST FOR ASSOCIATIONS BETWEEN QUANTITATIVE TRAITS AND GENOTYPE DATA WITH COMPLEX CORRELATIONS. Ann Appl Stat 2018; 12:1558-1582. [PMID: 30214655 DOI: 10.1214/17-aoas1121] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
High-throughput sequencing has often been used to screen samples from pedigrees or with population structure, producing genotype data with complex correlations rendered from both familial relation and linkage disequilibrium. With such data, it is critical to account for these genotypic correlations when assessing the contribution of variants by gene or pathway. Recognizing the limitations of existing association testing methods, we propose Adaptive-weight Burden Test (ABT), a retrospective, mixed-model test for genetic association of quantitative traits on genotype data with complex correlations. This method makes full use of genotypic correlations across both samples and variants, and adopts "data-driven" weights to improve power. We derive the ABT statistic and its explicit distribution under the null hypothesis, and demonstrate through simulation studies that it is generally more powerful than the fixed-weight burden test and family-based SKAT in various scenarios, controlling for the type I error rate. Further investigation reveals the connection of ABT with kernel tests, as well as the adaptability of its weights to the direction of genetic effects. The application of ABT is illustrated by a whole genome analysis of genes with common and rare variants associated with fasting glucose from the NHLBI "Grand Opportunity" Exome Sequencing Project.
Collapse
Affiliation(s)
- Xiaowei Wu
- Department of Statistics, Virginia Tech, 250 Drillfield Drive, MC0439, Blacksburg, VA 24061, USA
| | - Ting Guan
- Department of Statistics, Virginia Tech, 250 Drillfield Drive, MC0439, Blacksburg, VA 24061, USA
| | - Dajiang J Liu
- Department of Public Health Sciences, Hershey Institute of Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA 17033, USA
| | - Luis G León Novelo
- Department of Biostatistics, School of Public Health, University of Texas Health Science Center, Houston, TX 77030, USA
| | | |
Collapse
|
31
|
Chien LC, Chiu YF. General retrospective mega-analysis framework for rare variant association tests. Genet Epidemiol 2018; 42:621-635. [PMID: 30188589 DOI: 10.1002/gepi.22147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Revised: 06/05/2018] [Accepted: 06/05/2018] [Indexed: 11/09/2022]
Abstract
Here, we describe a retrospective mega-analysis framework for gene- or region-based multimarker rare variant association tests. Our proposed mega-analysis association tests allow investigators to combine longitudinal and cross-sectional family- and/or population-based studies. This framework can be applied to a continuous, categorical, or survival trait. In addition to autosomal variants, the tests can be applied to conduct mega-analyses on X-chromosome variants. Tests were built on study-specific region- or gene-level quasiscore statistics and, therefore, do not require estimates of effects of individual rare variants. We used the generalized estimating equation approach to account for complex multiple correlation structures between family members, repeated measurements, and genetic markers. While accounting for multilevel correlations and heterogeneity across studies, the test statistics were computationally efficient and feasible for large-scale sequencing studies. The retrospective aspect of association tests helps alleviate bias due to phenotype-related sampling and type I errors due to misspecification of phenotypic distribution. We evaluated our developed mega-analysis methods through comprehensive simulations with varying sample sizes, covariates, population stratification structures, and study designs across multiple studies. To illustrate application of the proposed framework, we conducted a mega-association analysis combining a longitudinal family study and a cross-sectional case-control study from Genetic Analysis Workshop 19.
Collapse
Affiliation(s)
- Li-Chu Chien
- Center for Fundamental Science, Kaohsiung Medical University, Kaohsiung, Taiwan, ROC
| | - Yen-Feng Chiu
- Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Taiwan, ROC
| |
Collapse
|
32
|
Baron RV, Stickel JR, Weeks DE. The Mega2R package: R tools for accessing and processing genetic data in common formats. F1000Res 2018; 7:1352. [PMID: 30271589 PMCID: PMC6137409 DOI: 10.12688/f1000research.15949.2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/28/2019] [Indexed: 02/02/2023] Open
Abstract
The standalone C++ Mega2 program has been facilitating data-reformatting for linkage and association analysis programs since 2000. Support for more analysis programs has been added over time. Currently, Mega2 converts data from several different genetic data formats (including PLINK, VCF, BCF, and IMPUTE2) into the specific data requirements for over 40 commonly-used linkage and association analysis programs (including Mendel, Merlin, Morgan, SHAPEIT, ROADTRIPS, MaCH/minimac3). Recently, Mega2 has been enhanced to use a SQLite database as an intermediate data representation. Additionally, Mega2 now stores bialleleic genotype data in a highly compressed form, like that of the GenABEL R package and the PLINK binary format. Our new Mega2R package now makes it easy to load Mega2 SQLite databases directly into R as data frames. In addition, Mega2R is memory efficient, keeping its genotype data in a compressed format, portions of which are only expanded when needed. Mega2R has functions that ease the process of applying gene-based tests by looping over genes, efficiently pulling out genotypes for variants within the desired boundaries. We have also created several more functions that illustrate how to use the data frames: these permit one to run the pedgene package to carry out gene-based association tests on family data, to run the SKAT package to carry out gene-based association tests, to output the Mega2R data as a VCF file and related files (for phenotype and family data), and to convert the data frames into GenABEL format. The Mega2R package enhances GenABEL since it supports additional input data formats (such as PLINK, VCF, and IMPUTE2) not currently supported by GenABEL. The Mega2 program and the Mega2R R package are both open source and are freely available, along with extensive documentation, from
https://watson.hgen.pitt.edu/register for Mega2 and
https://CRAN.R-project.org/package=Mega2R for Mega2R.
Collapse
Affiliation(s)
- Robert V Baron
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, 15261, USA
| | - Justin R Stickel
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, 15261, USA
| | - Daniel E Weeks
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, 15261, USA.,Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, 15261, USA
| |
Collapse
|
33
|
Detecting Rare Mutations with Heterogeneous Effects Using a Family-Based Genetic Random Field Method. Genetics 2018; 210:463-476. [PMID: 30104420 DOI: 10.1534/genetics.118.301266] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2018] [Accepted: 07/29/2018] [Indexed: 01/19/2023] Open
Abstract
The genetic etiology of many complex diseases is highly heterogeneous. A complex disease can be caused by multiple mutations within the same gene or mutations in multiple genes at various genomic loci. Although these disease-susceptibility mutations can be collectively common in the population, they are often individually rare or even private to certain families. Family-based studies are powerful for detecting rare variants enriched in families, which is an important feature for sequencing studies due to the heterogeneous nature of rare variants. In addition, family designs can provide robust protection against population stratification. Nevertheless, statistical methods for analyzing family-based sequencing data are underdeveloped, especially those accounting for heterogeneous etiology of complex diseases. In this article, we introduce a random field framework for detecting gene-phenotype associations in family-based sequencing studies, referred to as family-based genetic random field (FGRF). Similar to existing family-based association tests, FGRF could utilize within-family and between-family information separately or jointly to test an association. We demonstrate that FGRF has comparable statistical power with existing methods when there is no genetic heterogeneity, but can improve statistical power when there is genetic heterogeneity across families. The proposed method also shares the same advantages with the conventional family-based association tests (e.g., being robust to population stratification). Finally, we applied the proposed method to a sequencing data from the Minnesota Twin Family Study, and revealed several genes, including SAMD14, potentially associated with alcohol dependence.
Collapse
|
34
|
Fernández MV, Budde J, Del-Aguila JL, Ibañez L, Deming Y, Harari O, Norton J, Morris JC, Goate AM, Cruchaga C. Evaluation of Gene-Based Family-Based Methods to Detect Novel Genes Associated With Familial Late Onset Alzheimer Disease. Front Neurosci 2018; 12:209. [PMID: 29670507 PMCID: PMC5893779 DOI: 10.3389/fnins.2018.00209] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2017] [Accepted: 03/15/2018] [Indexed: 12/22/2022] Open
Abstract
Gene-based tests to study the combined effect of rare variants on a particular phenotype have been widely developed for case-control studies, but their evolution and adaptation for family-based studies, especially studies of complex incomplete families, has been slower. In this study, we have performed a practical examination of all the latest gene-based methods available for family-based study designs using both simulated and real datasets. We examined the performance of several collapsing, variance-component, and transmission disequilibrium tests across eight different software packages and 22 models utilizing a cohort of 285 families (N = 1,235) with late-onset Alzheimer disease (LOAD). After a thorough examination of each of these tests, we propose a methodological approach to identify, with high confidence, genes associated with the tested phenotype and we provide recommendations to select the best software and model for family-based gene-based analyses. Additionally, in our dataset, we identified PTK2B, a GWAS candidate gene for sporadic AD, along with six novel genes (CHRD, CLCN2, HDLBP, CPAMD8, NLRP9, and MAS1L) as candidate genes for familial LOAD.
Collapse
Affiliation(s)
- Maria V. Fernández
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, United States
- Hope Center for Neurological Disorders, Washington University School of Medicine, St. Louis, MO, United States
| | - John Budde
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, United States
- Hope Center for Neurological Disorders, Washington University School of Medicine, St. Louis, MO, United States
| | - Jorge L. Del-Aguila
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, United States
- Hope Center for Neurological Disorders, Washington University School of Medicine, St. Louis, MO, United States
| | - Laura Ibañez
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, United States
- Hope Center for Neurological Disorders, Washington University School of Medicine, St. Louis, MO, United States
| | - Yuetiva Deming
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, United States
- Hope Center for Neurological Disorders, Washington University School of Medicine, St. Louis, MO, United States
| | - Oscar Harari
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, United States
- Hope Center for Neurological Disorders, Washington University School of Medicine, St. Louis, MO, United States
| | - Joanne Norton
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, United States
- Hope Center for Neurological Disorders, Washington University School of Medicine, St. Louis, MO, United States
| | - John C. Morris
- Hope Center for Neurological Disorders, Washington University School of Medicine, St. Louis, MO, United States
- Knight Alzheimer's Disease Research Center, Washington University School of Medicine, St. Louis, MO, United States
| | - Alison M. Goate
- Department of Neuroscience, Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | | | | | - Carlos Cruchaga
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, United States
- Hope Center for Neurological Disorders, Washington University School of Medicine, St. Louis, MO, United States
| |
Collapse
|
35
|
Abstract
While genome-wide association studies have been very successful in identifying associations of common genetic variants with many different traits, the rarer frequency spectrum of the genome has not yet been comprehensively explored. Technological developments increasingly lift restrictions to access rare genetic variation. Dense reference panels enable improved genotype imputation for rarer variants in studies using DNA microarrays. Moreover, the decreasing cost of next generation sequencing makes whole exome and genome sequencing increasingly affordable for large samples. Large-scale efforts based on sequencing, such as ExAC, 100,000 Genomes, and TopMed, are likely to significantly advance this field.The main challenge in evaluating complex trait associations of rare variants is statistical power. The choice of population should be considered carefully because allele frequencies and linkage disequilibrium structure differ between populations. Genetically isolated populations can have favorable genomic characteristics for the study of rare variants.One strategy to increase power is to assess the combined effect of multiple rare variants within a region, known as aggregate testing. A range of methods have been developed for this. Model performance depends on the genetic architecture of the region of interest.
Collapse
Affiliation(s)
- Karoline Kuchenbaecker
- Wellcome Trust Sanger Institute, Cambridge, UK. .,University College London, London, UK.
| | - Emil Vincent Rosenbaum Appel
- Novo Nordisk Foundation Center for Basic Metabolic Research, Section for Metabolic Genetics, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
36
|
Chen F, Szymanski EP, Olivier KN, Liu X, Tettelin H, Holland SM, Duggal P. Whole-Exome Sequencing Identifies the 6q12-q16 Linkage Region and a Candidate Gene, TTK, for Pulmonary Nontuberculous Mycobacterial Disease. Am J Respir Crit Care Med 2017; 196:1599-1604. [PMID: 28777004 DOI: 10.1164/rccm.201612-2479oc] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
RATIONALE Pulmonary nontuberculous mycobacterial disease (PNTM) often affects white postmenopausal women, with a tall and lean body habitus and higher rates of scoliosis, pectus excavatum, mitral valve prolapse, and mutations in the CFTR gene. These clinical features and the familial clustering of the disease suggest an underlying genetic mechanism. OBJECTIVES To map the genes associated with PNTM, whole-exome sequencing was conducted in 12 PNTM families and 57 sporadic cases recruited at the National Institutes of Health Clinical Center during 2001-2013. METHODS We performed a variant-level and a gene-level parametric linkage analysis on nine PNTM families (16 affected and 20 unaffected) as well as a gene-level association analysis on nine PNTM families and 55 sporadic cases. MEASUREMENTS AND MAIN RESULTS The genome-wide variant-level linkage analysis using 4,328 independent common variants identified a 20-cM region on chromosome 6q12-6q16 (heterogeneity logarithm of odds score = 3.9), under a recessive disease model with 100% penetrance and a risk allele frequency of 5%. All genes on chromosome 6 were then tested in the gene-level linkage analysis, using the collapsed haplotype pattern method. The TTK protein kinase gene (TTK) on chromosome 6q14.1 was the most significant (heterogeneity logarithm of odds score = 3.38). In addition, the genes MAP2K4, RCOR3, KRT83, IFNLR1, and SLC29A1 were associated with PNTM in our gene-level association analysis. CONCLUSIONS The TTK gene encodes a protein kinase that is essential for mitotic checkpoints and the DNA damage response. TTK and other genetic loci identified in our study may contribute to the increased susceptibility to NTM infection and its progression to pulmonary disease.
Collapse
Affiliation(s)
- Fei Chen
- 1 Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
| | - Eva P Szymanski
- 2 Laboratory of Clinical Infectious Diseases, National Institute of Allergy and Infectious Diseases, National Institutes of Health (NIH), Bethesda, Maryland
| | - Kenneth N Olivier
- 3 Cardiovascular and Pulmonary Branch, National Heart, Lung, and Blood Institute, NIH, Bethesda, Maryland; and
| | - Xinyue Liu
- 4 Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland
| | - Hervé Tettelin
- 4 Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland
| | - Steven M Holland
- 2 Laboratory of Clinical Infectious Diseases, National Institute of Allergy and Infectious Diseases, National Institutes of Health (NIH), Bethesda, Maryland
| | - Priya Duggal
- 1 Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
| |
Collapse
|
37
|
Luo Y, Maity A, Wu MC, Smith C, Duan Q, Li Y, Tzeng JY. On the substructure controls in rare variant analysis: Principal components or variance components? Genet Epidemiol 2017; 42:276-287. [PMID: 29280188 DOI: 10.1002/gepi.22102] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2017] [Revised: 10/07/2017] [Accepted: 10/19/2017] [Indexed: 11/09/2022]
Abstract
Recent studies showed that population substructure (PS) can have more complex impact on rare variant tests and that similarity-based collapsing tests (e.g., SKAT) may suffer more severely by PS than burden-based tests. In this work, we evaluate the performance of SKAT coupling with principal components (PC) or variance components (VC) based PS correction methods. We consider confounding effects caused by PS including stratified populations, admixed populations, and spatially distributed nongenetic risk; we investigate which types of variants (e.g., common, less frequent, rare, or all variants) should be used to effectively control for confounding effects. We found that (i) PC-based methods can account for confounding effects in most scenarios except for admixture, although the number of sufficient PCs depends on the PS complexity and the type of variants used. (ii) PCs based on all variants (i.e., common + less frequent + rare) tend to require equal or fewer sufficient PCs and often achieve higher power than PCs based on other variant types. (iii) VC-based methods can effectively adjust for confounding in all scenarios (even for admixture), though the type of variants should be used to construct VC may vary. (iv) VC based on all variants works consistently in all scenarios, though its power may be sometimes lower than VC based on other variant types. Given that the best-performed method and which variants to use depend on the underlying unknown confounding mechanisms, a robust strategy is to perform SKAT analyses using VC-based methods based on all variants.
Collapse
Affiliation(s)
- Yiwen Luo
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America.,Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Arnab Maity
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Michael C Wu
- Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Chris Smith
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Qing Duan
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Yun Li
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America.,Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Jung-Ying Tzeng
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America.,Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America.,Department of Statistics, National Cheng-Kung University, Tainan, Taiwan.,Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
38
|
Chien LC, Bowden DW, Chiu YF. Region-based association tests for sequencing data on survival traits. Genet Epidemiol 2017; 41:511-522. [DOI: 10.1002/gepi.22054] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2016] [Revised: 03/27/2017] [Accepted: 03/27/2017] [Indexed: 11/07/2022]
Affiliation(s)
- Li-Chu Chien
- Center for Fundamental Science; Kaohsiung Medical University; Kaohsiung Taiwan
| | - Donald W. Bowden
- Center for Diabetes Research, Wake Forest School of Medicine; Winston-Salem North Carolina United States of America
- Center for Genomics and Personalized Medicine Research, Wake Forest School of Medicine; Winston-Salem North Carolina United States of America
- Department of Biochemistry; Wake Forest School of Medicine; Winston-Salem North Carolina United States of America
| | - Yen-Feng Chiu
- Institute of Population Health Sciences; National Health Research Institutes; Miaoli Taiwan
| |
Collapse
|
39
|
Lindor NM, Larson MC, DeRycke MS, McDonnell SK, Baheti S, Fogarty ZC, Win AK, Potter JD, Buchanan DD, Clendenning M, Newcomb PA, Casey G, Gallinger S, Le Marchand L, Hopper JL, Jenkins MA, Goode EL, Thibodeau SN. Germline miRNA DNA variants and the risk of colorectal cancer by subtype. Genes Chromosomes Cancer 2017; 56:177-184. [PMID: 27636879 PMCID: PMC5245119 DOI: 10.1002/gcc.22420] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Revised: 07/28/2016] [Accepted: 08/04/2016] [Indexed: 01/22/2023] Open
Abstract
MicroRNAs (miRNAs) regulate up to one-third of all protein-coding genes including genes relevant to cancer. Variants within miRNAs have been reported to be associated with prognosis, survival, response to chemotherapy across cancer types, in vitro parameters of cell growth, and altered risks for development of cancer. Five miRNA variants have been reported to be associated with risk for development of colorectal cancer (CRC). In this study, we evaluated germline genetic variation in 1,123 miRNAs in 899 individuals with CRCs categorized by clinical subtypes and in 204 controls. The role of common miRNA variation in CRC was investigated using single variant and miRNA-level association tests. Twenty-nine miRNAs and 30 variants exhibited some marginal association with CRC in at least one subtype of CRC. Previously reported associations were not confirmed (n = 4) or could not be evaluated (n = 1). The variants noted for the CRCs with deficient mismatch repair showed little overlap with the variants noted for CRCs with proficient mismatch repair, consistent with our evolving understanding of the distinct biology underlying these two groups. © 2016 The Authors Genes, Chromosomes & Cancer Published by Wiley Periodicals, Inc.
Collapse
Affiliation(s)
| | | | - Melissa S. DeRycke
- Department of Health Sciences ResearchMayo ClinicRochesterMN
- Department of Laboratory Medicine and PathologyMayo ClinicRochesterMN
| | | | - Saurabh Baheti
- Department of Health Sciences ResearchMayo ClinicRochesterMN
| | | | - Aung Ko Win
- Centre for Epidemiology and BiostatisticsMelbourne School of Population and Global Health, University of MelbourneParkvilleVictoriaAustralia
| | - John D. Potter
- Public Health Sciences DivisionFred Hutchinson Cancer Research CenterSeattleWA
- School of Public HealthUniversity of WashingtonSeattleWA
- Colorectal Oncogenomics GroupGenetic Epidemiology Laboratory, Department of Pathology, University of MelbourneParkvilleVictoriaAustralia
| | - Daniel D. Buchanan
- Centre for Epidemiology and BiostatisticsMelbourne School of Population and Global Health, University of MelbourneParkvilleVictoriaAustralia
- Colorectal Oncogenomics GroupGenetic Epidemiology Laboratory, Department of Pathology, University of MelbourneParkvilleVictoriaAustralia
| | - Mark Clendenning
- Colorectal Oncogenomics GroupGenetic Epidemiology Laboratory, Department of Pathology, University of MelbourneParkvilleVictoriaAustralia
| | - Polly A. Newcomb
- Public Health Sciences DivisionFred Hutchinson Cancer Research CenterSeattleWA
| | - Graham Casey
- Department of Preventive MedicineKeck School of Medicine and Norris Comprehensive Cancer Center, University of Southern CaliforniaLos AngelesCA
| | - Steven Gallinger
- Lunenfeld Tanenbaum Research Institute, Mount Sinai Hospital, University of TorontoTorontoOntarioCanada
| | | | - John L. Hopper
- Centre for Epidemiology and BiostatisticsMelbourne School of Population and Global Health, University of MelbourneParkvilleVictoriaAustralia
- Department of Epidemiology and Institute of Health and EnvironmentSchool of Public Health, Seoul National UniversitySeoulSouth Korea
| | - Mark A. Jenkins
- Centre for Epidemiology and BiostatisticsMelbourne School of Population and Global Health, University of MelbourneParkvilleVictoriaAustralia
| | - Ellen L. Goode
- Department of Health Sciences ResearchMayo ClinicRochesterMN
| | | |
Collapse
|
40
|
Qiao D, Lange C, Laird NM, Won S, Hersh CP, Morrow J, Hobbs BD, Lutz SM, Ruczinski I, Beaty TH, Silverman EK, Cho MH. Gene-based segregation method for identifying rare variants in family-based sequencing studies. Genet Epidemiol 2017; 41:309-319. [PMID: 28191685 DOI: 10.1002/gepi.22037] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2016] [Revised: 11/30/2016] [Accepted: 12/09/2016] [Indexed: 11/11/2022]
Abstract
Whole-exome sequencing using family data has identified rare coding variants in Mendelian diseases or complex diseases with Mendelian subtypes, using filters based on variant novelty, functionality, and segregation with the phenotype within families. However, formal statistical approaches are limited. We propose a gene-based segregation test (GESE) that quantifies the uncertainty of the filtering approach. It is constructed using the probability of segregation events under the null hypothesis of Mendelian transmission. This test takes into account different degrees of relatedness in families, the number of functional rare variants in the gene, and their minor allele frequencies in the corresponding population. In addition, a weighted version of this test allows incorporating additional subject phenotypes to improve statistical power. We show via simulations that the GESE and weighted GESE tests maintain appropriate type I error rate, and have greater power than several commonly used region-based methods. We apply our method to whole-exome sequencing data from 49 extended pedigrees with severe, early-onset chronic obstructive pulmonary disease (COPD) in the Boston Early-Onset COPD study (BEOCOPD) and identify several promising candidate genes. Our proposed methods show great potential for identifying rare coding variants of large effect and high penetrance for family-based sequencing data. The proposed tests are implemented in an R package that is available on CRAN (https://cran.r-project.org/web/packages/GESE/).
Collapse
Affiliation(s)
- Dandi Qiao
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Christoph Lange
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Nan M Laird
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Sungho Won
- Department of Public Health Science, Seoul National University, Seoul, Republic of Korea
| | - Craig P Hersh
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America.,Division of Pulmonary and Critical Care Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Jarrett Morrow
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Brian D Hobbs
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America.,Division of Pulmonary and Critical Care Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Sharon M Lutz
- Department of Biostatistics, Anschutz Medical Campus, University of Colorado, Aurora, Colorado, United States of America
| | - Ingo Ruczinski
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| | - Terri H Beaty
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| | - Edwin K Silverman
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America.,Division of Pulmonary and Critical Care Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Michael H Cho
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America.,Division of Pulmonary and Critical Care Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America.,University of Washington Center for Mendelian Genomics, Seattle, Washington, United States of America
| |
Collapse
|
41
|
Zhu H, Wang Z, Wang X, Sha Q. A novel statistical method for rare-variant association studies in general pedigrees. BMC Proc 2016; 10:193-196. [PMID: 27980635 PMCID: PMC5133499 DOI: 10.1186/s12919-016-0029-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Both population-based and family-based designs are commonly used in genetic association studies to identify rare variants that underlie complex diseases. For any type of study design, the statistical power will be improved if rare variants can be enriched in the samples. Family-based designs, with ascertainment based on phenotype, may enrich the sample for causal rare variants and thus can be more powerful than population-based designs. Therefore, it is important to develop family-based statistical methods that can account for ascertainment. In this paper, we develop a novel statistical method for rare-variant association studies in general pedigrees for quantitative traits. This method uses a retrospective view that treats the traits as fixed and the genotypes as random, which allows us to account for complex and undefined ascertainment of families. We then apply the newly developed method to the Genetic Analysis Workshop 19 data set and compare the power of the new method with two other methods for general pedigrees. The results show that the newly proposed method increases power in most of the cases we consider, more than the other two methods.
Collapse
Affiliation(s)
- Huanhuan Zhu
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Drive, Houghton, MI 49931 USA
| | - Zhenchuan Wang
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Drive, Houghton, MI 49931 USA
| | - Xuexia Wang
- Department of Mathematics, University of North Texas, 1155 Union Circle #311430, Denton, TX 76203-5017 USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Drive, Houghton, MI 49931 USA
| |
Collapse
|
42
|
Abstract
With the advance of sequencing technologies, it has become a routine practice to test for association between a quantitative trait and a set of rare variants (RVs). While a number of RV association tests have been proposed, there is a dearth of studies on the robustness of RV association testing for nonnormal distributed traits, e.g., due to skewness, which is ubiquitous in cohort studies. By extensive simulations, we demonstrate that commonly used RV tests, including sequence kernel association test (SKAT) and optimal unified SKAT (SKAT-O), are not robust to heavy-tailed or right-skewed trait distributions with inflated type I error rates; in contrast, the adaptive sum of powered score (aSPU) test is much more robust. Here we further propose a robust version of the aSPU test, called aSPUr. We conduct extensive simulations to evaluate the power of the tests, finding that for a larger number of RVs, aSPU is often more powerful than SKAT and SKAT-O, owing to its high data-adaptivity. We also compare different tests by conducting association analysis of triglyceride levels using the NHLBI ESP whole-exome sequencing data. The QQ plots for SKAT and SKAT-O were severely inflated (λ = 1.89 and 1.78, respectively), while those for aSPU and aSPUr behaved normally. Due to its relatively high robustness to outliers and high power of the aSPU test, we recommend its use complementary to SKAT and SKAT-O. If there is evidence of inflated type I error rate from the aSPU test, we would recommend the use of the more robust, but less powerful, aSPUr test.
Collapse
|
43
|
Wang L, Choi S, Lee S, Park T, Won S. Comparing family-based rare variant association tests for dichotomous phenotypes. BMC Proc 2016; 10:181-186. [PMID: 27980633 PMCID: PMC5133528 DOI: 10.1186/s12919-016-0027-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Background It has been repeatedly stressed that family-based samples suffer less from genetic heterogeneity and that association analyses with family-based samples are expected to be powerful for detecting susceptibility loci for rare disease. Various approaches for rare-variant analysis with family-based samples have been proposed. Methods In this report, performances of the existing methods were compared with the simulated data set provided as part of Genetic Analysis Workshop 19 (GAW19). We considered the rare variant transmission disequilibrium test (RV-TDT), generalized estimating equations-based kernel association (GEE-KM) test, an extended combined multivariate and collapsing test for pedigree data (known as Pedigree Combined Multivariate and Collapsing [PedCMC]), gene-level kernel and burden association tests with disease status for pedigree data (PedGene), and the family-based rare variant association test (FARVAT). Results The results show that PedGene and FARVAT are usually the most efficient, and the optimal test statistic provided by FARVAT is robust under different disease models. Furthermore, FARVAT was implemented with C++, which is more computationally faster than other methods. Conclusions Considering both statistical and computational efficiency, we conclude that FARVAT is a good choice for rare-variant analysis with extended families.
Collapse
Affiliation(s)
- Longfei Wang
- Interdisciplinary Program in bioinformatics, Seoul National University, Seoul, 151-742 Korea
| | - Sungkyoung Choi
- Interdisciplinary Program in bioinformatics, Seoul National University, Seoul, 151-742 Korea
| | - Sungyoung Lee
- Interdisciplinary Program in bioinformatics, Seoul National University, Seoul, 151-742 Korea
| | - Taesung Park
- Interdisciplinary Program in bioinformatics, Seoul National University, Seoul, 151-742 Korea ; Department of Statistics, Seoul National University, Seoul, 151-742 Korea
| | - Sungho Won
- Interdisciplinary Program in bioinformatics, Seoul National University, Seoul, 151-742 Korea ; Department of Public Health Science, Seoul National University, Seoul, 151-742 Korea ; Institute of Health Environment, Seoul National University, Seoul, 151-742 Korea
| |
Collapse
|
44
|
Increasing Generality and Power of Rare-Variant Tests by Utilizing Extended Pedigrees. Am J Hum Genet 2016; 99:846-859. [PMID: 27666371 DOI: 10.1016/j.ajhg.2016.08.015] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2016] [Accepted: 08/17/2016] [Indexed: 11/24/2022] Open
Abstract
Recently, multiple studies have performed whole-exome or whole-genome sequencing to identify groups of rare variants associated with complex traits and diseases. They have primarily utilized case-control study designs that often require thousands of individuals to reach acceptable statistical power. Family-based studies can be more powerful because a rare variant can be enriched in an extended pedigree and segregate with the phenotype. Although many methods have been proposed for using family data to discover rare variants involved in a disease, a majority of them focus on a specific pedigree structure and are designed to analyze either binary or continuously measured outcomes. In this article, we propose RareIBD, a general and powerful approach to identifying rare variants involved in disease susceptibility. Our method can be applied to large extended families of arbitrary structure, including pedigrees with only affected individuals. The method accommodates both binary and quantitative traits. A series of simulation experiments suggest that RareIBD is a powerful test that outperforms existing approaches. In addition, our method accounts for individuals in top generations, which are not usually genotyped in extended families. In contrast to available statistical tests, RareIBD generates accurate p values even when genetic data from these individuals are missing. We applied RareIBD, as well as other methods, to two extended family datasets generated by different genotyping technologies and representing different ethnicities. The analysis of real data confirmed that RareIBD is the only method that properly controls type I error.
Collapse
|
45
|
Jadhav S, Vsevolozhskaya OA, Tong X, Lu Q. The impact of genetic structure on sequencing analysis. BMC Proc 2016; 10:171-174. [PMID: 27980631 PMCID: PMC5133514 DOI: 10.1186/s12919-016-0025-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Genome-wide association studies have made substantial progress in identifying common variants associated with human diseases. Despite such success, a large portion of heritability remains unexplained. Evolutionary theory and empirical studies suggest that rare mutations could play an important role in human diseases, which motivates comprehensive investigation of rare variants in sequencing studies. To explore the association of rare variants with human diseases, many statistical approaches have been developed with different ways of modeling genetic structure (ie, linkage disequilibrium). Nevertheless, the appropriate strategy to model genetic structure of sequencing data and its effect on association analysis have not been well studied. Methods We investigate 3 statistical approaches that use 3 different strategies to model the genetic structure of sequencing data. We proceed by comparing a burden test that assumes independence among sequencing variants, a burden test that considers pairwise linkage disequilibrium (LD), and a functional analysis of variance (FANOVA) test that models genetic data through fitting continuous curves on individuals’ genotypes. Results Through simulations, we find that FANOVA attains better or comparable performance to the 2 burden tests. Overall, the burden test that considers pairwise LD has comparable performance to the burden test that assumes independence between sequencing variants. However, for 1 gene, where the disease-associated variant is located in an LD block, we find that considering pairwise LD could improve the test’s performance. Conclusions The structure of sequencing variants is complex in nature and its patterns vary across the whole genome. In certain cases (eg, a disease-susceptibility variant is in an LD block), ignoring the genetic structure in the association analysis could result in suboptimal performance. Through this study, we show that a functional-based method is promising for modeling the underlying genetic structure of sequencing data, which could lead to better performance.
Collapse
|
46
|
Zhang Q, Guldbrandtsen B, Calus MPL, Lund MS, Sahana G. Comparison of gene-based rare variant association mapping methods for quantitative traits in a bovine population with complex familial relationships. Genet Sel Evol 2016; 48:60. [PMID: 27534618 PMCID: PMC4989328 DOI: 10.1186/s12711-016-0238-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2016] [Accepted: 08/04/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND There is growing interest in the role of rare variants in the variation of complex traits due to increasing evidence that rare variants are associated with quantitative traits. However, association methods that are commonly used for mapping common variants are not effective to map rare variants. Besides, livestock populations have large half-sib families and the occurrence of rare variants may be confounded with family structure, which makes it difficult to disentangle their effects from family mean effects. We compared the power of methods that are commonly applied in human genetics to map rare variants in cattle using whole-genome sequence data and simulated phenotypes. We also studied the power of mapping rare variants using linear mixed models (LMM), which are the method of choice to account for both family relationships and population structure in cattle. RESULTS We observed that the power of the LMM approach was low for mapping a rare variant (defined as those that have frequencies lower than 0.01) with a moderate effect (5 to 8 % of phenotypic variance explained by multiple rare variants that vary from 5 to 21 in number) contributing to a QTL with a sample size of 1000. In contrast, across the scenarios studied, statistical methods that are specialized for mapping rare variants increased power regardless of whether multiple rare variants or a single rare variant underlie a QTL. Different methods for combining rare variants in the test single nucleotide polymorphism set resulted in similar power irrespective of the proportion of total genetic variance explained by the QTL. However, when the QTL variance is very small (only 0.1 % of the total genetic variance), these specialized methods for mapping rare variants and LMM generally had no power to map the variants within a gene with sample sizes of 1000 or 5000. CONCLUSIONS We observed that the methods that combine multiple rare variants within a gene into a meta-variant generally had greater power to map rare variants compared to LMM. Therefore, it is recommended to use rare variant association mapping methods to map rare genetic variants that affect quantitative traits in livestock, such as bovine populations.
Collapse
Affiliation(s)
- Qianqian Zhang
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, 8830, Denmark. .,Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, Wageningen, The Netherlands.
| | - Bernt Guldbrandtsen
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, 8830, Denmark
| | - Mario P L Calus
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, Wageningen, The Netherlands
| | - Mogens Sandø Lund
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, 8830, Denmark
| | - Goutam Sahana
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, 8830, Denmark
| |
Collapse
|
47
|
Statistical Methods for Testing Genetic Pleiotropy. Genetics 2016; 204:483-497. [PMID: 27527515 DOI: 10.1534/genetics.116.189308] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2016] [Accepted: 08/11/2016] [Indexed: 12/28/2022] Open
Abstract
Genetic pleiotropy is when a single gene influences more than one trait. Detecting pleiotropy and understanding its causes can improve the biological understanding of a gene in multiple ways, yet current multivariate methods to evaluate pleiotropy test the null hypothesis that none of the traits are associated with a variant; departures from the null could be driven by just one associated trait. A formal test of pleiotropy should assume a null hypothesis that one or no traits are associated with a genetic variant. For the special case of two traits, one can construct this null hypothesis based on the intersection-union (IU) test, which rejects the null hypothesis only if the null hypotheses of no association for both traits are rejected. To allow for more than two traits, we developed a new likelihood-ratio test for pleiotropy. We then extended the testing framework to a sequential approach to test the null hypothesis that [Formula: see text] traits are associated, given that the null of k traits are associated was rejected. This provides a formal testing framework to determine the number of traits associated with a genetic variant, while accounting for correlations among the traits. By simulations, we illustrate the type I error rate and power of our new methods; describe how they are influenced by sample size, the number of traits, and the trait correlations; and apply the new methods to multivariate immune phenotypes in response to smallpox vaccination. Our new approach provides a quantitative assessment of pleiotropy, enhancing current analytic practice.
Collapse
|
48
|
Lin WY, Liang YC. Conditioning adaptive combination of P-values method to analyze case-parent trios with or without population controls. Sci Rep 2016; 6:28389. [PMID: 27341039 PMCID: PMC4920030 DOI: 10.1038/srep28389] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2016] [Accepted: 06/02/2016] [Indexed: 11/24/2022] Open
Abstract
Detection of rare causal variants can help uncover the etiology of complex diseases. Recruiting case-parent trios is a popular study design in family-based studies. If researchers can obtain data from population controls, utilizing them in trio analyses can improve the power of methods. The transmission disequilibrium test (TDT) is a well-known method to analyze case-parent trio data. It has been extended to rare-variant association testing (abbreviated as "rvTDT"), with the flexibility to incorporate population controls. The rvTDT method is robust to population stratification. However, power loss may occur in the conditioning process. Here we propose a "conditioning adaptive combination of P-values method" (abbreviated as "conADA"), to analyze trios with/without unrelated controls. By first truncating the variants with larger P-values, we decrease the vulnerability of conADA to the inclusion of neutral variants. Moreover, because the test statistic is developed by conditioning on parental genotypes, conADA generates valid statistical inference in the presence of population stratification. With regard to statistical methods for next-generation sequencing data analyses, validity may be hampered by population stratification, whereas power may be affected by the inclusion of neutral variants. We recommend conADA for its robustness to these two factors (population stratification and the inclusion of neutral variants).
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
- Department of Public Health, College of Public Health, National Taiwan University, Taipei, Taiwan
| | - Yun-Chieh Liang
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
49
|
Choi S, Lee S, Qiao D, Hardin M, Cho MH, Silverman EK, Park T, Won S. FARVATX: Family-Based Rare Variant Association Test for X-Linked Genes. Genet Epidemiol 2016; 40:475-85. [PMID: 27325607 DOI: 10.1002/gepi.21979] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2015] [Revised: 03/05/2016] [Accepted: 04/04/2016] [Indexed: 11/06/2022]
Abstract
Although the X chromosome has many genes that are functionally related to human diseases, the complicated biological properties of the X chromosome have prevented efficient genetic association analyses, and only a few significantly associated X-linked variants have been reported for complex traits. For instance, dosage compensation of X-linked genes is often achieved via the inactivation of one allele in each X-linked variant in females; however, some X-linked variants can escape this X chromosome inactivation. Efficient genetic analyses cannot be conducted without prior knowledge about the gene expression process of X-linked variants, and misspecified information can lead to power loss. In this report, we propose new statistical methods for rare X-linked variant genetic association analysis of dichotomous phenotypes with family-based samples. The proposed methods are computationally efficient and can complete X-linked analyses within a few hours. Simulation studies demonstrate the statistical efficiency of the proposed methods, which were then applied to rare-variant association analysis of the X chromosome in chronic obstructive pulmonary disease. Some promising significant X-linked genes were identified, illustrating the practical importance of the proposed methods.
Collapse
Affiliation(s)
- Sungkyoung Choi
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea
| | - Sungyoung Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea
| | - Dandi Qiao
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Megan Hardin
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America.,Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | - Michael H Cho
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America.,Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | - Edwin K Silverman
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America.,Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea.,Department of Statistics, Seoul National University, Seoul, Korea
| | - Sungho Won
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea.,Department of Public Health Science, Seoul National University, Seoul, Korea.,Institute of Health and Environment, Seoul National University, Seoul, Korea
| |
Collapse
|
50
|
Wu B, Pankow JS. On Sample Size and Power Calculation for Variant Set-Based Association Tests. Ann Hum Genet 2016; 80:136-43. [PMID: 26831402 PMCID: PMC4761288 DOI: 10.1111/ahg.12147] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Accepted: 12/07/2015] [Indexed: 01/03/2023]
Abstract
Sample size and power calculations are an important part of designing new sequence-based association studies. The recently developed SEQPower and SPS programs adopted computationally intensive Monte Carlo simulations to empirically estimate power for a series of variant set association (VSA) test methods including the sequence kernel association test (SKAT). It is desirable to develop methods that can quickly and accurately compute power without intensive Monte Carlo simulations. We will show that the computed power for SKAT based on the existing analytical approach could be inflated especially for small significance levels, which are often of primary interest for large-scale whole genome and exome sequencing projects. We propose a new χ(2) -approximation-based approach to accurately and efficiently compute sample size and power. In addition, we propose and implement a more accurate "exact" method to compute power, which is more efficient than the Monte Carlo approach though generally involves more computations than the χ(2) approximation method. The exact approach could produce very accurate results and be used to verify alternative approximation approaches. We implement the proposed methods in publicly available R programs that can be readily adapted when planning sequencing projects.
Collapse
Affiliation(s)
- Baolin Wu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - James S. Pankow
- Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|