1
|
Sun X, Bulekova K, Yang J, Lai M, Pitsillides AN, Liu X, Zhang Y, Guo X, Yong Q, Raffield LM, Rotter JI, Rich SS, Abecasis G, Carson AP, Vasan RS, Bis JC, Psaty BM, Boerwinkle E, Fitzpatrick AL, Satizabal CL, Arking DE, Ding J, Levy D, Liu C. Association analysis of mitochondrial DNA heteroplasmic variants: methods and application. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.01.12.24301233. [PMID: 38260412 PMCID: PMC10802757 DOI: 10.1101/2024.01.12.24301233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
We rigorously assessed a comprehensive association testing framework for heteroplasmy, employing both simulated and real-world data. This framework employed a variant allele fraction (VAF) threshold and harnessed multiple gene-based tests for robust identification and association testing of heteroplasmy. Our simulation studies demonstrated that gene-based tests maintained an appropriate type I error rate at α=0.001. Notably, when 5% or more heteroplasmic variants within a target region were linked to an outcome, burden-extension tests (including the adaptive burden test, variable threshold burden test, and z-score weighting burden test) outperformed the sequence kernel association test (SKAT) and the original burden test. Applying this framework, we conducted association analyses on whole-blood derived heteroplasmy in 17,507 individuals of African and European ancestries (31% of African Ancestry, mean age of 62, with 58% women) with whole genome sequencing data. We performed both cohort- and ancestry-specific association analyses, followed by meta-analysis on both pooled samples and within each ancestry group. Our results suggest that mtDNA-encoded genes/regions are likely to exhibit varying rates in somatic aging, with the notably strong associations observed between heteroplasmy in the RNR1 and RNR2 genes (p<0.001) and advance aging by the Original Burden test. In contrast, SKAT identified significant associations (p<0.001) between diabetes and the aggregated effects of heteroplasmy in several protein-coding genes. Further research is warranted to validate these findings. In summary, our proposed statistical framework represents a valuable tool for facilitating association testing of heteroplasmy with disease traits in large human populations.
Collapse
Affiliation(s)
- Xianbang Sun
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA 02118, USA
| | - Katia Bulekova
- Research Computing Services, Boston University, Boston, MA 02215, USA
| | - Jian Yang
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA 02118, USA
| | - Meng Lai
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA 02118, USA
| | | | - Xue Liu
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA 02118, USA
| | - Yuankai Zhang
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA 02118, USA
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Qian Yong
- Longitudinal Studies Section, Translational Gerontology Branch, NIA/NIH, Baltimore, MD 21224, USA
| | - Laura M. Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| | - Jerome I. Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Stephen S. Rich
- Department of Public Health Services, Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
| | - Goncalo Abecasis
- TOPMed Informatics Research Center, University of Michigan, Ann Arbor, MI 48109, USA
| | - April P. Carson
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS 39216, USA
| | - Ramachandran S. Vasan
- Sections of Preventive Medicine and Epidemiology, and Cardiovascular Medicine, Boston University School of Medicine, Boston, MA, 02118, USA
- Framingham Heart Study, NHLBI/NIH, Framingham, MA 01702, USA
| | - Joshua C. Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA 98101, USA
| | - Bruce M. Psaty
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA 98101, USA
- Departments of Epidemiology, and Health Services, University of Washington, Seattle, WA 98101, USA
| | - Eric Boerwinkle
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Annette L. Fitzpatrick
- Departments of Family Medicine, Epidemiology, and Global Health, University of Washington, Seattle, WA 98195, USA
| | - Claudia L. Satizabal
- Framingham Heart Study, NHLBI/NIH, Framingham, MA 01702, USA
- Glenn Biggs Institute for Alzheimer’s and Neurodegenerative Diseases, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Dan E. Arking
- McKusick-Nathans Institute, Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, MD 21205, USA
| | - Jun Ding
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| | - Daniel Levy
- Framingham Heart Study, NHLBI/NIH, Framingham, MA 01702, USA
- Population Sciences Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | | | - Chunyu Liu
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA 02118, USA
- Framingham Heart Study, NHLBI/NIH, Framingham, MA 01702, USA
| |
Collapse
|
2
|
Jiang Z, Chen C, Xu Z, Wang X, Zhang M, Zhang D. SIGNET: transcriptome-wide causal inference for gene regulatory networks. Sci Rep 2023; 13:19371. [PMID: 37938594 PMCID: PMC10632394 DOI: 10.1038/s41598-023-46295-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 10/30/2023] [Indexed: 11/09/2023] Open
Abstract
Gene regulation plays an important role in understanding the mechanisms of human biology and diseases. However, inferring causal relationships between all genes is challenging due to the large number of genes in the transcriptome. Here, we present SIGNET (Statistical Inference on Gene Regulatory Networks), a flexible software package that reveals networks of causal regulation between genes built upon large-scale transcriptomic and genotypic data at the population level. Like Mendelian randomization, SIGNET uses genotypic variants as natural instrumental variables to establish such causal relationships but constructs a transcriptome-wide gene regulatory network with high confidence. SIGNET makes such a computationally heavy task feasible by deploying a well-designed statistical algorithm over a parallel computing environment. It also provides a user-friendly interface allowing for parameter tuning, efficient parallel computing scheduling, interactive network visualization, and confirmatory results retrieval. The Open source SIGNET software is freely available ( https://www.zstats.org/signet/ ).
Collapse
Affiliation(s)
- Zhongli Jiang
- Department of Statistics, Purdue University, West Lafayette, IN, 47907, USA
| | | | - Zhenyu Xu
- Department of Statistics, Purdue University, West Lafayette, IN, 47907, USA
| | | | - Min Zhang
- Department of Statistics, Purdue University, West Lafayette, IN, 47907, USA
- Department of Epidemiology and Biostatistics, University of California, Irvine, CA, 92617, USA
| | - Dabao Zhang
- Department of Epidemiology and Biostatistics, University of California, Irvine, CA, 92617, USA.
| |
Collapse
|
3
|
Rajabli F, Kunkle BW. Strategies in Aggregation Tests for Rare Variants. Curr Protoc 2023; 3:e931. [PMID: 37988228 DOI: 10.1002/cpz1.931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2023]
Abstract
Genome-wide association studies (GWAS) successfully identified numerous common variants involved in complex diseases, but only limited heritability was explained by these findings. Advances in high-throughput sequencing technology made it possible to assess the contribution of rare variants in common diseases. However, study of rare variants introduces challenges due to low frequency of rare variants. Well-established common variant methods were underpowered to identify the rare variants in GWAS. To address this challenge, several new methods have been developed to examine the role of rare variants in complex diseases. These approaches are based on testing the aggregate effect of multiple rare variants in a predefined genetic region. Provided here is an overview of statistical approaches and the protocols explaining step-by-step analysis of aggregations tests with the hands-on experience using R scripts in four categories: burden tests, adaptive burden tests, variance-component tests, and combined tests. Also explained are the concepts of rare variants, permutation tests, kernel methods, and genetic variant annotation. At the end we discuss relevant topics of bioinformatics tools for annotation, family-based design of rare-variant analysis, population stratification adjustment, and meta-analysis. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC.
Collapse
Affiliation(s)
- Farid Rajabli
- Dr. John T. Macdonald Foundation Department of Human Genetics, John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, Florida, USA
| | - Brian W Kunkle
- Dr. John T. Macdonald Foundation Department of Human Genetics, John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, Florida, USA
| |
Collapse
|
4
|
Boutry S, Helaers R, Lenaerts T, Vikkula M. Rare variant association on unrelated individuals in case-control studies using aggregation tests: existing methods and current limitations. Brief Bioinform 2023; 24:bbad412. [PMID: 37974506 DOI: 10.1093/bib/bbad412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 10/14/2023] [Accepted: 10/28/2023] [Indexed: 11/19/2023] Open
Abstract
Over the past years, progress made in next-generation sequencing technologies and bioinformatics have sparked a surge in association studies. Especially, genome-wide association studies (GWASs) have demonstrated their effectiveness in identifying disease associations with common genetic variants. Yet, rare variants can contribute to additional disease risk or trait heterogeneity. Because GWASs are underpowered for detecting association with such variants, numerous statistical methods have been recently proposed. Aggregation tests collapse multiple rare variants within a genetic region (e.g. gene, gene set, genomic loci) to test for association. An increasing number of studies using such methods successfully identified trait-associated rare variants and led to a better understanding of the underlying disease mechanism. In this review, we compare existing aggregation tests, their statistical features and scope of application, splitting them into the five classical classes: burden, adaptive burden, variance-component, omnibus and other. Finally, we describe some limitations of current aggregation tests, highlighting potential direction for further investigations.
Collapse
Affiliation(s)
- Simon Boutry
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, 1050 Brussels, Belgium
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, 1050 Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Artificial Intelligence laboratory, Vrije Universiteit Brussel, 1050 Brussels, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
- WELBIO department, WEL Research Institute, avenue Pasteur, 6, 1300 Wavre, Belgium
| |
Collapse
|
5
|
Boutry S, Helaers R, Lenaerts T, Vikkula M. Excalibur: A new ensemble method based on an optimal combination of aggregation tests for rare-variant association testing for sequencing data. PLoS Comput Biol 2023; 19:e1011488. [PMID: 37708232 PMCID: PMC10522036 DOI: 10.1371/journal.pcbi.1011488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 09/26/2023] [Accepted: 09/04/2023] [Indexed: 09/16/2023] Open
Abstract
The development of high-throughput next-generation sequencing technologies and large-scale genetic association studies produced numerous advances in the biostatistics field. Various aggregation tests, i.e. statistical methods that analyze associations of a trait with multiple markers within a genomic region, have produced a variety of novel discoveries. Notwithstanding their usefulness, there is no single test that fits all needs, each suffering from specific drawbacks. Selecting the right aggregation test, while considering an unknown underlying genetic model of the disease, remains an important challenge. Here we propose a new ensemble method, called Excalibur, based on an optimal combination of 36 aggregation tests created after an in-depth study of the limitations of each test and their impact on the quality of result. Our findings demonstrate the ability of our method to control type I error and illustrate that it offers the best average power across all scenarios. The proposed method allows for novel advances in Whole Exome/Genome sequencing association studies, able to handle a wide range of association models, providing researchers with an optimal aggregation analysis for the genetic regions of interest.
Collapse
Affiliation(s)
- Simon Boutry
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
- Artificial Intelligence laboratory, Vrije Universiteit Brussel, Brussels, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- WELBIO department, WEL Research Institute, Wavre, Belgium
| |
Collapse
|
6
|
Jiang Z, Chen C, Xu Z, Wang X, Zhang M, Zhang D. SIGNET: Transcriptome-wide Causal Inference for Gene Regulatory Networks. RESEARCH SQUARE 2023:rs.3.rs-3180043. [PMID: 37546848 PMCID: PMC10402199 DOI: 10.21203/rs.3.rs-3180043/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Gene regulation plays an important role in understanding the mechanisms of human biology and diseases. However, inferring causal relationships between all genes is challenging due to the large number of genes in the transcriptome. Here, we present SIGNET (Statistical Inference on Gene Regulatory Networks), a flexible software package that reveals networks of causal regulation between genes built upon large-scale transcriptomic and genotypic data at the population level. Like Mendelian randomization, SIGNET uses genotypic variants as natural instrumental variables to establish such causal relationships but constructs a transcriptome-wide gene regulatory network with high confidence. SIGNET makes such a computationally heavy task feasible by deploying a well-designed statistical algorithm over a parallel computing environment. It also provides a user-friendly interface allowing for parameter tuning, efficient parallel computing scheduling, interactive network visualization, and confirmatory results retrieval. The Open source SIGNET software is freely available (https://www.zstats.org/signet/).
Collapse
Affiliation(s)
- Zhongli Jiang
- Department of Statistics, Purdue University, West Lafayette, 47907, Indiana, United States
| | - Chen Chen
- UCB Pharma, Brussels, 1070, Belgium
- These authors contributed to this project as research assistants when they studied in the Department of Statistics, Purdue University
| | - Zhenyu Xu
- Department of Statistics, Purdue University, West Lafayette, 47907, Indiana, United States
- These authors contributed to this project as research assistants when they studied in the Department of Statistics, Purdue University
| | - Xiaojian Wang
- ByteDance, Shanghai, 201107, China
- These authors contributed to this project as research assistants when they studied in the Department of Statistics, Purdue University
| | - Min Zhang
- Department of Statistics, Purdue University, West Lafayette, 47907, Indiana, United States
- Department of Epidemiology and Biostatistics, University of California, Irvine, 92617, California, United States
| | - Dabao Zhang
- Department of Statistics, Purdue University, West Lafayette, 47907, Indiana, United States
| |
Collapse
|
7
|
Wang J, Zhou F, Li C, Yin N, Liu H, Zhuang B, Huang Q, Wen Y. Gene Association Analysis of Quantitative Trait Based on Functional Linear Regression Model with Local Sparse Estimator. Genes (Basel) 2023; 14:genes14040834. [PMID: 37107592 PMCID: PMC10137544 DOI: 10.3390/genes14040834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Revised: 03/27/2023] [Accepted: 03/28/2023] [Indexed: 04/03/2023] Open
Abstract
Functional linear regression models have been widely used in the gene association analysis of complex traits. These models retain all the genetic information in the data and take full advantage of spatial information in genetic variation data, which leads to brilliant detection power. However, the significant association signals identified by the high-power methods are not all the real causal SNPs, because it is easy to regard noise information as significant association signals, leading to a false association. In this paper, a method based on the sparse functional data association test (SFDAT) of gene region association analysis is developed based on a functional linear regression model with local sparse estimation. The evaluation indicators CSR and DL are defined to evaluate the feasibility and performance of the proposed method with other indicators. Simulation studies show that: (1) SFDAT performs well under both linkage equilibrium and linkage disequilibrium simulation; (2) SFDAT performs successfully for gene regions (including common variants, low-frequency variants, rare variants and mix variants); (3) With power and type I error rates comparable to OLS and Smooth, SFDAT has a better ability to handle the zero regions. The Oryza sativa data set is analyzed by SFDAT. It is shown that SFDAT can better perform gene association analysis and eliminate the false positive of gene localization. This study showed that SFDAT can lower the interference caused by noise while maintaining high power. SFDAT provides a new method for the association analysis between gene regions and phenotypic quantitative traits.
Collapse
Affiliation(s)
- Jingyu Wang
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Fujie Zhou
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Cheng Li
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Ning Yin
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Huiming Liu
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Binxian Zhuang
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Qingyu Huang
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Yongxian Wen
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Correspondence:
| |
Collapse
|
8
|
A Whole-Genome Sequencing Study Implicates GRAMD1B in Multiple Sclerosis Susceptibility. Genes (Basel) 2022; 13:genes13122392. [PMID: 36553660 PMCID: PMC9777893 DOI: 10.3390/genes13122392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 12/13/2022] [Accepted: 12/14/2022] [Indexed: 12/23/2022] Open
Abstract
While the role of common genetic variants in multiple sclerosis (MS) has been elucidated in large genome-wide association studies, the contribution of rare variants to the disease remains unclear. Herein, a whole-genome sequencing study in four affected and four healthy relatives of a consanguineous Italian family identified a novel missense c.1801T > C (p.S601P) variant in the GRAMD1B gene that is shared within MS cases and resides under a linkage peak (LOD: 2.194). Sequencing GRAMD1B in 91 familial MS cases revealed two additional rare missense and two splice-site variants, two of which (rs755488531 and rs769527838) were not found in 1000 Italian healthy controls. Functional studies demonstrated that GRAMD1B, a gene with unknown function in the central nervous system (CNS), is expressed by several cell types, including astrocytes, microglia and neurons as well as by peripheral monocytes and macrophages. Notably, GRAMD1B was downregulated in vessel-associated astrocytes of active MS lesions in autopsied brains and by inflammatory stimuli in peripheral monocytes, suggesting a possible role in the modulation of inflammatory response and disease pathophysiology.
Collapse
|
9
|
Chen W, Coombes BJ, Larson NB. Recent advances and challenges of rare variant association analysis in the biobank sequencing era. Front Genet 2022; 13:1014947. [PMID: 36276986 PMCID: PMC9582646 DOI: 10.3389/fgene.2022.1014947] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 09/22/2022] [Indexed: 12/04/2022] Open
Abstract
Causal variants for rare genetic diseases are often rare in the general population. Rare variants may also contribute to common complex traits and can have much larger per-allele effect sizes than common variants, although power to detect these associations can be limited. Sequencing costs have steadily declined with technological advancements, making it feasible to adopt whole-exome and whole-genome profiling for large biobank-scale sample sizes. These large amounts of sequencing data provide both opportunities and challenges for rare-variant association analysis. Herein, we review the basic concepts of rare-variant analysis methods, the current state-of-the-art methods in utilizing variant annotations or external controls to improve the statistical power, and particular challenges facing rare variant analysis such as accounting for population structure, extremely unbalanced case-control design. We also review recent advances and challenges in rare variant analysis for familial sequencing data and for more complex phenotypes such as survival data. Finally, we discuss other potential directions for further methodology investigation.
Collapse
Affiliation(s)
- Wenan Chen
- Center for Applied Bioinformatics, St. Jude Children’s Research Hospital, Memphis, TN, United States
- *Correspondence: Wenan Chen, ; Brandon J. Coombes, ; Nicholas B. Larson,
| | - Brandon J. Coombes
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
- *Correspondence: Wenan Chen, ; Brandon J. Coombes, ; Nicholas B. Larson,
| | - Nicholas B. Larson
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
- *Correspondence: Wenan Chen, ; Brandon J. Coombes, ; Nicholas B. Larson,
| |
Collapse
|
10
|
Aborageh M, Krawitz P, Fröhlich H. Genetics in parkinson's disease: From better disease understanding to machine learning based precision medicine. FRONTIERS IN MOLECULAR MEDICINE 2022; 2:933383. [PMID: 39086979 PMCID: PMC11285583 DOI: 10.3389/fmmed.2022.933383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 08/30/2022] [Indexed: 08/02/2024]
Abstract
Parkinson's Disease (PD) is a neurodegenerative disorder with highly heterogeneous phenotypes. Accordingly, it has been challenging to robustly identify genetic factors associated with disease risk, prognosis and therapy response via genome-wide association studies (GWAS). In this review we first provide an overview of existing statistical methods to detect associations between genetic variants and the disease phenotypes in existing PD GWAS. Secondly, we discuss the potential of machine learning approaches to better quantify disease phenotypes and to move beyond disease understanding towards a better-personalized treatment of the disease.
Collapse
Affiliation(s)
- Mohamed Aborageh
- Bonn-Aachen International Center for Information Technology (B-IT), Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Peter Krawitz
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Bonn, Germany
| | - Holger Fröhlich
- Bonn-Aachen International Center for Information Technology (B-IT), Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
| |
Collapse
|
11
|
Wang T, Ionita-Laza I, Wei Y. Integrated Quantile RAnk Test (iQRAT) for gene-level associations. Ann Appl Stat 2022. [DOI: 10.1214/21-aoas1548] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Tianying Wang
- Center for Statistical Science & Department of Industrial Engineering, Tsinghua University
| | | | - Ying Wei
- Department of Biostatistics, Columbia University
| |
Collapse
|
12
|
Lee JY, Shen PS, Cheng KF. A robust association test with multiple genetic variants and covariates. Stat Appl Genet Mol Biol 2022; 21:sagmb-2021-0029. [DOI: 10.1515/sagmb-2021-0029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Accepted: 05/20/2022] [Indexed: 11/15/2022]
Abstract
Abstract
Due to the advancement of genome sequencing techniques, a great stride has been made in exome sequencing such that the association study between disease and genetic variants has become feasible. Some powerful and well-known association tests have been proposed to test the association between a group of genes and the disease of interest. However, some challenges still remain, in particular, many factors can affect the performance of testing power, e.g., the sample size, the number of causal and non-causal variants, and direction of the effect of causal variants. Recently, a powerful test, called T
REM
, is derived based on a random effects model. T
REM
has the advantages of being less sensitive to the inclusion of non-causal rare variants or low effect common variants or the presence of missing genotypes. However, the testing power of T
REM
can be low when a portion of causal variants has effects in opposite directions. To improve the drawback of T
REM
, we propose a novel test, called T
ROB
, which keeps the advantages of T
REM
and is more robust than T
REM
in terms of having adequate power in the case of variants with opposite directions of effect. Simulation results show that T
ROB
has a stable type I error rate and outperforms T
REM
when the proportion of risk variants decreases to a certain level and its advantage over T
REM
increases as the proportion decreases. Furthermore, T
ROB
outperforms several other competing tests in most scenarios. The proposed methodology is illustrated using the Shanghai Breast Cancer Study.
Collapse
Affiliation(s)
- Jen-Yu Lee
- Department of Statistics , Feng Chia University , Taichung , Taiwan, ROC
| | - Pao-Sheng Shen
- Department of Statistics , Tunghai University , Taichung , Taiwan, ROC
| | - Kuang-Fu Cheng
- Biostatistics Center , Taipei Medical University , Taipei , Taiwan, ROC
- Department of Business Administration , Asia University , Taichung , Taiwan, ROC
| |
Collapse
|
13
|
Li MK, Yuan YX, Zhu B, Wang KW, Fung WK, Zhou JY. Gene-Based Methods for Estimating the Degree of the Skewness of X Chromosome Inactivation. Genes (Basel) 2022; 13:genes13050827. [PMID: 35627212 PMCID: PMC9140558 DOI: 10.3390/genes13050827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 05/01/2022] [Accepted: 05/02/2022] [Indexed: 11/16/2022] Open
Abstract
Skewed X chromosome inactivation (XCI-S) has been reported to be associated with some X-linked diseases, and currently several methods have been proposed to estimate the degree of the XCI-S (denoted as γ) for a single locus. However, no method has been available to estimate γ for genes. Therefore, in this paper, we first propose the point estimate and the penalized point estimate of γ for genes, and then derive its confidence intervals based on the Fieller’s and penalized Fieller’s methods, respectively. Further, we consider the constraint condition of γ∈[0, 2] and propose the Bayesian methods to obtain the point estimates and the credible intervals of γ, where a truncated normal prior and a uniform prior are respectively used (denoted as GBN and GBU). The simulation results show that the Bayesian methods can avoid the extreme point estimates (0 or 2), the empty sets, the noninformative intervals ([0, 2]) and the discontinuous intervals to occur. GBN performs best in both the point estimation and the interval estimation. Finally, we apply the proposed methods to the Minnesota Center for Twin and Family Research data for their practical use. In summary, in practical applications, we recommend using GBN to estimate γ of genes.
Collapse
Affiliation(s)
- Meng-Kai Li
- Department of Biostatistics, State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, School of Public Health, Southern Medical University, Guangzhou 510515, China; (M.-K.L.); (Y.-X.Y.); (B.Z.); (K.-W.W.)
- Guangdong-Hong Hong-Macao Joint Laboratory for Contaminants Exposure and Health, Guangzhou 510006, China
| | - Yu-Xin Yuan
- Department of Biostatistics, State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, School of Public Health, Southern Medical University, Guangzhou 510515, China; (M.-K.L.); (Y.-X.Y.); (B.Z.); (K.-W.W.)
- Guangdong-Hong Hong-Macao Joint Laboratory for Contaminants Exposure and Health, Guangzhou 510006, China
| | - Bin Zhu
- Department of Biostatistics, State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, School of Public Health, Southern Medical University, Guangzhou 510515, China; (M.-K.L.); (Y.-X.Y.); (B.Z.); (K.-W.W.)
- Guangdong-Hong Hong-Macao Joint Laboratory for Contaminants Exposure and Health, Guangzhou 510006, China
| | - Kai-Wen Wang
- Department of Biostatistics, State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, School of Public Health, Southern Medical University, Guangzhou 510515, China; (M.-K.L.); (Y.-X.Y.); (B.Z.); (K.-W.W.)
- Guangdong-Hong Hong-Macao Joint Laboratory for Contaminants Exposure and Health, Guangzhou 510006, China
| | - Wing Kam Fung
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong, China;
| | - Ji-Yuan Zhou
- Department of Biostatistics, State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, School of Public Health, Southern Medical University, Guangzhou 510515, China; (M.-K.L.); (Y.-X.Y.); (B.Z.); (K.-W.W.)
- Guangdong-Hong Hong-Macao Joint Laboratory for Contaminants Exposure and Health, Guangzhou 510006, China
- Correspondence:
| |
Collapse
|
14
|
Xiao L, Wu D, Sun Y, Hu D, Dai J, Chen Y, Wang D. Whole-exome sequencing reveals genetic risks of early-onset sporadic dilated cardiomyopathy in the Chinese Han population. SCIENCE CHINA. LIFE SCIENCES 2022; 65:770-780. [PMID: 34302607 DOI: 10.1007/s11427-020-1951-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Accepted: 06/01/2021] [Indexed: 06/13/2023]
Abstract
To reveal genetic risks of early-onset sporadic dilated cardiomyopathy (DCM) patients in the Chinese Han population, we enlisted 363 DCM cases and 414 healthy controls. Whole-exome sequencing and phenotypic characterization were conducted. In total, we identified 26 loss-of-function (LOF) candidates and 66 pathogenic variants from 33 genes, most of which were novel. The deleterious variants can account for 25.07% (91/363) of all patients. Furthermore, rare missense variants in 21 genes were found to be significantly associated with DCM in burden tests. Other than rare variants, twelve common SNPs were significantly associated with an increased risk of DCM in allele-based genetic model association analysis. Of note, in the cumulative risk model, high-risk subjects had a 3.113-fold higher risk of developing DCM than low-risk subjects. Also, DCM in the high-risk group had a younger age of onset than that in the low-risk group. In terms of cardiac function, the mean left ventricular ejection fraction of patients with the deleterious variants was lower than those without (27.73%±10.02% vs. 30.61%±10.85%, P=0.026). To conclude, we mapped a comprehensive atlas of genetic risks in Chinese patients with DCM that might lead to new insights into the mechanisms and risk stratification for DCM.
Collapse
Affiliation(s)
- Lei Xiao
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
- Hubei Key Laboratory of Genetics and Molecular Mechanisms of Cardiologic Disorders, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Dongyang Wu
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
- Hubei Key Laboratory of Genetics and Molecular Mechanisms of Cardiologic Disorders, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Yang Sun
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
- Hubei Key Laboratory of Genetics and Molecular Mechanisms of Cardiologic Disorders, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Dong Hu
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
- Hubei Key Laboratory of Genetics and Molecular Mechanisms of Cardiologic Disorders, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Jiaqi Dai
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
- Hubei Key Laboratory of Genetics and Molecular Mechanisms of Cardiologic Disorders, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Yanghui Chen
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
- Hubei Key Laboratory of Genetics and Molecular Mechanisms of Cardiologic Disorders, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Daowen Wang
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China.
- Hubei Key Laboratory of Genetics and Molecular Mechanisms of Cardiologic Disorders, Huazhong University of Science and Technology, Wuhan, 430030, China.
| |
Collapse
|
15
|
Li S, Li S, Su S, Zhang H, Shen J, Wen Y. Gene Region Association Analysis of Longitudinal Quantitative Traits Based on a Function-On-Function Regression Model. Front Genet 2022; 13:781740. [PMID: 35265102 PMCID: PMC8899465 DOI: 10.3389/fgene.2022.781740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 01/04/2022] [Indexed: 11/13/2022] Open
Abstract
In the process of growth and development in life, gene expressions that control quantitative traits will turn on or off with time. Studies of longitudinal traits are of great significance in revealing the genetic mechanism of biological development. With the development of ultra-high-density sequencing technology, the associated analysis has tremendous challenges to statistical methods. In this paper, a longitudinal functional data association test (LFDAT) method is proposed based on the function-on-function regression model. LFDAT can simultaneously treat phenotypic traits and marker information as continuum variables and analyze the association of longitudinal quantitative traits and gene regions. Simulation studies showed that: 1) LFDAT performs well for both linkage equilibrium simulation and linkage disequilibrium simulation, 2) LFDAT has better performance for gene regions (include common variants, low-frequency variants, rare variants and mixture), and 3) LFDAT can accurately identify gene switching in the growth and development stage. The longitudinal data of the Oryza sativa projected shoot area is analyzed by LFDAT. It showed that there is the advantage of quick calculations. Further, an association analysis was conducted between longitudinal traits and gene regions by integrating the micro effects of multiple related variants and using the information of the entire gene region. LFDAT provides a feasible method for studying the formation and expression of longitudinal traits.
Collapse
Affiliation(s)
- Shijing Li
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou, China.,> Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Shiqin Li
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Shaoqiang Su
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Hui Zhang
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou, China.,> Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Jiayu Shen
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou, China.,> Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Yongxian Wen
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou, China.,> Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou, China
| |
Collapse
|
16
|
Simulation Research on the Methods of Multi-Gene Region Association Analysis Based on a Functional Linear Model. Genes (Basel) 2022; 13:genes13030455. [PMID: 35328009 PMCID: PMC8954869 DOI: 10.3390/genes13030455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Revised: 02/26/2022] [Accepted: 02/27/2022] [Indexed: 11/16/2022] Open
Abstract
Genome-wide association analysis is an important approach to identify genetic variants associated with complex traits. Complex traits are not only affected by single gene loci, but also by the interaction of multiple gene loci. Studies of association between gene regions and quantitative traits are of great significance in revealing the genetic mechanism of biological development. There have been a lot of studies on single-gene region association analysis, but the application of functional linear models in multi-gene region association analysis is still less. In this paper, a functional multi-gene region association analysis test method is proposed based on the functional linear model. From the three directions of common multi-gene region method, multi-gene region weighted method and multi-gene region loci weighted method, that test method is studied combined with computer simulation. The following conclusions are obtained through computer simulation: (a) The functional multi-gene region association analysis test method has higher power than the functional single gene region association analysis test method; (b) The functional multi-gene region weighted method performs better than the common functional multi-gene region method; (c) the functional multi-gene region loci weighted method is the best method for association analysis on three directions of the common multi-gene region method; (d) the performance of the Step method and Multi-gene region loci weighted Step for multi-gene regions is the best in general. Functional multi-gene region association analysis test method can theoretically provide a feasible method for the study of complex traits affected by multiple genes.
Collapse
|
17
|
Liu J, Deng Y, Yu B, Mo B, Luo L, Yang J, Zhang X, Wang Z, Wang Y, Zhu J, Yang H, Fang S, Cheng Z, Li J, Shu Y, Luo G, Xiong W, Wei J, Li Z. Targeted resequencing showing novel common and rare genetic variants increases the risk of asthma in the Chinese Han population. J Clin Lab Anal 2021; 35:e23813. [PMID: 33969541 DOI: 10.1002/jcla.23813] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Revised: 04/16/2021] [Indexed: 11/11/2022] Open
Abstract
BACKGROUND Although studies have identified hundreds of genetic variants associated with asthma risk, a large fraction of heritability remains unexplained, especially in Chinese individuals. METHODS To identify genetic risk factors for asthma in a Han Chinese population, 211 asthma-related genes were first selected based on database searches. The genes were then sequenced for subjects in a Discovery Cohort (284 asthma patients and 205 older healthy controls) using targeted next-generation sequencing. Bioinformatics analysis and statistical association analyses were performed to reveal the associations between rare/common variants and asthma, respectively. The identified common risk variants underwent a validation analysis using a Replication Cohort (664 patients and 650 controls). RESULTS First, we identified 18 potentially functional rare loss-of-function (LOF) variants in 21/284 (7.4%) of the asthma cases. Second, using burden tests, we found that the asthma group had nominally significant (p < 0.05) burdens of rare nonsynonymous variants in 10 genes. Third, 23 common single-nucleotide polymorphisms were associated with the risk of asthma, 7/23 (30.4%) and 9/23 (39.1%) of which were modestly significant (p < 9.1 × 10-4 ) in the Replication Cohort and Combined Cohort, respectively. According to our cumulative risk model involving the modestly associated alleles, middle- and high-risk subjects had a 2.0-fold (95% CI: 1.621-2.423, p = 2.624 × 10-11 ) and 6.0-fold (95% CI: 3.623-10.156, p = 7.086 × 10-12 ) increased risk of asthma, respectively, compared with low-risk subjects. CONCLUSION This study revealed novel rare and common genetic risk factors for asthma, and provided a cumulative risk model for asthma risk prediction and stratification in Han Chinese individuals.
Collapse
Affiliation(s)
- Juan Liu
- Department of Respiratory and Critical Care Medicine, Key Laboratory of Pulmonary Diseases of Health Ministry, Key Cite of National Clinical Research Center for Respiratory Disease, Wuhan Clinical Medical Research Center for Chronic Airway Diseases, Tongji Hospital, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, China
| | - Yanhan Deng
- Department of Respiratory and Critical Care Medicine, Key Laboratory of Pulmonary Diseases of Health Ministry, Key Cite of National Clinical Research Center for Respiratory Disease, Wuhan Clinical Medical Research Center for Chronic Airway Diseases, Tongji Hospital, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, China
| | - Bo Yu
- Division of Cardiology, Departments of Internal Medicine and Genetic Diagnosis Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Biwen Mo
- Department of Respiratory Medicine, Affiliated Hospital of Guilin Medical University, Guilin, China
| | - Liman Luo
- Department of Pediatrics, The 306 Hospital of People's Liberation Army, Beijing, China
| | - Jingping Yang
- Department of Respiratory and Critical Care Medicine, The Third Affiliated Hospital of Inner Mongolia Medical University, Baotou, China
| | - Xiaoju Zhang
- Department of Respiratory Medicine, Henan Provincial People's Hospital & the People's Hospital of Zhengzhou University, Zhengzhou, China
| | - Zheng Wang
- Department of Respiratory Medicine, Henan Provincial People's Hospital & the People's Hospital of Zhengzhou University, Zhengzhou, China
| | - Yingnan Wang
- Department of Respiratory and Critical Care Medicine, Renmin Hospital of Three Gorges University, Yichang, China
| | - Jing Zhu
- Department of Respiratory and Critical Care Medicine, Renmin Hospital of Three Gorges University, Yichang, China
| | - Hua Yang
- Department of Respiratory Medicine, University Hospital of Hubei University for Nationalities, Enshi, China
| | - Shirong Fang
- Department of Respiratory Medicine, University Hospital of Hubei University for Nationalities, Enshi, China
| | - Zhenshun Cheng
- Department of Respiratory Medicine, Zhongnan Hospital of Wuhan University, Wuhan University, Wuhan, China
| | - Jingping Li
- Department of Respiratory Medicine, Qianjiang Central Hospital, Qianjiang, China
| | - Ying Shu
- Department of Respiratory Medicine, Qianjiang Central Hospital, Qianjiang, China
| | - Guangwei Luo
- Department of Respiratory Medicine, Wuhan No. 1 Hospital, Wuhan, China
| | - Weining Xiong
- Department of Respiratory and Critical Care Medicine, Key Laboratory of Pulmonary Diseases of Health Ministry, Key Cite of National Clinical Research Center for Respiratory Disease, Wuhan Clinical Medical Research Center for Chronic Airway Diseases, Tongji Hospital, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, China.,Department of Respiratory Medicine, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Jianghong Wei
- Department of Respiratory Medicine, Affiliated Hospital of Guilin Medical University, Guilin, China
| | - Zongzhe Li
- Division of Cardiology, Departments of Internal Medicine and Genetic Diagnosis Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
18
|
Yang Y, Basu S, Zhang L. A Bayesian hierarchically structured prior for rare-variant association testing. Genet Epidemiol 2021; 45:413-424. [PMID: 33565109 DOI: 10.1002/gepi.22379] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Revised: 01/08/2021] [Accepted: 01/25/2021] [Indexed: 12/12/2022]
Abstract
Although genome-wide association studies have been widely used to identify associations between complex diseases and genetic variants, standard single-variant analyses often have limited power when applied to rare variants. To overcome this problem, set-based methods have been developed with the aim of boosting power by borrowing strength from multiple rare variants. We propose the adaptive hierarchically structured variable selection (HSVS-A) before test for association of rare variants in a set with continuous or dichotomous phenotypes and to estimate the effect of individual rare variants simultaneously. HSVS-A has the flexibility to integrate a pairwise weighting scheme, which adaptively induces desirable correlations among variants of similar significance such that we can borrow information from potentially causal and noncausal rare variants to boost power. Simulation studies show that for both continuous and dichotomous phenotypes, HSVS-A is powerful when there are multiple causal rare variants, either in the same or opposite direction of effect, with the presence of a large number of noncausal variants. We also apply HSVS-A to the Wellcome Trust Case Control Consortium Crohn's disease data for testing the association of Crohn's disease with rare variants in pathways. HSVS-A identifies two pathways harboring novel protective rare variants for Crohn's disease.
Collapse
Affiliation(s)
- Yi Yang
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, USA.,Department of Biostatistics, Columbia University, New York, New York, USA
| | - Saonli Basu
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Lin Zhang
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, USA
| |
Collapse
|
19
|
Dong G, Wendl MC, Zhang B, Ding L, Huang KL. AeQTL: eQTL analysis using region-based aggregation of rare genomic variants. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2021; 26:172-183. [PMID: 33691015 PMCID: PMC8050802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Concurrently available genomic and transcriptomic data from large cohorts provide opportunities to discover expression quantitative trait loci (eQTLs)-genetic variants associated with gene expression changes. However, the statistical power of detecting rare variant eQTLs is often limited and most existing eQTL tools are not compatible with sequence variant file formats. We have developed AeQTL (Aggregated eQTL), a software tool that performs eQTL analysis on variants aggregated according to user-specified regions and is designed to accommodate standard genomic files. AeQTL consistently yielded similar or higher powers for identifying rare variant eQTLs than single-variant tests. Using AeQTL, we discovered that aggregated rare germline truncations in cis exomic regions are significantly associated with the expression of BRCA1 and SLC25A39 in breast tumors. In a somatic mutation pan-cancer analysis, aggregated mutations of those predicted to be missense versus truncations were differentially associated with gene expressions of cancer drivers, and somatic truncation eQTLs were further identified as a new multi-omic classifier of oncogenes versus tumor-suppressor genes. AeQTL is easy to use and customize, allowing a broad application for discovering rare variants, including coding and noncoding variants, associated with gene expression. AeQTL is implemented in Python and the source code is freely available at https://github.com/Huan-glab/AeQTL under the MIT license.
Collapse
Affiliation(s)
- Guanlan Dong
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Michael C. Wendl
- Department of Medicine, McDonnel Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Bin Zhang
- Department of Genetics and Genomic Sciences, Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Li Ding
- Department of Medicine, McDonnel Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Kuan-lin Huang
- Department of Genetics and Genomic Sciences, Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA,Corresponding:
| |
Collapse
|
20
|
Xue Y, Ding J, Wang J, Zhang S, Pan D. Two-phase SSU and SKAT in genetic association studies. J Genet 2020. [DOI: 10.1007/s12041-019-1166-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
21
|
Genetic-Based Hypertension Subtype Identification Using Informative SNPs. Genes (Basel) 2020; 11:genes11111265. [PMID: 33121163 PMCID: PMC7693873 DOI: 10.3390/genes11111265] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 09/29/2020] [Accepted: 10/05/2020] [Indexed: 12/22/2022] Open
Abstract
In this work, we proposed a process to select informative genetic variants for identifying clinically meaningful subtypes of hypertensive patients. We studied 575 African American (AA) and 612 Caucasian hypertensive participants enrolled in the Hypertension Genetic Epidemiology Network (HyperGEN) study and analyzed each race-based group separately. All study participants underwent GWAS (Genome-Wide Association Studies) and echocardiography. We applied a variety of statistical methods and filtering criteria, including generalized linear models, F statistics, burden tests, deleterious variant filtering, and others to select the most informative hypertension-related genetic variants. We performed an unsupervised learning algorithm non-negative matrix factorization (NMF) to identify hypertension subtypes with similar genetic characteristics. Kruskal–Wallis tests were used to demonstrate the clinical meaningfulness of genetic-based hypertension subtypes. Two subgroups were identified for both African American and Caucasian HyperGEN participants. In both AAs and Caucasians, indices of cardiac mechanics differed significantly by hypertension subtypes. African Americans tend to have more genetic variants compared to Caucasians; therefore, using genetic information to distinguish the disease subtypes for this group of people is relatively challenging, but we were able to identify two subtypes whose cardiac mechanics have statistically different distributions using the proposed process. The research gives a promising direction in using statistical methods to select genetic information and identify subgroups of diseases, which may inform the development and trial of novel targeted therapies.
Collapse
|
22
|
Gao C, Sha Q, Zhang S, Zhang K. MF-TOWmuT: Testing an optimally weighted combination of common and rare variants with multiple traits using family data. Genet Epidemiol 2020; 45:64-81. [PMID: 33047835 DOI: 10.1002/gepi.22355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Revised: 08/03/2020] [Accepted: 08/18/2020] [Indexed: 11/11/2022]
Abstract
With rapid advancements of sequencing technologies and accumulations of electronic health records, a large number of genetic variants and multiple correlated human complex traits have become available in many genetic association studies. Thus, it becomes necessary and important to develop new methods that can jointly analyze the association between multiple genetic variants and multiple traits. Compared with methods that only use a single marker or trait, the joint analysis of multiple genetic variants and multiple traits is more powerful since such an analysis can fully incorporate the correlation structure of genetic variants and/or traits and their mutual dependence patterns. However, most of existing methods that simultaneously analyze multiple genetic variants and multiple traits are only applicable to unrelated samples. We develop a new method called MF-TOWmuT to detect association of multiple phenotypes and multiple genetic variants in a genomic region with family samples. MF-TOWmuT is based on an optimally weighted combination of variants. Our method can be applied to both rare and common variants and both qualitative and quantitative traits. Our simulation results show that (1) the type I error of MF-TOWmuT is preserved; (2) MF-TOWmuT outperforms two existing methods such as Multiple Family-based Quasi-Likelihood Score Test and Multivariate Family-based Rare Variant Association Test in terms of power. We also illustrate the usefulness of MF-TOWmuT by analyzing genotypic and phenotipic data from the Genetics of Kidneys in Diabetes study. R program is available at https://github.com/gaochengPRC/MF-TOWmuT.
Collapse
Affiliation(s)
- Cheng Gao
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, USA
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, USA
| | - Kui Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, USA
| |
Collapse
|
23
|
Shafquat A, Crystal RG, Mezey JG. Identifying novel associations in GWAS by hierarchical Bayesian latent variable detection of differentially misclassified phenotypes. BMC Bioinformatics 2020; 21:178. [PMID: 32381021 PMCID: PMC7204256 DOI: 10.1186/s12859-020-3387-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Accepted: 01/24/2020] [Indexed: 12/22/2022] Open
Abstract
Background Heterogeneity in the definition and measurement of complex diseases in Genome-Wide Association Studies (GWAS) may lead to misdiagnoses and misclassification errors that can significantly impact discovery of disease loci. While well appreciated, almost all analyses of GWAS data consider reported disease phenotype values as is without accounting for potential misclassification. Results Here, we introduce Phenotype Latent variable Extraction of disease misdiagnosis (PheLEx), a GWAS analysis framework that learns and corrects misclassified phenotypes using structured genotype associations within a dataset. PheLEx consists of a hierarchical Bayesian latent variable model, where inference of differential misclassification is accomplished using filtered genotypes while implementing a full mixed model to account for population structure and genetic relatedness in study populations. Through simulations, we show that the PheLEx framework dramatically improves recovery of the correct disease state when considering realistic allele effect sizes compared to existing methodologies designed for Bayesian recovery of disease phenotypes. We also demonstrate the potential of PheLEx for extracting new potential loci from existing GWAS data by analyzing bipolar disorder and epilepsy phenotypes available from the UK Biobank. From the PheLEx analysis of these data, we identified new candidate disease loci not previously reported for these datasets that have value for supplemental hypothesis generation. Conclusion PheLEx shows promise in reanalyzing GWAS datasets to provide supplemental candidate loci that are ignored by traditional GWAS analysis methodologies.
Collapse
Affiliation(s)
- Afrah Shafquat
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
| | - Ronald G Crystal
- Department of Genetic Medicine, Weill Cornell Medicine, New York, NY, USA.,Department of Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Jason G Mezey
- Department of Computational Biology, Cornell University, Ithaca, NY, USA. .,Department of Genetic Medicine, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
24
|
Cai X, Chang LB, Potter J, Song C. Adaptive Fisher method detects dense and sparse signals in association analysis of SNV sets. BMC Med Genomics 2020; 13:46. [PMID: 32241265 PMCID: PMC7118831 DOI: 10.1186/s12920-020-0684-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND With the development of next generation sequencing (NGS) technology and genotype imputation methods, statistical methods have been proposed to test a set of genomic variants together to detect if any of them is associated with the phenotype or disease. In practice, within the set, there is an unknown proportion of variants truly causal or associated with the disease. There is a demand for statistical methods with high power in both dense and sparse scenarios, where the proportion of causal or associated variants is large or small respectively. RESULTS We propose a new association test - weighted Adaptive Fisher (wAF) that can adapt to both dense and sparse scenarios by adding weights to the Adaptive Fisher (AF) method we developed before. Using simulation, we show that wAF enjoys comparable or better power to popular methods such as sequence kernel association tests (SKAT and SKAT-O) and adaptive SPU (aSPU) test. We apply wAF to a publicly available schizophrenia dataset, and successfully detect thirteen genes. Among them, three genes are supported by existing literature; six are plausible as they either relate to other neurological diseases or have relevant biological functions. CONCLUSIONS The proposed wAF method is a powerful disease-variants association test in both dense and sparse scenarios. Both simulation studies and real data analysis indicate the potential of wAF for new biological findings.
Collapse
Affiliation(s)
- Xiaoyu Cai
- Department of Statistics, The Ohio State University, 1948 Neil Ave., Columbus, OH 43210, US
| | - Lo-Bin Chang
- Department of Statistics, The Ohio State University, 1948 Neil Ave., Columbus, OH 43210, US
| | - Jordan Potter
- Department of Mathematics and Statistics, Kenyon College, 201 N College Rd., Gambier, Ohio 43022, US
| | - Chi Song
- College of Public Health, Division of Biostatistics, The Ohio State University, 1841 Neil Ave., 208E Cunz Hall, Columbus, OH 43210, US
| |
Collapse
|
25
|
Shinohara RT, Shou H, Carone M, Schultz R, Tunc B, Parker D, Martin ML, Verma R. Distance-based analysis of variance for brain connectivity. Biometrics 2020; 76:257-269. [PMID: 31350904 PMCID: PMC7653688 DOI: 10.1111/biom.13123] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2018] [Accepted: 07/12/2019] [Indexed: 01/07/2023]
Abstract
The field of neuroimaging dedicated to mapping connections in the brain is increasingly being recognized as key for understanding neurodevelopment and pathology. Networks of these connections are quantitatively represented using complex structures, including matrices, functions, and graphs, which require specialized statistical techniques for estimation and inference about developmental and disorder-related changes. Unfortunately, classical statistical testing procedures are not well suited to high-dimensional testing problems. In the context of global or regional tests for differences in neuroimaging data, traditional analysis of variance (ANOVA) is not directly applicable without first summarizing the data into univariate or low-dimensional features, a process that might mask the salient features of high-dimensional distributions. In this work, we consider a general framework for two-sample testing of complex structures by studying generalized within-group and between-group variances based on distances between complex and potentially high-dimensional observations. We derive an asymptotic approximation to the null distribution of the ANOVA test statistic, and conduct simulation studies with scalar and graph outcomes to study finite sample properties of the test. Finally, we apply our test to our motivating study of structural connectivity in autism spectrum disorder.
Collapse
Affiliation(s)
- Russell T. Shinohara
- Department of Biostatistics, Epidemiology, and Informatics, Penn Statistics in Imaging and Visualization Center, University of Pennsylvania, Philadelphia, Pennsylvania
- Department of Radiology, Center for Biomedical Image Computing and Analytics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Haochang Shou
- Department of Biostatistics, Epidemiology, and Informatics, Penn Statistics in Imaging and Visualization Center, University of Pennsylvania, Philadelphia, Pennsylvania
- Department of Radiology, Center for Biomedical Image Computing and Analytics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Marco Carone
- Department of Biostatistics, University of Washington, Seattle, Washington
| | - Robert Schultz
- Center for Autism Research, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania
| | - Birkan Tunc
- Department of Radiology, Center for Biomedical Image Computing and Analytics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Drew Parker
- Department of Radiology, Center for Biomedical Image Computing and Analytics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Melissa Lynne Martin
- Department of Biostatistics, Epidemiology, and Informatics, Penn Statistics in Imaging and Visualization Center, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Ragini Verma
- Department of Radiology, Center for Biomedical Image Computing and Analytics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| |
Collapse
|
26
|
Zhang M, Gelfman S, McCarthy J, Harms MB, Moreno CAM, Goldstein DB, Allen AS. Incorporating external information to improve sparse signal detection in rare-variant gene-set-based analyses. Genet Epidemiol 2020; 44:330-338. [PMID: 32043633 DOI: 10.1002/gepi.22283] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 12/17/2019] [Accepted: 01/27/2020] [Indexed: 01/30/2023]
Abstract
Gene-set analyses are used to assess whether there is any evidence of association with disease among a set of biologically related genes. Such an analysis typically treats all genes within the sets similarly, even though there is substantial, external, information concerning the likely importance of each gene within each set. For example, for traits that are under purifying selection, we would expect genes showing extensive genic constraint to be more likely to be trait associated than unconstrained genes. Here we improve gene-set analyses by incorporating such external information into a higher-criticism-based signal detection analysis. We show that when this external information is predictive of whether a gene is associated with disease, our approach can lead to a significant increase in power. Further, our approach is particularly powerful when the signal is sparse, that is when only a small number of genes within the set are associated with the trait. We illustrate our approach with a gene-set analysis of amyotrophic lateral sclerosis (ALS) and implicate a number of gene-sets containing SOD1 and NEK1 as well as showing enrichment of small p values for gene-sets containing known ALS genes. We implement our approach in the R package wHC.
Collapse
Affiliation(s)
- Mengqi Zhang
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina.,Center for Genomic and Computational Biology, Duke University, Durham, North Carolina.,Center for Statistical Genetics and Genomics, Duke University, Durham, North Carolina
| | - Sahar Gelfman
- Institute of Genomic Medicine, Columbia University, New York City, New York
| | - Janice McCarthy
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina
| | - Matthew B Harms
- Institute of Genomic Medicine, Columbia University, New York City, New York.,Department of Neurology, Columbia University, New York City, New York.,Center for Motor Neuron Biology and Disease, Columbia University, New York City, New York
| | - Cristiane A M Moreno
- Institute of Genomic Medicine, Columbia University, New York City, New York.,Center for Motor Neuron Biology and Disease, Columbia University, New York City, New York
| | - David B Goldstein
- Institute of Genomic Medicine, Columbia University, New York City, New York
| | - Andrew S Allen
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina.,Center for Genomic and Computational Biology, Duke University, Durham, North Carolina.,Center for Statistical Genetics and Genomics, Duke University, Durham, North Carolina
| |
Collapse
|
27
|
Xue Y, Ding J, Wang J, Zhang S, Pan D. Two-phase SSU and SKAT in genetic association studies. J Genet 2020; 99:9. [PMID: 32089528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The sum of squared score (SSU) and sequence kernel association test (SKAT) are the two good alternative tests for genetic association studies in case-control data. Both SSU and SKAT are derived through assuming a dose-response model between the risk of disease and genotypes. However, in practice, the real genetic mode of inheritance is impossible to know. Thus, these two tests might losepower substantially as shown in simulation results when the genetic model is misspecified. Here, to make both the tests suitable in broad situations, we propose two-phase SSU (tpSSU) and two-phase SKAT (tpSKAT), where the Hardy-Weinberg equilibrium test is adopted to choose the genetic model in the first phase and the SSU and SKAT are constructed corresponding to the selected genetic model in the second phase. We found that both tpSSU and tpSKAT outperformed the original SSU and SKAT in most of our simulation scenarios. Byapplying tpSSU and tpSKAT to the study of type 2 diabetes data, we successfully identified some genes that have direct effects on obesity. Besides, we also detected the significant chromosomal region 10q21.22 in GAW16 rheumatoid arthritis dataset, with P<10-6. These findings suggest that tpSSU and tpSKAT can be effective in identifying genetic variants for complex diseases in case-control association studies.
Collapse
Affiliation(s)
- Yuan Xue
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, People's Republic of China.
| | | | | | | | | |
Collapse
|
28
|
An B, Gao X, Chang T, Xia J, Wang X, Miao J, Xu L, Zhang L, Chen Y, Li J, Xu S, Gao H. Genome-wide association studies using binned genotypes. Heredity (Edinb) 2019; 124:288-298. [PMID: 31641238 DOI: 10.1038/s41437-019-0279-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Revised: 09/25/2019] [Accepted: 09/26/2019] [Indexed: 01/23/2023] Open
Abstract
Linear mixed models (LMM) that tests trait association one marker at a time have been the most popular methods for genome-wide association studies. However, this approach has potential pitfalls: over conservativeness after Bonferroni correction, ignorance of linkage disequilibrium (LD) between neighboring markers, and power reduction due to overfitting SNP effects. So, multiple locus models that can simultaneously estimate and test all markers in the genome are more appropriate. Based on the multiple locus models, we proposed a bin model that combines markers into bins based on their LD relationships. A bin is treated as a new synthetic marker and we detect the associations between bins and traits. Since the number of bins can be substantially smaller than the number of markers, a penalized multiple regression method can be adopted by fitting all bins to a single model. We developed an innovative method to bin the neighboring markers and used the least absolute shrinkage and selection operator (LASSO) method. We compared BIN-Lasso with SNP-Lasso and Q + K-LMM in a simulation experiment, and showed that the new method is more powerful with less Type I error than the other two methods. We also applied the bin model to a Chinese Simmental beef cattle population for bone weight association study. The new method identified more significant associations than the classical LMM. The bin model is a new dimension reduction technique that takes advantage of biological information (i.e., LD). The new method will be a significant breakthrough in associative genomics in the big data era.
Collapse
Affiliation(s)
- Bingxing An
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Xue Gao
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Tianpeng Chang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Jiangwei Xia
- Institute of Basic Medical Science, Westlake Institute for Advanced Study, Hangzhou, China
| | - Xiaoqiao Wang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Jian Miao
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lingyang Xu
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lupei Zhang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Yan Chen
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Junya Li
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Shizhong Xu
- Department of Botany and Plant Sciences, University of California, Riverside, CA, USA
| | - Huijiang Gao
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China.
| |
Collapse
|
29
|
Wei Y, Liu Y, Sun T, Chen W, Ding Y. Gene-based association analysis for bivariate time-to-event data through functional regression with copula models. Biometrics 2019; 76:619-629. [PMID: 31625595 DOI: 10.1111/biom.13165] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Accepted: 10/08/2019] [Indexed: 11/28/2022]
Abstract
Several gene-based association tests for time-to-event traits have been proposed recently to detect whether a gene region (containing multiple variants), as a set, is associated with the survival outcome. However, for bivariate survival outcomes, to the best of our knowledge, there is no statistical method that can be directly applied for gene-based association analysis. Motivated by a genetic study to discover the gene regions associated with the progression of a bilateral eye disease, age-related macular degeneration (AMD), we implement a novel functional regression (FR) method under the copula framework. Specifically, the effects of variants within a gene region are modeled through a functional linear model, which then contributes to the marginal survival functions within the copula. Generalized score test statistics are derived to test for the association between bivariate survival traits and the genetic region. Extensive simulation studies are conducted to evaluate the type I error control and power performance of the proposed approach, with comparisons to several existing methods for a single survival trait, as well as the marginal Cox FR model using the robust sandwich estimator for bivariate survival traits. Finally, we apply our method to a large AMD study, the Age-related Eye Disease Study, and to identify the gene regions that are associated with AMD progression.
Collapse
Affiliation(s)
- Yue Wei
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Yi Liu
- Department of Biostatistics and Data Sciences, Boehringer Ingelheim, Ridgefield, Connecticut
| | - Tao Sun
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Wei Chen
- Department of Pediatrics, Children's Hospital of Pittsburgh, Pittsburgh, Pennsylvania
| | - Ying Ding
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pennsylvania
| |
Collapse
|
30
|
Povysil G, Petrovski S, Hostyk J, Aggarwal V, Allen AS, Goldstein DB. Rare-variant collapsing analyses for complex traits: guidelines and applications. Nat Rev Genet 2019; 20:747-759. [PMID: 31605095 DOI: 10.1038/s41576-019-0177-4] [Citation(s) in RCA: 117] [Impact Index Per Article: 23.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/06/2019] [Indexed: 12/11/2022]
Abstract
The first phase of genome-wide association studies (GWAS) assessed the role of common variation in human disease. Advances optimizing and economizing high-throughput sequencing have enabled a second phase of association studies that assess the contribution of rare variation to complex disease in all protein-coding genes. Unlike the early microarray-based studies, sequencing-based studies catalogue the full range of genetic variation, including the evolutionarily youngest forms. Although the experience with common variants helped establish relevant standards for genome-wide studies, the analysis of rare variation introduces several challenges that require novel analysis approaches.
Collapse
Affiliation(s)
- Gundula Povysil
- Institute for Genomic Medicine, Columbia University Irving Medical Center, Columbia University, New York, NY, USA
| | - Slavé Petrovski
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK.,Department of Medicine, The University of Melbourne, Austin Health and Royal Melbourne Hospital, Melbourne, Victoria, Australia
| | - Joseph Hostyk
- Institute for Genomic Medicine, Columbia University Irving Medical Center, Columbia University, New York, NY, USA
| | - Vimla Aggarwal
- Institute for Genomic Medicine, Columbia University Irving Medical Center, Columbia University, New York, NY, USA
| | - Andrew S Allen
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - David B Goldstein
- Institute for Genomic Medicine, Columbia University Irving Medical Center, Columbia University, New York, NY, USA.
| |
Collapse
|
31
|
Zhang J, Wu B, Sha Q, Zhang S, Wang X. A general statistic to test an optimally weighted combination of common and/or rare variants. Genet Epidemiol 2019; 43:966-979. [PMID: 31498476 DOI: 10.1002/gepi.22255] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2019] [Revised: 06/17/2019] [Accepted: 07/30/2019] [Indexed: 11/10/2022]
Abstract
Both genome-wide association study and next-generation sequencing data analyses are widely employed to identify disease susceptible common and/or rare genetic variants. Rare variants generally have large effects though they are hard to detect due to their low frequencies. Currently, many existing statistical methods for rare variants association studies employ a weighted combination scheme, which usually puts subjective weights or suboptimal weights based on some adhoc assumptions (e.g., ignoring dependence between rare variants). In this study, we analytically derived optimal weights for both common and rare variants and proposed a general and novel approach to test association between an optimally weighted combination of variants (G-TOW) in a gene or pathway for a continuous or dichotomous trait while easily adjusting for covariates. Results of the simulation studies show that G-TOW has properly controlled type I error rates and it is the most powerful test among the methods we compared when testing effects of either both rare and common variants or rare variants only. We also illustrate the effectiveness of G-TOW using the Genetic Analysis Workshop 17 (GAW17) data. Additionally, we applied G-TOW and other competitive methods to test disease-associated genes in real data of schizophrenia. The G-TOW has successfully verified genes FYN and VPS39 which are associated with schizophrenia reported in existing publications. Both of these genes are missed by the weighted sum statistic and the sequence kernel association test. Simulation study and real data analysis indicate that G-TOW is a powerful test.
Collapse
Affiliation(s)
- Jianjun Zhang
- Department of Mathematics, University of North Texas, Denton, Texas
| | - Baolin Wu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan
| | - Xuexia Wang
- Department of Mathematics, University of North Texas, Denton, Texas
| |
Collapse
|
32
|
Zhang J, Zhao Z, Guo X, Guo B, Wu B. Powerful statistical method to detect disease-associated genes using publicly available genome-wide association studies summary data. Genet Epidemiol 2019; 43:941-951. [PMID: 31392781 DOI: 10.1002/gepi.22251] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2018] [Revised: 07/14/2019] [Accepted: 07/16/2019] [Indexed: 12/11/2022]
Abstract
Genome-wide association studies (GWAS) have thus far achieved substantial success. In the last decade, a large number of common variants underlying complex diseases have been identified through GWAS. In most existing GWAS, the identified common variants are obtained by single marker-based tests, that is, testing one single-nucleotide polymorphism (SNP) at a time. Generally, the basic functional unit of inheritance is a gene, rather than a SNP. Thus, results from gene-level association test can be more readily integrated with downstream functional and pathogenic investigation. In this paper, we propose a general gene-based p-value adaptive combination approach (GPA) which can integrate association evidence of multiple genetic variants using only GWAS summary statistics (either p-value or other test statistics). The proposed method could be used to test genetic association for both continuous and binary traits through not only one study but also multiple studies, which would be helpful to overcome the limitation of existing methods that can only be applied to a specific type of data. We conducted thorough simulation studies to verify that the proposed method controls type I errors well, and performs favorably compared to single-marker analysis and other existing methods. We demonstrated the utility of our proposed method through analysis of GWAS meta-analysis results for fasting glucose and lipids from the international MAGIC consortium and Global Lipids Consortium, respectively. The proposed method identified some novel trait associated genes which can improve our understanding of the mechanisms involved in β -cell function, glucose homeostasis, and lipids traits.
Collapse
Affiliation(s)
- Jianjun Zhang
- Department of Mathematics, University of North Texas, Denton, Texas
| | - Zihan Zhao
- Texas Academy of Mathematics & Science, University of North Texas, Denton, Texas
| | - Xuan Guo
- Department of Computer Science and Engineering, University of North Texas, Denton, Texas
| | - Bin Guo
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota
| | - Baolin Wu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota
| |
Collapse
|
33
|
Zeng Z, Vo AH, Mao C, Clare SE, Khan SA, Luo Y. Cancer classification and pathway discovery using non-negative matrix factorization. J Biomed Inform 2019; 96:103247. [PMID: 31271844 PMCID: PMC6697569 DOI: 10.1016/j.jbi.2019.103247] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2019] [Revised: 04/23/2019] [Accepted: 07/01/2019] [Indexed: 02/08/2023]
Abstract
OBJECTIVES Extracting genetic information from a full range of sequencing data is important for understanding disease. We propose a novel method to effectively explore the landscape of genetic mutations and aggregate them to predict cancer type. DESIGN We applied non-smooth non-negative matrix factorization (nsNMF) and support vector machine (SVM) to utilize the full range of sequencing data, aiming to better aggregate genetic mutations and improve their power to predict disease type. More specifically, we introduce a novel classifier to distinguish cancer types using somatic mutations obtained from whole-exome sequencing data. Mutations were identified from multiple cancers and scored using SIFT, PP2, and CADD, and collapsed at the individual gene level. nsNMF was then applied to reduce dimensionality and obtain coefficient and basis matrices. A feature matrix was derived from the obtained matrices to train a classifier for cancer type classification with the SVM model. RESULTS We have demonstrated that the classifier was able to distinguish four cancer types with reasonable accuracy. In five-fold cross-validations using mutation counts as features, the average prediction accuracy was 80% (SEM = 0.1%), significantly outperforming baselines and outperforming models using mutation scores as features. CONCLUSION Using the factor matrices derived from the nsNMF, we identified multiple genes and pathways that are significantly associated with each cancer type. This study presents a generic and complete pipeline to study the associations between somatic mutations and cancers. The proposed method can be adapted to other studies for disease status classification and pathway discovery.
Collapse
Affiliation(s)
- Zexian Zeng
- Department of Preventive Medicine, Northwestern University, Feinberg School of Medicine, Chicago, IL, USA
| | - Andy H Vo
- Committee on Developmental Biology and Regenerative Medicine, The University of Chicago, Chicago, IL, USA
| | - Chengsheng Mao
- Department of Preventive Medicine, Northwestern University, Feinberg School of Medicine, Chicago, IL, USA
| | - Susan E Clare
- Department of Surgery, Northwestern University, Feinberg School of Medicine, Chicago, IL, USA.
| | - Seema A Khan
- Department of Surgery, Northwestern University, Feinberg School of Medicine, Chicago, IL, USA.
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University, Feinberg School of Medicine, Chicago, IL, USA.
| |
Collapse
|
34
|
Li CI, Samuels DC, Zhao YY, Shyr Y, Guo Y. Power and sample size calculations for high-throughput sequencing-based experiments. Brief Bioinform 2019; 19:1247-1255. [PMID: 28605403 DOI: 10.1093/bib/bbx061] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2017] [Indexed: 12/22/2022] Open
Abstract
Power/sample size (power) analysis estimates the likelihood of successfully finding the statistical significance in a data set. There has been a growing recognition of the importance of power analysis in the proper design of experiments. Power analysis is complex, yet necessary for the success of large studies. It is important to design a study that produces statistically accurate and reliable results. Power computation methods have been well established for both microarray-based gene expression studies and genotyping microarray-based genome-wide association studies. High-throughput sequencing (HTS) has greatly enhanced our ability to conduct biomedical studies at the highest possible resolution (per nucleotide). However, the complexity of power computations is much greater for sequencing data than for the simpler genotyping array data. Research on methods of power computations for HTS-based studies has been recently conducted but is not yet well known or widely used. In this article, we describe the power computation methods that are currently available for a range of HTS-based studies, including DNA sequencing, RNA-sequencing, microbiome sequencing and chromatin immunoprecipitation sequencing. Most importantly, we review the methods of power analysis for several types of sequencing data and guide the reader to the relevant methods for each data type.
Collapse
Affiliation(s)
- Chung-I Li
- Department of Statistics, National Cheng Kung University in Taiwan
| | - David C Samuels
- Department of Molecular Physiology and Biophysics, Vanderbilt University, USA
| | | | - Yu Shyr
- Department of Biostatistics, Vanderbilt University, USA
| | - Yan Guo
- Department of Cancer Biology, Vanderbilt University
| |
Collapse
|
35
|
Bocher O, Marenne G, Saint Pierre A, Ludwig TE, Guey S, Tournier-Lasserve E, Perdry H, Génin E. Rare variant association testing for multicategory phenotype. Genet Epidemiol 2019; 43:646-656. [PMID: 31087445 DOI: 10.1002/gepi.22210] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Revised: 04/03/2019] [Accepted: 04/17/2019] [Indexed: 01/09/2023]
Abstract
Genetic association studies have provided new insights into the genetic variability of human complex traits with a focus mainly on continuous or binary traits. Methods have been proposed to take into account disease heterogeneity between subgroups of patients when studying common variants but none was specifically designed for rare variants. Because rare variants are expected to have stronger effects and to be more heterogeneously distributed among cases than common ones, subgroup analyses might be particularly attractive in this context. To address this issue, we propose an extension of burden tests by using a multinomial regression model, which enables association tests between rare variants and multicategory phenotypes. We evaluated the type I error and the power of two burden tests, CAST and WSS, by simulating data under different scenarios. In the case of genetic heterogeneity between case subgroups, we showed an advantage of multinomial regression over logistic regression, which considers all the cases against the controls. We replicated these results on real data from Moyamoya disease where the burden tests performed better when cases were stratified according to age-of-onset. We implemented the functions for association tests in the R package "Ravages" available on Github.
Collapse
Affiliation(s)
- Ozvan Bocher
- Univ Brest, Inserm, EFS, UMR 1078, GGB, Brest, France
| | | | | | - Thomas E Ludwig
- Univ Brest, Inserm, EFS, UMR 1078, GGB, Brest, France.,CHU Brest, Brest, France
| | - Stéphanie Guey
- Inserm UMR-S1161, Génétique et Physiopathologie des Maladies Cérébro-vasculaires, Université Paris Diderot, Sorbonne Paris Cité, Paris, France
| | - Elisabeth Tournier-Lasserve
- Inserm UMR-S1161, Génétique et Physiopathologie des Maladies Cérébro-vasculaires, Université Paris Diderot, Sorbonne Paris Cité, Paris, France
| | - Hervé Perdry
- CESP Inserm, U1018, UFR Médecine, Univ Paris-Sud, Université Paris-Saclay, Villejuif, France
| | | |
Collapse
|
36
|
Yan Q, Liu N, Forno E, Canino G, Celedón JC, Chen W. An integrative association method for omics data based on a modified Fisher's method with application to childhood asthma. PLoS Genet 2019; 15:e1008142. [PMID: 31063461 PMCID: PMC6524814 DOI: 10.1371/journal.pgen.1008142] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Revised: 05/17/2019] [Accepted: 04/16/2019] [Indexed: 02/07/2023] Open
Abstract
The development of high-throughput biotechnologies allows the collection of omics data to study the biological mechanisms underlying complex diseases at different levels, such as genomics, epigenomics, and transcriptomics. However, each technology is designed to collect a specific type of omics data. Thus, the association between a disease and one type of omics data is usually tested individually, but this strategy is suboptimal. To better articulate biological processes and increase the consistency of variant identification, omics data from various platforms need to be integrated. In this report, we introduce an approach that uses a modified Fisher's method (denoted as Omnibus-Fisher) to combine separate p-values of association testing for a trait and SNPs, DNA methylation markers, and RNA sequencing, calculated by kernel machine regression into an overall gene-level p-value to account for correlation between omics data. To consider all possible disease models, we extend Omnibus-Fisher to an optimal test by using perturbations. In our simulations, a usual Fisher's method has inflated type I error rates when directly applied to correlated omics data. In contrast, Omnibus-Fisher preserves the expected type I error rates. Moreover, Omnibus-Fisher has increased power compared to its optimal version when the true disease model involves all types of omics data. On the other hand, the optimal Omnibus-Fisher is more powerful than its regular version when only one type of data is causal. Finally, we illustrate our proposed method by analyzing whole-genome genotyping, DNA methylation data, and RNA sequencing data from a study of childhood asthma in Puerto Ricans.
Collapse
Affiliation(s)
- Qi Yan
- Division of Pediatric Pulmonary Medicine, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA
- * E-mail: (QY); (WC)
| | - Nianjun Liu
- Department of Epidemiology and Biostatistics, School of Public Health, Indiana University Bloomington, Bloomington, IN
| | - Erick Forno
- Division of Pediatric Pulmonary Medicine, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA
| | - Glorisa Canino
- Behavioral Sciences Research Institute, University of Puerto Rico, San Juan, PR
| | - Juan C. Celedón
- Division of Pediatric Pulmonary Medicine, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA
| | - Wei Chen
- Division of Pediatric Pulmonary Medicine, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, PA
- * E-mail: (QY); (WC)
| |
Collapse
|
37
|
Wang C, Deng S, Sun L, Li L, Hu YQ. A nonparametric test for association with multiple loci in the retrospective case-control study. Stat Methods Med Res 2019; 29:589-602. [PMID: 30987531 DOI: 10.1177/0962280219842892] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The genome-wide association studies aim at identifying common or rare variants associated with common diseases and explaining more heritability. It is well known that common diseases are influenced by multiple single nucleotide polymorphisms (SNPs) that are usually correlated in location or function. In order to powerfully detect association signals, it is highly desirable to take account of correlations or linkage disequilibrium (LD) information among multiple SNPs in testing for association. In this article, we propose a test SLIDE that depicts the difference of the average multi-locus genotypes between cases and controls and derive its variance-covariance matrix in the retrospective design. This matrix is composed of the pairwise LD between SNPs. Thus SLIDE can borrow the strength from an external database in the population of interest with a few thousands to hundreds of thousands individuals to improve the power for detecting association. Extensive simulations show that SLIDE has apparent superiority over the existing methods, especially in the situation involving both common and rare variants, both protective and deleterious variants. Furthermore, the efficiency of the proposed method is demonstrated in the application to the data from the Wellcome Trust Case Control Consortium.
Collapse
Affiliation(s)
- Chan Wang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Institute of Biostatistics, Fudan University, Shanghai, China.,Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, NY, USA
| | - Shufang Deng
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Institute of Biostatistics, Fudan University, Shanghai, China
| | - Leiming Sun
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Institute of Biostatistics, Fudan University, Shanghai, China
| | - Liming Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Institute of Biostatistics, Fudan University, Shanghai, China
| | - Yue-Qing Hu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Institute of Biostatistics, Fudan University, Shanghai, China.,Shanghai Center for Mathematical Sciences, Fudan University, Shanghai, China
| |
Collapse
|
38
|
Qin H, Zhao J, Zhu X. Identifying Rare Variant Associations in Admixed Populations. Sci Rep 2019; 9:5458. [PMID: 30931973 PMCID: PMC6443736 DOI: 10.1038/s41598-019-41845-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 03/12/2019] [Indexed: 12/27/2022] Open
Abstract
An admixed population and its ancestral populations bear different burdens of a complex disease. The ancestral populations may have different haplotypes of deleterious alleles and thus ancestry-gene interaction can influence disease risk in the admixed population. Among admixed individuals, deleterious haplotypes and their ancestries are dependent and can provide non-redundant association information. Herein we propose a local ancestry boosted sum test (LABST) for identifying chromosomal blocks that harbor rare variants but have no ancestry switches. For such a stable ancestral block, our LABST exploits ancestry-gene interaction and the number of rare alleles therein. Under the null of no genetic association, the test statistic asymptotically follows a chi-square distribution with one degree of freedom (1-df). Our LABST properly controlled type I error rates under extensive simulations, suggesting that the asymptotic approximation was accurate for the null distribution of the test statistic. In terms of power for identifying rare variant associations, our LABST uniformly outperformed several famed methods under four important modes of disease genetics over a large range of relative risks. In conclusion, exploiting ancestry-gene interaction can boost statistical power for rare variant association mapping in admixed populations.
Collapse
Affiliation(s)
- Huaizhen Qin
- Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, FL, 32611, USA
- Department of Global Biostatistics and Data Science, Tulane University School of Public Health and Tropical Medicine, 1440 Canal Street, New Orleans, LA, 70112, USA
| | - Jinying Zhao
- Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, FL, 32611, USA
| | - Xiaofeng Zhu
- Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, 10900 Euclid Avenue, Cleveland, Ohio, 44106, USA.
| |
Collapse
|
39
|
Chiu CY, Yuan F, Zhang BS, Yuan A, Li X, Fang HB, Lange K, Weeks DE, Wilson AF, Bailey-Wilson JE, Musolf AM, Stambolian D, Lakhal-Chaieb ML, Cook RJ, McMahon FJ, Amos CI, Xiong M, Fan R. Linear mixed models for association analysis of quantitative traits with next-generation sequencing data. Genet Epidemiol 2019; 43:189-206. [PMID: 30537345 PMCID: PMC6375753 DOI: 10.1002/gepi.22177] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2018] [Revised: 08/27/2018] [Accepted: 09/26/2018] [Indexed: 01/01/2023]
Abstract
We develop linear mixed models (LMMs) and functional linear mixed models (FLMMs) for gene-based tests of association between a quantitative trait and genetic variants on pedigrees. The effects of a major gene are modeled as a fixed effect, the contributions of polygenes are modeled as a random effect, and the correlations of pedigree members are modeled via inbreeding/kinship coefficients. F -statistics and χ 2 likelihood ratio test (LRT) statistics based on the LMMs and FLMMs are constructed to test for association. We show empirically that the F -distributed statistics provide a good control of the type I error rate. The F -test statistics of the LMMs have similar or higher power than the FLMMs, kernel-based famSKAT (family-based sequence kernel association test), and burden test famBT (family-based burden test). The F -statistics of the FLMMs perform well when analyzing a combination of rare and common variants. For small samples, the LRT statistics of the FLMMs control the type I error rate well at the nominal levels α = 0.01 and 0.05 . For moderate/large samples, the LRT statistics of the FLMMs control the type I error rates well. The LRT statistics of the LMMs can lead to inflated type I error rates. The proposed models are useful in whole genome and whole exome association studies of complex traits.
Collapse
Affiliation(s)
- Chi-Yang Chiu
- Division of Biostatistics, Department of Preventive Medicine, University of Tennessee Health Science Center, Memphis, Tennessee
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health (NIH), Bethesda, Maryland
| | - Fang Yuan
- Department of Biochemistry and Molecular Biology, School of Basic Medicine, Kunming Medical University, Kunming, Yunnan, China
| | - Bing-Song Zhang
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, District of Columbia
| | - Ao Yuan
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, District of Columbia
| | - Xin Li
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, District of Columbia
| | - Hong-Bin Fang
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, District of Columbia
| | - Kenneth Lange
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California
| | - Daniel E Weeks
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Alexander F Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health (NIH), Bethesda, Maryland
| | - Joan E Bailey-Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health (NIH), Bethesda, Maryland
| | - Anthony M Musolf
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health (NIH), Bethesda, Maryland
| | - Dwight Stambolian
- Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania
| | | | - Richard J Cook
- Department of Statistics and Actuarial Science, Waterloo, Ontario, Quebec, Canada
| | - Francis J McMahon
- Human Genetics Branch and Genetic Basis of Mood and Anxiety Disorders Section, University of Waterloo, National Institute of Mental Health, NIH, Bethesda, Maryland
| | | | - Momiao Xiong
- Human Genetics Center, University of Texas-Houston, Houston, Texas
| | - Ruzong Fan
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health (NIH), Bethesda, Maryland
- Department of Biochemistry and Molecular Biology, School of Basic Medicine, Kunming Medical University, Kunming, Yunnan, China
| |
Collapse
|
40
|
Almlöf JC, Nystedt S, Leonard D, Eloranta ML, Grosso G, Sjöwall C, Bengtsson AA, Jönsen A, Gunnarsson I, Svenungsson E, Rönnblom L, Sandling JK, Syvänen AC. Whole-genome sequencing identifies complex contributions to genetic risk by variants in genes causing monogenic systemic lupus erythematosus. Hum Genet 2019; 138:141-150. [PMID: 30707351 PMCID: PMC6373277 DOI: 10.1007/s00439-018-01966-7] [Citation(s) in RCA: 57] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Accepted: 12/13/2018] [Indexed: 01/01/2023]
Abstract
Systemic lupus erythematosus (SLE, OMIM 152700) is a systemic autoimmune disease with a complex etiology. The mode of inheritance of the genetic risk beyond familial SLE cases is currently unknown. Additionally, the contribution of heterozygous variants in genes known to cause monogenic SLE is not fully understood. Whole-genome sequencing of DNA samples from 71 Swedish patients with SLE and their healthy biological parents was performed to investigate the general genetic risk of SLE using known SLE GWAS risk loci identified using the ImmunoChip, variants in genes associated to monogenic SLE, and the mode of inheritance of SLE risk alleles in these families. A random forest model for predicting genetic risk for SLE showed that the SLE risk variants were mainly inherited from one of the parents. In the 71 patients, we detected a significant enrichment of ultra-rare ( ≤ 0.1%) missense and nonsense mutations in 22 genes known to cause monogenic forms of SLE. We identified one previously reported homozygous nonsense mutation in the C1QC (Complement C1q C Chain) gene, which explains the immunodeficiency and severe SLE phenotype of that patient. We also identified seven ultra-rare, coding heterozygous variants in five genes (C1S, DNASE1L3, DNASE1, IFIH1, and RNASEH2A) involved in monogenic SLE. Our findings indicate a complex contribution to the overall genetic risk of SLE by rare variants in genes associated with monogenic forms of SLE. The rare variants were inherited from the other parent than the one who passed on the more common risk variants leading to an increased genetic burden for SLE in the child. Higher frequency SLE risk variants are mostly passed from one of the parents to the offspring affected with SLE. In contrast, the other parent, in seven cases, contributed heterozygous rare variants in genes associated with monogenic forms of SLE, suggesting a larger impact of rare variants in SLE than hitherto reported.
Collapse
Affiliation(s)
- Jonas Carlsson Almlöf
- Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory, Uppsala University, 751 23, Uppsala, Sweden.
| | - Sara Nystedt
- Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory, Uppsala University, 751 23, Uppsala, Sweden
| | - Dag Leonard
- Department of Medical Sciences, Rheumatology and Science for Life Laboratory, Uppsala University, 751 85, Uppsala, Sweden
| | - Maija-Leena Eloranta
- Department of Medical Sciences, Rheumatology and Science for Life Laboratory, Uppsala University, 751 85, Uppsala, Sweden
| | - Giorgia Grosso
- Rheumatology Unit, Department of Medicine, Karolinska Institutet, Rheumatology, Karolinska University Hospital, 171 77, Stockholm, Sweden
| | - Christopher Sjöwall
- Division of Neuro and Inflammation Sciences, Department of Clinical and Experimental Medicine, Rheumatology, Linköping University, 581 83, Linköping, Sweden
| | - Anders A Bengtsson
- Department of Clinical Sciences, Rheumatology, Lund University, Skåne University Hospital, 222 42, Lund, Sweden
| | - Andreas Jönsen
- Department of Clinical Sciences, Rheumatology, Lund University, Skåne University Hospital, 222 42, Lund, Sweden
| | - Iva Gunnarsson
- Rheumatology Unit, Department of Medicine, Karolinska Institutet, Rheumatology, Karolinska University Hospital, 171 77, Stockholm, Sweden
| | - Elisabet Svenungsson
- Rheumatology Unit, Department of Medicine, Karolinska Institutet, Rheumatology, Karolinska University Hospital, 171 77, Stockholm, Sweden
| | - Lars Rönnblom
- Department of Medical Sciences, Rheumatology and Science for Life Laboratory, Uppsala University, 751 85, Uppsala, Sweden
| | - Johanna K Sandling
- Department of Medical Sciences, Rheumatology and Science for Life Laboratory, Uppsala University, 751 85, Uppsala, Sweden
| | - Ann-Christine Syvänen
- Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory, Uppsala University, 751 23, Uppsala, Sweden
| |
Collapse
|
41
|
Chen Z, Wang K. Gene-based sequential burden association test. Stat Med 2019; 38:2353-2363. [PMID: 30706509 DOI: 10.1002/sim.8111] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Revised: 11/29/2018] [Accepted: 01/10/2019] [Indexed: 11/10/2022]
Abstract
Detecting the association between a set of variants and a phenotype of interest is the first and important step in genetic and genomic studies. Although it attracted a large amount of attention in the scientific community and several related statistical approaches have been proposed in the literature, powerful and robust statistical tests are still highly desired and yet to be developed in this area. In this paper, we propose a powerful and robust association test, which combines information from each individual single-nucleotide polymorphisms based on sequential independent burden tests. We compare the proposed approach with some popular tests through a comprehensive simulation study and real data application. Our results show that, in general, the new test is more powerful; the gain in detecting power can be substantial in many situations, compared to other methods.
Collapse
Affiliation(s)
- Zhongxue Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Indiana University Bloomington, Bloomington, Indiana
| | - Kai Wang
- Department of Biostatistics, College of Public Health, University of Iowa, Iowa City, Iowa
| |
Collapse
|
42
|
Zhang X, Basile AO, Pendergrass SA, Ritchie MD. Real world scenarios in rare variant association analysis: the impact of imbalance and sample size on the power in silico. BMC Bioinformatics 2019; 20:46. [PMID: 30669967 PMCID: PMC6343276 DOI: 10.1186/s12859-018-2591-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 12/26/2018] [Indexed: 11/11/2022] Open
Abstract
Background The development of sequencing techniques and statistical methods provides great opportunities for identifying the impact of rare genetic variation on complex traits. However, there is a lack of knowledge on the impact of sample size, case numbers, the balance of cases vs controls for both burden and dispersion based rare variant association methods. For example, Phenome-Wide Association Studies may have a wide range of case and control sample sizes across hundreds of diagnoses and traits, and with the application of statistical methods to rare variants, it is important to understand the strengths and limitations of the analyses. Results We conducted a large-scale simulation of randomly selected low-frequency protein-coding regions using twelve different balanced samples with an equal number of cases and controls as well as twenty-one unbalanced sample scenarios. We further explored statistical performance of different minor allele frequency thresholds and a range of genetic effect sizes. Our simulation results demonstrate that using an unbalanced study design has an overall higher type I error rate for both burden and dispersion tests compared with a balanced study design. Regression has an overall higher type I error with balanced cases and controls, while SKAT has higher type I error for unbalanced case-control scenarios. We also found that both type I error and power were driven by the number of cases in addition to the case to control ratio under large control group scenarios. Based on our power simulations, we observed that a SKAT analysis with case numbers larger than 200 for unbalanced case-control models yielded over 90% power with relatively well controlled type I error. To achieve similar power in regression, over 500 cases are needed. Moreover, SKAT showed higher power to detect associations in unbalanced case-control scenarios than regression. Conclusions Our results provide important insights into rare variant association study designs by providing a landscape of type I error and statistical power for a wide range of sample sizes. These results can serve as a benchmark for making decisions about study design for rare variant analyses. Electronic supplementary material The online version of this article (10.1186/s12859-018-2591-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xinyuan Zhang
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Anna O Basile
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Sarah A Pendergrass
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, USA
| | - Marylyn D Ritchie
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA. .,Department of Genetics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA.
| |
Collapse
|
43
|
The impact of a fine-scale population stratification on rare variant association test results. PLoS One 2018; 13:e0207677. [PMID: 30521541 PMCID: PMC6283567 DOI: 10.1371/journal.pone.0207677] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Accepted: 11/05/2018] [Indexed: 12/28/2022] Open
Abstract
Population stratification is a well-known confounding factor in both common and rare variant association analyses. Rare variants tend to be more geographically clustered than common variants, because of their more recent origin. However, it is not yet clear if population stratification at a very fine scale (neighboring administrative regions within a country) would lead to statistical bias in rare variant analyses. As the inclusion of convenience controls from external studies is indeed a common procedure, in order to increase the power to detect genetic associations, this problem is important. We studied through simulation the impact of a fine scale population structure on different rare variant association strategies, assessing type I error and power. We showed that principal component analysis (PCA) based methods of adjustment for population stratification adequately corrected type I error inflation at the largest geographical scales, but not at finest scales. We also showed in our simulations that adding controls obviously increased power, but at a considerably lower level when controls were drawn from another population.
Collapse
|
44
|
Zhu B, Mirabello L, Chatterjee N. A subregion-based burden test for simultaneous identification of susceptibility loci and subregions within. Genet Epidemiol 2018; 42:673-683. [PMID: 29931698 PMCID: PMC6185783 DOI: 10.1002/gepi.22134] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Revised: 04/14/2018] [Accepted: 05/04/2018] [Indexed: 01/08/2023]
Abstract
In rare variant association studies, aggregating rare and/or low frequency variants, may increase statistical power for detection of the underlying susceptibility gene or region. However, it is unclear which variants, or class of them, in a gene contribute most to the association. We proposed a subregion-based burden test (REBET) to simultaneously select susceptibility genes and identify important underlying subregions. The subregions are predefined by shared common biologic characteristics, such as the protein domain or functional impact. Based on a subset-based approach considering local correlations between combinations of test statistics of subregions, REBET is able to properly control the type I error rate while adjusting for multiple comparisons in a computationally efficient manner. Simulation studies show that REBET can achieve power competitive to alternative methods when rare variants cluster within subregions. In two case studies, REBET is able to identify known disease susceptibility genes, and more importantly pinpoint the unreported most susceptible subregions, which represent protein domains essential for gene function. R package REBET is available at https://dceg.cancer.gov/tools/analysis/rebet.
Collapse
Affiliation(s)
- Bin Zhu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institute of Health, Bethesda, MD 20892, USA
| | - Lisa Mirabello
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institute of Health, Bethesda, MD 20892, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA
- Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD 21205, USA
| |
Collapse
|
45
|
Wang X, Boekstegers F, Brinster R. Methods and results from the genome-wide association group at GAW20. BMC Genet 2018; 19:79. [PMID: 30255814 PMCID: PMC6157187 DOI: 10.1186/s12863-018-0649-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
BACKGROUND This paper summarizes the contributions from the Genome-wide Association Study group (GWAS group) of the GAW20. The GWAS group contributions focused on topics such as association tests, phenotype imputation, and application of empirical kinships. The goals of the GWAS group contributions were varied. A real or a simulated data set based on the Genetics of Lipid Lowering Drugs and Diet Network (GOLDN) study was employed by different methods. Different outcomes and covariates were considered, and quality control procedures varied throughout the contributions. RESULTS The consideration of heritability and family structure played a major role in some contributions. The inclusion of family information and adaptive weights based on data were found to improve power in genome-wide association studies. It was proven that gene-level approaches are more powerful than single-marker analysis. Other contributions focused on the comparison between pedigree-based kinship and empirical kinship matrices, and investigated similar results in heritability estimation, association mapping, and genomic prediction. A new approach for linkage mapping of triglyceride levels was able to identify a novel linkage signal. CONCLUSIONS This summary paper reports on promising statistical approaches and findings of the members of the GWAS group applied on real and simulated data which encompass the current topics of epigenetic and pharmacogenomics.
Collapse
Affiliation(s)
- Xuexia Wang
- University of North Texas, GAB 459, 1155 Union Circle #311430, Denton, TX 76203 USA
| | - Felix Boekstegers
- Institute of Medical Biometry and Informatics, University of Heidelberg, Im Neuenheimer Feld 130.3, 69120 Heidelberg, Germany
| | - Regina Brinster
- Institute of Medical Biometry and Informatics, University of Heidelberg, Im Neuenheimer Feld 130.3, 69120 Heidelberg, Germany
| |
Collapse
|
46
|
Wu X, Guan T, Liu DJ, León Novelo LG, Bandyopadhyay D. ADAPTIVE-WEIGHT BURDEN TEST FOR ASSOCIATIONS BETWEEN QUANTITATIVE TRAITS AND GENOTYPE DATA WITH COMPLEX CORRELATIONS. Ann Appl Stat 2018; 12:1558-1582. [PMID: 30214655 DOI: 10.1214/17-aoas1121] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
High-throughput sequencing has often been used to screen samples from pedigrees or with population structure, producing genotype data with complex correlations rendered from both familial relation and linkage disequilibrium. With such data, it is critical to account for these genotypic correlations when assessing the contribution of variants by gene or pathway. Recognizing the limitations of existing association testing methods, we propose Adaptive-weight Burden Test (ABT), a retrospective, mixed-model test for genetic association of quantitative traits on genotype data with complex correlations. This method makes full use of genotypic correlations across both samples and variants, and adopts "data-driven" weights to improve power. We derive the ABT statistic and its explicit distribution under the null hypothesis, and demonstrate through simulation studies that it is generally more powerful than the fixed-weight burden test and family-based SKAT in various scenarios, controlling for the type I error rate. Further investigation reveals the connection of ABT with kernel tests, as well as the adaptability of its weights to the direction of genetic effects. The application of ABT is illustrated by a whole genome analysis of genes with common and rare variants associated with fasting glucose from the NHLBI "Grand Opportunity" Exome Sequencing Project.
Collapse
Affiliation(s)
- Xiaowei Wu
- Department of Statistics, Virginia Tech, 250 Drillfield Drive, MC0439, Blacksburg, VA 24061, USA
| | - Ting Guan
- Department of Statistics, Virginia Tech, 250 Drillfield Drive, MC0439, Blacksburg, VA 24061, USA
| | - Dajiang J Liu
- Department of Public Health Sciences, Hershey Institute of Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA 17033, USA
| | - Luis G León Novelo
- Department of Biostatistics, School of Public Health, University of Texas Health Science Center, Houston, TX 77030, USA
| | | |
Collapse
|
47
|
Chen Z, Liu Q, Wang K. A genetic association test through combining two independent tests. Genomics 2018; 111:1152-1159. [PMID: 30009923 DOI: 10.1016/j.ygeno.2018.07.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Revised: 06/25/2018] [Accepted: 07/11/2018] [Indexed: 12/21/2022]
Abstract
Gene- and pathway-based variant association tests are important tools in finding genetic variants that are associated with phenotypes of interest. Although some methods have been proposed in the literature, powerful and robust statistical tests are still desirable in this area. In this study, we propose a statistical test based on decomposing the genotype data into orthogonal parts from which powerful and robust independent p-value combination approaches can be utilized. Through a comprehensive simulation study, we compare the proposed test with some existing popular ones. Our simulation results show that the new test has great performance in terms of controlling type I error rate and statistical power. Real data applications are also conducted to illustrate the performance and usefulness of the proposed test.
Collapse
Affiliation(s)
- Zhongxue Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Indiana University Bloomington, 1025 E. 7th street, Bloomington, IN 47405, USA.
| | - Qingzhong Liu
- Department of Computer Science, Sam Houston State University, 1803 Avenue I, Huntsville, TX 77341, USA
| | - Kai Wang
- Department of Biostatistics, College of Public Health, University of Iowa, 145 N. Riverside Drive, Iowa City, IA 52242, USA
| |
Collapse
|
48
|
Deng Y, Li Z, Liu J, Wang Z, Cao Y, Mou Y, Fu B, Mo B, Wei J, Cheng Z, Luo L, Li J, Shu Y, Wang X, Luo G, Yang S, Wang Y, Zhu J, Yang J, Wu M, Xu X, Ge R, Chen X, Peng Q, Wei G, Li Y, Yang H, Fang S, Zhang X, Xiong W. Targeted resequencing reveals genetic risks in patients with sporadic idiopathic pulmonary fibrosis. Hum Mutat 2018; 39:1238-1245. [PMID: 29920840 DOI: 10.1002/humu.23566] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2018] [Revised: 06/11/2018] [Accepted: 06/11/2018] [Indexed: 12/19/2022]
Abstract
Idiopathic pulmonary fibrosis (IPF) is a genetic heterogeneous disease with high mortality and poor prognosis. However, a large fraction of genetic cause remains unexplained, especially in sporadic IPF (∼80% IPF). By systemically reviewing related literature and potential pathogenic pathways, 92 potentially IPF-related genes were selected and sequenced in genomic DNAs from 253 sporadic IPF patients and 125 matched health controls using targeted massively parallel next-generation sequencing. The identified risk variants were confirmed by Sanger sequencing. We identified two pathogenic and 10 loss-of-function (LOF) candidate variants, accounting for 4.74% (12 out of 253) of all the IPF cases. In burden tests, rare missense variants in three genes (CSF3R, DSP, and LAMA3) were identified that have a statistically significant relationship with IPF. Four common SNPs (rs3737002, rs2296160, rs1800470, and rs35705950) were observed to be statistically associated with increased risk of IPF. In the cumulative risk model, high risk subjects had 3.47-fold (95%CI: 2.07-5.81, P = 2.34 × 10-6 ) risk of developing IPF compared with low risk subjects. We drafted a comprehensive map of genetic risks (including both rare and common candidate variants) in patients with IPF, which could provide insights to help in understanding mechanisms, providing genetic diagnosis, and predicting risk for IPF.
Collapse
Affiliation(s)
- Yanhan Deng
- Department of Respiratory and Critical Care Medicine, Key Laboratory of Pulmonary Diseases of Health Ministry, Key Cite of National Clinical Research Center for Respiratory Disease, Tongji Hospital, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, China
| | - Zongzhe Li
- Division of Cardiology, Departments of Internal Medicine and Genetic Diagnosis Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Juan Liu
- Department of Respiratory and Critical Care Medicine, Key Laboratory of Pulmonary Diseases of Health Ministry, Key Cite of National Clinical Research Center for Respiratory Disease, Tongji Hospital, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, China
| | - Zheng Wang
- Department of Respiratory Medicine, Henan Provincial People's Hospital & the People's Hospital of Zhengzhou University, Zhengzhou, China
| | - Yanyan Cao
- Division of Cardiology, Departments of Internal Medicine and Genetic Diagnosis Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Yong Mou
- Department of Respiratory and Critical Care Medicine, Key Laboratory of Pulmonary Diseases of Health Ministry, Key Cite of National Clinical Research Center for Respiratory Disease, Tongji Hospital, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, China
| | - Bohua Fu
- Department of Respiratory and Critical Care Medicine, Key Laboratory of Pulmonary Diseases of Health Ministry, Key Cite of National Clinical Research Center for Respiratory Disease, Tongji Hospital, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, China
| | - Biwen Mo
- Department of Respiratory Medicine, Affiliated hospital of Guilin Medical University, Guilin, China
| | - Jianghong Wei
- Department of Respiratory Medicine, Affiliated hospital of Guilin Medical University, Guilin, China
| | - Zhenshun Cheng
- Department of Respiratory Medicine, Zhongnan Hospital of Wuhan University, Wuhan University, Wuhan, China
| | - Liman Luo
- Department of Pediatrics, The 306 Hospital of People's Liberation Army, Beijing, China
| | - Jingping Li
- Department of Respiratory Medicine, Qianjiang Central Hospital, Qianjiang, China
| | - Ying Shu
- Department of Respiratory Medicine, Qianjiang Central Hospital, Qianjiang, China
| | - Xiaomei Wang
- Department of Geriatrics, Southwest Hospital, Army Medical University, Chongqing, China
| | - Guangwei Luo
- Department of Respiratory Medicine, Wuhan No. 1 Hospital, Wuhan, China
| | - Shuo Yang
- Department of Respiratory Medicine, Wuhan No. 1 Hospital, Wuhan, China
| | - Yingnan Wang
- Department of Respiratory and Critical Care Medicine, Renmin Hospital of Three Gorges University, Yichang, China
| | - Jing Zhu
- Department of Respiratory and Critical Care Medicine, Renmin Hospital of Three Gorges University, Yichang, China
| | - Jingping Yang
- Department of Respiratory and Critical Care Medicine, The Third Affiliated Hospital of Inner Mongolia Medical University, Baotou, China
| | - Ming Wu
- Department of Respiratory and Critical Care Medicine, The Third Affiliated Hospital of Inner Mongolia Medical University, Baotou, China
| | - Xuyan Xu
- Department of Respiratory Medicine, Xianning Center Hospital, The First Affiliated Hospital of Hubei University of Science and Technology, Xianning, China
| | - Renying Ge
- Department of Respiratory Medicine, Xianning Center Hospital, The First Affiliated Hospital of Hubei University of Science and Technology, Xianning, China
| | - Xueqin Chen
- Department of Respiratory and Critical Care Medicine, Wuhan University Renmin Hospital, Wuhan University, Wuhan, China
| | - Qingzhen Peng
- Department of Respiratory Medicine, Xiaogan Central Hospital, Xiaogan, China
| | - Guang Wei
- Department of Respiratory Medicine, Xiaogan Central Hospital, Xiaogan, China
| | - Yaqing Li
- Department of Respiratory Medicine, Zhejiang Provincial People's Hospital, Hangzhou, China
| | - Hua Yang
- Department of Respiratory Medicine, University Hospital of Hubei University for Nationalities, Enshi, China
| | - Shirong Fang
- Department of Respiratory Medicine, University Hospital of Hubei University for Nationalities, Enshi, China
| | - Xiaoju Zhang
- Department of Respiratory Medicine, Henan Provincial People's Hospital & the People's Hospital of Zhengzhou University, Zhengzhou, China
| | - Weining Xiong
- Department of Respiratory and Critical Care Medicine, Key Laboratory of Pulmonary Diseases of Health Ministry, Key Cite of National Clinical Research Center for Respiratory Disease, Tongji Hospital, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, China
| |
Collapse
|
49
|
Novel Methods for Family-Based Genetic Studies. Methods Mol Biol 2018. [PMID: 29876895 DOI: 10.1007/978-1-4939-7868-7_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
The recent development of microarray and sequencing technology allows identification of disease susceptibility genes. Although the genome-wide association studies (GWAS) have successfully identified many genetic markers related to human diseases, the traditional statistical methods are not powerful to detect rare genetic markers. The rare genetic markers are usually grouped together and tested at the set level. One of such methods is the sequence kernel association test (SKAT), which has been commonly used in the rare genetic marker analysis. In recent publications, SKAT has been extended to be applicable for family-based rare variant analysis. Here, I present three published statistical approaches for family-based rare variant analysis for: 1. continuous traits, 2. binary traits, and 3. multiple correlated traits.
Collapse
|
50
|
Belonogova NM, Svishcheva GR, Wilson JF, Campbell H, Axenovich TI. Weighted functional linear regression models for gene-based association analysis. PLoS One 2018; 13:e0190486. [PMID: 29309409 PMCID: PMC5757938 DOI: 10.1371/journal.pone.0190486] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Accepted: 12/17/2017] [Indexed: 11/19/2022] Open
Abstract
Functional linear regression models are effectively used in gene-based association analysis of complex traits. These models combine information about individual genetic variants, taking into account their positions and reducing the influence of noise and/or observation errors. To increase the power of methods, where several differently informative components are combined, weights are introduced to give the advantage to more informative components. Allele-specific weights have been introduced to collapsing and kernel-based approaches to gene-based association analysis. Here we have for the first time introduced weights to functional linear regression models adapted for both independent and family samples. Using data simulated on the basis of GAW17 genotypes and weights defined by allele frequencies via the beta distribution, we demonstrated that type I errors correspond to declared values and that increasing the weights of causal variants allows the power of functional linear models to be increased. We applied the new method to real data on blood pressure from the ORCADES sample. Five of the six known genes with P < 0.1 in at least one analysis had lower P values with weighted models. Moreover, we found an association between diastolic blood pressure and the VMP1 gene (P = 8.18×10-6), when we used a weighted functional model. For this gene, the unweighted functional and weighted kernel-based models had P = 0.004 and 0.006, respectively. The new method has been implemented in the program package FREGAT, which is freely available at https://cran.r-project.org/web/packages/FREGAT/index.html.
Collapse
Affiliation(s)
- Nadezhda M. Belonogova
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Gulnara R. Svishcheva
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
- Vavilov Institute of General Genetics, the Russian Academy of Sciences, Moscow, Russia
| | - James F. Wilson
- Centre for Global Health Research, Usher Institute for Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, Scotland
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, Scotland
| | - Harry Campbell
- Centre for Global Health Research, Usher Institute for Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, Scotland
| | - Tatiana I. Axenovich
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
- Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|