1
|
Kezios KL, Hayes-Larson E. Sufficient component cause simulations: an underutilized epidemiologic teaching tool. FRONTIERS IN EPIDEMIOLOGY 2023; 3:1282809. [PMID: 38435670 PMCID: PMC10906966 DOI: 10.3389/fepid.2023.1282809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 10/19/2023] [Indexed: 03/05/2024]
Abstract
Simulation studies are a powerful and important tool in epidemiologic teaching, especially for understanding causal inference. Simulations using the sufficient component cause framework can provide students key insights about causal mechanisms and sources of bias, but are not commonly used. To make them more accessible, we aim to provide an introduction and tutorial on developing and using these simulations, including an overview of translation from directed acyclic graphs and potential outcomes to sufficient component causal models, and a summary of the simulation approach. Using the applied question of the impact of educational attainment on dementia, we offer simple simulation examples and accompanying code to illustrate sufficient component cause-based simulations for four common causal structures (causation, confounding, selection bias, and effect modification) often introduced early in epidemiologic training. We show how sufficient component cause-based simulations illuminate both the causal processes and the mechanisms through which bias occurs, which can help enhance student understanding of these causal structures and the distinctions between them. We conclude with a discussion of considerations for using sufficient component cause-based simulations as a teaching tool.
Collapse
Affiliation(s)
- Katrina L. Kezios
- Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, NY, United States
| | - Eleanor Hayes-Larson
- Department of Epidemiology, Fielding School of Public Health, University of California, Los Angeles, CA, United States
| |
Collapse
|
2
|
Li Q, Perera D, Cao C, He J, Bian J, Chen X, Azeem F, Howe A, Au B, Wu J, Yan J, Long Q. Interaction-integrated linear mixed model reveals 3D-genetic basis underlying Autism. Genomics 2023; 115:110575. [PMID: 36758877 DOI: 10.1016/j.ygeno.2023.110575] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 01/16/2023] [Accepted: 02/03/2023] [Indexed: 02/10/2023]
Abstract
Genetic interactions play critical roles in genotype-phenotype associations. We developed a novel interaction-integrated linear mixed model (ILMM) that integrates a priori knowledge into linear mixed models. ILMM enables statistical integration of genetic interactions upfront and overcomes the problems of searching for combinations. To demonstrate its utility, with 3D genomic interactions (assessed by Hi-C experiments) as a priori, we applied ILMM to whole-genome sequencing data for Autism Spectrum Disorders (ASD) and brain transcriptome data, revealing the 3D-genetic basis of ASD and 3D-expression quantitative loci (3D-eQTLs) for brain tissues. Notably, we reported a potential mechanism involving distal regulation between FOXP2 and DNMT3A, conferring the risk of ASD.
Collapse
Affiliation(s)
- Qing Li
- Department of Biochemistry and Molecular Biology, University of Calgary, Alberta T2N 1N4, Canada
| | - Deshan Perera
- Department of Biochemistry and Molecular Biology, University of Calgary, Alberta T2N 1N4, Canada
| | - Chen Cao
- Department of Biochemistry and Molecular Biology, University of Calgary, Alberta T2N 1N4, Canada
| | - Jingni He
- Department of Biochemistry and Molecular Biology, University of Calgary, Alberta T2N 1N4, Canada
| | - Jiayi Bian
- Department of Mathematics and Statistics, University of Calgary, Alberta T2N 1N4, Canada
| | - Xingyu Chen
- Department of Biochemistry and Molecular Biology, University of Calgary, Alberta T2N 1N4, Canada
| | - Feeha Azeem
- Department of Biochemistry and Molecular Biology, University of Calgary, Alberta T2N 1N4, Canada
| | - Aaron Howe
- Heritage Youth Researcher Summer Program, University of Calgary, Alberta T2N 1N4, Canada
| | - Billie Au
- Department of Medical Genetics, University of Calgary, Alberta T2N 1N4, Canada; Alberta Children's Hospital Research Institute, University of Calgary, Alberta T2N 1N4, Canada
| | - Jingjing Wu
- Department of Mathematics and Statistics, University of Calgary, Alberta T2N 1N4, Canada
| | - Jun Yan
- Department of Physiology and Pharmacology, University of Calgary, Alberta T2N 1N4, Canada; Hotchkiss Brain Institute, University of Calgary, Alberta T2N 1N4, Canada.
| | - Quan Long
- Department of Biochemistry and Molecular Biology, University of Calgary, Alberta T2N 1N4, Canada; Department of Medical Genetics, University of Calgary, Alberta T2N 1N4, Canada; Department of Mathematics and Statistics, University of Calgary, Alberta T2N 1N4, Canada; Alberta Children's Hospital Research Institute, University of Calgary, Alberta T2N 1N4, Canada; Hotchkiss Brain Institute, University of Calgary, Alberta T2N 1N4, Canada.
| |
Collapse
|
3
|
Woodward AA, Urbanowicz RJ, Naj AC, Moore JH. Genetic heterogeneity: Challenges, impacts, and methods through an associative lens. Genet Epidemiol 2022; 46:555-571. [PMID: 35924480 PMCID: PMC9669229 DOI: 10.1002/gepi.22497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 07/06/2022] [Accepted: 07/19/2022] [Indexed: 01/07/2023]
Abstract
Genetic heterogeneity describes the occurrence of the same or similar phenotypes through different genetic mechanisms in different individuals. Robustly characterizing and accounting for genetic heterogeneity is crucial to pursuing the goals of precision medicine, for discovering novel disease biomarkers, and for identifying targets for treatments. Failure to account for genetic heterogeneity may lead to missed associations and incorrect inferences. Thus, it is critical to review the impact of genetic heterogeneity on the design and analysis of population level genetic studies, aspects that are often overlooked in the literature. In this review, we first contextualize our approach to genetic heterogeneity by proposing a high-level categorization of heterogeneity into "feature," "outcome," and "associative" heterogeneity, drawing on perspectives from epidemiology and machine learning to illustrate distinctions between them. We highlight the unique nature of genetic heterogeneity as a heterogeneous pattern of association that warrants specific methodological considerations. We then focus on the challenges that preclude effective detection and characterization of genetic heterogeneity across a variety of epidemiological contexts. Finally, we discuss systems heterogeneity as an integrated approach to using genetic and other high-dimensional multi-omic data in complex disease research.
Collapse
Affiliation(s)
- Alexa A. Woodward
- Department of Biostatistics, Epidemiology and InformaticsUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Ryan J. Urbanowicz
- Department of Computational BiomedicineCedars‐Sinai Medical CenterLos AngelesCaliforniaUSA
| | - Adam C. Naj
- Department of Biostatistics, Epidemiology and InformaticsUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Jason H. Moore
- Department of Computational BiomedicineCedars‐Sinai Medical CenterLos AngelesCaliforniaUSA
| |
Collapse
|
4
|
Schrodi SJ. Reflections on the Field of Human Genetics: A Call for Increased Disease Genetics Theory. Front Genet 2016; 7:106. [PMID: 27375680 PMCID: PMC4896932 DOI: 10.3389/fgene.2016.00106] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2016] [Accepted: 05/25/2016] [Indexed: 12/29/2022] Open
Abstract
Development of human genetics theoretical models and the integration of those models with experiment and statistical evaluation are critical for scientific progress. This perspective argues that increased effort in disease genetics theory, complementing experimental, and statistical efforts, will escalate the unraveling of molecular etiologies of complex diseases. In particular, the development of new, realistic disease genetics models will help elucidate complex disease pathogenesis, and the predicted patterns in genetic data made by these models will enable the concurrent, more comprehensive statistical testing of multiple aspects of disease genetics predictions, thereby better identifying disease loci. By theoretical human genetics, I intend to encompass all investigations devoted to modeling the heritable architecture underlying disease traits and studies of the resulting principles and dynamics of such models. Hence, the scope of theoretical disease genetics work includes construction and analysis of models describing how disease-predisposing alleles (1) arise, (2) are transmitted across families and populations, and (3) interact with other risk and protective alleles across both the genome and environmental factors to produce disease states. Theoretical work improves insight into viable genetic models of diseases consistent with empirical results from linkage, transmission, and association studies as well as population genetics. Furthermore, understanding the patterns of genetic data expected under realistic disease models will enable more powerful approaches to discover disease-predisposing alleles and additional heritable factors important in common diseases. In spite of the pivotal role of disease genetics theory, such investigation is not particularly vibrant.
Collapse
Affiliation(s)
- Steven J Schrodi
- Marshfield Clinic Research Foundation, Center for Human GeneticsMarshfield, WI, USA; Computation and Informatics in Biology and Medicine, University of Wisconsin-MadisonMadison, WI, USA
| |
Collapse
|
5
|
Abstract
Although many genome-wide association studies have been performed, the identification of disease polymorphisms remains important. It is now suspected that many rare disease variants induce the association signal of common variants in linkage disequilibrium (LD). Based on recent development of genetic models, the current study provides explanations of the existence of rare variants with high impacts and common variants with low impacts. Disease variants are neither necessary nor sufficient due to gene–gene or gene–environment interactions. A new method was developed based on theoretical aspects to identify both rare and common disease variants by their genotypes. Common disease variants were identified with relatively small odds ratios and relatively small sample sizes, except for specific situations in which the disease variants were in strong LD with a variant with a higher frequency. Rare disease variants with small impacts were difficult to identify without increasing sample sizes; however, the method was reasonably accurate for rare disease variants with high impacts. For rare variants, dominant variants generally showed better Type II error rates than recessive variants; however, the trend was reversed for common variants. Type II error rates increased in gene regions containing more than two disease variants because the more common variant, rather than both disease variants, was usually identified. The proposed method would be useful for identifying common disease variants with small impacts and rare disease variants with large impacts when disease variants have the same effects on disease presentation.
Collapse
|
6
|
Lee WC. Testing for Sufficient-Cause Gene-Environment Interactions Under the Assumptions of Independence and Hardy-Weinberg Equilibrium. Am J Epidemiol 2015; 182:9-16. [PMID: 26025233 DOI: 10.1093/aje/kwv030] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2014] [Accepted: 01/26/2015] [Indexed: 12/24/2022] Open
Abstract
To detect gene-environment interactions, a logistic regression model is typically fitted to a set of case-control data, and the focus is on testing of the cross-product terms (gene × environment) in the model. A significant result is indicative of a gene-environment interaction under a multiplicative model for disease odds. Based on the sufficient-cause model for rates, in this paper we put forward a general approach to testing for sufficient-cause gene-environment interactions in case-control studies. The proposed tests can be tailored to detect a particular type of sufficient-cause gene-environment interaction with greater sensitivity. These tests include testing for autosomal dominant, autosomal recessive, and gene-dosage interactions. The tests can also detect trend interactions (e.g., a larger gene-environment interaction with a higher level of environmental exposure) and threshold interactions (e.g., gene-environment interaction occurs only when environmental exposure reaches a certain threshold level). Two assumptions are necessary for the validity of the tests: 1) the rare-disease assumption and 2) the no-redundancy assumption. Another 2 assumptions are optional but, if imposed correctly, can boost the statistical powers of the tests: 3) the gene-environment independence assumption and 4) the Hardy-Weinberg equilibrium assumption. SAS code (SAS Institute, Inc., Cary, North Carolina) for implementing the methods is provided.
Collapse
|
7
|
Park L, Kim JH. A novel approach for identifying causal models of complex diseases from family data. Genetics 2015; 199:1007-16. [PMID: 25701286 PMCID: PMC4391573 DOI: 10.1534/genetics.114.174102] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2014] [Accepted: 02/16/2015] [Indexed: 02/01/2023] Open
Abstract
Causal models including genetic factors are important for understanding the presentation mechanisms of complex diseases. Familial aggregation and segregation analyses based on polygenic threshold models have been the primary approach to fitting genetic models to the family data of complex diseases. In the current study, an advanced approach to obtaining appropriate causal models for complex diseases based on the sufficient component cause (SCC) model involving combinations of traditional genetics principles was proposed. The probabilities for the entire population, i.e., normal-normal, normal-disease, and disease-disease, were considered for each model for the appropriate handling of common complex diseases. The causal model in the current study included the genetic effects from single genes involving epistasis, complementary gene interactions, gene-environment interactions, and environmental effects. Bayesian inference using a Markov chain Monte Carlo algorithm (MCMC) was used to assess of the proportions of each component for a given population lifetime incidence. This approach is flexible, allowing both common and rare variants within a gene and across multiple genes. An application to schizophrenia data confirmed the complexity of the causal factors. An analysis of diabetes data demonstrated that environmental factors and gene-environment interactions are the main causal factors for type II diabetes. The proposed method is effective and useful for identifying causal models, which can accelerate the development of efficient strategies for identifying causal factors of complex diseases.
Collapse
Affiliation(s)
- Leeyoung Park
- Natural Science Research Institute, Yonsei University, Seoul, Korea 120-749
| | - Ju H Kim
- Seoul National University Biomedical Informatics (SNUBI), Seoul National University College of Medicine, Seoul 110-799, Korea Systems Biomedical Informatics National Core Research Center (SBI-NCRC), Seoul National University College of Medicine, Seoul 110-799, Korea
| |
Collapse
|
8
|
Lee WC. Assessing causal mechanistic interactions: a peril ratio index of synergy based on multiplicativity. PLoS One 2013; 8:e67424. [PMID: 23826299 PMCID: PMC3691192 DOI: 10.1371/journal.pone.0067424] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2013] [Accepted: 05/17/2013] [Indexed: 11/18/2022] Open
Abstract
The assessments of interactions in epidemiology have traditionally been based on risk-ratio, odds-ratio or rate-ratio multiplicativity. However, many epidemiologists fail to recognize that this is mainly for statistical conveniences and often will misinterpret a statistically significant interaction as a genuine mechanistic interaction. The author adopts an alternative metric system for risk, the 'peril'. A peril is an exponentiated cumulative rate, or simply, the inverse of a survival (risk complement) or one plus an odds. The author proposes a new index based on multiplicativity of peril ratios, the 'peril ratio index of synergy based on multiplicativity' (PRISM). Under the assumption of no redundancy, PRISM can be used to assess synergisms in sufficient cause sense, i.e., causal co-actions or causal mechanistic interactions. It has a less stringent threshold to detect a synergy as compared to a previous index of 'relative excess risk due to interaction'. Using the new PRISM criterion, many situations in which there is not evidence of interaction judged by the traditional indices are in fact corresponding to bona fide positive or negative synergisms.
Collapse
Affiliation(s)
- Wen-Chung Lee
- Research Center for Genes, Environment and Human Health, College of Public Health, National Taiwan University, Taipei, Taiwan.
| |
Collapse
|
9
|
Abstract
Digenic inheritance (DI) is the simplest form of inheritance for genetically complex diseases. By contrast with the thousands of reports that mutations in single genes cause human diseases, there are only dozens of human disease phenotypes with evidence for DI in some pedigrees. The advent of high-throughput sequencing (HTS) has made it simpler to identify monogenic disease causes and could similarly simplify proving DI because one can simultaneously find mutations in two genes in the same sample. However, through 2012, I could find only one example of human DI in which HTS was used; in that example, HTS found only the second of the two genes. To explore the gap between expectation and reality, I tried to collect all examples of human DI with a narrow definition and characterise them according to the types of evidence collected, and whether there has been replication. Two strong trends are that knowledge of candidate genes and knowledge of protein–protein interactions (PPIs) have been helpful in most published examples of human DI. By contrast, the positional method of genetic linkage analysis, has been mostly unsuccessful in identifying genes underlying human DI. Based on the empirical data, I suggest that combining HTS with growing networks of established PPIs may expedite future discoveries of human DI and strengthen the evidence for them.
Collapse
|
10
|
McEachin RC, Cavalcoli JD. Overlap of genetic influences in phenotypes classically categorized as psychiatric vs medical disorders. World J Med Genet 2011; 1:4-10. [DOI: 10.5496/wjmg.v1.i1.4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Psychiatric disorders have traditionally been segregated from medical disorders in terms of drugs, treatment, insurance coverage and training of clinicians. This segregation is consistent with the long-standing observation that there are inherent differences between psychiatric disorders (diseases relating to thoughts, feelings and behavior) and medical disorders (diseases relating to physical processes). However, these differences are growing less distinct as we improve our understanding of the roles of epistasis and pleiotropy in medical genetics. Both psychiatric and medical disorders are predisposed in part by genetic variation, and psychiatric disorders tend to be comorbid with medical disorders. One hypothesis on this interaction posits that certain combinations of genetic variants (epistasis) influence psychiatric disorders due to their impact on the brain, but the associated genes are also expressed in other tissues so the same groups of variants influence medical disorders (pleiotropy). The observation that psychiatric and medical disorders may interact is not novel. Equally, both epistasis and pleiotropy are fundamental concepts in medical genetics. However, we are just beginning to understand how genetic variation can influence both psychiatric and medical disorders. In our recent work, we have discovered gene networks significantly associated with psychiatric and substance use disorders. Invariably, these networks are also significantly associated with medical disorders. Recognizing how genetic variation can influence both psychiatric and medical disorders will help us to understand the etiology of the individual and comorbid disease phenotypes, predict and minimize side effects in drug and other treatments, and help to reduce stigma associated with psychiatric disorders.
Collapse
|
11
|
Madsen AM, Hodge SE, Ottman R. Causal models for investigating complex disease: I. A primer. Hum Hered 2011; 72:54-62. [PMID: 21912138 DOI: 10.1159/000330779] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2011] [Accepted: 07/11/2011] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND/AIMS To illustrate the utility of causal models for research in genetic epidemiology and statistical genetics. Causal models are increasingly applied in risk factor epidemiology, economics, and health policy, but seldom used in statistical genetics or genetic epidemiology. Unlike the statistical models usually used in genetic epidemiology, causal models are explicitly formulated in terms of cause and effect relationships occurring at the individual level. METHODS We describe two causal models, the sufficient component cause model and the potential outcomes model, and show how key concepts in genetic epidemiology, including penetrance, phenocopies, genetic heterogeneity, etiologic heterogeneity, gene-gene interaction, and gene-environment interaction, can be framed in terms of these causal models. We also illustrate how potential outcomes models can provide insight into the potential for confounding and bias in the measurement of causal effects in genetic studies. RESULTS Our analysis illustrates how causal models can elucidate the relationships among underlying causal mechanisms and measures obtained from statistical analysis of observed data. CONCLUSION Causal models can enhance research aimed at identifying causal genes.
Collapse
Affiliation(s)
- Ann M Madsen
- Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, USA
| | | | | |
Collapse
|