1
|
Lin WY. Searching for gene-gene interactions through variance quantitative trait loci of 29 continuous Taiwan Biobank phenotypes. Front Genet 2024; 15:1357238. [PMID: 38516378 PMCID: PMC10956579 DOI: 10.3389/fgene.2024.1357238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Accepted: 02/27/2024] [Indexed: 03/23/2024] Open
Abstract
Introduction: After the era of genome-wide association studies (GWAS), thousands of genetic variants have been identified to exhibit main effects on human phenotypes. The next critical issue would be to explore the interplay between genes, the so-called "gene-gene interactions" (GxG) or epistasis. An exhaustive search for all single-nucleotide polymorphism (SNP) pairs is not recommended because this will induce a harsh penalty of multiple testing. Limiting the search of epistasis on SNPs reported by previous GWAS may miss essential interactions between SNPs without significant marginal effects. Moreover, most methods are computationally intensive and can be challenging to implement genome-wide. Methods: I here searched for GxG through variance quantitative trait loci (vQTLs) of 29 continuous Taiwan Biobank (TWB) phenotypes. A discovery cohort of 86,536 and a replication cohort of 25,460 TWB individuals were analyzed, respectively. Results: A total of 18 nearly independent vQTLs with linkage disequilibrium measure r 2 < 0.01 were identified and replicated from nine phenotypes. 15 significant GxG were found with p-values <1.1E-5 (in the discovery cohort) and false discovery rates <2% (in the replication cohort). Among these 15 GxG, 11 were detected for blood traits including red blood cells, hemoglobin, and hematocrit; 2 for total bilirubin; 1 for fasting glucose; and 1 for total cholesterol (TCHO). All GxG were observed for gene pairs on the same chromosome, except for the APOA5 (chromosome 11)-TOMM40 (chromosome 19) interaction for TCHO. Discussion: This study provided a computationally feasible way to search for GxG genome-wide and applied this approach to 29 phenotypes.
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Health Data Analytics and Statistics, College of Public Health, National Taiwan University, Taipei, Taiwan
- Master of Public Health Degree Program, College of Public Health, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
2
|
Pudjihartono N, Fadason T, Kempa-Liehr AW, O'Sullivan JM. A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction. FRONTIERS IN BIOINFORMATICS 2022; 2:927312. [PMID: 36304293 PMCID: PMC9580915 DOI: 10.3389/fbinf.2022.927312] [Citation(s) in RCA: 75] [Impact Index Per Article: 37.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Accepted: 06/03/2022] [Indexed: 01/14/2023] Open
Abstract
Machine learning has shown utility in detecting patterns within large, unstructured, and complex datasets. One of the promising applications of machine learning is in precision medicine, where disease risk is predicted using patient genetic data. However, creating an accurate prediction model based on genotype data remains challenging due to the so-called “curse of dimensionality” (i.e., extensively larger number of features compared to the number of samples). Therefore, the generalizability of machine learning models benefits from feature selection, which aims to extract only the most “informative” features and remove noisy “non-informative,” irrelevant and redundant features. In this article, we provide a general overview of the different feature selection methods, their advantages, disadvantages, and use cases, focusing on the detection of relevant features (i.e., SNPs) for disease risk prediction.
Collapse
Affiliation(s)
| | - Tayaza Fadason
- Liggins Institute, University of Auckland, Auckland, New Zealand
- Maurice Wilkins Centre for Molecular Biodiscovery, Auckland, New Zealand
| | - Andreas W. Kempa-Liehr
- Department of Engineering Science, The University of Auckland, Auckland, New Zealand
- *Correspondence: Andreas W. Kempa-Liehr, ; Justin M. O'Sullivan,
| | - Justin M. O'Sullivan
- Liggins Institute, University of Auckland, Auckland, New Zealand
- Maurice Wilkins Centre for Molecular Biodiscovery, Auckland, New Zealand
- MRC Lifecourse Epidemiology Unit, University of Southampton, Southampton, United Kingdom
- Singapore Institute for Clinical Sciences, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Australian Parkinson’s Mission, Garvan Institute of Medical Research, Sydney, NSW, Australia
- *Correspondence: Andreas W. Kempa-Liehr, ; Justin M. O'Sullivan,
| |
Collapse
|
3
|
Trinder M, Brunham LR. Polygenic scores for dyslipidemia: the emerging genomic model of plasma lipoprotein trait inheritance. Curr Opin Lipidol 2021; 32:103-111. [PMID: 33395106 DOI: 10.1097/mol.0000000000000737] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
PURPOSE OF REVIEW Contemporary polygenic scores, which summarize the cumulative contribution of millions of common single-nucleotide variants to a phenotypic trait, can have effects comparable to monogenic mutations. This review focuses on the emerging use of 'genome-wide' polygenic scores for plasma lipoproteins to define the etiology of clinical dyslipidemia, modify the severity of monogenic disease, and inform therapeutic options. RECENT FINDINGS Polygenic scores for low-density lipoprotein cholesterol (LDL-C), triglycerides, and high-density lipoprotein cholesterol are associated with severe hypercholesterolemia, hypertriglyceridemia, or hypoalphalipoproteinemia, respectively. These polygenic scores for LDL-C or triglycerides associate with risk of incident coronary artery disease (CAD) independent of polygenic scores designed specifically for CAD and may identify individuals that benefit most from lipid-lowering medication. Additionally, the severity of hypercholesterolemia and CAD associated with familial hypercholesterolemia-a common monogenic disorder-is modified by these polygenic factors. The current focus of polygenic scores for dyslipidemia is to design predictive polygenic scores for diverse populations and determining how these polygenic scores could be implemented and standardized for use in the clinic. SUMMARY Polygenic scores have shown early promise for the management of dyslipidemias, but several challenges need to be addressed before widespread clinical implementation to ensure that potential benefits are robust and reproducible, equitable, and cost-effective.
Collapse
Affiliation(s)
- Mark Trinder
- Centre for Heart Lung Innovation, University of British Columbia
- Experimental Medicine Program, University of British Columbia
| | - Liam R Brunham
- Centre for Heart Lung Innovation, University of British Columbia
- Experimental Medicine Program, University of British Columbia
- Department of Medicine, University of British Columbia
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
4
|
Genomic approaches for the elucidation of genes and gene networks underlying cardiovascular traits. Biophys Rev 2018; 10:1053-1060. [PMID: 29934864 PMCID: PMC6082306 DOI: 10.1007/s12551-018-0435-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Accepted: 06/13/2018] [Indexed: 12/31/2022] Open
Abstract
Genome-wide association studies have shed light on the association between natural genetic variation and cardiovascular traits. However, linking a cardiovascular trait associated locus to a candidate gene or set of candidate genes for prioritization for follow-up mechanistic studies is all but straightforward. Genomic technologies based on next-generation sequencing technology nowadays offer multiple opportunities to dissect gene regulatory networks underlying genetic cardiovascular trait associations, thereby aiding in the identification of candidate genes at unprecedented scale. RNA sequencing in particular becomes a powerful tool when combined with genotyping to identify loci that modulate transcript abundance, known as expression quantitative trait loci (eQTL), or loci modulating transcript splicing known as splicing quantitative trait loci (sQTL). Additionally, the allele-specific resolution of RNA-sequencing technology enables estimation of allelic imbalance, a state where the two alleles of a gene are expressed at a ratio differing from the expected 1:1 ratio. When multiple high-throughput approaches are combined with deep phenotyping in a single study, a comprehensive elucidation of the relationship between genotype and phenotype comes into view, an approach known as systems genetics. In this review, we cover key applications of systems genetics in the broad cardiovascular field.
Collapse
|
5
|
Ritchie MD, Van Steen K. The search for gene-gene interactions in genome-wide association studies: challenges in abundance of methods, practical considerations, and biological interpretation. ANNALS OF TRANSLATIONAL MEDICINE 2018; 6:157. [PMID: 29862246 DOI: 10.21037/atm.2018.04.05] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
One of the primary goals in this era of precision medicine is to understand the biology of human diseases and their treatment, such that each individual patient receives the best possible treatment for their disease based on their genetic and environmental exposures. One way to work towards achieving this goal is to identify the environmental exposures and genetic variants that are relevant to each disease in question, as well as the complex interplay between genes and environment. Genome-wide association studies (GWAS) have allowed for a greater understanding of the genetic component of many complex traits. However, these genetic effects are largely small and thus, our ability to use these GWAS finding for precision medicine is limited. As more and more GWAS have been performed, rather than focusing only on common single nucleotide polymorphisms (SNPs) and additive genetic models, many researchers have begun to explore alternative heritable components of complex traits including rare variants, structural variants, epigenetics, and genetic interactions. While genetic interactions are a plausible reality that could explain some of the heritabliy that has not yet been identified, especially when one considers the identification of genetic interactions in model organisms as well as our understanding of biological complexity, still there are significant challenges and considerations in identifying these genetic interactions. Broadly, these can be summarized in three categories: abundance of methods, practical considerations, and biological interpretation. In this review, we will discuss these important elements in the search for genetic interactions along with some potential solutions. While genetic interactions are theoretically understood to be important for complex human disease, the body of evidence is still building to support this component of the underlying genetic architecture of complex human traits. Our hope is that more sophisticated modeling approaches and more robust computational techniques will enable the community to identify these important genetic interactions and improve our ability to implement precision medicine in the future.
Collapse
Affiliation(s)
- Marylyn D Ritchie
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
| | - Kristel Van Steen
- WELBIO, GIGA-R Medical Genomics Unit - BIO3, University of Liège, Liège, Belgium.,Department of Human Genetics, University of Leuven, Leuven, Belgium
| |
Collapse
|
6
|
Hall MA, Moore JH, Ritchie MD. Embracing Complex Associations in Common Traits: Critical Considerations for Precision Medicine. Trends Genet 2017; 32:470-484. [PMID: 27392675 DOI: 10.1016/j.tig.2016.06.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2016] [Revised: 06/01/2016] [Accepted: 06/02/2016] [Indexed: 10/21/2022]
Abstract
Genome-wide association studies (GWAS) have identified numerous loci associated with human phenotypes. This approach, however, does not consider the richly diverse and complex environment with which humans interact throughout the life course, nor does it allow for interrelationships between genetic loci and across traits. As we move toward making precision medicine a reality, whereby we make predictions about disease risk based on genomic profiles, we need to identify improved predictive models of the relationship between genome and phenome. Methods that embrace pleiotropy (the effect of one locus on more than one trait), and gene-environment (G×E) and gene-gene (G×G) interactions, will further unveil the impact of alterations in biological pathways and identify genes that are only involved with disease in the context of the environment. This valuable information can be used to assess personal risk and choose the most appropriate medical interventions based on the genotype and environment of an individual, the whole premise of precision medicine.
Collapse
Affiliation(s)
- Molly A Hall
- Institute for Biomedical Informatics, Departments of Genetics and Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania, 3535 Market Street, Philadelphia, PA 19104, USA
| | - Jason H Moore
- Institute for Biomedical Informatics, Departments of Genetics and Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania, 3535 Market Street, Philadelphia, PA 19104, USA
| | - Marylyn D Ritchie
- Biomedical and Translational Informatics, Geisinger Health System, Danville, PA, USA; Department of Biochemistry and Molecular Biology, Center for Systems Genomics, Eberly College of Science, The Pennsylvania State University, University Park, PA, USA.
| |
Collapse
|
7
|
Qian L, Hickey LT, Stahl A, Werner CR, Hayes B, Snowdon RJ, Voss-Fels KP. Exploring and Harnessing Haplotype Diversity to Improve Yield Stability in Crops. FRONTIERS IN PLANT SCIENCE 2017; 8:1534. [PMID: 28928764 PMCID: PMC5591830 DOI: 10.3389/fpls.2017.01534] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2017] [Accepted: 08/22/2017] [Indexed: 05/19/2023]
Abstract
In order to meet future food, feed, fiber, and bioenergy demands, global yields of all major crops need to be increased significantly. At the same time, the increasing frequency of extreme weather events such as heat and drought necessitates improvements in the environmental resilience of modern crop cultivars. Achieving sustainably increase yields implies rapid improvement of quantitative traits with a very complex genetic architecture and strong environmental interaction. Latest advances in genome analysis technologies today provide molecular information at an ultrahigh resolution, revolutionizing crop genomic research, and paving the way for advanced quantitative genetic approaches. These include highly detailed assessment of population structure and genotypic diversity, facilitating the identification of selective sweeps and signatures of directional selection, dissection of genetic variants that underlie important agronomic traits, and genomic selection (GS) strategies that not only consider major-effect genes. Single-nucleotide polymorphism (SNP) markers today represent the genotyping system of choice for crop genetic studies because they occur abundantly in plant genomes and are easy to detect. SNPs are typically biallelic, however, hence their information content compared to multiallelic markers is low, limiting the resolution at which SNP-trait relationships can be delineated. An efficient way to overcome this limitation is to construct haplotypes based on linkage disequilibrium, one of the most important features influencing genetic analyses of crop genomes. Here, we give an overview of the latest advances in genomics-based haplotype analyses in crops, highlighting their importance in the context of polyploidy and genome evolution, linkage drag, and co-selection. We provide examples of how haplotype analyses can complement well-established quantitative genetics frameworks, such as quantitative trait analysis and GS, ultimately providing an effective tool to equip modern crops with environment-tailored characteristics.
Collapse
Affiliation(s)
- Lunwen Qian
- Collaborative Innovation Center of Grain and Oil Crops in South China, Hunan Agricultural UniversityChangsha, China
- Department of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig University GiessenGiessen, Germany
| | - Lee T. Hickey
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St LuciaQLD, Australia
| | - Andreas Stahl
- Department of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig University GiessenGiessen, Germany
| | - Christian R. Werner
- Department of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig University GiessenGiessen, Germany
| | - Ben Hayes
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St LuciaQLD, Australia
| | - Rod J. Snowdon
- Department of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig University GiessenGiessen, Germany
| | - Kai P. Voss-Fels
- Department of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig University GiessenGiessen, Germany
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St LuciaQLD, Australia
| |
Collapse
|
8
|
Mitra I, Lavillaureix A, Yeh E, Traglia M, Tsang K, Bearden CE, Rauen KA, Weiss LA. Reverse Pathway Genetic Approach Identifies Epistasis in Autism Spectrum Disorders. PLoS Genet 2017; 13:e1006516. [PMID: 28076348 PMCID: PMC5226683 DOI: 10.1371/journal.pgen.1006516] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Accepted: 12/01/2016] [Indexed: 02/08/2023] Open
Abstract
Although gene-gene interaction, or epistasis, plays a large role in complex traits in model organisms, genome-wide by genome-wide searches for two-way interaction have limited power in human studies. We thus used knowledge of a biological pathway in order to identify a contribution of epistasis to autism spectrum disorders (ASDs) in humans, a reverse-pathway genetic approach. Based on previous observation of increased ASD symptoms in Mendelian disorders of the Ras/MAPK pathway (RASopathies), we showed that common SNPs in RASopathy genes show enrichment for association signal in GWAS (P = 0.02). We then screened genome-wide for interactors with RASopathy gene SNPs and showed strong enrichment in ASD-affected individuals (P < 2.2 x 10-16), with a number of pairwise interactions meeting genome-wide criteria for significance. Finally, we utilized quantitative measures of ASD symptoms in RASopathy-affected individuals to perform modifier mapping via GWAS. One top region overlapped between these independent approaches, and we showed dysregulation of a gene in this region, GPR141, in a RASopathy neural cell line. We thus used orthogonal approaches to provide strong evidence for a contribution of epistasis to ASDs, confirm a role for the Ras/MAPK pathway in idiopathic ASDs, and to identify a convergent candidate gene that may interact with the Ras/MAPK pathway.
Collapse
Affiliation(s)
- Ileena Mitra
- Department of Psychiatry, University of California San Francisco, San Francisco, California, United States of America
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America
| | - Alinoë Lavillaureix
- Department of Psychiatry, University of California San Francisco, San Francisco, California, United States of America
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America
- Université Paris Descartes, Sorbonne Paris Cité, Faculty of Medicine, Paris, France
| | - Erika Yeh
- Department of Psychiatry, University of California San Francisco, San Francisco, California, United States of America
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America
| | - Michela Traglia
- Department of Psychiatry, University of California San Francisco, San Francisco, California, United States of America
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America
| | - Kathryn Tsang
- Department of Psychiatry, University of California San Francisco, San Francisco, California, United States of America
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America
| | - Carrie E. Bearden
- Department of Psychiatry and Biobehavioral Sciences, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, California, United States of America
- Department of Psychology, University of California Los Angeles, Los Angeles, California, United States of America
| | - Katherine A. Rauen
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America
- Department of Pediatrics, School of Medicine, University of California San Francisco, San Francisco, California, United States of America
| | - Lauren A. Weiss
- Department of Psychiatry, University of California San Francisco, San Francisco, California, United States of America
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America
| |
Collapse
|
9
|
Niel C, Sinoquet C, Dina C, Rocheleau G. A survey about methods dedicated to epistasis detection. Front Genet 2015; 6:285. [PMID: 26442103 PMCID: PMC4564769 DOI: 10.3389/fgene.2015.00285] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2015] [Accepted: 08/27/2015] [Indexed: 12/25/2022] Open
Abstract
During the past decade, findings of genome-wide association studies (GWAS) improved our knowledge and understanding of disease genetics. To date, thousands of SNPs have been associated with diseases and other complex traits. Statistical analysis typically looks for association between a phenotype and a SNP taken individually via single-locus tests. However, geneticists admit this is an oversimplified approach to tackle the complexity of underlying biological mechanisms. Interaction between SNPs, namely epistasis, must be considered. Unfortunately, epistasis detection gives rise to analytic challenges since analyzing every SNP combination is at present impractical at a genome-wide scale. In this review, we will present the main strategies recently proposed to detect epistatic interactions, along with their operating principle. Some of these methods are exhaustive, such as multifactor dimensionality reduction, likelihood ratio-based tests or receiver operating characteristic curve analysis; some are non-exhaustive, such as machine learning techniques (random forests, Bayesian networks) or combinatorial optimization approaches (ant colony optimization, computational evolution system).
Collapse
Affiliation(s)
- Clément Niel
- Computer Science Institute of Nantes-Atlantic (Lina), Centre National de la Recherche Scientifique UMR 6241, Ecole Polytechnique de l'Université de NantesNantes, France
| | - Christine Sinoquet
- Computer Science Institute of Nantes-Atlantic (Lina), Centre National de la Recherche Scientifique UMR 6241, University of NantesNantes, France
| | - Christian Dina
- Institut du Thorax, Institut National de la Santé et de la Recherche Médicale UMR 1087, Centre National de la Recherche Scientifique UMR 6291, University of NantesNantes, France
| | - Ghislain Rocheleau
- European Genomic Institute for Diabetes FR3508, Centre National de la Recherche Scientifique UMR 8199, Lille 2 UniversityLille, France
| |
Collapse
|