1
|
Kim W, Kim S, Na MH, Kim Y. A modified least angle regression algorithm for interaction selection with heredity. Stat Anal Data Min 2022. [DOI: 10.1002/sam.11577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Woosung Kim
- Samsung Electronics, Suwon‐si, Gyeonggi‐do Korea
| | - Seonghyeon Kim
- Department of Statistics Seoul National University, Seoul Korea
| | - Myung Hwan Na
- Department of Mathematics and Statistics Chonnam National University, Gwangju Korea
| | - Yongdai Kim
- Department of Statistics Seoul National University, Seoul Korea
| |
Collapse
|
2
|
Yang L, Qu Q, Hao Z, Sha K, Li Z, Li S. Powerful Identification of Large Quantitative Trait Loci Using Genome-wide R/glmnet-Based Regression. J Hered 2022; 113:472-478. [PMID: 35134967 DOI: 10.1093/jhered/esac006] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Accepted: 02/02/2022] [Indexed: 11/14/2022] Open
Abstract
R/glmnet has been successfully applied to jointly-mapped multiple quantitative trait loci for linkage analysis, along with statistical inference for quantitative trait loci candidates with non-zero genetic effects using R/lm for normally distributed traits, R/glm for discrete traits, and R/coxph for survival times. In this study, we extended R/glmnet to a genome-wide association study by means of parallel computation. A multi-locus genome-wide association study for high-throughput single nucleotide polymorphisms was implemented in the "Multi-Runking" software written within the R workspace. This software can better detect common and large quantitative trait nucleotides and more accurately estimate than genome-wide mixed model analysis for one single nucleotide polymorphism at a time and linear mixed models-least absolute shrinkage and selection operator. Its applicability and utility were demonstrated by multi-locus genome-wide association studies for the simulated and real traits distributed normally, binary traits, and survival times.
Collapse
Affiliation(s)
- Li'ang Yang
- College of Life Science, Northeast Agricultural University, Harbin 150030, China
| | - Qiannan Qu
- College of Life Science, Northeast Agricultural University, Harbin 150030, China
| | - Zhiyu Hao
- College of Animal Science and Technology, Northeast Agricultural University, Harbin 150030, China
| | - Ke Sha
- College of Life Science, Northeast Agricultural University, Harbin 150030, China
| | - Ziyu Li
- College of Life Science, Northeast Agricultural University, Harbin 150030, China
| | - Shuling Li
- College of Life Science, Northeast Agricultural University, Harbin 150030, China
| |
Collapse
|
3
|
Baison J, Zhou L, Forsberg N, Mörling T, Grahn T, Olsson L, Karlsson B, Wu HX, Mellerowicz EJ, Lundqvist SO, García-Gil MR. Genetic control of tracheid properties in Norway spruce wood. Sci Rep 2020; 10:18089. [PMID: 33093525 PMCID: PMC7581746 DOI: 10.1038/s41598-020-72586-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Accepted: 09/03/2020] [Indexed: 01/20/2023] Open
Abstract
Through the use of genome-wide association studies (GWAS) mapping it is possible to establish the genetic basis of phenotypic trait variation. Our GWAS study presents the first such effort in Norway spruce (Picea abies (L). Karst.) for the traits related to wood tracheid characteristics. The study employed an exome capture genotyping approach that generated 178 101 Single Nucleotide Polymorphisms (SNPs) from 40 018 probes within a population of 517 Norway spruce mother trees. We applied a least absolute shrinkage and selection operator (LASSO) based association mapping method using a functional multi-locus mapping approach, with a stability selection probability method as the hypothesis testing approach to determine significant Quantitative Trait Loci (QTLs). The analysis has provided 30 significant associations, the majority of which show specific expression in wood-forming tissues or high ubiquitous expression, potentially controlling tracheids dimensions, their cell wall thickness and microfibril angle. Among the most promising candidates based on our results and prior information for other species are: Picea abies BIG GRAIN 2 (PabBG2) with a predicted function in auxin transport and sensitivity, and MA_373300g0010 encoding a protein similar to wall-associated receptor kinases, which were both associated with cell wall thickness. The results demonstrate feasibility of GWAS to identify novel candidate genes controlling industrially-relevant tracheid traits in Norway spruce.
Collapse
Affiliation(s)
- J Baison
- Department of Forest Genetics and Plant Physiology, Umeå Plant Science Centre, Swedish University of Agricultural Science, Umeå, Sweden
| | - Linghua Zhou
- Department of Forest Genetics and Plant Physiology, Umeå Plant Science Centre, Swedish University of Agricultural Science, Umeå, Sweden
| | - Nils Forsberg
- Department of Forest Genetics and Plant Physiology, Umeå Plant Science Centre, Swedish University of Agricultural Science, Umeå, Sweden
| | - Tommy Mörling
- Department of Forest Genetics and Plant Physiology, Umeå Plant Science Centre, Swedish University of Agricultural Science, Umeå, Sweden
| | - Thomas Grahn
- RISE Bioeconomy, Box 5604, 114 86, Stockholm, Sweden
| | - Lars Olsson
- RISE Bioeconomy, Box 5604, 114 86, Stockholm, Sweden
| | - Bo Karlsson
- Skogforsk, Ekebo 2250, 268 90, Svalov, Sweden
| | - Harry X Wu
- Department of Forest Genetics and Plant Physiology, Umeå Plant Science Centre, Swedish University of Agricultural Science, Umeå, Sweden
| | - Ewa J Mellerowicz
- Department of Forest Genetics and Plant Physiology, Umeå Plant Science Centre, Swedish University of Agricultural Science, Umeå, Sweden
| | - Sven-Olof Lundqvist
- RISE Bioeconomy, Box 5604, 114 86, Stockholm, Sweden
- IIC, Rosenlundsgatan 48B, 11863, Stockholm, Sweden
| | - María Rosario García-Gil
- Department of Forest Genetics and Plant Physiology, Umeå Plant Science Centre, Swedish University of Agricultural Science, Umeå, Sweden.
| |
Collapse
|
4
|
Identification of a Set of Genes Improving Survival Prediction in Kidney Renal Clear Cell Carcinoma through Integrative Reanalysis of Transcriptomic Data. DISEASE MARKERS 2020; 2020:8824717. [PMID: 33110456 PMCID: PMC7578724 DOI: 10.1155/2020/8824717] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 09/19/2020] [Accepted: 09/24/2020] [Indexed: 11/18/2022]
Abstract
Background With an enormous amount of research concerning kidney cancer being conducted, various treatments have been applied to its cure. However, high recurrence and metastasis rates continue to pose a threat to the survival of patients with kidney renal clear cell carcinoma (KIRC). Methods Data from The Cancer Genome Atlas were downloaded, and a series of analyses were performed, including differential analysis, Cox analysis, weighted gene coexpression network analysis, least absolute shrinkage and selection operator analysis, multivariate Cox analysis, survival analysis, and receiver operating characteristic curve and functional enrichment analysis. Results A total of 5,777 differentially expressed genes were identified from the differential analysis. The Cox analysis showed 1,853 significant genes (P < 0.01). Weighted gene coexpression network analysis revealed that 226 genes in the module were related to clinical parameters, including Tumor-Node-Metastasis (TNM) staging. Least absolute shrinkage and selection operator and multivariate Cox analyses suggested that four genes (CDKL2, LRFN1, STAT2, and SOWAHB) had a potential function in predicting the survival time of patients with KIRC. Survival analysis uncovered that a high risk of these four genes was associated with an unfavorable prognosis. Receiver operating characteristic curve analysis further confirmed the accuracy of the risk score model. The analysis of clinicopathological parameters of the four identified genes revealed that they were associated with the progression of KIRC. Conclusion The gene expression model consisting of CDKL2, LRFN1, STAT2, and SOWAHB is a promising tool for predicting the prognosis of patients with KIRC. The results of this study may provide insights into the diagnosis and treatment of KIRC.
Collapse
|
5
|
Elfstrand M, Baison J, Lundén K, Zhou L, Vos I, Capador HD, Åslund MS, Chen Z, Chaudhary R, Olson Å, Wu HX, Karlsson B, Stenlid J, García-Gil MR. Association genetics identifies a specifically regulated Norway spruce laccase gene, PaLAC5, linked to Heterobasidion parviporum resistance. PLANT, CELL & ENVIRONMENT 2020; 43:1779-1791. [PMID: 32276288 DOI: 10.1111/pce.13768] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Revised: 03/21/2020] [Accepted: 03/28/2020] [Indexed: 06/11/2023]
Abstract
It is important to improve the understanding of the interactions between the trees and pathogens and integrate this knowledge about disease resistance into tree breeding programs. The conifer Norway spruce (Picea abies) is an important species for the forest industry in Europe. Its major pathogen is Heterobasidion parviporum, causing stem and root rot. In this study, we identified 11 Norway spruce QTLs (Quantitative trait loci) that correlate with variation in resistance to H. parviporum in a population of 466 trees by association genetics. Individual QTLs explained between 2.1 and 5.2% of the phenotypic variance. The expression of candidate genes associated with the QTLs was analysed in silico and in response to H. parviporum hypothesizing that (a) candidate genes linked to control of fungal sapwood growth are more commonly expressed in sapwood, and; (b) candidate genes associated with induced defences are respond to H. parviporum inoculation. The Norway spruce laccase PaLAC5 associated with control of lesion length development is likely to be involved in the induced defences. Expression analyses showed that PaLAC5 responds specifically and strongly in close proximity to the H. parviporum inoculation. Thus, PaLAC5 may be associated with the lignosuberized boundary zone formation in bark adjacent to the inoculation site.
Collapse
Affiliation(s)
- Malin Elfstrand
- Uppsala Biocentre, Department of Forest Mycology and Plant Pathology, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - John Baison
- Umeå Plant Science Centre, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, Umeå, Sweden
| | - Karl Lundén
- Uppsala Biocentre, Department of Forest Mycology and Plant Pathology, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Linghua Zhou
- Umeå Plant Science Centre, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, Umeå, Sweden
| | | | - Hernan Dario Capador
- Uppsala Biocentre, Department of Forest Mycology and Plant Pathology, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Matilda Stein Åslund
- Uppsala Biocentre, Department of Forest Mycology and Plant Pathology, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Zhiqiang Chen
- Umeå Plant Science Centre, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, Umeå, Sweden
| | - Rajiv Chaudhary
- Uppsala Biocentre, Department of Forest Mycology and Plant Pathology, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Åke Olson
- Uppsala Biocentre, Department of Forest Mycology and Plant Pathology, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Harry X Wu
- Umeå Plant Science Centre, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, Umeå, Sweden
| | | | - Jan Stenlid
- Uppsala Biocentre, Department of Forest Mycology and Plant Pathology, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - María Rosario García-Gil
- Umeå Plant Science Centre, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, Umeå, Sweden
| |
Collapse
|
6
|
Elfstrand M, Zhou L, Baison J, Olson Å, Lundén K, Karlsson B, Wu HX, Stenlid J, García‐Gil MR. Genotypic variation in Norway spruce correlates to fungal communities in vegetative buds. Mol Ecol 2020; 29:199-213. [PMID: 31755612 PMCID: PMC7003977 DOI: 10.1111/mec.15314] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 10/31/2019] [Accepted: 11/20/2019] [Indexed: 12/19/2022]
Abstract
The taxonomically diverse phyllosphere fungi inhabit leaves of plants. Thus, apart from the fungi's dispersal capacities and environmental factors, the assembly of the phyllosphere community associated with a given host plant depends on factors encoded by the host's genome. The host genetic factors and their influence on the assembly of phyllosphere communities under natural conditions are poorly understood, especially in trees. Recent work indicates that Norway spruce (Picea abies) vegetative buds harbour active fungal communities, but these are hitherto largely uncharacterized. This study combines internal transcribed spacer sequencing of the fungal communities associated with dormant vegetative buds with a genome-wide association study (GWAS) in 478 unrelated Norway spruce trees. The aim was to detect host loci associated with variation in the fungal communities across the population, and to identify loci correlating with the presence of specific, latent, pathogens. The fungal communities were dominated by known Norway spruce phyllosphere endophytes and pathogens. We identified six quantitative trait loci (QTLs) associated with the relative abundance of the dominating taxa (i.e., top 1% most abundant taxa). Three additional QTLs associated with colonization by the spruce needle cast pathogen Lirula macrospora or the cherry spruce rust (Thekopsora areolata) in asymptomatic tissues were detected. The identification of the nine QTLs shows that the genetic variation in Norway spruce influences the fungal community in dormant buds and that mechanisms underlying the assembly of the communities and the colonization of latent pathogens in trees may be uncovered by combining molecular identification of fungi with GWAS.
Collapse
Affiliation(s)
- Malin Elfstrand
- Uppsala BiocentreDepartment of Forest Mycology and Plant PathologySwedish University of Agricultural SciencesUppsalaSweden
| | - Linghua Zhou
- Umeå Plant Science CentreDepartment of Forest Genetics and Plant PhysiologySwedish University of Agricultural SciencesUmeåSweden
| | - John Baison
- Umeå Plant Science CentreDepartment of Forest Genetics and Plant PhysiologySwedish University of Agricultural SciencesUmeåSweden
| | - Åke Olson
- Uppsala BiocentreDepartment of Forest Mycology and Plant PathologySwedish University of Agricultural SciencesUppsalaSweden
| | - Karl Lundén
- Uppsala BiocentreDepartment of Forest Mycology and Plant PathologySwedish University of Agricultural SciencesUppsalaSweden
| | | | - Harry X. Wu
- Umeå Plant Science CentreDepartment of Forest Genetics and Plant PhysiologySwedish University of Agricultural SciencesUmeåSweden
| | - Jan Stenlid
- Uppsala BiocentreDepartment of Forest Mycology and Plant PathologySwedish University of Agricultural SciencesUppsalaSweden
| | - M. Rosario García‐Gil
- Umeå Plant Science CentreDepartment of Forest Genetics and Plant PhysiologySwedish University of Agricultural SciencesUmeåSweden
| |
Collapse
|
7
|
Baison J, Vidalis A, Zhou L, Chen Z, Li Z, Sillanpää MJ, Bernhardsson C, Scofield D, Forsberg N, Grahn T, Olsson L, Karlsson B, Wu H, Ingvarsson PK, Lundqvist S, Niittylä T, García‐Gil MR. Genome-wide association study identified novel candidate loci affecting wood formation in Norway spruce. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2019; 100:83-100. [PMID: 31166032 PMCID: PMC6852177 DOI: 10.1111/tpj.14429] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 04/16/2019] [Accepted: 05/20/2019] [Indexed: 05/26/2023]
Abstract
Norway spruce is a boreal forest tree species of significant ecological and economic importance. Hence there is a strong imperative to dissect the genetics underlying important wood quality traits in the species. We performed a functional genome-wide association study (GWAS) of 17 wood traits in Norway spruce using 178 101 single nucleotide polymorphisms (SNPs) generated from exome genotyping of 517 mother trees. The wood traits were defined using functional modelling of wood properties across annual growth rings. We applied a Least Absolute Shrinkage and Selection Operator (LASSO-based) association mapping method using a functional multilocus mapping approach that utilizes latent traits, with a stability selection probability method as the hypothesis testing approach to determine a significant quantitative trait locus. The analysis provided 52 significant SNPs from 39 candidate genes, including genes previously implicated in wood formation and tree growth in spruce and other species. Our study represents a multilocus GWAS for complex wood traits in Norway spruce. The results advance our understanding of the genetics influencing wood traits and identifies candidate genes for future functional studies.
Collapse
Affiliation(s)
- John Baison
- Department of Forest Genetics and Plant PhysiologyUmeå Plant Science CentreSwedish University of Agricultural ScienceParallellvägen 21Umeå907 36Sweden
| | - Amaryllis Vidalis
- Section of Population Epigenetics and EpigenomicsCentre of Life and Food Sciences WeihenstephanTechnische Universität MünchenLichtenbergstr. 2aMünchen85748Germany
| | - Linghua Zhou
- Department of Forest Genetics and Plant PhysiologyUmeå Plant Science CentreSwedish University of Agricultural ScienceParallellvägen 21Umeå907 36Sweden
| | - Zhi‐Qiang Chen
- Department of Forest Genetics and Plant PhysiologyUmeå Plant Science CentreSwedish University of Agricultural ScienceParallellvägen 21Umeå907 36Sweden
| | - Zitong Li
- Ecological Genetics Research UnitDepartment of BiosciencesUniversity of HelsinkiP.O. Box 65FI‐00014HelsinkiFinland
| | - Mikko J. Sillanpää
- Department of Mathematical SciencesBiocenter OuluUniversity of OuluPentti Kaiteran katu 1OuluFinland
| | - Carolina Bernhardsson
- Department of Forest Genetics and Plant PhysiologyUmeå Plant Science CentreSwedish University of Agricultural ScienceParallellvägen 21Umeå907 36Sweden
- Department of Ecology and Environmental ScienceUmeå UniversityLinnaeus väg 4-6Umeå907 36Sweden
| | - Douglas Scofield
- Uppsala Multidisciplinary Centre for Advanced Computational ScienceUppsala UniversityLägerhyddsvägen 2Uppsala752 37Sweden
| | - Nils Forsberg
- Department of Forest Genetics and Plant PhysiologyUmeå Plant Science CentreSwedish University of Agricultural ScienceParallellvägen 21Umeå907 36Sweden
| | - Thomas Grahn
- RISE BioeconomyDrottning Kristinas väg 61SE‐114 86StockholmSweden
| | - Lars Olsson
- RISE BioeconomyDrottning Kristinas väg 61SE‐114 86StockholmSweden
| | | | - Harry Wu
- Department of Forest Genetics and Plant PhysiologyUmeå Plant Science CentreSwedish University of Agricultural ScienceParallellvägen 21Umeå907 36Sweden
| | - Pär K. Ingvarsson
- Department of Ecology and Environmental ScienceUmeå UniversityLinnaeus väg 4-6Umeå907 36Sweden
- Department of Ecology and Genetics: Evolutionary BiologyUppsala UniversityKåbovägen 4Uppsala752 36Sweden
| | - Sven‐Olof Lundqvist
- RISE BioeconomyDrottning Kristinas väg 61SE‐114 86StockholmSweden
- IICRosenlundsgatan 48BSE‐118 63StockholmSweden
| | - Totte Niittylä
- Department of Forest Genetics and Plant PhysiologyUmeå Plant Science CentreSwedish University of Agricultural ScienceParallellvägen 21Umeå907 36Sweden
| | - M Rosario García‐Gil
- Department of Forest Genetics and Plant PhysiologyUmeå Plant Science CentreSwedish University of Agricultural ScienceParallellvägen 21Umeå907 36Sweden
| |
Collapse
|
8
|
Estimation of parameters on Texas reservoirs using least absolute shrinkage and selection operator. ACTA ACUST UNITED AC 2019. [DOI: 10.1007/s42108-019-00018-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
9
|
Braz CU, Taylor JF, Bresolin T, Espigolan R, Feitosa FLB, Carvalheiro R, Baldi F, de Albuquerque LG, de Oliveira HN. Sliding window haplotype approaches overcome single SNP analysis limitations in identifying genes for meat tenderness in Nelore cattle. BMC Genet 2019; 20:8. [PMID: 30642245 PMCID: PMC6332854 DOI: 10.1186/s12863-019-0713-4] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2018] [Accepted: 01/02/2019] [Indexed: 12/30/2022] Open
Abstract
Background Traditional single nucleotide polymorphism (SNP) genome-wide association analysis (GWAA) can be inefficient because single SNPs provide limited genetic information about genomic regions. On the other hand, using haplotypes in the statistical analysis may increase the extent of linkage disequilibrium (LD) between haplotypes and causal variants and may also potentially capture epistastic interactions between variants within a haplotyped locus, providing an increase in the power and robustness of the association studies. We performed GWAA (413,355 SNP markers) using haplotypes based on variable-sized sliding windows and compared the results to a single-SNP GWAA using Warner-Bratzler shear force measured in the longissimus thorasis muscle of 3161 Nelore bulls to ascertain the optimal window size for identifying the genomic regions that influence meat tenderness. Results The GWAA using single SNPs identified eight variants influencing meat tenderness on BTA 3, 4, 9, 10 and 11. However, thirty-three putative meat tenderness QTL were detected on BTA 1, 3, 4, 5, 8, 9, 10, 11, 15, 17, 18, 24, 25, 26 and 29 using variable-sized sliding haplotype windows. Analyses using sliding window haplotypes of 3, 5, 7, 9 and 11 SNPs identified 57, 61, 42, 39, and 21% of all thirty-three putative QTL regions, respectively; however, the analyses using the 3 and 5 SNP haplotypes, cumulatively detected 88% of the putative QTL. The genes associated with variation in meat tenderness participate in myogenesis, neurogenesis, lipid and fatty acid metabolism and skeletal muscle structure or composition processes. Conclusions GWAA using haplotypes based on variable-sized sliding windows allowed the detection of more QTL than traditional single-SNP GWAA. Analyses using smaller haplotypes (3 and 5 SNPs) detected a higher proportion of the putative QTL. Electronic supplementary material The online version of this article (10.1186/s12863-019-0713-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Camila U Braz
- Animal Science Department, São Paulo State University (Unesp), Jaboticabal, SP, 144884-900, Brazil.
| | - Jeremy F Taylor
- Division of Animal Sciences, University of Missouri, Columbia, MO, 65211, USA
| | - Tiago Bresolin
- Animal Science Department, São Paulo State University (Unesp), Jaboticabal, SP, 144884-900, Brazil
| | - Rafael Espigolan
- Animal Science Department, São Paulo State University (Unesp), Jaboticabal, SP, 144884-900, Brazil
| | - Fabieli L B Feitosa
- Animal Science Department, São Paulo State University (Unesp), Jaboticabal, SP, 144884-900, Brazil
| | - Roberto Carvalheiro
- Animal Science Department, São Paulo State University (Unesp), Jaboticabal, SP, 144884-900, Brazil
| | - Fernando Baldi
- Animal Science Department, São Paulo State University (Unesp), Jaboticabal, SP, 144884-900, Brazil
| | - Lucia G de Albuquerque
- Animal Science Department, São Paulo State University (Unesp), Jaboticabal, SP, 144884-900, Brazil
| | - Henrique N de Oliveira
- Animal Science Department, São Paulo State University (Unesp), Jaboticabal, SP, 144884-900, Brazil.
| |
Collapse
|
10
|
Simon PHG, Sylvestre MP, Tremblay J, Hamet P. Key Considerations and Methods in the Study of Gene-Environment Interactions. Am J Hypertens 2016; 29:891-9. [PMID: 27037711 DOI: 10.1093/ajh/hpw021] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2015] [Accepted: 02/08/2016] [Indexed: 12/16/2022] Open
Abstract
With increased involvement of genetic data in most epidemiological investigations, gene-environment (G × E) interactions now stand as a topic, which must be meticulously assessed and thoroughly understood. The level, mode, and outcomes of interactions between environmental factors and genetic traits have the capacity to modulate disease risk. These must, therefore, be carefully evaluated as they have the potential to offer novel insights on the "missing heritability problem", reaching beyond our current limitations. First, we review a definition of G × E interactions. We then explore how concepts such as the early manifestation of the genetic components of a disease, the heterogeneity of complex traits, the clear definition of epidemiological strata, and the effect of varying physiological conditions can affect our capacity to detect (or miss) G × E interactions. Lastly, we discuss the shortfalls of regression models to study G × E interactions and how other methods such as the ReliefF algorithm, pattern recognition methods, or the LASSO (Least Absolute Shrinkage and Selection Operator) method can enable us to more adequately model G × E interactions. Overall, we present the elements to consider and a path to follow when studying genetic determinants of disease in order to uncover potential G × E interactions.
Collapse
Affiliation(s)
- Paul H G Simon
- CHUM Research Center, Centre hospitalier de l'Université de Montréal, Montréal, Québec, Canada
| | - Marie-Pierre Sylvestre
- CHUM Research Center, Centre hospitalier de l'Université de Montréal, Montréal, Québec, Canada
| | - Johanne Tremblay
- CHUM Research Center, Centre hospitalier de l'Université de Montréal, Montréal, Québec, Canada
| | - Pavel Hamet
- CHUM Research Center, Centre hospitalier de l'Université de Montréal, Montréal, Québec, Canada.
| |
Collapse
|
11
|
Sariyar M, Hoffmann I, Binder H. Combining techniques for screening and evaluating interaction terms on high-dimensional time-to-event data. BMC Bioinformatics 2014; 15:58. [PMID: 24571520 PMCID: PMC3945780 DOI: 10.1186/1471-2105-15-58] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2013] [Accepted: 01/28/2014] [Indexed: 11/23/2022] Open
Abstract
Background Molecular data, e.g. arising from microarray technology, is often used for predicting survival probabilities of patients. For multivariate risk prediction models on such high-dimensional data, there are established techniques that combine parameter estimation and variable selection. One big challenge is to incorporate interactions into such prediction models. In this feasibility study, we present building blocks for evaluating and incorporating interactions terms in high-dimensional time-to-event settings, especially for settings in which it is computationally too expensive to check all possible interactions. Results We use a boosting technique for estimation of effects and the following building blocks for pre-selecting interactions: (1) resampling, (2) random forests and (3) orthogonalization as a data pre-processing step. In a simulation study, the strategy that uses all building blocks is able to detect true main effects and interactions with high sensitivity in different kinds of scenarios. The main challenge are interactions composed of variables that do not represent main effects, but our findings are also promising in this regard. Results on real world data illustrate that effect sizes of interactions frequently may not be large enough to improve prediction performance, even though the interactions are potentially of biological relevance. Conclusion Screening interactions through random forests is feasible and useful, when one is interested in finding relevant two-way interactions. The other building blocks also contribute considerably to an enhanced pre-selection of interactions. We determined the limits of interaction detection in terms of necessary effect sizes. Our study emphasizes the importance of making full use of existing methods in addition to establishing new ones.
Collapse
Affiliation(s)
- Murat Sariyar
- Institute of Medical Biostatistics, Epidemiology and Informatics, Medical Center of the Johannes Gutenberg University, Mainz 55131, Germany.
| | | | | |
Collapse
|
12
|
Li J, Dan J, Li C, Wu R. A model-free approach for detecting interactions in genetic association studies. Brief Bioinform 2013; 15:1057-68. [PMID: 24273216 DOI: 10.1093/bib/bbt082] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Over the past few decades, genome-wide association studies analyzed by efficient statistical procedures have successfully identified single-nucleotide polymorphisms (SNPs) that are associated with complex traits or human diseases. However, due to the overwhelming number of SNPs, most approaches have focused on additive genetic model without genome-wide SNP-SNP interactions. In this study, we propose an efficient statistical procedure in a genetic model-free framework for detecting SNPs exhibiting main genetic effects as well as epistatic interactions. Specifically, the association between phenotype and genotype is characterized by an unknown function to be estimated using nonparametric techniques, and a two-stage non-parametric independence screening procedure is proposed to sequentially identify potentially important main genetic effects and interactions. Finally, the subset of genetic predictors implied by two-stage non-parametric independence screening is analyzed by penalized regressions such as LASSO, and a final model is identified. In this framework, specific genetic model is not assumed and interactions are not only among marginally important SNPs. Therefore, SNPs that are involved in genetic regulatory networks but missed by previous studies are expected to be recognized. In simulation studies, we show that the procedure is computationally efficient and has an outstanding finite sample performance in selecting potential SNPs as well as SNP-SNP interactions. A real data analysis further indicates the importance of epistatic interactions in explaining body mass index.
Collapse
|
13
|
Yang W, Gu C. A whole-genome simulator capable of modeling high-order epistasis for complex disease. Genet Epidemiol 2013; 37:686-94. [PMID: 24114848 PMCID: PMC4143152 DOI: 10.1002/gepi.21761] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2013] [Revised: 08/09/2013] [Accepted: 08/14/2013] [Indexed: 11/10/2022]
Abstract
Genome-wide association studies (GWAS) have been successful in finding numerous new risk variants for complex diseases, but the results almost exclusively rely on single-marker scans. Methods that can analyze joint effects of many variants in GWAS data are still being developed and trialed. To evaluate the performance of such methods it is essential to have a GWAS data simulator that can rapidly simulate a large number of samples, and capture key features of real GWAS data such as linkage disequilibrium (LD) among single-nucleotide polymorphisms (SNPs) and joint effects of multiple loci (multilocus epistasis). In the current study, we combine techniques for specifying high-order epistasis among risk SNPs with an existing program GWAsimulator [Li and Li, 2008] to achieve rapid whole-genome simulation with accurate modeling of complex interactions. We considered various approaches to specifying interaction models including the following: departure from product of marginal effects for pairwise interactions, product terms in logistic regression models for low-order interactions, and penetrance tables conforming to marginal effect constraints for high-order interactions or prescribing known biological interactions. Methods for conversion among different model specifications are developed using penetrance table as the fundamental characterization of disease models. The new program, called simGWA, is capable of efficiently generating large samples of GWAS data with high precision. We show that data simulated by simGWA are faithful to template LD structures, and conform to prespecified diseases models with (or without) interactions.
Collapse
Affiliation(s)
- Wei Yang
- Division of Biostatistics, Washington University School of Medicine, St. Louis, MO
| | - Charles Gu
- Division of Biostatistics, Washington University School of Medicine, St. Louis, MO
- Department of Genetics, Washington University School of Medicine, St. Louis, MO
| |
Collapse
|