Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Weissbrod O, Geiger D, Rosset S. Multikernel linear mixed models for complex phenotype prediction. Genome Res 2016;26:969-79. [PMID: 27302636 PMCID: PMC4937570 DOI: 10.1101/gr.201996.115] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2015] [Accepted: 05/02/2016] [Indexed: 12/22/2022]

For:	Weissbrod O, Geiger D, Rosset S. Multikernel linear mixed models for complex phenotype prediction. Genome Res 2016;26:969-79. [PMID: 27302636 PMCID: PMC4937570 DOI: 10.1101/gr.201996.115] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2015] [Accepted: 05/02/2016] [Indexed: 12/22/2022]

Number

Cited by Other Article(s)

Wang X, Shi S, Ali Khan MY, Zhang Z, Zhang Y. Improving the accuracy of genomic prediction in dairy cattle using the biologically annotated neural networks framework. J Anim Sci Biotechnol 2024;15:87. [PMID: 38945998 PMCID: PMC11215832 DOI: 10.1186/s40104-024-01044-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Accepted: 05/05/2024] [Indexed: 07/02/2024] Open

Abstract

BACKGROUND

Biologically annotated neural networks (BANNs) are feedforward Bayesian neural network models that utilize partially connected architectures based on SNP-set annotations. As an interpretable neural network, BANNs model SNP and SNP-set effects in their input and hidden layers, respectively. Furthermore, the weights and connections of the network are regarded as random variables with prior distributions reflecting the manifestation of genetic effects at various genomic scales. However, its application in genomic prediction has yet to be explored.

RESULTS

This study extended the BANNs framework to the area of genomic selection and explored the optimal SNP-set partitioning strategies by using dairy cattle datasets. The SNP-sets were partitioned based on two strategies-gene annotations and 100 kb windows, denoted as BANN_gene and BANN_100kb, respectively. The BANNs model was compared with GBLUP, random forest (RF), BayesB and BayesCπ through five replicates of five-fold cross-validation using genotypic and phenotypic data on milk production traits, type traits, and one health trait of 6,558, 6,210 and 5,962 Chinese Holsteins, respectively. Results showed that the BANNs framework achieves higher genomic prediction accuracy compared to GBLUP, RF and Bayesian methods. Specifically, the BANN_100kb demonstrated superior accuracy and the BANN_gene exhibited generally suboptimal accuracy compared to GBLUP, RF, BayesB and BayesCπ across all traits. The average accuracy improvements of BANN_100kb over GBLUP, RF, BayesB and BayesCπ were 4.86%, 3.95%, 3.84% and 1.92%, and the accuracy of BANN_gene was improved by 3.75%, 2.86%, 2.73% and 0.85% compared to GBLUP, RF, BayesB and BayesCπ, respectively across all seven traits. Meanwhile, both BANN_100kb and BANN_gene yielded lower overall mean square error values than GBLUP, RF and Bayesian methods.

CONCLUSION

Our findings demonstrated that the BANNs framework performed better than traditional genomic prediction methods in our tested scenarios, and might serve as a promising alternative approach for genomic prediction in dairy cattle.

Collapse

Link V, Schraiber JG, Fan C, Dinh B, Mancuso N, Chiang CWK, Edge MD. Tree-based QTL mapping with expected local genetic relatedness matrices. Am J Hum Genet 2023;110:2077-2091. [PMID: 38065072 PMCID: PMC10716520 DOI: 10.1016/j.ajhg.2023.10.017] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Revised: 10/26/2023] [Accepted: 10/27/2023] [Indexed: 12/18/2023] Open

Hai Y, Ma J, Yang K, Wen Y. Bayesian linear mixed model with multiple random effects for prediction analysis on high-dimensional multi-omics data. Bioinformatics 2023;39:btad647. [PMID: 37882747 PMCID: PMC10627352 DOI: 10.1093/bioinformatics/btad647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 09/24/2023] [Accepted: 10/24/2023] [Indexed: 10/27/2023] Open

Hai Y, Zhao W, Meng Q, Liu L, Wen Y. Bayesian linear mixed model with multiple random effects for family-based genetic studies. Front Genet 2023;14:1267704. [PMID: 37928242 PMCID: PMC10620972 DOI: 10.3389/fgene.2023.1267704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 09/25/2023] [Indexed: 11/07/2023] Open

Alatrany AS, Khan W, Hussain AJ, Mustafina J, Al-Jumeily D. Transfer Learning for Classification of Alzheimer's Disease Based on Genome Wide Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:2700-2711. [PMID: 37018274 DOI: 10.1109/tcbb.2022.3233869] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]

Fu B, Pazokitoroudi A, Sudarshan M, Liu Z, Subramanian L, Sankararaman S. Fast kernel-based association testing of non-linear genetic effects for biobank-scale data. Nat Commun 2023;14:4936. [PMID: 37582955 PMCID: PMC10427662 DOI: 10.1038/s41467-023-40346-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 07/18/2023] [Indexed: 08/17/2023] Open

Link V, Schraiber JG, Fan C, Dinh B, Mancuso N, Chiang CW, Edge MD. Tree-based QTL mapping with expected local genetic relatedness matrices. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.07.536093. [PMID: 37066144 PMCID: PMC10104234 DOI: 10.1101/2023.04.07.536093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]

Mahmood U, Li X, Fan Y, Chang W, Niu Y, Li J, Qu C, Lu K. Multi-omics revolution to promote plant breeding efficiency. FRONTIERS IN PLANT SCIENCE 2022;13:1062952. [PMID: 36570904 PMCID: PMC9773847 DOI: 10.3389/fpls.2022.1062952] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 11/24/2022] [Indexed: 06/17/2023]

Affiliation(s)

Umer Mahmood Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China
Xiaodong Li Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China
Yonghai Fan Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China
Wei Chang Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China
Yue Niu Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China
Jiana Li Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China Academy of Agricultural Sciences, Southwest University, Chongqing, China Engineering Research Center of South Upland Agriculture, Ministry of Education, Chongqing, China
Cunmin Qu Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China Academy of Agricultural Sciences, Southwest University, Chongqing, China Engineering Research Center of South Upland Agriculture, Ministry of Education, Chongqing, China
Kun Lu Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China Academy of Agricultural Sciences, Southwest University, Chongqing, China Engineering Research Center of South Upland Agriculture, Ministry of Education, Chongqing, China

Collapse

Wang X, Wen Y. A penalized linear mixed model with generalized method of moments estimators for complex phenotype prediction. Bioinformatics 2022;38:5222-5228. [PMID: 36205617 DOI: 10.1093/bioinformatics/btac659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 07/27/2022] [Accepted: 10/05/2022] [Indexed: 12/24/2022] Open

Liu L, Meng Q, Weng C, Lu Q, Wang T, Wen Y. Explainable deep transfer learning model for disease risk prediction using high-dimensional genomic data. PLoS Comput Biol 2022;18:e1010328. [PMID: 35839250 PMCID: PMC9328574 DOI: 10.1371/journal.pcbi.1010328] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 07/27/2022] [Accepted: 06/27/2022] [Indexed: 11/19/2022] Open

Abstract

Building an accurate disease risk prediction model is an essential step in the modern quest for precision medicine. While high-dimensional genomic data provides valuable data resources for the investigations of disease risk, their huge amount of noise and complex relationships between predictors and outcomes have brought tremendous analytical challenges. Deep learning model is the state-of-the-art methods for many prediction tasks, and it is a promising framework for the analysis of genomic data. However, deep learning models generally suffer from the curse of dimensionality and the lack of biological interpretability, both of which have greatly limited their applications. In this work, we have developed a deep neural network (DNN) based prediction modeling framework. We first proposed a group-wise feature importance score for feature selection, where genes harboring genetic variants with both linear and non-linear effects are efficiently detected. We then designed an explainable transfer-learning based DNN method, which can directly incorporate information from feature selection and accurately capture complex predictive effects. The proposed DNN-framework is biologically interpretable, as it is built based on the selected predictive genes. It is also computationally efficient and can be applied to genome-wide data. Through extensive simulations and real data analyses, we have demonstrated that our proposed method can not only efficiently detect predictive features, but also accurately predict disease risk, as compared to many existing methods.

Accurate disease risk prediction is an essential step towards precision medicine. Deep learning models have achieved the state-of-the-art performance for many prediction tasks. However, they generally suffer from the curse of dimensionality and lack of biological interpretability, both of which have greatly limited their applications to the prediction analysis of whole-genome sequencing data. We present here an explainable deep transfer learning model for the analysis of high-dimensional genomic data. Our proposed method can detect predictive genes that harbor genetic variants with both linear and non-linear effects via the proposed group-wise feature importance score. It can also efficiently and accurately model disease risk based on the detected predictive genes using the proposed transfer-learning based network architecture. Our proposed method is built at the gene level, and thus is much more biologically interpretable. It is also computationally efficiently and can be applied to whole-exome sequencing data that have millions of potential predictors. Through both simulation studies and the analysis of whole-exome data obtained from the Alzheimer’s Disease Neuroimaging Initiative, we have demonstrated that our method can efficiently detect predictive genes and it has better prediction performance than many existing methods.

Collapse

Wang X, Wen Y. A penalized linear mixed model with generalized method of moments for prediction analysis on high-dimensional multi-omics data. Brief Bioinform 2022;23:6596990. [PMID: 35649346 PMCID: PMC9310531 DOI: 10.1093/bib/bbac193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 03/18/2022] [Accepted: 04/27/2022] [Indexed: 11/13/2022] Open

Liu YH, Zhang M, Scheuring CF, Cilkiz M, Sze SH, Smith CW, Murray SC, Xu W, Zhang HB. Accurate prediction of complex traits for individuals and offspring from parents using a simple, rapid, and efficient method for gene-based breeding in cotton and maize. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2022;316:111153. [PMID: 35151437 DOI: 10.1016/j.plantsci.2021.111153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 12/11/2021] [Indexed: 06/14/2023]

Wang T, Qiao J, Zhang S, Wei Y, Zeng P. Simultaneous test and estimation of total genetic effect in eQTL integrative analysis through mixed models. Brief Bioinform 2022;23:6535679. [PMID: 35212359 DOI: 10.1093/bib/bbac038] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 01/22/2022] [Accepted: 02/07/2021] [Indexed: 11/14/2022] Open

Abstract

Integration of expression quantitative trait loci (eQTL) into genome-wide association studies (GWASs) is a promising manner to reveal functional roles of associated single-nucleotide polymorphisms (SNPs) in complex phenotypes and has become an active research field in post-GWAS era. However, how to efficiently incorporate eQTL mapping study into GWAS for prioritization of causal genes remains elusive. We herein proposed a novel method termed as Mixed transcriptome-wide association studies (TWAS) and mediated Variance estimation (MTV) by modeling the effects of cis-SNPs of a gene as a function of eQTL. MTV formulates the integrative method and TWAS within a unified framework via mixed models and therefore includes many prior methods/tests as special cases. We further justified MTV from another two statistical perspectives of mediation analysis and two-stage Mendelian randomization. Relative to existing methods, MTV is superior for pronounced features including the processing of direct effects of cis-SNPs on phenotypes, the powerful likelihood ratio test for assessment of joint effects of cis-SNPs and genetically regulated gene expression (GReX), two useful quantities to measure relative genetic contributions of GReX and cis-SNPs to phenotypic variance, and the computationally efferent parameter expansion expectation maximum algorithm. With extensive simulations, we identified that MTV correctly controlled the type I error in joint evaluation of the total genetic effect and proved more powerful to discover true association signals across various scenarios compared to existing methods. We finally applied MTV to 41 complex traits/diseases available from three GWASs and discovered many new associated genes that had otherwise been missed by existing methods. We also revealed that a small but substantial fraction of phenotypic variation was mediated by GReX. Overall, MTV constructs a robust and realistic modeling foundation for integrative omics analysis and has the advantage of offering more attractive biological interpretations of GWAS results.

Collapse

Demetci P, Cheng W, Darnell G, Zhou X, Ramachandran S, Crawford L. Multi-scale inference of genetic trait architecture using biologically annotated neural networks. PLoS Genet 2021;17:e1009754. [PMID: 34411094 PMCID: PMC8407593 DOI: 10.1371/journal.pgen.1009754] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Revised: 08/31/2021] [Accepted: 07/31/2021] [Indexed: 01/01/2023] Open

Hai Y, Wen Y. A Bayesian linear mixed model for prediction of complex traits. Bioinformatics 2021;36:5415-5423. [PMID: 33331865 PMCID: PMC8016495 DOI: 10.1093/bioinformatics/btaa1023] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Revised: 11/24/2020] [Accepted: 11/27/2020] [Indexed: 11/13/2022] Open

Zeng P, Dai J, Jin S, Zhou X. Aggregating multiple expression prediction models improves the power of transcriptome-wide association studies. Hum Mol Genet 2021;30:939-951. [PMID: 33615361 DOI: 10.1093/hmg/ddab056] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Revised: 02/10/2021] [Accepted: 02/15/2021] [Indexed: 12/11/2022] Open

Abstract

Transcriptome-wide association study (TWAS) is an important integrative method for identifying genes that are causally associated with phenotypes. A key step of TWAS involves the construction of expression prediction models for every gene in turn using its cis-SNPs as predictors. Different TWAS methods rely on different models for gene expression prediction, and each such model makes a distinct modeling assumption that is often suitable for a particular genetic architecture underlying expression. However, the genetic architectures underlying gene expression vary across genes throughout the transcriptome. Consequently, different TWAS methods may be beneficial in detecting genes with distinct genetic architectures. Here, we develop a new method, HMAT, which aggregates TWAS association evidence obtained across multiple gene expression prediction models by leveraging the harmonic mean P-value combination strategy. Because each expression prediction model is suited to capture a particular genetic architecture, aggregating TWAS associations across prediction models as in HMAT improves accurate expression prediction and enables subsequent powerful TWAS analysis across the transcriptome. A key feature of HMAT is its ability to accommodate the correlations among different TWAS test statistics and produce calibrated P-values after aggregation. Through numerical simulations, we illustrated the advantage of HMAT over commonly used TWAS methods as well as ad hoc P-value combination rules such as Fisher's method. We also applied HMAT to analyze summary statistics of nine common diseases. In the real data applications, HMAT was on average 30.6% more powerful compared to the next best method, detecting many new disease-associated genes that were otherwise not identified by existing TWAS approaches. In conclusion, HMAT represents a flexible and powerful TWAS method that enjoys robust performance across a range of genetic architectures underlying gene expression.

Collapse

Coupled mixed model for joint genetic analysis of complex disorders with two independently collected data sets. BMC Bioinformatics 2021;22:50. [PMID: 33546598 PMCID: PMC7866684 DOI: 10.1186/s12859-021-03959-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 01/06/2021] [Indexed: 11/10/2022] Open

Li J, Lu Q, Wen Y. Multi-kernel linear mixed model with adaptive lasso for prediction analysis on high-dimensional multi-omics data. Bioinformatics 2020;36:1785-1794. [PMID: 31693075 DOI: 10.1093/bioinformatics/btz822] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Revised: 10/08/2019] [Accepted: 11/01/2019] [Indexed: 12/11/2022] Open

Wen Y, Lu Q. Multikernel linear mixed model with adaptive lasso for complex phenotype prediction. Stat Med 2020;39:1311-1327. [PMID: 31985088 DOI: 10.1002/sim.8477] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2019] [Revised: 11/17/2019] [Accepted: 12/24/2019] [Indexed: 12/15/2022]

Wang X, Wen Y. A U-statistics for integrative analysis of multilayer omics data. Bioinformatics 2020;36:2365-2374. [PMID: 31913435 DOI: 10.1093/bioinformatics/btaa004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Revised: 12/09/2019] [Accepted: 01/02/2020] [Indexed: 11/12/2022] Open

Liu YH, Xu Y, Zhang M, Cui Y, Sze SH, Smith CW, Xu S, Zhang HB. Accurate Prediction of a Quantitative Trait Using the Genes Controlling the Trait for Gene-Based Breeding in Cotton. FRONTIERS IN PLANT SCIENCE 2020;11:583277. [PMID: 33281846 PMCID: PMC7690289 DOI: 10.3389/fpls.2020.583277] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 10/15/2020] [Indexed: 05/03/2023]

Zeng P, Hao X, Zhou X. Pleiotropic mapping and annotation selection in genome-wide association studies with penalized Gaussian mixture models. Bioinformatics 2019;34:2797-2807. [PMID: 29635306 DOI: 10.1093/bioinformatics/bty204] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Accepted: 04/02/2018] [Indexed: 12/11/2022] Open

Abstract

Motivation

Genome-wide association studies (GWASs) have identified many genetic loci associated with complex traits. A substantial fraction of these identified loci is associated with multiple traits-a phenomena known as pleiotropy. Identification of pleiotropic associations can help characterize the genetic relationship among complex traits and can facilitate our understanding of disease etiology. Effective pleiotropic association mapping requires the development of statistical methods that can jointly model multiple traits with genome-wide single nucleic polymorphisms (SNPs) together.

Results

We develop a joint modeling method, which we refer to as the integrative MApping of Pleiotropic association (iMAP). iMAP models summary statistics from GWASs, uses a multivariate Gaussian distribution to account for phenotypic correlation, simultaneously infers genome-wide SNP association pattern using mixture modeling and has the potential to reveal causal relationship between traits. Importantly, iMAP integrates a large number of SNP functional annotations to substantially improve association mapping power, and, with a sparsity-inducing penalty, is capable of selecting informative annotations from a large, potentially non-informative set. To enable scalable inference of iMAP to association studies with hundreds of thousands of individuals and millions of SNPs, we develop an efficient expectation maximization algorithm based on an approximate penalized regression algorithm. With simulations and comparisons to existing methods, we illustrate the benefits of iMAP in terms of both high association mapping power and accurate estimation of genome-wide SNP association patterns. Finally, we apply iMAP to perform a joint analysis of 48 traits from 31 GWAS consortia together with 40 tissue-specific SNP annotations generated from the Roadmap Project.

Availability and implementation

iMAP is freely available at http://www.xzlab.org/software.html.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Mueller LD, Phillips MA, Barter TT, Greenspan ZS, Rose MR. Genome-Wide Mapping of Gene-Phenotype Relationships in Experimentally Evolved Populations. Mol Biol Evol 2019;35:2085-2095. [PMID: 29860403 DOI: 10.1093/molbev/msy113] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Crawford L, Flaxman SR, Runcie DE, West M. VARIABLE PRIORITIZATION IN NONLINEAR BLACK BOX METHODS: A GENETIC ASSOCIATION CASE STUDY¹. Ann Appl Stat 2019;13:958-989. [PMID: 32542104 PMCID: PMC7295151 DOI: 10.1214/18-aoas1222] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Jackknife Model Averaging Prediction Methods for Complex Phenotypes with Gene Expression Levels by Integrating External Pathway Information. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2019;2019:2807470. [PMID: 31089389 PMCID: PMC6476151 DOI: 10.1155/2019/2807470] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/13/2019] [Accepted: 03/20/2019] [Indexed: 01/03/2023]

Abstract

Motivation

In the past few years many prediction approaches have been proposed and widely employed in high dimensional genetic data for disease risk evaluation. However, those approaches typically ignore in model fitting the important group structures that naturally exists in genetic data.

Methods

In the present study, we applied a novel model-averaging approach, called jackknife model averaging prediction (JMAP), for high dimensional genetic risk prediction while incorporating pathway information into the model specification. JMAP selects the optimal weights across candidate models by minimizing a cross validation criterion in a jackknife way. Compared with previous approaches, one of the primary features of JMAP is to allow model weights to vary from 0 to 1 but without the limitation that the summation of weights is equal to one. We evaluated the performance of JMAP using extensive simulation studies and compared it with existing methods. We finally applied JMAP to four real cancer datasets that are publicly available from TCGA.

Results

The simulations showed that compared with other existing approaches (e.g., gsslasso), JMAP performed best or is among the best methods across a range of scenarios. For example, among 14 out of 16 simulation settings with PVE = 0.3, JMAP has an average of 0.075 higher prediction accuracy compared with gsslasso. We further found that in the simulation, the model weights for the true candidate models have much smaller chances to be zero compared with those for the null candidate models and are substantially greater in magnitude. In the real data application, JMAP also behaves comparably or better compared with the other methods for continuous phenotypes. For example, for the COAD, CRC, and PAAD datasets, the average gains of predictive accuracy of JMAP are 0.019, 0.064, and 0.052 compared with gsslasso.

Conclusion

The proposed method JMAP is a novel model-averaging approach for high dimensional genetic risk prediction while incorporating external useful group structures into the model specification.

Collapse

Zhang H, Yin L, Wang M, Yuan X, Liu X. Factors Affecting the Accuracy of Genomic Selection for Agricultural Economic Traits in Maize, Cattle, and Pig Populations. Front Genet 2019;10:189. [PMID: 30923535 PMCID: PMC6426750 DOI: 10.3389/fgene.2019.00189] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Accepted: 02/21/2019] [Indexed: 11/20/2022] Open

Abstract

Genomic Selection (GS) has been proved to be a powerful tool for estimating genetic values in plant and livestock breeding. Newly developed sequencing technologies have dramatically reduced the cost of genotyping and significantly increased the scale of genotype data that used for GS. Meanwhile, state-of-the-art statistical methods were developed to make the best use of high marker density genotype data. In this study, 14 traits from four data sets of three species (maize, cattle, and pig) and five influential factors that affect the prediction accuracy were evaluated, including marker density (from 1 to ~600 k), statistical method (GBLUP-A, GBLUP-AD, and BayesR), minor allele frequency (MAF), heritability, and genetic architecture. Results indicate that in the GBLUP method, higher marker density leads to a higher prediction accuracy. In contrast, BayesR method needs more Monte Carlo Markov Chain (MCMC) iterations to reach the convergence and get reliable prediction values. BayesR outperforms GBLUP in predicting high or medium heritability trait that affected by one or several genes with large effects, while GBLUP performs similarly or slightly better than BayesR in predicting low heritability trait that controlled by a large amount of genes with minor effects. Prediction accuracy of trait with complex genetic architecture can be improved by increasing the marker density. Interestingly, for simple traits that controlled by one or several genes with large effects, higher marker density can cause a lower prediction accuracy if the QTN is included, but leads to a higher prediction accuracy if the QTN is excluded. The quantity of genetic markers with low MAF would not significantly affect the prediction accuracy of GBLUP, but results in a bad prediction accuracy performance of BayesR method. Compared with GBLUP-A, GBLUP-AD didn't show any advantages in capturing the non-additive variance for the traits with high heritability. The factors that affected prediction accuracy are discussed in this study and indicate that a combination of either GBLUP or BayesR method with moderate marker density and favorable polymorphism single nucleotide polymorphisms (SNPs) (~25 k SNPs) would always produce a good and stable prediction accuracy with acceptable breeding and computational costs.

Collapse

Weissbrod O, Rothschild D, Barkan E, Segal E. Host genetics and microbiome associations through the lens of genome wide association studies. Curr Opin Microbiol 2018;44:9-19. [PMID: 29909175 DOI: 10.1016/j.mib.2018.05.003] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Revised: 03/15/2018] [Accepted: 05/25/2018] [Indexed: 12/22/2022]

Márquez-Luna C, Loh PR, Price AL. Multiethnic polygenic risk scores improve risk prediction in diverse populations. Genet Epidemiol 2017;41:811-823. [PMID: 29110330 PMCID: PMC5726434 DOI: 10.1002/gepi.22083] [Citation(s) in RCA: 183] [Impact Index Per Article: 26.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2017] [Revised: 08/16/2017] [Accepted: 08/30/2017] [Indexed: 01/04/2023]

Mangin B, Bonnafous F, Blanchet N, Boniface MC, Bret-Mestries E, Carrère S, Cottret L, Legrand L, Marage G, Pegot-Espagnet P, Munos S, Pouilly N, Vear F, Vincourt P, Langlade NB. Genomic Prediction of Sunflower Hybrids Oil Content. FRONTIERS IN PLANT SCIENCE 2017;8:1633. [PMID: 28983306 DOI: 10.3389/fpls.2017.01633d] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Accepted: 09/06/2017] [Indexed: 05/26/2023]

Abstract

Prediction of hybrid performance using incomplete factorial mating designs is widely used in breeding programs including different heterotic groups. Based on the general combining ability (GCA) of the parents, predictions are accurate only if the genetic variance resulting from the specific combining ability is small and both parents have phenotyped descendants. Genomic selection (GS) can predict performance using a model trained on both phenotyped and genotyped hybrids that do not necessarily include all hybrid parents. Therefore, GS could overcome the issue of unknown parent GCA. Here, we compared the accuracy of classical GCA-based and genomic predictions for oil content of sunflower seeds using several GS models. Our study involved 452 sunflower hybrids from an incomplete factorial design of 36 female and 36 male lines. Re-sequencing of parental lines allowed to identify 468,194 non-redundant SNPs and to infer the hybrid genotypes. Oil content was observed in a multi-environment trial (MET) over 3 years, leading to nine different environments. We compared GCA-based model to different GS models including female and male genomic kinships with the addition of the female-by-male interaction genomic kinship, the use of functional knowledge as SNPs in genes of oil metabolic pathways, and with epistasis modeling. When both parents have descendants in the training set, the predictive ability was high even for GCA-based prediction, with an average MET value of 0.782. GS performed slightly better (+0.2%). Neither the inclusion of the female-by-male interaction, nor functional knowledge of oil metabolism, nor epistasis modeling improved the GS accuracy. GS greatly improved predictive ability when one or both parents were untested in the training set, increasing GCA-based predictive ability by 10.4% from 0.575 to 0.635 in the MET. In this scenario, performing GS only considering SNPs in oil metabolic pathways did not improve whole genome GS prediction but increased GCA-based prediction ability by 6.4%. Our results show that GS is a major improvement to breeding efficiency compared to the classical GCA modeling when either one or both parents are not well-characterized. This finding could therefore accelerate breeding through reducing phenotyping efforts and more effectively targeting for the most promising crosses.

Collapse

Non-parametric genetic prediction of complex traits with latent Dirichlet process regression models. Nat Commun 2017;8:456. [PMID: 28878256 PMCID: PMC5587666 DOI: 10.1038/s41467-017-00470-2] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2016] [Accepted: 06/30/2017] [Indexed: 01/03/2023] Open

Zeng P, Zhou X, Huang S. Prediction of gene expression with cis-SNPs using mixed models and regularization methods. BMC Genomics 2017;18:368. [PMID: 28490319 PMCID: PMC5425981 DOI: 10.1186/s12864-017-3759-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2016] [Accepted: 05/03/2017] [Indexed: 12/25/2022] Open

Abstract

Background

It has been shown that gene expression in human tissues is heritable, thus predicting gene expression using only SNPs becomes possible. The prediction of gene expression can offer important implications on the genetic architecture of individual functional associated SNPs and further interpretations of the molecular basis underlying human diseases.

Methods

We compared three types of methods for predicting gene expression using only cis-SNPs, including the polygenic model, i.e. linear mixed model (LMM), two sparse models, i.e. Lasso and elastic net (ENET), and the hybrid of LMM and sparse model, i.e. Bayesian sparse linear mixed model (BSLMM). The three kinds of prediction methods have very different assumptions of underlying genetic architectures. These methods were evaluated using simulations under various scenarios, and were applied to the Geuvadis gene expression data.

Results

The simulations showed that these four prediction methods (i.e. Lasso, ENET, LMM and BSLMM) behaved best when their respective modeling assumptions were satisfied, but BSLMM had a robust performance across a range of scenarios. According to R² of these models in the Geuvadis data, the four methods performed quite similarly. We did not observe any clustering or enrichment of predictive genes (defined as genes with R² ≥ 0.05) across the chromosomes, and also did not see there was any clear relationship between the proportion of the predictive genes and the proportion of genes in each chromosome. However, an interesting finding in the Geuvadis data was that highly predictive genes (e.g. R² ≥ 0.30) may have sparse genetic architectures since Lasso, ENET and BSLMM outperformed LMM for these genes; and this observation was validated in another gene expression data. We further showed that the predictive genes were enriched in approximately independent LD blocks.

Conclusions

Gene expression can be predicted with only cis-SNPs using well-developed prediction models and these predictive genes were enriched in some approximately independent LD blocks. The prediction of gene expression can shed some light on the functional interpretation for identified SNPs in GWASs.

Collapse

Mangin B, Bonnafous F, Blanchet N, Boniface MC, Bret-Mestries E, Carrère S, Cottret L, Legrand L, Marage G, Pegot-Espagnet P, Munos S, Pouilly N, Vear F, Vincourt P, Langlade NB. Genomic Prediction of Sunflower Hybrids Oil Content. FRONTIERS IN PLANT SCIENCE 2017;8:1633. [PMID: 28983306 PMCID: PMC5613134 DOI: 10.3389/fpls.2017.01633] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Accepted: 09/06/2017] [Indexed: 05/18/2023]

Abstract

Collapse