1
|
Gianola D, Fernando RL. A Multiple-Trait Bayesian Lasso for Genome-Enabled Analysis and Prediction of Complex Traits. Genetics 2020; 214:305-331. [PMID: 31879318 PMCID: PMC7017027 DOI: 10.1534/genetics.119.302934] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Accepted: 12/20/2019] [Indexed: 12/21/2022] Open
Abstract
A multiple-trait Bayesian LASSO (MBL) for genome-based analysis and prediction of quantitative traits is presented and applied to two real data sets. The data-generating model is a multivariate linear Bayesian regression on possibly a huge number of molecular markers, and with a Gaussian residual distribution posed. Each (one per marker) of the [Formula: see text] vectors of regression coefficients (T: number of traits) is assigned the same T-variate Laplace prior distribution, with a null mean vector and unknown scale matrix Σ. The multivariate prior reduces to that of the standard univariate Bayesian LASSO when [Formula: see text] The covariance matrix of the residual distribution is assigned a multivariate Jeffreys prior, and Σ is given an inverse-Wishart prior. The unknown quantities in the model are learned using a Markov chain Monte Carlo sampling scheme constructed using a scale-mixture of normal distributions representation. MBL is demonstrated in a bivariate context employing two publicly available data sets using a bivariate genomic best linear unbiased prediction model (GBLUP) for benchmarking results. The first data set is one where wheat grain yields in two different environments are treated as distinct traits. The second data set comes from genotyped Pinus trees, with each individual measured for two traits: rust bin and gall volume. In MBL, the bivariate marker effects are shrunk differentially, i.e., "short" vectors are more strongly shrunk toward the origin than in GBLUP; conversely, "long" vectors are shrunk less. A predictive comparison was carried out as well in wheat, where the comparators of MBL were bivariate GBLUP and bivariate Bayes Cπ-a variable selection procedure. A training-testing layout was used, with 100 random reconstructions of training and testing sets. For the wheat data, all methods produced similar predictions. In Pinus, MBL gave better predictions that either a Bayesian bivariate GBLUP or the single trait Bayesian LASSO. MBL has been implemented in the Julia language package JWAS, and is now available for the scientific community to explore with different traits, species, and environments. It is well known that there is no universally best prediction machine, and MBL represents a new resource in the armamentarium for genome-enabled analysis and prediction of complex traits.
Collapse
Affiliation(s)
- Daniel Gianola
- Department of Animal Sciences, University of Wisconsin-Madison, Wisconsin 53706
- Department of Dairy Science, University of Wisconsin-Madison, Wisconsin 53706
- Department of Animal Science, Iowa State University, Ames, Iowa 50011
- Department of Plant Sciences, Technical University of Munich (TUM), TUM School of Life Sciences, Freising, 85354 Germany
| | - Rohan L Fernando
- Department of Animal Science, Iowa State University, Ames, Iowa 50011
| |
Collapse
|
2
|
López de Maturana E, Alonso L, Alarcón P, Martín-Antoniano IA, Pineda S, Piorno L, Calle ML, Malats N. Challenges in the Integration of Omics and Non-Omics Data. Genes (Basel) 2019; 10:genes10030238. [PMID: 30897838 PMCID: PMC6471713 DOI: 10.3390/genes10030238] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2019] [Revised: 03/05/2019] [Accepted: 03/14/2019] [Indexed: 11/16/2022] Open
Abstract
Omics data integration is already a reality. However, few omics-based algorithms show enough predictive ability to be implemented into clinics or public health domains. Clinical/epidemiological data tend to explain most of the variation of health-related traits, and its joint modeling with omics data is crucial to increase the algorithm’s predictive ability. Only a small number of published studies performed a “real” integration of omics and non-omics (OnO) data, mainly to predict cancer outcomes. Challenges in OnO data integration regard the nature and heterogeneity of non-omics data, the possibility of integrating large-scale non-omics data with high-throughput omics data, the relationship between OnO data (i.e., ascertainment bias), the presence of interactions, the fairness of the models, and the presence of subphenotypes. These challenges demand the development and application of new analysis strategies to integrate OnO data. In this contribution we discuss different attempts of OnO data integration in clinical and epidemiological studies. Most of the reviewed papers considered only one type of omics data set, mainly RNA expression data. All selected papers incorporated non-omics data in a low-dimensionality fashion. The integrative strategies used in the identified papers adopted three modeling methods: Independent, conditional, and joint modeling. This review presents, discusses, and proposes integrative analytical strategies towards OnO data integration.
Collapse
Affiliation(s)
- Evangelina López de Maturana
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), and CIBERONC, Melchor Fernández Almagro 3, 28029 Madrid, Spain.
| | - Lola Alonso
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), and CIBERONC, Melchor Fernández Almagro 3, 28029 Madrid, Spain.
| | - Pablo Alarcón
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), and CIBERONC, Melchor Fernández Almagro 3, 28029 Madrid, Spain.
| | - Isabel Adoración Martín-Antoniano
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), and CIBERONC, Melchor Fernández Almagro 3, 28029 Madrid, Spain.
| | - Silvia Pineda
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), and CIBERONC, Melchor Fernández Almagro 3, 28029 Madrid, Spain.
| | - Lucas Piorno
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), and CIBERONC, Melchor Fernández Almagro 3, 28029 Madrid, Spain.
| | - M Luz Calle
- Biosciences Department, University of Vic-Central University of Catalonia, Carrer de la Laura 13, 08570 Vic, Spain.
| | - Núria Malats
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), and CIBERONC, Melchor Fernández Almagro 3, 28029 Madrid, Spain.
| |
Collapse
|
3
|
de Maturana EL, Rava M, Anumudu C, Sáez O, Alonso D, Malats N. Bladder Cancer Genetic Susceptibility. A Systematic Review. Bladder Cancer 2018; 4:215-226. [PMID: 29732392 PMCID: PMC5929300 DOI: 10.3233/blc-170159] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Background: The variant/gene candidate approach to explore bladder cancer (BC) genetic susceptibility has been applied in many studies with significant findings reported. However, results are not always conclusive due to the lack of replication by subsequent studies. Objectives: To identify all epidemiological investigations on the genetic associations with BC risk, to quantify the likely magnitude of the associations by applying metaanalysis methodology and to assess whether there is a potential for publication/reporting bias. Methods: To address our aims, we have catalogued all genetic association studies published in the field of BC risk since 2000. Furthermore, we metaanalysed all polymorphisms with data available from at least three independent case-control studies with subjects of Caucasian origin analyzed under the same mode of inheritance. Results: The characterization of the genetic susceptibility of BC is composed of 28 variants, GWAS contributing most of them. Most of the significant variants associated with BC risk are located in genes belonging to chemical carcinogenesis, DNA repair, and cell cycle pathways. Causal relationship was also provided by functional analysis for GSTM1-null, NAT2-slow, APOBEC-rs1014971, CCNE1-rs8102137, SLC14A1-rs10775480, PSCA-rs2294008, UGT1A-rs1189203, and TP63-rs35592567. Conclusions: Genetic susceptibility of BC is still poorly defined, with GWAS contributing most of the strongest evidence. The systematic review did not provide evidence of further genetic associations. The potential public health translation of the existing knowledge on genetic susceptibility on BC is still limited.
Collapse
Affiliation(s)
| | - Marta Rava
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), and CIBERONC, Spain
| | - Chiaka Anumudu
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), and CIBERONC, Spain
| | - Olga Sáez
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), and CIBERONC, Spain
| | - Dolores Alonso
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), and CIBERONC, Spain
| | - Núria Malats
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), and CIBERONC, Spain
| |
Collapse
|
4
|
López de Maturana E, Malats N. Genetic Testing, Genetic Variation, and Genetic Susceptibility. Bladder Cancer 2018. [DOI: 10.1016/b978-0-12-809939-1.00033-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
5
|
A fast algorithm for Bayesian multi-locus model in genome-wide association studies. Mol Genet Genomics 2017; 292:923-934. [DOI: 10.1007/s00438-017-1322-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2016] [Accepted: 04/18/2017] [Indexed: 12/27/2022]
|
6
|
McGeachie MJ, Clemmer GL, Croteau-Chonka DC, Castaldi PJ, Cho MH, Sordillo JE, Lasky-Su JA, Raby BA, Tantisira KG, Weiss ST. Whole genome prediction and heritability of childhood asthma phenotypes. IMMUNITY INFLAMMATION AND DISEASE 2016; 4:487-496. [PMID: 27980782 PMCID: PMC5134727 DOI: 10.1002/iid3.133] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/13/2016] [Revised: 09/01/2016] [Accepted: 09/04/2016] [Indexed: 01/19/2023]
Abstract
Introduction While whole genome prediction (WGP) methods have recently demonstrated successes in the prediction of complex genetic diseases, they have not yet been applied to asthma and related phenotypes. Longitudinal patterns of lung function differ between asthmatics, but these phenotypes have not been assessed for heritability or predictive ability. Herein, we assess the heritability and genetic predictability of asthma‐related phenotypes. Methods We applied several WGP methods to a well‐phenotyped cohort of 832 children with mild‐to‐moderate asthma from CAMP. We assessed narrow‐sense heritability and predictability for airway hyperresponsiveness, serum immunoglobulin E, blood eosinophil count, pre‐ and post‐bronchodilator forced expiratory volume in 1 sec (FEV1), bronchodilator response, steroid responsiveness, and longitudinal patterns of lung function (normal growth, reduced growth, early decline, and their combinations). Prediction accuracy was evaluated using a training/testing set split of the cohort. Results We found that longitudinal lung function phenotypes demonstrated significant narrow‐sense heritability (reduced growth, 95%; normal growth with early decline, 55%). These same phenotypes also showed significant polygenic prediction (areas under the curve [AUCs] 56% to 62%). Including additional demographic covariates in the models increased prediction 4–8%, with reduced growth increasing from 62% to 66% AUC. We found that prediction with a genomic relatedness matrix was improved by filtering available SNPs based on chromatin evidence, and this result extended across cohorts. Conclusions Longitudinal reduced lung function growth displayed extremely high heritability. All phenotypes with significant heritability showed significant polygenic prediction. Using SNP‐prioritization increased prediction across cohorts. WGP methods show promise in predicting asthma‐related heritable traits.
Collapse
Affiliation(s)
- Michael J McGeachie
- Channing Division of Network Medicine Brigham and Women's Hospital and Harvard Medical School Boston Massachusetts
| | - George L Clemmer
- Channing Division of Network Medicine Brigham and Women's Hospital and Harvard Medical School Boston Massachusetts
| | - Damien C Croteau-Chonka
- Channing Division of Network Medicine Brigham and Women's Hospital and Harvard Medical School Boston Massachusetts
| | - Peter J Castaldi
- Channing Division of Network Medicine Brigham and Women's Hospital and Harvard Medical School Boston Massachusetts
| | - Michael H Cho
- Channing Division of Network Medicine Brigham and Women's Hospital and Harvard Medical School Boston Massachusetts
| | - Joanne E Sordillo
- Channing Division of Network Medicine Brigham and Women's Hospital and Harvard Medical School Boston Massachusetts
| | - Jessica A Lasky-Su
- Channing Division of Network Medicine Brigham and Women's Hospital and Harvard Medical School Boston Massachusetts
| | - Benjamin A Raby
- Channing Division of Network Medicine Brigham and Women's Hospital and Harvard Medical School Boston Massachusetts
| | - Kelan G Tantisira
- Channing Division of Network Medicine Brigham and Women's Hospital and Harvard Medical School Boston Massachusetts
| | - Scott T Weiss
- Channing Division of Network Medicine Brigham and Women's Hospital and Harvard Medical School Boston Massachusetts
| |
Collapse
|
7
|
Cross-Validation Without Doing Cross-Validation in Genome-Enabled Prediction. G3-GENES GENOMES GENETICS 2016; 6:3107-3128. [PMID: 27489209 PMCID: PMC5068934 DOI: 10.1534/g3.116.033381] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Cross-validation of methods is an essential component of genome-enabled prediction of complex traits. We develop formulae for computing the predictions that would be obtained when one or several cases are removed in the training process, to become members of testing sets, but by running the model using all observations only once. Prediction methods to which the developments apply include least squares, best linear unbiased prediction (BLUP) of markers, or genomic BLUP, reproducing kernels Hilbert spaces regression with single or multiple kernel matrices, and any member of a suite of linear regression methods known as “Bayesian alphabet.” The approach used for Bayesian models is based on importance sampling of posterior draws. Proof of concept is provided by applying the formulae to a wheat data set representing 599 inbred lines genotyped for 1279 markers, and the target trait was grain yield. The data set was used to evaluate predictive mean-squared error, impact of alternative layouts on maximum likelihood estimates of regularization parameters, model complexity, and residual degrees of freedom stemming from various strengths of regularization, as well as two forms of importance sampling. Our results will facilitate carrying out extensive cross-validation without model retraining for most machines employed in genome-assisted prediction of quantitative traits.
Collapse
|
8
|
López de Maturana E, Picornell A, Masson-Lecomte A, Kogevinas M, Márquez M, Carrato A, Tardón A, Lloreta J, García-Closas M, Silverman D, Rothman N, Chanock S, Real FX, Goddard ME, Malats N. Prediction of non-muscle invasive bladder cancer outcomes assessed by innovative multimarker prognostic models. BMC Cancer 2016; 16:351. [PMID: 27259534 PMCID: PMC4893282 DOI: 10.1186/s12885-016-2361-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2015] [Accepted: 05/12/2016] [Indexed: 01/28/2023] Open
Abstract
Background We adapted Bayesian statistical learning strategies to the prognosis field to investigate if genome-wide common SNP improve the prediction ability of clinico-pathological prognosticators and applied it to non-muscle invasive bladder cancer (NMIBC) patients. Methods Adapted Bayesian sequential threshold models in combination with LASSO were applied to consider the time-to-event and the censoring nature of data. We studied 822 NMIBC patients followed-up >10 years. The study outcomes were time-to-first-recurrence and time-to-progression. The predictive ability of the models including up to 171,304 SNP and/or 6 clinico-pathological prognosticators was evaluated using AUC-ROC and determination coefficient. Results Clinico-pathological prognosticators explained a larger proportion of the time-to-first-recurrence (3.1 %) and time-to-progression (5.4 %) phenotypic variances than SNPs (1 and 0.01 %, respectively). Adding SNPs to the clinico-pathological-parameters model slightly improved the prediction of time-to-first-recurrence (up to 4 %). The prediction of time-to-progression using both clinico-pathological prognosticators and SNP did not improve. Heritability (ĥ2) of both outcomes was <1 % in NMIBC. Conclusions We adapted a Bayesian statistical learning method to deal with a large number of parameters in prognostic studies. Common SNPs showed a limited role in predicting NMIBC outcomes yielding a very low heritability for both outcomes. We report for the first time a heritability estimate for a disease outcome. Our method can be extended to other disease models. Electronic supplementary material The online version of this article (doi:10.1186/s12885-016-2361-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- E López de Maturana
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), C/Melchor Fernández, Almagro, 3, 28029, Madrid, Spain
| | - A Picornell
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), C/Melchor Fernández, Almagro, 3, 28029, Madrid, Spain
| | - A Masson-Lecomte
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), C/Melchor Fernández, Almagro, 3, 28029, Madrid, Spain
| | - M Kogevinas
- Centre for Research in Environmental Epidemiology (CREAL), Parc de Salut Mar, Barcelona, Spain.,CIBERESP, Madrid, Spain
| | - M Márquez
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), C/Melchor Fernández, Almagro, 3, 28029, Madrid, Spain
| | - A Carrato
- Servicio de Oncología, Hospital Universitario Ramon y Cajal, Madrid, and Servicio de Oncología, Hospital Universitario de Elche, Elche, Spain
| | - A Tardón
- Department of Preventive Medicine Universidad de Oviedo, Oviedo, Spain.,CIBERESP, Madrid, Spain
| | - J Lloreta
- Parc de Salut Mar and Departament of Pathology, Hospital del Mar - IMAS, Barcelona, Spain
| | - M García-Closas
- Division of Genetics and Epidemiology, Institute of Cancer Research, London, UK
| | - D Silverman
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Department of Health and Human Services, Bethesda, Maryland, USA
| | - N Rothman
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Department of Health and Human Services, Bethesda, Maryland, USA
| | - S Chanock
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Department of Health and Human Services, Bethesda, Maryland, USA
| | - F X Real
- Epithelial Carcinogenesis Group, Spanish National Cancer Research Centre (CNIO), Madrid, and Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Spain
| | - M E Goddard
- Biosciences Research Division, Department of Environment and Primary Industries, Agribio, and Department of Food and Agricultural Systems, University of Melbourne, Melbourne, Australia
| | - N Malats
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), C/Melchor Fernández, Almagro, 3, 28029, Madrid, Spain.
| | | |
Collapse
|