1
|
Anguita-Ruiz A, Amine I, Stratakis N, Maitre L, Julvez J, Urquiza J, Luo C, Nieuwenhuijsen M, Thomsen C, Grazuleviciene R, Heude B, McEachan R, Vafeiadi M, Chatzi L, Wright J, Yang TC, Slama R, Siroux V, Vrijheid M, Basagaña X. Beyond the single-outcome approach: A comparison of outcome-wide analysis methods for exposome research. ENVIRONMENT INTERNATIONAL 2023; 182:108344. [PMID: 38016387 DOI: 10.1016/j.envint.2023.108344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 10/16/2023] [Accepted: 11/20/2023] [Indexed: 11/30/2023]
Abstract
Outcome-wide analysis can offer several benefits, including increased power to detect weak signals and the ability to identify exposures with multiple effects on health, which may be good targets for preventive measures. Recently, advanced statistical multivariate techniques for outcome-wide analysis have been developed, but they have been rarely applied to exposome analysis. In this work, we provide an overview of a selection of methods that are well-suited for outcome-wide exposome analysis and are implemented in the R statistical software. Our work brings together six different methods presenting innovative solutions for typical problems arising from outcome-wide approaches in the context of the exposome, including dependencies among outcomes, high dimensionality, mixed-type outcomes, missing data records, and confounding effects. The identified methods can be grouped into four main categories: regularized multivariate regression techniques, multi-task learning approaches, dimensionality reduction approaches, and bayesian extensions of the multivariate regression framework. Here, we compare each technique presenting its main rationale, strengths, and limitations, and provide codes and guidelines for their application to exposome data. Additionally, we apply all selected methods to a real exposome dataset from the Human Early-Life Exposome (HELIX) project, demonstrating their suitability for exposome research. Although the choice of the best method will always depend on the challenges to be faced in each application, for an exposome-like analysis we find dimensionality reduction and bayesian methods such as reduced rank regression (RRR) or multivariate bayesian shrinkage priors (MBSP) particularly useful, given their ability to deal with critical issues such as collinearity, high-dimensionality, missing data or quantification of uncertainty.
Collapse
Affiliation(s)
- Augusto Anguita-Ruiz
- ISGlobal, 08003 Barcelona, Spain; CIBEROBN (CIBER Physiopathology of Obesity and Nutrition), Instituto de Salud Carlos III, 28029 Madrid, Spain
| | - Ines Amine
- University Grenoble Alpes, Inserm U 1209, CNRS UMR 5309, Team of Environmental Epidemiology Applied to the Development and Respiratory Health, Institute for Advanced Biosciences, 38000 Grenoble, France
| | | | - Lea Maitre
- ISGlobal, 08003 Barcelona, Spain; Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain; CIBER Epidemiología y Salud Pública (CIBERESP), 28029 Madrid, Spain
| | - Jordi Julvez
- ISGlobal, 08003 Barcelona, Spain; CIBEROBN (CIBER Physiopathology of Obesity and Nutrition), Instituto de Salud Carlos III, 28029 Madrid, Spain; Epidemiology and Environmental Health Joint Research Unit, Foundation for the Promotion of Health and Biomedical Research in the Valencian Region, FISABIO-Public Health, FISABIO-Universitat Jaume I-Universitat de València, Av. Catalunya 21, 46020 Valencia, Spain; Institut d'Investigació Sanitària Pere Virgili (IISPV), Clinical and Epidemiological Neuroscience Group (NeuroÈpia), 43204 Reus (Tarragona), Catalonia, Spain
| | | | - Chongliang Luo
- Division of Public Health Sciences, Washington University School of Medicine in St. Louis, 600 S Taylor Ave, St. Louis, MO 63110, USA
| | - Mark Nieuwenhuijsen
- ISGlobal, 08003 Barcelona, Spain; Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain; CIBER Epidemiología y Salud Pública (CIBERESP), 28029 Madrid, Spain
| | - Cathrine Thomsen
- Department of Food Safety, Norwegian Institute of Public Health (NIPH), Oslo, Norway
| | - Regina Grazuleviciene
- Department of Environmental Science, Vytautas Magnus University, 44248 Kaunas, Lithuania
| | - Barbara Heude
- Université Paris Cité and Université Sorbonne Paris Nord, Inserm, INRAE, Center for Research in Epidemiology and StatisticS (CRESS), F-75004 Paris, France
| | - Rosemary McEachan
- Bradford Institute for Health Research, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, UK
| | - Marina Vafeiadi
- Department of Social Medicine, School of Medicine, University of Crete, Heraklion, Crete, Greece
| | - Leda Chatzi
- Department of Social Medicine, School of Medicine, University of Crete, Heraklion, Crete, Greece
| | - John Wright
- Bradford Institute for Health Research, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, UK
| | - Tiffany C Yang
- Bradford Institute for Health Research, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, UK
| | - Rémy Slama
- University Grenoble Alpes, Inserm U 1209, CNRS UMR 5309, Team of Environmental Epidemiology Applied to the Development and Respiratory Health, Institute for Advanced Biosciences, 38000 Grenoble, France
| | - Valérie Siroux
- University Grenoble Alpes, Inserm U 1209, CNRS UMR 5309, Team of Environmental Epidemiology Applied to the Development and Respiratory Health, Institute for Advanced Biosciences, 38000 Grenoble, France
| | - Martine Vrijheid
- ISGlobal, 08003 Barcelona, Spain; Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain; CIBER Epidemiología y Salud Pública (CIBERESP), 28029 Madrid, Spain
| | - Xavier Basagaña
- ISGlobal, 08003 Barcelona, Spain; Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain; CIBER Epidemiología y Salud Pública (CIBERESP), 28029 Madrid, Spain.
| |
Collapse
|
3
|
Ma W, Chen LS, Özbek U, Han SW, Lin C, Paulovich AG, Zhong H, Wang P. Integrative Proteo-genomic Analysis to Construct CNA-protein Regulatory Map in Breast and Ovarian Tumors. Mol Cell Proteomics 2019; 18:S66-S81. [PMID: 31281117 PMCID: PMC6692778 DOI: 10.1074/mcp.ra118.001229] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Revised: 07/01/2019] [Indexed: 12/16/2022] Open
Abstract
Recent development in high throughput proteomics and genomics profiling enable one to study regulations of genome alterations on protein activities in a systematic manner. In this article, we propose a new statistical method, ProMAP, to systematically characterize the regulatory relationships between proteins and DNA copy number alterations (CNA) in breast and ovarian tumors based on proteogenomic data from the CPTAC-TCGA studies. Because of the dynamic nature of mass spectrometry instruments, proteomics data from labeled mass spectrometry experiments usually have non-ignorable batch effects. Moreover, mass spectrometry based proteomic data often possesses high percentages of missing values and non-ignorable missing-data patterns. Thus, we use a linear mixed effects model to account for the batch structure and explicitly incorporate the abundance-dependent-missing-data mechanism of proteomic data in ProMAP. In addition, we employ a multivariate regression framework to characterize the multiple-to-multiple regulatory relationships between CNA and proteins. Further, we use proper statistical regularization to facilitate the detection of master genetic regulators, which affect the activities of many proteins and often play important roles in genetic regulatory networks. Improved performance of ProMAP over existing methods were illustrated through extensive simulation studies and real data examples. Applying ProMAP to the CPTAC-TCGA breast and ovarian cancer data sets, we identified many genome regions, including a few novel ones, whose CNA were associated with protein and or phosphoprotein abundances. For example, in breast tumors, a small region in 8p11.21 was recognized as the second biggest hub in the CNA-phosphoprotein regulatory map, and further investigation of the regulatory targets suggests the potential role of 8p11.21 CNA in perturbing oxygen binding and transport activities in tumor cells. This and other findings from our analyses help to characterize the impacts of CNAs on protein activity landscapes and cast light on the genetic regulation mechanisms underlying these tumors.
Collapse
Affiliation(s)
- Weiping Ma
- ‡Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York 10029
| | - Lin S. Chen
- §Department of Public Health Sciences, University of Chicago Chicago, IL 60637
| | - Umut Özbek
- ¶Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai New York, New York 10029
| | - Sung Won Han
- ‖School of Industrial Management Engineering, Korea University, 145, Anam-ro, Seongbuk-gu, Seoul, 02841, Rep. of KOREA
| | - Chenwei Lin
- **Clinical Research Division, Fred Hutchinson Cancer Research Center Seattle Washington 98109–1024
| | - Amanda G. Paulovich
- **Clinical Research Division, Fred Hutchinson Cancer Research Center Seattle Washington 98109–1024
| | - Hua Zhong
- ‡‡Division of Biostatistics, Department of Population Health, New York University New York, New York 10016
| | - Pei Wang
- ‡Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York 10029
| |
Collapse
|
4
|
Lukowski SW, Lloyd-Jones LR, Holloway A, Kirsten H, Hemani G, Yang J, Small K, Zhao J, Metspalu A, Dermitzakis ET, Gibson G, Spector TD, Thiery J, Scholz M, Montgomery GW, Esko T, Visscher PM, Powell JE. Genetic correlations reveal the shared genetic architecture of transcription in human peripheral blood. Nat Commun 2017; 8:483. [PMID: 28883458 PMCID: PMC5589780 DOI: 10.1038/s41467-017-00473-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2016] [Accepted: 06/30/2017] [Indexed: 01/29/2023] Open
Abstract
Transcript co-expression is regulated by a combination of shared genetic and environmental factors. Here, we estimate the proportion of co-expression that is due to shared genetic variance. To do so, we estimated the genetic correlations between each pairwise combination of 2469 transcripts that are highly heritable and expressed in whole blood in 1748 unrelated individuals of European ancestry. We identify 556 pairs with a significant genetic correlation of which 77% are located on different chromosomes, and report 934 expression quantitative trait loci, identified in an independent cohort, with significant effects on both transcripts in a genetically correlated pair. We show significant enrichment for transcription factor control and physical proximity through chromatin interactions as possible mechanisms of shared genetic control. Finally, we construct networks of interconnected transcripts and identify their underlying biological functions. Using genetic correlations to investigate transcriptional co-regulation provides valuable insight into the nature of the underlying genetic architecture of gene regulation. Covariance of gene expression pairs is due to a combination of shared genetic and environmental factors. Here the authors estimate the genetic correlation between highly heritable pairs and identify transcription factor control and chromatin interactions as possible mechanisms of correlation.
Collapse
Affiliation(s)
- Samuel W Lukowski
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, 4072, Australia
| | - Luke R Lloyd-Jones
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, 4072, Australia.,Centre for Neurogenetics and Statistical Genomics, Queensland Brain Institute, University of Queensland, Brisbane, QLD, 4072, Australia
| | - Alexander Holloway
- Centre for Neurogenetics and Statistical Genomics, Queensland Brain Institute, University of Queensland, Brisbane, QLD, 4072, Australia
| | - Holger Kirsten
- Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig, Leipzig, 04107, Germany.,LIFE Leipzig Research Center for Civilization Diseases, University of Leipzig, Leipzig, 04103, Germany
| | - Gibran Hemani
- MRC Integrative Epidemiology Unit (IEU) at the University of Bristol, Bristol, BS8 2BN, UK.,School of Social and Community Medicine, University of Bristol, Bristol, BS8 2BN, UK
| | - Jian Yang
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, 4072, Australia.,Centre for Neurogenetics and Statistical Genomics, Queensland Brain Institute, University of Queensland, Brisbane, QLD, 4072, Australia
| | - Kerrin Small
- Department of Twin Research and Genetic Epidemiology, King's College London, London, SE1 7EH, UK
| | - Jing Zhao
- School of Biology and Center for Integrative Genomics, Georgia Institute of Technology, Atlanta, GA, 30332, USA
| | - Andres Metspalu
- Estonian Genome Center, University of Tartu, Tartu, 51010, Estonia
| | - Emmanouil T Dermitzakis
- Department of Genetic Medicine and Development, University of Geneva, Geneva, CH-1211, Switzerland
| | - Greg Gibson
- School of Biology and Center for Integrative Genomics, Georgia Institute of Technology, Atlanta, GA, 30332, USA
| | - Timothy D Spector
- Department of Twin Research and Genetic Epidemiology, King's College London, London, SE1 7EH, UK
| | - Joachim Thiery
- LIFE Leipzig Research Center for Civilization Diseases, University of Leipzig, Leipzig, 04103, Germany.,Institute of Laboratory Medicine, Clinical Chemistry and Molecular Diagnostics, University of Leipzig, Leipzig, 04103, Germany
| | - Markus Scholz
- Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig, Leipzig, 04107, Germany.,LIFE Leipzig Research Center for Civilization Diseases, University of Leipzig, Leipzig, 04103, Germany
| | - Grant W Montgomery
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, 4072, Australia.,QIMR Berghofer Medical Research Institute, 300 Herston Road, Brisbane, QLD, 4006, Australia
| | - Tonu Esko
- Estonian Genome Center, University of Tartu, Tartu, 51010, Estonia
| | - Peter M Visscher
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, 4072, Australia.,Centre for Neurogenetics and Statistical Genomics, Queensland Brain Institute, University of Queensland, Brisbane, QLD, 4072, Australia
| | - Joseph E Powell
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, 4072, Australia. .,Centre for Neurogenetics and Statistical Genomics, Queensland Brain Institute, University of Queensland, Brisbane, QLD, 4072, Australia.
| |
Collapse
|