1
|
Banzato E, Chiogna M, Djordjilović V, Risso D. A Bartlett-type correction for likelihood ratio tests with application to testing equality of Gaussian graphical models. Stat Probab Lett 2023; 193:109732. [PMID: 38584807 PMCID: PMC10997343 DOI: 10.1016/j.spl.2022.109732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
This work defines a new correction for the likelihood ratio test for a two-sample problem within the multivariate normal context. This correction applies to decomposable graphical models, where testing equality of distributions can be decomposed into lower dimensional problems.
Collapse
Affiliation(s)
- Erika Banzato
- Department of Statistical Sciences, University of Padua, via C. Battisti 241, Padua, Italy
| | - Monica Chiogna
- Department of Statistical Sciences, University of Bologna, Via Belle Arti, 41, Bologna, Italy
| | - Vera Djordjilović
- Department of Economics, Ca’ Foscari University of Venice, Cannaregio 873, Venice, Italy
| | - Davide Risso
- Department of Statistical Sciences, University of Padua, via C. Battisti 241, Padua, Italy
| |
Collapse
|
2
|
Page CM, Nøst TH, Djordjilović V, Thoresen M, Frigessi A, Sandanger TM, Veierød MB. Pre-diagnostic DNA methylation in blood leucocytes in cutaneous melanoma; a nested case–control study within the Norwegian Women and Cancer cohort. Sci Rep 2022; 12:14200. [PMID: 35987900 PMCID: PMC9392730 DOI: 10.1038/s41598-022-18585-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 08/16/2022] [Indexed: 12/03/2022] Open
Abstract
The prognosis of cutaneous melanoma depends on early detection, and good biomarkers for melanoma risk may provide a valuable tool to detect melanoma development at a pre-clinical stage. By studying the epigenetic profile in pre-diagnostic blood samples of melanoma cases and cancer free controls, we aimed to identify DNA methylation sites conferring melanoma risk. DNA methylation was measured at 775,528 CpG sites using the Illumina EPIC array in whole blood in incident melanoma cases (n = 183) and matched cancer-free controls (n = 183) in the Norwegian Women and Cancer cohort. Phenotypic information and ultraviolet radiation exposure were obtained from questionnaires. Epigenome wide association (EWAS) was analyzed in future melanoma cases and controls with conditional logistic regression, with correction for multiple testing using the false discovery rate (FDR). We extended the analysis by including a public data set on melanoma (GSE120878), and combining these different data sets using a version of covariate modulated FDR (AdaPT). The analysis on future melanoma cases and controls did not identify any genome wide significant CpG sites (0.85 ≤ padj ≤ 0.99). In the restricted AdaPT analysis, 7 CpG sites were suggestive at the FDR level of 0.15. These CpG sites may potentially be used as pre-diagnostic biomarkers of melanoma risk.
Collapse
|
3
|
Djordjilović V, Chiogna M. Searching for a source of difference in graphical models. J MULTIVARIATE ANAL 2022. [DOI: 10.1016/j.jmva.2022.104973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
4
|
Abstract
Mediation analysis in high‐dimensional settings often involves identifying potential mediators among a large number of measured variables. For this purpose, a two‐step familywise error rate procedure called ScreenMin has been recently proposed. In ScreenMin, variables are first screened and only those that pass the screening are tested. The proposed data‐independent threshold for selection has been shown to guarantee asymptotic familywise error rate. In this work, we investigate the impact of the threshold on the finite‐sample familywise error rate. We derive a power maximizing threshold and show that it is well approximated by an adaptive threshold of Wang et al. (2016, arXiv preprint arXiv:1610.03330). We illustrate the investigated procedures on a case‐control study examining the effect of fish intake on the risk of colorectal adenoma. We also apply our procedure in the context of replicability analysis to identify single nucleotide polymorphisms (SNP) associated with crop yield in two distinct environments.
Collapse
Affiliation(s)
- Vera Djordjilović
- Department of Economics, Ca' Foscari University of Venice, Dorsoduro, Venice, Italy
| | - Jesse Hemerik
- Biometris, Wageningen University & Research, Wageningen, The Netherlands
| | - Magne Thoresen
- Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Blindern, Oslo, Norway
| |
Collapse
|
5
|
Salviato E, Djordjilović V, Hariprakash JM, Tagliaferri I, Pal K, Ferrari F. Leveraging three-dimensional chromatin architecture for effective reconstruction of enhancer-target gene regulatory interactions. Nucleic Acids Res 2021; 49:e97. [PMID: 34197622 PMCID: PMC8464068 DOI: 10.1093/nar/gkab547] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Revised: 06/07/2021] [Accepted: 06/17/2021] [Indexed: 12/23/2022] Open
Abstract
A growing amount of evidence in literature suggests that germline sequence variants and somatic mutations in non-coding distal regulatory elements may be crucial for defining disease risk and prognostic stratification of patients, in genetic disorders as well as in cancer. Their functional interpretation is challenging because genome-wide enhancer-target gene (ETG) pairing is an open problem in genomics. The solutions proposed so far do not account for the hierarchy of structural domains which define chromatin three-dimensional (3D) architecture. Here we introduce a change of perspective based on the definition of multi-scale structural chromatin domains, integrated in a statistical framework to define ETG pairs. In this work (i) we develop a computational and statistical framework to reconstruct a comprehensive map of ETG pairs leveraging functional genomics data; (ii) we demonstrate that the incorporation of chromatin 3D architecture information improves ETG pairing accuracy and (iii) we use multiple experimental datasets to extensively benchmark our method against previous solutions for the genome-wide reconstruction of ETG pairs. This solution will facilitate the annotation and interpretation of sequence variants in distal non-coding regulatory elements. We expect this to be especially helpful in clinically oriented applications of whole genome sequencing in cancer and undiagnosed genetic diseases research.
Collapse
Affiliation(s)
- Elisa Salviato
- IFOM, the FIRC Institute of Molecular Oncology, Milan 20139, Italy
| | - Vera Djordjilović
- Department of Economics, Ca’ Foscari University of Venice, Venice 30100, Italy
| | | | | | - Koustav Pal
- IFOM, the FIRC Institute of Molecular Oncology, Milan 20139, Italy
| | - Francesco Ferrari
- IFOM, the FIRC Institute of Molecular Oncology, Milan 20139, Italy
- Institute of Molecular Genetics “Luigi Luca Cavalli-Sforza”, National Research Council, Pavia 27100, Italy
| |
Collapse
|
6
|
Djordjilović V, Chiogna M, Romualdi C. Simulating gene silencing through intervention analysis. J R Stat Soc Ser C Appl Stat 2020. [DOI: 10.1111/rssc.12412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
7
|
Page CM, Djordjilović V, Nøst TH, Ghiasvand R, Sandanger TM, Frigessi A, Thoresen M, Veierød MB. Lifetime Ultraviolet Radiation Exposure and DNA Methylation in Blood Leukocytes: The Norwegian Women and Cancer Study. Sci Rep 2020; 10:4521. [PMID: 32161338 PMCID: PMC7066249 DOI: 10.1038/s41598-020-61430-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 02/26/2020] [Indexed: 12/04/2022] Open
Abstract
Ultraviolet radiation (UVR) exposure is a leading cause of skin cancers and an ubiquitous environmental exposure. However, the molecular mechanisms relating UVR exposure to melanoma is not fully understood. We aimed to investigate if lifetime UVR exposure could be robustly associated to DNA methylation (DNAm). We assessed DNAm in whole blood in three data sets (n = 183, 191, and 125) from the Norwegian Woman and Cancer cohort, using Illumina platforms. We studied genome-wide DNAm, targeted analyses of CpG sites indicated in the literature, global methylation, and accelerated aging. Lifetime history of UVR exposure (residential ambient UVR, sunburns, sunbathing vacations and indoor tanning) was collected by questionnaires. We used one data set for discovery and the other two for replication. One CpG site showed a genome-wide significant association to cumulative UVR exposure (cg01884057) (pnominal = 3.96e-08), but was not replicated in any of the two replication sets (pnominal ≥ 0.42). Two CpG sites (cg05860019, cg00033666) showed suggestive associations with the other UVR exposures. We performed extensive analyses of the association between long-term UVR exposure and DNAm. There was no indication of a robust effect of past UVR exposure on DNAm.
Collapse
Affiliation(s)
- Christian M Page
- Oslo Centre for Biostatistics and Epidemiology, Oslo University Hospital, Oslo, Norway
- Centre for Fertility and Health, Norwegian Institute of Public health, Oslo, Norway
| | - Vera Djordjilović
- Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, Institute of Basic Medical Sciences, University of Oslo, Oslo, Norway
| | - Therese H Nøst
- Department of Community Medicine, UiT - the Arctic University of Norway, Tromsø, Norway
| | - Reza Ghiasvand
- Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, Institute of Basic Medical Sciences, University of Oslo, Oslo, Norway
- Department of Research, Cancer Registry of Norway, Oslo, Norway
| | - Torkjel M Sandanger
- Department of Community Medicine, UiT - the Arctic University of Norway, Tromsø, Norway
| | - Arnoldo Frigessi
- Oslo Centre for Biostatistics and Epidemiology, Oslo University Hospital, Oslo, Norway
- Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, Institute of Basic Medical Sciences, University of Oslo, Oslo, Norway
| | - Magne Thoresen
- Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, Institute of Basic Medical Sciences, University of Oslo, Oslo, Norway
| | - Marit B Veierød
- Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, Institute of Basic Medical Sciences, University of Oslo, Oslo, Norway.
| |
Collapse
|
8
|
Salviato E, Djordjilović V, Chiogna M, Romualdi C. SourceSet: A graphical model approach to identify primary genes in perturbed biological pathways. PLoS Comput Biol 2019; 15:e1007357. [PMID: 31652275 PMCID: PMC6834292 DOI: 10.1371/journal.pcbi.1007357] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2018] [Revised: 11/06/2019] [Accepted: 08/23/2019] [Indexed: 11/24/2022] Open
Abstract
Topological gene-set analysis has emerged as a powerful means for omic data interpretation. Although numerous methods for identifying dysregulated genes have been proposed, few of them aim to distinguish genes that are the real source of perturbation from those that merely respond to the signal dysregulation. Here, we propose a new method, called SourceSet, able to distinguish between the primary and the secondary dysregulation within a Gaussian graphical model context. The proposed method compares gene expression profiles in the control and in the perturbed condition and detects the differences in both the mean and the covariance parameters with a series of likelihood ratio tests. The resulting evidence is used to infer the primary and the secondary set, i.e. the genes responsible for the primary dysregulation, and the genes affected by the perturbation through network propagation. The proposed method demonstrates high specificity and sensitivity in different simulated scenarios and on several real biological case studies. In order to fit into the more traditional pathway analysis framework, SourceSet R package also extends the analysis from a single to multiple pathways and provides several graphical outputs, including Cytoscape visualization to browse the results. The rapid increase in omic studies has created a need to understand the biological implications of their results. Gene-set analysis has emerged as a powerful means for gaining such understanding, evolving in the last decade from the classical enrichment analysis to the more powerful topological approaches. Although numerous methods for identifying dysregulated genes have been proposed, few of them aim to distinguish genes that are the real source of perturbation from those that merely respond to the signal dysregulation. This distinction is crucial for network medicine, where the prioritization of the effect of biological perturbations may help in the molecular understanding of drug treatments and diseases. Here we propose a new method, called SourceSet, able to distinguish between primary and secondary dysregulation within a graphical model context, demonstrating a high specificity and sensitivity in different simulated scenarios and on real biological case studies.
Collapse
Affiliation(s)
- Elisa Salviato
- IFOM - The FIRC Institute of Molecular Oncology, Milan, Italy
- * E-mail: (ES); (CR)
| | | | - Monica Chiogna
- Department of Statistical Sciences, University of Bologna, Bologna, Italy
| | - Chiara Romualdi
- Department of Biology, University of Padova, Padova, Italy
- * E-mail: (ES); (CR)
| |
Collapse
|
9
|
Djordjilović V, Page CM, Gran JM, Nøst TH, Sandanger TM, Veierød MB, Thoresen M. Global test for high-dimensional mediation: Testing groups of potential mediators. Stat Med 2019; 38:3346-3360. [PMID: 31074092 DOI: 10.1002/sim.8199] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Revised: 04/18/2019] [Accepted: 04/22/2019] [Indexed: 11/08/2022]
Abstract
We address the problem of testing whether a possibly high-dimensional vector may act as a mediator between some exposure variable and the outcome of interest. We propose a global test for mediation, which combines a global test with the intersection-union principle. We discuss theoretical properties of our approach and conduct simulation studies that demonstrate that it performs equally well or better than its competitor. We also propose a multiple testing procedure, ScreenMin, that provides asymptotic control of either familywise error rate or false discovery rate when multiple groups of potential mediators are tested simultaneously. We apply our approach to data from a large Norwegian cohort study, where we look at the hypothesis that smoking increases the risk of lung cancer by modifying the level of DNA methylation.
Collapse
Affiliation(s)
- Vera Djordjilović
- Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Oslo, Norway
| | - Christian M Page
- Oslo Centre for Biostatistics and Epidemiology, Oslo University Hospital, Oslo, Norway.,Center for Fertility and Health, Division of Mental and Physical Health, Norwegian Institute of Public Health, Oslo, Norway
| | - Jon Michael Gran
- Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Oslo, Norway.,Oslo Centre for Biostatistics and Epidemiology, Oslo University Hospital, Oslo, Norway
| | - Therese H Nøst
- Department of Community Medicine, The Arctic University of Norway, Tromsø, Norway
| | - Torkjel M Sandanger
- Department of Community Medicine, The Arctic University of Norway, Tromsø, Norway
| | - Marit B Veierød
- Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Oslo, Norway
| | - Magne Thoresen
- Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Oslo, Norway
| |
Collapse
|
10
|
Djordjilović V, Chiogna M, Vomlel J. An empirical comparison of popular structure learning algorithms with a view to gene network inference. Int J Approx Reason 2017. [DOI: 10.1016/j.ijar.2016.12.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
11
|
Djordjilović V, Chiogna M, Massa MS, Romualdi C. Graphical modeling for gene set analysis: A critical appraisal. Biom J 2015; 57:852-66. [PMID: 26149206 DOI: 10.1002/bimj.201300287] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2013] [Revised: 03/13/2015] [Accepted: 03/17/2015] [Indexed: 11/08/2022]
Abstract
Current demand for understanding the behavior of groups of related genes, combined with the greater availability of data, has led to an increased focus on statistical methods in gene set analysis. In this paper, we aim to perform a critical appraisal of the methodology based on graphical models developed in Massa et al. (2010) that uses pathway signaling networks as a starting point to develop statistically sound procedures for gene set analysis. We pay attention to the potential of the methodology with respect to the organizational aspects of dealing with such complex but highly informative starting structures, that is pathways. We focus on three themes: the translation of a biological pathway into a graph suitable for modeling, the role of shrinkage when more genes than samples are obtained, the evaluation of respondence of the statistical models to the biological expectations. To study the impact of shrinkage, two simulation studies will be run. To evaluate the biological expectation we will use data from a network with known behavior that offer the possibility of carrying out a realistic check of respondence of the model to changes in the experimental conditions.
Collapse
Affiliation(s)
- Vera Djordjilović
- Department of Statistical Sciences, University of Padua, via Cesare Battisti 241, 35121 Padova, Italy
| | - Monica Chiogna
- Department of Statistical Sciences, University of Padua, via Cesare Battisti 241, 35121 Padova, Italy
| | - M Sofia Massa
- Department of Statistics, University of Oxford, 1 South Parks Road, Oxford, OX1 3TG, United Kingdom
| | - Chiara Romualdi
- Department of Biology, University of Padua, Via Ugo Bassi 58/B, 35121 Padova, Italy
| |
Collapse
|