1
|
Zhang J, Li C, Wang J. A stochastic block Ising model for multi-layer networks with inter-layer dependence. Biometrics 2023; 79:3564-3573. [PMID: 37284764 DOI: 10.1111/biom.13885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 05/26/2023] [Indexed: 06/08/2023]
Abstract
Community detection has attracted tremendous interests in network analysis, which aims at finding group of nodes with similar characteristics. Various detection methods have been developed to detect homogeneous communities in multi-layer networks, where inter-layer dependence is a widely acknowledged but severely under-investigated issue. In this paper, we propose a novel stochastic block Ising model (SBIM) to incorporate the inter-layer dependence to help with community detection in multi-layer networks. The community structure is modeled by the stochastic block model (SBM) and the inter-layer dependence is incorporated via the popular Ising model. Furthermore, we develop an efficient variational EM algorithm to tackle the resultant optimization task and establish the asymptotic consistency of the proposed method. Extensive simulated examples and a real example on gene co-expression multi-layer network data are also provided to demonstrate the advantage of the proposed method.
Collapse
Affiliation(s)
- Jingnan Zhang
- International Institute of Finance, School of Management, University of Science and Technology of China, Hefei, Anhui, China
| | - Chengye Li
- School of Data Science, City University of Hong Kong, Kowloon, Hong Kong
| | - Junhui Wang
- Department of Statistics, The Chinese University of Hong Kong, New Territories, Hong Kong
| |
Collapse
|
2
|
Norberg A, Susi H, Sallinen S, Baran P, Clark NJ, Laine AL. Direct and indirect viral associations predict coexistence in wild plant virus communities. Curr Biol 2023; 33:1665-1676.e4. [PMID: 37019108 DOI: 10.1016/j.cub.2023.03.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 01/17/2023] [Accepted: 03/08/2023] [Indexed: 04/07/2023]
Abstract
Viruses are a vastly underestimated component of biodiversity that occur as diverse communities across hierarchical scales from the landscape level to individual hosts. The integration of community ecology with disease biology is a powerful, novel approach that can yield unprecedented insights into the abiotic and biotic drivers of pathogen community assembly. Here, we sampled wild plant populations to characterize and analyze the diversity and co-occurrence structure of within-host virus communities and their predictors. Our results show that these virus communities are characterized by diverse, non-random coinfections. Using a novel graphical network modeling framework, we demonstrate how environmental heterogeneity influences the network of virus taxa and how the virus co-occurrence patterns can be attributed to non-random, direct statistical virus-virus associations. Moreover, we show that environmental heterogeneity changed virus association networks, especially through their indirect effects. Our results highlight a previously underestimated mechanism of how environmental variability can influence disease risks by changing associations between viruses that are conditional on their environment.
Collapse
Affiliation(s)
- Anna Norberg
- Department of Evolutionary Biology and Environmental Studies, University of Zürich, 8057 Zürich, Switzerland; Centre for Biodiversity Dynamics, Department of Biology, Norwegian University of Science and Technology, 7034 Trondheim, Norway.
| | - Hanna Susi
- Organismal and Evolutionary Biology Research Programme, Faculty of Biological and Environmental Sciences, University of Helsinki, PO Box 65 00014, Helsinki, Finland
| | - Suvi Sallinen
- Organismal and Evolutionary Biology Research Programme, Faculty of Biological and Environmental Sciences, University of Helsinki, PO Box 65 00014, Helsinki, Finland
| | - Pezhman Baran
- Organismal and Evolutionary Biology Research Programme, Faculty of Biological and Environmental Sciences, University of Helsinki, PO Box 65 00014, Helsinki, Finland
| | - Nicholas J Clark
- School of Veterinary Science, Faculty of Science, University of Queensland, Gatton, QL 4343, Australia
| | - Anna-Liisa Laine
- Department of Evolutionary Biology and Environmental Studies, University of Zürich, 8057 Zürich, Switzerland; Organismal and Evolutionary Biology Research Programme, Faculty of Biological and Environmental Sciences, University of Helsinki, PO Box 65 00014, Helsinki, Finland
| |
Collapse
|
3
|
Barshes NR, Clark NJ, Bidare D, Dudenhoeffer JH, Mindru C, Rodriguez-Barradas MC. Polymicrobial Foot Infection Patterns Are Common and Associated With Treatment Failure. Open Forum Infect Dis 2022; 9:ofac475. [PMID: 36267251 PMCID: PMC9578153 DOI: 10.1093/ofid/ofac475] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 09/13/2022] [Indexed: 08/09/2023] Open
Abstract
BACKGROUND That foot infections are predominately polymicrobial has long been recognized, but it is not clear if the various species co-occur randomly or in patterns. We sought nonrandom species co-occurrence patterns that might help better predict prognosis or guide antimicrobial selection. METHODS We analyzed tissue (bone, skin, and other soft tissue), fluid, and swab specimens collected from initial foot infection episodes during a 10-year period using a hospital registry. Nonrandom co-occurrence of microbial species was identified using simple pairwise co-occurrence rates adjusted for multiple comparisons, Markov and conditional random fields, and factor analysis. A historical cohort was used to validate pattern occurrence and identify clinical significance. RESULTS In total, 156 unique species were identified among the 727 specimens obtained from initial foot infection episodes in 694 patients. Multiple analyses suggested that Staphylococcus aureus is negatively associated with other staphylococci. Another pattern noted was the co-occurrence of alpha-hemolytic Streptococcus, Enterococcus fecalis, Klebsiella, Proteus, Enterobacter, or Escherichia coli, and absence of both Bacteroides and Corynebacterium. Patients in a historical cohort with this latter pattern had significantly higher risk-adjusted rates of treatment failure. CONCLUSIONS Several nonrandom microbial co-occurrence patterns are frequently seen in foot infection specimens. One particular pattern with many Proteobacteria species may denote a higher risk for treatment failure. Staphylococcus aureus rarely co-occurs with other staphylococci.
Collapse
Affiliation(s)
- Neal R Barshes
- Correspondence: Neal R. Barshes, MD, MPH, Baylor College of Medicine/Michael E. DeBakey Veterans Affairs Medical Center, 2002 Holcombe Boulevard (OCL 112), Houston, TX 77030 ()
| | - Nicholas J Clark
- School of Veterinary Science, School of Veterinary Science, The University of Queensland, Gatton, Queensland, Australia
| | - Deeksha Bidare
- Baylor College of Medicine, One Baylor Plaza, Houston, Texas, USA
| | - J H Dudenhoeffer
- Department of Biology and Biochemistry, University of Houston, Houston, Texas, USA
| | - Cezarina Mindru
- Michael E. DeBakey Veterans Affairs Medical Center, Houston, Texas, USA
- Infectious Disease Section, Department of Medicine, One Baylor Plaza, Houston, Texas, USA
| | - Maria C Rodriguez-Barradas
- Michael E. DeBakey Veterans Affairs Medical Center, Houston, Texas, USA
- Infectious Disease Section, Department of Medicine, One Baylor Plaza, Houston, Texas, USA
| |
Collapse
|
4
|
Park S, Lee ER, Zhao H. Low-rank regression models for multiple binary responses and their applications to cancer cell-line encyclopedia data. J Am Stat Assoc 2022; 119:202-216. [PMID: 38481466 PMCID: PMC10928550 DOI: 10.1080/01621459.2022.2105704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 07/16/2022] [Indexed: 10/16/2022]
Abstract
In this paper, we study high-dimensional multivariate logistic regression models in which a common set of covariates is used to predict multiple binary outcomes simultaneously. Our work is primarily motivated from many biomedical studies with correlated multiple responses such as the cancer cell-line encyclopedia project. We assume that the underlying regression coefficient matrix is simultaneously low-rank and row-wise sparse. We propose an intuitively appealing selection and estimation framework based on marginal model likelihood, and we develop an efficient computational algorithm for inference. We establish a novel high-dimensional theory for this nonlinear multivariate regression. Our theory is general, allowing for potential correlations between the binary responses. We propose a new type of nuclear norm penalty using the smooth clipped absolute deviation, filling the gap in the related non-convex penalization literature. We theoretically demonstrate that the proposed approach improves estimation accuracy by considering multiple responses jointly through the proposed estimator when the underlying coefficient matrix is low-rank and row-wise sparse. In particular, we establish the non-asymptotic error bounds, and both rank and row support consistency of the proposed method. Moreover, we develop a consistent rule to simultaneously select the rank and row dimension of the coefficient matrix. Furthermore, we extend the proposed methods and theory to a joint Ising model, which accounts for the dependence relationships. In our analysis of both simulated data and the cancer cell line encyclopedia data, the proposed methods outperform the existing methods in better predicting responses.
Collapse
Affiliation(s)
- Seyoung Park
- Department of Statistics, Sungkyunkwan University, Seoul, 03063, Korea
| | - Eun Ryung Lee
- Department of Statistics, Sungkyunkwan University, Seoul, 03063, Korea
| | - Hongyu Zhao
- Department of Biostatistics, Yale University, New Haven, CT, 06511, USA
| |
Collapse
|
5
|
Tao J, Li B, Xue L. An additive graphical model for discrete data. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2022.2119983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Jun Tao
- Department of Statistics, The Pennsylvania State University
| | - Bing Li
- Department of Statistics, The Pennsylvania State University
| | - Lingzhou Xue
- Department of Statistics, The Pennsylvania State University
| |
Collapse
|
6
|
Zhou J, Hoen AG, Mcritchie S, Pathmasiri W, Viles WD, Nguyen QP, Madan JC, Dade E, Karagas MR, Gui J. Information enhanced model selection for Gaussian graphical model with application to metabolomic data. Biostatistics 2022; 23:926-948. [PMID: 33720330 PMCID: PMC9608647 DOI: 10.1093/biostatistics/kxab006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Revised: 01/21/2021] [Accepted: 01/22/2021] [Indexed: 11/12/2022] Open
Abstract
In light of the low signal-to-noise nature of many large biological data sets, we propose a novel method to learn the structure of association networks using Gaussian graphical models combined with prior knowledge. Our strategy includes two parts. In the first part, we propose a model selection criterion called structural Bayesian information criterion, in which the prior structure is modeled and incorporated into Bayesian information criterion. It is shown that the popular extended Bayesian information criterion is a special case of structural Bayesian information criterion. In the second part, we propose a two-step algorithm to construct the candidate model pool. The algorithm is data-driven and the prior structure is embedded into the candidate model automatically. Theoretical investigation shows that under some mild conditions structural Bayesian information criterion is a consistent model selection criterion for high-dimensional Gaussian graphical model. Simulation studies validate the superiority of the proposed algorithm over the existing ones and show the robustness to the model misspecification. Application to relative concentration data from infant feces collected from subjects enrolled in a large molecular epidemiological cohort study validates that metabolic pathway involvement is a statistically significant factor for the conditional dependence between metabolites. Furthermore, new relationships among metabolites are discovered which can not be identified by the conventional methods of pathway analysis. Some of them have been widely recognized in biological literature.
Collapse
Affiliation(s)
- Jie Zhou
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, 3 Rope Ferry Road, Hanover, NH 03755, USA
| | - Anne G Hoen
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA and Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, 3 Rope Ferry Road, Hanover, NH 03755, USA
| | - Susan Mcritchie
- Nutrition Research Institute, Department of Nutrition, School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, 500 Laureate Way, Kannapolis, NC 28081, USA
| | - Wimal Pathmasiri
- Nutrition Research Institute, Department of Nutrition, School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, 500 Laureate Way, Kannapolis, NC 28081, USA
| | - Weston D Viles
- Department of Mathematics and Statistics, University of Southern Maine, 96 Falmouth St, Portland, ME 04103, USA
| | - Quang P Nguyen
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA and Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Juliette C Madan
- Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Erika Dade
- Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Margaret R Karagas
- Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Jiang Gui
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| |
Collapse
|
7
|
Zhang J, Li Y. High-Dimensional Gaussian Graphical Regression Models with Covariates. J Am Stat Assoc 2022; 118:2088-2100. [PMID: 38143787 PMCID: PMC10746132 DOI: 10.1080/01621459.2022.2034632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 01/20/2022] [Indexed: 10/19/2022]
Abstract
Though Gaussian graphical models have been widely used in many scientific fields, relatively limited progress has been made to link graph structures to external covariates. We propose a Gaussian graphical regression model, which regresses both the mean and the precision matrix of a Gaussian graphical model on covariates. In the context of co-expression quantitative trait locus (QTL) studies, our method can determine how genetic variants and clinical conditions modulate the subject-level network structures, and recover both the population-level and subject-level gene networks. Our framework encourages sparsity of covariate effects on both the mean and the precision matrix. In particular for the precision matrix, we stipulate simultaneous sparsity, i.e., group sparsity and element-wise sparsity, on effective covariates and their effects on network edges, respectively. We establish variable selection consistency first under the case with known mean parameters and then a more challenging case with unknown means depending on external covariates, and establish in both cases the ℓ2 convergence rates and the selection consistency of the estimated precision parameters. The utility and efficacy of our proposed method is demonstrated through simulation studies and an application to a co-expression QTL study with brain cancer patients.
Collapse
Affiliation(s)
- Jingfei Zhang
- Department of Management Science, University of Miami, Coral Gables, FL 33146
| | - Yi Li
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109
| |
Collapse
|
8
|
Xie S, McDonnell E, Wang Y. Conditional Gaussian graphical model for estimating personalized disease symptom networks. Stat Med 2022; 41:543-553. [PMID: 34866214 PMCID: PMC8792223 DOI: 10.1002/sim.9274] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 10/13/2021] [Accepted: 11/15/2021] [Indexed: 11/10/2022]
Abstract
The co-occurrence of symptoms may result from the direct interactions between these symptoms and the symptoms can be treated as a system. In addition, subject-specific risk factors (eg, genetic variants, age) can also exert external influence on the system. In this work, we develop a covariate-dependent conditional Gaussian graphical model to obtain personalized symptom networks. The strengths of network connections are modeled as a function of covariates to capture the heterogeneity among individuals and subgroups of individuals. We assess the performance of our proposed method by simulation studies and an application to a large natural history study of Huntington's disease to investigate the networks of symptoms in multiple clinical domains (motor, cognitive, psychiatric) and identify important brain imaging biomarkers that are associated with the connections. We show that the symptoms in the same clinical domain interact more often with each other than cross domains and the psychiatric subnetwork is the densest network. We validate the findings using the subjects' symptom measurements at follow-up visits.
Collapse
Affiliation(s)
- Shanghong Xie
- School of Statistics and Center of Statistical Research, Southwestern University of Finance and Economics, Chengdu, China
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, U.S.A
| | - Erin McDonnell
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, U.S.A
| | - Yuanjia Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, U.S.A
- Department of Psychiatry, Columbia University Medical Center, New York, NY, U.S.A
| |
Collapse
|
9
|
Wang Z, Kaseb AO, Amin HM, Hassan MM, Wang W, Morris JS. Bayesian Edge Regression in Undirected Graphical Models to Characterize Interpatient Heterogeneity in Cancer. J Am Stat Assoc 2022; 117:533-546. [PMID: 36090952 PMCID: PMC9454401 DOI: 10.1080/01621459.2021.2000866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Revised: 07/13/2021] [Accepted: 10/24/2021] [Indexed: 10/19/2022]
Abstract
It is well-established that interpatient heterogeneity in cancer may significantly affect genomic data analyses and in particular, network topologies. Most existing graphical model methods estimate a single population-level graph for genomic or proteomic network. In many investigations, these networks depend on patient-specific indicators that characterize the heterogeneity of individual networks across subjects with respect to subject-level covariates. Examples include assessments of how the network varies with patient-specific prognostic scores or comparisons of tumor and normal graphs while accounting for tumor purity as a continuous predictor. In this paper, we propose a novel edge regression model for undirected graphs, which estimates conditional dependencies as a function of subject-level covariates. We evaluate our model performance through simulation studies focused on comparing tumor and normal graphs while adjusting for tumor purity. In application to a dataset of proteomic measurements on plasma samples from patients with hepatocellular carcinoma (HCC), we ascertain how blood protein networks vary with disease severity, as measured by HepatoScore, a novel biomarker signature measuring disease severity. Our case study shows that the network connectivity increases with HepatoScore and a set of hub genes as well as important gene connections are identified under different HepatoScore, which may provide important biological insights to the development of precision therapies for HCC.
Collapse
Affiliation(s)
- Zeya Wang
- Department of Statistics, Rice University; Department of Bioinformatics and Computational Biology, University of Texas MD Anderson Cancer Center, Veerabhadran Baladandayuthapani; Department of Biostatistics, University of Michigan
| | - Ahmed O Kaseb
- Department of Gastrointestinal Medical Oncology, The University of Texas MD Anderson Cancer Center
| | - Hesham M Amin
- Department of Hematopathology, The University of Texas MD Anderson Cancer Center
| | - Manal M Hassan
- Department of Epidemiology, The University of Texas MD Anderson Cancer Center
| | - Wenyi Wang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center
| | - Jeffrey S Morris
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania
| |
Collapse
|
10
|
Ni Y, Stingo FC, Baladandayuthapani V. Bayesian Covariate-Dependent Gaussian Graphical Models with Varying Structure. JOURNAL OF MACHINE LEARNING RESEARCH : JMLR 2022; 23:https://www.jmlr.org/papers/v23/21-0102.html. [PMID: 37799290 PMCID: PMC10552903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/07/2023]
Abstract
We introduce Bayesian Gaussian graphical models with covariates (GGMx), a class of multivariate Gaussian distributions with covariate-dependent sparse precision matrix. We propose a general construction of a functional mapping from the covariate space to the cone of sparse positive definite matrices, which encompasses many existing graphical models for heterogeneous settings. Our methodology is based on a novel mixture prior for precision matrices with a non-local component that admits attractive theoretical and empirical properties. The flexible formulation of GGMx allows both the strength and the sparsity pattern of the precision matrix (hence the graph structure) change with the covariates. Posterior inference is carried out with a carefully designed Markov chain Monte Carlo algorithm, which ensures the positive definiteness of sparse precision matrices at any given covariates' values. Extensive simulations and a case study in cancer genomics demonstrate the utility of the proposed model.
Collapse
Affiliation(s)
- Yang Ni
- Department of Statistics, Texas A&M University, College Station, TX 77843, USA
| | - Francesco C Stingo
- Department of Statistics, Computer Science, Applications "G. Parenti", The University of Florence Florence, Italy
| | | |
Collapse
|
11
|
Abstract
Graphical models are powerful tools that are regularly used to investigate complex dependence structures in high-throughput biomedical datasets. They allow for holistic, systems-level view of the various biological processes, for intuitive and rigorous understanding and interpretations. In the context of large networks, Bayesian approaches are particularly suitable because it encourages sparsity of the graphs, incorporate prior information, and most importantly account for uncertainty in the graph structure. These features are particularly important in applications with limited sample size, including genomics and imaging studies. In this paper, we review several recently developed techniques for the analysis of large networks under non-standard settings, including but not limited to, multiple graphs for data observed from multiple related subgroups, graphical regression approaches used for the analysis of networks that change with covariates, and other complex sampling and structural settings. We also illustrate the practical utility of some of these methods using examples in cancer genomics and neuroimaging.
Collapse
|
12
|
Zhu Y, Shen X, Jiang H, Wong WH. Collaborative Multilabel Classification. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2021.1961783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Yunzhang Zhu
- Department of Statistics, The Ohio State University, Columbus, OH
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, Minneapolis, MN
| | - Hui Jiang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI
| | - Wing Hung Wong
- Department of Statistics and Biomedical Data Science, Stanford University, Stanford, CA
| |
Collapse
|
13
|
Affiliation(s)
- Yubai Yuan
- Department of Statistics, University of California, Irvine
| | - Annie Qu
- Department of Statistics, University of California, Irvine
| |
Collapse
|
14
|
Glidden CK, Coon CAC, Beechler BR, McNulty C, Ezenwa VO, Jolles AE. Co-infection best predicts respiratory viral infection in a wild host. J Anim Ecol 2021; 90:602-614. [PMID: 33232513 DOI: 10.1111/1365-2656.13391] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Accepted: 11/02/2020] [Indexed: 11/29/2022]
Abstract
The dynamics of directly transmitted pathogens in natural populations are likely to result from the combined effects of host traits, pathogen biology, and interactions among pathogens within a host. Discovering how these factors work in concert to shape variation in pathogen dynamics in natural host-multi-pathogen systems is fundamental to understanding population health. Here, we describe temporal variation in incidence and then elucidate the effect of hosts trait, season and pathogen co-occurrence on host infection risk using one of the most comprehensive studies of co-infection in a wild population: a suite of seven directly transmitted viral and bacterial respiratory infections from a 4-year study of 200 free-ranging African buffalo Syncerus caffer. Incidence of upper respiratory infections was common throughout the study-five out of the seven pathogens appeared to be consistently circulating throughout our study population. One pathogen exhibited clear outbreak dynamics in our final study year and another was rarely detected. Co-infection was also common in this system: The strongest indicator of pathogen occurrence for respiratory viruses was in fact the presence of other viral respiratory infections. Host traits had minimal effects on odds of pathogen occurrence but did modify pathogen-pathogen associations. In contrast, only season predicted bacterial pathogen occurrence. Though a combination of environmental, behavioural, and physiological factors work together to shape disease dynamics, we found pathogen associations best determined infection risk. Our study demonstrates that, in the absence of very fine-scale data, the intricate changes among these factors are best represented by co-infection.
Collapse
Affiliation(s)
- Caroline K Glidden
- Department of Integrative Biology, Oregon State University, Corvallis, OR, USA
| | - Courtney A C Coon
- Department of Veterinary Tropical Diseases, University of Pretoria, Pretoria, South Africa
| | - Brianna R Beechler
- College of Veterinary Medicine, Oregon State University, Corvallis, OR, USA
| | - Chase McNulty
- College of Veterinary Medicine, Oregon State University, Corvallis, OR, USA
| | - Vanessa O Ezenwa
- Odum School of Ecology and Department of Infectious Diseases, College of Veterinary Medicine, University of Georgia, Athens, GA, USA
| | - Anna E Jolles
- Department of Integrative Biology, Oregon State University, Corvallis, OR, USA.,College of Veterinary Medicine, Oregon State University, Corvallis, OR, USA
| |
Collapse
|
15
|
Zhou J, Viles WD, Lu B, Li Z, Madan JC, Karagas MR, Gui J, Hoen AG. Identification of microbial interaction network: zero-inflated latent Ising model based approach. BioData Min 2020; 13:16. [PMID: 33042226 PMCID: PMC7542390 DOI: 10.1186/s13040-020-00226-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Accepted: 09/22/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Throughout their lifespans, humans continually interact with the microbial world, including those organisms which live in and on the human body. Research in this domain has revealed the extensive links between the human-associated microbiota and health. In particular, the microbiota of the human gut plays essential roles in digestion, nutrient metabolism, immune maturation and homeostasis, neurological signaling, and endocrine regulation. Microbial interaction networks are frequently estimated from data and are an indispensable tool for representing and understanding the conditional correlation between the microbes. In this high-dimensional setting, zero-inflation and unit-sum constraint for relative abundance data pose challenges to the reliable estimation of microbial interaction networks. METHODS AND RESULTS To identify the microbial interaction network, the zero-inflated latent Ising (ZILI) model is proposed which assumes the distribution of relative abundance relies only on finite latent states and provides a novel way to solve issues induced by the unit-sum and zero-inflation constrains. A two-step algorithm is proposed for the model selection of ZILI. ZILI is evaluated through simulated data and subsequently applied to an infant gut microbiota dataset from New Hampshire Birth Cohort Study. The results are compared with results from Gaussian graphical model (GGM) and dichotomous Ising model (DIS). Providing ZILI is the true data-generating model, the simulation studies show that the two-step algorithm can identify the graphical structure effectively and is robust to a range of parameter settings. For the infant gut microbiota dataset, the final estimated networks from GGM and ZILI turn out to have significant overlap in which the ZILI tends to select the sparser network than those from GGM. From the shared subnetwork, a hub taxon Lachnospiraceae is identified whose involvement in human disease development has been discovered recently in literature. CONCLUSIONS Constrains induced by relative abundance of microbiota such as zero inflation and unit sum render the conditional correlation analysis unreliable for conventional methods such as GGM. The proposed optimal categoricalization based ZILI model provides an alternative yet elegant way to deal with these difficulties. The results from ZILI have reasonable biological interpretation. This model can also be used to study the microbial interaction in other body parts.
Collapse
Affiliation(s)
- Jie Zhou
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH USA
| | - Weston D. Viles
- Department of Mathematics and Statistics, University of Southern Maine, Portland, ME USA
| | - Boran Lu
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH USA
| | - Zhigang Li
- Department of Biostatistics, University of Florida, Gainesville, FL USA
| | - Juliette C. Madan
- Department of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH USA
| | - Margaret R. Karagas
- Department of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH USA
| | - Jiang Gui
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH USA
| | - Anne G. Hoen
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH USA
- Department of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH USA
| |
Collapse
|
16
|
Clark NJ, Owada K, Ruberanziza E, Ortu G, Umulisa I, Bayisenge U, Mbonigaba JB, Mucaca JB, Lancaster W, Fenwick A, Soares Magalhães RJ, Mbituyumuremyi A. Parasite associations predict infection risk: incorporating co-infections in predictive models for neglected tropical diseases. Parasit Vectors 2020; 13:138. [PMID: 32178706 PMCID: PMC7077138 DOI: 10.1186/s13071-020-04016-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Accepted: 03/10/2020] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Schistosomiasis and infection by soil-transmitted helminths are some of the world's most prevalent neglected tropical diseases. Infection by more than one parasite (co-infection) is common and can contribute to clinical morbidity in children. Geostatistical analyses of parasite infection data are key for developing mass drug administration strategies, yet most methods ignore co-infections when estimating risk. Infection status for multiple parasites can act as a useful proxy for data-poor individual-level or environmental risk factors while avoiding regression dilution bias. Conditional random fields (CRF) is a multivariate graphical network method that opens new doors in parasite risk mapping by (i) predicting co-infections with high accuracy; (ii) isolating associations among parasites; and (iii) quantifying how these associations change across landscapes. METHODS We built a spatial CRF to estimate infection risks for Ascaris lumbricoides, Trichuris trichiura, hookworms (Ancylostoma duodenale and Necator americanus) and Schistosoma mansoni using data from a national survey of Rwandan schoolchildren. We used an ensemble learning approach to generate spatial predictions by simulating from the CRF's posterior distribution with a multivariate boosted regression tree that captured non-linear relationships between predictors and covariance in infection risks. This CRF ensemble was compared against single parasite gradient boosted machines to assess each model's performance and prediction uncertainty. RESULTS Parasite co-infections were common, with 19.57% of children infected with at least two parasites. The CRF ensemble achieved higher predictive power than single-parasite models by improving estimates of co-infection prevalence at the individual level and classifying schools into World Health Organization treatment categories with greater accuracy. The CRF uncovered important environmental and demographic predictors of parasite infection probabilities. Yet even after capturing demographic and environmental risk factors, the presences or absences of other parasites were strong predictors of individual-level infection risk. Spatial predictions delineated high-risk regions in need of anthelminthic treatment interventions, including areas with higher than expected co-infection prevalence. CONCLUSIONS Monitoring studies routinely screen for multiple parasites, yet statistical models generally ignore this multivariate data when assessing risk factors and designing treatment guidelines. Multivariate approaches can be instrumental in the global effort to reduce and eventually eliminate neglected helminth infections in developing countries.
Collapse
Affiliation(s)
- Nicholas J. Clark
- UQ Spatial Epidemiology Laboratory, School of Veterinary Science, The University of Queensland, Gatton, QLD 4343 Australia
| | - Kei Owada
- UQ Spatial Epidemiology Laboratory, School of Veterinary Science, The University of Queensland, Gatton, QLD 4343 Australia
- Children Health and Environment Program, Child Health Research Centre, The University of Queensland, South Brisbane, QLD 4101 Australia
| | - Eugene Ruberanziza
- Neglected Tropical Diseases and Other Parasitic Diseases Unit, Malaria and Other Parasitic Diseases Division, Rwanda Biomedical Center, Kigali, Rwanda
| | - Giuseppina Ortu
- Schistosomiasis Control Initiative (SCI), Department of Infectious Diseases Epidemiology, Imperial College, London, UK
| | - Irenee Umulisa
- Neglected Tropical Diseases and Other Parasitic Diseases Unit, Malaria and Other Parasitic Diseases Division, Rwanda Biomedical Center, Kigali, Rwanda
| | - Ursin Bayisenge
- Neglected Tropical Diseases and Other Parasitic Diseases Unit, Malaria and Other Parasitic Diseases Division, Rwanda Biomedical Center, Kigali, Rwanda
| | - Jean Bosco Mbonigaba
- Neglected Tropical Diseases and Other Parasitic Diseases Unit, Malaria and Other Parasitic Diseases Division, Rwanda Biomedical Center, Kigali, Rwanda
| | - Jean Bosco Mucaca
- Microbiology Unit, National Reference Laboratory (NRL) Division, Rwanda Biomedical Center, Ministry of Health, Kigali, Rwanda
| | | | - Alan Fenwick
- Schistosomiasis Control Initiative (SCI), Department of Infectious Diseases Epidemiology, Imperial College, London, UK
| | - Ricardo J. Soares Magalhães
- UQ Spatial Epidemiology Laboratory, School of Veterinary Science, The University of Queensland, Gatton, QLD 4343 Australia
- Children Health and Environment Program, Child Health Research Centre, The University of Queensland, South Brisbane, QLD 4101 Australia
| | - Aimable Mbituyumuremyi
- Malaria and Other Parasitic Diseases Division, Rwanda Biomedical Center, Ministry of Health, Kigali, Rwanda
| |
Collapse
|
17
|
Xie S, Li X, McColgan P, Scahill RI, Zeng D, Wang Y. Identifying disease-associated biomarker network features through conditional graphical model. Biometrics 2019; 76:995-1006. [PMID: 31850527 DOI: 10.1111/biom.13201] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Revised: 07/25/2019] [Accepted: 12/04/2019] [Indexed: 01/28/2023]
Abstract
Biomarkers are often organized into networks, in which the strengths of network connections vary across subjects depending on subject-specific covariates (eg, genetic variants). Variation of network connections, as subject-specific feature variables, has been found to predict disease clinical outcome. In this work, we develop a two-stage method to estimate biomarker networks that account for heterogeneity among subjects and evaluate network's association with disease clinical outcome. In the first stage, we propose a conditional Gaussian graphical model with mean and precision matrix depending on covariates to obtain covariate-dependent networks with connection strengths varying across subjects while assuming homogeneous network structure. In the second stage, we evaluate clinical utility of network measures (connection strengths) estimated from the first stage. The second-stage analysis provides the relative predictive power of between-region network measures on clinical impairment in the context of regional biomarkers and existing disease risk factors. We assess the performance of proposed method by extensive simulation studies and application to a Huntington's disease (HD) study to investigate the effect of HD causal gene on the rate of change in motor symptom through affecting brain subcortical and cortical gray matter atrophy connections. We show that cortical network connections and subcortical volumes, but not subcortical connections are identified to be predictive of clinical motor function deterioration. We validate these findings in an independent HD study. Lastly, highly similar patterns seen in the gray matter connections and a previous white matter connectivity study suggest a shared biological mechanism for HD and support the hypothesis that white matter loss is a direct result of neuronal loss as opposed to the loss of myelin or dysmyelination.
Collapse
Affiliation(s)
- Shanghong Xie
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York
| | - Xiang Li
- Statistics and Decision Sciences, Janssen Research & Development, LLC, Raritan, New Jersey
| | - Peter McColgan
- Huntington's Disease Centre, Department of Neurodegenerative Disease, UCL Institute of Neurology, London, UK
| | - Rachael I Scahill
- Huntington's Disease Centre, Department of Neurodegenerative Disease, UCL Institute of Neurology, London, UK
| | - Donglin Zeng
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina
| | - Yuanjia Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York.,Department of Psychiatry, Columbia University Medical Center, New York
| |
Collapse
|
18
|
Lee JW, Moen EL, Punshon T, Hoen AG, Stewart D, Li H, Karagas MR, Gui J. An Integrated Gaussian Graphical Model to evaluate the impact of exposures on metabolic networks. Comput Biol Med 2019; 114:103417. [PMID: 31521894 PMCID: PMC6817396 DOI: 10.1016/j.compbiomed.2019.103417] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Revised: 08/25/2019] [Accepted: 08/26/2019] [Indexed: 02/07/2023]
Abstract
Examining the effects of exogenous exposures on complex metabolic processes poses the unique challenge of identifying interactions among a large number of metabolites. Recent progress in the quantification of the metabolome through mass spectrometry (MS) and nuclear magnetic resonance (NMR) has given rise to high-dimensional biomedical data of specific metabolites that can be leveraged to study their effects in humans. These metabolic interactions can be evaluated using probabilistic graphical models (PGMs), which define conditional dependence and independence between components within and between heterogeneous biomedical datasets. This method allows for the detection and recovery of valuable but latent information that cannot be easily detected by other currently existing methods. Here, we develop a PGM method, referred to as an "Integrated Gaussian Graphical Model (IGGM)", to incorporate exposure concentrations of seven trace elements-arsenic (As), lead (Pb), mercury (Hg), cadmium (Cd), zinc (Zn), selenium (Se) and copper (Cu-into metabolic networks. We first conducted a simulation study demonstrating that the integration of trace elements into metabolomics data can improve the accuracy of detecting latent interactions of metabolites impacted by exposure in the network. We tested parameters such as sample size and the number of neighboring metabolites of a chosen trace element for their impact on the accuracy of detecting metabolite interactions. We then applied this method to measurements of cord blood plasma metabolites and placental trace elements collected from newborns in the New Hampshire Birth Cohort Study (NHBCS). We found that our approach can identify latent interactions among metabolites that are related to trace element concentrations. Application to similarly structured data may contribute to our understanding of the complex interplay between exposure-related metabolic interactions that are important for human health.
Collapse
Affiliation(s)
- Jai Woo Lee
- Institute for Quantitative Biomedical Sciences, Dartmouth College, Hanover, NH, USA
| | - Erika L Moen
- Department of Biomedical Data Science, Geisel School of Medicine, Lebanon, NH, USA
| | - Tracy Punshon
- Department of Biological Sciences, Dartmouth College, Hanover, NH, USA
| | - Anne G Hoen
- Department of Biomedical Data Science, Geisel School of Medicine, Lebanon, NH, USA; Department of Epidemiology, Geisel School of Medicine, Lebanon, NH, USA
| | - Delisha Stewart
- Department of Nutrition, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Hongzhe Li
- Department of Biostatistics and Epidemiology, University of Pennsylvania Perelman School of Medicine, Philadephia, PA, USA
| | | | - Jiang Gui
- Department of Biomedical Data Science, Geisel School of Medicine, Lebanon, NH, USA.
| |
Collapse
|
19
|
Petralia F, Wang L, Peng J, Yan A, Zhu J, Wang P. A new method for constructing tumor specific gene co-expression networks based on samples with tumor purity heterogeneity. Bioinformatics 2019; 34:i528-i536. [PMID: 29949994 PMCID: PMC6022554 DOI: 10.1093/bioinformatics/bty280] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Motivation Tumor tissue samples often contain an unknown fraction of stromal cells. This problem is widely known as tumor purity heterogeneity (TPH) was recently recognized as a severe issue in omics studies. Specifically, if TPH is ignored when inferring co-expression networks, edges are likely to be estimated among genes with mean shift between non-tumor- and tumor cells rather than among gene pairs interacting with each other in tumor cells. To address this issue, we propose Tumor Specific Net (TSNet), a new method which constructs tumor-cell specific gene/protein co-expression networks based on gene/protein expression profiles of tumor tissues. TSNet treats the observed expression profile as a mixture of expressions from different cell types and explicitly models tumor purity percentage in each tumor sample. Results Using extensive synthetic data experiments, we demonstrate that TSNet outperforms a standard graphical model which does not account for TPH. We then apply TSNet to estimate tumor specific gene co-expression networks based on TCGA ovarian cancer RNAseq data. We identify novel co-expression modules and hub structure specific to tumor cells. Availability and implementation R codes can be found at https://github.com/petraf01/TSNet. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Francesca Petralia
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Li Wang
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Sema4, a Mount Sinai Venture, Stamford, CT, USA
| | - Jie Peng
- Department of Statistics, University of California, Davis, Davis, CA, USA
| | - Arthur Yan
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jun Zhu
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Sema4, a Mount Sinai Venture, Stamford, CT, USA
| | - Pei Wang
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
20
|
Ballout N, Viallon V. Structure estimation of binary graphical models on stratified data: Application to the description of injury tables for victims of road accidents. Stat Med 2019; 38:2680-2703. [PMID: 30873639 DOI: 10.1002/sim.8138] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Revised: 12/15/2018] [Accepted: 02/12/2019] [Indexed: 11/09/2022]
Abstract
Graphical models are used in many applications such as medical diagnostics and computer security. Increasingly often, the estimation of such models has to be performed on several predefined strata of the whole population. For instance, in epidemiology and clinical research, strata are often defined according to age, gender, treatment, or disease type. In this article, we propose new approaches dedicated to the estimation of binary graphical models on such strata. These approaches are implemented by combining well-known methods that have been developed in the context of a single binary graphical model, with penalties encouraging structured sparsity, which have recently been shown to be appropriate when dealing with stratified data. Empirical comparisons on synthetic data highlight that our approaches generally outperform its competitors. We present an application of the approach to study associations among the injuries suffered by victims of road accidents according to road user type.
Collapse
Affiliation(s)
- Nadim Ballout
- IFSTTAR, UMRESTTE, Université Claude Bernard Lyon 1, Lyon, France
| | - Vivian Viallon
- Nutritional Methodology and Biostatistics Group, International Agency for Research on Cancer, Lyon, France
| |
Collapse
|
21
|
|
22
|
Abstract
We consider the problem of modeling conditional independence structures in heterogeneous data in the presence of additional subject-level covariates - termed Graphical Regression. We propose a novel specification of a conditional (in)dependence function of covariates - which allows the structure of a directed graph to vary flexibly with the covariates; imposes sparsity in both edge and covariate selection; produces both subject-specific and predictive graphs; and is computationally tractable. We provide theoretical justifications of our modeling endeavor, in terms of graphical model selection consistency. We demonstrate the performance of our method through rigorous simulation studies. We illustrate our approach in a cancer genomics-based precision medicine paradigm, where-in we explore gene regulatory networks in multiple myeloma taking prognostic clinical factors into account to obtain both population-level and subject-level gene regulatory networks.
Collapse
Affiliation(s)
- Yang Ni
- Department of Statistics and Data Sciences, The University of Texas at Austin
- Department of Statistics, Rice University
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center
| | - Francesco C Stingo
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center
- Department of Statistics, Computer Science, Applications "G. Parenti", The University of Florence
| | | |
Collapse
|
23
|
Li X, Xie S, McColgan P, Tabrizi SJ, Scahill RI, Zeng D, Wang Y. Learning Subject-Specific Directed Acyclic Graphs With Mixed Effects Structural Equation Models From Observational Data. Front Genet 2018; 9:430. [PMID: 30333854 PMCID: PMC6176748 DOI: 10.3389/fgene.2018.00430] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Accepted: 09/11/2018] [Indexed: 11/13/2022] Open
Abstract
The identification of causal relationships between random variables from large-scale observational data using directed acyclic graphs (DAG) is highly challenging. We propose a new mixed-effects structural equation model (mSEM) framework to estimate subject-specific DAGs, where we represent joint distribution of random variables in the DAG as a set of structural causal equations with mixed effects. The directed edges between nodes depend on observed exogenous covariates on each of the individual and unobserved latent variables. The strength of the connection is decomposed into a fixed-effect term representing the average causal effect given the covariates and a random effect term representing the latent causal effect due to unobserved pathways. The advantage of such decomposition is to capture essential asymmetric structural information and heterogeneity between DAGs in order to allow for the identification of causal structure with observational data. In addition, by pooling information across subject-specific DAGs, we can identify causal structure with a high probability and estimate subject-specific networks with a high precision. We propose a penalized likelihood-based approach to handle multi-dimensionality of the DAG model. We propose a fast, iterative computational algorithm, DAG-MM, to estimate parameters in mSEM and achieve desirable sparsity by hard-thresholding the edges. We theoretically prove the identifiability of mSEM. Using simulations and an application to protein signaling data, we show substantially improved performances when compared to existing methods and consistent results with a network estimated from interventional data. Lastly, we identify gray matter atrophy networks in regions of brain from patients with Huntington's disease and corroborate our findings using white matter connectivity data collected from an independent study.
Collapse
Affiliation(s)
- Xiang Li
- Statistics and Decision Sciences, Janssen Research and Development, LLC, Raritan, NJ, United States
| | - Shanghong Xie
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, United States
| | - Peter McColgan
- National Hospital for Neurology and Neurosurgery, London, United Kingdom
| | - Sarah J. Tabrizi
- National Hospital for Neurology and Neurosurgery, London, United Kingdom
| | - Rachael I. Scahill
- National Hospital for Neurology and Neurosurgery, London, United Kingdom
| | - Donglin Zeng
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, United States
| | - Yuanjia Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, United States
- Departments of Psychiatry, Columbia University Medical Center, New York, NY, United States
| |
Collapse
|
24
|
|
25
|
Characterizing functional consequences of DNA copy number alterations in breast and ovarian tumors by spaceMap. J Genet Genomics 2018; 45:361-371. [PMID: 30057342 DOI: 10.1016/j.jgg.2018.07.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Revised: 07/09/2018] [Accepted: 07/09/2018] [Indexed: 01/18/2023]
Abstract
We propose a novel conditional graphical model - spaceMap - to construct gene regulatory networks from multiple types of high dimensional omic profiles. A motivating application is to characterize the perturbation of DNA copy number alterations (CNAs) on downstream protein levels in tumors. Through a penalized multivariate regression framework, spaceMap jointly models high dimensional protein levels as responses and high dimensional CNAs as predictors. In this setup, spaceMap infers an undirected network among proteins together with a directed network encoding how CNAs perturb the protein network. spaceMap can be applied to learn other types of regulatory relationships from high dimensional molecular profiles, especially those exhibiting hub structures. Simulation studies show spaceMap has greater power in detecting regulatory relationships over competing methods. Additionally, spaceMap includes a network analysis toolkit for biological interpretation of inferred networks. We applies spaceMap to the CNAs, gene expression and proteomics data sets from CPTAC-TCGA breast (n=77) and ovarian (n=174) cancer studies. Each cancer exhibits disruption of 'ion transmembrane transport' and 'regulation from RNA polymerase II promoter' by CNA events unique to each cancer. Moreover, using protein levels as a response yields a more functionally-enriched network than using RNA expressions in both cancer types. The network results also help to pinpoint crucial cancer genes and provide insights on the functional consequences of important CNA in breast and ovarian cancers. The R package spaceMap - including vignettes and documentation - is hosted on https://topherconley.github.io/spacemap.
Collapse
|
26
|
Clark NJ, Wells K, Lindberg O. Unravelling changing interspecific interactions across environmental gradients using Markov random fields. Ecology 2018; 99:1277-1283. [DOI: 10.1002/ecy.2221] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Revised: 02/20/2018] [Accepted: 03/02/2018] [Indexed: 11/09/2022]
Affiliation(s)
- Nicholas J. Clark
- School of Veterinary Science University of Queensland Gatton Queensland 4343 Australia
| | - Konstans Wells
- Environmental Futures Research Institute School of Environment Griffith University Brisbane Queensland 4111 Australia
| | - Oscar Lindberg
- Department of Mathematics and Statistics University of Turku 20500 Turku Finland
| |
Collapse
|
27
|
Teisseyre P. CCnet: Joint multi-label classification and feature selection using classifier chains and elastic net regularization. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.01.004] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
28
|
Bura E, Duarte S, Forzani L. Sufficient Reductions in Regressions With Exponential Family Inverse Predictors. J Am Stat Assoc 2016. [DOI: 10.1080/01621459.2015.1093944] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Affiliation(s)
- Efstathia Bura
- Department of Statistics, The George Washington University, Washington, DC, USA
| | - Sabrina Duarte
- Facultad de Ingeniería Química, Universidad Nacional del Litoral, Santa Fe, Argentina
| | - Liliana Forzani
- Facultad de Ingeniería Química, Universidad Nacional del Litoral, Santa Fe, Argentina
| |
Collapse
|
29
|
|
30
|
Teisseyre P, Kłopotek RA, Mielniczuk J. Random Subspace Method for high-dimensional regression with the R package regRSM. Comput Stat 2016. [DOI: 10.1007/s00180-016-0658-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|