1
|
Karamveer, Uzun Y. Approaches for Benchmarking Single-Cell Gene Regulatory Network Methods. Bioinform Biol Insights 2024; 18:11779322241287120. [PMID: 39502448 PMCID: PMC11536393 DOI: 10.1177/11779322241287120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 09/10/2024] [Indexed: 11/08/2024] Open
Abstract
Gene regulatory networks are powerful tools for modeling genetic interactions that control the expression of genes driving cell differentiation, and single-cell sequencing offers a unique opportunity to build these networks with high-resolution genomic data. There are many proposed computational methods to build these networks using single-cell data, and different approaches are used to benchmark these methods. However, a comprehensive discussion specifically focusing on benchmarking approaches is missing. In this article, we lay the GRN terminology, present an overview of common gold-standard studies and data sets, and define the performance metrics for benchmarking network construction methodologies. We also point out the advantages and limitations of different benchmarking approaches, suggest alternative ground truth data sets that can be used for benchmarking, and specify additional considerations in this context.
Collapse
Affiliation(s)
- Karamveer
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Yasin Uzun
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Penn State Cancer Institute, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| |
Collapse
|
2
|
Zhang S, Pyne S, Pietrzak S, Halberg S, McCalla SG, Siahpirani AF, Sridharan R, Roy S. Inference of cell type-specific gene regulatory networks on cell lineages from single cell omic datasets. Nat Commun 2023; 14:3064. [PMID: 37244909 PMCID: PMC10224950 DOI: 10.1038/s41467-023-38637-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 05/10/2023] [Indexed: 05/29/2023] Open
Abstract
Cell type-specific gene expression patterns are outputs of transcriptional gene regulatory networks (GRNs) that connect transcription factors and signaling proteins to target genes. Single-cell technologies such as single cell RNA-sequencing (scRNA-seq) and single cell Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq), can examine cell-type specific gene regulation at unprecedented detail. However, current approaches to infer cell type-specific GRNs are limited in their ability to integrate scRNA-seq and scATAC-seq measurements and to model network dynamics on a cell lineage. To address this challenge, we have developed single-cell Multi-Task Network Inference (scMTNI), a multi-task learning framework to infer the GRN for each cell type on a lineage from scRNA-seq and scATAC-seq data. Using simulated and real datasets, we show that scMTNI is a broadly applicable framework for linear and branching lineages that accurately infers GRN dynamics and identifies key regulators of fate transitions for diverse processes such as cellular reprogramming and differentiation.
Collapse
Affiliation(s)
- Shilu Zhang
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| | - Saptarshi Pyne
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| | - Stefan Pietrzak
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, WI, USA
| | - Spencer Halberg
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Sunnie Grace McCalla
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Alireza Fotuhi Siahpirani
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Rupa Sridharan
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, WI, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA.
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
3
|
McCalla SG, Fotuhi Siahpirani A, Li J, Pyne S, Stone M, Periyasamy V, Shin J, Roy S. Identifying strengths and weaknesses of methods for computational network inference from single-cell RNA-seq data. G3 (BETHESDA, MD.) 2023; 13:jkad004. [PMID: 36626328 PMCID: PMC9997554 DOI: 10.1093/g3journal/jkad004] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 11/09/2022] [Accepted: 12/16/2022] [Indexed: 01/11/2023]
Abstract
Single-cell RNA-sequencing (scRNA-seq) offers unparalleled insight into the transcriptional programs of different cellular states by measuring the transcriptome of thousands of individual cells. An emerging problem in the analysis of scRNA-seq is the inference of transcriptional gene regulatory networks and a number of methods with different learning frameworks have been developed to address this problem. Here, we present an expanded benchmarking study of eleven recent network inference methods on seven published scRNA-seq datasets in human, mouse, and yeast considering different types of gold standard networks and evaluation metrics. We evaluate methods based on their computing requirements as well as on their ability to recover the network structure. We find that, while most methods have a modest recovery of experimentally derived interactions based on global metrics such as Area Under the Precision Recall curve, methods are able to capture targets of regulators that are relevant to the system under study. Among the top performing methods that use only expression were SCENIC, PIDC, MERLIN or Correlation. Addition of prior biological knowledge and the estimation of transcription factor activities resulted in the best overall performance with the Inferelator and MERLIN methods that use prior knowledge outperforming methods that use expression alone. We found that imputation for network inference did not improve network inference accuracy and could be detrimental. Comparisons of inferred networks for comparable bulk conditions showed that the networks inferred from scRNA-seq datasets are often better or at par with the networks inferred from bulk datasets. Our analysis should be beneficial in selecting methods for network inference. At the same time, this highlights the need for improved methods and better gold standards for regulatory network inference from scRNAseq datasets.
Collapse
Affiliation(s)
- Sunnie Grace McCalla
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI 53706, USA
| | | | - Jiaxin Li
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Saptarshi Pyne
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
| | - Matthew Stone
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53792, USA
| | - Viswesh Periyasamy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Junha Shin
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53792, USA
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA
| |
Collapse
|
4
|
Mei H, Jia R, Qiao G, Lin Z, Ma S. Human disease clinical treatment network for the elderly: analysis of the medicare inpatient length of stay and readmission data. Biometrics 2023; 79:404-416. [PMID: 34411297 DOI: 10.1111/biom.13549] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2021] [Revised: 06/24/2021] [Accepted: 08/11/2021] [Indexed: 11/30/2022]
Abstract
Clinical treatment outcomes are the quality and cost targets that health-care providers aim to improve. Most existing outcome analysis focuses on a single disease or all diseases combined. Motivated by the success of molecular and phenotypic human disease networks (HDNs), this article develops a clinical treatment network that describes the interconnections among diseases in terms of inpatient length of stay (LOS) and readmission. Here one node represents one disease, and two nodes are linked with an edge if their LOS and number of readmissions are conditionally dependent. This is the very first HDN that jointly analyzes multiple clinical treatment outcomes at the pan-disease level. To accommodate the unique data characteristics, we propose a modeling approach based on two-part generalized linear models and estimation based on penalized integrative analysis. Analysis is conducted on the Medicare inpatient data of 100,000 randomly selected subjects for the period of January 2010 to December 2018. The resulted network has 1008 edges for 106 nodes. We analyze key network properties including connectivity, module/hub, and temporal variation. The findings are biomedically sensible. For example, high connectivity and hub conditions, such as disorders of lipid metabolism and essential hypertension, are identified. There are also findings that are less/not investigated in the literature. Overall, this study can provide additional insight into diseases' properties and their interconnections and assist more efficient disease management and health-care resources allocation.
Collapse
Affiliation(s)
- Hao Mei
- Department of Biostatistics, Yale University, New Haven, Connecticut, USA
- Center for Outcomes Research and Evaluation, Yale-New Haven Hospital, New Haven, Connecticut, USA
| | - Ruofan Jia
- The Wang Yanan Institute for Studies in Economics, Xiamen University, Xiamen, Fujian, China
| | - Guanzhong Qiao
- Department of Orthopaedic, The First Hospital of Tsinghua University, Beijing, China
| | - Zhenqiu Lin
- Center for Outcomes Research and Evaluation, Yale-New Haven Hospital, New Haven, Connecticut, USA
| | - Shuangge Ma
- Department of Biostatistics, Yale University, New Haven, Connecticut, USA
| |
Collapse
|
5
|
Yu S, Drton M, Shojaie A. Directed Graphical Models and Causal Discovery for Zero-Inflated Data. PROCEEDINGS OF MACHINE LEARNING RESEARCH 2023; 213:27-67. [PMID: 39027359 PMCID: PMC11257027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
With advances in technology, gene expression measurements from single cells can be used to gain refined insights into regulatory relationships among genes. Directed graphical models are well-suited to explore such (cause-effect) relationships. However, statistical analyses of single cell data are complicated by the fact that the data often show zero-inflated expression patterns. To address this challenge, we propose directed graphical models that are based on Hurdle conditional distributions parametrized in terms of polynomials in parent variables and their 0/1 indicators of being zero or nonzero. While directed graphs for Gaussian models are only identifiable up to an equivalence class in general, we show that, under a natural and weak assumption, the exact directed acyclic graph of our zero-inflated models can be identified. We propose methods for graph recovery, apply our model to real single-cell gene expression data on T helper cells, and show simulated experiments that validate the identifiability and graph estimation methods in practice.
Collapse
Affiliation(s)
- Shiqing Yu
- Department of Statistics, University of Washington, Seattle, Washington, 98195, U.S.A
| | - Mathias Drton
- Department of Mathematics, Technical University of Munich, 85748 Garching bei München, Germany
| | - Ali Shojaie
- Department of Biostatistics, University of Washington, Seattle, Washington, 98195, U.S.A
| |
Collapse
|
6
|
Oh J, Chang C, Long Q. Accounting for technical noise in Bayesian graphical models of single-cell RNA-sequencing data. Biostatistics 2022; 24:161-176. [PMID: 34520533 DOI: 10.1093/biostatistics/kxab011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 03/16/2021] [Accepted: 03/17/2021] [Indexed: 12/16/2022] Open
Abstract
Single-cell RNA-sequencing (scRNAseq) data contain a high level of noise, especially in the form of zero-inflation, that is, the presence of an excessively large number of zeros. This is largely due to dropout events and amplification biases that occur in the preparation stage of single-cell experiments. Recent scRNAseq experiments have been augmented with unique molecular identifiers (UMI) and External RNA Control Consortium (ERCC) molecules which can be used to account for zero-inflation. However, most of the current methods on graphical models are developed under the assumption of the multivariate Gaussian distribution or its variants, and thus they are not able to adequately account for an excessively large number of zeros in scRNAseq data. In this article, we propose a single-cell latent graphical model (scLGM)-a Bayesian hierarchical model for estimating the conditional dependency network among genes using scRNAseq data. Taking advantage of UMI and ERCC data, scLGM explicitly models the two sources of zero-inflation. Our simulation study and real data analysis demonstrate that the proposed approach outperforms several existing methods.
Collapse
Affiliation(s)
- Jihwan Oh
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvannia, 423 Guardian Drive, Philadelphia, PA 19104, USA
| | - Changgee Chang
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvannia, 423 Guardian Drive, Philadelphia, PA 19104, USA
| | - Qi Long
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvannia, 423 Guardian Drive, Philadelphia, PA 19104, USA
| |
Collapse
|
7
|
Chung HC, Gaynanova I, Ni Y. Phylogenetically informed Bayesian truncated copula graphical models for microbial association networks. Ann Appl Stat 2022. [DOI: 10.1214/21-aoas1598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
| | | | - Yang Ni
- Department of Statistics, Texas A&M University
| |
Collapse
|
8
|
Wu Q, Luo X. Estimating heterogeneous gene regulatory networks from zero-inflated single-cell expression data. Ann Appl Stat 2022. [DOI: 10.1214/21-aoas1582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Qiuyu Wu
- Institute of Statistics and Big Data, Renmin University of China
| | - Xiangyu Luo
- Institute of Statistics and Big Data, Renmin University of China
| |
Collapse
|
9
|
Liu J, Wang H, Sun W, Liu Y. Prioritizing Autism Risk Genes using Personalized Graphical Models Estimated from Single Cell RNA-seq Data. J Am Stat Assoc 2022; 117:38-51. [PMID: 35529781 PMCID: PMC9070996 DOI: 10.1080/01621459.2021.1933495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Hundreds of autism risk genes have been reported recently, mainly based on genetic studies where these risk genes have more de novo mutations in autism subjects than healthy controls. However, as a complex disease, autism is likely associated with more risk genes and many of them may not be identifiable through de novo mutations. We hypothesize that more autism risk genes can be identified through their connections with known autism risk genes in personalized gene-gene interaction graphs. We estimate such personalized graphs using single cell RNA sequencing (scRNA-seq) while appropriately modeling the cell dependence and possible zero-inflation in the scRNA-seq data. The sample size, which is the number of cells per individual, ranges from 891 to 1,241 in our case study using scRNA-seq data in autism subjects and controls. We consider 1,500 genes in our analysis. Since the number of genes is larger or comparable to the sample size, we perform penalized estimation. We score each gene's relevance by applying a simple graph kernel smoothing method to each personalized graph. The molecular functions of the top-scored genes are related to autism diseases. For example, a candidate gene RYR2 that encodes protein ryanodine receptor 2 is involved in neurotransmission, a process that is impaired in ASD patients. While our method provides a systemic and unbiased approach to prioritize autism risk genes, the relevance of these genes needs to be further validated in functional studies.
Collapse
Affiliation(s)
- Jianyu Liu
- Department of Statistics and Operations Research, University of North Carolina, Chapel Hill
| | - Haodong Wang
- Department of Statistics and Operations Research, University of North Carolina, Chapel Hill
| | - Wei Sun
- Biostatistics Program, Public Health Sciences Division Fred Hutchinson Cancer Research Center, Seattle, Washington
| | - Yufeng Liu
- Department of Statistics and Operations Research, University of North Carolina, Chapel Hill,Department of Genetics, Department of Biostatistics, Carolina Center for Genome Science, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill,
| |
Collapse
|
10
|
Matchado MS, Lauber M, Reitmeier S, Kacprowski T, Baumbach J, Haller D, List M. Network analysis methods for studying microbial communities: A mini review. Comput Struct Biotechnol J 2021; 19:2687-2698. [PMID: 34093985 PMCID: PMC8131268 DOI: 10.1016/j.csbj.2021.05.001] [Citation(s) in RCA: 116] [Impact Index Per Article: 38.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 05/01/2021] [Accepted: 05/01/2021] [Indexed: 12/20/2022] Open
Abstract
Microorganisms including bacteria, fungi, viruses, protists and archaea live as communities in complex and contiguous environments. They engage in numerous inter- and intra- kingdom interactions which can be inferred from microbiome profiling data. In particular, network-based approaches have proven helpful in deciphering complex microbial interaction patterns. Here we give an overview of state-of-the-art methods to infer intra-kingdom interactions ranging from simple correlation- to complex conditional dependence-based methods. We highlight common biases encountered in microbial profiles and discuss mitigation strategies employed by different tools and their trade-off with increased computational complexity. Finally, we discuss current limitations that motivate further method development to infer inter-kingdom interactions and to robustly and comprehensively characterize microbial environments in the future.
Collapse
Affiliation(s)
- Monica Steffi Matchado
- Chair of Experimental Bioinformatics, Technical University of Munich, 85354 Freising, Germany
| | - Michael Lauber
- Chair of Experimental Bioinformatics, Technical University of Munich, 85354 Freising, Germany
| | - Sandra Reitmeier
- ZIEL - Institute for Food & Health, Technical University of Munich, 85354 Freising, Germany
- Chair of Nutrition and Immunology, Technical University of Munich, 85354 Freising, Germany
| | - Tim Kacprowski
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics, TU Braunschweig and Hannover Medical School, 38106 Brunswick, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), 38106 Brunswick, Germany
| | - Jan Baumbach
- Institute of Mathematics and Computer Science, University of Southern Denmark, 5230 Odense, Denmark
- Chair of Computational Systems Biology, University of Hamburg, 22607 Hamburg, Germany
| | - Dirk Haller
- ZIEL - Institute for Food & Health, Technical University of Munich, 85354 Freising, Germany
- Chair of Nutrition and Immunology, Technical University of Munich, 85354 Freising, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, Technical University of Munich, 85354 Freising, Germany
| |
Collapse
|
11
|
Kuksin M, Morel D, Aglave M, Danlos FX, Marabelle A, Zinovyev A, Gautheret D, Verlingue L. Applications of single-cell and bulk RNA sequencing in onco-immunology. Eur J Cancer 2021; 149:193-210. [PMID: 33866228 DOI: 10.1016/j.ejca.2021.03.005] [Citation(s) in RCA: 62] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 02/26/2021] [Accepted: 03/04/2021] [Indexed: 02/08/2023]
Abstract
The rising interest for precise characterization of the tumour immune contexture has recently brought forward the high potential of RNA sequencing (RNA-seq) in identifying molecular mechanisms engaged in the response to immunotherapy. In this review, we provide an overview of the major principles of single-cell and conventional (bulk) RNA-seq applied to onco-immunology. We describe standard preprocessing and statistical analyses of data obtained from such techniques and highlight some computational challenges relative to the sequencing of individual cells. We notably provide examples of gene expression analyses such as differential expression analysis, dimensionality reduction, clustering and enrichment analysis. Additionally, we used public data sets to exemplify how deconvolution algorithms can identify and quantify multiple immune subpopulations from either bulk or single-cell RNA-seq. We give examples of machine and deep learning models used to predict patient outcomes and treatment effect from high-dimensional data. Finally, we balance the strengths and weaknesses of single-cell and bulk RNA-seq regarding their applications in the clinic.
Collapse
Affiliation(s)
- Maria Kuksin
- ENS de Lyon, 15 Parvis René Descartes, 69007, Lyon, France; Département d'Innovations Thérapeutiques et Essais Précoces (DITEP), Gustave Roussy Cancer Campus, 114 rue Edouard Vaillant, 94800, Villejuif, France
| | - Daphné Morel
- Département d'Innovations Thérapeutiques et Essais Précoces (DITEP), Gustave Roussy Cancer Campus, 114 rue Edouard Vaillant, 94800, Villejuif, France; Département de Radiothérapie, Gustave Roussy Cancer Campus, Gustave Roussy, 114 rue Edouard Vaillant, 94800, Villejuif, France; INSERM UMR1030, Molecular Radiotherapy and Therapeutic Innovations, Gustave Roussy, 114 rue Edouard Vaillant, 94800, Villejuif, France
| | - Marine Aglave
- INSERM US23, CNRS UMS 3655, Gustave Roussy Cancer Campus, 114 rue Edouard Vaillant, 94800, Villejuif, France
| | | | - Aurélien Marabelle
- Département d'Innovations Thérapeutiques et Essais Précoces (DITEP), Gustave Roussy Cancer Campus, 114 rue Edouard Vaillant, 94800, Villejuif, France; INSERM U1015, Gustave Roussy, Université Paris Saclay, France
| | - Andrei Zinovyev
- Institut Curie, PSL Research University, F-75005, Paris, France; INSERM, U900, F-75005, Paris, France; MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, F-75006, Paris, France; Laboratory of Advanced Methods for High-dimensional Data Analysis, Lobachevsky University, 603000, Nizhny Novgorod, Russia
| | - Daniel Gautheret
- Institute for Integrative Biology of the Cell, UMR 9198, CEA, CNRS, Université Paris-Saclay, Gif-Sur-Yvette, France; IHU PRISM, Gustave Roussy Cancer Campus, Gustave Roussy, 114 Rue Edouard Vaillant, 94800, Villejuif, France; Université Paris-Saclay, France
| | - Loïc Verlingue
- Département d'Innovations Thérapeutiques et Essais Précoces (DITEP), Gustave Roussy Cancer Campus, 114 rue Edouard Vaillant, 94800, Villejuif, France; INSERM UMR1030, Molecular Radiotherapy and Therapeutic Innovations, Gustave Roussy, 114 rue Edouard Vaillant, 94800, Villejuif, France; Institut Curie, PSL Research University, F-75005, Paris, France; Université Paris-Saclay, France.
| |
Collapse
|
12
|
Mei H, Jia R, Qiao G, Lin Z, Ma S. Human disease clinical treatment network for the elderly: The analysis of medicare inpatient length of stay data. Stat Med 2021; 40:2083-2099. [PMID: 33527492 DOI: 10.1002/sim.8893] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Revised: 12/03/2020] [Accepted: 01/09/2021] [Indexed: 12/14/2022]
Abstract
Disease clinical treatment measures, such as inpatient length of stay (LOS), have been examined for most if not all diseases. Such analysis has important implications for the management and planning of health care, financial, and human resources. In addition, clinical treatment measures can also informatively reflect intrinsic disease properties such as severity. The existing studies mostly focus on either a single disease (or a few pre-selected and closely related diseases) or all diseases combined. In this study, we take a new and innovative perspective, examine the interconnections in length of stay (LOS) among diseases, and construct the very first disease clinical treatment network on LOS. To accommodate uniquely challenging data distributions, a new conditional network construction approach is developed. Based on the constructed network, the analysis of important network properties is conducted. The Medicare data on 100 000 randomly selected subjects for the period of January 2008 to December 2018 is analyzed. The network structure and key properties are found to have sensible biomedical interpretations. Being the very first of its kind, this study can be informative to disease clinical management, advance our understanding of disease interconnections, and foster complex network analysis.
Collapse
Affiliation(s)
- Hao Mei
- Department of Biostatistics, Yale University, New Haven, Connecticut, USA.,Center for Outcomes Research and Evaluation, Yale-New Haven Hospital, New Haven, Connecticut, USA
| | - Ruofan Jia
- The Wang Yanan Institute for Studies in Economics, Xiamen University, Fujian, China
| | - Guanzhong Qiao
- Department of Orthopaedic, The First Hospital of Tsinghua University, Beijing, China
| | - Zhenqiu Lin
- Center for Outcomes Research and Evaluation, Yale-New Haven Hospital, New Haven, Connecticut, USA
| | - Shuangge Ma
- Department of Biostatistics, Yale University, New Haven, Connecticut, USA
| |
Collapse
|
13
|
Joint Microbial and Metabolomic Network Estimation with the Censored Gaussian Graphical Model. STATISTICS IN BIOSCIENCES 2021; 13:351-372. [PMID: 34178165 PMCID: PMC8223740 DOI: 10.1007/s12561-020-09294-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Joint analysis of microbiome and metabolomic data represents an imperative objective as the field moves beyond basic microbiome association studies and turns towards mechanistic and translational investigations. We present a censored Gaussian graphical model framework, where the metabolomic data are treated as continuous and the microbiome data as censored at zero, to identify direct interactions (defined as conditional dependence relationships) between microbial species and metabolites. Simulated examples show that our method metaMint performs favorably compared to the existing ones. metaMint also provides interpretable microbe-metabolite interactions when applied to a bacterial vaginosis data set. R implementation of metaMint is available on GitHub.
Collapse
|
14
|
Ha MJ, Kim J, Galloway-Peña J, Do KA, Peterson CB. Compositional zero-inflated network estimation for microbiome data. BMC Bioinformatics 2020; 21:581. [PMID: 33371887 PMCID: PMC7768662 DOI: 10.1186/s12859-020-03911-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2020] [Accepted: 11/25/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND The estimation of microbial networks can provide important insight into the ecological relationships among the organisms that comprise the microbiome. However, there are a number of critical statistical challenges in the inference of such networks from high-throughput data. Since the abundances in each sample are constrained to have a fixed sum and there is incomplete overlap in microbial populations across subjects, the data are both compositional and zero-inflated. RESULTS We propose the COmpositional Zero-Inflated Network Estimation (COZINE) method for inference of microbial networks which addresses these critical aspects of the data while maintaining computational scalability. COZINE relies on the multivariate Hurdle model to infer a sparse set of conditional dependencies which reflect not only relationships among the continuous values, but also among binary indicators of presence or absence and between the binary and continuous representations of the data. Our simulation results show that the proposed method is better able to capture various types of microbial relationships than existing approaches. We demonstrate the utility of the method with an application to understanding the oral microbiome network in a cohort of leukemic patients. CONCLUSIONS Our proposed method addresses important challenges in microbiome network estimation, and can be effectively applied to discover various types of dependence relationships in microbial communities. The procedure we have developed, which we refer to as COZINE, is available online at https://github.com/MinJinHa/COZINE .
Collapse
Affiliation(s)
- Min Jin Ha
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, 1400 Pressler St., Houston, TX, USA.
| | - Junghi Kim
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Sp, MD, USA
| | - Jessica Galloway-Peña
- Department of Veterinary Pathobiology, Texas A&M University, College Station, TX, USA
| | - Kim-Anh Do
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, 1400 Pressler St., Houston, TX, USA
| | - Christine B Peterson
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, 1400 Pressler St., Houston, TX, USA
| |
Collapse
|
15
|
McDavid A, Gottardo R, Simon N, Drton M. GRAPHICAL MODELS FOR ZERO-INFLATED SINGLE CELL GENE EXPRESSION. Ann Appl Stat 2019; 13:848-873. [PMID: 31388390 PMCID: PMC6684253 DOI: 10.1214/18-aoas1213] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Bulk gene expression experiments relied on aggregations of thousands of cells to measure the average expression in an organism. Advances in microfluidic and droplet sequencing now permit expression profiling in single cells. This study of cell-to-cell variation reveals that individual cells lack detectable expression of transcripts that appear abundant on a population level, giving rise to zero-inflated expression patterns. To infer gene co-regulatory networks from such data, we propose a multivariate Hurdle model. It is comprised of a mixture of singular Gaussian distributions. We employ neighborhood selection with the pseudo-likelihood and a group lasso penalty to select and fit undirected graphical models that capture conditional independences between genes. The proposed method is more sensitive than existing approaches in simulations, even under departures from our Hurdle model. The method is applied to data for T follicular helper cells, and a high-dimensional profile of mouse dendritic cells. It infers network structure not revealed by other methods; or in bulk data sets. An R implementation is available at https://github.com/amcdavid/HurdleNormal.
Collapse
Affiliation(s)
- Andrew McDavid
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center; Rochester, New York
| | - Raphael Gottardo
- Vaccine and Infectuous Disease Division, Fred Hutchinson Cancer Research Center
- Department of Statistic, University of Washington; Seattle, Washington
| | - Noah Simon
- Department of Biostatistics, University of Washington; Seattle, Washington
| | - Mathias Drton
- Department of Statistic, University of Washington; Seattle, Washington
- Department of Mathematical Sciences, University of Copenhagen; Denmark
| |
Collapse
|