1
|
Zhou F, He K, Ni Y. Individualized causal discovery with latent trajectory embedded Bayesian networks. Biometrics 2023; 79:3191-3202. [PMID: 36807295 DOI: 10.1111/biom.13843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 12/08/2022] [Accepted: 02/06/2023] [Indexed: 02/20/2023]
Abstract
Bayesian networks have been widely used to generate causal hypotheses from multivariate data. Despite their popularity, the vast majority of existing causal discovery approaches make the strong assumption of a (partially) homogeneous sampling scheme. However, such assumption can be seriously violated, causing significant biases when the underlying population is inherently heterogeneous. To this end, we propose a novel causal Bayesian network model, termed BN-LTE, that embeds heterogeneous samples onto a low-dimensional manifold and builds Bayesian networks conditional on the embedding. This new framework allows for more precise network inference by improving the estimation resolution from the population level to the observation level. Moreover, while causal Bayesian networks are in general not identifiable with purely observational, cross-sectional data due to Markov equivalence, with the blessing of causal effect heterogeneity, we prove that the proposed BN-LTE is uniquely identifiable under relatively mild assumptions. Through extensive experiments, we demonstrate the superior performance of BN-LTE in causal structure learning as well as inferring observation-specific gene regulatory networks from observational data.
Collapse
Affiliation(s)
- Fangting Zhou
- Institute of Statistics and Big Data, Renmin University of China, Beijing, China
- Department of Statistics, Texas A&M University, College Station, Texas, USA
| | - Kejun He
- Institute of Statistics and Big Data, Renmin University of China, Beijing, China
| | - Yang Ni
- Department of Statistics, Texas A&M University, College Station, Texas, USA
| |
Collapse
|
2
|
Das P, Peterson CB, Ni Y, Reuben A, Zhang J, Zhang J, Do KA, Baladandayuthapani V. Bayesian hierarchical quantile regression with application to characterizing the immune architecture of lung cancer. Biometrics 2023; 79:2474-2488. [PMID: 36239535 PMCID: PMC10102253 DOI: 10.1111/biom.13774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Accepted: 08/18/2022] [Indexed: 12/14/2022]
Abstract
The successful development and implementation of precision immuno-oncology therapies requires a deeper understanding of the immune architecture at a patient level. T-cell receptor (TCR) repertoire sequencing is a relatively new technology that enables monitoring of T-cells, a subset of immune cells that play a central role in modulating immune response. These immunologic relationships are complex and are governed by various distributional aspects of an individual patient's tumor profile. We propose Bayesian QUANTIle regression for hierarchical COvariates (QUANTICO) that allows simultaneous modeling of hierarchical relationships between multilevel covariates, conducts explicit variable selection, estimates quantile and patient-specific coefficient effects, to induce individualized inference. We show QUANTICO outperforms existing approaches in multiple simulation scenarios. We demonstrate the utility of QUANTICO to investigate the effect of TCR variables on immune response in a cohort of lung cancer patients. At population level, our analyses reveal the mechanistic role of T-cell proportion on the immune cell abundance, with tumor mutation burden as an important factor modulating this relationship. At a patient level, we find several outlier patients based on their quantile-specific coefficient functions, who have higher mutational rates and different smoking history.
Collapse
Affiliation(s)
- Priyam Das
- Dept. of Biomedical Informatics, Harvard Medical School
| | | | - Yang Ni
- Dept. of Statistics, Texas A&M University
| | - Alexandre Reuben
- Dept. of Thoracic Head and Neck Medical Oncology, University of Texas MD Anderson Cancer Center
| | - Jiexin Zhang
- Dept. of Bioinformatics and Computational Biology, University of Texas MD Anderson Cancer Center
| | - Jianjun Zhang
- Dept. of Thoracic Head and Neck Medical Oncology, University of Texas MD Anderson Cancer Center
| | - Kim-Anh Do
- Dept. of Biostatistics, University of Texas MD Anderson Cancer Center
| | | |
Collapse
|
3
|
Pedone M, Amedei A, Stingo FC. Subject-specific Dirichlet-multinomial regression for multi-district microbiota data analysis. Ann Appl Stat 2023. [DOI: 10.1214/22-aoas1641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Affiliation(s)
- Matteo Pedone
- Department of Statistics, Computer Science, Applications, University of Florence
| | - Amedeo Amedei
- Department of Clinical and Experimental Medicine, University of Florence
| | - Francesco C. Stingo
- Department of Statistics, Computer Science, Applications, University of Florence
| |
Collapse
|
4
|
Kidd B, Wang K, Xu Y, Ni Y. Federated Learning for Sparse Bayesian Models with Applications to Electronic Health Records and Genomics. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2023; 28:484-495. [PMID: 36541002 PMCID: PMC9782716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Federated learning is becoming increasingly more popular as the concern of privacy breaches rises across disciplines including the biological and biomedical fields. The main idea is to train models locally on each server using data that are only available to that server and aggregate the model (not data) information at the global level. While federated learning has made significant advancements for machine learning methods such as deep neural networks, to the best of our knowledge, its development in sparse Bayesian models is still lacking. Sparse Bayesian models are highly interpretable with natural uncertain quantification, a desirable property for many scientific problems. However, without a federated learning algorithm, their applicability to sensitive biological/biomedical data from multiple sources is limited. Therefore, to fill this gap in the literature, we propose a new Bayesian federated learning framework that is capable of pooling information from different data sources without breaching privacy. The proposed method is conceptually simple to understand and implement, accommodates sampling heterogeneity (i.e., non-iid observations) across data sources, and allows for principled uncertainty quantification. We illustrate the proposed framework with three concrete sparse Bayesian models, namely, sparse regression, Markov random field, and directed graphical models. The application of these three models is demonstrated through three real data examples including a multi-hospital COVID-19 study, breast cancer protein-protein interaction networks, and gene regulatory networks.
Collapse
Affiliation(s)
- Brian Kidd
- Department of Statistics, Texas A&M University, College Station, Texas 77843, USA
| | - Kunbo Wang
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Yanxun Xu
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Yang Ni
- Department of Statistics, Texas A&M University, College Station, Texas 77843, USA,Correspondence:
| |
Collapse
|
5
|
Crook OM, Chung CW, Deane CM. Challenges and Opportunities for Bayesian Statistics in Proteomics. J Proteome Res 2022; 21:849-864. [PMID: 35258980 PMCID: PMC8982455 DOI: 10.1021/acs.jproteome.1c00859] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Indexed: 12/27/2022]
Abstract
Proteomics is a data-rich science with complex experimental designs and an intricate measurement process. To obtain insights from the large data sets produced, statistical methods, including machine learning, are routinely applied. For a quantity of interest, many of these approaches only produce a point estimate, such as a mean, leaving little room for more nuanced interpretations. By contrast, Bayesian statistics allows quantification of uncertainty through the use of probability distributions. These probability distributions enable scientists to ask complex questions of their proteomics data. Bayesian statistics also offers a modular framework for data analysis by making dependencies between data and parameters explicit. Hence, specifying complex hierarchies of parameter dependencies is straightforward in the Bayesian framework. This allows us to use a statistical methodology which equals, rather than neglects, the sophistication of experimental design and instrumentation present in proteomics. Here, we review Bayesian methods applied to proteomics, demonstrating their potential power, alongside the challenges posed by adopting this new statistical framework. To illustrate our review, we give a walk-through of the development of a Bayesian model for dynamic organic orthogonal phase-separation (OOPS) data.
Collapse
Affiliation(s)
- Oliver M. Crook
- Department
of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
| | - Chun-wa Chung
- Structural
and Biophysical Sciences, GlaxoSmithKline
R&D, Stevenage SG1 2NY, United Kingdom
| | - Charlotte M. Deane
- Department
of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
| |
Collapse
|
6
|
Abstract
Graphical models are powerful tools that are regularly used to investigate complex dependence structures in high-throughput biomedical datasets. They allow for holistic, systems-level view of the various biological processes, for intuitive and rigorous understanding and interpretations. In the context of large networks, Bayesian approaches are particularly suitable because it encourages sparsity of the graphs, incorporate prior information, and most importantly account for uncertainty in the graph structure. These features are particularly important in applications with limited sample size, including genomics and imaging studies. In this paper, we review several recently developed techniques for the analysis of large networks under non-standard settings, including but not limited to, multiple graphs for data observed from multiple related subgroups, graphical regression approaches used for the analysis of networks that change with covariates, and other complex sampling and structural settings. We also illustrate the practical utility of some of these methods using examples in cancer genomics and neuroimaging.
Collapse
|
7
|
Chu J, Sun NA, Hu W, Chen X, Yi N, Shen Y. The Application of Bayesian Methods in Cancer Prognosis and Prediction. Cancer Genomics Proteomics 2022; 19:1-11. [PMID: 34949654 PMCID: PMC8717957 DOI: 10.21873/cgp.20298] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 11/24/2021] [Accepted: 11/30/2021] [Indexed: 11/10/2022] Open
Abstract
With the development of high-throughput biological techniques, high-dimensional omics data have emerged. These molecular data provide a solid foundation for precision medicine and prognostic prediction of cancer. Bayesian methods contribute to constructing prognostic models with complex relationships in omics and improving performance by introducing different prior distribution, which is suitable for modelling the high-dimensional data involved. Using different omics, several Bayesian hierarchical approaches have been proposed for variable selection and model construction. In particular, the Bayesian methods of multi-omics integration have also been consistently proposed in recent years. Compared with single-omics, multi-omics integration modelling will contribute to improving predictive performance, gaining insights into the underlying mechanisms of tumour occurrence and development, and the discovery of more reliable biomarkers. In this work, we present a review of current proposed Bayesian approaches in prognostic prediction modelling in cancer.
Collapse
Affiliation(s)
- Jiadong Chu
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - N A Sun
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Wei Hu
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Xuanli Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Nengjun Yi
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, U.S.A
| | - Yueping Shen
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China;
| |
Collapse
|
8
|
Vitorino R, Choudhury M, Guedes S, Ferreira R, Thongboonkerd V, Sharma L, Amado F, Srivastava S. Peptidomics and proteogenomics: background, challenges and future needs. Expert Rev Proteomics 2021; 18:643-659. [PMID: 34517741 DOI: 10.1080/14789450.2021.1980388] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
INTRODUCTION With available genomic data and related information, it is becoming possible to better highlight mutations or genomic alterations associated with a particular disease or disorder. The advent of high-throughput sequencing technologies has greatly advanced diagnostics, prognostics, and drug development. AREAS COVERED Peptidomics and proteogenomics are the two post-genomic technologies that enable the simultaneous study of peptides and proteins/transcripts/genes. Both technologies add a remarkably large amount of data to the pool of information on various peptides associated with gene mutations or genome remodeling. Literature search was performed in the PubMed database and is up to date. EXPERT OPINION This article lists various techniques used for peptidomic and proteogenomic analyses. It also explains various bioinformatics workflows developed to understand differentially expressed peptides/proteins and their role in disease pathogenesis. Their role in deciphering disease pathways, cancer research, and biomarker discovery using biofluids is highlighted. Finally, the challenges and future requirements to overcome the current limitations for their effective clinical use are also discussed.
Collapse
Affiliation(s)
- Rui Vitorino
- Faculdade de Medicina da Universidade do Porto, Porto, Portugal.,iBiMED, Department of Medical Sciences, University of Aveiro, Aveiro, Portugal.,Laqv/requimte, Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Manisha Choudhury
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, Powai, India
| | - Sofia Guedes
- Laqv/requimte, Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Rita Ferreira
- Laqv/requimte, Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Visith Thongboonkerd
- Medical Proteomics Unit, Office for Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | | | - Francisco Amado
- Laqv/requimte, Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Sanjeeva Srivastava
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, Powai, India
| |
Collapse
|
9
|
Heuclin B, Mortier F, Trottier C, Denis M. Bayesian varying coefficient model with selection: An application to functional mapping. J R Stat Soc Ser C Appl Stat 2020. [DOI: 10.1111/rssc.12447] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Affiliation(s)
- Benjamin Heuclin
- IMAG, Univ Montpellier, CNRS Montpellier France
- CIRAD, UMR AGAP Montpellier France
| | - Frédéric Mortier
- Forêts et Sociétés Cirad Montpellier France
- Forêts et Sociétés Univ Montpellier, Cirad Montpellier France
| | - Catherine Trottier
- IMAG, Univ Montpellier, CNRS Montpellier France
- Univ Paul‐Valéry Montpellier 3 Montpellier France
| | - Marie Denis
- CIRAD, UMR AGAP Montpellier France
- AGAP Univ Montpellier, CIRAD, INRAE, Institut Agro Montpellier France
| |
Collapse
|
10
|
Münch MM, van de Wiel MA, Richardson S, Leday GGR. Drug sensitivity prediction with normal inverse Gaussian shrinkage informed by external data. Biom J 2020; 63:289-304. [PMID: 33155717 PMCID: PMC7891636 DOI: 10.1002/bimj.201900371] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2019] [Revised: 04/30/2020] [Accepted: 06/03/2020] [Indexed: 11/09/2022]
Abstract
In precision medicine, a common problem is drug sensitivity prediction from cancer tissue cell lines. These types of problems entail modelling multivariate drug responses on high-dimensional molecular feature sets in typically >1000 cell lines. The dimensions of the problem require specialised models and estimation methods. In addition, external information on both the drugs and the features is often available. We propose to model the drug responses through a linear regression with shrinkage enforced through a normal inverse Gaussian prior. We let the prior depend on the external information, and estimate the model and external information dependence in an empirical-variational Bayes framework. We demonstrate the usefulness of this model in both a simulated setting and in the publicly available Genomics of Drug Sensitivity in Cancer data.
Collapse
Affiliation(s)
- Magnus M Münch
- Department of Epidemiology & Biostatistics, Amsterdam UMC, VU University, Amsterdam, The Netherlands.,Mathematical Institute, Leiden University, Leiden, The Netherlands.,MRC Biostatistics Unit, University of Cambridge, Cambridge Institute of Public Health, Forvie Site, Robinson Way, Cambridge Biomedical Campus, Cambridge, United Kingdom
| | - Mark A van de Wiel
- Department of Epidemiology & Biostatistics, Amsterdam UMC, VU University, Amsterdam, The Netherlands.,MRC Biostatistics Unit, University of Cambridge, Cambridge Institute of Public Health, Forvie Site, Robinson Way, Cambridge Biomedical Campus, Cambridge, United Kingdom
| | - Sylvia Richardson
- MRC Biostatistics Unit, University of Cambridge, Cambridge Institute of Public Health, Forvie Site, Robinson Way, Cambridge Biomedical Campus, Cambridge, United Kingdom
| | - Gwenaël G R Leday
- MRC Biostatistics Unit, University of Cambridge, Cambridge Institute of Public Health, Forvie Site, Robinson Way, Cambridge Biomedical Campus, Cambridge, United Kingdom
| |
Collapse
|
11
|
Vitorino R, Guedes S, Trindade F, Correia I, Moura G, Carvalho P, Santos MAS, Amado F. De novo sequencing of proteins by mass spectrometry. Expert Rev Proteomics 2020; 17:595-607. [PMID: 33016158 DOI: 10.1080/14789450.2020.1831387] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
INTRODUCTION Proteins are crucial for every cellular activity and unraveling their sequence and structure is a crucial step to fully understand their biology. Early methods of protein sequencing were mainly based on the use of enzymatic or chemical degradation of peptide chains. With the completion of the human genome project and with the expansion of the information available for each protein, various databases containing this sequence information were formed. AREAS COVERED De novo protein sequencing, shotgun proteomics and other mass-spectrometric techniques, along with the various software are currently available for proteogenomic analysis. Emphasis is placed on the methods for de novo sequencing, together with potential and shortcomings using databases for interpretation of protein sequence data. EXPERT OPINION As mass-spectrometry sequencing performance is improving with better software and hardware optimizations, combined with user-friendly interfaces, de-novo protein sequencing becomes imperative in shotgun proteomic studies. Issues regarding unknown or mutated peptide sequences, as well as, unexpected post-translational modifications (PTMs) and their identification through false discovery rate searches using the target/decoy strategy need to be addressed. Ideally, it should become integrated in standard proteomic workflows as an add-on to conventional database search engines, which then would be able to provide improved identification.
Collapse
Affiliation(s)
- Rui Vitorino
- QOPNA & LAQV-REQUIMTE, Departamento De Química, Institute of Biomedicine - iBiMED , Aveiro, Portugal.,iBiMED, Department of Medical Sciences, University of Aveiro , Aveiro, Portugal.,Unidade De Investigação Cardiovascular, Departamento De Cirurgia E Fisiologia, Faculdade De Medicina, Universidade Do Porto , Porto, Portugal
| | - Sofia Guedes
- QOPNA & LAQV-REQUIMTE, Departamento De Química, Institute of Biomedicine - iBiMED , Aveiro, Portugal
| | - Fabio Trindade
- Unidade De Investigação Cardiovascular, Departamento De Cirurgia E Fisiologia, Faculdade De Medicina, Universidade Do Porto , Porto, Portugal
| | - Inês Correia
- iBiMED, Department of Medical Sciences, University of Aveiro , Aveiro, Portugal
| | - Gabriela Moura
- iBiMED, Department of Medical Sciences, University of Aveiro , Aveiro, Portugal
| | - Paulo Carvalho
- Laboratory for Structural and Computational Proteomics, Carlos Chagas Institute, FIOCRUZ, Laboratory for Proteomics and Protein Engineering , Brazil
| | - Manuel A S Santos
- iBiMED, Department of Medical Sciences, University of Aveiro , Aveiro, Portugal
| | - Francisco Amado
- QOPNA & LAQV-REQUIMTE, Departamento De Química, Institute of Biomedicine - iBiMED , Aveiro, Portugal
| |
Collapse
|
12
|
Eloyan A, Yue MS, Khachatryan D. Tumor heterogeneity estimation for radiomics in cancer. Stat Med 2020; 39:4704-4723. [PMID: 32964647 DOI: 10.1002/sim.8749] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 07/15/2020] [Accepted: 08/18/2020] [Indexed: 12/12/2022]
Abstract
Radiomics is an emerging field of medical image analysis research where quantitative measurements are obtained from radiological images that can be utilized to predict patient outcomes and inform treatment decisions. Cancer patients routinely undergo radiological evaluations when images of various modalities including computed tomography, positron emission tomography, and magnetic resonance images are collected for diagnosis and for evaluation of disease progression. Tumor characteristics, often referred to as measures of tumor heterogeneity, can be computed using these clinical images and used as predictors of disease progression and patient survival. Several approaches for quantifying tumor heterogeneity have been proposed, including intensity histogram-based measures, shape and volume-based features, and texture analysis. Taking into account the topology of the tumors we propose a statistical framework for estimating tumor heterogeneity using clustering based on Markov random field theory. We model the voxel intensities using a Gaussian mixture model using a Gibbs prior to incorporate voxel neighborhood information. We propose a novel approach to choosing the number of mixture components. Subsequently, we show that the proposed procedure outperforms the existing approaches when predicting lung cancer survival.
Collapse
Affiliation(s)
- Ani Eloyan
- Department of Biostatistics, Brown University, Providence, Rhode Island, USA
| | - Mun Sang Yue
- Department of Biostatistics, Gilead Sciences Inc., Foster City, California, USA
| | - Davit Khachatryan
- Division of Mathematics and Science, Babson College, Babson Park, Massachusetts, USA
| |
Collapse
|