1
|
Whitfield-Cargile CM, Chung HC, Coleman MC, Cohen ND, Chamoun-Emanuelli AM, Ivanov I, Goldsby JS, Davidson LA, Gaynanova I, Ni Y, Chapkin RS. Integrated analysis of gut metabolome, microbiome, and exfoliome data in an equine model of intestinal injury. MICROBIOME 2024; 12:74. [PMID: 38622632 PMCID: PMC11017594 DOI: 10.1186/s40168-024-01785-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 02/29/2024] [Indexed: 04/17/2024]
Abstract
BACKGROUND The equine gastrointestinal (GI) microbiome has been described in the context of various diseases. The observed changes, however, have not been linked to host function and therefore it remains unclear how specific changes in the microbiome alter cellular and molecular pathways within the GI tract. Further, non-invasive techniques to examine the host gene expression profile of the GI mucosa have been described in horses but not evaluated in response to interventions. Therefore, the objectives of our study were to (1) profile gene expression and metabolomic changes in an equine model of non-steroidal anti-inflammatory drug (NSAID)-induced intestinal inflammation and (2) apply computational data integration methods to examine host-microbiota interactions. METHODS Twenty horses were randomly assigned to 1 of 2 groups (n = 10): control (placebo paste) or NSAID (phenylbutazone 4.4 mg/kg orally once daily for 9 days). Fecal samples were collected on days 0 and 10 and analyzed with respect to microbiota (16S rDNA gene sequencing), metabolomic (untargeted metabolites), and host exfoliated cell transcriptomic (exfoliome) changes. Data were analyzed and integrated using a variety of computational techniques, and underlying regulatory mechanisms were inferred from features that were commonly identified by all computational approaches. RESULTS Phenylbutazone induced alterations in the microbiota, metabolome, and host transcriptome. Data integration identified correlation of specific bacterial genera with expression of several genes and metabolites that were linked to oxidative stress. Concomitant microbiota and metabolite changes resulted in the initiation of endoplasmic reticulum stress and unfolded protein response within the intestinal mucosa. CONCLUSIONS Results of integrative analysis identified an important role for oxidative stress, and subsequent cell signaling responses, in a large animal model of GI inflammation. The computational approaches for combining non-invasive platforms for unbiased assessment of host GI responses (e.g., exfoliomics) with metabolomic and microbiota changes have broad application for the field of gastroenterology. Video Abstract.
Collapse
Affiliation(s)
- C M Whitfield-Cargile
- Department of Large Animal Clinical Sciences, College of Veterinary Medicine & Biomedical Sciences, Texas A&M University, College Station, TX, USA.
| | - H C Chung
- Department of Statistics, College of Arts & Sciences, Texas A&M University, College Station, TX, USA
- Mathematics & Statistics Department, College of Science, University of North Carolina Charlotte, Charlotte, NC, USA
| | - M C Coleman
- Department of Large Animal Clinical Sciences, College of Veterinary Medicine & Biomedical Sciences, Texas A&M University, College Station, TX, USA
| | - N D Cohen
- Department of Large Animal Clinical Sciences, College of Veterinary Medicine & Biomedical Sciences, Texas A&M University, College Station, TX, USA
| | - A M Chamoun-Emanuelli
- Department of Large Animal Clinical Sciences, College of Veterinary Medicine & Biomedical Sciences, Texas A&M University, College Station, TX, USA
| | - I Ivanov
- Department of Veterinary Physiology and Pharmacology, College of Veterinary Medicine & Biomedical Sciences, Texas A&M University, College Station, TX, USA
| | - J S Goldsby
- Program in Integrative Nutrition & Complex Diseases, College of Agriculture & Life Sciences, Texas A&M University, College Station, TX, USA
| | - L A Davidson
- Program in Integrative Nutrition & Complex Diseases, College of Agriculture & Life Sciences, Texas A&M University, College Station, TX, USA
| | - I Gaynanova
- Department of Statistics, College of Arts & Sciences, Texas A&M University, College Station, TX, USA
| | - Y Ni
- Department of Statistics, College of Arts & Sciences, Texas A&M University, College Station, TX, USA
| | - R S Chapkin
- Program in Integrative Nutrition & Complex Diseases, College of Agriculture & Life Sciences, Texas A&M University, College Station, TX, USA
| |
Collapse
|
2
|
Zhang Y, Gaynanova I. Joint association and classification analysis of multi-view data. Biometrics 2022; 78:1614-1625. [PMID: 34343342 DOI: 10.1111/biom.13536] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Revised: 07/20/2021] [Accepted: 07/28/2021] [Indexed: 12/30/2022]
Abstract
Multi-view data, which is matched sets of measurements on the same subjects, have become increasingly common with advances in multi-omics technology. Often, it is of interest to find associations between the views that are related to the intrinsic class memberships. Existing association methods cannot directly incorporate class information, while existing classification methods do not take into account between-views associations. In this work, we propose a framework for Joint Association and Classification Analysis of multi-view data (JACA). Our goal is not to merely improve the misclassification rates, but to provide a latent representation of high-dimensional data that is both relevant for the subtype discrimination and coherent across the views. We motivate the methodology by establishing a connection between canonical correlation analysis and discriminant analysis. We also establish the estimation consistency of JACA in high-dimensional settings. A distinct advantage of JACA is that it can be applied to the multi-view data with block-missing structure, that is to cases where a subset of views or class labels is missing for some subjects. The application of JACA to quantify the associations between RNAseq and miRNA views with respect to consensus molecular subtypes in colorectal cancer data from The Cancer Genome Atlas project leads to improved misclassification rates and stronger found associations compared to existing methods.
Collapse
Affiliation(s)
- Yunfeng Zhang
- Department of Statistics, Texas A&M University, College Station, Texas, USA
| | - Irina Gaynanova
- Department of Statistics, Texas A&M University, College Station, Texas, USA
| |
Collapse
|
3
|
Anzarmou Y, Mkhadri A, Oualkacha K. Sparse overlapped linear discriminant analysis. TEST-SPAIN 2022. [DOI: 10.1007/s11749-022-00839-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
4
|
Safo SE, Min EJ, Haine L. Sparse linear discriminant analysis for multiview structured data. Biometrics 2022; 78:612-623. [PMID: 33739448 PMCID: PMC8906173 DOI: 10.1111/biom.13458] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 02/15/2021] [Accepted: 03/04/2021] [Indexed: 11/28/2022]
Abstract
Classification methods that leverage the strengths of data from multiple sources (multiview data) simultaneously have enormous potential to yield more powerful findings than two-step methods: association followed by classification. We propose two methods, sparse integrative discriminant analysis (SIDA), and SIDA with incorporation of network information (SIDANet), for joint association and classification studies. The methods consider the overall association between multiview data, and the separation within each view in choosing discriminant vectors that are associated and optimally separate subjects into different classes. SIDANet is among the first methods to incorporate prior structural information in joint association and classification studies. It uses the normalized Laplacian of a graph to smooth coefficients of predictor variables, thus encouraging selection of predictors that are connected. We demonstrate the effectiveness of our methods on a set of synthetic datasets and explore their use in identifying potential nontraditional risk factors that discriminate healthy patients at low versus high risk for developing atherosclerosis cardiovascular disease in 10 years. Our findings underscore the benefit of joint association and classification methods if the goal is to correlate multiview data and to perform classification.
Collapse
Affiliation(s)
- Sandra E. Safo
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Eun Jeong Min
- Department of Medical Life Sciences, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Lillian Haine
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, USA
| |
Collapse
|
5
|
Hilafu H, Safo SE. Sparse sliced inverse regression for high dimensional data analysis. BMC Bioinformatics 2022; 23:168. [PMID: 35525975 PMCID: PMC9080177 DOI: 10.1186/s12859-022-04700-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Accepted: 04/21/2022] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Dimension reduction and variable selection play a critical role in the analysis of contemporary high-dimensional data. The semi-parametric multi-index model often serves as a reasonable model for analysis of such high-dimensional data. The sliced inverse regression (SIR) method, which can be formulated as a generalized eigenvalue decomposition problem, offers a model-free estimation approach for the indices in the semi-parametric multi-index model. Obtaining sparse estimates of the eigenvectors that constitute the basis matrix that is used to construct the indices is desirable to facilitate variable selection, which in turn facilitates interpretability and model parsimony. RESULTS To this end, we propose a group-Dantzig selector type formulation that induces row-sparsity to the sliced inverse regression dimension reduction vectors. Extensive simulation studies are carried out to assess the performance of the proposed method, and compare it with other state of the art methods in the literature. CONCLUSION The proposed method is shown to yield competitive estimation, prediction, and variable selection performance. Three real data applications, including a metabolomics depression study, are presented to demonstrate the method's effectiveness in practice.
Collapse
Affiliation(s)
- Haileab Hilafu
- Department of Business Analytics and Statistics, University of Tennessee, Knoxville, TN 37996 USA
| | - Sandra E. Safo
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455 USA
| |
Collapse
|
6
|
Gaynanova I. Erratum for Prediction and estimation consistency of sparse multi-class penalized optimal scoring. BERNOULLI 2022. [DOI: 10.3150/21-bej1359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Irina Gaynanova
- Department of Statistics, Texas A&M University, MS 3143, College Station, TX 77843, USA
| |
Collapse
|
7
|
Nam JH, Kim D, Chung D. Sparse Linear Discriminant Analysis using the Prior-Knowledge-Guided Block Covariance Matrix. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS : AN INTERNATIONAL JOURNAL SPONSORED BY THE CHEMOMETRICS SOCIETY 2020; 206:104142. [PMID: 32968333 PMCID: PMC7505231 DOI: 10.1016/j.chemolab.2020.104142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
There are two key challenges when using a linear discriminant analysis in the high-dimensional setting, including singularity of the covariance matrix and difficulty of interpreting the resulting classifier. Although several methods have been proposed to address these problems, they focused only on identifying a parsimonious set of variables maximizing classification accuracy. However, most methods did not consider dependency between variables and efficacy of selected variables appropriately. To address these limitations, here we propose a new approach that directly estimates the sparse discriminant vector without a need of estimating the whole inverse covariance matrix, by formulating a quadratic optimization problem. Furthermore, this approach also allows to integrate external information to guide the structure of covariance matrix. We evaluated the proposed model with simulation studies. We then applied it to the transcriptomic study that aims to identify genomic markers predictive of the response to cancer immunotherapy, where the covariance matrix was constructed based on the prior knowledge available in the pathway database.
Collapse
Affiliation(s)
- Jin Hyun Nam
- Division of Biostatistics and Bioinformatics, Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC 29412, United States of America
- School of Pharmacy, Sungkyunkwan University, Suwon, Republic of Korea
| | - Donguk Kim
- Department of Statistics, Sungkyunkwan University, Seoul, Republic of Korea
| | - Dongjun Chung
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio 43210, United States of America
| |
Collapse
|
8
|
Ahn J, Chung HC, Jeon Y. Trace Ratio Optimization for High-Dimensional Multi-Class Discrimination. J Comput Graph Stat 2020. [DOI: 10.1080/10618600.2020.1807352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Jeongyoun Ahn
- Department of Statistics, University of Georgia, Athens, GA
| | | | - Yongho Jeon
- Department of Applied Statistics, Yonsei University, Seoul, Republic of Korea
| |
Collapse
|
9
|
A consistent variable selection method in high-dimensional canonical discriminant analysis. J MULTIVARIATE ANAL 2020. [DOI: 10.1016/j.jmva.2019.104561] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
10
|
Li Y, Hong HG, Li Y. Multiclass linear discriminant analysis with ultrahigh-dimensional features. Biometrics 2019; 75:1086-1097. [PMID: 31009070 PMCID: PMC6810714 DOI: 10.1111/biom.13065] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2018] [Accepted: 03/25/2019] [Indexed: 11/29/2022]
Abstract
Within the framework of Fisher's discriminant analysis, we propose a multiclass classification method which embeds variable screening for ultrahigh-dimensional predictors. Leveraging interfeature correlations, we show that the proposed linear classifier recovers informative features with probability tending to one and can asymptotically achieve a zero misclassification rate. We evaluate the finite sample performance of the method via extensive simulations and use this method to classify posttransplantation rejection types based on patients' gene expressions.
Collapse
Affiliation(s)
- Yanming Li
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan
| | - Hyokyoung G Hong
- Department of Statistics and Probability, Michigan State University, East Lansing, Michigan
| | - Yi Li
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan
| |
Collapse
|
11
|
Jung S, Ahn J, Jeon Y. Penalized Orthogonal Iteration for Sparse Estimation of Generalized Eigenvalue Problem. J Comput Graph Stat 2019. [DOI: 10.1080/10618600.2019.1568014] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Sungkyu Jung
- Department of Statistics, Seoul National University, Seoul, South Korea
| | - Jeongyoun Ahn
- Department of Statistics, University of Georgia, Athens, GA
| | - Yongho Jeon
- Department of Applied Statistics, Yonsei University, Seoul, South Korea
| |
Collapse
|
12
|
Gaynanova I, Wang T. Sparse quadratic classification rules via linear dimension reduction. J MULTIVARIATE ANAL 2019; 169:278-299. [PMID: 31105355 PMCID: PMC6516858 DOI: 10.1016/j.jmva.2018.09.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
We consider the problem of high-dimensional classification between two groups with unequal covariance matrices. Rather than estimating the full quadratic discriminant rule, we propose to perform simultaneous variable selection and linear dimension reduction on the original data, with the subsequent application of quadratic discriminant analysis on the reduced space. In contrast to quadratic discriminant analysis, the proposed framework doesn't require the estimation of precision matrices; it scales linearly with the number of measurements, making it especially attractive for the use on high-dimensional datasets. We support the methodology with theoretical guarantees on variable selection consistency, and empirical comparisons with competing approaches. We apply the method to gene expression data of breast cancer patients, and confirm the crucial importance of the ESR1 gene in differentiating estrogen receptor status.
Collapse
Affiliation(s)
- Irina Gaynanova
- Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX 77843, USA
| | - Tianying Wang
- Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX 77843, USA
| |
Collapse
|
13
|
Li Y, Lei J. Sparse subspace linear discriminant analysis. STATISTICS-ABINGDON 2018. [DOI: 10.1080/02331888.2018.1469020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Affiliation(s)
- Yanfang Li
- School of Mathematical Sciences, Peking University, Beijing, People's Republic of China
| | - Jing Lei
- Department of Statistics, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
14
|
Safo SE, Long Q. Sparse linear discriminant analysis in structured covariates space. Stat Anal Data Min 2018. [DOI: 10.1002/sam.11376] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Sandra E. Safo
- Division of BiostatisticsUniversity of Minnesota Minneapolis Minnesota
| | - Qi Long
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of MedicineUniversity of Pennsylvania Philadelphia Pennsylvania
| |
Collapse
|
15
|
Davenport ER, Goodrich JK, Bell JT, Spector TD, Ley RE, Clark AG. ABO antigen and secretor statuses are not associated with gut microbiota composition in 1,500 twins. BMC Genomics 2016; 17:941. [PMID: 27871240 PMCID: PMC5117602 DOI: 10.1186/s12864-016-3290-1] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Accepted: 11/15/2016] [Indexed: 12/30/2022] Open
Abstract
Background Host genetics is one of several factors known to shape human gut microbiome composition, however, the physiological processes underlying the heritability are largely unknown. Inter-individual differences in host factors secreted into the gut lumen may lead to variation in microbiome composition. One such factor is the ABO antigen. This molecule is not only expressed on the surface of red blood cells, but is also secreted from mucosal surfaces in individuals containing an intact FUT2 gene (secretors). Previous studies report differences in microbiome composition across ABO and secretor genotypes. However, due to methodological limitations, the specific bacterial taxa involved remain unknown. Results Here, we sought to determine the relationship of the microbiota to ABO blood group and secretor status in a large panel of 1503 individuals from a cohort of twins from the United Kingdom. Contrary to previous reports, robust associations between either ABO or secretor phenotypes and gut microbiome composition were not detected. Overall community structure, diversity, and the relative abundances of individual taxa were not significantly associated with ABO or secretor status. Additionally, joint-modeling approaches were unsuccessful in identifying combinations of taxa that were predictive of ABO or secretor status. Conclusions Despite previous reports, the taxonomic composition of the microbiota does not appear to be strongly associated with ABO or secretor status in 1503 individuals from the United Kingdom. These results highlight the importance of replicating microbiome-associated traits in large, well-powered cohorts to ensure results are robust. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3290-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Emily R Davenport
- Department of Molecular Biology & Genetics, Cornell University, Ithaca, NY, USA.
| | - Julia K Goodrich
- Department of Molecular Biology & Genetics, Cornell University, Ithaca, NY, USA
| | - Jordana T Bell
- Department of Twin Research and Genetic Epidemiology, King's College London, London, UK
| | - Tim D Spector
- Department of Twin Research and Genetic Epidemiology, King's College London, London, UK
| | - Ruth E Ley
- Department of Molecular Biology & Genetics, Cornell University, Ithaca, NY, USA.,Department of Microbiome Science, Max Planck Institute for Developmental Biology, Tübigen, Germany
| | - Andrew G Clark
- Department of Molecular Biology & Genetics, Cornell University, Ithaca, NY, USA
| |
Collapse
|