1
|
Zheng C, Wang M, Yamada R, Okada D. Delving into gene-set multiplex networks facilitated by a k-nearest neighbor-based measure of similarity. Comput Struct Biotechnol J 2023; 21:4988-5002. [PMID: 37867964 PMCID: PMC10589751 DOI: 10.1016/j.csbj.2023.09.042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 09/22/2023] [Accepted: 09/28/2023] [Indexed: 10/24/2023] Open
Abstract
Gene sets are functional units for living cells. Previously, limited studies investigated the complex relations among gene sets, but documents about their altering patterns across biological conditions still need to be prepared. In this study, we adopted and modified a classical k-nearest neighbor-based association function to detect inter-gene-set similarities. Based on this method, we built multiplex networks of gene sets for the first time; these networks contain layers of gene sets corresponding to different populations of cells. The context-based multiplex networks can capture meaningful biological variation and have considerable differences from knowledge-based networks of gene sets built on Jaccard similarity, as demonstrated in this study. Furthermore, at the scale of individual gene sets, the structural coefficients of gene sets (multiplex PageRank centrality, clustering coefficient, and participation coefficient) disclose the diversity of gene sets from the perspective of structural properties and make it easier to identify unique gene sets. In gene set enrichment analysis (GSEA), each gene set is treated independently, and its contextual and relational attributes are ignored. The structural coefficients of gene sets can supplement GSEA with information about the overall picture of gene sets, promoting the constructive reorganization of the enriched terms and helping researchers better prioritize and select gene sets.
Collapse
Affiliation(s)
- Cheng Zheng
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, South Research Bldg. No.1(5F), 53 Shogoinkawahara-cho, Sakyo-ku, Kyoto, 6068507, Kyoto, Japan
| | - Man Wang
- Department of Signal Transduction, Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Suita, 5650871, Osaka, Japan
| | - Ryo Yamada
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, South Research Bldg. No.1(5F), 53 Shogoinkawahara-cho, Sakyo-ku, Kyoto, 6068507, Kyoto, Japan
| | - Daigo Okada
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, South Research Bldg. No.1(5F), 53 Shogoinkawahara-cho, Sakyo-ku, Kyoto, 6068507, Kyoto, Japan
| |
Collapse
|
2
|
Li Y, Porta-Pardo E, Tokheim C, Bailey MH, Yaron TM, Stathias V, Geffen Y, Imbach KJ, Cao S, Anand S, Akiyama Y, Liu W, Wyczalkowski MA, Song Y, Storrs EP, Wendl MC, Zhang W, Sibai M, Ruiz-Serra V, Liang WW, Terekhanova NV, Rodrigues FM, Clauser KR, Heiman DI, Zhang Q, Aguet F, Calinawan AP, Dhanasekaran SM, Birger C, Satpathy S, Zhou DC, Wang LB, Baral J, Johnson JL, Huntsman EM, Pugliese P, Colaprico A, Iavarone A, Chheda MG, Ricketts CJ, Fenyö D, Payne SH, Rodriguez H, Robles AI, Gillette MA, Kumar-Sinha C, Lazar AJ, Cantley LC, Getz G, Ding L. Pan-cancer proteogenomics connects oncogenic drivers to functional states. Cell 2023; 186:3921-3944.e25. [PMID: 37582357 DOI: 10.1016/j.cell.2023.07.014] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 12/30/2022] [Accepted: 07/10/2023] [Indexed: 08/17/2023]
Abstract
Cancer driver events refer to key genetic aberrations that drive oncogenesis; however, their exact molecular mechanisms remain insufficiently understood. Here, our multi-omics pan-cancer analysis uncovers insights into the impacts of cancer drivers by identifying their significant cis-effects and distal trans-effects quantified at the RNA, protein, and phosphoprotein levels. Salient observations include the association of point mutations and copy-number alterations with the rewiring of protein interaction networks, and notably, most cancer genes converge toward similar molecular states denoted by sequence-based kinase activity profiles. A correlation between predicted neoantigen burden and measured T cell infiltration suggests potential vulnerabilities for immunotherapies. Patterns of cancer hallmarks vary by polygenic protein abundance ranging from uniform to heterogeneous. Overall, our work demonstrates the value of comprehensive proteogenomics in understanding the functional states of oncogenic drivers and their links to cancer development, surpassing the limitations of studying individual cancer types.
Collapse
Affiliation(s)
- Yize Li
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Eduard Porta-Pardo
- Josep Carreras Leukaemia Research Institute (IJC), Badalona 08916, Spain; Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
| | - Collin Tokheim
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Matthew H Bailey
- Department of Biology and Simmons Center for Cancer Research, Brigham Young University, Provo, UT 84602, USA
| | - Tomer M Yaron
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10021, USA; Department of Medicine, Weill Cornell Medicine, New York, NY 10021, USA; Englander Institute for Precision Medicine, Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Vasileios Stathias
- Sylvester Comprehensive Cancer Center, University of Miami Miller School of Medicine, Miami, FL 33136, USA; Department of Molecular and Cellular Pharmacology, University of Miami Miller School of Medicine, Miami, FL 33136, USA
| | - Yifat Geffen
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA; Cancer Center and Department of Pathology, Massachusetts General Hospital, Boston, MA 02115, USA
| | - Kathleen J Imbach
- Josep Carreras Leukaemia Research Institute (IJC), Badalona 08916, Spain; Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
| | - Song Cao
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Shankara Anand
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | - Yo Akiyama
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | - Wenke Liu
- Institute for Systems Genetics, NYU Grossman School of Medicine, New York, NY 10016, USA; Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York, NY 10016, USA
| | - Matthew A Wyczalkowski
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Yizhe Song
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Erik P Storrs
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Michael C Wendl
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA; Department of Genetics, Washington University in St. Louis, St. Louis, MO 63130, USA; Department of Mathematics, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Wubing Zhang
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Mustafa Sibai
- Josep Carreras Leukaemia Research Institute (IJC), Badalona 08916, Spain; Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
| | - Victoria Ruiz-Serra
- Josep Carreras Leukaemia Research Institute (IJC), Badalona 08916, Spain; Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
| | - Wen-Wei Liang
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Nadezhda V Terekhanova
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Fernanda Martins Rodrigues
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Karl R Clauser
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | - David I Heiman
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | - Qing Zhang
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | - Francois Aguet
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | - Anna P Calinawan
- Department of Genetic and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Saravana M Dhanasekaran
- Michigan Center for Translational Pathology, Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Chet Birger
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | - Shankha Satpathy
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | - Daniel Cui Zhou
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Liang-Bo Wang
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Jessika Baral
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Jared L Johnson
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10021, USA; Department of Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Emily M Huntsman
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10021, USA; Department of Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Pietro Pugliese
- Department of Science and Technology, University of Sannio, 82100 Benevento, Italy
| | - Antonio Colaprico
- Sylvester Comprehensive Cancer Center, University of Miami Miller School of Medicine, Miami, FL 33136, USA; Department of Public Health Sciences, University of Miami Miller School of Medicine, Miami, FL 33136, USA
| | - Antonio Iavarone
- Sylvester Comprehensive Cancer Center, University of Miami Miller School of Medicine, Miami, FL 33136, USA; Department of Neurological Surgery, Department of Biochemistry and Molecular Biology, University of Miami Miller School of Medicine, Miami, FL 33136, USA
| | - Milan G Chheda
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO 63130, USA; Department of Neurology, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Christopher J Ricketts
- Urologic Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - David Fenyö
- Institute for Systems Genetics, NYU Grossman School of Medicine, New York, NY 10016, USA; Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York, NY 10016, USA
| | - Samuel H Payne
- Department of Biology, Brigham Young University, Provo, UT 84602, USA
| | - Henry Rodriguez
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, Rockville, MD 20850, USA
| | - Ana I Robles
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, Rockville, MD 20850, USA
| | - Michael A Gillette
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA; Harvard Medical School, Boston, MA 02115, USA
| | - Chandan Kumar-Sinha
- Michigan Center for Translational Pathology, Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Alexander J Lazar
- Departments of Pathology & Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Lewis C Cantley
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10021, USA; Department of Medicine, Weill Cornell Medicine, New York, NY 10021, USA.
| | - Gad Getz
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA; Cancer Center and Department of Pathology, Massachusetts General Hospital, Boston, MA 02115, USA; Harvard Medical School, Boston, MA 02115, USA.
| | - Li Ding
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA; Department of Genetics, Washington University in St. Louis, St. Louis, MO 63130, USA; Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO 63130, USA.
| |
Collapse
|
3
|
Migliozzi S, Oh YT, Hasanain M, Garofano L, D'Angelo F, Najac RD, Picca A, Bielle F, Di Stefano AL, Lerond J, Sarkaria JN, Ceccarelli M, Sanson M, Lasorella A, Iavarone A. Integrative multi-omics networks identify PKCδ and DNA-PK as master kinases of glioblastoma subtypes and guide targeted cancer therapy. NATURE CANCER 2023; 4:181-202. [PMID: 36732634 PMCID: PMC9970878 DOI: 10.1038/s43018-022-00510-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 12/21/2022] [Indexed: 02/04/2023]
Abstract
Despite producing a panoply of potential cancer-specific targets, the proteogenomic characterization of human tumors has yet to demonstrate value for precision cancer medicine. Integrative multi-omics using a machine-learning network identified master kinases responsible for effecting phenotypic hallmarks of functional glioblastoma subtypes. In subtype-matched patient-derived models, we validated PKCδ and DNA-PK as master kinases of glycolytic/plurimetabolic and proliferative/progenitor subtypes, respectively, and qualified the kinases as potent and actionable glioblastoma subtype-specific therapeutic targets. Glioblastoma subtypes were associated with clinical and radiomics features, orthogonally validated by proteomics, phospho-proteomics, metabolomics, lipidomics and acetylomics analyses, and recapitulated in pediatric glioma, breast and lung squamous cell carcinoma, including subtype specificity of PKCδ and DNA-PK activity. We developed a probabilistic classification tool that performs optimally with RNA from frozen and paraffin-embedded tissues, which can be used to evaluate the association of therapeutic response with glioblastoma subtypes and to inform patient selection in prospective clinical trials.
Collapse
Affiliation(s)
- Simona Migliozzi
- Institute for Cancer Genetics, Columbia University Medical Center, New York, NY, USA.,Sylvester Comprehensive Cancer Center, University of Miami, Miller School of Medicine, Miami, FL, USA
| | - Young Taek Oh
- Institute for Cancer Genetics, Columbia University Medical Center, New York, NY, USA.,Sylvester Comprehensive Cancer Center, University of Miami, Miller School of Medicine, Miami, FL, USA
| | - Mohammad Hasanain
- Institute for Cancer Genetics, Columbia University Medical Center, New York, NY, USA.,Sylvester Comprehensive Cancer Center, University of Miami, Miller School of Medicine, Miami, FL, USA
| | - Luciano Garofano
- Institute for Cancer Genetics, Columbia University Medical Center, New York, NY, USA.,Sylvester Comprehensive Cancer Center, University of Miami, Miller School of Medicine, Miami, FL, USA
| | - Fulvio D'Angelo
- Institute for Cancer Genetics, Columbia University Medical Center, New York, NY, USA.,Sylvester Comprehensive Cancer Center, University of Miami, Miller School of Medicine, Miami, FL, USA
| | - Ryan D Najac
- Institute for Cancer Genetics, Columbia University Medical Center, New York, NY, USA
| | - Alberto Picca
- AP-HP, Hôpital de la Pitié-Salpêtrière, Service de Neurologie 2, Paris, France.,Sorbonne Université, INSERM Unité 1127, CNRS UMR 7225, Paris Brain Institute, Equipe labellissée LNCC, Paris, France
| | - Franck Bielle
- Sorbonne Université, INSERM Unité 1127, CNRS UMR 7225, Paris Brain Institute, Equipe labellissée LNCC, Paris, France.,Department of Neuropathology, Pitié-Salpêtrière-Charles Foix, AP-HP, Paris, France
| | - Anna Luisa Di Stefano
- Sorbonne Université, INSERM Unité 1127, CNRS UMR 7225, Paris Brain Institute, Equipe labellissée LNCC, Paris, France.,Department of Neurology, Foch Hospital, Suresnes, Paris, France.,Neurosurgery Unit, Spedali Riuniti, Livorno, Italy
| | - Julie Lerond
- Sorbonne Université, INSERM Unité 1127, CNRS UMR 7225, Paris Brain Institute, Equipe labellissée LNCC, Paris, France
| | - Jann N Sarkaria
- Department of Radiation Oncology, Mayo Clinic, Rochester, MN, USA
| | - Michele Ceccarelli
- Department of Electrical Engineering and Information Technology (DIETI), University of Naples Federico II, Napoli, Italy.,BIOGEM Institute of Molecular Biology and Genetics, Via Camporeale, Ariano Irpino, Italy
| | - Marc Sanson
- AP-HP, Hôpital de la Pitié-Salpêtrière, Service de Neurologie 2, Paris, France.,Sorbonne Université, INSERM Unité 1127, CNRS UMR 7225, Paris Brain Institute, Equipe labellissée LNCC, Paris, France.,Onconeurotek Tumor Bank, Paris Brain Institute ICM, Paris, France
| | - Anna Lasorella
- Institute for Cancer Genetics, Columbia University Medical Center, New York, NY, USA. .,Sylvester Comprehensive Cancer Center, University of Miami, Miller School of Medicine, Miami, FL, USA. .,Department of Pathology and Cell Biology, Columbia University Medical Center, New York, NY, USA. .,Department of Pediatrics, Columbia University Medical Center, New York, NY, USA. .,Department of Biochemistry and Molecular Biology, University of Miami, Miller School of Medicine, Miami, FL, USA.
| | - Antonio Iavarone
- Institute for Cancer Genetics, Columbia University Medical Center, New York, NY, USA. .,Sylvester Comprehensive Cancer Center, University of Miami, Miller School of Medicine, Miami, FL, USA. .,Department of Pathology and Cell Biology, Columbia University Medical Center, New York, NY, USA. .,Department of Neurology, Columbia University Medical Center, New York, NY, USA. .,Department of Neurological Surgery, University of Miami, Miller School of Medicine, Miami, FL, USA.
| |
Collapse
|
4
|
Balestra C, Maj C, Müller E, Mayr A. Redundancy-aware unsupervised ranking based on game theory: Ranking pathways in collections of gene sets. PLoS One 2023; 18:e0282699. [PMID: 36893181 PMCID: PMC9997904 DOI: 10.1371/journal.pone.0282699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 02/13/2023] [Indexed: 03/10/2023] Open
Abstract
In Genetics, gene sets are grouped in collections concerning their biological function. This often leads to high-dimensional, overlapping, and redundant families of sets, thus precluding a straightforward interpretation of their biological meaning. In Data Mining, it is often argued that techniques to reduce the dimensionality of data could increase the maneuverability and consequently the interpretability of large data. In the past years, moreover, we witnessed an increasing consciousness of the importance of understanding data and interpretable models in the machine learning and bioinformatics communities. On the one hand, there exist techniques aiming to aggregate overlapping gene sets to create larger pathways. While these methods could partly solve the large size of the collections' problem, modifying biological pathways is hardly justifiable in this biological context. On the other hand, the representation methods to increase interpretability of collections of gene sets that have been proposed so far have proved to be insufficient. Inspired by this Bioinformatics context, we propose a method to rank sets within a family of sets based on the distribution of the singletons and their size. We obtain sets' importance scores by computing Shapley values; Making use of microarray games, we do not incur the typical exponential computational complexity. Moreover, we address the challenge of constructing redundancy-aware rankings where, in our case, redundancy is a quantity proportional to the size of intersections among the sets in the collections. We use the obtained rankings to reduce the dimension of the families, therefore showing lower redundancy among sets while still preserving a high coverage of their elements. We finally evaluate our approach for collections of gene sets and apply Gene Sets Enrichment Analysis techniques to the now smaller collections: As expected, the unsupervised nature of the proposed rankings allows for unremarkable differences in the number of significant gene sets for specific phenotypic traits. In contrast, the number of performed statistical tests can be drastically reduced. The proposed rankings show a practical utility in bioinformatics to increase interpretability of the collections of gene sets and a step forward to include redundancy-awareness into Shapley values computations.
Collapse
Affiliation(s)
- Chiara Balestra
- Department of Computer Science, TU Dortmund, Dortmund, Germany
- Department of Medical Biometry, Informatics and Epidemiology (IMBIE), University Hospital Bonn, Bonn, Germany
- * E-mail:
| | - Carlo Maj
- Institute for Genomic Statistics and Bioinformatics IGSB, University Hospital Bonn, Bonn, Germany
- Centre for Human Genetics, University of Marburg, Marburg, Germany
| | - Emmanuel Müller
- Department of Computer Science, TU Dortmund, Dortmund, Germany
| | - Andreas Mayr
- Department of Medical Biometry, Informatics and Epidemiology (IMBIE), University Hospital Bonn, Bonn, Germany
| |
Collapse
|
5
|
Wieder C, Lai RPJ, Ebbels TMD. Single sample pathway analysis in metabolomics: performance evaluation and application. BMC Bioinformatics 2022; 23:481. [PMID: 36376837 PMCID: PMC9664704 DOI: 10.1186/s12859-022-05005-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 10/25/2022] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND Single sample pathway analysis (ssPA) transforms molecular level omics data to the pathway level, enabling the discovery of patient-specific pathway signatures. Compared to conventional pathway analysis, ssPA overcomes the limitations by enabling multi-group comparisons, alongside facilitating numerous downstream analyses such as pathway-based machine learning. While in transcriptomics ssPA is a widely used technique, there is little literature evaluating its suitability for metabolomics. Here we provide a benchmark of established ssPA methods (ssGSEA, GSVA, SVD (PLAGE), and z-score) alongside the evaluation of two novel methods we propose: ssClustPA and kPCA, using semi-synthetic metabolomics data. We then demonstrate how ssPA can facilitate pathway-based interpretation of metabolomics data by performing a case-study on inflammatory bowel disease mass spectrometry data, using clustering to determine subtype-specific pathway signatures. RESULTS While GSEA-based and z-score methods outperformed the others in terms of recall, clustering/dimensionality reduction-based methods provided higher precision at moderate-to-high effect sizes. A case study applying ssPA to inflammatory bowel disease data demonstrates how these methods yield a much richer depth of interpretation than conventional approaches, for example by clustering pathway scores to visualise a pathway-based patient subtype-specific correlation network. We also developed the sspa python package (freely available at https://pypi.org/project/sspa/ ), providing implementations of all the methods benchmarked in this study. CONCLUSION This work underscores the value ssPA methods can add to metabolomic studies and provides a useful reference for those wishing to apply ssPA methods to metabolomics data.
Collapse
Affiliation(s)
- Cecilia Wieder
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, UK
| | - Rachel P J Lai
- Department of Infectious Disease, Faculty of Medicine, Imperial College London, London, UK
| | - Timothy M D Ebbels
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, UK.
| |
Collapse
|
6
|
Pannaraj PS, da Costa-Martins AG, Cerini C, Li F, Wong SS, Singh Y, Urbanski AH, Gonzalez-Dias P, Yang J, Webby RJ, Nakaya HI, Aldrovandi GM. Molecular alterations in human milk in simulated maternal nasal mucosal infection with live attenuated influenza vaccination. Mucosal Immunol 2022; 15:1040-1047. [PMID: 35739193 PMCID: PMC9225800 DOI: 10.1038/s41385-022-00537-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 05/24/2022] [Accepted: 05/31/2022] [Indexed: 02/04/2023]
Abstract
Breastfeeding protects against mucosal infections in infants. The underlying mechanisms through which immunity develops in human milk following maternal infection with mucosal pathogens are not well understood. We simulated nasal mucosal influenza infection through live attenuated influenza vaccination (LAIV) and compared immune responses in milk to inactivated influenza vaccination (IIV). Transcriptomic analysis was performed on RNA extracted from human milk cells to evaluate differentially expressed genes and pathways on days 1 and 7 post-vaccination. Both LAIV and IIV vaccines induced influenza-specific IgA that persisted for at least 6 months. Regulation of type I interferon production, toll-like receptor, and pattern recognition receptor signaling pathways were highly upregulated in milk on day 1 following LAIV but not IIV at any time point. Upregulation of innate immunity in human milk may provide timely protection against mucosal infections until antigen-specific immunity develops in the human milk-fed infant.
Collapse
Affiliation(s)
- Pia S Pannaraj
- Division of Infectious Diseases, Department of Pediatrics, Children's Hospital Los Angeles, Los Angeles, CA, USA.
- Department of Molecular Microbiology and Immunology, University of Southern California, Los Angeles, CA, USA.
| | - André Guilherme da Costa-Martins
- School of Pharmaceutical Sciences, University of Sao Paulo, Sao Paulo, Brazil
- Scientific Platform Pasteur-University of São Paulo, São Paulo, Brazil
| | - Chiara Cerini
- Division of Infectious Diseases, Department of Pediatrics, Children's Hospital Los Angeles, Los Angeles, CA, USA
| | - Fan Li
- Division of Infectious Diseases, Department of Pediatrics, Children's Hospital Los Angeles, Los Angeles, CA, USA
| | - Sook-San Wong
- Department of Infectious Diseases, St Jude Children's Research Hospital, Memphis, TN, USA
- School of Public Health, The University of Hong Kong, Pok Fu Lam, Hong Kong
| | - Youvika Singh
- School of Pharmaceutical Sciences, University of Sao Paulo, Sao Paulo, Brazil
- Scientific Platform Pasteur-University of São Paulo, São Paulo, Brazil
| | - Alysson H Urbanski
- School of Pharmaceutical Sciences, University of Sao Paulo, Sao Paulo, Brazil
| | - Patrícia Gonzalez-Dias
- Hospital Israelita Albert Einstein, São Paulo, Brazil
- Department of Clinical Sciences, Liverpool School of Tropical Medicine, Liverpool, United Kingdom
| | - Juliana Yang
- School of Pharmaceutical Sciences, University of Sao Paulo, Sao Paulo, Brazil
| | - Richard J Webby
- Department of Infectious Diseases, St Jude Children's Research Hospital, Memphis, TN, USA
| | - Helder I Nakaya
- Scientific Platform Pasteur-University of São Paulo, São Paulo, Brazil
- Hospital Israelita Albert Einstein, São Paulo, Brazil
| | - Grace M Aldrovandi
- Department of Pediatrics, University of California Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
7
|
de Steenhuijsen Piters WAA, Watson RL, de Koff EM, Hasrat R, Arp K, Chu MLJN, de Groot PCM, van Houten MA, Sanders EAM, Bogaert D. Early-life viral infections are associated with disadvantageous immune and microbiota profiles and recurrent respiratory infections. Nat Microbiol 2022; 7:224-237. [PMID: 35058634 DOI: 10.1038/s41564-021-01043-2] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Accepted: 12/06/2021] [Indexed: 12/17/2022]
Abstract
The respiratory tract is populated by a specialized microbial ecosystem, which is seeded during and directly following birth. Perturbed development of the respiratory microbial community in early-life has been associated with higher susceptibility to respiratory tract infections (RTIs). Given a consistent gap in time between first signs of aberrant microbial maturation and the observation of the first RTIs, we hypothesized that early-life host-microbe cross-talk plays a role in this process. We therefore investigated viral presence, gene expression profiles and nasopharyngeal microbiota from birth until 12 months of age in 114 healthy infants. We show that the strongest dynamics in gene expression profiles occurred within the first days of life, mostly involving Toll-like receptor (TLR) and inflammasome signalling. These gene expression dynamics coincided with rapid microbial niche differentiation. Early asymptomatic viral infection co-occurred with stronger interferon activity, which was related to specific microbiota dynamics following, including early enrichment of Moraxella and Haemophilus spp. These microbial trajectories were in turn related to a higher number of subsequent (viral) RTIs over the first year of life. Using a multi-omic approach, we found evidence for species-specific host-microbe interactions related to consecutive susceptibility to RTIs. Although further work will be needed to confirm causality of our findings, together these data indicate that early-life viral encounters could impact subsequent host-microbe cross-talk, which is linked to later-life infections.
Collapse
Affiliation(s)
- Wouter A A de Steenhuijsen Piters
- Department of Paediatric Immunology and Infectious Diseases, Wilhelmina Children's Hospital/University Medical Center Utrecht, Utrecht, the Netherlands
- Centre for Infectious Disease Control, National Institute for Public Health and the Environment, Bilthoven, the Netherlands
| | - Rebecca L Watson
- Centre for Inflammation Research, Queen's Medical Research Institute, University of Edinburgh, Edinburgh, UK
| | - Emma M de Koff
- Department of Paediatric Immunology and Infectious Diseases, Wilhelmina Children's Hospital/University Medical Center Utrecht, Utrecht, the Netherlands
- Centre for Infectious Disease Control, National Institute for Public Health and the Environment, Bilthoven, the Netherlands
- Spaarne Gasthuis Academy, Hoofddorp and Haarlem, the Netherlands
| | - Raiza Hasrat
- Department of Paediatric Immunology and Infectious Diseases, Wilhelmina Children's Hospital/University Medical Center Utrecht, Utrecht, the Netherlands
- Centre for Infectious Disease Control, National Institute for Public Health and the Environment, Bilthoven, the Netherlands
| | - Kayleigh Arp
- Department of Paediatric Immunology and Infectious Diseases, Wilhelmina Children's Hospital/University Medical Center Utrecht, Utrecht, the Netherlands
- Centre for Infectious Disease Control, National Institute for Public Health and the Environment, Bilthoven, the Netherlands
| | - Mei Ling J N Chu
- Department of Paediatric Immunology and Infectious Diseases, Wilhelmina Children's Hospital/University Medical Center Utrecht, Utrecht, the Netherlands
- Centre for Infectious Disease Control, National Institute for Public Health and the Environment, Bilthoven, the Netherlands
| | - Pieter C M de Groot
- Department of Obstetrics and Gynaecology, Spaarne Gasthuis, Hoofddorp and Haarlem, the Netherlands
| | - Marlies A van Houten
- Spaarne Gasthuis Academy, Hoofddorp and Haarlem, the Netherlands
- Department of Paediatrics, Spaarne Gasthuis, Hoofddorp and Haarlem, the Netherlands
| | - Elisabeth A M Sanders
- Department of Paediatric Immunology and Infectious Diseases, Wilhelmina Children's Hospital/University Medical Center Utrecht, Utrecht, the Netherlands
- Centre for Infectious Disease Control, National Institute for Public Health and the Environment, Bilthoven, the Netherlands
| | - Debby Bogaert
- Department of Paediatric Immunology and Infectious Diseases, Wilhelmina Children's Hospital/University Medical Center Utrecht, Utrecht, the Netherlands.
- Centre for Infectious Disease Control, National Institute for Public Health and the Environment, Bilthoven, the Netherlands.
- Centre for Inflammation Research, Queen's Medical Research Institute, University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
8
|
Taus P, Pospisilova S, Plevova K. Identification of Clinically Relevant Subgroups of Chronic Lymphocytic Leukemia Through Discovery of Abnormal Molecular Pathways. Front Genet 2021; 12:627964. [PMID: 34262590 PMCID: PMC8273263 DOI: 10.3389/fgene.2021.627964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 05/04/2021] [Indexed: 11/13/2022] Open
Abstract
Chronic lymphocytic leukemia (CLL) is the most common form of adult leukemia in the Western world with a highly variable clinical course. Its striking genetic heterogeneity is not yet fully understood. Although the CLL genetic landscape has been well-described, patient stratification based on mutation profiles remains elusive mainly due to the heterogeneity of data. Here we attempted to decrease the heterogeneity of somatic mutation data by mapping mutated genes in the respective biological processes. From the sequencing data gathered by the International Cancer Genome Consortium for 506 CLL patients, we generated pathway mutation scores, applied ensemble clustering on them, and extracted abnormal molecular pathways with a machine learning approach. We identified four clusters differing in pathway mutational profiles and time to first treatment. Interestingly, common CLL drivers such as ATM or TP53 were associated with particular subtypes, while others like NOTCH1 or SF3B1 were not. This study provides an important step in understanding mutational patterns in CLL.
Collapse
Affiliation(s)
- Petr Taus
- Central European Institute of Technology, Masaryk University, Brno, Czechia
| | - Sarka Pospisilova
- Central European Institute of Technology, Masaryk University, Brno, Czechia.,Department of Internal Medicine - Hematology and Oncology, University Hospital Brno, Brno, Czechia.,Faculty of Medicine, Masaryk University, Brno, Czechia
| | - Karla Plevova
- Central European Institute of Technology, Masaryk University, Brno, Czechia.,Department of Internal Medicine - Hematology and Oncology, University Hospital Brno, Brno, Czechia.,Faculty of Medicine, Masaryk University, Brno, Czechia
| |
Collapse
|
9
|
Li W, Shih A, Freudenberg-Hua Y, Fury W, Yang Y. Beyond standard pipeline and p < 0.05 in pathway enrichment analyses. Comput Biol Chem 2021; 92:107455. [PMID: 33774420 PMCID: PMC9179938 DOI: 10.1016/j.compbiolchem.2021.107455] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 12/18/2020] [Accepted: 02/07/2021] [Indexed: 10/22/2022]
Abstract
A standard pathway/gene-set enrichment analysis, the over-representation analysis, is based on four values: the size of two gene-sets, size of their overlap, and size of the gene universe from which the gene-sets are chosen. The standard result of such an analysis is based on the p-value of a statistical test. We supplement this standard pipeline by six cautions: (1) any p-value threshold to distinguish enriched gene-sets from not-enriched ones is to certain degree arbitrary; (2) genes in a gene-set may be correlated, which potentially overcount the gene-set size; (3) any attempt to impose multiple testing correction will increase the false negative rate; (4) gene-sets in a gene-set database may be correlated, potentially overcount the factor for multiple testing correction; (5) the discrete nature of the data make it possible that a minimum change in counts may lead to a quantum change in the p-value threshold-based conclusion; (6) the two gene-sets may not be chosen from the universe of all human genes, but in fact from a subset of that universe, or even two different subsets of all genes. Careful reconsideration of these issues can have an impact on an enrichment analysis conclusion. Part of our cautions mirror the call from statistician that reaching conclusion from data is not a simple matter of p-value smaller than 0.05, but a thoughtful process with due diligences.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
| | - Andrew Shih
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
| | - Yun Freudenberg-Hua
- Litwin-Zucker Center for the study of Alzheimer's Disease, The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA; Division of Geriatric Psychiatry, Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, USA
| | - Wen Fury
- Regeneron Pharmaceutical Inc., Tarrytown, NY, USA
| | - Yaning Yang
- Department of Statistics and Finance, University of Science and Technology of China, Hefei, Anhui, China
| |
Collapse
|
10
|
Zhang C, Gao L, Wang B, Gao Y. Improving Single-Cell RNA-seq Clustering by Integrating Pathways. Brief Bioinform 2021; 22:6262246. [PMID: 33940590 DOI: 10.1093/bib/bbab147] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 03/21/2021] [Accepted: 03/26/2021] [Indexed: 01/03/2023] Open
Abstract
Single-cell clustering is an important part of analyzing single-cell RNA-sequencing data. However, the accuracy and robustness of existing methods are disturbed by noise. One promising approach for addressing this challenge is integrating pathway information, which can alleviate noise and improve performance. In this work, we studied the impact on accuracy and robustness of existing single-cell clustering methods by integrating pathways. We collected 10 state-of-the-art single-cell clustering methods, 26 scRNA-seq datasets and four pathway databases, combined the AUCell method and the similarity network fusion to integrate pathway data and scRNA-seq data, and introduced three accuracy indicators, three noise generation strategies and robustness indicators. Experiments on this framework showed that integrating pathways can significantly improve the accuracy and robustness of most single-cell clustering methods.
Collapse
Affiliation(s)
- Chenxing Zhang
- Computer Science and Technology at Xidian University, Xi'an 710071, China
| | - Lin Gao
- School of Computer Science and Technology at Xidian University, Xi'an 710071, China
| | - Bingbo Wang
- School of Computer Science and Technology, Xidian University, Xi'an 710071, China
| | - Yong Gao
- Computer Science at the University of British Columbia Okanagan (UBC Okanagan), Canada
| |
Collapse
|
11
|
Wang S, Flynn ER, Altman RB. Gaussian Embedding for Large-scale Gene Set Analysis. NAT MACH INTELL 2020; 2:387-395. [PMID: 32968711 PMCID: PMC7505077 DOI: 10.1038/s42256-020-0193-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Accepted: 05/15/2020] [Indexed: 02/08/2023]
Abstract
Gene sets, including protein complexes and signaling pathways, have proliferated greatly, in large part as a result of high-throughput biological data. Leveraging gene sets to gain insight into biological discovery requires computational methods for converting them into a useful form for available machine learning models. Here, we study the problem of embedding gene sets as compact features that are compatible with available machine learning codes. We present Set2Gaussian, a novel network-based gene set embedding approach, which represents each gene set as a multivariate Gaussian distribution rather than a single point in the low-dimensional space, according to the proximity of these genes in a protein-protein interaction network. We demonstrate that Set2Gaussian improves gene set member identification, accurately stratifies tumors, and finds concise gene sets for gene set enrichment analysis. We further show how Set2Gaussian allows us to identify a previously unknown clinical prognostic and predictive subnetwork around NEFM in sarcoma, which we validate in independent cohorts.
Collapse
Affiliation(s)
- Sheng Wang
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Emily R. Flynn
- Biomedical Informatics Training Program, Stanford University, Stanford, CA 94305, USA
| | - Russ B. Altman
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- Biomedical Informatics Training Program, Stanford University, Stanford, CA 94305, USA
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
12
|
Mubeen S, Hoyt CT, Gemünd A, Hofmann-Apitius M, Fröhlich H, Domingo-Fernández D. The Impact of Pathway Database Choice on Statistical Enrichment Analysis and Predictive Modeling. Front Genet 2019; 10:1203. [PMID: 31824580 PMCID: PMC6883970 DOI: 10.3389/fgene.2019.01203] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 10/30/2019] [Indexed: 02/04/2023] Open
Abstract
Pathway-centric approaches are widely used to interpret and contextualize -omics data. However, databases contain different representations of the same biological pathway, which may lead to different results of statistical enrichment analysis and predictive models in the context of precision medicine. We have performed an in-depth benchmarking of the impact of pathway database choice on statistical enrichment analysis and predictive modeling. We analyzed five cancer datasets using three major pathway databases and developed an approach to merge several databases into a single integrative one: MPath. Our results show that equivalent pathways from different databases yield disparate results in statistical enrichment analysis. Moreover, we observed a significant dataset-dependent impact on the performance of machine learning models on different prediction tasks. In some cases, MPath significantly improved prediction performance and also reduced the variance of prediction performances. Furthermore, MPath yielded more consistent and biologically plausible results in statistical enrichment analyses. In summary, this benchmarking study demonstrates that pathway database choice can influence the results of statistical enrichment analysis and predictive modeling. Therefore, we recommend the use of multiple pathway databases or integrative ones.
Collapse
Affiliation(s)
- Sarah Mubeen
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Charles Tapley Hoyt
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - André Gemünd
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Holger Fröhlich
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Daniel Domingo-Fernández
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| |
Collapse
|
13
|
Koninckx PR, Ussia A, Adamyan L, Wattiez A, Gomel V, Martin DC. Correction: Heterogeneity of endometriosis lesions requires individualisation of diagnosis and treatment and a different approach to research and evidence based medicine. Facts Views Vis Obgyn 2019; 11:263. [PMID: 32175528 PMCID: PMC7053565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Statistical significance is used to analyse research findings and is together with biased free trials the cornerstone of evidence based medicine. However traditional statistics are based on the assumption that the population investigated is homogeneous without smaller hidden subgroups. The clinical, inflammatory, immunological, biochemical, histochemical and genetic-epigenetic heterogeneity of similar looking endometriosis lesions is a challenge for research and for diagnosis and treatment of endometriosis. The conclusions obtained by statistical testing of the entire group are not necessarily valid for subgroups. The importance is illustrated by the fact that a treatment with a beneficial effect in 80% of women but with exactly the same but opposite effect, worsening the disease in 20%, remains statistically highly significant. Since traditional statistics are unable to detect hidden subgroups, new approaches are mandatory. For diagnosis and treatment it is suggested to visualise individual data and to pay specific attention to the extremes of an analysis. For research it is important to integrate clinical, biochemical and histochemical data with molecular biological pathways and genetic-epigenetic analysis of the lesions.
Collapse
Affiliation(s)
- PR Koninckx
- Latifa Hospital, Dubai, United Arab Emirates;,Professor emeritus OBGYN, KULeuven Belgium, University of Oxford-Hon Consultant, UK, University Cattolica, Roma, Moscow State Univ.;,Gruppo Italo Belga, Villa Del Rosario Rome Italy
| | - A Ussia
- Professor emeritus OBGYN, KULeuven Belgium, University of Oxford-Hon Consultant, UK, University Cattolica, Roma, Moscow State Univ.;,Consultant Università Cattolica, Roma Italy
| | - L Adamyan
- Department of Operative Gynecology, Federal State Budget Institution V. I. Kulakov Research Centre for Obstetrics, Gynecology, and Perinatology, Ministry of Health of the Russian Federation, Moscow, Russia; and e Department of Reproductive Medicine and Surgery, Moscow State University of Medicine and Dentistry, Moscow, Russia
| | - A Wattiez
- Latifa Hospital, Dubai, United Arab Emirates;,Professor Department of obstetrics and gynaecology, University of Strasbourg
| | - V Gomel
- Professor emeritus Department of Obstetrics and Gynecology, University of British Columbia and Women’s Hospital, Vancouver, BC, Canada
| | - DC Martin
- Professor emeritus School of Medicine, University of Tennessee Health Science Centre, Memphis Tennessee, USA; Institutional Review Board, Virginia Commonwealth University, Richmond, Virginia. USA
| |
Collapse
|
14
|
Savage SR, Shi Z, Liao Y, Zhang B. Graph Algorithms for Condensing and Consolidating Gene Set Analysis Results. Mol Cell Proteomics 2019; 18:S141-S152. [PMID: 31142576 PMCID: PMC6692773 DOI: 10.1074/mcp.tir118.001263] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Revised: 03/22/2019] [Indexed: 01/04/2023] Open
Abstract
Gene set analysis plays a critical role in the functional interpretation of omics data. Although this is typically done for one omics experiment at a time, there is an increasing need to combine gene set analysis results from multiple experiments performed on the same or different omics platforms, such as in multi-omics studies. Integrating results from multiple experiments is challenging, and annotation redundancy between gene sets further obscures clear conclusions. We propose to use a weighted set cover algorithm to reduce redundancy of gene sets identified in a single experiment. Next, we use affinity propagation to consolidate similar gene sets identified from multiple experiments into clusters and to automatically determine the most representative gene set for each cluster. Using three examples from over representation analysis and gene set enrichment analysis, we showed that weighted set cover outperformed a previously published set cover method and reduced the number of gene sets by 52-77%. Focusing on overlapping genes between the list of input genes and the enriched gene sets in over-representation analysis and leading-edge genes in gene set enrichment analysis further reduced the number of gene sets. A use case combining enrichment analysis results from RNA-Seq and proteomics data comparing basal and luminal A breast cancer samples highlighted the known difference in proliferation and DNA damage response. Finally, we used these algorithms for a pan-cancer survival analysis. Our analysis clearly revealed prognosis-related pathways common to multiple cancer types or specific to individual cancer types, as well as pathways associated with prognosis in different directions in different cancer types. We implemented these two algorithms in an R package, Sumer, which generates tables and static and interactive plots for exploration and publication. Sumer is publicly available at https://github.com/bzhanglab/sumer.
Collapse
Affiliation(s)
- Sara R Savage
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas
| | - Zhiao Shi
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas
| | - Yuxing Liao
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas
| | - Bing Zhang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas.
| |
Collapse
|
15
|
Soul J, Hardingham TE, Boot-Handford RP, Schwartz JM. SkeletalVis: an exploration and meta-analysis data portal of cross-species skeletal transcriptomics data. Bioinformatics 2019; 35:2283-2290. [PMID: 30481257 PMCID: PMC6596879 DOI: 10.1093/bioinformatics/bty947] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 10/24/2018] [Accepted: 11/26/2018] [Indexed: 01/11/2023] Open
Abstract
MOTIVATION Skeletal diseases are prevalent in society, but improved molecular understanding is required to formulate new therapeutic strategies. Large and increasing quantities of available skeletal transcriptomics experiments give the potential for mechanistic insight of both fundamental skeletal biology and skeletal disease. However, no current repository provides access to processed, readily interpretable analysis of this data. To address this, we have developed SkeletalVis, an exploration portal for skeletal gene expression experiments. RESULTS The SkeletalVis data portal provides an exploration and comparison platform for analysed skeletal transcriptomics data. It currently hosts 287 analysed experiments with 739 perturbation responses with comprehensive downstream analysis. We demonstrate its utility in identifying both known and novel relationships between skeletal expression signatures. SkeletalVis provides users with a platform to explore the wealth of available expression data, develop consensus signatures and the ability to compare gene signatures from new experiments to the analysed data to facilitate meta-analysis. AVAILABILITY AND IMPLEMENTATION The SkeletalVis data portal is freely accessible at http://phenome.manchester.ac.uk. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jamie Soul
- Division of Evolution & Genomic Sciences, University of Manchester, Manchester, MUK
- Wellcome Centre for Cell-Matrix Research, Division of Cell-Matrix Biology and Regenerative Medicine, Faculty of Biology Medicine and Health, University of Manchester, Manchester, MUK
| | - Tim E Hardingham
- Wellcome Centre for Cell-Matrix Research, Division of Cell-Matrix Biology and Regenerative Medicine, Faculty of Biology Medicine and Health, University of Manchester, Manchester, MUK
| | - Ray P Boot-Handford
- Wellcome Centre for Cell-Matrix Research, Division of Cell-Matrix Biology and Regenerative Medicine, Faculty of Biology Medicine and Health, University of Manchester, Manchester, MUK
| | - Jean-Marc Schwartz
- Division of Evolution & Genomic Sciences, University of Manchester, Manchester, MUK
| |
Collapse
|
16
|
Using Pathway Covering to Explore Connections among Metabolites. Metabolites 2019; 9:metabo9050088. [PMID: 31052521 PMCID: PMC6571860 DOI: 10.3390/metabo9050088] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2019] [Revised: 04/24/2019] [Accepted: 04/26/2019] [Indexed: 11/17/2022] Open
Abstract
Interpreting changes in metabolite abundance in response to experimental treatments or disease states remains a major challenge in metabolomics. Pathway Covering is a new algorithm that takes a list of metabolites (compounds) and determines a minimum-cost set of metabolic pathways in an organism that includes (covers) all the metabolites in the list. We used five functions for assigning costs to pathways, including assigning a constant for all pathways, which yields a solution with the smallest pathway count; two methods that penalize large pathways; one that prefers pathways based on the pathway's assigned function, and one that loosely corresponds to metabolic flux. The pathway covering set computed by the algorithm can be displayed as a multi-pathway diagram ("pathway collage") that highlights the covered metabolites. We investigated the pathway covering algorithm by using several datasets from the Metabolomics Workbench. The algorithm is best applied to a list of metabolites with significant statistics and fold-changes with a specified direction of change for each metabolite. The pathway covering algorithm is now available within the Pathway Tools software and BioCyc website.
Collapse
|
17
|
Mubeen S, Hoyt CT, Gemünd A, Hofmann-Apitius M, Fröhlich H, Domingo-Fernández D. The Impact of Pathway Database Choice on Statistical Enrichment Analysis and Predictive Modeling. Front Genet 2019. [PMID: 31824580 DOI: 10.3389/fgene.2019.01203/bibtex] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/06/2023] Open
Abstract
Pathway-centric approaches are widely used to interpret and contextualize -omics data. However, databases contain different representations of the same biological pathway, which may lead to different results of statistical enrichment analysis and predictive models in the context of precision medicine. We have performed an in-depth benchmarking of the impact of pathway database choice on statistical enrichment analysis and predictive modeling. We analyzed five cancer datasets using three major pathway databases and developed an approach to merge several databases into a single integrative one: MPath. Our results show that equivalent pathways from different databases yield disparate results in statistical enrichment analysis. Moreover, we observed a significant dataset-dependent impact on the performance of machine learning models on different prediction tasks. In some cases, MPath significantly improved prediction performance and also reduced the variance of prediction performances. Furthermore, MPath yielded more consistent and biologically plausible results in statistical enrichment analyses. In summary, this benchmarking study demonstrates that pathway database choice can influence the results of statistical enrichment analysis and predictive modeling. Therefore, we recommend the use of multiple pathway databases or integrative ones.
Collapse
Affiliation(s)
- Sarah Mubeen
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Charles Tapley Hoyt
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - André Gemünd
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Holger Fröhlich
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Daniel Domingo-Fernández
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| |
Collapse
|
18
|
Stoney R, Robertson DL, Nenadic G, Schwartz JM. Mapping biological process relationships and disease perturbations within a pathway network. NPJ Syst Biol Appl 2018; 4:22. [PMID: 29900005 PMCID: PMC5995814 DOI: 10.1038/s41540-018-0055-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2017] [Revised: 04/17/2018] [Accepted: 04/24/2018] [Indexed: 01/07/2023] Open
Abstract
Molecular interaction networks are routinely used to map the organization of cellular function. Edges represent interactions between genes, proteins, or metabolites. However, in living cells, molecular interactions are dynamic, necessitating context-dependent models. Contextual information can be integrated into molecular interaction networks through the inclusion of additional molecular data, but there are concerns about completeness and relevance of this data. We developed an approach for representing the organization of human cellular processes using pathways as the nodes in a network. Pathways represent spatial and temporal sets of context-dependent interactions, generating a high-level network when linked together, which incorporates contextual information without the need for molecular interaction data. Analysis of the pathway network revealed linked communities representing functional relationships, comparable to those found in molecular networks, including metabolism, signaling, immunity, and the cell cycle. We mapped a range of diseases onto this network and find that pathways associated with diseases tend to be functionally connected, highlighting the perturbed functions that result in disease phenotypes. We demonstrated that disease pathways cluster within the network. We then examined the distribution of cancer pathways and showed that cancer pathways tend to localize within the signaling, DNA processes and immune modules, although some cancer-associated nodes are found in other network regions. Altogether, we generated a high-confidence functional network, which avoids some of the shortcomings faced by conventional molecular models. Our representation provides an intuitive functional interpretation of cellular organization, which relies only on high-quality pathway and Gene Ontology data. The network is available at https://data.mendeley.com/datasets/3pbwkxjxg9/1.
Collapse
Affiliation(s)
- Ruth Stoney
- School of Computer Science, University of Manchester, M13 9PT, Manchester, UK
| | - David L Robertson
- MRC-University of Glasgow Centre for Virus Research, Garscube Campus, Glasgow, G61 1QH UK
| | - Goran Nenadic
- School of Computer Science, University of Manchester, M13 9PT, Manchester, UK
| | - Jean-Marc Schwartz
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester, M13 9PT UK
| |
Collapse
|