1
|
Sinha S, McLaren E, Mullick M, Singh S, Boland BS, Ghosh P. FORWARD: A Learning Framework for Logical Network Perturbations to Prioritize Targets for Drug Development. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.16.602603. [PMID: 39071297 PMCID: PMC11275938 DOI: 10.1101/2024.07.16.602603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Despite advances in artificial intelligence (AI), target-based drug development remains a costly, complex and imprecise process. We introduce F.O.R.W.A.R.D [ Framework for Outcome-based Research and Drug Development ], a network-based target prioritization approach and test its utility in the challenging therapeutic area of Inflammatory Bowel Diseases (IBD), which is a chronic condition of multifactorial origin. F.O.R.W.A.R.D leverages real-world outcomes, using a machine-learning classifier trained on transcriptomic data from seven prospective randomized clinical trials involving four drugs. It establishes a molecular signature of remission as the therapeutic goal and computes, by integrating principles of network connectivity, the likelihood that a drug's action on its target(s) will induce the remission-associated genes. Benchmarking F.O.R.W.A.R.D against 210 completed clinical trials on 52 targets showed a perfect predictive accuracy of 100%. The success of F.O.R.W.A.R.D was achieved despite differences in targets, mechanisms, and trial designs. F.O.R.W.A.R.D-driven in-silico phase '0' trials revealed its potential to inform trial design, justify re-trialing failed drugs, and guide early terminations. With its extendable applications to other therapeutic areas and its iterative refinement with emerging trials, F.O.R.W.A.R.D holds the promise to transform drug discovery by generating foresight from hindsight and impacting research and development as well as human-in-the-loop clinical decision-making.
Collapse
|
2
|
Candia J, Ferrucci L. Assessment of Gene Set Enrichment Analysis using curated RNA-seq-based benchmarks. PLoS One 2024; 19:e0302696. [PMID: 38753612 PMCID: PMC11098418 DOI: 10.1371/journal.pone.0302696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 04/09/2024] [Indexed: 05/18/2024] Open
Abstract
Pathway enrichment analysis is a ubiquitous computational biology method to interpret a list of genes (typically derived from the association of large-scale omics data with phenotypes of interest) in terms of higher-level, predefined gene sets that share biological function, chromosomal location, or other common features. Among many tools developed so far, Gene Set Enrichment Analysis (GSEA) stands out as one of the pioneering and most widely used methods. Although originally developed for microarray data, GSEA is nowadays extensively utilized for RNA-seq data analysis. Here, we quantitatively assessed the performance of a variety of GSEA modalities and provide guidance in the practical use of GSEA in RNA-seq experiments. We leveraged harmonized RNA-seq datasets available from The Cancer Genome Atlas (TCGA) in combination with large, curated pathway collections from the Molecular Signatures Database to obtain cancer-type-specific target pathway lists across multiple cancer types. We carried out a detailed analysis of GSEA performance using both gene-set and phenotype permutations combined with four different choices for the Kolmogorov-Smirnov enrichment statistic. Based on our benchmarks, we conclude that the classic/unweighted gene-set permutation approach offered comparable or better sensitivity-vs-specificity tradeoffs across cancer types compared with other, more complex and computationally intensive permutation methods. Finally, we analyzed other large cohorts for thyroid cancer and hepatocellular carcinoma. We utilized a new consensus metric, the Enrichment Evidence Score (EES), which showed a remarkable agreement between pathways identified in TCGA and those from other sources, despite differences in cancer etiology. This finding suggests an EES-based strategy to identify a core set of pathways that may be complemented by an expanded set of pathways for downstream exploratory analysis. This work fills the existing gap in current guidelines and benchmarks for the use of GSEA with RNA-seq data and provides a framework to enable detailed benchmarking of other RNA-seq-based pathway analysis tools.
Collapse
Affiliation(s)
- Julián Candia
- Longitudinal Studies Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, United States of America
| | - Luigi Ferrucci
- Longitudinal Studies Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, United States of America
| |
Collapse
|
3
|
Yang H, Shi Y, Lin A, Qi C, Liu Z, Cheng Q, Miao K, Zhang J, Luo P. PESSA: A web tool for pathway enrichment score-based survival analysis in cancer. PLoS Comput Biol 2024; 20:e1012024. [PMID: 38717988 PMCID: PMC11078417 DOI: 10.1371/journal.pcbi.1012024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 03/26/2024] [Indexed: 05/12/2024] Open
Abstract
The activation levels of biologically significant gene sets are emerging tumor molecular markers and play an irreplaceable role in the tumor research field; however, web-based tools for prognostic analyses using it as a tumor molecular marker remain scarce. We developed a web-based tool PESSA for survival analysis using gene set activation levels. All data analyses were implemented via R. Activation levels of The Molecular Signatures Database (MSigDB) gene sets were assessed using the single sample gene set enrichment analysis (ssGSEA) method based on data from the Gene Expression Omnibus (GEO), The Cancer Genome Atlas (TCGA), The European Genome-phenome Archive (EGA) and supplementary tables of articles. PESSA was used to perform median and optimal cut-off dichotomous grouping of ssGSEA scores for each dataset, relying on the survival and survminer packages for survival analysis and visualisation. PESSA is an open-access web tool for visualizing the results of tumor prognostic analyses using gene set activation levels. A total of 238 datasets from the GEO, TCGA, EGA, and supplementary tables of articles; covering 51 cancer types and 13 survival outcome types; and 13,434 tumor-related gene sets are obtained from MSigDB for pre-grouping. Users can obtain the results, including Kaplan-Meier analyses based on the median and optimal cut-off values and accompanying visualization plots and the Cox regression analyses of dichotomous and continuous variables, by selecting the gene set markers of interest. PESSA (https://smuonco.shinyapps.io/PESSA/ OR http://robinl-lab.com/PESSA) is a large-scale web-based tumor survival analysis tool covering a large amount of data that creatively uses predefined gene set activation levels as molecular markers of tumors.
Collapse
Affiliation(s)
- Hong Yang
- Department of Oncology, Zhujiang Hospital, Southern Medical University, Haizhu District, Guangzhou, Guangdong, China
- The First School of Clinical Medicine, Southern Medical University, Baiyun District, Guangzhou, Guangdong, China
| | - Ying Shi
- Department of Oncology, Zhujiang Hospital, Southern Medical University, Haizhu District, Guangzhou, Guangdong, China
- The Second School of Clinical Medicine, Southern Medical University, Baiyun District, Guangzhou, Guangdong, China
| | - Anqi Lin
- Department of Oncology, Zhujiang Hospital, Southern Medical University, Haizhu District, Guangzhou, Guangdong, China
| | - Chang Qi
- Institute of Logic and Computation, TU Wien, Austria
| | - Zaoqu Liu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China
- State Key Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Department of Pathophysiology, Peking Union Medical College, Beijing, China
| | - Quan Cheng
- Department of Neurosurgery, Xiangya Hospital, Central South University, Changsha, Hunan, China
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, China
| | - Kai Miao
- Cancer Centre and Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Macau SAR, China
- MoE Frontiers Science Center for Precision Oncology, University of Macau, Macau SAR, China
| | - Jian Zhang
- Department of Oncology, Zhujiang Hospital, Southern Medical University, Haizhu District, Guangzhou, Guangdong, China
| | - Peng Luo
- Department of Oncology, Zhujiang Hospital, Southern Medical University, Haizhu District, Guangzhou, Guangdong, China
| |
Collapse
|
4
|
Dorbani I, Armengaud J, Carlin F, Duport C. Proteome of spores from biological indicators in sterilization processes: Bacillus pumilus and Bacillus atrophaeus. Proteomics 2024; 24:e2300293. [PMID: 38059874 DOI: 10.1002/pmic.202300293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 11/16/2023] [Accepted: 11/17/2023] [Indexed: 12/08/2023]
Abstract
Bacillus atrophaeus and Bacillus pumilus spores are widely used as biological indicators to assess the effectiveness of decontamination procedures. Spores are intricate, multi-layered cellular structures primarily composed of proteins, which significantly contribute to their extreme resistance. Therefore, conducting a comprehensive proteome analysis of spores is crucial to identify the specific proteins conferring spore resistance. Here, we employed a high-throughput shotgun proteomic approach to compare the spore proteomes of B. atrophaeus DSM675 and B. pumilus DSM492, identifying 1312 and 1264 proteins, respectively. While the overall number of proteins found in both strains is roughly equivalent, a closer examination of a subset of 54 spore-specific proteins revealed noteworthy distinctions. Among these 54 proteins, 23 were exclusively detected in one strain, while others were shared between both. Notably, of the 31 proteins detected in both strains, 10 exhibited differential abundance levels, including key coat layer morphogenetic proteins. The exploration of these 54 proteins, considering their presence, absence, and differential abundance, provides a unique molecular signature that may elucidate the differences in sensitivity/resistance profiles between the two strains.
Collapse
Affiliation(s)
- Imed Dorbani
- INRAE, Avignon Université, UMR SQPOV, Avignon, France
- Claranor SA, Avignon, France
| | - Jean Armengaud
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris Saclay, CEA, INRAE, Bagnols-sur-Cèze, France
| | | | | |
Collapse
|
5
|
O'Leary MF, Jackman SR, Bowtell JL. Shatavari supplementation in postmenopausal women alters the skeletal muscle proteome and pathways involved in training adaptation. Eur J Nutr 2024; 63:869-879. [PMID: 38214710 PMCID: PMC10948523 DOI: 10.1007/s00394-023-03310-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 12/10/2023] [Indexed: 01/13/2024]
Abstract
PURPOSE Shatavari is an understudied, widely available herbal supplement. It contains steroidal saponins and phytoestrogens. We previously showed that six weeks of shatavari supplementation improved handgrip strength and increased markers of myosin contractile function. Mechanistic insights into shatavari's actions are limited. Therefore, we performed proteomics on vastus lateralis (VL) samples that remained from our original study. METHODS In a randomised double-blind trial, women (68.5 ± 6 years) ingested either placebo or shatavari (equivalent to 26,500 mg/d fresh weight) for six weeks. Tandem mass tag global proteomic analysis of VL samples was conducted (N = 7 shatavari, N = 5 placebo). Data were normalized to total peptides and scaled using a reference sample. Data were filtered using a 5% FDR. For each protein, the pre to post supplementation difference was expressed as log2 fold change. Welch's t tests with Benjamini-Hochberg corrections were performed for each protein. Pathway enrichment (PADOG, CAMERA) was interrogated in Reactome (v85). RESULTS No individual protein was significantly different between supplementation conditions. Both PADOG and CAMERA indicated that pathways related to (1) Integrin/MAPK signalling, (2) metabolism/insulin secretion; (3) cell proliferation/senescence/DNA repair/cell death; (4) haemostasis/platelets/fibrin; (5) signal transduction; (6) neutrophil degranulation and (7) chemical synapse function were significantly upregulated. CAMERA indicated pathways related to translation/amino acid metabolism, viral infection, and muscle contraction were downregulated. CONCLUSION Our analyses indicate that shatavari may support muscle adaptation responses to exercise. These data provide useful signposts for future investigation of shatavari's utility in conserving and enhancing musculoskeletal function in older age. TRIAL REGISTRATION NCT05025917 30/08/21, retrospectively registered.
Collapse
Affiliation(s)
- Mary F O'Leary
- Department of Public Health and Sport Sciences, Faculty of Health and Life Sciences, University of Exeter, Exeter, UK.
| | - Sarah R Jackman
- Department of Public Health and Sport Sciences, Faculty of Health and Life Sciences, University of Exeter, Exeter, UK
| | - Joanna L Bowtell
- Department of Public Health and Sport Sciences, Faculty of Health and Life Sciences, University of Exeter, Exeter, UK
| |
Collapse
|
6
|
Peng C, Chen Q, Tan S, Shen X, Jiang C. Generalized reporter score-based enrichment analysis for omics data. Brief Bioinform 2024; 25:bbae116. [PMID: 38546324 PMCID: PMC10976918 DOI: 10.1093/bib/bbae116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 01/25/2024] [Accepted: 03/01/2024] [Indexed: 06/15/2024] Open
Abstract
Enrichment analysis contextualizes biological features in pathways to facilitate a systematic understanding of high-dimensional data and is widely used in biomedical research. The emerging reporter score-based analysis (RSA) method shows more promising sensitivity, as it relies on P-values instead of raw values of features. However, RSA cannot be directly applied to multi-group and longitudinal experimental designs and is often misused due to the lack of a proper tool. Here, we propose the Generalized Reporter Score-based Analysis (GRSA) method for multi-group and longitudinal omics data. A comparison with other popular enrichment analysis methods demonstrated that GRSA had increased sensitivity across multiple benchmark datasets. We applied GRSA to microbiome, transcriptome and metabolome data and discovered new biological insights in omics studies. Finally, we demonstrated the application of GRSA beyond functional enrichment using a taxonomy database. We implemented GRSA in an R package, ReporterScore, integrating with a powerful visualization module and updatable pathway databases, which is available on the Comprehensive R Archive Network (https://cran.r-project.org/web/packages/ReporterScore). We believe that the ReporterScore package will be a valuable asset for broad biomedical research fields.
Collapse
Affiliation(s)
- Chen Peng
- MOE Key Laboratory of Biosystems Homeostasis & Protection, and Zhejiang Provincial Key Laboratory of Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang 310030, China
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310009, China
| | - Qiong Chen
- MOE Key Laboratory of Biosystems Homeostasis & Protection, and Zhejiang Provincial Key Laboratory of Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang 310030, China
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310009, China
| | - Shangjin Tan
- BGI Research, Wuhan, Hubei 430074, China
- BGI Research, Shenzhen, Guangdong 518083, China
| | - Xiaotao Shen
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Chao Jiang
- MOE Key Laboratory of Biosystems Homeostasis & Protection, and Zhejiang Provincial Key Laboratory of Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang 310030, China
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310009, China
- Center for Life Sciences, Shaoxing Institute, Zhejiang University, Shaoxing, Zhejiang 321000, China
| |
Collapse
|
7
|
Hui TX, Kasim S, Aziz IA, Fudzee MFM, Haron NS, Sutikno T, Hassan R, Mahdin H, Sen SC. Robustness evaluations of pathway activity inference methods on gene expression data. BMC Bioinformatics 2024; 25:23. [PMID: 38216898 PMCID: PMC10785356 DOI: 10.1186/s12859-024-05632-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Accepted: 01/02/2024] [Indexed: 01/14/2024] Open
Abstract
BACKGROUND With the exponential growth of high-throughput technologies, multiple pathway analysis methods have been proposed to estimate pathway activities from gene expression profiles. These pathway activity inference methods can be divided into two main categories: non-Topology-Based (non-TB) and Pathway Topology-Based (PTB) methods. Although some review and survey articles discussed the topic from different aspects, there is a lack of systematic assessment and comparisons on the robustness of these approaches. RESULTS Thus, this study presents comprehensive robustness evaluations of seven widely used pathway activity inference methods using six cancer datasets based on two assessments. The first assessment seeks to investigate the robustness of pathway activity in pathway activity inference methods, while the second assessment aims to assess the robustness of risk-active pathways and genes predicted by these methods. The mean reproducibility power and total number of identified informative pathways and genes were evaluated. Based on the first assessment, the mean reproducibility power of pathway activity inference methods generally decreased as the number of pathway selections increased. Entropy-based Directed Random Walk (e-DRW) distinctly outperformed other methods in exhibiting the greatest reproducibility power across all cancer datasets. On the other hand, the second assessment shows that no methods provide satisfactory results across datasets. CONCLUSION However, PTB methods generally appear to perform better in producing greater reproducibility power and identifying potential cancer markers compared to non-TB methods.
Collapse
Affiliation(s)
- Tay Xin Hui
- Soft Computing and Data Mining Center, Faculty of Computer Sciences and Information Technology, Universiti Tun Hussein Onn Malaysia (UTHM), 83000, Batu Pahat, Malaysia
| | - Shahreen Kasim
- Soft Computing and Data Mining Center, Faculty of Computer Sciences and Information Technology, Universiti Tun Hussein Onn Malaysia (UTHM), 83000, Batu Pahat, Malaysia.
| | - Izzatdin Abdul Aziz
- Computer and Information Sciences Department (CISD), Universiti Teknologi PETRONAS (UTP), 32610, Seri Iskandar, Malaysia
| | - Mohd Farhan Md Fudzee
- Soft Computing and Data Mining Center, Faculty of Computer Sciences and Information Technology, Universiti Tun Hussein Onn Malaysia (UTHM), 83000, Batu Pahat, Malaysia
| | - Nazleeni Samiha Haron
- Computer and Information Sciences Department (CISD), Universiti Teknologi PETRONAS (UTP), 32610, Seri Iskandar, Malaysia
| | - Tole Sutikno
- Department of Electrical Engineering, Universitas Ahmad Dahlan (UAD), 55166, Yogyakarta, Indonesia
| | - Rohayanti Hassan
- Faculty of Electrical Engineering, Universiti Teknologi Malaysia (UTM), 81310, Johor Bahru, Malaysia
| | - Hairulnizam Mahdin
- Soft Computing and Data Mining Center, Faculty of Computer Sciences and Information Technology, Universiti Tun Hussein Onn Malaysia (UTHM), 83000, Batu Pahat, Malaysia
| | - Seah Choon Sen
- Faculty of Computing, Universiti Teknologi Malaysia (UTM), 81310, Johor Bahru, Malaysia
| |
Collapse
|
8
|
Wieder C, Cooke J, Frainay C, Poupin N, Bowler R, Jourdan F, Kechris KJ, Lai RP, Ebbels T. PathIntegrate: Multivariate modelling approaches for pathway-based multi-omics data integration. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.09.574780. [PMID: 38260498 PMCID: PMC10802464 DOI: 10.1101/2024.01.09.574780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
As terabytes of multi-omics data are being generated, there is an ever-increasing need for methods facilitating the integration and interpretation of such data. Current multi-omics integration methods typically output lists, clusters, or subnetworks of molecules related to an outcome. Even with expert domain knowledge, discerning the biological processes involved is a time-consuming activity. Here we propose PathIntegrate, a method for integrating multi-omics datasets based on pathways, designed to exploit knowledge of biological systems and thus provide interpretable models for such studies. PathIntegrate employs single-sample pathway analysis to transform multi-omics datasets from the molecular to the pathway-level, and applies a predictive single-view or multi-view model to integrate the data. Model outputs include multi-omics pathways ranked by their contribution to the outcome prediction, the contribution of each omics layer, and the importance of each molecule in a pathway. Using semi-synthetic data we demonstrate the benefit of grouping molecules into pathways to detect signals in low signal-to-noise scenarios, as well as the ability of PathIntegrate to precisely identify important pathways at low effect sizes. Finally, using COPD and COVID-19 data we showcase how PathIntegrate enables convenient integration and interpretation of complex high-dimensional multi-omics datasets. The PathIntegrate Python package is available at https://github.com/cwieder/PathIntegrate.
Collapse
Affiliation(s)
- Cecilia Wieder
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Juliette Cooke
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Clement Frainay
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Nathalie Poupin
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Russell Bowler
- National Jewish Health, 1400 Jackson Street, Denver, CO, 80206, USA
| | - Fabien Jourdan
- MetaboHUB-Metatoul, National Infrastructure of Metabolomics and Fluxomics, Toulouse, France
| | - Katerina J Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
| | - Rachel Pj Lai
- Department of Infectious Disease, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Timothy Ebbels
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, United Kingdom
| |
Collapse
|
9
|
Tsai HHD, House JS, Wright FA, Chiu WA, Rusyn I. A tiered testing strategy based on in vitro phenotypic and transcriptomic data for selecting representative petroleum UVCBs for toxicity evaluation in vivo. Toxicol Sci 2023; 193:219-233. [PMID: 37079747 PMCID: PMC10230285 DOI: 10.1093/toxsci/kfad041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/22/2023] Open
Abstract
Hazard evaluation of substances of "unknown or variable composition, complex reaction products and biological materials" (UVCBs) remains a major challenge in regulatory science because their chemical composition is difficult to ascertain. Petroleum substances are representative UVCBs and human cell-based data have been previously used to substantiate their groupings for regulatory submissions. We hypothesized that a combination of phenotypic and transcriptomic data could be integrated to make decisions as to selection of group-representative worst-case petroleum UVCBs for subsequent toxicity evaluation in vivo. We used data obtained from 141 substances from 16 manufacturing categories previously tested in 6 human cell types (induced pluripotent stem cell [iPSC]-derived hepatocytes, cardiomyocytes, neurons, and endothelial cells, and MCF7 and A375 cell lines). Benchmark doses for gene-substance combinations were calculated, and both transcriptomic and phenotype-derived points of departure (PODs) were obtained. Correlation analysis and machine learning were used to assess associations between phenotypic and transcriptional PODs and to determine the most informative cell types and assays, thus representing a cost-effective integrated testing strategy. We found that 2 cell types-iPSC-derived-hepatocytes and -cardiomyocytes-contributed the most informative and protective PODs and may be used to inform selection of representative petroleum UVCBs for further toxicity evaluation in vivo. Overall, although the use of new approach methodologies to prioritize UVCBs has not been widely adopted, our study proposes a tiered testing strategy based on iPSC-derived hepatocytes and cardiomyocytes to inform selection of representative worst-case petroleum UVCBs from each manufacturing category for further toxicity evaluation in vivo.
Collapse
Affiliation(s)
- Han-Hsuan Doris Tsai
- Interdisciplinary Faculty of Toxicology, College Station, Texas 77843, USA
- Department of Veterinary Physiology and Pharmacology, Texas A&M University, College Station, Texas 77843, USA
| | - John S House
- National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina 27709, USA
| | - Fred A Wright
- Interdisciplinary Faculty of Toxicology, College Station, Texas 77843, USA
- Department of Statistics and Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina 27603, USA
- Department of Biological Sciences and Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina 27603, USA
| | - Weihsueh A Chiu
- Interdisciplinary Faculty of Toxicology, College Station, Texas 77843, USA
- Department of Veterinary Physiology and Pharmacology, Texas A&M University, College Station, Texas 77843, USA
| | - Ivan Rusyn
- Interdisciplinary Faculty of Toxicology, College Station, Texas 77843, USA
- Department of Veterinary Physiology and Pharmacology, Texas A&M University, College Station, Texas 77843, USA
| |
Collapse
|
10
|
Reardon AJF, Farmahin R, Williams A, Meier MJ, Addicks GC, Yauk CL, Matteo G, Atlas E, Harrill J, Everett LJ, Shah I, Judson R, Ramaiahgari S, Ferguson SS, Barton-Maclaren TS. From vision toward best practices: Evaluating in vitro transcriptomic points of departure for application in risk assessment using a uniform workflow. FRONTIERS IN TOXICOLOGY 2023; 5:1194895. [PMID: 37288009 PMCID: PMC10242042 DOI: 10.3389/ftox.2023.1194895] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 05/03/2023] [Indexed: 06/09/2023] Open
Abstract
The growing number of chemicals in the current consumer and industrial markets presents a major challenge for regulatory programs faced with the need to assess the potential risks they pose to human and ecological health. The increasing demand for hazard and risk assessment of chemicals currently exceeds the capacity to produce the toxicity data necessary for regulatory decision making, and the applied data is commonly generated using traditional approaches with animal models that have limited context in terms of human relevance. This scenario provides the opportunity to implement novel, more efficient strategies for risk assessment purposes. This study aims to increase confidence in the implementation of new approach methods in a risk assessment context by using a parallel analysis to identify data gaps in current experimental designs, reveal the limitations of common approaches deriving transcriptomic points of departure, and demonstrate the strengths in using high-throughput transcriptomics (HTTr) to derive practical endpoints. A uniform workflow was applied across six curated gene expression datasets from concentration-response studies containing 117 diverse chemicals, three cell types, and a range of exposure durations, to determine tPODs based on gene expression profiles. After benchmark concentration modeling, a range of approaches was used to determine consistent and reliable tPODs. High-throughput toxicokinetics were employed to translate in vitro tPODs (µM) to human-relevant administered equivalent doses (AEDs, mg/kg-bw/day). The tPODs from most chemicals had AEDs that were lower (i.e., more conservative) than apical PODs in the US EPA CompTox chemical dashboard, suggesting in vitro tPODs would be protective of potential effects on human health. An assessment of multiple data points for single chemicals revealed that longer exposure duration and varied cell culture systems (e.g., 3D vs. 2D) lead to a decreased tPOD value that indicated increased chemical potency. Seven chemicals were flagged as outliers when comparing the ratio of tPOD to traditional POD, thus indicating they require further assessment to better understand their hazard potential. Our findings build confidence in the use of tPODs but also reveal data gaps that must be addressed prior to their adoption to support risk assessment applications.
Collapse
Affiliation(s)
- Anthony J. F. Reardon
- Existing Substances Risk Assessment Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, ON, Canada
| | - Reza Farmahin
- Existing Substances Risk Assessment Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, ON, Canada
| | - Andrew Williams
- Environmental Health Science and Research Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, ON, Canada
| | - Matthew J. Meier
- Environmental Health Science and Research Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, ON, Canada
| | - Gregory C. Addicks
- Environmental Health Science and Research Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, ON, Canada
| | - Carole L. Yauk
- Department of Biology, University of Ottawa, Ottawa, ON, Canada
| | - Geronimo Matteo
- Department of Biology, University of Ottawa, Ottawa, ON, Canada
| | - Ella Atlas
- Environmental Health Science and Research Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, ON, Canada
- Department of Biochemistry, University of Ottawa, Ottawa, ON, Canada
| | - Joshua Harrill
- Center for Computational Toxicology and Exposure, US Environmental Protection Agency, Durham, NC, United States
| | - Logan J. Everett
- Center for Computational Toxicology and Exposure, US Environmental Protection Agency, Durham, NC, United States
| | - Imran Shah
- Center for Computational Toxicology and Exposure, US Environmental Protection Agency, Durham, NC, United States
| | - Richard Judson
- Center for Computational Toxicology and Exposure, US Environmental Protection Agency, Durham, NC, United States
| | - Sreenivasa Ramaiahgari
- Division of Translational Toxicology, Mechanistic Toxicology Branch, National Institute of Environmental Health Sciences, National Institutes of Health, Durham, NC, United States
| | - Stephen S. Ferguson
- Division of Translational Toxicology, Mechanistic Toxicology Branch, National Institute of Environmental Health Sciences, National Institutes of Health, Durham, NC, United States
| | - Tara S. Barton-Maclaren
- Existing Substances Risk Assessment Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, ON, Canada
| |
Collapse
|
11
|
Engler Hart C, Ence D, Healey D, Domingo-Fernández D. On the correspondence between the transcriptomic response of a compound and its effects on its targets. BMC Bioinformatics 2023; 24:207. [PMID: 37208587 DOI: 10.1186/s12859-023-05337-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 05/14/2023] [Indexed: 05/21/2023] Open
Abstract
Better understanding the transcriptomic response produced by a compound perturbing its targets can shed light on the underlying biological processes regulated by the compound. However, establishing the relationship between the induced transcriptomic response and the target of a compound is non-trivial, partly because targets are rarely differentially expressed. Therefore, connecting both modalities requires orthogonal information (e.g., pathway or functional information). Here, we present a comprehensive study aimed at exploring this relationship by leveraging thousands of transcriptomic experiments and target data for over 2000 compounds. Firstly, we confirm that compound-target information does not correlate as expected with the transcriptomic signatures induced by a compound. However, we reveal how the concordance between both modalities increases by connecting pathway and target information. Additionally, we investigate whether compounds that target the same proteins induce a similar transcriptomic response and conversely, whether compounds with similar transcriptomic responses share the same target proteins. While our findings suggest that this is generally not the case, we did observe that compounds with similar transcriptomic profiles are more likely to share at least one protein target and common therapeutic applications. Finally, we demonstrate how to exploit the relationship between both modalities for mechanism of action deconvolution by presenting a case scenario involving a few compound pairs with high similarity.
Collapse
|
12
|
Zhao K, Rhee SY. Interpreting omics data with pathway enrichment analysis. Trends Genet 2023; 39:308-319. [PMID: 36750393 DOI: 10.1016/j.tig.2023.01.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 11/24/2022] [Accepted: 01/13/2023] [Indexed: 02/09/2023]
Abstract
Pathway enrichment analysis is indispensable for interpreting omics datasets and generating hypotheses. However, the foundations of enrichment analysis remain elusive to many biologists. Here, we discuss best practices in interpreting different types of omics data using pathway enrichment analysis and highlight the importance of considering intrinsic features of various types of omics data. We further explain major components that influence the outcomes of a pathway enrichment analysis, including defining background sets and choosing reference annotation databases. To improve reproducibility, we describe how to standardize reporting methodological details in publications. This article aims to serve as a primer for biologists to leverage the wealth of omics resources and motivate bioinformatics tool developers to enhance the power of pathway enrichment analysis.
Collapse
Affiliation(s)
- Kangmei Zhao
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94025, USA.
| | - Seung Yon Rhee
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94025, USA.
| |
Collapse
|
13
|
Lu Y, Pang Z, Xia J. Comprehensive investigation of pathway enrichment methods for functional interpretation of LC-MS global metabolomics data. Brief Bioinform 2023; 24:bbac553. [PMID: 36572652 PMCID: PMC9851290 DOI: 10.1093/bib/bbac553] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 10/31/2022] [Accepted: 11/15/2022] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Global or untargeted metabolomics is widely used to comprehensively investigate metabolic profiles under various pathophysiological conditions such as inflammations, infections, responses to exposures or interactions with microbial communities. However, biological interpretation of global metabolomics data remains a daunting task. Recent years have seen growing applications of pathway enrichment analysis based on putative annotations of liquid chromatography coupled with mass spectrometry (LC-MS) peaks for functional interpretation of LC-MS-based global metabolomics data. However, due to intricate peak-metabolite and metabolite-pathway relationships, considerable variations are observed among results obtained using different approaches. There is an urgent need to benchmark these approaches to inform the best practices. RESULTS We have conducted a benchmark study of common peak annotation approaches and pathway enrichment methods in current metabolomics studies. Representative approaches, including three peak annotation methods and four enrichment methods, were selected and benchmarked under different scenarios. Based on the results, we have provided a set of recommendations regarding peak annotation, ranking metrics and feature selection. The overall better performance was obtained for the mummichog approach. We have observed that a ~30% annotation rate is sufficient to achieve high recall (~90% based on mummichog), and using semi-annotated data improves functional interpretation. Based on the current platforms and enrichment methods, we further propose an identifiability index to indicate the possibility of a pathway being reliably identified. Finally, we evaluated all methods using 11 COVID-19 and 8 inflammatory bowel diseases (IBD) global metabolomics datasets.
Collapse
Affiliation(s)
- Yao Lu
- Department of Microbiology and Immunology, McGill University, Quebec, Canada
| | - Zhiqiang Pang
- Institute of Parasitology, McGill University, Quebec, Canada
| | - Jianguo Xia
- Department of Microbiology and Immunology, McGill University, Quebec, Canada
- Institute of Parasitology, McGill University, Quebec, Canada
| |
Collapse
|
14
|
Chicco D, Shiradkar R. Ten quick tips for computational analysis of medical images. PLoS Comput Biol 2023; 19:e1010778. [PMID: 36602952 PMCID: PMC9815662 DOI: 10.1371/journal.pcbi.1010778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Medical imaging is a great asset for modern medicine, since it allows physicians to spatially interrogate a disease site, resulting in precise intervention for diagnosis and treatment, and to observe particular aspect of patients' conditions that otherwise would not be noticeable. Computational analysis of medical images, moreover, can allow the discovery of disease patterns and correlations among cohorts of patients with the same disease, thus suggesting common causes or providing useful information for better therapies and cures. Machine learning and deep learning applied to medical images, in particular, have produced new, unprecedented results that can pave the way to advanced frontiers of medical discoveries. While computational analysis of medical images has become easier, however, the possibility to make mistakes or generate inflated or misleading results has become easier, too, hindering reproducibility and deployment. In this article, we provide ten quick tips to perform computational analysis of medical images avoiding common mistakes and pitfalls that we noticed in multiple studies in the past. We believe our ten guidelines, if taken into practice, can help the computational-medical imaging community to perform better scientific research that eventually can have a positive impact on the lives of patients worldwide.
Collapse
Affiliation(s)
- Davide Chicco
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
| | - Rakesh Shiradkar
- Department of Biomedical Engineering, Emory University, Atlanta, Georgia, United States of America
| |
Collapse
|
15
|
Chicco D, Jurman G. A brief survey of tools for genomic regions enrichment analysis. FRONTIERS IN BIOINFORMATICS 2022; 2:968327. [PMID: 36388843 PMCID: PMC9645122 DOI: 10.3389/fbinf.2022.968327] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 09/30/2022] [Indexed: 11/06/2022] Open
Abstract
Functional enrichment analysis or pathway enrichment analysis (PEA) is a bioinformatics technique which identifies the most over-represented biological pathways in a list of genes compared to those that would be associated with them by chance. These biological functions are found on bioinformatics annotated databases such as The Gene Ontology or KEGG; the more abundant pathways are identified through statistical techniques such as Fisher’s exact test. All PEA tools require a list of genes as input. A few tools, however, read lists of genomic regions as input rather than lists of genes, and first associate these chromosome regions with their corresponding genes. These tools perform a procedure called genomic regions enrichment analysis, which can be useful for detecting the biological pathways related to a set of chromosome regions. In this brief survey, we analyze six tools for genomic regions enrichment analysis (BEHST, g:Profiler g:GOSt, GREAT, LOLA, Poly-Enrich, and ReactomePA), outlining and comparing their main features. Our comparison results indicate that the inclusion of data for regulatory elements, such as ChIP-seq, is common among these tools and could therefore improve the enrichment analysis results.
Collapse
Affiliation(s)
- Davide Chicco
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, ON, Canada
| | - Giuseppe Jurman
- Data Science for Health Unit, Fondazione Bruno Kessler, Trento, Italy
| |
Collapse
|
16
|
Abstract
Pathway enrichment analysis (PEA) is a computational biology method that identifies biological functions that are overrepresented in a group of genes more than would be expected by chance and ranks these functions by relevance. The relative abundance of genes pertinent to specific pathways is measured through statistical methods, and associated functional pathways are retrieved from online bioinformatics databases. In the last decade, along with the spread of the internet, higher availability of computational resources made PEA software tools easy to access and to use for bioinformatics practitioners worldwide. Although it became easier to use these tools, it also became easier to make mistakes that could generate inflated or misleading results, especially for beginners and inexperienced computational biologists. With this article, we propose nine quick tips to avoid common mistakes and to out a complete, sound, thorough PEA, which can produce relevant and robust results. We describe our nine guidelines in a simple way, so that they can be understood and used by anyone, including students and beginners. Some tips explain what to do before starting a PEA, others are suggestions of how to correctly generate meaningful results, and some final guidelines indicate some useful steps to properly interpret PEA results. Our nine tips can help users perform better pathway enrichment analyses and eventually contribute to a better understanding of current biology.
Collapse
|