1
|
Nguyen H, Pham VD, Nguyen H, Tran B, Petereit J, Nguyen T. CCPA: cloud-based, self-learning modules for consensus pathway analysis using GO, KEGG and Reactome. Brief Bioinform 2024; 25:bbae222. [PMID: 39041916 PMCID: PMC11264295 DOI: 10.1093/bib/bbae222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 03/15/2024] [Accepted: 04/25/2024] [Indexed: 07/24/2024] Open
Abstract
This manuscript describes the development of a resource module that is part of a learning platform named 'NIGMS Sandbox for Cloud-based Learning' (https://github.com/NIGMS/NIGMS-Sandbox). The module delivers learning materials on Cloud-based Consensus Pathway Analysis in an interactive format that uses appropriate cloud resources for data access and analyses. Pathway analysis is important because it allows us to gain insights into biological mechanisms underlying conditions. But the availability of many pathway analysis methods, the requirement of coding skills, and the focus of current tools on only a few species all make it very difficult for biomedical researchers to self-learn and perform pathway analysis efficiently. Furthermore, there is a lack of tools that allow researchers to compare analysis results obtained from different experiments and different analysis methods to find consensus results. To address these challenges, we have designed a cloud-based, self-learning module that provides consensus results among established, state-of-the-art pathway analysis techniques to provide students and researchers with necessary training and example materials. The training module consists of five Jupyter Notebooks that provide complete tutorials for the following tasks: (i) process expression data, (ii) perform differential analysis, visualize and compare the results obtained from four differential analysis methods (limma, t-test, edgeR, DESeq2), (iii) process three pathway databases (GO, KEGG and Reactome), (iv) perform pathway analysis using eight methods (ORA, CAMERA, KS test, Wilcoxon test, FGSEA, GSA, SAFE and PADOG) and (v) combine results of multiple analyses. We also provide examples, source code, explanations and instructional videos for trainees to complete each Jupyter Notebook. The module supports the analysis for many model (e.g. human, mouse, fruit fly, zebra fish) and non-model species. The module is publicly available at https://github.com/NIGMS/Consensus-Pathway-Analysis-in-the-Cloud. This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [1] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.
Collapse
Affiliation(s)
- Ha Nguyen
- Department of Computer Science and Software Engineering, Auburn University, AL 36849, USA
| | - Van-Dung Pham
- Department of Computer Science and Software Engineering, Auburn University, AL 36849, USA
| | - Hung Nguyen
- Department of Computer Science and Software Engineering, Auburn University, AL 36849, USA
| | - Bang Tran
- Department of Computer Science, California State University, Sacramento, CA 95819, USA
| | - Juli Petereit
- Nevada Bioinformatics Center, University of Nevada, Reno, NV 89557, USA
| | - Tin Nguyen
- Department of Computer Science and Software Engineering, Auburn University, AL 36849, USA
| |
Collapse
|
2
|
Li X, Zan X, Liu T, Dong X, Zhang H, Li Q, Bao Z, Lin J. Integrated edge information and pathway topology for drug-disease associations. iScience 2024; 27:110025. [PMID: 38974972 PMCID: PMC11226970 DOI: 10.1016/j.isci.2024.110025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 04/06/2024] [Accepted: 05/15/2024] [Indexed: 07/09/2024] Open
Abstract
Drug repurposing is a promising approach to find new therapeutic indications for approved drugs. Many computational approaches have been proposed to prioritize candidate anticancer drugs by gene or pathway level. However, these methods neglect the changes in gene interactions at the edge level. To address the limitation, we develop a computational drug repurposing method (iEdgePathDDA) based on edge information and pathway topology. First, we identify drug-induced and disease-related edges (the changes in gene interactions) within pathways by using the Pearson correlation coefficient. Next, we calculate the inhibition score between drug-induced edges and disease-related edges. Finally, we prioritize drug candidates according to the inhibition score on all disease-related edges. Case studies show that our approach successfully identifies new drug-disease pairs based on CTD database. Compared to the state-of-the-art approaches, the results demonstrate our method has the superior performance in terms of five metrics across colorectal, breast, and lung cancer datasets.
Collapse
Affiliation(s)
- Xianbin Li
- School of Computer and Big Data Science, Jiujiang University, Jiujiang, Jiangxi 332000, China
- Department of Digital Media Technology, Hangzhou Dianzi University, Hangzhou, Zhejiang 310018, China
| | - Xiangzhen Zan
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, Guangdong 520000, China
| | - Tao Liu
- School of Computer and Big Data Science, Jiujiang University, Jiujiang, Jiangxi 332000, China
| | - Xiwei Dong
- School of Computer and Big Data Science, Jiujiang University, Jiujiang, Jiangxi 332000, China
| | - Haqi Zhang
- Department of Digital Media Technology, Hangzhou Dianzi University, Hangzhou, Zhejiang 310018, China
| | - Qizhang Li
- Innovative Drug R&D Center, School of Life Sciences, Huaibei Normal University, Huaibei, Anhui 235000, China
| | - Zhenshen Bao
- College of Information Engineering, Taizhou University, Taizhou 225300, Jiangsu, China
| | - Jie Lin
- Department of Pharmacy, the Third Affiliated Hospital of Wenzhou Medical University, Wenzhou 325200, Zhejiang Province, China
| |
Collapse
|
3
|
Chambers BA, Basili D, Word L, Baker N, Middleton A, Judson RS, Shah I. Searching for LINCS to Stress: Using Text Mining to Automate Reference Chemical Curation. Chem Res Toxicol 2024; 37:878-893. [PMID: 38736322 DOI: 10.1021/acs.chemrestox.3c00335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/14/2024]
Abstract
Adaptive stress response pathways (SRPs) restore cellular homeostasis following perturbation but may activate terminal outcomes like apoptosis, autophagy, or cellular senescence if disruption exceeds critical thresholds. Because SRPs hold the key to vital cellular tipping points, they are targeted for therapeutic interventions and assessed as biomarkers of toxicity. Hence, we are developing a public database of chemicals that perturb SRPs to enable new data-driven tools to improve public health. Here, we report on the automated text-mining pipeline we used to build and curate the first version of this database. We started with 100 reference SRP chemicals gathered from published biomarker studies to bootstrap the database. Second, we used information retrieval to find co-occurrences of reference chemicals with SRP terms in PubMed abstracts and determined pairwise mutual information thresholds to filter biologically relevant relationships. Third, we applied these thresholds to find 1206 putative SRP perturbagens within thousands of substances in the Library of Integrated Network-Based Cellular Signatures (LINCS). To assign SRP activity to LINCS chemicals, domain experts had to manually review at least three publications for each of 1206 chemicals out of 181,805 total abstracts. To accomplish this efficiently, we implemented a machine learning approach to predict SRP classifications from texts to prioritize abstracts. In 5-fold cross-validation testing with a corpus derived from the 100 reference chemicals, artificial neural networks performed the best (F1-macro = 0.678) and prioritized 2479/181,805 abstracts for expert review, which resulted in 457 chemicals annotated with SRP activities. An independent analysis of enriched mechanisms of action and chemical use class supported the text-mined chemical associations (p < 0.05): heat shock inducers were linked with HSP90 and DNA damage inducers to topoisomerase inhibition. This database will enable novel applications of LINCS data to evaluate SRP activities and to further develop tools for biomedical information extraction from the literature.
Collapse
Affiliation(s)
- Bryant A Chambers
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| | - Danilo Basili
- Unilever, Safety and Environmental Assurance Centre (SEAC), Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, U.K
| | - Laura Word
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| | - Nancy Baker
- Leidos, Research Triangle Park, North Carolina 27711, United States
| | - Alistair Middleton
- Unilever, Safety and Environmental Assurance Centre (SEAC), Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, U.K
| | - Richard S Judson
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| | - Imran Shah
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| |
Collapse
|
4
|
Candia J, Ferrucci L. Assessment of Gene Set Enrichment Analysis using curated RNA-seq-based benchmarks. PLoS One 2024; 19:e0302696. [PMID: 38753612 PMCID: PMC11098418 DOI: 10.1371/journal.pone.0302696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 04/09/2024] [Indexed: 05/18/2024] Open
Abstract
Pathway enrichment analysis is a ubiquitous computational biology method to interpret a list of genes (typically derived from the association of large-scale omics data with phenotypes of interest) in terms of higher-level, predefined gene sets that share biological function, chromosomal location, or other common features. Among many tools developed so far, Gene Set Enrichment Analysis (GSEA) stands out as one of the pioneering and most widely used methods. Although originally developed for microarray data, GSEA is nowadays extensively utilized for RNA-seq data analysis. Here, we quantitatively assessed the performance of a variety of GSEA modalities and provide guidance in the practical use of GSEA in RNA-seq experiments. We leveraged harmonized RNA-seq datasets available from The Cancer Genome Atlas (TCGA) in combination with large, curated pathway collections from the Molecular Signatures Database to obtain cancer-type-specific target pathway lists across multiple cancer types. We carried out a detailed analysis of GSEA performance using both gene-set and phenotype permutations combined with four different choices for the Kolmogorov-Smirnov enrichment statistic. Based on our benchmarks, we conclude that the classic/unweighted gene-set permutation approach offered comparable or better sensitivity-vs-specificity tradeoffs across cancer types compared with other, more complex and computationally intensive permutation methods. Finally, we analyzed other large cohorts for thyroid cancer and hepatocellular carcinoma. We utilized a new consensus metric, the Enrichment Evidence Score (EES), which showed a remarkable agreement between pathways identified in TCGA and those from other sources, despite differences in cancer etiology. This finding suggests an EES-based strategy to identify a core set of pathways that may be complemented by an expanded set of pathways for downstream exploratory analysis. This work fills the existing gap in current guidelines and benchmarks for the use of GSEA with RNA-seq data and provides a framework to enable detailed benchmarking of other RNA-seq-based pathway analysis tools.
Collapse
Affiliation(s)
- Julián Candia
- Longitudinal Studies Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, United States of America
| | - Luigi Ferrucci
- Longitudinal Studies Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, United States of America
| |
Collapse
|
5
|
Geistlinger L, Mirzayi C, Zohra F, Azhar R, Elsafoury S, Grieve C, Wokaty J, Gamboa-Tuz SD, Sengupta P, Hecht I, Ravikrishnan A, Gonçalves RS, Franzosa E, Raman K, Carey V, Dowd JB, Jones HE, Davis S, Segata N, Huttenhower C, Waldron L. BugSigDB captures patterns of differential abundance across a broad range of host-associated microbial signatures. Nat Biotechnol 2024; 42:790-802. [PMID: 37697152 PMCID: PMC11098749 DOI: 10.1038/s41587-023-01872-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 06/20/2023] [Indexed: 09/13/2023]
Abstract
The literature of human and other host-associated microbiome studies is expanding rapidly, but systematic comparisons among published results of host-associated microbiome signatures of differential abundance remain difficult. We present BugSigDB, a community-editable database of manually curated microbial signatures from published differential abundance studies accompanied by information on study geography, health outcomes, host body site and experimental, epidemiological and statistical methods using controlled vocabulary. The initial release of the database contains >2,500 manually curated signatures from >600 published studies on three host species, enabling high-throughput analysis of signature similarity, taxon enrichment, co-occurrence and coexclusion and consensus signatures. These data allow assessment of microbiome differential abundance within and across experimental conditions, environments or body sites. Database-wide analysis reveals experimental conditions with the highest level of consistency in signatures reported by independent studies and identifies commonalities among disease-associated signatures, including frequent introgression of oral pathobionts into the gut.
Collapse
Affiliation(s)
- Ludwig Geistlinger
- Center for Computational Biomedicine, Harvard Medical School, Boston, MA, USA
| | - Chloe Mirzayi
- Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, NY, USA
- Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, NY, USA
| | - Fatima Zohra
- Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, NY, USA
- Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, NY, USA
| | - Rimsha Azhar
- Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, NY, USA
- Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, NY, USA
| | - Shaimaa Elsafoury
- Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, NY, USA
- Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, NY, USA
| | - Clare Grieve
- Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, NY, USA
- Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, NY, USA
| | - Jennifer Wokaty
- Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, NY, USA
- Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, NY, USA
| | - Samuel David Gamboa-Tuz
- Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, NY, USA
- Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, NY, USA
| | - Pratyay Sengupta
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology (IIT) Madras, Chennai, India
- Robert Bosch Centre for Data Science and Artificial Intelligence, Indian Institute of Technology (IIT) Madras, Chennai, India
- Centre for Integrative Biology and Systems mEdicine (IBSE), Indian Institute of Technology (IIT) Madras, Chennai, India
| | | | - Aarthi Ravikrishnan
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Rafael S Gonçalves
- Center for Computational Biomedicine, Harvard Medical School, Boston, MA, USA
| | - Eric Franzosa
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
- Harvard Chan Microbiome in Public Health Center, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Karthik Raman
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology (IIT) Madras, Chennai, India
- Robert Bosch Centre for Data Science and Artificial Intelligence, Indian Institute of Technology (IIT) Madras, Chennai, India
- Centre for Integrative Biology and Systems mEdicine (IBSE), Indian Institute of Technology (IIT) Madras, Chennai, India
| | - Vincent Carey
- Channing Division of Network Medicine, Mass General Brigham, Harvard Medical School, Boston, MA, USA
| | - Jennifer B Dowd
- Leverhulme Centre for Demographic Science, University of Oxford, Oxford, UK
| | - Heidi E Jones
- Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, NY, USA
- Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, NY, USA
| | - Sean Davis
- Departments of Biomedical Informatics and Medicine, University of Colorado Anschutz School of Medicine, Denver, CO, USA
| | - Nicola Segata
- Department CIBIO, University of Trento, Trento, Italy
- Istituto Europeo di Oncologia (IEO) IRCSS, Milan, Italy
| | - Curtis Huttenhower
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
- Harvard Chan Microbiome in Public Health Center, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Levi Waldron
- Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, NY, USA.
- Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, NY, USA.
- Department CIBIO, University of Trento, Trento, Italy.
| |
Collapse
|
6
|
Taiwo M, Huang E, Pathak V, Bellar A, Welch N, Dasarathy J, Streem D, McClain CJ, Mitchell MC, Barton BA, Szabo G, Dasarathy S, Schaefer EA, Luther J, Day LZ, Ouyang X, Suyavaran A, Mehal WZ, Jacobs JM, Goodman RP, Rotroff DM, Nagy LE. Proteomics identifies complement protein signatures in patients with alcohol-associated hepatitis. JCI Insight 2024; 9:e174127. [PMID: 38573776 PMCID: PMC11141929 DOI: 10.1172/jci.insight.174127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 03/27/2024] [Indexed: 04/06/2024] Open
Abstract
Diagnostic challenges continue to impede development of effective therapies for successful management of alcohol-associated hepatitis (AH), creating an unmet need to identify noninvasive biomarkers for AH. In murine models, complement contributes to ethanol-induced liver injury. Therefore, we hypothesized that complement proteins could be rational diagnostic/prognostic biomarkers in AH. Here, we performed a comparative analysis of data derived from human hepatic and serum proteome to identify and characterize complement protein signatures in severe AH (sAH). The quantity of multiple complement proteins was perturbed in liver and serum proteome of patients with sAH. Multiple complement proteins differentiated patients with sAH from those with alcohol cirrhosis (AC) or alcohol use disorder (AUD) and healthy controls (HCs). Serum collectin 11 and C1q binding protein were strongly associated with sAH and exhibited good discriminatory performance among patients with sAH, AC, or AUD and HCs. Furthermore, complement component receptor 1-like protein was negatively associated with pro-inflammatory cytokines. Additionally, lower serum MBL associated serine protease 1 and coagulation factor II independently predicted 90-day mortality. In summary, meta-analysis of proteomic profiles from liver and circulation revealed complement protein signatures of sAH, highlighting a complex perturbation of complement and identifying potential diagnostic and prognostic biomarkers for patients with sAH.
Collapse
Affiliation(s)
| | | | - Vai Pathak
- Department of Quantitative Health Sciences, and
| | | | - Nicole Welch
- Department of Inflammation and Immunity
- Department of Gastroenterology and Hepatology, Cleveland Clinic, Cleveland, Ohio, USA
| | - Jaividhya Dasarathy
- Department of Family Medicine, Metro Health Medical Center, Cleveland, Ohio, USA
| | - David Streem
- Department of Psychiatry and Psychology, Cleveland Clinic Lutheran Hospital, Cleveland, Ohio, USA
| | - Craig J. McClain
- Department of Medicine, University of Louisville, Louisville, Kentucky, USA
| | - Mack C. Mitchell
- Department of Internal Medicine, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - Bruce A. Barton
- Department of Population and Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, Massachusetts, USA
| | - Gyongyi Szabo
- Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
| | - Srinivasan Dasarathy
- Department of Inflammation and Immunity
- Department of Gastroenterology and Hepatology, Cleveland Clinic, Cleveland, Ohio, USA
- Department of Molecular Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| | | | - Esperance A. Schaefer
- Alcohol Liver Center, Division of Gastroenterology, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Jay Luther
- Alcohol Liver Center, Division of Gastroenterology, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Le Z. Day
- Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Xinshou Ouyang
- Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - Arumugam Suyavaran
- Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - Wajahat Z. Mehal
- Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - Jon M. Jacobs
- Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Russell P. Goodman
- Alcohol Liver Center, Division of Gastroenterology, Massachusetts General Hospital, Boston, Massachusetts, USA
- Endocrine Unit, Division of Gastroenterology, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Daniel M. Rotroff
- Department of Quantitative Health Sciences, and
- Endocrine and Metabolism Institute and
- Center for Quantitative Metabolic Research, Cleveland Clinic, Cleveland, Ohio, USA
| | - Laura E. Nagy
- Department of Inflammation and Immunity
- Department of Gastroenterology and Hepatology, Cleveland Clinic, Cleveland, Ohio, USA
- See Supplemental Acknowledgments for information on the AlcHepNet Consortium
| |
Collapse
|
7
|
Baker BH, Freije S, MacDonald JW, Bammler TK, Benson C, Carroll KN, Enquobahrie DA, Karr CJ, LeWinn KZ, Zhao Q, Bush NR, Sathyanarayana S, Paquette AG. Placental transcriptomic signatures of prenatal and preconceptional maternal stress. Mol Psychiatry 2024; 29:1179-1191. [PMID: 38212375 PMCID: PMC11176062 DOI: 10.1038/s41380-023-02403-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 12/20/2023] [Accepted: 12/22/2023] [Indexed: 01/13/2024]
Abstract
Prenatal exposure to maternal psychological stress is associated with increased risk for adverse birth and child health outcomes. Accumulating evidence suggests that preconceptional maternal stress may also be transmitted intergenerationally to negatively impact offspring. However, understanding of mechanisms linking these exposures to offspring outcomes, particularly those related to placenta, is limited. Using RNA sequencing, we identified placental transcriptomic signatures associated with maternal prenatal stressful life events (SLEs) and childhood traumatic events (CTEs) in 1 029 mother-child pairs in two birth cohorts from Washington state and Memphis, Tennessee. We evaluated individual gene-SLE/CTE associations and performed an ensemble of gene set enrichment analyses combing across 11 popular enrichment methods. Higher number of prenatal SLEs was significantly (FDR < 0.05) associated with increased expression of ADGRG6, a placental tissue-specific gene critical in placental remodeling, and decreased expression of RAB11FIP3, an endocytosis and endocytic recycling gene, and SMYD5, a histone methyltransferase. Prenatal SLEs and maternal CTEs were associated with gene sets related to several biological pathways, including upregulation of protein processing in the endoplasmic reticulum, protein secretion, and ubiquitin mediated proteolysis, and down regulation of ribosome, epithelial mesenchymal transition, DNA repair, MYC targets, and amino acid-related pathways. The directional associations in these pathways corroborate prior non-transcriptomic mechanistic studies of psychological stress and mental health disorders, and have previously been implicated in pregnancy complications and adverse birth outcomes. Accordingly, our findings suggest that maternal exposure to psychosocial stressors during pregnancy as well as the mother's childhood may disrupt placental function, which may ultimately contribute to adverse pregnancy, birth, and child health outcomes.
Collapse
Affiliation(s)
- Brennan H Baker
- University of Washington, Seattle, WA, USA.
- Seattle Children's Research Institute, Seattle, WA, USA.
| | | | | | | | - Ciara Benson
- Global Alliance to Prevent Preterm Birth and Stillbirth (GAPPS), Lynnwood, WA, USA
| | | | | | | | - Kaja Z LeWinn
- University of California San Francisco, San Francisco, CA, USA
| | - Qi Zhao
- University of Tennessee Health Sciences Center, Memphis, TN, USA
| | - Nicole R Bush
- University of California San Francisco, San Francisco, CA, USA
| | - Sheela Sathyanarayana
- University of Washington, Seattle, WA, USA
- Seattle Children's Research Institute, Seattle, WA, USA
| | - Alison G Paquette
- University of Washington, Seattle, WA, USA
- Seattle Children's Research Institute, Seattle, WA, USA
| |
Collapse
|
8
|
Peng C, Chen Q, Tan S, Shen X, Jiang C. Generalized reporter score-based enrichment analysis for omics data. Brief Bioinform 2024; 25:bbae116. [PMID: 38546324 PMCID: PMC10976918 DOI: 10.1093/bib/bbae116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 01/25/2024] [Accepted: 03/01/2024] [Indexed: 06/15/2024] Open
Abstract
Enrichment analysis contextualizes biological features in pathways to facilitate a systematic understanding of high-dimensional data and is widely used in biomedical research. The emerging reporter score-based analysis (RSA) method shows more promising sensitivity, as it relies on P-values instead of raw values of features. However, RSA cannot be directly applied to multi-group and longitudinal experimental designs and is often misused due to the lack of a proper tool. Here, we propose the Generalized Reporter Score-based Analysis (GRSA) method for multi-group and longitudinal omics data. A comparison with other popular enrichment analysis methods demonstrated that GRSA had increased sensitivity across multiple benchmark datasets. We applied GRSA to microbiome, transcriptome and metabolome data and discovered new biological insights in omics studies. Finally, we demonstrated the application of GRSA beyond functional enrichment using a taxonomy database. We implemented GRSA in an R package, ReporterScore, integrating with a powerful visualization module and updatable pathway databases, which is available on the Comprehensive R Archive Network (https://cran.r-project.org/web/packages/ReporterScore). We believe that the ReporterScore package will be a valuable asset for broad biomedical research fields.
Collapse
Affiliation(s)
- Chen Peng
- MOE Key Laboratory of Biosystems Homeostasis & Protection, and Zhejiang Provincial Key Laboratory of Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang 310030, China
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310009, China
| | - Qiong Chen
- MOE Key Laboratory of Biosystems Homeostasis & Protection, and Zhejiang Provincial Key Laboratory of Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang 310030, China
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310009, China
| | - Shangjin Tan
- BGI Research, Wuhan, Hubei 430074, China
- BGI Research, Shenzhen, Guangdong 518083, China
| | - Xiaotao Shen
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Chao Jiang
- MOE Key Laboratory of Biosystems Homeostasis & Protection, and Zhejiang Provincial Key Laboratory of Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang 310030, China
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310009, China
- Center for Life Sciences, Shaoxing Institute, Zhejiang University, Shaoxing, Zhejiang 321000, China
| |
Collapse
|
9
|
Chang LY, Lee MZ, Wu Y, Lee WK, Ma CL, Chang JM, Chen CW, Huang TC, Lee CH, Lee JC, Tseng YY, Lin CY. Gene set correlation enrichment analysis for interpreting and annotating gene expression profiles. Nucleic Acids Res 2024; 52:e17. [PMID: 38096046 PMCID: PMC10853793 DOI: 10.1093/nar/gkad1187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 11/17/2023] [Accepted: 11/29/2023] [Indexed: 02/10/2024] Open
Abstract
Pathway analysis, including nontopology-based (non-TB) and topology-based (TB) methods, is widely used to interpret the biological phenomena underlying differences in expression data between two phenotypes. By considering dependencies and interactions between genes, TB methods usually perform better than non-TB methods in identifying pathways that include closely relevant or directly causative genes for a given phenotype. However, most TB methods may be limited by incomplete pathway data used as the reference network or by difficulties in selecting appropriate reference networks for different research topics. Here, we propose a gene set correlation enrichment analysis method, Gscore, based on an expression dataset-derived coexpression network to examine whether a differentially expressed gene (DEG) list (or each of its DEGs) is associated with a known gene set. Gscore is better able to identify target pathways in 89 human disease expression datasets than eight other state-of-the-art methods and offers insight into how disease-wide and pathway-wide associations reflect clinical outcomes. When applied to RNA-seq data from COVID-19-related cells and patient samples, Gscore provided a means for studying how DEGs are implicated in COVID-19-related pathways. In summary, Gscore offers a powerful analytical approach for annotating individual DEGs, DEG lists, and genome-wide expression profiles based on existing biological knowledge.
Collapse
Affiliation(s)
- Lan-Yun Chang
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Meng-Zhan Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Yujia Wu
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Wen-Kai Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Chia-Liang Ma
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Jun-Mao Chang
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Ciao-Wen Chen
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Tzu-Chun Huang
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Chia-Hwa Lee
- School of Medical Laboratory Science and Biotechnology, College of Medical Science and Technology, Taipei Medical University, New Taipei City 235, Taiwan
- Center for Intelligent Drug Systems and Smart Bio-devices (IDSB), National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- TMU Research Center of Cancer Translational Medicine, Taipei Medical University, Taipei 110, Taiwan
- Ph.D. Program in Medical Biotechnology, College of Medical Science and Technology, Taipei Medical University, New Taipei City 235, Taiwan
| | - Jih-Chin Lee
- Department of Otolaryngology-Head and Neck Surgery, Tri-Service General Hospital, National Defense Medical Center, Taipei 110, Taiwan
| | - Yu-Yao Tseng
- Department of Food Science, Nutrition, and Nutraceutical Biotechnology, Shih Chien University, Taipei 104, Taiwan
| | - Chun-Yu Lin
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- Center for Intelligent Drug Systems and Smart Bio-devices (IDSB), National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- Cancer and Immunology Research Center, National Yang Ming Chiao Tung University, Taipei 112, Taiwan
- Institute of Data Science and Engineering, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- School of Dentistry, Kaohsiung Medical University, Kaohsiung 807, Taiwan
| |
Collapse
|
10
|
Liu Y, Lian G, Chen T. A novel multi-omics data analysis of dose-dependent and temporal changes in regulatory pathways due to chemical perturbation: a case study on caffeine. Toxicol Mech Methods 2024; 34:164-175. [PMID: 37794615 DOI: 10.1080/15376516.2023.2265462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 09/26/2023] [Indexed: 10/06/2023]
Abstract
Comprehensive analysis of multi-omics data can reveal alterations in regulatory pathways induced by cellular exposure to chemicals by characterizing biological processes at the molecular level. Data-driven omics analysis, conducted in a dose-dependent or dynamic manner, can facilitate comprehending toxicity mechanisms. This study introduces a novel multi-omics data analysis designed to concurrently examine dose-dependent and temporal patterns of cellular responses to chemical perturbations. This analysis, encompassing preliminary exploration, pattern deconstruction, and network reconstruction of multi-omics data, provides a comprehensive perspective on the dynamic behaviors of cells exposed to varying levels of chemical stimuli. Importantly, this analysis is adaptable to any number of omics layers, including site-specific phosphoproteomics. We implemented this analysis on multi-omics data obtained from HepG2 cells exposed to a range of caffeine doses over varying durations and identified six response patterns, along with their associated biomolecules and pathways. Our study demonstrates the effectiveness of the proposed multi-omics data analysis in capturing multidimensional patterns of cellular response to chemical perturbation, enhancing understanding of pathway regulation for chemical risk assessment.
Collapse
Affiliation(s)
- Yufan Liu
- School of Chemistry and Chemical Engineering, University of Surrey, Guildford, UK
| | - Guoping Lian
- School of Chemistry and Chemical Engineering, University of Surrey, Guildford, UK
- Unilever R&D Colworth, Bedford, UK
| | - Tao Chen
- School of Chemistry and Chemical Engineering, University of Surrey, Guildford, UK
| |
Collapse
|
11
|
Buzzao D, Castresana-Aguirre M, Guala D, Sonnhammer ELL. Benchmarking enrichment analysis methods with the disease pathway network. Brief Bioinform 2024; 25:bbae069. [PMID: 38436561 PMCID: PMC10939300 DOI: 10.1093/bib/bbae069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 01/10/2024] [Accepted: 02/03/2024] [Indexed: 03/05/2024] Open
Abstract
Enrichment analysis (EA) is a common approach to gain functional insights from genome-scale experiments. As a consequence, a large number of EA methods have been developed, yet it is unclear from previous studies which method is the best for a given dataset. The main issues with previous benchmarks include the complexity of correctly assigning true pathways to a test dataset, and lack of generality of the evaluation metrics, for which the rank of a single target pathway is commonly used. We here provide a generalized EA benchmark and apply it to the most widely used EA methods, representing all four categories of current approaches. The benchmark employs a new set of 82 curated gene expression datasets from DNA microarray and RNA-Seq experiments for 26 diseases, of which only 13 are cancers. In order to address the shortcomings of the single target pathway approach and to enhance the sensitivity evaluation, we present the Disease Pathway Network, in which related Kyoto Encyclopedia of Genes and Genomes pathways are linked. We introduce a novel approach to evaluate pathway EA by combining sensitivity and specificity to provide a balanced evaluation of EA methods. This approach identifies Network Enrichment Analysis methods as the overall top performers compared with overlap-based methods. By using randomized gene expression datasets, we explore the null hypothesis bias of each method, revealing that most of them produce skewed P-values.
Collapse
Affiliation(s)
- Davide Buzzao
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 171 21 Solna, Sweden
| | | | - Dimitri Guala
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 171 21 Solna, Sweden
| | - Erik L L Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 171 21 Solna, Sweden
| |
Collapse
|
12
|
Somers J, Fenner M, Kong G, Thirumalaisamy D, Yashar WM, Thapa K, Kinali M, Nikolova O, Babur Ö, Demir E. A framework for considering prior information in network-based approaches to omics data analysis. Proteomics 2023; 23:e2200402. [PMID: 37986684 DOI: 10.1002/pmic.202200402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 09/20/2023] [Accepted: 09/21/2023] [Indexed: 11/22/2023]
Abstract
For decades, molecular biologists have been uncovering the mechanics of biological systems. Efforts to bring their findings together have led to the development of multiple databases and information systems that capture and present pathway information in a computable network format. Concurrently, the advent of modern omics technologies has empowered researchers to systematically profile cellular processes across different modalities. Numerous algorithms, methodologies, and tools have been developed to use prior knowledge networks (PKNs) in the analysis of omics datasets. Interestingly, it has been repeatedly demonstrated that the source of prior knowledge can greatly impact the results of a given analysis. For these methods to be successful it is paramount that their selection of PKNs is amenable to the data type and the computational task they aim to accomplish. Here we present a five-level framework that broadly describes network models in terms of their scope, level of detail, and ability to inform causal predictions. To contextualize this framework, we review a handful of network-based omics analysis methods at each level, while also describing the computational tasks they aim to accomplish.
Collapse
Affiliation(s)
- Julia Somers
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon, USA
| | - Madeleine Fenner
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon, USA
| | - Garth Kong
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon, USA
- Division of Oncological Sciences, Oregon Health and Science University, Portland, Oregon, USA
| | - Dharani Thirumalaisamy
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon, USA
| | - William M Yashar
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon, USA
- Division of Oncological Sciences, Oregon Health and Science University, Portland, Oregon, USA
| | - Kisan Thapa
- Computer Science Department, University of Massachusetts Boston, College of Science and Mathematics, Boston, Massachusetts, USA
| | - Meric Kinali
- Computer Science Department, University of Massachusetts Boston, College of Science and Mathematics, Boston, Massachusetts, USA
| | - Olga Nikolova
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon, USA
- Division of Oncological Sciences, Oregon Health and Science University, Portland, Oregon, USA
| | - Özgün Babur
- Computer Science Department, University of Massachusetts Boston, College of Science and Mathematics, Boston, Massachusetts, USA
| | - Emek Demir
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon, USA
| |
Collapse
|
13
|
Holubekova V, Loderer D, Grendar M, Mikolajcik P, Kolkova Z, Turyova E, Kudelova E, Kalman M, Marcinek J, Miklusica J, Laca L, Lasabova Z. Differential gene expression of immunity and inflammation genes in colorectal cancer using targeted RNA sequencing. Front Oncol 2023; 13:1206482. [PMID: 37869102 PMCID: PMC10586664 DOI: 10.3389/fonc.2023.1206482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Accepted: 08/24/2023] [Indexed: 10/24/2023] Open
Abstract
Introduction Colorectal cancer (CRC) is a heterogeneous disease caused by molecular changes, as driver mutations, gene methylations, etc., and influenced by tumor microenvironment (TME) pervaded with immune cells with both pro- and anti-tumor effects. The studying of interactions between the immune system (IS) and the TME is important for developing effective immunotherapeutic strategies for CRC. In our study, we focused on the analysis of expression profiles of inflammatory and immune-relevant genes to identify aberrant signaling pathways included in carcinogenesis, metastatic potential of tumors, and association of Kirsten rat sarcoma virus (KRAS) gene mutation. Methods A total of 91 patients were enrolled in the study. Using NGS, differential gene expression analysis of 11 tumor samples and 11 matching non-tumor controls was carried out by applying a targeted RNA panel for inflammation and immunity genes containing 475 target genes. The obtained data were evaluated by the CLC Genomics Workbench and R library. The significantly differentially expressed genes (DEGs) were analyzed in Reactome GSA software, and some selected DEGs were used for real-time PCR validation. Results After prioritization, the most significant differences in gene expression were shown by the genes TNFRSF4, IRF7, IL6R, NR3CI, EIF2AK2, MIF, CCL5, TNFSF10, CCL20, CXCL11, RIPK2, and BLNK. Validation analyses on 91 samples showed a correlation between RNA-seq data and qPCR for TNFSF10, RIPK2, and BLNK gene expression. The top differently regulated signaling pathways between the studied groups (cancer vs. control, metastatic vs. primary CRC and KRAS positive and negative CRC) belong to immune system, signal transduction, disease, gene expression, DNA repair, and programmed cell death. Conclusion Analyzed data suggest the changes at more levels of CRC carcinogenesis, including surface receptors of epithelial or immune cells, its signal transduction pathways, programmed cell death modifications, alterations in DNA repair machinery, and cell cycle control leading to uncontrolled proliferation. This study indicates only basic molecular pathways that enabled the formation of metastatic cancer stem cells and may contribute to clarifying the function of the IS in the TME of CRC. A precise identification of signaling pathways responsible for CRC may help in the selection of personalized pharmacological treatment.
Collapse
Affiliation(s)
- Veronika Holubekova
- Laboratory of Genomics and Prenatal Diagnostics, Biomedical Center in Martin, Jessenius Faculty of Medicine, Comenius University in Bratislava, Martin, Slovakia
| | - Dusan Loderer
- Laboratory of Genomics and Prenatal Diagnostics, Biomedical Center in Martin, Jessenius Faculty of Medicine, Comenius University in Bratislava, Martin, Slovakia
| | - Marian Grendar
- Laboratory of Bioinformatics and Biostatistics, Biomedical Center in Martin, Jessenius Faculty of Medicine, Comenius University in Bratislava, Martin, Slovakia
| | - Peter Mikolajcik
- Clinic of Surgery and Transplant Center, Jessenius Faculty of Medicine in Martin, Comenius University in Bratislava, Martin University Hospital, Martin, Slovakia
| | - Zuzana Kolkova
- Laboratory of Genomics and Prenatal Diagnostics, Biomedical Center in Martin, Jessenius Faculty of Medicine, Comenius University in Bratislava, Martin, Slovakia
| | - Eva Turyova
- Department of Molecular Biology and Genomics, Jessenius Faculty of Medicine in Martin, Comenius University in Bratislava, Martin, Slovakia
| | - Eva Kudelova
- Clinic of Surgery and Transplant Center, Jessenius Faculty of Medicine in Martin, Comenius University in Bratislava, Martin University Hospital, Martin, Slovakia
| | - Michal Kalman
- Department of Pathological Anatomy, Jessenius Faculty of Medicine, Comenius University in Bratislava, Martin University Hospital, Martin, Slovakia
| | - Juraj Marcinek
- Department of Pathological Anatomy, Jessenius Faculty of Medicine, Comenius University in Bratislava, Martin University Hospital, Martin, Slovakia
| | - Juraj Miklusica
- Clinic of Surgery and Transplant Center, Jessenius Faculty of Medicine in Martin, Comenius University in Bratislava, Martin University Hospital, Martin, Slovakia
| | - Ludovit Laca
- Clinic of Surgery and Transplant Center, Jessenius Faculty of Medicine in Martin, Comenius University in Bratislava, Martin University Hospital, Martin, Slovakia
| | - Zora Lasabova
- Department of Molecular Biology and Genomics, Jessenius Faculty of Medicine in Martin, Comenius University in Bratislava, Martin, Slovakia
| |
Collapse
|
14
|
Takeuchi F, Liang YQ, Shimizu-Furusawa H, Isono M, Ang MY, Mori K, Mori T, Kakazu E, Yoshio S, Kato N. Gene-regulation modules in nonalcoholic fatty liver disease revealed by single-nucleus ATAC-seq. Life Sci Alliance 2023; 6:e202301988. [PMID: 37491046 PMCID: PMC10368228 DOI: 10.26508/lsa.202301988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 07/14/2023] [Accepted: 07/14/2023] [Indexed: 07/27/2023] Open
Abstract
We investigated the progression of nonalcoholic fatty liver disease from fatty liver to steatohepatitis using single-nucleus and bulk ATAC-seq on the livers of rats fed a high-fat diet (HFD). Rats fed HFD for 4 wk developed fatty liver, and those fed HFD for 8 wk further progressed to steatohepatitis. We observed an increase in the proportion of inflammatory macrophages, consistent with the pathological progression. Utilizing machine learning, we divided global gene regulation into modules, wherein transcription factors within a module could regulate genes within the same module, reaffirming known regulatory relationships between transcription factors and biological processes. We identified core genes-central to co-expression and protein-protein interaction-for the biological processes discovered. Notably, a large part of the core genes overlapped with genes previously implicated in nonalcoholic fatty liver disease. Single-nucleus ATAC-seq, combined with data-driven statistical analysis, offers insight into in vivo global gene regulation as a combination of modules and assists in identifying core genes of relevant biological processes.
Collapse
Affiliation(s)
- Fumihiko Takeuchi
- Department of Gene Diagnostics and Therapeutics, Research Institute, National Center for Global Health and Medicine, Tokyo, Japan
- Medical Genomics Center, Research Institute, National Center for Global Health and Medicine, Tokyo, Japan
- Systems Genomics Laboratory, Baker Heart and Diabetes Institute, Melbourne, Australia
| | - Yi-Qiang Liang
- Department of Gene Diagnostics and Therapeutics, Research Institute, National Center for Global Health and Medicine, Tokyo, Japan
| | - Hana Shimizu-Furusawa
- Department of Hygiene and Public Health, School of Medicine, Teikyo University, Tokyo, Japan
| | - Masato Isono
- Department of Gene Diagnostics and Therapeutics, Research Institute, National Center for Global Health and Medicine, Tokyo, Japan
| | - Mia Yang Ang
- Department of Gene Diagnostics and Therapeutics, Research Institute, National Center for Global Health and Medicine, Tokyo, Japan
- Department of Clinical Genome Informatics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Kotaro Mori
- Medical Genomics Center, Research Institute, National Center for Global Health and Medicine, Tokyo, Japan
| | - Taizo Mori
- Department of Liver Diseases, The Research Center for Hepatitis and Immunology, National Center for Global Health and Medicine, Chiba, Japan
| | - Eiji Kakazu
- Department of Liver Diseases, The Research Center for Hepatitis and Immunology, National Center for Global Health and Medicine, Chiba, Japan
| | - Sachiyo Yoshio
- Department of Liver Diseases, The Research Center for Hepatitis and Immunology, National Center for Global Health and Medicine, Chiba, Japan
| | - Norihiro Kato
- Department of Gene Diagnostics and Therapeutics, Research Institute, National Center for Global Health and Medicine, Tokyo, Japan
- Medical Genomics Center, Research Institute, National Center for Global Health and Medicine, Tokyo, Japan
- Department of Clinical Genome Informatics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
15
|
Hakobyan S, Stepanyan A, Nersisyan L, Binder H, Arakelyan A. PSF toolkit: an R package for pathway curation and topology-aware analysis. Front Genet 2023; 14:1264656. [PMID: 37680201 PMCID: PMC10482229 DOI: 10.3389/fgene.2023.1264656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 08/09/2023] [Indexed: 09/09/2023] Open
Abstract
Most high throughput genomic data analysis pipelines currently rely on over-representation or gene set enrichment analysis (ORA/GSEA) approaches for functional analysis. In contrast, topology-based pathway analysis methods, which offer a more biologically informed perspective by incorporating interaction and topology information, have remained underutilized and inaccessible due to various limiting factors. These methods heavily rely on the quality of pathway topologies and often utilize predefined topologies from databases without assessing their correctness. To address these issues and make topology-aware pathway analysis more accessible and flexible, we introduce the PSF (Pathway Signal Flow) toolkit R package. Our toolkit integrates pathway curation and topology-based analysis, providing interactive and command-line tools that facilitate pathway importation, correction, and modification from diverse sources. This enables users to perform topology-based pathway signal flow analysis in both interactive and command-line modes. To showcase the toolkit's usability, we curated 36 KEGG signaling pathways and conducted several use-case studies, comparing our method with ORA and the topology-based signaling pathway impact analysis (SPIA) method. The results demonstrate that the algorithm can effectively identify ORA enriched pathways while providing more detailed branch-level information. Moreover, in contrast to the SPIA method, it offers the advantage of being cut-off free and less susceptible to the variability caused by selection thresholds. By combining pathway curation and topology-based analysis, the PSF toolkit enhances the quality, flexibility, and accessibility of topology-aware pathway analysis. Researchers can now easily import pathways from various sources, correct and modify them as needed, and perform detailed topology-based pathway signal flow analysis. In summary, our PSF toolkit offers an integrated solution that addresses the limitations of current topology-based pathway analysis methods. By providing interactive and command-line tools for pathway curation and topology-based analysis, we empower researchers to conduct comprehensive pathway analyses across a wide range of applications.
Collapse
Affiliation(s)
- Siras Hakobyan
- Bioinformatics Group, Institute of Molecular Biology, Armenian National Academy of Sciences, Yerevan, Armenia
- Armenian Bioinformatics Institute (ABI), Yerevan, Armenia
| | | | | | - Hans Binder
- Armenian Bioinformatics Institute, Yerevan, Armenia
- Interdisciplinary Centre for Bioinformatics, University of Leipzig, Leipzig, Germany
| | - Arsen Arakelyan
- Bioinformatics Group, Institute of Molecular Biology, Armenian National Academy of Sciences, Yerevan, Armenia
- Russian-Armenian University, Yerevan, Armenia
| |
Collapse
|
16
|
Xu S, Leng Y, Feng G, Zhang C, Chen M. A gene pathway enrichment method based on improved TF-IDF algorithm. Biochem Biophys Rep 2023; 34:101421. [PMID: 36923007 PMCID: PMC10009669 DOI: 10.1016/j.bbrep.2023.101421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 12/20/2022] [Accepted: 01/03/2023] [Indexed: 03/08/2023] Open
Abstract
Gene pathway enrichment analysis is a widely used method to analyze whether a gene set is statistically enriched on certain biological pathway network. Current gene pathway enrichment methods commonly consider local importance of genes in pathways without considering the interactions between genes. In this paper, we propose a gene pathway enrichment method (GIGSEA) based on improved TF-IDF algorithm. This method employs gene interaction data to calculate the influence of genes based on the local importance in a pathway as well as the global specificity. Computational experiment result shows that, compared with traditional gene set enrichment analysis method, our proposed method in this paper can find more specific enriched pathways related to phenotype with higher efficiency.
Collapse
Affiliation(s)
- Shutan Xu
- College of Information Technology, Shanghai Ocean University, Shanghai, 201306, China.,Key Laboratory of Fisheries Information, Ministry of Agriculture, Shanghai, 201306, China
| | - Yinhui Leng
- College of Information Technology, Shanghai Ocean University, Shanghai, 201306, China
| | - Guofu Feng
- College of Information Technology, Shanghai Ocean University, Shanghai, 201306, China
| | - Chenjing Zhang
- College of Information Technology, Shanghai Ocean University, Shanghai, 201306, China
| | - Ming Chen
- College of Information Technology, Shanghai Ocean University, Shanghai, 201306, China.,Key Laboratory of Fisheries Information, Ministry of Agriculture, Shanghai, 201306, China
| |
Collapse
|
17
|
Hicks EM, Seah C, Cote A, Marchese S, Brennand KJ, Nestler EJ, Girgenti MJ, Huckins LM. Integrating genetics and transcriptomics to study major depressive disorder: a conceptual framework, bioinformatic approaches, and recent findings. Transl Psychiatry 2023; 13:129. [PMID: 37076454 PMCID: PMC10115809 DOI: 10.1038/s41398-023-02412-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 03/17/2023] [Accepted: 03/24/2023] [Indexed: 04/21/2023] Open
Abstract
Major depressive disorder (MDD) is a complex and heterogeneous psychiatric syndrome with genetic and environmental influences. In addition to neuroanatomical and circuit-level disturbances, dysregulation of the brain transcriptome is a key phenotypic signature of MDD. Postmortem brain gene expression data are uniquely valuable resources for identifying this signature and key genomic drivers in human depression; however, the scarcity of brain tissue limits our capacity to observe the dynamic transcriptional landscape of MDD. It is therefore crucial to explore and integrate depression and stress transcriptomic data from numerous, complementary perspectives to construct a richer understanding of the pathophysiology of depression. In this review, we discuss multiple approaches for exploring the brain transcriptome reflecting dynamic stages of MDD: predisposition, onset, and illness. We next highlight bioinformatic approaches for hypothesis-free, genome-wide analyses of genomic and transcriptomic data and their integration. Last, we summarize the findings of recent genetic and transcriptomic studies within this conceptual framework.
Collapse
Affiliation(s)
- Emily M Hicks
- Pamela Sklar Division of Psychiatric Genomics, Departments of Psychiatry and of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
- Nash Family Department of Neuroscience, Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
| | - Carina Seah
- Pamela Sklar Division of Psychiatric Genomics, Departments of Psychiatry and of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
- Nash Family Department of Neuroscience, Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
| | - Alanna Cote
- Pamela Sklar Division of Psychiatric Genomics, Departments of Psychiatry and of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
| | - Shelby Marchese
- Pamela Sklar Division of Psychiatric Genomics, Departments of Psychiatry and of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
| | - Kristen J Brennand
- Pamela Sklar Division of Psychiatric Genomics, Departments of Psychiatry and of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
- Nash Family Department of Neuroscience, Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
- Department of Genetics, Yale University School of Medicine, New Haven, CT, 06511, USA
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, 06511, USA
| | - Eric J Nestler
- Nash Family Department of Neuroscience, Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
| | - Matthew J Girgenti
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, 06511, USA.
| | - Laura M Huckins
- Pamela Sklar Division of Psychiatric Genomics, Departments of Psychiatry and of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA.
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, 06511, USA.
| |
Collapse
|
18
|
Angel-Velez D, Meese T, Hedia M, Fernandez-Montoro A, De Coster T, Pascottini OB, Van Nieuwerburgh F, Govaere J, Van Soom A, Pavani K, Smits K. Transcriptomics Reveal Molecular Differences in Equine Oocytes Vitrified before and after In Vitro Maturation. Int J Mol Sci 2023; 24:ijms24086915. [PMID: 37108081 PMCID: PMC10138936 DOI: 10.3390/ijms24086915] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 03/27/2023] [Accepted: 04/04/2023] [Indexed: 04/29/2023] Open
Abstract
In the last decade, in vitro embryo production in horses has become an established clinical practice, but blastocyst rates from vitrified equine oocytes remain low. Cryopreservation impairs the oocyte developmental potential, which may be reflected in the messenger RNA (mRNA) profile. Therefore, this study aimed to compare the transcriptome profiles of metaphase II equine oocytes vitrified before and after in vitro maturation. To do so, three groups were analyzed with RNA sequencing: (1) fresh in vitro matured oocytes as a control (FR), (2) oocytes vitrified after in vitro maturation (VMAT), and (3) oocytes vitrified immature, warmed, and in vitro matured (VIM). In comparison with fresh oocytes, VIM resulted in 46 differentially expressed (DE) genes (14 upregulated and 32 downregulated), while VMAT showed 36 DE genes (18 in each category). A comparison of VIM vs. VMAT resulted in 44 DE genes (20 upregulated and 24 downregulated). Pathway analyses highlighted cytoskeleton, spindle formation, and calcium and cation ion transport and homeostasis as the main affected pathways in vitrified oocytes. The vitrification of in vitro matured oocytes presented subtle advantages in terms of the mRNA profile over the vitrification of immature oocytes. Therefore, this study provides a new perspective for understanding the impact of vitrification on equine oocytes and can be the basis for further improvements in the efficiency of equine oocyte vitrification.
Collapse
Affiliation(s)
- Daniel Angel-Velez
- Department of Internal Medicine, Reproduction and Population Medicine, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, 9820 Merelbeke, Belgium
- Research Group in Animal Sciences-INCA-CES, Universidad CES, Medellin 050021, Colombia
| | - Tim Meese
- Laboratory for Pharmaceutical Biotechnology, Faculty of Pharmaceutical Science, Ghent University, 9000 Ghent, Belgium
| | - Mohamed Hedia
- Department of Internal Medicine, Reproduction and Population Medicine, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, 9820 Merelbeke, Belgium
- Department of Theriogenology, Faculty of Veterinary Medicine, Cairo University, Giza 12211, Egypt
| | - Andrea Fernandez-Montoro
- Department of Internal Medicine, Reproduction and Population Medicine, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, 9820 Merelbeke, Belgium
| | - Tine De Coster
- Department of Internal Medicine, Reproduction and Population Medicine, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, 9820 Merelbeke, Belgium
| | - Osvaldo Bogado Pascottini
- Department of Internal Medicine, Reproduction and Population Medicine, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, 9820 Merelbeke, Belgium
| | - Filip Van Nieuwerburgh
- Laboratory for Pharmaceutical Biotechnology, Faculty of Pharmaceutical Science, Ghent University, 9000 Ghent, Belgium
| | - Jan Govaere
- Department of Internal Medicine, Reproduction and Population Medicine, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, 9820 Merelbeke, Belgium
| | - Ann Van Soom
- Department of Internal Medicine, Reproduction and Population Medicine, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, 9820 Merelbeke, Belgium
| | - Krishna Pavani
- Department of Internal Medicine, Reproduction and Population Medicine, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, 9820 Merelbeke, Belgium
- Department for Reproductive Medicine, Ghent University Hospital, Corneel Heymanslaan 10, 9000 Gent, Belgium
| | - Katrien Smits
- Department of Internal Medicine, Reproduction and Population Medicine, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, 9820 Merelbeke, Belgium
| |
Collapse
|
19
|
Sosa F, Uh K, Drum JN, Stoecklein KS, Davenport KM, Sofia Ortega M, Lee K, Hansen PJ. Disruption of CSF2RA in the bovine preimplantation embryo reduces development and affects embryonic gene expression in utero. REPRODUCTION AND FERTILITY 2023; 4:RAF-23-0001. [PMID: 37000631 PMCID: PMC10160533 DOI: 10.1530/raf-23-0001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 03/31/2023] [Indexed: 04/01/2023] Open
Abstract
The hypothesis that CSF2 plays a role in the preimplantation development of the bovine embryo was tested by evaluating consequences of inactivation of CSF2RA (the functional receptor in the embryo) for development of embryos in utero. CRISPR/Cas9 was used to alter sequences on exon 5 and intron 5 of CSF2RA, Control embryos were injected with Cas9 mRNA only. Embryos > 16 cells at day 5 after insemination were transferred to synchronized recipient females in groups of 7 to 24. Embryos were flushed from the uterus two days later. The proportion of recovered embryos that developed to the blastocyst stage was lower for knockout embryos (39%) than for control embryos (63%). RNA sequencing of individual morulae and blastocysts indicated a total of 27 (morula) or 15 (blastocyst) differentially-expressed genes (false discovery rate <0.05). Gene set enrichment analysis indicated that the knockout affected genes playing roles in several functions including cell signaling and glycosylation. It was concluded that signaling through CSF2RA is not obligatory for development of the bovine preimplantation embryo to the blastocyst stage but that CSF2 signaling does enhance the likelihood that the embryo can become a blastocyst and result in specific changes in gene expression.
Collapse
Affiliation(s)
- Froylan Sosa
- Department of Animal Sciences, D.H. Barron Reproductive and Perinatal Biology Research Program, and Genetics Institute, University of Florida, Gainesville, Florida, USA
| | - Kyungjun Uh
- Division of Animal Sciences, University of Missouri, Columbia, Missouri, USA
| | - Jéssica N Drum
- Division of Animal Sciences, University of Missouri, Columbia, Missouri, USA
| | - Katy S Stoecklein
- Division of Animal Sciences, University of Missouri, Columbia, Missouri, USA
| | | | - M Sofia Ortega
- Department of Animal & Dairy Sciences, University of Wisconsin, Madison, Wisconsin, USA
| | - Kiho Lee
- Division of Animal Sciences, University of Missouri, Columbia, Missouri, USA
| | - Peter J Hansen
- Department of Animal Sciences, D.H. Barron Reproductive and Perinatal Biology Research Program, and Genetics Institute, University of Florida, Gainesville, Florida, USA
| |
Collapse
|
20
|
Lu Y, Pang Z, Xia J. Comprehensive investigation of pathway enrichment methods for functional interpretation of LC-MS global metabolomics data. Brief Bioinform 2023; 24:bbac553. [PMID: 36572652 PMCID: PMC9851290 DOI: 10.1093/bib/bbac553] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 10/31/2022] [Accepted: 11/15/2022] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Global or untargeted metabolomics is widely used to comprehensively investigate metabolic profiles under various pathophysiological conditions such as inflammations, infections, responses to exposures or interactions with microbial communities. However, biological interpretation of global metabolomics data remains a daunting task. Recent years have seen growing applications of pathway enrichment analysis based on putative annotations of liquid chromatography coupled with mass spectrometry (LC-MS) peaks for functional interpretation of LC-MS-based global metabolomics data. However, due to intricate peak-metabolite and metabolite-pathway relationships, considerable variations are observed among results obtained using different approaches. There is an urgent need to benchmark these approaches to inform the best practices. RESULTS We have conducted a benchmark study of common peak annotation approaches and pathway enrichment methods in current metabolomics studies. Representative approaches, including three peak annotation methods and four enrichment methods, were selected and benchmarked under different scenarios. Based on the results, we have provided a set of recommendations regarding peak annotation, ranking metrics and feature selection. The overall better performance was obtained for the mummichog approach. We have observed that a ~30% annotation rate is sufficient to achieve high recall (~90% based on mummichog), and using semi-annotated data improves functional interpretation. Based on the current platforms and enrichment methods, we further propose an identifiability index to indicate the possibility of a pathway being reliably identified. Finally, we evaluated all methods using 11 COVID-19 and 8 inflammatory bowel diseases (IBD) global metabolomics datasets.
Collapse
Affiliation(s)
- Yao Lu
- Department of Microbiology and Immunology, McGill University, Quebec, Canada
| | - Zhiqiang Pang
- Institute of Parasitology, McGill University, Quebec, Canada
| | - Jianguo Xia
- Department of Microbiology and Immunology, McGill University, Quebec, Canada
- Institute of Parasitology, McGill University, Quebec, Canada
| |
Collapse
|
21
|
Ai H, Meng F, Ai Y. PathwayKO: An integrated platform for deciphering the systems-level signaling pathways. Front Immunol 2023; 14:1103392. [PMID: 37033947 PMCID: PMC10080220 DOI: 10.3389/fimmu.2023.1103392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Accepted: 03/01/2023] [Indexed: 04/11/2023] Open
Abstract
Systems characterization of immune landscapes in health, disease and clinical intervention cases is a priority in modern medicine. High-throughput transcriptomes accumulated from gene-knockout (KO) experiments are crucial for deciphering target KO signaling pathways that are impaired by KO genes at the systems-level. There is a demand for integrative platforms. This article describes the PathwayKO platform, which has integrated state-of-the-art methods of pathway enrichment analysis, statistics analysis, and visualizing analysis to conduct cutting-edge integrative pathway analysis in a pipeline fashion and decipher target KO signaling pathways at the systems-level. We focus on describing the methodology, principles and application features of PathwayKO. First, we demonstrate that the PathwayKO platform can be utilized to comprehensively analyze real-world mouse KO transcriptomes (GSE22873 and GSE24327), which reveal systemic mechanisms underlying the innate immune responses triggered by non-infectious extensive hepatectomy (2 hours after 85% liver resection surgery) and infectious CASP-model sepsis (12 hours after CASP-model surgery). Strikingly, our results indicate that both cases hit the same core set of 21 KO MyD88-associated signaling pathways, including the Toll-like receptor signaling pathway, the NFκB signaling pathway, the MAPK signaling pathway, and the PD-L1 expression and PD-1 checkpoint pathway in cancer, alongside the pathways of bacterial, viral and parasitic infections. These findings suggest common fundamental mechanisms between these immune responses and offer informative cues that warrant future experimental validation. Such mechanisms in mice may serve as models for humans and ultimately guide formulating the research paradigms and composite strategies to reduce the high mortality rates of patients in intensive care units who have undergone successful traumatic surgical treatments. Second, we demonstrate that the PathwayKO platform model-based assessments can effectively evaluate the performance difference of pathway analysis methods when benchmarked with a collection of proper transcriptomes. Together, such advances in methods for deciphering biological insights at the systems-level may benefit the fields of bioinformatics, systems immunology and beyond.
Collapse
Affiliation(s)
- Hannan Ai
- State Key Laboratory for Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
- Department of Electrical and Computer Engineering, The Grainger College of Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, United States
- National Center for Quality Supervision and Inspection of Automatic Equipment, National Center for Testing and Evaluation of Robots (Guangzhou), CRAT, SINOMACH-IT, Guangzhou, China
- *Correspondence: Hannan Ai, ; Yuncan Ai, .cn
| | - Fanmei Meng
- State Key Laboratory for Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Yuncan Ai
- State Key Laboratory for Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
- The Second Affiliated Hospital, Guangdong Provincial Key Laboratory of Allergy & Clinical Immunology, Center for Inflammation, Immunity & Immune-mediated Disease, Sino-French Hoffmann Institute, Guangzhou Medical University, Guangzhou, Guangdong, China
- *Correspondence: Hannan Ai, ; Yuncan Ai, .cn
| |
Collapse
|
22
|
Cousins H, Hall T, Guo Y, Tso L, Tzeng KTH, Cong L, Altman RB. Gene set proximity analysis: expanding gene set enrichment analysis through learned geometric embeddings, with drug-repurposing applications in COVID-19. Bioinformatics 2023; 39:btac735. [PMID: 36394254 PMCID: PMC9805577 DOI: 10.1093/bioinformatics/btac735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 09/27/2022] [Accepted: 11/16/2022] [Indexed: 11/18/2022] Open
Abstract
MOTIVATION Gene set analysis methods rely on knowledge-based representations of genetic interactions in the form of both gene set collections and protein-protein interaction (PPI) networks. However, explicit representations of genetic interactions often fail to capture complex interdependencies among genes, limiting the analytic power of such methods. RESULTS We propose an extension of gene set enrichment analysis to a latent embedding space reflecting PPI network topology, called gene set proximity analysis (GSPA). Compared with existing methods, GSPA provides improved ability to identify disease-associated pathways in disease-matched gene expression datasets, while improving reproducibility of enrichment statistics for similar gene sets. GSPA is statistically straightforward, reducing to a version of traditional gene set enrichment analysis through a single user-defined parameter. We apply our method to identify novel drug associations with SARS-CoV-2 viral entry. Finally, we validate our drug association predictions through retrospective clinical analysis of claims data from 8 million patients, supporting a role for gabapentin as a risk factor and metformin as a protective factor for severe COVID-19. AVAILABILITY AND IMPLEMENTATION GSPA is available for download as a command-line Python package at https://github.com/henrycousins/gspa. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Henry Cousins
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Taryn Hall
- Optum Labs at UnitedHealth Group, Minneapolis, MN 55343, USA
| | - Yinglong Guo
- Optum Labs at UnitedHealth Group, Minneapolis, MN 55343, USA
| | - Luke Tso
- Optum Labs at UnitedHealth Group, Minneapolis, MN 55343, USA
| | - Kathy T H Tzeng
- Optum Labs at UnitedHealth Group, Minneapolis, MN 55343, USA
| | - Le Cong
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Russ B Altman
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
23
|
Maghsoudi Z, Nguyen H, Tavakkoli A, Nguyen T. A comprehensive survey of the approaches for pathway analysis using multi-omics data integration. Brief Bioinform 2022; 23:6761962. [PMID: 36252928 PMCID: PMC9677478 DOI: 10.1093/bib/bbac435] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 08/26/2022] [Accepted: 09/08/2022] [Indexed: 02/07/2023] Open
Abstract
Pathway analysis has been widely used to detect pathways and functions associated with complex disease phenotypes. The proliferation of this approach is due to better interpretability of its results and its higher statistical power compared with the gene-level statistics. A plethora of pathway analysis methods that utilize multi-omics setup, rather than just transcriptomics or proteomics, have recently been developed to discover novel pathways and biomarkers. Since multi-omics gives multiple views into the same problem, different approaches are employed in aggregating these views into a comprehensive biological context. As a result, a variety of novel hypotheses regarding disease ideation and treatment targets can be formulated. In this article, we review 32 such pathway analysis methods developed for multi-omics and multi-cohort data. We discuss their availability and implementation, assumptions, supported omics types and databases, pathway analysis techniques and integration strategies. A comprehensive assessment of each method's practicality, and a thorough discussion of the strengths and drawbacks of each technique will be provided. The main objective of this survey is to provide a thorough examination of existing methods to assist potential users and researchers in selecting suitable tools for their data and analysis purposes, while highlighting outstanding challenges in the field that remain to be addressed for future development.
Collapse
Affiliation(s)
- Zeynab Maghsoudi
- Department of Computer Science and Engineering, University of Nevada, Reno, 89557, Nevada, USA
| | - Ha Nguyen
- Department of Computer Science and Engineering, University of Nevada, Reno, 89557, Nevada, USA
| | - Alireza Tavakkoli
- Department of Computer Science and Engineering, University of Nevada, Reno, 89557, Nevada, USA
| | - Tin Nguyen
- Corresponding author: Tin Nguyen, Department of Computer Science and Engineering, University of Nevada, Reno, NV, USA. Tel.: +1-775-784-6619;
| |
Collapse
|
24
|
Liu H, Yuan M, Mitra R, Zhou X, Long M, Lei W, Zhou S, Huang YE, Hou F, Eischen CM, Jiang W. CTpathway: a CrossTalk-based pathway enrichment analysis method for cancer research. Genome Med 2022; 14:118. [PMID: 36229842 PMCID: PMC9563764 DOI: 10.1186/s13073-022-01119-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Accepted: 09/26/2022] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Pathway enrichment analysis (PEA) is a common method for exploring functions of hundreds of genes and identifying disease-risk pathways. Moreover, different pathways exert their functions through crosstalk. However, existing PEA methods do not sufficiently integrate essential pathway features, including pathway crosstalk, molecular interactions, and network topologies, resulting in many risk pathways that remain uninvestigated. METHODS To overcome these limitations, we develop a new crosstalk-based PEA method, CTpathway, based on a global pathway crosstalk map (GPCM) with >440,000 edges by combing pathways from eight resources, transcription factor-gene regulations, and large-scale protein-protein interactions. Integrating gene differential expression and crosstalk effects in GPCM, we assign a risk score to genes in the GPCM and identify risk pathways enriched with the risk genes. RESULTS Analysis of >8300 expression profiles covering ten cancer tissues and blood samples indicates that CTpathway outperforms the current state-of-the-art methods in identifying risk pathways with higher accuracy, reproducibility, and speed. CTpathway recapitulates known risk pathways and exclusively identifies several previously unreported critical pathways for individual cancer types. CTpathway also outperforms other methods in identifying risk pathways across all cancer stages, including early-stage cancer with a small number of differentially expressed genes. Moreover, the robust design of CTpathway enables researchers to analyze both bulk and single-cell RNA-seq profiles to predict both cancer tissue and cell type-specific risk pathways with higher accuracy. CONCLUSIONS Collectively, CTpathway is a fast, accurate, and stable pathway enrichment analysis method for cancer research that can be used to identify cancer risk pathways. The CTpathway interactive web server can be accessed here http://www.jianglab.cn/CTpathway/ . The stand-alone program can be accessed here https://github.com/Bioccjw/CTpathway .
Collapse
Affiliation(s)
- Haizhou Liu
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, No. 29, Jiangjun Avenue, Nanjing, 211106, Jiangsu Province, China
| | - Mengqin Yuan
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, No. 29, Jiangjun Avenue, Nanjing, 211106, Jiangsu Province, China
| | - Ramkrishna Mitra
- Department of Pharmacology, Physiology, and Cancer Biology, Sidney Kimmel Cancer Center, Thomas Jefferson University, 233 South 10th St., Philadelphia, PA, 19107, USA
| | - Xu Zhou
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, No. 29, Jiangjun Avenue, Nanjing, 211106, Jiangsu Province, China
| | - Min Long
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, No. 29, Jiangjun Avenue, Nanjing, 211106, Jiangsu Province, China
| | - Wanyue Lei
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, No. 29, Jiangjun Avenue, Nanjing, 211106, Jiangsu Province, China
| | - Shunheng Zhou
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, No. 29, Jiangjun Avenue, Nanjing, 211106, Jiangsu Province, China
| | - Yu-E Huang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, No. 29, Jiangjun Avenue, Nanjing, 211106, Jiangsu Province, China
| | - Fei Hou
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, No. 29, Jiangjun Avenue, Nanjing, 211106, Jiangsu Province, China
| | - Christine M Eischen
- Department of Pharmacology, Physiology, and Cancer Biology, Sidney Kimmel Cancer Center, Thomas Jefferson University, 233 South 10th St., Philadelphia, PA, 19107, USA.
| | - Wei Jiang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, No. 29, Jiangjun Avenue, Nanjing, 211106, Jiangsu Province, China.
| |
Collapse
|
25
|
Makrooni MA, O’Shea D, Geeleher P, Seoighe C. Random-effects meta-analysis of effect sizes as a unified framework for gene set analysis. PLoS Comput Biol 2022; 18:e1010278. [PMID: 36197939 PMCID: PMC9576052 DOI: 10.1371/journal.pcbi.1010278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 10/17/2022] [Accepted: 09/18/2022] [Indexed: 11/06/2022] Open
Abstract
Gene set analysis (GSA) remains a common step in genome-scale studies because it can reveal insights that are not apparent from results obtained for individual genes. Many different computational tools are applied for GSA, which may be sensitive to different types of signals; however, most methods implicitly test whether there are differences in the distribution of the effect of some experimental condition between genes in gene sets of interest. We have developed a unifying framework for GSA that first fits effect size distributions, and then tests for differences in these distributions between gene sets. These differences can be in the proportions of genes that are perturbed or in the sign or size of the effects. Inspired by statistical meta-analysis, we take into account the uncertainty in effect size estimates by reducing the influence of genes with greater uncertainty on the estimation of distribution parameters. We demonstrate, using simulation and by application to real data, that this approach provides significant gains in performance over existing methods. Furthermore, the statistical tests carried out are defined in terms of effect sizes, rather than the results of prior statistical tests measuring these changes, which leads to improved interpretability and greater robustness to variation in sample sizes. The role of gene set analysis is to identify groups of genes that are perturbed in a genomics experiment. There are many tools available for this task and they do not all test for the same types of changes. Here we propose a new way to carry out gene set analysis that involves first working out the distribution of the group effect in the gene set and then comparing this distribution to the equivalent distribution in other genes. Tests performed by existing tools for gene set analysis can be related to different comparisons in these distributions of group effects. A unified framework for gene set analysis provides for more explicit null hypotheses against which to test sets of genes for different types of responses to the experimental conditions. These results are more interpretable, because the group effect distributions can be compared visually, providing an indication of how the experimental effect differs between the gene sets.
Collapse
Affiliation(s)
- Mohammad A. Makrooni
- School of Mathematical and Statistical Sciences, University of Galway, Galway, Ireland
| | - Dónal O’Shea
- School of Mathematical and Statistical Sciences, University of Galway, Galway, Ireland
| | - Paul Geeleher
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, Tennessee, United States of America
| | - Cathal Seoighe
- School of Mathematical and Statistical Sciences, University of Galway, Galway, Ireland,* E-mail:
| |
Collapse
|
26
|
Grassi M, Tarantino B. SEMgsa: topology-based pathway enrichment analysis with structural equation models. BMC Bioinformatics 2022; 23:344. [PMID: 35978279 PMCID: PMC9385099 DOI: 10.1186/s12859-022-04884-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 08/09/2022] [Indexed: 11/25/2022] Open
Abstract
Background Pathway enrichment analysis is extensively used in high-throughput experimental studies to gain insight into the functional roles of pre-defined subsets of genes, proteins and metabolites. Methods that leverages information on the topology of the underlying pathways outperform simpler methods that only consider pathway membership, leading to improved performance. Among all the proposed software tools, there’s the need to combine high statistical power together with a user-friendly framework, making it difficult to choose the best method for a particular experimental environment. Results We propose SEMgsa, a topology-based algorithm developed into the framework of structural equation models. SEMgsa combine the SEM p values regarding node-specific group effect estimates in terms of activation or inhibition, after statistically controlling biological relations among genes within pathways. We used SEMgsa to identify biologically relevant results in a Coronavirus disease (COVID-19) RNA-seq dataset (GEO accession: GSE172114) together with a frontotemporal dementia (FTD) DNA methylation dataset (GEO accession: GSE53740) and compared its performance with some existing methods. SEMgsa is highly sensitive to the pathways designed for the specific disease, showing low p values (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$< 0.001$$\end{document}<0.001) and ranking in high positions, outperforming existing software tools. Three pathway dysregulation mechanisms were used to generate simulated expression data and evaluate the performance of methods in terms of type I error followed by their statistical power. Simulation results confirm best overall performance of SEMgsa. Conclusions SEMgsa is a novel yet powerful method for identifying enrichment with regard to gene expression data. It takes into account topological information and exploits pathway perturbation statistics to reveal biological information. SEMgsa is implemented in the R package SEMgraph, easily available at https://CRAN.R-project.org/package=SEMgraph. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04884-8.
Collapse
Affiliation(s)
- Mario Grassi
- Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy
| | - Barbara Tarantino
- Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy.
| |
Collapse
|
27
|
Ai H, Li B, Meng F, Ai Y. CASP-Model Sepsis Triggers Systemic Innate Immune Responses Revealed by the Systems-Level Signaling Pathways. Front Immunol 2022; 13:907646. [PMID: 35774781 PMCID: PMC9238352 DOI: 10.3389/fimmu.2022.907646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 04/28/2022] [Indexed: 12/05/2022] Open
Abstract
Colon ascendens stent peritonitis (CASP) surgery induces a leakage of intestinal contents which may cause polymicrobial sepsis related to post-operative failure of remote multi-organs (including kidney, liver, lung and heart) and possible death from systemic syndromes. Mechanisms underlying such phenomena remain unclear. This article aims to elucidate the mechanisms underlying the CASP-model sepsis by analyzing real-world GEO data (GSE24327_A, B and C) generated from mice spleen 12 hours after a CASP-surgery in septic MyD88-deficient and wildtype mice, compared with untreated wildtype mice. Firstly, we identify and characterize 21 KO MyD88-associated signaling pathways, on which true key regulators (including ligands, receptors, adaptors, transducers, transcriptional factors and cytokines) are marked, which were coordinately, significantly, and differentially expressed at the systems-level, thus providing massive potential biomarkers that warrant experimental validations in the future. Secondly, we observe the full range of polymicrobial (viral, bacterial, and parasitic) sepsis triggered by the CASP-surgery by comparing the coordinated up- or down-regulations of true regulators among the experimental treatments born by the three data under study. Finally, we discuss the observed phenomena of “systemic syndrome”, “cytokine storm” and “KO MyD88 attenuation”, as well as the proposed hypothesis of “spleen-mediated immune-cell infiltration”. Together, our results provide novel insights into a better understanding of innate immune responses triggered by the CASP-model sepsis in both wildtype and MyD88-deficient mice at the systems-level in a broader vision. This may serve as a model for humans and ultimately guide formulating the research paradigms and composite strategies for the early diagnosis and prevention of sepsis.
Collapse
Affiliation(s)
- Hannan Ai
- State Key Laboratory for Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
- Department of Electrical and Computer Engineering, The Grainger College of Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, United States
- National Center for Quality Supervision and Inspection of Automatic Equipment, National Center for Testing and Evaluation of Robots (Guangzhou), CRAT, SINOMACH-IT, Guangzhou, China
- *Correspondence: Hannan Ai, ; Yuncan Ai,
| | - Bizhou Li
- State Key Laboratory for Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Fanmei Meng
- State Key Laboratory for Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Yuncan Ai
- State Key Laboratory for Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
- The Second Affiliated Hospital, Guangdong Provincial Key Laboratory of Allergy & Clinical Immunology, Center for Inflammation, Immunity & Immune-mediated Disease, Sino-French Hoffmann Institute, Guangzhou Medical University, Guangzhou, China
- *Correspondence: Hannan Ai, ; Yuncan Ai,
| |
Collapse
|
28
|
Mubeen S, Tom Kodamullil A, Hofmann-Apitius M, Domingo-Fernández D. On the influence of several factors on pathway enrichment analysis. Brief Bioinform 2022; 23:bbac143. [PMID: 35453140 PMCID: PMC9116215 DOI: 10.1093/bib/bbac143] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 03/21/2022] [Accepted: 03/30/2022] [Indexed: 02/01/2023] Open
Abstract
Pathway enrichment analysis has become a widely used knowledge-based approach for the interpretation of biomedical data. Its popularity has led to an explosion of both enrichment methods and pathway databases. While the elegance of pathway enrichment lies in its simplicity, multiple factors can impact the results of such an analysis, which may not be accounted for. Researchers may fail to give influential aspects their due, resorting instead to popular methods and gene set collections, or default settings. Despite ongoing efforts to establish set guidelines, meaningful results are still hampered by a lack of consensus or gold standards around how enrichment analysis should be conducted. Nonetheless, such concerns have prompted a series of benchmark studies specifically focused on evaluating the influence of various factors on pathway enrichment results. In this review, we organize and summarize the findings of these benchmarks to provide a comprehensive overview on the influence of these factors. Our work covers a broad spectrum of factors, spanning from methodological assumptions to those related to prior biological knowledge, such as pathway definitions and database choice. In doing so, we aim to shed light on how these aspects can lead to insignificant, uninteresting or even contradictory results. Finally, we conclude the review by proposing future benchmarks as well as solutions to overcome some of the challenges, which originate from the outlined factors.
Collapse
Affiliation(s)
- Sarah Mubeen
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53115 Bonn, Germany
- Fraunhofer Center for Machine Learning, Germany
| | - Alpha Tom Kodamullil
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53115 Bonn, Germany
| | - Daniel Domingo-Fernández
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
- Fraunhofer Center for Machine Learning, Germany
- Enveda Biosciences, Boulder, CO, 80301, USA
| |
Collapse
|
29
|
Ke X, Wu H, Chen YX, Guo Y, Yao S, Guo MR, Duan YY, Wang NN, Shi W, Wang C, Dong SS, Kang H, Dai Z, Yang TL. Individualized pathway activity algorithm identifies oncogenic pathways in pan-cancer analysis. EBioMedicine 2022; 79:104014. [PMID: 35487057 PMCID: PMC9117264 DOI: 10.1016/j.ebiom.2022.104014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 04/04/2022] [Accepted: 04/05/2022] [Indexed: 02/07/2023] Open
Abstract
Background Accumulative evidences have shown that dysregulation of biological pathways contributed to the initiation and progression of malignant tumours. Several methods for pathway activity measurement have been proposed, but they are restricted to making comparisons between groups or sensitive to experimental batch effects. Methods We introduced a novel method for individualized pathway activity measurement (IPAM) that is based on the ranking of gene expression levels in individual sample. Taking advantage of IPAM, we calculated the pathway activity of 318 pathways from KEGG database in the 10528 tumour/normal samples of 33 cancer types from TCGA to identify characteristic dysregulated pathways among different cancer types. Findings IPAM precisely quantified the level of activity of each pathway in pan-cancer analysis and exhibited better performance in cancer classification and prognosis prediction over five widely used tools. The average ROC-AUC of cancer diagnostic model using tumour-educated platelets (TEPs) reached 92.84%, suggesting the potential of our algorithm in early diagnosis of cancer. We identified several pathways significantly deregulated and associated with patient survival in a large fraction of cancer types, such as tyrosine metabolism, fatty acid degradation, cell cycle, p53 signalling pathway and DNA replication. We also confirmed the dominant role of metabolic pathways in cancer pathway dysregulation and identified the driving factors of specific pathway dysregulation, such as PPARA for branched-chain amino acid metabolism and NR1I2, NR1I3 for fatty acid metabolism. Interpretation Our study will provide novel clues for understanding the pathological mechanisms of cancer, ultimately paving the way for personalized medicine of cancer. Funding A full list of funding can be found in the Acknowledgements section.
Collapse
Affiliation(s)
- Xin Ke
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics and Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, PR China
| | - Hao Wu
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics and Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, PR China
| | - Yi-Xiao Chen
- National and Local Joint Engineering Research Center of Biodiagnosis and Biotherapy, The Second Affiliated Hospital, Xi'an Jiaotong University, Xi'an, Shaanxi 710004, PR China
| | - Yan Guo
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics and Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, PR China
| | - Shi Yao
- National and Local Joint Engineering Research Center of Biodiagnosis and Biotherapy, The Second Affiliated Hospital, Xi'an Jiaotong University, Xi'an, Shaanxi 710004, PR China
| | - Ming-Rui Guo
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics and Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, PR China
| | - Yuan-Yuan Duan
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics and Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, PR China
| | - Nai-Ning Wang
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics and Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, PR China
| | - Wei Shi
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics and Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, PR China
| | - Chen Wang
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics and Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, PR China
| | - Shan-Shan Dong
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics and Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, PR China
| | - Huafeng Kang
- Department of Oncology, The Second Affiliated Hospital, Xi'an Jiaotong University, Xi'an, Shaanxi 710004, PR China
| | - Zhijun Dai
- Department of Breast Surgery, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou, PR China
| | - Tie-Lin Yang
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics and Genomics Center, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, PR China; National and Local Joint Engineering Research Center of Biodiagnosis and Biotherapy, The Second Affiliated Hospital, Xi'an Jiaotong University, Xi'an, Shaanxi 710004, PR China.
| |
Collapse
|
30
|
Network- and enrichment-based inference of phenotypes and targets from large-scale disease maps. NPJ Syst Biol Appl 2022; 8:13. [PMID: 35473910 PMCID: PMC9042890 DOI: 10.1038/s41540-022-00222-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Accepted: 03/22/2022] [Indexed: 01/09/2023] Open
Abstract
Complex diseases are inherently multifaceted, and the associated data are often heterogeneous, making linking interactions across genes, metabolites, RNA, proteins, cellular functions, and clinically relevant phenotypes a high-priority challenge. Disease maps have emerged as knowledge bases that capture molecular interactions, disease-related processes, and disease phenotypes with standardized representations in large-scale molecular interaction maps. Various tools are available for disease map analysis, but an intuitive solution to perform in silico experiments on the maps in a wide range of contexts and analyze high-dimensional data is currently missing. To this end, we introduce a two-dimensional enrichment analysis (2DEA) approach to infer downstream and upstream elements through the statistical association of network topology parameters and fold changes from molecular perturbations. We implemented our approach in a plugin suite for the MINERVA platform, providing an environment where experimental data can be mapped onto a disease map and predict potential regulatory interactions through an intuitive graphical user interface. We show several workflows using this approach and analyze two RNA-seq datasets in the Atlas of Inflammation Resolution (AIR) to identify enriched downstream processes and upstream transcription factors. Our work improves the usability of disease maps and increases their functionality by facilitating multi-omics data integration and exploration.
Collapse
|
31
|
Yue Z, Slominski R, Bharti S, Chen JY. PAGER Web APP: An Interactive, Online Gene Set and Network Interpretation Tool for Functional Genomics. Front Genet 2022; 13:820361. [PMID: 35495152 PMCID: PMC9039620 DOI: 10.3389/fgene.2022.820361] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Accepted: 03/17/2022] [Indexed: 12/30/2022] Open
Abstract
Functional genomics studies have helped researchers annotate differentially expressed gene lists, extract gene expression signatures, and identify biological pathways from omics profiling experiments conducted on biological samples. The current geneset, network, and pathway analysis (GNPA) web servers, e.g., DAVID, EnrichR, WebGestaltR, or PAGER, do not allow automated integrative functional genomic downstream analysis. In this study, we developed a new web-based interactive application, “PAGER Web APP”, which supports online R scripting of integrative GNPA. In a case study of melanoma drug resistance, we showed that the new PAGER Web APP enabled us to discover highly relevant pathways and network modules, leading to novel biological insights. We also compared PAGER Web APP’s pathway analysis results retrieved among PAGER, EnrichR, and WebGestaltR to show its advantages in integrative GNPA. The interactive online web APP is publicly accessible from the link, https://aimed-lab.shinyapps.io/PAGERwebapp/.
Collapse
Affiliation(s)
- Zongliang Yue
- Informatics Institute in the School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States
| | - Radomir Slominski
- Informatics Institute in the School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States
- Graduate Biomedical Sciences Program, The University of Alabama at Birmingham, Birmingham, AL, United States
| | - Samuel Bharti
- Informatics Institute in the School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States
| | - Jake Y. Chen
- Informatics Institute in the School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States
- *Correspondence: Jake Y. Chen,
| |
Collapse
|
32
|
Ogris C, Castresana-Aguirre M, Sonnhammer ELL. PathwAX II: Network-based pathway analysis with interactive visualization of network crosstalk. Bioinformatics 2022; 38:2659-2660. [PMID: 35266519 PMCID: PMC9048662 DOI: 10.1093/bioinformatics/btac153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 02/03/2022] [Accepted: 03/09/2022] [Indexed: 11/28/2022] Open
Abstract
Motivation Pathway annotation tools are indispensable for the interpretation of a wide range of experiments in life sciences. Network-based algorithms have recently been developed which are more sensitive than traditional overlap-based algorithms, but there is still a lack of good online tools for network-based pathway analysis. Results We present PathwAX II—a pathway analysis web tool based on network crosstalk analysis using the BinoX algorithm. It offers several new features compared with the first version, including interactive graphical network visualization of the crosstalk between a query gene set and an enriched pathway, and the addition of Reactome pathways. Availability and implementation PathwAX II is available at http://pathwax.sbc.su.se. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Christoph Ogris
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, 17121 Solna, Box, Sweden 1031.,Institute of Computational Biology, Helmholtz Center Munich, Neuherberg, Germany Ingolstädter Landstr. 1 85764
| | - Miguel Castresana-Aguirre
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, 17121 Solna, Box, Sweden 1031
| | - Erik L L Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, 17121 Solna, Box, Sweden 1031
| |
Collapse
|
33
|
Lycopene Supplementation to Serum-Free Maturation Medium Improves In Vitro Bovine Embryo Development and Quality and Modulates Embryonic Transcriptomic Profile. Antioxidants (Basel) 2022; 11:antiox11020344. [PMID: 35204226 PMCID: PMC8868338 DOI: 10.3390/antiox11020344] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 02/02/2022] [Accepted: 02/08/2022] [Indexed: 02/08/2023] Open
Abstract
Bovine embryos are typically cultured at reduced oxygen tension to lower the impact of oxidative stress on embryo development. However, oocyte in vitro maturation (IVM) is performed at atmospheric oxygen tension since low oxygen during maturation has a negative impact on oocyte developmental competence. Lycopene, a carotenoid, acts as a powerful antioxidant and may protect the oocyte against oxidative stress during maturation at atmospheric oxygen conditions. Here, we assessed the effect of adding 0.2 μM lycopene (antioxidant), 5 μM menadione (pro-oxidant), and their combination on the generation of reactive oxygen species (ROS) in matured oocytes and the subsequent development, quality, and transcriptome of the blastocysts in a bovine in vitro model. ROS fluorescent intensity in matured oocytes was significantly lower in the lycopene group, and the resulting embryos showed a significantly higher blastocyst rate on day 8 and a lower apoptotic cell ratio than all other groups. Transcriptomic analysis disclosed a total of 296 differentially expressed genes (Benjamini–Hochberg-adjusted p < 0.05 and ≥ 1-log2-fold change) between the lycopene and control groups, where pathways associated with cellular function, metabolism, DNA repair, and anti-apoptosis were upregulated in the lycopene group. Lycopene supplementation to serum-free maturation medium neutralized excess ROS during maturation, enhanced blastocyst development and quality, and modulated the transcriptomic landscape.
Collapse
|
34
|
Abstract
DNA microarrays are widely used to investigate gene expression. Even though the classical analysis of microarray data is based on the study of differentially expressed genes, it is well known that genes do not act individually. Network analysis can be applied to study association patterns of the genes in a biological system. Moreover, it finds wide application in differential coexpression analysis between different systems. Network based coexpression studies have for example been used in (complex) disease gene prioritization, disease subtyping, and patient stratification.In this chapter we provide an overview of the methods and tools used to create networks from microarray data and describe multiple methods on how to analyze a single network or a group of networks. The described methods range from topological metrics, functional group identification to data integration strategies, topological pathway analysis as well as graphical models.
Collapse
Affiliation(s)
- Alisa Pavel
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Angela Serra
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Luca Cattelani
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Antonio Federico
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Dario Greco
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.
- BioMediTech Institute, Tampere University, Tampere, Finland.
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland.
- Institute of Biotechnology , University of Helsinki, Helsinki, Finland.
| |
Collapse
|
35
|
Marczyk M, Macioszek A, Tobiasz J, Polanska J, Zyla J. Importance of SNP Dependency Correction and Association Integration for Gene Set Analysis in Genome-Wide Association Studies. Front Genet 2021; 12:767358. [PMID: 34956320 PMCID: PMC8696167 DOI: 10.3389/fgene.2021.767358] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 11/10/2021] [Indexed: 11/13/2022] Open
Abstract
A typical genome-wide association study (GWAS) analyzes millions of single-nucleotide polymorphisms (SNPs), several of which are in a region of the same gene. To conduct gene set analysis (GSA), information from SNPs needs to be unified at the gene level. A widely used practice is to use only the most relevant SNP per gene; however, there are other methods of integration that could be applied here. Also, the problem of nonrandom association of alleles at two or more loci is often neglected. Here, we tested the impact of incorporation of different integrations and linkage disequilibrium (LD) correction on the performance of several GSA methods. Matched normal and breast cancer samples from The Cancer Genome Atlas database were used to evaluate the performance of six GSA algorithms: Coincident Extreme Ranks in Numerical Observations (CERNO), Gene Set Enrichment Analysis (GSEA), GSEA-SNP, improved GSEA for GWAS (i-GSEA4GWAS), Meta-Analysis Gene-set Enrichment of variaNT Associations (MAGENTA), and Over-Representation Analysis (ORA). Association of SNPs to phenotype was calculated using modified McNemar's test. Results for SNPs mapped to the same gene were integrated using Fisher and Stouffer methods and compared with the minimum p-value method. Four common measures were used to quantify the performance of all combinations of methods. Results of GSA analysis on GWAS were compared to the one performed on gene expression data. Comparing all evaluation metrics across different GSA algorithms, integrations, and LD correction, we highlighted CERNO, and MAGENTA with Stouffer as the most efficient. Applying LD correction increased prioritization and specificity of enrichment outcomes for all tested algorithms. When Fisher or Stouffer were used with LD, sensitivity and reproducibility were also better. Using any integration method was beneficial in comparison with a minimum p-value method in specific combinations. The correlation between GSA results from genomic and transcriptomic level was the highest when Stouffer integration was combined with LD correction. We thoroughly evaluated different approaches to GSA in GWAS in terms of performance to guide others to select the most effective combinations. We showed that LD correction and Stouffer integration could increase the performance of enrichment analysis and encourage the usage of these techniques.
Collapse
Affiliation(s)
- Michal Marczyk
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland.,Yale Cancer Center, Yale School of Medicine, New Haven, CT, United States
| | - Agnieszka Macioszek
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland
| | - Joanna Tobiasz
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland
| | - Joanna Polanska
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland
| | - Joanna Zyla
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland
| |
Collapse
|
36
|
Wang G, Kitaoka T, Crawford A, Mao Q, Hesketh A, Guppy FM, Ash GI, Liu J, Gerstein MB, Pitsiladis YP. Cross-platform transcriptomic profiling of the response to recombinant human erythropoietin. Sci Rep 2021; 11:21705. [PMID: 34737331 PMCID: PMC8568984 DOI: 10.1038/s41598-021-00608-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2021] [Accepted: 10/11/2021] [Indexed: 11/08/2022] Open
Abstract
RNA-seq has matured and become an important tool for studying RNA biology. Here we compared two RNA-seq (MGI DNBSEQ and Illumina NextSeq 500) and two microarray platforms (GeneChip Human Transcriptome Array 2.0 and Illumina Expression BeadChip) in healthy individuals administered recombinant human erythropoietin for transcriptome-wide quantification of differential gene expression. The results show that total RNA DNB-seq generated a multitude of target genes compared to other platforms. Pathway enrichment analyses revealed genes correlate to not only erythropoiesis and oxygen transport but also a wide range of other functions, such as tissue protection and immune regulation. This study provides a knowledge base of genes relevant to EPO biology through cross-platform comparisons and validation.
Collapse
Affiliation(s)
- Guan Wang
- School of Sport and Health Sciences, University of Brighton, Brighton, UK.
- Centre for Regenerative Medicine and Devices, University of Brighton, Brighton, UK.
| | | | | | | | - Andrew Hesketh
- School of Applied Sciences, University of Brighton, Brighton, UK
| | - Fergus M Guppy
- School of Applied Sciences, University of Brighton, Brighton, UK
- Centre for Stress and Age-Related Disease, University of Brighton, Brighton, UK
| | - Garrett I Ash
- Veterans Affairs Connecticut Healthcare System, West Haven, CT, USA
- Center for Medical Informatics, Yale University, New Haven, CT, USA
| | - Jason Liu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Mark B Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
- Department of Computer Science, Yale University, New Haven, CT, USA
- Department of Statistics and Data Science, Yale University, New Haven, CT, USA
| | - Yannis P Pitsiladis
- School of Sport and Health Sciences, University of Brighton, Brighton, UK.
- Centre for Stress and Age-Related Disease, University of Brighton, Brighton, UK.
| |
Collapse
|
37
|
Aranciaga N, Morton JD, Maes E, Gathercole JL, Berg DK. Proteomic determinants of uterine receptivity for pregnancy in early and mid-postpartum dairy cows†. Biol Reprod 2021; 105:1458-1473. [PMID: 34647570 DOI: 10.1093/biolre/ioab190] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2021] [Revised: 08/03/2021] [Accepted: 10/13/2021] [Indexed: 11/14/2022] Open
Abstract
Dairy cow subfertility is a worldwide issue arising from multiple factors. It manifests in >30% early pregnancy losses in seasonal pasture-grazed herds, especially when cows are inseminated in the early post-partum period. Most losses occur before implantation, when embryo growth depends on factors present in maternal tract fluids. Here we examined the proteomic composition of early and mid-postpartum uterine luminal fluid in crossbred lactating dairy cows to identify molecular determinants of fertility. We also explored changes in uterine luminal fluid from first to third estrus cycles postpartum in individual cows, linking those changes with divergent embryo development. For this, we flushed uteri of 87 cows at day 7 of pregnancy at first and third estrus postpartum, recovering and grading their embryos. Out of 1563 proteins detected, 472 had not been previously reported in this fluid, and 408 were predicted to be actively secreted by bioinformatic analysis. The abundance of 18 proteins with roles in immune regulation and metabolic function (e.g. cystatin B, pyruvate kinase M2) was associated with contrasting embryo quality. Matched-paired pathway analysis indicated that, from first to third estrus postpartum, upregulation of metabolic (e.g. creatine and carbohydrate) and immune (e.g. complement regulation, antiviral defense) processes were related to poorer quality embryos in the third estrus cycle postpartum. Conversely, upregulated signal transduction and protein trafficking appeared related to improved embryo quality in third estrus. These results advance the characterization of the molecular environment of bovine uterine luminal fluid and may aid understanding fertility issues in other mammals, including humans.
Collapse
Affiliation(s)
- Nicolas Aranciaga
- Proteins and Metabolites Team, Agresearch, Christchurch, New Zealand.,Faculty of Agriculture and Life Sciences, Lincoln University, Christchurch, New Zealand.,Animal Biotechnology Team, Agresearch, Hamilton, New Zealand
| | - James D Morton
- Faculty of Agriculture and Life Sciences, Lincoln University, Christchurch, New Zealand
| | - Evelyne Maes
- Proteins and Metabolites Team, Agresearch, Christchurch, New Zealand
| | | | - Debra K Berg
- Animal Biotechnology Team, Agresearch, Hamilton, New Zealand
| |
Collapse
|
38
|
Li X, Zhang B, Yu K, Bao Z, Zhang W, Bai Y. Identifying cancer specific signaling pathways based on the dysregulation between genes. Comput Biol Chem 2021; 95:107586. [PMID: 34619555 DOI: 10.1016/j.compbiolchem.2021.107586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 08/10/2021] [Accepted: 09/26/2021] [Indexed: 11/26/2022]
Abstract
A large collection of studies has shown that the occurrence of cancer is related to the functional dysfunction of the pathways. Identification of cancer-related pathways could help researchers understand the mechanisms of complex diseases well. Whereas, most current signaling pathway analysis methods take no account of the gene interaction variations within pathways. Furthermore, considering that some pathways have connection with two or more cancer types, while some are likely to be cancer-type specific pathways. Identifying cancer-type specific pathways contributes to interpreting the different mechanisms of different cancer types. In this study, we first proposed a pathway analysis method named Pathway Analysis of Intergenic Regulation (PAIGR) to identify pathways with dysregulation between genes and compared the performance of this method with four existing methods on four colorectal cancer (CRC) datasets. The results showed that PAIGR could find cancer-related pathways more accurately. Moreover, in order to explore the relationship between the identified pathways and the cancer type, we constructed a pathway interaction network, in which nodes and edges represented pathways and interactions between pathways respectively. Highly connected pathways were considered to play a central role in an extensive range of biological processes, while sparsely connected pathways are considered to have certain specificity. Our results showed that pathways identified by PAIGR had a low nodal degree (i.e., a few numbers of interactions), which suggested that most of these pathways were cancer-type specific.
Collapse
Affiliation(s)
- Xiaohan Li
- State Key Lab of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, Jiangsu, 210096, China.
| | - Bing Zhang
- State Key Lab of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, Jiangsu, 210096, China.
| | - Kequan Yu
- State Key Lab of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, Jiangsu, 210096, China.
| | - Zhenshen Bao
- State Key Lab of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, Jiangsu, 210096, China.
| | - Weizhong Zhang
- Department of Ophthalmology, First Affiliated Hospital of Nanjing Medical University, Nanjing, China.
| | - Yunfei Bai
- State Key Lab of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, Jiangsu, 210096, China.
| |
Collapse
|
39
|
Ramos M, Geistlinger L, Oh S, Schiffer L, Azhar R, Kodali H, de Bruijn I, Gao J, Carey VJ, Morgan M, Waldron L. Multiomic Integration of Public Oncology Databases in Bioconductor. JCO Clin Cancer Inform 2021; 4:958-971. [PMID: 33119407 PMCID: PMC7608653 DOI: 10.1200/cci.19.00119] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
PURPOSE Investigations of the molecular basis for the development, progression, and treatment of cancer increasingly use complementary genomic assays to gather multiomic data, but management and analysis of such data remain complex. The cBioPortal for cancer genomics currently provides multiomic data from > 260 public studies, including The Cancer Genome Atlas (TCGA) data sets, but integration of different data types remains challenging and error prone for computational methods and tools using these resources. Recent advances in data infrastructure within the Bioconductor project enable a novel and powerful approach to creating fully integrated representations of these multiomic, pan-cancer databases. METHODS We provide a set of R/Bioconductor packages for working with TCGA legacy data and cBioPortal data, with special considerations for loading time; efficient representations in and out of memory; analysis platform; and an integrative framework, such as MultiAssayExperiment. Large methylation data sets are provided through out-of-memory data representation to provide responsive loading times and analysis capabilities on machines with limited memory. RESULTS We developed the curatedTCGAData and cBioPortalData R/Bioconductor packages to provide integrated multiomic data sets from the TCGA legacy database and the cBioPortal web application programming interface using the MultiAssayExperiment data structure. This suite of tools provides coordination of diverse experimental assays with clinicopathological data with minimal data management burden, as demonstrated through several greatly simplified multiomic and pan-cancer analyses. CONCLUSION These integrated representations enable analysts and tool developers to apply general statistical and plotting methods to extensive multiomic data through user-friendly commands and documented examples.
Collapse
Affiliation(s)
- Marcel Ramos
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY.,Institute for Implementation Science and Population Health, City University of New York, New York, NY.,Roswell Park Comprehensive Cancer Center, Buffalo, NY
| | - Ludwig Geistlinger
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY.,Institute for Implementation Science and Population Health, City University of New York, New York, NY
| | - Sehyun Oh
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY.,Institute for Implementation Science and Population Health, City University of New York, New York, NY
| | - Lucas Schiffer
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY.,Institute for Implementation Science and Population Health, City University of New York, New York, NY.,Section of Computational Biomedicine, Boston University School of Medicine, Boston, MA
| | - Rimsha Azhar
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY.,Institute for Implementation Science and Population Health, City University of New York, New York, NY.,Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, NY
| | - Hanish Kodali
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY.,Institute for Implementation Science and Population Health, City University of New York, New York, NY
| | - Ino de Bruijn
- Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Jianjiong Gao
- Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY.,Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Vincent J Carey
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Martin Morgan
- Roswell Park Comprehensive Cancer Center, Buffalo, NY
| | - Levi Waldron
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY.,Institute for Implementation Science and Population Health, City University of New York, New York, NY
| |
Collapse
|
40
|
Rondel FM, Hosseini R, Sahoo B, Knyazev S, Mandric I, Stewart F, Măndoiu II, Pasaniuc B, Porozov Y, Zelikovsky A. Pipeline for Analyzing Activity of Metabolic Pathways in Planktonic Communities Using Metatranscriptomic Data. J Comput Biol 2021; 28:842-855. [PMID: 34264744 PMCID: PMC8575064 DOI: 10.1089/cmb.2021.0053] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
In this article, we present our novel pipeline for analysis of metabolic activity using a microbial community's metatranscriptome sequence data set for validation. Our method is based on expectation-maximization (EM) algorithm and provides enzyme expression and pathway activity levels. Further expanding our analysis, we consider individual enzymatic activity and compute enzyme participation coefficients to approximate the metabolic pathway activity more accurately. We apply our EM pathways pipeline to a metatranscriptomic data set of a plankton community from surface waters of the Northern Gulf of Mexico. The data set consists of RNA-seq data and respective environmental parameters, which were sampled at two depths, six times a day over multiple 24-hour cycles. Furthermore, we discuss microbial dependence on day-night cycle within our findings based on a three-way correlation of the enzyme expression during antipodal times-midnight and noon. We show that the enzyme participation levels strongly affect the metabolic activity estimates: that is, marginal and multiple linear regression of enzymatic and metabolic pathway activity correlated significantly with the recorded environmental parameters. Our analysis statistically validates that EM-based methods produce meaningful results, as our method confirms statistically significant dependence of metabolic pathway activity on the environmental parameters, such as salinity, temperature, brightness, and a few others.
Collapse
Affiliation(s)
| | - Roya Hosseini
- Department of Computer Science, Georgia State University, Atlanta, Georgia, USA
| | - Bikram Sahoo
- Department of Computer Science, Georgia State University, Atlanta, Georgia, USA
| | - Sergey Knyazev
- Department of Computer Science, Georgia State University, Atlanta, Georgia, USA
| | - Igor Mandric
- Department of Computer Science, Georgia State University, Atlanta, Georgia, USA
| | - Frank Stewart
- Department of Microbiology and Immunology, Montana State University, Bozeman, Montana, USA
| | - Ion I. Măndoiu
- Computer Science & Engineering Department, University of Connecticut, Storrs, Connecticut, USA
| | - Bogdan Pasaniuc
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA
| | - Yuri Porozov
- World-Class Research Center “Digital biodesign and personalized healthcare,” I.M. Sechenov First Moscow State Medical University, Moscow, Russia
- Department of Computational Biology, Sirius University of Science and Technology, Sochi, Russia
| | - Alexander Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, Georgia, USA
- World-Class Research Center “Digital biodesign and personalized healthcare,” I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| |
Collapse
|
41
|
Nguyen H, Tran D, Galazka JM, Costes SV, Beheshti A, Petereit J, Draghici S, Nguyen T. CPA: a web-based platform for consensus pathway analysis and interactive visualization. Nucleic Acids Res 2021; 49:W114-W124. [PMID: 34037798 PMCID: PMC8262702 DOI: 10.1093/nar/gkab421] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 04/16/2021] [Accepted: 05/05/2021] [Indexed: 01/06/2023] Open
Abstract
In molecular biology and genetics, there is a large gap between the ease of data collection and our ability to extract knowledge from these data. Contributing to this gap is the fact that living organisms are complex systems whose emerging phenotypes are the results of multiple complex interactions taking place on various pathways. This demands powerful yet user-friendly pathway analysis tools to translate the now abundant high-throughput data into a better understanding of the underlying biological phenomena. Here we introduce Consensus Pathway Analysis (CPA), a web-based platform that allows researchers to (i) perform pathway analysis using eight established methods (GSEA, GSA, FGSEA, PADOG, Impact Analysis, ORA/Webgestalt, KS-test, Wilcox-test), (ii) perform meta-analysis of multiple datasets, (iii) combine methods and datasets to accurately identify the impacted pathways underlying the studied condition and (iv) interactively explore impacted pathways, and browse relationships between pathways and genes. The platform supports three types of input: (i) a list of differentially expressed genes, (ii) genes and fold changes and (iii) an expression matrix. It also allows users to import data from NCBI GEO. The CPA platform currently supports the analysis of multiple organisms using KEGG and Gene Ontology, and it is freely available at http://cpa.tinnguyen-lab.com.
Collapse
Affiliation(s)
- Hung Nguyen
- University of Nevada Reno, Department of Computer Science and Engineering, Reno, NV 89557, USA
| | - Duc Tran
- University of Nevada Reno, Department of Computer Science and Engineering, Reno, NV 89557, USA
| | - Jonathan M Galazka
- NASA Ames Research Center, Space Biosciences Division, Moffett Field, CA 94035, USA
| | - Sylvain V Costes
- NASA Ames Research Center, Space Biosciences Division, Moffett Field, CA 94035, USA
| | - Afshin Beheshti
- KBR, NASA Ames Research Center, Space Biosciences Division, Moffett Field, CA 94035, USA
| | - Juli Petereit
- University of Nevada Reno, Nevada Bioinformatics Center, Reno, NV 89557, USA
| | - Sorin Draghici
- Wayne State University, Department of Computer Science, Detroit, MI 48202, USA
| | - Tin Nguyen
- University of Nevada Reno, Department of Computer Science and Engineering, Reno, NV 89557, USA
| |
Collapse
|
42
|
Bu D, Luo H, Huo P, Wang Z, Zhang S, He Z, Wu Y, Zhao L, Liu J, Guo J, Fang S, Cao W, Yi L, Zhao Y, Kong L. KOBAS-i: intelligent prioritization and exploratory visualization of biological functions for gene enrichment analysis. Nucleic Acids Res 2021; 49:W317-W325. [PMID: 34086934 PMCID: PMC8265193 DOI: 10.1093/nar/gkab447] [Citation(s) in RCA: 726] [Impact Index Per Article: 242.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 04/24/2021] [Accepted: 05/09/2021] [Indexed: 12/20/2022] Open
Abstract
Gene set enrichment (GSE) analysis plays an essential role in extracting biological insight from genome-scale experiments. ORA (overrepresentation analysis), FCS (functional class scoring), and PT (pathway topology) approaches are three generations of GSE methods along the timeline of development. Previous versions of KOBAS provided services based on just the ORA method. Here we presented version 3.0 of KOBAS, which is named KOBAS-i (short for KOBAS intelligent version). It introduced a novel machine learning-based method we published earlier, CGPS, which incorporates seven FCS tools and two PT tools into a single ensemble score and intelligently prioritizes the relevant biological pathways. In addition, KOBAS has expanded the downstream exploratory visualization for selecting and understanding the enriched results. The tool constructs a novel view of cirFunMap, which presents different enriched terms and their correlations in a landscape. Finally, based on the previous version's framework, KOBAS increased the number of supported species from 1327 to 5944. For an easier local run, it also provides a prebuilt Docker image that requires no installation, as a supplementary to the source code version. KOBAS can be freely accessed at http://kobas.cbi.pku.edu.cn, and a mirror site is available at http://bioinfo.org/kobas.
Collapse
Affiliation(s)
| | | | - Peipei Huo
- Chinese Academy of Sciences, LuoYang Branch of Institute of Computing Technology, Luoyang, 471000, China
| | - Zhihao Wang
- Chinese Academy of Sciences, LuoYang Branch of Institute of Computing Technology, Luoyang, 471000, China
| | - Shan Zhang
- Chinese Academy of Sciences, LuoYang Branch of Institute of Computing Technology, Luoyang, 471000, China
| | - Zihao He
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, ChaoYang District, Beijing 100029, China
| | - Yang Wu
- Pervasive Computing Research Center, Institute of Computing Technology, Chinese Academy ofSciences, Beijing, 100190, China
| | - Lianhe Zhao
- Pervasive Computing Research Center, Institute of Computing Technology, Chinese Academy ofSciences, Beijing, 100190, China
| | - Jingjia Liu
- Cancer Center, Ningbo Institute of Life and Health Industry, University of Chinese Academy of Sciences, Zhejiang 315000, China
| | - Jincheng Guo
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, ChaoYang District, Beijing 100029, China
| | - Shuangsang Fang
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, ChaoYang District, Beijing 100029, China
| | - Wanchen Cao
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, ChaoYang District, Beijing 100029, China
| | - Lan Yi
- Pervasive Computing Research Center, Institute of Computing Technology, Chinese Academy ofSciences, Beijing, 100190, China
| | - Yi Zhao
- Correspondence may also be addressed to Yi Zhao. Tel: +86 010 62600822;
| | - Lei Kong
- To whom correspondence should be addressed. Tel: +86 010 62755206;
| |
Collapse
|
43
|
Xie C, Jauhari S, Mora A. Popularity and performance of bioinformatics software: the case of gene set analysis. BMC Bioinformatics 2021; 22:191. [PMID: 33858350 PMCID: PMC8050894 DOI: 10.1186/s12859-021-04124-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Accepted: 04/08/2021] [Indexed: 11/22/2022] Open
Abstract
Background Gene Set Analysis (GSA) is arguably the method of choice for the functional interpretation of omics results. The following paper explores the popularity and the performance of all the GSA methodologies and software published during the 20 years since its inception. "Popularity" is estimated according to each paper's citation counts, while "performance" is based on a comprehensive evaluation of the validation strategies used by papers in the field, as well as the consolidated results from the existing benchmark studies. Results Regarding popularity, data is collected into an online open database ("GSARefDB") which allows browsing bibliographic and method-descriptive information from 503 GSA paper references; regarding performance, we introduce a repository of jupyter workflows and shiny apps for automated benchmarking of GSA methods (“GSA-BenchmarKING”). After comparing popularity versus performance, results show discrepancies between the most popular and the best performing GSA methods. Conclusions The above-mentioned results call our attention towards the nature of the tool selection procedures followed by researchers and raise doubts regarding the quality of the functional interpretation of biological datasets in current biomedical studies. Suggestions for the future of the functional interpretation field are made, including strategies for education and discussion of GSA tools, better validation and benchmarking practices, reproducibility, and functional re-analysis of previously reported data. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04124-5.
Collapse
Affiliation(s)
- Chengshu Xie
- Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health - Chinese Academy of Sciences, Guangzhou, China
| | - Shaurya Jauhari
- Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health - Chinese Academy of Sciences, Guangzhou, China
| | - Antonio Mora
- Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health - Chinese Academy of Sciences, Guangzhou, China.
| |
Collapse
|
44
|
Abstract
Perturbation in the normal function of the cell signaling pathways often leads to diseases. One of the factors that help understand the mechanism of diseases is the precise identification and investigation of perturbed signaling pathways. Pathway analysis methods have been developed as their purpose is to identify perturbed signaling pathways in given conditions. Among these methods, some consider the pathways topologies in their analysis, which are referred to as topology-based methods. Most of the topology-based methods used simple graph-based models to incorporate topology in their analysis, which have some limitations. We describe a new Pathway Analysis method using Petri net (PAPet) that uses the Petri net to model the signaling pathways and then propose an algorithm to measure the perturbation on a given pathway under a given condition. Modeling with Petri net has some advantages and could overcome the shortcomings of the simple graph-based models. We illustrate the capabilities of the proposed method using sensitivity, prioritization, mean reciprocal rank, and false-positive rate metrics on 36 real datasets from various diseases. The results of comparing PAPet with five pathway analysis methods FoPA, PADOG, GSEA, CePa and SPIA show that PAPet is the best one that provides a good compromise between all metrics. In addition, the results of applying methods to gene expression profiles in normal and Pancreatic Ductal Adenocarcinoma cancer (PDAC) samples show that the PAPet method achieves the best rank among others in finding the pathways that have been previously reported for PDAC. The PAPet method is available at https://github.com/fmansoori/PAPET.
Collapse
|
45
|
Ietswaart R, Gyori BM, Bachman JA, Sorger PK, Churchman LS. GeneWalk identifies relevant gene functions for a biological context using network representation learning. Genome Biol 2021; 22:55. [PMID: 33526072 PMCID: PMC7852222 DOI: 10.1186/s13059-021-02264-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 01/05/2021] [Indexed: 12/13/2022] Open
Abstract
A bottleneck in high-throughput functional genomics experiments is identifying the most important genes and their relevant functions from a list of gene hits. Gene Ontology (GO) enrichment methods provide insight at the gene set level. Here, we introduce GeneWalk ( github.com/churchmanlab/genewalk ) that identifies individual genes and their relevant functions critical for the experimental setting under examination. After the automatic assembly of an experiment-specific gene regulatory network, GeneWalk uses representation learning to quantify the similarity between vector representations of each gene and its GO annotations, yielding annotation significance scores that reflect the experimental context. By performing gene- and condition-specific functional analysis, GeneWalk converts a list of genes into data-driven hypotheses.
Collapse
Affiliation(s)
- Robert Ietswaart
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Benjamin M Gyori
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, 02115, USA
| | - John A Bachman
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, 02115, USA
| | - Peter K Sorger
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, 02115, USA
| | - L Stirling Churchman
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
46
|
Griss J, Viteri G, Sidiropoulos K, Nguyen V, Fabregat A, Hermjakob H. ReactomeGSA - Efficient Multi-Omics Comparative Pathway Analysis. Mol Cell Proteomics 2020; 19:2115-2125. [PMID: 32907876 PMCID: PMC7710148 DOI: 10.1074/mcp.tir120.002155] [Citation(s) in RCA: 128] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 07/28/2020] [Indexed: 01/27/2023] Open
Abstract
Pathway analyses are key methods to analyze 'omics experiments. Nevertheless, integrating data from different 'omics technologies and different species still requires considerable bioinformatics knowledge.Here we present the novel ReactomeGSA resource for comparative pathway analyses of multi-omics datasets. ReactomeGSA can be used through Reactome's existing web interface and the novel ReactomeGSA R Bioconductor package with explicit support for scRNA-seq data. Data from different species is automatically mapped to a common pathway space. Public data from ExpressionAtlas and Single Cell ExpressionAtlas can be directly integrated in the analysis. ReactomeGSA greatly reduces the technical barrier for multi-omics, cross-species, comparative pathway analyses.We used ReactomeGSA to characterize the role of B cells in anti-tumor immunity. We compared B cell rich and poor human cancer samples from five of the Cancer Genome Atlas (TCGA) transcriptomics and two of the Clinical Proteomic Tumor Analysis Consortium (CPTAC) proteomics studies. B cell-rich lung adenocarcinoma samples lacked the otherwise present activation through NFkappaB. This may be linked to the presence of a specific subset of tumor associated IgG+ plasma cells that lack NFkappaB activation in scRNA-seq data from human melanoma. This showcases how ReactomeGSA can derive novel biomedical insights by integrating large multi-omics datasets.
Collapse
Affiliation(s)
- Johannes Griss
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, United Kingdom; Department of Dermatology, Medical University of Vienna, Vienna, Austria.
| | - Guilherme Viteri
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, United Kingdom
| | - Konstantinos Sidiropoulos
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, United Kingdom
| | - Vy Nguyen
- Department of Dermatology, Medical University of Vienna, Vienna, Austria
| | - Antonio Fabregat
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, United Kingdom
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, United Kingdom.
| |
Collapse
|
47
|
Kim W, Yoon SM, Kim S. A semi-automatic cell type annotation method for single-cell RNA sequencing dataset. Genomics Inform 2020; 18:e26. [PMID: 33017870 PMCID: PMC7560448 DOI: 10.5808/gi.2020.18.3.e26] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Accepted: 03/27/2020] [Indexed: 11/21/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has been widely applied to provide insights into the cell-by-cell expression difference in a given bulk sample. Accordingly, numerous analysis methods have been developed. As it involves simultaneous analyses of many cell and genes, efficiency of the methods is crucial. The conventional cell type annotation method is laborious and subjective. Here we propose a semi-automatic method that calculates a normalized score for each cell type based on user-supplied cell type–specific marker gene list. The method was applied to a publicly available scRNA-seq data of mouse cardiac non-myocyte cell pool. Annotating the 35 t-stochastic neighbor embedding clusters into 12 cell types was straightforward, and its accuracy was evaluated by constructing co-expression network for each cell type. Gene Ontology analysis was congruent with the annotated cell type and the corollary regulatory network analysis showed upstream transcription factors that have well supported literature evidences. The source code is available as an R script upon request.
Collapse
Affiliation(s)
- Wan Kim
- Department of Bioinformatics and Life Science, Soongsil University, Seoul 06978, Korea
| | - Sung Min Yoon
- Department of Bioinformatics and Life Science, Soongsil University, Seoul 06978, Korea
| | - Sangsoo Kim
- Department of Bioinformatics and Life Science, Soongsil University, Seoul 06978, Korea
| |
Collapse
|
48
|
Bao Z, Zhang B, Li L, Ge Q, Gu W, Bai Y. Identifying disease-associated signaling pathways through a novel effector gene analysis. PeerJ 2020; 8:e9695. [PMID: 32864216 PMCID: PMC7430270 DOI: 10.7717/peerj.9695] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Accepted: 07/20/2020] [Indexed: 12/21/2022] Open
Abstract
Background Signaling pathway analysis methods are commonly used to explain biological behaviors of disease cells. Effector genes typically decide functional attributes (associated with biological behaviors of disease cells) by abnormal signals they received. The signals that the effector genes receive can be quite different in normal vs. disease conditions. However, most of current signaling pathway analysis methods do not take these signal variations into consideration. Methods In this study, we developed a novel signaling pathway analysis method called signaling pathway functional attributes analysis (SPFA) method. This method analyzes the signal variations that effector genes received between two conditions (normal and disease) in different signaling pathways. Results We compared the SPFA method to seven other methods across 33 Gene Expression Omnibus datasets using three measurements: the median rank of target pathways, the median p-value of target pathways, and the percentages of significant pathways. The results confirmed that SPFA was the top-ranking method in terms of median rank of target pathways and the fourth best method in terms of median p-value of target pathways. SPFA’s percentage of significant pathways was modest, indicating a good false positive rate and false negative rate. Overall, SPFA was comparable to the other methods. Our results also suggested that the signal variations calculated by SPFA could help identify abnormal functional attributes and parts of pathways. The SPFA R code and functions can be accessed at https://github.com/ZhenshenBao/SPFA.
Collapse
Affiliation(s)
- Zhenshen Bao
- State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing, Jiangsu, China
| | - Bing Zhang
- State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing, Jiangsu, China
| | - Li Li
- Department of Respiratory Medicine, Zhongda Hospital, School of Medicine, Southeast University, Nanjing, Jiangsu, China
| | - Qinyu Ge
- State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing, Jiangsu, China
| | - Wanjun Gu
- State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing, Jiangsu, China
| | - Yunfei Bai
- State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing, Jiangsu, China
| |
Collapse
|
49
|
Saberian N, Shafi A, Peyvandipour A, Draghici S. MAGPEL: an autoMated pipeline for inferring vAriant-driven Gene PanEls from the full-length biomedical literature. Sci Rep 2020; 10:12365. [PMID: 32703994 PMCID: PMC7378213 DOI: 10.1038/s41598-020-68649-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Accepted: 06/17/2020] [Indexed: 11/09/2022] Open
Abstract
In spite of the efforts in developing and maintaining accurate variant databases, a large number of disease-associated variants are still hidden in the biomedical literature. Curation of the biomedical literature in an effort to extract this information is a challenging task due to: (i) the complexity of natural language processing, (ii) inconsistent use of standard recommendations for variant description, and (iii) the lack of clarity and consistency in describing the variant-genotype-phenotype associations in the biomedical literature. In this article, we employ text mining and word cloud analysis techniques to address these challenges. The proposed framework extracts the variant-gene-disease associations from the full-length biomedical literature and designs an evidence-based variant-driven gene panel for a given condition. We validate the identified genes by showing their diagnostic abilities to predict the patients' clinical outcome on several independent validation cohorts. As representative examples, we present our results for acute myeloid leukemia (AML), breast cancer and prostate cancer. We compare these panels with other variant-driven gene panels obtained from Clinvar, Mastermind and others from literature, as well as with a panel identified with a classical differentially expressed genes (DEGs) approach. The results show that the panels obtained by the proposed framework yield better results than the other gene panels currently available in the literature.
Collapse
Affiliation(s)
- Nafiseh Saberian
- Department of Computer Science, Wayne State University, Detroit, MI, USA
| | - Adib Shafi
- Department of Computer Science, Wayne State University, Detroit, MI, USA
| | - Azam Peyvandipour
- Department of Computer Science, Wayne State University, Detroit, MI, USA
| | - Sorin Draghici
- Department of Computer Science, Wayne State University, Detroit, MI, USA.
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, MI, USA.
| |
Collapse
|
50
|
Soubeyrand S, Nikpay M, Lau P, Turner A, Hoang HD, Alain T, McPherson R. CARMAL Is a Long Non-coding RNA Locus That Regulates MFGE8 Expression. Front Genet 2020; 11:631. [PMID: 32625236 PMCID: PMC7311772 DOI: 10.3389/fgene.2020.00631] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Accepted: 05/26/2020] [Indexed: 12/27/2022] Open
Abstract
Genome-wide association studies have identified several genetic loci linked to coronary artery disease (CAD) most of them located in non-protein coding regions of the genome. One such locus is the CAD Associated Region between MFGE8 and ABHD2 (CARMA), a ∼18 kb haplotype that was recently shown to regulate vicinal protein coding genes. Here, we further investigate the region by examining a long non-coding RNA gene locus (CARMAL/RP11-326A19.4/AC013565) abutting the CARMA region. Expression-genotype correlation analyses of public databases indicate that CARMAL levels are influenced by CAD associated variants suggesting that it might have cardioprotective functions. We found CARMAL to be stably expressed at relatively low levels and enriched in the cytosol. CARMAL function was investigated by several gene targeting approaches in HEK293T: inactive CRISPR fusion proteins, antisense, overexpression and inactivation by CRISPR-mediated knock-out. Modest increases in CARMAL (3–4×) obtained via CRISPRa using distinct single-guided RNAs did not result in consistent transcriptome effects. By contrast, CARMAL deletion or reduced CARMAL expression via CRISPRi increased MFGE8 levels, suggesting that CARMAL is contributing to reduce MFGE8 expression under basal conditions. While future investigations are required to clarify the mechanism(s) by which CARMAL acts on MFGE8, integrative bioinformatic analyses of the transcriptome of CARMAL deleted cells suggest that this locus may also be involved in leucine metabolism, splicing, transcriptional regulation and Shwachman-Bodian-Diamond syndrome protein function.
Collapse
Affiliation(s)
- Sébastien Soubeyrand
- Atherogenomics Laboratory, University of Ottawa Heart Institute, Ottawa, ON, Canada
| | - Majid Nikpay
- Ruddy Canadian Cardiovascular Genetics Centre, University of Ottawa Heart Institute, Ottawa, ON, Canada
| | - Paulina Lau
- Atherogenomics Laboratory, University of Ottawa Heart Institute, Ottawa, ON, Canada
| | - Adam Turner
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, United States
| | - Huy-Dung Hoang
- Children Hospital of Eastern Ontario Research Institute, Ottawa, ON, Canada.,Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, ON, Canada
| | - Tommy Alain
- Children Hospital of Eastern Ontario Research Institute, Ottawa, ON, Canada.,Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, ON, Canada
| | - Ruth McPherson
- Atherogenomics Laboratory, University of Ottawa Heart Institute, Ottawa, ON, Canada.,Ruddy Canadian Cardiovascular Genetics Centre, University of Ottawa Heart Institute, Ottawa, ON, Canada.,Department of Medicine, University of Ottawa Heart Institute, Ottawa, ON, Canada
| |
Collapse
|