1
|
Geistlinger L, Mirzayi C, Zohra F, Azhar R, Elsafoury S, Grieve C, Wokaty J, Gamboa-Tuz SD, Sengupta P, Hecht I, Ravikrishnan A, Gonçalves RS, Franzosa E, Raman K, Carey V, Dowd JB, Jones HE, Davis S, Segata N, Huttenhower C, Waldron L. BugSigDB captures patterns of differential abundance across a broad range of host-associated microbial signatures. Nat Biotechnol 2024; 42:790-802. [PMID: 37697152 PMCID: PMC11098749 DOI: 10.1038/s41587-023-01872-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 06/20/2023] [Indexed: 09/13/2023]
Abstract
The literature of human and other host-associated microbiome studies is expanding rapidly, but systematic comparisons among published results of host-associated microbiome signatures of differential abundance remain difficult. We present BugSigDB, a community-editable database of manually curated microbial signatures from published differential abundance studies accompanied by information on study geography, health outcomes, host body site and experimental, epidemiological and statistical methods using controlled vocabulary. The initial release of the database contains >2,500 manually curated signatures from >600 published studies on three host species, enabling high-throughput analysis of signature similarity, taxon enrichment, co-occurrence and coexclusion and consensus signatures. These data allow assessment of microbiome differential abundance within and across experimental conditions, environments or body sites. Database-wide analysis reveals experimental conditions with the highest level of consistency in signatures reported by independent studies and identifies commonalities among disease-associated signatures, including frequent introgression of oral pathobionts into the gut.
Collapse
Affiliation(s)
- Ludwig Geistlinger
- Center for Computational Biomedicine, Harvard Medical School, Boston, MA, USA
| | - Chloe Mirzayi
- Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, NY, USA
- Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, NY, USA
| | - Fatima Zohra
- Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, NY, USA
- Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, NY, USA
| | - Rimsha Azhar
- Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, NY, USA
- Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, NY, USA
| | - Shaimaa Elsafoury
- Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, NY, USA
- Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, NY, USA
| | - Clare Grieve
- Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, NY, USA
- Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, NY, USA
| | - Jennifer Wokaty
- Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, NY, USA
- Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, NY, USA
| | - Samuel David Gamboa-Tuz
- Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, NY, USA
- Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, NY, USA
| | - Pratyay Sengupta
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology (IIT) Madras, Chennai, India
- Robert Bosch Centre for Data Science and Artificial Intelligence, Indian Institute of Technology (IIT) Madras, Chennai, India
- Centre for Integrative Biology and Systems mEdicine (IBSE), Indian Institute of Technology (IIT) Madras, Chennai, India
| | | | - Aarthi Ravikrishnan
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Rafael S Gonçalves
- Center for Computational Biomedicine, Harvard Medical School, Boston, MA, USA
| | - Eric Franzosa
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
- Harvard Chan Microbiome in Public Health Center, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Karthik Raman
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology (IIT) Madras, Chennai, India
- Robert Bosch Centre for Data Science and Artificial Intelligence, Indian Institute of Technology (IIT) Madras, Chennai, India
- Centre for Integrative Biology and Systems mEdicine (IBSE), Indian Institute of Technology (IIT) Madras, Chennai, India
| | - Vincent Carey
- Channing Division of Network Medicine, Mass General Brigham, Harvard Medical School, Boston, MA, USA
| | - Jennifer B Dowd
- Leverhulme Centre for Demographic Science, University of Oxford, Oxford, UK
| | - Heidi E Jones
- Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, NY, USA
- Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, NY, USA
| | - Sean Davis
- Departments of Biomedical Informatics and Medicine, University of Colorado Anschutz School of Medicine, Denver, CO, USA
| | - Nicola Segata
- Department CIBIO, University of Trento, Trento, Italy
- Istituto Europeo di Oncologia (IEO) IRCSS, Milan, Italy
| | - Curtis Huttenhower
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
- Harvard Chan Microbiome in Public Health Center, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Levi Waldron
- Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, NY, USA.
- Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, NY, USA.
- Department CIBIO, University of Trento, Trento, Italy.
| |
Collapse
|
2
|
Isik FB, Knight HM, Rajkumar AP. Extracellular vesicle microRNA-mediated transcriptional regulation may contribute to dementia with Lewy bodies molecular pathology. Acta Neuropsychiatr 2024; 36:29-38. [PMID: 37339939 DOI: 10.1017/neu.2023.27] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/22/2023]
Abstract
OBJECTIVE Dementia with Lewy bodies (DLB) is the second most common dementia. Advancing our limited understanding of its molecular pathogenesis is essential for identifying novel biomarkers and therapeutic targets for DLB. DLB is an α-synucleinopathy, and small extracellular vesicles (SEV) from people with DLB can transmit α-synuclein oligomerisation between cells. Post-mortem DLB brains and serum SEV from those with DLB share common miRNA signatures, and their functional implications are uncertain. Hence, we aimed to investigate potential targets of DLB-associated SEV miRNA and to analyse their functional implications. METHODS We identified potential targets of six previously reported differentially expressed miRNA genes in serum SEV of people with DLB (MIR26A1, MIR320C2, MIR320D2, MIR548BA, MIR556, and MIR4722) using miRBase and miRDB databases. We analysed functional implications of these targets using EnrichR gene set enrichment analysis and analysed their protein interactions using Reactome pathway analysis. RESULTS These SEV miRNA may regulate 4278 genes that were significantly enriched among the genes involved in neuronal development, cell-to-cell communication, vesicle-mediated transport, apoptosis, regulation of cell cycle, post-translational protein modifications, and autophagy lysosomal pathway, after Benjamini-Hochberg false discovery rate correction at 5%. The miRNA target genes and their protein interactions were significantly associated with several neuropsychiatric disorders and with multiple signal transduction, transcriptional regulation, and cytokine signalling pathways. CONCLUSION Our findings provide in-silico evidence that potential targets of DLB-associated SEV miRNAs may contribute to Lewy pathology by transcriptional regulation. Experimental validation of these dysfunctional pathways is warranted and could lead to novel therapeutic avenues for DLB.
Collapse
Affiliation(s)
- Fatma Busra Isik
- School of Life Science, Queen's Medical Centre, University of Nottingham, Nottingham, UK
| | - Helen Miranda Knight
- School of Life Science, Queen's Medical Centre, University of Nottingham, Nottingham, UK
| | - Anto P Rajkumar
- Institute of Mental Health, Mental Health and Clinical Neurosciences Academic Unit, University of Nottingham, Nottingham, UK
- Mental Health Services for Older People, Nottinghamshire Healthcare NHS Foundation Trust, Nottingham, UK
| |
Collapse
|
3
|
Marcoux P, Hwang JW, Desterke C, Imeri J, Bennaceur-Griscelli A, Turhan AG. Modeling RET-Rearranged Non-Small Cell Lung Cancer (NSCLC): Generation of Lung Progenitor Cells (LPCs) from Patient-Derived Induced Pluripotent Stem Cells (iPSCs). Cells 2023; 12:2847. [PMID: 38132167 PMCID: PMC10742233 DOI: 10.3390/cells12242847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 12/03/2023] [Accepted: 12/08/2023] [Indexed: 12/23/2023] Open
Abstract
REarranged during Transfection (RET) oncogenic rearrangements can occur in 1-2% of lung adenocarcinomas. While RET-driven NSCLC models have been developed using various approaches, no model based on patient-derived induced pluripotent stem cells (iPSCs) has yet been described. Patient-derived iPSCs hold great promise for disease modeling and drug screening. However, generating iPSCs with specific oncogenic drivers, like RET rearrangements, presents challenges due to reprogramming efficiency and genotypic variability within tumors. To address this issue, we aimed to generate lung progenitor cells (LPCs) from patient-derived iPSCs carrying the mutation RETC634Y, commonly associated with medullary thyroid carcinoma. Additionally, we established a RETC634Y knock-in iPSC model to validate the effect of this oncogenic mutation during LPC differentiation. We successfully generated LPCs from RETC634Y iPSCs using a 16-day protocol and detected an overexpression of cancer-associated markers as compared to control iPSCs. Transcriptomic analysis revealed a distinct signature of NSCLC tumor repression, suggesting a lung multilineage lung dedifferentiation, along with an upregulated signature associated with RETC634Y mutation, potentially linked to poor NSCLC prognosis. These findings were validated using the RETC634Y knock-in iPSC model, highlighting key cancerous targets such as PROM2 and C1QTNF6, known to be associated with poor prognostic outcomes. Furthermore, the LPCs derived from RETC634Y iPSCs exhibited a positive response to the RET inhibitor pralsetinib, evidenced by the downregulation of the cancer markers. This study provides a novel patient-derived off-the-shelf iPSC model of RET-driven NSCLC, paving the way for exploring the molecular mechanisms involved in RET-driven NSCLC to study disease progression and to uncover potential therapeutic targets.
Collapse
Affiliation(s)
- Paul Marcoux
- INSERM UMR-S-1310, Université Paris Saclay, 94800 Villejuif, France; (P.M.); (J.W.H.); (C.D.); (J.I.); (A.B.-G.)
- Faculty of Medicine, Paris-Saclay University, 94270 Le Kremlin Bicetre, France
| | - Jin Wook Hwang
- INSERM UMR-S-1310, Université Paris Saclay, 94800 Villejuif, France; (P.M.); (J.W.H.); (C.D.); (J.I.); (A.B.-G.)
- Faculty of Medicine, Paris-Saclay University, 94270 Le Kremlin Bicetre, France
| | - Christophe Desterke
- INSERM UMR-S-1310, Université Paris Saclay, 94800 Villejuif, France; (P.M.); (J.W.H.); (C.D.); (J.I.); (A.B.-G.)
- Faculty of Medicine, Paris-Saclay University, 94270 Le Kremlin Bicetre, France
| | - Jusuf Imeri
- INSERM UMR-S-1310, Université Paris Saclay, 94800 Villejuif, France; (P.M.); (J.W.H.); (C.D.); (J.I.); (A.B.-G.)
- Faculty of Medicine, Paris-Saclay University, 94270 Le Kremlin Bicetre, France
| | - Annelise Bennaceur-Griscelli
- INSERM UMR-S-1310, Université Paris Saclay, 94800 Villejuif, France; (P.M.); (J.W.H.); (C.D.); (J.I.); (A.B.-G.)
- Faculty of Medicine, Paris-Saclay University, 94270 Le Kremlin Bicetre, France
- APHP Paris Saclay, Department of Hematology, Hôpital Bicêtre, 94270 Le Kremlin Bicetre, France
- Center for IPSC Therapies, CITHERA, INSERM UMS-45, Genopole Campus, 91100 Evry, France
- APHP Paris Saclay, Department of Hematology, Hôpital Paul Brousse, 94800 Villejuif, France
| | - Ali G. Turhan
- INSERM UMR-S-1310, Université Paris Saclay, 94800 Villejuif, France; (P.M.); (J.W.H.); (C.D.); (J.I.); (A.B.-G.)
- Faculty of Medicine, Paris-Saclay University, 94270 Le Kremlin Bicetre, France
- APHP Paris Saclay, Department of Hematology, Hôpital Bicêtre, 94270 Le Kremlin Bicetre, France
- Center for IPSC Therapies, CITHERA, INSERM UMS-45, Genopole Campus, 91100 Evry, France
- APHP Paris Saclay, Department of Hematology, Hôpital Paul Brousse, 94800 Villejuif, France
| |
Collapse
|
4
|
Bayraktar A, Li X, Kim W, Zhang C, Turkez H, Shoaie S, Mardinoglu A. Drug repositioning targeting glutaminase reveals drug candidates for the treatment of Alzheimer's disease patients. J Transl Med 2023; 21:332. [PMID: 37210557 DOI: 10.1186/s12967-023-04192-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Accepted: 05/11/2023] [Indexed: 05/22/2023] Open
Abstract
BACKGROUND Despite numerous clinical trials and decades of endeavour, there is still no effective cure for Alzheimer's disease. Computational drug repositioning approaches may be employed for the development of new treatment strategies for Alzheimer's patients since an extensive amount of omics data has been generated during pre-clinical and clinical studies. However, targeting the most critical pathophysiological mechanisms and determining drugs with proper pharmacodynamics and good efficacy are equally crucial in drug repurposing and often imbalanced in Alzheimer's studies. METHODS Here, we investigated central co-expressed genes upregulated in Alzheimer's disease to determine a proper therapeutic target. We backed our reasoning by checking the target gene's estimated non-essentiality for survival in multiple human tissues. We screened transcriptome profiles of various human cell lines perturbed by drug induction (for 6798 compounds) and gene knockout using data available in the Connectivity Map database. Then, we applied a profile-based drug repositioning approach to discover drugs targeting the target gene based on the correlations between these transcriptome profiles. We evaluated the bioavailability, functional enrichment profiles and drug-protein interactions of these repurposed agents and evidenced their cellular viability and efficacy in glial cell culture by experimental assays and Western blotting. Finally, we evaluated their pharmacokinetics to anticipate to which degree their efficacy can be improved. RESULTS We identified glutaminase as a promising drug target. Glutaminase overexpression may fuel the glutamate excitotoxicity in neurons, leading to mitochondrial dysfunction and other neurodegeneration hallmark processes. The computational drug repurposing revealed eight drugs: mitoxantrone, bortezomib, parbendazole, crizotinib, withaferin-a, SA-25547 and two unstudied compounds. We demonstrated that the proposed drugs could effectively suppress glutaminase and reduce glutamate production in the diseased brain through multiple neurodegeneration-associated mechanisms, including cytoskeleton and proteostasis. We also estimated the human blood-brain barrier permeability of parbendazole and SA-25547 using the SwissADME tool. CONCLUSIONS This study method effectively identified an Alzheimer's disease marker and compounds targeting the marker and interconnected biological processes by use of multiple computational approaches. Our results highlight the importance of synaptic glutamate signalling in Alzheimer's disease progression. We suggest repurposable drugs (like parbendazole) with well-evidenced activities that we linked to glutamate synthesis hereby and novel molecules (SA-25547) with estimated mechanisms for the treatment of Alzheimer's patients.
Collapse
Affiliation(s)
- Abdulahad Bayraktar
- Centre for Host-Microbiome Interactions, Faculty of Dentistry, Oral & Craniofacial Sciences, King's College London, London, SE1 9RT, UK
| | - Xiangyu Li
- Bash Biotech Inc, 600 West Broadway, Suite 700, San Diego, CA, 92101, USA
- Science for Life Laboratory, KTH-Royal Institute of Technology, SE-17121, Stockholm, Sweden
| | - Woonghee Kim
- Science for Life Laboratory, KTH-Royal Institute of Technology, SE-17121, Stockholm, Sweden
| | - Cheng Zhang
- Science for Life Laboratory, KTH-Royal Institute of Technology, SE-17121, Stockholm, Sweden
| | - Hasan Turkez
- Department of Medical Biology, Faculty of Medicine, Atatürk University, Erzurum, Turkey
| | - Saeed Shoaie
- Centre for Host-Microbiome Interactions, Faculty of Dentistry, Oral & Craniofacial Sciences, King's College London, London, SE1 9RT, UK
| | - Adil Mardinoglu
- Centre for Host-Microbiome Interactions, Faculty of Dentistry, Oral & Craniofacial Sciences, King's College London, London, SE1 9RT, UK.
- Science for Life Laboratory, KTH-Royal Institute of Technology, SE-17121, Stockholm, Sweden.
| |
Collapse
|
5
|
Garg T, Weiss CR, Sheth RA. Techniques for Profiling the Cellular Immune Response and Their Implications for Interventional Oncology. Cancers (Basel) 2022; 14:3628. [PMID: 35892890 PMCID: PMC9332307 DOI: 10.3390/cancers14153628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 07/19/2022] [Accepted: 07/20/2022] [Indexed: 12/07/2022] Open
Abstract
In recent years there has been increased interest in using the immune contexture of the primary tumors to predict the patient's prognosis. The tumor microenvironment of patients with cancers consists of different types of lymphocytes, tumor-infiltrating leukocytes, dendritic cells, and others. Different technologies can be used for the evaluation of the tumor microenvironment, all of which require a tissue or cell sample. Image-guided tissue sampling is a cornerstone in the diagnosis, stratification, and longitudinal evaluation of therapeutic efficacy for cancer patients receiving immunotherapies. Therefore, interventional radiologists (IRs) play an essential role in the evaluation of patients treated with systemically administered immunotherapies. This review provides a detailed description of different technologies used for immune assessment and analysis of the data collected from the use of these technologies. The detailed approach provided herein is intended to provide the reader with the knowledge necessary to not only interpret studies containing such data but also design and apply these tools for clinical practice and future research studies.
Collapse
Affiliation(s)
- Tushar Garg
- Division of Vascular and Interventional Radiology, Russell H. Morgan Department of Radiology and Radiological Science, The Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA; (T.G.); (C.R.W.)
| | - Clifford R. Weiss
- Division of Vascular and Interventional Radiology, Russell H. Morgan Department of Radiology and Radiological Science, The Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA; (T.G.); (C.R.W.)
| | - Rahul A. Sheth
- Department of Interventional Radiology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| |
Collapse
|
6
|
Hephzibah Cathryn R, Udhaya Kumar S, Younes S, Zayed H, George Priya Doss C. A review of bioinformatics tools and web servers in different microarray platforms used in cancer research. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 131:85-164. [PMID: 35871897 DOI: 10.1016/bs.apcsb.2022.05.002] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Over the past decade, conventional lab work strategies have gradually shifted from being limited to a laboratory setting towards a bioinformatics era to help manage and process the vast amounts of data generated by omics technologies. The present work outlines the latest contributions of bioinformatics in analyzing microarray data and their application to cancer. We dissect different microarray platforms and their use in gene expression in cancer models. We highlight how computational advances empowered the microarray technology in gene expression analysis. The study on protein-protein interaction databases classified into primary, derived, meta-database, and prediction databases describes the strategies to curate and predict novel interaction networks in silico. In addition, we summarize the areas of bioinformatics where neural graph networks are currently being used, such as protein functions, protein interaction prediction, and in silico drug discovery and development. We also discuss the role of deep learning as a potential tool in the prognosis, diagnosis, and treatment of cancer. Integrating these resources efficiently, practically, and ethically is likely to be the most challenging task for the healthcare industry over the next decade; however, we believe that it is achievable in the long term.
Collapse
Affiliation(s)
- R Hephzibah Cathryn
- Laboratory of Integrative Genomics, Department of Integrative Biology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore, India
| | - S Udhaya Kumar
- Laboratory of Integrative Genomics, Department of Integrative Biology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore, India
| | - Salma Younes
- Department of Biomedical Sciences, College of Health and Sciences, Qatar University, QU Health, Doha, Qatar
| | - Hatem Zayed
- Department of Biomedical Sciences, College of Health and Sciences, Qatar University, QU Health, Doha, Qatar
| | - C George Priya Doss
- Laboratory of Integrative Genomics, Department of Integrative Biology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore, India.
| |
Collapse
|
7
|
Li L, Liu R, Peng C, Chen X, Li J. Pharmacogenomics for the efficacy and side effects of antihistamines. Exp Dermatol 2022; 31:993-1004. [PMID: 35538735 DOI: 10.1111/exd.14602] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 05/01/2022] [Accepted: 05/09/2022] [Indexed: 11/27/2022]
Abstract
Antihistamines, especially H1 antihistamines, are widely used in the treatment of allergic diseases such as urticaria and allergic rhinitis, mainly for reversing elevated histamine and anti-allergic effects. Antihistamines are generally safe, but some patients experience adverse reactions, such as cardiotoxicity, central inhibition, and anticholinergic effects. There are also individual differences in antihistamine efficacy in clinical practice. The concept of individualized medicine has been deeply rooted in people's minds since it was put forward. Pharmacogenomics is the study of the role of inheritance in individual variations in drug response. In recent decades, pharmacogenomics has been developing rapidly, which provides new ideas for individualized medicine. Polymorphisms in the genes encoding metabolic enzymes, transporters, and target receptors have been shown to affect the efficacy of antihistamines. In addition, recent evidence suggests that gene polymorphisms influence urticaria susceptibility and antihistamine therapy. Here, we summarize current reports in this area, aiming to contribute to future research in antihistamines and clinical guidance for antihistamines use in individualized medicine.
Collapse
Affiliation(s)
- Liqiao Li
- Department of Dermatology, Xiangya Hospital, Central South University, Changsha, China.,Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Changsha, China.,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Runqiu Liu
- Department of Dermatology, Xiangya Hospital, Central South University, Changsha, China.,Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Changsha, China.,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Cong Peng
- Department of Dermatology, Xiangya Hospital, Central South University, Changsha, China.,Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Changsha, China.,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Xiang Chen
- Department of Dermatology, Xiangya Hospital, Central South University, Changsha, China.,Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Changsha, China.,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Jie Li
- Department of Dermatology, Xiangya Hospital, Central South University, Changsha, China.,Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Changsha, China.,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, China
| |
Collapse
|
8
|
Kan J, Hu Y, Ge Y, Zhang W, Lu S, Zhao C, Zhang R, Liu Y. Declined expressions of vast mitochondria-related genes represented by CYCS and transcription factor ESRRA in skeletal muscle aging. Bioengineered 2021; 12:3485-3502. [PMID: 34229541 PMCID: PMC8806411 DOI: 10.1080/21655979.2021.1948951] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Revised: 06/22/2021] [Accepted: 06/23/2021] [Indexed: 11/23/2022] Open
Abstract
Age-related skeletal muscle deterioration (sarcopenia) has a significant effect on the elderly's health and quality of life, but the molecular and gene regulatory mechanisms remain largely unknown. It is necessary to identify the candidate genes related to skeletal muscle aging and prospective therapeutic targets for effective treatments. The age-line-related genes (ALRGs) and age-line-related transcripts (ALRTs) were investigated using the gene expression profiles of GSE47881 and GSE118825 from the Gene Expression Omnibus (GEO) database. The protein-protein interaction (PPI) networks were performed to identify the key molecules with Cytoscape, and Gene Set Enrichment Analysis (GSEA) was used to clarify the potential molecular functions. Two hub molecules were finally obtained and verified with quantitative real-time PCR (qRT-PCR). The results showed that the expression of mitochondria genes involved in mitochondrial electron transport, complex assembly of the respiratory chain, tricarboxylic acid cycle, oxidative phosphorylation, and ATP synthesis were down-regulated in skeletal muscle with aging. We further identified a primary hub gene of CYCS (Cytochrome C) and a key transcription factor of ESRRA (Estrogen-related Receptor Alpha) to be associated closely with skeletal muscle aging. PCR analysis confirmed the expressions of CYCS and ESRRA in gastrocnemius muscles of mice of different ages were significantly different, and decreased gradually with age. In conclusion, the main cause of skeletal muscle aging may be the systematically reduced expression of mitochondrial functional genes. The CYCS and ESRRA may play significant roles in the progression of skeletal muscle aging and serve as potential biomarkers for future diagnosis and treatment.
Collapse
Affiliation(s)
- Jingbao Kan
- Department of Geriatrics, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Yifang Hu
- Department of Geriatrics, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Yaoqi Ge
- Department of Geriatrics, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - WenSong Zhang
- Department of Geriatrics, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Shan Lu
- Department of Geriatrics, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Cuiping Zhao
- Department of Geriatrics, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Rihua Zhang
- Department of Geriatrics, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Yun Liu
- Department of Geriatrics, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| |
Collapse
|
9
|
Zhu D, Ma N, Chen L, Huang J, Zhong X. Verification of the role of spiperone in the treatment of COPD through bioinformatics analysis. Int Immunopharmacol 2021; 101:108308. [PMID: 34741870 DOI: 10.1016/j.intimp.2021.108308] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 10/13/2021] [Accepted: 10/22/2021] [Indexed: 11/17/2022]
Abstract
BACKGROUND Aim of this study is investigates the influence of spiperone on hydrolase activity pathway in chronic obstructive pulmonary disease (COPD). PATIENTS AND METHODS Differentially expressed genes (DEGs) were calculated by the limma package from microarray data GSE20257, and analysed via gene set enrichment analysis (GSEA) for identifying COPD related pathways. The regulation of hydrolase activity pathway related drugs was predicted by connectivity Map analysis (CMap). Western blotting and reverse transcription quantitative polymerase chain reaction (RT-qPCR) were used to investigate the effect of spiperone on regulation of hydrolase activity pathway in vitro experiment. RESULTS A total of 378 DEGs were identified by the limma package. GSEA suggested that the regulation of hydrolase activity pathway was involved in the development of COPD. CMap of hub genes of regulation of hydrolase activity pathwayshown the most significant compound was spiperone. Results of vitro experiment verify that cigarette smoke extract (CSE) can increase the expression of fibronectin 1 (FN1) and epidermal growth factor (EGF), coinsided with decrease the expression of chemokine (C-X3-C motif) ligand 1 (CX3CL1), chemokoine (C-C motif) ligand 20 (CCL20), complement component 3 (C3) and slithomolog 2 (SLIT2) in BESA-2B cells and U937 cells. Spiperone can reverse the effect of CSE in BESA-2B cells and U937 cells. CONCLUSION Regulation of hydrolase activity pathway was involved in the occurrence of COPD, spiperone was a potential drug for the treatment of COPD by affecting the regulation of hydrolase activity pathway. This study had provided new insights into the potential pathogenesis and treatment of COPD.
Collapse
Affiliation(s)
- Donglan Zhu
- Department of Respiratory and Critical Care Medicine, The First Affiliated Hospital of Guangxi Medical University, No 6 Shuangyong Road, Nanning, Guangxi 530021, China
| | - Nan Ma
- Department of Respiratory and Critical Care Medicine, The First Affiliated Hospital of Guangxi Medical University, No 6 Shuangyong Road, Nanning, Guangxi 530021, China
| | - Lin Chen
- Department of Respiratory and Critical Care Medicine, The First Affiliated Hospital of Guangxi Medical University, No 6 Shuangyong Road, Nanning, Guangxi 530021, China
| | - Jinfu Huang
- Department of Respiratory and Critical Care Medicine, The First Affiliated Hospital of Guangxi Medical University, No 6 Shuangyong Road, Nanning, Guangxi 530021, China
| | - Xiaoning Zhong
- Department of Respiratory and Critical Care Medicine, The First Affiliated Hospital of Guangxi Medical University, No 6 Shuangyong Road, Nanning, Guangxi 530021, China.
| |
Collapse
|
10
|
Li L, Jing Q, Yan S, Liu X, Sun Y, Zhu D, Wang D, Hao C, Xue D. Amadis: A Comprehensive Database for Association Between Microbiota and Disease. Front Physiol 2021; 12:697059. [PMID: 34335304 PMCID: PMC8317061 DOI: 10.3389/fphys.2021.697059] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2021] [Accepted: 06/22/2021] [Indexed: 12/18/2022] Open
Abstract
The human gastrointestinal tract represents a symbiotic bioreactor that can mediate the interaction of the human host. The deployment and integration of multi-omics technologies have depicted a more complete image of the functions performed by microbial organisms. In addition, a large amount of data has been generated in a short time. However, researchers struggling to keep track of these mountains of information need a way to conveniently gain a comprehensive understanding of the relationship between microbiota and human diseases. To tackle this issue, we developed Amadis (http://gift2disease.net/GIFTED), a manually curated database that provides experimentally supported microbiota-disease associations and a dynamic network construction method. The current version of the Amadis database documents 20167 associations between 221 human diseases and 774 gut microbes across 17 species, curated from more than 1000 articles. By using the curated data, users can freely select and combine modules to obtain a specific microbe-based human disease network. Additionally, Amadis provides a user-friendly interface for browsing, searching and downloading. We hope it can serve as a useful and valuable resource for researchers exploring the associations between gastrointestinal microbiota and human diseases.
Collapse
Affiliation(s)
- Long Li
- Department of General Surgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China.,Key Laboratory of Hepatosplenic Surgery, Ministry of Education, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Qingxu Jing
- Department of General Surgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China.,Key Laboratory of Hepatosplenic Surgery, Ministry of Education, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Sen Yan
- Department of Cardiology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Xuxu Liu
- Department of General Surgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China.,Key Laboratory of Hepatosplenic Surgery, Ministry of Education, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Yuanyuan Sun
- Department of General Surgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China.,Key Laboratory of Hepatosplenic Surgery, Ministry of Education, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Defu Zhu
- Family Medicine General Practice Clinic, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
| | - Dawei Wang
- Department of General Surgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China.,Key Laboratory of Hepatosplenic Surgery, Ministry of Education, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Chenjun Hao
- Department of General Surgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China.,Key Laboratory of Hepatosplenic Surgery, Ministry of Education, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Dongbo Xue
- Department of General Surgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China.,Key Laboratory of Hepatosplenic Surgery, Ministry of Education, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| |
Collapse
|
11
|
Uzunangelov V, Wong CK, Stuart JM. Accurate cancer phenotype prediction with AKLIMATE, a stacked kernel learner integrating multimodal genomic data and pathway knowledge. PLoS Comput Biol 2021; 17:e1008878. [PMID: 33861732 PMCID: PMC8081343 DOI: 10.1371/journal.pcbi.1008878] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 04/28/2021] [Accepted: 03/15/2021] [Indexed: 02/03/2023] Open
Abstract
Advancements in sequencing have led to the proliferation of multi-omic profiles of human cells under different conditions and perturbations. In addition, many databases have amassed information about pathways and gene "signatures"-patterns of gene expression associated with specific cellular and phenotypic contexts. An important current challenge in systems biology is to leverage such knowledge about gene coordination to maximize the predictive power and generalization of models applied to high-throughput datasets. However, few such integrative approaches exist that also provide interpretable results quantifying the importance of individual genes and pathways to model accuracy. We introduce AKLIMATE, a first kernel-based stacked learner that seamlessly incorporates multi-omics feature data with prior information in the form of pathways for either regression or classification tasks. AKLIMATE uses a novel multiple-kernel learning framework where individual kernels capture the prediction propensities recorded in random forests, each built from a specific pathway gene set that integrates all omics data for its member genes. AKLIMATE has comparable or improved performance relative to state-of-the-art methods on diverse phenotype learning tasks, including predicting microsatellite instability in endometrial and colorectal cancer, survival in breast cancer, and cell line response to gene knockdowns. We show how AKLIMATE is able to connect feature data across data platforms through their common pathways to identify examples of several known and novel contributors of cancer and synthetic lethality.
Collapse
Affiliation(s)
- Vladislav Uzunangelov
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, United States of America
| | - Christopher K. Wong
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, United States of America
| | - Joshua M. Stuart
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, United States of America
- * E-mail:
| |
Collapse
|
12
|
Nguyen T, Zhang T, Fox G, Zeng S, Cao N, Pan C, Chen JY. Linking clinotypes to phenotypes and genotypes from laboratory test results in comprehensive physical exams. BMC Med Inform Decis Mak 2021; 21:51. [PMID: 33627109 PMCID: PMC7903607 DOI: 10.1186/s12911-021-01387-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Accepted: 01/06/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In this work, we aimed to demonstrate how to utilize the lab test results and other clinical information to support precision medicine research and clinical decisions on complex diseases, with the support of electronic medical record facilities. We defined "clinotypes" as clinical information that could be observed and measured objectively using biomedical instruments. From well-known 'omic' problem definitions, we defined problems using clinotype information, including stratifying patients-identifying interested sub cohorts for future studies, mining significant associations between clinotypes and specific phenotypes-diseases, and discovering potential linkages between clinotype and genomic information. We solved these problems by integrating public omic databases and applying advanced machine learning and visual analytic techniques on two-year health exam records from a large population of healthy southern Chinese individuals (size n = 91,354). When developing the solution, we carefully addressed the missing information, imbalance and non-uniformed data annotation issues. RESULTS We organized the techniques and solutions to address the problems and issues above into CPA framework (Clinotype Prediction and Association-finding). At the data preprocessing step, we handled the missing value issue with predicted accuracy of 0.760. We curated 12,635 clinotype-gene associations. We found 147 Associations between 147 chronic diseases-phenotype and clinotypes, which improved the disease predictive performance to AUC (average) of 0.967. We mined 182 significant clinotype-clinotype associations among 69 clinotypes. CONCLUSIONS Our results showed strong potential connectivity between the omics information and the clinical lab test information. The results further emphasized the needs to utilize and integrate the clinical information, especially the lab test results, in future PheWas and omic studies. Furthermore, it showed that the clinotype information could initiate an alternative research direction and serve as an independent field of data to support the well-known 'phenome' and 'genome' researches.
Collapse
Affiliation(s)
- Thanh Nguyen
- Informatics Institute, School of Medicine, The University of Alabama at Birmingham, AL, Birmingham, USA
| | - Tongbin Zhang
- School of First Clinical Medical Sciences - School of Information and Engineering, Wenzhou Medical University, Zhejiang, China
- Department of Computer Technology and Information Management, The First Affiliated Hospital of Wenzhou Medical University, Zhejiang, China
| | - Geoffrey Fox
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA
| | - Sisi Zeng
- School of First Clinical Medical Sciences - School of Information and Engineering, Wenzhou Medical University, Zhejiang, China
| | - Ni Cao
- School of First Clinical Medical Sciences - School of Information and Engineering, Wenzhou Medical University, Zhejiang, China
| | - Chuandi Pan
- School of First Clinical Medical Sciences - School of Information and Engineering, Wenzhou Medical University, Zhejiang, China
- Department of Computer Technology and Information Management, The First Affiliated Hospital of Wenzhou Medical University, Zhejiang, China
| | - Jake Y Chen
- Informatics Institute, School of Medicine, The University of Alabama at Birmingham, AL, Birmingham, USA.
| |
Collapse
|
13
|
Zhang J, Yan S, Li R, Wang G, Kang S, Wang Y, Hou W, Wang C, Tian W. CRMarker: A manually curated comprehensive resource of cancer RNA markers. Int J Biol Macromol 2021; 174:263-269. [PMID: 33529633 DOI: 10.1016/j.ijbiomac.2021.01.186] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2020] [Revised: 01/06/2021] [Accepted: 01/28/2021] [Indexed: 01/08/2023]
Abstract
Biomolecular markers have extremely important value for cancer research and treatment. However, as far as we know, there are still no searchable and predictable resources focusing on multiple classes of RNA molecular markers in cancers. Herein, we developed CRMarker, a manually curated comprehensive repository of cancer RNA markers. In the current release, CRMarker v1.1 consists of 5489 "known" cancer RNA markers based on 8756 valid publications in PubMed, including 2878 mRNAs (genes), 1314 miRNAs, 1097 lncRNAs and 200 circRNAs, and involving two functional molecules (diagnosis and prognosis), 21 organisms and 154 cancers. The search results provided by the database are comprehensive, including 11 items such as RNA molecule expression and risk level, type of tissue or sample, cancer subtype, reference type, etc. Moreover, CRMarker also provides more than 18,000 potential cancer RNA markers, which are predicted based on "guilt-by-association" analysis of the above-mentioned "known" RNA markers and three molecular interaction networks, and survival analysis of 18 gene expression data sets with survival data. CRMarker v1.1 has a friendly interface and is freely available online at http://crmarker.hnnu.edu.cn/. We aim to build a comprehensive platform that is convenient for cancer researchers and clinicians to inquire and retrieve.
Collapse
Affiliation(s)
- Jifeng Zhang
- School of Biological Engineering, Huainan Normal University, Huainan 232001, PR China; Department of Biostatistics and Computational Biology, School of Life Sciences, Fudan University, Shanghai 200436, PR China; Key Laboratory of Industrial Dust Prevention and Control & Occupational Health and Safety, Ministry of Education, Huainan, PR China; Anhui Shanhe Pharmaceutical Excipients Co., Ltd., Huainan, PR China.
| | - Shoubao Yan
- School of Biological Engineering, Huainan Normal University, Huainan 232001, PR China
| | - Ruoyu Li
- School of Computer Science, Huainan Normal University, Huainan, PR China
| | - Gangyuan Wang
- School of Biological Engineering, Huainan Normal University, Huainan 232001, PR China
| | - Siyong Kang
- School of Computer Science, Huainan Normal University, Huainan, PR China
| | - Ying Wang
- School of Biological Engineering, Huainan Normal University, Huainan 232001, PR China
| | - Wenmin Hou
- School of Biological Engineering, Huainan Normal University, Huainan 232001, PR China
| | - Chenrun Wang
- School of Biological Engineering, Huainan Normal University, Huainan 232001, PR China
| | - Weidong Tian
- Department of Biostatistics and Computational Biology, School of Life Sciences, Fudan University, Shanghai 200436, PR China.
| |
Collapse
|
14
|
Vittrant B, Leclercq M, Martin-Magniette ML, Collins C, Bergeron A, Fradet Y, Droit A. Identification of a Transcriptomic Prognostic Signature by Machine Learning Using a Combination of Small Cohorts of Prostate Cancer. Front Genet 2020; 11:550894. [PMID: 33324443 PMCID: PMC7723980 DOI: 10.3389/fgene.2020.550894] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Accepted: 10/29/2020] [Indexed: 01/31/2023] Open
Abstract
Determining which treatment to provide to men with prostate cancer (PCa) is a major challenge for clinicians. Currently, the clinical risk-stratification for PCa is based on clinico-pathological variables such as Gleason grade, stage and prostate specific antigen (PSA) levels. But transcriptomic data have the potential to enable the development of more precise approaches to predict evolution of the disease. However, high quality RNA sequencing (RNA-seq) datasets along with clinical data with long follow-up allowing discovery of biochemical recurrence (BCR) biomarkers are small and rare. In this study, we propose a machine learning approach that is robust to batch effect and enables the discovery of highly predictive signatures despite using small datasets. Gene expression data were extracted from three RNA-Seq datasets cumulating a total of 171 PCa patients. Data were re-analyzed using a unique pipeline to ensure uniformity. Using a machine learning approach, a total of 14 classifiers were tested with various parameters to identify the best model and gene signature to predict BCR. Using a random forest model, we have identified a signature composed of only three genes (JUN, HES4, PPDPF) predicting BCR with better accuracy [74.2%, balanced error rate (BER) = 27%] than the clinico-pathological variables (69.2%, BER = 32%) currently in use to predict PCa evolution. This score is in the range of the studies that predicted BCR in single-cohort with a higher number of patients. We showed that it is possible to merge and analyze different small and heterogeneous datasets altogether to obtain a better signature than if they were analyzed individually, thus reducing the need for very large cohorts. This study demonstrates the feasibility to regroup different small datasets in one larger to identify a predictive genomic signature that would benefit PCa patients.
Collapse
Affiliation(s)
- Benjamin Vittrant
- Centre de Recherche du CHU de Québec - Université Laval, Québec, QC, Canada.,Département de Médecine Moléculaire, Université Laval, QC, Canada
| | - Mickael Leclercq
- Centre de Recherche du CHU de Québec - Université Laval, Québec, QC, Canada.,Département de Médecine Moléculaire, Université Laval, QC, Canada
| | - Marie-Laure Martin-Magniette
- Universities of Paris Saclay, Paris, Evry, CNRS, INRAE, Institute of Plant Sciences Paris Saclay (IPS2), 91192, GIf sur Yvette, France.,UMR MIA-Paris, AgroParisTech, INRA, Université Paris-Saclay, Paris, France
| | - Colin Collins
- Vancouver Prostate Cancer Centre, Vancouver, BC, Canada.,Department of Urologic Sciences, The University of British Columbia, Vancouver, BC, Canada
| | - Alain Bergeron
- Centre de Recherche du CHU de Québec - Université Laval, Québec, QC, Canada.,Département de Chirurgie, Oncology Axis, Université Laval, Québec, QC, Canada
| | - Yves Fradet
- Centre de Recherche du CHU de Québec - Université Laval, Québec, QC, Canada.,Département de Chirurgie, Oncology Axis, Université Laval, Québec, QC, Canada
| | - Arnaud Droit
- Centre de Recherche du CHU de Québec - Université Laval, Québec, QC, Canada.,Département de Médecine Moléculaire, Université Laval, QC, Canada
| |
Collapse
|
15
|
Duan Y, Evans DS, Miller RA, Schork NJ, Cummings S, Girke T. signatureSearch: environment for gene expression signature searching and functional interpretation. Nucleic Acids Res 2020; 48:e124. [PMID: 33068417 PMCID: PMC7708038 DOI: 10.1093/nar/gkaa878] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2020] [Revised: 08/19/2020] [Accepted: 09/25/2020] [Indexed: 12/14/2022] Open
Abstract
signatureSearch is an R/Bioconductor package that integrates a suite of existing and novel algorithms into an analysis environment for gene expression signature (GES) searching combined with functional enrichment analysis (FEA) and visualization methods to facilitate the interpretation of the search results. In a typical GES search (GESS), a query GES is searched against a database of GESs obtained from large numbers of measurements, such as different genetic backgrounds, disease states and drug perturbations. Database matches sharing correlated signatures with the query indicate related cellular responses frequently governed by connected mechanisms, such as drugs mimicking the expression responses of a disease. To identify which processes are predominantly modulated in the GESS results, we developed specialized FEA methods combined with drug-target network visualization tools. The provided analysis tools are useful for studying the effects of genetic, chemical and environmental perturbations on biological systems, as well as searching single cell GES databases to identify novel network connections or cell types. The signatureSearch software is unique in that it provides access to an integrated environment for GESS/FEA routines that includes several novel search and enrichment methods, efficient data structures, and access to pre-built GES databases, and allowing users to work with custom databases.
Collapse
Affiliation(s)
- Yuzhu Duan
- Institute for Integrative Genome Biology, 1207F Genomics Building, University of California, Riverside, CA 92521, USA
| | - Daniel S Evans
- California Pacific Medical Center Research Institute, 550 16th Street, 2nd floor, San Francisco, CA 94158, USA
| | - Richard A Miller
- Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Nicholas J Schork
- Department of Quantitative Medicine and Systems Biology, The Translational Genomics Research Institute, 445 N. Fifth Street Phoenix, AZ 85004, USA
| | - Steven R Cummings
- California Pacific Medical Center Research Institute, 550 16th Street, 2nd floor, San Francisco, CA 94158, USA
| | - Thomas Girke
- Institute for Integrative Genome Biology, 1207F Genomics Building, University of California, Riverside, CA 92521, USA
| |
Collapse
|
16
|
Seo MK, Paik S, Kim S. An Improved, Assay Platform Agnostic, Absolute Single Sample Breast Cancer Subtype Classifier. Cancers (Basel) 2020; 12:E3506. [PMID: 33255759 PMCID: PMC7761033 DOI: 10.3390/cancers12123506] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 11/18/2020] [Accepted: 11/24/2020] [Indexed: 11/26/2022] Open
Abstract
While intrinsic molecular subtypes provide important biological classification of breast cancer, the subtype assignment of individuals is influenced by assay technology and study cohort composition. We sought to develop a platform-independent absolute single-sample subtype classifier based on a minimal number of genes. Pairwise ratios for subtype-specific differentially expressed genes from un-normalized expression data from 432 breast cancer (BC) samples of The Cancer Genome Atlas (TCGA) were used as inputs for machine learning. The subtype classifier with the fewest number of genes and maximal classification power was selected during cross-validation. The final model was evaluated on 5816 samples from 10 independent studies profiled with four different assay platforms. Upon cross-validation within the TCGA cohort, a random forest classifier (MiniABS) with 11 genes achieved the best accuracy of 88.2%. Applying MiniABS to five validation sets of RNA-seq and microarray data showed an average accuracy of 85.15% (vs. 77.72% for Absolute Intrinsic Molecular Subtype (AIMS)). Only MiniABS could be applied to five low-throughput datasets, showing an average accuracy of 87.93%. The MiniABS can absolutely subtype BC using the raw expression levels of only 11 genes, regardless of assay platform, with higher accuracy than existing methods.
Collapse
Affiliation(s)
- Mi-kyoung Seo
- Department of Biomedical Systems Informatics, Brain Korea 21 PLUS Project for Medical Science, Yonsei University College of Medicine, Seoul 03722, Korea;
| | - Soonmyung Paik
- Severance Biomedical Science Institute, Yonsei University College of Medicine, Seoul 03722, Korea
| | - Sangwoo Kim
- Department of Biomedical Systems Informatics, Brain Korea 21 PLUS Project for Medical Science, Yonsei University College of Medicine, Seoul 03722, Korea;
| |
Collapse
|
17
|
Tian K, Wang A, Wang J, Li W, Shen W, Li Y, Luo Z, Liu Y, Zhou Y. Transcriptome Analysis Identifies SenZfp536, a Sense LncRNA that Suppresses Self-renewal of Cortical Neural Progenitors. Neurosci Bull 2020; 37:183-200. [PMID: 33196962 DOI: 10.1007/s12264-020-00607-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Accepted: 08/12/2020] [Indexed: 11/28/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) regulate transcription to control development and homeostasis in a variety of tissues and organs. However, their roles in the development of the cerebral cortex have not been well elucidated. Here, a bioinformatics pipeline was applied to delineate the dynamic expression and potential cis-regulating effects of mouse lncRNAs using transcriptome data from 8 embryonic time points and sub-regions of the developing cerebral cortex. We further characterized a sense lncRNA, SenZfp536, which is transcribed downstream of and partially overlaps with the protein-coding gene Zfp536. Both SenZfp536 and Zfp536 were predominantly expressed in the proliferative zone of the developing cortex. Zfp536 was cis-regulated by SenZfp536, which facilitates looping between the promoter of Zfp536 and the genomic region that transcribes SenZfp536. Surprisingly, knocking down or activating the expression of SenZfp536 increased or compromised the proliferation of cortical neural progenitor cells (NPCs), respectively. Finally, overexpressing Zfp536 in cortical NPCs reversed the enhanced proliferation of cortical NPCs caused by SenZfp536 knockdown. The study deepens our understanding of how lncRNAs regulate the propagation of cortical NPCs through cis-regulatory mechanisms.
Collapse
Affiliation(s)
- Kuan Tian
- College of Life Sciences, Renmin Hospital of Wuhan University, Wuhan University, Wuhan, 430072, China.,Frontier Science Center for Immunology and Metabolism, Medical Research Institute, School of Medicine, Wuhan University, Wuhan, 430071, China
| | - Andi Wang
- College of Life Sciences, Renmin Hospital of Wuhan University, Wuhan University, Wuhan, 430072, China.,Frontier Science Center for Immunology and Metabolism, Medical Research Institute, School of Medicine, Wuhan University, Wuhan, 430071, China
| | - Junbao Wang
- College of Life Sciences, Renmin Hospital of Wuhan University, Wuhan University, Wuhan, 430072, China.,Frontier Science Center for Immunology and Metabolism, Medical Research Institute, School of Medicine, Wuhan University, Wuhan, 430071, China
| | - Wei Li
- College of Life Sciences, Renmin Hospital of Wuhan University, Wuhan University, Wuhan, 430072, China.,Frontier Science Center for Immunology and Metabolism, Medical Research Institute, School of Medicine, Wuhan University, Wuhan, 430071, China
| | - Wenchen Shen
- College of Life Sciences, Renmin Hospital of Wuhan University, Wuhan University, Wuhan, 430072, China.,Frontier Science Center for Immunology and Metabolism, Medical Research Institute, School of Medicine, Wuhan University, Wuhan, 430071, China
| | - Yamu Li
- College of Life Sciences, Renmin Hospital of Wuhan University, Wuhan University, Wuhan, 430072, China.,Frontier Science Center for Immunology and Metabolism, Medical Research Institute, School of Medicine, Wuhan University, Wuhan, 430071, China
| | - Zhiyuan Luo
- College of Life Sciences, Renmin Hospital of Wuhan University, Wuhan University, Wuhan, 430072, China.,Frontier Science Center for Immunology and Metabolism, Medical Research Institute, School of Medicine, Wuhan University, Wuhan, 430071, China
| | - Ying Liu
- College of Life Sciences, Renmin Hospital of Wuhan University, Wuhan University, Wuhan, 430072, China. .,Frontier Science Center for Immunology and Metabolism, Medical Research Institute, School of Medicine, Wuhan University, Wuhan, 430071, China.
| | - Yan Zhou
- College of Life Sciences, Renmin Hospital of Wuhan University, Wuhan University, Wuhan, 430072, China. .,Frontier Science Center for Immunology and Metabolism, Medical Research Institute, School of Medicine, Wuhan University, Wuhan, 430071, China. .,Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100730, China.
| |
Collapse
|
18
|
The transcription factor C/EBPβ orchestrates dendritic cell maturation and functionality under homeostatic and malignant conditions. Proc Natl Acad Sci U S A 2020; 117:26328-26339. [PMID: 33020261 DOI: 10.1073/pnas.2008883117] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Dendritic cell (DC) maturation is a prerequisite for the induction of adaptive immune responses against pathogens and cancer. Transcription factor (TF) networks control differential aspects of early DC progenitor versus late-stage DC cell fate decisions. Here, we identified the TF C/EBPβ as a key regulator for DC maturation and immunogenic functionality under homeostatic and lymphoma-transformed conditions. Upon cell-specific deletion of C/EBPβ in CD11c+MHCIIhi DCs, gene expression profiles of splenic C/EBPβ-/- DCs showed a down-regulation of E2F cell cycle target genes and associated proliferation signaling pathways, whereas maturation signatures were enriched. Total splenic DC cell numbers were modestly increased but differentiation into cDC1 and cDC2 subsets were unaltered. The splenic CD11c+MHCIIhiCD64+ DC compartment was also increased, suggesting that C/EBPβ deficiency favors the expansion of monocytic-derived DCs. Expression of C/EBPβ could be mimicked in LAP/LAP* isoform knockin DCs, whereas the short isoform LIP supported a differentiation program similar to deletion of the full-length TF. In accordance with E2F1 being a negative regulator of DC maturation, C/EBPβ-/- bone marrow-derived DCs matured much faster enabling them to activate and polarize T cells stronger. In contrast to a homeostatic condition, lymphoma-exposed DCs exhibited an up-regulation of the E2F transcriptional pathways and an impaired maturation. Pharmacological blockade of C/EBPβ/mTOR signaling in human DCs abrogated their protumorigenic function in primary B cell lymphoma cocultures. Thus, C/EBPβ plays a unique role in DC maturation and immunostimulatory functionality and emerges as a key factor of the tumor microenvironment that promotes lymphomagenesis.
Collapse
|
19
|
Hernaez M, Blatti C, Gevaert O. Comparison of single and module-based methods for modeling gene regulatory networks. Bioinformatics 2020; 36:558-567. [PMID: 31287491 DOI: 10.1093/bioinformatics/btz549] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Revised: 06/11/2019] [Accepted: 07/06/2019] [Indexed: 01/02/2023] Open
Abstract
MOTIVATION Gene regulatory networks describe the regulatory relationships among genes, and developing methods for reverse engineering these networks is an ongoing challenge in computational biology. The majority of the initially proposed methods for gene regulatory network discovery create a network of genes and then mine it in order to uncover previously unknown regulatory processes. More recent approaches have focused on inferring modules of co-regulated genes, linking these modules with regulatory genes and then mining them to discover new molecular biology. RESULTS In this work we analyze module-based network approaches to build gene regulatory networks, and compare their performance to single gene network approaches. In the process, we propose a novel approach to estimate gene regulatory networks drawing from the module-based methods. We show that generating modules of co-expressed genes which are predicted by a sparse set of regulators using a variational Bayes method, and then building a bipartite graph on the generated modules using sparse regression, yields more informative networks than previous single and module-based network approaches as measured by: (i) the rate of enriched gene sets, (ii) a network topology assessment, (iii) ChIP-Seq evidence and (iv) the KnowEnG Knowledge Network collection of previously characterized gene-gene interactions. AVAILABILITY AND IMPLEMENTATION The code is written in R and can be downloaded from https://github.com/mikelhernaez/linker. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mikel Hernaez
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | - Charles Blatti
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | - Olivier Gevaert
- The Stanford Center of Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University.,Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| |
Collapse
|
20
|
Maleki F, Ovens K, Hogan DJ, Kusalik AJ. Gene Set Analysis: Challenges, Opportunities, and Future Research. Front Genet 2020; 11:654. [PMID: 32695141 PMCID: PMC7339292 DOI: 10.3389/fgene.2020.00654] [Citation(s) in RCA: 100] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2020] [Accepted: 05/29/2020] [Indexed: 12/14/2022] Open
Abstract
Gene set analysis methods are widely used to provide insight into high-throughput gene expression data. There are many gene set analysis methods available. These methods rely on various assumptions and have different requirements, strengths and weaknesses. In this paper, we classify gene set analysis methods based on their components, describe the underlying requirements and assumptions for each class, and provide directions for future research in developing and evaluating gene set analysis methods.
Collapse
|
21
|
Cheng Q, Li J, Fan F, Cao H, Dai ZY, Wang ZY, Feng SS. Identification and Analysis of Glioblastoma Biomarkers Based on Single Cell Sequencing. Front Bioeng Biotechnol 2020; 8:167. [PMID: 32195242 PMCID: PMC7066068 DOI: 10.3389/fbioe.2020.00167] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 02/19/2020] [Indexed: 12/16/2022] Open
Abstract
Glioblastoma (GBM) is one of the most common and aggressive primary adult brain tumors. Tumor heterogeneity poses a great challenge to the treatment of GBM, which is determined by both heterogeneous GBM cells and a complex tumor microenvironment. Single-cell RNA sequencing (scRNA-seq) enables the transcriptomes of great deal of individual cells to be assayed in an unbiased manner and has been applied in head and neck cancer, breast cancer, blood disease, and so on. In this study, based on the scRNA-seq results of infiltrating neoplastic cells in GBM, computational methods were applied to screen core biomarkers that can distinguish the discrepancy between GBM tumor and pericarcinomatous environment. The gene expression profiles of GBM from 2343 tumor cells and 1246 periphery cells were analyzed by maximum relevance minimum redundancy (mRMR). Upon further analysis of the feature lists yielded by the mRMR method, 31 important genes were extracted that may be essential biomarkers for GBM tumor cells. Besides, an optimal classification model using a support vector machine (SVM) algorithm as the classifier was also built. Our results provided insights of GBM mechanisms and may be useful for GBM diagnosis and therapy.
Collapse
Affiliation(s)
- Quan Cheng
- Department of Neurosurgery, Xiangya Hospital, Central South University, Changsha, China.,Department of Clinical Pharmacology, Xiangya Hospital, Central South University, Changsha, China
| | - Jing Li
- Department of Rehabilitation, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Fan Fan
- Department of Neurosurgery, Xiangya Hospital, Central South University, Changsha, China
| | - Hui Cao
- Department of Psychiatry, The Second People's Hospital of Hunan University of Chinese Medicine, Changsha, China
| | - Zi-Yu Dai
- Department of Neurosurgery, Xiangya Hospital, Central South University, Changsha, China
| | - Ze-Yu Wang
- Department of Neurosurgery, Xiangya Hospital, Central South University, Changsha, China
| | - Song-Shan Feng
- Department of Neurosurgery, Xiangya Hospital, Central South University, Changsha, China
| |
Collapse
|
22
|
Schwede M, Waldron L, Mok SC, Wei W, Basunia A, Merritt MA, Mitsiades CS, Parmigiani G, Harrington DP, Quackenbush J, Birrer MJ, Culhane AC. The Impact of Stroma Admixture on Molecular Subtypes and Prognostic Gene Signatures in Serous Ovarian Cancer. Cancer Epidemiol Biomarkers Prev 2019; 29:509-519. [PMID: 31871106 DOI: 10.1158/1055-9965.epi-18-1359] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Revised: 04/26/2019] [Accepted: 12/06/2019] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Recent efforts to improve outcomes for high-grade serous ovarian cancer, a leading cause of cancer death in women, have focused on identifying molecular subtypes and prognostic gene signatures, but existing subtypes have poor cross-study robustness. We tested the contribution of cell admixture in published ovarian cancer molecular subtypes and prognostic gene signatures. METHODS Gene signatures of tumor and stroma were developed using paired microdissected tissue from two independent studies. Stromal genes were investigated in two molecular subtype classifications and 61 published gene signatures. Prognostic performance of gene signatures of stromal admixture was evaluated in 2,527 ovarian tumors (16 studies). Computational simulations of increasing stromal cell proportion were performed by mixing gene-expression profiles of paired microdissected ovarian tumor and stroma. RESULTS Recently described ovarian cancer molecular subtypes are strongly associated with the cell admixture. Tumors were classified as different molecular subtypes in simulations where the percentage of stromal cells increased. Stromal gene expression in bulk tumors was associated with overall survival (hazard ratio, 1.17; 95% confidence interval, 1.11-1.23), and in one data set, increased stroma was associated with anatomic sampling location. Five published prognostic gene signatures were no longer prognostic in a multivariate model that adjusted for stromal content. CONCLUSIONS Cell admixture affects the interpretation and reproduction of ovarian cancer molecular subtypes and gene signatures derived from bulk tissue. Elucidating the role of stroma in the tumor microenvironment and in prognosis is important. IMPACT Single-cell analyses may be required to refine the molecular subtypes of high-grade serous ovarian cancer.
Collapse
Affiliation(s)
- Matthew Schwede
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Levi Waldron
- Biostatistics, CUNY Graduate School of Public Health and Health Policy, New York, New York
| | - Samuel C Mok
- Department of Gynecologic Oncology and Reproductive Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Wei Wei
- Pfizer, Andover, Massachusetts
| | - Azfar Basunia
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, Massachusetts.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| | | | | | - Giovanni Parmigiani
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, Massachusetts.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| | - David P Harrington
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, Massachusetts.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| | - John Quackenbush
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| | - Michael J Birrer
- Division of Hematology-Oncology, University of Alabama at Birmingham, Birmingham, Alabama.
| | - Aedín C Culhane
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, Massachusetts. .,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| |
Collapse
|
23
|
Grau M, Lenz G, Lenz P. Dissection of gene expression datasets into clinically relevant interaction signatures via high-dimensional correlation maximization. Nat Commun 2019; 10:5417. [PMID: 31780653 PMCID: PMC6883077 DOI: 10.1038/s41467-019-12713-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Accepted: 09/20/2019] [Indexed: 12/12/2022] Open
Abstract
Gene expression is controlled by many simultaneous interactions, frequently measured collectively in biology and medicine by high-throughput technologies. It is a highly challenging task to infer from these data the generating effects and cooperating genes. Here, we present an unsupervised hypothesis-generating learning concept termed signal dissection by correlation maximization (SDCM) that dissects large high-dimensional datasets into signatures. Each signature captures a particular signal pattern that was consistently observed for multiple genes and samples, likely caused by the same underlying interaction. A key difference to other methods is our flexible nonlinear signal superposition model, combined with a precise regression technique. Analyzing gene expression of diffuse large B-cell lymphoma, our method discovers previously unidentified signatures that reveal significant differences in patient survival. These signatures are more predictive than those from various methods used for comparison and robustly validate across technological platforms. This implies highly specific extraction of clinically relevant gene interactions. Identification of clinically relevant gene expression signatures for cancer stratification remains challenging. Here, the authors introduce a flexible nonlinear signal superposition model that enables dissection of large gene expression data sets into signatures and extraction of gene interactions.
Collapse
Affiliation(s)
- Michael Grau
- Department of Medicine A, Albert-Schweitzer Campus 1, University Hospital Münster, 48149, Münster, Germany.,Cluster of Excellence EXC 1003, Cells in Motion, University of Münster, 48149, Münster, Germany
| | - Georg Lenz
- Department of Medicine A, Albert-Schweitzer Campus 1, University Hospital Münster, 48149, Münster, Germany.,Cluster of Excellence EXC 1003, Cells in Motion, University of Münster, 48149, Münster, Germany
| | - Peter Lenz
- Department of Physics, Renthof 5, University of Marburg, 35032, Marburg, Germany. .,LOEWE Center for Synthetic Microbiology, 35032, Marburg, Germany.
| |
Collapse
|
24
|
Chen YA, Tripathi LP, Fujiwara T, Kameyama T, Itoh MN, Mizuguchi K. The TargetMine Data Warehouse: Enhancement and Updates. Front Genet 2019; 10:934. [PMID: 31649722 PMCID: PMC6794636 DOI: 10.3389/fgene.2019.00934] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Accepted: 09/05/2019] [Indexed: 12/01/2022] Open
Abstract
Biological data analysis is the key to new discoveries in disease biology and drug discovery. The rapid proliferation of high-throughput ‘omics’ data has necessitated a need for tools and platforms that allow the researchers to combine and analyse different types of biological data and obtain biologically relevant knowledge. We had previously developed TargetMine, an integrative data analysis platform for target prioritisation and broad-based biological knowledge discovery. Here, we describe the newly modelled biological data types and the enhanced visual and analytical features of TargetMine. These enhancements have included: an enhanced coverage of gene–gene relations, small molecule metabolite to pathway mappings, an improved literature survey feature, and in silico prediction of gene functional associations such as protein–protein interactions and global gene co-expression. We have also described two usage examples on trans-omics data analysis and extraction of gene-disease associations using MeSH term descriptors. These examples have demonstrated how the newer enhancements in TargetMine have contributed to a more expansive coverage of the biological data space and can help interpret genotype–phenotype relations. TargetMine with its auxiliary toolkit is available at https://targetmine.mizuguchilab.org. The TargetMine source code is available at https://github.com/chenyian-nibio/targetmine-gradle.
Collapse
Affiliation(s)
- Yi-An Chen
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan
| | - Lokesh P Tripathi
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan
| | - Takeshi Fujiwara
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan
| | - Tatsuya Kameyama
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan
| | - Mari N Itoh
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan
| | - Kenji Mizuguchi
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan
| |
Collapse
|
25
|
Yue Z, Zheng Q, Neylon MT, Yoo M, Shin J, Zhao Z, Tan AC, Chen JY. PAGER 2.0: an update to the pathway, annotated-list and gene-signature electronic repository for Human Network Biology. Nucleic Acids Res 2019; 46:D668-D676. [PMID: 29126216 PMCID: PMC5753198 DOI: 10.1093/nar/gkx1040] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2017] [Accepted: 11/03/2017] [Indexed: 12/14/2022] Open
Abstract
Integrative Gene-set, Network and Pathway Analysis (GNPA) is a powerful data analysis approach developed to help interpret high-throughput omics data. In PAGER 1.0, we demonstrated that researchers can gain unbiased and reproducible biological insights with the introduction of PAGs (Pathways, Annotated-lists and Gene-signatures) as the basic data representation elements. In PAGER 2.0, we improve the utility of integrative GNPA by significantly expanding the coverage of PAGs and PAG-to-PAG relationships in the database, defining a new metric to quantify PAG data qualities, and developing new software features to simplify online integrative GNPA. Specifically, we included 84 282 PAGs spanning 24 different data sources that cover human diseases, published gene-expression signatures, drug-gene, miRNA-gene interactions, pathways and tissue-specific gene expressions. We introduced a new normalized Cohesion Coefficient (nCoCo) score to assess the biological relevance of genes inside a PAG, and RP-score to rank genes and assign gene-specific weights inside a PAG. The companion web interface contains numerous features to help users query and navigate the database content. The database content can be freely downloaded and is compatible with third-party Gene Set Enrichment Analysis tools. We expect PAGER 2.0 to become a major resource in integrative GNPA. PAGER 2.0 is available at http://discovery.informatics.uab.edu/PAGER/.
Collapse
Affiliation(s)
- Zongliang Yue
- Informatics Institute, School of Medicine, the University of Alabama at Birmingham, AL 35294, USA
| | - Qi Zheng
- Informatics Institute, School of Medicine, the University of Alabama at Birmingham, AL 35294, USA.,School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, Guangdong 510006, China
| | - Michael T Neylon
- Indiana University School of Informatics and Computing, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, USA
| | - Minjae Yoo
- Division of Medical Oncology, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Jimin Shin
- Division of Medical Oncology, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Zhiying Zhao
- Informatics Institute, School of Medicine, the University of Alabama at Birmingham, AL 35294, USA.,School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
| | - Aik Choon Tan
- Division of Medical Oncology, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Jake Y Chen
- Informatics Institute, School of Medicine, the University of Alabama at Birmingham, AL 35294, USA
| |
Collapse
|
26
|
Powers RK, Goodspeed A, Pielke-Lombardo H, Tan AC, Costello JC. GSEA-InContext: identifying novel and common patterns in expression experiments. Bioinformatics 2019; 34:i555-i564. [PMID: 29950010 PMCID: PMC6022535 DOI: 10.1093/bioinformatics/bty271] [Citation(s) in RCA: 139] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Motivation Gene Set Enrichment Analysis (GSEA) is routinely used to analyze and interpret coordinate pathway-level changes in transcriptomics experiments. For an experiment where less than seven samples per condition are compared, GSEA employs a competitive null hypothesis to test significance. A gene set enrichment score is tested against a null distribution of enrichment scores generated from permuted gene sets, where genes are randomly selected from the input experiment. Looking across a variety of biological conditions, however, genes are not randomly distributed with many showing consistent patterns of up- or down-regulation. As a result, common patterns of positively and negatively enriched gene sets are observed across experiments. Placing a single experiment into the context of a relevant set of background experiments allows us to identify both the common and experiment-specific patterns of gene set enrichment. Results We compiled a compendium of 442 small molecule transcriptomic experiments and used GSEA to characterize common patterns of positively and negatively enriched gene sets. To identify experiment-specific gene set enrichment, we developed the GSEA-InContext method that accounts for gene expression patterns within a background set of experiments to identify statistically significantly enriched gene sets. We evaluated GSEA-InContext on experiments using small molecules with known targets to show that it successfully prioritizes gene sets that are specific to each experiment, thus providing valuable insights that complement standard GSEA analysis. Availability and implementation GSEA-InContext implemented in Python, Supplementary results and the background expression compendium are available at: https://github.com/CostelloLab/GSEA-InContext.
Collapse
Affiliation(s)
- Rani K Powers
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.,Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Andrew Goodspeed
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Harrison Pielke-Lombardo
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.,Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Aik-Choon Tan
- Department of Medical Oncology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - James C Costello
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.,Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| |
Collapse
|
27
|
Cheng L, Yang H, Zhao H, Pei X, Shi H, Sun J, Zhang Y, Wang Z, Zhou M. MetSigDis: a manually curated resource for the metabolic signatures of diseases. Brief Bioinform 2019; 20:203-209. [PMID: 28968812 DOI: 10.1093/bib/bbx103] [Citation(s) in RCA: 87] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2017] [Indexed: 12/18/2022] Open
Abstract
Complex diseases cannot be understood only on the basis of single gene, single mRNA transcript or single protein but the effect of their collaborations. The combination consequence in molecular level can be captured by the alterations of metabolites. With the rapidly developing of biomedical instruments and analytical platforms, a large number of metabolite signatures of complex diseases were identified and documented in the literature. Biologists' hardship in the face of this large amount of papers recorded metabolic signatures of experiments' results calls for an automated data repository. Therefore, we developed MetSigDis aiming to provide a comprehensive resource of metabolite alterations in various diseases. MetSigDis is freely available at http://www.bio-annotation.cn/MetSigDis/. By reviewing hundreds of publications, we collected 6849 curated relationships between 2420 metabolites and 129 diseases across eight species involving Homo sapiens and model organisms. All of these relationships were used in constructing a metabolite disease network (MDN). This network displayed scale-free characteristics according to the degree distribution (power-law distribution with R2 = 0.909), and the subnetwork of MDN for interesting diseases and their related metabolites can be visualized in the Web. The common alterations of metabolites reflect the metabolic similarity of diseases, which is measured using Jaccard index. We observed that metabolite-based similar diseases are inclined to share semantic associations of Disease Ontology. A human disease network was then built, where a node represents a disease, and an edge indicates similarity of pair-wise diseases. The network validated the observation that linked diseases based on metabolites should have more overlapped genes.
Collapse
Affiliation(s)
- Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University
| | - Haixiu Yang
- College of Bioinformatics Science and Technology, Harbin Medical University
| | - Hengqiang Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University
| | - Xiaoya Pei
- College of Bioinformatics Science and Technology, Harbin Medical University
| | - Hongbo Shi
- College of Bioinformatics Science and Technology, Harbin Medical University
| | - Jie Sun
- College of Bioinformatics Science and Technology, Harbin Medical University
| | - Yunpeng Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University
| | - Zhenzhen Wang
- College of Bioinformatics Science and Technology, Harbin Medical University
| | - Meng Zhou
- College of Bioinformatics Science and Technology, Harbin Medical University
| |
Collapse
|
28
|
Yue Z, Willey CD, Hjelmeland AB, Chen JY. BEERE: a web server for biomedical entity expansion, ranking and explorations. Nucleic Acids Res 2019; 47:W578-W586. [PMID: 31114876 PMCID: PMC6602520 DOI: 10.1093/nar/gkz428] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Revised: 05/04/2019] [Accepted: 05/20/2019] [Indexed: 12/02/2022] Open
Abstract
BEERE (Biomedical Entity Expansion, Ranking and Explorations) is a new web-based data analysis tool to help biomedical researchers characterize any input list of genes/proteins, biomedical terms or their combinations, i.e. 'biomedical entities', in the context of existing literature. Specifically, BEERE first aims to help users examine the credibility of known entity-to-entity associative or semantic relationships supported by database or literature references from the user input of a gene/term list. Then, it will help users uncover the relative importance of each entity-a gene or a term-within the user input by computing the ranking scores of all entities. At last, it will help users hypothesize new gene functions or genotype-phenotype associations by an interactive visual interface of constructed global entity relationship network. The output from BEERE includes: a list of the original entities matched with known relationships in databases; any expanded entities that may be generated from the analysis; the ranks and ranking scores reported with statistical significance for each entity; and an interactive graphical display of the gene or term network within data provenance annotations that link to external data sources. The web server is free and open to all users with no login requirement and can be accessed at http://discovery.informatics.uab.edu/beere/.
Collapse
Affiliation(s)
- Zongliang Yue
- Informatics Institute, School of Medicine, the University of Alabama at Birmingham, AL 35233, USA
| | - Christopher D Willey
- Department of Radiation Oncology, School of Medicine, the University of Alabama at Birmingham, AL 35233, USA
| | - Anita B Hjelmeland
- Department of Cell, Developmental and Integrative Biology, School of Medicine, the University of Alabama at Birmingham, AL 35233, USA
| | - Jake Y Chen
- Informatics Institute, School of Medicine, the University of Alabama at Birmingham, AL 35233, USA
| |
Collapse
|
29
|
Raman P, Zimmerman S, Rathi KS, de Torrenté L, Sarmady M, Wu C, Leipzig J, Taylor DM, Tozeren A, Mar JC. A comparison of survival analysis methods for cancer gene expression RNA-Sequencing data. Cancer Genet 2019; 235-236:1-12. [PMID: 31296308 DOI: 10.1016/j.cancergen.2019.04.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Revised: 03/19/2019] [Accepted: 04/09/2019] [Indexed: 01/29/2023]
Abstract
Identifying genetic biomarkers of patient survival remains a major goal of large-scale cancer profiling studies. Using gene expression data to predict the outcome of a patient's tumor makes biomarker discovery a compelling tool for improving patient care. As genomic technologies expand, multiple data types may serve as informative biomarkers, and bioinformatic strategies have evolved around these different applications. For categorical variables such as a gene's mutation status, biomarker identification to predict survival time is straightforward. However, for continuous variables like gene expression, the available methods generate highly-variable results, and studies on best practices are lacking. We investigated the performance of eight methods that deal specifically with continuous data. K-means, Cox regression, concordance index, D-index, 25th-75th percentile split, median-split, distribution-based splitting, and KaplanScan were applied to four RNA-sequencing (RNA-seq) datasets from the Cancer Genome Atlas. The reliability of the eight methods was assessed by splitting each dataset into two groups and comparing the overlap of the results. Gene sets that had been identified from the literature for a specific tumor type served as positive controls to assess the accuracy of each biomarker using receiver operating characteristic (ROC) curves. Artificial RNA-Seq data were generated to test the robustness of these methods under fixed levels of gene expression noise. Our results show that methods based on dichotomizing tend to have consistently poor performance while C-index, D-index, and k-means perform well in most settings. Overall, the Cox regression method had the strongest performance based on tests of accuracy, reliability, and robustness.
Collapse
Affiliation(s)
- Pichai Raman
- School of Biomedical Engineering, Sciences and Health Systems, Drexel University, Philadelphia, PA, United States; Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, Philadelphia, PA, United States; Center for Data-Driven Discovery in Biomedicine, Children's Hospital of Philadelphia, Philadelphia, PA, United States.
| | - Samuel Zimmerman
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, NY, United States.
| | - Komal S Rathi
- Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, Philadelphia, PA, United States; Center for Data-Driven Discovery in Biomedicine, Children's Hospital of Philadelphia, Philadelphia, PA, United States.
| | - Laurence de Torrenté
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, NY, United States.
| | - Mahdi Sarmady
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, United States; Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| | - Chao Wu
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, United States.
| | - Jeremy Leipzig
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, United States; College of Computing and Informatics, Drexel University, Philadelphia, PA, United States.
| | - Deanne M Taylor
- Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, Philadelphia, PA, United States; The Department of Pediatrics, The University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA.
| | - Aydin Tozeren
- School of Biomedical Engineering, Sciences and Health Systems, Drexel University, Philadelphia, PA, United States.
| | - Jessica C Mar
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, NY, United States; Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, United States; Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, Australia.
| |
Collapse
|
30
|
Yue Z, Neylon MT, Nguyen T, Ratliff T, Chen JY. "Super Gene Set" Causal Relationship Discovery from Functional Genomics Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1991-1998. [PMID: 30040650 PMCID: PMC6380687 DOI: 10.1109/tcbb.2018.2858755] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this article, we present a computational framework to identify "causal relationships" among super gene sets. For "causal relationships," we refer to both stimulatory and inhibitory regulatory relationships, regardless of through direct or indirect mechanisms. For super gene sets, we refer to "pathways, annotated lists, and gene signatures," or PAGs. To identify causal relationships among PAGs, we extend the previous work on identifying PAG-to-PAG regulatory relationships by further requiring them to be significantly enriched with gene-to-gene co-expression pairs across the two PAGs involved. This is achieved by developing a quantitative metric based on PAG-to-PAG Co-expressions (PPC), which we use to infer the likelihood that PAG-to-PAG relationships under examination are causal-either stimulatory or inhibitory. Since true causal relationships are unknown, we approximate the overall performance of inferring causal relationships with the performance of recalling known r-type PAG-to-PAG relationships from causal PAG-to-PAG inference, using a functional genomics benchmark dataset from the GEO database. We report the area-under-curve (AUC) performance for both precision and recall being 0.81. By applying our framework to a myeloid-derived suppressor cells (MDSC) dataset, we further demonstrate that this framework is effective in helping build multi-scale biomolecular systems models with new insights on regulatory and causal links for downstream biological interpretations.
Collapse
Affiliation(s)
- Zongliang Yue
- Informatics Institute, the University of Alabama at Birmingham, Birmingham, AL 35233, US.
| | - Michael T. Neylon
- School of Informatics and Computing, Indiana University, Indianapolis, IN 46202, US.
| | - Thanh Nguyen
- Informatics Institute, the University of Alabama at Birmingham, Birmingham, AL 35233, US.
| | - Timothy Ratliff
- Purdue University Center for Cancer Research, West Lafayette, IN 47906, US.
| | - Jake Y. Chen
- Informatics Institute, the University of Alabama at Birmingham, Birmingham, AL 35233, US.
| |
Collapse
|
31
|
Jose V, Fumagalli D, Rothé F, Majjaj S, Loi S, Michiels S, Sotiriou C. Feasibility of developing reliable gene expression modules from FFPE derived RNA profiled on Affymetrix arrays. PLoS One 2018; 13:e0203346. [PMID: 30169535 PMCID: PMC6118369 DOI: 10.1371/journal.pone.0203346] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2017] [Accepted: 08/20/2018] [Indexed: 11/19/2022] Open
Abstract
The reliability of differential gene expression analysis on formalin-fixed, paraffin-embedded (FFPE) expression profiles generated using Affymetrix arrays is questionable, due to the high range of percent-present values reported in studies which profiled FFPE samples using this technology. Moreover, the validity of gene-modules derived from external datasets in FFPE microarray expression profiles is unknown. By generating matched gene expression profiles using RNAs derived from fresh-frozen (FF) and FFPE preserved breast tumors with Affymetrix arrays and FF/FFPE RNA specific amplification-and-labeling kits, the reliability of differential expression analysis and the validity of gene modules derived from external datasets were investigated. Specifically, the reliability of differential expression analysis was investigated by developing de-novo ER/HER2 pathway gene-modules from the matched datasets and validating them on external FF/FFPE gene expression datasets using ROC analysis. Spearman's rank correlation coefficient of module scores between matched FFPE/frozen datasets was used to measure the reliability of gene-modules derived from external datasets in FFPE expression profiles. Independent of the array/amplification-kit/sample preservation method used, de-novo ER/HER2 gene-modules derived from all matched datasets showed similar prediction performance in the independent validation (AUC range in FFPE dataset; ER: 0.93-0.95, HER2: 0.85-0.91), except for the de-novo ER/HER2 gene-module derived from the FFPE dataset using the 3'IVT kit (AUC range in FFPE dataset; ER: 0.79-0.81, HER2: 0.78). Among the external gene modules considered, roughly ~50% gene modules showed high concordance between expression profiles derived from matching FF and FFPE RNA. The remaining discordant gene modules between FF and FFPE expression profiles showed high concordance within matching FF datasets and within matching FFPE datasets independently, implying that microarrays still require improved amplification-and-sample-preparation protocols for deriving 100% concordant expression profiles from matching FF and FFPE RNA.
Collapse
Affiliation(s)
- Vinu Jose
- Breast Cancer Translational Research Laboratory, Institut Jules Bordet, Université Libre de Bruxelles, Brussels, Belgium
| | - Debora Fumagalli
- Breast International Group, Institut Jules Bordet, Université Libre de Bruxelles, Brussels, Belgium
| | - Françoise Rothé
- Breast Cancer Translational Research Laboratory, Institut Jules Bordet, Université Libre de Bruxelles, Brussels, Belgium
| | - Samira Majjaj
- Breast Cancer Translational Research Laboratory, Institut Jules Bordet, Université Libre de Bruxelles, Brussels, Belgium
| | - Sherene Loi
- Division of Research and Cancer Medicine, Peter MacCallum Cancer Centre, University of Melbourne, Melbourne, Australia
| | - Stefan Michiels
- Service de Biostatistique et D’Epidémiologie, Gustave Roussy, CESP, U1018, Université Paris-Sud, Faculté de Médcine, Université Paris-Saclay, Villejuif, France
| | - Christos Sotiriou
- Breast Cancer Translational Research Laboratory, Institut Jules Bordet, Université Libre de Bruxelles, Brussels, Belgium
- Department of Medicine, Medical Oncology Clinic, Institut Jules Bordet, Université Libre de Bruxelles, Brussels, Belgium
- * E-mail:
| |
Collapse
|
32
|
Wang C, Xu Y, Wang X, Zhang L, Wei S, Ye Q, Zhu Y, Yin H, Nainwal M, Tanon-Reyes L, Cheng F, Yin T, Ye N. GEsture: an online hand-drawing tool for gene expression pattern search. PeerJ 2018; 6:e4927. [PMID: 29942676 PMCID: PMC6015481 DOI: 10.7717/peerj.4927] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Accepted: 05/18/2018] [Indexed: 01/21/2023] Open
Abstract
Gene expression profiling data provide useful information for the investigation of biological function and process. However, identifying a specific expression pattern from extensive time series gene expression data is not an easy task. Clustering, a popular method, is often used to classify similar expression genes, however, genes with a 'desirable' or 'user-defined' pattern cannot be efficiently detected by clustering methods. To address these limitations, we developed an online tool called GEsture. Users can draw, or graph a curve using a mouse instead of inputting abstract parameters of clustering methods. GEsture explores genes showing similar, opposite and time-delay expression patterns with a gene expression curve as input from time series datasets. We presented three examples that illustrate the capacity of GEsture in gene hunting while following users' requirements. GEsture also provides visualization tools (such as expression pattern figure, heat map and correlation network) to display the searching results. The result outputs may provide useful information for researchers to understand the targets, function and biological processes of the involved genes.
Collapse
Affiliation(s)
- Chunyan Wang
- College of Information Science and Technology, Nanjing Forestry University, Nanjing, Jiangsu, China
| | - Yiqing Xu
- College of Information Science and Technology, Nanjing Forestry University, Nanjing, Jiangsu, China
| | - Xuelin Wang
- College of Information Science and Technology, Nanjing Forestry University, Nanjing, Jiangsu, China
| | - Li Zhang
- College of Information Science and Technology, Nanjing Forestry University, Nanjing, Jiangsu, China
| | - Suyun Wei
- College of Information Science and Technology, Nanjing Forestry University, Nanjing, Jiangsu, China
| | - Qiaolin Ye
- College of Information Science and Technology, Nanjing Forestry University, Nanjing, Jiangsu, China
| | - Youxiang Zhu
- College of Information Science and Technology, Nanjing Forestry University, Nanjing, Jiangsu, China
| | - Hengfu Yin
- Key Laboratory of Forest genetics and breeding, Research Institute of Subtropical Forestry, Chinese Academy of Forestry, Hangzhou, Zhejiang, China
| | - Manoj Nainwal
- Department of Computer Science, Nantong University, Nantong, Jiangsu, China
| | - Luis Tanon-Reyes
- Department of Cell Biology, Microbiology and Molecular Biology, University of South Florida, Tampa, United States of America
| | - Feng Cheng
- Department of Pharmaceutical Science, College of Pharmacy, University of South Florida, Tampa, United States of America
| | - Tongming Yin
- College of Forest Resources and Environment, Nanjing Forestry University, Nanjing, Jiangsu, China
| | - Ning Ye
- College of Information Science and Technology, Nanjing Forestry University, Nanjing, Jiangsu, China
| |
Collapse
|
33
|
Mohammed A, Biegert G, Adamec J, Helikar T. CancerDiscover: an integrative pipeline for cancer biomarker and cancer class prediction from high-throughput sequencing data. Oncotarget 2017; 9:2565-2573. [PMID: 29416792 PMCID: PMC5788660 DOI: 10.18632/oncotarget.23511] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2017] [Accepted: 12/09/2017] [Indexed: 11/25/2022] Open
Abstract
Accurate identification of cancer biomarkers and classification of cancer type and subtype from High Throughput Sequencing (HTS) data is a challenging problem because it requires manual processing of raw HTS data from various sequencing platforms, quality control, and normalization, which are both tedious and time-consuming. Machine learning techniques for cancer class prediction and biomarker discovery can hasten cancer detection and significantly improve prognosis. To date, great research efforts have been taken for cancer biomarker identification and cancer class prediction. However, currently available tools and pipelines lack flexibility in data preprocessing, running multiple feature selection methods and learning algorithms, therefore, developing a freely available and easy-to-use program is strongly demanded by researchers. Here, we propose CancerDiscover, an integrative open-source software pipeline that allows users to automatically and efficiently process large high-throughput raw datasets, normalize, and selects best performing features from multiple feature selection algorithms. Additionally, the integrative pipeline lets users apply different feature thresholds to identify cancer biomarkers and build various training models to distinguish different types and subtypes of cancer. The open-source software is available at https://github.com/HelikarLab/CancerDiscover and is free for use under the GPL3 license.
Collapse
Affiliation(s)
- Akram Mohammed
- Department of Biochemistry, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America
| | - Greyson Biegert
- Department of Biochemistry, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America
| | - Jiri Adamec
- Department of Biochemistry, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America
| | - Tomáš Helikar
- Department of Biochemistry, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America
| |
Collapse
|
34
|
Lemieux S, Sargeant T, Laperrière D, Ismail H, Boucher G, Rozendaal M, Lavallée VP, Ashton-Beaucage D, Wilhelm B, Hébert J, Hilton DJ, Mader S, Sauvageau G. MiSTIC, an integrated platform for the analysis of heterogeneity in large tumour transcriptome datasets. Nucleic Acids Res 2017; 45:e122. [PMID: 28472340 PMCID: PMC5570030 DOI: 10.1093/nar/gkx338] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2016] [Accepted: 04/21/2017] [Indexed: 01/22/2023] Open
Abstract
Genome-wide transcriptome profiling has enabled non-supervised classification of tumours, revealing different sub-groups characterized by specific gene expression features. However, the biological significance of these subtypes remains for the most part unclear. We describe herein an interactive platform, Minimum Spanning Trees Inferred Clustering (MiSTIC), that integrates the direct visualization and comparison of the gene correlation structure between datasets, the analysis of the molecular causes underlying co-variations in gene expression in cancer samples, and the clinical annotation of tumour sets defined by the combined expression of selected biomarkers. We have used MiSTIC to highlight the roles of specific transcription factors in breast cancer subtype specification, to compare the aspects of tumour heterogeneity targeted by different prognostic signatures, and to highlight biomarker interactions in AML. A version of MiSTIC preloaded with datasets described herein can be accessed through a public web server (http://mistic.iric.ca); in addition, the MiSTIC software package can be obtained (github.com/iric-soft/MiSTIC) for local use with personalized datasets.
Collapse
Affiliation(s)
- Sebastien Lemieux
- The Leucegene project, Université de Montréal, Montréal, QC H3C 3J7, Canada.,Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montréal, QC H3C 3J7, Canada.,Computer science and operation research, Université de Montréal, Montréal, QC H3C 3J7, Canada
| | - Tobias Sargeant
- The Leucegene project, Université de Montréal, Montréal, QC H3C 3J7, Canada.,Division of Molecular Medicine, The Walter and Eliza Hall Institute of Medical Research, Melbourne, Victoria 3050, Australia.,Department of Medical Biology, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - David Laperrière
- The Leucegene project, Université de Montréal, Montréal, QC H3C 3J7, Canada.,Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montréal, QC H3C 3J7, Canada
| | - Houssam Ismail
- The Leucegene project, Université de Montréal, Montréal, QC H3C 3J7, Canada.,Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montréal, QC H3C 3J7, Canada
| | - Geneviève Boucher
- The Leucegene project, Université de Montréal, Montréal, QC H3C 3J7, Canada.,Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montréal, QC H3C 3J7, Canada
| | - Marieke Rozendaal
- The Leucegene project, Université de Montréal, Montréal, QC H3C 3J7, Canada.,Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montréal, QC H3C 3J7, Canada
| | - Vincent-Philippe Lavallée
- The Leucegene project, Université de Montréal, Montréal, QC H3C 3J7, Canada.,Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montréal, QC H3C 3J7, Canada
| | - Dariel Ashton-Beaucage
- The Leucegene project, Université de Montréal, Montréal, QC H3C 3J7, Canada.,Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montréal, QC H3C 3J7, Canada
| | - Brian Wilhelm
- The Leucegene project, Université de Montréal, Montréal, QC H3C 3J7, Canada.,Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montréal, QC H3C 3J7, Canada
| | - Josée Hébert
- The Leucegene project, Université de Montréal, Montréal, QC H3C 3J7, Canada.,Division of Hematology, Maisonneuve-Rosemont Hospital, Montréal, QC H1T 2M4, Canada.,Leukemia Cell Bank of Quebec, Maisonneuve-Rosemont Hospital, Montréal, QC H1T 2M4, Canada.,Department of Medicine, Faculty of Medicine, Université de Montréal, Montréal, QC H3C 3J7, Canada
| | - Douglas J Hilton
- Division of Molecular Medicine, The Walter and Eliza Hall Institute of Medical Research, Melbourne, Victoria 3050, Australia.,Department of Medical Biology, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Sylvie Mader
- The Leucegene project, Université de Montréal, Montréal, QC H3C 3J7, Canada.,Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montréal, QC H3C 3J7, Canada.,Department of Biochemistry, Université de Montréal, Montréal, QC H3C 3J7, Canada, and Centre de Recherche du Centre Hospitalier Universitaire de l'Université de Montréal, Montréal, QC H2X 0A9, Canada
| | - Guy Sauvageau
- The Leucegene project, Université de Montréal, Montréal, QC H3C 3J7, Canada.,Institute for Research in Immunology and Cancer (IRIC), Université de Montréal, Montréal, QC H3C 3J7, Canada.,Division of Hematology, Maisonneuve-Rosemont Hospital, Montréal, QC H1T 2M4, Canada.,Leukemia Cell Bank of Quebec, Maisonneuve-Rosemont Hospital, Montréal, QC H1T 2M4, Canada.,Department of Medicine, Faculty of Medicine, Université de Montréal, Montréal, QC H3C 3J7, Canada
| |
Collapse
|
35
|
Sensitivity to PI3K and AKT inhibitors is mediated by divergent molecular mechanisms in subtypes of DLBCL. Blood 2017; 130:310-322. [DOI: 10.1182/blood-2016-12-758599] [Citation(s) in RCA: 69] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2016] [Accepted: 02/10/2017] [Indexed: 01/14/2023] Open
Abstract
Key Points
PI3Kα/δ inhibition induces cytotoxicity in ABC DLBCLs through downregulation of NF-κB signaling. Inhibition of AKT induces cytotoxicity by downregulation of MYC in PTEN-deficient DLBCL models in vivo and in vitro.
Collapse
|
36
|
Xu W, Cao Y, Xie Z, He H, He S, Hong H, Bo X, Li F. NFPscanner: a webtool for knowledge-based deciphering of biomedical networks. BMC Bioinformatics 2017; 18:262. [PMID: 28521733 PMCID: PMC5437514 DOI: 10.1186/s12859-017-1673-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2016] [Accepted: 05/03/2017] [Indexed: 12/05/2022] Open
Abstract
Background Many biological pathways have been created to represent different types of knowledge, such as genetic interactions, metabolic reactions, and gene-regulating and physical-binding relationships. Biologists are using a wide range of omics data to elaborately construct various context-specific differential molecular networks. However, they cannot easily gain insight into unfamiliar gene networks with the tools that are currently available for pathways resource and network analysis. They would benefit from the development of a standardized tool to compare functions of multiple biological networks quantitatively and promptly. Results To address this challenge, we developed NFPscanner, a web server for deciphering gene networks with pathway associations. Adapted from a recently reported knowledge-based framework called network fingerprint, NFPscanner integrates the annotated pathways of 7 databases, 4 algorithms, and 2 graphical visualization modules into a webtool. It implements 3 types of network analysis:Fingerprint: Deciphering gene networks and highlighting inherent pathway modules Alignment: Discovering functional associations by finding optimized node mapping between 2 gene networks Enrichment: Calculating and visualizing gene ontology (GO) and pathway enrichment for genes in networks
Users can upload gene networks to NFPscanner through the web interface and then interactively explore the networks’ functions. Conclusions NFPscanner is open-source software for non-commercial use, freely accessible at http://biotech.bmi.ac.cn/nfs. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1673-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wenjian Xu
- Department of Biotechnology, Beijing Institute of Radiation Medicine, 27 Taiping Street, Haidian District, Beijing, 100850, China
| | - Yang Cao
- Tianjin Institute of Health & Environmental Medicine, 1 Dali Road, Heping District, Tianjin, 300050, China
| | - Ziwei Xie
- Department of Biomedical Engineering, College of Life Science and Technology, Huazhong University of Science and Technology, 1037 Luoyu Road, Wuhan, 430074, Hubei, China
| | - Haochen He
- Department of Biotechnology, Beijing Institute of Radiation Medicine, 27 Taiping Street, Haidian District, Beijing, 100850, China
| | - Song He
- Department of Biotechnology, Beijing Institute of Radiation Medicine, 27 Taiping Street, Haidian District, Beijing, 100850, China
| | - Hao Hong
- Department of Biomedical Engineering, National University of Defense Technology, 109 Deya Road, Kaifu District, Changsha, 410073, Hunan, China
| | - Xiaochen Bo
- Department of Biotechnology, Beijing Institute of Radiation Medicine, 27 Taiping Street, Haidian District, Beijing, 100850, China.
| | - Fei Li
- Department of Biotechnology, Beijing Institute of Radiation Medicine, 27 Taiping Street, Haidian District, Beijing, 100850, China.
| |
Collapse
|
37
|
Paquet ER, Lesurf R, Tofigh A, Dumeaux V, Hallett MT. Detecting gene signature activation in breast cancer in an absolute, single-patient manner. Breast Cancer Res 2017; 19:32. [PMID: 28327201 PMCID: PMC5361722 DOI: 10.1186/s13058-017-0824-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Accepted: 03/02/2017] [Indexed: 01/20/2023] Open
Abstract
Background The ability to reliably identify the state (activated, repressed, or latent) of any molecular process in the tumor of a patient from an individual whole-genome gene expression profile obtained from microarray or RNA sequencing (RNA-seq) promises important clinical utility. Unfortunately, all previous bioinformatics tools are only applicable in large and diverse panels of patients, or are limited to a single specific pathway/process (e.g. proliferation). Methods Using a panel of 4510 whole-genome gene expression profiles from 10 different studies we built and selected models predicting the activation status of a compendium of 1733 different biological processes. Using a second independent validation dataset of 742 patients we validated the final list of 1773 models to be included in a de novo tool entitled absolute inference of patient signatures (AIPS). We also evaluated the prognostic significance of the 1773 individual models to predict outcome in all and in specific breast cancer subtypes. Results We described the development of the de novo tool entitled AIPS that can identify the activation status of a panel of 1733 different biological processes from an individual breast cancer microarray or RNA-seq profile without recourse to a broad cohort of patients. We demonstrated that AIPS is stable compared to previous tools, as the inferred pathway state is not affected by the composition of a dataset. We also showed that pathway states inferred by AIPS are in agreement with previous tools but use far fewer genes. We determined that several AIPS-defined pathways are prognostic across and within molecularly and clinically define subtypes (two-sided log-rank test false discovery rate (FDR) <5%). Interestingly, 74.5% (1291/1733) of the models are able to distinguish patients with luminal A cancer from those with luminal B cancer (Fisher’s exact test FDR <5%). Conclusion AIPS represents the first tool that would allow an individual breast cancer patient to obtain a thorough knowledge of the molecular processes active in their tumor from only one individual gene expression (N-of-1) profile. Electronic supplementary material The online version of this article (doi:10.1186/s13058-017-0824-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- E R Paquet
- Centre for Bioinformatics, McGill University, Montreal, Quebec, H3G 0B1, Canada.,The Rosalind and Morris Goodman Cancer Research Centre, McGill University, Montreal, Quebec, H3A 1A3, Canada
| | - R Lesurf
- Centre for Bioinformatics, McGill University, Montreal, Quebec, H3G 0B1, Canada.,The Rosalind and Morris Goodman Cancer Research Centre, McGill University, Montreal, Quebec, H3A 1A3, Canada
| | - A Tofigh
- Centre for Bioinformatics, McGill University, Montreal, Quebec, H3G 0B1, Canada.,The Rosalind and Morris Goodman Cancer Research Centre, McGill University, Montreal, Quebec, H3A 1A3, Canada.,School of Computer Science, McGill University, Montreal, Quebec, H3A 0E9, Canada
| | - V Dumeaux
- Centre for Bioinformatics, McGill University, Montreal, Quebec, H3G 0B1, Canada.,The Rosalind and Morris Goodman Cancer Research Centre, McGill University, Montreal, Quebec, H3A 1A3, Canada.,School of Computer Science, McGill University, Montreal, Quebec, H3A 0E9, Canada
| | - M T Hallett
- Centre for Bioinformatics, McGill University, Montreal, Quebec, H3G 0B1, Canada. .,The Rosalind and Morris Goodman Cancer Research Centre, McGill University, Montreal, Quebec, H3A 1A3, Canada. .,School of Computer Science, McGill University, Montreal, Quebec, H3A 0E9, Canada.
| |
Collapse
|
38
|
Rahmati S, Abovsky M, Pastrello C, Jurisica I. pathDIP: an annotated resource for known and predicted human gene-pathway associations and pathway enrichment analysis. Nucleic Acids Res 2016; 45:D419-D426. [PMID: 27899558 PMCID: PMC5210562 DOI: 10.1093/nar/gkw1082] [Citation(s) in RCA: 71] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Revised: 09/30/2016] [Accepted: 10/25/2016] [Indexed: 01/06/2023] Open
Abstract
Molecular pathway data are essential in current computational and systems biology research. While there are many primary and integrated pathway databases, several challenges remain, including low proteome coverage (57%), low overlap across different databases, unavailability of direct information about underlying physical connectivity of pathway members, and high fraction of protein-coding genes without any pathway annotations, i.e. ‘pathway orphans’. In order to address all these challenges, we developed pathDIP, which integrates data from 20 source pathway databases, ‘core pathways’, with physical protein–protein interactions to predict biologically relevant protein–pathway associations, referred to as ‘extended pathways’. Cross-validation determined 71% recovery rate of our predictions. Data integration and predictions increase coverage of pathway annotations for protein-coding genes to 86%, and provide novel annotations for 5732 pathway orphans. PathDIP (http://ophid.utoronto.ca/pathdip) annotates 17 070 protein-coding genes with 4678 pathways, and provides multiple query, analysis and output options.
Collapse
Affiliation(s)
- Sara Rahmati
- Department of Medical Biophysics, University of Toronto, Toronto, ON M5G 1L7, Canada
| | - Mark Abovsky
- Princess Margaret Cancer Centre, University Health Network, 101 College Street, TMDT, Room 11-314, Toronto, ON M5G 1L7, Canada
| | - Chiara Pastrello
- Princess Margaret Cancer Centre, University Health Network, 101 College Street, TMDT, Room 11-314, Toronto, ON M5G 1L7, Canada
| | - Igor Jurisica
- Department of Medical Biophysics, University of Toronto, Toronto, ON M5G 1L7, Canada .,Princess Margaret Cancer Centre, University Health Network, 101 College Street, TMDT, Room 11-314, Toronto, ON M5G 1L7, Canada.,Department of Computer Science, University of Toronto, Toronto, ON, Canada.,Institute of Neuroimmunology, Slovak Academy of Sciences, Bratislava, Slovakia
| |
Collapse
|
39
|
Samatov TR, Galatenko VV, Block A, Shkurnikov MY, Tonevitsky AG, Schumacher U. Novel biomarkers in cancer: The whole is greater than the sum of its parts. Semin Cancer Biol 2016; 45:50-57. [PMID: 27639751 DOI: 10.1016/j.semcancer.2016.09.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2016] [Accepted: 09/08/2016] [Indexed: 02/07/2023]
Abstract
The major issues hampering progress in the treatment of cancer patients are distant metastases and drug resistance to chemotherapy. Metastasis formation is a very complex process, and looking at gene signatures alone is not enough to get deep insight into it. This paper reviews traditional and novel approaches to identify gene signature biomarkers and intratumoural fluid pressure both as a novel way of creating predictive markers and as an obstacle to cancer therapy. Finally recently developed in vitro systems to predict the response of individual patient derived cancer explants to chemotherapy are discussed.
Collapse
Affiliation(s)
- Timur R Samatov
- SRC Bioclinicum, Ugreshskaya str 2/85, 115088, Moscow, Russia; Moscow State University of Mechanical Engineering, Bolshaya Semenovskaya str 38, 107023, Moscow, Russia
| | - Vladimir V Galatenko
- SRC Bioclinicum, Ugreshskaya str 2/85, 115088, Moscow, Russia; Lomonosov Moscow State University, Leninskie Gory, 119991, Moscow, Russia; National Research University Higher School of Economics, Kochnovsky Pass 3, 125319 Moscow, Russia
| | - Andreas Block
- Department of Oncology and Hematology, University Cancer Center, University Medical Center Hamburg-Eppendorf, Martinistraße 52, 20246, Hamburg, Germany
| | - Maxim Yu Shkurnikov
- P. Hertsen Moscow Oncology Research Institute, National Center of Medical Radiological Research, 3 Second Botkinsky Lane, Moscow, 125284, Russia
| | - Alexander G Tonevitsky
- Lomonosov Moscow State University, Leninskie Gory, 119991, Moscow, Russia; P. Hertsen Moscow Oncology Research Institute, National Center of Medical Radiological Research, 3 Second Botkinsky Lane, Moscow, 125284, Russia
| | - Udo Schumacher
- Department of Anatomy and Experimental Morphology, University Cancer Center, University Medical Center Hamburg-Eppendorf, Martinistraße 52, 20246, Hamburg, Germany, Germany.
| |
Collapse
|
40
|
Gätjen M, Brand F, Grau M, Gerlach K, Kettritz R, Westermann J, Anagnostopoulos I, Lenz P, Lenz G, Höpken UE, Rehm A. Splenic Marginal Zone Granulocytes Acquire an Accentuated Neutrophil B-Cell Helper Phenotype in Chronic Lymphocytic Leukemia. Cancer Res 2016; 76:5253-65. [PMID: 27488528 DOI: 10.1158/0008-5472.can-15-3486] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2015] [Accepted: 07/06/2016] [Indexed: 11/16/2022]
Abstract
Recruitment of tumor-associated macrophages and neutrophils (TAM and TAN) to solid tumors contributes to immunosuppression in the tumor microenvironment; however, their contributions to lymphoid neoplasms are less clear. In human chronic lymphocytic leukemia (CLL), tumor B cells lodge in lymph nodes where interactions with the microenvironment occur. Tumor cell homing stimulates proliferation, such that engagement of the B-cell receptor is important for malignant progression. In the Eμ-Tcl1 murine model of CLL, we identified gene expression signatures indicative of a skewed polarization in the phenotype of monocytes and neutrophils. Selective ablation of either of these cell populations in mice delayed leukemia growth. Despite tumor infiltration of these immune cells, a systemic inflammation was not detected. Notably, in progressive CLL, splenic neutrophils were observed to differentiate toward a B-cell helper phenotype, a process promoted by the induction of leukemia-associated IL10 and TGFβ. Our results suggest that targeting aberrant neutrophil differentiation and restoring myeloid cell homeostasis could limit the formation of survival niches for CLL cells. Cancer Res; 76(18); 5253-65. ©2016 AACR.
Collapse
Affiliation(s)
- Marcel Gätjen
- Department of Hematology, Oncology and Tumorimmunology, Max-Delbrück-Center for Molecular Medicine, Berlin, Germany
| | - Franziska Brand
- Department of Tumor Genetics and Immunogenetics, Max-Delbrück-Center for Molecular Medicine, Berlin, Germany
| | - Michael Grau
- Department of Physics, Philipps-University Marburg, Marburg, Germany. Cluster of Excellence EXC 1003, Cells in Motion, Münster, Germany
| | - Kerstin Gerlach
- Department of Hematology, Oncology and Tumorimmunology, Max-Delbrück-Center for Molecular Medicine, Berlin, Germany
| | - Ralph Kettritz
- Department of Nephrology and Intensive Care Medicine, Experimental and Clinical Research Center, Charité-University Medicine Berlin, Berlin, Germany
| | - Jörg Westermann
- Department of Hematology, Oncology and Tumorimmunology, Charité-University Medicine Berlin, Berlin, Germany
| | | | - Peter Lenz
- Department of Physics, Philipps-University Marburg, Marburg, Germany
| | - Georg Lenz
- Cluster of Excellence EXC 1003, Cells in Motion, Münster, Germany. Translational Oncology, Department of Medicine A, University Hospital Münster, Münster, Germany
| | - Uta E Höpken
- Department of Tumor Genetics and Immunogenetics, Max-Delbrück-Center for Molecular Medicine, Berlin, Germany.
| | - Armin Rehm
- Department of Hematology, Oncology and Tumorimmunology, Max-Delbrück-Center for Molecular Medicine, Berlin, Germany. Department of Hematology, Oncology and Tumorimmunology, Charité-University Medicine Berlin, Berlin, Germany.
| |
Collapse
|
41
|
Rouillard AD, Gundersen GW, Fernandez NF, Wang Z, Monteiro CD, McDermott MG, Ma'ayan A. The harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins. Database (Oxford) 2016; 2016:baw100. [PMID: 27374120 PMCID: PMC4930834 DOI: 10.1093/database/baw100] [Citation(s) in RCA: 889] [Impact Index Per Article: 111.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2016] [Revised: 05/15/2016] [Accepted: 05/31/2016] [Indexed: 12/18/2022]
Abstract
Genomics, epigenomics, transcriptomics, proteomics and metabolomics efforts rapidly generate a plethora of data on the activity and levels of biomolecules within mammalian cells. At the same time, curation projects that organize knowledge from the biomedical literature into online databases are expanding. Hence, there is a wealth of information about genes, proteins and their associations, with an urgent need for data integration to achieve better knowledge extraction and data reuse. For this purpose, we developed the Harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins from over 70 major online resources. We extracted, abstracted and organized data into ∼72 million functional associations between genes/proteins and their attributes. Such attributes could be physical relationships with other biomolecules, expression in cell lines and tissues, genetic associations with knockout mouse or human phenotypes, or changes in expression after drug treatment. We stored these associations in a relational database along with rich metadata for the genes/proteins, their attributes and the original resources. The freely available Harmonizome web portal provides a graphical user interface, a web service and a mobile app for querying, browsing and downloading all of the collected data. To demonstrate the utility of the Harmonizome, we computed and visualized gene-gene and attribute-attribute similarity networks, and through unsupervised clustering, identified many unexpected relationships by combining pairs of datasets such as the association between kinase perturbations and disease signatures. We also applied supervised machine learning methods to predict novel substrates for kinases, endogenous ligands for G-protein coupled receptors, mouse phenotypes for knockout genes, and classified unannotated transmembrane proteins for likelihood of being ion channels. The Harmonizome is a comprehensive resource of knowledge about genes and proteins, and as such, it enables researchers to discover novel relationships between biological entities, as well as form novel data-driven hypotheses for experimental validation.Database URL: http://amp.pharm.mssm.edu/Harmonizome.
Collapse
Affiliation(s)
- Andrew D Rouillard
- Department of Pharmacology and Systems Therapeutics, Department of Genetics and Genomic Sciences, BD2K-LINCS Data Coordination and Integration Center (DCIC), Mount Sinai's Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Gregory W Gundersen
- Department of Pharmacology and Systems Therapeutics, Department of Genetics and Genomic Sciences, BD2K-LINCS Data Coordination and Integration Center (DCIC), Mount Sinai's Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Nicolas F Fernandez
- Department of Pharmacology and Systems Therapeutics, Department of Genetics and Genomic Sciences, BD2K-LINCS Data Coordination and Integration Center (DCIC), Mount Sinai's Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Zichen Wang
- Department of Pharmacology and Systems Therapeutics, Department of Genetics and Genomic Sciences, BD2K-LINCS Data Coordination and Integration Center (DCIC), Mount Sinai's Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Caroline D Monteiro
- Department of Pharmacology and Systems Therapeutics, Department of Genetics and Genomic Sciences, BD2K-LINCS Data Coordination and Integration Center (DCIC), Mount Sinai's Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Michael G McDermott
- Department of Pharmacology and Systems Therapeutics, Department of Genetics and Genomic Sciences, BD2K-LINCS Data Coordination and Integration Center (DCIC), Mount Sinai's Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Avi Ma'ayan
- Department of Pharmacology and Systems Therapeutics, Department of Genetics and Genomic Sciences, BD2K-LINCS Data Coordination and Integration Center (DCIC), Mount Sinai's Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
42
|
Lesurf R, Aure M, Mørk H, Vitelli V, Lundgren S, Børresen-Dale AL, Kristensen V, Wärnberg F, Hallett M, Sørlie T, Sauer T, Geisler J, Hofvind S, Borgen E, Børresen-Dale AL, Engebråten O, Fodstad Ø, Garred Ø, Geitvik G, Kåresen R, Naume B, Mælandsmo G, Russnes H, Schlichting E, Sørlie T, Lingjærde O, Kristensen V, Sahlberg K, Skjerven H, Fritzman B. Molecular Features of Subtype-Specific Progression from Ductal Carcinoma In Situ to Invasive Breast Cancer. Cell Rep 2016; 16:1166-1179. [DOI: 10.1016/j.celrep.2016.06.051] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Revised: 05/03/2016] [Accepted: 06/10/2016] [Indexed: 12/21/2022] Open
|
43
|
Stover DG, Coloff JL, Barry WT, Brugge JS, Winer EP, Selfors LM. The Role of Proliferation in Determining Response to Neoadjuvant Chemotherapy in Breast Cancer: A Gene Expression-Based Meta-Analysis. Clin Cancer Res 2016; 22:6039-6050. [PMID: 27330058 DOI: 10.1158/1078-0432.ccr-16-0471] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2016] [Revised: 05/26/2016] [Accepted: 06/03/2016] [Indexed: 12/31/2022]
Abstract
PURPOSE To provide further insight into the role of proliferation and other cellular processes in chemosensitivity and resistance, we evaluated the association of a diverse set of gene expression signatures with response to neoadjuvant chemotherapy (NAC) in breast cancer. EXPERIMENTAL DESIGN Expression data from primary breast cancer biopsies for 1,419 patients in 17 studies prior to NAC were identified and aggregated using common normalization procedures. Clinicopathologic characteristics, including response to NAC, were collected. Scores for 125 previously published breast cancer-related gene expression signatures were calculated for each tumor. RESULTS Within each receptor-based subgroup or PAM50 subtype, breast tumors with high proliferation signature scores were significantly more likely to achieve pathologic complete response to NAC. To distinguish "proliferation-associated" from "proliferation-independent" signatures, we used correlation and linear modeling approaches. Most signatures associated with response to NAC were proliferation associated: 90.5% (38/42) in ER+/HER2- and 63.3% (38/60) in triple-negative breast cancer (TNBC). Proliferation-independent signatures predictive of response to NAC in ER+/HER2- breast cancer were related to immune activity, while those in TNBC comprised a diverse set of signatures, including immune, DNA damage, signaling pathways (PI3K, AKT, Ras, and EGFR), and "stemness" phenotypes. CONCLUSIONS Proliferation differences account for the vast majority of predictive capacity of gene expression signatures in neoadjuvant chemosensitivity for ER+/HER2- breast cancers and, to a lesser extent, TNBCs. Immune activation signatures are proliferation-independent predictors of pathologic complete response in ER+/HER2- breast cancers. In TNBCs, significant proliferation-independent signatures include gene sets that represent a diverse set of cellular processes. Clin Cancer Res; 22(24); 6039-50. ©2016 AACR.
Collapse
Affiliation(s)
- Daniel G Stover
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts.,Department of Cell Biology, Harvard Medical School, Boston, Massachusetts
| | - Jonathan L Coloff
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts
| | - William T Barry
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Joan S Brugge
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts
| | - Eric P Winer
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts.
| | - Laura M Selfors
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts.
| |
Collapse
|
44
|
Ternès N, Rotolo F, Michiels S. Empirical extensions of the lasso penalty to reduce the false discovery rate in high-dimensional Cox regression models. Stat Med 2016; 35:2561-73. [PMID: 26970107 DOI: 10.1002/sim.6927] [Citation(s) in RCA: 66] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2014] [Revised: 02/11/2016] [Accepted: 02/13/2016] [Indexed: 01/15/2023]
Abstract
Correct selection of prognostic biomarkers among multiple candidates is becoming increasingly challenging as the dimensionality of biological data becomes higher. Therefore, minimizing the false discovery rate (FDR) is of primary importance, while a low false negative rate (FNR) is a complementary measure. The lasso is a popular selection method in Cox regression, but its results depend heavily on the penalty parameter λ. Usually, λ is chosen using maximum cross-validated log-likelihood (max-cvl). However, this method has often a very high FDR. We review methods for a more conservative choice of λ. We propose an empirical extension of the cvl by adding a penalization term, which trades off between the goodness-of-fit and the parsimony of the model, leading to the selection of fewer biomarkers and, as we show, to the reduction of the FDR without large increase in FNR. We conducted a simulation study considering null and moderately sparse alternative scenarios and compared our approach with the standard lasso and 10 other competitors: Akaike information criterion (AIC), corrected AIC, Bayesian information criterion (BIC), extended BIC, Hannan and Quinn information criterion (HQIC), risk information criterion (RIC), one-standard-error rule, adaptive lasso, stability selection, and percentile lasso. Our extension achieved the best compromise across all the scenarios between a reduction of the FDR and a limited raise of the FNR, followed by the AIC, the RIC, and the adaptive lasso, which performed well in some settings. We illustrate the methods using gene expression data of 523 breast cancer patients. In conclusion, we propose to apply our extension to the lasso whenever a stringent FDR with a limited FNR is targeted. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Nils Ternès
- Université Paris-Saclay, Univ. Paris-Sud, UVSQ, CESP, INSERM, F-94805, Villejuif, France.,Gustave Roussy, Service de biostatistique et d'épidémiologie, F-94805, Villejuif, France
| | - Federico Rotolo
- Université Paris-Saclay, Univ. Paris-Sud, UVSQ, CESP, INSERM, F-94805, Villejuif, France.,Gustave Roussy, Service de biostatistique et d'épidémiologie, F-94805, Villejuif, France
| | - Stefan Michiels
- Université Paris-Saclay, Univ. Paris-Sud, UVSQ, CESP, INSERM, F-94805, Villejuif, France.,Gustave Roussy, Service de biostatistique et d'épidémiologie, F-94805, Villejuif, France
| |
Collapse
|
45
|
Resistance to everolimus driven by epigenetic regulation of MYC in ER+ breast cancers. Oncotarget 2016; 6:2407-20. [PMID: 25537515 PMCID: PMC4385860 DOI: 10.18632/oncotarget.2964] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2015] [Accepted: 12/10/2015] [Indexed: 11/25/2022] Open
Abstract
Acquired resistance to PI3K/mTOR/Akt pathway inhibitors is often associated with compensatory feedback loops involving the activation of oncogenes. Here, we have generated everolimus resistance in ER+ breast cancer cells and in long-term estrogen deprived (LTED) models that mimic progression on anti-estrogens. This allowed us to uncover MYC as a driver of mTOR inhibitor resistance. We demonstrate that both everolimus resistance and acute treatment of everolimus can lead to the upregulation of MYC mRNA, protein expression and, consequently, the enrichment of MYC signatures as revealed by RNA sequencing data. Depletion of MYC resulted in resensitization to everolimus, confirming its functional importance in this setting. Furthermore, ChIP assays demonstrate that MYC upregulation in the everolimus resistant lines is mediated by increased association of the BRD4 transcription factor with the MYC gene. Finally, JQ1, a BRD4 inhibitor combined with everolimus exhibited increased tumor growth inhibition in 3D Matrigel models and an in vivo xenograft model. These data suggest that MYC plays an important role in mediating resistance to everolimus in ER+ and ER+/LTED models. Furthermore, given the regulation ofMYCby BRD4 in this setting, these data have implications for increased therapeutic potential of combining epigenetic agents with mTOR inhibitors to effectively downregulate otherwise difficult to target transcription factors such as MYC.
Collapse
|
46
|
Abstract
This chapter introduces methods to synthesize experimental results from independent high-throughput genomic experiments, with a focus on adaptation of traditional methods from systematic review of clinical trials and epidemiological studies. First, it reviews methods for identifying, acquiring, and preparing individual patient data for meta-analysis. It then reviews methodology for synthesizing results across studies and assessing heterogeneity, first through outlining of methods and then through a step-by-step case study in identifying genes associated with survival in high-grade serous ovarian cancer.
Collapse
|
47
|
Yue Z, Kshirsagar MM, Nguyen T, Suphavilai C, Neylon MT, Zhu L, Ratliff T, Chen JY. PAGER: constructing PAGs and new PAG-PAG relationships for network biology. Bioinformatics 2015; 31:i250-7. [PMID: 26072489 PMCID: PMC4553834 DOI: 10.1093/bioinformatics/btv265] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
In this article, we described a new database framework to perform integrative “gene-set, network, and pathway analysis” (GNPA). In this framework, we integrated heterogeneous data on pathways, annotated list, and gene-sets (PAGs) into a PAG electronic repository (PAGER). PAGs in the PAGER database are organized into P-type, A-type and G-type PAGs with a three-letter-code standard naming convention. The PAGER database currently compiles 44 313 genes from 5 species including human, 38 663 PAGs, 324 830 gene–gene relationships and two types of 3 174 323 PAG–PAG regulatory relationships—co-membership based and regulatory relationship based. To help users assess each PAG’s biological relevance, we developed a cohesion measure called Cohesion Coefficient (CoCo), which is capable of disambiguating between biologically significant PAGs and random PAGs with an area-under-curve performance of 0.98. PAGER database was set up to help users to search and retrieve PAGs from its online web interface. PAGER enable advanced users to build PAG–PAG regulatory networks that provide complementary biological insights not found in gene set analysis or individual gene network analysis. We provide a case study using cancer functional genomics data sets to demonstrate how integrative GNPA help improve network biology data coverage and therefore biological interpretability. The PAGER database can be accessible openly at http://discovery.informatics.iupui.edu/PAGER/. Contact: jakechen@iupui.edu Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zongliang Yue
- Indiana University School of Informatics and Computing, Department of Computer and Information Science, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, Purdue University Center for Cancer Research, West Lafayette, IN 47906 and Institute of Biopharmaceutical Informatics and Technology, Wenzhou Medical University, WenZhou, Zhe Jiang Province, China
| | - Madhura M Kshirsagar
- Indiana University School of Informatics and Computing, Department of Computer and Information Science, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, Purdue University Center for Cancer Research, West Lafayette, IN 47906 and Institute of Biopharmaceutical Informatics and Technology, Wenzhou Medical University, WenZhou, Zhe Jiang Province, China
| | - Thanh Nguyen
- Indiana University School of Informatics and Computing, Department of Computer and Information Science, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, Purdue University Center for Cancer Research, West Lafayette, IN 47906 and Institute of Biopharmaceutical Informatics and Technology, Wenzhou Medical University, WenZhou, Zhe Jiang Province, China
| | - Chayaporn Suphavilai
- Indiana University School of Informatics and Computing, Department of Computer and Information Science, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, Purdue University Center for Cancer Research, West Lafayette, IN 47906 and Institute of Biopharmaceutical Informatics and Technology, Wenzhou Medical University, WenZhou, Zhe Jiang Province, China
| | - Michael T Neylon
- Indiana University School of Informatics and Computing, Department of Computer and Information Science, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, Purdue University Center for Cancer Research, West Lafayette, IN 47906 and Institute of Biopharmaceutical Informatics and Technology, Wenzhou Medical University, WenZhou, Zhe Jiang Province, China
| | - Liugen Zhu
- Indiana University School of Informatics and Computing, Department of Computer and Information Science, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, Purdue University Center for Cancer Research, West Lafayette, IN 47906 and Institute of Biopharmaceutical Informatics and Technology, Wenzhou Medical University, WenZhou, Zhe Jiang Province, China
| | - Timothy Ratliff
- Indiana University School of Informatics and Computing, Department of Computer and Information Science, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, Purdue University Center for Cancer Research, West Lafayette, IN 47906 and Institute of Biopharmaceutical Informatics and Technology, Wenzhou Medical University, WenZhou, Zhe Jiang Province, China
| | - Jake Y Chen
- Indiana University School of Informatics and Computing, Department of Computer and Information Science, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, Purdue University Center for Cancer Research, West Lafayette, IN 47906 and Institute of Biopharmaceutical Informatics and Technology, Wenzhou Medical University, WenZhou, Zhe Jiang Province, China Indiana University School of Informatics and Computing, Department of Computer and Information Science, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, Purdue University Center for Cancer Research, West Lafayette, IN 47906 and Institute of Biopharmaceutical Informatics and Technology, Wenzhou Medical University, WenZhou, Zhe Jiang Province, China Indiana University School of Informatics and Computing, Department of Computer and Information Science, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, Purdue University Center for Cancer Research, West Lafayette, IN 47906 and Institute of Biopharmaceutical Informatics and Technology, Wenzhou Medical University, WenZhou, Zhe Jiang Province, China
| |
Collapse
|
48
|
Verfaillie A, Imrichova H, Janky R, Aerts S. iRegulon and i-cisTarget: Reconstructing Regulatory Networks Using Motif and Track Enrichment. ACTA ACUST UNITED AC 2015; 52:2.16.1-2.16.39. [PMID: 26678384 DOI: 10.1002/0471250953.bi0216s52] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Gene expression profiling is often used to identify genes that are co-expressed in a biological process or disease. Downstream analyses of co-expressed gene sets using bioinformatics methods can reveal candidate transcription factors (TF) that co-regulate these genes, based on the presence of shared TF binding sites. Drawing gene regulatory networks that connect TFs to their predicted target genes can uncover gene modules that implement a particular function. Here, we describe several protocols to analyze any set of co-expressed genes using iRegulon and i-cisTarget. These tools perform regulatory sequence analysis (motif discovery) and integrate and mine large collections of existing regulatory data, such as ChIP-Seq, DHS-seq, and FAIRE-seq (track discovery). While iRegulon focuses on sets of co-expressed genes, i-cisTarget also analyses genomic regions as input. The following protocols describe how to install and use these tools, how to interpret the obtained results, and will thus help to create meaningful regulatory networks.
Collapse
Affiliation(s)
- Annelien Verfaillie
- Laboratory of Computational Biology, Center for Human Genetics, KU Leuven, Belgium
| | - Hana Imrichova
- Laboratory of Computational Biology, Center for Human Genetics, KU Leuven, Belgium
| | - Rekins Janky
- Laboratory of Computational Biology, Center for Human Genetics, KU Leuven, Belgium
| | - Stein Aerts
- Laboratory of Computational Biology, Center for Human Genetics, KU Leuven, Belgium
| |
Collapse
|
49
|
Wang M, Zhao Y, Zhang B. Efficient Test and Visualization of Multi-Set Intersections. Sci Rep 2015; 5:16923. [PMID: 26603754 PMCID: PMC4658477 DOI: 10.1038/srep16923] [Citation(s) in RCA: 219] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2015] [Accepted: 10/22/2015] [Indexed: 01/10/2023] Open
Abstract
Identification of sets of objects with shared features is a common operation in all
disciplines. Analysis of intersections among multiple sets is fundamental for
in-depth understanding of their complex relationships. However, so far no method has
been developed to assess statistical significance of intersections among three or
more sets. Moreover, the state-of-the-art approaches for visualization of multi-set
intersections are not scalable. Here, we first developed a theoretical framework for
computing the statistical distributions of multi-set intersections based upon
combinatorial theory, and then accordingly designed a procedure to efficiently
calculate the exact probabilities of multi-set intersections. We further developed
multiple efficient and scalable techniques to visualize multi-set intersections and
the corresponding intersection statistics. We implemented both the theoretical
framework and the visualization techniques in a unified R software package,
SuperExactTest. We demonstrated the utility of SuperExactTest
through an intensive simulation study and a comprehensive analysis of seven
independently curated cancer gene sets as well as six disease or trait associated
gene sets identified by genome-wide association studies. We expect
SuperExactTest developed by this study will have a broad range of
applications in scientific data analysis in many disciplines.
Collapse
Affiliation(s)
- Minghui Wang
- Department of Genetics and Genomic Sciences, Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, NY 10029, USA
| | - Yongzhong Zhao
- Department of Genetics and Genomic Sciences, Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, NY 10029, USA
| | - Bin Zhang
- Department of Genetics and Genomic Sciences, Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, NY 10029, USA
| |
Collapse
|
50
|
Glass K, Girvan M. Finding New Order in Biological Functions from the Network Structure of Gene Annotations. PLoS Comput Biol 2015; 11:e1004565. [PMID: 26588252 PMCID: PMC4654495 DOI: 10.1371/journal.pcbi.1004565] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2015] [Accepted: 09/23/2015] [Indexed: 11/19/2022] Open
Abstract
The Gene Ontology (GO) provides biologists with a controlled terminology that describes how genes are associated with functions and how functional terms are related to one another. These term-term relationships encode how scientists conceive the organization of biological functions, and they take the form of a directed acyclic graph (DAG). Here, we propose that the network structure of gene-term annotations made using GO can be employed to establish an alternative approach for grouping functional terms that captures intrinsic functional relationships that are not evident in the hierarchical structure established in the GO DAG. Instead of relying on an externally defined organization for biological functions, our approach connects biological functions together if they are performed by the same genes, as indicated in a compendium of gene annotation data from numerous different sources. We show that grouping terms by this alternate scheme provides a new framework with which to describe and predict the functions of experimentally identified sets of genes.
Collapse
Affiliation(s)
- Kimberly Glass
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
- Physics Department, University of Maryland, College Park, Maryland, United States of America
- * E-mail:
| | - Michelle Girvan
- Physics Department, University of Maryland, College Park, Maryland, United States of America
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America
- Santa Fe Institute, Santa Fe, New Mexico, United States of America
| |
Collapse
|