2
|
Reed ER, Monti S. Multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data. Nucleic Acids Res 2021; 49:e98. [PMID: 34226941 PMCID: PMC8464061 DOI: 10.1093/nar/gkab552] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 06/07/2021] [Accepted: 06/18/2021] [Indexed: 12/21/2022] Open
Abstract
As high-throughput genomics assays become more efficient and cost effective, their utilization has become standard in large-scale biomedical projects. These studies are often explorative, in that relationships between samples are not explicitly defined a priori, but rather emerge from data-driven discovery and annotation of molecular subtypes, thereby informing hypotheses and independent evaluation. Here, we present K2Taxonomer, a novel unsupervised recursive partitioning algorithm and associated R package that utilize ensemble learning to identify robust subgroups in a 'taxonomy-like' structure. K2Taxonomer was devised to accommodate different data paradigms, and is suitable for the analysis of both bulk and single-cell transcriptomics, and other '-omics', data. For each of these data types, we demonstrate the power of K2Taxonomer to discover known relationships in both simulated and human tissue data. We conclude with a practical application on breast cancer tumor infiltrating lymphocyte (TIL) single-cell profiles, in which we identified co-expression of translational machinery genes as a dominant transcriptional program shared by T cells subtypes, associated with better prognosis in breast cancer tissue bulk expression data.
Collapse
Affiliation(s)
- Eric R Reed
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, MA 02118, USA
- Bioinformatics Program, College of Engineering, Boston University, Boston, MA 02118, USA
| | - Stefano Monti
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, MA 02118, USA
- Bioinformatics Program, College of Engineering, Boston University, Boston, MA 02118, USA
- Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA
| |
Collapse
|
3
|
Blatti C, Emad A, Berry MJ, Gatzke L, Epstein M, Lanier D, Rizal P, Ge J, Liao X, Sobh O, Lambert M, Post CS, Xiao J, Groves P, Epstein AT, Chen X, Srinivasan S, Lehnert E, Kalari KR, Wang L, Weinshilboum RM, Song JS, Jongeneel CV, Han J, Ravaioli U, Sobh N, Bushell CB, Sinha S. Knowledge-guided analysis of "omics" data using the KnowEnG cloud platform. PLoS Biol 2020; 18:e3000583. [PMID: 31971940 PMCID: PMC6977717 DOI: 10.1371/journal.pbio.3000583] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Accepted: 12/19/2019] [Indexed: 12/19/2022] Open
Abstract
We present Knowledge Engine for Genomics (KnowEnG), a free-to-use computational system for analysis of genomics data sets, designed to accelerate biomedical discovery. It includes tools for popular bioinformatics tasks such as gene prioritization, sample clustering, gene set analysis, and expression signature analysis. The system specializes in "knowledge-guided" data mining and machine learning algorithms, in which user-provided data are analyzed in light of prior information about genes, aggregated from numerous knowledge bases and encoded in a massive "Knowledge Network." KnowEnG adheres to "FAIR" principles (findable, accessible, interoperable, and reuseable): its tools are easily portable to diverse computing environments, run on the cloud for scalable and cost-effective execution, and are interoperable with other computing platforms. The analysis tools are made available through multiple access modes, including a web portal with specialized visualization modules. We demonstrate the KnowEnG system's potential value in democratization of advanced tools for the modern genomics era through several case studies that use its tools to recreate and expand upon the published analysis of cancer data sets.
Collapse
Affiliation(s)
- Charles Blatti
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Amin Emad
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Department of Electrical and Computer Engineering, McGill University, Montreal, Canada
| | - Matthew J. Berry
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Lisa Gatzke
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Milt Epstein
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Daniel Lanier
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Pramod Rizal
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Jing Ge
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Xiaoxia Liao
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Omar Sobh
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Mike Lambert
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Corey S. Post
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Jinfeng Xiao
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Peter Groves
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Aidan T. Epstein
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Xi Chen
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Subhashini Srinivasan
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Erik Lehnert
- Seven Bridges Genomics, Charlestown, Massachusetts, United States of America
| | - Krishna R. Kalari
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Liewei Wang
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Richard M. Weinshilboum
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Jun S. Song
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Department of Physics, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Cancer Center at Illinois, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - C. Victor Jongeneel
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Jiawei Han
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Cancer Center at Illinois, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Umberto Ravaioli
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Nahil Sobh
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Colleen B. Bushell
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Saurabh Sinha
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Cancer Center at Illinois, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- * E-mail:
| |
Collapse
|
4
|
Li S, Teng Y, Yuan MJ, Ma TT, Ma J, Gao XJ. A seven long-noncoding RNA signature predicts prognosis of lung squamous cell carcinoma. Biomark Med 2019; 14:53-63. [PMID: 31729251 DOI: 10.2217/bmm-2019-0282] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Aim: This study profiled differentially expressed long noncoding RNAs (lncRNAs) in lung squamous cell carcinoma (LSCC) to predict LSCC overall survival (OS) using The Cancer Genome Atlas data. Materials & methods: The RNA-seq and clinical dataset of 475 LSCC patients was retrieved from The Cancer Genome Atlas database and statistically analyzed. Results: There were 67 upregulated and 32 downregulated lncRNAs in LSCCs and 12 lncRNAs associated with OS. The seven-lncRNA signature was associated with poor OS and RP11-150O12.6 and CTA-384D8.35 were associated with better OS (p < 0.001). The seven lncRNAs-mRNA interaction network analysis showed their association with 187 protein-coding genes for cancer development, cell migration, adhesion, proliferation, apoptosis, angiogenesis and the MAPK signaling pathways. Conclusion: This seven-lncRNA signature is useful to predict LSCC OS.
Collapse
Affiliation(s)
- Shuai Li
- Department of Cardiology, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai 200233, PR China
| | - Yue Teng
- Department of Radiology, The First Affiliated Hospital of Soochow University, Suzhou, Jiangsu 215006, PR China
| | - Min-Jie Yuan
- Department of Cardiology, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai 200233, PR China
| | - Ting-Ting Ma
- Department of Radiology, Tianjin Medical University Cancer Institute & Hospital, National Clinical Research Center for Cancer, Tianjin's Clinical Research Center for Cancer, The Key Laboratory of Cancer Prevention & Therapy, Tianjin 300060, PR China
| | - Jian Ma
- Department of Cardiology, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai 200233, PR China
| | - Xu-Jie Gao
- Department of Radiology, Tianjin Medical University Cancer Institute & Hospital, National Clinical Research Center for Cancer, Tianjin's Clinical Research Center for Cancer, The Key Laboratory of Cancer Prevention & Therapy, Tianjin 300060, PR China
| |
Collapse
|