1
|
Zhao X, Singhal A, Park S, Kong J, Bachelder R, Ideker T. Cancer Mutations Converge on a Collection of Protein Assemblies to Predict Resistance to Replication Stress. Cancer Discov 2024; 14:508-523. [PMID: 38236062 PMCID: PMC10905674 DOI: 10.1158/2159-8290.cd-23-0641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 10/25/2023] [Accepted: 12/21/2023] [Indexed: 01/19/2024]
Abstract
Rapid proliferation is a hallmark of cancer associated with sensitivity to therapeutics that cause DNA replication stress (RS). Many tumors exhibit drug resistance, however, via molecular pathways that are incompletely understood. Here, we develop an ensemble of predictive models that elucidate how cancer mutations impact the response to common RS-inducing (RSi) agents. The models implement recent advances in deep learning to facilitate multidrug prediction and mechanistic interpretation. Initial studies in tumor cells identify 41 molecular assemblies that integrate alterations in hundreds of genes for accurate drug response prediction. These cover roles in transcription, repair, cell-cycle checkpoints, and growth signaling, of which 30 are shown by loss-of-function genetic screens to regulate drug sensitivity or replication restart. The model translates to cisplatin-treated cervical cancer patients, highlighting an RTK-JAK-STAT assembly governing resistance. This study defines a compendium of mechanisms by which mutations affect therapeutic responses, with implications for precision medicine. SIGNIFICANCE Zhao and colleagues use recent advances in machine learning to study the effects of tumor mutations on the response to common therapeutics that cause RS. The resulting predictive models integrate numerous genetic alterations distributed across a constellation of molecular assemblies, facilitating a quantitative and interpretable assessment of drug response. This article is featured in Selected Articles from This Issue, p. 384.
Collapse
Affiliation(s)
- Xiaoyu Zhao
- Division of Human Genomics and Precision Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
| | - Akshat Singhal
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California
| | - Sungjoon Park
- Division of Human Genomics and Precision Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
| | - JungHo Kong
- Division of Human Genomics and Precision Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
- Moores Cancer Center, School of Medicine, University of California, San Diego, La Jolla, California
| | - Robin Bachelder
- Division of Human Genomics and Precision Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
| | - Trey Ideker
- Division of Human Genomics and Precision Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California
- Moores Cancer Center, School of Medicine, University of California, San Diego, La Jolla, California
- Department of Bioengineering, University of California, San Diego, La Jolla, California
| |
Collapse
|
2
|
Tjärnberg A, Beheler-Amass M, Jackson CA, Christiaen LA, Gresham D, Bonneau R. Structure-primed embedding on the transcription factor manifold enables transparent model architectures for gene regulatory network and latent activity inference. Genome Biol 2024; 25:24. [PMID: 38238840 PMCID: PMC10797903 DOI: 10.1186/s13059-023-03134-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 11/30/2023] [Indexed: 01/22/2024] Open
Abstract
BACKGROUND Modeling of gene regulatory networks (GRNs) is limited due to a lack of direct measurements of genome-wide transcription factor activity (TFA) making it difficult to separate covariance and regulatory interactions. Inference of regulatory interactions and TFA requires aggregation of complementary evidence. Estimating TFA explicitly is problematic as it disconnects GRN inference and TFA estimation and is unable to account for, for example, contextual transcription factor-transcription factor interactions, and other higher order features. Deep-learning offers a potential solution, as it can model complex interactions and higher-order latent features, although does not provide interpretable models and latent features. RESULTS We propose a novel autoencoder-based framework, StrUcture Primed Inference of Regulation using latent Factor ACTivity (SupirFactor) for modeling, and a metric, explained relative variance (ERV), for interpretation of GRNs. We evaluate SupirFactor with ERV in a wide set of contexts. Compared to current state-of-the-art GRN inference methods, SupirFactor performs favorably. We evaluate latent feature activity as an estimate of TFA and biological function in S. cerevisiae as well as in peripheral blood mononuclear cells (PBMC). CONCLUSION Here we present a framework for structure-primed inference and interpretation of GRNs, SupirFactor, demonstrating interpretability using ERV in multiple biological and experimental settings. SupirFactor enables TFA estimation and pathway analysis using latent factor activity, demonstrated here on two large-scale single-cell datasets, modeling S. cerevisiae and PBMC. We find that the SupirFactor model facilitates biological analysis acquiring novel functional and regulatory insight.
Collapse
Affiliation(s)
- Andreas Tjärnberg
- Center for Developmental Genetics, New York University, New York, NY, 10003, USA.
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA.
- Department of Biology, NYU, New York, NY, 10008, USA.
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, 10010, USA.
- Department of Neuro-Science, University of Wisconsin-Madison - Waisman Center, Madison, USA.
| | - Maggie Beheler-Amass
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA
- Department of Biology, NYU, New York, NY, 10008, USA
| | - Christopher A Jackson
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA
- Department of Biology, NYU, New York, NY, 10008, USA
| | - Lionel A Christiaen
- Center for Developmental Genetics, New York University, New York, NY, 10003, USA
- Department of Biology, NYU, New York, NY, 10008, USA
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway
- Department of Heart Disease, Haukeland University Hospital, Bergen, Norway
| | - David Gresham
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA
- Department of Biology, NYU, New York, NY, 10008, USA
| | - Richard Bonneau
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA.
- Department of Biology, NYU, New York, NY, 10008, USA.
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY, 10010, USA.
- Courant Institute of Mathematical Sciences, Computer Science Department, New York University, New York, NY, 10003, USA.
- Center For Data Science, NYU, New York, NY, 10008, USA.
- Prescient Design, a Genentech accelerator, New York, NY, 10010, USA.
| |
Collapse
|
3
|
Shan Q, Zhang C, Li Y, Li Q, Zhang Y, Li X, Shi J, Hu F. SLC7A11, a potential immunotherapeutic target in lung adenocarcinoma. Sci Rep 2023; 13:18302. [PMID: 37880315 PMCID: PMC10600206 DOI: 10.1038/s41598-023-45284-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 10/18/2023] [Indexed: 10/27/2023] Open
Abstract
SLC7A11 has significant translational value in cancer treatment. However, there are few studies on whether SLC7A11 affects the immune status of lung adenocarcinoma (LUAD). Information on SLC7A11 expression and its impact on prognosis was obtained from the cancer genome atlas and gene expression omnibus databases. The differentially expressed genes (DEGs) were analysed by GO and KEGG. GSEA enrichment analysis was performed in the SLC7A11-high and SLC7A11-low groups. The relationship between SLC7A11 and tumour immunity, immune checkpoints, and immune cell infiltration was studied using R language. We analysed the correlation between SLC7A11 and chemotactic factors (CFs) and chemokine receptors using the TISIDB database. SLC7A11 is overexpressed in many tumours, including LUAD. The 5-year overall survival of patients in the SLC7A11-high group was lower than in the SLC7A11-low group. KEGG analysis found that the DEGs were enriched in ferroptosis signaling pathways. GSEA analysis found that the survival-related signaling pathways were enriched in the SLC7A11-low group. The SLC7A11-low group had higher immune scores and immune checkpoint expression. SLC7A11 was negatively correlated with many immune cells (CD8+ T cells, immature dendritic cells), CFs, chemokine receptors (such as CCL17/19/22/23, CXCL9/10/11/14, CCR4/6, CX3CR1, CXCR3) and MHCs (major histocompatibility complex). SLC7A11 may regulate tumour immunity and could be a potential therapeutic target for LUAD.
Collapse
Affiliation(s)
- Qingqing Shan
- Department of Respiration, Chengdu First People's Hospital, No. 18, Wangxiang North Road, High-Tech Zone, Chengdu, 610041, Sichuan Province, People's Republic of China
| | - Chi Zhang
- Department of Respiration, Chengdu First People's Hospital, No. 18, Wangxiang North Road, High-Tech Zone, Chengdu, 610041, Sichuan Province, People's Republic of China
| | - Yangke Li
- Department of Respiration, Chengdu First People's Hospital, No. 18, Wangxiang North Road, High-Tech Zone, Chengdu, 610041, Sichuan Province, People's Republic of China.
| | - Qunying Li
- Department of Respiration, Chengdu First People's Hospital, No. 18, Wangxiang North Road, High-Tech Zone, Chengdu, 610041, Sichuan Province, People's Republic of China.
| | - Yifan Zhang
- Department of Respiration, Chengdu First People's Hospital, No. 18, Wangxiang North Road, High-Tech Zone, Chengdu, 610041, Sichuan Province, People's Republic of China
| | - Xue Li
- Department of Respiration, Chengdu First People's Hospital, No. 18, Wangxiang North Road, High-Tech Zone, Chengdu, 610041, Sichuan Province, People's Republic of China
| | - Junqing Shi
- Department of Respiration, Chengdu First People's Hospital, No. 18, Wangxiang North Road, High-Tech Zone, Chengdu, 610041, Sichuan Province, People's Republic of China
| | - Fengying Hu
- Department of Respiration, Chengdu First People's Hospital, No. 18, Wangxiang North Road, High-Tech Zone, Chengdu, 610041, Sichuan Province, People's Republic of China
| |
Collapse
|
4
|
Roper B, Mathews JC, Nadeem S, Park JH. Vis-SPLIT: Interactive Hierarchical Modeling for mRNA Expression Classification. IEEE VISUALIZATION CONFERENCE : VIS. IEEE CONFERENCE ON VISUALIZATION 2023; 2023:106-110. [PMID: 38881685 PMCID: PMC11179685 DOI: 10.1109/vis54172.2023.00030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2024]
Abstract
We propose an interactive visual analytics tool, Vis-SPLIT, for partitioning a population of individuals into groups with similar gene signatures. Vis-SPLIT allows users to interactively explore a dataset and exploit visual separations to build a classification model for specific cancers. The visualization components reveal gene expression and correlation to assist specific partitioning decisions, while also providing overviews for the decision model and clustered genetic signatures. We demonstrate the effectiveness of our framework through a case study and evaluate its usability with domain experts. Our results show that Vis-SPLIT can classify patients based on their genetic signatures to effectively gain insights into RNA sequencing data, as compared to an existing classification system.
Collapse
|
5
|
Utriainen M, Morris JH. clusterMaker2: a major update to clusterMaker, a multi-algorithm clustering app for Cytoscape. BMC Bioinformatics 2023; 24:134. [PMID: 37020209 PMCID: PMC10074866 DOI: 10.1186/s12859-023-05225-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Accepted: 03/11/2023] [Indexed: 04/07/2023] Open
Abstract
BACKGROUND Since the initial publication of clusterMaker, the need for tools to analyze large biological datasets has only increased. New datasets are significantly larger than a decade ago, and new experimental techniques such as single-cell transcriptomics continue to drive the need for clustering or classification techniques to focus on portions of datasets of interest. While many libraries and packages exist that implement various algorithms, there remains the need for clustering packages that are easy to use, integrated with visualization of the results, and integrated with other commonly used tools for biological data analysis. clusterMaker2 has added several new algorithms, including two entirely new categories of analyses: node ranking and dimensionality reduction. Furthermore, many of the new algorithms have been implemented using the Cytoscape jobs API, which provides a mechanism for executing remote jobs from within Cytoscape. Together, these advances facilitate meaningful analyses of modern biological datasets despite their ever-increasing size and complexity. RESULTS The use of clusterMaker2 is exemplified by reanalyzing the yeast heat shock expression experiment that was included in our original paper; however, here we explored this dataset in significantly more detail. Combining this dataset with the yeast protein-protein interaction network from STRING, we were able to perform a variety of analyses and visualizations from within clusterMaker2, including Leiden clustering to break the entire network into smaller clusters, hierarchical clustering to look at the overall expression dataset, dimensionality reduction using UMAP to find correlations between our hierarchical visualization and the UMAP plot, fuzzy clustering, and cluster ranking. Using these techniques, we were able to explore the highest-ranking cluster and determine that it represents a strong contender for proteins working together in response to heat shock. We found a series of clusters that, when re-explored as fuzzy clusters, provide a better presentation of mitochondrial processes. CONCLUSIONS clusterMaker2 represents a significant advance over the previously published version, and most importantly, provides an easy-to-use tool to perform clustering and to visualize clusters within the Cytoscape network context. The new algorithms should be welcome to the large population of Cytoscape users, particularly the new dimensionality reduction and fuzzy clustering techniques.
Collapse
Affiliation(s)
| | - John H Morris
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, CA, USA.
| |
Collapse
|
6
|
Tjärnberg A, Beheler-Amass M, Jackson CA, Christiaen LA, Gresham D, Bonneau R. Structure primed embedding on the transcription factor manifold enables transparent model architectures for gene regulatory network and latent activity inference. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.02.526909. [PMID: 36778259 PMCID: PMC9915715 DOI: 10.1101/2023.02.02.526909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The modeling of gene regulatory networks (GRNs) is limited due to a lack of direct measurements of regulatory features in genome-wide screens. Most GRN inference methods are therefore forced to model relationships between regulatory genes and their targets with expression as a proxy for the upstream independent features, complicating validation and predictions produced by modeling frameworks. Separating covariance and regulatory influence requires aggregation of independent and complementary sets of evidence, such as transcription factor (TF) binding and target gene expression. However, the complete regulatory state of the system, e.g. TF activity (TFA) is unknown due to a lack of experimental feasibility, making regulatory relations difficult to infer. Some methods attempt to account for this by modeling TFA as a latent feature, but these models often use linear frameworks that are unable to account for non-linearities such as saturation, TF-TF interactions, and other higher order features. Deep learning frameworks may offer a solution, as they are capable of modeling complex interactions and capturing higher-order latent features. However, these methods often discard central concepts in biological systems modeling, such as sparsity and latent feature interpretability, in favor of increased model complexity. We propose a novel deep learning autoencoder-based framework, StrUcture Primed Inference of Regulation using latent Factor ACTivity (SupirFactor), that scales to single cell genomic data and maintains interpretability to perform GRN inference and estimate TFA as a latent feature. We demonstrate that SupirFactor outperforms current leading GRN inference methods, predicts biologically relevant TFA and elucidates functional regulatory pathways through aggregation of TFs.
Collapse
Affiliation(s)
- Andreas Tjärnberg
- Center for Developmental Genetics, New York University, New York 10003 NY, USA
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, 10010, USA
| | - Maggie Beheler-Amass
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
| | - Christopher A Jackson
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
| | - Lionel A Christiaen
- Center for Developmental Genetics, New York University, New York 10003 NY, USA
- Department of Biology, NYU, New York, NY 10008, USA
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway
- Department of Heart Disease, Haukeland University Hospital, Bergen, Norway
| | - David Gresham
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
| | - Richard Bonneau
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA
- Courant Institute of Mathematical Sciences, Computer Science Department, New York University, New York, NY 10003, USA
- Center For Data Science, NYU, New York, NY 10008, USA
- Prescient Design, a Genentech accelerator, New York, NY, 10010, USA
| |
Collapse
|
7
|
Cousins H, Hall T, Guo Y, Tso L, Tzeng KTH, Cong L, Altman RB. Gene set proximity analysis: expanding gene set enrichment analysis through learned geometric embeddings, with drug-repurposing applications in COVID-19. Bioinformatics 2023; 39:btac735. [PMID: 36394254 PMCID: PMC9805577 DOI: 10.1093/bioinformatics/btac735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 09/27/2022] [Accepted: 11/16/2022] [Indexed: 11/18/2022] Open
Abstract
MOTIVATION Gene set analysis methods rely on knowledge-based representations of genetic interactions in the form of both gene set collections and protein-protein interaction (PPI) networks. However, explicit representations of genetic interactions often fail to capture complex interdependencies among genes, limiting the analytic power of such methods. RESULTS We propose an extension of gene set enrichment analysis to a latent embedding space reflecting PPI network topology, called gene set proximity analysis (GSPA). Compared with existing methods, GSPA provides improved ability to identify disease-associated pathways in disease-matched gene expression datasets, while improving reproducibility of enrichment statistics for similar gene sets. GSPA is statistically straightforward, reducing to a version of traditional gene set enrichment analysis through a single user-defined parameter. We apply our method to identify novel drug associations with SARS-CoV-2 viral entry. Finally, we validate our drug association predictions through retrospective clinical analysis of claims data from 8 million patients, supporting a role for gabapentin as a risk factor and metformin as a protective factor for severe COVID-19. AVAILABILITY AND IMPLEMENTATION GSPA is available for download as a command-line Python package at https://github.com/henrycousins/gspa. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Henry Cousins
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Taryn Hall
- Optum Labs at UnitedHealth Group, Minneapolis, MN 55343, USA
| | - Yinglong Guo
- Optum Labs at UnitedHealth Group, Minneapolis, MN 55343, USA
| | - Luke Tso
- Optum Labs at UnitedHealth Group, Minneapolis, MN 55343, USA
| | - Kathy T H Tzeng
- Optum Labs at UnitedHealth Group, Minneapolis, MN 55343, USA
| | - Le Cong
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Russ B Altman
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
8
|
Wang L, Nie R, Zhang J, Cai J. scCapsNet-mask: an updated version of scCapsNet with extended applicability in functional analysis related to scRNA-seq data. BMC Bioinformatics 2022; 23:539. [PMID: 36510124 PMCID: PMC9743530 DOI: 10.1186/s12859-022-05098-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Accepted: 12/03/2022] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND With the rapid accumulation of scRNA-seq data, more and more automatic cell type identification methods have been developed, especially those based on deep learning. Although these methods have reached relatively high prediction accuracy, many issues still exist. One is the interpretability. The second is how to deal with the non-standard test samples that are not encountered in the training process. RESULTS Here we introduce scCapsNet-mask, an updated version of scCapsNet. The scCapsNet-mask provides a reasonable solution to the issues of interpretability and non-standard test samples. Firstly, the scCapsNet-mask utilizes a mask to ease the task of model interpretation in the original scCapsNet. The results show that scCapsNet-mask could constrain the coupling coefficients, and make a one-to-one correspondence between the primary capsules and type capsules. Secondly, the scCapsNet-mask can process non-standard samples more reasonably. In one example, the scCapsNet-mask was trained on the committed cells, and then tested on less differentiated cells as the non-standard samples. It could not only estimate the lineage bias of less differentiated cells, but also distinguish the development stages more accurately than traditional machine learning models. Therefore, the pseudo-temporal order of cells for each lineage could be established. Following these pseudo-temporal order, lineage specific genes exhibit a gradual increase expression pattern and stem cell associated genes exhibit a gradual decrease expression pattern. In another example, the scCapsNet-mask was trained on scRNA-seq data, and then used to assign cell type in spatial transcriptomics that may contain non-standard sample of doublets. The results show that the scCapsNet-mask not only restored the spatial map but also identified several non-standard samples of doublet. CONCLUSIONS The scCapsNet-mask offers a suitable solution to the challenge of interpretability and non-standard test samples. By adding a mask, it has the advantages of automatic processing and easy interpretation compared with the original scCapsNet. In addition, the scCapsNet-mask could more accurately reflect the composition of non-standard test samples than traditional machine learning methods. Therefore, it can extend its applicability in functional analysis, such as fate bias prediction in less differentiated cells and cell type assignment in spatial transcriptomics.
Collapse
Affiliation(s)
- Lifei Wang
- grid.413073.20000 0004 1758 9341Shulan (Hangzhou) Hospital Affiliated to Zhejiang Shuren University Shulan International Medical College, Hangzhou, China ,grid.464209.d0000 0004 0644 6935China National Center for Bioinformation, Beijing, 100101 China ,grid.9227.e0000000119573309Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101 China ,grid.410726.60000 0004 1797 8419University of Chinese Academy of Sciences, Beijing, 100049 China
| | - Rui Nie
- grid.464209.d0000 0004 0644 6935China National Center for Bioinformation, Beijing, 100101 China ,grid.9227.e0000000119573309Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101 China ,grid.410726.60000 0004 1797 8419University of Chinese Academy of Sciences, Beijing, 100049 China
| | - Jiang Zhang
- grid.20513.350000 0004 1789 9964School of Systems Science, Beijing Normal University, Beijing, 100875 China
| | - Jun Cai
- grid.464209.d0000 0004 0644 6935China National Center for Bioinformation, Beijing, 100101 China ,grid.9227.e0000000119573309Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101 China ,grid.410726.60000 0004 1797 8419University of Chinese Academy of Sciences, Beijing, 100049 China
| |
Collapse
|
9
|
Shah I, Bundy J, Chambers B, Everett LJ, Haggard D, Harrill J, Judson RS, Nyffeler J, Patlewicz G. Navigating Transcriptomic Connectivity Mapping Workflows to Link Chemicals with Bioactivities. Chem Res Toxicol 2022; 35:1929-1949. [PMID: 36301716 PMCID: PMC10483698 DOI: 10.1021/acs.chemrestox.2c00245] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Screening new compounds for potential bioactivities against cellular targets is vital for drug discovery and chemical safety. Transcriptomics offers an efficient approach for assessing global gene expression changes, but interpreting chemical mechanisms from these data is often challenging. Connectivity mapping is a potential data-driven avenue for linking chemicals to mechanisms based on the observation that many biological processes are associated with unique gene expression signatures (gene signatures). However, mining the effects of a chemical on gene signatures for biological mechanisms is challenging because transcriptomic data contain thousands of noisy genes. New connectivity mapping approaches seeking to distinguish signal from noise continue to be developed, spurred by the promise of discovering chemical mechanisms, new drugs, and disease targets from burgeoning transcriptomic data. Here, we analyze these approaches in terms of diverse transcriptomic technologies, public databases, gene signatures, pattern-matching algorithms, and statistical evaluation criteria. To navigate the complexity of connectivity mapping, we propose a harmonized scheme to coherently organize and compare published workflows. We first standardize concepts underlying transcriptomic profiles and gene signatures based on various transcriptomic technologies such as microarrays, RNA-Seq, and L1000 and discuss the widely used data sources such as Gene Expression Omnibus, ArrayExpress, and MSigDB. Next, we generalize connectivity mapping as a pattern-matching task for finding similarity between a query (e.g., transcriptomic profile for new chemical) and a reference (e.g., gene signature of known target). Published pattern-matching approaches fall into two main categories: vector-based use metrics like correlation, Jaccard index, etc., and aggregation-based use parametric and nonparametric statistics (e.g., gene set enrichment analysis). The statistical methods for evaluating the performance of different approaches are described, along with comparisons reported in the literature on benchmark transcriptomic data sets. Lastly, we review connectivity mapping applications in toxicology and offer guidance on evaluating chemical-induced toxicity with concentration-response transcriptomic data. In addition to serving as a high-level guide and tutorial for understanding and implementing connectivity mapping workflows, we hope this review will stimulate new algorithms for evaluating chemical safety and drug discovery using transcriptomic data.
Collapse
Affiliation(s)
- Imran Shah
- Center for Computational Toxicology and Exposure, Office of Research and Development, US. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, USA
| | - Joseph Bundy
- Center for Computational Toxicology and Exposure, Office of Research and Development, US. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, USA
| | - Bryant Chambers
- Center for Computational Toxicology and Exposure, Office of Research and Development, US. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, USA
| | - Logan J. Everett
- Center for Computational Toxicology and Exposure, Office of Research and Development, US. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, USA
| | - Derik Haggard
- Center for Computational Toxicology and Exposure, Office of Research and Development, US. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, USA
| | - Joshua Harrill
- Center for Computational Toxicology and Exposure, Office of Research and Development, US. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, USA
| | - Richard S. Judson
- Center for Computational Toxicology and Exposure, Office of Research and Development, US. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, USA
| | - Johanna Nyffeler
- Center for Computational Toxicology and Exposure, Office of Research and Development, US. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, USA
- Oak Ridge Institute for Science and Education (ORISE) Postdoctoral Fellow, Oak Ridge, Tennessee, 37831, US
| | - Grace Patlewicz
- Center for Computational Toxicology and Exposure, Office of Research and Development, US. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, USA
| |
Collapse
|
10
|
Zamani M, Cheng YH, Charbonier F, Gupta VK, Mayer AT, Trevino AE, Quertermous T, Chaudhuri O, Cahan P, Huang NF. Single-Cell Transcriptomic Census of Endothelial Changes Induced by Matrix Stiffness and the Association with Atherosclerosis. ADVANCED FUNCTIONAL MATERIALS 2022; 32:2203069. [PMID: 36816792 PMCID: PMC9937733 DOI: 10.1002/adfm.202203069] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Indexed: 05/28/2023]
Abstract
Vascular endothelial cell (EC) plasticity plays a critical role in the progression of atherosclerosis by giving rise to mesenchymal phenotypes in the plaque lesion. Despite the evidence for arterial stiffening as a major contributor to atherosclerosis, the complex interplay among atherogenic stimuli in vivo has hindered attempts to determine the effects of extracellular matrix (ECM) stiffness on endothelial-mesenchymal transition (EndMT). To study the regulatory effects of ECM stiffness on EndMT, an in vitro model is developed in which human coronary artery ECs are cultured on physiological or pathological stiffness substrates. Leveraging single-cell RNA sequencing, cell clusters with mesenchymal transcriptional features are identified to be more prevalent on pathological substrates than physiological substrates. Trajectory inference analyses reveal a novel mesenchymal-to-endothelial reverse transition, which is blocked by pathological stiffness substrates, in addition to the expected EndMT trajectory. ECs pushed to a mesenchymal character by pathological stiffness substrates are enriched in transcriptional signatures of atherosclerotic ECs from human and murine plaques. This study characterizes at single-cell resolution the transcriptional programs that underpin EC plasticity in both physiological or pathological milieus, and thus serves as a valuable resource for more precisely defining EndMT and the transcriptional programs contributing to atherosclerosis.
Collapse
Affiliation(s)
- Maedeh Zamani
- Department of Cardiothoracic Surgery, Stanford University, Stanford, CA 94305, USA
- Stanford Cardiovascular Institute, Stanford University, Stanford, CA 94305, USA
| | - Yu-Hao Cheng
- Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Institute for Cell Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Frank Charbonier
- Department of Mechanical Engineering, Stanford University, Stanford, CA 94305, USA
| | - Vivek Kumar Gupta
- Department of Mechanical Engineering, Stanford University, Stanford, CA 94305, USA
| | | | | | - Thomas Quertermous
- Division of Cardiovascular Medicine, Stanford School of Medicine, Stanford, CA 94305, USA
| | - Ovijit Chaudhuri
- Department of Mechanical Engineering, Stanford University, Stanford, CA 94305, USA
| | - Patrick Cahan
- Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Institute for Cell Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Ngan F Huang
- Department of Cardiothoracic Surgery, Stanford University, Stanford, CA 94305, USA
- Stanford Cardiovascular Institute, Stanford University, Stanford, CA 94305, USA
- Department of Chemical Engineering, Stanford University, Stanford, CA 94305, USA
- Veterans Affairs Palo Alto Health Care System, Palo Alto, CA 94304, USA
| |
Collapse
|
11
|
Tian L, Peng Y, Yang K, Cao J, Du X, Liang Z, Shi J, Zhang J. The ERα-NRF2 signalling axis promotes bicalutamide resistance in prostate cancer. Cell Commun Signal 2022; 20:178. [PMID: 36376959 PMCID: PMC9661764 DOI: 10.1186/s12964-022-00979-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2022] [Accepted: 09/27/2022] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Bicalutamide is a nonsteroidal antiandrogen widely used as a first-line clinical treatment for advanced prostate cancer (PCa). Although patients initially show effective responses to bicalutamide treatment, resistance to bicalutamide frequently occurs and leads to the development of castration-resistant PCa (CRPC). This research investigated the roles of the oestrogen receptor α (ERα)-nuclear factor E2-related factor 2 (NRF2) signalling pathway in bicalutamide resistance in PCa cells. METHODS We performed bioinformatic analysis and immunohistochemical staining on normal and cancerous prostate tissue to evaluate ERα and NRF2 expression and their correlation. Gene expression and localization in PCa cell lines were further investigated using real-time reverse transcription PCR/Western blotting and immunofluorescence staining. We treated PCa cells with the ER inhibitor tamoxifen and performed luciferase reporter assays and chromatin immunoprecipitation (ChIP) assays to understand ERα-dependent NRF2 expression. Overexpression and knockdown of ERα and NRF2 were used to explore the potential role of the ERα-NRF2 signalling axis in bicalutamide resistance in PCa cells. RESULTS We found that the expression of ERα and NRF2 was positively correlated and was higher in human CRPC tissues than in primary PCa tissues. Treatment with oestrogen or bicalutamide increased the expression of ERα and NRF2 as well as NRF2 target genes in PCa cell lines. These effects were blocked by pretreatment with tamoxifen. ChIP assays demonstrated that ERα directly binds to the oestrogen response element (ERE) in the NRF2 promoter. This binding led to increased transcriptional activity of NRF2 in a luciferase reporter assay. Activation of the ERα-NRF2 signalling axis increased the expression of bicalutamide resistance-related genes. Inhibition of this signalling axis by knockdown of ERα or NRF2 downregulated the expression of bicalutamide resistance-related genes and inhibited the proliferation and migration of PCa cells. CONCLUSIONS We demonstrated the transcriptional interaction between ERα and NRF2 in CRPC tissues and cell lines by showing the direct binding of ERα to the ERE in the NRF2 promoter under oestrogen treatment. Activation of the ERα-NRF2 signalling axis contributes to bicalutamide resistance in PCa cells, suggesting that the ERα-NRF2 signalling axis is a potential therapeutic target for CRPC. Video Abstract.
Collapse
Affiliation(s)
- Lei Tian
- grid.216938.70000 0000 9878 7032Department of Biochemistry and Molecular Biology, College of Life Sciences, Bioactive Materials Key Lab of the Ministry of Education, Nankai University, Tianjin, 300071 China
| | - Yanfei Peng
- grid.410648.f0000 0001 1816 6218School of Integrative Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin, 301617 China
| | - Kuo Yang
- grid.412648.d0000 0004 1798 6160Department of Urology, The Second Hospital of Tianjin Medical University, Tianjin, 300211 China
| | - Jiasong Cao
- grid.216938.70000 0000 9878 7032Department of Biochemistry and Molecular Biology, College of Life Sciences, Bioactive Materials Key Lab of the Ministry of Education, Nankai University, Tianjin, 300071 China
| | - Xiaoling Du
- grid.216938.70000 0000 9878 7032Department of Biochemistry and Molecular Biology, College of Life Sciences, Bioactive Materials Key Lab of the Ministry of Education, Nankai University, Tianjin, 300071 China
| | - Zhixian Liang
- grid.10784.3a0000 0004 1937 0482School of Biomedical Sciences, The Chinese University of Hong Kong, Hong Kong, 999077 China
| | - Jiandang Shi
- grid.216938.70000 0000 9878 7032Department of Biochemistry and Molecular Biology, College of Life Sciences, Bioactive Materials Key Lab of the Ministry of Education, Nankai University, Tianjin, 300071 China
| | - Ju Zhang
- grid.216938.70000 0000 9878 7032Department of Biochemistry and Molecular Biology, College of Life Sciences, Bioactive Materials Key Lab of the Ministry of Education, Nankai University, Tianjin, 300071 China
| |
Collapse
|
12
|
Dsouza KB, Li AY, Bhargava VK, Libbrecht MW. Latent Representation of the Human Pan-Celltype Epigenome Through a Deep Recurrent Neural Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2313-2323. [PMID: 34043510 DOI: 10.1109/tcbb.2021.3084147] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The availability of thousands of assays of epigenetic activity necessitates compressed representations of these data sets that summarize the epigenetic landscape of the genome. Until recently, most such representations were cell type-specific, applying to a single tissue or cell state. Recently, neural networks have made it possible to summarize data across tissues to produce a pan-cell type representation. In this work, we propose Epi-LSTM, a deep long short-term memory (LSTM) recurrent neural network autoencoder to capture the long-term dependencies in the epigenomic data. The latent representations from Epi-LSTM capture a variety of genomic phenomena, including gene-expression, promoter-enhancer interactions, replication timing, frequently interacting regions, and evolutionary conservation. These representations outperform existing methods in a majority of cell types while yielding smoother representations along the genomic axis due to their sequential nature.
Collapse
|
13
|
Moon S, Lee H. MOMA: a multi-task attention learning algorithm for multi-omics data interpretation and classification. Bioinformatics 2022; 38:2287-2296. [PMID: 35157023 PMCID: PMC10060719 DOI: 10.1093/bioinformatics/btac080] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 01/01/2022] [Accepted: 02/08/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Accurate diagnostic classification and biological interpretation are important in biology and medicine, which are data-rich sciences. Thus, integration of different data types is necessary for the high predictive accuracy of clinical phenotypes, and more comprehensive analyses for predicting the prognosis of complex diseases are required. RESULTS Here, we propose a novel multi-task attention learning algorithm for multi-omics data, termed MOMA, which captures important biological processes for high diagnostic performance and interpretability. MOMA vectorizes features and modules using a geometric approach and focuses on important modules in multi-omics data via an attention mechanism. Experiments using public data on Alzheimer's disease and cancer with various classification tasks demonstrated the superior performance of this approach. The utility of MOMA was also verified using a comparison experiment with an attention mechanism that was turned on or off and biological analysis. AVAILABILITY AND IMPLEMENTATION The source codes are available at https://github.com/dmcb-gist/MOMA. SUPPLEMENTARY INFORMATION Supplementary materials are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sehwan Moon
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju 61005, South Korea
| | - Hyunju Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju 61005, South Korea
| |
Collapse
|
14
|
Wang X, Wang H, Liu D, Wang N, He D, Wu Z, Zhu X, Wen X, Li X, Li J, Wang Z. Deep learning using bulk RNA-seq data expands cell landscape identification in tumor microenvironment. Oncoimmunology 2022; 11:2043662. [PMID: 35251771 PMCID: PMC8890395 DOI: 10.1080/2162402x.2022.2043662] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Affiliation(s)
- Xin Wang
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, College of Biomedical Information and Engineering, Hainan Medical University, Haikou, China
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin,China
- The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Hongjiu Wang
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, College of Biomedical Information and Engineering, Hainan Medical University, Haikou, China
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin,China
| | - Dan Liu
- The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Na Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin,China
| | - Danni He
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin,China
| | - Zheyu Wu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin,China
| | - Xu Zhu
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, College of Biomedical Information and Engineering, Hainan Medical University, Haikou, China
| | - Xiaoling Wen
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, College of Biomedical Information and Engineering, Hainan Medical University, Haikou, China
| | - Xuhua Li
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, College of Biomedical Information and Engineering, Hainan Medical University, Haikou, China
| | - Jin Li
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, College of Biomedical Information and Engineering, Hainan Medical University, Haikou, China
| | - Zhenzhen Wang
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, College of Biomedical Information and Engineering, Hainan Medical University, Haikou, China
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin,China
| |
Collapse
|
15
|
Wang L, Miao X, Nie R, Zhang Z, Zhang J, Cai J. MultiCapsNet: A General Framework for Data Integration and Interpretable Classification. Front Genet 2021; 12:767602. [PMID: 34899854 PMCID: PMC8652257 DOI: 10.3389/fgene.2021.767602] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 10/25/2021] [Indexed: 12/16/2022] Open
Abstract
The latest progresses of experimental biology have generated a large number of data with different formats and lengths. Deep learning is an ideal tool to deal with complex datasets, but its inherent “black box” nature needs more interpretability. At the same time, traditional interpretable machine learning methods, such as linear regression or random forest, could only deal with numerical features instead of modular features often encountered in the biological field. Here, we present MultiCapsNet (https://github.com/wanglf19/MultiCapsNet), a new deep learning model built on CapsNet and scCapsNet, which possesses the merits such as easy data integration and high model interpretability. To demonstrate the ability of this model as an interpretable classifier to deal with modular inputs, we test MultiCapsNet on three datasets with different data type and application scenarios. Firstly, on the labeled variant call dataset, MultiCapsNet shows a similar classification performance with neural network model, and provides importance scores for data sources directly without an extra importance determination step required by the neural network model. The importance scores generated by these two models are highly correlated. Secondly, on single cell RNA sequence (scRNA-seq) dataset, MultiCapsNet integrates information about protein-protein interaction (PPI), and protein-DNA interaction (PDI). The classification accuracy of MultiCapsNet is comparable to the neural network and random forest model. Meanwhile, MultiCapsNet reveals how each transcription factor (TF) or PPI cluster node contributes to classification of cell type. Thirdly, we made a comparison between MultiCapsNet and SCENIC. The results show several cell type relevant TFs identified by both methods, further proving the validity and interpretability of the MultiCapsNet.
Collapse
Affiliation(s)
- Lifei Wang
- Shulan (Hangzhou) Hospital Affiliated to Zhejiang Shuren University Shulan International Medical College, Hangzhou, China.,China National Center for Bioinformation, Beijing, China.,Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Xuexia Miao
- China National Center for Bioinformation, Beijing, China.,Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Rui Nie
- China National Center for Bioinformation, Beijing, China.,Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Zhang Zhang
- School of Systems Science, Beijing Normal University, Beijing, China
| | - Jiang Zhang
- School of Systems Science, Beijing Normal University, Beijing, China
| | - Jun Cai
- China National Center for Bioinformation, Beijing, China.,Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
16
|
Pratella D, Ait-El-Mkadem Saadi S, Bannwarth S, Paquis-Fluckinger V, Bottini S. A Survey of Autoencoder Algorithms to Pave the Diagnosis of Rare Diseases. Int J Mol Sci 2021; 22:10891. [PMID: 34639231 PMCID: PMC8509321 DOI: 10.3390/ijms221910891] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 10/04/2021] [Accepted: 10/07/2021] [Indexed: 12/28/2022] Open
Abstract
Rare diseases (RDs) concern a broad range of disorders and can result from various origins. For a long time, the scientific community was unaware of RDs. Impressive progress has already been made for certain RDs; however, due to the lack of sufficient knowledge, many patients are not diagnosed. Nowadays, the advances in high-throughput sequencing technologies such as whole genome sequencing, single-cell and others, have boosted the understanding of RDs. To extract biological meaning using the data generated by these methods, different analysis techniques have been proposed, including machine learning algorithms. These methods have recently proven to be valuable in the medical field. Among such approaches, unsupervised learning methods via neural networks including autoencoders (AEs) or variational autoencoders (VAEs) have shown promising performances with applications on various type of data and in different contexts, from cancer to healthy patient tissues. In this review, we discuss how AEs and VAEs have been used in biomedical settings. Specifically, we discuss their current applications and the improvements achieved in diagnostic and survival of patients. We focus on the applications in the field of RDs, and we discuss how the employment of AEs and VAEs would enhance RD understanding and diagnosis.
Collapse
Affiliation(s)
- David Pratella
- Center of Modeling, Simulation and Interactions, Université Côte d’Azur, 06200 Nice, France;
| | - Samira Ait-El-Mkadem Saadi
- Centre Hospitalier Universitaire (CHU) de Nice, Institute for Research on Cancer and Aging, Nice (IRCAN), Université Côte d’Azur, Inserm U1081, CNRS UMR 7284, 06200 Nice, France; (S.A.-E.-M.S.); (S.B.); (V.P.-F.)
| | - Sylvie Bannwarth
- Centre Hospitalier Universitaire (CHU) de Nice, Institute for Research on Cancer and Aging, Nice (IRCAN), Université Côte d’Azur, Inserm U1081, CNRS UMR 7284, 06200 Nice, France; (S.A.-E.-M.S.); (S.B.); (V.P.-F.)
| | - Véronique Paquis-Fluckinger
- Centre Hospitalier Universitaire (CHU) de Nice, Institute for Research on Cancer and Aging, Nice (IRCAN), Université Côte d’Azur, Inserm U1081, CNRS UMR 7284, 06200 Nice, France; (S.A.-E.-M.S.); (S.B.); (V.P.-F.)
| | - Silvia Bottini
- Center of Modeling, Simulation and Interactions, Université Côte d’Azur, 06200 Nice, France;
| |
Collapse
|
17
|
Mowbray M, Savage T, Wu C, Song Z, Cho BA, Del Rio-Chanona EA, Zhang D. Machine learning for biochemical engineering: A review. Biochem Eng J 2021. [DOI: 10.1016/j.bej.2021.108054] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
18
|
Liñares-Blanco J, Pazos A, Fernandez-Lozano C. Machine learning analysis of TCGA cancer data. PeerJ Comput Sci 2021; 7:e584. [PMID: 34322589 PMCID: PMC8293929 DOI: 10.7717/peerj-cs.584] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Accepted: 05/17/2021] [Indexed: 06/13/2023]
Abstract
In recent years, machine learning (ML) researchers have changed their focus towards biological problems that are difficult to analyse with standard approaches. Large initiatives such as The Cancer Genome Atlas (TCGA) have allowed the use of omic data for the training of these algorithms. In order to study the state of the art, this review is provided to cover the main works that have used ML with TCGA data. Firstly, the principal discoveries made by the TCGA consortium are presented. Once these bases have been established, we begin with the main objective of this study, the identification and discussion of those works that have used the TCGA data for the training of different ML approaches. After a review of more than 100 different papers, it has been possible to make a classification according to following three pillars: the type of tumour, the type of algorithm and the predicted biological problem. One of the conclusions drawn in this work shows a high density of studies based on two major algorithms: Random Forest and Support Vector Machines. We also observe the rise in the use of deep artificial neural networks. It is worth emphasizing, the increase of integrative models of multi-omic data analysis. The different biological conditions are a consequence of molecular homeostasis, driven by both protein coding regions, regulatory elements and the surrounding environment. It is notable that a large number of works make use of genetic expression data, which has been found to be the preferred method by researchers when training the different models. The biological problems addressed have been classified into five types: prognosis prediction, tumour subtypes, microsatellite instability (MSI), immunological aspects and certain pathways of interest. A clear trend was detected in the prediction of these conditions according to the type of tumour. That is the reason for which a greater number of works have focused on the BRCA cohort, while specific works for survival, for example, were centred on the GBM cohort, due to its large number of events. Throughout this review, it will be possible to go in depth into the works and the methodologies used to study TCGA cancer data. Finally, it is intended that this work will serve as a basis for future research in this field of study.
Collapse
Affiliation(s)
- Jose Liñares-Blanco
- CITIC-Research Center of Information and Communication Technologies, University of A Coruna, A Coruña, Spain
- Department of Computer Science and Information Technologies, Faculty of Computer Science, University of A Coruna, A Coruña, Spain
| | - Alejandro Pazos
- CITIC-Research Center of Information and Communication Technologies, University of A Coruna, A Coruña, Spain
- Department of Computer Science and Information Technologies, Faculty of Computer Science, University of A Coruna, A Coruña, Spain
- Grupo de Redes de Neuronas Artificiales y Sistemas Adaptativos. Imagen Médica y Diagnóstico Radiológico (RNASA-IMEDIR). Complexo Hospitalario Universitario de A Coruña (CHUAC), SERGAS, Universidade da Coruña, Instituto de Investigación Biomédica de A Coruña (INIBIC), A Coruña, Spain
| | - Carlos Fernandez-Lozano
- CITIC-Research Center of Information and Communication Technologies, University of A Coruna, A Coruña, Spain
- Department of Computer Science and Information Technologies, Faculty of Computer Science, University of A Coruna, A Coruña, Spain
- Grupo de Redes de Neuronas Artificiales y Sistemas Adaptativos. Imagen Médica y Diagnóstico Radiológico (RNASA-IMEDIR). Complexo Hospitalario Universitario de A Coruña (CHUAC), SERGAS, Universidade da Coruña, Instituto de Investigación Biomédica de A Coruña (INIBIC), A Coruña, Spain
| |
Collapse
|
19
|
Zhao Y, Dong Y, Sun Y, Cheng C. AutoEncoder-Based Computational Framework for Tumor Microenvironment Decomposition and Biomarker Identification in Metastatic Melanoma. Front Genet 2021; 12:665065. [PMID: 34122516 PMCID: PMC8191580 DOI: 10.3389/fgene.2021.665065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Accepted: 04/12/2021] [Indexed: 11/13/2022] Open
Abstract
Melanoma is one of the most aggressive cancer types whose prognosis is determined by both the tumor cell-intrinsic and -extrinsic features as well as their interactions. In this study, we performed systematic and unbiased analysis using The Cancer Genome Atlas (TCGA) melanoma RNA-seq data and identified two gene signatures that captured the intrinsic and extrinsic features, respectively. Specifically, we selected genes that best reflected the expression signals from tumor cells and immune infiltrate cells. Then, we applied an AutoEncoder-based method to decompose the expression of these genes into a small number of representative nodes. Many of these nodes were found to be significantly associated with patient prognosis. From them, we selected two most prognostic nodes and defined a tumor-intrinsic (TI) signature and a tumor-extrinsic (TE) signature. Pathway analysis confirmed that the TE signature recapitulated cytotoxic immune cell related pathways while the TI signature reflected MYC pathway activity. We leveraged these two signatures to investigate six independent melanoma microarray datasets and found that they were able to predict the prognosis of patients under standard care. Furthermore, we showed that the TE signature was also positively associated with patients' response to immunotherapies, including tumor vaccine therapy and checkpoint blockade immunotherapy. This study developed a novel computational framework to capture the tumor-intrinsic and -extrinsic features and identified robust prognostic and predictive biomarkers in melanoma.
Collapse
Affiliation(s)
- Yanding Zhao
- Department of Medicine, Baylor College of Medicine, Houston, TX, United States.,Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, United States
| | - Yadong Dong
- Department of Medicine, Baylor College of Medicine, Houston, TX, United States.,Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, United States
| | - Yongqi Sun
- Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China
| | - Chao Cheng
- Department of Medicine, Baylor College of Medicine, Houston, TX, United States.,Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, United States
| |
Collapse
|
20
|
Mostavi M, Chiu YC, Chen Y, Huang Y. CancerSiamese: one-shot learning for predicting primary and metastatic tumor types unseen during model training. BMC Bioinformatics 2021; 22:244. [PMID: 33980137 PMCID: PMC8117642 DOI: 10.1186/s12859-021-04157-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 04/27/2021] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND The state-of-the-art deep learning based cancer type prediction can only predict cancer types whose samples are available during the training where the sample size is commonly large. In this paper, we consider how to utilize the existing training samples to predict cancer types unseen during the training. We hypothesize the existence of a set of type-agnostic expression representations that define the similarity/dissimilarity between samples of the same/different types and propose a novel one-shot learning model called CancerSiamese to learn this common representation. CancerSiamese accepts a pair of query and support samples (gene expression profiles) and learns the representation of similar or dissimilar cancer types through two parallel convolutional neural networks joined by a similarity function. RESULTS We trained CancerSiamese for cancer type prediction for primary and metastatic tumors using samples from the Cancer Genome Atlas (TCGA) and MET500. Network transfer learning was utilized to facilitate the training of the CancerSiamese models. CancerSiamese was tested for different N-way predictions and yielded an average accuracy improvement of 8% and 4% over the benchmark 1-Nearest Neighbor (1-NN) classifier for primary and metastatic tumors, respectively. Moreover, we applied the guided gradient saliency map and feature selection to CancerSiamese to examine 100 and 200 top marker-gene candidates for the prediction of primary and metastatic cancers, respectively. Functional analysis of these marker genes revealed several cancer related functions between primary and metastatic tumors. CONCLUSION This work demonstrated, for the first time, the feasibility of predicting unseen cancer types whose samples are limited. Thus, it could inspire new and ingenious applications of one-shot and few-shot learning solutions for improving cancer diagnosis, prognostic, and our understanding of cancer.
Collapse
Affiliation(s)
- Milad Mostavi
- Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX, 78229, USA
- Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, TX, 78249, USA
| | - Yu-Chiao Chiu
- Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX, 78229, USA
| | - Yidong Chen
- Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX, 78229, USA.
- Department of Population Health Sciences, University of Texas Health San Antonio, San Antonio, TX, 78229, USA.
| | - Yufei Huang
- Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, TX, 78249, USA.
- Department of Population Health Sciences, University of Texas Health San Antonio, San Antonio, TX, 78229, USA.
| |
Collapse
|
21
|
Chiu YC, Chen HIH, Gorthi A, Mostavi M, Zheng S, Huang Y, Chen Y. Deep learning of pharmacogenomics resources: moving towards precision oncology. Brief Bioinform 2020; 21:2066-2083. [PMID: 31813953 PMCID: PMC7711267 DOI: 10.1093/bib/bbz144] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Revised: 08/22/2019] [Accepted: 10/18/2019] [Indexed: 12/13/2022] Open
Abstract
The recent accumulation of cancer genomic data provides an opportunity to understand how a tumor's genomic characteristics can affect its responses to drugs. This field, called pharmacogenomics, is a key area in the development of precision oncology. Deep learning (DL) methodology has emerged as a powerful technique to characterize and learn from rapidly accumulating pharmacogenomics data. We introduce the fundamentals and typical model architectures of DL. We review the use of DL in classification of cancers and cancer subtypes (diagnosis and treatment stratification of patients), prediction of drug response and drug synergy for individual tumors (treatment prioritization for a patient), drug repositioning and discovery and the study of mechanism/mode of action of treatments. For each topic, we summarize current genomics and pharmacogenomics data resources such as pan-cancer genomics data for cancer cell lines (CCLs) and tumors, and systematic pharmacologic screens of CCLs. By revisiting the published literature, including our in-house analyses, we demonstrate the unprecedented capability of DL enabled by rapid accumulation of data resources to decipher complex drug response patterns, thus potentially improving cancer medicine. Overall, this review provides an in-depth summary of state-of-the-art DL methods and up-to-date pharmacogenomics resources and future opportunities and challenges to realize the goal of precision oncology.
Collapse
Affiliation(s)
- Yu-Chiao Chiu
- Greehey Children’s Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX 78229, USA
| | - Hung-I Harry Chen
- Greehey Children’s Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX 78229, USA
- Department of Electrical and Computer Engineering, the University of Texas at San Antonio, San Antonio, TX 78249, USA
| | - Aparna Gorthi
- Greehey Children’s Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX 78229, USA
| | - Milad Mostavi
- Greehey Children’s Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX 78229, USA
- Department of Electrical and Computer Engineering, the University of Texas at San Antonio, San Antonio, TX 78249, USA
| | - Siyuan Zheng
- Greehey Children’s Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX 78229, USA
- Department of Population Health Sciences, University of Texas Health San Antonio, San Antonio, TX 78229, USA
| | - Yufei Huang
- Department of Electrical and Computer Engineering, the University of Texas at San Antonio, San Antonio, TX 78249, USA
- Department of Population Health Sciences, University of Texas Health San Antonio, San Antonio, TX 78229, USA
| | - Yidong Chen
- Greehey Children’s Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX 78229, USA
- Department of Population Health Sciences, University of Texas Health San Antonio, San Antonio, TX 78229, USA
| |
Collapse
|
22
|
Ramirez R, Chiu YC, Hererra A, Mostavi M, Ramirez J, Chen Y, Huang Y, Jin YF. Classification of Cancer Types Using Graph Convolutional Neural Networks. FRONTIERS IN PHYSICS 2020; 8:203. [PMID: 33437754 PMCID: PMC7799442 DOI: 10.3389/fphy.2020.00203] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
BACKGROUND Cancer has been a leading cause of death in the United States with significant health care costs. Accurate prediction of cancers at an early stage and understanding the genomic mechanisms that drive cancer development are vital to the improvement of treatment outcomes and survival rates, thus resulting in significant social and economic impacts. Attempts have been made to classify cancer types with machine learning techniques during the past two decades and deep learning approaches more recently. RESULTS In this paper, we established four models with graph convolutional neural network (GCNN) that use unstructured gene expressions as inputs to classify different tumor and non-tumor samples into their designated 33 cancer types or as normal. Four GCNN models based on a co-expression graph, co-expression+singleton graph, protein-protein interaction (PPI) graph, and PPI+singleton graph have been designed and implemented. They were trained and tested on combined 10,340 cancer samples and 731 normal tissue samples from The Cancer Genome Atlas (TCGA) dataset. The established GCNN models achieved excellent prediction accuracies (89.9-94.7%) among 34 classes (33 cancer types and a normal group). In silico gene-perturbation experiments were performed on four models based on co-expression graph, co-expression+singleton, PPI graph, and PPI+singleton graphs. The co-expression GCNN model was further interpreted to identify a total of 428 markers genes that drive the classification of 33 cancer types and normal. The concordance of differential expressions of these markers between the represented cancer type and others are confirmed. Successful classification of cancer types and a normal group regardless of normal tissues' origin suggested that the identified markers are cancer-specific rather than tissue-specific. CONCLUSION Novel GCNN models have been established to predict cancer types or normal tissue based on gene expression profiles. We demonstrated the results from the TCGA dataset that these models can produce accurate classification (above 94%), using cancer-specific markers genes. The models and the source codes are publicly available and can be readily adapted to the diagnosis of cancer and other diseases by the data-driven modeling research community.
Collapse
Affiliation(s)
- Ricardo Ramirez
- Department of Electrical and Computer Engineering, the University of Texas at San Antonio, San Antonio, Texas 78249, USA
| | - Yu-Chiao Chiu
- Greehey Children’s Cancer Research Institute, The University of Texas Health San Antonio, San Antonio, TX, 78229, USA
| | - Allen Hererra
- Department of Electrical and Computer Engineering, the University of Texas at San Antonio, San Antonio, Texas 78249, USA
| | - Milad Mostavi
- Department of Electrical and Computer Engineering, the University of Texas at San Antonio, San Antonio, Texas 78249, USA
| | - Joshua Ramirez
- Department of Electrical and Computer Engineering, the University of Texas at San Antonio, San Antonio, Texas 78249, USA
| | - Yidong Chen
- Greehey Children’s Cancer Research Institute, The University of Texas Health San Antonio, San Antonio, TX, 78229, USA
- Department of Population Health Sciences, The University of Texas Health San Antonio, San Antonio, Texas 78229, USA
| | - Yufei Huang
- Department of Electrical and Computer Engineering, the University of Texas at San Antonio, San Antonio, Texas 78249, USA
- Department of Population Health Sciences, The University of Texas Health San Antonio, San Antonio, Texas 78229, USA
| | - Yu-Fang Jin
- Department of Electrical and Computer Engineering, the University of Texas at San Antonio, San Antonio, Texas 78249, USA
| |
Collapse
|
23
|
Complex Data Imputation by Auto-Encoders and Convolutional Neural Networks—A Case Study on Genome Gap-Filling. COMPUTERS 2020. [DOI: 10.3390/computers9020037] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Missing data imputation has been a hot topic in the past decade, and many state-of-the-art works have been presented to propose novel, interesting solutions that have been applied in a variety of fields. In the past decade, the successful results achieved by deep learning techniques have opened the way to their application for solving difficult problems where human skill is not able to provide a reliable solution. Not surprisingly, some deep learners, mainly exploiting encoder-decoder architectures, have also been designed and applied to the task of missing data imputation. However, most of the proposed imputation techniques have not been designed to tackle “complex data”, that is high dimensional data belonging to datasets with huge cardinality and describing complex problems. Precisely, they often need critical parameters to be manually set or exploit complex architecture and/or training phases that make their computational load impracticable. In this paper, after clustering the state-of-the-art imputation techniques into three broad categories, we briefly review the most representative methods and then describe our data imputation proposals, which exploit deep learning techniques specifically designed to handle complex data. Comparative tests on genome sequences show that our deep learning imputers outperform the state-of-the-art KNN-imputation method when filling gaps in human genome sequences.
Collapse
|
24
|
Mostavi M, Chiu YC, Huang Y, Chen Y. Convolutional neural network models for cancer type prediction based on gene expression. BMC Med Genomics 2020; 13:44. [PMID: 32241303 PMCID: PMC7119277 DOI: 10.1186/s12920-020-0677-2] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Precise prediction of cancer types is vital for cancer diagnosis and therapy. Through a predictive model, important cancer marker genes can be inferred. Several studies have attempted to build machine learning models for this task however none has taken into consideration the effects of tissue of origin that can potentially bias the identification of cancer markers. RESULTS In this paper, we introduced several Convolutional Neural Network (CNN) models that take unstructured gene expression inputs to classify tumor and non-tumor samples into their designated cancer types or as normal. Based on different designs of gene embeddings and convolution schemes, we implemented three CNN models: 1D-CNN, 2D-Vanilla-CNN, and 2D-Hybrid-CNN. The models were trained and tested on gene expression profiles from combined 10,340 samples of 33 cancer types and 713 matched normal tissues of The Cancer Genome Atlas (TCGA). Our models achieved excellent prediction accuracies (93.9-95.0%) among 34 classes (33 cancers and normal). Furthermore, we interpreted one of the models, 1D-CNN model, with a guided saliency technique and identified a total of 2090 cancer markers (108 per class on average). The concordance of differential expression of these markers between the cancer type they represent and others is confirmed. In breast cancer, for instance, our model identified well-known markers, such as GATA3 and ESR1. Finally, we extended the 1D-CNN model for the prediction of breast cancer subtypes and achieved an average accuracy of 88.42% among 5 subtypes. The codes can be found at https://github.com/chenlabgccri/CancerTypePrediction. CONCLUSIONS Here we present novel CNN designs for accurate and simultaneous cancer/normal and cancer types prediction based on gene expression profiles, and unique model interpretation scheme to elucidate biologically relevance of cancer marker genes after eliminating the effects of tissue-of-origin. The proposed model has light hyperparameters to be trained and thus can be easily adapted to facilitate cancer diagnosis in the future.
Collapse
Affiliation(s)
- Milad Mostavi
- Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX, 78229, USA
- Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, TX, 78249, USA
| | - Yu-Chiao Chiu
- Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX, 78229, USA
| | - Yufei Huang
- Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, TX, 78249, USA.
- Department of Population Health Sciences, University of Texas Health San Antonio, San Antonio, TX, 78229, USA.
| | - Yidong Chen
- Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX, 78229, USA.
- Department of Population Health Sciences, University of Texas Health San Antonio, San Antonio, TX, 78229, USA.
| |
Collapse
|
25
|
López-García G, Jerez JM, Franco L, Veredas FJ. Transfer learning with convolutional neural networks for cancer survival prediction using gene-expression data. PLoS One 2020; 15:e0230536. [PMID: 32214348 PMCID: PMC7098575 DOI: 10.1371/journal.pone.0230536] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Accepted: 03/02/2020] [Indexed: 12/17/2022] Open
Abstract
Precision medicine in oncology aims at obtaining data from heterogeneous sources to have a precise estimation of a given patient’s state and prognosis. With the purpose of advancing to personalized medicine framework, accurate diagnoses allow prescription of more effective treatments adapted to the specificities of each individual case. In the last years, next-generation sequencing has impelled cancer research by providing physicians with an overwhelming amount of gene-expression data from RNA-seq high-throughput platforms. In this scenario, data mining and machine learning techniques have widely contribute to gene-expression data analysis by supplying computational models to supporting decision-making on real-world data. Nevertheless, existing public gene-expression databases are characterized by the unfavorable imbalance between the huge number of genes (in the order of tenths of thousands) and the small number of samples (in the order of a few hundreds) available. Despite diverse feature selection and extraction strategies have been traditionally applied to surpass derived over-fitting issues, the efficacy of standard machine learning pipelines is far from being satisfactory for the prediction of relevant clinical outcomes like follow-up end-points or patient’s survival. Using the public Pan-Cancer dataset, in this study we pre-train convolutional neural network architectures for survival prediction on a subset composed of thousands of gene-expression samples from thirty-one tumor types. The resulting architectures are subsequently fine-tuned to predict lung cancer progression-free interval. The application of convolutional networks to gene-expression data has many limitations, derived from the unstructured nature of these data. In this work we propose a methodology to rearrange RNA-seq data by transforming RNA-seq samples into gene-expression images, from which convolutional networks can extract high-level features. As an additional objective, we investigate whether leveraging the information extracted from other tumor-type samples contributes to the extraction of high-level features that improve lung cancer progression prediction, compared to other machine learning approaches.
Collapse
Affiliation(s)
- Guillermo López-García
- Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, ETSI Informática, Málaga, Spain
- * E-mail:
| | - José M. Jerez
- Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, ETSI Informática, Málaga, Spain
| | - Leonardo Franco
- Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, ETSI Informática, Málaga, Spain
| | - Francisco J. Veredas
- Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, ETSI Informática, Málaga, Spain
| |
Collapse
|
26
|
Dwivedi SK, Tjärnberg A, Tegnér J, Gustafsson M. Deriving disease modules from the compressed transcriptional space embedded in a deep autoencoder. Nat Commun 2020; 11:856. [PMID: 32051402 PMCID: PMC7016183 DOI: 10.1038/s41467-020-14666-6] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Accepted: 01/22/2020] [Indexed: 01/05/2023] Open
Abstract
Disease modules in molecular interaction maps have been useful for characterizing diseases. Yet biological networks, that commonly define such modules are incomplete and biased toward some well-studied disease genes. Here we ask whether disease-relevant modules of genes can be discovered without prior knowledge of a biological network, instead training a deep autoencoder from large transcriptional data. We hypothesize that modules could be discovered within the autoencoder representations. We find a statistically significant enrichment of genome-wide association studies (GWAS) relevant genes in the last layer, and to a successively lesser degree in the middle and first layers respectively. In contrast, we find an opposite gradient where a modular protein-protein interaction signal is strongest in the first layer, but then vanishing smoothly deeper in the network. We conclude that a data-driven discovery approach is sufficient to discover groups of disease-related genes.
Collapse
Affiliation(s)
- Sanjiv K Dwivedi
- Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, Linköping, Sweden
| | - Andreas Tjärnberg
- Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, Linköping, Sweden
- Department of Biology, Center For Genomics and Systems Biology, New York University, New York, NY, 10008, USA
- Center for Developmental Genetics, Department of Biology, New York University, New York, NY, USA
| | - Jesper Tegnér
- Biological and Environmental Sciences and Engineering Division, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
- Unit of Computational Medicine, Department of Medicine, Solna, Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
- Science for Life Laboratory, Solna, Sweden
| | - Mika Gustafsson
- Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, Linköping, Sweden.
| |
Collapse
|
27
|
Palazzo M, Beauseroy P, Yankilevich P. A pan-cancer somatic mutation embedding using autoencoders. BMC Bioinformatics 2019; 20:655. [PMID: 31829157 PMCID: PMC6907172 DOI: 10.1186/s12859-019-3298-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Accepted: 11/27/2019] [Indexed: 02/08/2023] Open
Abstract
Background Next generation sequencing instruments are providing new opportunities for comprehensive analyses of cancer genomes. The increasing availability of tumor data allows to research the complexity of cancer disease with machine learning methods. The large available repositories of high dimensional tumor samples characterised with germline and somatic mutation data requires advance computational modelling for data interpretation. In this work, we propose to analyze this complex data with neural network learning, a methodology that made impressive advances in image and natural language processing. Results Here we present a tumor mutation profile analysis pipeline based on an autoencoder model, which is used to discover better representations of lower dimensionality from large somatic mutation data of 40 different tumor types and subtypes. Kernel learning with hierarchical cluster analysis are used to assess the quality of the learned somatic mutation embedding, on which support vector machine models are used to accurately classify tumor subtypes. Conclusions The learned latent space maps the original samples in a much lower dimension while keeping the biological signals from the original tumor samples. This pipeline and the resulting embedding allows an easier exploration of the heterogeneity within and across tumor types and to perform an accurate classification of tumor samples in the pan-cancer somatic mutation landscape.
Collapse
Affiliation(s)
- Martin Palazzo
- Instituto de Investigación en Biomedicina de Buenos Aires (IBioBA)-CONICET-Partner Institute of the Max Planck Society, Godoy Cruz 2390, Buenos Aires, C1425FQD, Argentina.,Institut Charles Delaunay, Universite de Technologie de Troyes, 12 Rue Marie Curie, Troyes, 10300, France.,Universidad Tecnologica Nacional, Facultad Regional Buenos Aires, Av. Medrano 951, Buenos Aires, C1179AAQ, Argentina
| | - Pierre Beauseroy
- Institut Charles Delaunay, Universite de Technologie de Troyes, 12 Rue Marie Curie, Troyes, 10300, France
| | - Patricio Yankilevich
- Instituto de Investigación en Biomedicina de Buenos Aires (IBioBA)-CONICET-Partner Institute of the Max Planck Society, Godoy Cruz 2390, Buenos Aires, C1425FQD, Argentina.
| |
Collapse
|
28
|
Liang Z, Cao J, Tian L, Shen Y, Yang X, Lin Q, Zhang R, Liu H, Du X, Shi J, Zhang J. Aromatase-induced endogenous estrogen promotes tumour metastasis through estrogen receptor-α/matrix metalloproteinase 12 axis activation in castration-resistant prostate cancer. Cancer Lett 2019; 467:72-84. [PMID: 31499120 DOI: 10.1016/j.canlet.2019.09.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Revised: 08/17/2019] [Accepted: 09/04/2019] [Indexed: 01/09/2023]
Abstract
Castration-resistant prostate cancer (CRPC) following androgen deprivation therapy remains a major obstacle advanced prostate cancer management. Aromatase catalyzes estrogen from androgens, yet the role of aromatase-generated endogenous estrogen in CRPC is poorly understood. In this study, we assessed the expression and function of aromatase in CRPC. We found that aromatase expression was significantly increased in CRPC tissues and cell lines. In some prostate cancer cell lines, aromatase was predominantly expressed in CD44+ subsets. Bicalutamide treatment significantly increased aromatase expression, and CYP19A1 expression positively correlated with estrogen responses and epithelial-mesenchymal transition. Aromatase knockdown in PC3 cells reduced invasiveness and decreased metastasis-related gene expression. The aromatase inhibitor, letrozole, attenuated tumour metastasis in castrated PC3-xenograft mice. Mechanistically, aromatase-induced endogenous estrogen promoted estrogen receptor-α (ERα) binding to matrix metalloproteinase 12 (MMP12) promoter estrogen response element (ERE). MMP12 co-localized with CD44 on the cell membrane and MMP12 knockdown significantly reduced estradiol-induced PC3 invasion. Taken together, our findings indicated that increased endogenous estrogen, catalysed by elevated aromatase levels, enhanced MMP12 expression via ERα, participated in CRPC progression and promoted tumour metastasis. Thus, aromatase represents a potential novel therapeutic target for CRPC.
Collapse
Affiliation(s)
- Zhixian Liang
- Department of Biochemistry and Molecular Biology, College of Life Sciences, Bioactive Materials Key Lab of the Ministry of Education, Nankai University, Tianjin, 300071, China
| | - Jiasong Cao
- Department of Biochemistry and Molecular Biology, College of Life Sciences, Bioactive Materials Key Lab of the Ministry of Education, Nankai University, Tianjin, 300071, China
| | - Lei Tian
- Department of Biochemistry and Molecular Biology, College of Life Sciences, Bioactive Materials Key Lab of the Ministry of Education, Nankai University, Tianjin, 300071, China
| | - Yongmei Shen
- Department of Biochemistry and Molecular Biology, College of Life Sciences, Bioactive Materials Key Lab of the Ministry of Education, Nankai University, Tianjin, 300071, China
| | - Xu Yang
- Department of Biochemistry and Molecular Biology, College of Life Sciences, Bioactive Materials Key Lab of the Ministry of Education, Nankai University, Tianjin, 300071, China
| | - Qimei Lin
- Department of Biochemistry and Molecular Biology, College of Life Sciences, Bioactive Materials Key Lab of the Ministry of Education, Nankai University, Tianjin, 300071, China
| | - Ran Zhang
- Department of Biochemistry and Molecular Biology, College of Life Sciences, Bioactive Materials Key Lab of the Ministry of Education, Nankai University, Tianjin, 300071, China
| | - Haitao Liu
- Shanghai First People's Hospital Shanghai Jiaotong University, Shanghai, 200080, China
| | - Xiaoling Du
- Department of Biochemistry and Molecular Biology, College of Life Sciences, Bioactive Materials Key Lab of the Ministry of Education, Nankai University, Tianjin, 300071, China
| | - Jiandang Shi
- Department of Biochemistry and Molecular Biology, College of Life Sciences, Bioactive Materials Key Lab of the Ministry of Education, Nankai University, Tianjin, 300071, China.
| | - Ju Zhang
- Department of Biochemistry and Molecular Biology, College of Life Sciences, Bioactive Materials Key Lab of the Ministry of Education, Nankai University, Tianjin, 300071, China.
| |
Collapse
|
29
|
Tobore I, Li J, Yuhang L, Al-Handarish Y, Kandwal A, Nie Z, Wang L. Deep Learning Intervention for Health Care Challenges: Some Biomedical Domain Considerations. JMIR Mhealth Uhealth 2019; 7:e11966. [PMID: 31376272 PMCID: PMC6696854 DOI: 10.2196/11966] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Revised: 04/14/2019] [Accepted: 06/12/2019] [Indexed: 01/10/2023] Open
Abstract
The use of deep learning (DL) for the analysis and diagnosis of biomedical and health care problems has received unprecedented attention in the last decade. The technique has recorded a number of achievements for unearthing meaningful features and accomplishing tasks that were hitherto difficult to solve by other methods and human experts. Currently, biological and medical devices, treatment, and applications are capable of generating large volumes of data in the form of images, sounds, text, graphs, and signals creating the concept of big data. The innovation of DL is a developing trend in the wake of big data for data representation and analysis. DL is a type of machine learning algorithm that has deeper (or more) hidden layers of similar function cascaded into the network and has the capability to make meaning from medical big data. Current transformation drivers to achieve personalized health care delivery will be possible with the use of mobile health (mHealth). DL can provide the analysis for the deluge of data generated from mHealth apps. This paper reviews the fundamentals of DL methods and presents a general view of the trends in DL by capturing literature from PubMed and the Institute of Electrical and Electronics Engineers database publications that implement different variants of DL. We highlight the implementation of DL in health care, which we categorize into biological system, electronic health record, medical image, and physiological signals. In addition, we discuss some inherent challenges of DL affecting biomedical and health domain, as well as prospective research directions that focus on improving health management by promoting the application of physiological signals and modern internet technology.
Collapse
Affiliation(s)
- Igbe Tobore
- Center for Medical Robotics and Minimally Invasive Surgical Devices, Shenzhen Institutes of Advance Technology, Chinese Academy of Sciences, Shenzhen, China.,Graduate University, Chinese Academy of Sciences, Beijing, China
| | - Jingzhen Li
- Center for Medical Robotics and Minimally Invasive Surgical Devices, Shenzhen Institutes of Advance Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Liu Yuhang
- Center for Medical Robotics and Minimally Invasive Surgical Devices, Shenzhen Institutes of Advance Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Yousef Al-Handarish
- Center for Medical Robotics and Minimally Invasive Surgical Devices, Shenzhen Institutes of Advance Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Abhishek Kandwal
- Center for Medical Robotics and Minimally Invasive Surgical Devices, Shenzhen Institutes of Advance Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Zedong Nie
- Center for Medical Robotics and Minimally Invasive Surgical Devices, Shenzhen Institutes of Advance Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Lei Wang
- Center for Medical Robotics and Minimally Invasive Surgical Devices, Shenzhen Institutes of Advance Technology, Chinese Academy of Sciences, Shenzhen, China
| |
Collapse
|
30
|
Mercatelli D, Ray F, Giorgi FM. Pan-Cancer and Single-Cell Modeling of Genomic Alterations Through Gene Expression. Front Genet 2019; 10:671. [PMID: 31379928 PMCID: PMC6657420 DOI: 10.3389/fgene.2019.00671] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Accepted: 06/27/2019] [Indexed: 12/27/2022] Open
Abstract
Cancer is a disease often characterized by the presence of multiple genomic alterations, which trigger altered transcriptional patterns and gene expression, which in turn sustain the processes of tumorigenesis, tumor progression, and tumor maintenance. The links between genomic alterations and gene expression profiles can be utilized as the basis to build specific molecular tumorigenic relationships. In this study, we perform pan-cancer predictions of the presence of single somatic mutations and copy number variations using machine learning approaches on gene expression profiles. We show that gene expression can be used to predict genomic alterations in every tumor type, where some alterations are more predictable than others. We propose gene aggregation as a tool to improve the accuracy of alteration prediction models from gene expression profiles. Ultimately, we show how this principle can be beneficial in intrinsically noisy datasets, such as those based on single-cell sequencing.
Collapse
Affiliation(s)
- Daniele Mercatelli
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Forest Ray
- Department of Systems Biology, Columbia University Medical Center, New York, NY, United States
| | - Federico M. Giorgi
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| |
Collapse
|
31
|
Wang K, Liu X, Guo Y, Wu Z, Zhi D, Ruan J, Zhao Z. The International Conference on Intelligent Biology and Medicine (ICIBM) 2018: systems biology on diverse data types. BMC SYSTEMS BIOLOGY 2018; 12:125. [PMID: 30577731 PMCID: PMC6302362 DOI: 10.1186/s12918-018-0648-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Between June 10–12, 2018, the International Conference on Intelligent Biology and Medicine (ICIBM 2018) was held in Los Angeles, California, USA. The conference included 11 scientific sessions, four tutorials, one poster session, four keynote talks and four eminent scholar talks that covered a wide range of topics in 3D genome structure analysis and visualization, next generation sequencing analysis, computational drug discovery, medical informatics, cancer genomics and systems biology. Systems biology has been a main theme in ICIBM 2018, with exciting advances presented in many areas of systems biology, covering various different data types such as gene regulation, circular RNAs expression, single-cell RNA-Seq, inter-chromosomal interactions, metabolomics, proteomics and phosphoproteomics. Here, we describe ten high quality papers to be published in BMC Systems Biology.
Collapse
Affiliation(s)
- Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA. .,Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| | - Xiaoming Liu
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.,College of Public Health, University of South Florida, Tampa, FL, 33612, USA
| | - Yan Guo
- Comprehensive Cancer Center, University of New Mexico, Albuquerque, NM, 87131, USA
| | - Zhijin Wu
- Department of Biostatistics, Brown University, Providence, RI, 02912, USA
| | - Degui Zhi
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Jianhua Ruan
- Department of Computer Science, The University of Texas at San Antonio, San Antonio, TX, 78249, USA
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
| |
Collapse
|