Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Chen HIH, Chiu YC, Zhang T, Zhang S, Huang Y, Chen Y. GSAE: an autoencoder with embedded gene-set nodes for genomics functional characterization. BMC Syst Biol 2018;12:142. [PMID: 30577835 PMCID: PMC6302374 DOI: 10.1186/s12918-018-0642-2] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]

For:	Chen HIH, Chiu YC, Zhang T, Zhang S, Huang Y, Chen Y. GSAE: an autoencoder with embedded gene-set nodes for genomics functional characterization. BMC Syst Biol 2018;12:142. [PMID: 30577835 PMCID: PMC6302374 DOI: 10.1186/s12918-018-0642-2] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]

Number

Cited by Other Article(s)

Zhao X, Singhal A, Park S, Kong J, Bachelder R, Ideker T. Cancer Mutations Converge on a Collection of Protein Assemblies to Predict Resistance to Replication Stress. Cancer Discov 2024;14:508-523. [PMID: 38236062 PMCID: PMC10905674 DOI: 10.1158/2159-8290.cd-23-0641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 10/25/2023] [Accepted: 12/21/2023] [Indexed: 01/19/2024]

Tjärnberg A, Beheler-Amass M, Jackson CA, Christiaen LA, Gresham D, Bonneau R. Structure-primed embedding on the transcription factor manifold enables transparent model architectures for gene regulatory network and latent activity inference. Genome Biol 2024;25:24. [PMID: 38238840 PMCID: PMC10797903 DOI: 10.1186/s13059-023-03134-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 11/30/2023] [Indexed: 01/22/2024] Open

Shan Q, Zhang C, Li Y, Li Q, Zhang Y, Li X, Shi J, Hu F. SLC7A11, a potential immunotherapeutic target in lung adenocarcinoma. Sci Rep 2023;13:18302. [PMID: 37880315 PMCID: PMC10600206 DOI: 10.1038/s41598-023-45284-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 10/18/2023] [Indexed: 10/27/2023] Open

Roper B, Mathews JC, Nadeem S, Park JH. Vis-SPLIT: Interactive Hierarchical Modeling for mRNA Expression Classification. IEEE VISUALIZATION CONFERENCE : VIS. IEEE CONFERENCE ON VISUALIZATION 2023;2023:106-110. [PMID: 38881685 PMCID: PMC11179685 DOI: 10.1109/vis54172.2023.00030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2024]

Utriainen M, Morris JH. clusterMaker2: a major update to clusterMaker, a multi-algorithm clustering app for Cytoscape. BMC Bioinformatics 2023;24:134. [PMID: 37020209 PMCID: PMC10074866 DOI: 10.1186/s12859-023-05225-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Accepted: 03/11/2023] [Indexed: 04/07/2023] Open

Abstract

BACKGROUND

Since the initial publication of clusterMaker, the need for tools to analyze large biological datasets has only increased. New datasets are significantly larger than a decade ago, and new experimental techniques such as single-cell transcriptomics continue to drive the need for clustering or classification techniques to focus on portions of datasets of interest. While many libraries and packages exist that implement various algorithms, there remains the need for clustering packages that are easy to use, integrated with visualization of the results, and integrated with other commonly used tools for biological data analysis. clusterMaker2 has added several new algorithms, including two entirely new categories of analyses: node ranking and dimensionality reduction. Furthermore, many of the new algorithms have been implemented using the Cytoscape jobs API, which provides a mechanism for executing remote jobs from within Cytoscape. Together, these advances facilitate meaningful analyses of modern biological datasets despite their ever-increasing size and complexity.

RESULTS

The use of clusterMaker2 is exemplified by reanalyzing the yeast heat shock expression experiment that was included in our original paper; however, here we explored this dataset in significantly more detail. Combining this dataset with the yeast protein-protein interaction network from STRING, we were able to perform a variety of analyses and visualizations from within clusterMaker2, including Leiden clustering to break the entire network into smaller clusters, hierarchical clustering to look at the overall expression dataset, dimensionality reduction using UMAP to find correlations between our hierarchical visualization and the UMAP plot, fuzzy clustering, and cluster ranking. Using these techniques, we were able to explore the highest-ranking cluster and determine that it represents a strong contender for proteins working together in response to heat shock. We found a series of clusters that, when re-explored as fuzzy clusters, provide a better presentation of mitochondrial processes.

CONCLUSIONS

clusterMaker2 represents a significant advance over the previously published version, and most importantly, provides an easy-to-use tool to perform clustering and to visualize clusters within the Cytoscape network context. The new algorithms should be welcome to the large population of Cytoscape users, particularly the new dimensionality reduction and fuzzy clustering techniques.

Collapse

Tjärnberg A, Beheler-Amass M, Jackson CA, Christiaen LA, Gresham D, Bonneau R. Structure primed embedding on the transcription factor manifold enables transparent model architectures for gene regulatory network and latent activity inference. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.02.526909. [PMID: 36778259 PMCID: PMC9915715 DOI: 10.1101/2023.02.02.526909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Cousins H, Hall T, Guo Y, Tso L, Tzeng KTH, Cong L, Altman RB. Gene set proximity analysis: expanding gene set enrichment analysis through learned geometric embeddings, with drug-repurposing applications in COVID-19. Bioinformatics 2023;39:btac735. [PMID: 36394254 PMCID: PMC9805577 DOI: 10.1093/bioinformatics/btac735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 09/27/2022] [Accepted: 11/16/2022] [Indexed: 11/18/2022] Open

Wang L, Nie R, Zhang J, Cai J. scCapsNet-mask: an updated version of scCapsNet with extended applicability in functional analysis related to scRNA-seq data. BMC Bioinformatics 2022;23:539. [PMID: 36510124 PMCID: PMC9743530 DOI: 10.1186/s12859-022-05098-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Accepted: 12/03/2022] [Indexed: 12/14/2022] Open

Abstract

BACKGROUND

With the rapid accumulation of scRNA-seq data, more and more automatic cell type identification methods have been developed, especially those based on deep learning. Although these methods have reached relatively high prediction accuracy, many issues still exist. One is the interpretability. The second is how to deal with the non-standard test samples that are not encountered in the training process.

RESULTS

Here we introduce scCapsNet-mask, an updated version of scCapsNet. The scCapsNet-mask provides a reasonable solution to the issues of interpretability and non-standard test samples. Firstly, the scCapsNet-mask utilizes a mask to ease the task of model interpretation in the original scCapsNet. The results show that scCapsNet-mask could constrain the coupling coefficients, and make a one-to-one correspondence between the primary capsules and type capsules. Secondly, the scCapsNet-mask can process non-standard samples more reasonably. In one example, the scCapsNet-mask was trained on the committed cells, and then tested on less differentiated cells as the non-standard samples. It could not only estimate the lineage bias of less differentiated cells, but also distinguish the development stages more accurately than traditional machine learning models. Therefore, the pseudo-temporal order of cells for each lineage could be established. Following these pseudo-temporal order, lineage specific genes exhibit a gradual increase expression pattern and stem cell associated genes exhibit a gradual decrease expression pattern. In another example, the scCapsNet-mask was trained on scRNA-seq data, and then used to assign cell type in spatial transcriptomics that may contain non-standard sample of doublets. The results show that the scCapsNet-mask not only restored the spatial map but also identified several non-standard samples of doublet.

CONCLUSIONS

The scCapsNet-mask offers a suitable solution to the challenge of interpretability and non-standard test samples. By adding a mask, it has the advantages of automatic processing and easy interpretation compared with the original scCapsNet. In addition, the scCapsNet-mask could more accurately reflect the composition of non-standard test samples than traditional machine learning methods. Therefore, it can extend its applicability in functional analysis, such as fate bias prediction in less differentiated cells and cell type assignment in spatial transcriptomics.

Collapse

Shah I, Bundy J, Chambers B, Everett LJ, Haggard D, Harrill J, Judson RS, Nyffeler J, Patlewicz G. Navigating Transcriptomic Connectivity Mapping Workflows to Link Chemicals with Bioactivities. Chem Res Toxicol 2022;35:1929-1949. [PMID: 36301716 PMCID: PMC10483698 DOI: 10.1021/acs.chemrestox.2c00245] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]

Abstract

Screening new compounds for potential bioactivities against cellular targets is vital for drug discovery and chemical safety. Transcriptomics offers an efficient approach for assessing global gene expression changes, but interpreting chemical mechanisms from these data is often challenging. Connectivity mapping is a potential data-driven avenue for linking chemicals to mechanisms based on the observation that many biological processes are associated with unique gene expression signatures (gene signatures). However, mining the effects of a chemical on gene signatures for biological mechanisms is challenging because transcriptomic data contain thousands of noisy genes. New connectivity mapping approaches seeking to distinguish signal from noise continue to be developed, spurred by the promise of discovering chemical mechanisms, new drugs, and disease targets from burgeoning transcriptomic data. Here, we analyze these approaches in terms of diverse transcriptomic technologies, public databases, gene signatures, pattern-matching algorithms, and statistical evaluation criteria. To navigate the complexity of connectivity mapping, we propose a harmonized scheme to coherently organize and compare published workflows. We first standardize concepts underlying transcriptomic profiles and gene signatures based on various transcriptomic technologies such as microarrays, RNA-Seq, and L1000 and discuss the widely used data sources such as Gene Expression Omnibus, ArrayExpress, and MSigDB. Next, we generalize connectivity mapping as a pattern-matching task for finding similarity between a query (e.g., transcriptomic profile for new chemical) and a reference (e.g., gene signature of known target). Published pattern-matching approaches fall into two main categories: vector-based use metrics like correlation, Jaccard index, etc., and aggregation-based use parametric and nonparametric statistics (e.g., gene set enrichment analysis). The statistical methods for evaluating the performance of different approaches are described, along with comparisons reported in the literature on benchmark transcriptomic data sets. Lastly, we review connectivity mapping applications in toxicology and offer guidance on evaluating chemical-induced toxicity with concentration-response transcriptomic data. In addition to serving as a high-level guide and tutorial for understanding and implementing connectivity mapping workflows, we hope this review will stimulate new algorithms for evaluating chemical safety and drug discovery using transcriptomic data.

Collapse

Zamani M, Cheng YH, Charbonier F, Gupta VK, Mayer AT, Trevino AE, Quertermous T, Chaudhuri O, Cahan P, Huang NF. Single-Cell Transcriptomic Census of Endothelial Changes Induced by Matrix Stiffness and the Association with Atherosclerosis. ADVANCED FUNCTIONAL MATERIALS 2022;32:2203069. [PMID: 36816792 PMCID: PMC9937733 DOI: 10.1002/adfm.202203069] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Indexed: 05/28/2023]

Tian L, Peng Y, Yang K, Cao J, Du X, Liang Z, Shi J, Zhang J. The ERα-NRF2 signalling axis promotes bicalutamide resistance in prostate cancer. Cell Commun Signal 2022;20:178. [PMID: 36376959 PMCID: PMC9661764 DOI: 10.1186/s12964-022-00979-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2022] [Accepted: 09/27/2022] [Indexed: 11/16/2022] Open

Abstract

BACKGROUND

Bicalutamide is a nonsteroidal antiandrogen widely used as a first-line clinical treatment for advanced prostate cancer (PCa). Although patients initially show effective responses to bicalutamide treatment, resistance to bicalutamide frequently occurs and leads to the development of castration-resistant PCa (CRPC). This research investigated the roles of the oestrogen receptor α (ERα)-nuclear factor E2-related factor 2 (NRF2) signalling pathway in bicalutamide resistance in PCa cells.

METHODS

We performed bioinformatic analysis and immunohistochemical staining on normal and cancerous prostate tissue to evaluate ERα and NRF2 expression and their correlation. Gene expression and localization in PCa cell lines were further investigated using real-time reverse transcription PCR/Western blotting and immunofluorescence staining. We treated PCa cells with the ER inhibitor tamoxifen and performed luciferase reporter assays and chromatin immunoprecipitation (ChIP) assays to understand ERα-dependent NRF2 expression. Overexpression and knockdown of ERα and NRF2 were used to explore the potential role of the ERα-NRF2 signalling axis in bicalutamide resistance in PCa cells.

RESULTS

We found that the expression of ERα and NRF2 was positively correlated and was higher in human CRPC tissues than in primary PCa tissues. Treatment with oestrogen or bicalutamide increased the expression of ERα and NRF2 as well as NRF2 target genes in PCa cell lines. These effects were blocked by pretreatment with tamoxifen. ChIP assays demonstrated that ERα directly binds to the oestrogen response element (ERE) in the NRF2 promoter. This binding led to increased transcriptional activity of NRF2 in a luciferase reporter assay. Activation of the ERα-NRF2 signalling axis increased the expression of bicalutamide resistance-related genes. Inhibition of this signalling axis by knockdown of ERα or NRF2 downregulated the expression of bicalutamide resistance-related genes and inhibited the proliferation and migration of PCa cells.

CONCLUSIONS

We demonstrated the transcriptional interaction between ERα and NRF2 in CRPC tissues and cell lines by showing the direct binding of ERα to the ERE in the NRF2 promoter under oestrogen treatment. Activation of the ERα-NRF2 signalling axis contributes to bicalutamide resistance in PCa cells, suggesting that the ERα-NRF2 signalling axis is a potential therapeutic target for CRPC. Video Abstract.

Collapse

Dsouza KB, Li AY, Bhargava VK, Libbrecht MW. Latent Representation of the Human Pan-Celltype Epigenome Through a Deep Recurrent Neural Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:2313-2323. [PMID: 34043510 DOI: 10.1109/tcbb.2021.3084147] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Moon S, Lee H. MOMA: a multi-task attention learning algorithm for multi-omics data interpretation and classification. Bioinformatics 2022;38:2287-2296. [PMID: 35157023 PMCID: PMC10060719 DOI: 10.1093/bioinformatics/btac080] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 01/01/2022] [Accepted: 02/08/2022] [Indexed: 02/03/2023] Open

Wang X, Wang H, Liu D, Wang N, He D, Wu Z, Zhu X, Wen X, Li X, Li J, Wang Z. Deep learning using bulk RNA-seq data expands cell landscape identification in tumor microenvironment. Oncoimmunology 2022;11:2043662. [PMID: 35251771 PMCID: PMC8890395 DOI: 10.1080/2162402x.2022.2043662] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open

Affiliation(s)

Xin Wang Key Laboratory of Tropical Translational Medicine of Ministry of Education, College of Biomedical Information and Engineering, Hainan Medical University, Haikou, China College of Bioinformatics Science and Technology, Harbin Medical University, Harbin,China The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
Hongjiu Wang Key Laboratory of Tropical Translational Medicine of Ministry of Education, College of Biomedical Information and Engineering, Hainan Medical University, Haikou, China College of Bioinformatics Science and Technology, Harbin Medical University, Harbin,China
Dan Liu The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
Na Wang College of Bioinformatics Science and Technology, Harbin Medical University, Harbin,China
Danni He College of Bioinformatics Science and Technology, Harbin Medical University, Harbin,China
Zheyu Wu College of Bioinformatics Science and Technology, Harbin Medical University, Harbin,China
Xu Zhu Key Laboratory of Tropical Translational Medicine of Ministry of Education, College of Biomedical Information and Engineering, Hainan Medical University, Haikou, China
Xiaoling Wen Key Laboratory of Tropical Translational Medicine of Ministry of Education, College of Biomedical Information and Engineering, Hainan Medical University, Haikou, China
Xuhua Li Key Laboratory of Tropical Translational Medicine of Ministry of Education, College of Biomedical Information and Engineering, Hainan Medical University, Haikou, China
Jin Li Key Laboratory of Tropical Translational Medicine of Ministry of Education, College of Biomedical Information and Engineering, Hainan Medical University, Haikou, China
Zhenzhen Wang Key Laboratory of Tropical Translational Medicine of Ministry of Education, College of Biomedical Information and Engineering, Hainan Medical University, Haikou, China College of Bioinformatics Science and Technology, Harbin Medical University, Harbin,China

Collapse

Wang L, Miao X, Nie R, Zhang Z, Zhang J, Cai J. MultiCapsNet: A General Framework for Data Integration and Interpretable Classification. Front Genet 2021;12:767602. [PMID: 34899854 PMCID: PMC8652257 DOI: 10.3389/fgene.2021.767602] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 10/25/2021] [Indexed: 12/16/2022] Open

Abstract

The latest progresses of experimental biology have generated a large number of data with different formats and lengths. Deep learning is an ideal tool to deal with complex datasets, but its inherent “black box” nature needs more interpretability. At the same time, traditional interpretable machine learning methods, such as linear regression or random forest, could only deal with numerical features instead of modular features often encountered in the biological field. Here, we present MultiCapsNet (https://github.com/wanglf19/MultiCapsNet), a new deep learning model built on CapsNet and scCapsNet, which possesses the merits such as easy data integration and high model interpretability. To demonstrate the ability of this model as an interpretable classifier to deal with modular inputs, we test MultiCapsNet on three datasets with different data type and application scenarios. Firstly, on the labeled variant call dataset, MultiCapsNet shows a similar classification performance with neural network model, and provides importance scores for data sources directly without an extra importance determination step required by the neural network model. The importance scores generated by these two models are highly correlated. Secondly, on single cell RNA sequence (scRNA-seq) dataset, MultiCapsNet integrates information about protein-protein interaction (PPI), and protein-DNA interaction (PDI). The classification accuracy of MultiCapsNet is comparable to the neural network and random forest model. Meanwhile, MultiCapsNet reveals how each transcription factor (TF) or PPI cluster node contributes to classification of cell type. Thirdly, we made a comparison between MultiCapsNet and SCENIC. The results show several cell type relevant TFs identified by both methods, further proving the validity and interpretability of the MultiCapsNet.

Collapse

Pratella D, Ait-El-Mkadem Saadi S, Bannwarth S, Paquis-Fluckinger V, Bottini S. A Survey of Autoencoder Algorithms to Pave the Diagnosis of Rare Diseases. Int J Mol Sci 2021;22:10891. [PMID: 34639231 PMCID: PMC8509321 DOI: 10.3390/ijms221910891] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 10/04/2021] [Accepted: 10/07/2021] [Indexed: 12/28/2022] Open

Mowbray M, Savage T, Wu C, Song Z, Cho BA, Del Rio-Chanona EA, Zhang D. Machine learning for biochemical engineering: A review. Biochem Eng J 2021. [DOI: 10.1016/j.bej.2021.108054] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]

Liñares-Blanco J, Pazos A, Fernandez-Lozano C. Machine learning analysis of TCGA cancer data. PeerJ Comput Sci 2021;7:e584. [PMID: 34322589 PMCID: PMC8293929 DOI: 10.7717/peerj-cs.584] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Accepted: 05/17/2021] [Indexed: 06/13/2023]

Abstract

In recent years, machine learning (ML) researchers have changed their focus towards biological problems that are difficult to analyse with standard approaches. Large initiatives such as The Cancer Genome Atlas (TCGA) have allowed the use of omic data for the training of these algorithms. In order to study the state of the art, this review is provided to cover the main works that have used ML with TCGA data. Firstly, the principal discoveries made by the TCGA consortium are presented. Once these bases have been established, we begin with the main objective of this study, the identification and discussion of those works that have used the TCGA data for the training of different ML approaches. After a review of more than 100 different papers, it has been possible to make a classification according to following three pillars: the type of tumour, the type of algorithm and the predicted biological problem. One of the conclusions drawn in this work shows a high density of studies based on two major algorithms: Random Forest and Support Vector Machines. We also observe the rise in the use of deep artificial neural networks. It is worth emphasizing, the increase of integrative models of multi-omic data analysis. The different biological conditions are a consequence of molecular homeostasis, driven by both protein coding regions, regulatory elements and the surrounding environment. It is notable that a large number of works make use of genetic expression data, which has been found to be the preferred method by researchers when training the different models. The biological problems addressed have been classified into five types: prognosis prediction, tumour subtypes, microsatellite instability (MSI), immunological aspects and certain pathways of interest. A clear trend was detected in the prediction of these conditions according to the type of tumour. That is the reason for which a greater number of works have focused on the BRCA cohort, while specific works for survival, for example, were centred on the GBM cohort, due to its large number of events. Throughout this review, it will be possible to go in depth into the works and the methodologies used to study TCGA cancer data. Finally, it is intended that this work will serve as a basis for future research in this field of study.

Collapse

Zhao Y, Dong Y, Sun Y, Cheng C. AutoEncoder-Based Computational Framework for Tumor Microenvironment Decomposition and Biomarker Identification in Metastatic Melanoma. Front Genet 2021;12:665065. [PMID: 34122516 PMCID: PMC8191580 DOI: 10.3389/fgene.2021.665065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Accepted: 04/12/2021] [Indexed: 11/13/2022] Open

Mostavi M, Chiu YC, Chen Y, Huang Y. CancerSiamese: one-shot learning for predicting primary and metastatic tumor types unseen during model training. BMC Bioinformatics 2021;22:244. [PMID: 33980137 PMCID: PMC8117642 DOI: 10.1186/s12859-021-04157-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 04/27/2021] [Indexed: 02/06/2023] Open

Abstract

BACKGROUND

The state-of-the-art deep learning based cancer type prediction can only predict cancer types whose samples are available during the training where the sample size is commonly large. In this paper, we consider how to utilize the existing training samples to predict cancer types unseen during the training. We hypothesize the existence of a set of type-agnostic expression representations that define the similarity/dissimilarity between samples of the same/different types and propose a novel one-shot learning model called CancerSiamese to learn this common representation. CancerSiamese accepts a pair of query and support samples (gene expression profiles) and learns the representation of similar or dissimilar cancer types through two parallel convolutional neural networks joined by a similarity function.

RESULTS

We trained CancerSiamese for cancer type prediction for primary and metastatic tumors using samples from the Cancer Genome Atlas (TCGA) and MET500. Network transfer learning was utilized to facilitate the training of the CancerSiamese models. CancerSiamese was tested for different N-way predictions and yielded an average accuracy improvement of 8% and 4% over the benchmark 1-Nearest Neighbor (1-NN) classifier for primary and metastatic tumors, respectively. Moreover, we applied the guided gradient saliency map and feature selection to CancerSiamese to examine 100 and 200 top marker-gene candidates for the prediction of primary and metastatic cancers, respectively. Functional analysis of these marker genes revealed several cancer related functions between primary and metastatic tumors.

CONCLUSION

This work demonstrated, for the first time, the feasibility of predicting unseen cancer types whose samples are limited. Thus, it could inspire new and ingenious applications of one-shot and few-shot learning solutions for improving cancer diagnosis, prognostic, and our understanding of cancer.

Collapse

Chiu YC, Chen HIH, Gorthi A, Mostavi M, Zheng S, Huang Y, Chen Y. Deep learning of pharmacogenomics resources: moving towards precision oncology. Brief Bioinform 2020;21:2066-2083. [PMID: 31813953 PMCID: PMC7711267 DOI: 10.1093/bib/bbz144] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Revised: 08/22/2019] [Accepted: 10/18/2019] [Indexed: 12/13/2022] Open

Ramirez R, Chiu YC, Hererra A, Mostavi M, Ramirez J, Chen Y, Huang Y, Jin YF. Classification of Cancer Types Using Graph Convolutional Neural Networks. FRONTIERS IN PHYSICS 2020;8:203. [PMID: 33437754 PMCID: PMC7799442 DOI: 10.3389/fphy.2020.00203] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Abstract

BACKGROUND

Cancer has been a leading cause of death in the United States with significant health care costs. Accurate prediction of cancers at an early stage and understanding the genomic mechanisms that drive cancer development are vital to the improvement of treatment outcomes and survival rates, thus resulting in significant social and economic impacts. Attempts have been made to classify cancer types with machine learning techniques during the past two decades and deep learning approaches more recently.

RESULTS

In this paper, we established four models with graph convolutional neural network (GCNN) that use unstructured gene expressions as inputs to classify different tumor and non-tumor samples into their designated 33 cancer types or as normal. Four GCNN models based on a co-expression graph, co-expression+singleton graph, protein-protein interaction (PPI) graph, and PPI+singleton graph have been designed and implemented. They were trained and tested on combined 10,340 cancer samples and 731 normal tissue samples from The Cancer Genome Atlas (TCGA) dataset. The established GCNN models achieved excellent prediction accuracies (89.9-94.7%) among 34 classes (33 cancer types and a normal group). In silico gene-perturbation experiments were performed on four models based on co-expression graph, co-expression+singleton, PPI graph, and PPI+singleton graphs. The co-expression GCNN model was further interpreted to identify a total of 428 markers genes that drive the classification of 33 cancer types and normal. The concordance of differential expressions of these markers between the represented cancer type and others are confirmed. Successful classification of cancer types and a normal group regardless of normal tissues' origin suggested that the identified markers are cancer-specific rather than tissue-specific.

CONCLUSION

Novel GCNN models have been established to predict cancer types or normal tissue based on gene expression profiles. We demonstrated the results from the TCGA dataset that these models can produce accurate classification (above 94%), using cancer-specific markers genes. The models and the source codes are publicly available and can be readily adapted to the diagnosis of cancer and other diseases by the data-driven modeling research community.

Collapse

Complex Data Imputation by Auto-Encoders and Convolutional Neural Networks—A Case Study on Genome Gap-Filling. COMPUTERS 2020. [DOI: 10.3390/computers9020037] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Mostavi M, Chiu YC, Huang Y, Chen Y. Convolutional neural network models for cancer type prediction based on gene expression. BMC Med Genomics 2020;13:44. [PMID: 32241303 PMCID: PMC7119277 DOI: 10.1186/s12920-020-0677-2] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open

Abstract

BACKGROUND

Precise prediction of cancer types is vital for cancer diagnosis and therapy. Through a predictive model, important cancer marker genes can be inferred. Several studies have attempted to build machine learning models for this task however none has taken into consideration the effects of tissue of origin that can potentially bias the identification of cancer markers.

RESULTS

In this paper, we introduced several Convolutional Neural Network (CNN) models that take unstructured gene expression inputs to classify tumor and non-tumor samples into their designated cancer types or as normal. Based on different designs of gene embeddings and convolution schemes, we implemented three CNN models: 1D-CNN, 2D-Vanilla-CNN, and 2D-Hybrid-CNN. The models were trained and tested on gene expression profiles from combined 10,340 samples of 33 cancer types and 713 matched normal tissues of The Cancer Genome Atlas (TCGA). Our models achieved excellent prediction accuracies (93.9-95.0%) among 34 classes (33 cancers and normal). Furthermore, we interpreted one of the models, 1D-CNN model, with a guided saliency technique and identified a total of 2090 cancer markers (108 per class on average). The concordance of differential expression of these markers between the cancer type they represent and others is confirmed. In breast cancer, for instance, our model identified well-known markers, such as GATA3 and ESR1. Finally, we extended the 1D-CNN model for the prediction of breast cancer subtypes and achieved an average accuracy of 88.42% among 5 subtypes. The codes can be found at https://github.com/chenlabgccri/CancerTypePrediction.

CONCLUSIONS

Here we present novel CNN designs for accurate and simultaneous cancer/normal and cancer types prediction based on gene expression profiles, and unique model interpretation scheme to elucidate biologically relevance of cancer marker genes after eliminating the effects of tissue-of-origin. The proposed model has light hyperparameters to be trained and thus can be easily adapted to facilitate cancer diagnosis in the future.

Collapse

López-García G, Jerez JM, Franco L, Veredas FJ. Transfer learning with convolutional neural networks for cancer survival prediction using gene-expression data. PLoS One 2020;15:e0230536. [PMID: 32214348 PMCID: PMC7098575 DOI: 10.1371/journal.pone.0230536] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Accepted: 03/02/2020] [Indexed: 12/17/2022] Open

Abstract

Precision medicine in oncology aims at obtaining data from heterogeneous sources to have a precise estimation of a given patient’s state and prognosis. With the purpose of advancing to personalized medicine framework, accurate diagnoses allow prescription of more effective treatments adapted to the specificities of each individual case. In the last years, next-generation sequencing has impelled cancer research by providing physicians with an overwhelming amount of gene-expression data from RNA-seq high-throughput platforms. In this scenario, data mining and machine learning techniques have widely contribute to gene-expression data analysis by supplying computational models to supporting decision-making on real-world data. Nevertheless, existing public gene-expression databases are characterized by the unfavorable imbalance between the huge number of genes (in the order of tenths of thousands) and the small number of samples (in the order of a few hundreds) available. Despite diverse feature selection and extraction strategies have been traditionally applied to surpass derived over-fitting issues, the efficacy of standard machine learning pipelines is far from being satisfactory for the prediction of relevant clinical outcomes like follow-up end-points or patient’s survival. Using the public Pan-Cancer dataset, in this study we pre-train convolutional neural network architectures for survival prediction on a subset composed of thousands of gene-expression samples from thirty-one tumor types. The resulting architectures are subsequently fine-tuned to predict lung cancer progression-free interval. The application of convolutional networks to gene-expression data has many limitations, derived from the unstructured nature of these data. In this work we propose a methodology to rearrange RNA-seq data by transforming RNA-seq samples into gene-expression images, from which convolutional networks can extract high-level features. As an additional objective, we investigate whether leveraging the information extracted from other tumor-type samples contributes to the extraction of high-level features that improve lung cancer progression prediction, compared to other machine learning approaches.

Collapse

Dwivedi SK, Tjärnberg A, Tegnér J, Gustafsson M. Deriving disease modules from the compressed transcriptional space embedded in a deep autoencoder. Nat Commun 2020;11:856. [PMID: 32051402 PMCID: PMC7016183 DOI: 10.1038/s41467-020-14666-6] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Accepted: 01/22/2020] [Indexed: 01/05/2023] Open

Palazzo M, Beauseroy P, Yankilevich P. A pan-cancer somatic mutation embedding using autoencoders. BMC Bioinformatics 2019;20:655. [PMID: 31829157 PMCID: PMC6907172 DOI: 10.1186/s12859-019-3298-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Accepted: 11/27/2019] [Indexed: 02/08/2023] Open

Liang Z, Cao J, Tian L, Shen Y, Yang X, Lin Q, Zhang R, Liu H, Du X, Shi J, Zhang J. Aromatase-induced endogenous estrogen promotes tumour metastasis through estrogen receptor-α/matrix metalloproteinase 12 axis activation in castration-resistant prostate cancer. Cancer Lett 2019;467:72-84. [PMID: 31499120 DOI: 10.1016/j.canlet.2019.09.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Revised: 08/17/2019] [Accepted: 09/04/2019] [Indexed: 01/09/2023]

Affiliation(s)

Zhixian Liang Department of Biochemistry and Molecular Biology, College of Life Sciences, Bioactive Materials Key Lab of the Ministry of Education, Nankai University, Tianjin, 300071, China
Jiasong Cao Department of Biochemistry and Molecular Biology, College of Life Sciences, Bioactive Materials Key Lab of the Ministry of Education, Nankai University, Tianjin, 300071, China
Lei Tian Department of Biochemistry and Molecular Biology, College of Life Sciences, Bioactive Materials Key Lab of the Ministry of Education, Nankai University, Tianjin, 300071, China
Yongmei Shen Department of Biochemistry and Molecular Biology, College of Life Sciences, Bioactive Materials Key Lab of the Ministry of Education, Nankai University, Tianjin, 300071, China
Xu Yang Department of Biochemistry and Molecular Biology, College of Life Sciences, Bioactive Materials Key Lab of the Ministry of Education, Nankai University, Tianjin, 300071, China
Qimei Lin Department of Biochemistry and Molecular Biology, College of Life Sciences, Bioactive Materials Key Lab of the Ministry of Education, Nankai University, Tianjin, 300071, China
Ran Zhang Department of Biochemistry and Molecular Biology, College of Life Sciences, Bioactive Materials Key Lab of the Ministry of Education, Nankai University, Tianjin, 300071, China
Haitao Liu Shanghai First People's Hospital Shanghai Jiaotong University, Shanghai, 200080, China
Xiaoling Du Department of Biochemistry and Molecular Biology, College of Life Sciences, Bioactive Materials Key Lab of the Ministry of Education, Nankai University, Tianjin, 300071, China
Jiandang Shi Department of Biochemistry and Molecular Biology, College of Life Sciences, Bioactive Materials Key Lab of the Ministry of Education, Nankai University, Tianjin, 300071, China.
Ju Zhang Department of Biochemistry and Molecular Biology, College of Life Sciences, Bioactive Materials Key Lab of the Ministry of Education, Nankai University, Tianjin, 300071, China.

Collapse

Tobore I, Li J, Yuhang L, Al-Handarish Y, Kandwal A, Nie Z, Wang L. Deep Learning Intervention for Health Care Challenges: Some Biomedical Domain Considerations. JMIR Mhealth Uhealth 2019;7:e11966. [PMID: 31376272 PMCID: PMC6696854 DOI: 10.2196/11966] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Revised: 04/14/2019] [Accepted: 06/12/2019] [Indexed: 01/10/2023] Open

Mercatelli D, Ray F, Giorgi FM. Pan-Cancer and Single-Cell Modeling of Genomic Alterations Through Gene Expression. Front Genet 2019;10:671. [PMID: 31379928 PMCID: PMC6657420 DOI: 10.3389/fgene.2019.00671] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Accepted: 06/27/2019] [Indexed: 12/27/2022] Open

Wang K, Liu X, Guo Y, Wu Z, Zhi D, Ruan J, Zhao Z. The International Conference on Intelligent Biology and Medicine (ICIBM) 2018: systems biology on diverse data types. BMC SYSTEMS BIOLOGY 2018;12:125. [PMID: 30577731 PMCID: PMC6302362 DOI: 10.1186/s12918-018-0648-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]