1
|
Tran TO, Vo TH, Le NQK. Omics-based deep learning approaches for lung cancer decision-making and therapeutics development. Brief Funct Genomics 2024; 23:181-192. [PMID: 37519050 DOI: 10.1093/bfgp/elad031] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 07/04/2023] [Accepted: 07/13/2023] [Indexed: 08/01/2023] Open
Abstract
Lung cancer has been the most common and the leading cause of cancer deaths globally. Besides clinicopathological observations and traditional molecular tests, the advent of robust and scalable techniques for nucleic acid analysis has revolutionized biological research and medicinal practice in lung cancer treatment. In response to the demands for minimally invasive procedures and technology development over the past decade, many types of multi-omics data at various genome levels have been generated. As omics data grow, artificial intelligence models, particularly deep learning, are prominent in developing more rapid and effective methods to potentially improve lung cancer patient diagnosis, prognosis and treatment strategy. This decade has seen genome-based deep learning models thriving in various lung cancer tasks, including cancer prediction, subtype classification, prognosis estimation, cancer molecular signatures identification, treatment response prediction and biomarker development. In this study, we summarized available data sources for deep-learning-based lung cancer mining and provided an update on recent deep learning models in lung cancer genomics. Subsequently, we reviewed the current issues and discussed future research directions of deep-learning-based lung cancer genomics research.
Collapse
Affiliation(s)
- Thi-Oanh Tran
- International Ph.D. Program in Cell Therapy and Regenerative Medicine, College of Medicine, Taipei Medical University, No 250 Wuxing Street, 110, Taipei, Taiwan
- AIBioMed Research Group, Taipei Medical University, No 250 Wuxing Street, 110, Taipei, Taiwan
- Hematology and Blood Transfusion Center, Bach Mai Hospital, No 78 Giai Phong Street, Hanoi, Viet Nam
| | - Thanh Hoa Vo
- Department of Science, School of Science and Computing, South East Technological University, Waterford X91 K0EK, Ireland
- Pharmaceutical and Molecular Biotechnology Research Center (PMBRC), South East Technological University, Waterford X91 K0EK, Ireland
| | - Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, 250 Wuxing Street, 110, Taipei, Taiwan
- AIBioMed Research Group, Taipei Medical University, No 250 Wuxing Street, 110, Taipei, Taiwan
- Research Center for Artificial Intelligence in Medicine, Taipei Medical University, 250 Wuxing Street, 110, Taipei, Taiwan
- Translational Imaging Research Center, Taipei Medical University Hospital, 252 Wuxing Street, 110, Taipei, Taiwan
| |
Collapse
|
2
|
Zhou M, He X, Zhang J, Mei C, Zhong B, Ou C. tRNA-derived small RNAs in human cancers: roles, mechanisms, and clinical application. Mol Cancer 2024; 23:76. [PMID: 38622694 PMCID: PMC11020452 DOI: 10.1186/s12943-024-01992-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 04/02/2024] [Indexed: 04/17/2024] Open
Abstract
Transfer RNA (tRNA)-derived small RNAs (tsRNAs) are a new type of non-coding RNAs (ncRNAs) produced by the specific cleavage of precursor or mature tRNAs. tsRNAs are involved in various basic biological processes such as epigenetic, transcriptional, post-transcriptional, and translation regulation, thereby affecting the occurrence and development of various human diseases, including cancers. Recent studies have shown that tsRNAs play an important role in tumorigenesis by regulating biological behaviors such as malignant proliferation, invasion and metastasis, angiogenesis, immune response, tumor resistance, and tumor metabolism reprogramming. These may be new potential targets for tumor treatment. Furthermore, tsRNAs can exist abundantly and stably in various bodily fluids (e.g., blood, serum, and urine) in the form of free or encapsulated extracellular vesicles, thereby affecting intercellular communication in the tumor microenvironment (TME). Meanwhile, their abnormal expression is closely related to the clinicopathological features of tumor patients, such as tumor staging, lymph node metastasis, and poor prognosis of tumor patients; thus, tsRNAs can be served as a novel type of liquid biopsy biomarker. This review summarizes the discovery, production, and expression of tsRNAs and analyzes their molecular mechanisms in tumor development and potential applications in tumor therapy, which may provide new strategies for early diagnosis and targeted therapy of tumors.
Collapse
Affiliation(s)
- Manli Zhou
- Department of Pathology, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China
- Department of Clinical Laboratory, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China
| | - Xiaoyun He
- Departments of Ultrasound Imaging, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China
| | - Jing Zhang
- Department of Pathology, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China
| | - Cheng Mei
- Department of Blood Transfusion, Xiangya Hospital, Clinical Transfusion Research Center, Central South University, Changsha, Hunan, 410008, China.
| | - Baiyun Zhong
- Department of Clinical Laboratory, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China.
| | - Chunlin Ou
- Department of Pathology, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China.
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China.
| |
Collapse
|
3
|
Qin S, Sun S, Wang Y, Li C, Fu L, Wu M, Yan J, Li W, Lv J, Chen L. Immune, metabolic landscapes of prognostic signatures for lung adenocarcinoma based on a novel deep learning framework. Sci Rep 2024; 14:527. [PMID: 38177198 PMCID: PMC10767103 DOI: 10.1038/s41598-023-51108-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 12/30/2023] [Indexed: 01/06/2024] Open
Abstract
Lung adenocarcinoma (LUAD) is a malignant tumor with high lethality, and the aim of this study was to identify promising biomarkers for LUAD. Using the TCGA-LUAD dataset as a discovery cohort, a novel joint framework VAEjMLP based on variational autoencoder (VAE) and multilayer perceptron (MLP) was proposed. And the Shapley Additive Explanations (SHAP) method was introduced to evaluate the contribution of feature genes to the classification decision, which helped us to develop a biologically meaningful biomarker potential scoring algorithm. Nineteen potential biomarkers for LUAD were identified, which were involved in the regulation of immune and metabolic functions in LUAD. A prognostic risk model for LUAD was constructed by the biomarkers HLA-DRB1, SCGB1A1, and HLA-DRB5 screened by Cox regression analysis, dividing the patients into high-risk and low-risk groups. The prognostic risk model was validated with external datasets. The low-risk group was characterized by enrichment of immune pathways and higher immune infiltration compared to the high-risk group. While, the high-risk group was accompanied by an increase in metabolic pathway activity. There were significant differences between the high- and low-risk groups in metabolic reprogramming of aerobic glycolysis, amino acids, and lipids, as well as in angiogenic activity, epithelial-mesenchymal transition, tumorigenic cytokines, and inflammatory response. Furthermore, high-risk patients were more sensitive to Afatinib, Gefitinib, and Gemcitabine as predicted by the pRRophetic algorithm. This study provides prognostic signatures capable of revealing the immune and metabolic landscapes for LUAD, and may shed light on the identification of other cancer biomarkers.
Collapse
Affiliation(s)
- Shimei Qin
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China
| | - Shibin Sun
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China
| | - Yahui Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China
| | - Chao Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China
| | - Lei Fu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China
| | - Ming Wu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China
| | - Jinxing Yan
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China
| | - Wan Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China
| | - Junjie Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China.
| | - Lina Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China.
| |
Collapse
|
4
|
Çalışkan M, Tazaki K. AI/ML advances in non-small cell lung cancer biomarker discovery. Front Oncol 2023; 13:1260374. [PMID: 38148837 PMCID: PMC10750392 DOI: 10.3389/fonc.2023.1260374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 11/16/2023] [Indexed: 12/28/2023] Open
Abstract
Lung cancer is the leading cause of cancer deaths among both men and women, representing approximately 25% of cancer fatalities each year. The treatment landscape for non-small cell lung cancer (NSCLC) is rapidly evolving due to the progress made in biomarker-driven targeted therapies. While advancements in targeted treatments have improved survival rates for NSCLC patients with actionable biomarkers, long-term survival remains low, with an overall 5-year relative survival rate below 20%. Artificial intelligence/machine learning (AI/ML) algorithms have shown promise in biomarker discovery, yet NSCLC-specific studies capturing the clinical challenges targeted and emerging patterns identified using AI/ML approaches are lacking. Here, we employed a text-mining approach and identified 215 studies that reported potential biomarkers of NSCLC using AI/ML algorithms. We catalogued these studies with respect to BEST (Biomarkers, EndpointS, and other Tools) biomarker sub-types and summarized emerging patterns and trends in AI/ML-driven NSCLC biomarker discovery. We anticipate that our comprehensive review will contribute to the current understanding of AI/ML advances in NSCLC biomarker research and provide an important catalogue that may facilitate clinical adoption of AI/ML-derived biomarkers.
Collapse
Affiliation(s)
- Minal Çalışkan
- Translational Science Department, Precision Medicine Function, Daiichi Sankyo, Inc., Basking Ridge, NJ, United States
| | - Koichi Tazaki
- Translational Science Department I, Precision Medicine Function, Daiichi Sankyo, Tokyo, Japan
| |
Collapse
|
5
|
Ellen JG, Jacob E, Nikolaou N, Markuzon N. Autoencoder-based multimodal prediction of non-small cell lung cancer survival. Sci Rep 2023; 13:15761. [PMID: 37737469 PMCID: PMC10517020 DOI: 10.1038/s41598-023-42365-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 09/09/2023] [Indexed: 09/23/2023] Open
Abstract
The ability to accurately predict non-small cell lung cancer (NSCLC) patient survival is crucial for informing physician decision-making, and the increasing availability of multi-omics data offers the promise of enhancing prognosis predictions. We present a multimodal integration approach that leverages microRNA, mRNA, DNA methylation, long non-coding RNA (lncRNA) and clinical data to predict NSCLC survival and identify patient subtypes, utilizing denoising autoencoders for data compression and integration. Survival performance for patients with lung adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC) was compared across modality combinations and data integration methods. Using The Cancer Genome Atlas data, our results demonstrate that survival prediction models combining multiple modalities outperform single modality models. The highest performance was achieved with a combination of only two modalities, lncRNA and clinical, at concordance indices (C-indices) of 0.69 ± 0.03 for LUAD and 0.62 ± 0.03 for LUSC. Models utilizing all five modalities achieved mean C-indices of 0.67 ± 0.04 and 0.63 ± 0.02 for LUAD and LUSC, respectively, while the best individual modality performance reached C-indices of 0.64 ± 0.03 for LUAD and 0.59 ± 0.03 for LUSC. Analysis of biological differences revealed two distinct survival subtypes with over 900 differentially expressed transcripts.
Collapse
Affiliation(s)
- Jacob G Ellen
- Institute of Health Informatics, University College London, London, UK.
| | - Etai Jacob
- AstraZeneca, Oncology Data Science, Waltham, MA, USA
| | | | | |
Collapse
|
6
|
Yan T, Yan Z, Liu L, Zhang X, Chen G, Xu F, Li Y, Zhang L, Peng M, Wang L, Li D, Zhao D. Survival prediction for patients with glioblastoma multiforme using a Cox proportional hazards denoising autoencoder network. Front Comput Neurosci 2023; 16:916511. [PMID: 36704230 PMCID: PMC9871481 DOI: 10.3389/fncom.2022.916511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 12/13/2022] [Indexed: 01/11/2023] Open
Abstract
Objectives This study aimed to establish and validate a prognostic model based on magnetic resonance imaging and clinical features to predict the survival time of patients with glioblastoma multiforme (GBM). Methods In this study, a convolutional denoising autoencoder (DAE) network combined with the loss function of the Cox proportional hazard regression model was used to extract features for survival prediction. In addition, the Kaplan-Meier curve, the Schoenfeld residual analysis, the time-dependent receiver operating characteristic curve, the nomogram, and the calibration curve were performed to assess the survival prediction ability. Results The concordance index (C-index) of the survival prediction model, which combines the DAE and the Cox proportional hazard regression model, reached 0.78 in the training set, 0.75 in the validation set, and 0.74 in the test set. Patients were divided into high- and low-risk groups based on the median prognostic index (PI). Kaplan-Meier curve was used for survival analysis (p = < 2e-16 in the training set, p = 3e-04 in the validation set, and p = 0.007 in the test set), which showed that the survival probability of different groups was significantly different, and the PI of the network played an influential role in the prediction of survival probability. In the residual verification of the PI, the fitting curve of the scatter plot was roughly parallel to the x-axis, and the p-value of the test was 0.11, proving that the PI and survival time were independent of each other and the survival prediction ability of the PI was less affected than survival time. The areas under the curve of the training set were 0.843, 0.871, 0.903, and 0.941; those of the validation set were 0.687, 0.895, 1.000, and 0.967; and those of the test set were 0.757, 0.852, 0.683, and 0.898. Conclusion The survival prediction model, which combines the DAE and the Cox proportional hazard regression model, can effectively predict the prognosis of patients with GBM.
Collapse
Affiliation(s)
- Ting Yan
- Key Laboratory of Cellular Physiology of the Ministry of Education, Department of Pathology, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Zhenpeng Yan
- Key Laboratory of Cellular Physiology of the Ministry of Education, Department of Pathology, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Lili Liu
- Key Laboratory of Cellular Physiology of the Ministry of Education, Department of Pathology, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Xiaoyu Zhang
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, China
| | - Guohui Chen
- Key Laboratory of Cellular Physiology of the Ministry of Education, Department of Pathology, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Feng Xu
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, China
| | - Ying Li
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, China
| | - Lijuan Zhang
- Shanxi Provincial People's Hospital, Taiyuan, China
| | - Meilan Peng
- Key Laboratory of Cellular Physiology of the Ministry of Education, Department of Pathology, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Lu Wang
- Key Laboratory of Cellular Physiology of the Ministry of Education, Department of Pathology, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Dandan Li
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, China,*Correspondence: Dandan Li ✉
| | - Dong Zhao
- Department of Stomatology, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China,Dong Zhao ✉
| |
Collapse
|
7
|
Wang J, Zhong F, Xiao F, Dong X, Long Y, Gan T, Li T, Liao M. CT radiomics model combined with clinical and radiographic features for discriminating peripheral small cell lung cancer from peripheral lung adenocarcinoma. Front Oncol 2023; 13:1157891. [PMID: 37020864 PMCID: PMC10069670 DOI: 10.3389/fonc.2023.1157891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 03/06/2023] [Indexed: 04/07/2023] Open
Abstract
Purpose Exploring a non-invasive method to accurately differentiate peripheral small cell lung cancer (PSCLC) and peripheral lung adenocarcinoma (PADC) could improve clinical decision-making and prognosis. Methods This retrospective study reviewed the clinicopathological and imaging data of lung cancer patients between October 2017 and March 2022. A total of 240 patients were enrolled in this study, including 80 cases diagnosed with PSCLC and 160 with PADC. All patients were randomized in a seven-to-three ratio into the training and validation datasets (170 vs. 70, respectively). The least absolute shrinkage and selection operator regression was employed to generate radiomics features and univariate analysis, followed by multivariate logistic regression to select significant clinical and radiographic factors to generate four models: clinical, radiomics, clinical-radiographic, and clinical-radiographic-radiomics (comprehensive). The Delong test was to compare areas under the receiver operating characteristic curves (AUCs) in the models. Results Five clinical-radiographic features and twenty-three selected radiomics features differed significantly in the identification of PSCLC and PADC. The clinical, radiomics, clinical-radiographic and comprehensive models demonstrated AUCs of 0.8960, 0.8356, 0.9396, and 0.9671 in the validation set, with the comprehensive model having better discernment than the clinical model (P=0.036), the radiomics model (P=0.006) and the clinical-radiographic model (P=0.049). Conclusions The proposed model combining clinical data, radiographic characteristics and radiomics features could accurately distinguish PSCLC from PADC, thus providing a potential non-invasive method to help clinicians improve treatment decisions.
Collapse
Affiliation(s)
- Jingting Wang
- Department of Radiology, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Feiyang Zhong
- Department of Radiology, Zhongnan Hospital of Wuhan University, Wuhan, China
- Department of Radiology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Feng Xiao
- Department of Radiology, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Xinyang Dong
- Department of Radiology, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Yun Long
- Department of Radiology, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Tian Gan
- Department of Radiology, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Ting Li
- Department of Radiology, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Meiyan Liao
- Department of Radiology, Zhongnan Hospital of Wuhan University, Wuhan, China
- *Correspondence: Meiyan Liao,
| |
Collapse
|
8
|
Combining metabolome and clinical indicators with machine learning provides some promising diagnostic markers to precisely detect smear-positive/negative pulmonary tuberculosis. BMC Infect Dis 2022; 22:707. [PMID: 36008772 PMCID: PMC9403968 DOI: 10.1186/s12879-022-07694-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 08/22/2022] [Indexed: 11/30/2022] Open
Abstract
Background Tuberculosis (TB) had been the leading lethal infectious disease worldwide for a long time (2014–2019) until the COVID-19 global pandemic, and it is still one of the top 10 death causes worldwide. One important reason why there are so many TB patients and death cases in the world is because of the difficulties in precise diagnosis of TB using common detection methods, especially for some smear-negative pulmonary tuberculosis (SNPT) cases. The rapid development of metabolome and machine learning offers a great opportunity for precision diagnosis of TB. However, the metabolite biomarkers for the precision diagnosis of smear-positive and smear-negative pulmonary tuberculosis (SPPT/SNPT) remain to be uncovered. In this study, we combined metabolomics and clinical indicators with machine learning to screen out newly diagnostic biomarkers for the precise identification of SPPT and SNPT patients. Methods Untargeted plasma metabolomic profiling was performed for 27 SPPT patients, 37 SNPT patients and controls. The orthogonal partial least squares-discriminant analysis (OPLS-DA) was then conducted to screen differential metabolites among the three groups. Metabolite enriched pathways, random forest (RF), support vector machines (SVM) and multilayer perceptron neural network (MLP) were performed using Metaboanalyst 5.0, “caret” R package, “e1071” R package and “Tensorflow” Python package, respectively. Results Metabolomic analysis revealed significant enrichment of fatty acid and amino acid metabolites in the plasma of SPPT and SNPT patients, where SPPT samples showed a more serious dysfunction in fatty acid and amino acid metabolisms. Further RF analysis revealed four optimized diagnostic biomarker combinations including ten features (two lipid/lipid-like molecules and seven organic acids/derivatives, and one clinical indicator) for the identification of SPPT, SNPT patients and controls with high accuracy (83–93%), which were further verified by SVM and MLP. Among them, MLP displayed the best classification performance on simultaneously precise identification of the three groups (94.74%), suggesting the advantage of MLP over RF/SVM to some extent. Conclusions Our findings reveal plasma metabolomic characteristics of SPPT and SNPT patients, provide some novel promising diagnostic markers for precision diagnosis of various types of TB, and show the potential of machine learning in screening out biomarkers from big data. Supplementary Information The online version contains supplementary material available at 10.1186/s12879-022-07694-8.
Collapse
|
9
|
Madhumita, Paul S. Capturing the latent space of an Autoencoder for multi-omics integration and cancer subtyping. Comput Biol Med 2022; 148:105832. [PMID: 35834966 DOI: 10.1016/j.compbiomed.2022.105832] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 06/15/2022] [Accepted: 07/03/2022] [Indexed: 11/29/2022]
Abstract
BACKGROUND AND OBJECTIVE The motivation behind cancer subtyping is to identify subgroups of cancer patients with distinguishable phenotypes of clinical importance. It can assist in advancement of subtype-targeted based treatments. Subtype identification is a complicated task, therefore requires multi-omics data integration to identify the precise patients' subgroup. Over the years, several computational attempts have been made to identify the cancer subtypes accurately using integrative multi-omics analysis. Some studies have used Autoencoders (AE) to capture multi-omics feature integration in lower dimensions for identifying subtypes in specific types of cancer. However, capturing the highly informative latent space by learning the deep architectures of AE to attain a satisfactory generalized performance is required. Therefore, in this study, a novel AE-assisted cancer subtyping framework is presented that utilizes the compressed latent space of a Sparse AE neural network for multi-omics clustering. METHODS The proposed framework first performs a supervised feature selection based on the survival status of the patients. The selected features from each of the omic data are passed to the AE. The information embedded in the latent space of the trained AE neural networks are then used for cancer subtyping using Spectral clustering. The AE architecture designed in this study exhaustively searches the best compression for multi-omics data by varying the number of neurons in the hidden layers and penalizing activations within the layers. RESULTS AND CONCLUSION The proposed framework is applied to five different multi-omics cancer datasets taken from The Cancer Genome Atlas. It is observed that for getting a robust information bottleneck, a compression of 10-20% of the input features along with an L1 regularization penalty of 0.01 or 0.001 performs well for most of the cancer datasets. Clustering performed on this latent representation generates clusters with better silhouette scores and significantly varying survival patterns. For further biological assessment, differential expression analysis is performed between the identified subtypes of Glioblastoma multiforme (GBM), followed by enrichment analysis of the differentially expressed biomarkers. Several pathways and disease ontology terms coherent to GBM are found to be significantly associated. Varying responses of the identified GBM subtypes towards the drug Temozolomide is also tested to demonstrate its clinical importance. Hence, the study shows that AE-assisted multi-omics integration can be used for the prediction of clinically significant cancer subtypes.
Collapse
Affiliation(s)
- Madhumita
- Department of Bioscience and Bioengineering, Indian Institute of Technology, Jodhpur, 342037, Rajasthan, India.
| | - Sushmita Paul
- Department of Bioscience and Bioengineering, Indian Institute of Technology, Jodhpur, 342037, Rajasthan, India; School of Artificial Intelligence and Data Science, Indian Institute of Technology, Jodhpur, 342037, Rajasthan, India.
| |
Collapse
|
10
|
Caudai C, Galizia A, Geraci F, Le Pera L, Morea V, Salerno E, Via A, Colombo T. AI applications in functional genomics. Comput Struct Biotechnol J 2021; 19:5762-5790. [PMID: 34765093 PMCID: PMC8566780 DOI: 10.1016/j.csbj.2021.10.009] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 10/05/2021] [Accepted: 10/05/2021] [Indexed: 12/13/2022] Open
Abstract
We review the current applications of artificial intelligence (AI) in functional genomics. The recent explosion of AI follows the remarkable achievements made possible by "deep learning", along with a burst of "big data" that can meet its hunger. Biology is about to overthrow astronomy as the paradigmatic representative of big data producer. This has been made possible by huge advancements in the field of high throughput technologies, applied to determine how the individual components of a biological system work together to accomplish different processes. The disciplines contributing to this bulk of data are collectively known as functional genomics. They consist in studies of: i) the information contained in the DNA (genomics); ii) the modifications that DNA can reversibly undergo (epigenomics); iii) the RNA transcripts originated by a genome (transcriptomics); iv) the ensemble of chemical modifications decorating different types of RNA transcripts (epitranscriptomics); v) the products of protein-coding transcripts (proteomics); and vi) the small molecules produced from cell metabolism (metabolomics) present in an organism or system at a given time, in physiological or pathological conditions. After reviewing main applications of AI in functional genomics, we discuss important accompanying issues, including ethical, legal and economic issues and the importance of explainability.
Collapse
Affiliation(s)
- Claudia Caudai
- CNR, Institute of Information Science and Technologies “A. Faedo” (ISTI), Pisa, Italy
| | - Antonella Galizia
- CNR, Institute of Applied Mathematics and Information Technologies (IMATI), Genoa, Italy
| | - Filippo Geraci
- CNR, Institute for Informatics and Telematics (IIT), Pisa, Italy
| | - Loredana Le Pera
- CNR, Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Bari, Italy
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Veronica Morea
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Emanuele Salerno
- CNR, Institute of Information Science and Technologies “A. Faedo” (ISTI), Pisa, Italy
| | - Allegra Via
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Teresa Colombo
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| |
Collapse
|
11
|
Pratella D, Ait-El-Mkadem Saadi S, Bannwarth S, Paquis-Fluckinger V, Bottini S. A Survey of Autoencoder Algorithms to Pave the Diagnosis of Rare Diseases. Int J Mol Sci 2021; 22:10891. [PMID: 34639231 PMCID: PMC8509321 DOI: 10.3390/ijms221910891] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 10/04/2021] [Accepted: 10/07/2021] [Indexed: 12/28/2022] Open
Abstract
Rare diseases (RDs) concern a broad range of disorders and can result from various origins. For a long time, the scientific community was unaware of RDs. Impressive progress has already been made for certain RDs; however, due to the lack of sufficient knowledge, many patients are not diagnosed. Nowadays, the advances in high-throughput sequencing technologies such as whole genome sequencing, single-cell and others, have boosted the understanding of RDs. To extract biological meaning using the data generated by these methods, different analysis techniques have been proposed, including machine learning algorithms. These methods have recently proven to be valuable in the medical field. Among such approaches, unsupervised learning methods via neural networks including autoencoders (AEs) or variational autoencoders (VAEs) have shown promising performances with applications on various type of data and in different contexts, from cancer to healthy patient tissues. In this review, we discuss how AEs and VAEs have been used in biomedical settings. Specifically, we discuss their current applications and the improvements achieved in diagnostic and survival of patients. We focus on the applications in the field of RDs, and we discuss how the employment of AEs and VAEs would enhance RD understanding and diagnosis.
Collapse
Affiliation(s)
- David Pratella
- Center of Modeling, Simulation and Interactions, Université Côte d’Azur, 06200 Nice, France;
| | - Samira Ait-El-Mkadem Saadi
- Centre Hospitalier Universitaire (CHU) de Nice, Institute for Research on Cancer and Aging, Nice (IRCAN), Université Côte d’Azur, Inserm U1081, CNRS UMR 7284, 06200 Nice, France; (S.A.-E.-M.S.); (S.B.); (V.P.-F.)
| | - Sylvie Bannwarth
- Centre Hospitalier Universitaire (CHU) de Nice, Institute for Research on Cancer and Aging, Nice (IRCAN), Université Côte d’Azur, Inserm U1081, CNRS UMR 7284, 06200 Nice, France; (S.A.-E.-M.S.); (S.B.); (V.P.-F.)
| | - Véronique Paquis-Fluckinger
- Centre Hospitalier Universitaire (CHU) de Nice, Institute for Research on Cancer and Aging, Nice (IRCAN), Université Côte d’Azur, Inserm U1081, CNRS UMR 7284, 06200 Nice, France; (S.A.-E.-M.S.); (S.B.); (V.P.-F.)
| | - Silvia Bottini
- Center of Modeling, Simulation and Interactions, Université Côte d’Azur, 06200 Nice, France;
| |
Collapse
|
12
|
Chen Q, Zhang X, Shi J, Yan M, Zhou T. Origins and evolving functionalities of tRNA-derived small RNAs. Trends Biochem Sci 2021; 46:790-804. [PMID: 34053843 PMCID: PMC8448906 DOI: 10.1016/j.tibs.2021.05.001] [Citation(s) in RCA: 72] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Revised: 04/22/2021] [Accepted: 05/03/2021] [Indexed: 12/14/2022]
Abstract
Transfer RNA (tRNA)-derived small RNAs (tsRNAs) are among the most ancient small RNAs in all domains of life and are generated by the cleavage of tRNAs. Emerging studies have begun to reveal the versatile roles of tsRNAs in fundamental biological processes, including gene silencing, ribosome biogenesis, retrotransposition, and epigenetic inheritance, which are rooted in tsRNA sequence conservation, RNA modifications, and protein-binding abilities. We summarize the mechanisms of tsRNA biogenesis and the impact of RNA modifications, and propose how thinking of tsRNA functionality from an evolutionary perspective urges the expansion of tsRNA research into a wider spectrum, including cross-tissue/cross-species regulation and harnessing of the 'tsRNA code' for precision medicine.
Collapse
Affiliation(s)
- Qi Chen
- Division of Biomedical Sciences, School of Medicine, University of California, Riverside, CA, USA.
| | - Xudong Zhang
- Division of Biomedical Sciences, School of Medicine, University of California, Riverside, CA, USA
| | - Junchao Shi
- Division of Biomedical Sciences, School of Medicine, University of California, Riverside, CA, USA
| | - Menghong Yan
- Institutes of Biomedical Sciences, Shanghai Medical College of Fudan University, Shanghai, China; Department of Physiology and Cell Biology, University of Nevada, Reno School of Medicine, Reno, NV, USA
| | - Tong Zhou
- Department of Physiology and Cell Biology, University of Nevada, Reno School of Medicine, Reno, NV, USA.
| |
Collapse
|