1
|
Bahrambanan F, Alizamir M, Moradveisi K, Heddam S, Kim S, Kim S, Soleimani M, Afshar S, Taherkhani A. The development of an efficient artificial intelligence-based classification approach for colorectal cancer response to radiochemotherapy: deep learning vs. machine learning. Sci Rep 2025; 15:62. [PMID: 39748016 PMCID: PMC11696929 DOI: 10.1038/s41598-024-84023-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Accepted: 12/19/2024] [Indexed: 01/04/2025] Open
Abstract
Colorectal cancer (CRC) is a form of cancer that impacts both the rectum and colon. Typically, it begins with a small abnormal growth known as a polyp, which can either be non-cancerous or cancerous. Therefore, early detection of colorectal cancer as the second deadliest cancer after lung cancer, can be highly beneficial. Moreover, the standard treatment for locally advanced colorectal cancer, which is widely accepted around the world, is chemoradiotherapy. Then, in this study, seven artificial intelligence models including decision tree, K-nearest neighbors, Adaboost, random forest, Gradient Boosting, multi-layer perceptron, and convolutional neural network were implemented to detect patients responder and non-responder to radiochemotherapy. For finding the potential predictors (genes), three feature selection strategies were employed including mutual information, F-classif, and Chi-Square. Based on feature selection models, four different scenarios were developed and five, ten, twenty and thirty features selected for designing a more accurate classification paradigm. The results of this study confirm that random forest, Gradient Boosting, decision tree, and K-nearest neighbors provided more accurate results in terms of accuracy, by 93.8%. Moreover, Among the feature selection methods, mutual information and F-classif showed the best results, while Chi-Square produced the worst results. Therefore, the suggested artificial intelligence models can be successfully applied as a robust approach for classification of colorectal cancer response to radiochemotherapy for medical studies.
Collapse
Affiliation(s)
- Fatemeh Bahrambanan
- Research Center for Molecular Medicine, Hamadan University of Medical Sciences, Hamadan, Iran.
| | - Meysam Alizamir
- Institute of Research and Development, Duy Tan University, Da Nang, Vietnam.
- School of Engineering & Technology, Duy Tan University, Da Nang, Vietnam.
| | - Kayhan Moradveisi
- Civil Engineering Department, University of Kurdistan, Sanandaj, Iran
| | - Salim Heddam
- Faculty of Science, Agronomy Department, Hydraulics Division, University 20 Août 1955, Route El Hadaik BP 26, 21000, Skikda, Algeria
| | - Sungwon Kim
- Department of Railroad Construction and Safety Engineering, Dongyang University, Yeongju, 36040, Republic of Korea
| | - Seunghyun Kim
- Department of Biology, University of California San Diego, San Diego, CA, 92093, USA
| | - Meysam Soleimani
- Department of Pharmaceutical Biotechnology, School of Pharmacy, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Saeid Afshar
- Department of Molecular Medicine and Genetics, Medical School, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Amir Taherkhani
- Research Center for Molecular Medicine, Hamadan University of Medical Sciences, Hamadan, Iran
| |
Collapse
|
2
|
Liu Z, Park T. DMOIT: denoised multi-omics integration approach based on transformer multi-head self-attention mechanism. Front Genet 2024; 15:1488683. [PMID: 39720180 PMCID: PMC11666520 DOI: 10.3389/fgene.2024.1488683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2024] [Accepted: 11/25/2024] [Indexed: 12/26/2024] Open
Abstract
Multi-omics data integration has become increasingly crucial for a deeper understanding of the complexity of biological systems. However, effectively integrating and analyzing multi-omics data remains challenging due to their heterogeneity and high dimensionality. Existing methods often struggle with noise, redundant features, and the complex interactions between different omics layers, leading to suboptimal performance. Additionally, they face difficulties in adequately capturing intra-omics interactions due to simplistic concatenation techiniques, and they risk losing critical inter-omics interaction information when using hierarchical attention layers. To address these challenges, we propose a novel Denoised Multi-Omics Integration approach that leverages the Transformer multi-head self-attention mechanism (DMOIT). DMOIT consists of three key modules: a generative adversarial imputation network for handling missing values, a sampling-based robust feature selection module to reduce noise and redundant features, and a multi-head self-attention (MHSA) based feature extractor with a noval architecture that enchance the intra-omics interaction capture. We validated model porformance using cancer datasets from the Cancer Genome Atlas (TCGA), conducting two tasks: survival time classification across different cancer types and estrogen receptor status classification for breast cancer. Our results show that DMOIT outperforms traditional machine learning methods and the state-of-the-art integration method MoGCN in terms of accuracy and weighted F1 score. Furthermore, we compared DMOIT with various alternative MHSA-based architectures to further validate our approach. Our results show that DMOIT consistently outperforms these models across various cancer types and different omics combinations. The strong performance and robustness of DMOIT demonstrate its potential as a valuable tool for integrating multi-omics data across various applications.
Collapse
Affiliation(s)
- Zhe Liu
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- Department of Statistics, Seoul National University, Seoul, Republic of Korea
| |
Collapse
|
3
|
Wang Z, Geng A, Duan H, Cui F, Zou Q, Zhang Z. A comprehensive review of approaches for spatial domain recognition of spatial transcriptomes. Brief Funct Genomics 2024; 23:702-712. [PMID: 39426802 DOI: 10.1093/bfgp/elae040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Revised: 10/02/2024] [Accepted: 10/07/2024] [Indexed: 10/21/2024] Open
Abstract
In current bioinformatics research, spatial transcriptomics (ST) as a rapidly evolving technology is gradually receiving widespread attention from researchers. Spatial domains are regions where gene expression and histology are consistent in space, and detecting spatial domains can better understand the organization and functional distribution of tissues. Spatial domain recognition is a fundamental step in the process of ST data interpretation, which is also a major challenge in ST analysis. Therefore, developing more accurate, efficient, and general spatial domain recognition methods has become an important and urgent research direction. This article aims to review the current status and progress of spatial domain recognition research, explore the advantages and limitations of existing methods, and provide suggestions and directions for future tool development.
Collapse
Affiliation(s)
- Ziyi Wang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Aoyun Geng
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Hao Duan
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Feifei Cui
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| |
Collapse
|
4
|
Beltrán JF, Herrera-Belén L, Yáñez AJ, Jimenez L. Prediction of viral oncoproteins through the combination of generative adversarial networks and machine learning techniques. Sci Rep 2024; 14:27108. [PMID: 39511292 PMCID: PMC11543823 DOI: 10.1038/s41598-024-77028-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2024] [Accepted: 10/18/2024] [Indexed: 11/15/2024] Open
Abstract
Viral oncoproteins play crucial roles in transforming normal cells into cancer cells, representing a significant factor in the etiology of various cancers. Traditionally, identifying these oncoproteins is both time-consuming and costly. With advancements in computational biology, bioinformatics tools based on machine learning have emerged as effective methods for predicting biological activities. Here, for the first time, we propose an innovative approach that combines Generative Adversarial Networks (GANs) with supervised learning methods to enhance the accuracy and generalizability of viral oncoprotein prediction. Our methodology evaluated multiple machine learning models, including Random Forest, Multilayer Perceptron, Light Gradient Boosting Machine, eXtreme Gradient Boosting, and Support Vector Machine. In ten-fold cross-validation on our training dataset, the GAN-enhanced Random Forest model demonstrated superior performance metrics: 0.976 accuracy, 0.976 F1 score, 0.977 precision, 0.976 sensitivity, and 1.0 AUC. During independent testing, this model achieved 0.982 accuracy, 0.982 F1 score, 0.982 precision, 0.982 sensitivity, and 1.0 AUC. These results establish our new tool, VirOncoTarget, accessible via a web application. We anticipate that VirOncoTarget will be a valuable resource for researchers, enabling rapid and reliable viral oncoprotein prediction and advancing our understanding of their role in cancer biology.
Collapse
Affiliation(s)
- Jorge F Beltrán
- Department of Chemical Engineering, Faculty of Engineering and Science, Universidad de La Frontera, Ave. Francisco Salazar 01145, Temuco, Chile.
| | - Lisandra Herrera-Belén
- Departamento de Ciencias Básicas, Facultad de Ciencias, Universidad Santo Tomas, Temuco, Chile
| | - Alejandro J Yáñez
- Departamento de Investigación y Desarrollo, Greenvolution SpA, Puerto Varas, Chile
- Interdisciplinary Center for Aquaculture Research (INCAR), Concepcion, Chile
| | - Luis Jimenez
- Department of Chemical Engineering, Faculty of Engineering and Science, Universidad de La Frontera, Ave. Francisco Salazar 01145, Temuco, Chile
| |
Collapse
|
5
|
Yu Z, Liu F, Li Y. scTCA: a hybrid Transformer-CNN architecture for imputation and denoising of scDNA-seq data. Brief Bioinform 2024; 25:bbae577. [PMID: 39523623 PMCID: PMC11551055 DOI: 10.1093/bib/bbae577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2024] [Revised: 10/05/2024] [Accepted: 10/29/2024] [Indexed: 11/16/2024] Open
Abstract
Single-cell DNA sequencing (scDNA-seq) has been widely used to unmask tumor copy number alterations (CNAs) at single-cell resolution. Despite that arm-level CNAs can be accurately detected from single-cell read counts, it is difficult to precisely identify focal CNAs as the read counts are featured with high dimensionality, high sparsity and low signal-to-noise ratio. This gives rise to a desperate demand for reconstructing high-quality scDNA-seq data. We develop a new method called scTCA for imputation and denoising of single-cell read counts, thus aiding in downstream analysis of both arm-level and focal CNAs. scTCA employs hybrid Transformer-CNN architectures to identify local and non-local correlations between genes for precise recovery of the read counts. Unlike conventional Transformers, the Transformer block in scTCA is a two-stage attention module containing a stepwise self-attention layer and a window Transformer, and can efficiently deal with the high-dimensional read counts data. We showcase the superior performance of scTCA through comparison with the state-of-the-arts on both synthetic and real datasets. The results indicate it is highly effective in imputation and denoising of scDNA-seq data.
Collapse
Affiliation(s)
- Zhenhua Yu
- School of Information Engineering, Ningxia University, 750021 Ningxia, China
- Ningxia Key Laboratory of Artificial Intelligence and Information Security for Channeling Computing Resources from the East to the West, Ningxia University, 750021 Ningxia, China
| | - Furui Liu
- School of Information Engineering, Ningxia University, 750021 Ningxia, China
| | - Yang Li
- School of Information Engineering, Ningxia University, 750021 Ningxia, China
| |
Collapse
|
6
|
Zhao J, Ching WK, Wong CW, Cheng X. BANMF-S: a blockwise accelerated non-negative matrix factorization framework with structural network constraints for single cell imputation. Brief Bioinform 2024; 25:bbae432. [PMID: 39242194 PMCID: PMC11379494 DOI: 10.1093/bib/bbae432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2024] [Revised: 07/23/2024] [Accepted: 08/19/2024] [Indexed: 09/09/2024] Open
Abstract
MOTIVATION Single cell RNA sequencing (scRNA-seq) technique enables the transcriptome profiling of hundreds to ten thousands of cells at the unprecedented individual level and provides new insights to study cell heterogeneity. However, its advantages are hampered by dropout events. To address this problem, we propose a Blockwise Accelerated Non-negative Matrix Factorization framework with Structural network constraints (BANMF-S) to impute those technical zeros. RESULTS BANMF-S constructs a gene-gene similarity network to integrate prior information from the external PPI network by the Triadic Closure Principle and a cell-cell similarity network to capture the neighborhood structure and temporal information through a Minimum-Spanning Tree. By collaboratively employing these two networks as regularizations, BANMF-S encourages the coherence of similar gene and cell pairs in the latent space, enhancing the potential to recover the underlying features. Besides, BANMF-S adopts a blocklization strategy to solve the traditional NMF problem through distributed Stochastic Gradient Descent method in a parallel way to accelerate the optimization. Numerical experiments on simulations and real datasets verify that BANMF-S can improve the accuracy of downstream clustering and pseudo-trajectory inference, and its performance is superior to seven state-of-the-art algorithms. AVAILABILITY All data used in this work are downloaded from publicly available data sources, and their corresponding accession numbers or source URLs are provided in Supplementary File Section 5.1 Dataset Information. The source codes are publicly available in Github repository https://github.com/jiayingzhao/BANMF-S.
Collapse
Affiliation(s)
- Jiaying Zhao
- Department of Mathematics, The University of Hong Kong, Pokfulam Road, Hong Kong
| | - Wai-Ki Ching
- Department of Mathematics, The University of Hong Kong, Pokfulam Road, Hong Kong
| | - Chi-Wing Wong
- Department of Mathematics, The University of Hong Kong, Pokfulam Road, Hong Kong
| | - Xiaoqing Cheng
- School of Mathematics and Statistics, Xi’an Jiaotong University, No. 28 Xianning West Road, Xi'an, Shaanxi 710049, China
| |
Collapse
|
7
|
Wang T, Yan Z, Zhang Y, Lou Z, Zheng X, Mai D, Wang Y, Shang X, Xiao B, Peng J, Chen J. postGWAS: A web server for deciphering the causality post the genome-wide association studies. Comput Biol Med 2024; 171:108108. [PMID: 38359659 DOI: 10.1016/j.compbiomed.2024.108108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/23/2024] [Accepted: 02/04/2024] [Indexed: 02/17/2024]
Abstract
While genome-wide association studies (GWAS) have unequivocally identified vast disease susceptibility variants, a majority of them are situated in non-coding regions and are in high linkage disequilibrium (LD). To pave the way of translating GWAS signals to clinical drug targets, it is essential to identify the underlying causal variants and further causal genes. To this end, a myriad of post-GWAS methods have been devised, each grounded in distinct principles including fine-mapping, co-localization, and transcriptome-wide association study (TWAS) techniques. Yet, no platform currently exists that seamlessly integrates these diverse post-GWAS methodologies. In this work, we present a user-friendly web server for post-GWAS analysis, that seamlessly integrates 9 distinct methods with 12 models, categorized by fine-mapping, colocalization, and TWAS. The server mainly helps users decipher the causality hindered by complex GWAS signals, including casual variants and casual genes, without the burden of computational skills and complex environment configuration, and provides a convenient platform for post-GWAS analysis, result visualization, facilitating the understanding and interpretation of the genome-wide association studies. The postGWAS server is available at http://g2g.biographml.com/.
Collapse
Affiliation(s)
- Tao Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China; Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Zhihao Yan
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China; Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Yiming Zhang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Zhuofei Lou
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Xiaozhu Zheng
- Department of Anesthesiology, The People's Hospital of Yubei District, Chongqing, 401120, China
| | - DuoDuo Mai
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China; Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Yongtian Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China; Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China; Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Bing Xiao
- School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China; Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Jing Chen
- School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, 710048, China.
| |
Collapse
|
8
|
Sun Y, Guo J, Liu Y, Wang N, Xu Y, Wu F, Xiao J, Li Y, Wang X, Hu Y, Zhou Y. METnet: A novel deep learning model predicting MET dysregulation in non-small-cell lung cancer on computed tomography images. Comput Biol Med 2024; 171:108136. [PMID: 38367451 DOI: 10.1016/j.compbiomed.2024.108136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 01/24/2024] [Accepted: 02/12/2024] [Indexed: 02/19/2024]
Abstract
BACKGROUND Mesenchymal epithelial transformation (MET) is a key molecular target for diagnosis and treatment of non-small cell lung cancer (NSCLC). The corresponding molecularly targeted therapeutics have been approved by Food and Drug Administration (FDA), achieving promising results. However, current detection of MET dysregulation requires biopsy and gene sequencing, which is invasive, time-consuming and difficult to obtain tumor samples. METHODS To address the above problems, we developed a noninvasive and convenient deep learning (DL) model based on Computed tomography (CT) imaging data for prediction of MET dysregulation. We introduced the unsupervised algorithm RK-net for automated image processing and utilized the MedSAM large model to achieve automated tissue segmentation. Based on the processed CT images, we developed a DL model (METnet). The model based on the grouped convolutional block. We evaluated the performance of the model over the internal test dataset using the area under the receiver operating characteristic curve (AUROC) and accuracy. We conducted subgroup analysis on the basis of clinical data of the lung cancer patients and compared the performance of the model in different subgroups. RESULTS The model demonstrated a good discriminative ability over the internal test dataset. The accuracy of METnet was 0.746 with an AUC value of 0.793 (95% CI 0.714-0.871). The subgroup analysis revealed that the model exhibited similar performance across different subgroups. CONCLUSIONS METnet realizes prediction of MET dysregulation in NSCLC, holding promise for guiding precise tumor diagnosis and treatment at the molecular level.
Collapse
Affiliation(s)
- Yige Sun
- Department of Radiology, Harbin Medical University Cancer Hospital, 150 Haping Road, Harbin, 150010, Heilongjiang, P.R. China; Genomics Research Center (Key Laboratory of Gut Microbiota and Pharmacogenomics of Heilongjiang Province, State-Province Key Laboratory of Biomedicine-Pharmaceutics of China), College of Pharmacy, Harbin Medical University, Harbin, 150081, China
| | - Jirui Guo
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang, 150001, China
| | - Yang Liu
- Department of Radiology, Harbin Medical University Cancer Hospital, 150 Haping Road, Harbin, 150010, Heilongjiang, P.R. China
| | - Nan Wang
- Beidahuang Industry Group General Hospital, Harbin, 150088, China
| | - Yanwei Xu
- Beidahuang Group Neuropsychiatric Hospital, Jiamusi, 154000, China
| | - Fei Wu
- The Second Affiliated Hospital of Harbin Medical University, Harbin Medical University, 150001, Harbin, Heilongjiang, China
| | - Jianxin Xiao
- Department of Radiology, Harbin Medical University Cancer Hospital, 150 Haping Road, Harbin, 150010, Heilongjiang, P.R. China
| | - Yingpu Li
- Department of Oncological Surgery, Harbin Medical University Cancer Hospital, Harbin, Heilongjiang Province, 150000, China
| | - Xinxin Wang
- Department of Radiology, Harbin Medical University Cancer Hospital, 150 Haping Road, Harbin, 150010, Heilongjiang, P.R. China
| | - Yang Hu
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang, 150001, China.
| | - Yang Zhou
- Department of Radiology, Harbin Medical University Cancer Hospital, 150 Haping Road, Harbin, 150010, Heilongjiang, P.R. China.
| |
Collapse
|