1
|
Chen B, Sun X, Huang H, Feng C, Chen W, Wu D. An integrated machine learning framework for developing and validating a diagnostic model of major depressive disorder based on interstitial cystitis-related genes. J Affect Disord 2024; 359:22-32. [PMID: 38754597 DOI: 10.1016/j.jad.2024.05.061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 03/24/2024] [Accepted: 05/12/2024] [Indexed: 05/18/2024]
Abstract
BACKGROUND Major depressive disorder (MDD) and interstitial cystitis (IC) are two highly debilitating conditions that often coexist with reciprocal effect, significantly exacerbating patients' suffering. However, the molecular underpinnings linking these disorders remain poorly understood. METHODS Transcriptomic data from GEO datasets including those of MDD and IC patients was systematically analyzed to develop and validate our model. Following removal of batch effect, differentially expressed genes (DEGs) between respective disease and control groups were identified. Shared DEGs of the conditions then underwent functional enrichment analyses. Additionally, immune infiltration analysis was quantified through ssGSEA. A diagnostic model for MDD was constructed by exploring 113 combinations of 12 machine learning algorithms with 10-fold cross-validation on the training sets following by external validation on test sets. Finally, the "Enrichr" platform was utilized to identify potential drugs for MDD. RESULTS Totally, 21 key genes closely associated with both MDD and IC were identified, predominantly involved in immune processes based on enrichment analyses. Immune infiltration analysis revealed distinct profiles of immune cell infiltration in MDD and IC compared to healthy controls. From these genes, a robust 11-gene (ABCD2, ATP8B4, TNNT1, AKR1C3, SLC26A8, S100A12, PTX3, FAM3B, ITGA2B, OLFM4, BCL7A) diagnostic signature was constructed, which exhibited superior performance over existing MDD diagnostic models both in training and testing cohorts. Additionally, epigallocatechin gallate and 10 other drugs emerged as potential targets for MDD. CONCLUSION Our work developed a diagnostic model for MDD employing a combination of bioinformatic techniques and machine learning methods, focusing on shared genes between MDD and IC.
Collapse
Affiliation(s)
- Bohong Chen
- Department of Urology, The First Affiliated Hospital of Xi'an Jiaotong University, 710061 Xi'an, Shaanxi, China
| | - Xinyue Sun
- Department of neurology, The First Affiliated Hospital of Xi'an Jiaotong University, 710061 Xi'an, Shaanxi, China
| | - Haoxiang Huang
- Department of Urology, The First Affiliated Hospital of Xi'an Jiaotong University, 710061 Xi'an, Shaanxi, China
| | - Cong Feng
- Department of Urology, The First Affiliated Hospital of Xi'an Jiaotong University, 710061 Xi'an, Shaanxi, China
| | - Wei Chen
- Department of Urology, The First Affiliated Hospital of Xi'an Jiaotong University, 710061 Xi'an, Shaanxi, China.
| | - Dapeng Wu
- Department of Urology, The First Affiliated Hospital of Xi'an Jiaotong University, 710061 Xi'an, Shaanxi, China.
| |
Collapse
|
2
|
Ma Y, Zhang B, Liu Z, Liu Y, Wang J, Li X, Feng F, Ni Y, Li S. IAS-FET: An intelligent assistant system and an online platform for enhancing successful rate of in-vitro fertilization embryo transfer technology based on clinical features. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 245:108050. [PMID: 38301430 DOI: 10.1016/j.cmpb.2024.108050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 01/20/2024] [Accepted: 01/23/2024] [Indexed: 02/03/2024]
Abstract
BACKGROUND Among all of the assisted reproductive technology (ART) methods, in vitro fertilization-embryo transfer (IVF-ET) holds a prominent position as a key solution for overcoming infertility. However, its success rate hovers at a modest 30% to 70%. Adding to the challenge is the absence of effective models and clinical tools capable of predicting the outcome of IVF-ET before embryo formation. Our study is dedicated to filling this critical gap by aiming to predict IVF-ET outcomes and ultimately enhance the success rate of this transformative procedure. METHODS In this retrospective study, infertile patients who received artificial assisted pregnancy treatment at Gansu Provincial Maternity and Child-care Hospital in China were enrolled from 2016 to 2020. Individual's clinical information were studied by cascade XGBoost method to build an intelligent assisted system for predicting the outcome of IVF-ET, called IAS-FET. The cascade XGBoost model was trained using clinical information from 2292 couples and externally tested using clinical information from 573 couples. In addition, several schemes which will be of help for patients to adjust their physical condition to improve their success rate on ART were suggested by IAS-FET. RESULTS The outcome of IVF-ET can be predicted by the built IAS-FET method with the area under curve (AUC) value of 0.8759 on the external test set. Besides, this IAS-FET method can provide several schemes to improve the successful rate of IVF-ET outcomes. The built tool for IAS-FET is addressed as a free platform online at http://www.cppdd.cn/ART for the convenient usage of users. CONCLUSIONS It suggested the significant influence of personal clinical features for the success of ART. The proposed system IAS-FET based on the top 27 factors could be a promising tool to predict the outcome of ART and propose a plan for the patient's physical adjustment. With the help of IAS-FET, patients can take informed steps towards increasing their chances of a successful outcome on their journey to parenthood.
Collapse
Affiliation(s)
- Ying Ma
- Gansu Provincial Maternity and Child-care Hospital, Lanzhou, Gansu 730030, China
| | - Bowen Zhang
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China; School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, Hubei 430073, China
| | - Zhaoqing Liu
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Yujie Liu
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Jiarui Wang
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Xingxuan Li
- School of Chemistry and Chemical Engineering, Lanzhou University, Lanzhou, Gansu 730030, China
| | - Fan Feng
- Gansu Provincial Maternity and Child-care Hospital, Lanzhou, Gansu 730030, China
| | - Yali Ni
- Gansu Provincial Maternity and Child-care Hospital, Lanzhou, Gansu 730030, China
| | - Shuyan Li
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China.
| |
Collapse
|
3
|
Liang S, Zhao Y, Jin J, Qiao J, Wang D, Wang Y, Wei L. Rm-LR: A long-range-based deep learning model for predicting multiple types of RNA modifications. Comput Biol Med 2023; 164:107238. [PMID: 37515874 DOI: 10.1016/j.compbiomed.2023.107238] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 06/16/2023] [Accepted: 07/07/2023] [Indexed: 07/31/2023]
Abstract
Recent research has highlighted the pivotal role of RNA post-transcriptional modifications in the regulation of RNA expression and function. Accurate identification of RNA modification sites is important for understanding RNA function. In this study, we propose a novel RNA modification prediction method, namely Rm-LR, which leverages a long-range-based deep learning approach to accurately predict multiple types of RNA modifications using RNA sequences only. Rm-LR incorporates two large-scale RNA language pre-trained models to capture discriminative sequential information and learn local important features, which are subsequently integrated through a bilinear attention network. Rm-LR supports a total of ten RNA modification types (m6A, m1A, m5C, m5U, m6Am, Ψ, Am, Cm, Gm, and Um) and significantly outperforms the state-of-the-art methods in terms of predictive capability on benchmark datasets. Experimental results show the effectiveness and superiority of Rm-LR in prediction of various RNA modifications, demonstrating the strong adaptability and robustness of our proposed model. We demonstrate that RNA language pretrained models enable to learn dense biological sequential representations from large-scale long-range RNA corpus, and meanwhile enhance the interpretability of the models. This work contributes to the development of accurate and reliable computational models for RNA modification prediction, providing insights into the complex landscape of RNA modifications.
Collapse
Affiliation(s)
- Sirui Liang
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Yanxi Zhao
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Junru Jin
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Jianbo Qiao
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Ding Wang
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Yu Wang
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Leyi Wei
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China.
| |
Collapse
|
4
|
Alkady W, ElBahnasy K, Gad W. A diagnostic model for COVID-19 based on proteomics analysis. Comput Biol Med 2023; 162:107109. [PMID: 37276752 PMCID: PMC10232940 DOI: 10.1016/j.compbiomed.2023.107109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 05/21/2023] [Accepted: 05/30/2023] [Indexed: 06/07/2023]
Abstract
BACKGROUND AND OBJECTIVE Early diagnosis of Coronavirus Disease 2019 (COVID-19) can help save patients' lives before the disease turns severe. This can be achieved through an effective and correct treatment protocol. In this paper, a prediction model is proposed to detect infected cases and determine the severity level of the disease. METHODS The proposed model is based on utilizing proteins and metabolites as features for each patient, which are then analyzed using feature selection methods such as Principal Component Analysis (PCA), Information Gain (IG), and analysis of Variance (ANOVA) to select the most significant features. The model employs three classifiers, namely K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Random Forest (RF), to predict and classify the severity level of the COVID-19 infection. The proposed model is evaluated using four performance measures: accuracy, sensitivity, specificity, and precision. RESULTS The experiment results show that the proposed model accuracy can reach 80% using RF classifier with PCA. The PCA selects 22 proteins and 10 metabolites. While ANOVA selects 9 proteins and 5 metabolites. The accuracy reaches 92% after applying RF classifier with the ANOVA. Finally, the accuracy reaches 93% using the RF classifier with only ten features. The selected features are 7 proteins and 3 metabolites. Moreover, it shows that the selected features have a relation to the immune system and respiratory systems. CONCLUSION The proposed model uses three classifiers and shows promising results by selecting the important features and maximizing the prediction accuracy.
Collapse
Affiliation(s)
- Walaa Alkady
- Bioinformatics Program, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt.
| | - Khaled ElBahnasy
- Department of Information Systems, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt.
| | - Walaa Gad
- Department of Information Systems, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt.
| |
Collapse
|
5
|
Chen Y, Liu Z, Yu Q, Sun X, Wang S, Zhu Q, Yang J, Jiang R. Investigation of Underlying Biological Association and Targets between Rejection of Renal Transplant and Renal Cancer. Int J Genomics 2023; 2023:5542233. [PMID: 37261105 PMCID: PMC10229252 DOI: 10.1155/2023/5542233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 05/05/2023] [Accepted: 05/10/2023] [Indexed: 06/02/2023] Open
Abstract
Background Post-renal transplant patients have a high likelihood of developing renal cancer. However, the underlying biological mechanisms behind the development of renal cancer in post-kidney transplant patients remain to be elucidated. Therefore, this study aimed to investigate the underlying biological mechanism behind the development of renal cell carcinoma in post-renal transplant patients. Methods Next-generation sequencing data and corresponding clinical information of patients with clear cell renal cell carcinoma (ccRCC) were obtained from The Cancer Genome Atlas Program (TCGA) database. The microarray data of kidney transplant patients with or without rejection response was obtained from the Gene Expression Omnibus (GEO) database. In addition, statistical analysis was conducted in R software. Results We identified 55 upregulated genes in the transplant patients with rejection from the GEO datasets (GSE48581, GSE36059, and GSE98320). Furthermore, we conducted bioinformatics analyses, which showed that all of these genes were upregulated in ccRCC tissue. Moreover, a prognosis model was constructed based on four rejection-related genes, including PLAC8, CSTA, AIM2, and LYZ. The prognosis model showed excellent performance in prognosis prediction in a ccRCC cohort. In addition, the machine learning algorithms identified 19 rejection-related genes, including PLAC8, involved in ccRCC occurrence. Finally, the PLAC8 was selected for further research, including its clinical and biological role. Conclusion In all, our study provides novel insight into the transition from the rejection of renal transplant to renal cancer. Meanwhile, PLAC8 could be a potential biomarker for ccRCC diagnosis and prognosis in post-kidney transplant patients.
Collapse
Affiliation(s)
- Yinwei Chen
- Department of Urology, The Second Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Zhanpeng Liu
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Qian Yu
- College of Pediatrics, Nanjing Medical University, Nanjing, China
| | - Xu Sun
- Department of Urology, The Second Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Shuai Wang
- Department of Orthopedics, Huai'an No. 1 People's Hospital, Huai'an, China
| | - Qingyi Zhu
- Department of Urology, The Second Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Jian Yang
- Department of Urology, The Second Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Rongjiang Jiang
- Department of Urology, The Second Affiliated Hospital of Nanjing Medical University, Nanjing, China
| |
Collapse
|
6
|
Vahabzadeh V, Moattar MH. Robust microarray data feature selection using a correntropy based distance metric learning approach. Comput Biol Med 2023; 161:107056. [PMID: 37235945 DOI: 10.1016/j.compbiomed.2023.107056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 04/18/2023] [Accepted: 05/20/2023] [Indexed: 05/28/2023]
Abstract
Classification of high-dimensional microarray data is a challenge in bioinformatics and genetic data processing. One of the challenging issues of feature selection is the presence of outliers. The Euclidean distance metric is sensitive to outliers. In this study, a distance metric learning based feature selection approach that uses the correntropy function as the discrimination metric is proposed. For this purpose, the metric learning problem is formulated as an optimization problem and solved using the Lagrange method. The output of the approach signifies the most important and robust features. After feature selection, different classification methods such as SVM, decision trees, and NN classifiers are used to investigate the classification accuracy of the proposed method as well as precision, recall, and F-measure. Experiments are carried out on 13 high-dimensional datasets and show that the proposed method outperforms the previous models in terms of accuracy and robustness.
Collapse
Affiliation(s)
- Venus Vahabzadeh
- Department of Software Engineering, Mashhad Branch, Islamic Azad University, Mashhad, Iran.
| | | |
Collapse
|
7
|
Zhang Y, Sun H, Lian X, Tang J, Zhu F. ANPELA: Significantly Enhanced Quantification Tool for Cytometry-Based Single-Cell Proteomics. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023; 10:e2207061. [PMID: 36950745 DOI: 10.1002/advs.202207061] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 02/13/2023] [Indexed: 05/27/2023]
Abstract
ANPELA is widely used for quantifying traditional bulk proteomic data. Recently, there is a clear shift from bulk proteomics to the single-cell ones (SCP), for which powerful cytometry techniques demonstrate the fantastic capacity of capturing cellular heterogeneity that is completely overlooked by traditional bulk profiling. However, the in-depth and high-quality quantification of SCP data is still challenging and severely affected by the large numbers of quantification workflows and extreme performance dependence on the studied datasets. In other words, the proper selection of well-performing workflow(s) for any studied dataset is elusory, and it is urgently needed to have a significantly enhanced and accelerated tool to address this issue. However, no such tool is developed yet. Herein, ANPELA is therefore updated to its 2.0 version (https://idrblab.org/anpela/), which is unique in providing the most comprehensive set of quantification alternatives (>1000 workflows) among all existing tools, enabling systematic performance evaluation from multiple perspectives based on machine learning, and identifying the optimal workflow(s) using overall performance ranking together with the parallel computation. Extensive validation on different benchmark datasets and representative application scenarios suggest the great application potential of ANPELA in current SCP research for gaining more accurate and reliable biological insights.
Collapse
Affiliation(s)
- Ying Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Huaicheng Sun
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Xichen Lian
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Jing Tang
- Department of Bioinformatics, Chongqing Medical University, Chongqing, 400016, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
| |
Collapse
|
8
|
Walakira A, Skubic C, Nadižar N, Rozman D, Režen T, Mraz M, Moškon M. Integrative computational modeling to unravel novel potential biomarkers in hepatocellular carcinoma. Comput Biol Med 2023; 159:106957. [PMID: 37116239 DOI: 10.1016/j.compbiomed.2023.106957] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 03/17/2023] [Accepted: 04/16/2023] [Indexed: 04/30/2023]
Abstract
Hepatocellular carcinoma (HCC) is a major health problem around the world. The management of this disease is complicated by the lack of noninvasive diagnostic tools and the few treatment options available. Better clinical outcomes can be achieved if HCC is detected early, but unfortunately, clinical signs appear when the disease is in its late stages. We aim to identify novel genes that can be targeted for the diagnosis and therapy of HCC. We performed a meta-analysis of transcriptomics data to identify differentially expressed genes and applied network analysis to identify hub genes. Fatty acid metabolism, complement and coagulation cascade, chemical carcinogenesis and retinol metabolism were identified as key pathways in HCC. Furthermore, we integrated transcriptomics data into a reference human genome-scale metabolic model to identify key reactions and subsystems relevant in HCC. We conclude that fatty acid activation, purine metabolism, vitamin D, and E metabolism are key processes in the development of HCC and therefore need to be further explored for the development of new therapies. We provide the first evidence that GABRP, HBG1 and DAK (TKFC) genes are important in HCC in humans and warrant further studies.
Collapse
Affiliation(s)
- Andrew Walakira
- Centre for Functional Genomics and Bio-Chips, Institute for Biochemistry and Molecular Genetics, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia.
| | - Cene Skubic
- Centre for Functional Genomics and Bio-Chips, Institute for Biochemistry and Molecular Genetics, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
| | - Nejc Nadižar
- Centre for Functional Genomics and Bio-Chips, Institute for Biochemistry and Molecular Genetics, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
| | - Damjana Rozman
- Centre for Functional Genomics and Bio-Chips, Institute for Biochemistry and Molecular Genetics, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
| | - Tadeja Režen
- Centre for Functional Genomics and Bio-Chips, Institute for Biochemistry and Molecular Genetics, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
| | - Miha Mraz
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
| | - Miha Moškon
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia.
| |
Collapse
|
9
|
Nie X, Qin D, Zhou X, Duo H, Hao Y, Li B, Liang G. Clustering ensemble in scRNA-seq data analysis: Methods, applications and challenges. Comput Biol Med 2023; 159:106939. [PMID: 37075602 DOI: 10.1016/j.compbiomed.2023.106939] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 03/31/2023] [Accepted: 04/14/2023] [Indexed: 04/21/2023]
Abstract
With the rapid development of single-cell RNA-sequencing techniques, various computational methods and tools were proposed to analyze these high-throughput data, which led to an accelerated reveal of potential biological information. As one of the core steps of single-cell transcriptome data analysis, clustering plays a crucial role in identifying cell types and interpreting cellular heterogeneity. However, the results generated by different clustering methods showed distinguishing, and those unstable partitions can affect the accuracy of the analysis to a certain extent. To overcome this challenge and obtain more accurate results, currently clustering ensemble is frequently applied to cluster analysis of single-cell transcriptome datasets, and the results generated by all clustering ensembles are nearly more reliable than those from most of the single clustering partitions. In this review, we summarize applications and challenges of the clustering ensemble method in single-cell transcriptome data analysis, and provide constructive thoughts and references for researchers in this field.
Collapse
Affiliation(s)
- Xiner Nie
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400044, China; College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China
| | - Dan Qin
- Department of Biology, College of Science, Northeastern University, Boston, MA, 02115, USA
| | - Xinyi Zhou
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China
| | - Hongrui Duo
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China
| | - Youjin Hao
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China
| | - Bo Li
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China.
| | - Guizhao Liang
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400044, China.
| |
Collapse
|
10
|
Chatterjee D, Rahman MM, Saha AK, Siam MKS, Sharif Shohan MU. Transcriptomic analysis of esophageal cancer reveals hub genes and networks involved in cancer progression. Comput Biol Med 2023; 159:106944. [PMID: 37075603 DOI: 10.1016/j.compbiomed.2023.106944] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 04/09/2023] [Accepted: 04/14/2023] [Indexed: 04/21/2023]
Abstract
Esophageal carcinoma (ESCA) has a 5-year survival rate of fewer than 20%. The study aimed to identify new predictive biomarkers for ESCA through transcriptomics meta-analysis to address the problems of ineffective cancer therapy, lack of efficient diagnostic tools, and costly screening and contribute to developing more efficient cancer screening and treatments by identifying new marker genes. Nine GEO datasets of three kinds of esophageal carcinoma were analyzed, and 20 differentially expressed genes were detected in carcinogenic pathways. Network analysis revealed four hub genes, namely RAR Related Orphan Receptor A (RORA), lysine acetyltransferase 2B (KAT2B), Cell Division Cycle 25B (CDC25B), and Epithelial Cell Transforming 2 (ECT2). Overexpression of RORA, KAT2B, and ECT2 was identified with a bad prognosis. These hub genes modulate immune cell infiltration. These hub genes modulate immune cell infiltration. Although this research needs lab confirmation, we found interesting biomarkers in ESCA that may aid in diagnosis and treatment.
Collapse
Affiliation(s)
- Dipankor Chatterjee
- Department of Biochemistry and Molecular Biology, University of Dhaka, Dhaka, Bangladesh
| | - Md Mostafijur Rahman
- Department of Microbiology, Jashore University of Science and Technology, Bangladesh
| | - Anik Kumar Saha
- Institute of Food Science and Technology, Bangladesh Council of Scientific and Industrial Research, Dhaka, Bangladesh
| | | | | |
Collapse
|
11
|
Fajarda O, Almeida JR, Duarte-Pereira S, Silva RM, Oliveira JL. Methodology to identify a gene expression signature by merging microarray datasets. Comput Biol Med 2023; 159:106867. [PMID: 37060770 DOI: 10.1016/j.compbiomed.2023.106867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 03/01/2023] [Accepted: 03/30/2023] [Indexed: 04/17/2023]
Abstract
A vast number of microarray datasets have been produced as a way to identify differentially expressed genes and gene expression signatures. A better understanding of these biological processes can help in the diagnosis and prognosis of diseases, as well as in the therapeutic response to drugs. However, most of the available datasets are composed of a reduced number of samples, leading to low statistical, predictive and generalization power. One way to overcome this problem is by merging several microarray datasets into a single dataset, which is typically a challenging task. Statistical methods or supervised machine learning algorithms are usually used to determine gene expression signatures. Nevertheless, statistical methods require an arbitrary threshold to be defined, and supervised machine learning methods can be ineffective when applied to high-dimensional datasets like microarrays. We propose a methodology to identify gene expression signatures by merging microarray datasets. This methodology uses statistical methods to obtain several sets of differentially expressed genes and uses supervised machine learning algorithms to select the gene expression signature. This methodology was validated using two distinct research applications: one using heart failure and the other using autism spectrum disorder microarray datasets. For the first, we obtained a gene expression signature composed of 117 genes, with a classification accuracy of approximately 98%. For the second use case, we obtained a gene expression signature composed of 79 genes, with a classification accuracy of approximately 82%. This methodology was implemented in R language and is available, under the MIT licence, at https://github.com/bioinformatics-ua/MicroGES.
Collapse
Affiliation(s)
- Olga Fajarda
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal.
| | - João Rafael Almeida
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal; Department of Computation, University of A Coruña, A Coruña, Spain.
| | - Sara Duarte-Pereira
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal; Department of Medical Sciences and iBiMED-Institute of Biomedicine, University of Aveiro, Aveiro, Portugal.
| | - Raquel M Silva
- Universidade Católica Portuguesa, Faculty of Dental Medicine (FMD), Center for Interdisciplinary Research in Health (CIIS), Viseu, Portugal.
| | | |
Collapse
|
12
|
Liu Y, Ma J, Wang X, Liu P, Cai C, Han Y, Zeng S, Feng Z, Shen H. Lipophagy-related gene RAB7A is involved in immune regulation and malignant progression in hepatocellular carcinoma. Comput Biol Med 2023; 158:106862. [PMID: 37044053 DOI: 10.1016/j.compbiomed.2023.106862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 02/05/2023] [Accepted: 03/30/2023] [Indexed: 04/08/2023]
Abstract
BACKGROUND RAB7A (RAS-related in Brain 7A) is an important member of the RAS oncogene family. However, the correlation between RAB7A and the development and immune infiltration of hepatocellular carcinoma (HCC) has rarely been studied. Here, we studied the role of RAB7A in HCC through bioinformatics analysis, real-world cohort validation, and in vitro experimental exploration. MATERIALS AND METHODS The RAB7A expression level was analyzed through TCGA, HPA and TISIDB databases. TIMER and TISCH were used to analyze the correlation between RAB7A and tumor immune microenvironment. The expression of RAB7A was detected through real-time PCR and western blotting. The cell proliferation was detected by EdU and CCK8. Wound-healing and transwell assays were used to test the invasion and migration ability. Cell cycle distribution and reactive oxygen species (ROS) content were analyzed by flow cytometry. Identification of epithelial-mesenchymal transition (EMT) was performed by immunofluorescence double staining. Immunohistochemistry (IHC) was used to evaluate the correlation between RAB7A and immune checkpoints. RESULTS RAB7A is upregulated in most of the tumor types, and the upregulation of RAB7A is associated with a poorer prognosis in many cancers. The results showed that RAB7A was significantly positively correlated with the infiltration of macrophages and cancer-associated fibroblasts (CAFs), but negatively correlated with M2-type macrophages in most tumors. The single-cell atlas also revealed the distribution and proportion of RAB7A in immune cells of HCC. The in vitro experiments suggested that RAB7A was increased in HCC tissue and cell lines. The knockdown of RAB7A inhibited the activation of the PIK3CA-AKT pathway and suppressed the expression of CDK4, CDK6 and CCNA2. Knockdown of RAB7A induced G0/G1 arrest and ROS accumulation in HCC. In addition, overexpression of RAB7A enhanced migration and invasion by inducing EMT. The real-world cohort showed that the expression level of RAB7A was positively correlated with the expression levels of TGFBR1 and PD-L1. CONCLUSIONS RAB7A may serve as a potential tumor prognostic and immune infiltration-related biomarker, predicting immunotherapy efficacy in certain cancer types, especially in HCC. Besides, RAB7A was a multi-pathway target involved in the malignant progression of HCC.
Collapse
Affiliation(s)
- Yongting Liu
- Department of Oncology, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China.
| | - Jiayao Ma
- Department of Oncology, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China.
| | - Xinwen Wang
- Department of Oncology, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China.
| | - Ping Liu
- Department of Oncology, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China.
| | - Changjing Cai
- Department of Oncology, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China.
| | - Ying Han
- Department of Oncology, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China.
| | - Shan Zeng
- Department of Oncology, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China; National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China.
| | - Ziyang Feng
- Department of Oncology, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China.
| | - Hong Shen
- Department of Oncology, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China; National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China.
| |
Collapse
|
13
|
Nalepa J, Kotowski K, Machura B, Adamski S, Bozek O, Eksner B, Kokoszka B, Pekala T, Radom M, Strzelczak M, Zarudzki L, Krason A, Arcadu F, Tessier J. Deep learning automates bidimensional and volumetric tumor burden measurement from MRI in pre- and post-operative glioblastoma patients. Comput Biol Med 2023; 154:106603. [PMID: 36738710 DOI: 10.1016/j.compbiomed.2023.106603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 01/11/2023] [Accepted: 01/22/2023] [Indexed: 02/05/2023]
Abstract
Tumor burden assessment by magnetic resonance imaging (MRI) is central to the evaluation of treatment response for glioblastoma. This assessment is, however, complex to perform and associated with high variability due to the high heterogeneity and complexity of the disease. In this work, we tackle this issue and propose a deep learning pipeline for the fully automated end-to-end analysis of glioblastoma patients. Our approach simultaneously identifies tumor sub-regions, including the enhancing tumor, peritumoral edema and surgical cavity in the first step, and then calculates the volumetric and bidimensional measurements that follow the current Response Assessment in Neuro-Oncology (RANO) criteria. Also, we introduce a rigorous manual annotation process which was followed to delineate the tumor sub-regions by the human experts, and to capture their segmentation confidences that are later used while training deep learning models. The results of our extensive experimental study performed over 760 pre-operative and 504 post-operative adult patients with glioma obtained from the public database (acquired at 19 sites in years 2021-2020) and from a clinical treatment trial (47 and 69 sites for pre-/post-operative patients, 2009-2011) and backed up with thorough quantitative, qualitative and statistical analysis revealed that our pipeline performs accurate segmentation of pre- and post-operative MRIs in a fraction of the manual delineation time (up to 20 times faster than humans). Volumetric measurements were in strong agreement with experts with the Intraclass Correlation Coefficient (ICC): 0.959, 0.703, 0.960 for ET, ED, and cavity. Similarly, automated RANO compared favorably with experienced readers (ICC: 0.681 and 0.866) producing consistent and accurate results. Additionally, we showed that RANO measurements are not always sufficient to quantify tumor burden. The high performance of the automated tumor burden measurement highlights the potential of the tool for considerably improving and simplifying radiological evaluation of glioblastoma in clinical trials and clinical practice.
Collapse
Affiliation(s)
- Jakub Nalepa
- Graylight Imaging, Gliwice, Poland; Department of Algorithmics and Software, Silesian University of Technology, Gliwice, Poland.
| | | | | | | | - Oskar Bozek
- Department of Radiodiagnostics and Invasive Radiology, School of Medicine in Katowice, Medical University of Silesia in Katowice, Katowice, Poland
| | - Bartosz Eksner
- Department of Radiology and Nuclear Medicine, ZSM Chorzów, Chorzów, Poland
| | - Bartosz Kokoszka
- Department of Radiodiagnostics, Interventional Radiology and Nuclear Medicine, University Clinical Centre, Katowice, Poland
| | - Tomasz Pekala
- Department of Radiodiagnostics, Interventional Radiology and Nuclear Medicine, University Clinical Centre, Katowice, Poland
| | - Mateusz Radom
- Department of Radiology and Diagnostic Imaging, Maria Skłodowska-Curie National Research Institute of Oncology, Gliwice Branch, Gliwice, Poland
| | - Marek Strzelczak
- Department of Radiology and Diagnostic Imaging, Maria Skłodowska-Curie National Research Institute of Oncology, Gliwice Branch, Gliwice, Poland
| | - Lukasz Zarudzki
- Department of Radiology and Diagnostic Imaging, Maria Skłodowska-Curie National Research Institute of Oncology, Gliwice Branch, Gliwice, Poland
| | - Agata Krason
- Roche Pharmaceutical Research & Early Development, Early Clinical Development Oncology, Roche Innovation Center Basel, Basel, Switzerland
| | - Filippo Arcadu
- Roche Pharmaceutical Research & Early Development, Early Clinical Development Informatics, Roche Innovation Center Basel, Basel, Switzerland
| | - Jean Tessier
- Roche Pharmaceutical Research & Early Development, Early Clinical Development Oncology, Roche Innovation Center Basel, Basel, Switzerland
| |
Collapse
|
14
|
Baran Y, Doğan B. scMAGS: Marker gene selection from scRNA-seq data for spatial transcriptomics studies. Comput Biol Med 2023; 155:106634. [PMID: 36774895 DOI: 10.1016/j.compbiomed.2023.106634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Revised: 01/28/2023] [Accepted: 02/04/2023] [Indexed: 02/11/2023]
Abstract
Single-Cell RNA sequencing (scRNA-seq) has provided unprecedented opportunities for exploring gene expression and thus uncovering regulatory relationships between genes at the single-cell level. However, scRNA-seq relies on isolating cells from tissues. Therefore, the spatial context of the regulatory processes is lost. A recent technological innovation, spatial transcriptomics, allows for the measurement of gene expression while preserving spatial information. An initial step in the spatial transcriptomic analysis is to identify the cell type, which requires a careful selection of cell-specific marker genes. For this purpose, currently, scRNA-seq data is used to select a limited number of marker genes from among all genes that distinguish cell types from each other. This study proposes scMAGS (single-cell MArker Gene Selection), a novel method for marker gene selection from scRNA-seq data for spatial transcriptomics studies. scMAGS uses a filtering step in which the candidate genes are identified before the marker gene selection step. For the selection of marker genes, cluster validity indices, the Silhouette index, or the Calinski-Harabasz index (for large datasets) are utilized. Experimental results showed that, in comparison to the existing methods, scMAGS is scalable, fast, and accurate. Even for large datasets with millions of cells, scMAGS could find the required number of marker genes in a reasonable amount of time with fewer memory requirements. scMAGS is made freely available at https://github.com/doganlab/scmags and can be downloaded from the Python Package Directory (PyPI) software repository with the command pip install scmags.
Collapse
Affiliation(s)
- Yusuf Baran
- Department of Biomedical Engineering, Inonu University, Malatya, Turkey
| | - Berat Doğan
- Department of Biomedical Engineering, Inonu University, Malatya, Turkey.
| |
Collapse
|
15
|
Yagin FH, Cicek İB, Alkhateeb A, Yagin B, Colak C, Azzeh M, Akbulut S. Explainable artificial intelligence model for identifying COVID-19 gene biomarkers. Comput Biol Med 2023; 154:106619. [PMID: 36738712 PMCID: PMC9889119 DOI: 10.1016/j.compbiomed.2023.106619] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 01/11/2023] [Accepted: 01/28/2023] [Indexed: 02/04/2023]
Abstract
AIM COVID-19 has revealed the need for fast and reliable methods to assist clinicians in diagnosing the disease. This article presents a model that applies explainable artificial intelligence (XAI) methods based on machine learning techniques on COVID-19 metagenomic next-generation sequencing (mNGS) samples. METHODS In the data set used in the study, there are 15,979 gene expressions of 234 patients with COVID-19 negative 141 (60.3%) and COVID-19 positive 93 (39.7%). The least absolute shrinkage and selection operator (LASSO) method was applied to select genes associated with COVID-19. Support Vector Machine - Synthetic Minority Oversampling Technique (SVM-SMOTE) method was used to handle the class imbalance problem. Logistics regression (LR), SVM, random forest (RF), and extreme gradient boosting (XGBoost) methods were constructed to predict COVID-19. An explainable approach based on local interpretable model-agnostic explanations (LIME) and SHAPley Additive exPlanations (SHAP) methods was applied to determine COVID-19- associated biomarker candidate genes and improve the final model's interpretability. RESULTS For the diagnosis of COVID-19, the XGBoost (accuracy: 0.930) model outperformed the RF (accuracy: 0.912), SVM (accuracy: 0.877), and LR (accuracy: 0.912) models. As a result of the SHAP, the three most important genes associated with COVID-19 were IFI27, LGR6, and FAM83A. The results of LIME showed that especially the high level of IFI27 gene expression contributed to increasing the probability of positive class. CONCLUSIONS The proposed model (XGBoost) was able to predict COVID-19 successfully. The results show that machine learning combined with LIME and SHAP can explain the biomarker prediction for COVID-19 and provide clinicians with an intuitive understanding and interpretability of the impact of risk factors in the model.
Collapse
Affiliation(s)
- Fatma Hilal Yagin
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, 44280, Malatya, Turkey.
| | - İpek Balikci Cicek
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, 44280, Malatya, Turkey.
| | - Abedalrhman Alkhateeb
- Software Engineering Department, King Hussein School for Computing Sciences, Amman, Jordan.
| | - Burak Yagin
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, 44280, Malatya, Turkey.
| | - Cemil Colak
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, 44280, Malatya, Turkey.
| | - Mohammad Azzeh
- Data Science Department, King Hussein School for Computing Sciences, Amman, Jordan.
| | - Sami Akbulut
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, 44280, Malatya, Turkey; Inonu University, Faculty of Medicine, Department of Surgery, 44280, Malatya, Turkey; Inonu University, Faculty of Medicine, Department of Public Health, 44280, Malatya, Turkey.
| |
Collapse
|
16
|
Yang Y, Cao Y, Han X, Ma X, Li R, Wang R, Xiao L, Xie L. Revealing EXPH5 as a potential diagnostic gene biomarker of the late stage of COPD based on machine learning analysis. Comput Biol Med 2023; 154:106621. [PMID: 36746116 DOI: 10.1016/j.compbiomed.2023.106621] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 01/19/2023] [Accepted: 01/28/2023] [Indexed: 02/01/2023]
Abstract
Chronic obstructive pulmonary disease is a kind of chronic lung disease characterized by persistent air flow obstruction, which was the third leading cause of death in China. The incidence of COPD is steadily and increasing and has been a globally sever disease. Accordingly, it is urgently needed to explore how to diagnose and treat COPD timely. This study aims to find key genes to diagnose COPD as soon as possible to avoid COPD processing and analyze immune cell infiltration between COPD early stage and late stage. Two GEO datasets were merged as the merge data for analyses. 157 DEGs were used for GSEA analysis to find the pathway between COPD early stage and late stage. Above all, gene EXPH5 stood out from the screen as the most likely candidate diagnosis biomarker of COPD indicating the late-stage by least LASSO and SVM-RFE. ROC curves of EXPH5 were applied to represent the discriminatory ability through the area under the curve which is the gold standard to evaluate the accuracy of diagnosis and survival rate. The CIBERSORT algorithm was used to assess the distribution of tissue-infiltrating immune cells between two COPD stages. The diagnosis biomarker, gene EXPH5 had a positive correlation with NK cells resting; mast cell resting, eosinophils, and negative correlation with T cell gamma delta, macrophages M1, which underscore the role of gene and immune cell infiltration. To make results more reliable, we further analyzed the gene EXPH5 expression in single-cell transcriptome data and showed again that EXPH5 genes significantly downregulated in the late stage of COPD especially in the main lung cell types AT1 and AT2. In a word, our study identified genes EXPH5 as a marker gene, which adds to the knowledge for clinical diagnosis and pharmaceutical design of COPD.
Collapse
Affiliation(s)
- Yuwei Yang
- College of Pulmonary & Critical Care Medicine, Chinese PLA General Hospital, Beijing, 100091, China; Beijing Key Laboratory of OTIR, Beijing, 100091, China.
| | - Yan Cao
- College of Pulmonary & Critical Care Medicine, Chinese PLA General Hospital, Beijing, 100091, China; Beijing Key Laboratory of OTIR, Beijing, 100091, China.
| | - Xiaobo Han
- College of Pulmonary & Critical Care Medicine, Chinese PLA General Hospital, Beijing, 100091, China; Beijing Key Laboratory of OTIR, Beijing, 100091, China.
| | - Xihui Ma
- College of Pulmonary & Critical Care Medicine, Chinese PLA General Hospital, Beijing, 100091, China; Beijing Key Laboratory of OTIR, Beijing, 100091, China.
| | - Rui Li
- College of Pulmonary & Critical Care Medicine, Chinese PLA General Hospital, Beijing, 100091, China; Hebei North Universit, Zhangjiakou, 075000, China.
| | - Rentao Wang
- College of Pulmonary & Critical Care Medicine, Chinese PLA General Hospital, Beijing, 100091, China; Beijing Key Laboratory of OTIR, Beijing, 100091, China.
| | - Li Xiao
- College of Pulmonary & Critical Care Medicine, Chinese PLA General Hospital, Beijing, 100091, China; Beijing Key Laboratory of OTIR, Beijing, 100091, China.
| | - Lixin Xie
- College of Pulmonary & Critical Care Medicine, Chinese PLA General Hospital, Beijing, 100091, China; Beijing Key Laboratory of OTIR, Beijing, 100091, China.
| |
Collapse
|
17
|
Rather AA, Chachoo MA. Robust correlation estimation and UMAP assisted topological analysis of omics data for disease subtyping. Comput Biol Med 2023; 155:106640. [PMID: 36774889 DOI: 10.1016/j.compbiomed.2023.106640] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 01/08/2023] [Accepted: 02/05/2023] [Indexed: 02/10/2023]
Abstract
Deciphering information hidden in the gene expression assays for identifying disease subtypes has significant importance in precision medicine. However, computational limitations thwart this process due to the intricacy of the biological networks and the curse of dimensionality of gene expression data. Therefore, clustering in such scenarios often becomes the first choice of exploratory data analysis to identify natural structures and intrinsic patterns in the data. However, sparse and high dimensional nature of omics data prevents conventional clustering algorithms to discover subtypes that are clinically relevant and statistically significant. Hence, non-linear dimensionality reduction techniques coupled with clustering in such scenarios often becomes imperative to improve the clustering results. In this study, we present a robust pipeline to discover disease subtypes with clinical relevance. Specifically, we focus on discovering patient sub-groups that have a residual life patterns remarkably different from other sub-groups. This is significant because by refining prognosis, subtyping can reduce uncertainty in approximating patients expected outcome. The methodology present is based on robust correlation estimation, UMAP- a non-linear dimensionality reduction method and mapper- a tool from topology. Notably, we suggest a method for improving the robustness of the correlation matrix of gene expression data for improving the clustering results. The performance of the model is evaluated by applying to five cancer datasets obtained through TCGA and comparisons are performed with some state of the art methods of NEMO, RSC-OTRI and SNF with regard to log-rank test and Restricted Life Expectancy Difference. For example in GBM dataset, the minimum separation for any two discovered subtypes is 221 days which is significantly higher than the other methodologies. We also compared the results without using the robust correlation based estimate and observed that robust correlation improves separability between survival curves significantly. From the results we infer that our methodology performs better compared to other methodologies with regard to separating survival curves of patient sub-groups despite using single omics profiles of patients compared to multiple omics profiles of SNF and NEMO. Pathway over-representation analysis is performed on the final clustering results to investigate the biological underpinnings characterizing each subtype.
Collapse
Affiliation(s)
- Arif Ahmad Rather
- Department of Computer Sciences, University of Kashmir, Srinagar, JK, India.
| | | |
Collapse
|
18
|
Somadder PD, Hossain MA, Ahsan A, Sultana T, Soikot SH, Rahman MM, Ibrahim SM, Ahmed K, Bui FM. Drug Repurposing and Systems Biology approaches of Enzastaurin can target potential biomarkers and critical pathways in Colorectal Cancer. Comput Biol Med 2023; 155:106630. [PMID: 36774894 DOI: 10.1016/j.compbiomed.2023.106630] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 01/28/2023] [Accepted: 02/04/2023] [Indexed: 02/10/2023]
Abstract
Colorectal cancer (CRC) is a severe health concern that results from a cocktail of genetic, epigenetic, and environmental abnormalities. Because it is the second most lethal malignancy in the world and the third-most common malignant tumor, but the treatment is unavailable. The goal of the current study was to use bioinformatics and systems biology techniques to determine the pharmacological mechanism underlying putative important genes and linked pathways in early-onset CRC. Computer-aided methods were used to uncover similar biological targets and signaling pathways associated with CRC, along with bioinformatics and network pharmacology techniques to assess the effects of enzastaurin on CRC. The KEGG and gene ontology (GO) pathway analysis revealed several significant pathways including in positive regulation of protein phosphorylation, negative regulation of the apoptotic process, nucleus, nucleoplasm, protein tyrosine kinase activity, PI3K-Akt signaling pathway, pathways in cancer, focal adhesion, HIF-1 signaling pathway, and Rap1 signaling pathway. Later, the hub protein module identified from the protein-protein interactions (PPIs) network, molecular docking and molecular dynamics simulation represented that enzastaurin showed strong binding interaction with two hub proteins including CASP3 (-8.6 kcal/mol), and MCL1 (-8.6 kcal/mol), which were strongly implicated in CRC management than other the five hub proteins. Moreover, the pharmacokinetic features of enzastaurin revealed that it is an effective therapeutic agent with minimal adverse effects. Enzastaurin may inhibit the potential biological targets that are thought to be responsible for the advancement of CRC and this study suggests a potential novel therapeutic target for CRC.
Collapse
Affiliation(s)
- Pratul Dipta Somadder
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail, 1092, Bangladesh.
| | - Md Arju Hossain
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail, 1092, Bangladesh.
| | - Asif Ahsan
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail, 1092, Bangladesh.
| | - Tayeba Sultana
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail, 1092, Bangladesh.
| | - Sadat Hossain Soikot
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail, 1092, Bangladesh.
| | - Md Masuder Rahman
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail, 1092, Bangladesh.
| | - Sobhy M Ibrahim
- Department of Biochemistry, College of Science, King Saud University, P.O. Box 2455, Riyadh, 11451, Saudi Arabia.
| | - Kawsar Ahmed
- Department of Electrical and Computer Engineering, University of Saskatchewan, 57 Campus Drive, Saskatoon, SK, S7N 5A9, Canada; Group of Biophotomatiχ, Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Santosh, Tangail, 1902, Bangladesh.
| | - Francis M Bui
- Department of Electrical and Computer Engineering, University of Saskatchewan, 57 Campus Drive, Saskatoon, SK, S7N 5A9, Canada.
| |
Collapse
|
19
|
Zafari N, Bathaei P, Velayati M, Khojasteh-Leylakoohi F, Khazaei M, Fiuji H, Nassiri M, Hassanian SM, Ferns GA, Nazari E, Avan A. Integrated analysis of multi-omics data for the discovery of biomarkers and therapeutic targets for colorectal cancer. Comput Biol Med 2023; 155:106639. [PMID: 36805214 DOI: 10.1016/j.compbiomed.2023.106639] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 01/14/2023] [Accepted: 02/05/2023] [Indexed: 02/12/2023]
Abstract
The considerable burden of colorectal cancer and the rising trend in young adults emphasize the necessity of understanding its underlying mechanisms, providing new diagnostic and prognostic markers, and improving therapeutic approaches. Precision medicine is a new trend all over the world and identification of novel biomarkers and therapeutic targets is a step forward towards this trend. In this context, multi-omics data and integrated analysis are being investigated to develop personalized medicine in the management of colorectal cancer. Given the large amount of data from multi-omics approach, data integration and analysis is a great challenge. In this Review, we summarize how statistical and machine learning techniques are applied to analyze multi-omics data and how it contributes to the discovery of useful diagnostic and prognostic biomarkers and therapeutic targets. Moreover, we discuss the importance of these biomarkers and therapeutic targets in the clinical management of colorectal cancer in the future. Taken together, integrated analysis of multi-omics data has great potential for finding novel diagnostic and prognostic biomarkers and therapeutic targets, however, there are still challenges to overcome in future studies.
Collapse
Affiliation(s)
- Nima Zafari
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Parsa Bathaei
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Mahla Velayati
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Fatemeh Khojasteh-Leylakoohi
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran; Basic Sciences Research Institute, Mashhad University of Medical Sciences, Mashhad, Iran; Medical Genetics Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Majid Khazaei
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran; Basic Sciences Research Institute, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Hamid Fiuji
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Mohammadreza Nassiri
- Recombinant Proteins Research Group, The Research Institute of Biotechnology, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Seyed Mahdi Hassanian
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran; Basic Sciences Research Institute, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Gordon A Ferns
- Brighton & Sussex Medical School, Division of Medical Education, Falmer, Brighton, Sussex, BN1 9PH, UK
| | - Elham Nazari
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran; Basic Sciences Research Institute, Mashhad University of Medical Sciences, Mashhad, Iran.
| | - Amir Avan
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran; Basic Sciences Research Institute, Mashhad University of Medical Sciences, Mashhad, Iran; Medical Genetics Research Center, Mashhad University of Medical Sciences, Mashhad, Iran.
| |
Collapse
|
20
|
He H, Duo H, Hao Y, Zhang X, Zhou X, Zeng Y, Li Y, Li B. Computational drug repurposing by exploiting large-scale gene expression data: Strategy, methods and applications. Comput Biol Med 2023; 155:106671. [PMID: 36805225 DOI: 10.1016/j.compbiomed.2023.106671] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Revised: 02/05/2023] [Accepted: 02/10/2023] [Indexed: 02/18/2023]
Abstract
De novo drug development is an extremely complex, time-consuming and costly task. Urgent needs for therapies of various diseases have greatly accelerated searches for more effective drug development methods. Luckily, drug repurposing provides a new and effective perspective on disease treatment. Rapidly increased large-scale transcriptome data paints a detailed prospect of gene expression during disease onset and thus has received wide attention in the field of computational drug repurposing. However, how to efficiently mine transcriptome data and identify new indications for old drugs remains a critical challenge. This review discussed the irreplaceable role of transcriptome data in computational drug repurposing and summarized some representative databases, tools and strategies. More importantly, it proposed a practical guideline through establishing the correspondence between three gene expression data types and five strategies, which would facilitate researchers to adopt appropriate strategies to deeply mine large-scale transcriptome data and discover more effective therapies.
Collapse
Affiliation(s)
- Hao He
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China; State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Institutes of Brain Science, Fudan University, Shanghai, 200032, PR China
| | - Hongrui Duo
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China
| | - Youjin Hao
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China
| | - Xiaoxi Zhang
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China
| | - Xinyi Zhou
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China
| | - Yujie Zeng
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China
| | - Yinghong Li
- The Key Laboratory on Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, PR China
| | - Bo Li
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China.
| |
Collapse
|
21
|
Cheng N, Liu J, Chen C, Zheng T, Li C, Huang J. Prediction of lung cancer metastasis by gene expression. Comput Biol Med 2023; 153:106490. [PMID: 36638618 DOI: 10.1016/j.compbiomed.2022.106490] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 12/14/2022] [Accepted: 12/27/2022] [Indexed: 12/31/2022]
Abstract
Tumor metastasis is the main cause of death in cancer patients. Early prediction of tumor metastasis can allow for timely intervention. At present, research on tumor metastasis mainly focuses on manual diagnosis by imaging or diagnosis by computational methods. With the deterioration of the tumor, gene expression levels in blood change greatly. It is feasible to measure the transcripts of key genes to predict whether cancer will metastasize. Therefore, in this paper, we obtained gene expression data from 226 patients from TCGA. These data included 239,322 transcripts. Background screening and LASSO analysis were used to select 31 transcripts as features. Finally, a deep neural network (DNN) was used to determine whether or not lung cancer would metastasize. We compared our methods with several other methods and found that our method achieved the best precision. In addition, in a previous study, we identified 7 genes that play a vital role in lung cancer. We added those gene transcripts into the DNN and found that the AUC and AUPR of the model were increased.
Collapse
Affiliation(s)
- Nitao Cheng
- Department of Thoracic Surgery, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Junliang Liu
- Faculty of Computing, Harbin Institute of Technology, Harbin, China
| | - Chen Chen
- Department of Biological Repositories, Zhongnan Hospital of Wuhan University, China
| | - Tang Zheng
- Department of Thoracic Surgery, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Changsheng Li
- Department of Thoracic Surgery, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Jingyu Huang
- Department of Thoracic Surgery, Zhongnan Hospital of Wuhan University, Wuhan, China.
| |
Collapse
|
22
|
Zhang J, Jiang H, Shi T. ASE-Net: A tumor segmentation method based on image pseudo enhancement and adaptive-scale attention supervision module. Comput Biol Med 2023; 152:106363. [PMID: 36516579 DOI: 10.1016/j.compbiomed.2022.106363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Revised: 11/08/2022] [Accepted: 11/25/2022] [Indexed: 11/29/2022]
Abstract
Fluorine 18(18F) fluorodeoxyglucose positron emission tomography and Computed Tomography (PET/CT) is the preferred imaging method of choice for the diagnosis and treatment of many cancers. However, factors such as low-contrast organ and tissue images, and the original scale of tumors pose huge obstacles to the accurate segmentation of tumors. In this work, we propose a novel model ASE-Net which is used for multimodality tumor segmentation. Firstly, we propose a pseudo-enhanced CT image generation method based on metabolic intensity to generate pseudo-enhanced CT images as additional input, which reduces the learning of the network in the spatial position of PET/CT and increases the discriminability of the corresponding structural positions of the high and low metabolic region. Second, unlike previous networks that directly segment tumors of all scales, we propose an Adaptive-Scale Attention Supervision Module at the skip connections, after combining the results of all paths, tumors of different scales will be given different receptive fields. Finally, Dual Path Block is used as the backbone of our network to leverage the ability of residual learning for feature reuse and dense connection for exploring new features. Our experimental results on two clinical PET/CT datasets demonstrate the effectiveness of our proposed network and achieve 78.56% and 72.57% in Dice Similarity Coefficient, respectively, which has better performance compared to state-of-the-art network models, whether for large or small tumors. The proposed model will help pathologists formulate more accurate diagnoses by providing reference opinions during diagnosis, consequently improving patient survival rate.
Collapse
Affiliation(s)
- Junzhi Zhang
- Software College, Northeastern University, No. 195, Chuangxin Road, Hunnan District, Shenyang, 110169, Liaoning, China
| | - Huiyan Jiang
- Software College, Northeastern University, No. 195, Chuangxin Road, Hunnan District, Shenyang, 110169, Liaoning, China; Key Laboratory of Intelligent Computing in Biomedical Image, Ministry of Education, Northeastern University, No. 195, Chuangxin Road, Hunnan District, Shenyang, 110169, Liaoning, China.
| | - Tianyu Shi
- Software College, Northeastern University, No. 195, Chuangxin Road, Hunnan District, Shenyang, 110169, Liaoning, China
| |
Collapse
|
23
|
Huang P, Yan L, Li Z, Zhao S, Feng Y, Zeng J, Chen L, Huang A, Chen Y, Lei S, Huang X, Deng Y, Xie D, Guan H, Peng W, Yu L, Chen B. Potential shared gene signatures and molecular mechanisms between atherosclerosis and depression: Evidence from transcriptome data. Comput Biol Med 2023; 152:106450. [PMID: 36565484 DOI: 10.1016/j.compbiomed.2022.106450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 12/09/2022] [Accepted: 12/19/2022] [Indexed: 12/24/2022]
Abstract
BACKGROUND Atherosclerosis and depression contribute to each other; however, mechanisms linking them at the genetic level remain unexplored. This study aimed to identify shared gene signatures and related pathways between these comorbidities. METHODS Atherosclerosis-related datasets were downloaded from the Gene Expression Omnibus database. Differential and weighted gene co-expression network analyses were employed to identify atherosclerosis-related genes. Depression-related genes were downloaded from the DisGeNET database, and the overlaps between atherosclerosis-related genes and depression-related genes were characterized as crosstalk genes. The functional enrichment analysis and protein-protein interaction network were performed in these gene sets. Subsequently, the Boruta algorithm and Recursive Feature Elimination algorithm were performed to identify feature-selection genes. A support vector machine was constructed to measure the accuracy of calculations, and two external validation sets were included to verify the results. RESULTS Based on two atherosclerosis-related datasets (GSE28829 and GSE43292), 165 genes were determined as atherosclerosis-related genes. Meanwhile, 1478 depression-related genes were obtained. After intersecting, 24 crosstalk genes were identified, and two pathways, "lipid and atherosclerosis" and "tryptophan metabolism," were revealed as mutual pathways according to the enrichment analysis results. Through the protein-protein interaction network, Molecular Complex Detection plugin, and cytoHubba plugin, PTPRC and MMP9 were identified as the hub gene. Moreover, SLC22A3, CASP1, AMPD3, and PIK3CG were recognized as feature-selection genes. Based on two external validation sets, CASP1 and MMP9 were finally determined as the critical crosstalk genes. CONCLUSIONS "Lipid and atherosclerosis" and "tryptophan metabolism" were possibly the pathways of atherosclerosis secondary to depression and depression due to atherosclerosis, respectively. CASP1 and MMP9 were revealed as the most pivotal candidates linking atherosclerosis and depression by mediating these two pathways. Further experimentation is needed to confirm these conclusions.
Collapse
Affiliation(s)
- Peiying Huang
- The Second Clinical Medical School of Guangzhou University of Chinese Medicine, Guangzhou, China
| | - Li Yan
- Department of Neurosurgery of Shenyang Second Hospital of Traditional Chinese Medicine, Shenyang, China
| | - Zhishang Li
- Emergency Department of Guangdong Provincial Hospital of Traditional Chinese Medicine, Guangzhou, China
| | - Shuai Zhao
- Emergency Department of Guangdong Provincial Hospital of Traditional Chinese Medicine, Guangzhou, China
| | - Yuchao Feng
- Guangdong Provincial Key Laboratory of Research on Emergency in Traditional Chinese Medicine, Clinical Research Team of Prevention and Treatment of Cardiac Emergencies with Traditional Chinese Medicine, Guangzhou, China
| | - Jing Zeng
- Emergency Department of Guangdong Provincial Hospital of Traditional Chinese Medicine, Guangzhou, China
| | - Li Chen
- Emergency Department of Guangdong Provincial Hospital of Traditional Chinese Medicine, Guangzhou, China
| | - Afang Huang
- Departments of Laboratory Medicine of Foshan Forth People's Hospital, Foshan, China
| | - Yan Chen
- Emergency Department of Guangdong Provincial Hospital of Traditional Chinese Medicine, Guangzhou, China
| | - Sisi Lei
- The Second Clinical Medical School of Guangzhou University of Chinese Medicine, Guangzhou, China
| | - Xiaoyan Huang
- Emergency Department of Guangdong Provincial Hospital of Traditional Chinese Medicine, Guangzhou, China
| | - Yi Deng
- Emergency Department of Guangdong Provincial Hospital of Traditional Chinese Medicine, Guangzhou, China
| | - Dan Xie
- The Second Clinical Medical School of Guangzhou University of Chinese Medicine, Guangzhou, China
| | - Hansu Guan
- The Third Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangzhou, China
| | - Weihang Peng
- The Second Clinical Medical School of Guangzhou University of Chinese Medicine, Guangzhou, China
| | - Liyuan Yu
- The Second Clinical Medical School of Guangzhou University of Chinese Medicine, Guangzhou, China
| | - Bojun Chen
- The Second Clinical Medical School of Guangzhou University of Chinese Medicine, Guangzhou, China; Emergency Department of Guangdong Provincial Hospital of Traditional Chinese Medicine, Guangzhou, China; Guangdong Provincial Key Laboratory of Research on Emergency in Traditional Chinese Medicine, Clinical Research Team of Prevention and Treatment of Cardiac Emergencies with Traditional Chinese Medicine, Guangzhou, China.
| |
Collapse
|
24
|
Xiang J, Wang X, Wang X, Zhang J, Yang S, Yang W, Han X, Liu Y. Automatic diagnosis and grading of Prostate Cancer with weakly supervised learning on whole slide images. Comput Biol Med 2023; 152:106340. [PMID: 36481762 DOI: 10.1016/j.compbiomed.2022.106340] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Revised: 11/02/2022] [Accepted: 11/16/2022] [Indexed: 11/23/2022]
Abstract
BACKGROUND The workflow of prostate cancer diagnosis and grading is cumbersome and the results suffer from substantial inter-observer variability. Recent trials have shown potential in using machine learning to develop automated systems to address this challenge. Most automated deep learning systems for prostate cancer Gleason grading focused on supervised learning requiring demanding fine-grained pixel-level annotations. METHODS A weakly-supervised deep learning model with slide-level labels is presented in this study for the diagnosis and grading of prostate cancer with whole slide image (WSI). WSIs are first cropped into small patches and then processed with a deep learning model to extract patch-level features. A graph convolution network (GCN) is used to aggregate the features for classifications. Throughout the training process, the noisy labels are progressively filtered out to reduce inter-observer variations in clinical reports. Finally, multi-center independent test cohorts with 6,174 slides are collected to evaluate the prostate cancer diagnosis and grading performance of our model. RESULTS The cancer diagnosis (2-level classification) results on two external test sets (n= 4,675, n= 844) show an area under the receiver operating characteristic curve (AUC) of 0.985 and 0.986. The Gleason grading (6-level classification) results reach 0.931 quadratic weighted kappa on the internal test set (n= 531). It generalizes well on the external test dataset (n= 844) with 0.801 quadratic weighted kappa with the reference standard set independently. The model enables pathological meaningful interpretability by visualizing the most attended lesions which are highly consistent with expert annotations. CONCLUSION The proposed model incorporates a graph network in weakly supervised learning with only slide-level reports. A robust learning strategy is also employed to correct the label noise. It is highly accurate (>0.985 AUC for diagnosis) and also interpretable with intuitive heatmap visualization. It can be unified with a digital pathology pipeline to deliver prostate cancer metrics for a pathology report.
Collapse
Affiliation(s)
| | - Xiyue Wang
- College of Computer Science, Sichuan University, Chengdu, China
| | - Xinran Wang
- Department of Pathology, The Fourth Hospital of Hebei Medical University, Shijiazhuang, China
| | | | - Sen Yang
- AI Lab, Tencent, Shenzhen, China
| | - Wei Yang
- AI Lab, Tencent, Shenzhen, China
| | - Xiao Han
- AI Lab, Tencent, Shenzhen, China
| | - Yueping Liu
- Department of Pathology, The Fourth Hospital of Hebei Medical University, Shijiazhuang, China.
| |
Collapse
|
25
|
Liu C, Zhou Y, Zhou Y, Tang X, Tang L, Wang J. Identification of crucial genes for predicting the risk of atherosclerosis with system lupus erythematosus based on comprehensive bioinformatics analysis and machine learning. Comput Biol Med 2023; 152:106388. [PMID: 36470144 DOI: 10.1016/j.compbiomed.2022.106388] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 11/22/2022] [Accepted: 11/28/2022] [Indexed: 12/02/2022]
Abstract
BACKGROUND Systemic lupus erythematosus (SLE) has become a major public health problem over the years, and atherosclerosis (AS) is one of the main complications of SLE associated with serious cardiovascular consequences in this patient population. The present study aimed to identify potential biomarkers for SLE patients with AS. METHODS Five microarray datasets (GSE50772, GSE81622, GSE100927, GSE28829, GSE37356) were downloaded from the NCBI Gene Expression Omnibus database. The Limma package was used to identify differentially expressed genes (DEGs) in AS. Weighted gene coexpression network analysis (WGCNA) was used to identify significant module genes associated with SLE. Functional enrichment analysis, protein-protein interaction (PPI) network construction, and machine learning algorithms (least absolute shrinkage and selection operator (Lasso, Support Vector Machine-Recursive Feature Elimination (SVM-RFE), and random forest) were applied to identify hub genes. Subsequently, we generated a nomogram and receiver operating characteristic curve (ROC) for predicting the risk of AS in SLE patients. Finally, immune cell infiltrations were analyzed, and Consensus Cluster Analysis was conducted based on Single Sample Gene Set Enrichment Analysis (ssGSEA) scores. RESULTS Five hub genes (SPI1, MMP9, C1QA, CX3CR1, and MNDA) were identified and used to establish a nomogram that yielded a high predictive performance (area under the curve 0.900-0.981). Dysregulated immune cell infiltrations were found in AS, with positive correlations with the five hub genes. Consensus clustering showed that the optimal number of subtypes was 3. Compared to subtypes A and B, subtype C presented higher expression of the five hub genes, immune cell infiltration levels and immune checkpoint expression. CONCLUSION Our study systematically identified five candidate hub genes (SPI1, MMP9, C1QA, CX3CR1, MNDA) and established a nomogram that could predict the risk of AS with SLE using various bioinformatic analyses and machine learning algorithms. Our findings provide the foothold for future studies on potential crucial genes for AS in SLE patients. Additionally, the dysregulated immune cell proportions and immune checkpoint expressions in AS with SLE were identified.
Collapse
Affiliation(s)
- Chunjiang Liu
- Department of General Surgery, Division of Vascular Surgery, Shaoxing People's Hospital (Shaoxing Hospital of Zhejiang University), Shaoxing, 312000, China
| | - Yufei Zhou
- Department of Cardiology, Shanghai Institute of Cardiovascular Diseases, Zhongshan Hospital and Institutes of Biomedical Sciences, Fudan University, Shanghai, 200032, China
| | - Yue Zhou
- Department of General Surgery, Division of Vascular Surgery, Shaoxing People's Hospital (Shaoxing Hospital of Zhejiang University), Shaoxing, 312000, China
| | - Xiaoqi Tang
- Department of General Surgery, Division of Vascular Surgery, Shaoxing People's Hospital (Shaoxing Hospital of Zhejiang University), Shaoxing, 312000, China
| | - Liming Tang
- Department of General Surgery, Division of Vascular Surgery, Shaoxing People's Hospital (Shaoxing Hospital of Zhejiang University), Shaoxing, 312000, China.
| | - Jiajia Wang
- Department of Rheumatology, Shaoxing People's Hospital (Shaoxing Hospital of Zhejiang University), Shaoxing, 312000, China.
| |
Collapse
|
26
|
Yue ZX, Yan TC, Xu HQ, Liu YH, Hong YF, Chen GX, Xie T, Tao L. A systematic review on the state-of-the-art strategies for protein representation. Comput Biol Med 2023; 152:106440. [PMID: 36543002 DOI: 10.1016/j.compbiomed.2022.106440] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 12/08/2022] [Accepted: 12/15/2022] [Indexed: 12/23/2022]
Abstract
The study of drug-target protein interaction is a key step in drug research. In recent years, machine learning techniques have become attractive for research, including drug research, due to their automated nature, predictive power, and expected efficiency. Protein representation is a key step in the study of drug-target protein interaction by machine learning, which plays a fundamental role in the ultimate accomplishment of accurate research. With the progress of machine learning, protein representation methods have gradually attracted attention and have consequently developed rapidly. Therefore, in this review, we systematically classify current protein representation methods, comprehensively review them, and discuss the latest advances of interest. According to the information extraction methods and information sources, these representation methods are generally divided into structure and sequence-based representation methods. Each primary class can be further divided into specific subcategories. As for the particular representation methods involve both traditional and the latest approaches. This review contains a comprehensive assessment of the various methods which researchers can use as a reference for their specific protein-related research requirements, including drug research.
Collapse
Affiliation(s)
- Zi-Xuan Yue
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Tian-Ci Yan
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Hong-Quan Xu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Yu-Hong Liu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Yan-Feng Hong
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Gong-Xing Chen
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Tian Xie
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China.
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China.
| |
Collapse
|
27
|
Zhou Y, Zhang Y, Li F, Lian X, Zhu Q, Zhu F, Qiu Y. SISPRO: signature identification for spatial proteomics. J Mol Biol 2023. [DOI: 10.1016/j.jmb.2022.167944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
|
28
|
Alshawaqfeh M, Rababah S, Hayajneh A, Gharaibeh A, Serpedin E. MetaAnalyst: a user-friendly tool for metagenomic biomarker detection and phenotype classification. BMC Med Res Methodol 2022; 22:336. [PMID: 36577938 PMCID: PMC9795700 DOI: 10.1186/s12874-022-01812-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 11/28/2022] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Many metagenomic studies have linked the imbalance in microbial abundance profiles to a wide range of diseases. These studies suggest utilizing the microbial abundance profiles as potential markers for metagenomic-associated conditions. Due to the inevitable importance of biomarkers in understanding the disease progression and the development of possible therapies, various computational tools have been proposed for metagenomic biomarker detection. However, most existing tools require prior scripting knowledge and lack user friendly interfaces, causing considerable time and effort to install, configure, and run these tools. Besides, there is no available all-in-one solution for running and comparing various metagenomic biomarker detection simultaneously. In addition, most of these tools just present the suggested biomarkers without any statistical evaluation for their quality. RESULTS To overcome these limitations, this work presents MetaAnalyst, a software package with a simple graphical user interface (GUI) that (i) automates the installation and configuration of 28 state-of-the-art tools, (ii) supports flexible study design to enable studying the dataset under different scenarios smoothly, iii) runs and evaluates several algorithms simultaneously iv) supports different input formats and provides the user with several preprocessing capabilities, v) provides a variety of metrics to evaluate the quality of the suggested markers, and vi) presents the outcomes in the form of publication quality plots with various formatting capabilities as well as Excel sheets. CONCLUSIONS The utility of this tool has been verified through studying a metagenomic dataset under four scenarios. The executable file for MetaAnalyst along with its user manual are made available at https://github.com/mshawaqfeh/MetaAnalyst .
Collapse
Affiliation(s)
- Mustafa Alshawaqfeh
- grid.440896.70000 0004 0418 154XSchool of Electrical Engineering and Information Technology, German Jordanian University, Amman, Jordan
| | - Salahelden Rababah
- grid.440896.70000 0004 0418 154XSchool of Electrical Engineering and Information Technology, German Jordanian University, Amman, Jordan ,grid.264260.40000 0001 2164 4508Department of Systems Science and Industrial Engineering, State University of New York at Binghamton, Binghamton, NY, USA
| | - Abdullah Hayajneh
- grid.264756.40000 0004 4687 2082Electrical and Computer Engineering Department, Texas A &M University, College Station, TX, USA
| | - Ammar Gharaibeh
- grid.440896.70000 0004 0418 154XSchool of Electrical Engineering and Information Technology, German Jordanian University, Amman, Jordan
| | - Erchin Serpedin
- grid.264756.40000 0004 4687 2082Electrical and Computer Engineering Department, Texas A &M University, College Station, TX, USA
| |
Collapse
|
29
|
Mou M, Pan Z, Lu M, Sun H, Wang Y, Luo Y, Zhu F. Application of Machine Learning in Spatial Proteomics. J Chem Inf Model 2022; 62:5875-5895. [PMID: 36378082 DOI: 10.1021/acs.jcim.2c01161] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Spatial proteomics is an interdisciplinary field that investigates the localization and dynamics of proteins, and it has gained extensive attention in recent years, especially the subcellular proteomics. Numerous evidence indicate that the subcellular localization of proteins is associated with various cellular processes and disease progression. Mass spectrometry (MS)-based and imaging-based experimental approaches have been developed to acquire large-scale spatial proteomic data. To allow the reliable analysis of increasingly complex spatial proteomics data, machine learning (ML) methods have been widely used in both MS-based and imaging-based spatial proteomic data analysis pipelines. Here, we comprehensively survey the applications of ML in spatial proteomics from following aspects: (1) data resources for spatial proteome are comprehensively introduced; (2) the roles of different ML algorithms in data analysis pipelines are elaborated; (3) successful applications of spatial proteomics and several analytical tools integrating ML methods are presented; (4) challenges existing in modern ML-based spatial proteomics studies are discussed. This review provides guidelines for researchers seeking to apply ML methods to analyze spatial proteomic data and can facilitate insightful understanding of cell biology as well as the future research in medical and drug discovery communities.
Collapse
Affiliation(s)
- Minjie Mou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Mingkun Lu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Huaicheng Sun
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yunxia Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
30
|
Tang H, Sun L, Huang J, Yang Z, Li C, Zhou X. The mechanism and biomarker function of Cavin-2 in lung ischemia-reperfusion injury. Comput Biol Med 2022; 151:106234. [PMID: 36335812 DOI: 10.1016/j.compbiomed.2022.106234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 10/01/2022] [Accepted: 10/22/2022] [Indexed: 12/27/2022]
Abstract
BACKGROUND Lung Ischemia Reperfusion injury(LIRI) is one of the most predominant complications of ischemic lung disease. Cavin-2 emerged as a regulator of a variety of cellular processes, including endocytosis, lipid homeostasis, signal transduction and tumorigenesis, but the function of Cavin-2 in LIRI is unknown. The purpose of this study was to determine the predictive potential of Cavin-2 in protecting lung ischemia-reperfusion injury and its corresponding mechanisms. METHODS We found the strong relationship between Cavin-2 and multiple immune-related genes by deep learning method. To reveal the mechanism of Cavin-2 in LIRI, the LIRI SD rat model was constructed to detect the expression of Cavin-2 in the lung tissue of SD rats after LIRI, and the expression of Cavin-2 in lung cell lines was also detected. The expression of IL-6, IL-10 and MDA in cells after Cavin-2 over-expression or knockdown was examined under hypoxic conditions. The expression levels of p-AKT, p-STAT3 and p-ERK1/2 were measured in over-expressing Cavin-2 cells under hypoxic-ischemia conditions, and then the corresponding blockers of AKT, STAT3 and ERK1/2 were given to verify, whether they play a protective role in LIRI. RESULTS After hypoxia, the expression of Cavin-2 in rat lung tissues was significantly increased, and the cellular activity and IL-10 in Cavin-2 over-expressing cells were significantly higher than that of the control group, while IL-6 and MDA were significantly lower than that of the control group, while the above results were reversed in Cavin-2 knockdown cells; Meanwhile, the phosphorylation levels of AKT, STAT3, and ERK1/2 were significantly increased in Cavin-2 over-expression cells after hypoxia. When AKT, STAT3, and ERK1/2 specific blockers were given, they lost their protective effect against LIRI. CONCLUSIONS Cavin-2 shows biomarker potential in protecting lung from ischemia-reperfusion injury through the survivor activating factor enhancement (SAFE) and reperfusion injury salvage kinase (RISK) pathway.
Collapse
Affiliation(s)
- Hexiao Tang
- Department of Thoracic Surgery, Zhongnan Hospital of Wuhan University, Wuhan, Hubei, China
| | - Linao Sun
- Tianjin Medical University, Tianjin, China
| | - Jingyu Huang
- Department of Thoracic Surgery, Zhongnan Hospital of Wuhan University, Wuhan, Hubei, China
| | - Zetian Yang
- Department of Thoracic Surgery, Zhongnan Hospital of Wuhan University, Wuhan, Hubei, China
| | - Changsheng Li
- Department of Thoracic Surgery, Zhongnan Hospital of Wuhan University, Wuhan, Hubei, China.
| | - Xuefeng Zhou
- Department of Thoracic Surgery, Zhongnan Hospital of Wuhan University, Wuhan, Hubei, China.
| |
Collapse
|
31
|
iEnhancer-MRBF: Identifying enhancers and their strength with a multiple Laplacian-regularized radial basis function network. Methods 2022; 208:1-8. [DOI: 10.1016/j.ymeth.2022.10.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 09/26/2022] [Accepted: 10/03/2022] [Indexed: 11/07/2022] Open
|
32
|
Cheng X, Tan Y, Li H, Huang J, Zhao D, Zhang Z, Yi M, Zhu L, Hui S, Yang J, Peng W. Fecal 16S rRNA sequencing and multi-compartment metabolomics revealed gut microbiota and metabolites interactions in APP/PS1 mice. Comput Biol Med 2022; 151:106312. [PMID: 36417828 DOI: 10.1016/j.compbiomed.2022.106312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 10/27/2022] [Accepted: 11/13/2022] [Indexed: 11/18/2022]
Abstract
BACKGROUND Alzheimer's disease is a significant public health issue. Recent studies have shown that the gut microbiota plays a vital role in the onset and development of Alzheimer's disease. However, the potential role of the gut microbiota and the associated metabolic characteristics require further elucidation. METHODS The gut microbial compositions of APP/PS1 mice were analyzed using 16S rRNA gene sequencing. Metabolomics was used to characterize changes in metabolic profiles in feces, serum, and cortex. A multi-omics approach investigated the potential associations between gut microbes and metabolites. RESULTS The gut microbiota composition was markedly different between APP/PS1 mice and normal mice. Metabolomic analysis identified 253 fecal metabolites, 16 serum metabolites, and 123 cortical metabolites that were differentially abundant in APP/PS1 that may be potential biomarkers of AD. Nearly half of these metabolites were lipids. A combined analysis of the three sample types showed a correlation between fecal fatty acids and glycerolipids, serum glycerophospholipids, and cortical fatty acids. Furthermore, our study showed that Marinifilaceae and Akkermansiaceae were closely related to these lipids and lipid-like molecules, particularly fatty acids and glycerophospholipids. CONCLUSION Our study highlighted the interactions between the gut microbiome and the fecal, serum, and cortical metabolomes. This interaction provides a new direction for further exploring the link between gut microbiota composition and metabolism in Alzheimer's disease.
Collapse
Affiliation(s)
- Xin Cheng
- Department of Integrated Traditional Chinese & Western Medicine, The Second Xiangya Hospital, Central South University, Changsha, 410011, China; National Clinical Research Center for Mental Disorder, Changsha, 410011, China
| | - Yejun Tan
- School of Mathematics, University of Minnesota Twin Cities, Minneapolis, 55455, MN, USA
| | - Hongli Li
- Department of Integrated Traditional Chinese & Western Medicine, The Second Xiangya Hospital, Central South University, Changsha, 410011, China; National Clinical Research Center for Mental Disorder, Changsha, 410011, China
| | - Jianhua Huang
- Hunan Academy of Chinese Medicine, Changsha, 410013, China
| | - Di Zhao
- Hunan Academy of Chinese Medicine, Changsha, 410013, China
| | - Zheyu Zhang
- Department of Integrated Traditional Chinese & Western Medicine, The Second Xiangya Hospital, Central South University, Changsha, 410011, China; National Clinical Research Center for Mental Disorder, Changsha, 410011, China
| | - Min Yi
- Department of Integrated Traditional Chinese & Western Medicine, The Second Xiangya Hospital, Central South University, Changsha, 410011, China; National Clinical Research Center for Mental Disorder, Changsha, 410011, China
| | - Lemei Zhu
- Academician Workstation, Changsha Medical University, Changsha, 410219, China
| | - Shan Hui
- Department of Geratology, Hunan Provincial People's Hospital, The First Affiliated Hospital of Hunan Normal University, Changsha, 410005, China
| | - Jingjing Yang
- Teaching and Research Section of Clinical Nursing, Xiangya Hospital, Central South University, Changsha, 410008, China
| | - Weijun Peng
- Department of Integrated Traditional Chinese & Western Medicine, The Second Xiangya Hospital, Central South University, Changsha, 410011, China; National Clinical Research Center for Mental Disorder, Changsha, 410011, China.
| |
Collapse
|
33
|
Yang Q, Li B, Wang P, Xie J, Feng Y, Liu Z, Zhu F. LargeMetabo: an out-of-the-box tool for processing and analyzing large-scale metabolomic data. Brief Bioinform 2022; 23:6768054. [PMID: 36274234 DOI: 10.1093/bib/bbac455] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2022] [Revised: 09/06/2022] [Accepted: 09/24/2022] [Indexed: 12/14/2022] Open
Abstract
Large-scale metabolomics is a powerful technique that has attracted widespread attention in biomedical studies focused on identifying biomarkers and interpreting the mechanisms of complex diseases. Despite a rapid increase in the number of large-scale metabolomic studies, the analysis of metabolomic data remains a key challenge. Specifically, diverse unwanted variations and batch effects in processing many samples have a substantial impact on identifying true biological markers, and it is a daunting challenge to annotate a plethora of peaks as metabolites in untargeted mass spectrometry-based metabolomics. Therefore, the development of an out-of-the-box tool is urgently needed to realize data integration and to accurately annotate metabolites with enhanced functions. In this study, the LargeMetabo package based on R code was developed for processing and analyzing large-scale metabolomic data. This package is unique because it is capable of (1) integrating multiple analytical experiments to effectively boost the power of statistical analysis; (2) selecting the appropriate biomarker identification method by intelligent assessment for large-scale metabolic data and (3) providing metabolite annotation and enrichment analysis based on an enhanced metabolite database. The LargeMetabo package can facilitate flexibility and reproducibility in large-scale metabolomics. The package is freely available from https://github.com/LargeMetabo/LargeMetabo.
Collapse
Affiliation(s)
- Qingxia Yang
- Department of Bioinformatics, Smart Health Big Data Analysis and Location Services Engineering Lab of Jiangsu Province, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Bo Li
- College of Life Sciences, Chongqing Normal University, Chongqing, Chongqing 401331, China
| | - Panpan Wang
- College of Chemistry and Pharmaceutical Engineering, Huanghuai University, Zhumadian 463000, China
| | - Jicheng Xie
- Department of Bioinformatics, Smart Health Big Data Analysis and Location Services Engineering Lab of Jiangsu Province, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| | - Yuhao Feng
- Department of Bioinformatics, Smart Health Big Data Analysis and Location Services Engineering Lab of Jiangsu Province, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| | - Ziqiang Liu
- Department of Bioinformatics, Smart Health Big Data Analysis and Location Services Engineering Lab of Jiangsu Province, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| |
Collapse
|
34
|
Rong Z, Liu Z, Song J, Cao L, Yu Y, Qiu M, Hou Y. MCluster-VAEs: An end-to-end variational deep learning-based clustering method for subtype discovery using multi-omics data. Comput Biol Med 2022; 150:106085. [PMID: 36162197 DOI: 10.1016/j.compbiomed.2022.106085] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 07/30/2022] [Accepted: 09/03/2022] [Indexed: 11/03/2022]
Abstract
The discovery of cancer subtypes based on unsupervised clustering helps in providing a precise diagnosis, guide treatment, and improve patients' prognoses. Instead of single-omics data, multi-omics data can improve the clustering performance because it obtains a comprehensive landscape for understanding biological systems and mechanisms. However, heterogeneous data from multiple sources raises high complexity and different kinds of noise, which are detrimental to the extraction of clustering information. We propose an end-to-end deep learning based method, called Multi-omics Clustering Variational Autoencoders (MCluster-VAEs), that can extract cluster-friendly representations on multi-omics data. First, a unified network architecture with an attention mechanism was developed for accurately modeling multi-omics data. Then, using a novel objective function built from the Variational Bayes technique, the model was trained to effectively obtain the posterior estimation of the clustering assignments. Compared with 12 other state-of-the-art multi-omics clustering methods, MCluster-VAEs achieved an outstanding performance on benchmark datasets from the TCGA database. On the Pan Cancer dataset, MCluster-VAEs achieved an adjusted Rand index of approximately 0.78 for cancer category recognition, an increase of more than 18% compared with other methods. Furthermore, a survival analysis and clinical parameter enrichment tests conducted on 10 cancer datasets demonstrated that MCluster-VAEs provides comparable and even better results than many common integrative approaches. These results demonstrate that MCluster-VAEs are a powerful new tool for dissecting complex multi-omics relationships and providing new insights for cancer subtype discovery.
Collapse
Affiliation(s)
- Zhiwei Rong
- Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China
| | - Zhilin Liu
- Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China
| | - Jiali Song
- Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China
| | - Lei Cao
- Department of Epidemiology and Biostatistics Harbin, Harbin Medical University School of Public Health, Harbin, 150000, Heilongjiang, China
| | - Yipe Yu
- Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China
| | - Mantang Qiu
- Department of Thoracic Surgery Beijing, Peking University People's Hospital, Beijing, 100000, China.
| | - Yan Hou
- Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China; Peking University Clinical Research Center, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China.
| |
Collapse
|
35
|
Jin Q, Li W, Yu W, Zeng M, Liu J, Xu P. Analysis and identification of potential type II helper T cell (Th2)-Related key genes and therapeutic agents for COVID-19. Comput Biol Med 2022; 150:106134. [PMID: 36201886 PMCID: PMC9528635 DOI: 10.1016/j.compbiomed.2022.106134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Revised: 08/30/2022] [Accepted: 09/18/2022] [Indexed: 11/19/2022]
Abstract
COVID-19 pandemic poses a severe threat to public health. However, so far, there are no effective drugs for COVID-19. Transcriptomic changes and key genes related to Th2 cells in COVID-19 have not been reported. These genes play an important role in host interactions with SARS-COV-2 and may be used as promising target. We analyzed five COVID-19-associated GEO datasets (GSE157103, GSE152641, GSE171110, GSE152418, and GSE179627) using the xCell algorithm and weighted gene co-expression network analysis (WGCNA). Results showed that 5 closely correlated modular genes to COVID-19 and Th2 cell enrichment levels, including purple, blue, pink, tan and turquoise, were intersected with differentially expressed genes (DEGs) and 648 shared genes were obtained. GO and KEGG pathway enrichment analyses revealed that they were enriched in cell proliferation, differentiation, and immune responses after virus infection. The most significantly enriched pathway involved the regulation of viral life cycle. Three key genes, namely CCNB1, BUB1, and UBE2C, may clarify the pathogenesis of COVID-19 associated with Th2 cells. 11 drug candidates were identified that could down-regulate three key genes using the cMAP database and demonstrated strong drugs binding energies aganist the three keygenes using molecular docking methods. BUB1, CCNB1 and UBE2C were identified key genes for COVID-19 and could be promising therapeutic targets.
Collapse
Affiliation(s)
- Qiying Jin
- Institute of Tropical Medicine, Guangzhou University of Chinese Medicine, Guangzhou, PR China
| | - Wanxi Li
- Institute of Tropical Medicine, Guangzhou University of Chinese Medicine, Guangzhou, PR China
| | - Wendi Yu
- Institute of Tropical Medicine, Guangzhou University of Chinese Medicine, Guangzhou, PR China
| | - Maosen Zeng
- Institute of Tropical Medicine, Guangzhou University of Chinese Medicine, Guangzhou, PR China
| | - Jinyuan Liu
- Basic Medical College, Guangzhou University of Chinese Medicine, Guangzhou, PR China
| | - Peiping Xu
- Institute of Tropical Medicine, Guangzhou University of Chinese Medicine, Guangzhou, PR China.
| |
Collapse
|
36
|
Zhou K, Cai C, He Y, Chen Z. Potential prognostic biomarkers of sudden cardiac death discovered by machine learning. Comput Biol Med 2022; 150:106154. [PMID: 36208596 DOI: 10.1016/j.compbiomed.2022.106154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Revised: 09/16/2022] [Accepted: 09/24/2022] [Indexed: 11/24/2022]
Abstract
OBJECTIVE Sudden cardiac death (SCD) is a serious public health burden. This study aims to find prognostic biomarkers of SCD using machine learning. METHODS The myocardial samples from 21 accidental death and 82 sudden death donors were compared to seek for differential genes. Enriched active genes were found according to the PPI interaction network. GSEA analyzed differences in function and pathway between control and experimental groups. Related diseases caused by active genes are mainly exhibited through DO enrichment. Prognostic biomarkers for SCD are identified via two machine learning algorithms. The CIBERSORT method was used to compare the immune microenvironment changes in patients with SCD. RESULTS SCD was mainly associated with heart and kidney diseases caused by atherosclerosis. DEFA1B, BGN, SERPINE1, CCL2 and HBB are considered to be prognostic biomarkers for SCD after machine learning. And immune infiltration plays an important role in the process of SCD. CONCLUSION We discovered 5 prognostic biomarkers for SCD. And immune microenvironment changes was also found in SCD. Moreover, atherosclerosis might be an important risk factor for SCD.
Collapse
Affiliation(s)
- Kena Zhou
- Gastroenterology Department of Ningbo No.9 Hospital, Ningbo, Zhejiang, 315000, China
| | - Congbo Cai
- Emergency Department of Yinzhou No.2 Hospital, Ningbo, Zhejiang, 315000, China
| | - Yi He
- Gastroenterology Department of Ningbo No.9 Hospital, Ningbo, Zhejiang, 315000, China
| | - Zhihua Chen
- Emergency Department of Ningbo No.1 Hospital, Ningbo, Zhejiang, 315000, China.
| |
Collapse
|
37
|
Li F, Yin J, Lu M, Mou M, Li Z, Zeng Z, Tan Y, Wang S, Chu X, Dai H, Hou T, Zeng S, Chen Y, Zhu F. DrugMAP: molecular atlas and pharma-information of all drugs. Nucleic Acids Res 2022; 51:D1288-D1299. [PMID: 36243961 PMCID: PMC9825453 DOI: 10.1093/nar/gkac813] [Citation(s) in RCA: 49] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 08/30/2022] [Accepted: 10/12/2022] [Indexed: 02/06/2023] Open
Abstract
The efficacy and safety of drugs are widely known to be determined by their interactions with multiple molecules of pharmacological importance, and it is therefore essential to systematically depict the molecular atlas and pharma-information of studied drugs. However, our understanding of such information is neither comprehensive nor precise, which necessitates the construction of a new database providing a network containing a large number of drugs and their interacting molecules. Here, a new database describing the molecular atlas and pharma-information of drugs (DrugMAP) was therefore constructed. It provides a comprehensive list of interacting molecules for >30 000 drugs/drug candidates, gives the differential expression patterns for >5000 interacting molecules among different disease sites, ADME (absorption, distribution, metabolism and excretion)-relevant organs and physiological tissues, and weaves a comprehensive and precise network containing >200 000 interactions among drugs and molecules. With the great efforts made to clarify the complex mechanism underlying drug pharmacokinetics and pharmacodynamics and rapidly emerging interests in artificial intelligence (AI)-based network analyses, DrugMAP is expected to become an indispensable supplement to existing databases to facilitate drug discovery. It is now fully and freely accessible at: https://idrblab.org/drugmap/.
Collapse
Affiliation(s)
| | | | - Mingkun Lu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Zhaorong Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba–Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Zhenyu Zeng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba–Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Ying Tan
- State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China
| | - Shanshan Wang
- Qian Xuesen Collaborative Research Center of Astrochemistry and Space Life Sciences, Institute of Drug Discovery Technology, Ningbo University, Ningbo 315211, China
| | - Xinyi Chu
- Qian Xuesen Collaborative Research Center of Astrochemistry and Space Life Sciences, Institute of Drug Discovery Technology, Ningbo University, Ningbo 315211, China
| | - Haibin Dai
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Su Zeng
- Correspondence may also be addressed to Su Zeng.
| | - Yuzong Chen
- Correspondence may also be addressed to Yuzong Chen.
| | - Feng Zhu
- To whom correspondence should be addressed.
| |
Collapse
|
38
|
Amahong K, Zhang W, Zhou Y, Zhang S, Yin J, Li F, Xu H, Yan T, Yue Z, Liu Y, Hou T, Qiu Y, Tao L, Han L, Zhu F. CovInter: interaction data between coronavirus RNAs and host proteins. Nucleic Acids Res 2022; 51:D546-D556. [PMID: 36200814 PMCID: PMC9825556 DOI: 10.1093/nar/gkac834] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 09/07/2022] [Accepted: 09/16/2022] [Indexed: 01/29/2023] Open
Abstract
Coronavirus has brought about three massive outbreaks in the past two decades. Each step of its life cycle invariably depends on the interactions among virus and host molecules. The interaction between virus RNA and host protein (IVRHP) is unique compared to other virus-host molecular interactions and represents not only an attempt by viruses to promote their translation/replication, but also the host's endeavor to combat viral pathogenicity. In other words, there is an urgent need to develop a database for providing such IVRHP data. In this study, a new database was therefore constructed to describe the interactions between coronavirus RNAs and host proteins (CovInter). This database is unique in (a) unambiguously characterizing the interactions between virus RNA and host protein, (b) comprehensively providing experimentally validated biological function for hundreds of host proteins key in viral infection and (c) systematically quantifying the differential expression patterns (before and after infection) of these key proteins. Given the devastating and persistent threat of coronaviruses, CovInter is highly expected to fill the gap in the whole process of the 'molecular arms race' between viruses and their hosts, which will then aid in the discovery of new antiviral therapies. It's now free and publicly accessible at: https://idrblab.org/covinter/.
Collapse
Affiliation(s)
| | | | | | - Song Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Jiayi Yin
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Fengcheng Li
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Hongquan Xu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China
| | - Tianci Yan
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China
| | - Zixuan Yue
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China
| | - Yuhong Liu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yunqing Qiu
- State Key Laboratory for Diagnosis and Treatment of Infectious Disease, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, Hangzhou 310000, China
| | - Lin Tao
- Correspondence may also be addressed to Lin Tao.
| | - Lianyi Han
- Correspondence may also be addressed to Lianyi Han.
| | - Feng Zhu
- To whom correspondence should be addressed. Tel: +86 189 8946 6518; Fax: +86 571 8820 8444;
| |
Collapse
|
39
|
Chen M, Xu C, Xu Z, He W, Zhang H, Su J, Song Q. Uncovering the dynamic effects of DEX treatment on lung cancer by integrating bioinformatic inference and multiscale modeling of scRNA-seq and proteomics data. Comput Biol Med 2022; 149:105999. [PMID: 35998480 PMCID: PMC9717711 DOI: 10.1016/j.compbiomed.2022.105999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2022] [Revised: 06/16/2022] [Accepted: 08/14/2022] [Indexed: 11/18/2022]
Abstract
Lung cancer is one of the leading causes of cancer-related death, with a five-year survival rate of 18%. It is a priority for us to understand the underlying mechanisms affecting lung cancer therapeutics' implementation and effectiveness. In this study, we combine the power of Bioinformatics and Systems Biology to comprehensively uncover functional and signaling pathways of drug treatment using bioinformatics inference and multiscale modeling of both scRNA-seq data and proteomics data. Based on a time series of lung adenocarcinoma derived A549 cells after DEX treatment, we first identified the differentially expressed genes (DEGs) in those lung cancer cells. Through the interrogation of regulatory network of those DEGs, we identified key hub genes including TGFβ, MYC, and SMAD3 varied underlie DEX treatment. Further gene set enrichment analysis revealed the TGFβ signaling pathway as the top enriched term. Those genes involved in the TGFβ pathway and their crosstalk with the ERBB pathway presented a strong survival prognosis in clinical lung cancer samples. With the basis of biological validation and literature-based curation, a multiscale model of tumor regulation centered on both TGFβ-induced and ERBB-amplified signaling pathways was developed to characterize the dynamic effects of DEX therapy on lung cancer cells. Our simulation results were well matched to available data of SMAD2, FOXO3, TGFβ1, and TGFβR1 over the time course. Moreover, we provided predictions of different doses to illustrate the trend and therapeutic potential of DEX treatment. The innovative and cross-disciplinary approach can be further applied to other computational studies in tumorigenesis and oncotherapy. We released the approach as a user-friendly tool named BIMM (Bioinformatic Inference and Multiscale Modeling), with all the key features available at https://github.com/chenm19/BIMM.
Collapse
Affiliation(s)
- Minghan Chen
- Department of Computer Science, Wake Forest University, Winston-Salem, NC, USA
| | - Chunrui Xu
- Genetics, Bioinformatics, and Computational Biology, Virginia Tech, Blacksburg, VA, USA
| | - Ziang Xu
- Department of Computer Science, Wake Forest University, Winston-Salem, NC, USA; Department of Chemistry, Wake Forest University, Winston-Salem, NC, USA
| | - Wei He
- Genetics, Bioinformatics, and Computational Biology, Virginia Tech, Blacksburg, VA, USA
| | - Haorui Zhang
- Department of Mathematics and Statistics, Wake Forest University, Winston-Salem, NC, USA
| | - Jing Su
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Qianqian Song
- Center for Cancer Genomics and Precision Oncology, Wake Forest Baptist Comprehensive Cancer Center, Wake Forest Baptist Medical Center, Winston Salem, NC, USA; Department of Cancer Biology, Wake Forest School of Medicine, Winston Salem, NC, USA.
| |
Collapse
|
40
|
Beura S, Kundu P, Das AK, Ghosh A. Metagenome-scale community metabolic modelling for understanding the role of gut microbiota in human health. Comput Biol Med 2022; 149:105997. [DOI: 10.1016/j.compbiomed.2022.105997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 07/03/2022] [Accepted: 08/14/2022] [Indexed: 11/03/2022]
|
41
|
Gao S, Zhang H, Lai L, Zhang J, Li Y, Miao Z, Rahman SU, Zhang H, Qian A, Zhang W. S100A10 might be a novel prognostic biomarker for head and neck squamous cell carcinoma based on bioinformatics analysis. Comput Biol Med 2022; 149:106000. [DOI: 10.1016/j.compbiomed.2022.106000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 06/29/2022] [Accepted: 08/14/2022] [Indexed: 12/09/2022]
|
42
|
Identification of crucial hub genes and potential molecular mechanisms in breast cancer by integrated bioinformatics analysis and experimental validation. Comput Biol Med 2022; 149:106036. [DOI: 10.1016/j.compbiomed.2022.106036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 08/14/2022] [Accepted: 08/20/2022] [Indexed: 11/24/2022]
|
43
|
Iqbal N, Kumar P. Integrated COVID-19 Predictor: Differential expression analysis to reveal potential biomarkers and prediction of coronavirus using RNA-Seq profile data. Comput Biol Med 2022; 147:105684. [PMID: 35687925 PMCID: PMC9162937 DOI: 10.1016/j.compbiomed.2022.105684] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 05/27/2022] [Accepted: 05/30/2022] [Indexed: 02/01/2023]
Abstract
Background The world has been battling the continuous COVID-19 pandemic spread by the SARS-CoV-2 virus for last two years. The issue of viral disease prediction is constantly a matter of interest in virology and the study of disease transmission over the long years. Objective In this study, we aimed to implement genome association studies using RNA-Seq of COVID-19 and reveal highly expressed gene biomarkers and prediction based on the machine learning model of COVID-19 analysis to combat this pandemic. Method We collected RNA-Seq gene count data for both healthy (Control) and non-healthy (Treated) COVID-19 cases. In this experiment, a sequence of bioinformatics strategies and statistical techniques, such as fold-change and adjusted p-value, were processed to identify differentially expressed genes (DEGs). We filtered biomarker sets of high DEGs, moderate DEGs, and low DEGs using DESeq2, Limma Trend, and Limma Voom methods based on intersection and union operations and applied machine learning techniques to predict COVID-19. Result Through experimental analysis, 67 potential biomarkers were extracted, comprising 49 up-regulated and 18 down-regulated genes, using statistical techniques and a set-theory consensus strategy. We trained the machine learning models on 12 different biomarker sets and found that the SVM model performed better than the other classifiers with 99.07% classification accuracy for moderate DEGs. Conclusion Our study revealed that identified differentially expressed genes of the moderate DEGs biomarker set, |log2FC| ≥ 2 with adjusted p-value < 0.05, work significantly as input features to implement a machine learning model using a kernel-based SVM technique to predict COVID-19.
Collapse
|
44
|
Hu Y, Tang C, Zhu W, Ye H, Lin Y, Wang R, Zhou T, Wen S, Yang J, Fang C. Identification of chromosomal instability-associated genes as hepatocellular carcinoma progression-related biomarkers to guide clinical diagnosis, prognosis and therapy. Comput Biol Med 2022; 148:105896. [PMID: 35868048 DOI: 10.1016/j.compbiomed.2022.105896] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2022] [Revised: 06/21/2022] [Accepted: 07/16/2022] [Indexed: 11/03/2022]
Abstract
Hepatocellular carcinoma (HCC) is a type of cancer characterized by high heterogeneity and a complex multistep progression process. Significantly-altered biomarkers for HCC need to be identified. Differentially expressed genes and weighted gene co-expression network analyses were used to identify progression-related biomarkers. LASSO-Cox regression and random forest algorithms were used to construct the progression-related prognosis (PRP) score. Three chromosomal instability-associated genes (KIF20A, TOP2A, and TTK) have been identified as progression-related biomarkers. The robustness of the PRP scores were validated using four independent cohorts. Immune status was observed using the single-sample gene set enrichment analysis (ssGSEA). Comprehensive analysis showed that the patients with high PRP score had wider genomic alterations, more malignant phenotypes, and were in a state of immunosuppression. The diagnostic models constructed via logistic regression based on the three genes showed satisfactory performances in distinguishing HCC from cirrhotic tissues or dysplastic nodules. The nomogram combining PRP scores with clinical factors had a better performance in predicting prognosis than the tumor node metastasis classification (TNM) system. We further confirmed that KIF20A, TOP2A, and TTK were highly expressed in HCC tissues than in cirrhotic tissues. Downregulation of all three genes aggravated chromosomal instabilities in HCC and suppressed HCC cells viability both in vitro and in vivo. Overall, our study highlights the important roles of chromosomal instability-associated genes during the progression of HCC and their potential clinical diagnosis and prognostic value and provides promising new ideas for developing therapeutic strategies to improve the outcomes of HCC patients.
Collapse
Affiliation(s)
- Yueyang Hu
- Department of Hepatobiliary Surgery, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, China; Institute of Digital Intelligence, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, China; Guangdong Provincial Clinical and Engineering Center of Digital Medicine, Guangzhou, 510280, China
| | - Chuanyu Tang
- Department of Hepatobiliary Surgery, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, China; Institute of Digital Intelligence, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, China; Guangdong Provincial Clinical and Engineering Center of Digital Medicine, Guangzhou, 510280, China
| | - Wen Zhu
- Department of Hepatobiliary Surgery, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, China; Institute of Digital Intelligence, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, China; Guangdong Provincial Clinical and Engineering Center of Digital Medicine, Guangzhou, 510280, China
| | - Hanjie Ye
- Department of Hepatobiliary Surgery, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, China; Institute of Digital Intelligence, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, China; Guangdong Provincial Clinical and Engineering Center of Digital Medicine, Guangzhou, 510280, China
| | - Yuxing Lin
- Department of Hepatobiliary Surgery, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, China; Institute of Digital Intelligence, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, China; Guangdong Provincial Clinical and Engineering Center of Digital Medicine, Guangzhou, 510280, China
| | - Ruixuan Wang
- Department of Hepatobiliary Surgery, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, China; Institute of Digital Intelligence, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, China; Guangdong Provincial Clinical and Engineering Center of Digital Medicine, Guangzhou, 510280, China
| | - Tianjun Zhou
- Department of Hepatobiliary Surgery, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, China; Institute of Digital Intelligence, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, China; Guangdong Provincial Clinical and Engineering Center of Digital Medicine, Guangzhou, 510280, China
| | - Sai Wen
- Department of Hepatobiliary Surgery, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, China; Institute of Digital Intelligence, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, China; Guangdong Provincial Clinical and Engineering Center of Digital Medicine, Guangzhou, 510280, China
| | - Jian Yang
- Department of Hepatobiliary Surgery, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, China; Institute of Digital Intelligence, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, China; Guangdong Provincial Clinical and Engineering Center of Digital Medicine, Guangzhou, 510280, China
| | - Chihua Fang
- Department of Hepatobiliary Surgery, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, China; Institute of Digital Intelligence, Zhujiang Hospital, Southern Medical University, Guangzhou, 510280, China; Guangdong Provincial Clinical and Engineering Center of Digital Medicine, Guangzhou, 510280, China.
| |
Collapse
|
45
|
An ensemble framework for microarray data classification based on feature subspace partitioning. Comput Biol Med 2022; 148:105820. [PMID: 35872409 DOI: 10.1016/j.compbiomed.2022.105820] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Revised: 06/05/2022] [Accepted: 07/03/2022] [Indexed: 12/14/2022]
Abstract
Feature selection is exposed to the curse of dimensionality risk, and it is even more exacerbated with high-dimensional data such as microarrays. Moreover, the low-instance/high-feature (LIHF) property of microarray data needs considerable processing time to do some calculations and comparisons among features to choose the best subset of them, which has led to many efforts to subdue the LIHF property of such genomic medicine data. Due to the promising results of the ensemble models in machine learning problems, this paper presents a novel framework, named feature-level aggregation-based ensemble based on overlapped feature subspace partitioning (FLAE-OFSP) for microarray data classification. The proposed ensemble has three main steps: after generating several subsets by the proposed partitioning approach, a feature selection algorithm (i.e., a feature ranker) is applied on each subset, and finally, their results are combined into a single ranked list using six defined aggregation functions. Evaluation of the presented framework based on seven microarray datasets and using four measures, including stability, classification accuracy, runtime, and Modscore shows substantial runtime improvement and also quality results in other evaluated measures compared to individual methods.
Collapse
|
46
|
Zhou J, Cao W, Wang L, Pan Z, Fu Y. Application of artificial intelligence in the diagnosis and prognostic prediction of ovarian cancer. Comput Biol Med 2022; 146:105608. [PMID: 35584585 DOI: 10.1016/j.compbiomed.2022.105608] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Revised: 05/08/2022] [Accepted: 05/09/2022] [Indexed: 11/03/2022]
Abstract
In recent years, the wide application of artificial intelligence (AI) has dramatically improved the work efficiency of clinicians and reduced their workload. This review provides a glance at the latest advances in AI-assisted diagnosis and prognostic prediction of ovarian cancer (OC). We performed an advanced search in PubMed and IEEE/IET Electronic Library, and included 39 articles in this review. A comprehensive and objective criterion was built to assess the reliability and quality of all studies from four aspects: the size of datasets for model development, research design, the division of training sets and test sets, and the type of quantitative performance indicators. This review analyzed the construction of AI models, including data pre-processing methods, feature selection techniques, AI classifiers, or algorithms. Additionally, we compared the performance of these models built on different datasets, which may support researchers for further iteration and development of AI. Finally, we discussed the challenges and future directions for AI application in medicine.
Collapse
Affiliation(s)
- Jingyang Zhou
- Queen Mary School, Medical Department, Nanchang University, Nanchang, 330031, Jiangxi Province, PR China
| | - Weiwei Cao
- Queen Mary School, Medical Department, Nanchang University, Nanchang, 330031, Jiangxi Province, PR China
| | - Lan Wang
- Queen Mary School, Medical Department, Nanchang University, Nanchang, 330031, Jiangxi Province, PR China
| | - Zezheng Pan
- Faculty of Basic Medical Science, Nanchang University, Nanchang, 330006, Jiangxi Province, PR China
| | - Ying Fu
- The First Affiliated Hospital of Nanchang University, Nanchang, 330006, Jiangxi Province, PR China.
| |
Collapse
|
47
|
Plancade S, Berland M, Blein-Nicolas M, Langella O, Bassignani A, Juste C. A combined test for feature selection on sparse metaproteomics data-an alternative to missing value imputation. PeerJ 2022; 10:e13525. [PMID: 35769140 PMCID: PMC9235818 DOI: 10.7717/peerj.13525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Accepted: 05/11/2022] [Indexed: 01/18/2023] Open
Abstract
One of the difficulties encountered in the statistical analysis of metaproteomics data is the high proportion of missing values, which are usually treated by imputation. Nevertheless, imputation methods are based on restrictive assumptions regarding missingness mechanisms, namely "at random" or "not at random". To circumvent these limitations in the context of feature selection in a multi-class comparison, we propose a univariate selection method that combines a test of association between missingness and classes, and a test for difference of observed intensities between classes. This approach implicitly handles both missingness mechanisms. We performed a quantitative and qualitative comparison of our procedure with imputation-based feature selection methods on two experimental data sets, as well as simulated data with various scenarios regarding the missingness mechanisms and the nature of the difference of expression (differential intensity or differential presence). Whereas we observed similar performances in terms of prediction on the experimental data set, the feature ranking and selection from various imputation-based methods were strongly divergent. We showed that the combined test reaches a compromise by correlating reasonably with other methods, and remains efficient in all simulated scenarios unlike imputation-based feature selection methods.
Collapse
Affiliation(s)
- Sandra Plancade
- UR875 MIAT, Université fédérale de Toulouse, INRAE, Castanet-Tolosan, France
| | - Magali Berland
- Université Paris-Saclay, INRAE, MGP, Jouy en Josas, France
| | - Mélisande Blein-Nicolas
- Université Paris-Saclay, CNRS, INRAE, AgroParisTech, GQE-Le Moulon, Gif-sur-Yvette, France,Université Paris-Saclay, CNRS, INRAE, AgroParisTech, PAPPSO, Gif-sur-Yvette, France
| | - Olivier Langella
- Université Paris-Saclay, CNRS, INRAE, AgroParisTech, GQE-Le Moulon, Gif-sur-Yvette, France,Université Paris-Saclay, CNRS, INRAE, AgroParisTech, PAPPSO, Gif-sur-Yvette, France
| | - Ariane Bassignani
- Université Paris-Saclay, INRAE, MGP, Jouy en Josas, France,Université Paris-Saclay, CNRS, INRAE, AgroParisTech, PAPPSO, Gif-sur-Yvette, France
| | - Catherine Juste
- Micalis Institute, Université Paris-Saclay, INRAE, AgroParis Tech, Jouy-en-Josas, France
| |
Collapse
|
48
|
Understanding the mutational frequency in SARS-CoV-2 proteome using structural features. Comput Biol Med 2022; 147:105708. [PMID: 35714506 PMCID: PMC9173821 DOI: 10.1016/j.compbiomed.2022.105708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Revised: 04/26/2022] [Accepted: 06/04/2022] [Indexed: 01/18/2023]
Abstract
The prolonged transmission of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus in the human population has led to demographic divergence and the emergence of several location-specific clusters of viral strains. Although the effect of mutation(s) on severity and survival of the virus is still unclear, it is evident that certain sites in the viral proteome are more/less prone to mutations. In fact, millions of SARS-CoV-2 sequences collected all over the world have provided us a unique opportunity to understand viral protein mutations and develop novel computational approaches to predict mutational patterns. In this study, we have classified the mutation sites into low and high mutability classes based on viral isolates count containing mutations. The physicochemical features and structural analysis of the SARS-CoV-2 proteins showed that features including residue type, surface accessibility, residue bulkiness, stability and sequence conservation at the mutation site were able to classify the low and high mutability sites. We further developed machine learning models using above-mentioned features, to predict low and high mutability sites at different selection thresholds (ranging 5-30% of topmost and bottommost mutated sites) and observed the improvement in performance as the selection threshold is reduced (prediction accuracy ranging from 65 to 77%). The analysis will be useful for early detection of variants of concern for the SARS-CoV-2, which can also be applied to other existing and emerging viruses for another pandemic prevention.
Collapse
|
49
|
An enhanced binary Rat Swarm Optimizer based on local-best concepts of PSO and collaborative crossover operators for feature selection. Comput Biol Med 2022; 147:105675. [PMID: 35687926 DOI: 10.1016/j.compbiomed.2022.105675] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 05/24/2022] [Accepted: 05/26/2022] [Indexed: 11/22/2022]
Abstract
In this paper, an enhanced binary version of the Rat Swarm Optimizer (RSO) is proposed to deal with Feature Selection (FS) problems. FS is an important data reduction step in data mining which finds the most representative features from the entire data. Many FS-based swarm intelligence algorithms have been used to tackle FS. However, the door is still open for further investigations since no FS method gives cutting-edge results for all cases. In this paper, a recent swarm intelligence metaheuristic method called RSO which is inspired by the social and hunting behavior of a group of rats is enhanced and explored for FS problems. The binary enhanced RSO is built based on three successive modifications: i) an S-shape transfer function is used to develop binary RSO algorithms; ii) the local search paradigm of particle swarm optimization is used with the iterative loop of RSO to boost its local exploitation; iii) three crossover mechanisms are used and controlled by a switch probability to improve the diversity. Based on these enhancements, three versions of RSO are produced, referred to as Binary RSO (BRSO), Binary Enhanced RSO (BERSO), and Binary Enhanced RSO with Crossover operators (BERSOC). To assess the performance of these versions, a benchmark of 24 datasets from various domains is used. The proposed methods are assessed concerning the fitness value, number of selected features, classification accuracy, specificity, sensitivity, and computational time. The best performance is achieved by BERSOC followed by BERSO and then BRSO. These proposed versions are comparatively assessed against 25 well-regarded metaheuristic methods and five filter-based approaches. The obtained results underline their superiority by producing new best results for some datasets.
Collapse
|
50
|
Feng G, Yao H, Li C, Liu R, Huang R, Fan X, Ge R, Miao Q. ME-ACP: Multi-view neural networks with ensemble model for identification of anticancer peptides. Comput Biol Med 2022; 145:105459. [DOI: 10.1016/j.compbiomed.2022.105459] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Revised: 03/22/2022] [Accepted: 03/24/2022] [Indexed: 12/26/2022]
|