1
|
Yu Z, Yu J, Wang H, Zhang S, Zhao L, Shi S. PhosAF: An integrated deep learning architecture for predicting protein phosphorylation sites with AlphaFold2 predicted structures. Anal Biochem 2024; 690:115510. [PMID: 38513769 DOI: 10.1016/j.ab.2024.115510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 03/14/2024] [Accepted: 03/18/2024] [Indexed: 03/23/2024]
Abstract
Phosphorylation is indispensable in comprehending biological processes, while biological experimental methods for identifying phosphorylation sites are tedious and arduous. With the rapid growth of biotechnology, deep learning methods have made significant progress in site prediction tasks. Nevertheless, most existing predictors only consider protein sequence information, that limits the capture of protein spatial information. Building upon the latest advancement in protein structure prediction by AlphaFold2, a novel integrated deep learning architecture PhosAF is developed to predict phosphorylation sites in human proteins by integrating CMA-Net and MFC-Net, which considers sequence and structure information predicted by AlphaFold2. Here, CMA-Net module is composed of multiple convolutional neural network layers and multi-head attention is appended to obtaining the local and long-term dependencies of sequence features. Meanwhile, the MFC-Net module composed of deep neural network layers is used to capture the complex representations of evolutionary and structure features. Furthermore, different features are combined to predict the final phosphorylation sites. In addition, we put forward a new strategy to construct reliable negative samples via protein secondary structures. Experimental results on independent test data and case study indicate that our model PhosAF surpasses the current most advanced methods in phosphorylation site prediction.
Collapse
Affiliation(s)
- Ziyuan Yu
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China.
| | - Jialin Yu
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China.
| | - Hongmei Wang
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China.
| | - Shuai Zhang
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China.
| | - Long Zhao
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China.
| | - Shaoping Shi
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China; Institute of Mathematics and Interdisciplinary Sciences, Nanchang University, Nanchang, 330031, China.
| |
Collapse
|
2
|
Korek M, Uhrig RG, Marzec M. Strigolactone insensitivity affects differential shoot and root transcriptome in barley. J Appl Genet 2024:10.1007/s13353-024-00885-w. [PMID: 38877382 DOI: 10.1007/s13353-024-00885-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 05/24/2024] [Accepted: 06/06/2024] [Indexed: 06/16/2024]
Abstract
Strigolactones (SLs) are plant hormones that play a crucial role in regulating various aspects of plant architecture, such as shoot and root branching. However, the knowledge of SL-responsive genes and transcription factors (TFs) that control the shaping of plant architecture remains elusive. Here, transcriptomic analysis was conducted using the SL-insensitive barley mutant hvd14.d (carried mutation in SL receptor DWARF14, HvD14) and its wild-type (WT) to unravel the differences in gene expression separately in root and shoot tissues. This approach enabled us to select more than six thousand SL-dependent genes that were exclusive to each studied organ or not tissue-specific. The data obtained, along with in silico analyses, found several TFs that exhibited changed expression between the analyzed genotypes and that recognized binding sites in promoters of other identified differentially expressed genes (DEGs). In total, 28 TFs that recognize motifs over-represented in DEG promoters were identified. Moreover, nearly half of the identified TFs were connected in a single network of known and predicted interactions, highlighting the complexity and multidimensionality of SL-related signalling in barley. Finally, the SL control on the expression of one of the identified TFs in HvD14- and dose-dependent manners was proved. Obtained results bring us closer to understanding the signalling pathways regulating SL-dependent plant development.
Collapse
Affiliation(s)
- Magdalena Korek
- Faculty of Natural Sciences, Institute of Biology, Biotechnology and Environmental Protection, University of Silesia in Katowice, Jagiellonska 28, 40-032, Katowice, Poland
| | - R Glen Uhrig
- Department of Biological Sciences, University of Alberta, 11455 Saskatchewan Drive, Edmonton, AB, T6G 2E9, Canada
| | - Marek Marzec
- Faculty of Natural Sciences, Institute of Biology, Biotechnology and Environmental Protection, University of Silesia in Katowice, Jagiellonska 28, 40-032, Katowice, Poland.
| |
Collapse
|
3
|
Ahmadian M, Bodalal Z, van der Hulst HJ, Vens C, Karssemakers LHE, Bogveradze N, Castagnoli F, Landolfi F, Hong EK, Gennaro N, Pizzi AD, Beets-Tan RGH, van den Brekel MWM, Castelijns JA. Overcoming data scarcity in radiomics/radiogenomics using synthetic radiomic features. Comput Biol Med 2024; 174:108389. [PMID: 38593640 DOI: 10.1016/j.compbiomed.2024.108389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 03/11/2024] [Accepted: 03/25/2024] [Indexed: 04/11/2024]
Abstract
PURPOSE To evaluate the potential of synthetic radiomic data generation in addressing data scarcity in radiomics/radiogenomics models. METHODS This study was conducted on a retrospectively collected cohort of 386 colorectal cancer patients (n = 2570 lesions) for whom matched contrast-enhanced CT images and gene TP53 mutational status were available. The full cohort data was divided into a training cohort (n = 2055 lesions) and an independent and fixed test set (n = 515 lesions). Differently sized training sets were subsampled from the training cohort to measure the impact of sample size on model performance and assess the added value of synthetic radiomic augmentation at different sizes. Five different tabular synthetic data generation models were used to generate synthetic radiomic data based on "real-world" radiomics data extracted from this cohort. The quality and reproducibility of the generated synthetic radiomic data were assessed. Synthetic radiomics were then combined with "real-world" radiomic training data to evaluate their impact on the predictive model's performance. RESULTS A prediction model was generated using only "real-world" radiomic data, revealing the impact of data scarcity in this particular data set through a lack of predictive performance at low training sample numbers (n = 200, 400, 1000 lesions with average AUC = 0.52, 0.53, and 0.56 respectively, compared to 0.64 when using 2055 training lesions). Synthetic tabular data generation models created reproducible synthetic radiomic data with properties highly similar to "real-world" data (for n = 1000 lesions, average Chi-square = 0.932, average basic statistical correlation = 0.844). The integration of synthetic radiomic data consistently enhanced the performance of predictive models trained with small sample size sets (AUC enhanced by 9.6%, 11.3%, and 16.7% for models trained on n_samples = 200, 400, and 1000 lesions, respectively). In contrast, synthetic data generated from randomised/noisy radiomic data failed to enhance predictive performance underlining the requirement of true signal data to do so. CONCLUSION Synthetic radiomic data, when combined with real radiomics, could enhance the performance of predictive models. Tabular synthetic data generation might help to overcome limitations in medical AI stemming from data scarcity.
Collapse
Affiliation(s)
- Milad Ahmadian
- Department of Head and Neck Oncology and Surgery, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; Amsterdam Center for Language and Communication, University of Amsterdam, Amsterdam, the Netherlands.
| | - Zuhir Bodalal
- Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; GROW School for Oncology and Developmental Biology, Maastricht University, Maastricht, the Netherlands
| | - Hedda J van der Hulst
- Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; GROW School for Oncology and Developmental Biology, Maastricht University, Maastricht, the Netherlands
| | - Conchita Vens
- Department of Head and Neck Oncology and Surgery, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; School of Cancer Science, University of Glasgow, Glasgow, Scotland, UK
| | - Luc H E Karssemakers
- Department of Head and Neck Oncology and Surgery, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands
| | - Nino Bogveradze
- Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; GROW School for Oncology and Developmental Biology, Maastricht University, Maastricht, the Netherlands; Department of Radiology, American Hospital Tbilisi, Tbilisi, Georgia
| | - Francesca Castagnoli
- Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; Department of Radiology, Royal Marsden Hospital, London, UK; Division of Radiotherapy and Imaging, The Institute of Cancer Research, London, UK
| | - Federica Landolfi
- Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; Radiology Unit, Sant'Andrea Hospital, Sapienza University of Rome, Rome, Italy
| | - Eun Kyoung Hong
- Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; GROW School for Oncology and Developmental Biology, Maastricht University, Maastricht, the Netherlands; Seoul National University Hospital, Seoul, South Korea
| | - Nicolo Gennaro
- Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; Department of Radiology, Northwestern University, Chicago, USA
| | - Andrea Delli Pizzi
- Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; ITAB - Institute for Advanced Biomedical Technologies, G. d'Annunzio University, Chieti, Italy; Department of Innovative Technologies in Medicine and Dentistry, G. D'Annunzio University, Chieti, Italy
| | - Regina G H Beets-Tan
- Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; GROW School for Oncology and Developmental Biology, Maastricht University, Maastricht, the Netherlands; Institute of Regional Health Research, University of Southern Denmark, Odense, Denmark
| | - Michiel W M van den Brekel
- Department of Head and Neck Oncology and Surgery, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; Amsterdam Center for Language and Communication, University of Amsterdam, Amsterdam, the Netherlands.
| | - Jonas A Castelijns
- Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands
| |
Collapse
|
4
|
Ramazi S, Tabatabaei SAH, Khalili E, Nia AG, Motarjem K. Analysis and review of techniques and tools based on machine learning and deep learning for prediction of lysine malonylation sites in protein sequences. Database (Oxford) 2024; 2024:baad094. [PMID: 38245002 PMCID: PMC10799748 DOI: 10.1093/database/baad094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Revised: 11/30/2023] [Accepted: 12/20/2023] [Indexed: 01/22/2024]
Abstract
The post-translational modifications occur as crucial molecular regulatory mechanisms utilized to regulate diverse cellular processes. Malonylation of proteins, a reversible post-translational modification of lysine/k residues, is linked to a variety of biological functions, such as cellular regulation and pathogenesis. This modification plays a crucial role in metabolic pathways, mitochondrial functions, fatty acid oxidation and other life processes. However, accurately identifying malonylation sites is crucial to understand the molecular mechanism of malonylation, and the experimental identification can be a challenging and costly task. Recently, approaches based on machine learning (ML) have been suggested to address this issue. It has been demonstrated that these procedures improve accuracy while lowering costs and time constraints. However, these approaches also have specific shortcomings, including inappropriate feature extraction out of protein sequences, high-dimensional features and inefficient underlying classifiers. As a result, there is an urgent need for effective predictors and calculation methods. In this study, we provide a comprehensive analysis and review of existing prediction models, tools and benchmark datasets for predicting malonylation sites in protein sequences followed by a comparison study. The review consists of the specifications of benchmark datasets, explanation of features and encoding methods, descriptions of the predictions approaches and their embedding ML or deep learning models and the description and comparison of the existing tools in this domain. To evaluate and compare the prediction capability of the tools, a new bunch of data has been extracted based on the most updated database and the tools have been assessed based on the extracted data. Finally, a hybrid architecture consisting of several classifiers including classical ML models and a deep learning model has been proposed to ensemble the prediction results. This approach demonstrates the better performance in comparison with all prediction tools included in this study (the source codes of the models presented in this manuscript are available in https://github.com/Malonylation). Database URL: https://github.com/A-Golshan/Malonylation.
Collapse
Affiliation(s)
| | - Seyed Amir Hossein Tabatabaei
- Department of Computer Science, Faculty of Mathematical Sciences, University of Guilan, Namjoo St. Postal, Rasht 41938-33697, Iran
- Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Jalal AleAhmad, Tehran 14117-13116, Iran
| | - Elham Khalili
- Department of Plant Sciences, Faculty of Science, Tarbiat Modares University, Jalal AleAhmad, Tehran 14117-13116, Iran
| | - Amirhossein Golshan Nia
- Department of Mathematics and Computer Science, Amirkabir University of Technology, No. 350, Hafez Ave, Tehran 15916-34311, Iran
| | - Kiomars Motarjem
- Department of Statistics, Faculty of Mathematical Sciences, Tarbiat Modares University, Jalal AleAhmad, Tehran 14117-13116, Iran
| |
Collapse
|
5
|
Esmaili F, Pourmirzaei M, Ramazi S, Shojaeilangari S, Yavari E. A Review of Machine Learning and Algorithmic Methods for Protein Phosphorylation Site Prediction. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:1266-1285. [PMID: 37863385 PMCID: PMC11082408 DOI: 10.1016/j.gpb.2023.03.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 01/16/2023] [Accepted: 03/23/2023] [Indexed: 10/22/2023]
Abstract
Post-translational modifications (PTMs) have key roles in extending the functional diversity of proteins and, as a result, regulating diverse cellular processes in prokaryotic and eukaryotic organisms. Phosphorylation modification is a vital PTM that occurs in most proteins and plays a significant role in many biological processes. Disorders in the phosphorylation process lead to multiple diseases, including neurological disorders and cancers. The purpose of this review is to organize this body of knowledge associated with phosphorylation site (p-site) prediction to facilitate future research in this field. At first, we comprehensively review all related databases and introduce all steps regarding dataset creation, data preprocessing, and method evaluation in p-site prediction. Next, we investigate p-site prediction methods, which are divided into two computational groups: algorithmic and machine learning (ML). Additionally, it is shown that there are basically two main approaches for p-site prediction by ML: conventional and end-to-end deep learning methods, both of which are given an overview. Moreover, this review introduces the most important feature extraction techniques, which have mostly been used in p-site prediction. Finally, we create three test sets from new proteins related to the released version of the database of protein post-translational modifications (dbPTM) in 2022 based on general and human species. Evaluating online p-site prediction tools on newly added proteins introduced in the dbPTM 2022 release, distinct from those in the dbPTM 2019 release, reveals their limitations. In other words, the actual performance of these online p-site prediction tools on unseen proteins is notably lower than the results reported in their respective research papers.
Collapse
Affiliation(s)
- Farzaneh Esmaili
- Department of Information Technology, Tarbiat Modares University, Tehran 14115-111, Iran
| | - Mahdi Pourmirzaei
- Department of Information Technology, Tarbiat Modares University, Tehran 14115-111, Iran
| | - Shahin Ramazi
- Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran 14115-111, Iran.
| | - Seyedehsamaneh Shojaeilangari
- Biomedical Engineering Group, Department of Electrical Engineering and Information Technology, Iranian Research Organization for Science and Technology (IROST), Tehran 33535-111, Iran
| | - Elham Yavari
- Department of Information Technology, Tarbiat Modares University, Tehran 14115-111, Iran
| |
Collapse
|
6
|
Pourmirzaei M, Ramazi S, Esmaili F, Shojaeilangari S, Allahvardi A. Machine learning-based approaches for ubiquitination site prediction in human proteins. BMC Bioinformatics 2023; 24:449. [PMID: 38017391 PMCID: PMC10683244 DOI: 10.1186/s12859-023-05581-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 11/23/2023] [Indexed: 11/30/2023] Open
Abstract
Protein ubiquitination is a critical post-translational modification (PTMs) involved in numerous cellular processes. Identifying ubiquitination sites (Ubi-sites) on proteins offers valuable insights into their function and regulatory mechanisms. Due to the cost- and time-consuming nature of traditional approaches for Ubi-site detection, there has been a growing interest in leveraging artificial intelligence for computer-aided Ubi-site prediction. In this study, we collected experimentally verified Ubi-sites of human proteins from the dbPTM database, then conducted comprehensive state-of-the art computational methods along with standard evaluation metrics and a proper validation strategy for Ubi-site prediction. We presented the effectiveness of our framework by comparing ten machine learning (ML) based approaches in three different categories: feature-based conventional ML methods, end-to-end sequence-based deep learning (DL) techniques, and hybrid feature-based DL models. Our results revealed that DL approaches outperformed the classical ML methods, achieving a 0.902 F1-score, 0.8198 accuracy, 0.8786 precision, and 0.9147 recall as the best performance for a DL model using both raw amino acid sequences and hand-crafted features. Interestingly, our experimental results disclosed that the performance of DL methods had a positive correlation with the length of amino acid fragments, suggesting that utilizing the entire sequence can lead to more accurate predictions in future research endeavors. Additionally, we developed a meticulously curated benchmark for Ubi-site prediction in human proteins. This benchmark serves as a valuable resource for future studies, enabling fair and accurate comparisons between different methods. Overall, our work highlights the potential of ML, particularly DL techniques, in predicting Ubi-sites and furthering our knowledge of protein regulation through ubiquitination in cells.
Collapse
Affiliation(s)
- Mahdi Pourmirzaei
- Department of Information Technology, Tarbiat Modares University, 14115-111, Tehran, Iran
| | - Shahin Ramazi
- Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, 14115-111, Tehran, Iran
| | - Farzaneh Esmaili
- Department of Information Technology, Tarbiat Modares University, 14115-111, Tehran, Iran
| | - Seyedehsamaneh Shojaeilangari
- Biomedical Engineering Group, Department of Electrical and Information Technology, Iranian Research Organization for Science and Technology (IROST), 33535111, Tehran, Iran.
| | - Abdollah Allahvardi
- Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, 14115-111, Tehran, Iran
| |
Collapse
|
7
|
Qian W, Yang Z. Identification of cell-type-specific genes in multimodal single-cell data using deep neural network algorithm. Comput Biol Med 2023; 166:107498. [PMID: 37738895 DOI: 10.1016/j.compbiomed.2023.107498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Revised: 08/15/2023] [Accepted: 09/15/2023] [Indexed: 09/24/2023]
Abstract
The emergence of single-cell RNA sequencing (scRNA-seq) technology makes it possible to measure DNA, RNA, and protein in a single cell. Cellular Indexing of Transcriptomes and Epitopes by sequencing (CITE-seq) is a powerful multimodal single-cell research innovation, allowing researchers to capture RNA and surface protein expression on the same cells. Currently, identification of cell-type-specific genes in CITE-seq data is still challenging. In this study, we obtained a set of CITE-seq datasets from Kaggle database, which included the sequencing dataset of seven cell types during bone marrow stem cell differentiation. We used Student's t-test to analyze these transcription RNAs and pick out 133 significantly differentially expressed genes (DEGs) among all cell types. Functional enrichment revealed that these DEGs were strongly associated with blood-related diseases, providing important insights into the cellular heterogeneity within bone marrow stem cells. The relation between RNA and protein levels was performed by deep neural network (DNN) model and achieved a high prediction score of 0.867. Based on their coefficients in the DNN model, three genes (LGALS1, CENPV, TRIM24) were identified as cell-type-specific genes in erythrocyte progenitor. Our works provide a novel perspective regarding the differentiation of stem cells in the bone marrow and provide valuable insights for further research in this field.
Collapse
Affiliation(s)
- Weiye Qian
- School of Artificial Intelligence, Hangzhou Dianzi University, Hangzhou, PR China
| | - Zhiyuan Yang
- School of Artificial Intelligence, Hangzhou Dianzi University, Hangzhou, PR China.
| |
Collapse
|
8
|
Kita K, Fujimori T, Suzuki Y, Kanie Y, Takenaka S, Kaito T, Taki T, Ukon Y, Furuya M, Saiwai H, Nakajima N, Sugiura T, Ishiguro H, Kamatani T, Tsukazaki H, Sakai Y, Takami H, Tateiwa D, Hashimoto K, Wataya T, Nishigaki D, Sato J, Hoshiyama M, Tomiyama N, Okada S, Kido S. Bimodal artificial intelligence using TabNet for differentiating spinal cord tumors-Integration of patient background information and images. iScience 2023; 26:107900. [PMID: 37766987 PMCID: PMC10520519 DOI: 10.1016/j.isci.2023.107900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 02/18/2023] [Accepted: 09/08/2023] [Indexed: 09/29/2023] Open
Abstract
We proposed a bimodal artificial intelligence that integrates patient information with images to diagnose spinal cord tumors. Our model combines TabNet, a state-of-the-art deep learning model for tabular data for patient information, and a convolutional neural network for images. As training data, we collected 259 spinal tumor patients (158 for schwannoma and 101 for meningioma). We compared the performance of the image-only unimodal model, table-only unimodal model, bimodal model using a gradient-boosting decision tree, and bimodal model using TabNet. Our proposed bimodal model using TabNet performed best (area under the receiver-operating characteristic curve [AUROC]: 0.91) in the training data and significantly outperformed the physicians' performance. In the external validation using 62 cases from the other two facilities, our bimodal model showed an AUROC of 0.92, proving the robustness of the model. The bimodal analysis using TabNet was effective for differentiating spinal tumors.
Collapse
Affiliation(s)
- Kosuke Kita
- Osaka University School of Medicine Graduate School of Medicine Diagnostic and Interventional Radiology, Suita, Osaka, Japan
| | - Takahito Fujimori
- Osaka University Graduate School of Medicine Department of Orthopaedic Surgery, Suita, Osaka, Japan
| | - Yuki Suzuki
- Osaka University School of Medicine Graduate School of Medicine Diagnostic and Interventional Radiology, Suita, Osaka, Japan
| | - Yuya Kanie
- Osaka University Graduate School of Medicine Department of Orthopaedic Surgery, Suita, Osaka, Japan
| | - Shota Takenaka
- Osaka University Graduate School of Medicine Department of Orthopaedic Surgery, Suita, Osaka, Japan
| | - Takashi Kaito
- Osaka University Graduate School of Medicine Department of Orthopaedic Surgery, Suita, Osaka, Japan
| | - Takuyu Taki
- Department of Neurosurgery, Iseikai Hospital, Osaka, Osaka, Japan
| | - Yuichiro Ukon
- Osaka University Graduate School of Medicine Department of Orthopaedic Surgery, Suita, Osaka, Japan
| | | | - Hirokazu Saiwai
- Department of Orthopedic Surgery, Graduate School of Medical Sciences, Kyusyu University, Higashi, Fukuoka, Japan
| | - Nozomu Nakajima
- Japanese Red Cross Society Himeji Hospital, Himeji, Hyogo, Japan
| | - Tsuyoshi Sugiura
- General Incorporated Foundation Sumitomo Hospital, Osaka, Osaka, Japan
| | - Hiroyuki Ishiguro
- National Hospital Organization Osaka National Hospital, Osaka, Osaka, Japan
| | | | | | | | - Haruna Takami
- Osaka International Cancer Institute, Osaka, Osaka, Japan
| | | | | | - Tomohiro Wataya
- Osaka University School of Medicine Graduate School of Medicine Diagnostic and Interventional Radiology, Suita, Osaka, Japan
| | - Daiki Nishigaki
- Osaka University School of Medicine Graduate School of Medicine Diagnostic and Interventional Radiology, Suita, Osaka, Japan
| | - Junya Sato
- Osaka University School of Medicine Graduate School of Medicine Diagnostic and Interventional Radiology, Suita, Osaka, Japan
| | | | - Noriyuki Tomiyama
- Osaka University School of Medicine Graduate School of Medicine Diagnostic and Interventional Radiology, Suita, Osaka, Japan
| | - Seiji Okada
- Osaka University Graduate School of Medicine Department of Orthopaedic Surgery, Suita, Osaka, Japan
| | - Shoji Kido
- Osaka University School of Medicine Graduate School of Medicine Diagnostic and Interventional Radiology, Suita, Osaka, Japan
| |
Collapse
|
9
|
Dong J, Wang K, He J, Guo Q, Min H, Tang D, Zhang Z, Zhang C, Zheng F, Li Y, Xu H, Wang G, Luan S, Yin L, Zhang X, Dai Y. Machine learning-based intradialytic hypotension prediction of patients undergoing hemodialysis: A multicenter retrospective study. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 240:107698. [PMID: 37429246 DOI: 10.1016/j.cmpb.2023.107698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 05/22/2023] [Accepted: 06/24/2023] [Indexed: 07/12/2023]
Abstract
BACKGROUND AND OBJECTIVE Intradialytic hypotension (IDH) is closely associated with adverse clinical outcomes in HD-patients. An IDH predictor model is important for IDH risk screening and clinical decision-making. In this study, we used Machine learning (ML) to develop IDH model for risk prediction in HD patients. METHODS 62,227 dialysis sessions were randomly partitioned into training data (70%), test data (20%), and validation data (10%). IDH-A model based on twenty-seven variables was constructed for risk prediction for the next HD treatment. IDH-B model based on ten variables from 64,870 dialysis sessions was developed for risk assessment before each HD treatment. Light Gradient Boosting Machine (LightGBM), Linear Discriminant Analysis, support vector machines, XGBoost, TabNet, and multilayer perceptron were used to develop the predictor model. RESULTS In IDH-A model, we identified the LightGBM method as the best-performing and interpretable model with C- statistics of 0.82 in Fall30Nadir90 definitions, which was higher than those obtained using the other models (P<0.01). In other IDH standards of Nadir90, Nadir100, Fall20, Fall30, and Fall20Nadir90, the LightGBM method had a performance with C- statistics ranged 0.77 to 0.89. As a complementary application, the LightGBM model in IDH-B model achieved C- statistics of 0.68 in Fall30Nadir90 definitions and 0.69 to 0.78 in the other five IDH standards, which were also higher than the other methods, respectively. CONCLUSION Use ML, we identified the LightGBM method as the good-performing and interpretable model. We identified the top variables as the high-risk factors for IDH incident in HD-patient. IDH-A and IDH-B model can usefully complement each other for risk prediction and further facilitate timely intervention through applied into different clinical setting.
Collapse
Affiliation(s)
- Jingjing Dong
- Clinical Medical Research Center, the Second Clinical Medical College of Jinan University, Shenzhen People's Hospital, Jinan University, Shenzhen 518020, China; Institute of Nephrology and Blood Purification, the First Affiliated Hospital of Jinan University, Jinan University, Guangzhou 510630, China
| | - Kang Wang
- Department of Nephrology, the Second Affiliated Hospital of Jinan University, Shenzhen People's Hospital, Jinan University, Shenzhen 518020, China
| | - Jingquan He
- Clinical Medical Research Center, the Second Clinical Medical College of Jinan University, Shenzhen People's Hospital, Jinan University, Shenzhen 518020, China
| | - Qi Guo
- Shenzhen Yuchen Medical Technology Co., Ltd. Co., Ltd, Shenzhen 518020, China
| | - Haodi Min
- Shenzhen Yuchen Medical Technology Co., Ltd. Co., Ltd, Shenzhen 518020, China
| | - Donge Tang
- Clinical Medical Research Center, the Second Clinical Medical College of Jinan University, Shenzhen People's Hospital, Jinan University, Shenzhen 518020, China
| | - Zeyu Zhang
- Clinical Medical Research Center, the Second Clinical Medical College of Jinan University, Shenzhen People's Hospital, Jinan University, Shenzhen 518020, China; Institute of Nephrology and Blood Purification, the First Affiliated Hospital of Jinan University, Jinan University, Guangzhou 510630, China
| | - Cantong Zhang
- Clinical Medical Research Center, the Second Clinical Medical College of Jinan University, Shenzhen People's Hospital, Jinan University, Shenzhen 518020, China
| | - Fengping Zheng
- Clinical Medical Research Center, the Second Clinical Medical College of Jinan University, Shenzhen People's Hospital, Jinan University, Shenzhen 518020, China
| | - Yixi Li
- Clinical Medical Research Center, the Second Clinical Medical College of Jinan University, Shenzhen People's Hospital, Jinan University, Shenzhen 518020, China; Institute of Nephrology and Blood Purification, the First Affiliated Hospital of Jinan University, Jinan University, Guangzhou 510630, China
| | - Huixuan Xu
- Clinical Medical Research Center, the Second Clinical Medical College of Jinan University, Shenzhen People's Hospital, Jinan University, Shenzhen 518020, China
| | - Gang Wang
- Department of Nephrology, University of Chinese Academy of Sciences Shenzhen Hospital (Guangming), Shenzhen 518020, China
| | - Shaodong Luan
- Departments of Nephrology, Shenzhen Longhua District Central Hospital, Shenzhen 518020, China
| | - Lianghong Yin
- Institute of Nephrology and Blood Purification, the First Affiliated Hospital of Jinan University, Jinan University, Guangzhou 510630, China.
| | - Xinzhou Zhang
- Department of Nephrology, the Second Affiliated Hospital of Jinan University, Shenzhen People's Hospital, Jinan University, Shenzhen 518020, China.
| | - Yong Dai
- Clinical Medical Research Center, the Second Clinical Medical College of Jinan University, Shenzhen People's Hospital, Jinan University, Shenzhen 518020, China.
| |
Collapse
|
10
|
Chen Y, Li H, Dou H, Wen H, Dong Y. Prediction and Visual Analysis of Food Safety Risk Based on TabNet-GRA. Foods 2023; 12:3113. [PMID: 37628112 PMCID: PMC10453234 DOI: 10.3390/foods12163113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 08/11/2023] [Accepted: 08/13/2023] [Indexed: 08/27/2023] Open
Abstract
Food safety risk prediction is crucial for timely hazard detection and effective control. This study proposes a novel risk prediction method for food safety called TabNet-GRA, which combines a specialized deep learning architecture for tabular data (TabNet) with a grey relational analysis (GRA) to predict food safety risk. Initially, this study employed a GRA to derive comprehensive risk values from fused detection data. Subsequently, a food safety risk prediction model was constructed based on TabNet, and training was performed using the detection data as inputs and the comprehensive risk values calculated via the GRA as the expected outputs. Comparative experiments with six typical models demonstrated the superior fitting ability of the TabNet-based prediction model. Moreover, a food safety risk prediction and visualization system (FSRvis system) was designed and implemented based on TabNet-GRA to facilitate risk prediction and visual analysis. A case study in which our method was applied to a dataset of cooked meat products from a Chinese province further validated the effectiveness of the TabNet-GRA method and the FSRvis system. The method can be applied to targeted risk assessment, hazard identification, and early warning systems to strengthen decision making and safeguard public health by proactively addressing food safety risks.
Collapse
Affiliation(s)
- Yi Chen
- Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing Technology and Business University, Beijing 100048, China; (H.L.); (H.D.)
| | - Hanqiang Li
- Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing Technology and Business University, Beijing 100048, China; (H.L.); (H.D.)
| | - Haifeng Dou
- Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing Technology and Business University, Beijing 100048, China; (H.L.); (H.D.)
| | - Hong Wen
- Hubei Provincial Institute for Food Supervision and Test, Wuhan 430075, China;
| | - Yu Dong
- School of Computer Science, University of Technology Sydney, Sydney, NSW 2008, Australia;
| |
Collapse
|
11
|
Chandra A, Tünnermann L, Löfstedt T, Gratz R. Transformer-based deep learning for predicting protein properties in the life sciences. eLife 2023; 12:82819. [PMID: 36651724 PMCID: PMC9848389 DOI: 10.7554/elife.82819] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 01/06/2023] [Indexed: 01/19/2023] Open
Abstract
Recent developments in deep learning, coupled with an increasing number of sequenced proteins, have led to a breakthrough in life science applications, in particular in protein property prediction. There is hope that deep learning can close the gap between the number of sequenced proteins and proteins with known properties based on lab experiments. Language models from the field of natural language processing have gained popularity for protein property predictions and have led to a new computational revolution in biology, where old prediction results are being improved regularly. Such models can learn useful multipurpose representations of proteins from large open repositories of protein sequences and can be used, for instance, to predict protein properties. The field of natural language processing is growing quickly because of developments in a class of models based on a particular model-the Transformer model. We review recent developments and the use of large-scale Transformer models in applications for predicting protein characteristics and how such models can be used to predict, for example, post-translational modifications. We review shortcomings of other deep learning models and explain how the Transformer models have quickly proven to be a very promising way to unravel information hidden in the sequences of amino acids.
Collapse
Affiliation(s)
- Abel Chandra
- Department of Computing Science, Umeå UniversityUmeåSweden
| | - Laura Tünnermann
- Umeå Plant Science Centre (UPSC), Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural SciencesUmeåSweden
| | - Tommy Löfstedt
- Department of Computing Science, Umeå UniversityUmeåSweden
| | - Regina Gratz
- Umeå Plant Science Centre (UPSC), Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural SciencesUmeåSweden
- Department of Forest Ecology and Management, Swedish University of Agricultural SciencesUmeåSweden
| |
Collapse
|
12
|
Zeng Y, Liu D, Wang Y. Identification of phosphorylation site using S-padding strategy based convolutional neural network. Health Inf Sci Syst 2022; 10:29. [PMID: 36124094 PMCID: PMC9481819 DOI: 10.1007/s13755-022-00196-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Accepted: 08/25/2022] [Indexed: 10/14/2022] Open
Abstract
Purpose Abnormal phosphorylation has been proved to associate with a variety of human diseases, and the identification of phosphorylation sites is one of the research hotspots in healthcare. The study of phosphorylation site prediction in deep learning models often introduces a variety of information, and the utilization of complex models limits the usage scenarios of the models. Methods An enhanced deep learning method with S-padding strategy based on convolutional neural network is proposed in this paper. The S-padding strategy forms a three-dimensional matrix with extension information from original amino acid sequences, and a corresponding 2D-CNN model is designed to abstract the comprehensive features of phosphorylation site area in protein sequences. Results The fivefold cross-validation experiments are conducted, and the results show the performance of the proposed method on human dataset can achieve an accuracy of 89.68 % on serine/threonine sites and 88.16 % on tyrosine sites, respectively. Furthermore, phosphorylation site prediction on different organisms obtains the accuracy, sensitivity, and specificity of over 0.85, indicating a potential capability on phosphorylation site prediction task. Comparison result with existing models shows that the proposed method obtains better performance on both accuracy and AUC value, and the proposed method can further improve performance with sufficient training data. Conclusion This method enables proteome-wide predictions via models trained on a large amount of phosphorylation data, further exploiting the potential of protein phosphorylation site identification, and helping to provide insights into phosphorylation mechanisms.
Collapse
Affiliation(s)
- Yanjiao Zeng
- School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, 510006 Guangdong China
| | - Dongning Liu
- School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, 510006 Guangdong China
| | - Yang Wang
- School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, 510006 Guangdong China
| |
Collapse
|
13
|
Small Tweaks, Major Changes: Post-Translational Modifications That Occur within M2 Macrophages in the Tumor Microenvironment. Cancers (Basel) 2022; 14:cancers14225532. [PMID: 36428622 PMCID: PMC9688270 DOI: 10.3390/cancers14225532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/21/2022] [Accepted: 11/07/2022] [Indexed: 11/12/2022] Open
Abstract
The majority of proteins are subjected to post-translational modifications (PTMs), regardless of whether they occur in or after biosynthesis of the protein. Capable of altering the physical and chemical properties and functions of proteins, PTMs are thus crucial. By fostering the proliferation, migration, and invasion of cancer cells with which they communicate in the tumor microenvironment (TME), M2 macrophages have emerged as key cellular players in the TME. Furthermore, growing evidence illustrates that PTMs can occur in M2 macrophages as well, possibly participating in molding the multifaceted characteristics and physiological behaviors in the TME. Hence, there is a need to review the PTMs that have been reported to occur within M2 macrophages. Although there are several reviews available regarding the roles of M2 macrophages, the majority of these reviews overlooked PTMs occurring within M2 macrophages. Considering this, in this review, we provide a review focusing on the advancement of PTMs that have been reported to take place within M2 macrophages, mainly in the TME, to better understand the performance of M2 macrophages in the tumor microenvironment. Incidentally, we also briefly cover the advances in developing inhibitors that target PTMs and the application of artificial intelligence (AI) in the prediction and analysis of PTMs at the end of the review.
Collapse
|