1
|
Geng G, Wang L, Xu Y, Wang T, Ma W, Duan H, Zhang J, Mao A. MGDDI: A multi-scale graph neural networks for drug-drug interaction prediction. Methods 2024; 228:22-29. [PMID: 38754712 DOI: 10.1016/j.ymeth.2024.05.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Revised: 05/09/2024] [Accepted: 05/12/2024] [Indexed: 05/18/2024] Open
Abstract
Drug-drug interaction (DDI) prediction is crucial for identifying interactions within drug combinations, especially adverse effects due to physicochemical incompatibility. While current methods have made strides in predicting adverse drug interactions, limitations persist. Most methods rely on handcrafted features, restricting their applicability. They predominantly extract information from individual drugs, neglecting the importance of interaction details between drug pairs. To address these issues, we propose MGDDI, a graph neural network-based model for predicting potential adverse drug interactions. Notably, we use a multiscale graph neural network (MGNN) to learn drug molecule representations, addressing substructure size variations and preventing gradient issues. For capturing interaction details between drug pairs, we integrate a substructure interaction learning module based on attention mechanisms. Our experimental results demonstrate MGDDI's superiority in predicting adverse drug interactions, offering a solution to current methodological limitations.
Collapse
Affiliation(s)
- Guannan Geng
- Department of Endocrinology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Lizhuang Wang
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Yanwei Xu
- Beidahuang Group Neuropsychiatric Hospital, Jiamusi, China; Department of Stomatology and Dental Hygiene, The Fourth Affiliated Hospital, Harbin Medical University, Harbin, China
| | - Tianshuo Wang
- School of Software, Shandong University, Jinan, China
| | - Wei Ma
- Department of Stomatology and Dental Hygiene, The Fourth Affiliated Hospital, Harbin Medical University, Harbin, China
| | - Hongliang Duan
- Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China
| | - Jiahui Zhang
- Department of Stomatology and Dental Hygiene, The Fourth Affiliated Hospital, Harbin Medical University, Harbin, China.
| | - Anqiong Mao
- The Affiliated Traditional Chinese Medicine Hospital, Southwest Medical University, Department of Anesthesiology, Luzhou, China.
| |
Collapse
|
2
|
Zhang Y, Yang Y, Ren L, Ning L, Zou Q, Luo N, Zhang Y, Liu R. RDscan: Extracting RNA-disease relationship from the literature based on pre-training model. Methods 2024; 228:48-54. [PMID: 38789016 DOI: 10.1016/j.ymeth.2024.05.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2024] [Revised: 05/02/2024] [Accepted: 05/16/2024] [Indexed: 05/26/2024] Open
Abstract
With the rapid advancements in molecular biology and genomics, a multitude of connections between RNA and diseases has been unveiled, making the efficient and accurate extraction of RNA-disease (RD) relationships from extensive biomedical literature crucial for advancing research in this field. This study introduces RDscan, a novel text mining method developed based on the pre-training and fine-tuning strategy, aimed at automatically extracting RD-related information from a vast corpus of literature using pre-trained biomedical large language models (LLM). Initially, we constructed a dedicated RD corpus by manually curating from literature, comprising 2,082 positive and 2,000 negative sentences, alongside an independent test dataset (comprising 500 positive and 500 negative sentences) for training and evaluating RDscan. Subsequently, by fine-tuning the Bioformer and BioBERT pre-trained models, RDscan demonstrated exceptional performance in text classification and named entity recognition (NER) tasks. In 5-fold cross-validation, RDscan significantly outperformed traditional machine learning methods (Support Vector Machine, Logistic Regression and Random Forest). In addition, we have developed an accessible webserver that assists users in extracting RD relationships from text. In summary, RDscan represents the first text mining tool specifically designed for RD relationship extraction, and is poised to emerge as an invaluable tool for researchers dedicated to exploring the intricate interactions between RNA and diseases. Webserver of RDscan is free available at https://cellknowledge.com.cn/RDscan/.
Collapse
Affiliation(s)
- Yang Zhang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 611731, China; School of Healthcare Technology, Chengdu Neusoft University, Chengdu 611844, China.
| | - Yu Yang
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu 611844, China
| | - Liping Ren
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu 611844, China
| | - Lin Ning
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu 611844, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Nanchao Luo
- School of Computer Science and Technology, Aba Teachers College, WenChuan, Sichuan, 623002, China
| | - Yinghui Zhang
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu 611844, China.
| | - Ruijun Liu
- School of Software, Beihang University, Beijing 100191, China.
| |
Collapse
|
3
|
Jin YT, Tan Y, Gan ZH, Hao YD, Wang TY, Lin H, Tang B. Identification of DNase I hypersensitive sites in the human genome by multiple sequence descriptors. Methods 2024; 229:125-132. [PMID: 38964595 DOI: 10.1016/j.ymeth.2024.06.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Revised: 06/01/2024] [Accepted: 06/27/2024] [Indexed: 07/06/2024] Open
Abstract
DNase I hypersensitive sites (DHSs) are chromatin regions highly sensitive to DNase I enzymes. Studying DHSs is crucial for understanding complex transcriptional regulation mechanisms and localizing cis-regulatory elements (CREs). Numerous studies have indicated that disease-related loci are often enriched in DHSs regions, underscoring the importance of identifying DHSs. Although wet experiments exist for DHSs identification, they are often labor-intensive. Therefore, there is a strong need to develop computational methods for this purpose. In this study, we used experimental data to construct a benchmark dataset. Seven feature extraction methods were employed to capture information about human DHSs. The F-score was applied to filter the features. By comparing the prediction performance of various classification algorithms through five-fold cross-validation, random forest was proposed to perform the final model construction. The model could produce an overall prediction accuracy of 0.859 with an AUC value of 0.837. We hope that this model can assist scholars conducting DNase research in identifying these sites.
Collapse
Affiliation(s)
- Yan-Ting Jin
- School of Life Science and Technology, University of Electronic Science and Technology of China, 611731 Chengdu, China.
| | - Yang Tan
- School of Life Science and Technology, University of Electronic Science and Technology of China, 611731 Chengdu, China
| | - Zhong-Hua Gan
- Department of Pathology, The Affiliated Traditional Chinese Medicine Hospital, Southwest Medical University, Luzhou, 646000, Sichuan, China
| | - Yu-Duo Hao
- School of Life Science and Technology, University of Electronic Science and Technology of China, 611731 Chengdu, China.
| | - Tian-Yu Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China, 611731 Chengdu, China
| | - Hao Lin
- School of Life Science and Technology, University of Electronic Science and Technology of China, 611731 Chengdu, China.
| | - Bo Tang
- Department of Pathology, The Affiliated Traditional Chinese Medicine Hospital, Southwest Medical University, Luzhou, 646000, Sichuan, China.
| |
Collapse
|
4
|
Li H, Jiang L, Yang K, Shang S, Li M, Lv Z. iNP_ESM: Neuropeptide Identification Based on Evolutionary Scale Modeling and Unified Representation Embedding Features. Int J Mol Sci 2024; 25:7049. [PMID: 39000158 PMCID: PMC11240975 DOI: 10.3390/ijms25137049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2024] [Revised: 06/17/2024] [Accepted: 06/25/2024] [Indexed: 07/16/2024] Open
Abstract
Neuropeptides are biomolecules with crucial physiological functions. Accurate identification of neuropeptides is essential for understanding nervous system regulatory mechanisms. However, traditional analysis methods are expensive and laborious, and the development of effective machine learning models continues to be a subject of current research. Hence, in this research, we constructed an SVM-based machine learning neuropeptide predictor, iNP_ESM, by integrating protein language models Evolutionary Scale Modeling (ESM) and Unified Representation (UniRep) for the first time. Our model utilized feature fusion and feature selection strategies to improve prediction accuracy during optimization. In addition, we validated the effectiveness of the optimization strategy with UMAP (Uniform Manifold Approximation and Projection) visualization. iNP_ESM outperforms existing models on a variety of machine learning evaluation metrics, with an accuracy of up to 0.937 in cross-validation and 0.928 in independent testing, demonstrating optimal neuropeptide recognition capabilities. We anticipate improved neuropeptide data in the future, and we believe that the iNP_ESM model will have broader applications in the research and clinical treatment of neurological diseases.
Collapse
Affiliation(s)
- Honghao Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610041, China
| | - Liangzhen Jiang
- College of Food and Biological Engineering, Chengdu University, Chengdu 610106, China
- Country Key Laboratory of Coarse Cereal Processing, Ministry of Agriculture and Rural Affairs, Chengdu 610106, China
| | - Kaixiang Yang
- College of Software Engineering, Sichuan University, Chengdu 610041, China
| | - Shulin Shang
- College of Biomedical Engineering, Sichuan University, Chengdu 610041, China
| | - Mingxin Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610041, China
| | - Zhibin Lv
- College of Biomedical Engineering, Sichuan University, Chengdu 610041, China
| |
Collapse
|
5
|
Ye DX, Yu JW, Li R, Hao YD, Wang TY, Yang H, Ding H. The Prediction of Recombination Hotspot Based on Automated Machine Learning. J Mol Biol 2024:168653. [PMID: 38871176 DOI: 10.1016/j.jmb.2024.168653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2024] [Revised: 05/12/2024] [Accepted: 06/06/2024] [Indexed: 06/15/2024]
Abstract
Meiotic recombination plays a pivotal role in genetic evolution. Genetic variation induced by recombination is a crucial factor in generating biodiversity and a driving force for evolution. At present, the development of recombination hotspot prediction methods has encountered challenges related to insufficient feature extraction and limited generalization capabilities. This paper focused on the research of recombination hotspot prediction methods. We explored deep learning-based recombination hotspot prediction and scrutinized the shortcomings of prevalent models in addressing the challenge of recombination hotspot prediction. To addressing these deficiencies, an automated machine learning approach was utilized to construct recombination hotspot prediction model. The model combined sequence information with physicochemical properties by employing TF-IDF-Kmer and DNA composition components to acquire more effective feature data. Experimental results validate the effectiveness of the feature extraction method and automated machine learning technology used in this study. The final model was validated on three distinct datasets and yielded accuracy rates of 97.14%, 79.71%, and 98.73%, surpassing the current leading models by 2%, 2.56%, and 4%, respectively. In addition, we incorporated tools such as SHAP and AutoGluon to analyze the interpretability of black-box models, delved into the impact of individual features on the results, and investigated the reasons behind misclassification of samples. Finally, an application of recombination hotspot prediction was established to facilitate easy access to necessary information and tools for researchers. The research outcomes of this paper underscore the enormous potential of automated machine learning methods in gene sequence prediction.
Collapse
Affiliation(s)
- Dong-Xin Ye
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Jun-Wen Yu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Rui Li
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Yu-Duo Hao
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Tian-Yu Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Yang
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China.
| | - Hui Ding
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| |
Collapse
|
6
|
Ahmed Z, Shahzadi K, Jin Y, Li R, Momanyi BM, Zulfiqar H, Ning L, Lin H. Identification of RNA‐dependent liquid‐liquid phase separation proteins using an artificial intelligence strategy. Proteomics 2024:e2400044. [PMID: 38824664 DOI: 10.1002/pmic.202400044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 05/03/2024] [Accepted: 05/21/2024] [Indexed: 06/04/2024]
Abstract
RNA-dependent liquid-liquid phase separation (LLPS) proteins play critical roles in cellular processes such as stress granule formation, DNA repair, RNA metabolism, germ cell development, and protein translation regulation. The abnormal behavior of these proteins is associated with various diseases, particularly neurodegenerative disorders like amyotrophic lateral sclerosis and frontotemporal dementia, making their identification crucial. However, conventional biochemistry-based methods for identifying these proteins are time-consuming and costly. Addressing this challenge, our study developed a robust computational model for their identification. We constructed a comprehensive dataset containing 137 RNA-dependent and 606 non-RNA-dependent LLPS protein sequences, which were then encoded using amino acid composition, composition of K-spaced amino acid pairs, Geary autocorrelation, and conjoined triad methods. Through a combination of correlation analysis, mutual information scoring, and incremental feature selection, we identified an optimal feature subset. This subset was used to train a random forest model, which achieved an accuracy of 90% when tested against an independent dataset. This study demonstrates the potential of computational methods as efficient alternatives for the identification of RNA-dependent LLPS proteins. To enhance the accessibility of the model, a user-centric web server has been established and can be accessed via the link: http://rpp.lin-group.cn.
Collapse
Affiliation(s)
- Zahoor Ahmed
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, China
| | - Kiran Shahzadi
- Department of Biotechnology, Women University of Azad Jammu and Kashmir Bagh, Bagh, Azad Kashmir, Pakistan
| | - Yanting Jin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Rui Li
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Biffon Manyura Momanyi
- School of Computer Science and Engineering, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hasan Zulfiqar
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, China
| | - Lin Ning
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, China
| | - Hao Lin
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, China
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
7
|
Liu L, Jia R, Hou R, Huang C. Prediction of cell-type-specific cohesin-mediated chromatin loops based on chromatin state. Methods 2024; 226:151-160. [PMID: 38670416 DOI: 10.1016/j.ymeth.2024.04.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Revised: 04/02/2024] [Accepted: 04/18/2024] [Indexed: 04/28/2024] Open
Abstract
Chromatin loop is of crucial importance for the regulation of gene transcription. Cohesin is a type of chromatin-associated protein that mediates the interaction of chromatin through the loop extrusion. Cohesin-mediated chromatin interactions have strong cell-type specificity, posing a challenge for predicting chromatin loops. Existing computational methods perform poorly in predicting cell-type-specific chromatin loops. To address this issue, we propose a random forest model to predict cell-type-specific cohesin-mediated chromatin loops based on chromatin states identified by ChromHMM and the occupancy of related factors. Our results show that chromatin state is responsible for cell-type-specificity of loops. Using only chromatin states as features, the model achieved high accuracy in predicting cell-type-specific loops between two cell types and can be applied to different cell types. Furthermore, when chromatin states are combined with the occurrence frequency of CTCF, RAD21, YY1, and H3K27ac ChIP-seq peaks, more accurate prediction can be achieved. Our feature extraction method provides novel insights into predicting cell-type-specific chromatin loops and reveals the relationship between chromatin state and chromatin loop formation.
Collapse
Affiliation(s)
- Li Liu
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, China.
| | - Ranran Jia
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou 571158, China.
| | - Rui Hou
- College of Data Science and Application, Inner Mongolia University of Technology, Hohhot 010051, China.
| | - Chengbing Huang
- School of Computer Science and Technology, Aba Teachers University, Aba 623002, China.
| |
Collapse
|
8
|
Sun SL, Zhou BW, Liu SZ, Xiu YH, Bilal A, Long HX. Prediction of miRNAs and diseases association based on sparse autoencoder and MLP. Front Genet 2024; 15:1369811. [PMID: 38873111 PMCID: PMC11169787 DOI: 10.3389/fgene.2024.1369811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Accepted: 05/07/2024] [Indexed: 06/15/2024] Open
Abstract
Introduction: MicroRNAs (miRNAs) are small and non-coding RNA molecules which have multiple important regulatory roles within cells. With the deepening research on miRNAs, more and more researches show that the abnormal expression of miRNAs is closely related to various diseases. The relationship between miRNAs and diseases is crucial for discovering the pathogenesis of diseases and exploring new treatment methods. Methods: Therefore, we propose a new sparse autoencoder and MLP method (SPALP) to predict the association between miRNAs and diseases. In this study, we adopt advanced deep learning technologies, including sparse autoencoder and multi-layer perceptron (MLP), to improve the accuracy of predicting miRNA-disease associations. Firstly, the SPALP model uses a sparse autoencoder to perform feature learning and extract the initial features of miRNAs and diseases separately, obtaining the latent features of miRNAs and diseases. Then, the latent features combine miRNAs functional similarity data with diseases semantic similarity data to construct comprehensive miRNAs-diseases datasets. Subsequently, the MLP model can predict the unknown association among miRNAs and diseases. Result: To verify the performance of our model, we set up several comparative experiments. The experimental results show that, compared with traditional methods and other deep learning prediction methods, our method has significantly improved the accuracy of predicting miRNAs-disease associations, with 94.61% accuracy and 0.9859 AUC value. Finally, we conducted case study of SPALP model. We predicted the top 30 miRNAs that might be related to Lupus Erythematosus, Ecute Myeloid Leukemia, Cardiovascular, Stroke, Diabetes Mellitus five elderly diseases and validated that 27, 29, 29, 30, and 30 of the top 30 are indeed associated. Discussion: The SPALP approach introduced in this study is adept at forecasting the links between miRNAs and diseases, addressing the complexities of analyzing extensive bioinformatics datasets and enriching the comprehension contribution to disease progression of miRNAs.
Collapse
Affiliation(s)
- Si-Lin Sun
- Department of Information Science Technology, Hainan Normal University, Haikou, Hainan, China
| | - Bing-Wei Zhou
- Department of Information Science Technology, Hainan Normal University, Haikou, Hainan, China
| | - Sheng-Zheng Liu
- Department of Information Science Technology, Hainan Normal University, Haikou, Hainan, China
| | - Yu-Han Xiu
- Department of Information Science Technology, Hainan Normal University, Haikou, Hainan, China
| | - Anas Bilal
- Department of Information Science Technology, Hainan Normal University, Haikou, Hainan, China
- Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, Haikou, China
| | - Hai-Xia Long
- Department of Information Science Technology, Hainan Normal University, Haikou, Hainan, China
- Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, Haikou, China
| |
Collapse
|
9
|
Cheng N, Wang L, Liu Y, Song B, Ding C. HANSynergy: Heterogeneous Graph Attention Network for Drug Synergy Prediction. J Chem Inf Model 2024; 64:4334-4347. [PMID: 38709204 PMCID: PMC11135324 DOI: 10.1021/acs.jcim.4c00003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 04/23/2024] [Accepted: 04/24/2024] [Indexed: 05/07/2024]
Abstract
Drug synergy therapy is a promising strategy for cancer treatment. However, the extensive variety of available drugs and the time-intensive process of determining effective drug combinations through clinical trials pose significant challenges. It requires a reliable method for the rapid and precise selection of drug synergies. In response, various computational strategies have been developed for predicting drug synergies, yet the exploitation of heterogeneous biological network features remains underexplored. In this study, we construct a heterogeneous graph that encompasses diverse biological entities and interactions, utilizing rich data sets from sources, such as DrugCombDB, PubChem, UniProt, and cancer cell line encyclopedia (CCLE). We initialize node feature representations and introduce a novel virtual node to enhance drug representation. Our proposed method, the heterogeneous graph attention network for drug-drug synergy prediction (HANSynergy), has been experimentally validated to demonstrate that the heterogeneous graph attention network can extract key node features, efficiently harness the diversity of information, and further enhance network functionality through the incorporation of a multihead attention mechanism. In the comparative experiment, the highest accuracy (Acc) and area under the curve (AUC) are 0.877 and 0.947, respectively, in DrugCombDB_early data set, demonstrating the superiority of HANSynergy over the competing methods. Moreover, protein-protein interactions are important in understanding the mechanism of action of drugs. The heterogeneous attention mechanism facilitates protein-protein interaction analysis. By analyzing the changes of attention weight before and after heterogeneous network training, we investigated proteins that may be associated with drug combinations. Additionally, case studies align our findings with existing research, underscoring the potential of HANSynergy in drug synergy prediction. This advancement not only contributes to the burgeoning field of drug synergy prediction but also holds the potential to provide valuable insights and uncover new drug synergies for combating cancer.
Collapse
Affiliation(s)
- Ning Cheng
- School
of Informatics, Hunan University of Chinese
Medicine, Changsha, Hunan 410208, China
| | - Li Wang
- Degree
Programs in Systems and information Engineering, University of Tsukuba, Tsukuba, Ibaraki 305-8577, Japan
| | - Yiping Liu
- College
of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Bosheng Song
- College
of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Changsong Ding
- School
of Informatics, Hunan University of Chinese
Medicine, Changsha, Hunan 410208, China
- Big
Data Analysis Laboratory of Traditional Chinese Medicine, Hunan University of Chinese Medicine, Changsha, Hunan 410208, China
| |
Collapse
|
10
|
Cui Y, Liu H, Ming Y, Zhang Z, Liu L, Liu R. Prediction of strand-specific and cell-type-specific G-quadruplexes based on high-resolution CUT&Tag data. Brief Funct Genomics 2024; 23:265-275. [PMID: 37357985 DOI: 10.1093/bfgp/elad024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 05/20/2023] [Accepted: 06/01/2023] [Indexed: 06/27/2023] Open
Abstract
G-quadruplex (G4), a non-classical deoxyribonucleic acid structure, is widely distributed in the genome and involved in various biological processes. In vivo, high-throughput sequencing has indicated that G4s are significantly enriched at functional regions in a cell-type-specific manner. Therefore, the prediction of G4s based on computational methods is necessary instead of the time-consuming and laborious experimental methods. Recently, G4 CUT&Tag has been developed to generate higher-resolution sequencing data than ChIP-seq, which provides more accurate training samples for model construction. In this paper, we present a new dataset construction method based on G4 CUT&Tag sequencing data and an XGBoost prediction model based on the machine learning boost method. The results show that our model performs well within and across cell types. Furthermore, sequence analysis indicates that the formation of G4 structure is greatly affected by the flanking sequences, and the GC content of the G4 flanking sequences is higher than non-G4. Moreover, we also identified G4 motifs in the high-resolution dataset, among which we found several motifs for known transcription factors (TFs), such as SP2 and BPC. These TFs may directly or indirectly affect the formation of the G4 structure.
Collapse
Affiliation(s)
- Yizhi Cui
- School of Computer Science and Engineering, Beijing Technology and Business University, Beijing, 100048, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324003, Zhejiang, China
| | - Hongzhi Liu
- School of Computer Science and Engineering, Beijing Technology and Business University, Beijing, 100048, China
| | - Yutong Ming
- School of Computer Science and Engineering, Beijing Technology and Business University, Beijing, 100048, China
| | - Zheng Zhang
- Department of Computer Science and Software Engineering, Auburn University, Auburn, 36830, Alabama, USA
| | - Li Liu
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324003, Zhejiang, China
| | - Ruijun Liu
- School of Computer Science and Engineering, Beijing Technology and Business University, Beijing, 100048, China
| |
Collapse
|
11
|
He T, Gao Z, Lin L, Zhang X, Zou Q. Prognostic signature analysis and survival prediction of esophageal cancer based on N6-methyladenosine associated lncRNAs. Brief Funct Genomics 2024; 23:239-248. [PMID: 37465899 DOI: 10.1093/bfgp/elad028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 06/27/2023] [Accepted: 07/04/2023] [Indexed: 07/20/2023] Open
Abstract
Esophageal cancer (ESCA) has a bad prognosis. Long non-coding RNA (lncRNA) impacts on cell proliferation. However, the prognosis function of N6-methyladenosine (m6A)-associated lncRNAs (m6A-lncRNAs) in ESCA remains unknown. Univariate Cox analysis was applied to investigate prognosis related m6A-lncRNAs, based on which the samples were clustered. Wilcoxon rank and Chi-square tests were adopted to compare the clinical traits, survival, pathway activity and immune infiltration in different clusters where overall survival, clinical traits (N stage), tumor-invasive immune cells and pathway activity were found significantly different. Through least absolute shrinkage and selection operator and proportional hazard (Lasso-Cox) model, five m6A-lncRNAs were selected to construct the prognostic signature (m6A-lncSig) and risk score. To investigate the link between risk score and clinical traits or immunological microenvironments, Chi-square test and Spearman correlation analysis were utilized. Risk score was found connected with N stage, tumor stage, different clusters, macrophages M2, B cells naive and T cells CD4 memory resting. Risk score and tumor stage were found as independent prognostic variables. And the constructed nomogram model had high accuracy in predicting prognosis. The obtained m6A-lncSig could be taken as potential prognostic biomarker for ESCA patients. This study offers a theoretical foundation for clinical diagnosis and prognosis of ESCA.
Collapse
Affiliation(s)
- Ting He
- School of Mathematics and Statistics, Southwest University, Chongqing 400715, China
| | - Zhipeng Gao
- Beidahuang Industry Group General Hospital, Harbin 150000, China
| | - Ling Lin
- Yucai School Attached to Sichuan Chengdu No. 7 High School, Chengdu 610503, China
| | - Xu Zhang
- School of Mathematics and Statistics, Southwest University, Chongqing 400715, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 611730, China
- School of Mathematics and Statistics, Southwest University, Chongqing 400715, China
| |
Collapse
|
12
|
Wei H, Gao L, Wu S, Jiang Y, Liu B. DiSMVC: a multi-view graph collaborative learning framework for measuring disease similarity. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae306. [PMID: 38715444 DOI: 10.1093/bioinformatics/btae306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 04/19/2024] [Accepted: 05/05/2024] [Indexed: 05/30/2024]
Abstract
MOTIVATION Exploring potential associations between diseases can help in understanding pathological mechanisms of diseases and facilitating the discovery of candidate biomarkers and drug targets, thereby promoting disease diagnosis and treatment. Some computational methods have been proposed for measuring disease similarity. However, these methods describe diseases without considering their latent multi-molecule regulation and valuable supervision signal, resulting in limited biological interpretability and efficiency to capture association patterns. RESULTS In this study, we propose a new computational method named DiSMVC. Different from existing predictors, DiSMVC designs a supervised graph collaborative framework to measure disease similarity. Multiple bio-entity associations related to genes and miRNAs are integrated via cross-view graph contrastive learning to extract informative disease representation, and then association pattern joint learning is implemented to compute disease similarity by incorporating phenotype-annotated disease associations. The experimental results show that DiSMVC can draw discriminative characteristics for disease pairs, and outperform other state-of-the-art methods. As a result, DiSMVC is a promising method for predicting disease associations with molecular interpretability. AVAILABILITY AND IMPLEMENTATION Datasets and source codes are available at https://github.com/Biohang/DiSMVC.
Collapse
Affiliation(s)
- Hang Wei
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi 710126, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi 710126, China
| | - Shuai Wu
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi 710126, China
| | - Yina Jiang
- Department of Basic Medicine, Shaanxi University of Chinese Medicine, Xianyang, Shaanxi 712046, China
| | - Bin Liu
- Faculty of Engineering, Shenzhen MSU-BIT University, Shenzhen, Guangdong 518172, China
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
| |
Collapse
|
13
|
Ma X, Li Z, Du Z, Xu Y, Chen Y, Zhuo L, Fu X, Liu R. Advancing cancer driver gene detection via Schur complement graph augmentation and independent subspace feature extraction. Comput Biol Med 2024; 174:108484. [PMID: 38643595 DOI: 10.1016/j.compbiomed.2024.108484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 03/18/2024] [Accepted: 04/15/2024] [Indexed: 04/23/2024]
Abstract
Accurately identifying cancer driver genes (CDGs) is crucial for guiding cancer treatment and has recently received great attention from researchers. However, the high complexity and heterogeneity of cancer gene regulatory networks limit the precition accuracy of existing deep learning models. To address this, we introduce a model called SCIS-CDG that utilizes Schur complement graph augmentation and independent subspace feature extraction techniques to effectively predict potential CDGs. Firstly, a random Schur complement strategy is adopted to generate two augmented views of gene network within a graph contrastive learning framework. Rapid randomization of the random Schur complement strategy enhances the model's generalization and its ability to handle complex networks effectively. Upholding the Schur complement principle in expectations promotes the preservation of the original gene network's vital structure in the augmented views. Subsequently, we employ feature extraction technology using multiple independent subspaces, each trained with independent weights to reduce inter-subspace dependence and improve the model's expressiveness. Concurrently, we introduced a feature expansion component based on the structure of the gene network to address issues arising from the limited dimensionality of node features. Moreover, it can alleviate the challenges posed by the heterogeneity of cancer gene networks to some extent. Finally, we integrate a learnable attention weight mechanism into the graph neural network (GNN) encoder, utilizing feature expansion technology to optimize the significance of various feature levels in the prediction task. Following extensive experimental validation, the SCIS-CDG model has exhibited high efficiency in identifying known CDGs and uncovering potential unknown CDGs in external datasets. Particularly when compared to previous conventional GNN models, its performance has seen significant improved. The code and data are publicly available at: https://github.com/mxqmxqmxq/SCIS-CDG.
Collapse
Affiliation(s)
- Xinqian Ma
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, 325027, Wenzhou, China
| | - Zhen Li
- School of Computer Science of Information Technology, Qiannan Normal University for Nationalities, Duyun, Guizhou 558000, China; Institute of Computational Science and Technology, Guangzhou University, 510000, Guangzhou, China
| | - Zhenya Du
- Guangzhou Xinhua University, 510520, Guangzhou, China
| | - Yan Xu
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, 325027, Wenzhou, China
| | - Yifan Chen
- College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha, Hunan, 410004, China
| | - Linlin Zhuo
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, 325027, Wenzhou, China.
| | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, 410012, Changsha, China
| | - Ruijun Liu
- School of Software, Beihang University, Beijing, China.
| |
Collapse
|
14
|
Chen W, Zhang Y, Wu W, Yang H, Huang W. Machine learning-based predictive model for abdominal diseases using physical examination datasets. Comput Biol Med 2024; 173:108249. [PMID: 38531251 DOI: 10.1016/j.compbiomed.2024.108249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 02/21/2024] [Accepted: 03/06/2024] [Indexed: 03/28/2024]
Abstract
Abdominal ultrasound is a key non-invasive imaging method for diagnosing liver, kidney, and gallbladder diseases, despite its clinical significance, not all individuals can undergo abdominal ultrasonography during routine health check-ups due to limitations in equipment, cost, and time. This study aims to use basic physical examination data to predict the risk of diseases of the liver, kidney, and gallbladder that can be diagnosed via abdominal ultrasound. Basic physical examination data contain gender, age, height, weight, BMI, pulse, systolic blood pressure (SBP), diastolic blood pressure (DBP), high-density lipoprotein (HDL), low-density lipoprotein (LDL), total cholesterol, triglycerides, fasting blood glucose (FBG), and uric acid-we established seven single-label predictive models and one multi-label predictive model. These models were specifically designed to predict a range of abdominal diseases. The single-label models, utilizing the XGBoost algorithm, targeted diseases such as fatty liver (with an Area Under the Curve (AUC) of 0.9344), liver deposits (AUC: 0.8221), liver cysts (AUC: 0.7928), gallbladder polyps (AUC: 0.7508), kidney stones (AUC: 0.7853), kidney cysts (AUC: 0.8241), and kidney crystals (AUC: 0.7536). Furthermore, a comprehensive multi-label model, capable of predicting multiple conditions simultaneously, was established by FCN and achieved an AUC of 0.6344. We conducted interpretability analysis on these models to enhance their understanding and applicability in clinical settings. The insights gained from this analysis are crucial for the development of targeted disease prevention strategies. This study represents a significant advancement in utilizing physical examination data to predict ultrasound results, offering a novel approach to early diagnosis and prevention of abdominal diseases.
Collapse
Affiliation(s)
- Wei Chen
- Zhejiang Academy of Traditional Chinese Medicine Culture, Zhejiang Chinese Medical University, Hangzhou, China; Four Provincial Marginal Traditional Chinese Medicine Hospitals (Quzhou Traditional Chinese Medicine Hospital) Affiliated to Zhejiang University of Traditional Chinese Medicine, Quzhou, China
| | - YuJie Zhang
- Zhejiang Academy of Traditional Chinese Medicine Culture, Zhejiang Chinese Medical University, Hangzhou, China
| | - Weili Wu
- Four Provincial Marginal Traditional Chinese Medicine Hospitals (Quzhou Traditional Chinese Medicine Hospital) Affiliated to Zhejiang University of Traditional Chinese Medicine, Quzhou, China
| | - Hui Yang
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China.
| | - Wenxiu Huang
- Zhejiang Academy of Traditional Chinese Medicine Culture, Zhejiang Chinese Medical University, Hangzhou, China.
| |
Collapse
|
15
|
Gu ZF, Hao YD, Wang TY, Cai PL, Zhang Y, Deng KJ, Lin H, Lv H. Prediction of blood-brain barrier penetrating peptides based on data augmentation with Augur. BMC Biol 2024; 22:86. [PMID: 38637801 PMCID: PMC11027412 DOI: 10.1186/s12915-024-01883-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 04/05/2024] [Indexed: 04/20/2024] Open
Abstract
BACKGROUND The blood-brain barrier serves as a critical interface between the bloodstream and brain tissue, mainly composed of pericytes, neurons, endothelial cells, and tightly connected basal membranes. It plays a pivotal role in safeguarding brain from harmful substances, thus protecting the integrity of the nervous system and preserving overall brain homeostasis. However, this remarkable selective transmission also poses a formidable challenge in the realm of central nervous system diseases treatment, hindering the delivery of large-molecule drugs into the brain. In response to this challenge, many researchers have devoted themselves to developing drug delivery systems capable of breaching the blood-brain barrier. Among these, blood-brain barrier penetrating peptides have emerged as promising candidates. These peptides had the advantages of high biosafety, ease of synthesis, and exceptional penetration efficiency, making them an effective drug delivery solution. While previous studies have developed a few prediction models for blood-brain barrier penetrating peptides, their performance has often been hampered by issue of limited positive data. RESULTS In this study, we present Augur, a novel prediction model using borderline-SMOTE-based data augmentation and machine learning. we extract highly interpretable physicochemical properties of blood-brain barrier penetrating peptides while solving the issues of small sample size and imbalance of positive and negative samples. Experimental results demonstrate the superior prediction performance of Augur with an AUC value of 0.932 on the training set and 0.931 on the independent test set. CONCLUSIONS This newly developed Augur model demonstrates superior performance in predicting blood-brain barrier penetrating peptides, offering valuable insights for drug development targeting neurological disorders. This breakthrough may enhance the efficiency of peptide-based drug discovery and pave the way for innovative treatment strategies for central nervous system diseases.
Collapse
Affiliation(s)
- Zhi-Feng Gu
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, PR China
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 611731, PR China
| | - Yu-Duo Hao
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, PR China
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 611731, PR China
| | - Tian-Yu Wang
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, PR China
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 611731, PR China
| | - Pei-Ling Cai
- School of Basic Medical Sciences, Chengdu University, Chengdu, 610106, PR China
| | - Yang Zhang
- Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, 610072, PR China
| | - Ke-Jun Deng
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, PR China
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 611731, PR China
| | - Hao Lin
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, PR China.
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 611731, PR China.
| | - Hao Lv
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, PR China.
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 611731, PR China.
| |
Collapse
|
16
|
Zhang ZY, Zhang Z, Ye X, Sakurai T, Lin H. A BERT-based model for the prediction of lncRNA subcellular localization in Homo sapiens. Int J Biol Macromol 2024; 265:130659. [PMID: 38462114 DOI: 10.1016/j.ijbiomac.2024.130659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 02/19/2024] [Accepted: 03/04/2024] [Indexed: 03/12/2024]
Abstract
Understanding the subcellular localization of lncRNAs is crucial for comprehending their regulation activities. The conventional detection of lncRNA subcellular location usually uses in situ detection techniques, which are resource intensive. Some machine learning-based algorithms have been proposed for lncRNA subcellular location prediction in mammals. However, due to the low level of conservation of lncRNA sequence, the performance of cross-species models remains unsatisfactory. In this study, we curated a novel dataset containing subcellular location information of lncRNAs in Homo sapiens. Subsequently, based on the BERT pre-trained language algorithm, we developed a model for lncRNA subcellular location prediction. Our model achieved a micro-average area under the receiver operating characteristic (AUROC) of 0.791 on the training set and an AUROC of 0.700 on the testing nucleus set. Additionally, we conducted cross-species validation and motif discovery to further investigate underlying patterns. In summary, our study provides valuable guidance and computational analysis tools for exploring the mechanisms of lncRNA subcellular localization and the dynamic spatial changes of RNA in abnormal physiological states.
Collapse
Affiliation(s)
- Zhao-Yue Zhang
- Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba 3058577, Japan
| | - Zheng Zhang
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan.
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| | - Hao Lin
- Center for Information Biology, University of Electronic Science and Technology of China, Chengdu 611731, China.
| |
Collapse
|
17
|
Chen M, Sun M, Su X, Tiwari P, Ding Y. Fuzzy kernel evidence Random Forest for identifying pseudouridine sites. Brief Bioinform 2024; 25:bbae169. [PMID: 38622357 PMCID: PMC11018548 DOI: 10.1093/bib/bbae169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/27/2024] [Accepted: 03/31/2024] [Indexed: 04/17/2024] Open
Abstract
Pseudouridine is an RNA modification that is widely distributed in both prokaryotes and eukaryotes, and plays a critical role in numerous biological activities. Despite its importance, the precise identification of pseudouridine sites through experimental approaches poses significant challenges, requiring substantial time and resources.Therefore, there is a growing need for computational techniques that can reliably and quickly identify pseudouridine sites from vast amounts of RNA sequencing data. In this study, we propose fuzzy kernel evidence Random Forest (FKeERF) to identify pseudouridine sites. This method is called PseU-FKeERF, which demonstrates high accuracy in identifying pseudouridine sites from RNA sequencing data. The PseU-FKeERF model selected four RNA feature coding schemes with relatively good performance for feature combination, and then input them into the newly proposed FKeERF method for category prediction. FKeERF not only uses fuzzy logic to expand the original feature space, but also combines kernel methods that are easy to interpret in general for category prediction. Both cross-validation tests and independent tests on benchmark datasets have shown that PseU-FKeERF has better predictive performance than several state-of-the-art methods. This new method not only improves the accuracy of pseudouridine site identification, but also provides a certain reference for disease control and related drug development in the future.
Collapse
Affiliation(s)
- Mingshuai Chen
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 611731, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, China
| | - Mingai Sun
- Beidahuang Industry Group General Hospital, Harbin 150001, China
| | - Xi Su
- Foshan Women and Children Hospital, Foshan 528000, China
| | - Prayag Tiwari
- School of Information Technology, Halmstad University, Sweden
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, China
| |
Collapse
|
18
|
Ju H, Cui Y, Su Q, Juan L, Manavalan B. CODENET: A deep learning model for COVID-19 detection. Comput Biol Med 2024; 171:108229. [PMID: 38447500 DOI: 10.1016/j.compbiomed.2024.108229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 02/20/2024] [Accepted: 02/25/2024] [Indexed: 03/08/2024]
Abstract
Conventional COVID-19 testing methods have some flaws: they are expensive and time-consuming. Chest X-ray (CXR) diagnostic approaches can alleviate these flaws to some extent. However, there is no accurate and practical automatic diagnostic framework with good interpretability. The application of artificial intelligence (AI) technology to medical radiography can help to accurately detect the disease, reduce the burden on healthcare organizations, and provide good interpretability. Therefore, this study proposes a new deep neural network (CNN) based on CXR for COVID-19 diagnosis - CodeNet. This method uses contrastive learning to make full use of latent image data to enhance the model's ability to extract features and generalize across different data domains. On the evaluation dataset, the proposed method achieves an accuracy as high as 94.20%, outperforming several other existing methods used for comparison. Ablation studies validate the efficacy of the proposed method, while interpretability analysis shows that the method can effectively guide clinical professionals. This work demonstrates the superior detection performance of a CNN using contrastive learning techniques on CXR images, paving the way for computer vision and artificial intelligence technologies to leverage massive medical data for disease diagnosis.
Collapse
Affiliation(s)
- Hong Ju
- Heilongjiang Agricultural Engineering Vocational College, China
| | - Yanyan Cui
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Qiaosen Su
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea
| | - Liran Juan
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150001, China.
| | - Balachandran Manavalan
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea.
| |
Collapse
|
19
|
Qi R, Zhang Z, Wu J, Dou L, Xu L, Cheng Y. A new method for handling heterogeneous data in bioinformatics. Comput Biol Med 2024; 170:107937. [PMID: 38217975 DOI: 10.1016/j.compbiomed.2024.107937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 12/25/2023] [Accepted: 01/01/2024] [Indexed: 01/15/2024]
Abstract
Heterogeneous data, especially a mixture of numerical and categorical data, widely exist in bioinformatics. Most of works focus on defining new distance metrics rather than learning discriminative metrics for mixed data. Here, we create a new support vector heterogeneous metric learning framework for mixed data. A heterogeneous sample pair kernel is defined for mixed data and metric learning is then converted to a sample pair classification problem. The suggested approach lends itself well to effective resolution through conventional support vector machine solvers. Empirical assessments conducted on mixed data benchmarks and cancer datasets affirm the exceptional efficacy demonstrated by the proposed modeling technique.
Collapse
Affiliation(s)
- Ren Qi
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China; School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zehua Zhang
- Scientific and Technological Innovation Center, Beijing, China
| | - Jin Wu
- School of Management, Shenzhen Polytechnic University, Shenzhen, China
| | - Lijun Dou
- School of Electronic and Communication Engineering, Shenzhen Polytechnic University, Shenzhen, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic University, Shenzhen, China
| | - Yue Cheng
- School of Electronic and Communication Engineering, Shenzhen Polytechnic University, Shenzhen, China.
| |
Collapse
|
20
|
Gu X, Liu J, Yu Y, Xiao P, Ding Y. MFD-GDrug: multimodal feature fusion-based deep learning for GPCR-drug interaction prediction. Methods 2024; 223:75-82. [PMID: 38286333 DOI: 10.1016/j.ymeth.2024.01.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Revised: 01/14/2024] [Accepted: 01/26/2024] [Indexed: 01/31/2024] Open
Abstract
The accurate identification of drug-protein interactions (DPIs) is crucial in drug development, especially concerning G protein-coupled receptors (GPCRs), which are vital targets in drug discovery. However, experimental validation of GPCR-drug pairings is costly, prompting the need for accurate predictive methods. To address this, we propose MFD-GDrug, a multimodal deep learning model. Leveraging the ESM pretrained model, we extract protein features and employ a CNN for protein feature representation. For drugs, we integrated multimodal features of drug molecular structures, including three-dimensional features derived from Mol2vec and the topological information of drug graph structures extracted through Graph Convolutional Neural Networks (GCN). By combining structural characterizations and pretrained embeddings, our model effectively captures GPCR-drug interactions. Our tests on leading GPCR-drug interaction datasets show that MFD-GDrug outperforms other methods, demonstrating superior predictive accuracy.
Collapse
Affiliation(s)
- Xingyue Gu
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| | - Junkai Liu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Yue Yu
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China
| | - Pengfeng Xiao
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China.
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324003, China; Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 611730, China.
| |
Collapse
|
21
|
Zhang HQ, Liu SH, Li R, Yu JW, Ye DX, Yuan SS, Lin H, Huang CB, Tang H. MIBPred: Ensemble Learning-Based Metal Ion-Binding Protein Classifier. ACS OMEGA 2024; 9:8439-8447. [PMID: 38405489 PMCID: PMC10882704 DOI: 10.1021/acsomega.3c09587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/16/2024] [Accepted: 01/22/2024] [Indexed: 02/27/2024]
Abstract
In biological organisms, metal ion-binding proteins participate in numerous metabolic activities and are closely associated with various diseases. To accurately predict whether a protein binds to metal ions and the type of metal ion-binding protein, this study proposed a classifier named MIBPred. The classifier incorporated advanced Word2Vec technology from the field of natural language processing to extract semantic features of the protein sequence language and combined them with position-specific score matrix (PSSM) features. Furthermore, an ensemble learning model was employed for the metal ion-binding protein classification task. In the model, we independently trained XGBoost, LightGBM, and CatBoost algorithms and integrated the output results through an SVM voting mechanism. This innovative combination has led to a significant breakthrough in the predictive performance of our model. As a result, we achieved accuracies of 95.13% and 85.19%, respectively, in predicting metal ion-binding proteins and their types. Our research not only confirms the effectiveness of Word2Vec technology in extracting semantic information from protein sequences but also highlights the outstanding performance of the MIBPred classifier in the problem of metal ion-binding protein types. This study provides a reliable tool and method for the in-depth exploration of the structure and function of metal ion-binding proteins.
Collapse
Affiliation(s)
- Hong-Qi Zhang
- School
of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of
China, Chengdu 610054, China
| | - Shang-Hua Liu
- School
of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of
China, Chengdu 610054, China
| | - Rui Li
- School
of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of
China, Chengdu 610054, China
| | - Jun-Wen Yu
- School
of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of
China, Chengdu 610054, China
| | - Dong-Xin Ye
- School
of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of
China, Chengdu 610054, China
| | - Shi-Shi Yuan
- School
of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of
China, Chengdu 610054, China
| | - Hao Lin
- School
of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of
China, Chengdu 610054, China
| | - Cheng-Bing Huang
- School
of Computer Science and Technology, Aba Teachers University, Aba 623002, China
| | - Hua Tang
- School
of Basic Medical Sciences, Southwest Medical
University, Luzhou 646000, China
- Central
Nervous System Drug Key Laboratory of Sichuan Province, Luzhou 646000, China
| |
Collapse
|
22
|
Zhang J, Wang R, Wei L. MucLiPred: Multi-Level Contrastive Learning for Predicting Nucleic Acid Binding Residues of Proteins. J Chem Inf Model 2024; 64:1050-1065. [PMID: 38301174 DOI: 10.1021/acs.jcim.3c01471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Protein-molecule interactions play a crucial role in various biological functions, with their accurate prediction being pivotal for drug discovery and design processes. Traditional methods for predicting protein-molecule interactions are limited. Some can only predict interactions with a specific molecule, restricting their applicability, while others target multiple molecule types but fail to efficiently process diverse interaction information, leading to complexity and inefficiency. This study presents a novel deep learning model, MucLiPred, equipped with a dual contrastive learning mechanism aimed at improving the prediction of multiple molecule-protein interactions and the identification of potential molecule-binding residues. The residue-level paradigm focuses on differentiating binding from non-binding residues, illuminating detailed local interactions. The type-level paradigm, meanwhile, analyzes overarching contexts of molecule types, like DNA or RNA, ensuring that representations of identical molecule types gravitate closer in the representational space, bolstering the model's proficiency in discerning interaction motifs. This dual approach enables comprehensive multi-molecule predictions, elucidating the relationships among different molecule types and strengthening precise protein-molecule interaction predictions. Empirical evidence demonstrates MucLiPred's superiority over existing models in robustness and prediction accuracy. The integration of dual contrastive learning techniques amplifies its capability to detect potential molecule-binding residues with precision. Further optimization, separating representational and classification tasks, has markedly improved its performance. MucLiPred thus represents a significant advancement in protein-molecule interaction prediction, setting a new precedent for future research in this field.
Collapse
Affiliation(s)
- Jiashuo Zhang
- School of Software, Shandong University, Jinan 250101, China
| | - Ruheng Wang
- School of Software, Shandong University, Jinan 250101, China
| | - Leyi Wei
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
- Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China
| |
Collapse
|
23
|
Niu M, Wang C, Zhang Z, Zou Q. A computational model of circRNA-associated diseases based on a graph neural network: prediction and case studies for follow-up experimental validation. BMC Biol 2024; 22:24. [PMID: 38281919 PMCID: PMC10823650 DOI: 10.1186/s12915-024-01826-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 01/11/2024] [Indexed: 01/30/2024] Open
Abstract
BACKGROUND Circular RNAs (circRNAs) have been confirmed to play a vital role in the occurrence and development of diseases. Exploring the relationship between circRNAs and diseases is of far-reaching significance for studying etiopathogenesis and treating diseases. To this end, based on the graph Markov neural network algorithm (GMNN) constructed in our previous work GMNN2CD, we further considered the multisource biological data that affects the association between circRNA and disease and developed an updated web server CircDA and based on the human hepatocellular carcinoma (HCC) tissue data to verify the prediction results of CircDA. RESULTS CircDA is built on a Tumarkov-based deep learning framework. The algorithm regards biomolecules as nodes and the interactions between molecules as edges, reasonably abstracts multiomics data, and models them as a heterogeneous biomolecular association network, which can reflect the complex relationship between different biomolecules. Case studies using literature data from HCC, cervical, and gastric cancers demonstrate that the CircDA predictor can identify missing associations between known circRNAs and diseases, and using the quantitative real-time PCR (RT-qPCR) experiment of HCC in human tissue samples, it was found that five circRNAs were significantly differentially expressed, which proved that CircDA can predict diseases related to new circRNAs. CONCLUSIONS This efficient computational prediction and case analysis with sufficient feedback allows us to identify circRNA-associated diseases and disease-associated circRNAs. Our work provides a method to predict circRNA-associated diseases and can provide guidance for the association of diseases with certain circRNAs. For ease of use, an online prediction server ( http://server.malab.cn/CircDA ) is provided, and the code is open-sourced ( https://github.com/nmt315320/CircDA.git ) for the convenience of algorithm improvement.
Collapse
Affiliation(s)
- Mengting Niu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic University, Shenzhen, 518055, China
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Chunyu Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150000, Heilongjiang, China
| | - Zhanguo Zhang
- Hepatic Surgery Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, 1095 Jiefang Avenue, Wuhan, 430030, China.
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, No. 4 Block 2 North Jianshe Road, Chengdu, 610054, China.
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.
| |
Collapse
|
24
|
Jiang J, Pei H, Li J, Li M, Zou Q, Lv Z. FEOpti-ACVP: identification of novel anti-coronavirus peptide sequences based on feature engineering and optimization. Brief Bioinform 2024; 25:bbae037. [PMID: 38366802 PMCID: PMC10939380 DOI: 10.1093/bib/bbae037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 12/27/2023] [Accepted: 01/17/2024] [Indexed: 02/18/2024] Open
Abstract
Anti-coronavirus peptides (ACVPs) represent a relatively novel approach of inhibiting the adsorption and fusion of the virus with human cells. Several peptide-based inhibitors showed promise as potential therapeutic drug candidates. However, identifying such peptides in laboratory experiments is both costly and time consuming. Therefore, there is growing interest in using computational methods to predict ACVPs. Here, we describe a model for the prediction of ACVPs that is based on the combination of feature engineering (FE) optimization and deep representation learning. FEOpti-ACVP was pre-trained using two feature extraction frameworks. At the next step, several machine learning approaches were tested in to construct the final algorithm. The final version of FEOpti-ACVP outperformed existing methods used for ACVPs prediction and it has the potential to become a valuable tool in ACVP drug design. A user-friendly webserver of FEOpti-ACVP can be accessed at http://servers.aibiochem.net/soft/FEOpti-ACVP/.
Collapse
Affiliation(s)
- Jici Jiang
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Hongdi Pei
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Jiayu Li
- College of Life Science, Sichuan University, Chengdu 610065, China
| | - Mingxin Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Zhibin Lv
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| |
Collapse
|
25
|
Pang Y, Liu B. DisoFLAG: accurate prediction of protein intrinsic disorder and its functions using graph-based interaction protein language model. BMC Biol 2024; 22:3. [PMID: 38166858 PMCID: PMC10762911 DOI: 10.1186/s12915-023-01803-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Accepted: 12/15/2023] [Indexed: 01/05/2024] Open
Abstract
Intrinsically disordered proteins and regions (IDPs/IDRs) are functionally important proteins and regions that exist as highly dynamic conformations under natural physiological conditions. IDPs/IDRs exhibit a broad range of molecular functions, and their functions involve binding interactions with partners and remaining native structural flexibility. The rapid increase in the number of proteins in sequence databases and the diversity of disordered functions challenge existing computational methods for predicting protein intrinsic disorder and disordered functions. A disordered region interacts with different partners to perform multiple functions, and these disordered functions exhibit different dependencies and correlations. In this study, we introduce DisoFLAG, a computational method that leverages a graph-based interaction protein language model (GiPLM) for jointly predicting disorder and its multiple potential functions. GiPLM integrates protein semantic information based on pre-trained protein language models into graph-based interaction units to enhance the correlation of the semantic representation of multiple disordered functions. The DisoFLAG predictor takes amino acid sequences as the only inputs and provides predictions of intrinsic disorder and six disordered functions for proteins, including protein-binding, DNA-binding, RNA-binding, ion-binding, lipid-binding, and flexible linker. We evaluated the predictive performance of DisoFLAG following the Critical Assessment of protein Intrinsic Disorder (CAID) experiments, and the results demonstrated that DisoFLAG offers accurate and comprehensive predictions of disordered functions, extending the current coverage of computationally predicted disordered function categories. The standalone package and web server of DisoFLAG have been established to provide accurate prediction tools for intrinsic disorders and their associated functions.
Collapse
Affiliation(s)
- Yihe Pang
- School of Computer Science and Technology, Beijing Institute of Technology, No. 5, South Zhongguancun Street, Beijing, Haidian District, 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, No. 5, South Zhongguancun Street, Beijing, Haidian District, 100081, China.
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, No. 5, South Zhongguancun Street, Beijing, Haidian District, 100081, China.
| |
Collapse
|
26
|
Meng C, Yuan Y, Zhao H, Pei Y, Li Z. IIFS: An improved incremental feature selection method for protein sequence processing. Comput Biol Med 2023; 167:107654. [PMID: 37944304 DOI: 10.1016/j.compbiomed.2023.107654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 10/09/2023] [Accepted: 10/31/2023] [Indexed: 11/12/2023]
Abstract
MOTIVATION Discrete features can be obtained from protein sequences using a feature extraction method. These features are the basis of downstream processing of protein data, but it is necessary to screen and select some important features from them as they generally have data redundancy. RESULT Here, we report IIFS, an improved incremental feature selection method that exploits a new subset search strategy to find the optimal feature set. IIFS combines nonadjacent sorting features to prevent the drawbacks of data explosion and excessive reliance on feature sorting results. The comparative experimental results on 27 feature sorting data show that IIFS can find more accurate and important features compared to existing methods.The IIFS approach also handles data redundancy more efficiently and finds more representative and discriminatory features while ensuring minimal feature dimensionality and good evaluation metrics. Moreover, we wrap this method and deploy it on a web server for access at http://112.124.26.17:8005/.
Collapse
Affiliation(s)
- Chaolu Meng
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China; Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application of Agriculture and Animal Husbandry, China
| | - Ye Yuan
- Beidahuang Industry Group General Hospital, Harbin, 150001, China
| | - Haiyan Zhao
- College of Integration of Traditional Chinese and Western Medicine to Southwest Medical University, Luzhou, Sichuan, 646000, China
| | - Yue Pei
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100190, China
| | - Zhi Li
- Department of Spleen and Stomach Diseases, The Affiliated Traditional Chinese Medicine Hospital of Southwest Medical University, Luzhou, Sichuan, 646000, China.
| |
Collapse
|
27
|
Xing W, Zhang J, Li C, Huo Y, Dong G. iAMP-Attenpred: a novel antimicrobial peptide predictor based on BERT feature extraction method and CNN-BiLSTM-Attention combination model. Brief Bioinform 2023; 25:bbad443. [PMID: 38055840 PMCID: PMC10699745 DOI: 10.1093/bib/bbad443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 10/31/2023] [Accepted: 11/11/2023] [Indexed: 12/08/2023] Open
Abstract
As a kind of small molecule protein that can fight against various microorganisms in nature, antimicrobial peptides (AMPs) play an indispensable role in maintaining the health of organisms and fortifying defenses against diseases. Nevertheless, experimental approaches for AMP identification still demand substantial allocation of human resources and material inputs. Alternatively, computing approaches can assist researchers effectively and promptly predict AMPs. In this study, we present a novel AMP predictor called iAMP-Attenpred. As far as we know, this is the first work that not only employs the popular BERT model in the field of natural language processing (NLP) for AMPs feature encoding, but also utilizes the idea of combining multiple models to discover AMPs. Firstly, we treat each amino acid from preprocessed AMPs and non-AMP sequences as a word, and then input it into BERT pre-training model for feature extraction. Moreover, the features obtained from BERT method are fed to a composite model composed of one-dimensional CNN, BiLSTM and attention mechanism for better discriminating features. Finally, a flatten layer and various fully connected layers are utilized for the final classification of AMPs. Experimental results reveal that, compared with the existing predictors, our iAMP-Attenpred predictor achieves better performance indicators, such as accuracy, precision and so on. This further demonstrates that using the BERT approach to capture effective feature information of peptide sequences and combining multiple deep learning models are effective and meaningful for predicting AMPs.
Collapse
Affiliation(s)
- Wenxuan Xing
- School of Computer Science and Engineering, Northeastern University, No.195 Chuangxin Road, Hunnan District, Shenyang 110170, China
| | - Jie Zhang
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, No.29 Erdos East Street, Saihan District, Hohhot 010011, China
| | - Chen Li
- School of Computer Science and Engineering, Northeastern University, No.195 Chuangxin Road, Hunnan District, Shenyang 110170, China
| | - Yujia Huo
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, No.29 Erdos East Street, Saihan District, Hohhot 010011, China
| | - Gaifang Dong
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, No.29 Erdos East Street, Saihan District, Hohhot 010011, China
| |
Collapse
|
28
|
Wan H, Zhang Y, Huang S. Prediction of thermophilic protein using 2-D general series correlation pseudo amino acid features. Methods 2023; 218:141-148. [PMID: 37604248 DOI: 10.1016/j.ymeth.2023.08.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 07/08/2023] [Accepted: 08/18/2023] [Indexed: 08/23/2023] Open
Abstract
The demand for thermophilic protein has been increasing in protein engineering recently. Many machine-learning methods for identifying thermophilic proteins have emerged during this period. However, most machine learning-based thermophilic protein identification studies have only focused on accuracy. The relationship between the features' meaning and the proteins' physicochemical properties has yet to be studied in depth. In this article, we focused on the relationship between the features and the thermal stability of thermophilic proteins. This method used 2-D general series correlation pseudo amino acid (SC-PseAAC-General) features and realized accuracy of 82.76% using the J48 classifier. In addition, this research found the presence of higher frequencies of glutamic acid in thermophilic proteins, which help thermophilic proteins maintain their thermal stability by forming hydrogen bonds and salt bridges that prevent denaturation at high temperatures.
Collapse
Affiliation(s)
- Hao Wan
- College of Life Science, Qingdao University, Qingdao 266071, China.
| | - Yanan Zhang
- College of Life Science, Qingdao University, Qingdao 266071, China
| | - Shibo Huang
- Beidahuang Industry Group General Hospital, Harbin 150001, China
| |
Collapse
|
29
|
Xu X, Gao L, Yu L. GOLF-Net: Global and local association fusion network for COVID-19 lung infection segmentation. Comput Biol Med 2023; 164:107361. [PMID: 37595522 DOI: 10.1016/j.compbiomed.2023.107361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Revised: 07/27/2023] [Accepted: 08/12/2023] [Indexed: 08/20/2023]
Abstract
The global spread of the Corona Virus Disease 2019 (COVID-19) has caused significant health hazards, leading researchers to explore new methods for detecting lung infections that can supplement molecular diagnosis. Computer tomography (CT) has emerged as a promising tool, although accurately segmenting infected areas in COVID-19 CT scans, especially given the limited available data, remains a challenge for deep learning models. To address this issue, we propose a novel segmentation network, the GlObal and Local association Fusion Network (GOLF-Net), that combines global and local features from Convolutional Neural Networks and Transformers, respectively. Our network leverages attention mechanisms to enhance the correlation and representation of local features, improving the accuracy of infected area segmentation. Additionally, we implement transfer learning to pretrain our network parameters, providing a robust solution to the issue of limited COVID-19 CT data. Our experimental results demonstrate that the segmentation performance of our network exceeds that of most existing models, with a Dice coefficient of 95.09% and an IoU of 92.58%. © 2014 Hosting by Elsevier B.V. All rights reserved.
Collapse
Affiliation(s)
- Xinyu Xu
- School of Computer Science and Technology, Xidian University, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, China
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, China.
| |
Collapse
|
30
|
Fan R, Ding Y, Zou Q, Yuan L. Multi-view local hyperplane nearest neighbor model based on independence criterion for identifying vesicular transport proteins. Int J Biol Macromol 2023; 247:125774. [PMID: 37437677 DOI: 10.1016/j.ijbiomac.2023.125774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 06/30/2023] [Accepted: 07/07/2023] [Indexed: 07/14/2023]
Abstract
Vesicular transport proteins participate in various biological processes and play a significant role in the movement of substances within cells. These proteins are associated with numerous human diseases, making their identification particularly important. In this study, we developed a novel strategy for accurately identifying vesicular transport proteins. We developed a novel multi-view classifier called graph-regularized k-local hyperplane distance nearest neighbor model (HSIC-GHKNN), which combines the Hilbert-Schmidt independence criterion (HSIC)-based multi-view learning method with a local hyperplane distance nearest-neighbor classifier. We first extracted protein evolution information using two feature extraction methods, pseudo-position-specific scoring matrix (PsePSSM) and AATP, and addressed dataset imbalance using the Edited Nearest Neighbors (ENN) algorithm. Subsequently, we employed a local hyperplane distance nearest-neighbor classifier for each view identification and added an HSIC term to maintain independence between views. We then assessed the performance of our identification strategy and analyzed the PsePSSM and AATP feature sets to determine the influencing factors of the classification results. The experimental results demonstrate that the accurate and Matthew correlation coefficients of our strategy on the independent test set are 85.8 % and 0.548, respectively. Our approach outperformed existing methods in most evaluation metrics. In addition, the proposed multi-view classification model can easily be applied to similar identification tasks.
Collapse
Affiliation(s)
- Rui Fan
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324000, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324000, China.
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324000, China.
| | - Lei Yuan
- Department of Hepatobiliary Surgery, Quzhou People's Hospital, Quzhou, Zhejiang 324000, China.
| |
Collapse
|
31
|
Wu D, Fang X, Luan K, Xu Q, Lin S, Sun S, Yang J, Dong B, Manavalan B, Liao Z. Identification of SH2 domain-containing proteins and motifs prediction by a deep learning method. Comput Biol Med 2023; 162:107065. [PMID: 37267826 DOI: 10.1016/j.compbiomed.2023.107065] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 04/30/2023] [Accepted: 05/27/2023] [Indexed: 06/04/2023]
Abstract
The Src Homology 2 (SH2) domain plays an important role in the signal transmission mechanism in organisms. It mediates the protein-protein interactions based on the combination between phosphotyrosine and motifs in SH2 domain. In this study, we designed a method to identify SH2 domain-containing proteins and non-SH2 domain-containing proteins through deep learning technology. Firstly, we collected SH2 and non-SH2 domain-containing protein sequences including multiple species. We built six deep learning models through DeepBIO after data preprocessing and compared their performance. Secondly, we selected the model with the strongest comprehensive ability to conduct training and test separately again, and analyze the results visually. It was found that 288-dimensional (288D) feature could effectively identify two types of proteins. Finally, motifs analysis discovered the specific motif YKIR and revealed its function in signal transduction. In summary, we successfully identified SH2 domain and non-SH2 domain proteins through deep learning method, and obtained 288D features that perform best. In addition, we found a new motif YKIR in SH2 domain, and analyzed its function which helps to further understand the signaling mechanisms within the organism.
Collapse
Affiliation(s)
- Duanzhi Wu
- School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China
| | - Xin Fang
- School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China; Laboratory of Non-communicable Chronic Disease Control, Fujian Provincial Center for Disease Control and Prevention, Fuzhou, 350012, China
| | - Kai Luan
- School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China
| | - Qijin Xu
- School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China
| | - Shiqi Lin
- School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China
| | - Shiying Sun
- School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China
| | - Jiaying Yang
- School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China; Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China
| | - Bingying Dong
- School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China; Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China
| | - Balachandran Manavalan
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea.
| | - Zhijun Liao
- School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China; Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China.
| |
Collapse
|
32
|
Deng Y, Ma S, Li J, Zheng B, Lv Z. Using the Random Forest for Identifying Key Physicochemical Properties of Amino Acids to Discriminate Anticancer and Non-Anticancer Peptides. Int J Mol Sci 2023; 24:10854. [PMID: 37446031 DOI: 10.3390/ijms241310854] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 06/17/2023] [Accepted: 06/26/2023] [Indexed: 07/15/2023] Open
Abstract
Anticancer peptides (ACPs) represent a promising new therapeutic approach in cancer treatment. They can target cancer cells without affecting healthy tissues or altering normal physiological functions. Machine learning algorithms have increasingly been utilized for predicting peptide sequences with potential ACP effects. This study analyzed four benchmark datasets based on a well-established random forest (RF) algorithm. The peptide sequences were converted into 566 physicochemical features extracted from the amino acid index (AAindex) library, which were then subjected to feature selection using four methods: light gradient-boosting machine (LGBM), analysis of variance (ANOVA), chi-squared test (Chi2), and mutual information (MI). Presenting and merging the identified features using Venn diagrams, 19 key amino acid physicochemical properties were identified that can be used to predict the likelihood of a peptide sequence functioning as an ACP. The results were quantified by performance evaluation metrics to determine the accuracy of predictions. This study aims to enhance the efficiency of designing peptide sequences for cancer treatment.
Collapse
Affiliation(s)
- Yiting Deng
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Shuhan Ma
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Jiayu Li
- College of Life Science, Sichuan University, Chengdu 610065, China
| | - Bowen Zheng
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Zhibin Lv
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| |
Collapse
|
33
|
Lin Y, Sun M, Zhang J, Li M, Yang K, Wu C, Zulfiqar H, Lai H. Computational identification of promoters in Klebsiella aerogenes by using support vector machine. Front Microbiol 2023; 14:1200678. [PMID: 37250059 PMCID: PMC10215528 DOI: 10.3389/fmicb.2023.1200678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 04/18/2023] [Indexed: 05/31/2023] Open
Abstract
Promoters are the basic functional cis-elements to which RNA polymerase binds to initiate the process of gene transcription. Comprehensive understanding gene expression and regulation depends on the precise identification of promoters, as they are the most important component of gene expression. This study aimed to develop a machine learning-based model to predict promoters in Klebsiella aerogenes (K. aerogenes). In the prediction model, the promoter sequences in K. aerogenes genome were encoded by pseudo k-tuple nucleotide composition (PseKNC) and position-correlation scoring function (PCSF). Numerical features were obtained and then optimized using mRMR by combining with support vector machine (SVM) and 5-fold cross-validation (CV). Subsequently, these optimized features were inputted into SVM-based classifier to discriminate promoter sequences from non-promoter sequences in K. aerogenes. Results of 10-fold CV showed that the model could yield the overall accuracy of 96.0% and the area under the ROC curve (AUC) of 0.990. We hope that this model will provide help for the study of promoter and gene regulation in K. aerogenes.
Collapse
Affiliation(s)
- Yan Lin
- Key Laboratory for Animal Disease-Resistance Nutrition of the Ministry of Agriculture, Animal Nutrition Institute, Sichuan Agricultural University, Chengdu, China
| | - Meili Sun
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Junjie Zhang
- Key Laboratory for Animal Disease-Resistance Nutrition of the Ministry of Agriculture, Animal Nutrition Institute, Sichuan Agricultural University, Chengdu, China
| | - Mingyan Li
- Chifeng Product Quality Inspection and Testing Centre, Chifeng, China
| | - Keli Yang
- Nonlinear Research Institute, Baoji University of Arts and Sciences, Baoji, China
| | - Chengyan Wu
- Baotou Teacher’s College, Inner Mongolia University of Science and Technology, Baotou, China
| | - Hasan Zulfiqar
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, Zhejiang, China
| | - Hongyan Lai
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, China
| |
Collapse
|
34
|
Wang Y, Zhang Y, Wang J, Xie F, Zheng D, Zou X, Guo M, Ding Y, Wan J, Han K. Prediction of drug-target interactions via neural tangent kernel extraction feature matrix factorization model. Comput Biol Med 2023; 159:106955. [PMID: 37094465 DOI: 10.1016/j.compbiomed.2023.106955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 04/04/2023] [Accepted: 04/16/2023] [Indexed: 04/26/2023]
Abstract
Drug discovery is a complex and lengthy process that often requires years of research and development. Therefore, drug research and development require a lot of investment and resource support, as well as professional knowledge, technology, skills, and other elements. Predicting of drug-target interactions (DTIs) is an important part of drug development. If machine learning is used to predict DTIs, the cost and time of drug development can be significantly reduced. Currently, machine learning methods are widely used to predict DTIs. In this study neighborhood regularized logistic matrix factorization method based on extracted features from a neural tangent kernel (NTK) to predict DTIs. First, the potential feature matrix of drugs and targets is extracted from the NTK model, then the corresponding Laplacian matrix is constructed according to the feature matrix. Next, the Laplacian matrix of the drugs and targets is used as the condition for matrix factorization to obtain two low-dimensional matrices. Finally, the matrix of the predicted DTIs was obtained by multiplying these two low-dimensional matrices. For the four gold standard datasets, the present method is significantly better than the other methods that is compared to, indicating that the automatic feature extraction method using the deep learning model is competitive compared with the manual feature selection method.
Collapse
Affiliation(s)
- Yu Wang
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Yu Zhang
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Jianchun Wang
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Fang Xie
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Dequan Zheng
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Xiang Zou
- Pharmaceutical Engineering Technology Research Center, Harbin University of Commerce, Harbin, 150076, China
| | - Mian Guo
- Department of Neurosurgery, The Second Affiliated Hospital of Harbin Medical University, 150086, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China.
| | - Jie Wan
- Laboratory for Space Environment and Physical Sciences, Harbin Institute of Technology, Harbin, 150001, China.
| | - Ke Han
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China; Pharmaceutical Engineering Technology Research Center, Harbin University of Commerce, Harbin, 150076, China.
| |
Collapse
|
35
|
Zheng L, Liu L, Zhu W, Ding Y, Wu F. Predicting enhancer-promoter interaction based on epigenomic signals. Front Genet 2023; 14:1133775. [PMID: 37144127 PMCID: PMC10151517 DOI: 10.3389/fgene.2023.1133775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 04/04/2023] [Indexed: 05/06/2023] Open
Abstract
Introduction: The physical interactions between enhancers and promoters are often involved in gene transcriptional regulation. High tissue-specific enhancer-promoter interactions (EPIs) are responsible for the differential expression of genes. Experimental methods are time-consuming and labor-intensive in measuring EPIs. An alternative approach, machine learning, has been widely used to predict EPIs. However, most existing machine learning methods require a large number of functional genomic and epigenomic features as input, which limits the application to different cell lines. Methods: In this paper, we developed a random forest model, HARD (H3K27ac, ATAC-seq, RAD21, and Distance), to predict EPI using only four types of features. Results: Independent tests on a benchmark dataset showed that HARD outperforms other models with the fewest features. Discussion: Our results revealed that chromatin accessibility and the binding of cohesin are important for cell-line-specific EPIs. Furthermore, we trained the HARD model in the GM12878 cell line and performed testing in the HeLa cell line. The cross-cell-lines prediction also performs well, suggesting it has the potential to be applied to other cell lines.
Collapse
Affiliation(s)
- Leqiong Zheng
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
| | - Li Liu
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Wen Zhu
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
| | - Yijie Ding
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
| | - Fangxiang Wu
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| |
Collapse
|
36
|
Jiang J, Li J, Li J, Pei H, Li M, Zou Q, Lv Z. A Machine Learning Method to Identify Umami Peptide Sequences by Using Multiplicative LSTM Embedded Features. Foods 2023; 12:foods12071498. [PMID: 37048319 PMCID: PMC10094688 DOI: 10.3390/foods12071498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Revised: 03/24/2023] [Accepted: 03/30/2023] [Indexed: 04/05/2023] Open
Abstract
Umami peptides enhance the umami taste of food and have good food processing properties, nutritional value, and numerous potential applications. Wet testing for the identification of umami peptides is a time-consuming and expensive process. Here, we report the iUmami-DRLF that uses a logistic regression (LR) method solely based on the deep learning pre-trained neural network feature extraction method, unified representation (UniRep based on multiplicative LSTM), for feature extraction from the peptide sequences. The findings demonstrate that deep learning representation learning significantly enhanced the capability of models in identifying umami peptides and predictive precision solely based on peptide sequence information. The newly validated taste sequences were also used to test the iUmami-DRLF and other predictors, and the result indicates that the iUmami-DRLF has better robustness and accuracy and remains valid at higher probability thresholds. The iUmami-DRLF method can aid further studies on enhancing the umami flavor of food for satisfying the need for an umami-flavored diet.
Collapse
Affiliation(s)
- Jici Jiang
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Jiayu Li
- College of Life Science, Sichuan University, Chengdu 610065, China
| | - Junxian Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Hongdi Pei
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
- Wu Yuzhang Honors College, Sichuan University, Chengdu 610065, China
| | - Mingxin Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Zhibin Lv
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| |
Collapse
|
37
|
Li Y, Ma D, Chen D, Chen Y. ACP-GBDT: An improved anticancer peptide identification method with gradient boosting decision tree. Front Genet 2023; 14:1165765. [PMID: 37065496 PMCID: PMC10090421 DOI: 10.3389/fgene.2023.1165765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 03/09/2023] [Indexed: 03/31/2023] Open
Abstract
Cancer is one of the most dangerous diseases in the world, killing millions of people every year. Drugs composed of anticancer peptides have been used to treat cancer with low side effects in recent years. Therefore, identifying anticancer peptides has become a focus of research. In this study, an improved anticancer peptide predictor named ACP-GBDT, based on gradient boosting decision tree (GBDT) and sequence information, is proposed. To encode the peptide sequences included in the anticancer peptide dataset, ACP-GBDT uses a merged-feature composed of AAIndex and SVMProt-188D. A GBDT is adopted to train the prediction model in ACP-GBDT. Independent testing and ten-fold cross-validation show that ACP-GBDT can effectively distinguish anticancer peptides from non-anticancer ones. The comparison results of the benchmark dataset show that ACP-GBDT is simpler and more effective than other existing anticancer peptide prediction methods.
Collapse
Affiliation(s)
- Yanjuan Li
- College of Electrical and Information Engineering, Quzhou University, Quzhou, China
| | - Di Ma
- College of Computer, Hangzhou Dianzi University, Hangzhou, China
| | - Dong Chen
- College of Electrical and Information Engineering, Quzhou University, Quzhou, China
- *Correspondence: Dong Chen, ; Yu Chen,
| | - Yu Chen
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
- *Correspondence: Dong Chen, ; Yu Chen,
| |
Collapse
|
38
|
Sun Y. A systematic pan-cancer analysis reveals the clinical prognosis and immunotherapy value of C-X3-C motif ligand 1 (CX3CL1). Front Genet 2023; 14:1183795. [PMID: 37153002 PMCID: PMC10157490 DOI: 10.3389/fgene.2023.1183795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 04/10/2023] [Indexed: 05/09/2023] Open
Abstract
It is now widely known that C-X3-C motif ligand 1 (CX3CL1) plays an essential part in the process of regulating pro-inflammatory cells migration across a wide range of inflammatory disorders, including a number of malignancies. However, there has been no comprehensive study on the correlation between CX3CL1 and cancers on the basis of clinical features. In order to investigate the potential function of CX3CL1 in the clinical prognosis and immunotherapy, I evaluated the expression of CX3CL1 in numerous cancer types, methylation levels and genetic alterations. I found CX3CL1 was differentially expressed in numerous cancer types, which indicated CX3CL1 may plays a potential role in tumor progression. Furthermore, CX3CL1 was variably expressed in methylation levels and gene alterations in most cancers according to The Cancer Genome Atlas (TCGA). CX3CL1 was robustly associated with clinical characteristics and pathological stages, suggesting that it was related to the degree of tumor malignancy and the physical function of patients. As determined by the Kaplan-Meier method of estimating survival, high CX3CL1 expression was associated with either favorable or unfavorable outcomes depending on the different types of cancer. It suggests the correlation between CX3CL1 and tumor prognosis. Significant positive correlations of CX3CL1 expression with CD4+ T cells, M1 macrophage cells and activated mast cells have been established in the majority of TCGA malignancies. Which indicates CX3CL1 plays an important role in tumor immune microenvironment. Gene Ontology (GO) terms and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis suggested that the chemokine signaling pathway may shed light on the pathway for CX3CL1 to exert function. In a conclusion, our study comprehensively summarizes the potential role of CX3CL1 in clinical prognosis and immunotherapy, suggesting that CX3CL1 may represent a promising pharmacological treatment target of tumors.
Collapse
|