1
|
Ke S, Huang Y, Wang D, Jiang Q, Luo Z, Li B, Yan D, Zhou J. BreCML: identifying breast cancer cell state in scRNA-seq via machine learning. Front Med (Lausanne) 2024; 11:1482726. [PMID: 39574916 PMCID: PMC11579858 DOI: 10.3389/fmed.2024.1482726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2024] [Accepted: 10/15/2024] [Indexed: 11/24/2024] Open
Abstract
Breast cancer is a prevalent malignancy and one of the leading causes of cancer-related mortality among women worldwide. This disease typically manifests through the abnormal proliferation and dissemination of malignant cells within breast tissue. Current diagnostic and therapeutic strategies face significant challenges in accurately identifying and localizing specific subtypes of breast cancer. In this study, we developed a novel machine learning-based predictor, BreCML, designed to accurately classify subpopulations of breast cancer cells and their associated marker genes. BreCML exhibits outstanding predictive performance, achieving an accuracy of 98.92% on the training dataset. Utilizing the XGBoost algorithm, BreCML demonstrates superior accuracy (98.67%), precision (99.15%), recall (99.49%), and F1-score (99.79%) on the test dataset. Through the application of machine learning and feature selection techniques, BreCML successfully identified new key genes. This predictor not only serves as a powerful tool for assessing breast cancer cellular status but also offers a rapid and efficient means to uncover potential biomarkers, providing critical insights for precision medicine and therapeutic strategies.
Collapse
Affiliation(s)
- Shanbao Ke
- Department of Oncology, Henan Provincial People’s Hospital, Zhengzhou University People’s Hospital, Zhengzhou, China
| | - Yuxuan Huang
- Department of Neuroscience in the Behavioral Sciences, Duke University and Duke Kunshan University, Suzhou, China
| | - Dong Wang
- Pudong Institute for Health Development, Shanghai, China
| | - Qiang Jiang
- Department of Oncology, Henan Provincial People’s Hospital, Zhengzhou University People’s Hospital, Zhengzhou, China
| | - Zhangyang Luo
- Pudong Institute for Health Development, Shanghai, China
| | - Baiyu Li
- Department of Oncology, Henan Provincial People’s Hospital, Zhengzhou University People’s Hospital, Zhengzhou, China
| | - Danfang Yan
- Department of Radiation Oncology, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou, China
| | - Jianwei Zhou
- Department of Oncology, Henan Provincial People’s Hospital, Zhengzhou University People’s Hospital, Zhengzhou, China
| |
Collapse
|
2
|
Yu S, Liu L, Wang H, Yan S, Zheng S, Ning J, Luo R, Fu X, Deng X. AtML: An Arabidopsis thaliana root cell identity recognition tool for medicinal ingredient accumulation. Methods 2024; 231:61-69. [PMID: 39293728 DOI: 10.1016/j.ymeth.2024.09.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2024] [Revised: 08/05/2024] [Accepted: 09/12/2024] [Indexed: 09/20/2024] Open
Abstract
Arabidopsis thaliana synthesizes various medicinal compounds, and serves as a model plant for medicinal plant research. Single-cell transcriptomics technologies are essential for understanding the developmental trajectory of plant roots, facilitating the analysis of synthesis and accumulation patterns of medicinal compounds in different cell subpopulations. Although methods for interpreting single-cell transcriptomics data are rapidly advancing in Arabidopsis, challenges remain in precisely annotating cell identity due to the lack of marker genes for certain cell types. In this work, we trained a machine learning system, AtML, using sequencing datasets from six cell subpopulations, comprising a total of 6000 cells, to predict Arabidopsis root cell stages and identify biomarkers through complete model interpretability. Performance testing using an external dataset revealed that AtML achieved 96.50% accuracy and 96.51% recall. Through the interpretability provided by AtML, our model identified 160 important marker genes, contributing to the understanding of cell type annotations. In conclusion, we trained AtML to efficiently identify Arabidopsis root cell stages, providing a new tool for elucidating the mechanisms of medicinal compound accumulation in Arabidopsis roots.
Collapse
Affiliation(s)
- Shicong Yu
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Lijia Liu
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Hao Wang
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Shen Yan
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Shuqin Zheng
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Jing Ning
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Ruxian Luo
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Xiangzheng Fu
- Research Institute of Hunan University in Chongqing, Chongqing 401120, China.
| | - Xiaoshu Deng
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China; Chongqing Academy of Chinese Materia Medica, Chongqing 400065, China.
| |
Collapse
|
3
|
Lei L, Li J, Liu Z, Zhang D, Liu Z, Wang Q, Gao Y, Mo B, Li J. Identification of diagnostic markers pyrodeath-related genes in non-alcoholic fatty liver disease based on machine learning and experiment validation. Sci Rep 2024; 14:25541. [PMID: 39462099 PMCID: PMC11513955 DOI: 10.1038/s41598-024-77409-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2024] [Accepted: 10/22/2024] [Indexed: 10/28/2024] Open
Abstract
Non-alcoholic fatty liver disease (NAFLD) poses a global health challenge. While pyroptosis is implicated in various diseases, its specific involvement in NAFLD remains unclear. Thus, our study aims to elucidate the role and mechanisms of pyroptosis in NAFLD. Utilizing data from the Gene Expression Omnibus (GEO) database, we analyzed the expression levels of pyroptosis-related genes (PRGs) in NAFLD and normal tissues using the R data package. We investigated protein interactions, correlations, and functional enrichment of these genes. Key genes were identified employing multiple machine learning techniques. Immunoinfiltration analyses were conducted to discern differences in immune cell populations between NAFLD patients and controls. Key gene expression was validated using a cell model. Analysis of GEO datasets, comprising 206 NAFLD samples and 10 controls, revealed two key PRGs (TIRAP, and GSDMD). Combining these genes yielded an area under the curve (AUC) of 0.996 for diagnosing NAFLD. In an external dataset, the AUC for the two key genes was 0.825. Nomogram, decision curve, and calibration curve analyses further validated their diagnostic efficacy. These genes were implicated in multiple pathways associated with NAFLD progression. Immunoinfiltration analysis showed significantly lower numbers of various immune cell types in NAFLD patient samples compared to controls. Single sample gene set enrichment analysis (ssGSEA) was employed to assess the immune microenvironment. Finally, the expression of the two key genes was validated in cell NAFLD model using qRT-PCR. We developed a prognostic model for NAFLD based on two PRGs, demonstrating robust predictive efficacy. Our findings enhance the understanding of pyroptosis in NAFLD and suggest potential avenues for therapeutic exploration.
Collapse
Affiliation(s)
- Liping Lei
- Department of Geriatric Medicine, The Affiliated Hospital of Guilin Medical University, Guilin, 541001, Guangxi, China
- Division of Hepatobiliary Surgery, The Affiliated Hospital of Guilin Medical University, Guilin, 541001, Guangxi, China
| | - Jixue Li
- Division of Hepatobiliary Surgery, The Affiliated Hospital of Guilin Medical University, Guilin, 541001, Guangxi, China
| | - Zirui Liu
- Division of Hepatobiliary Surgery, The Affiliated Hospital of Guilin Medical University, Guilin, 541001, Guangxi, China
| | - Dongdong Zhang
- Division of Hepatobiliary Surgery, The Affiliated Hospital of Guilin Medical University, Guilin, 541001, Guangxi, China
| | - Zihan Liu
- Division of Hepatobiliary Surgery, The Affiliated Hospital of Guilin Medical University, Guilin, 541001, Guangxi, China
| | - Qing Wang
- Division of Hepatobiliary Surgery, The Affiliated Hospital of Guilin Medical University, Guilin, 541001, Guangxi, China
| | - Yi Gao
- Department of Gastrointestinal Surgery, The Affiliated Hospital of Guilin Medical University, Guilin, 541001, Guangxi, China
| | - Biwen Mo
- Department of Respiratory and Critical Care Medicine, The Second Affiliated Hospital of Guilin Medical University, Guilin, 541002, Guangxi, China.
| | - Jiangfa Li
- Division of Hepatobiliary Surgery, The Affiliated Hospital of Guilin Medical University, Guilin, 541001, Guangxi, China.
- Key Laboratory of Early Prevention and Treatment for Regional High Frequency Tumor, Guangxi Medical University, Ministry of Education, Nanning, 530021, Guangxi, China.
- Guangxi Key Laboratory of Early Prevention and Treatment for Regional High Frequency Tumor, Nanning, 530021, Guangxi, China.
| |
Collapse
|
4
|
Li J, Wang Y, Wu Z, Zhong M, Feng G, Liu Z, Zeng Y, Wei Z, Mueller S, He S, Ouyang G, Yuan G. Identification of diagnostic markers and molecular clusters of cuproptosis-related genes in alcohol-related liver disease based on machine learning and experimental validation. Heliyon 2024; 10:e37612. [PMID: 39315155 PMCID: PMC11417179 DOI: 10.1016/j.heliyon.2024.e37612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 07/15/2024] [Accepted: 09/06/2024] [Indexed: 09/25/2024] Open
Abstract
Background and aims Alcohol-related liver disease (ALD) is a worldwide burden. Cuproptosis has been shown to play a key role in the development of several diseases. However, the role and mechanisms of cuproptosis in ALD remain unclear. Methods The RNA-sequencing data of ALD liver samples were downloaded from the Gene Expression Omnibus (GEO) database. Bioinformatical analyses were performed using the R data package. We then identified key genes through multiple machine learning methods. Immunoinfiltration analyses were used to identify different immune cells in ALD patients and controls. The expression levels of key genes were further verified. Results We identified three key cuproptosis-related genes (CRGs) (DPYD, SLC31A1, and DBT) through an in-depth analysis of two GEO datasets, including 28 ALD samples and eight control samples. The area under the curve (AUC) value of these three genes combined in determining ALD was 1.0. In the external datasets, the three key genes had AUC values as high as 1.0 and 0.917, respectively. Nomogram, decision curve, and calibration curve analyses also confirmed these genes' ability to predict the diagnosis. These three key genes were found to be involved in multiple pathways associated with ALD progression. We confirmed the mRNA expression of these three key genes in mouse ALD liver samples. Regarding immune cell infiltration, the numbers of B cells, CD8 (+) T cells, NK cells, T-helper cells, and Th1 cells were significantly lower in ALD patient samples than in control liver samples. Single sample gene set enrichment analysis (ssGSEA) was then used to estimate the immune microenvironment of different CRG clusters and CRG-related gene clusters. In addition, we calculated CRG scores through principal component analysis (PCA) and selected Sankey plots to represent the correlation between CRG clusters, gene clusters, and CRG scores. Finally, the three key genes were confirmed in mouse ALD liver samples and liver cells treated with ethanol. Conclusions We first established a prognostic model for ALD based on 3 CRGs and robust prediction efficacy was confirmed. Our investigation contributes to a comprehensive understanding of the role of cuproptosis in ALD, presenting promising avenues for the exploration of therapeutic strategies.
Collapse
Affiliation(s)
- Jiangfa Li
- Division of Hepatobiliary Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi 530021, China
- Key Laboratory of Early Prevention and Treatment for Regional High Frequency Tumor (Guangxi Medical University), Ministry of Education, Nanning, Guangxi 530021, China
- Guangxi Key Laboratory of Immunology and Metabolism for Liver Diseases, Nanning, Guangxi 530021, China
| | - Yong Wang
- Division of Hepatobiliary Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi 530021, China
- Key Laboratory of Early Prevention and Treatment for Regional High Frequency Tumor (Guangxi Medical University), Ministry of Education, Nanning, Guangxi 530021, China
- Guangxi Key Laboratory of Immunology and Metabolism for Liver Diseases, Nanning, Guangxi 530021, China
| | - Zhan Wu
- Division of Hepatobiliary Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi 530021, China
- Key Laboratory of Early Prevention and Treatment for Regional High Frequency Tumor (Guangxi Medical University), Ministry of Education, Nanning, Guangxi 530021, China
- Guangxi Key Laboratory of Immunology and Metabolism for Liver Diseases, Nanning, Guangxi 530021, China
| | - Mingbei Zhong
- Division of Hepatobiliary Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi 530021, China
- Key Laboratory of Early Prevention and Treatment for Regional High Frequency Tumor (Guangxi Medical University), Ministry of Education, Nanning, Guangxi 530021, China
- Guangxi Key Laboratory of Immunology and Metabolism for Liver Diseases, Nanning, Guangxi 530021, China
| | - Gangping Feng
- Division of Hepatobiliary Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi 530021, China
- Key Laboratory of Early Prevention and Treatment for Regional High Frequency Tumor (Guangxi Medical University), Ministry of Education, Nanning, Guangxi 530021, China
- Guangxi Key Laboratory of Immunology and Metabolism for Liver Diseases, Nanning, Guangxi 530021, China
| | - Zhipeng Liu
- Division of Hepatobiliary Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi 530021, China
- Key Laboratory of Early Prevention and Treatment for Regional High Frequency Tumor (Guangxi Medical University), Ministry of Education, Nanning, Guangxi 530021, China
- Guangxi Key Laboratory of Immunology and Metabolism for Liver Diseases, Nanning, Guangxi 530021, China
| | - Yonglian Zeng
- Division of Hepatobiliary Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi 530021, China
- Key Laboratory of Early Prevention and Treatment for Regional High Frequency Tumor (Guangxi Medical University), Ministry of Education, Nanning, Guangxi 530021, China
- Guangxi Key Laboratory of Immunology and Metabolism for Liver Diseases, Nanning, Guangxi 530021, China
| | - Zaiwa Wei
- Division of Hepatobiliary Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi 530021, China
- Key Laboratory of Early Prevention and Treatment for Regional High Frequency Tumor (Guangxi Medical University), Ministry of Education, Nanning, Guangxi 530021, China
- Guangxi Key Laboratory of Immunology and Metabolism for Liver Diseases, Nanning, Guangxi 530021, China
| | - Sebastian Mueller
- Center for Alcohol Research, University Hospital Heidelberg, Heidelberg, Germany
| | - Songqing He
- Division of Hepatobiliary Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi 530021, China
- Key Laboratory of Early Prevention and Treatment for Regional High Frequency Tumor (Guangxi Medical University), Ministry of Education, Nanning, Guangxi 530021, China
- Guangxi Key Laboratory of Immunology and Metabolism for Liver Diseases, Nanning, Guangxi 530021, China
| | - Guoqing Ouyang
- Division of Hepatobiliary Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi 530021, China
- Key Laboratory of Early Prevention and Treatment for Regional High Frequency Tumor (Guangxi Medical University), Ministry of Education, Nanning, Guangxi 530021, China
- Guangxi Key Laboratory of Immunology and Metabolism for Liver Diseases, Nanning, Guangxi 530021, China
| | - Guandou Yuan
- Division of Hepatobiliary Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi 530021, China
- Key Laboratory of Early Prevention and Treatment for Regional High Frequency Tumor (Guangxi Medical University), Ministry of Education, Nanning, Guangxi 530021, China
- Guangxi Key Laboratory of Immunology and Metabolism for Liver Diseases, Nanning, Guangxi 530021, China
| |
Collapse
|
5
|
Liu L, Huang Y, Zheng Y, Liao Y, Ma S, Wang Q. ScnML models single-cell transcriptome to predict spinal cord neuronal cell status. Front Genet 2024; 15:1413484. [PMID: 38894722 PMCID: PMC11183327 DOI: 10.3389/fgene.2024.1413484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Accepted: 05/20/2024] [Indexed: 06/21/2024] Open
Abstract
Injuries to the spinal cord nervous system often result in permanent loss of sensory, motor, and autonomic functions. Accurately identifying the cellular state of spinal cord nerves is extremely important and could facilitate the development of new therapeutic and rehabilitative strategies. Existing experimental techniques for identifying the development of spinal cord nerves are both labor-intensive and costly. In this study, we developed a machine learning predictor, ScnML, for predicting subpopulations of spinal cord nerve cells as well as identifying marker genes. The prediction performance of ScnML was evaluated on the training dataset with an accuracy of 94.33%. Based on XGBoost, ScnML on the test dataset achieved 94.08% 94.24%, 94.26%, and 94.24% accuracies with precision, recall, and F1-measure scores, respectively. Importantly, ScnML identified new significant genes through model interpretation and biological landscape analysis. ScnML can be a powerful tool for predicting the status of spinal cord neuronal cells, revealing potential specific biomarkers quickly and efficiently, and providing crucial insights for precision medicine and rehabilitation recovery.
Collapse
Affiliation(s)
- Lijia Liu
- School of Recreation and Community Sport, Capital University of Physical Education and Sports, Beijing, China
| | - Yuxuan Huang
- Department of Neuroscience in the Behavioral Sciences, Duke University and Duke Kunshan University, Suzhou, Jiangsu, China
| | - Yuan Zheng
- Taizhou Hospital of Zhejiang Province, Wenzhou Medical University, Luqiao, China
| | - Yihan Liao
- Taizhou Hospital of Zhejiang Province, Wenzhou Medical University, Luqiao, China
| | - Siyuan Ma
- School of Recreation and Community Sport, Capital University of Physical Education and Sports, Beijing, China
| | - Qian Wang
- Department of Neurology, The First Hospital of Tsinghua University, Beijing, China
| |
Collapse
|
6
|
Cheng N, Gao Y, Ju S, Kong X, Lyu J, Hou L, Jin L, Shen B. Serum analysis based on SERS combined with 2D convolutional neural network and Gramian angular field for breast cancer screening. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2024; 312:124054. [PMID: 38382221 DOI: 10.1016/j.saa.2024.124054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 02/08/2024] [Accepted: 02/17/2024] [Indexed: 02/23/2024]
Abstract
Breast cancer is a significant cause of death among women worldwide. It is crucial to quickly and accurately diagnose breast cancer in order to reduce mortality rates. While traditional diagnostic techniques for medical imaging and pathology samples have been commonly used in breast cancer screening, they still have certain limitations. Surface-enhanced Raman spectroscopy (SERS) is a fast, highly sensitive and user-friendly method that is often combined with deep learning techniques like convolutional neural networks. This combination helps identify unique molecular spectral features, also known as "fingerprint", in biological samples such as serum. Ultimately, this approach is able to accurately screen for cancer. The Gramian angular field (GAF) algorithm can convert one-dimensional (1D) time series into two-dimensional (2D) images. These images can be used for data visualization, pattern recognition and machine learning tasks. In this study, 640 serum SERS from breast cancer patients and healthy volunteers were converted into 2D spectral images by Gramian angular field (GAF) technique. These images were then used to train and test a two-dimensional convolutional neural network-GAF (2D-CNN-GAF) model for breast cancer classification. We compared the performance of the 2D-CNN-GAF model with other methods, including one-dimensional convolutional neural network (1D-CNN), support vector machine (SVM), K-nearest neighbor (KNN) and principal component analysis-linear discriminant analysis (PCA-LDA), using various evaluation metrics such as accuracy, precision, sensitivity, F1-score, receiver operating characteristic (ROC) curve and area under curve (AUC) value. The results showed that the 2D-CNN model outperformed the traditional models, achieving an AUC value of 0.9884, an accuracy of 98.13%, sensitivity of 98.65% and specificity of 97.67% for breast cancer classification. In this study, we used conventional nano-silver sol as the SERS-enhanced substrate and a portable laser Raman spectrometer to obtain the serum SERS data. The 2D-CNN-GAF model demonstrated accurate and automatic classification of breast cancer patients and healthy volunteers. The method does not require augmentation and preprocessing of spectral data, simplifying the processing steps of spectral data. This method has great potential for accurate breast cancer screening and also provides a useful reference in more types of cancer classification and automatic screening.
Collapse
Affiliation(s)
- Nuo Cheng
- School of Life Science and Technology, Changchun University of Science and Technology, Changchun 130022, PR China
| | - Yan Gao
- School of Life Science and Technology, Changchun University of Science and Technology, Changchun 130022, PR China; Chinese Academy of Science, Shenzhen Institutes of Advanced and Technology, Shenzhen 518000, PR China
| | - Shaowei Ju
- School of Life Science and Technology, Changchun University of Science and Technology, Changchun 130022, PR China
| | - Xiangwei Kong
- School of Life Science and Technology, Changchun University of Science and Technology, Changchun 130022, PR China
| | - Jiugong Lyu
- School of Life Science and Technology, Changchun University of Science and Technology, Changchun 130022, PR China; School of Biological Engineering, Dalian University of Technology, Dalian 116024, PR China
| | - Lijie Hou
- School of Life Science and Technology, Changchun University of Science and Technology, Changchun 130022, PR China
| | - Lihong Jin
- School of Life Science and Technology, Changchun University of Science and Technology, Changchun 130022, PR China
| | - Bingjun Shen
- School of Life Science and Technology, Changchun University of Science and Technology, Changchun 130022, PR China
| |
Collapse
|
7
|
Fu X, Yuan Y, Qiu H, Suo H, Song Y, Li A, Zhang Y, Xiao C, Li Y, Dou L, Zhang Z, Cui F. AGF-PPIS: A protein-protein interaction site predictor based on an attention mechanism and graph convolutional networks. Methods 2024; 222:142-151. [PMID: 38242383 DOI: 10.1016/j.ymeth.2024.01.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 01/04/2024] [Accepted: 01/13/2024] [Indexed: 01/21/2024] Open
Abstract
Protein-protein interactions play an important role in various biological processes. Interaction among proteins has a wide range of applications. Therefore, the correct identification of protein-protein interactions sites is crucial. In this paper, we propose a novel predictor for protein-protein interactions sites, AGF-PPIS, where we utilize a multi-head self-attention mechanism (introducing a graph structure), graph convolutional network, and feed-forward neural network. We use the Euclidean distance between each protein residue to generate the corresponding protein graph as the input of AGF-PPIS. On the independent test dataset Test_60, AGF-PPIS achieves superior performance over comparative methods in terms of seven different evaluation metrics (ACC, precision, recall, F1-score, MCC, AUROC, AUPRC), which fully demonstrates the validity and superiority of the proposed AGF-PPIS model. The source codes and the steps for usage of AGF-PPIS are available at https://github.com/fxh1001/AGF-PPIS.
Collapse
Affiliation(s)
- Xiuhao Fu
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Ye Yuan
- Beidahuang Industry Group General Hospital, Harbin 150001, China
| | - Haoye Qiu
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Haodong Suo
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Yingying Song
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Anqi Li
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Yupeng Zhang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Cuilin Xiao
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Yazi Li
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Lijun Dou
- Genomic Medicine Institute, Lerner Research Institute, Cleveland, OH 44106, USA
| | - Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China.
| | - Feifei Cui
- School of Computer Science and Technology, Hainan University, Haikou 570228, China.
| |
Collapse
|
8
|
Wang H, Lin YN, Yan S, Hong JP, Tan JR, Chen YQ, Cao YS, Fang W. NRTPredictor: identifying rice root cell state in single-cell RNA-seq via ensemble learning. PLANT METHODS 2023; 19:119. [PMID: 37925413 PMCID: PMC10625708 DOI: 10.1186/s13007-023-01092-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 10/15/2023] [Indexed: 11/06/2023]
Abstract
BACKGROUND Single-cell RNA sequencing (scRNA-seq) measurements of gene expression show great promise for studying the cellular heterogeneity of rice roots. How precisely annotating cell identity is a major unresolved problem in plant scRNA-seq analysis due to the inherent high dimensionality and sparsity. RESULTS To address this challenge, we present NRTPredictor, an ensemble-learning system, to predict rice root cell stage and mine biomarkers through complete model interpretability. The performance of NRTPredictor was evaluated using a test dataset, with 98.01% accuracy and 95.45% recall. With the power of interpretability provided by NRTPredictor, our model recognizes 110 marker genes partially involved in phenylpropanoid biosynthesis. Expression patterns of rice root could be mapped by the above-mentioned candidate genes, showing the superiority of NRTPredictor. Integrated analysis of scRNA and bulk RNA-seq data revealed aberrant expression of Epidermis cell subpopulations in flooding, Pi, and salt stresses. CONCLUSION Taken together, our results demonstrate that NRTPredictor is a useful tool for automated prediction of rice root cell stage and provides a valuable resource for deciphering the rice root cellular heterogeneity and the molecular mechanisms of flooding, Pi, and salt stresses. Based on the proposed model, a free webserver has been established, which is available at https://www.cgris.net/nrtp .
Collapse
Affiliation(s)
- Hao Wang
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Yu-Nan Lin
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Shen Yan
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Jing-Peng Hong
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Jia-Rui Tan
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Yan-Qing Chen
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China.
| | - Yong-Sheng Cao
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China.
| | - Wei Fang
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China.
| |
Collapse
|
9
|
Kırboğa KK, Abbasi S, Küçüksille EU. Explainability and white box in drug discovery. Chem Biol Drug Des 2023; 102:217-233. [PMID: 37105727 DOI: 10.1111/cbdd.14262] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2022] [Revised: 03/24/2023] [Accepted: 04/12/2023] [Indexed: 04/29/2023]
Abstract
Recently, artificial intelligence (AI) techniques have been increasingly used to overcome the challenges in drug discovery. Although traditional AI techniques generally have high accuracy rates, there may be difficulties in explaining the decision process and patterns. This can create difficulties in understanding and making sense of the outputs of algorithms used in drug discovery. Therefore, using explainable AI (XAI) techniques, the causes and consequences of the decision process are better understood. This can help further improve the drug discovery process and make the right decisions. To address this issue, Explainable Artificial Intelligence (XAI) emerged as a process and method that securely captures the results and outputs of machine learning (ML) and deep learning (DL) algorithms. Using techniques such as SHAP (SHApley Additive ExPlanations) and LIME (Locally Interpretable Model-Independent Explanations) has made the drug targeting phase clearer and more understandable. XAI methods are expected to reduce time and cost in future computational drug discovery studies. This review provides a comprehensive overview of XAI-based drug discovery and development prediction. XAI mechanisms to increase confidence in AI and modeling methods. The limitations and future directions of XAI in drug discovery are also discussed.
Collapse
Affiliation(s)
- Kevser Kübra Kırboğa
- Bioengineering Department, Bilecik Seyh Edebali University, Bilecik, Turkey
- Informatics Institute, Istanbul Technical University, Maslak, Turkey
| | - Sumra Abbasi
- Department of Biological Sciences, National of Medical Sciences, Rawalpindi, Pakistan
| | - Ecir Uğur Küçüksille
- Department of Computer Engineering, Süleyman Demirel University, Isparta, Turkey
| |
Collapse
|
10
|
Zhang L, Bai T, Wu H. sgRNA-2wPSM: Identify sgRNAs on-target activity by combining two-window-based position specific mismatch and synthetic minority oversampling technique. Comput Biol Med 2023; 155:106489. [PMID: 36841059 DOI: 10.1016/j.compbiomed.2022.106489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 12/27/2022] [Indexed: 12/30/2022]
Abstract
MOTIVATION sgRNAs on-target activity prediction is a critical step in the CRISPR-Cas9 system. Due to its importance to RNA function research and genome editing application, some computational methods were introduced, treating it as a binary classification task or a regression task. Among these methods, sgRNA-PSM is a state-of-the-art method. In this work, we improved this method by proposing a new feature extraction method called two-window-based PSM, which divides the DNA sequences into two non-overlapping segments so as to extract different patterns in the two different segments. The two-window-based PSM were fed into Support Vector Machines (SVMs), and a new method called sgRNA-2wPSM was proposed. Furthermore, a new oversampling method called SCORE-SVM-SMOTE was proposed to solve the imbalanced training set problem based on the SVM-SMOTE algorithm. Results on the benchmark datasets indicated that sgRNA-2wPSM is superior to other methods.
Collapse
Affiliation(s)
- Lichao Zhang
- School of Intelligent Manufacturing and Equipment, Shenzhen Institute of Information Technology, Shenzhen, China.
| | - Tao Bai
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China; School of Mathematics & Computer Science, Yanan University, Shanxi, 716000, China.
| | - Hao Wu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China.
| |
Collapse
|
11
|
Wang H, Zhang Z, Li H, Li J, Li H, Liu M, Liang P, Xi Q, Xing Y, Yang L, Zuo Y. A cost-effective machine learning-based method for preeclampsia risk assessment and driver genes discovery. Cell Biosci 2023; 13:41. [PMID: 36849879 PMCID: PMC9972636 DOI: 10.1186/s13578-023-00991-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 02/15/2023] [Indexed: 03/01/2023] Open
Abstract
BACKGROUND The placenta, as a unique exchange organ between mother and fetus, is essential for successful human pregnancy and fetal health. Preeclampsia (PE) caused by placental dysfunction contributes to both maternal and infant morbidity and mortality. Accurate identification of PE patients plays a vital role in the formulation of treatment plans. However, the traditional clinical methods of PE have a high misdiagnosis rate. RESULTS Here, we first designed a computational biology method that used single-cell transcriptome (scRNA-seq) of healthy pregnancy (38 wk) and early-onset PE (28-32 wk) to identify pathological cell subpopulations and predict PE risk. Based on machine learning methods and feature selection techniques, we observed that the Tuning ReliefF (TURF) score hybrid with XGBoost (TURF_XGB) achieved optimal performance, with 92.61% accuracy and 92.46% recall for classifying nine cell subpopulations of healthy placentas. Biological landscapes of placenta heterogeneity could be mapped by the 110 marker genes screened by TURF_XGB, which revealed the superiority of the TURF feature mining. Moreover, we processed the PE dataset with LASSO to obtain 497 biomarkers. Integration analysis of the above two gene sets revealed that dendritic cells were closely associated with early-onset PE, and C1QB and C1QC might drive preeclampsia by mediating inflammation. In addition, an ensemble model-based risk stratification card was developed to classify preeclampsia patients, and its area under the receiver operating characteristic curve (AUC) could reach 0.99. For broader accessibility, we designed an accessible online web server ( http://bioinfor.imu.edu.cn/placenta ). CONCLUSION Single-cell transcriptome-based preeclampsia risk assessment using an ensemble machine learning framework is a valuable asset for clinical decision-making. C1QB and C1QC may be involved in the development and progression of early-onset PE by affecting the complement and coagulation cascades pathway that mediate inflammation, which has important implications for better understanding the pathogenesis of PE.
Collapse
Affiliation(s)
- Hao Wang
- grid.411643.50000 0004 1761 0411The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070 China ,Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mongolia Wesure Date Technology Co., Ltd., Hohhot, 010010 China
| | - Zhaoyue Zhang
- grid.54549.390000 0004 0369 4060School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054 China
| | - Haicheng Li
- grid.411643.50000 0004 1761 0411The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070 China ,Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mongolia Wesure Date Technology Co., Ltd., Hohhot, 010010 China
| | - Jinzhao Li
- grid.411643.50000 0004 1761 0411The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070 China
| | - Hanshuang Li
- grid.411643.50000 0004 1761 0411The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070 China
| | - Mingzhu Liu
- grid.411643.50000 0004 1761 0411The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070 China ,Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mongolia Wesure Date Technology Co., Ltd., Hohhot, 010010 China
| | - Pengfei Liang
- grid.411643.50000 0004 1761 0411The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070 China
| | - Qilemuge Xi
- grid.411643.50000 0004 1761 0411The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070 China
| | - Yongqiang Xing
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China.
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China.
| | - Yongchun Zuo
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China. .,Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mongolia Wesure Date Technology Co., Ltd., Hohhot, 010010, China.
| |
Collapse
|
12
|
Chen L, Yu L, Gao L. Potent antibiotic design via guided search from antibacterial activity evaluations. Bioinformatics 2023; 39:btad059. [PMID: 36707990 PMCID: PMC9897189 DOI: 10.1093/bioinformatics/btad059] [Citation(s) in RCA: 46] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 01/14/2023] [Accepted: 01/25/2023] [Indexed: 01/29/2023] Open
Abstract
MOTIVATION The emergence of drug-resistant bacteria makes the discovery of new antibiotics an urgent issue, but finding new molecules with the desired antibacterial activity is an extremely difficult task. To address this challenge, we established a framework, MDAGS (Molecular Design via Attribute-Guided Search), to optimize and generate potent antibiotic molecules. RESULTS By designing the antibacterial activity latent space and guiding the optimization of functional compounds based on this space, the model MDAGS can generate novel compounds with desirable antibacterial activity without the need for extensive expensive and time-consuming evaluations. Compared with existing antibiotics, candidate antibacterial compounds generated by MDAGS always possessed significantly better antibacterial activity and ensured high similarity. Furthermore, although without explicit constraints on similarity to known antibiotics, these candidate antibacterial compounds all exhibited the highest structural similarity to antibiotics of expected function in the DrugBank database query. Overall, our approach provides a viable solution to the problem of bacterial drug resistance. AVAILABILITY AND IMPLEMENTATION Code of the model and datasets can be downloaded from GitHub (https://github.com/LiangYu-Xidian/MDAGS). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lu Chen
- School of Computer Science and Technology, Xidian University, Xi’an 710071, Shaanxi, China
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi’an 710071, Shaanxi, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi’an 710071, Shaanxi, China
| |
Collapse
|
13
|
Zhang H, Chi M, Su D, Xiong Y, Wei H, Yu Y, Zuo Y, Yang L. A random forest-based metabolic risk model to assess the prognosis and metabolism-related drug targets in ovarian cancer. Comput Biol Med 2023; 153:106432. [PMID: 36608460 DOI: 10.1016/j.compbiomed.2022.106432] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 11/13/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022]
Abstract
As one of the most common gynecologic malignant tumors, ovarian cancer is usually diagnosed at an advanced and incurable stage because of its early asymptomatic onset. Increasing research into tumor biology has demonstrated that abnormal cellular metabolism precedes tumorigenesis, therefore it has become an area of active research in academia. Cellular metabolism is of great significance in cancer diagnostic and prognostic studies. In this study, we integrated The Cancer Genome Atlas dataset with multiple Gene Expression Omnibus ovarian cancer datasets, identified 17 metabolic pathways with prognostic values using the random forest algorithm, constructed a metabolic risk scoring model based on metabolic pathway enrichment scores, and classified patients with ovarian cancer into two subtypes. Then, we systematically investigated the differences between different subtypes in terms of prognosis, differential gene expression, immune signature enrichment, Hallmark signature enrichment, and somatic mutations. As well, we successfully predicted differences in sensitivity to immunotherapy and chemotherapy drugs in patients with different metabolic risk subtypes. Moreover, we identified 5 drug targets associated with high metabolic risk and low metabolic risk ovarian cancer phenotypes through the weighted correlation network analysis and investigated their roles in the genesis of ovarian cancer. Finally, we developed an XGBoost classifier for predicting metabolic risk types in patients with ovarian cancer, producing a good predictive effect. In light of the above study, the research findings will provide valuable information for prognostic prediction and personalized medical treatment of patients with ovarian cancer.
Collapse
Affiliation(s)
- Haoxin Zhang
- Department of Gastrointestinal Oncology, Harbin Medical University Cancer Hospital, Harbin, 150081, China
| | - Meng Chi
- Department of Anesthesiology, Harbin Medical University Cancer Hospital, Harbin, 150081, China
| | - Dongqing Su
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yuqiang Xiong
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Haodong Wei
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yao Yu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yongchun Zuo
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China; Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mongolia Wesure Date Technology Co., Ltd, Hohhot, 010010, China.
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China.
| |
Collapse
|
14
|
Wan H, Liu Q, Ju Y. Utilize a few features to classify presynaptic and postsynaptic neurotoxins. Comput Biol Med 2023; 152:106380. [PMID: 36473343 DOI: 10.1016/j.compbiomed.2022.106380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Revised: 10/21/2022] [Accepted: 11/28/2022] [Indexed: 12/02/2022]
Abstract
Neurotoxins are a class of proteins that have a significant damaging effect on nerve tissue. Neurotoxins are classified into presynaptic neurotoxins and postsynaptic neurotoxins, and accurate identification of neurotoxins plays a key role in drug development. In this study, 90 presynaptic neurotoxins and 165 postsynaptic neurotoxins were classified. The features of the presynaptic and postsynaptic neurotoxin sequences were extracted using the AutoProp feature extraction method and feature selection was performed using the maximum relevance maximum distance (MRMD) program, Finally, only two features were retained to achieve 84.7% classification accuracy. Moreover, it was found that the two retained features were present in the conserved sites and motifs of presynaptic neurotoxins and could represent the critical structures of presynaptic neurotoxins. This method demonstrates that using a few key features to classify proteins can effectively identify critical protein structures.
Collapse
Affiliation(s)
- Hao Wan
- Institute of Advanced Cross-field Science, College of Life Science, Qingdao University, Qingdao, China
| | - Qing Liu
- Department of Anesthesiology, Hospital (T.C.M) Affiliated to Southwest Medical University, Luzhou, China.
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China.
| |
Collapse
|
15
|
Yuan SS, Gao D, Xie XQ, Ma CY, Su W, Zhang ZY, Zheng Y, Ding H. IBPred: A sequence-based predictor for identifying ion binding protein in phage. Comput Struct Biotechnol J 2022; 20:4942-4951. [PMID: 36147670 PMCID: PMC9474292 DOI: 10.1016/j.csbj.2022.08.053] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 08/23/2022] [Accepted: 08/24/2022] [Indexed: 11/16/2022] Open
Abstract
Ion binding proteins (IBPs) can selectively and non-covalently interact with ions. IBPs in phages also play an important role in biological processes. Therefore, accurate identification of IBPs is necessary for understanding their biological functions and molecular mechanisms that involve binding to ions. Since molecular biology experimental methods are still labor-intensive and cost-ineffective in identifying IBPs, it is helpful to develop computational methods to identify IBPs quickly and efficiently. In this work, a random forest (RF)-based model was constructed to quickly identify IBPs. Based on the protein sequence information and residues' physicochemical properties, the dipeptide composition combined with the physicochemical correlation between two residues were proposed for the extraction of features. A feature selection technique called analysis of variance (ANOVA) was used to exclude redundant information. By comparing with other classified methods, we demonstrated that our method could identify IBPs accurately. Based on the model, a Python package named IBPred was built with the source code which can be accessed at https://github.com/ShishiYuan/IBPred.
Collapse
Affiliation(s)
- Shi-Shi Yuan
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Dong Gao
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xue-Qin Xie
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Cai-Yi Ma
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Wei Su
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zhao-Yue Zhang
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu 611844, China
| | - Yan Zheng
- Baotou Medical College, Baotou 014040, China
| | - Hui Ding
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
16
|
Li H, Pang Y, Liu B, Yu L. MoRF-FUNCpred: Molecular Recognition Feature Function Prediction Based on Multi-Label Learning and Ensemble Learning. Front Pharmacol 2022; 13:856417. [PMID: 35350759 PMCID: PMC8957949 DOI: 10.3389/fphar.2022.856417] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 02/14/2022] [Indexed: 01/13/2023] Open
Abstract
Intrinsically disordered regions (IDRs) without stable structure are important for protein structures and functions. Some IDRs can be combined with molecular fragments to make itself completed the transition from disordered to ordered, which are called molecular recognition features (MoRFs). There are five main functions of MoRFs: molecular recognition assembler (MoR_assembler), molecular recognition chaperone (MoR_chaperone), molecular recognition display sites (MoR_display_sites), molecular recognition effector (MoR_effector), and molecular recognition scavenger (MoR_scavenger). Researches on functions of molecular recognition features are important for pharmaceutical and disease pathogenesis. However, the existing computational methods can only predict the MoRFs in proteins, failing to distinguish their different functions. In this paper, we treat MoRF function prediction as a multi-label learning task and solve it with the Binary Relevance (BR) strategy. Finally, we use Support Vector Machine (SVM), Logistic Regression (LR), Decision Tree (DT), and Random Forest (RF) as basic models to construct MoRF-FUNCpred through ensemble learning. Experimental results show that MoRF-FUNCpred performs well for MoRF function prediction. To the best knowledge of ours, MoRF-FUNCpred is the first predictor for predicting the functions of MoRFs. Availability and Implementation: The stand alone package of MoRF-FUNCpred can be accessed from https://github.com/LiangYu-Xidian/MoRF-FUNCpred.
Collapse
Affiliation(s)
- Haozheng Li
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Yihe Pang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China.,Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
17
|
Sun Y, Li H, Zheng L, Li J, Hong Y, Liang P, Kwok LY, Zuo Y, Zhang W, Zhang H. iProbiotics: a machine learning platform for rapid identification of probiotic properties from whole-genome primary sequences. Brief Bioinform 2021; 23:6444315. [PMID: 34849572 DOI: 10.1093/bib/bbab477] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Revised: 09/28/2021] [Accepted: 10/15/2021] [Indexed: 12/13/2022] Open
Abstract
Lactic acid bacteria consortia are commonly present in food, and some of these bacteria possess probiotic properties. However, discovery and experimental validation of probiotics require extensive time and effort. Therefore, it is of great interest to develop effective screening methods for identifying probiotics. Advances in sequencing technology have generated massive genomic data, enabling us to create a machine learning-based platform for such purpose in this work. This study first selected a comprehensive probiotics genome dataset from the probiotic database (PROBIO) and literature surveys. Then, k-mer (from 2 to 8) compositional analysis was performed, revealing diverse oligonucleotide composition in strain genomes and apparently more probiotic (P-) features in probiotic genomes than non-probiotic genomes. To reduce noise and improve computational efficiency, 87 376 k-mers were refined by an incremental feature selection (IFS) method, and the model achieved the maximum accuracy level at 184 core features, with a high prediction accuracy (97.77%) and area under the curve (98.00%). Functional genomic analysis using annotations from gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and Rapid Annotation using Subsystem Technology (RAST) databases, as well as analysis of genes associated with host gastrointestinal survival/settlement, carbohydrate utilization, drug resistance and virulence factors, revealed that the distribution of P-features was biased toward genes/pathways related to probiotic function. Our results suggest that the role of probiotics is not determined by a single gene, but by a combination of k-mer genomic components, providing new insights into the identification and underlying mechanisms of probiotics. This work created a novel and free online bioinformatic tool, iProbiotics, which would facilitate rapid screening for probiotics.
Collapse
Affiliation(s)
- Yu Sun
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of life sciences, Inner Mongolia University, Hohhot 010070, China
| | - Haicheng Li
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of life sciences, Inner Mongolia University, Hohhot 010070, China
| | - Lei Zheng
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of life sciences, Inner Mongolia University, Hohhot 010070, China
| | - Jinzhao Li
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of life sciences, Inner Mongolia University, Hohhot 010070, China
| | - Yan Hong
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of life sciences, Inner Mongolia University, Hohhot 010070, China
| | - Pengfei Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of life sciences, Inner Mongolia University, Hohhot 010070, China
| | - Lai-Yu Kwok
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot 010018, China
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of life sciences, Inner Mongolia University, Hohhot 010070, China
| | - Wenyi Zhang
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot 010018, China
| | - Heping Zhang
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot 010018, China
| |
Collapse
|
18
|
Xu H, Zhao B, Zhong W, Teng P, Qiao H. Identification of miRNA Signature Associated With Erectile Dysfunction in Type 2 Diabetes Mellitus by Support Vector Machine-Recursive Feature Elimination. Front Genet 2021; 12:762136. [PMID: 34707644 PMCID: PMC8542849 DOI: 10.3389/fgene.2021.762136] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2021] [Accepted: 09/22/2021] [Indexed: 01/10/2023] Open
Abstract
Diabetic mellitus erectile dysfunction (DMED) is one of the most common complications of diabetes mellitus (DM), which seriously affects the self-esteem and quality of life of diabetics. MicroRNAs (miRNAs) are endogenous non-coding RNAs whose expression levels can affect multiple cellular processes. Many pieces of studies have demonstrated that miRNA plays a role in the occurrence and development of DMED. However, the exact mechanism of this process is unclear. Hence, we apply miRNA sequencing from blood samples of 10 DMED patients and 10 DM controls to study the mechanisms of miRNA interactions in DMED patients. Firstly, we found four characteristic miRNAs as signature by the SVM-RFE method (hsa-let-7E-5p, hsa-miR-30 days-5p, hsa-miR-199b-5p, and hsa-miR-342–3p), called DMEDSig-4. Subsequently, we correlated DMEDSig-4 with clinical factors and further verified the ability of these miRNAs to classify samples. Finally, we functionally verified the relationship between DMEDSig-4 and DMED by pathway enrichment analysis of miRNA and its target genes. In brief, our study found four key miRNAs, which may be the key influencing factors of DMED. Meanwhile, the DMEDSig-4 could help in the development of new therapies for DMED.
Collapse
Affiliation(s)
- Haibo Xu
- The Second Affiliated Hospital of Harbin Medical University, Harbin, China.,The First Hospital of Qiqihar, Qiqihar, China
| | - Baoyin Zhao
- The First Hospital of Qiqihar, Qiqihar, China
| | - Wei Zhong
- The First Hospital of Qiqihar, Qiqihar, China
| | - Peng Teng
- The First Hospital of Qiqihar, Qiqihar, China
| | - Hong Qiao
- The Second Affiliated Hospital of Harbin Medical University, Harbin, China
| |
Collapse
|