1
|
Chen Y, Du Z, Ren X, Pan C, Zhu Y, Li Z, Meng T, Yao X. mRNA-CLA: An interpretable deep learning approach for predicting mRNA subcellular localization. Methods 2024; 227:17-26. [PMID: 38705502 DOI: 10.1016/j.ymeth.2024.04.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 03/30/2024] [Accepted: 04/28/2024] [Indexed: 05/07/2024] Open
Abstract
Messenger RNA (mRNA) is vital for post-transcriptional gene regulation, acting as the direct template for protein synthesis. However, the methods available for predicting mRNA subcellular localization need to be improved and enhanced. Notably, few existing algorithms can annotate mRNA sequences with multiple localizations. In this work, we propose the mRNA-CLA, an innovative multi-label subcellular localization prediction framework for mRNA, leveraging a deep learning approach with a multi-head self-attention mechanism. The framework employs a multi-scale convolutional layer to extract sequence features across different regions and uses a self-attention mechanism explicitly designed for each sequence. Paired with Position Weight Matrices (PWMs) derived from the convolutional neural network layers, our model offers interpretability in the analysis. In particular, we perform a base-level analysis of mRNA sequences from diverse subcellular localizations to determine the nucleotide specificity corresponding to each site. Our evaluations demonstrate that the mRNA-CLA model substantially outperforms existing methods and tools.
Collapse
Affiliation(s)
- Yifan Chen
- Institute of Artificial Intelligence Application, College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha, Hunan 410004, China
| | - Zhenya Du
- Guangzhou Xinhua University, 510520, Guangzhou, China
| | - Xuanbai Ren
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, China
| | - Chu Pan
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, China
| | - Yangbin Zhu
- Manufacturing and Electronic Engineering, Wenzhou University of Technology, 325027, Wenzhou, China.
| | - Zhen Li
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, 510006, China.
| | - Tao Meng
- Institute of Artificial Intelligence Application, College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha, Hunan 410004, China
| | - Xiaojun Yao
- Faculty of Applied Sciences, Macao Polytechnic University, 999078, Macao.
| |
Collapse
|
2
|
Liu W, Teng Z, Li Z, Chen J. CVGAE: A Self-Supervised Generative Method for Gene Regulatory Network Inference Using Single-Cell RNA Sequencing Data. Interdiscip Sci 2024:10.1007/s12539-024-00633-y. [PMID: 38778003 DOI: 10.1007/s12539-024-00633-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Revised: 04/07/2024] [Accepted: 04/09/2024] [Indexed: 05/25/2024]
Abstract
Gene regulatory network (GRN) inference based on single-cell RNA sequencing data (scRNAseq) plays a crucial role in understanding the regulatory mechanisms between genes. Various computational methods have been employed for GRN inference, but their performance in terms of network accuracy and model generalization is not satisfactory, and their poor performance is caused by high-dimensional data and network sparsity. In this paper, we propose a self-supervised method for gene regulatory network inference using single-cell RNA sequencing data (CVGAE). CVGAE uses graph neural network for inductive representation learning, which merges gene expression data and observed topology into a low-dimensional vector space. The well-trained vectors will be used to calculate mathematical distance of each gene, and further predict interactions between genes. In overall framework, FastICA is implemented to relief computational complexity caused by high dimensional data, and CVGAE adopts multi-stacked GraphSAGE layers as an encoder and an improved decoder to overcome network sparsity. CVGAE is evaluated on several single cell datasets containing four related ground-truth networks, and the result shows that CVGAE achieve better performance than comparative methods. To validate learning and generalization capabilities, CVGAE is applied in few-shot environment by change the ratio of train set and test set. In condition of few-shot, CVGAE obtains comparable or superior performance.
Collapse
Affiliation(s)
- Wei Liu
- School of Computer Science, Xiangtan University, Xiangtan, 411105, China.
| | - Zhijie Teng
- School of Computer Science, Xiangtan University, Xiangtan, 411105, China
| | - Zejun Li
- School of Computer Science and Engineering, Hunan Institute of Technology, Hengyang, 412002, China
| | - Jing Chen
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.
| |
Collapse
|
3
|
Stock M, Popp N, Fiorentino J, Scialdone A. Topological benchmarking of algorithms to infer gene regulatory networks from single-cell RNA-seq data. Bioinformatics 2024; 40:btae267. [PMID: 38627250 PMCID: PMC11096270 DOI: 10.1093/bioinformatics/btae267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 02/28/2024] [Accepted: 04/16/2024] [Indexed: 05/18/2024] Open
Abstract
MOTIVATION In recent years, many algorithms for inferring gene regulatory networks from single-cell transcriptomic data have been published. Several studies have evaluated their accuracy in estimating the presence of an interaction between pairs of genes. However, these benchmarking analyses do not quantify the algorithms' ability to capture structural properties of networks, which are fundamental, e.g., for studying the robustness of a gene network to external perturbations. Here, we devise a three-step benchmarking pipeline called STREAMLINE that quantifies the ability of algorithms to capture topological properties of networks and identify hubs. RESULTS To this aim, we use data simulated from different types of networks as well as experimental data from three different organisms. We apply our benchmarking pipeline to four inference algorithms and provide guidance on which algorithm should be used depending on the global network property of interest. AVAILABILITY AND IMPLEMENTATION STREAMLINE is available at https://github.com/ScialdoneLab/STREAMLINE. The data generated in this study are available at https://doi.org/10.5281/zenodo.10710444.
Collapse
Affiliation(s)
- Marco Stock
- Institute of Epigenetics and Stem Cells, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 81377, Germany
- Institute of Functional Epigenetics, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich 85354, Germany
| | - Niclas Popp
- Institute of Epigenetics and Stem Cells, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 81377, Germany
- Institute of Functional Epigenetics, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
| | - Jonathan Fiorentino
- Institute of Epigenetics and Stem Cells, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 81377, Germany
- Institute of Functional Epigenetics, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
| | - Antonio Scialdone
- Institute of Epigenetics and Stem Cells, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 81377, Germany
- Institute of Functional Epigenetics, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
| |
Collapse
|
4
|
Xu M, Abdullah NA, Md Sabri AQ. A method to improve the prediction performance of cancer-gene association by screening negative training samples through gene network data. Comput Biol Chem 2024; 108:107997. [PMID: 38154318 DOI: 10.1016/j.compbiolchem.2023.107997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 11/03/2023] [Accepted: 12/03/2023] [Indexed: 12/30/2023]
Abstract
This work focuses on data sampling in cancer-gene association prediction. Currently, researchers are using machine learning methods to predict genes that are more likely to produce cancer-causing mutations. To improve the performance of machine learning models, methods have been proposed, one of which is to improve the quality of the training data. Existing methods focus mainly on positive data, i.e. cancer driver genes, for screening selection. This paper proposes a low-cancer-related gene screening method based on gene network and graph theory algorithms to improve the negative samples selection. Genetic data with low cancer correlation is used as negative training samples. After experimental verification, using the negative samples screened by this method to train the cancer gene classification model can improve prediction performance. The biggest advantage of this method is that it can be easily combined with other methods that focus on enhancing the quality of positive training samples. It has been demonstrated that significant improvement is achieved by combining this method with three state-of-the-arts cancer gene prediction methods.
Collapse
Affiliation(s)
- Mingzhe Xu
- Faculty of Computer Science & Information Technology, Universiti Malaya, Kuala Lumpur, 50603 Malaysia; School of Energy and Intelligence Engineering, Henan University of Animal Husbandry and Economy, #6 North Longzihu Rd, Zhengzhou 450000, China.
| | - Nor Aniza Abdullah
- Faculty of Computer Science & Information Technology, Universiti Malaya, Kuala Lumpur, 50603 Malaysia.
| | - Aznul Qalid Md Sabri
- Faculty of Computer Science & Information Technology, Universiti Malaya, Kuala Lumpur, 50603 Malaysia.
| |
Collapse
|
5
|
Mei P, Zhao YH. Dynamic network link prediction with node representation learning from graph convolutional networks. Sci Rep 2024; 14:538. [PMID: 38177652 PMCID: PMC10766634 DOI: 10.1038/s41598-023-50977-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Accepted: 12/28/2023] [Indexed: 01/06/2024] Open
Abstract
Dynamic network link prediction is extensively applicable in various scenarios, and it has progressively emerged as a focal point in data mining research. The comprehensive and accurate extraction of node information, as well as a deeper understanding of the temporal evolution pattern, are particularly crucial in the investigation of link prediction in dynamic networks. To address this issue, this paper introduces a node representation learning framework based on Graph Convolutional Networks (GCN), referred to as GCN_MA. This framework effectively combines GCN, Recurrent Neural Networks (RNN), and multi-head attention to achieve comprehensive and accurate representations of node embedding vectors. It aggregates network structural features and node features through GCN and incorporates an RNN with multi-head attention mechanisms to capture the temporal evolution patterns of dynamic networks from both global and local perspectives. Additionally, a node representation algorithm based on the node aggregation effect (NRNAE) is proposed, which synthesizes information including node aggregation and temporal evolution to comprehensively represent the structural characteristics of the network. The effectiveness of the proposed method for link prediction is validated through experiments conducted on six distinct datasets. The experimental outcomes demonstrate that the proposed approach yields satisfactory results in comparison to state-of-the-art baseline methods.
Collapse
Affiliation(s)
- Peng Mei
- School of Information Engineering, Inner Mongolia University of Science and Technology, Baotou, 014010, China
| | - Yu Hong Zhao
- School of Information Engineering, Inner Mongolia University of Science and Technology, Baotou, 014010, China.
| |
Collapse
|
6
|
Chatterjee D, Mou SI, Sultana T, Hosen MI, Faruk MO. Identification and validation of prognostic signature genes of bladder cancer by integrating methylation and transcriptomic analysis. Sci Rep 2024; 14:368. [PMID: 38172584 PMCID: PMC10764961 DOI: 10.1038/s41598-023-50740-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Accepted: 12/24/2023] [Indexed: 01/05/2024] Open
Abstract
Being a frequent malignant tumor of the genitourinary system, Bladder Urothelial Carcinoma (BLCA) has a poor prognosis. This study focused on identifying and validating prognostic biomarkers utilizing methylation, transcriptomics, and clinical data from The Cancer Genome Atlas Bladder Urothelial Carcinoma (TCGA BLCA) cohort. The impact of altered differentially methylated hallmark pathway genes was subjected to clustering analysis to observe changes in the transcriptional landscape on BLCA patients and identify two subtypes of patients from the TCGA BLCA population where Subtype 2 was associated with the worst prognosis with a p-value of 0.00032. Differential expression and enrichment analysis showed that subtype 2 was enriched in immune-responsive and cancer-progressive pathways, whereas subtype 1 was enriched in biosynthetic pathways. Following, regression and network analyses revealed Epidermal Growth Factor Receptor (EGFR), Fos-related antigen 1 (FOSL1), Nuclear Factor Erythroid 2 (NFE2), ADP-ribosylation factor-like protein 4D (ARL4D), SH3 domain containing ring finger 2 (SH3RF2), and Cadherin 3 (CDH3) genes to be the most significant prognostic gene markers. These genes were used to construct a risk model that separated the BLCA patients into high and low-risk groups. The risk model was also validated in an external dataset by performing survival analysis between high and low-risk groups with a p-value < 0.001 and the result showed the high group was significantly associated with poor prognosis compared to the low group. Single-cell analyses revealed the elevated level of these genes in the tumor microenvironment and associated with immune response. High-grade patients also tend to have a high expression of these genes compared to low-grade patients. In conclusion, this research developed a six-gene signature that is pertinent to the prediction of overall survival (OS) and might contribute to the advancement of precision medicine in the management of bladder cancer.
Collapse
Affiliation(s)
- Dipankor Chatterjee
- Department of Biochemistry and Molecular Biology, University of Dhaka, Dhaka, 1000, Bangladesh
| | - Sadia Islam Mou
- Department of Biochemistry and Molecular Biology, University of Dhaka, Dhaka, 1000, Bangladesh
| | - Tamanna Sultana
- Department of Biochemistry and Molecular Biology, University of Dhaka, Dhaka, 1000, Bangladesh
| | - Md Ismail Hosen
- Department of Biochemistry and Molecular Biology, University of Dhaka, Dhaka, 1000, Bangladesh
| | - Md Omar Faruk
- Department of Biochemistry and Molecular Biology, University of Dhaka, Dhaka, 1000, Bangladesh.
| |
Collapse
|
7
|
Xu L, Fu X, Zhuo L, Zhou Z, Liao X, Tian S, Kang R, Chen Y. SGAE-MDA: Exploring the MiRNA-disease associations in herbal medicines based on semi-supervised graph autoencoder. Methods 2024; 221:73-81. [PMID: 38123109 DOI: 10.1016/j.ymeth.2023.12.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 11/28/2023] [Accepted: 12/12/2023] [Indexed: 12/23/2023] Open
Abstract
Research indicates that miRNAs present in herbal medicines are crucial for identifying disease markers, advancing gene therapy, facilitating drug delivery, and so on. These miRNAs maintain stability in the extracellular environment, making them viable tools for disease diagnosis. They can withstand the digestive processes in the gastrointestinal tract, positioning them as potential carriers for specific oral drug delivery. By engineering plants to generate effective, non-toxic miRNA interference sequences, it's possible to broaden their applicability, including the treatment of diseases such as hepatitis C. Consequently, delving into the miRNA-disease associations (MDAs) within herbal medicines holds immense promise for diagnosing and addressing miRNA-related diseases. In our research, we propose the SGAE-MDA model, which harnesses the strengths of a graph autoencoder (GAE) combined with a semi-supervised approach to uncover potential MDAs in herbal medicines more effectively. Leveraging the GAE framework, the SGAE-MDA model exactly integrates the inherent feature vectors of miRNAs and disease nodes with the regulatory data in the miRNA-disease network. Additionally, the proposed semi-supervised learning approach randomly hides the partial structure of the miRNA-disease network, subsequently reconstructing them within the GAE framework. This technique effectively minimizes network noise interference. Through comparison against other leading deep learning models, the results consistently highlighted the superior performance of the proposed SGAE-MDA model. Our code and dataset can be available at: https://github.com/22n9n23/SGAE-MDA.
Collapse
Affiliation(s)
- Lei Xu
- Wenzhou University of Technology, Wenzhou, China
| | - Xiangzheng Fu
- Neher's Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao, China; College of Information Science and Engineering, Hunan University, Changsha, Hunan, China
| | - Linlin Zhuo
- Wenzhou University of Technology, Wenzhou, China
| | | | - Xuefeng Liao
- Wenzhou University of Technology, Wenzhou, China.
| | - Sha Tian
- Department of Internal Medicine, College of Integrated Chinese and Western Medicine, Hunan University of Chinese Medicine, Changsha, Hunan, China.
| | - Ruofei Kang
- Xuhui Excellent Health Information Technology Co., Ltd., China
| | - Yifan Chen
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, China.
| |
Collapse
|
8
|
Liao Q, Fu X, Zhuo L, Chen H. An efficient model for predicting human diseases through miRNA based on multiple-types of contrastive learning. Front Microbiol 2023; 14:1325001. [PMID: 38163075 PMCID: PMC10755968 DOI: 10.3389/fmicb.2023.1325001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 11/16/2023] [Indexed: 01/03/2024] Open
Abstract
Multiple studies have demonstrated that microRNA (miRNA) can be deeply involved in the regulatory mechanism of human microbiota, thereby inducing disease. Developing effective methods to infer potential associations between microRNAs (miRNAs) and diseases can aid early diagnosis and treatment. Recent methods utilize machine learning or deep learning to predict miRNA-disease associations (MDAs), achieving state-of-the-art performance. However, the problem of sparse neighborhoods of nodes due to lack of data has not been well solved. To this end, we propose a new model named MTCL-MDA, which integrates multiple-types of contrastive learning strategies into a graph collaborative filtering model to predict potential MDAs. The model adopts a contrastive learning strategy based on topology, which alleviates the damage to model performance caused by sparse neighborhoods. In addition, the model also adopts a semantic-based contrastive learning strategy, which not only reduces the impact of noise introduced by topology-based contrastive learning, but also enhances the semantic information of nodes. Experimental results show that our model outperforms existing models on all evaluation metrics. Case analysis shows that our model can more accurately identify potential MDA, which is of great significance for the screening and diagnosis of real-life diseases. Our data and code are publicly available at: https://github.com/Lqingquan/MTCL-MDA.
Collapse
Affiliation(s)
- Qingquan Liao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Linlin Zhuo
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, China
| | - Hao Chen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| |
Collapse
|
9
|
Li Z, Zhang Y, Bai Y, Xie X, Zeng L. IMC-MDA: Prediction of miRNA-disease association based on induction matrix completion. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:10659-10674. [PMID: 37322953 DOI: 10.3934/mbe.2023471] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
To comprehend the etiology and pathogenesis of many illnesses, it is essential to identify disease-associated microRNAs (miRNAs). However, there are a number of challenges with current computational approaches, such as the lack of "negative samples", that is, confirmed irrelevant miRNA-disease pairs, and the poor performance in terms of predicting miRNAs related with "isolated diseases", i.e. illnesses with no known associated miRNAs, which presents the need for novel computational methods. In this study, for the purpose of predicting the connection between disease and miRNA, an inductive matrix completion model was designed, referred to as IMC-MDA. In the model of IMC-MDA, for each miRNA-disease pair, the predicted marks are calculated by combining the known miRNA-disease connection with the integrated disease similarities and miRNA similarities. Based on LOOCV, IMC-MDA had an AUC of 0.8034, which shows better performance than previous methods. Furthermore, experiments have validated the prediction of disease-related miRNAs for three major human diseases: colon cancer, kidney cancer, and lung cancer.
Collapse
Affiliation(s)
- Zejun Li
- School of Computer and Information Science, Hunan Institute of Technology, Hengyang 412002, China
| | - Yuxiang Zhang
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, Henan, 450001, China
| | - Yuting Bai
- College of Information Science and Engineering, Hunan University, Changsha 410082, Hunan, China
| | - Xiaohui Xie
- School of Computer and Information Science, Hunan Institute of Technology, Hengyang 412002, China
| | - Lijun Zeng
- School of Computer and Information Science, Hunan Institute of Technology, Hengyang 412002, China
| |
Collapse
|
10
|
Liao Q, Ye Y, Li Z, Chen H, Zhuo L. Prediction of miRNA-disease associations in microbes based on graph convolutional networks and autoencoders. Front Microbiol 2023; 14:1170559. [PMID: 37187536 PMCID: PMC10175670 DOI: 10.3389/fmicb.2023.1170559] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 03/21/2023] [Indexed: 05/17/2023] Open
Abstract
MicroRNAs (miRNAs) are short RNA molecular fragments that regulate gene expression by targeting and inhibiting the expression of specific RNAs. Due to the fact that microRNAs affect many diseases in microbial ecology, it is necessary to predict microRNAs' association with diseases at the microbial level. To this end, we propose a novel model, termed as GCNA-MDA, where dual-autoencoder and graph convolutional network (GCN) are integrated to predict miRNA-disease association. The proposed method leverages autoencoders to extract robust representations of miRNAs and diseases and meantime exploits GCN to capture the topological information of miRNA-disease networks. To alleviate the impact of insufficient information for the original data, the association similarity and feature similarity data are combined to calculate a more complete initial basic vector of nodes. The experimental results on the benchmark datasets demonstrate that compared with the existing representative methods, the proposed method has achieved the superior performance and its precision reaches up to 0.8982. These results demonstrate that the proposed method can serve as a tool for exploring miRNA-disease associations in microbial environments.
Collapse
Affiliation(s)
- Qingquan Liao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Yuxiang Ye
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, China
| | - Zihang Li
- School of Computing and Data Science, Xiamen University Malaysia, Sepang, Selangor, Malaysia
| | - Hao Chen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
- *Correspondence: Hao Chen
| | - Linlin Zhuo
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, China
- Linlin Zhuo
| |
Collapse
|
11
|
Li J, Zhuo L, Lian X, Pan S, Xu L. DPB-NBFnet: Using neural Bellman-Ford networks to predict DNA-protein binding. Front Pharmacol 2022; 13:1018294. [DOI: 10.3389/fphar.2022.1018294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Accepted: 09/28/2022] [Indexed: 11/13/2022] Open
Abstract
DNA is a hereditary material that plays an essential role in micro-organisms and almost all other organisms. Meanwhile, proteins are a vital composition and principal undertaker of microbe movement. Therefore, studying the bindings between DNA and proteins is of high significance from the micro-biological point of view. In addition, the binding affinity prediction is beneficial for the study of drug design. However, existing experimental methods to identifying DNA-protein bindings are extremely expensive and time consuming. To solve this problem, many deep learning methods (including graph neural networks) have been developed to predict DNA-protein interactions. Our work possesses the same motivation and we put the latest Neural Bellman-Ford neural networks (NBFnets) into use to build pair representations of DNA and protein to predict the existence of DNA-protein binding (DPB). NBFnet is a graph neural network model that uses the Bellman-Ford algorithms to get pair representations and has been proven to have a state-of-the-art performance when used to solve the link prediction problem. After building the pair representations, we designed a feed-forward neural network structure and got a 2-D vector output as a predicted value of positive or negative samples. We conducted our experiments on 100 datasets from ENCODE datasets. Our experiments indicate that the performance of DPB-NBFnet is competitive when compared with the baseline models. We have also executed parameter tuning with different architectures to explore the structure of our framework.
Collapse
|
12
|
Identification of key candidate genes for IgA nephropathy using machine learning and statistics based bioinformatics models. Sci Rep 2022; 12:13963. [PMID: 35978028 PMCID: PMC9385868 DOI: 10.1038/s41598-022-18273-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 08/08/2022] [Indexed: 11/08/2022] Open
Abstract
Immunoglobulin-A-nephropathy (IgAN) is a kidney disease caused by the accumulation of IgAN deposits in the kidneys, which causes inflammation and damage to the kidney tissues. Various bioinformatics analysis-based approaches are widely used to predict novel candidate genes and pathways associated with IgAN. However, there is still some scope to clearly explore the molecular mechanisms and causes of IgAN development and progression. Therefore, the present study aimed to identify key candidate genes for IgAN using machine learning (ML) and statistics-based bioinformatics models. First, differentially expressed genes (DEGs) were identified using limma, and then enrichment analysis was performed on DEGs using DAVID. Protein-protein interaction (PPI) was constructed using STRING and Cytoscape was used to determine hub genes based on connectivity and hub modules based on MCODE scores and their associated genes from DEGs. Furthermore, ML-based algorithms, namely support vector machine (SVM), least absolute shrinkage and selection operator (LASSO), and partial least square discriminant analysis (PLS-DA) were applied to identify the discriminative genes of IgAN from DEGs. Finally, the key candidate genes (FOS, JUN, EGR1, FOSB, and DUSP1) were identified as overlapping genes among the selected hub genes, hub module genes, and discriminative genes from SVM, LASSO, and PLS-DA, respectively which can be used for the diagnosis and treatment of IgAN.
Collapse
|