1
|
Chen SD, Chu CY, Wang CB, Yang Y, Xu ZY, Qu YL, Man Y. Integrated-omics profiling unveils the disparities of host defense to ECM scaffolds during wound healing in aged individuals. Biomaterials 2024; 311:122685. [PMID: 38944969 DOI: 10.1016/j.biomaterials.2024.122685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 06/11/2024] [Accepted: 06/23/2024] [Indexed: 07/02/2024]
Abstract
Extracellular matrix (ECM) scaffold membranes have exhibited promising potential to better the outcomes of wound healing by creating a regenerative microenvironment around. However, when compared to the application in younger individuals, the performance of the same scaffold membrane in promoting re-epithelialization and collagen deposition was observed dissatisfying in aged mice. To comprehensively explore the mechanisms underlying this age-related disparity, we conducted the integrated analysis, combing single-cell RNA sequencing (scRNA-Seq) with spatial transcriptomics, and elucidated six functionally and spatially distinctive macrophage groups and lymphocytes surrounding the ECM scaffolds. Through intergroup comparative analysis and cell-cell communication, we characterized the dysfunction of Spp1+ macrophages in aged mice impeded the activation of the type Ⅱ immune response, thus inhibiting the repair ability of epidermal cells and fibroblasts around the ECM scaffolds. These findings contribute to a deeper understanding of biomaterial applications in varied physiological contexts, thereby paving the way for the development of precision-based biomaterials tailored specifically for aged individuals in future therapeutic strategies.
Collapse
Affiliation(s)
- Shuai-Dong Chen
- State Key Laboratory of Oral Diseases & National Center for Stomatology & National Clinical Research Center for Oral Diseases & Department of Oral Implantology, West China Hospital of Stomatology, Sichuan University, Chengdu, 610041, Sichuan, China
| | - Chen-Yu Chu
- State Key Laboratory of Oral Diseases & National Center for Stomatology & National Clinical Research Center for Oral Diseases & Department of Oral Implantology, West China Hospital of Stomatology, Sichuan University, Chengdu, 610041, Sichuan, China
| | - Chen-Bing Wang
- College & Hospital of Stomatology, Anhui Medical University, Key Lab. of Oral Diseases Research of Anhui Province, Hefei, 230032, China
| | - Yang Yang
- State Key Laboratory of Oral Diseases & National Center for Stomatology & National Clinical Research Center for Oral Diseases & Department of Oral Implantology, West China Hospital of Stomatology, Sichuan University, Chengdu, 610041, Sichuan, China
| | - Zhao-Yu Xu
- State Key Laboratory of Oral Diseases & National Center for Stomatology & National Clinical Research Center for Oral Diseases & Department of Oral Implantology, West China Hospital of Stomatology, Sichuan University, Chengdu, 610041, Sichuan, China
| | - Yi-Li Qu
- State Key Laboratory of Oral Diseases & National Center for Stomatology & National Clinical Research Center for Oral Diseases & Department of Prosthodontics, West China Hospital of Stomatology, Sichuan University, Chengdu, 610041, Sichuan, China
| | - Yi Man
- State Key Laboratory of Oral Diseases & National Center for Stomatology & National Clinical Research Center for Oral Diseases & Department of Oral Implantology, West China Hospital of Stomatology, Sichuan University, Chengdu, 610041, Sichuan, China.
| |
Collapse
|
2
|
Hu D, Guan R, Liang K, Yu H, Quan H, Zhao Y, Liu X, He K. scEGG: an exogenous gene-guided clustering method for single-cell transcriptomic data. Brief Bioinform 2024; 25:bbae483. [PMID: 39344711 DOI: 10.1093/bib/bbae483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 09/05/2024] [Accepted: 09/12/2024] [Indexed: 10/01/2024] Open
Abstract
In recent years, there has been significant advancement in the field of single-cell data analysis, particularly in the development of clustering methods. Despite these advancements, most algorithms continue to focus primarily on analyzing the provided single-cell matrix data. However, within medical contexts, single-cell data often encompasses a wealth of exogenous information, such as gene networks. Overlooking this aspect could result in information loss and produce clustering outcomes lacking significant clinical relevance. To address this limitation, we introduce an innovative deep clustering method for single-cell data that leverages exogenous gene information to generate discriminative cell representations. Specifically, an attention-enhanced graph autoencoder has been developed to efficiently capture topological signal patterns among cells. Concurrently, a random walk on an exogenous protein-protein interaction network enabled the acquisition of the gene's embeddings. Ultimately, the clustering process entailed integrating and reconstructing gene-cell cooperative embeddings, which yielded a discriminative representation. Extensive experiments have demonstrated the effectiveness of the proposed method. This research provides enhanced insights into the characteristics of cells, thus laying the foundation for the early diagnosis and treatment of diseases. The datasets and code can be publicly accessed in the repository at https://github.com/DayuHuu/scEGG.
Collapse
Affiliation(s)
- Dayu Hu
- School of Computer, National University of Defense Technology, No. 109 Deya Road, 410073 Changsha, Hunan, China
| | - Renxiang Guan
- School of Computer, National University of Defense Technology, No. 109 Deya Road, 410073 Changsha, Hunan, China
| | - Ke Liang
- School of Computer, National University of Defense Technology, No. 109 Deya Road, 410073 Changsha, Hunan, China
| | - Hao Yu
- School of Computer, National University of Defense Technology, No. 109 Deya Road, 410073 Changsha, Hunan, China
| | - Hao Quan
- College of Medicine and Biological Information Engineering, Northeastern University, No.195 Chuangxin Road, 110169 Shenyang, Liaoning, China
| | - Yawei Zhao
- Medical Big Data Research Center, Chinese PLA General Hospital, No. 28 Fuxing Road, 100853 Beijing, China
| | - Xinwang Liu
- School of Computer, National University of Defense Technology, No. 109 Deya Road, 410073 Changsha, Hunan, China
| | - Kunlun He
- Medical Big Data Research Center, Chinese PLA General Hospital, No. 28 Fuxing Road, 100853 Beijing, China
| |
Collapse
|
3
|
Wang XF, Yu CQ, You ZH, Wang Y, Huang L, Qiao Y, Wang L, Li ZW. BEROLECMI: a novel prediction method to infer circRNA-miRNA interaction from the role definition of molecular attributes and biological networks. BMC Bioinformatics 2024; 25:264. [PMID: 39127625 DOI: 10.1186/s12859-024-05891-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 08/01/2024] [Indexed: 08/12/2024] Open
Abstract
Circular RNA (CircRNA)-microRNA (miRNA) interaction (CMI) is an important model for the regulation of biological processes by non-coding RNA (ncRNA), which provides a new perspective for the study of human complex diseases. However, the existing CMI prediction models mainly rely on the nearest neighbor structure in the biological network, ignoring the molecular network topology, so it is difficult to improve the prediction performance. In this paper, we proposed a new CMI prediction method, BEROLECMI, which uses molecular sequence attributes, molecular self-similarity, and biological network topology to define the specific role feature representation for molecules to infer the new CMI. BEROLECMI effectively makes up for the lack of network topology in the CMI prediction model and achieves the highest prediction performance in three commonly used data sets. In the case study, 14 of the 15 pairs of unknown CMIs were correctly predicted.
Collapse
Affiliation(s)
- Xin-Fei Wang
- School of Information Engineering, Xijing University, Xi'an, China
| | - Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi'an, China.
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| | - Yan Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China.
- School of Artificial Intelligence, Jilin University, Changchun, China.
| | - Lan Huang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Yan Qiao
- College of Agriculture and Forestry, Longdong University, Qingyang, China
| | - Lei Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
- Guangxi Academy of Sciences, Nanning, China
| | - Zheng-Wei Li
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
| |
Collapse
|
4
|
Ding X, Zhang L, Fan M, Li L. TME-NET: an interpretable deep neural network for predicting pan-cancer immune checkpoint inhibitor responses. Brief Bioinform 2024; 25:bbae410. [PMID: 39167797 PMCID: PMC11337220 DOI: 10.1093/bib/bbae410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 07/17/2024] [Accepted: 08/02/2024] [Indexed: 08/23/2024] Open
Abstract
Immunotherapy with immune checkpoint inhibitors (ICIs) is increasingly used to treat various tumor types. Determining patient responses to ICIs presents a significant clinical challenge. Although components of the tumor microenvironment (TME) are used to predict patient outcomes, comprehensive assessments of the TME are frequently overlooked. Using a top-down approach, the TME was divided into five layers-outcome, immune role, cell, cellular component, and gene. Using this structure, a neural network called TME-NET was developed to predict responses to ICIs. Model parameter weights and cell ablation studies were used to investigate the influence of TME components. The model was developed and evaluated using a pan-cancer cohort of 948 patients across four cancer types, with Area Under the Curve (AUC) and accuracy as performance metrics. Results show that TME-NET surpasses established models such as support vector machine and k-nearest neighbors in AUC and accuracy. Visualization of model parameter weights showed that at the cellular layer, Th1 cells enhance immune responses, whereas myeloid-derived suppressor cells and M2 macrophages show strong immunosuppressive effects. Cell ablation studies further confirmed the impact of these cells. At the gene layer, the transcription factors STAT4 in Th1 cells and IRF4 in M2 macrophages significantly affect TME dynamics. Additionally, the cytokine-encoding genes IFNG from Th1 cells and ARG1 from M2 macrophages are crucial for modulating immune responses within the TME. Survival data from immunotherapy cohorts confirmed the prognostic ability of these markers, with p-values <0.01. In summary, TME-NET performs well in predicting immunotherapy responses and offers interpretable insights into the immunotherapy process. It can be customized at https://immbal.shinyapps.io/TME-NET.
Collapse
Affiliation(s)
- Xiaobao Ding
- Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou 310018, Zhejiang, China
- Institute of Big Data and Artificial Intelligence in Medicine, School of Electronics and Information Engineering, Taizhou University, Taizhou 318000, Zhejiang, China
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, 310018, China
| | - Lin Zhang
- Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou 310018, Zhejiang, China
| | - Ming Fan
- Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou 310018, Zhejiang, China
| | - Lihua Li
- Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou 310018, Zhejiang, China
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, 310018, China
| |
Collapse
|
5
|
Wang Y, Kong X, Bi X, Cui L, Yu H, Wu H. ResDeepSurv: A Survival Model for Deep Neural Networks Based on Residual Blocks and Self-attention Mechanism. Interdiscip Sci 2024; 16:405-417. [PMID: 38489147 DOI: 10.1007/s12539-024-00617-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 01/30/2024] [Accepted: 02/01/2024] [Indexed: 03/17/2024]
Abstract
Survival analysis, as a widely used method for analyzing and predicting the timing of event occurrence, plays a crucial role in the medicine field. Medical professionals utilize survival models to gain insight into the effects of patient covariates on the disease, and the correlation with the effectiveness of different treatment strategies. This knowledge is essential for the development of treatment plans and the enhancement of treatment approaches. Conventional survival models, such as the Cox proportional hazards model, require a significant amount of feature engineering or prior knowledge to facilitate personalized modeling. To address these limitations, we propose a novel residual-based self-attention deep neural network for survival modeling, called ResDeepSurv, which combines the benefits of neural networks and the Cox proportional hazards regression model. The model proposed in our study simulates the distribution of survival time and the correlation between covariates and outcomes, but does not impose strict assumptions on the basic distribution of survival data. This approach effectively accounts for both linear and nonlinear risk functions in survival data analysis. The performance of our model in analyzing survival data with various risk functions is on par with or even superior to that of other existing survival analysis methods. Furthermore, we validate the superior performance of our model in comparison to currently existing methods by evaluating multiple publicly available clinical datasets. Through this study, we prove the effectiveness of our proposed model in survival analysis, providing a promising alternative to traditional approaches. The application of deep learning techniques and the ability to capture complex relationships between covariates and survival outcomes without relying on extensive feature engineering make our model a valuable tool for personalized medicine and decision-making in clinical practice.
Collapse
Affiliation(s)
- Yuchen Wang
- School of Software, Shandong University, Jinan, 250101, China
| | - Xianchun Kong
- Department of Pediatric Surgery, Heze Municipal Hospital, Heze, 274000, China
| | - Xiao Bi
- School of Mathematics, Shandong University, Jinan, 250100, China
| | - Lizhen Cui
- School of Software, Shandong University, Jinan, 250101, China
| | - Hong Yu
- School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Hao Wu
- School of Software, Shandong University, Jinan, 250101, China.
| |
Collapse
|
6
|
Armingol E, Baghdassarian HM, Lewis NE. The diversification of methods for studying cell-cell interactions and communication. Nat Rev Genet 2024; 25:381-400. [PMID: 38238518 PMCID: PMC11139546 DOI: 10.1038/s41576-023-00685-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/01/2023] [Indexed: 05/20/2024]
Abstract
No cell lives in a vacuum, and the molecular interactions between cells define most phenotypes. Transcriptomics provides rich information to infer cell-cell interactions and communication, thus accelerating the discovery of the roles of cells within their communities. Such research relies heavily on algorithms that infer which cells are interacting and the ligands and receptors involved. Specific pressures on different research niches are driving the evolution of next-generation computational tools, enabling new conceptual opportunities and technological advances. More sophisticated algorithms now account for the heterogeneity and spatial organization of cells, multiple ligand types and intracellular signalling events, and enable the use of larger and more complex datasets, including single-cell and spatial transcriptomics. Similarly, new high-throughput experimental methods are increasing the number and resolution of interactions that can be analysed simultaneously. Here, we explore recent progress in cell-cell interaction research and highlight the diversification of the next generation of tools, which have yielded a rich ecosystem of tools for different applications and are enabling invaluable discoveries.
Collapse
Affiliation(s)
- Erick Armingol
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, USA.
- Department of Paediatrics, University of California, San Diego, La Jolla, CA, USA.
| | - Hratch M Baghdassarian
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, USA
- Department of Paediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Nathan E Lewis
- Department of Paediatrics, University of California, San Diego, La Jolla, CA, USA.
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
7
|
Liu W, Teng Z, Li Z, Chen J. CVGAE: A Self-Supervised Generative Method for Gene Regulatory Network Inference Using Single-Cell RNA Sequencing Data. Interdiscip Sci 2024:10.1007/s12539-024-00633-y. [PMID: 38778003 DOI: 10.1007/s12539-024-00633-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Revised: 04/07/2024] [Accepted: 04/09/2024] [Indexed: 05/25/2024]
Abstract
Gene regulatory network (GRN) inference based on single-cell RNA sequencing data (scRNAseq) plays a crucial role in understanding the regulatory mechanisms between genes. Various computational methods have been employed for GRN inference, but their performance in terms of network accuracy and model generalization is not satisfactory, and their poor performance is caused by high-dimensional data and network sparsity. In this paper, we propose a self-supervised method for gene regulatory network inference using single-cell RNA sequencing data (CVGAE). CVGAE uses graph neural network for inductive representation learning, which merges gene expression data and observed topology into a low-dimensional vector space. The well-trained vectors will be used to calculate mathematical distance of each gene, and further predict interactions between genes. In overall framework, FastICA is implemented to relief computational complexity caused by high dimensional data, and CVGAE adopts multi-stacked GraphSAGE layers as an encoder and an improved decoder to overcome network sparsity. CVGAE is evaluated on several single cell datasets containing four related ground-truth networks, and the result shows that CVGAE achieve better performance than comparative methods. To validate learning and generalization capabilities, CVGAE is applied in few-shot environment by change the ratio of train set and test set. In condition of few-shot, CVGAE obtains comparable or superior performance.
Collapse
Affiliation(s)
- Wei Liu
- School of Computer Science, Xiangtan University, Xiangtan, 411105, China.
| | - Zhijie Teng
- School of Computer Science, Xiangtan University, Xiangtan, 411105, China
| | - Zejun Li
- School of Computer Science and Engineering, Hunan Institute of Technology, Hengyang, 412002, China
| | - Jing Chen
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.
| |
Collapse
|
8
|
Zhou L, Wang X, Peng L, Chen M, Wen H. SEnSCA: Identifying possible ligand-receptor interactions and its application in cell-cell communication inference. J Cell Mol Med 2024; 28:e18372. [PMID: 38747737 PMCID: PMC11095317 DOI: 10.1111/jcmm.18372] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2024] [Revised: 04/10/2024] [Accepted: 04/18/2024] [Indexed: 05/18/2024] Open
Abstract
Multicellular organisms have dense affinity with the coordination of cellular activities, which severely depend on communication across diverse cell types. Cell-cell communication (CCC) is often mediated via ligand-receptor interactions (LRIs). Existing CCC inference methods are limited to known LRIs. To address this problem, we developed a comprehensive CCC analysis tool SEnSCA by integrating single cell RNA sequencing and proteome data. SEnSCA mainly contains potential LRI acquisition and CCC strength evaluation. For acquiring potential LRIs, it first extracts LRI features and reduces the feature dimension, subsequently constructs negative LRI samples through K-means clustering, finally acquires potential LRIs based on Stacking ensemble comprising support vector machine, 1D-convolutional neural networks and multi-head attention mechanism. During CCC strength evaluation, SEnSCA conducts LRI filtering and then infers CCC by combining the three-point estimation approach and single cell RNA sequencing data. SEnSCA computed better precision, recall, accuracy, F1 score, AUC and AUPR under most of conditions when predicting possible LRIs. To better illustrate the inferred CCC network, SEnSCA provided three visualization options: heatmap, bubble diagram and network diagram. Its application on human melanoma tissue demonstrated its reliability in CCC detection. In summary, SEnSCA offers a useful CCC inference tool and is freely available at https://github.com/plhhnu/SEnSCA.
Collapse
Affiliation(s)
- Liqian Zhou
- School of Life Sciences and ChemistryHunan University of TechnologyHunanChina
| | - Xiwen Wang
- School of Life Sciences and ChemistryHunan University of TechnologyHunanChina
| | - Lihong Peng
- School of Life Sciences and ChemistryHunan University of TechnologyHunanChina
| | - Min Chen
- School of Computer ScienceHunan Institute of TechnologyHengyangChina
| | - Hong Wen
- School of Computer ScienceHunan University of TechnologyHunanChina
| |
Collapse
|
9
|
Du Q, An Q, Zhang J, Liu C, Hu Q. Unravelling immune microenvironment features underlying tumor progression in the single-cell era. Cancer Cell Int 2024; 24:143. [PMID: 38649887 PMCID: PMC11036673 DOI: 10.1186/s12935-024-03335-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 04/18/2024] [Indexed: 04/25/2024] Open
Abstract
The relationship between the immune cell and tumor occurrence and progression remains unclear. Profiling alterations in the tumor immune microenvironment (TIME) at high resolution is crucial to identify factors influencing cancer progression and enhance the effectiveness of immunotherapy. However, traditional sequencing methods, including bulk RNA sequencing, exhibit varying degrees of masking the cellular heterogeneity and immunophenotypic changes observed in early and late-stage tumors. Single-cell RNA sequencing (scRNA-seq) has provided significant and precise TIME landscapes. Consequently, this review has highlighted TIME cellular and molecular changes in tumorigenesis and progression elucidated through recent scRNA-seq studies. Specifically, we have summarized the cellular heterogeneity of TIME at different stages, including early, late, and metastatic stages. Moreover, we have outlined the related variations that may promote tumor occurrence and metastasis in the single-cell era. The widespread applications of scRNA-seq in TIME will comprehensively redefine the understanding of tumor biology and furnish more effective immunotherapy strategies.
Collapse
Affiliation(s)
- Qilian Du
- Department of Oncology, Renmin Hospital of Wuhan University, Wuhan, 430060, China
| | - Qi An
- Department of Oncology, Renmin Hospital of Wuhan University, Wuhan, 430060, China
| | - Jiajun Zhang
- Department of Oncology, Renmin Hospital of Wuhan University, Wuhan, 430060, China
| | - Chao Liu
- Department of Radiation Oncology, Peking University First Hospital, Beijing, 100034, China.
| | - Qinyong Hu
- Department of Oncology, Renmin Hospital of Wuhan University, Wuhan, 430060, China.
| |
Collapse
|
10
|
Zhang T, Wu Z, Li L, Ren J, Zhang Z, Wang G. CPPLS-MLP: a method for constructing cell-cell communication networks and identifying related highly variable genes based on single-cell sequencing and spatial transcriptomics data. Brief Bioinform 2024; 25:bbae198. [PMID: 38678387 PMCID: PMC11056015 DOI: 10.1093/bib/bbae198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 03/08/2024] [Accepted: 04/10/2024] [Indexed: 04/30/2024] Open
Abstract
In the growth and development of multicellular organisms, the immune processes of the immune system and the maintenance of the organism's internal environment, cell communication plays a crucial role. It exerts a significant influence on regulating internal cellular states such as gene expression and cell functionality. Currently, the mainstream methods for studying intercellular communication are focused on exploring the ligand-receptor-transcription factor and ligand-receptor-subunit scales. However, there is relatively limited research on the association between intercellular communication and highly variable genes (HVGs). As some HVGs are closely related to cell communication, accurately identifying these HVGs can enhance the accuracy of constructing cell communication networks. The rapid development of single-cell sequencing (scRNA-seq) and spatial transcriptomics technologies provides a data foundation for exploring the relationship between intercellular communication and HVGs. Therefore, we propose CPPLS-MLP, which can identify HVGs closely related to intercellular communication and further analyze the impact of Multiple Input Multiple Output cellular communication on the differential expression of these HVGs. By comparing with the commonly used method CCPLS for constructing intercellular communication networks, we validated the superior performance of our method in identifying cell-type-specific HVGs and effectively analyzing the influence of neighboring cell types on HVG expression regulation. Source codes for the CPPLS_MLP R, python packages and the related scripts are available at 'CPPLS_MLP Github [https://github.com/wuzhenao/CPPLS-MLP]'.
Collapse
Affiliation(s)
- Tianjiao Zhang
- College of Computer and Control Engineering, Northeast Forestry University Harbin, 150040, China
| | - Zhenao Wu
- College of Computer and Control Engineering, Northeast Forestry University Harbin, 150040, China
| | - Liangyu Li
- College of Computer and Control Engineering, Northeast Forestry University Harbin, 150040, China
| | - Jixiang Ren
- College of Computer and Control Engineering, Northeast Forestry University Harbin, 150040, China
| | - Ziheng Zhang
- College of Computer and Control Engineering, Northeast Forestry University Harbin, 150040, China
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University Harbin, 150040, China
- Faculty of Computing, Harbin Institute of Technology Harbin, 150001, China
| |
Collapse
|
11
|
Li R, Chen X, Yang X. Navigating the landscapes of spatial transcriptomics: How computational methods guide the way. WILEY INTERDISCIPLINARY REVIEWS. RNA 2024; 15:e1839. [PMID: 38527900 DOI: 10.1002/wrna.1839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 02/24/2024] [Accepted: 03/04/2024] [Indexed: 03/27/2024]
Abstract
Spatially resolved transcriptomics has been dramatically transforming biological and medical research in various fields. It enables transcriptome profiling at single-cell, multi-cellular, or sub-cellular resolution, while retaining the information of geometric localizations of cells in complex tissues. The coupling of cell spatial information and its molecular characteristics generates a novel multi-modal high-throughput data source, which poses new challenges for the development of analytical methods for data-mining. Spatial transcriptomic data are often highly complex, noisy, and biased, presenting a series of difficulties, many unresolved, for data analysis and generation of biological insights. In addition, to keep pace with the ever-evolving spatial transcriptomic experimental technologies, the existing analytical theories and tools need to be updated and reformed accordingly. In this review, we provide an overview and discussion of the current computational approaches for mining of spatial transcriptomics data. Future directions and perspectives of methodology design are proposed to stimulate further discussions and advances in new analytical models and algorithms. This article is categorized under: RNA Methods > RNA Analyses in Cells RNA Evolution and Genomics > Computational Analyses of RNA RNA Export and Localization > RNA Localization.
Collapse
Affiliation(s)
- Runze Li
- MOE Key Laboratory of Bioinformatics, Center for Synthetic & Systems Biology, School of Life Sciences, Tsinghua University, Beijing, China
| | - Xu Chen
- MOE Key Laboratory of Bioinformatics, Center for Synthetic & Systems Biology, School of Life Sciences, Tsinghua University, Beijing, China
| | - Xuerui Yang
- MOE Key Laboratory of Bioinformatics, Center for Synthetic & Systems Biology, School of Life Sciences, Tsinghua University, Beijing, China
| |
Collapse
|
12
|
Zhou L, Peng X, Zeng L, Peng L. Finding potential lncRNA-disease associations using a boosting-based ensemble learning model. Front Genet 2024; 15:1356205. [PMID: 38495672 PMCID: PMC10940470 DOI: 10.3389/fgene.2024.1356205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 02/01/2024] [Indexed: 03/19/2024] Open
Abstract
Introduction: Long non-coding RNAs (lncRNAs) have been in the clinical use as potential prognostic biomarkers of various types of cancer. Identifying associations between lncRNAs and diseases helps capture the potential biomarkers and design efficient therapeutic options for diseases. Wet experiments for identifying these associations are costly and laborious. Methods: We developed LDA-SABC, a novel boosting-based framework for lncRNA-disease association (LDA) prediction. LDA-SABC extracts LDA features based on singular value decomposition (SVD) and classifies lncRNA-disease pairs (LDPs) by incorporating LightGBM and AdaBoost into the convolutional neural network. Results: The LDA-SABC performance was evaluated under five-fold cross validations (CVs) on lncRNAs, diseases, and LDPs. It obviously outperformed four other classical LDA inference methods (SDLDA, LDNFSGB, LDASR, and IPCAF) through precision, recall, accuracy, F1 score, AUC, and AUPR. Based on the accurate LDA prediction performance of LDA-SABC, we used it to find potential lncRNA biomarkers for lung cancer. The results elucidated that 7SK and HULC could have a relationship with non-small-cell lung cancer (NSCLC) and lung adenocarcinoma (LUAD), respectively. Conclusion: We hope that our proposed LDA-SABC method can help improve the LDA identification.
Collapse
Affiliation(s)
- Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, Hunan, China
| | - Xinhuai Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, Hunan, China
| | - Lijun Zeng
- School of Computer Science, Hunan Institute of Technology, Hengyang, China
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, Hunan, China
| |
Collapse
|
13
|
Peng L, Gao P, Xiong W, Li Z, Chen X. Identifying potential ligand-receptor interactions based on gradient boosted neural network and interpretable boosting machine for intercellular communication analysis. Comput Biol Med 2024; 171:108110. [PMID: 38367445 DOI: 10.1016/j.compbiomed.2024.108110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 01/24/2024] [Accepted: 02/04/2024] [Indexed: 02/19/2024]
Abstract
Cell-cell communication is essential to many key biological processes. Intercellular communication is generally mediated by ligand-receptor interactions (LRIs). Thus, building a comprehensive and high-quality LRI resource can significantly improve intercellular communication analysis. Meantime, due to lack of a "gold standard" dataset, it remains a challenge to evaluate LRI-mediated intercellular communication results. Here, we introduce CellGiQ, a high-confident LRI prediction framework for intercellular communication analysis. Highly confident LRIs are first inferred by LRI feature extraction with BioTriangle, LRI selection using LightGBM, and LRI classification based on ensemble of gradient boosted neural network and interpretable boosting machine. Subsequently, known and identified high-confident LRIs are filtered by combining single-cell RNA sequencing (scRNA-seq) data and further applied to intercellular communication inference through a quartile scoring strategy. To validation the predictions, CellGiQ exploited several evaluation strategies: using AUC and AUPR, it surpassed six competing LRI prediction models on four LRI datasets; through Venn diagrams and molecular docking, its predicted LRIs were validated by five other popular intercellular communication inference methods; based on the overlapping LRIs, it computed high Jaccard index with six other state-of-the-art intercellular communication prediction tools within human HNSCC tissues; by comparing with classical models and literature retrieve, its inferred HNSCC-related intercellular communication results was further validated. The novelty of this study is to identify high-confident LRIs based on machine learning as well as design several LRI validation ways, providing reference for computational LRI prediction. CellGiQ provides an open-source and useful tool to decompose LRI-mediated intercellular communication at single cell resolution. CellGiQ is freely available at https://github.com/plhhnu/CellGiQ.
Collapse
Affiliation(s)
- Lihong Peng
- College of Life Science and Chemistry, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Pengfei Gao
- College of Life Science and Chemistry, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Wei Xiong
- College of Life Science and Chemistry, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Zejun Li
- School of Computer Science and Engineering, Hunan Institute of Technology, Hengyang, 421002, Hunan, China.
| | - Xing Chen
- School of Science, Jiangnan University, Wuxi, 214122, Jiangsu, China.
| |
Collapse
|
14
|
Wang W, Han P, Li Z, Nie R, Wang K, Wang L, Liao H. LMGATCDA: Graph Neural Network With Labeling Trick for Predicting circRNA-Disease Associations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:289-300. [PMID: 38231821 DOI: 10.1109/tcbb.2024.3355093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Previous studies have proven that circular RNAs (circRNAs) are inextricably connected to the etiology and pathophysiology of complicated diseases. Since conventional biological research are frequently small-scale, expensive, and time-consuming, it is essential to establish an efficient and reasonable computation-based method to identify disease-related circRNAs. In this article, we proposed a novel ensemble model for predicting probable circRNA-disease associations based on multi-source similarity information(LMGATCDA). In particular, LMGATCDA first incorporates information on circRNA functional similarity, disease semantic similarity, and the Gaussian interaction profile (GIP) kernel similarity as explicit features, along with node-labeling of the three-hop subgraphs extracted from each linked target node as graph structural features. After that, the fused features are used as input, and further implied features are extracted by graph sampling aggregation (GraphSAGE) and multi-hop attention graph neural network (MAGNA). Finally, the prediction scores are obtained through a fully connected layer. With five-fold cross-validation, LMGATCDA demonstrated excellent competitiveness against gold standard data, reaching 95.37% accuracy and 91.31% recall with an AUC of 94.25% on the circR2Disease benchmark dataset. Collectively, the noteworthy findings from these case studies support our conclusion that the LMGATCDA model can provide reliable circRNA-disease associations for clinical research while helping to mitigate experimental uncertainties in wet-lab investigations.
Collapse
|
15
|
Shi X, Yue C, Quan M, Li Y, Nashwan Sam H. A semi-supervised ensemble clustering algorithm for discovering relationships between different diseases by extracting cell-to-cell biological communications. J Cancer Res Clin Oncol 2024; 150:3. [PMID: 38168012 DOI: 10.1007/s00432-023-05559-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 11/01/2023] [Indexed: 01/05/2024]
Abstract
INTRODUCTION In recent decades, many theories have been proposed about the cause of hereditary diseases such as cancer. However, most studies state genetic and environmental factors as the most important parameters. It has been shown that gene expression data are valuable information about hereditary diseases and their analysis can identify the relationships between these diseases. OBJECTIVE Identification of damaged genes from various diseases can be done through the discovery of cell-to-cell biological communications. Also, extraction of intercellular communications can identify relationships between different diseases. For example, gene disorders that cause damage to the same cells in both breast and blood cancers. Hence, the purpose is to discover cell-to-cell biological communications in gene expression data. METHODOLOGY The identification of cell-to-cell biological communications for various cancer diseases has been widely performed by clustering algorithms. However, this field remains open due to the abundance of unprocessed gene expression data. Accordingly, this paper focuses on the development of a semi-supervised ensemble clustering algorithm that can discover relationships between different diseases through the extraction of cell-to-cell biological communications. The proposed clustering framework includes a stratified feature sampling mechanism and a novel similarity metric to deal with high-dimensional data and improve the diversity of primary partitions. RESULTS The performance of the proposed clustering algorithm is verified with several datasets from the UCI machine learning repository and then applied to the FANTOM5 dataset to extract cell-to-cell biological communications. The used version of this dataset contains 108 cells and 86,427 promoters from 702 samples. The strength of communication between two similar cells from different diseases indicates the relationship of those diseases. Here, the strength of communication is determined by promoter, so we found the highest cell-to-cell biological communication between "basophils" and "ciliary.epithelial.cells" with 62,809 promoters. CONCLUSION The maximum cell-to-cell biological similarity in each cluster can be used to detect the relationship between different diseases such as cancer.
Collapse
Affiliation(s)
- Xiuchao Shi
- College of Environment and Life Sciences, Weinan Normal University, Weinan, 714099, Shaanxi, China.
| | - Chunxiao Yue
- Weinan Junior Middle School, Weinan, 714000, Shaanxi, China
| | - Meiping Quan
- College of Environment and Life Sciences, Weinan Normal University, Weinan, 714099, Shaanxi, China
| | - Yalin Li
- College of Environment and Life Sciences, Weinan Normal University, Weinan, 714099, Shaanxi, China
| | - Hiba Nashwan Sam
- Department of Radiology and Sonar Techniques, Al-Noor University College, Nineveh, Iraq
| |
Collapse
|
16
|
Wang J, Zhang L, Sun J, Yang X, Wu W, Chen W, Zhao Q. Predicting drug-induced liver injury using graph attention mechanism and molecular fingerprints. Methods 2024; 221:18-26. [PMID: 38040204 DOI: 10.1016/j.ymeth.2023.11.014] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 11/14/2023] [Accepted: 11/25/2023] [Indexed: 12/03/2023] Open
Abstract
Drug-induced liver injury (DILI) is a significant issue in drug development and clinical treatment due to its potential to cause liver dysfunction or damage, which, in severe cases, can lead to liver failure or even fatality. DILI has numerous pathogenic factors, many of which remain incompletely understood. Consequently, it is imperative to devise methodologies and tools for anticipatory assessment of DILI risk in the initial phases of drug development. In this study, we present DMFPGA, a novel deep learning predictive model designed to predict DILI. To provide a comprehensive description of molecular properties, we employ a multi-head graph attention mechanism to extract features from the molecular graphs, representing characteristics at the level of compound nodes. Additionally, we combine multiple fingerprints of molecules to capture features at the molecular level of compounds. The fusion of molecular fingerprints and graph features can more fully express the properties of compounds. Subsequently, we employ a fully connected neural network to classify compounds as either DILI-positive or DILI-negative. To rigorously evaluate DMFPGA's performance, we conduct a 5-fold cross-validation experiment. The obtained results demonstrate the superiority of our method over four existing state-of-the-art computational approaches, exhibiting an average AUC of 0.935 and an average ACC of 0.934. We believe that DMFPGA is helpful for early-stage DILI prediction and assessment in drug development.
Collapse
Affiliation(s)
- Jifeng Wang
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China
| | - Li Zhang
- School of Life Science, Liaoning University, Shenyang 110036, China
| | - Jianqiang Sun
- School of Information Science and Engineering, Linyi University, Linyi 276000, China
| | - Xin Yang
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China
| | - Wei Wu
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China
| | - Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China.
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China.
| |
Collapse
|
17
|
Peng L, Huang L, Su Q, Tian G, Chen M, Han G. LDA-VGHB: identifying potential lncRNA-disease associations with singular value decomposition, variational graph auto-encoder and heterogeneous Newton boosting machine. Brief Bioinform 2023; 25:bbad466. [PMID: 38127089 PMCID: PMC10734633 DOI: 10.1093/bib/bbad466] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Revised: 10/05/2023] [Accepted: 11/25/2023] [Indexed: 12/23/2023] Open
Abstract
Long noncoding RNAs (lncRNAs) participate in various biological processes and have close linkages with diseases. In vivo and in vitro experiments have validated many associations between lncRNAs and diseases. However, biological experiments are time-consuming and expensive. Here, we introduce LDA-VGHB, an lncRNA-disease association (LDA) identification framework, by incorporating feature extraction based on singular value decomposition and variational graph autoencoder and LDA classification based on heterogeneous Newton boosting machine. LDA-VGHB was compared with four classical LDA prediction methods (i.e. SDLDA, LDNFSGB, IPCARF and LDASR) and four popular boosting models (XGBoost, AdaBoost, CatBoost and LightGBM) under 5-fold cross-validations on lncRNAs, diseases, lncRNA-disease pairs and independent lncRNAs and independent diseases, respectively. It greatly outperformed the other methods with its prominent performance under four different cross-validations on the lncRNADisease and MNDR databases. We further investigated potential lncRNAs for lung cancer, breast cancer, colorectal cancer and kidney neoplasms and inferred the top 20 lncRNAs associated with them among all their unobserved lncRNAs. The results showed that most of the predicted top 20 lncRNAs have been verified by biomedical experiments provided by the Lnc2Cancer 3.0, lncRNADisease v2.0 and RNADisease databases as well as publications. We found that HAR1A, KCNQ1DN, ZFAT-AS1 and HAR1B could associate with lung cancer, breast cancer, colorectal cancer and kidney neoplasms, respectively. The results need further biological experimental validation. We foresee that LDA-VGHB was capable of identifying possible lncRNAs for complex diseases. LDA-VGHB is publicly available at https://github.com/plhhnu/LDA-VGHB.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, 412007, Hunan, China
- College of Life Sciences and Chemistry, Hunan University of Technology, 412007, Hunan, China
| | - Liangliang Huang
- School of Computer Science, Hunan University of Technology, 412007, Hunan, China
| | - Qiongli Su
- Department of Pharmacy, the Affiliated Zhuzhou Hospital Xiangya Medical College CSU, 412007, Hunan, China
| | - Geng Tian
- Geneis (Beijing) Co. Ltd, China, 100102, Beijing, China
| | - Min Chen
- School of Computer Science, Hunan Institute of Technology, 421002, No. 18 Henghua Road, Zhuhui District, Hengyang, Hunan, China
| | - Guosheng Han
- School of Mathematics and Computational Science, Xiangtan University, 411105, Yuhu District, Xiangtan, Hunan, China
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, 411105, Yuhu District, Xiangtan, Hunan, China
| |
Collapse
|
18
|
Chu J. Exploration of the molecular mechanism of intercellular communication in paediatric neuroblastoma by single-cell sequencing. Sci Rep 2023; 13:20406. [PMID: 37990103 PMCID: PMC10663476 DOI: 10.1038/s41598-023-47796-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Accepted: 11/18/2023] [Indexed: 11/23/2023] Open
Abstract
Neuroblastoma (NB) is an embryonic tumour that originates in the sympathetic nervous system and occurs most often in infants and children under 2 years of age. Moreover, it is the most common extracranial solid tumour in children. Increasing studies suggest that intercellular communication within the tumour microenvironment is closely related to tumour development. This study aimed to construct a prognosis-related intercellular communication-associated genes model by single-cell sequencing and transcriptome sequencing to predict the prognosis of patients with NB for precise management. Single-cell data from patients with NB were downloaded from the gene expression omnibus database for comprehensive analysis. Furthermore, prognosis-related genes were screened in the TARGET database based on epithelial cell marker genes through a combination of Cox regression and Lasso regression analyses, using GSE62564 and GSE85047 for external validation. The patients' risk scores were calculated, followed by immune infiltration analysis, drug sensitivity analysis, and enrichment analysis of risk scores, which were conducted for the prognostic model. I used the Lasso regression feature selection algorithm to screen characteristic genes in NB and developed a 21-gene prognostic model. The risk scores were highly correlated with multiple immune cells and common anti-tumour drugs. Furthermore, the risk score was identified as an independent prognostic factor for NB. In this study, I constructed and validated a prognostic signature based on epithelial marker genes, which may provide useful information on the development and prognosis of NB.
Collapse
Affiliation(s)
- Jing Chu
- Department of Pathology, Anhui Provincial Children's Hospital, 39 Wangjiang East Road, Hefei, 230051, Anhui, China.
| |
Collapse
|
19
|
Fu X, Chen Y, Tian S. DlncRNALoc: A discrete wavelet transform-based model for predicting lncRNA subcellular localization. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:20648-20667. [PMID: 38124569 DOI: 10.3934/mbe.2023913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
The prediction of long non-coding RNA (lncRNA) subcellular localization is essential to the understanding of its function and involvement in cellular regulation. Traditional biological experimental methods are costly and time-consuming, making computational methods the preferred approach for predicting lncRNA subcellular localization (LSL). However, existing computational methods have limitations due to the structural characteristics of lncRNAs and the uneven distribution of data across subcellular compartments. We propose a discrete wavelet transform (DWT)-based model for predicting LSL, called DlncRNALoc. We construct a physicochemical property matrix of a 2-tuple bases based on lncRNA sequences, and we introduce a DWT lncRNA feature extraction method. We use the Synthetic Minority Over-sampling Technique (SMOTE) for oversampling and the local fisher discriminant analysis (LFDA) algorithm to optimize feature information. The optimized feature vectors are fed into support vector machine (SVM) to construct a predictive model. DlncRNALoc has been applied for a five-fold cross-validation on the three sets of benchmark datasets. Extensive experiments have demonstrated the superiority and effectiveness of the DlncRNALoc model in predicting LSL.
Collapse
Affiliation(s)
- Xiangzheng Fu
- Neher's Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao, China
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, China
- Department of Basic Biology, Changsha Medical College, Changsha, Hunan, China
| | - Yifan Chen
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, China
- Department of Basic Biology, Changsha Medical College, Changsha, Hunan, China
| | - Sha Tian
- Department of Internal Medicine, College of Integrated Chinese and Western Medicine, Hunan University of Chinese Medicine, Changsha, Hunan, China
| |
Collapse
|
20
|
Peng L, He X, Peng X, Li Z, Zhang L. STGNNks: Identifying cell types in spatial transcriptomics data based on graph neural network, denoising auto-encoder, and k-sums clustering. Comput Biol Med 2023; 166:107440. [PMID: 37738898 DOI: 10.1016/j.compbiomed.2023.107440] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 08/15/2023] [Accepted: 08/29/2023] [Indexed: 09/24/2023]
Abstract
BACKGROUND Spatial transcriptomics technologies fully utilize spatial location information, tissue morphological features, and transcriptional profiles. Integrating these data can greatly advance our understanding about cell biology in the morphological background. METHODS We developed an innovative spatial clustering method called STGNNks by combining graph neural network, denoising auto-encoder, and k-sums clustering. First, spatial resolved transcriptomics data are preprocessed and a hybrid adjacency matrix is constructed. Next, gene expressions and spatial context are integrated to learn spots' embedding features by a deep graph infomax-based graph convolutional network. Third, the learned features are mapped to a low-dimensional space through a zero-inflated negative binomial (ZINB)-based denoising auto-encoder. Fourth, a k-sums clustering algorithm is developed to identify spatial domains by combining k-means clustering and the ratio-cut clustering algorithms. Finally, it implements spatial trajectory inference, spatially variable gene identification, and differentially expressed gene detection based on the pseudo-space-time method on six 10x Genomics Visium datasets. RESULTS We compared our proposed STGNNks method with five other spatial clustering methods, CCST, Seurat, stLearn, Scanpy and SEDR. For the first time, four internal indicators in the area of machine learning, that is, silhouette coefficient, the Davies-Bouldin index, the Caliniski-Harabasz index, and the S_Dbw index, were used to measure the clustering performance of STGNNks with CCST, Seurat, stLearn, Scanpy and SEDR on five spatial transcriptomics datasets without labels (i.e., Adult Mouse Brain (FFPE), Adult Mouse Kidney (FFPE), Human Breast Cancer (Block A Section 2), Human Breast Cancer (FFPE), and Human Lymph Node). And two external indicators including adjusted Rand index (ARI) and normalized mutual information (NMI) were applied to evaluate the performance of the above six methods on Human Breast Cancer (Block A Section 1) with real labels. The comparison experiments elucidated that STGNNks obtained the smallest Davies-Bouldin and S_Dbw values and the largest Silhouette Coefficient, Caliniski-Harabasz, ARI and NMI, significantly outperforming the above five spatial transcriptomics analysis algorithms. Furthermore, we detected the top six spatially variable genes and the top five differentially expressed genes in each cluster on the above five unlabeled datasets. And the pseudo-space-time tree plot with hierarchical layout demonstrated a flow of Human Breast Cancer (Block A Section 1) progress in three clades branching from three invasive ductal carcinoma regions to multiple ductal carcinoma in situ sub-clusters. CONCLUSION We anticipate that STGNNks can efficiently improve spatial transcriptomics data analysis and further boost the diagnosis and therapy of related diseases. The codes are publicly available at https://github.com/plhhnu/STGNNks.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China; College of Life Sciences and Chemistry, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Xianzhi He
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Xinhuai Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Zejun Li
- School of Computer Science, Hunan Institute of Technology, Hengyang, 421002, Hunan, China.
| | - Li Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, Jiangsu, China.
| |
Collapse
|
21
|
Peng L, Yuan R, Han C, Han G, Tan J, Wang Z, Chen M, Chen X. CellEnBoost: A Boosting-Based Ligand-Receptor Interaction Identification Model for Cell-to-Cell Communication Inference. IEEE Trans Nanobioscience 2023; 22:705-715. [PMID: 37216267 DOI: 10.1109/tnb.2023.3278685] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Cell-to-cell communication (CCC) plays important roles in multicellular organisms. The identification of communication between cancer cells themselves and one between cancer cells and normal cells in tumor microenvironment helps understand cancer genesis, development and metastasis. CCC is usually mediated by Ligand-Receptor Interactions (LRIs). In this manuscript, we developed a Boosting-based LRI identification model (CellEnBoost) for CCC inference. First, potential LRIs are predicted by data collection, feature extraction, dimensional reduction, and classification based on an ensemble of Light gradient boosting machine and AdaBoost combining convolutional neural network. Next, the predicted LRIs and known LRIs are filtered. Third, the filtered LRIs are applied to CCC elucidation by combining CCC strength measurement and single-cell RNA sequencing data. Finally, CCC inference results are visualized using heatmap view, Circos plot view, and network view. The experimental results show that CellEnBoost obtained the best AUCs and AUPRs on the collected four LRI datasets. Case study in human head and neck squamous cell carcinoma (HNSCC) tissues demonstrates that fibroblasts were more likely to communicate with HNSCC cells, which is in accord with the results from iTALK. We anticipate that this work can contribute to the diagnosis and treatment of cancers.
Collapse
|
22
|
Peng L, Huang L, Tian G, Wu Y, Li G, Cao J, Wang P, Li Z, Duan L. Predicting potential microbe-disease associations with graph attention autoencoder, positive-unlabeled learning, and deep neural network. Front Microbiol 2023; 14:1244527. [PMID: 37789848 PMCID: PMC10543759 DOI: 10.3389/fmicb.2023.1244527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 08/16/2023] [Indexed: 10/05/2023] Open
Abstract
Background Microbes have dense linkages with human diseases. Balanced microorganisms protect human body against physiological disorders while unbalanced ones may cause diseases. Thus, identification of potential associations between microbes and diseases can contribute to the diagnosis and therapy of various complex diseases. Biological experiments for microbe-disease association (MDA) prediction are expensive, time-consuming, and labor-intensive. Methods We developed a computational MDA prediction method called GPUDMDA by combining graph attention autoencoder, positive-unlabeled learning, and deep neural network. First, GPUDMDA computes disease similarity and microbe similarity matrices by integrating their functional similarity and Gaussian association profile kernel similarity, respectively. Next, it learns the feature representation of each microbe-disease pair using graph attention autoencoder based on the obtained disease similarity and microbe similarity matrices. Third, it selects a few reliable negative MDAs based on positive-unlabeled learning. Finally, it takes the learned MDA features and the selected negative MDAs as inputs and designed a deep neural network to predict potential MDAs. Results GPUDMDA was compared with four state-of-the-art MDA identification models (i.e., MNNMDA, GATMDA, LRLSHMDA, and NTSHMDA) on the HMDAD and Disbiome databases under five-fold cross validations on microbes, diseases, and microbe-disease pairs. Under the three five-fold cross validations, GPUDMDA computed the best AUCs of 0.7121, 0.9454, and 0.9501 on the HMDAD database and 0.8372, 0.8908, and 0.8948 on the Disbiome database, respectively, outperforming the other four MDA prediction methods. Asthma is the most common chronic respiratory condition and affects ~339 million people worldwide. Inflammatory bowel disease is a class of globally chronic intestinal disease widely existed in the gut and gastrointestinal tract and extraintestinal organs of patients. Particularly, inflammatory bowel disease severely affects the growth and development of children. We used the proposed GPUDMDA method and found that Enterobacter hormaechei had potential associations with both asthma and inflammatory bowel disease and need further biological experimental validation. Conclusion The proposed GPUDMDA demonstrated the powerful MDA prediction ability. We anticipate that GPUDMDA helps screen the therapeutic clues for microbe-related diseases.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
- College of Life Sciences and Chemistry, Hunan University of Technology, Zhuzhou, China
| | - Liangliang Huang
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Geng Tian
- Geneis (Beijing) Co. Ltd., Beijing, China
| | - Yan Wu
- Geneis (Beijing) Co. Ltd., Beijing, China
| | - Guang Li
- Faculty of Pediatrics, The Chinese PLA General Hospital, Beijing, China
- Department of Pediatric Surgery, The Seventh Medical Center of PLA General Hospital, Beijing, China
- National Engineering Laboratory for Birth Defects Prevention and Control of Key Technology, Beijing, China
- Beijing Key Laboratory of Pediatric Organ Failure, Beijing, China
| | - Jianying Cao
- Faculty of Pediatrics, The Chinese PLA General Hospital, Beijing, China
- Department of Pediatric Surgery, The Seventh Medical Center of PLA General Hospital, Beijing, China
- National Engineering Laboratory for Birth Defects Prevention and Control of Key Technology, Beijing, China
- Beijing Key Laboratory of Pediatric Organ Failure, Beijing, China
| | - Peng Wang
- School of Computer Science, Hunan Institute of Technology, Hengyang, China
| | - Zejun Li
- School of Computer Science, Hunan Institute of Technology, Hengyang, China
| | - Lian Duan
- Faculty of Pediatrics, The Chinese PLA General Hospital, Beijing, China
- Department of Pediatric Surgery, The Seventh Medical Center of PLA General Hospital, Beijing, China
- National Engineering Laboratory for Birth Defects Prevention and Control of Key Technology, Beijing, China
- Beijing Key Laboratory of Pediatric Organ Failure, Beijing, China
| |
Collapse
|
23
|
He B, Sun H, Bao M, Li H, He J, Tian G, Wang B. A cross-cohort computational framework to trace tumor tissue-of-origin based on RNA sequencing. Sci Rep 2023; 13:15356. [PMID: 37717102 PMCID: PMC10505149 DOI: 10.1038/s41598-023-42465-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 09/11/2023] [Indexed: 09/18/2023] Open
Abstract
Carcinoma of unknown primary (CUP) is a type of metastatic cancer with tissue-of-origin (TOO) unidentifiable by traditional methods. CUP patients typically have poor prognosis but therapy targeting the original cancer tissue can significantly improve patients' prognosis. Thus, it's critical to develop accurate computational methods to infer cancer TOO. While qPCR or microarray-based methods are effective in inferring TOO for most cancer types, the overall prediction accuracy is yet to be improved. In this study, we propose a cross-cohort computational framework to trace TOO of 32 cancer types based on RNA sequencing (RNA-seq). Specifically, we employed logistic regression models to select 80 genes for each cancer type to create a combined 1356-gene set, based on transcriptomic data from 9911 tissue samples covering the 32 cancer types with known TOO from the Cancer Genome Atlas (TCGA). The selected genes are enriched in both tissue-specific and tissue-general functions. The cross-validation accuracy of our framework reaches 97.50% across all cancer types. Furthermore, we tested the performance of our model on the TCGA metastatic dataset and International Cancer Genome Consortium (ICGC) dataset, achieving an accuracy of 91.09% and 82.67%, respectively, despite the differences in experiment procedures and pipelines. In conclusion, we developed an accurate yet robust computational framework for identifying TOO, which holds promise for clinical applications. Our code is available at http://github.com/wangbo00129/classifybysklearn .
Collapse
Affiliation(s)
- Binsheng He
- School of Pharmacy, Changsha Medical University, Changsha, 410219, People's Republic of China
- Academician Workstation, Changsha Medical University, Changsha, 410219, People's Republic of China
| | - Hongmei Sun
- Department of Medical Oncology, The Cancer Hospital of Jia Mu Si, Jiamusi, People's Republic of China
| | - Meihua Bao
- Academician Workstation, Changsha Medical University, Changsha, 410219, People's Republic of China
| | - Haigang Li
- Academician Workstation, Changsha Medical University, Changsha, 410219, People's Republic of China
| | - Jianjun He
- School of Pharmacy, Changsha Medical University, Changsha, 410219, People's Republic of China
- Academician Workstation, Changsha Medical University, Changsha, 410219, People's Republic of China
| | - Geng Tian
- Geneis Beijing Co., Ltd., Beijing, 100102, People's Republic of China
- Qingdao Genesis Institute of Big Data Mining and Precision Medicine, Qingdao, 266000, Shandong, People's Republic of China
| | - Bo Wang
- Geneis Beijing Co., Ltd., Beijing, 100102, People's Republic of China.
- Qingdao Genesis Institute of Big Data Mining and Precision Medicine, Qingdao, 266000, Shandong, People's Republic of China.
| |
Collapse
|
24
|
Peng L, Tan J, Xiong W, Zhang L, Wang Z, Yuan R, Li Z, Chen X. Deciphering ligand-receptor-mediated intercellular communication based on ensemble deep learning and the joint scoring strategy from single-cell transcriptomic data. Comput Biol Med 2023; 163:107137. [PMID: 37364528 DOI: 10.1016/j.compbiomed.2023.107137] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Revised: 05/18/2023] [Accepted: 06/04/2023] [Indexed: 06/28/2023]
Abstract
BACKGROUND Cell-cell communication in a tumor microenvironment is vital to tumorigenesis, tumor progression and therapy. Intercellular communication inference helps understand molecular mechanisms of tumor growth, progression and metastasis. METHODS Focusing on ligand-receptor co-expressions, in this study, we developed an ensemble deep learning framework, CellComNet, to decipher ligand-receptor-mediated cell-cell communication from single-cell transcriptomic data. First, credible LRIs are captured by integrating data arrangement, feature extraction, dimension reduction, and LRI classification based on an ensemble of heterogeneous Newton boosting machine and deep neural network. Next, known and identified LRIs are screened based on single-cell RNA sequencing (scRNA-seq) data in certain tissues. Finally, cell-cell communication is inferred by incorporating scRNA-seq data, the screened LRIs, a joint scoring strategy that combines expression thresholding and expression product of ligands and receptors. RESULTS The proposed CellComNet framework was compared with four competing protein-protein interaction prediction models (PIPR, XGBoost, DNNXGB, and OR-RCNN) and obtained the best AUCs and AUPRs on four LRI datasets, elucidating the optimal LRI classification ability. CellComNet was further applied to analyze intercellular communication in human melanoma and head and neck squamous cell carcinoma (HNSCC) tissues. The results demonstrate that cancer-associated fibroblasts highly communicate with melanoma cells and endothelial cells strong communicate with HNSCC cells. CONCLUSIONS The proposed CellComNet framework efficiently identified credible LRIs and significantly improved cell-cell communication inference performance. We anticipate that CellComNet can contribute to anticancer drug design and tumor-targeted therapy.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China; College of Life Sciences and Chemistry, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Jingwei Tan
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Wei Xiong
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Li Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, Jiangsu, China
| | - Zhao Wang
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Ruya Yuan
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Zejun Li
- School of Computer Science, Hunan Institute of Technology, Hengyang, 421002, Hunan, China.
| | - Xing Chen
- School of Science, Jiangnan University, Wuxi, 214122, Jiangsu, China.
| |
Collapse
|
25
|
Su Z, Lu H, Wu Y, Li Z, Duan L. Predicting potential lncRNA biomarkers for lung cancer and neuroblastoma based on an ensemble of a deep neural network and LightGBM. Front Genet 2023; 14:1238095. [PMID: 37655066 PMCID: PMC10466784 DOI: 10.3389/fgene.2023.1238095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Accepted: 07/19/2023] [Indexed: 09/02/2023] Open
Abstract
Introduction: Lung cancer is one of the most frequent neoplasms worldwide with approximately 2.2 million new cases and 1.8 million deaths each year. The expression levels of programmed death ligand-1 (PDL1) demonstrate a complex association with lung cancer. Neuroblastoma is a high-risk malignant tumor and is mainly involved in childhood patients. Identification of new biomarkers for these two diseases can significantly promote their diagnosis and therapy. However, in vivo experiments to discover potential biomarkers are costly and laborious. Consequently, artificial intelligence technologies, especially machine learning methods, provide a powerful avenue to find new biomarkers for various diseases. Methods: We developed a machine learning-based method named LDAenDL to detect potential long noncoding RNA (lncRNA) biomarkers for lung cancer and neuroblastoma using an ensemble of a deep neural network and LightGBM. LDAenDL first computes the Gaussian kernel similarity and functional similarity of lncRNAs and the Gaussian kernel similarity and semantic similarity of diseases to obtain their similar networks. Next, LDAenDL combines a graph convolutional network, graph attention network, and convolutional neural network to learn the biological features of the lncRNAs and diseases based on their similarity networks. Third, these features are concatenated and fed to an ensemble model composed of a deep neural network and LightGBM to find new lncRNA-disease associations (LDAs). Finally, the proposed LDAenDL method is applied to identify possible lncRNA biomarkers associated with lung cancer and neuroblastoma. Results: The experimental results show that LDAenDL computed the best AUCs of 0.8701, 107 0.8953, and 0.9110 under cross-validation on lncRNAs, diseases, and lncRNA-disease pairs on Dataset 1, respectively, and 0.9490, 0.9157, and 0.9708 on Dataset 2, respectively. Furthermore, AUPRs of 0.8903, 0.9061, and 0.9166 under three cross-validations were obtained on Dataset 1, and 0.9582, 0.9122, and 0.9743 on Dataset 2. The results demonstrate that LDAenDL significantly outperformed the other four classical LDA prediction methods (i.e., SDLDA, LDNFSGB, IPCAF, and LDASR). Case studies demonstrate that CCDC26 and IFNG-AS1 may be new biomarkers of lung cancer, SNHG3 may associate with PDL1 for lung cancer, and HOTAIR and BDNF-AS may be potential biomarkers of neuroblastoma. Conclusion: We hope that the proposed LDAenDL method can help the development of targeted therapies for these two diseases.
Collapse
Affiliation(s)
- Zhenguo Su
- Clinical Lab, Yantai Affiliated Hospital of Binzhou Medical University, Yantai, China
| | - Huihui Lu
- Department of Thoracic Cardiovascular Surgery, Hunan Province Directly Affiliated TCM Hospital, Zhuzhou, China
| | - Yan Wu
- Geneis (Beijing) Co., Ltd., Beijing, China
| | - Zejun Li
- School of Computer Science, Hunan Institute of Technology, Hengyang, China
| | - Lian Duan
- Faculty of Pediatrics, The Chinese PLA General Hospital, Beijing, China
- Department of Pediatric Surgery, The Seventh Medical Center of PLA General Hospital, Beijing, China
- National Engineering Laboratory for Birth Defects Prevention and Control of Key Technology, Beijing, China
- Beijing Key Laboratory of Pediatric Organ Failure, Beijing, China
| |
Collapse
|
26
|
Cheng C, Chen W, Jin H, Chen X. A Review of Single-Cell RNA-Seq Annotation, Integration, and Cell-Cell Communication. Cells 2023; 12:1970. [PMID: 37566049 PMCID: PMC10417635 DOI: 10.3390/cells12151970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 07/10/2023] [Accepted: 07/21/2023] [Indexed: 08/12/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool for investigating cellular biology at an unprecedented resolution, enabling the characterization of cellular heterogeneity, identification of rare but significant cell types, and exploration of cell-cell communications and interactions. Its broad applications span both basic and clinical research domains. In this comprehensive review, we survey the current landscape of scRNA-seq analysis methods and tools, focusing on count modeling, cell-type annotation, data integration, including spatial transcriptomics, and the inference of cell-cell communication. We review the challenges encountered in scRNA-seq analysis, including issues of sparsity or low expression, reliability of cell annotation, and assumptions in data integration, and discuss the potential impact of suboptimal clustering and differential expression analysis tools on downstream analyses, particularly in identifying cell subpopulations. Finally, we discuss recent advancements and future directions for enhancing scRNA-seq analysis. Specifically, we highlight the development of novel tools for annotating single-cell data, integrating and interpreting multimodal datasets covering transcriptomics, epigenomics, and proteomics, and inferring cellular communication networks. By elucidating the latest progress and innovation, we provide a comprehensive overview of the rapidly advancing field of scRNA-seq analysis.
Collapse
Affiliation(s)
- Changde Cheng
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA;
| | - Wenan Chen
- Center for Applied Bioinformatics, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA; (W.C.); (H.J.)
| | - Hongjian Jin
- Center for Applied Bioinformatics, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA; (W.C.); (H.J.)
| | - Xiang Chen
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA;
| |
Collapse
|
27
|
Wang F, Yang H, Wu Y, Peng L, Li X. SAELGMDA: Identifying human microbe-disease associations based on sparse autoencoder and LightGBM. Front Microbiol 2023; 14:1207209. [PMID: 37415823 PMCID: PMC10320730 DOI: 10.3389/fmicb.2023.1207209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 05/18/2023] [Indexed: 07/08/2023] Open
Abstract
Introduction Identification of complex associations between diseases and microbes is important to understand the pathogenesis of diseases and design therapeutic strategies. Biomedical experiment-based Microbe-Disease Association (MDA) detection methods are expensive, time-consuming, and laborious. Methods Here, we developed a computational method called SAELGMDA for potential MDA prediction. First, microbe similarity and disease similarity are computed by integrating their functional similarity and Gaussian interaction profile kernel similarity. Second, one microbe-disease pair is presented as a feature vector by combining the microbe and disease similarity matrices. Next, the obtained feature vectors are mapped to a low-dimensional space based on a Sparse AutoEncoder. Finally, unknown microbe-disease pairs are classified based on Light Gradient boosting machine. Results The proposed SAELGMDA method was compared with four state-of-the-art MDA methods (MNNMDA, GATMDA, NTSHMDA, and LRLSHMDA) under five-fold cross validations on diseases, microbes, and microbe-disease pairs on the HMDAD and Disbiome databases. The results show that SAELGMDA computed the best accuracy, Matthews correlation coefficient, AUC, and AUPR under the majority of conditions, outperforming the other four MDA prediction models. In particular, SAELGMDA obtained the best AUCs of 0.8358 and 0.9301 under cross validation on diseases, 0.9838 and 0.9293 under cross validation on microbes, and 0.9857 and 0.9358 under cross validation on microbe-disease pairs on the HMDAD and Disbiome databases. Colorectal cancer, inflammatory bowel disease, and lung cancer are diseases that severely threat human health. We used the proposed SAELGMDA method to find possible microbes for the three diseases. The results demonstrate that there are potential associations between Clostridium coccoides and colorectal cancer and one between Sphingomonadaceae and inflammatory bowel disease. In addition, Veillonella may associate with autism. The inferred MDAs need further validation. Conclusion We anticipate that the proposed SAELGMDA method contributes to the identification of new MDAs.
Collapse
Affiliation(s)
- Feixiang Wang
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Huandong Yang
- Department of Gastrointestinal Surgery, Yidu Central Hospital of Weifang, Weifang, China
| | - Yan Wu
- Geneis (Beijing) Co., Ltd., Beijing, China
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Xiaoling Li
- The Second Department of Oncology, Beidahuang Industry Group General Hospital, Harbin, China
- The Second Department of Oncology, Heilongjiang Second Cancer Hospital, Harbin, China
| |
Collapse
|
28
|
Zhou L, Wang Y, Peng L, Li Z, Luo X. Identifying potential drug-target interactions based on ensemble deep learning. Front Aging Neurosci 2023; 15:1176400. [PMID: 37396659 PMCID: PMC10309650 DOI: 10.3389/fnagi.2023.1176400] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 05/10/2023] [Indexed: 07/04/2023] Open
Abstract
Introduction Drug-target interaction prediction is one important step in drug research and development. Experimental methods are time consuming and laborious. Methods In this study, we developed a novel DTI prediction method called EnGDD by combining initial feature acquisition, dimensional reduction, and DTI classification based on Gradient boosting neural network, Deep neural network, and Deep Forest. Results EnGDD was compared with seven stat-of-the-art DTI prediction methods (BLM-NII, NRLMF, WNNGIP, NEDTP, DTi2Vec, RoFDT, and MolTrans) on the nuclear receptor, GPCR, ion channel, and enzyme datasets under cross validations on drugs, targets, and drug-target pairs, respectively. EnGDD computed the best recall, accuracy, F1-score, AUC, and AUPR under the majority of conditions, demonstrating its powerful DTI identification performance. EnGDD predicted that D00182 and hsa2099, D07871 and hsa1813, DB00599 and hsa2562, D00002 and hsa10935 have a higher interaction probabilities among unknown drug-target pairs and may be potential DTIs on the four datasets, respectively. In particular, D00002 (Nadide) was identified to interact with hsa10935 (Mitochondrial peroxiredoxin3) whose up-regulation might be used to treat neurodegenerative diseases. Finally, EnGDD was used to find possible drug targets for Parkinson's disease and Alzheimer's disease after confirming its DTI identification performance. The results show that D01277, D04641, and D08969 may be applied to the treatment of Parkinson's disease through targeting hsa1813 (dopamine receptor D2) and D02173, D02558, and D03822 may be the clues of treatment for patients with Alzheimer's disease through targeting hsa5743 (prostaglandinendoperoxide synthase 2). The above prediction results need further biomedical validation. Discussion We anticipate that our proposed EnGDD model can help discover potential therapeutic clues for various diseases including neurodegenerative diseases.
Collapse
Affiliation(s)
- Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Yuzhuang Wang
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Zejun Li
- School of Computer Science, Hunan Institute of Technology, Hengyang, China
| | - Xueming Luo
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| |
Collapse
|
29
|
Wong L, Wang L, You ZH, Yuan CA, Huang YA, Cao MY. GKLOMLI: a link prediction model for inferring miRNA-lncRNA interactions by using Gaussian kernel-based method on network profile and linear optimization algorithm. BMC Bioinformatics 2023; 24:188. [PMID: 37158823 PMCID: PMC10169329 DOI: 10.1186/s12859-023-05309-w] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 04/27/2023] [Indexed: 05/10/2023] Open
Abstract
BACKGROUND The limited knowledge of miRNA-lncRNA interactions is considered as an obstruction of revealing the regulatory mechanism. Accumulating evidence on Human diseases indicates that the modulation of gene expression has a great relationship with the interactions between miRNAs and lncRNAs. However, such interaction validation via crosslinking-immunoprecipitation and high-throughput sequencing (CLIP-seq) experiments that inevitably costs too much money and time but with unsatisfactory results. Therefore, more and more computational prediction tools have been developed to offer many reliable candidates for a better design of further bio-experiments. METHODS In this work, we proposed a novel link prediction model based on Gaussian kernel-based method and linear optimization algorithm for inferring miRNA-lncRNA interactions (GKLOMLI). Given an observed miRNA-lncRNA interaction network, the Gaussian kernel-based method was employed to output two similarity matrixes of miRNAs and lncRNAs. Based on the integrated matrix combined with similarity matrixes and the observed interaction network, a linear optimization-based link prediction model was trained for inferring miRNA-lncRNA interactions. RESULTS To evaluate the performance of our proposed method, k-fold cross-validation (CV) and leave-one-out CV were implemented, in which each CV experiment was carried out 100 times on a training set generated randomly. The high area under the curves (AUCs) at 0.8623 ± 0.0027 (2-fold CV), 0.9053 ± 0.0017 (5-fold CV), 0.9151 ± 0.0013 (10-fold CV), and 0.9236 (LOO-CV), illustrated the precision and reliability of our proposed method. CONCLUSION GKLOMLI with high performance is anticipated to be used to reveal underlying interactions between miRNA and their target lncRNAs, and deciphers the potential mechanisms of the complex diseases.
Collapse
Affiliation(s)
- Leon Wong
- Guangxi Key Lab of Human-machine Interaction and Intelligent Decision, Guangxi Academy of Sciences, Nanning, 530007, China
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, 200092, Shanghai, China
| | - Lei Wang
- Guangxi Key Lab of Human-machine Interaction and Intelligent Decision, Guangxi Academy of Sciences, Nanning, 530007, China.
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277160, China.
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710139, China.
| | - Chang-An Yuan
- Guangxi Key Lab of Human-machine Interaction and Intelligent Decision, Guangxi Academy of Sciences, Nanning, 530007, China
| | - Yu-An Huang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710139, China
| | - Mei-Yuan Cao
- School of Electrical and Electronic Engineering, Guangdong Technology College, Zhaoqing, 526100, China
- Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, UKM, 43600, Bangi, Selangor, Malaysia
| |
Collapse
|
30
|
Gu C, Li X. Prediction of disease-related miRNAs by voting with multiple classifiers. BMC Bioinformatics 2023; 24:177. [PMID: 37122001 PMCID: PMC10150488 DOI: 10.1186/s12859-023-05308-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 04/26/2023] [Indexed: 05/02/2023] Open
Abstract
There is strong evidence to support that mutations and dysregulation of miRNAs are associated with a variety of diseases, including cancer. However, the experimental methods used to identify disease-related miRNAs are expensive and time-consuming. Effective computational approaches to identify disease-related miRNAs are in high demand and would aid in the detection of lncRNA biomarkers for disease diagnosis, treatment, and prevention. In this study, we develop an ensemble learning framework to reveal the potential associations between miRNAs and diseases (ELMDA). The ELMDA framework does not rely on the known associations when calculating miRNA and disease similarities and uses multi-classifiers voting to predict disease-related miRNAs. As a result, the average AUC of the ELMDA framework was 0.9229 for the HMDD v2.0 database in a fivefold cross-validation. All potential associations in the HMDD V2.0 database were predicted, and 90% of the top 50 results were verified with the updated HMDD V3.2 database. The ELMDA framework was implemented to investigate gastric neoplasms, prostate neoplasms and colon neoplasms, and 100%, 94%, and 90%, respectively, of the top 50 potential miRNAs were validated by the HMDD V3.2 database. Moreover, the ELMDA framework can predict isolated disease-related miRNAs. In conclusion, ELMDA appears to be a reliable method to uncover disease-associated miRNAs.
Collapse
Affiliation(s)
- Changlong Gu
- College of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China.
| | - Xiaoying Li
- College of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China.
| |
Collapse
|
31
|
Li T, Li Y, Zhu X, He Y, Wu Y, Ying T, Xie Z. Artificial intelligence in cancer immunotherapy: Applications in neoantigen recognition, antibody design and immunotherapy response prediction. Semin Cancer Biol 2023; 91:50-69. [PMID: 36870459 DOI: 10.1016/j.semcancer.2023.02.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 02/13/2023] [Accepted: 02/28/2023] [Indexed: 03/06/2023]
Abstract
Cancer immunotherapy is a method of controlling and eliminating tumors by reactivating the body's cancer-immunity cycle and restoring its antitumor immune response. The increased availability of data, combined with advancements in high-performance computing and innovative artificial intelligence (AI) technology, has resulted in a rise in the use of AI in oncology research. State-of-the-art AI models for functional classification and prediction in immunotherapy research are increasingly used to support laboratory-based experiments. This review offers a glimpse of the current AI applications in immunotherapy, including neoantigen recognition, antibody design, and prediction of immunotherapy response. Advancing in this direction will result in more robust predictive models for developing better targets, drugs, and treatments, and these advancements will eventually make their way into the clinical setting, pushing AI forward in the field of precision oncology.
Collapse
Affiliation(s)
- Tong Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Yupeng Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Xiaoyi Zhu
- MOE/NHC Key Laboratory of Medical Molecular Virology, Shanghai Institute of Infectious Disease and Biosecurity, School of Basic Medical Sciences, Shanghai Medical College, Fudan University, Shanghai, China; Shanghai Engineering Research Center for Synthetic Immunology, Shanghai, China
| | - Yao He
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Yanling Wu
- MOE/NHC Key Laboratory of Medical Molecular Virology, Shanghai Institute of Infectious Disease and Biosecurity, School of Basic Medical Sciences, Shanghai Medical College, Fudan University, Shanghai, China; Shanghai Engineering Research Center for Synthetic Immunology, Shanghai, China
| | - Tianlei Ying
- MOE/NHC Key Laboratory of Medical Molecular Virology, Shanghai Institute of Infectious Disease and Biosecurity, School of Basic Medical Sciences, Shanghai Medical College, Fudan University, Shanghai, China; Shanghai Engineering Research Center for Synthetic Immunology, Shanghai, China.
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China; Center for Precision Medicine, Sun Yat-sen University, Guangzhou, China.
| |
Collapse
|
32
|
Wang MN, Li Y, Lei LL, Ding DW, Xie XJ. Combining non-negative matrix factorization with graph Laplacian regularization for predicting drug-miRNA associations based on multi-source information fusion. Front Pharmacol 2023; 14:1132012. [PMID: 36817132 PMCID: PMC9931722 DOI: 10.3389/fphar.2023.1132012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Accepted: 01/16/2023] [Indexed: 02/05/2023] Open
Abstract
Increasing evidences suggest that miRNAs play a key role in the occurrence and progression of many complex human diseases. Therefore, targeting dysregulated miRNAs with small molecule drugs in the clinical has become a new treatment. Nevertheless, it is high cost and time-consuming for identifying miRNAs-targeted with drugs by biological experiments. Thus, more reliable computational method for identification associations of drugs with miRNAs urgently need to be developed. In this study, we proposed an efficient method, called GNMFDMA, to predict potential associations of drug with miRNA by combining graph Laplacian regularization with non-negative matrix factorization. We first calculated the overall similarity matrices of drugs and miRNAs according to the collected different biological information. Subsequently, the new drug-miRNA association adjacency matrix was reformulated based on the K nearest neighbor profiles so as to put right the false negative associations. Finally, graph Laplacian regularization collaborative non-negative matrix factorization was used to calculate the association scores of drugs with miRNAs. In the cross validation, GNMFDMA obtains AUC of 0.9193, which outperformed the existing methods. In addition, case studies on three common drugs (i.e., 5-Aza-CdR, 5-FU and Gemcitabine), 30, 31 and 34 of the top-50 associations inferred by GNMFDMA were verified. These results reveal that GNMFDMA is a reliable and efficient computational approach for identifying the potential drug-miRNA associations.
Collapse
Affiliation(s)
- Mei-Neng Wang
- School of Mathematics and Computer Science, Yichun University, Yichun, China
| | - Yu Li
- School of Information Engineering, Inner Mongolia University of Science and Technology, Baotou, China,*Correspondence: Yu Li,
| | - Li-Lan Lei
- School of Mathematics and Computer Science, Yichun University, Yichun, China
| | - De-Wu Ding
- School of Mathematics and Computer Science, Yichun University, Yichun, China
| | - Xue-Jun Xie
- School of Mathematics and Computer Science, Yichun University, Yichun, China
| |
Collapse
|
33
|
Liu Y, Gao Z, Peng C, Jiang X. Exploration of the heterogeneity and interaction of epithelial cells and NK/T-cells in Laryngeal Squamous Cell Carcinoma based on single-cell RNA sequencing data. Braz J Otorhinolaryngol 2023; 89:393-400. [PMID: 37105033 PMCID: PMC10164759 DOI: 10.1016/j.bjorl.2023.02.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 02/06/2023] [Indexed: 02/16/2023] Open
Abstract
OBJECTIVES We aimed to explore the heterogeneity and differentiation trajectories of epithelial cells and NK/T-cells in Laryngeal Squamous Cell Carcinoma (LSCC). METHODS We downloaded the GSE150321 data set containing LSCC01 and LSCC02 samples single cell RNA data from Gene Expression Omnibus. The UMAP analysis was performed to identify the cell subpopulations and cell locations of subpopulations. Seurat package was used to analyze the differential expression of genes. The function of differential expression genes was analyzed using DAVID database. The monocle2 package was used to analyze differentiation trajectories. We used the CellChat package to observe the signaling pathways and ligand-receptor pairs for epithelial cells and NK/T-cells. RESULTS All the LSCC cells were divided into 16 subpopulation that included 7 epithelial cell subsets, 3 T-cell subsets. The function analysis indicated that epithelial cells and NK/T-cells mainly participated in different process, such as cell cycle, immune response, and cell migration. Then, the results of differentiation trajectory indicated that the ability of migration, and the activation of the immune system increases, while the ability of apoptosis, and glucose metabolic process decreases as pseudotime. Migration-related epithelial cells act on all T-cells via the CNTN2-CNTN2 ligand-receptor pair, which suggested that CNTN2 might be an important biomarker for regulating migration of epithelial cells. CONCLUSIONS Our study characterized the heterogeneity of LSCC, which provided novel insights into LSCC and identified a new mechanism and target for clinical LSCC threapies. EVIDENCE IV.
Collapse
Affiliation(s)
- Yanan Liu
- Heilongjiang Provincial Hospital Affiliated to Harbin Institute of Technology, Department of Otorhinolaryngology, Harbin, Heilongjiang, PR China
| | - Zhiguang Gao
- Heilongjiang Provincial Hospital Affiliated to Harbin Institute of Technology, Department of Otorhinolaryngology, Harbin, Heilongjiang, PR China
| | - Cheng Peng
- Heilongjiang Provincial Hospital Affiliated to Harbin Institute of Technology, Department of Otorhinolaryngology, Harbin, Heilongjiang, PR China
| | - Xingli Jiang
- Heilongjiang Provincial Hospital Affiliated to Harbin Institute of Technology, Department of Otorhinolaryngology, Harbin, Heilongjiang, PR China.
| |
Collapse
|
34
|
Wang T, Sun J, Zhao Q. Investigating cardiotoxicity related with hERG channel blockers using molecular fingerprints and graph attention mechanism. Comput Biol Med 2023; 153:106464. [PMID: 36584603 DOI: 10.1016/j.compbiomed.2022.106464] [Citation(s) in RCA: 112] [Impact Index Per Article: 112.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 12/12/2022] [Accepted: 12/19/2022] [Indexed: 12/24/2022]
Abstract
Human ether-a-go-go-related gene (hERG) channel blockade by small molecules is a big concern during drug development in the pharmaceutical industry. Failure or inhibition of hERG channel activity caused by drug molecules can lead to prolonging QT interval, which will result in serious cardiotoxicity. Thus, evaluating the hERG blocking activity of all these small molecular compounds is technically challenging, and the relevant procedures are expensive and time-consuming. In this study, we develop a novel deep learning predictive model named DMFGAM for predicting hERG blockers. In order to characterize the molecule more comprehensively, we first consider the fusion of multiple molecular fingerprint features to characterize its final molecular fingerprint features. Then, we use the multi-head attention mechanism to extract the molecular graph features. Both molecular fingerprint features and molecular graph features are fused as the final features of the compounds to make the feature expression of compounds more comprehensive. Finally, the molecules are classified into hERG blockers or hERG non-blockers through the fully connected neural network. We conduct 5-fold cross-validation experiment to evaluate the performance of DMFGAM, and verify the robustness of DMFGAM on external validation datasets. We believe DMFGAM can serve as a powerful tool to predict hERG channel blockers in the early stages of drug discovery and development.
Collapse
Affiliation(s)
- Tianyi Wang
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China
| | - Jianqiang Sun
- School of Automation and Electrical Engineering, Linyi University, Linyi, 276000, China
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China.
| |
Collapse
|
35
|
Jia X, Yin Z, Peng Y. Gene differential co-expression analysis of male infertility patients based on statistical and machine learning methods. Front Microbiol 2023; 14:1092143. [PMID: 36778885 PMCID: PMC9911419 DOI: 10.3389/fmicb.2023.1092143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 01/11/2023] [Indexed: 01/28/2023] Open
Abstract
Male infertility has always been one of the important factors affecting the infertility of couples of gestational age. The reasons that affect male infertility includes living habits, hereditary factors, etc. Identifying the genetic causes of male infertility can help us understand the biology of male infertility, as well as the diagnosis of genetic testing and the determination of clinical treatment options. While current research has made significant progress in the genes that cause sperm defects in men, genetic studies of sperm content defects are still lacking. This article is based on a dataset of gene expression data on the X chromosome in patients with azoospermia, mild and severe oligospermia. Due to the difference in the degree of disease between patients and the possible difference in genetic causes, common classical clustering methods such as k-means, hierarchical clustering, etc. cannot effectively identify samples (realize simultaneous clustering of samples and features). In this paper, we use machine learning and various statistical methods such as hypergeometric distribution, Gibbs sampling, Fisher test, etc. and genes the interaction network for cluster analysis of gene expression data of male infertility patients has certain advantages compared with existing methods. The cluster results were identified by differential co-expression analysis of gene expression data in male infertility patients, and the model recognition clusters were analyzed by multiple gene enrichment methods, showing different degrees of enrichment in various enzyme activities, cancer, virus-related, ATP and ADP production, and other pathways. At the same time, as this paper is an unsupervised analysis of genetic factors of male infertility patients, we constructed a simulated data set, in which the clustering results have been determined, which can be used to measure the effect of discriminant model recognition. Through comparison, it finds that the proposed model has a better identification effect.
Collapse
|
36
|
Li S, Chang M, Tong L, Wang Y, Wang M, Wang F. Screening potential lncRNA biomarkers for breast cancer and colorectal cancer combining random walk and logistic matrix factorization. Front Genet 2023; 13:1023615. [PMID: 36744179 PMCID: PMC9895102 DOI: 10.3389/fgene.2022.1023615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Accepted: 10/10/2022] [Indexed: 01/21/2023] Open
Abstract
Breast cancer and colorectal cancer are two of the most common malignant tumors worldwide. They cause the leading causes of cancer mortality. Many researches have demonstrated that long noncoding RNAs (lncRNAs) have close linkages with the occurrence and development of the two cancers. Therefore, it is essential to design an effective way to identify potential lncRNA biomarkers for them. In this study, we developed a computational method (LDA-RWLMF) by integrating random walk with restart and Logistic Matrix Factorization to investigate the roles of lncRNA biomarkers in the prognosis and diagnosis of the two cancers. We first fuse disease semantic and Gaussian association profile similarities and lncRNA functional and Gaussian association profile similarities. Second, we design a negative selection algorithm to extract negative LncRNA-Disease Associations (LDA) based on random walk. Third, we develop a logistic matrix factorization model to predict possible LDAs. We compare our proposed LDA-RWLMF method with four classical LDA prediction methods, that is, LNCSIM1, LNCSIM2, ILNCSIM, and IDSSIM. The results from 5-fold cross validation on the MNDR dataset show that LDA-RWLMF computes the best AUC value of 0.9312, outperforming the above four LDA prediction methods. Finally, we rank all lncRNA biomarkers for the two cancers after determining the performance of LDA-RWLMF, respectively. We find that 48 and 50 lncRNAs have the highest association scores with breast cancer and colorectal cancer among all lncRNAs known to associate with them on the MNDR dataset, respectively. We predict that lncRNAs HULC and HAR1A could be separately potential biomarkers for breast cancer and colorectal cancer and need to biomedical experimental validation.
Collapse
|
37
|
Chen X, Huang L. Computational model for disease research. Brief Bioinform 2023; 24:6987819. [PMID: 36642407 DOI: 10.1093/bib/bbac615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Affiliation(s)
- Xing Chen
- Artificial Intelligence Research Institute, China University of Mining and Technology, Xuzhou 221116, China.,School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Li Huang
- The Future Laboratory, Tsinghua University, Beijing 100084, China
| |
Collapse
|
38
|
Zhao J, Sun J, Shuai SC, Zhao Q, Shuai J. Predicting potential interactions between lncRNAs and proteins via combined graph auto-encoder methods. Brief Bioinform 2023; 24:6896030. [PMID: 36515153 DOI: 10.1093/bib/bbac527] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 10/23/2022] [Accepted: 11/06/2022] [Indexed: 12/15/2022] Open
Abstract
Long noncoding RNA (lncRNA) is a kind of noncoding RNA with a length of more than 200 nucleotide units. Numerous research studies have proven that although lncRNAs cannot be directly translated into proteins, lncRNAs still play an important role in human growth processes by interacting with proteins. Since traditional biological experiments often require a lot of time and material costs to explore potential lncRNA-protein interactions (LPI), several computational models have been proposed for this task. In this study, we introduce a novel deep learning method known as combined graph auto-encoders (LPICGAE) to predict potential human LPIs. First, we apply a variational graph auto-encoder to learn the low dimensional representations from the high-dimensional features of lncRNAs and proteins. Then the graph auto-encoder is used to reconstruct the adjacency matrix for inferring potential interactions between lncRNAs and proteins. Finally, we minimize the loss of the two processes alternately to gain the final predicted interaction matrix. The result in 5-fold cross-validation experiments illustrates that our method achieves an average area under receiver operating characteristic curve of 0.974 and an average accuracy of 0.985, which is better than those of existing six state-of-the-art computational methods. We believe that LPICGAE can help researchers to gain more potential relationships between lncRNAs and proteins effectively.
Collapse
Affiliation(s)
- Jingxuan Zhao
- University of Science and Technology Liaoning, 66459, Anshan, China
| | | | - Stella C Shuai
- Northwestern University, 3270, Evanston, IllinoisUnited States
| | - Qi Zhao
- University of Science and Technology Liaoning, 66459, Anshan, China
| | - Jianwei Shuai
- Department of Physics, Xiamen University, Xiamen, China
| |
Collapse
|
39
|
Hu H, Feng Z, Lin H, Zhao J, Zhang Y, Xu F, Chen L, Chen F, Ma Y, Su J, Zhao Q, Shuai J. Modeling and analyzing single-cell multimodal data with deep parametric inference. Brief Bioinform 2023; 24:6987655. [PMID: 36642414 DOI: 10.1093/bib/bbad005] [Citation(s) in RCA: 33] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 12/11/2022] [Accepted: 01/02/2023] [Indexed: 01/17/2023] Open
Abstract
The proliferation of single-cell multimodal sequencing technologies has enabled us to understand cellular heterogeneity with multiple views, providing novel and actionable biological insights into the disease-driving mechanisms. Here, we propose a comprehensive end-to-end single-cell multimodal analysis framework named Deep Parametric Inference (DPI). DPI transforms single-cell multimodal data into a multimodal parameter space by inferring individual modal parameters. Analysis of cord blood mononuclear cells (CBMC) reveals that the multimodal parameter space can characterize the heterogeneity of cells more comprehensively than individual modalities. Furthermore, comparisons with the state-of-the-art methods on multiple datasets show that DPI has superior performance. Additionally, DPI can reference and query cell types without batch effects. As a result, DPI can successfully analyze the progression of COVID-19 disease in peripheral blood mononuclear cells (PBMC). Notably, we further propose a cell state vector field and analyze the transformation pattern of bone marrow cells (BMC) states. In conclusion, DPI is a powerful single-cell multimodal analysis framework that can provide new biological insights into biomedical researchers. The python packages, datasets and user-friendly manuals of DPI are freely available at https://github.com/studentiz/dpi.
Collapse
Affiliation(s)
- Huan Hu
- Department of Physics, and Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen 361005, China.,National Institute for Data Science in Health and Medicine, and State Key Laboratory of Cellular Stress Biology, Innovation Center for Cell Signaling Network, Xiamen University, Xiamen 361005 China.,Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), and Wenzhou Institute and Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang 325001, China
| | - Zhen Feng
- First Affiliated Hospital of Wenzhou Medical University, Wenzhou Medical University, Wenzhou 325000, China
| | - Hai Lin
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), and Wenzhou Institute and Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang 325001, China
| | - Junjie Zhao
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510000, China
| | - Yaru Zhang
- Institute of Biomedical Big Data, School of Ophthalmology & Optometry and Eye Hospital, School of Biomedical Engineering, Wenzhou Medical University, Wenzhou 325027, China
| | - Fei Xu
- Department of Physics, and Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen 361005, China
| | - Lingling Chen
- Department of Physics, and Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen 361005, China
| | - Feng Chen
- Department of Physics, and Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen 361005, China
| | - Yunlong Ma
- Institute of Biomedical Big Data, School of Ophthalmology & Optometry and Eye Hospital, School of Biomedical Engineering, Wenzhou Medical University, Wenzhou 325027, China
| | - Jianzhong Su
- Institute of Biomedical Big Data, School of Ophthalmology & Optometry and Eye Hospital, School of Biomedical Engineering, Wenzhou Medical University, Wenzhou 325027, China
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China
| | - Jianwei Shuai
- Department of Physics, and Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen 361005, China.,National Institute for Data Science in Health and Medicine, and State Key Laboratory of Cellular Stress Biology, Innovation Center for Cell Signaling Network, Xiamen University, Xiamen 361005 China.,Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), and Wenzhou Institute and Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang 325001, China
| |
Collapse
|
40
|
Peng Y, Zhao S, Zeng Z, Hu X, Yin Z. LGBMDF: A cascade forest framework with LightGBM for predicting drug-target interactions. Front Microbiol 2023; 13:1092467. [PMID: 36687573 PMCID: PMC9849804 DOI: 10.3389/fmicb.2022.1092467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 12/07/2022] [Indexed: 01/07/2023] Open
Abstract
Prediction of drug-target interactions (DTIs) plays an important role in drug development. However, traditional laboratory methods to determine DTIs require a lot of time and capital costs. In recent years, many studies have shown that using machine learning methods to predict DTIs can speed up the drug development process and reduce capital costs. An excellent DTI prediction method should have both high prediction accuracy and low computational cost. In this study, we noticed that the previous research based on deep forests used XGBoost as the estimator in the cascade, we applied LightGBM instead of XGBoost to the cascade forest as the estimator, then the estimator group was determined experimentally as three LightGBMs and three ExtraTrees, this new model is called LGBMDF. We conducted 5-fold cross-validation on LGBMDF and other state-of-the-art methods using the same dataset, and compared their Sn, Sp, MCC, AUC and AUPR. Finally, we found that our method has better performance and faster calculation speed.
Collapse
|
41
|
Qi Y, Zheng P, Huang G. DeepLBCEPred: A Bi-LSTM and multi-scale CNN-based deep learning method for predicting linear B-cell epitopes. Front Microbiol 2023; 14:1117027. [PMID: 36910218 PMCID: PMC9992402 DOI: 10.3389/fmicb.2023.1117027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 01/17/2023] [Indexed: 02/24/2023] Open
Abstract
The epitope is the site where antigens and antibodies interact and is vital to understanding the immune system. Experimental identification of linear B-cell epitopes (BCEs) is expensive, is labor-consuming, and has a low throughput. Although a few computational methods have been proposed to address this challenge, there is still a long way to go for practical applications. We proposed a deep learning method called DeepLBCEPred for predicting linear BCEs, which consists of bi-directional long short-term memory (Bi-LSTM), feed-forward attention, and multi-scale convolutional neural networks (CNNs). We extensively tested the performance of DeepLBCEPred through cross-validation and independent tests on training and two testing datasets. The empirical results showed that the DeepLBCEPred obtained state-of-the-art performance. We also investigated the contribution of different deep learning elements to recognize linear BCEs. In addition, we have developed a user-friendly web application for linear BCEs prediction, which is freely available for all scientific researchers at: http://www.biolscience.cn/DeepLBCEPred/.
Collapse
Affiliation(s)
- Yue Qi
- School of Information Engineering, Shaoyang University, Shaoyang, Hunan, China
| | - Peijie Zheng
- School of Information Engineering, Shaoyang University, Shaoyang, Hunan, China
| | - Guohua Huang
- School of Information Engineering, Shaoyang University, Shaoyang, Hunan, China
| |
Collapse
|
42
|
Liao Q, Ye Y, Li Z, Chen H, Zhuo L. Prediction of miRNA-disease associations in microbes based on graph convolutional networks and autoencoders. Front Microbiol 2023; 14:1170559. [PMID: 37187536 PMCID: PMC10175670 DOI: 10.3389/fmicb.2023.1170559] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 03/21/2023] [Indexed: 05/17/2023] Open
Abstract
MicroRNAs (miRNAs) are short RNA molecular fragments that regulate gene expression by targeting and inhibiting the expression of specific RNAs. Due to the fact that microRNAs affect many diseases in microbial ecology, it is necessary to predict microRNAs' association with diseases at the microbial level. To this end, we propose a novel model, termed as GCNA-MDA, where dual-autoencoder and graph convolutional network (GCN) are integrated to predict miRNA-disease association. The proposed method leverages autoencoders to extract robust representations of miRNAs and diseases and meantime exploits GCN to capture the topological information of miRNA-disease networks. To alleviate the impact of insufficient information for the original data, the association similarity and feature similarity data are combined to calculate a more complete initial basic vector of nodes. The experimental results on the benchmark datasets demonstrate that compared with the existing representative methods, the proposed method has achieved the superior performance and its precision reaches up to 0.8982. These results demonstrate that the proposed method can serve as a tool for exploring miRNA-disease associations in microbial environments.
Collapse
Affiliation(s)
- Qingquan Liao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Yuxiang Ye
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, China
| | - Zihang Li
- School of Computing and Data Science, Xiamen University Malaysia, Sepang, Selangor, Malaysia
| | - Hao Chen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
- *Correspondence: Hao Chen
| | - Linlin Zhuo
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, China
- Linlin Zhuo
| |
Collapse
|
43
|
Gong H, You X, Jin M, Meng Y, Zhang H, Yang S, Xu J. Graph neural network and multi-data heterogeneous networks for microbe-disease prediction. Front Microbiol 2022; 13:1077111. [PMID: 36620040 PMCID: PMC9814480 DOI: 10.3389/fmicb.2022.1077111] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Accepted: 11/30/2022] [Indexed: 12/24/2022] Open
Abstract
The research on microbe association networks is greatly significant for understanding the pathogenic mechanism of microbes and promoting the application of microbes in precision medicine. In this paper, we studied the prediction of microbe-disease associations based on multi-data biological network and graph neural network algorithm. The HMDAD database provided a dataset that included 39 diseases, 292 microbes, and 450 known microbe-disease associations. We proposed a Microbe-Disease Heterogeneous Network according to the microbe similarity network, disease similarity network, and known microbe-disease associations. Furthermore, we integrated the network into the graph convolutional neural network algorithm and developed the GCNN4Micro-Dis model to predict microbe-disease associations. Finally, the performance of the GCNN4Micro-Dis model was evaluated via 5-fold cross-validation. We randomly divided all known microbe-disease association data into five groups. The results showed that the average AUC value and standard deviation were 0.8954 ± 0.0030. Our model had good predictive power and can help identify new microbe-disease associations. In addition, we compared GCNN4Micro-Dis with three advanced methods to predict microbe-disease associations, KATZHMDA, BiRWHMDA, and LRLSHMDA. The results showed that our method had better prediction performance than the other three methods. Furthermore, we selected breast cancer as a case study and found the top 12 microbes related to breast cancer from the intestinal flora of patients, which further verified the model's accuracy.
Collapse
Affiliation(s)
- Houwu Gong
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China,Academy of Military Sciences, Beijing, China
| | - Xiong You
- Center of Rehabilitation Diagnosis and Treatment, Hunan Provincial Rehabilitation Hospital, Changsha, China
| | - Min Jin
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China,*Correspondence: Min Jin, ✉
| | - Yajie Meng
- School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, China
| | - Hanxue Zhang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Shuaishuai Yang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Junlin Xu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China,Junlin Xu, ✉
| |
Collapse
|
44
|
Gou H, Liu S, Liu L, Luo M, Qin S, He K, Yang X. Obeticholic acid and 5β-cholanic acid 3 exhibit anti-tumor effects on liver cancer through CXCL16/CXCR6 pathway. Front Immunol 2022; 13:1095915. [PMID: 36605219 PMCID: PMC9807878 DOI: 10.3389/fimmu.2022.1095915] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Accepted: 12/06/2022] [Indexed: 12/24/2022] Open
Abstract
Hepatocellular carcinoma (HCC) is the most common type of liver malignancy with a high incidence and mortality rate. Previous in vitro and in vivo studies have confirmed that liver sinusoidal endothelial cells (LSEC) secrete CXCL16, which acts as a messenger to increase the hepatic accumulation of CXCR6+ natural killer T (NKT) cells and exert potent antitumor effects. However, evidence for this process in humans is lacking and its clinical significance is still unclear. In this study, by dissecting the human HCC single-cell RNA-seq data, we verified this process through cellphoneDB. NKT cells in patients with high expression of CXCL16 exhibited a higher activation state and produced more interferon-γ (IFN-γ) compared with those with low expression. We next investigated the signaling pathways between activated (CD69 high) and unactivated NKT cells (CD69 low) using NKT cell-developmental trajectories and functional enrichment analyses. In vivo experiments, we found that farnesoid X receptor agonist (obeticholic acid) combined with the takeda G protein coupled receptor 5 antagonist (5β-cholanic acid 3) exhibited significant tumor suppressive effects in the orthotopic liver tumor model and this result may be related to the CXCL16/CXCR6 axis. In conclusion, our study provides the basis and potential strategies for HCC immunotherapy based on NKT cells.
Collapse
Affiliation(s)
- Haoxian Gou
- Department of Hepatobiliary Surgery, The Affiliated Hospital of Southwest Medical University, Luzhou, China,Academician Workstation of Sichuan Province, Luzhou, China
| | - Shenglu Liu
- Academician Workstation of Sichuan Province, Luzhou, China
| | - Linxin Liu
- Department of Hepatobiliary Surgery, The Affiliated Hospital of Southwest Medical University, Luzhou, China
| | - Ming Luo
- Academician Workstation of Sichuan Province, Luzhou, China
| | - Shu Qin
- Academician Workstation of Sichuan Province, Luzhou, China
| | - Kai He
- Department of Hepatobiliary Surgery, The Affiliated Hospital of Southwest Medical University, Luzhou, China,Academician Workstation of Sichuan Province, Luzhou, China,*Correspondence: Xiaoli Yang, ; Kai He,
| | - Xiaoli Yang
- Department of Hepatobiliary Surgery, The Affiliated Hospital of Southwest Medical University, Luzhou, China,Academician Workstation of Sichuan Province, Luzhou, China,*Correspondence: Xiaoli Yang, ; Kai He,
| |
Collapse
|
45
|
Su M, Pan T, Chen QZ, Zhou WW, Gong Y, Xu G, Yan HY, Li S, Shi QZ, Zhang Y, He X, Jiang CJ, Fan SC, Li X, Cairns MJ, Wang X, Li YS. Data analysis guidelines for single-cell RNA-seq in biomedical studies and clinical applications. Mil Med Res 2022; 9:68. [PMID: 36461064 PMCID: PMC9716519 DOI: 10.1186/s40779-022-00434-8] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 11/18/2022] [Indexed: 12/03/2022] Open
Abstract
The application of single-cell RNA sequencing (scRNA-seq) in biomedical research has advanced our understanding of the pathogenesis of disease and provided valuable insights into new diagnostic and therapeutic strategies. With the expansion of capacity for high-throughput scRNA-seq, including clinical samples, the analysis of these huge volumes of data has become a daunting prospect for researchers entering this field. Here, we review the workflow for typical scRNA-seq data analysis, covering raw data processing and quality control, basic data analysis applicable for almost all scRNA-seq data sets, and advanced data analysis that should be tailored to specific scientific questions. While summarizing the current methods for each analysis step, we also provide an online repository of software and wrapped-up scripts to support the implementation. Recommendations and caveats are pointed out for some specific analysis tasks and approaches. We hope this resource will be helpful to researchers engaging with scRNA-seq, in particular for emerging clinical applications.
Collapse
Affiliation(s)
- Min Su
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166 China
| | - Tao Pan
- College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199 Hainan China
| | - Qiu-Zhen Chen
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166 China
| | - Wei-Wei Zhou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081 Heilongjiang China
| | - Yi Gong
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166 China
- Department of Immunology, Nanjing Medical University, Nanjing, 211166 China
| | - Gang Xu
- College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199 Hainan China
| | - Huan-Yu Yan
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166 China
| | - Si Li
- College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199 Hainan China
| | - Qiao-Zhen Shi
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166 China
| | - Ya Zhang
- College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199 Hainan China
| | - Xiao He
- Department of Laboratory Medicine, Women and Children’s Hospital of Chongqing Medical University, Chongqing, 401174 China
| | | | - Shi-Cai Fan
- Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Shenzhen, 518110 Guangdong China
| | - Xia Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081 Heilongjiang China
| | - Murray J. Cairns
- School of Biomedical Sciences and Pharmacy, Faculty of Health and Medicine, the University of Newcastle, University Drive, Callaghan, NSW 2308 Australia
- Precision Medicine Research Program, Hunter Medical Research Institute, New Lambton Heights, NSW 2305 Australia
| | - Xi Wang
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166 China
| | - Yong-Sheng Li
- College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199 Hainan China
| |
Collapse
|
46
|
Peng L, Yang J, Wang M, Zhou L. Editorial: Machine learning-based methods for RNA data analysis—Volume II. Front Genet 2022; 13:1010089. [DOI: 10.3389/fgene.2022.1010089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 09/20/2022] [Indexed: 12/02/2022] Open
|
47
|
Zhai S, Li X, Wu Y, Shi X, Ji B, Qiu C. Identifying potential microRNA biomarkers for colon cancer and colorectal cancer through bound nuclear norm regularization. Front Genet 2022; 13:980437. [PMID: 36313468 PMCID: PMC9614659 DOI: 10.3389/fgene.2022.980437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 08/01/2022] [Indexed: 11/17/2022] Open
Abstract
Colon cancer and colorectal cancer are two common cancer-related deaths worldwide. Identification of potential biomarkers for the two cancers can help us to evaluate their initiation, progression and therapeutic response. In this study, we propose a new microRNA-disease association identification method, BNNRMDA, to discover potential microRNA biomarkers for the two cancers. BNNRMDA better combines disease semantic similarity and Gaussian Association Profile Kernel (GAPK) similarity, microRNA function similarity and GAPK similarity, and the bound nuclear norm regularization model. Compared to other five classical microRNA-disease association identification methods (MIDPE, MIDP, RLSMDA, GRNMF, AND LPLNS), BNNRMDA obtains the highest AUC of 0.9071, demonstrating its strong microRNA-disease association identification performance. BNNRMDA is applied to discover possible microRNA biomarkers for colon cancer and colorectal cancer. The results show that all 73 known microRNAs associated with colon cancer in the HMDD database have the highest association scores with colon cancer and are ranked as top 73. Among 137 known microRNAs associated with colorectal cancer in the HMDD database, 129 microRNAs have the highest association scores with colorectal cancer and are ranked as top 129. In addition, we predict that hsa-miR-103a could be a potential biomarker of colon cancer and hsa-mir-193b and hsa-mir-7days could be potential biomarkers of colorectal cancer.
Collapse
Affiliation(s)
- Shengyong Zhai
- Department of General Surgery, Weifang People’s Hospital, Shandong, China
| | - Xiaoling Li
- The Second Department of Oncology, Beidahuang Industry Group General Hospital, Harbin, China,Heilongjiang Second Cancer Hospital, Harbin, China
| | - Yan Wu
- Geneis Beijing Co., Ltd., Beijing, China
| | - Xiaoli Shi
- Geneis Beijing Co., Ltd., Beijing, China
| | - Binbin Ji
- Geneis Beijing Co., Ltd., Beijing, China
| | - Chun Qiu
- Department of Oncology, Hainan General Hospital, Haikou, China,*Correspondence: Chun Qiu,
| |
Collapse
|
48
|
Su Q, Tan Q, Liu X, Wu L. Prioritizing potential circRNA biomarkers for bladder cancer and bladder urothelial cancer based on an ensemble model. Front Genet 2022; 13:1001608. [PMID: 36186429 PMCID: PMC9521272 DOI: 10.3389/fgene.2022.1001608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Accepted: 08/15/2022] [Indexed: 12/03/2022] Open
Abstract
Bladder cancer is the most common cancer of the urinary system. Bladder urothelial cancer accounts for 90% of bladder cancer. These two cancers have high morbidity and mortality rates worldwide. The identification of biomarkers for bladder cancer and bladder urothelial cancer helps in their diagnosis and treatment. circRNAs are considered oncogenes or tumor suppressors in cancers, and they play important roles in the occurrence and development of cancers. In this manuscript, we developed an Ensemble model, CDA-EnRWLRLS, to predict circRNA-Disease Associations (CDA) combining Random Walk with restart and Laplacian Regularized Least Squares, and further screen potential biomarkers for bladder cancer and bladder urothelial cancer. First, we compute disease similarity by combining the semantic similarity and association profile similarity of diseases and circRNA similarity by combining the functional similarity and association profile similarity of circRNAs. Second, we score each circRNA-disease pair by random walk with restart and Laplacian regularized least squares, respectively. Third, circRNA-disease association scores from these models are integrated to obtain the final CDAs by the soft voting approach. Finally, we use CDA-EnRWLRLS to screen potential circRNA biomarkers for bladder cancer and bladder urothelial cancer. CDA-EnRWLRLS is compared to three classical CDA prediction methods (CD-LNLP, DWNN-RLS, and KATZHCDA) and two individual models (CDA-RWR and CDA-LRLS), and obtains better AUC of 0.8654. We predict that circHIPK3 has the highest association with bladder cancer and may be its potential biomarker. In addition, circSMARCA5 has the highest association with bladder urothelial cancer and may be its possible biomarker.
Collapse
|