1
|
Tie D, He M, Li W, Xiang Z. Advances in the application of network analysis methods in traditional Chinese medicine research. PHYTOMEDICINE : INTERNATIONAL JOURNAL OF PHYTOTHERAPY AND PHYTOPHARMACOLOGY 2025; 136:156256. [PMID: 39615211 DOI: 10.1016/j.phymed.2024.156256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2024] [Revised: 11/03/2024] [Accepted: 11/11/2024] [Indexed: 01/16/2025]
Abstract
OBJECTIVE This review aims at evaluating the role and potential applications of network analysis methods in the medicinal substances of traditional Chinese medicine (TCM), theories of TCM compatibility, properties of herbs, and TCM syndromes. METHODS Literature was retrieved from databases, such as CNKI, PubMed, and Web of Science, using keywords, including "network analysis," "network biology," "network pharmacology," and "network medicine." The extracted literature included the biological network construction (including ingredient-target and target-disease relations), analysis of network topology characteristics (including node degree, clustering coefficient, and path length), network modularization analysis, functional annotation and so on. These studies were categorized and organized based on their research methods, application domains, and other relevant characteristics. RESULTS Network analysis algorithms, such as network distance, random walk, matrix factorization, graph embedding, and graph neural networks, are widely applied in fields related to the properties, compatibility, and mechanisms of TCM. They effectively reflect the interactive relations within the complex systems of TCM and elucidate and clarify theories, such as the effective substances, the principles of TCM compatibility, the TCM syndromes, and the properties of TCM. CONCLUSION The network analysis method is a powerful mathematical and computational tool that reveals the structure, dynamics, and functions of complex systems by analyzing the elements and their relations. This approach has effectively promoted the modernization of TCM, providing essential theoretical and practical tools for personalized treatment and scientific research on TCM. It also offers a significant methodological framework for the modernization and internationalization of TCM.
Collapse
Affiliation(s)
- Defu Tie
- Medical School, Hangzhou City University, Hangzhou, 310015, China; College of Pharmaceutical Engineering of Traditional Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin, 301617, China.
| | - Mulan He
- Medical School, Hangzhou City University, Hangzhou, 310015, China; College of Pharmaceutical Engineering of Traditional Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin, 301617, China.
| | - Wenlong Li
- College of Pharmaceutical Engineering of Traditional Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin, 301617, China.
| | - Zheng Xiang
- Medical School, Hangzhou City University, Hangzhou, 310015, China.
| |
Collapse
|
2
|
Chen J, Tao R, Qiu Y, Yuan Q. CMFHMDA: a prediction framework for human disease-microbe associations based on cross-domain matrix factorization. Brief Bioinform 2024; 25:bbae481. [PMID: 39327064 PMCID: PMC11427075 DOI: 10.1093/bib/bbae481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Revised: 08/27/2024] [Accepted: 09/12/2024] [Indexed: 09/28/2024] Open
Abstract
Predicting associations between microbes and diseases opens up new avenues for developing diagnostic, preventive, and therapeutic strategies. Given that laboratory-based biological tests to verify these associations are often time-consuming and expensive, there is a critical need for innovative computational frameworks to predict new microbe-disease associations. In this work, we introduce a novel prediction algorithm called Predicting Human Disease-Microbe Associations using Cross-Domain Matrix Factorization (CMFHMDA). Initially, we calculate the composite similarity of diseases and the Gaussian interaction profile similarity of microbes. We then apply the Weighted K Nearest Known Neighbors (WKNKN) algorithm to refine the microbe-disease association matrix. Our CMFHMDA model is subsequently developed by integrating the network data of both microbes and diseases to predict potential associations. The key innovations of this method include using the WKNKN algorithm to preprocess missing values in the association matrix and incorporating cross-domain information from microbes and diseases into the CMFHMDA model. To validate CMFHMDA, we employed three different cross-validation techniques to evaluate the model's accuracy. The results indicate that the CMFHMDA model achieved Area Under the Receiver Operating Characteristic Curve scores of 0.9172, 0.8551, and 0.9351$\pm $0.0052 in global Leave-One-Out Cross-Validation (LOOCV), local LOOCV, and five-fold CV, respectively. Furthermore, many predicted associations have been confirmed by published experimental studies, establishing CMFHMDA as an effective tool for predicting potential disease-associated microbes.
Collapse
Affiliation(s)
- Jing Chen
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, 215009 Suzhou, China
| | - Ran Tao
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, 215009 Suzhou, China
| | - Yi Qiu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, 215009 Suzhou, China
| | - Qun Yuan
- Suzhou Research Center of Medical School, Suzhou Hospital, Affiliated Hospital of Medical School, Nanjing University, 215153 Suzhou, China
| |
Collapse
|
3
|
Li M, Wang Z, Liu L, Liu X, Zhang W. Subgraph-Aware Graph Kernel Neural Network for Link Prediction in Biological Networks. IEEE J Biomed Health Inform 2024; 28:4373-4381. [PMID: 38630566 DOI: 10.1109/jbhi.2024.3390092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Abstract
Identifying links within biological networks is important in various biomedical applications. Recent studies have revealed that each node in a network may play a unique role in different links, but most link prediction methods overlook distinctive node roles, hindering the acquisition of effective link representations. Subgraph-based methods have been introduced as solutions but often ignore shared information among subgraphs. To address these limitations, we propose a Subgraph-aware Graph Kernel Neural Network (SubKNet) for link prediction in biological networks. Specifically, SubKNet extracts a subgraph for each node pair and feeds it into a graph kernel neural network, which decomposes each subgraph into a combination of trainable graph filters with diversity regularization for subgraph-aware representation learning. Additionally, node embeddings of the network are extracted as auxiliary information, aiding in distinguishing node pairs that share the same subgraph. Extensive experiments on five biological networks demonstrate that SubKNet outperforms baselines, including methods especially designed for biological networks and methods adapted to various networks. Further investigations confirm that employing graph filters to subgraphs helps to distinguish node roles in different subgraphs, and the inclusion of diversity regularization further enhances its capacity from diverse perspectives, generating effective link representations that contribute to more accurate link prediction.
Collapse
|
4
|
Gong L, Cui X, Liu Y, Lin C, Gao Z. SinCWIm: An imputation method for single-cell RNA sequence dropouts using weighted alternating least squares. Comput Biol Med 2024; 171:108225. [PMID: 38442556 DOI: 10.1016/j.compbiomed.2024.108225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Revised: 01/28/2024] [Accepted: 02/25/2024] [Indexed: 03/07/2024]
Abstract
BACKGROUND AND OBJECTIVES Single-cell RNA sequencing (scRNA-seq) provides a powerful tool for exploring cellular heterogeneity, discovering novel or rare cell types, distinguishing between tissue-specific cellular composition, and understanding cell differentiation during development. However, due to technological limitations, dropout events in scRNA-seq can mistakenly convert some entries in the real data to zero. This is equivalent to introducing noise into the data of cell gene expression entries. The data is contaminated, which affects the performance of downstream analyses, including clustering, cell annotation, differential gene expression analysis, and so on. Therefore, it is a crucial work to accurately determine which zeros are due to dropout events and perform imputation operations on them. METHODS Considering the different confidence levels of different zeros in the gene expression matrix, this paper proposes a SinCWIm method for dropout events in scRNA-seq based on weighted alternating least squares (WALS). The method utilizes Pearson correlation coefficient and hierarchical clustering to quantify the confidence of zero entries. It is then combined with WALS for matrix decomposition. And the imputation result is made close to the actual number by outlier removal and data correction operations. RESULTS A total of eight single-cell sequencing datasets were used for comparative experiments to demonstrate the overall superiority of SinCWIm over state-of-the-art models. SinCWIm was applied to cluster the data to obtain an adjusted RAND index evaluation, and the Usoskin, Pollen and Bladder datasets scored 94.46%, 96.48% and 76.74%, respectively. In addition, significant improvements were made in the retention of differential expression genes and visualization. CONCLUSIONS SinCWIm provides a valuable imputation method for handling dropout events in single-cell sequencing data. In comparison to advanced methods, SinCWIm demonstrates excellent performance in clustering, visualization and other aspects. It is applicable to various single-cell sequencing datasets.
Collapse
Affiliation(s)
- Lejun Gong
- Jiangsu Key Lab of Big Data Security & Intelligent Processing, School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, China.
| | - Xiong Cui
- Jiangsu Key Lab of Big Data Security & Intelligent Processing, School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, China
| | - Yang Liu
- Jiangsu Key Lab of Big Data Security & Intelligent Processing, School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, China
| | - Cai Lin
- Department of Burn, Wound Repair and Regenerative Medicine Center, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang, 325000, China.
| | - Zhihong Gao
- Zhejiang Engineering Research Center of Intelligent Medicine, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China.
| |
Collapse
|
5
|
Peng S, Yamamoto A, Ito K. Link prediction on bipartite networks using matrix factorization with negative sample selection. PLoS One 2023; 18:e0289568. [PMID: 37585433 PMCID: PMC10431684 DOI: 10.1371/journal.pone.0289568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 07/19/2023] [Indexed: 08/18/2023] Open
Abstract
We propose a new method for bipartite link prediction using matrix factorization with negative sample selection. Bipartite link prediction is a problem that aims to predict the missing links or relations in a bipartite network. One of the most popular solutions to the problem is via matrix factorization (MF), which performs well but requires reliable information on both absent and present network links as training samples. This, however, is sometimes unavailable since there is no ground truth for absent links. To solve the problem, we propose a technique called negative sample selection, which selects reliable negative training samples using formal concept analysis (FCA) of a given bipartite network in advance of the preceding MF process. We conduct experiments on two hypothetical application scenarios to prove that our joint method outperforms the raw MF-based link prediction method as well as all other previously-proposed unsupervised link prediction methods.
Collapse
Affiliation(s)
- Siqi Peng
- Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Kyoto, Japan
| | - Akihiro Yamamoto
- Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Kyoto, Japan
| | - Kimihito Ito
- International Institute for Zoonosis Control, Division of Bioinformatics, Hokkaido University, Hokkaido, Japan
| |
Collapse
|
6
|
Avşar G, Pir P. A comparative performance evaluation of imputation methods in spatially resolved transcriptomics data. Mol Omics 2023; 19:162-173. [PMID: 36562244 DOI: 10.1039/d2mo00266c] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Spatially resolved transcriptomics technologies have drawn enormous attention by providing RNA expression patterns together with their spatial information. Even though improved techniques are being developed rapidly, the technologies which give spatially whole transcriptome level profiles suffer from dropout problems because of the low capture rate. Imputation of missing data is one strategy to eliminate this technical problem. We evaluated the imputation performance of five available methods (SpaGE, stPlus, gimVI, Tangram and stLearn) which were indicated as capable of making predictions for the dropouts in spatially resolved transcriptomics datasets. The evaluation was performed qualitatively via visualization of the predictions against the original values and quantitatively with Pearson's correlation coefficient, cosine similarity, root mean squared log-error, Silhouette Index and Calinski Harabasz Index. We found that stPlus and gimVI outperform the other three. However, the performance of all methods was lower than expected which indicates that there is still a gap for imputation tools dealing with dropout events in spatially resolved transcriptomics.
Collapse
Affiliation(s)
- Gülben Avşar
- Department of Bioengineering, Gebze Technical University, 41400 Kocaeli, Turkey.
| | - Pınar Pir
- Department of Bioengineering, Gebze Technical University, 41400 Kocaeli, Turkey.
| |
Collapse
|
7
|
Mariappan R, Jayagopal A, Sien HZ, Rajan V. Neural Collective Matrix Factorization for integrated analysis of heterogeneous biomedical data. Bioinformatics 2022; 38:4554-4561. [PMID: 35929808 DOI: 10.1093/bioinformatics/btac543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 06/30/2022] [Accepted: 08/03/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION In many biomedical studies, there arises the need to integrate data from multiple directly or indirectly related sources. Collective matrix factorization (CMF) and its variants are models designed to collectively learn from arbitrary collections of matrices. The latent factors learnt are rich integrative representations that can be used in downstream tasks, such as clustering or relation prediction with standard machine-learning models. Previous CMF-based methods have numerous modeling limitations. They do not adequately capture complex non-linear interactions and do not explicitly model varying sparsity and noise levels in the inputs, and some cannot model inputs with multiple datatypes. These inadequacies limit their use on many biomedical datasets. RESULTS To address these limitations, we develop Neural Collective Matrix Factorization (NCMF), the first fully neural approach to CMF. We evaluate NCMF on relation prediction tasks of gene-disease association prediction and adverse drug event prediction, using multiple datasets. In each case, data are obtained from heterogeneous publicly available databases and used to learn representations to build predictive models. NCMF is found to outperform previous CMF-based methods and several state-of-the-art graph embedding methods for representation learning in our experiments. Our experiments illustrate the versatility and efficacy of NCMF in representation learning for seamless integration of heterogeneous data. AVAILABILITY AND IMPLEMENTATION https://github.com/ajayago/NCMF_bioinformatics. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ragunathan Mariappan
- Department of Information Systems and Analytics, School of Computing, National University of Singapore, Singapore 117417, Singapore
| | - Aishwarya Jayagopal
- Department of Information Systems and Analytics, School of Computing, National University of Singapore, Singapore 117417, Singapore
| | - Ho Zong Sien
- Department of Information Systems and Analytics, School of Computing, National University of Singapore, Singapore 117417, Singapore
| | - Vaibhav Rajan
- Department of Information Systems and Analytics, School of Computing, National University of Singapore, Singapore 117417, Singapore
| |
Collapse
|
8
|
Xie X, Wang Y, Sheng N, Zhang S, Cao Y, Fu Y. Predicting miRNA-disease associations based on multi-view information fusion. Front Genet 2022; 13:979815. [PMID: 36238163 PMCID: PMC9552014 DOI: 10.3389/fgene.2022.979815] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 08/16/2022] [Indexed: 11/13/2022] Open
Abstract
MicroRNAs (miRNAs) play an important role in various biological processes and their abnormal expression could lead to the occurrence of diseases. Exploring the potential relationships between miRNAs and diseases can contribute to the diagnosis and treatment of complex diseases. The increasing databases storing miRNA and disease information provide opportunities to develop computational methods for discovering unobserved disease-related miRNAs, but there are still some challenges in how to effectively learn and fuse information from multi-source data. In this study, we propose a multi-view information fusion based method for miRNA-disease association (MDA)prediction, named MVIFMDA. Firstly, multiple heterogeneous networks are constructed by combining the known MDAs and different similarities of miRNAs and diseases based on multi-source information. Secondly, the topology features of miRNAs and diseases are obtained by using the graph convolutional network to each heterogeneous network view, respectively. Moreover, we design the attention strategy at the topology representation level to adaptively fuse representations including different structural information. Meanwhile, we learn the attribute representations of miRNAs and diseases from their similarity attribute views with convolutional neural networks, respectively. Finally, the complicated associations between miRNAs and diseases are reconstructed by applying a bilinear decoder to the combined features, which combine topology and attribute representations. Experimental results on the public dataset demonstrate that our proposed model consistently outperforms baseline methods. The case studies further show the ability of the MVIFMDA model for inferring underlying associations between miRNAs and diseases.
Collapse
Affiliation(s)
- Xuping Xie
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Yan Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
- School of Artificial Intelligence, Jilin University, Changchun, China
- *Correspondence: Yan Wang,
| | - Nan Sheng
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Shuangquan Zhang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Yangkun Cao
- School of Artificial Intelligence, Jilin University, Changchun, China
| | - Yuan Fu
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, United Kingdom
| |
Collapse
|
9
|
DTIP-TC2A: An analytical framework for drug-target interactions prediction methods. Comput Biol Chem 2022; 99:107707. [DOI: 10.1016/j.compbiolchem.2022.107707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 05/01/2022] [Accepted: 05/26/2022] [Indexed: 11/18/2022]
|
10
|
Vahabi N, Michailidis G. Unsupervised Multi-Omics Data Integration Methods: A Comprehensive Review. Front Genet 2022; 13:854752. [PMID: 35391796 PMCID: PMC8981526 DOI: 10.3389/fgene.2022.854752] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 02/28/2022] [Indexed: 12/26/2022] Open
Abstract
Through the developments of Omics technologies and dissemination of large-scale datasets, such as those from The Cancer Genome Atlas, Alzheimer’s Disease Neuroimaging Initiative, and Genotype-Tissue Expression, it is becoming increasingly possible to study complex biological processes and disease mechanisms more holistically. However, to obtain a comprehensive view of these complex systems, it is crucial to integrate data across various Omics modalities, and also leverage external knowledge available in biological databases. This review aims to provide an overview of multi-Omics data integration methods with different statistical approaches, focusing on unsupervised learning tasks, including disease onset prediction, biomarker discovery, disease subtyping, module discovery, and network/pathway analysis. We also briefly review feature selection methods, multi-Omics data sets, and resources/tools that constitute critical components for carrying out the integration.
Collapse
Affiliation(s)
- Nasim Vahabi
- Informatics Institute, University of Florida, Gainesville, FL, United States
| | - George Michailidis
- Informatics Institute, University of Florida, Gainesville, FL, United States
| |
Collapse
|