1
|
Bradshaw MS, Gibbs C, Martin S, Firman T, Gaskell A, Fosdick B, Layer R. Hypothesis generation for rare and undiagnosed diseases through clustering and classifying time-versioned biological ontologies. PLoS One 2024; 19:e0309205. [PMID: 39724242 DOI: 10.1371/journal.pone.0309205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 08/06/2024] [Indexed: 12/28/2024] Open
Abstract
Rare diseases affect 1-in-10 people in the United States and despite increased genetic testing, up to half never receive a diagnosis. Even when using advanced genome sequencing platforms to discover variants, if there is no connection between the variants found in the patient's genome and their phenotypes in the literature, then the patient will remain undiagnosed. When a direct variant-phenotype connection is not known, putting a patient's information in the larger context of phenotype relationships and protein-protein interactions may provide an opportunity to find an indirect explanation. Databases such as STRING contain millions of protein-protein interactions, and the Human Phenotype Ontology (HPO) contains the relations of thousands of phenotypes. By integrating these networks and clustering the entities within, we can potentially discover latent gene-to-phenotype connections. The historical records for STRING and HPO provide a unique opportunity to create a network time series for evaluating the cluster significance. Most excitingly, working with Children's Hospital Colorado, we have provided promising hypotheses about latent gene-to-phenotype connections for 38 patients. We also provide potential answers for 14 patients listed on MyGene2. Clusters our tool finds significant harbor 2.35 to 8.72 times as many gene-to-phenotype edges inferred from known drug interactions than clusters found to be insignificant. Our tool, BOCC, is available as a web app and command line tool.
Collapse
Affiliation(s)
- Michael S Bradshaw
- Department of Computer Science, University of Colorado Boulder, Boulder, CO, United States of America
| | - Connor Gibbs
- Department of Statistics, Colorado State University, Fort Collins, CO, United States of America
| | - Skylar Martin
- Department of Computer Science, University of Colorado Boulder, Boulder, CO, United States of America
| | - Taylor Firman
- Precision Medicine Institute, Children's Hospital Colorado, Aurora, CO, United States of America
| | - Alisa Gaskell
- Precision Medicine Institute, Children's Hospital Colorado, Aurora, CO, United States of America
| | - Bailey Fosdick
- Department of Biostatistics & Informatics, Colorado School of Public Health, Aurora, CO, United States of America
| | - Ryan Layer
- Department of Computer Science, University of Colorado Boulder, Boulder, CO, United States of America
| |
Collapse
|
2
|
Csikász-Nagy A, Fichó E, Noto S, Reguly I. Computational tools to predict context-specific protein complexes. Curr Opin Struct Biol 2024; 88:102883. [PMID: 38986166 DOI: 10.1016/j.sbi.2024.102883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Revised: 05/21/2024] [Accepted: 06/19/2024] [Indexed: 07/12/2024]
Abstract
Interactions between thousands of proteins define cells' protein-protein interaction (PPI) network. Some of these interactions lead to the formation of protein complexes. It is challenging to identify a protein complex in a haystack of protein-protein interactions, and it is even more difficult to predict all protein complexes of the complexome. Simulations and machine learning approaches try to crack these problems by looking at the PPI network or predicted protein structures. Clustering of PPI networks led to the first protein complex predictions, while most recently, atomistic models of protein complexes and deep-learning-based structure prediction methods have also emerged. The simulation of PPI level interactions even enables the quantitative prediction of protein complexes. These methods, the required data sources, and their potential future developments are discussed in this review.
Collapse
Affiliation(s)
- Attila Csikász-Nagy
- Cytocast Hungary Kft, Budapest, Hungary; Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary.
| | | | - Santiago Noto
- Cytocast Hungary Kft, Budapest, Hungary; Escola de Matemática Aplicada, Fundação Getúlio Vargas, Rio de Janeiro, Brazil
| | - István Reguly
- Cytocast Hungary Kft, Budapest, Hungary; Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary
| |
Collapse
|
3
|
Lu Y, Li Q, Li T. A novel hierarchical network-based approach to unveil the complexity of functional microbial genome. BMC Genomics 2024; 25:786. [PMID: 39138557 PMCID: PMC11323692 DOI: 10.1186/s12864-024-10692-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 08/07/2024] [Indexed: 08/15/2024] Open
Abstract
Biological networks serve a crucial role in elucidating intricate biological processes. While interspecies environmental interactions have been extensively studied, the exploration of gene interactions within species, particularly among individual microorganisms, is less developed. The increasing amount of microbiome genomic data necessitates a more nuanced analysis of microbial genome structures and functions. In this context, we introduce a complex structure using higher-order network theory, "Solid Motif Structures (SMS)", via a hierarchical biological network analysis of genomes within the same genus, effectively linking microbial genome structure with its function. Leveraging 162 high-quality genomes of Microcystis, a key freshwater cyanobacterium within microbial ecosystems, we established a genome structure network. Employing deep learning techniques, such as adaptive graph encoder, we uncovered 27 critical functional subnetworks and their associated SMSs. Incorporating metagenomic data from seven geographically distinct lakes, we conducted an investigation into Microcystis' functional stability under varying environmental conditions, unveiling unique functional interaction models for each lake. Our work compiles these insights into an extensive resource repository, providing novel perspectives on the functional dynamics within Microcystis. This research offers a hierarchical network analysis framework for understanding interactions between microbial genome structures and functions within the same genus.
Collapse
Affiliation(s)
- Yuntao Lu
- University of Michigan, Ann Arbor, USA
| | - Qi Li
- The State Key Laboratory of Freshwater Ecology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China.
| | - Tao Li
- The State Key Laboratory of Freshwater Ecology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China.
| |
Collapse
|
4
|
Li G, Li S, Liang C, Xiao Q, Luo J. Drug repositioning based on residual attention network and free multiscale adversarial training. BMC Bioinformatics 2024; 25:261. [PMID: 39118000 PMCID: PMC11308596 DOI: 10.1186/s12859-024-05893-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 08/06/2024] [Indexed: 08/10/2024] Open
Abstract
BACKGROUND Conducting traditional wet experiments to guide drug development is an expensive, time-consuming and risky process. Analyzing drug function and repositioning plays a key role in identifying new therapeutic potential of approved drugs and discovering therapeutic approaches for untreated diseases. Exploring drug-disease associations has far-reaching implications for identifying disease pathogenesis and treatment. However, reliable detection of drug-disease relationships via traditional methods is costly and slow. Therefore, investigations into computational methods for predicting drug-disease associations are currently needed. RESULTS This paper presents a novel drug-disease association prediction method, RAFGAE. First, RAFGAE integrates known associations between diseases and drugs into a bipartite network. Second, RAFGAE designs the Re_GAT framework, which includes multilayer graph attention networks (GATs) and two residual networks. The multilayer GATs are utilized for learning the node embeddings, which is achieved by aggregating information from multihop neighbors. The two residual networks are used to alleviate the deep network oversmoothing problem, and an attention mechanism is introduced to combine the node embeddings from different attention layers. Third, two graph autoencoders (GAEs) with collaborative training are constructed to simulate label propagation to predict potential associations. On this basis, free multiscale adversarial training (FMAT) is introduced. FMAT enhances node feature quality through small gradient adversarial perturbation iterations, improving the prediction performance. Finally, tenfold cross-validations on two benchmark datasets show that RAFGAE outperforms current methods. In addition, case studies have confirmed that RAFGAE can detect novel drug-disease associations. CONCLUSIONS The comprehensive experimental results validate the utility and accuracy of RAFGAE. We believe that this method may serve as an excellent predictor for identifying unobserved disease-drug associations.
Collapse
Affiliation(s)
- Guanghui Li
- School of Information Engineering, East China Jiaotong University, Nanchang, China.
| | - Shuwen Li
- School of Information Engineering, East China Jiaotong University, Nanchang, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Qiu Xiao
- College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.
| |
Collapse
|
5
|
Zhao BW, He YZ, Su XR, Yang Y, Li GD, Huang YA, Hu PW, You ZH, Hu L. Motif-Aware miRNA-Disease Association Prediction via Hierarchical Attention Network. IEEE J Biomed Health Inform 2024; 28:4281-4294. [PMID: 38557614 DOI: 10.1109/jbhi.2024.3383591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
As post-transcriptional regulators of gene expression, micro-ribonucleic acids (miRNAs) are regarded as potential biomarkers for a variety of diseases. Hence, the prediction of miRNA-disease associations (MDAs) is of great significance for an in-depth understanding of disease pathogenesis and progression. Existing prediction models are mainly concentrated on incorporating different sources of biological information to perform the MDA prediction task while failing to consider the fully potential utility of MDA network information at the motif-level. To overcome this problem, we propose a novel motif-aware MDA prediction model, namely MotifMDA, by fusing a variety of high- and low-order structural information. In particular, we first design several motifs of interest considering their ability to characterize how miRNAs are associated with diseases through different network structural patterns. Then, MotifMDA adopts a two-layer hierarchical attention to identify novel MDAs. Specifically, the first attention layer learns high-order motif preferences based on their occurrences in the given MDA network, while the second one learns the final embeddings of miRNAs and diseases through coupling high- and low-order preferences. Experimental results on two benchmark datasets have demonstrated the superior performance of MotifMDA over several state-of-the-art prediction models. This strongly indicates that accurate MDA prediction can be achieved by relying solely on MDA network information. Furthermore, our case studies indicate that the incorporation of motif-level structure information allows MotifMDA to discover novel MDAs from different perspectives.
Collapse
|
6
|
Martins C, Neves B, Teixeira AS, Froes M, Sarmento P, Machado J, Magalhães CA, Silva NA, Silva MJ, Leite F. Identifying subgroups in heart failure patients with multimorbidity by clustering and network analysis. BMC Med Inform Decis Mak 2024; 24:95. [PMID: 38622703 PMCID: PMC11020914 DOI: 10.1186/s12911-024-02497-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 04/03/2024] [Indexed: 04/17/2024] Open
Abstract
This study presents a workflow for identifying and characterizing patients with Heart Failure (HF) and multimorbidity utilizing data from Electronic Health Records. Multimorbidity, the co-occurrence of two or more chronic conditions, poses a significant challenge on healthcare systems. Nonetheless, understanding of patients with multimorbidity, including the most common disease interactions, risk factors, and treatment responses, remains limited, particularly for complex and heterogeneous conditions like HF. We conducted a clustering analysis of 3745 HF patients using demographics, comorbidities, laboratory values, and drug prescriptions. Our analysis revealed four distinct clusters with significant differences in multimorbidity profiles showing differential prognostic implications regarding unplanned hospital admissions. These findings underscore the considerable disease heterogeneity within HF patients and emphasize the potential for improved characterization of patient subgroups for clinical risk stratification through the use of EHR data.
Collapse
Affiliation(s)
- Catarina Martins
- Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
- INESC-ID, Lisboa, Portugal
| | - Bernardo Neves
- Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal.
- Hospital da Luz Lisboa, Internal Medicine, Luz Saúde, Lisboa, Portugal.
- Hospital da Luz Learning Health, Luz Saúde, Lisboa, Portugal.
| | - Andreia Sofia Teixeira
- Hospital da Luz Learning Health, Luz Saúde, Lisboa, Portugal
- LASIGE and Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
| | - Miguel Froes
- Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
| | - Pedro Sarmento
- Hospital da Luz Lisboa, Internal Medicine, Luz Saúde, Lisboa, Portugal
| | - Jaime Machado
- Hospital da Luz Learning Health, Luz Saúde, Lisboa, Portugal
| | | | - Nuno A Silva
- Hospital da Luz Learning Health, Luz Saúde, Lisboa, Portugal
| | - Mário J Silva
- Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
- INESC-ID, Lisboa, Portugal
| | - Francisca Leite
- Hospital da Luz Learning Health, Luz Saúde, Lisboa, Portugal
| |
Collapse
|
7
|
Li J, Chen J, Wang Z, Lei X. HoRDA: Learning higher-order structure information for predicting RNA-disease associations. Artif Intell Med 2024; 148:102775. [PMID: 38325924 DOI: 10.1016/j.artmed.2024.102775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 10/16/2023] [Accepted: 01/14/2024] [Indexed: 02/09/2024]
Abstract
CircRNA and miRNA are crucial non-coding RNAs, which are associated with biological diseases. Exploring the associations between RNAs and diseases often requires a significant time and financial investments, which has been greatly alleviated and improved with the application of deep learning methods in bioinformatics. However, existing methods often fail to achieve higher accuracy and cannot be universal between multiple RNAs. Moreover, complex RNA-disease associations hide important higher-order topology information. To address these issues, we learn higher-order structure information for predicting RNA-disease associations (HoRDA). Firstly, the correlations between RNAs and the correlations between diseases are fully explored by combining similarity and higher-order graph attention network. Then, a higher-order graph convolutional network is constructed to aggregate neighbor information, and further obtain the representations of RNAs and diseases. Meanwhile, due to the large number of complex and variable higher-order structures in biological networks, we design a higher-order negative sampling strategy to gain more desirable negative samples. Finally, the obtained embeddings of RNAs and diseases are feed into logistic regression model to acquire the probabilities of RNA-disease associations. Diverse simulation results demonstrate the superiority of the proposed method. In the end, the case study is conducted on breast neoplasms, colorectal neoplasms, and gastric neoplasms. We validate the proposed higher-order strategies through ablative and exploratory analyses and further demonstrate the practical applicability of HoRDA. HoRDA has a certain contribution in RNA-disease association prediction.
Collapse
Affiliation(s)
- Julong Li
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China
| | - Jianrui Chen
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China.
| | - Zhihui Wang
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China
| |
Collapse
|
8
|
Hu L, Zhang M, Hu P, Zhang J, Niu C, Lu X, Jiang X, Ma Y. Dual-channel hypergraph convolutional network for predicting herb-disease associations. Brief Bioinform 2024; 25:bbae067. [PMID: 38426326 PMCID: PMC10939431 DOI: 10.1093/bib/bbae067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 01/26/2024] [Accepted: 02/05/2024] [Indexed: 03/02/2024] Open
Abstract
Herbs applicability in disease treatment has been verified through experiences over thousands of years. The understanding of herb-disease associations (HDAs) is yet far from complete due to the complicated mechanism inherent in multi-target and multi-component (MTMC) botanical therapeutics. Most of the existing prediction models fail to incorporate the MTMC mechanism. To overcome this problem, we propose a novel dual-channel hypergraph convolutional network, namely HGHDA, for HDA prediction. Technically, HGHDA first adopts an autoencoder to project components and target protein onto a low-dimensional latent space so as to obtain their embeddings by preserving similarity characteristics in their original feature spaces. To model the high-order relations between herbs and their components, we design a channel in HGHDA to encode a hypergraph that describes the high-order patterns of herb-component relations via hypergraph convolution. The other channel in HGHDA is also established in the same way to model the high-order relations between diseases and target proteins. The embeddings of drugs and diseases are then aggregated through our dual-channel network to obtain the prediction results with a scoring function. To evaluate the performance of HGHDA, a series of extensive experiments have been conducted on two benchmark datasets, and the results demonstrate the superiority of HGHDA over the state-of-the-art algorithms proposed for HDA prediction. Besides, our case study on Chuan Xiong and Astragalus membranaceus is a strong indicator to verify the effectiveness of HGHDA, as seven and eight out of the top 10 diseases predicted by HGHDA for Chuan-Xiong and Astragalus-membranaceus, respectively, have been reported in literature.
Collapse
Affiliation(s)
- Lun Hu
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China
| | - Menglong Zhang
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China
| | - Pengwei Hu
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China
| | - Jun Zhang
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China
| | - Chao Niu
- University of Chinese Academy of Sciences, Beijing, China
- State Key Laboratory Basis of Xinjiang Indigenous Medicinal Plants Resource Utilization, Key Laboratory of Chemistry of Plant Resources in Arid Regions, Xinjiang Technical Institute of Physicsand Chemistry,Chinese Academy of Sciences Urumqi, China
| | - Xueying Lu
- University of Chinese Academy of Sciences, Beijing, China
- State Key Laboratory Basis of Xinjiang Indigenous Medicinal Plants Resource Utilization, Key Laboratory of Chemistry of Plant Resources in Arid Regions, Xinjiang Technical Institute of Physicsand Chemistry,Chinese Academy of Sciences Urumqi, China
| | - Xiangrui Jiang
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica,Chinese Academy of Sciences Shanghai, China
| | - Yupeng Ma
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China
| |
Collapse
|
9
|
Zhao BW, Su XR, Yang Y, Li DX, Li GD, Hu PW, Zhao YG, Hu L. Drug-disease association prediction using semantic graph and function similarity representation learning over heterogeneous information networks. Methods 2023; 220:106-114. [PMID: 37972913 DOI: 10.1016/j.ymeth.2023.10.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 10/13/2023] [Accepted: 10/28/2023] [Indexed: 11/19/2023] Open
Abstract
Discovering new indications for existing drugs is a promising development strategy at various stages of drug research and development. However, most of them complete their tasks by constructing a variety of heterogeneous networks without considering available higher-order connectivity patterns in heterogeneous biological information networks, which are believed to be useful for improving the accuracy of new drug discovering. To this end, we propose a computational-based model, called SFRLDDA, for drug-disease association prediction by using semantic graph and function similarity representation learning. Specifically, SFRLDDA first integrates a heterogeneous information network (HIN) by drug-disease, drug-protein, protein-disease associations, and their biological knowledge. Second, different representation learning strategies are applied to obtain the feature representations of drugs and diseases from different perspectives over semantic graph and function similarity graphs constructed, respectively. At last, a Random Forest classifier is incorporated by SFRLDDA to discover potential drug-disease associations (DDAs). Experimental results demonstrate that SFRLDDA yields a best performance when compared with other state-of-the-art models on three benchmark datasets. Moreover, case studies also indicate that the simultaneous consideration of semantic graph and function similarity of drugs and diseases in the HIN allows SFRLDDA to precisely predict DDAs in a more comprehensive manner.
Collapse
Affiliation(s)
- Bo-Wei Zhao
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China.
| | - Xiao-Rui Su
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China.
| | - Yue Yang
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China.
| | - Dong-Xu Li
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China.
| | - Guo-Dong Li
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China.
| | - Peng-Wei Hu
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China.
| | - Yong-Gang Zhao
- Department of Orthopaedic Surgery (hand and foot trauma), People's Hospital of Dongxihu, Wuhan 420100, China.
| | - Lun Hu
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China.
| |
Collapse
|
10
|
Lin S, Mao X, Hong L, Lin S, Wei DQ, Xiong Y. MATT-DDI: Predicting multi-type drug-drug interactions via heterogeneous attention mechanisms. Methods 2023; 220:1-10. [PMID: 37858611 DOI: 10.1016/j.ymeth.2023.10.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 10/13/2023] [Accepted: 10/17/2023] [Indexed: 10/21/2023] Open
Abstract
The joint use of multiple drugs can result in adverse drug-drug interactions (DDIs) and side effects that harm the body. Accurate identification of DDIs is crucial for avoiding accidental drug side effects and understanding potential mechanisms underlying DDIs. Several computational methods have been proposed for multi-type DDI prediction, but most rely on the similarity profiles of drugs as the drug feature vectors, which may result in information leakage and overoptimistic performance when predicting interactions between new drugs. To address this issue, we propose a novel method, MATT-DDI, for predicting multi-type DDIs based on the original feature vectors of drugs and multiple attention mechanisms. MATT-DDI consists of three main modules: the top k most similar drug pair selection module, heterogeneous attention mechanism module and multi‑type DDI prediction module. Firstly, based on the feature vector of the input drug pair (IDP), k drug pairs that are most similar to the input drug pair from the training dataset are selected according to cosine similarity between drug pairs. Then, the vectors of k selected drug pairs are averaged to obtain a new drug pair (NDP). Next, IDP and NDP are fed into heterogeneous attention modules, including scaled dot product attention and bilinear attention, to extract latent feature vectors. Finally, these latent feature vectors are taken as input of the classification module to predict DDI types. We evaluated MATT-DDI on three different tasks. The experimental results show that MATT-DDI provides better or comparable performance compared to several state-of-the-art methods, and its feasibility is supported by case studies. MATT-DDI is a robust model for predicting multi-type DDIs with excellent performance and no information leakage.
Collapse
Affiliation(s)
- Shenggeng Lin
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Xueying Mao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Liang Hong
- Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China; School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Shuangjun Lin
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China; Zhongjing Research and Industrialization Institute of Chinese Medicine, Nanyang 473006, China; Peng Cheng National Laboratory, Shenzhen 518055, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China; Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China.
| |
Collapse
|
11
|
Li DX, Zhou P, Zhao BW, Su XR, Li GD, Zhang J, Hu PW, Hu L. Biocaiv: an integrative webserver for motif-based clustering analysis and interactive visualization of biological networks. BMC Bioinformatics 2023; 24:451. [PMID: 38030973 PMCID: PMC10685597 DOI: 10.1186/s12859-023-05574-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 11/20/2023] [Indexed: 12/01/2023] Open
Abstract
BACKGROUND As an important task in bioinformatics, clustering analysis plays a critical role in understanding the functional mechanisms of many complex biological systems, which can be modeled as biological networks. The purpose of clustering analysis in biological networks is to identify functional modules of interest, but there is a lack of online clustering tools that visualize biological networks and provide in-depth biological analysis for discovered clusters. RESULTS Here we present BioCAIV, a novel webserver dedicated to maximize its accessibility and applicability on the clustering analysis of biological networks. This, together with its user-friendly interface, assists biological researchers to perform an accurate clustering analysis for biological networks and identify functionally significant modules for further assessment. CONCLUSIONS BioCAIV is an efficient clustering analysis webserver designed for a variety of biological networks. BioCAIV is freely available without registration requirements at http://bioinformatics.tianshanzw.cn:8888/BioCAIV/ .
Collapse
Affiliation(s)
- Dong-Xu Li
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Ürümqi, China
| | - Peng Zhou
- School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China
| | - Bo-Wei Zhao
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Ürümqi, China
| | - Xiao-Rui Su
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Ürümqi, China
| | - Guo-Dong Li
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Ürümqi, China
| | - Jun Zhang
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Ürümqi, China
| | - Peng-Wei Hu
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Ürümqi, China
| | - Lun Hu
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China.
- University of Chinese Academy of Sciences, Beijing, China.
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Ürümqi, China.
| |
Collapse
|
12
|
Pan J, You Z, You W, Zhao T, Feng C, Zhang X, Ren F, Ma S, Wu F, Wang S, Sun Y. PTBGRP: predicting phage-bacteria interactions with graph representation learning on microbial heterogeneous information network. Brief Bioinform 2023; 24:bbad328. [PMID: 37742053 DOI: 10.1093/bib/bbad328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 08/14/2023] [Accepted: 08/30/2023] [Indexed: 09/25/2023] Open
Abstract
Identifying the potential bacteriophages (phage) candidate to treat bacterial infections plays an essential role in the research of human pathogens. Computational approaches are recognized as a valid way to predict bacteria and target phages. However, most of the current methods only utilize lower-order biological information without considering the higher-order connectivity patterns, which helps to improve the predictive accuracy. Therefore, we developed a novel microbial heterogeneous interaction network (MHIN)-based model called PTBGRP to predict new phages for bacterial hosts. Specifically, PTBGRP first constructs an MHIN by integrating phage-bacteria interaction (PBI) and six bacteria-bacteria interaction networks with their biological attributes. Then, different representation learning methods are deployed to extract higher-level biological features and lower-level topological features from MHIN. Finally, PTBGRP employs a deep neural network as the classifier to predict unknown PBI pairs based on the fused biological information. Experiment results demonstrated that PTBGRP achieves the best performance on the corresponding ESKAPE pathogens and PBI dataset when compared with state-of-art methods. In addition, case studies of Klebsiella pneumoniae and Staphylococcus aureus further indicate that the consideration of rich heterogeneous information enables PTBGRP to accurately predict PBI from a more comprehensive perspective. The webserver of the PTBGRP predictor is freely available at http://120.77.11.78/PTBGRP/.
Collapse
Affiliation(s)
- Jie Pan
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, the College of Life Sciences, Northwest University, Xi'an 710069, China
| | - Zhuhong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China
| | - Wencai You
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, the College of Life Sciences, Northwest University, Xi'an 710069, China
| | - Tian Zhao
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, the College of Life Sciences, Northwest University, Xi'an 710069, China
| | - Chenlu Feng
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, the College of Life Sciences, Northwest University, Xi'an 710069, China
| | - Xuexia Zhang
- North China Pharmaceutical Group, Shijiazhuang 050015, Hebei, China
- National Microbial Medicine Engineering & Research Center, Shijiazhuang 050015, Hebei, China
| | - Fengzhi Ren
- North China Pharmaceutical Group, Shijiazhuang 050015, Hebei, China
- National Microbial Medicine Engineering & Research Center, Shijiazhuang 050015, Hebei, China
| | - Sanxing Ma
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, the College of Life Sciences, Northwest University, Xi'an 710069, China
| | - Fan Wu
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, the College of Life Sciences, Northwest University, Xi'an 710069, China
| | - Shiwei Wang
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, the College of Life Sciences, Northwest University, Xi'an 710069, China
| | - Yanmei Sun
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, the College of Life Sciences, Northwest University, Xi'an 710069, China
| |
Collapse
|
13
|
Li X, Liao M, Wang B, Zan X, Huo Y, Liu Y, Bao Z, Xu P, Liu W. A drug repurposing method based on inhibition effect on gene regulatory network. Comput Struct Biotechnol J 2023; 21:4446-4455. [PMID: 37731599 PMCID: PMC10507583 DOI: 10.1016/j.csbj.2023.09.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 09/05/2023] [Accepted: 09/07/2023] [Indexed: 09/22/2023] Open
Abstract
Numerous computational drug repurposing methods have emerged as efficient alternatives to costly and time-consuming traditional drug discovery approaches. Some of these methods are based on the assumption that the candidate drug should have a reversal effect on disease-associated genes. However, such methods are not applicable in the case that there is limited overlap between disease-related genes and drug-perturbed genes. In this study, we proposed a novel Drug Repurposing method based on the Inhibition Effect on gene regulatory network (DRIE) to identify potential drugs for cancer treatment. DRIE integrated gene expression profile and gene regulatory network to calculate inhibition score by using the shortest path in the disease-specific network. The results on eleven datasets indicated the superior performance of DRIE when compared to other state-of-the-art methods. Case studies showed that our method effectively discovered novel drug-disease associations. Our findings demonstrated that the top-ranked drug candidates had been already validated by CTD database. Additionally, it clearly identified potential agents for three cancers (colorectal, breast, and lung cancer), which was beneficial when annotating drug-disease relationships in the CTD. This study proposed a novel framework for drug repurposing, which would be helpful for drug discovery and development.
Collapse
Affiliation(s)
- Xianbin Li
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
- School of Computer Science of Information Technology, Qiannan Normal University for Nationalities, Duyun, China
| | - Minzhen Liao
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
| | - Bing Wang
- School of Medicine, Southeast University, Nanjing, China
| | - Xiangzhen Zan
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
| | - Yanhao Huo
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
| | - Yue Liu
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
| | - Zhenshen Bao
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
- School of Computer Science of Information Technology, Qiannan Normal University for Nationalities, Duyun, China
| | - Peng Xu
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
- School of Computer Science of Information Technology, Qiannan Normal University for Nationalities, Duyun, China
| | - Wenbin Liu
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
| |
Collapse
|
14
|
Luo X, Wang L, Hu P, Hu L. Predicting Protein-Protein Interactions Using Sequence and Network Information via Variational Graph Autoencoder. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3182-3194. [PMID: 37155405 DOI: 10.1109/tcbb.2023.3273567] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Protein-protein interactions (PPIs) play a critical role in the proteomics study, and a variety of computational algorithms have been developed to predict PPIs. Though effective, their performance is constrained by high false-positive and false-negative rates observed in PPI data. To overcome this problem, a novel PPI prediction algorithm, namely PASNVGA, is proposed in this work by combining the sequence and network information of proteins via variational graph autoencoder. To do so, PASNVGA first applies different strategies to extract the features of proteins from their sequence and network information, and obtains a more compact form of these features using principal component analysis. In addition, PASNVGA designs a scoring function to measure the higher-order connectivity between proteins and so as to obtain a higher-order adjacency matrix. With all these features and adjacency matrices, PASNVGA trains a variational graph autoencoder model to further learn the integrated embeddings of proteins. The prediction task is then completed by using a simple feedforward neural network. Extensive experiments have been conducted on five PPI datasets collected from different species. Compared with several state-of-the-art algorithms, PASNVGA has been demonstrated as a promising PPI prediction algorithm.
Collapse
|
15
|
Palukuri MV, Patil RS, Marcotte EM. Molecular complex detection in protein interaction networks through reinforcement learning. BMC Bioinformatics 2023; 24:306. [PMID: 37532987 PMCID: PMC10394916 DOI: 10.1186/s12859-023-05425-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Accepted: 07/20/2023] [Indexed: 08/04/2023] Open
Abstract
BACKGROUND Proteins often assemble into higher-order complexes to perform their biological functions. Such protein-protein interactions (PPI) are often experimentally measured for pairs of proteins and summarized in a weighted PPI network, to which community detection algorithms can be applied to define the various higher-order protein complexes. Current methods include unsupervised and supervised approaches, often assuming that protein complexes manifest only as dense subgraphs. Utilizing supervised approaches, the focus is not on how to find them in a network, but only on learning which subgraphs correspond to complexes, currently solved using heuristics. However, learning to walk trajectories on a network to identify protein complexes leads naturally to a reinforcement learning (RL) approach, a strategy not extensively explored for community detection. Here, we develop and evaluate a reinforcement learning pipeline for community detection on weighted protein-protein interaction networks to detect new protein complexes. The algorithm is trained to calculate the value of different subgraphs encountered while walking on the network to reconstruct known complexes. A distributed prediction algorithm then scales the RL pipeline to search for novel protein complexes on large PPI networks. RESULTS The reinforcement learning pipeline is applied to a human PPI network consisting of 8k proteins and 60k PPI, which results in 1,157 protein complexes. The method demonstrated competitive accuracy with improved speed compared to previous algorithms. We highlight protein complexes such as C4orf19, C18orf21, and KIAA1522 which are currently minimally characterized. Additionally, the results suggest TMC04 be a putative additional subunit of the KICSTOR complex and confirm the involvement of C15orf41 in a higher-order complex with HIRA, CDAN1, ASF1A, and by 3D structural modeling. CONCLUSIONS Reinforcement learning offers several distinct advantages for community detection, including scalability and knowledge of the walk trajectories defining those communities. Applied to currently available human protein interaction networks, this method had comparable accuracy with other algorithms and notable savings in computational time, and in turn, led to clear predictions of protein function and interactions for several uncharacterized human proteins.
Collapse
Affiliation(s)
- Meghana V Palukuri
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX, 78712, USA.
- Oden Institute for Computational Engineering and Sciences, University of Texas, Austin, TX, 78712, USA.
| | - Ridhi S Patil
- Department of Biomedical Engineering, University of Texas, Austin, TX, 78712, USA.
| | - Edward M Marcotte
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX, 78712, USA.
- Oden Institute for Computational Engineering and Sciences, University of Texas, Austin, TX, 78712, USA.
| |
Collapse
|
16
|
Kuo KM, Talley PC, Chang CS. The accuracy of artificial intelligence used for non-melanoma skin cancer diagnoses: a meta-analysis. BMC Med Inform Decis Mak 2023; 23:138. [PMID: 37501114 PMCID: PMC10375663 DOI: 10.1186/s12911-023-02229-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 07/07/2023] [Indexed: 07/29/2023] Open
Abstract
BACKGROUND With rising incidence of skin cancer and relatively increased mortality rates, an improved diagnosis of such a potentially fatal disease is of vital importance. Although frequently curable, it nevertheless places a considerable burden upon healthcare systems. Among the various types of skin cancers, non-melanoma skin cancer is most prevalent. Despite such prevalence and its associated cost, scant proof concerning the diagnostic accuracy via Artificial Intelligence (AI) for non-melanoma skin cancer exists. This study meta-analyzes the diagnostic test accuracy of AI used to diagnose non-melanoma forms of skin cancer, and it identifies potential covariates that account for heterogeneity between extant studies. METHODS Various electronic databases (Scopus, PubMed, ScienceDirect, SpringerLink, and Dimensions) were examined to discern eligible studies beginning from March 2022. Those AI studies predictive of non-melanoma skin cancer were included. Summary estimates of sensitivity, specificity, and area under receiver operating characteristic curves were used to evaluate diagnostic accuracy. The revised Quality Assessment of Diagnostic Studies served to assess any risk of bias. RESULTS A literature search produced 39 eligible articles for meta-analysis. The summary sensitivity, specificity, and area under receiver operating characteristic curve of AI for diagnosing non-melanoma skin cancer was 0.78, 0.98, & 0.97, respectively. Skin cancer typology, data sources, cross validation, ensemble models, types of techniques, pre-trained models, and image augmentation became significant covariates accounting for heterogeneity in terms of both sensitivity and/or specificity. CONCLUSIONS Meta-analysis results revealed that AI is predictive of non-melanoma with an acceptable performance, but sensitivity may become improved. Further, ensemble models and pre-trained models are employable to improve true positive rating.
Collapse
Affiliation(s)
- Kuang Ming Kuo
- Department of Business Management, National United University, No.1, Miaoli, 360301, Lienda, Taiwan, Republic of China
| | - Paul C Talley
- Department of Applied English, I-Shou University, No. 1, Sec. 1, Syuecheng Rd., Dashu District, 84001, Kaohsiung City, Taiwan, Republic of China
| | - Chao-Sheng Chang
- Department of Occupational Therapy, I-Shou University, No. 1, Yida Rd., Yanchao District, 82445, Kaohsiung City, Taiwan, Republic of China.
- Department of Emergency Medicine, E-Da Hospital, I-Shou University, Kaohsiung, Taiwan, Republic of China.
| |
Collapse
|
17
|
Lin X, Dai L, Zhou Y, Yu ZG, Zhang W, Shi JY, Cao DS, Zeng L, Chen H, Song B, Yu PS, Zeng X. Comprehensive evaluation of deep and graph learning on drug-drug interactions prediction. Brief Bioinform 2023:bbad235. [PMID: 37401373 DOI: 10.1093/bib/bbad235] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/30/2023] [Accepted: 06/05/2023] [Indexed: 07/05/2023] Open
Abstract
Recent advances and achievements of artificial intelligence (AI) as well as deep and graph learning models have established their usefulness in biomedical applications, especially in drug-drug interactions (DDIs). DDIs refer to a change in the effect of one drug to the presence of another drug in the human body, which plays an essential role in drug discovery and clinical research. DDIs prediction through traditional clinical trials and experiments is an expensive and time-consuming process. To correctly apply the advanced AI and deep learning, the developer and user meet various challenges such as the availability and encoding of data resources, and the design of computational methods. This review summarizes chemical structure based, network based, natural language processing based and hybrid methods, providing an updated and accessible guide to the broad researchers and development community with different domain knowledge. We introduce widely used molecular representation and describe the theoretical frameworks of graph neural network models for representing molecular structures. We present the advantages and disadvantages of deep and graph learning methods by performing comparative experiments. We discuss the potential technical challenges and highlight future directions of deep and graph learning models for accelerating DDIs prediction.
Collapse
Affiliation(s)
- Xuan Lin
- College of Computer Science, Xiangtan University, Xiangtan, China
| | - Lichang Dai
- College of Computer Science, Xiangtan University, Xiangtan, China
| | - Yafang Zhou
- College of Computer Science, Xiangtan University, Xiangtan, China
| | - Zu-Guo Yu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, China
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, China
| | - Jian-Yu Shi
- Northwestern Polytechnical University, Xian, China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, China
| | - Li Zeng
- AIDD department of Yuyao Biotech, Shanghai, China
| | - Haowen Chen
- College of Computer Science and Electronic Engineering, Hunan University, 410013 Changsha, P. R. China
| | - Bosheng Song
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Philip S Yu
- University of Illinois at Chicago and also holds the Wexler Chair in Information Technology
| | - Xiangxiang Zeng
- College of Information Science and Engineering, Hunan University, Changsha, China
| |
Collapse
|
18
|
Shah E, Maji P. Multi-View Kernel Learning for Identification of Disease Genes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2278-2290. [PMID: 37027602 DOI: 10.1109/tcbb.2023.3247033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Gene expression data sets and protein-protein interaction (PPI) networks are two heterogeneous data sources that have been extensively studied, due to their ability to capture the co-expression patterns among genes and their topological connections. Although they depict different traits of the data, both of them tend to group co-functional genes together. This phenomenon agrees with the basic assumption of multi-view kernel learning, according to which different views of the data contain a similar inherent cluster structure. Based on this inference, a new multi-view kernel learning based disease gene identification algorithm, termed as DiGId, is put forward. A novel multi-view kernel learning approach is proposed that aims to learn a consensus kernel, which efficiently captures the heterogeneous information of individual views as well as depicts the underlying inherent cluster structure. Some low-rank constraints are imposed on the learned multi-view kernel, so that it can effectively be partitioned into k or fewer clusters. The learned joint cluster structure is used to curate a set of potential disease genes. Moreover, a novel approach is put forward to quantify the importance of each view. In order to demonstrate the effectiveness of the proposed approach in capturing the relevant information depicted by individual views, an extensive analysis is performed on four different cancer-related gene expression data sets and PPI network, considering different similarity measures.
Collapse
|
19
|
Gao R, Luo J, Ding H, Zhai H. INSnet: a method for detecting insertions based on deep learning network. BMC Bioinformatics 2023; 24:80. [PMID: 36879189 PMCID: PMC9990265 DOI: 10.1186/s12859-023-05216-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Accepted: 03/01/2023] [Indexed: 03/08/2023] Open
Abstract
BACKGROUND Many studies have shown that structural variations (SVs) strongly impact human disease. As a common type of SV, insertions are usually associated with genetic diseases. Therefore, accurately detecting insertions is of great significance. Although many methods for detecting insertions have been proposed, these methods often generate some errors and miss some variants. Hence, accurately detecting insertions remains a challenging task. RESULTS In this paper, we propose a method named INSnet to detect insertions using a deep learning network. First, INSnet divides the reference genome into continuous sub-regions and takes five features for each locus through alignments between long reads and the reference genome. Next, INSnet uses a depthwise separable convolutional network. The convolution operation extracts informative features through spatial information and channel information. INSnet uses two attention mechanisms, the convolutional block attention module (CBAM) and efficient channel attention (ECA) to extract key alignment features in each sub-region. In order to capture the relationship between adjacent subregions, INSnet uses a gated recurrent unit (GRU) network to further extract more important SV signatures. After predicting whether a sub-region contains an insertion through the previous steps, INSnet determines the precise site and length of the insertion. The source code is available from GitHub at https://github.com/eioyuou/INSnet . CONCLUSION Experimental results show that INSnet can achieve better performance than other methods in terms of F1 score on real datasets.
Collapse
Affiliation(s)
- Runtian Gao
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China
| | - Junwei Luo
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China.
| | - Hongyu Ding
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China
| | - Haixia Zhai
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China
| |
Collapse
|
20
|
Wang X, Yang W, Yang Y, He Y, Zhang J, Wang L, Hu L. PPISB: A Novel Network-Based Algorithm of Predicting Protein-Protein Interactions With Mixed Membership Stochastic Blockmodel. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1606-1612. [PMID: 35939453 DOI: 10.1109/tcbb.2022.3196336] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Protein-protein interactions (PPIs) play an essential role for most of biological processes in cells. Many computational algorithms have thus been proposed to predict PPIs. However, most of them heavily rest on the biological information of proteins while ignoring the latent structural features of proteins presented in a PPI network. In this paper, we propose an efficient network-based prediction algorithm, namely PPISB, based on a mixed membership stochastic blockmodel. By simulating the generative process of a PPI network, PPISB is able to capture the latent community structures. The inference procedure adopted by PPISB further optimizes the membership distributions of proteins over different complexes. After that, a distance measure is designed to compute the similarity between two proteins in terms of their likelihoods of being in the same complex, thus verifying whether they interact with each other or not. To evaluate the performance of PPISB, a series of extensive experiments have been conducted with five PPI networks collected from different species and the results demonstrate that PPISB has a promising performance when applied to predict PPIs in terms of several evaluation metrics. Hence, we reason that PPISB is preferred over state-of-the-art network-based prediction algorithms especially for predicting potential PPIs.
Collapse
|
21
|
Yu T, Liu M, Ren Z, Zhang J. A Fast Approximate Method for k-Edge Connected Component Detection in Graphs with High Accuracy. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.03.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2023]
|
22
|
Discovering Entities Similarities in Biological Networks Using a Hybrid Immune Algorithm. INFORMATICS 2023. [DOI: 10.3390/informatics10010018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Disease phenotypes are generally caused by the failure of gene modules which often have similar biological roles. Through the study of biological networks, it is possible to identify the intrinsic structure of molecular interactions in order to identify the so-called “disease modules”. Community detection is an interesting and valuable approach to discovering the structure of the community in a complex network, revealing the internal organization of the nodes, and has become a leading research topic in the analysis of complex networks. This work investigates the link between biological modules and network communities in test-case biological networks that are commonly used as a reference point and which include Protein–Protein Interaction Networks, Metabolic Networks and Transcriptional Regulation Networks. In order to identify small and structurally well-defined communities in the biological context, a hybrid immune metaheuristic algorithm Hybrid-IA is proposed and compared with several metaheuristics, hyper-heuristics, and the well-known greedy algorithm Louvain, with respect to modularity maximization. Considering the limitation of modularity optimization, which can fail to identify smaller communities, the reliability of Hybrid-IA was also analyzed with respect to three well-known sensitivity analysis measures (NMI, ARI and NVI) that assess how similar the detected communities are to real ones. By inspecting all outcomes and the performed comparisons, we will see that on one hand Hybrid-IA finds slightly lower modularity values than Louvain, but outperforms all other metaheuristics, while on the other hand, it can detect communities more similar to the real ones when compared to those detected by Louvain.
Collapse
|
23
|
Selvaraj MK, Thakur A, Kumar M, Pinnaka AK, Suri CR, Siddhardha B, Elumalai SP. Ion-pumping microbial rhodopsin protein classification by machine learning approach. BMC Bioinformatics 2023; 24:29. [PMID: 36707759 PMCID: PMC9881276 DOI: 10.1186/s12859-023-05138-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 01/04/2023] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND Rhodopsin is a seven-transmembrane protein covalently linked with retinal chromophore that absorbs photons for energy conversion and intracellular signaling in eukaryotes, bacteria, and archaea. Haloarchaeal rhodopsins are Type-I microbial rhodopsin that elicits various light-driven functions like proton pumping, chloride pumping and Phototaxis behaviour. The industrial application of Ion-pumping Haloarchaeal rhodopsins is limited by the lack of full-length rhodopsin sequence-based classifications, which play an important role in Ion-pumping activity. The well-studied Haloarchaeal rhodopsin is a proton-pumping bacteriorhodopsin that shows promising applications in optogenetics, biosensitized solar cells, security ink, data storage, artificial retinal implant and biohydrogen generation. As a result, a low-cost computational approach is required to identify Ion-pumping Haloarchaeal rhodopsin sequences and its subtype. RESULTS This study uses a support vector machine (SVM) technique to identify these ion-pumping Haloarchaeal rhodopsin proteins. The haloarchaeal ion pumping rhodopsins viz., bacteriorhodopsin, halorhodopsin, xanthorhodopsin, sensoryrhodopsin and marine prokaryotic Ion-pumping rhodopsins like actinorhodopsin, proteorhodopsin have been utilized to develop the methods that accurately identified the ion pumping haloarchaeal and other type I microbial rhodopsins. We achieved overall maximum accuracy of 97.78%, 97.84% and 97.60%, respectively, for amino acid composition, dipeptide composition and hybrid approach on tenfold cross validation using SVM. Predictive models for each class of rhodopsin performed equally well on an independent data set. In addition to this, similar results were achieved using another machine learning technique namely random forest. Simultaneously predictive models performed equally well during five-fold cross validation. Apart from this study, we also tested the own, blank, BLAST dataset and annotated whole-genome rhodopsin sequences of PWS haloarchaeal isolates in the developed methods. The developed web server ( https://bioinfo.imtech.res.in/servers/rhodopred ) can identify the Ion Pumping Haloarchaeal rhodopsin proteins and their subtypes. We expect this web tool would be useful for rhodopsin researchers. CONCLUSION The overall performance of the developed method results show that it accurately identifies the Ionpumping Haloarchaeal rhodopsin and their subtypes using known and unknown microbial rhodopsin sequences. We expect that this study would be useful for optogenetics, molecular biologists and rhodopsin researchers.
Collapse
Affiliation(s)
- Muthu Krishnan Selvaraj
- grid.418099.dMTCC-Microbial Type Culture Collection and Gene Bank, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR-IMTECH), Chandigarh, 160036 India
| | - Anamika Thakur
- grid.418099.dVirology Unit and Bioinformatics Centre, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR-IMTECH), Chandigarh, 160036 India
| | - Manoj Kumar
- grid.418099.dVirology Unit and Bioinformatics Centre, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR-IMTECH), Chandigarh, 160036 India
| | - Anil Kumar Pinnaka
- grid.418099.dMTCC-Microbial Type Culture Collection and Gene Bank, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR-IMTECH), Chandigarh, 160036 India
| | - Chander Raman Suri
- grid.418099.dBiosensor Department, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR-IMTECH), Chandigarh, 160036 India
| | - Busi Siddhardha
- grid.412517.40000 0001 2152 9956Department of Microbiology, School of Life Sciences, Pondicherry University, Puducherry, 605014 India
| | - Senthil Prasad Elumalai
- grid.418099.dBiochemical Engineering Research and Process Development Centre, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR-IMTECH), Chandigarh, 160036 India
| |
Collapse
|
24
|
He Y, Yang Y, Su X, Zhao B, Xiong S, Hu L. Incorporating higher order network structures to improve miRNA-disease association prediction based on functional modularity. Brief Bioinform 2023; 24:6958503. [PMID: 36562706 DOI: 10.1093/bib/bbac562] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 10/29/2022] [Accepted: 11/19/2022] [Indexed: 12/24/2022] Open
Abstract
As microRNAs (miRNAs) are involved in many essential biological processes, their abnormal expressions can serve as biomarkers and prognostic indicators to prevent the development of complex diseases, thus providing accurate early detection and prognostic evaluation. Although a number of computational methods have been proposed to predict miRNA-disease associations (MDAs) for further experimental verification, their performance is limited primarily by the inadequacy of exploiting lower order patterns characterizing known MDAs to identify missing ones from MDA networks. Hence, in this work, we present a novel prediction model, namely HiSCMDA, by incorporating higher order network structures for improved performance of MDA prediction. To this end, HiSCMDA first integrates miRNA similarity network, disease similarity network and MDA network to preserve the advantages of all these networks. After that, it identifies overlapping functional modules from the integrated network by predefining several higher order connectivity patterns of interest. Last, a path-based scoring function is designed to infer potential MDAs based on network paths across related functional modules. HiSCMDA yields the best performance across all datasets and evaluation metrics in the cross-validation and independent validation experiments. Furthermore, in the case studies, 49 and 50 out of the top 50 miRNAs, respectively, predicted for colon neoplasms and lung neoplasms have been validated by well-established databases. Experimental results show that rich higher order organizational structures exposed in the MDA network gain new insight into the MDA prediction based on higher order connectivity patterns.
Collapse
Affiliation(s)
- Yizhou He
- School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, 430070, China
| | - Yue Yang
- School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, 430070, China
| | - Xiaorui Su
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
| | - Bowei Zhao
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
| | - Shengwu Xiong
- School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, 430070, China
| | - Lun Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
| |
Collapse
|
25
|
Zhao J, Sun J, Shuai SC, Zhao Q, Shuai J. Predicting potential interactions between lncRNAs and proteins via combined graph auto-encoder methods. Brief Bioinform 2023; 24:6896030. [PMID: 36515153 DOI: 10.1093/bib/bbac527] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 10/23/2022] [Accepted: 11/06/2022] [Indexed: 12/15/2022] Open
Abstract
Long noncoding RNA (lncRNA) is a kind of noncoding RNA with a length of more than 200 nucleotide units. Numerous research studies have proven that although lncRNAs cannot be directly translated into proteins, lncRNAs still play an important role in human growth processes by interacting with proteins. Since traditional biological experiments often require a lot of time and material costs to explore potential lncRNA-protein interactions (LPI), several computational models have been proposed for this task. In this study, we introduce a novel deep learning method known as combined graph auto-encoders (LPICGAE) to predict potential human LPIs. First, we apply a variational graph auto-encoder to learn the low dimensional representations from the high-dimensional features of lncRNAs and proteins. Then the graph auto-encoder is used to reconstruct the adjacency matrix for inferring potential interactions between lncRNAs and proteins. Finally, we minimize the loss of the two processes alternately to gain the final predicted interaction matrix. The result in 5-fold cross-validation experiments illustrates that our method achieves an average area under receiver operating characteristic curve of 0.974 and an average accuracy of 0.985, which is better than those of existing six state-of-the-art computational methods. We believe that LPICGAE can help researchers to gain more potential relationships between lncRNAs and proteins effectively.
Collapse
Affiliation(s)
- Jingxuan Zhao
- University of Science and Technology Liaoning, 66459, Anshan, China
| | | | - Stella C Shuai
- Northwestern University, 3270, Evanston, IllinoisUnited States
| | - Qi Zhao
- University of Science and Technology Liaoning, 66459, Anshan, China
| | - Jianwei Shuai
- Department of Physics, Xiamen University, Xiamen, China
| |
Collapse
|
26
|
Peng W, Wu R, Dai W, Yu N. Identifying cancer driver genes based on multi-view heterogeneous graph convolutional network and self-attention mechanism. BMC Bioinformatics 2023; 24:16. [PMID: 36639646 PMCID: PMC9838012 DOI: 10.1186/s12859-023-05140-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Accepted: 01/06/2023] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND Correctly identifying the driver genes that promote cell growth can significantly assist drug design, cancer diagnosis and treatment. The recent large-scale cancer genomics projects have revealed multi-omics data from thousands of cancer patients, which requires to design effective models to unlock the hidden knowledge within the valuable data and discover cancer drivers contributing to tumorigenesis. RESULTS In this work, we propose a graph convolution network-based method called MRNGCN that integrates multiple gene relationship networks to identify cancer driver genes. First, we constructed three gene relationship networks, including the gene-gene, gene-outlying gene and gene-miRNA networks. Then, genes learnt feature presentations from the three networks through three sharing-parameter heterogeneous graph convolution network (HGCN) models with the self-attention mechanism. After that, these gene features pass a convolution layer to generate fused features. Finally, we utilized the fused features and the original feature to optimize the model by minimizing the node and link prediction losses. Meanwhile, we combined the fused features, the original features and the three features learned from every network through a logistic regression model to predict cancer driver genes. CONCLUSIONS We applied the MRNGCN to predict pan-cancer and cancer type-specific driver genes. Experimental results show that our model performs well in terms of the area under the ROC curve (AUC) and the area under the precision-recall curve (AUPRC) compared to state-of-the-art methods. Ablation experimental results show that our model successfully improved the cancer driver identification by integrating multiple gene relationship networks.
Collapse
Affiliation(s)
- Wei Peng
- grid.218292.20000 0000 8571 108XFaculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650050 China ,grid.218292.20000 0000 8571 108XComputer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, 650050 China
| | - Rong Wu
- grid.218292.20000 0000 8571 108XFaculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650050 China
| | - Wei Dai
- grid.218292.20000 0000 8571 108XFaculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650050 China ,grid.218292.20000 0000 8571 108XComputer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, 650050 China
| | - Ning Yu
- grid.264262.60000 0001 0725 9953Department of Computing Sciences, The College at Brockport, State University of New York, 350 New Campus Drive, Brockport, NY 14422 USA
| |
Collapse
|
27
|
He K, Mao R, Gong T, Cambria E, Li C. JCBIE: a joint continual learning neural network for biomedical information extraction. BMC Bioinformatics 2022; 23:549. [PMID: 36536280 PMCID: PMC9761970 DOI: 10.1186/s12859-022-05096-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2022] [Accepted: 12/05/2022] [Indexed: 12/23/2022] Open
Abstract
Extracting knowledge from heterogeneous data sources is fundamental for the construction of structured biomedical knowledge graphs (BKGs), where entities and relations are represented as nodes and edges in the graphs, respectively. Previous biomedical knowledge extraction methods simply considered limited entity types and relations by using a task-specific training set, which is insufficient for large-scale BKGs development and downstream task applications in different scenarios. To alleviate this issue, we propose a joint continual learning biomedical information extraction (JCBIE) network to extract entities and relations from different biomedical information datasets. By empirically studying different joint learning and continual learning strategies, the proposed JCBIE can learn and expand different types of entities and relations from different datasets. JCBIE uses two separated encoders in joint-feature extraction, hence can effectively avoid the feature confusion problem comparing with using one hard-parameter sharing encoder. Specifically, it allows us to adopt entity augmented inputs to establish the interaction between named entity recognition and relation extraction. Finally, a novel evaluation mechanism is proposed for measuring cross-corpus generalization errors, which was ignored by traditional evaluation methods. Our empirical studies show that JCBIE achieves promising performance when continual learning strategy is adopted with multiple corpora.
Collapse
Affiliation(s)
- Kai He
- grid.43169.390000 0001 0599 1243School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, Shaanxi China ,grid.43169.390000 0001 0599 1243Shaanxi Provincial Key Laboratory of Big Data Knowledge Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi China ,grid.43169.390000 0001 0599 1243National Engineering Lab for Big Data Analytics, Xi’an Jiaotong University, Xi’an, Shaanxi China
| | - Rui Mao
- grid.59025.3b0000 0001 2224 0361School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| | - Tieliang Gong
- grid.43169.390000 0001 0599 1243School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, Shaanxi China ,grid.43169.390000 0001 0599 1243Shaanxi Provincial Key Laboratory of Big Data Knowledge Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi China ,grid.43169.390000 0001 0599 1243National Engineering Lab for Big Data Analytics, Xi’an Jiaotong University, Xi’an, Shaanxi China
| | - Erik Cambria
- grid.59025.3b0000 0001 2224 0361School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| | - Chen Li
- grid.43169.390000 0001 0599 1243School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, Shaanxi China ,grid.43169.390000 0001 0599 1243Shaanxi Provincial Key Laboratory of Big Data Knowledge Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi China ,grid.43169.390000 0001 0599 1243National Engineering Lab for Big Data Analytics, Xi’an Jiaotong University, Xi’an, Shaanxi China
| |
Collapse
|
28
|
Hypergraph geometry reflects higher-order dynamics in protein interaction networks. Sci Rep 2022; 12:20879. [PMID: 36463292 PMCID: PMC9719542 DOI: 10.1038/s41598-022-24584-w] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 11/17/2022] [Indexed: 12/05/2022] Open
Abstract
Protein interactions form a complex dynamic molecular system that shapes cell phenotype and function; in this regard, network analysis is a powerful tool for studying the dynamics of cellular processes. Current models of protein interaction networks are limited in that the standard graph model can only represent pairwise relationships. Higher-order interactions are well-characterized in biology, including protein complex formation and feedback or feedforward loops. These higher-order relationships are better represented by a hypergraph as a generalized network model. Here, we present an approach to analyzing dynamic gene expression data using a hypergraph model and quantify network heterogeneity via Forman-Ricci curvature. We observe, on a global level, increased network curvature in pluripotent stem cells and cancer cells. Further, we use local curvature to conduct pathway analysis in a melanoma dataset, finding increased curvature in several oncogenic pathways and decreased curvature in tumor suppressor pathways. We compare this approach to a graph-based model and a differential gene expression approach.
Collapse
|
29
|
Zhang ML, Zhao BW, Su XR, He YZ, Yang Y, Hu L. RLFDDA: a meta-path based graph representation learning model for drug-disease association prediction. BMC Bioinformatics 2022; 23:516. [PMID: 36456957 PMCID: PMC9713188 DOI: 10.1186/s12859-022-05069-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 11/21/2022] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Drug repositioning is a very important task that provides critical information for exploring the potential efficacy of drugs. Yet developing computational models that can effectively predict drug-disease associations (DDAs) is still a challenging task. Previous studies suggest that the accuracy of DDA prediction can be improved by integrating different types of biological features. But how to conduct an effective integration remains a challenging problem for accurately discovering new indications for approved drugs. METHODS In this paper, we propose a novel meta-path based graph representation learning model, namely RLFDDA, to predict potential DDAs on heterogeneous biological networks. RLFDDA first calculates drug-drug similarities and disease-disease similarities as the intrinsic biological features of drugs and diseases. A heterogeneous network is then constructed by integrating DDAs, disease-protein associations and drug-protein associations. With such a network, RLFDDA adopts a meta-path random walk model to learn the latent representations of drugs and diseases, which are concatenated to construct joint representations of drug-disease associations. As the last step, we employ the random forest classifier to predict potential DDAs with their joint representations. RESULTS To demonstrate the effectiveness of RLFDDA, we have conducted a series of experiments on two benchmark datasets by following a ten-fold cross-validation scheme. The results show that RLFDDA yields the best performance in terms of AUC and F1-score when compared with several state-of-the-art DDAs prediction models. We have also conducted a case study on two common diseases, i.e., paclitaxel and lung tumors, and found that 7 out of top-10 diseases and 8 out of top-10 drugs have already been validated for paclitaxel and lung tumors respectively with literature evidence. Hence, the promising performance of RLFDDA may provide a new perspective for novel DDAs discovery over heterogeneous networks.
Collapse
Affiliation(s)
- Meng-Long Zhang
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China
| | - Bo-Wei Zhao
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China
| | - Xiao-Rui Su
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China
| | - Yi-Zhou He
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| | - Yue Yang
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| | - Lun Hu
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
- University of Chinese Academy of Sciences, Beijing, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China
| |
Collapse
|
30
|
Lin S, Chen W, Chen G, Zhou S, Wei DQ, Xiong Y. MDDI-SCL: predicting multi-type drug-drug interactions via supervised contrastive learning. J Cheminform 2022; 14:81. [DOI: 10.1186/s13321-022-00659-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 11/05/2022] [Indexed: 11/16/2022] Open
Abstract
AbstractThe joint use of multiple drugs may cause unintended drug-drug interactions (DDIs) and result in adverse consequence to the patients. Accurate identification of DDI types can not only provide hints to avoid these accidental events, but also elaborate the underlying mechanisms by how DDIs occur. Several computational methods have been proposed for multi-type DDI prediction, but room remains for improvement in prediction performance. In this study, we propose a supervised contrastive learning based method, MDDI-SCL, implemented by three-level loss functions, to predict multi-type DDIs. MDDI-SCL is mainly composed of three modules: drug feature encoder and mean squared error loss module, drug latent feature fusion and supervised contrastive loss module, multi-type DDI prediction and classification loss module. The drug feature encoder and mean squared error loss module uses self-attention mechanism and autoencoder to learn drug-level latent features. The drug latent feature fusion and supervised contrastive loss module uses multi-scale feature fusion to learn drug pair-level latent features. The prediction and classification loss module predicts DDI types of each drug pair. We evaluate MDDI-SCL on three different tasks of two datasets. Experimental results demonstrate that MDDI-SCL achieves better or comparable performance as the state-of-the-art methods. Furthermore, the effectiveness of supervised contrastive learning is validated by ablation experiment, and the feasibility of MDDI-SCL is supported by case studies. The source codes are available at https://github.com/ShenggengLin/MDDI-SCL.
Collapse
|
31
|
Hu L, Li Z, Tang Z, Zhao C, Zhou X, Hu P. Effectively predicting HIV-1 protease cleavage sites by using an ensemble learning approach. BMC Bioinformatics 2022; 23:447. [PMID: 36303135 PMCID: PMC9608884 DOI: 10.1186/s12859-022-04999-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 10/13/2022] [Indexed: 11/10/2022] Open
Abstract
Background The site information of substrates that can be cleaved by human immunodeficiency virus 1 proteases (HIV-1 PRs) is of great significance for designing effective inhibitors against HIV-1 viruses. A variety of machine learning-based algorithms have been developed to predict HIV-1 PR cleavage sites by extracting relevant features from substrate sequences. However, only relying on the sequence information is not sufficient to ensure a promising performance due to the uncertainty in the way of separating the datasets used for training and testing. Moreover, the existence of noisy data, i.e., false positive and false negative cleavage sites, could negatively influence the accuracy performance. Results In this work, an ensemble learning algorithm for predicting HIV-1 PR cleavage sites, namely EM-HIV, is proposed by training a set of weak learners, i.e., biased support vector machine classifiers, with the asymmetric bagging strategy. By doing so, the impact of data imbalance and noisy data can thus be alleviated. Besides, in order to make full use of substrate sequences, the features used by EM-HIV are collected from three different coding schemes, including amino acid identities, chemical properties and variable-length coevolutionary patterns, for the purpose of constructing more relevant feature vectors of octamers. Experiment results on three independent benchmark datasets demonstrate that EM-HIV outperforms state-of-the-art prediction algorithm in terms of several evaluation metrics. Hence, EM-HIV can be regarded as a useful tool to accurately predict HIV-1 PR cleavage sites.
Collapse
Affiliation(s)
- Lun Hu
- grid.9227.e0000000119573309Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Zhenfeng Li
- grid.162110.50000 0000 9291 3229School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China
| | - Zehai Tang
- grid.162110.50000 0000 9291 3229School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China
| | - Cheng Zhao
- grid.162110.50000 0000 9291 3229School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China
| | - Xi Zhou
- grid.9227.e0000000119573309Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Pengwei Hu
- grid.9227.e0000000119573309Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| |
Collapse
|
32
|
Kurata H, Tsukiyama S. ICAN: Interpretable cross-attention network for identifying drug and target protein interactions. PLoS One 2022; 17:e0276609. [PMID: 36279284 PMCID: PMC9591068 DOI: 10.1371/journal.pone.0276609] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 10/10/2022] [Indexed: 11/18/2022] Open
Abstract
Drug-target protein interaction (DTI) identification is fundamental for drug discovery and drug repositioning, because therapeutic drugs act on disease-causing proteins. However, the DTI identification process often requires expensive and time-consuming tasks, including biological experiments involving large numbers of candidate compounds. Thus, a variety of computation approaches have been developed. Of the many approaches available, chemo-genomics feature-based methods have attracted considerable attention. These methods compute the feature descriptors of drugs and proteins as the input data to train machine and deep learning models to enable accurate prediction of unknown DTIs. In addition, attention-based learning methods have been proposed to identify and interpret DTI mechanisms. However, improvements are needed for enhancing prediction performance and DTI mechanism elucidation. To address these problems, we developed an attention-based method designated the interpretable cross-attention network (ICAN), which predicts DTIs using the Simplified Molecular Input Line Entry System of drugs and amino acid sequences of target proteins. We optimized the attention mechanism architecture by exploring the cross-attention or self-attention, attention layer depth, and selection of the context matrixes from the attention mechanism. We found that a plain attention mechanism that decodes drug-related protein context features without any protein-related drug context features effectively achieved high performance. The ICAN outperformed state-of-the-art methods in several metrics on the DAVIS dataset and first revealed with statistical significance that some weighted sites in the cross-attention weight matrix represent experimental binding sites, thus demonstrating the high interpretability of the results. The program is freely available at https://github.com/kuratahiroyuki/ICAN.
Collapse
Affiliation(s)
- Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka, Japan
- * E-mail:
| | - Sho Tsukiyama
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka, Japan
| |
Collapse
|
33
|
Learning to rank complex network node based on the self-supervised graph convolution model. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109220] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
34
|
Pan X, Hu L, Hu P, You ZH. Identifying Protein Complexes From Protein-Protein Interaction Networks Based on Fuzzy Clustering and GO Semantic Information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2882-2893. [PMID: 34242171 DOI: 10.1109/tcbb.2021.3095947] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Protein complexes are of great significance to provide valuable insights into the mechanisms of biological processes of proteins. A variety of computational algorithms have thus been proposed to identify protein complexes in a protein-protein interaction network. However, few of them can perform their tasks by taking into account both network topology and protein attribute information in a unified fuzzy-based clustering framework. Since proteins in the same complex are similar in terms of their attribute information and the consideration of fuzzy clustering can also make it possible for us to identify overlapping complexes, we target to propose such a novel fuzzy-based clustering framework, namely FCAN-PCI, for an improved identification accuracy. To do so, the semantic similarity between the attribute information of proteins is calculated and we then integrate it into a well-established fuzzy clustering model together with the network topology. After that, a momentum method is adopted to accelerate the clustering procedure. FCAN-PCI finally applies a heuristical search strategy to identify overlapping protein complexes. A series of extensive experiments have been conducted to evaluate the performance of FCAN-PCI by comparing it with state-of-the-art identification algorithms and the results demonstrate the promising performance of FCAN-PCI.
Collapse
|
35
|
Dai C, Wang K. Adaptive Weighted Neighbors Method for Sensitivity Analysis. Interdiscip Sci 2022; 14:652-668. [PMID: 35426544 DOI: 10.1007/s12539-022-00512-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 03/16/2022] [Accepted: 03/17/2022] [Indexed: 06/14/2023]
Abstract
Identifying key factors from observational data is important for understanding complex phenomena in many disciplines, including biomedical sciences and biology. However, there are still some limitations in practical applications, such as severely nonlinear input-output relationships and highly skewed output distributions. To acquire more reliable sensitivity analysis (SA) results in these extreme cases, inspired by the weighted k-nearest neighbors algorithm, we propose a new method called adaptive weighted neighbors (AWN). AWN makes full use of the information contained in all training samples instead of limited samples and automatically gives more weight to nearby samples. Then, the bootstrap technique and Jansen's method are used to obtain reliable SA results based on AWN. We demonstrate the performance and accuracy of AWN by analyzing various biological and biomedical data sets, three simulated examples and two case studies, showing that it can effectively overcome the above limitations. We therefore expect it to be a complementary approach for SA.
Collapse
Affiliation(s)
- Chenxi Dai
- School of Biomedical Engineering and Imaging Medicine, Army Medical University, Chongqing, 400038, China
| | - Kaifa Wang
- School of Mathematics and Statistics, Southwest University, Chongqing, 400715, China.
| |
Collapse
|
36
|
Su XR, Hu L, You ZH, Hu PW, Zhao BW. Multi-view heterogeneous molecular network representation learning for protein-protein interaction prediction. BMC Bioinformatics 2022; 23:234. [PMID: 35710342 PMCID: PMC9205098 DOI: 10.1186/s12859-022-04766-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 05/27/2022] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Protein-protein interaction (PPI) plays an important role in regulating cells and signals. Despite the ongoing efforts of the bioassay group, continued incomplete data limits our ability to understand the molecular roots of human disease. Therefore, it is urgent to develop a computational method to predict PPIs from the perspective of molecular system. METHODS In this paper, a highly efficient computational model, MTV-PPI, is proposed for PPI prediction based on a heterogeneous molecular network by learning inter-view protein sequences and intra-view interactions between molecules simultaneously. On the one hand, the inter-view feature is extracted from the protein sequence by k-mer method. On the other hand, we use a popular embedding method LINE to encode the heterogeneous molecular network to obtain the intra-view feature. Thus, the protein representation used in MTV-PPI is constructed by the aggregation of its inter-view feature and intra-view feature. Finally, random forest is integrated to predict potential PPIs. RESULTS To prove the effectiveness of MTV-PPI, we conduct extensive experiments on a collected heterogeneous molecular network with the accuracy of 86.55%, sensitivity of 82.49%, precision of 89.79%, AUC of 0.9301 and AUPR of 0.9308. Further comparison experiments are performed with various protein representations and classifiers to indicate the effectiveness of MTV-PPI in predicting PPIs based on a complex network. CONCLUSION The achieved experimental results illustrate that MTV-PPI is a promising tool for PPI prediction, which may provide a new perspective for the future interactions prediction researches based on heterogeneous molecular network.
Collapse
Affiliation(s)
- Xiao-Rui Su
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011 China
- University of Chinese Academy of Sciences, Beijing, 100049 China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, 830011 China
| | - Lun Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011 China
- University of Chinese Academy of Sciences, Beijing, 100049 China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, 830011 China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129 China
| | - Peng-Wei Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011 China
- University of Chinese Academy of Sciences, Beijing, 100049 China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, 830011 China
| | - Bo-Wei Zhao
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011 China
- University of Chinese Academy of Sciences, Beijing, 100049 China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, 830011 China
| |
Collapse
|
37
|
Multi-feature Fusion Method Based on Linear Neighborhood Propagation Predict Plant LncRNA-Protein Interactions. Interdiscip Sci 2022; 14:545-554. [PMID: 35040094 DOI: 10.1007/s12539-022-00501-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 12/28/2021] [Accepted: 01/04/2022] [Indexed: 12/31/2022]
Abstract
Long non-coding RNAs (lncRNAs) have attracted extensive attention due to their important roles in various biological processes, among which lncRNA-protein interaction plays an important regulatory role in plant immunity and life activities. Laboratory methods are time consuming and labor-intensive, so that many computational methods have gradually emerged as auxiliary tools to assist relevant research. However, there are relatively few methods to predict lncRNA-protein interaction of plant. Due to the lack of experimentally verified interactions data, there is an imbalance between known and unknown interaction samples in plant data sets. In this study, a multi-feature fusion method based on linear neighborhood propagation is developed to predict plant unobserved lncRNA-protein interaction pairs through known interaction pairs, called MPLPLNP. The linear neighborhood similarity of the feature space is calculated and the results are predicted by label propagation. Meanwhile, multiple feature training is integrated to better explore the potential interaction information in the data. The experimental results show that the proposed multi-feature fusion method can improve the performance of the model, and is superior to other state-of-the-art approaches. Moreover, the proposed approach has better performance and generalization ability on various plant datasets, which is expected to facilitate the related research of plant molecular biology.
Collapse
|
38
|
Deng P, Zhang F, Li T, Wang H, Horng SJ. Biased unconstrained non-negative matrix factorization for clustering. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.108040] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
39
|
Wang CC, Li TH, Huang L, Chen X. Prediction of potential miRNA-disease associations based on stacked autoencoder. Brief Bioinform 2022; 23:6529883. [PMID: 35176761 DOI: 10.1093/bib/bbac021] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 01/05/2022] [Accepted: 01/14/2022] [Indexed: 12/11/2022] Open
Abstract
In recent years, increasing biological experiments and scientific studies have demonstrated that microRNA (miRNA) plays an important role in the development of human complex diseases. Therefore, discovering miRNA-disease associations can contribute to accurate diagnosis and effective treatment of diseases. Identifying miRNA-disease associations through computational methods based on biological data has been proven to be low-cost and high-efficiency. In this study, we proposed a computational model named Stacked Autoencoder for potential MiRNA-Disease Association prediction (SAEMDA). In SAEMDA, all the miRNA-disease samples were used to pretrain a Stacked Autoencoder (SAE) in an unsupervised manner. Then, the positive samples and the same number of selected negative samples were utilized to fine-tune SAE in a supervised manner after adding an output layer with softmax classifier to the SAE. SAEMDA can make full use of the feature information of all unlabeled miRNA-disease pairs. Therefore, SAEMDA is suitable for our dataset containing small labeled samples and large unlabeled samples. As a result, SAEMDA achieved AUCs of 0.9210 and 0.8343 in global and local leave-one-out cross validation. Besides, SAEMDA obtained an average AUC and standard deviation of 0.9102 ± /-0.0029 in 100 times of 5-fold cross validation. These results were better than those of previous models. Moreover, we carried out three case studies to further demonstrate the predictive accuracy of SAEMDA. As a result, 82% (breast neoplasms), 100% (lung neoplasms) and 90% (esophageal neoplasms) of the top 50 predicted miRNAs were verified by databases. Thus, SAEMDA could be a useful and reliable model to predict potential miRNA-disease associations.
Collapse
Affiliation(s)
- Chun-Chun Wang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China.,Artificial Intelligence Research Institute, China University of Mining and Technology, Xuzhou, 221116, China
| | - Tian-Hao Li
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Li Huang
- Academy of Arts and Design, Tsinghua University, Beijing, 10084, China.,The Future Laboratory, Tsinghua University, Beijing, 10084, China
| | - Xing Chen
- Artificial Intelligence Research Institute, China University of Mining and Technology, Xuzhou, 221116, China
| |
Collapse
|
40
|
Li G, Zhang P, Sun W, Ren C, Wang L. Bridging-BPs: a novel approach to predict potential drug-target interactions based on a bridging heterogeneous graph and BPs2vec. Brief Bioinform 2022; 23:6509044. [PMID: 35037024 DOI: 10.1093/bib/bbab557] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 11/04/2021] [Accepted: 12/05/2021] [Indexed: 11/12/2022] Open
Abstract
Predicting drug-target interactions (DTIs) is a convenient strategy for drug discovery. Although various computational methods have been put forward in recent years, DTIs prediction is still a challenging task. In this paper, based on indirect prior information (we term them as mediators), we proposed a new model, called Bridging-BPs (bridging paths), for DTIs prediction. Specifically, we regarded linkage process between mediators and DTs (drugs and proteins) as 'bridging' and source (drug)-mediators-destination (protein) as bridging paths. By integrating various bridging paths, we constructed a bridging heterogeneous graph for DTIs. After that, an improved graph-embedding algorithm-BPs2vec-was designed to capture deep topological features underlying the bridging graph, thereby obtaining the low-dimensional node vector representations. Then, the vector representations were fed into a Random Forest classifier to train and score the probability, outputting the final classification results for potential DTIs. Under 5-fold cross validation, our method obtained AUPR of 88.97% and AUC of 88.63%, suggesting that Bridging-BPs could effectively mine the link relationships hidden in indirect prior information and it significantly improved the accuracy and robustness of DTIs prediction without direct prior information. Finally, we confirmed the practical prediction ability of Bridging-BPs by case studies.
Collapse
Affiliation(s)
- Guodong Li
- College of Life Science and Technology, Huazhong Agricultural University, Wuhan, 430070, China
| | - Ping Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Weicheng Sun
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Chengjuan Ren
- School of Computer Software Convergence Engineering, Kunsan National University, Kunsan, 54150, Korea
| | - Lei Wang
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Science, Nanning, 530007, China
| |
Collapse
|
41
|
Hu L, Yang S, Luo X, Yuan H, Sedraoui K, Zhou M. A Distributed Framework for Large-scale Protein-protein Interaction Data Analysis and Prediction Using MapReduce. IEEE/CAA JOURNAL OF AUTOMATICA SINICA 2022; 9:160-172. [DOI: 10.1109/jas.2021.1004198] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
42
|
Su X, Hu L, You Z, Hu P, Wang L, Zhao B. A deep learning method for repurposing antiviral drugs against new viruses via multi-view nonnegative matrix factorization and its application to SARS-CoV-2. Brief Bioinform 2021; 23:6489102. [PMID: 34965582 DOI: 10.1093/bib/bbab526] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 10/20/2021] [Accepted: 11/14/2021] [Indexed: 12/15/2022] Open
Abstract
The outbreak of COVID-19 caused by SARS-coronavirus (CoV)-2 has made millions of deaths since 2019. Although a variety of computational methods have been proposed to repurpose drugs for treating SARS-CoV-2 infections, it is still a challenging task for new viruses, as there are no verified virus-drug associations (VDAs) between them and existing drugs. To efficiently solve the cold-start problem posed by new viruses, a novel constrained multi-view nonnegative matrix factorization (CMNMF) model is designed by jointly utilizing multiple sources of biological information. With the CMNMF model, the similarities of drugs and viruses can be preserved from their own perspectives when they are projected onto a unified latent feature space. Based on the CMNMF model, we propose a deep learning method, namely VDA-DLCMNMF, for repurposing drugs against new viruses. VDA-DLCMNMF first initializes the node representations of drugs and viruses with their corresponding latent feature vectors to avoid a random initialization and then applies graph convolutional network to optimize their representations. Given an arbitrary drug, its probability of being associated with a new virus is computed according to their representations. To evaluate the performance of VDA-DLCMNMF, we have conducted a series of experiments on three VDA datasets created for SARS-CoV-2. Experimental results demonstrate that the promising prediction accuracy of VDA-DLCMNMF. Moreover, incorporating the CMNMF model into deep learning gains new insight into the drug repurposing for SARS-CoV-2, as the results of molecular docking experiments reveal that four antiviral drugs identified by VDA-DLCMNMF have the potential ability to treat SARS-CoV-2 infections.
Collapse
Affiliation(s)
- Xiaorui Su
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China.,University of Chinese Academy of Sciences, Beijing, 100049, China.,Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, 830011, China
| | - Lun Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China.,University of Chinese Academy of Sciences, Beijing, 100049, China.,Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, 830011, China
| | - Zhuhong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710129, China
| | - Pengwei Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China.,University of Chinese Academy of Sciences, Beijing, 100049, China.,Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, 830011, China
| | - Lei Wang
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Science, Nanning, 530007, China
| | - Bowei Zhao
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China.,University of Chinese Academy of Sciences, Beijing, 100049, China.,Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, 830011, China
| |
Collapse
|
43
|
Li H, Qu J, Jiang X, Niu Y. A Correlation Analysis of Geomagnetic Field Characteristics in Geomagnetic Perceiving Navigation. Front Neurorobot 2021; 15:785563. [PMID: 35002669 PMCID: PMC8733244 DOI: 10.3389/fnbot.2021.785563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 11/15/2021] [Indexed: 11/29/2022] Open
Abstract
It is well-known that geomagnetic fields have multiple components or parameters, and that these geomagnetic parameters are related to each other. In this paper, a parameter selection method is proposed, and this paper mainly discusses the correlation of geomagnetic field parameters for geomagnetic navigation technology. For the correlation analysis between geomagnetic parameters, the similarity calculation of the correlation coefficient is firstly introduced for geomagnetic navigation technology, and the grouped results are obtained by data analysis. At the same time, the search algorithm (Hex-path algorithm) is used to verify the correlation analysis results. The results show the same convergent state for the approximate correlation coefficient. In other words, the simulation results are in agreement with the similarity calculation results.
Collapse
Affiliation(s)
- Hong Li
- School of Automation, Xi'an University of Posts and Telecommunications, Xi'an, China
- Xi'an Key Laboratory of Advanced Control and Intelligent Process, School of Automation, Xi'an University of Posts and Telecommunications, Xi'an, China
| | - Junsuo Qu
- School of Automation, Xi'an University of Posts and Telecommunications, Xi'an, China
- Xi'an Key Laboratory of Advanced Control and Intelligent Process, School of Automation, Xi'an University of Posts and Telecommunications, Xi'an, China
| | - Xiangkui Jiang
- School of Automation, Xi'an University of Posts and Telecommunications, Xi'an, China
| | - Yun Niu
- School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an, China
| |
Collapse
|
44
|
Zhao BW, Hu L, You ZH, Wang L, Su XR. HINGRL: predicting drug-disease associations with graph representation learning on heterogeneous information networks. Brief Bioinform 2021; 23:6456295. [PMID: 34891172 DOI: 10.1093/bib/bbab515] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 11/08/2021] [Accepted: 11/09/2021] [Indexed: 12/20/2022] Open
Abstract
Identifying new indications for drugs plays an essential role at many phases of drug research and development. Computational methods are regarded as an effective way to associate drugs with new indications. However, most of them complete their tasks by constructing a variety of heterogeneous networks without considering the biological knowledge of drugs and diseases, which are believed to be useful for improving the accuracy of drug repositioning. To this end, a novel heterogeneous information network (HIN) based model, namely HINGRL, is proposed to precisely identify new indications for drugs based on graph representation learning techniques. More specifically, HINGRL first constructs a HIN by integrating drug-disease, drug-protein and protein-disease biological networks with the biological knowledge of drugs and diseases. Then, different representation strategies are applied to learn the features of nodes in the HIN from the topological and biological perspectives. Finally, HINGRL adopts a Random Forest classifier to predict unknown drug-disease associations based on the integrated features of drugs and diseases obtained in the previous step. Experimental results demonstrate that HINGRL achieves the best performance on two real datasets when compared with state-of-the-art models. Besides, our case studies indicate that the simultaneous consideration of network topology and biological knowledge of drugs and diseases allows HINGRL to precisely predict drug-disease associations from a more comprehensive perspective. The promising performance of HINGRL also reveals that the utilization of rich heterogeneous information provides an alternative view for HINGRL to identify novel drug-disease associations especially for new diseases.
Collapse
Affiliation(s)
- Bo-Wei Zhao
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.,University of Chinese Academy of Sciences, Beijing 100049, China.,Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China
| | - Lun Hu
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.,University of Chinese Academy of Sciences, Beijing 100049, China.,Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China
| | - Lei Wang
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Science, Nanning 530007, China
| | - Xiao-Rui Su
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.,University of Chinese Academy of Sciences, Beijing 100049, China.,Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China
| |
Collapse
|
45
|
Hu L, Zhao BW, Yang S, Luo X, Zhou M. Predicting Large-scale Protein-protein Interactions by Extracting Coevolutionary Patterns with MapReduce Paradigm. 2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC) 2021:939-944. [DOI: 10.1109/smc52423.2021.9658839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
46
|
Hu L, Wang X, Huang YA, Hu P, You ZH. A Novel Network-Based Algorithm for Predicting Protein-Protein Interactions Using Gene Ontology. Front Microbiol 2021; 12:735329. [PMID: 34512614 PMCID: PMC8425590 DOI: 10.3389/fmicb.2021.735329] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Accepted: 08/02/2021] [Indexed: 11/24/2022] Open
Abstract
Proteins are one of most significant components in living organism, and their main role in cells is to undertake various physiological functions by interacting with each other. Thus, the prediction of protein-protein interactions (PPIs) is crucial for understanding the molecular basis of biological processes, such as chronic infections. Given the fact that laboratory-based experiments are normally time-consuming and labor-intensive, computational prediction algorithms have become popular at present. However, few of them could simultaneously consider both the structural information of PPI networks and the biological information of proteins for an improved accuracy. To do so, we assume that the prior information of functional modules is known in advance and then simulate the generative process of a PPI network associated with the biological information of proteins, i.e., Gene Ontology, by using an established Bayesian model. In order to indicate to what extent two proteins are likely to interact with each other, we propose a novel scoring function by combining the membership distributions of proteins with network paths. Experimental results show that our algorithm has a promising performance in terms of several independent metrics when compared with state-of-the-art prediction algorithms, and also reveal that the consideration of modularity in PPI networks provides us an alternative, yet much more flexible, way to accurately predict PPIs.
Collapse
Affiliation(s)
- Lun Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Xiaojuan Wang
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| | - Yu-An Huang
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| | - Pengwei Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| |
Collapse
|
47
|
Automatic Detection of Melanins and Sebums from Skin Images Using a Generative Adversarial Network. Cognit Comput 2021. [DOI: 10.1007/s12559-021-09870-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
48
|
Li Z, Hu L, Tang Z, Zhao C. Predicting HIV-1 Protease Cleavage Sites With Positive-Unlabeled Learning. Front Genet 2021; 12:658078. [PMID: 33868387 PMCID: PMC8044780 DOI: 10.3389/fgene.2021.658078] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Accepted: 03/08/2021] [Indexed: 11/13/2022] Open
Abstract
Understanding the substrate specificity of HIV-1 protease plays an essential role in the prevention of HIV infection. A variety of computational models have thus been developed to predict substrate sites that are cleaved by HIV-1 protease, but most of them normally follow a supervised learning scheme to build classifiers by considering experimentally verified cleavable sites as positive samples and unknown sites as negative samples. However, certain noisy can be contained in the negative set, as false negative samples are possibly existed. Hence, the performance of the classifiers is not as accurate as they could be due to the biased prediction results. In this work, unknown substrate sites are regarded as unlabeled samples instead of negative ones. We propose a novel positive-unlabeled learning algorithm, namely PU-HIV, for an effective prediction of HIV-1 protease cleavage sites. Features used by PU-HIV are encoded from different perspectives of substrate sequences, including amino acid identities, coevolutionary patterns and chemical properties. By adjusting the weights of errors generated by positive and unlabeled samples, a biased support vector machine classifier can be built to complete the prediction task. In comparison with state-of-the-art prediction models, benchmarking experiments using cross-validation and independent tests demonstrated the superior performance of PU-HIV in terms of AUC, PR-AUC, and F-measure. Thus, with PU-HIV, it is possible to identify previously unknown, but physiologically existed substrate sites that are able to be cleaved by HIV-1 protease, thus providing valuable insights into designing novel HIV-1 protease inhibitors for HIV treatment.
Collapse
Affiliation(s)
- Zhenfeng Li
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| | - Lun Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Zehai Tang
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| | - Cheng Zhao
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| |
Collapse
|