1
|
Ortiz-Vilchis P, De-la-Cruz-García JS, Ramirez-Arellano A. Identification of Relevant Protein Interactions with Partial Knowledge: A Complex Network and Deep Learning Approach. BIOLOGY 2023; 12:140. [PMID: 36671832 PMCID: PMC9856098 DOI: 10.3390/biology12010140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 01/11/2023] [Accepted: 01/12/2023] [Indexed: 01/18/2023]
Abstract
Protein-protein interactions (PPIs) are the basis for understanding most cellular events in biological systems. Several experimental methods, e.g., biochemical, molecular, and genetic methods, have been used to identify protein-protein associations. However, some of them, such as mass spectrometry, are time-consuming and expensive. Machine learning (ML) techniques have been widely used to characterize PPIs, increasing the number of proteins analyzed simultaneously and optimizing time and resources for identifying and predicting protein-protein functional linkages. Previous ML approaches have focused on well-known networks or specific targets but not on identifying relevant proteins with partial or null knowledge of the interaction networks. The proposed approach aims to generate a relevant protein sequence based on bidirectional Long-Short Term Memory (LSTM) with partial knowledge of interactions. The general framework comprises conducting a scale-free and fractal complex network analysis. The outcome of these analyses is then used to fine-tune the fractal method for the vital protein extraction of PPI networks. The results show that several PPI networks are self-similar or fractal, but that both features cannot coexist. The generated protein sequences (by the bidirectional LSTM) also contain an average of 39.5% of proteins in the original sequence. The average length of the generated sequences was 17% of the original one. Finally, 95% of the generated sequences were true.
Collapse
Affiliation(s)
- Pilar Ortiz-Vilchis
- Sección de Estudios de Posgrado e Investigación, Escuela Superior de Medicina, Instituto Politécnico Nacional, Mexico City 11340, Mexico
| | - Jazmin-Susana De-la-Cruz-García
- Sección de Estudios de Posgrado e Investigación, Unidad Profesional Interdisciplinaria de Ingeniería y Ciencias Sociales y Administrativas, Instituto Politécnico Nacional, Mexico City 08400, Mexico
| | - Aldo Ramirez-Arellano
- Sección de Estudios de Posgrado e Investigación, Unidad Profesional Interdisciplinaria de Ingeniería y Ciencias Sociales y Administrativas, Instituto Politécnico Nacional, Mexico City 08400, Mexico
| |
Collapse
|
2
|
He T, Bai L, Ong YS. Vicinal Vertex Allocation for Matrix Factorization in Networks. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:8047-8060. [PMID: 33600331 DOI: 10.1109/tcyb.2021.3051606] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this article, we present a novel matrix-factorization-based model, labeled here as Vicinal vertex allocated matrix factorization (VVAMo), for uncovering clusters in network data. Different from the past related efforts of network clustering, which consider the edge structure, vertex features, or both in their design, the proposed model includes the additional detail on vertex inclinations with respect to topology and features into the learning. In particular, by taking the latent preferences between vicinal vertices into consideration, VVAMo is then able to uncover network clusters composed of proximal vertices that share analogous inclinations, and correspondingly high structural and feature correlations. To ensure such clusters are effectively uncovered, we propose a unified likelihood function for VVAMo and derive an alternating algorithm for optimizing the proposed function. Subsequently, we provide the theoretical analysis of VVAMo, including the convergence proof and computational complexity analysis. To investigate the effectiveness of the proposed model, a comprehensive empirical study of VVAMo is conducted using extensive commonly used realistic network datasets. The results obtained show that VVAMo attained superior performances over existing classical and state-of-the-art approaches.
Collapse
|
3
|
Liu G, Liu B, Li A, Wang X, Yu J, Zhou X. Identifying Protein Complexes With Clear Module Structure Using Pairwise Constraints in Protein Interaction Networks. Front Genet 2021; 12:664786. [PMID: 34512712 PMCID: PMC8430217 DOI: 10.3389/fgene.2021.664786] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2021] [Accepted: 06/23/2021] [Indexed: 02/02/2023] Open
Abstract
The protein-protein interaction (PPI) networks can be regarded as powerful platforms to elucidate the principle and mechanism of cellular organization. Uncovering protein complexes from PPI networks will lead to a better understanding of the science of biological function in cellular systems. In recent decades, numerous computational algorithms have been developed to identify protein complexes. However, the majority of them primarily concern the topological structure of PPI networks and lack of the consideration for the native organized structure among protein complexes. The PPI networks generated by high-throughput technology include a fraction of false protein interactions which make it difficult to identify protein complexes efficiently. To tackle these challenges, we propose a novel semi-supervised protein complex detection model based on non-negative matrix tri-factorization, which not only considers topological structure of a PPI network but also makes full use of available high quality known protein pairs with must-link constraints. We propose non-overlapping (NSSNMTF) and overlapping (OSSNMTF) protein complex detection algorithms to identify the significant protein complexes with clear module structures from PPI networks. In addition, the proposed two protein complex detection algorithms outperform a diverse range of state-of-the-art protein complex identification algorithms on both synthetic networks and human related PPI networks.
Collapse
Affiliation(s)
- Guangming Liu
- School of Computer Science & Engineering, Xi'an University of Technology, Xi'an, China
| | - Bo Liu
- Hebei Key Laboratory of Agricultural Big Data, College of Information Science and Technology, Hebei Agricultural University, Baoding, China
| | - Aimin Li
- School of Computer Science & Engineering, Xi'an University of Technology, Xi'an, China
| | - Xiaofan Wang
- School of Computer Science & Engineering, Xi'an University of Technology, Xi'an, China
| | - Jian Yu
- Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China
| | - Xuezhong Zhou
- Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China
| |
Collapse
|
4
|
Hu L, Zhang J, Pan X, Yan H, You ZH. HiSCF: leveraging higher-order structures for clustering analysis in biological networks. Bioinformatics 2021; 37:542-550. [PMID: 32931549 DOI: 10.1093/bioinformatics/btaa775] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2020] [Revised: 05/12/2020] [Accepted: 09/03/2020] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION Clustering analysis in a biological network is to group biological entities into functional modules, thus providing valuable insight into the understanding of complex biological systems. Existing clustering techniques make use of lower-order connectivity patterns at the level of individual biological entities and their connections, but few of them can take into account of higher-order connectivity patterns at the level of small network motifs. RESULTS Here, we present a novel clustering framework, namely HiSCF, to identify functional modules based on the higher-order structure information available in a biological network. Taking advantage of higher-order Markov stochastic process, HiSCF is able to perform the clustering analysis by exploiting a variety of network motifs. When compared with several state-of-the-art clustering models, HiSCF yields the best performance for two practical clustering applications, i.e. protein complex identification and gene co-expression module detection, in terms of accuracy. The promising performance of HiSCF demonstrates that the consideration of higher-order network motifs gains new insight into the analysis of biological networks, such as the identification of overlapping protein complexes and the inference of new signaling pathways, and also reveals the rich higher-order organizational structures presented in biological networks. AVAILABILITY AND IMPLEMENTATION HiSCF is available at https://github.com/allenv5/HiSCF. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lun Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China.,School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430070, China
| | - Jun Zhang
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430070, China
| | - Xiangyu Pan
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430070, China
| | - Hong Yan
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong 999077, China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China
| |
Collapse
|
5
|
Yi HC, You ZH, Wang L, Su XR, Zhou X, Jiang TH. In silico drug repositioning using deep learning and comprehensive similarity measures. BMC Bioinformatics 2021; 22:293. [PMID: 34074242 PMCID: PMC8170943 DOI: 10.1186/s12859-020-03882-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 11/13/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Drug repositioning, meanings finding new uses for existing drugs, which can accelerate the processing of new drugs research and development. Various computational methods have been presented to predict novel drug-disease associations for drug repositioning based on similarity measures among drugs and diseases. However, there are some known associations between drugs and diseases that previous studies not utilized. METHODS In this work, we develop a deep gated recurrent units model to predict potential drug-disease interactions using comprehensive similarity measures and Gaussian interaction profile kernel. More specifically, the similarity measure is used to exploit discriminative feature for drugs based on their chemical fingerprints. Meanwhile, the Gaussian interactions profile kernel is employed to obtain efficient feature of diseases based on known disease-disease associations. Then, a deep gated recurrent units model is developed to predict potential drug-disease interactions. RESULTS The performance of the proposed model is evaluated on two benchmark datasets under tenfold cross-validation. And to further verify the predictive ability, case studies for predicting new potential indications of drugs were carried out. CONCLUSION The experimental results proved the proposed model is a useful tool for predicting new indications for drugs or new treatments for diseases, and can accelerate drug repositioning and related drug research and discovery.
Collapse
Affiliation(s)
- Hai-Cheng Yi
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zhu-Hong You
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China.
| | - Lei Wang
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
| | - Xiao-Rui Su
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xi Zhou
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
| | - Tong-Hai Jiang
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
| |
Collapse
|
6
|
Huo X, Sun H, Cao D, Yang J, Peng P, Yu M, Shen K. Identification of prognosis markers for endometrial cancer by integrated analysis of DNA methylation and RNA-Seq data. Sci Rep 2019; 9:9924. [PMID: 31289358 PMCID: PMC6617448 DOI: 10.1038/s41598-019-46195-8] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Accepted: 06/24/2019] [Indexed: 12/17/2022] Open
Abstract
Endometrial cancer is highly malignant and has a poor prognosis in the advanced stage, thus, prediction of its prognosis is important. DNA methylation has rapidly gained clinical attention as a biomarker for diagnostic, prognostic and predictive purposes in various cancers. In present study, differentially methylated positions and differentially expressed genes were identified according to DNA methylation and RNA-Seq data. Functional analyses and interaction network were performed to identify hub genes, and overall survival analysis of hub genes were validated. The top genes were evaluated by immunohistochemical staining of endometrial cancer tissues. The gene function was evaluated by cell growth curve after knockdown CDC20 and CCNA2 of endometrial cancer cell line. A total of 329 hypomethylated highly expressed genes and 359 hypermethylated lowly expressed genes were identified, and four hub genes were obtained according to the interaction network. Patients with low expression of CDC20 and CCNA2 showed better overall survival. The results also were demonstrated by the immunohistochemical staining. Cell growth curve also demonstrated that knockdown CDC20 and CCNA2 can suppress the cell proliferation. We have identified two aberrantly methylated genes, CDC20 and CCNA2 as novel biomarkers for precision diagnosis in EC.
Collapse
Affiliation(s)
- Xiao Huo
- Department of Obstetrics and Gynecology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Hengzi Sun
- Department of Obstetrics and Gynecology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Dongyan Cao
- Department of Obstetrics and Gynecology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Jiaxin Yang
- Department of Obstetrics and Gynecology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Peng Peng
- Department of Obstetrics and Gynecology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Mei Yu
- Department of Obstetrics and Gynecology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Keng Shen
- Department of Obstetrics and Gynecology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.
| |
Collapse
|