1
|
Ma W, Bi X, Jiang H, Zhang S, Wei Z. CollaPPI: A Collaborative Learning Framework for Predicting Protein-Protein Interactions. IEEE J Biomed Health Inform 2024; 28:3167-3177. [PMID: 38466584 DOI: 10.1109/jbhi.2024.3375621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]
Abstract
Exploring protein-protein interaction (PPI) is of paramount importance for elucidating the intrinsic mechanism of various biological processes. Nevertheless, experimental determination of PPI can be both time-consuming and expensive, motivating the exploration of data-driven deep learning technologies as a viable, efficient, and accurate alternative. Nonetheless, most current deep learning-based methods regarded a pair of proteins to be predicted for possible interaction as two separate entities when extracting PPI features, thus neglecting the knowledge sharing among the collaborative protein and the target protein. Aiming at the above issue, a collaborative learning framework CollaPPI was proposed in this study, where two kinds of collaboration, i.e., protein-level collaboration and task-level collaboration, were incorporated to achieve not only the knowledge-sharing between a pair of proteins, but also the complementation of such shared knowledge between biological domains closely related to PPI (i.e., protein function, and subcellular location). Evaluation results demonstrated that CollaPPI obtained superior performance compared to state-of-the-art methods on two PPI benchmarks. Besides, evaluation results of CollaPPI on the additional PPI type prediction task further proved its excellent generalization ability.
Collapse
|
2
|
Wu J, Liu B, Zhang J, Wang Z, Li J. DL-PPI: a method on prediction of sequenced protein-protein interaction based on deep learning. BMC Bioinformatics 2023; 24:473. [PMID: 38097937 PMCID: PMC10722729 DOI: 10.1186/s12859-023-05594-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 12/01/2023] [Indexed: 12/17/2023] Open
Abstract
PURPOSE Sequenced Protein-Protein Interaction (PPI) prediction represents a pivotal area of study in biology, playing a crucial role in elucidating the mechanistic underpinnings of diseases and facilitating the design of novel therapeutic interventions. Conventional methods for extracting features through experimental processes have proven to be both costly and exceedingly complex. In light of these challenges, the scientific community has turned to computational approaches, particularly those grounded in deep learning methodologies. Despite the progress achieved by current deep learning technologies, their effectiveness diminishes when applied to larger, unfamiliar datasets. RESULTS In this study, the paper introduces a novel deep learning framework, termed DL-PPI, for predicting PPIs based on sequence data. The proposed framework comprises two key components aimed at improving the accuracy of feature extraction from individual protein sequences and capturing relationships between proteins in unfamiliar datasets. 1. Protein Node Feature Extraction Module: To enhance the accuracy of feature extraction from individual protein sequences and facilitate the understanding of relationships between proteins in unknown datasets, the paper devised a novel protein node feature extraction module utilizing the Inception method. This module efficiently captures relevant patterns and representations within protein sequences, enabling more informative feature extraction. 2. Feature-Relational Reasoning Network (FRN): In the Global Feature Extraction module of our model, the paper developed a novel FRN that leveraged Graph Neural Networks to determine interactions between pairs of input proteins. The FRN effectively captures the underlying relational information between proteins, contributing to improved PPI predictions. DL-PPI framework demonstrates state-of-the-art performance in the realm of sequence-based PPI prediction.
Collapse
Affiliation(s)
- Jiahui Wu
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
| | - Bo Liu
- School of Mathematical and Computational Sciences, Massey University, Auckland, 0745, New Zealand.
| | - Jidong Zhang
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
| | - Zhihan Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
| | - Jianqiang Li
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
| |
Collapse
|
3
|
Huang W, Gu H, Yuan Z. Identifying biomarkers for prenatal diagnosis of neural tube defects based on "omics". Clin Genet 2021; 101:381-389. [PMID: 34761376 DOI: 10.1111/cge.14087] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Revised: 11/05/2021] [Accepted: 11/06/2021] [Indexed: 11/27/2022]
Abstract
Neural tube defects (NTDs) are the most severe birth defects and the main cause of newborn death; posing a great challenge to the affected children, families, and societies. Presently, the clinical diagnosis of NTDs mainly relies on ultrasound images combined with certain indices, such as alpha-fetoprotein levels in the maternal serum and amniotic fluid. Recently, the discovery of additional biomarkers in maternal tissue has presented new possibilities for prenatal diagnosis. Over the past 20 years, "omics" techniques have provided the premise for the study of biomarkers. This review summarizes recent advances in candidate biomarkers for the prenatal diagnosis of fetal NTDs based on omics techniques using maternal biological specimens of different origins, including amniotic fluid, blood, and urine, which may provide a foundation for the early prenatal diagnosis of NTDs.
Collapse
Affiliation(s)
- Wanqi Huang
- Key Laboratory of Health Ministry for Congenital Malformation, Shengjing Hospital, China Medical University, Shenyang, China
| | - Hui Gu
- Key Laboratory of Health Ministry for Congenital Malformation, Shengjing Hospital, China Medical University, Shenyang, China
| | - Zhengwei Yuan
- Key Laboratory of Health Ministry for Congenital Malformation, Shengjing Hospital, China Medical University, Shenyang, China
| |
Collapse
|
4
|
Dong N, Gu H, Liu D, Wei X, Ma W, Ma L, Liu Y, Wang Y, Jia S, Huang J, Wang C, He X, Huang T, He Y, Zhang Q, An D, Bai Y, Yuan Z. Complement factors and alpha-fetoprotein as biomarkers for noninvasive prenatal diagnosis of neural tube defects. Ann N Y Acad Sci 2020; 1478:75-91. [PMID: 32761624 DOI: 10.1111/nyas.14443] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Revised: 05/30/2020] [Accepted: 06/29/2020] [Indexed: 12/23/2022]
Abstract
Neural tube defects (NTDs) are serious congenital malformations. In this study, we aimed to identify more specific and sensitive maternal serum biomarkers for noninvasive NTD screenings. We collected serum from 37 pregnant women carrying fetuses with NTDs and 38 pregnant women carrying normal fetuses. Isobaric tags for relative and absolute quantitation were conducted for differential proteomic analysis, and an enzyme-linked immunosorbent assay was used to validate the results. We then used a support vector machine (SVM) classifier to establish a disease prediction model for NTD diagnosis. We identified 113 differentially expressed proteins; of these, 23 were either up- or downregulated 1.5-fold or more, including five complement proteins (C1QA, C1S, C1R, C9, and C3); C3 and C9 were downregulated significantly in NTD groups. The accuracy rate of the SVM model of the complement factors (including C1QA, C1S, and C3) was 62.5%, with 60% sensitivity and 67% specificity, while the accuracy rate of the SVM model of alpha-fetoprotein (AFP, an established biomarker for NTDs) was 62.5%, with 75% sensitivity and 50% specificity. Combination of the complement factor and AFP data resulted in the SVM model accuracy of 75%, and receiver operating characteristic curve analysis showed 75% sensitivity and 75% specificity. These data suggest that a disease prediction model based on combined complement factor and AFP data could serve as a more accurate method of noninvasive prenatal NTD diagnosis.
Collapse
Affiliation(s)
- Naixuan Dong
- Key Laboratory of Health Ministry for Congenital Malformation, Shengjing Hospital, China Medical University, Shenyang, China.,School of Sino-Dutch Biomedical & Information Engineering, Northeastern University, Shenyang, China
| | - Hui Gu
- Key Laboratory of Health Ministry for Congenital Malformation, Shengjing Hospital, China Medical University, Shenyang, China
| | - Dan Liu
- Key Laboratory of Health Ministry for Congenital Malformation, Shengjing Hospital, China Medical University, Shenyang, China
| | - Xiaowei Wei
- Key Laboratory of Health Ministry for Congenital Malformation, Shengjing Hospital, China Medical University, Shenyang, China
| | - Wei Ma
- Key Laboratory of Health Ministry for Congenital Malformation, Shengjing Hospital, China Medical University, Shenyang, China
| | - Ling Ma
- Key Laboratory of Health Ministry for Congenital Malformation, Shengjing Hospital, China Medical University, Shenyang, China
| | - Yusi Liu
- Key Laboratory of Health Ministry for Congenital Malformation, Shengjing Hospital, China Medical University, Shenyang, China
| | - Yanfu Wang
- Key Laboratory of Health Ministry for Congenital Malformation, Shengjing Hospital, China Medical University, Shenyang, China
| | - Shanshan Jia
- Key Laboratory of Health Ministry for Congenital Malformation, Shengjing Hospital, China Medical University, Shenyang, China
| | - Jieting Huang
- Key Laboratory of Health Ministry for Congenital Malformation, Shengjing Hospital, China Medical University, Shenyang, China
| | - Chenfei Wang
- Key Laboratory of Health Ministry for Congenital Malformation, Shengjing Hospital, China Medical University, Shenyang, China
| | - Xuan He
- School of Sino-Dutch Biomedical & Information Engineering, Northeastern University, Shenyang, China
| | - Tianchu Huang
- Key Laboratory of Health Ministry for Congenital Malformation, Shengjing Hospital, China Medical University, Shenyang, China
| | - Yiwen He
- Key Laboratory of Health Ministry for Congenital Malformation, Shengjing Hospital, China Medical University, Shenyang, China
| | - Qiang Zhang
- Second Respiratory Department, Shengjing Hospital, China Medical University, Shenyang, China
| | - Dong An
- Pediatric Department, The First Hospital of China Medical University, Shenyang, China
| | - Yuzuo Bai
- Department of Pediatric Surgery, Shengjing Hospital, China Medical University, Shenyang, China
| | - Zhengwei Yuan
- Key Laboratory of Health Ministry for Congenital Malformation, Shengjing Hospital, China Medical University, Shenyang, China
| |
Collapse
|
5
|
Sarkar D, Saha S. Machine-learning techniques for the prediction of protein–protein interactions. J Biosci 2019. [DOI: 10.1007/s12038-019-9909-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
6
|
Saha S, Chatterjee P, Basu S, Nasipuri M, Plewczynski D. FunPred 3.0: improved protein function prediction using protein interaction network. PeerJ 2019; 7:e6830. [PMID: 31198622 PMCID: PMC6535044 DOI: 10.7717/peerj.6830] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Accepted: 03/21/2019] [Indexed: 11/23/2022] Open
Abstract
Proteins are the most versatile macromolecules in living systems and perform crucial biological functions. In the advent of the post-genomic era, the next generation sequencing is done routinely at the population scale for a variety of species. The challenging problem is to massively determine the functions of proteins that are yet not characterized by detailed experimental studies. Identification of protein functions experimentally is a laborious and time-consuming task involving many resources. We therefore propose the automated protein function prediction methodology using in silico algorithms trained on carefully curated experimental datasets. We present the improved protein function prediction tool FunPred 3.0, an extended version of our previous methodology FunPred 2, which exploits neighborhood properties in protein–protein interaction network (PPIN) and physicochemical properties of amino acids. Our method is validated using the available functional annotations in the PPIN network of Saccharomyces cerevisiae in the latest Munich information center for protein (MIPS) dataset. The PPIN data of S. cerevisiae in MIPS dataset includes 4,554 unique proteins in 13,528 protein–protein interactions after the elimination of the self-replicating and the self-interacting protein pairs. Using the developed FunPred 3.0 tool, we are able to achieve the mean precision, the recall and the F-score values of 0.55, 0.82 and 0.66, respectively. FunPred 3.0 is then used to predict the functions of unpredicted protein pairs (incomplete and missing functional annotations) in MIPS dataset of S. cerevisiae. The method is also capable of predicting the subcellular localization of proteins along with its corresponding functions. The code and the complete prediction results are available freely at: https://github.com/SovanSaha/FunPred-3.0.git.
Collapse
Affiliation(s)
- Sovan Saha
- Department of Computer Science and Engineering, Dr. Sudhir Chandra Sur Degree Engineering College, Kolkata, West Bengal, India
| | - Piyali Chatterjee
- Department of Computer Science and Engineering, Netaji Subhash Engineering College, Kolkata, India
| | - Subhadip Basu
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
| | - Mita Nasipuri
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
| | - Dariusz Plewczynski
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Warsaw, Poland.,Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| |
Collapse
|
7
|
Lin X, Zhang X. Prediction of Hot Regions in PPIs Based on Improved Local Community Structure Detecting. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1470-1479. [PMID: 29994749 DOI: 10.1109/tcbb.2018.2793858] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The hot regions in PPIs are some assembly regions which are composed of the tightly packed HotSpots. The discovery of hot regions helps to understand life activities and has very important value for biological applications. The identification of hot regions is the basis for protein design and cancer prevention. The existing algorithms of predicting hot regions often have some defects, such as low accuracy and unstability. This paper proposes a novel hot region prediction method based on diverse biological characteristics. First, feature evaluation is employed by using an impoved mRMR method. Then, SVM is adopted to create cassification model based on the features selected. In addition, a new clustering algorithm, namely LCSD (Local community structure detecting), is developed to detect and analyze the conformation of hot regions. In the clustering process, the link similarity of protein residues is introduced to handle the boundary nodes. This algorithm can effectively deal with the missing residue nodes and control the local community boundaries. The results indicate that the spatial structure of hot regions can be obtained more effectively, and that our method is more effective than previous methods for precise identification of hot regions.
Collapse
|
8
|
Małysiak-Mrozek B. Uncertainty, imprecision, and many-valued logics in protein bioinformatics. Math Biosci 2018; 309:143-162. [PMID: 30118719 DOI: 10.1016/j.mbs.2018.08.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 07/24/2018] [Accepted: 08/09/2018] [Indexed: 11/15/2022]
Abstract
Understanding proteins, their structures, functions, mutual interactions, activity in cellular reactions, interactions with drugs, and expression in body cells is a key to efficient medical diagnosis, drug production, and treatment of patients. Machine learning and data exploration methods supported by many-valued logics allow to grasp the imprecision and uncertainties that naturally occur in proteins and other biomolecules. Many-valued logics, like Łukasiewicz logic or fuzzy logic, are non-classical logics that do not restrict the number of truth values to only two values of true or false, but they allow for a larger set of truth degrees. In this paper, we briefly review the use of many-valued logics, especially the fuzzy logic, in bioinformatics. Then, we focus on protein bioinformatics, and present selected applications of many-valued logics in the analysis of complex protein structures, including; (1) potential-based protein similarity searching, (2) matching proteins on the basis of secondary structures, (3) 3D protein structure alignment, (4) prediction of intrinsically disordered proteins, and (5) fuzzy querying in large collections of Big macromolecular Data. Results of presented studies show that the utilization of many-valued logics can enrich the investigations of protein molecules, in which uncertainty and imprecision are prevalent problems. The paper discusses all observed benefits brought by the application of many-valued logics in investigations related to selected protein analyzes carried out by the author.
Collapse
Affiliation(s)
- Bożena Małysiak-Mrozek
- Institute of Informatics, Silesian University of Technology, Akademicka 16, Gliwice 44-100, Poland.
| |
Collapse
|
9
|
Dutta P, Basu S, Kundu M. Assessment of Semantic Similarity between Proteins Using Information Content and Topological Properties of the Gene Ontology Graph. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:839-849. [PMID: 28371781 DOI: 10.1109/tcbb.2017.2689762] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The semantic similarity between two interacting proteins can be estimated by combining the similarity scores of the GO terms associated with the proteins. Greater number of similar GO annotations between two proteins indicates greater interaction affinity. Existing semantic similarity measures make use of the GO graph structure, the information content of GO terms, or a combination of both. In this paper, we present a hybrid approach which utilizes both the topological features of the GO graph and information contents of the GO terms. More specifically, we 1) consider a fuzzy clustering of the GO graph based on the level of association of the GO terms, 2) estimate the GO term memberships to each cluster center based on the respective shortest path lengths, and 3) assign weightage to GO term pairs on the basis of their dissimilarity with respect to the cluster centers. We test the performance of our semantic similarity measure against seven other previously published similarity measures using benchmark protein-protein interaction datasets of Homo sapiens and Saccharomyces cerevisiae based on sequence similarity, Pfam similarity, area under ROC curve, and measure.
Collapse
|
10
|
Dutta P, Halder AK, Basu S, Kundu M. A survey on Ebola genome and current trends in computational research on the Ebola virus. Brief Funct Genomics 2017; 17:374-380. [DOI: 10.1093/bfgp/elx020] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
|