1
|
Pan Y, Wang Y, Guan J, Zhou S. PCGAN: a generative approach for protein complex identification from protein interaction networks. Bioinformatics 2023; 39:btad473. [PMID: 37531266 PMCID: PMC10457665 DOI: 10.1093/bioinformatics/btad473] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Revised: 07/23/2023] [Accepted: 08/01/2023] [Indexed: 08/04/2023] Open
Abstract
MOTIVATION Protein complexes are groups of polypeptide chains linked by non-covalent protein-protein interactions, which play important roles in biological systems and perform numerous functions, including DNA transcription, mRNA translation, and signal transduction. In the past decade, a number of computational methods have been developed to identify protein complexes from protein interaction networks by mining dense subnetworks or subgraphs. RESULTS In this article, different from the existing works, we propose a novel approach for this task based on generative adversarial networks, which is called PCGAN, meaning identifying Protein Complexes by GAN. With the help of some real complexes as training samples, our method can learn a model to generate new complexes from a protein interaction network. To effectively support model training and testing, we construct two more comprehensive and reliable protein interaction networks and a larger gold standard complex set by merging existing ones of the same organism (including human and yeast). Extensive comparison studies indicate that our method is superior to existing protein complex identification methods in terms of various performance metrics. Furthermore, functional enrichment analysis shows that the identified complexes are of high biological significance, which indicates that these generated protein complexes are very possibly real complexes. AVAILABILITY AND IMPLEMENTATION https://github.com/yul-pan/PCGAN.
Collapse
Affiliation(s)
- Yuliang Pan
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China
| | - Yang Wang
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China
| | - Jihong Guan
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China
| | - Shuigeng Zhou
- Shanghai Key Laboratory of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai 200438, China
| |
Collapse
|
2
|
Li MM, Huang K, Zitnik M. Graph representation learning in biomedicine and healthcare. Nat Biomed Eng 2022; 6:1353-1369. [PMID: 36316368 PMCID: PMC10699434 DOI: 10.1038/s41551-022-00942-x] [Citation(s) in RCA: 52] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Accepted: 08/09/2022] [Indexed: 11/11/2022]
Abstract
Networks-or graphs-are universal descriptors of systems of interacting elements. In biomedicine and healthcare, they can represent, for example, molecular interactions, signalling pathways, disease co-morbidities or healthcare systems. In this Perspective, we posit that representation learning can realize principles of network medicine, discuss successes and current limitations of the use of representation learning on graphs in biomedicine and healthcare, and outline algorithmic strategies that leverage the topology of graphs to embed them into compact vectorial spaces. We argue that graph representation learning will keep pushing forward machine learning for biomedicine and healthcare applications, including the identification of genetic variants underlying complex traits, the disentanglement of single-cell behaviours and their effects on health, the assistance of patients in diagnosis and treatment, and the development of safe and effective medicines.
Collapse
Affiliation(s)
- Michelle M Li
- Bioinformatics and Integrative Genomics Program, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Kexin Huang
- Health Data Science Program, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Harvard Data Science Initiative, Cambridge, MA, USA.
| |
Collapse
|
3
|
Lagisetty Y, Bourquard T, Al-Ramahi I, Mangleburg CG, Mota S, Soleimani S, Shulman JM, Botas J, Lee K, Lichtarge O. Identification of risk genes for Alzheimer's disease by gene embedding. CELL GENOMICS 2022; 2:100162. [PMID: 36268052 PMCID: PMC9581494 DOI: 10.1016/j.xgen.2022.100162] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Most disease-gene association methods do not account for gene-gene interactions, even though these play a crucial role in complex, polygenic diseases like Alzheimer's disease (AD). To discover new genes whose interactions may contribute to pathology, we introduce GeneEMBED. This approach compares the functional perturbations induced in gene interaction network neighborhoods by coding variants from disease versus healthy subjects. In two independent AD cohorts of 5,169 exomes and 969 genomes, GeneEMBED identified novel candidates. These genes were differentially expressed in post mortem AD brains and modulated neurological phenotypes in mice. Four that were differentially overexpressed and modified neurodegeneration in vivo are PLEC, UTRN, TP53, and POLD1. Notably, TP53 and POLD1 are involved in DNA break repair and inhibited by approved drugs. While these data show proof of concept in AD, GeneEMBED is a general approach that should be broadly applicable to identify genes relevant to risk mechanisms and therapy of other complex diseases.
Collapse
Affiliation(s)
- Yashwanth Lagisetty
- Department of Biology and Pharmacology, UTHealth McGovern Medical School, Houston, TX 77030, USA,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Thomas Bourquard
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ismael Al-Ramahi
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA,Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston, TX 77030, USA,Center for Alzheimer’s and Neurodegenerative Diseases, Baylor College of Medicine, Houston, TX 77030, USA
| | - Carl Grant Mangleburg
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Samantha Mota
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Shirin Soleimani
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Joshua M. Shulman
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA,Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston, TX 77030, USA,Center for Alzheimer’s and Neurodegenerative Diseases, Baylor College of Medicine, Houston, TX 77030, USA,Department of Neurology, Baylor College of Medicine, Houston, TX 77030, USA,Department of Neuroscience, Baylor College of Medicine, Houston, TX 77030, USA
| | - Juan Botas
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA,Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston, TX 77030, USA,Center for Alzheimer’s and Neurodegenerative Diseases, Baylor College of Medicine, Houston, TX 77030, USA
| | - Kwanghyuk Lee
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA,Center for Alzheimer’s and Neurodegenerative Diseases, Baylor College of Medicine, Houston, TX 77030, USA,Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX 77030, USA,Corresponding author
| |
Collapse
|
4
|
Sheng J, Xue J, Li P, Yi N. [A protein complex recognition method based on spatial-temporal graph convolution neural network]. NAN FANG YI KE DA XUE XUE BAO = JOURNAL OF SOUTHERN MEDICAL UNIVERSITY 2022; 42:1075-1081. [PMID: 35869773 DOI: 10.12122/j.issn.1673-4254.2022.07.17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
OBJECTIVE To propose a new method for mining complexes in dynamic protein network using spatiotemporal convolution neural network. METHODS The edge strength, node strength and edge existence probability are defined for modeling of the dynamic protein network. Based on the time series information and structure information on the graph, two convolution operators were designed using Hilbert-Huang transform, attention mechanism and residual connection technology to represent and learn the characteristics of the proteins in the network, and the dynamic protein network characteristic map was constructed. Finally, spectral clustering was used to identify the protein complexes. RESULTS The simulation results on several public biological datasets showed that the F value of the proposed algorithm exceeded 90% on DIP dataset and MIPS dataset. Compared with 4 other recognition algorithms (DPCMNE, GE-CFI, VGAE and NOCD), the proposed algorithm improved the recognition efficiency by 34.5%, 28.7%, 25.4% and 17.6%, respectively. CONCLUSION The application of deep learning technology can improve the efficiency in analysis of dynamic protein networks.
Collapse
Affiliation(s)
- J Sheng
- Clinical nursing teaching and Research Office, The Second Xiangya Hospital of Central South University, Changsha 410011, China.,Department of ultrasound diagnosis, The Second Xiangya Hospital of Central South University, Changsha 410011, China
| | - J Xue
- Operation center, The Third Xiangya Hospital of Central South University, Changsha 410013, China
| | - P Li
- School of Informatics, Hunan University of Chinese Medicine, Changsha 410208, China
| | - N Yi
- School of Informatics, Hunan University of Chinese Medicine, Changsha 410208, China
| |
Collapse
|
5
|
Chang S, Shen L, Li L, Chen X, Han H. Denoising of scanning electron microscope images for biological ultrastructure enhancement. J Bioinform Comput Biol 2022; 20:2250007. [PMID: 35469552 DOI: 10.1142/s021972002250007x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Scanning electron microscopy (SEM) is of great significance for analyzing the ultrastructure. However, due to the requirements of data throughput and electron dose of biological samples in the imaging process, the SEM image of biological samples is often occupied by noise which severely affects the observation of ultrastructure. Therefore, it is necessary to analyze and establish a noise model of SEM and propose an effective denoising algorithm that can preserve the ultrastructure. We first investigated the noise source of SEM images and introduced a signal-related SEM noise model. Then, we validated the effectiveness of the noise model through experiments, which are designed with standard samples to reflect the relation between real signal intensity and noise. Based on the SEM noise model and traditional variance stabilization denoising strategy, we proposed a novel, two-stage denoising method. In the first stage variance stabilization, our VS-Net realizes the separation of signal-dependent noise and signal in the SEM image. In the second stage denoising, our D-Net employs the structure of U-Net and combines the attention mechanism to achieve efficient noise removal. Compared with other existing denoising methods for SEM images, our proposed method is more competitive in objective evaluation and visual effects. Source code is available on GitHub (https://github.com/VictorCSheng/VSID-Net).
Collapse
Affiliation(s)
- Sheng Chang
- Institute of Automation, Chinese Academy of Sciences, Beijing 100190, P. R. China.,School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100190, P. R. China
| | - Lijun Shen
- Institute of Automation, Chinese Academy of Sciences, Beijing 100190, P. R. China
| | - Linlin Li
- Institute of Automation, Chinese Academy of Sciences, Beijing 100190, P. R. China
| | - Xi Chen
- Institute of Automation, Chinese Academy of Sciences, Beijing 100190, P. R. China
| | - Hua Han
- Institute of Automation, Chinese Academy of Sciences, Beijing 100190, P. R. China.,School of Future Technology, University of Chinese Academy of Sciences, Beijing 100190, P. R. China.,The Center for Excellence in Brain, Science and Intelligence Technology, CAS, Shanghai 200031, P. R. China.,National Laboratory of Pattern Recognition, CASIA, Beijing 100190, P. R. China
| |
Collapse
|
6
|
Ren ZH, Yu CQ, Li LP, You ZH, Guan YJ, Li YC, Pan J. SAWRPI: A Stacking Ensemble Framework With Adaptive Weight for Predicting ncRNA-Protein Interactions Using Sequence Information. Front Genet 2022; 13:839540. [PMID: 35360836 PMCID: PMC8963817 DOI: 10.3389/fgene.2022.839540] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 02/07/2022] [Indexed: 11/13/2022] Open
Abstract
Non-coding RNAs (ncRNAs) take essential effects on biological processes, like gene regulation. One critical way of ncRNA executing biological functions is interactions between ncRNA and RNA binding proteins (RBPs). Identifying proteins, involving ncRNA-protein interactions, can well understand the function ncRNA. Many high-throughput experiment have been applied to recognize the interactions. As a consequence of these approaches are time- and labor-consuming, currently, a great number of computational methods have been developed to improve and advance the ncRNA-protein interactions research. However, these methods may be not available to all RNAs and proteins, particularly processing new RNAs and proteins. Additionally, most of them cannot process well with long sequence. In this work, a computational method SAWRPI is proposed to make prediction of ncRNA-protein through sequence information. More specifically, the raw features of protein and ncRNA are firstly extracted through the k-mer sparse matrix with SVD reduction and learning nucleic acid symbols by natural language processing with local fusion strategy, respectively. Then, to classify easily, Hilbert Transformation is exploited to transform raw feature data to the new feature space. Finally, stacking ensemble strategy is adopted to learn high-level abstraction features automatically and generate final prediction results. To confirm the robustness and stability, three different datasets containing two kinds of interactions are utilized. In comparison with state-of-the-art methods and other results classifying or feature extracting strategies, SAWRPI achieved high performance on three datasets, containing two kinds of lncRNA-protein interactions. Upon our finding, SAWRPI is a trustworthy, robust, yet simple and can be used as a beneficial supplement to the task of predicting ncRNA-protein interactions.
Collapse
Affiliation(s)
- Zhong-Hao Ren
- School of Information Engineering, Xijing University, Xi’an, China
| | - Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi’an, China
- *Correspondence: Li-Ping Li, ; Chang-Qing Yu,
| | - Li-Ping Li
- School of Information Engineering, Xijing University, Xi’an, China
- *Correspondence: Li-Ping Li, ; Chang-Qing Yu,
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Yong-Jian Guan
- School of Information Engineering, Xijing University, Xi’an, China
| | - Yue-Chao Li
- School of Information Engineering, Xijing University, Xi’an, China
| | - Jie Pan
- School of Information Engineering, Xijing University, Xi’an, China
| |
Collapse
|
7
|
Yu H, Shen ZA, Du PF. NPI-RGCNAE: Fast predicting ncRNA-protein interactions using the Relational Graph Convolutional Network Auto-Encoder. IEEE J Biomed Health Inform 2021; 26:1861-1871. [PMID: 34699377 DOI: 10.1109/jbhi.2021.3122527] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
- ncRNAs play important roles in a variety of biological processes by interacting with RNA-binding proteins. Therefore, identifying ncRNA-protein interactions is important to understanding the biological functions of ncRNAs. Since experimental methods to determine ncRNA-protein interactions are always costly and time-consuming, computational methods have been proposed as alternative approaches. We developed a novel method NPI-RGCNAE (predicting ncRNA-Protein Interactions by the Relational Graph Convolutional Network Auto-Encoder). With a reliable negative sample selection strategy, we applied the Relational Graph Convolutional Network encoder and the DistMult decoder to predict ncRNA-protein interactions in an accurate and efficient way. By using the 5-fold cross-validation, we found that our method achieved a comparable performance to all state-of-the-art methods. Our method requires less than 10% training time of all state-of-the-art methods. It is a more efficient choice with large datasets in practice. All datasets and source codes of NPI-RGCNAE have been deposited in a public Github repository (https://github.com/Angelia0hh/NPI-RGCNAE).
Collapse
|
8
|
Yi HC, You ZH, Huang DS, Kwoh CK. Graph representation learning in bioinformatics: trends, methods and applications. Brief Bioinform 2021; 23:6361044. [PMID: 34471921 DOI: 10.1093/bib/bbab340] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 07/18/2021] [Accepted: 08/02/2021] [Indexed: 12/12/2022] Open
Abstract
Graph is a natural data structure for describing complex systems, which contains a set of objects and relationships. Ubiquitous real-life biomedical problems can be modeled as graph analytics tasks. Machine learning, especially deep learning, succeeds in vast bioinformatics scenarios with data represented in Euclidean domain. However, rich relational information between biological elements is retained in the non-Euclidean biomedical graphs, which is not learning friendly to classic machine learning methods. Graph representation learning aims to embed graph into a low-dimensional space while preserving graph topology and node properties. It bridges biomedical graphs and modern machine learning methods and has recently raised widespread interest in both machine learning and bioinformatics communities. In this work, we summarize the advances of graph representation learning and its representative applications in bioinformatics. To provide a comprehensive and structured analysis and perspective, we first categorize and analyze both graph embedding methods (homogeneous graph embedding, heterogeneous graph embedding, attribute graph embedding) and graph neural networks. Furthermore, we summarize their representative applications from molecular level to genomics, pharmaceutical and healthcare systems level. Moreover, we provide open resource platforms and libraries for implementing these graph representation learning methods and discuss the challenges and opportunities of graph representation learning in bioinformatics. This work provides a comprehensive survey of emerging graph representation learning algorithms and their applications in bioinformatics. It is anticipated that it could bring valuable insights for researchers to contribute their knowledge to graph representation learning and future-oriented bioinformatics studies.
Collapse
Affiliation(s)
- Hai-Cheng Yi
- Chinese Academy of Sciences, Xinjiang Technical Institute of Physics and Chemistry, Urumqi 830011, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore
| |
Collapse
|
9
|
Zhang XM, Liang L, Liu L, Tang MJ. Graph Neural Networks and Their Current Applications in Bioinformatics. Front Genet 2021; 12:690049. [PMID: 34394185 PMCID: PMC8360394 DOI: 10.3389/fgene.2021.690049] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 05/28/2021] [Indexed: 12/22/2022] Open
Abstract
Graph neural networks (GNNs), as a branch of deep learning in non-Euclidean space, perform particularly well in various tasks that process graph structure data. With the rapid accumulation of biological network data, GNNs have also become an important tool in bioinformatics. In this research, a systematic survey of GNNs and their advances in bioinformatics is presented from multiple perspectives. We first introduce some commonly used GNN models and their basic principles. Then, three representative tasks are proposed based on the three levels of structural information that can be learned by GNNs: node classification, link prediction, and graph generation. Meanwhile, according to the specific applications for various omics data, we categorize and discuss the related studies in three aspects: disease prediction, drug discovery, and biomedical imaging. Based on the analysis, we provide an outlook on the shortcomings of current studies and point out their developing prospect. Although GNNs have achieved excellent results in many biological tasks at present, they still face challenges in terms of low-quality data processing, methodology, and interpretability and have a long road ahead. We believe that GNNs are potentially an excellent method that solves various biological problems in bioinformatics research.
Collapse
Affiliation(s)
- Xiao-Meng Zhang
- School of Information, Yunnan Normal University, Kunming, China
| | - Li Liang
- School of Information, Yunnan Normal University, Kunming, China
| | - Lin Liu
- School of Information, Yunnan Normal University, Kunming, China
- Key Laboratory of Educational Informatization for Nationalities Ministry of Education, Yunnan Normal University, Kunming, China
| | - Ming-Jing Tang
- Key Laboratory of Educational Informatization for Nationalities Ministry of Education, Yunnan Normal University, Kunming, China
- School of Life Sciences, Yunnan Normal University, Kunming, China
| |
Collapse
|
10
|
Zhou S. Introduction to the JBCB special issue on CBC 2019. J Bioinform Comput Biol 2021; 18:2002003. [PMID: 32698723 DOI: 10.1142/s0219720020020035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|