Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	[Subscribe] [Scholar Register]

Number

Cited by Other Article(s)

Lawson S, Donovan D, Lefevre J. An application of node and edge nonlinear hypergraph centrality to a protein complex hypernetwork. PLoS One 2024;19:e0311433. [PMID: 39361678 PMCID: PMC11449304 DOI: 10.1371/journal.pone.0311433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Accepted: 09/12/2024] [Indexed: 10/05/2024] Open

Lu P, Tian J. ACDMBI: A deep learning model based on community division and multi-source biological information fusion predicts essential proteins. Comput Biol Chem 2024;112:108115. [PMID: 38865861 DOI: 10.1016/j.compbiolchem.2024.108115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 05/15/2024] [Accepted: 05/28/2024] [Indexed: 06/14/2024]

Abstract

Accurately identifying essential proteins is vital for drug research and disease diagnosis. Traditional centrality methods and machine learning approaches often face challenges in accurately discerning essential proteins, primarily relying on information derived from protein-protein interaction (PPI) networks. Despite attempts by some researchers to integrate biological data and PPI networks for predicting essential proteins, designing effective integration methods remains a challenge. In response to these challenges, this paper presents the ACDMBI model, specifically designed to overcome the aforementioned issues. ACDMBI is comprised of two key modules: feature extraction and classification. In terms of capturing relevant information, we draw insights from three distinct data sources. Initially, structural features of proteins are extracted from the PPI network through community division. Subsequently, these features are further optimized using Graph Convolutional Networks (GCN) and Graph Attention Networks (GAT). Moving forward, protein features are extracted from gene expression data utilizing Bidirectional Long Short-Term Memory networks (BiLSTM) and a multi-head self-attention mechanism. Finally, protein features are derived by mapping subcellular localization data to a one-dimensional vector and processing it through fully connected layers. In the classification phase, we integrate features extracted from three different data sources, crafting a multi-layer deep neural network (DNN) for protein classification prediction. Experimental results on brewing yeast data showcase the ACDMBI model's superior performance, with AUC reaching 0.9533 and AUPR reaching 0.9153. Ablation experiments further reveal that the effective integration of features from diverse biological information significantly boosts the model's performance.

Collapse

Pan L, Wang H, Yang B, Li W. A protein network refinement method based on module discovery and biological information. BMC Bioinformatics 2024;25:157. [PMID: 38643108 PMCID: PMC11031909 DOI: 10.1186/s12859-024-05772-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Accepted: 04/10/2024] [Indexed: 04/22/2024] Open

Panarese A. Bowel function and inflammation: Is motility the other side of the coin? World J Gastroenterol 2024;30:1963-1967. [PMID: 38681124 PMCID: PMC11045487 DOI: 10.3748/wjg.v30.i14.1963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Revised: 01/23/2024] [Accepted: 03/21/2024] [Indexed: 04/12/2024] Open

Hasibi R, Michoel T, Oyarzún DA. Integration of graph neural networks and genome-scale metabolic models for predicting gene essentiality. NPJ Syst Biol Appl 2024;10:24. [PMID: 38448436 PMCID: PMC10917767 DOI: 10.1038/s41540-024-00348-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 02/08/2024] [Indexed: 03/08/2024] Open

Wang SH, Zhao Y, Wang CC, Chu F, Miao LY, Zhang L, Zhuo L, Chen X. RFEM: A framework for essential microRNA identification in mice based on rotation forest and multiple feature fusion. Comput Biol Med 2024;171:108177. [PMID: 38422957 DOI: 10.1016/j.compbiomed.2024.108177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 01/21/2024] [Accepted: 02/18/2024] [Indexed: 03/02/2024]

Ding H, Li X, Han P, Tian X, Jing F, Wang S, Song T, Fu H, Kang N. MEG-PPIS: a fast protein-protein interaction site prediction method based on multi-scale graph information and equivariant graph neural network. Bioinformatics 2024;40:btae269. [PMID: 38640481 PMCID: PMC11252844 DOI: 10.1093/bioinformatics/btae269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 03/19/2024] [Accepted: 04/17/2024] [Indexed: 04/21/2024] Open

Giordano M, Falbo E, Maddalena L, Piccirillo M, Granata I. Untangling the Context-Specificity of Essential Genes by Means of Machine Learning: A Constructive Experience. Biomolecules 2023;14:18. [PMID: 38254618 PMCID: PMC10813179 DOI: 10.3390/biom14010018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 11/29/2023] [Accepted: 12/20/2023] [Indexed: 01/24/2024] Open

Cong H, Liu H, Cao Y, Liang C, Chen Y. Protein-protein interaction site prediction by model ensembling with hybrid feature and self-attention. BMC Bioinformatics 2023;24:456. [PMID: 38053020 DOI: 10.1186/s12859-023-05592-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Accepted: 11/30/2023] [Indexed: 12/07/2023] Open

Abstract

BACKGROUND

Protein-protein interactions (PPIs) are crucial in various biological functions and cellular processes. Thus, many computational approaches have been proposed to predict PPI sites. Although significant progress has been made, these methods still have limitations in encoding the characteristics of each amino acid in sequences. Many feature extraction methods rely on the sliding window technique, which simply merges all the features of residues into a vector. The importance of some key residues may be weakened in the feature vector, leading to poor performance.

RESULTS

We propose a novel sequence-based method for PPI sites prediction. The new network model, PPINet, contains multiple feature processing paths. For a residue, the PPINet extracts the features of the targeted residue and its context separately. These two types of features are processed by two paths in the network and combined to form a protein representation, where the two types of features are of relatively equal importance. The model ensembling technique is applied to make use of more features. The base models are trained with different features and then ensembled via stacking. In addition, a data balancing strategy is presented, by which our model can get significant improvement on highly unbalanced data.

CONCLUSION

The proposed method is evaluated on a fused dataset constructed from Dset186, Dset_72, and PDBset_164, as well as the public Dset_448 dataset. Compared with current state-of-the-art methods, the performance of our method is better than the others. In the most important metrics, such as AUPRC and recall, it surpasses the second-best programmer on the latter dataset by 6.9% and 4.7%, respectively. We also demonstrated that the improvement is essentially due to using the ensemble model, especially, the hybrid feature. We share our code for reproducibility and future research at https://github.com/CandiceCong/StackingPPINet .

Collapse

Chatterjee S, Sanjeev BS. Over-representation analysis of angiogenic factors in immunosuppressive mechanisms in neoplasms and neurological conditions during COVID-19. Microb Pathog 2023;185:106386. [PMID: 37865274 DOI: 10.1016/j.micpath.2023.106386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 09/27/2023] [Accepted: 10/09/2023] [Indexed: 10/23/2023]

Abstract

BACKGROUND

Recent studies emphasized the necessity to identify key (human) biological processes and pathways targeted by the Coronaviridae family of viruses, especially Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Coronavirus Disease (COVID-19) caused up to 33-55 % death rates in COVID-19 patients with malignant neoplasms and Alzheimer's disease. Given this scenario, we identified biological processes and pathways involved in various diseases which are most likely affected by COVID-19.

METHODS

The COVID-19 DisGeNET data set (v4.0) contains the associations between various diseases and human genes known to interact with viruses from Coronaviridae family and were obtained from the IntAct Coronavirus data set annotated with DisGeNET data. We constructed the disease-gene network to identify genes that are involved in various comorbid diseased states. Communities from the disease-gene network were identified using Louvain method and functional enrichment through over-representation analysis methodology was used to discover significant biological processes and pathways shared between COVID-19 and other diseases.

RESULT

The COVID-19 DisGeNET data set (v4.0) comprised of 828 human genes and 10,473 diseases (including various phenotypes) that together constituted nodes in the disease-gene network. Each of the 70,210 edges connects a human gene with an associated disease. The top 10 genes linked to most number of diseases were VEGFA, BCL2, CTNNB1, ALB, COX2, AGT, HLA-A, HMOX1, FGF2 and COMT. The most vulnerable group of patients thus discovered had comorbid conditions such as carcinomas, malignant neoplasms and Alzheimer's disease. Finally, we identified 15 potentially useful biological processes and pathways for improved therapies. Vascular endothelial growth factor (VEGF) is the key mediator of angiogenesis in cancer. It is widely distributed in the brain and plays a crucial role in brain inflammation regulating the level of angiopoietins. With a degree of 1899, VEGFA was associated with maximum number of diseases in the disease-gene network. Previous studies have indicated that increased levels of VEGFA in the blood results in dyspnea, Pulmonary Edema (PE), Acute Lung Injury (ALI) and Acute Respiratory Distress Syndrome (ARDS). In case of COVID-19 patients with neoplasms and other neurological symptoms, our results indicate VEGFA as a therapeutic target for inflammation suppression. As VEGFs are known to disproportionately affect cancer patients, improving endothelial permeability and vasodilation with anti-VEGF therapy could lead to suppression of inflammation and also improve oxygenation. As an outcome of our study, we make case for clinical investigations towards anti-VEGF therapies for such comorbid conditions affected by COVID-19 for better therapeutic outcomes.

Collapse

Zhao H, Liu G, Cao X. A seed expansion-based method to identify essential proteins by integrating protein-protein interaction sub-networks and multiple biological characteristics. BMC Bioinformatics 2023;24:452. [PMID: 38036960 PMCID: PMC10688502 DOI: 10.1186/s12859-023-05583-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Accepted: 11/24/2023] [Indexed: 12/02/2023] Open

Ma J, Song J, Young ND, Chang BCH, Korhonen PK, Campos TL, Liu H, Gasser RB. 'Bingo'-a large language model- and graph neural network-based workflow for the prediction of essential genes from protein data. Brief Bioinform 2023;25:bbad472. [PMID: 38152979 PMCID: PMC10753293 DOI: 10.1093/bib/bbad472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 10/22/2023] [Accepted: 11/28/2023] [Indexed: 12/29/2023] Open

Bolouki A, Rahimi M, Azarpira N, Baghban F. Integrated multi-omics analysis identifies epigenetic alteration related to neurodegeneration development in post-traumatic stress disorder patients. Psychiatr Genet 2023;33:167-181. [PMID: 37222234 DOI: 10.1097/ypg.0000000000000340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]

Sun J, Pan L, Li B, Wang H, Yang B, Li W. A Construction Method of Dynamic Protein Interaction Networks by Using Relevant Features of Gene Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:2790-2801. [PMID: 37030714 DOI: 10.1109/tcbb.2023.3264241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]

Saxena P, Rauniyar S, Thakur P, Singh RN, Bomgni A, Alaba MO, Tripathi AK, Gnimpieba EZ, Lushbough C, Sani RK. Integration of text mining and biological network analysis: Identification of essential genes in sulfate-reducing bacteria. Front Microbiol 2023;14:1086021. [PMID: 37125195 PMCID: PMC10133479 DOI: 10.3389/fmicb.2023.1086021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 03/23/2023] [Indexed: 05/02/2023] Open

Abstract

The growth and survival of an organism in a particular environment is highly depends on the certain indispensable genes, termed as essential genes. Sulfate-reducing bacteria (SRB) are obligate anaerobes which thrives on sulfate reduction for its energy requirements. The present study used Oleidesulfovibrio alaskensis G20 (OA G20) as a model SRB to categorize the essential genes based on their key metabolic pathways. Herein, we reported a feedback loop framework for gene of interest discovery, from bio-problem to gene set of interest, leveraging expert annotation with computational prediction. Defined bio-problem was applied to retrieve the genes of SRB from literature databases (PubMed, and PubMed Central) and annotated them to the genome of OA G20. Retrieved gene list was further used to enrich protein-protein interaction and was corroborated to the pangenome analysis, to categorize the enriched gene sets and the respective pathways under essential and non-essential. Interestingly, the sat gene (dde_2265) from the sulfur metabolism was the bridging gene between all the enriched pathways. Gene clusters involved in essential pathways were linked with the genes from seleno-compound metabolism, amino acid metabolism, secondary metabolite synthesis, and cofactor biosynthesis. Furthermore, pangenome analysis demonstrated the gene distribution, where 69.83% of the 116 enriched genes were mapped under "persistent," inferring the essentiality of these genes. Likewise, 21.55% of the enriched genes, which involves specially the formate dehydrogenases and metallic hydrogenases, appeared under "shell." Our methodology suggested that semi-automated text mining and network analysis may play a crucial role in deciphering the previously unexplored genes and key mechanisms which can help to generate a baseline prior to perform any experimental studies.

Collapse

Affiliation(s)

Priya Saxena Department of Chemical and Biological Engineering, South Dakota School of Mines and Technology, Rapid City, SD, United States Data Driven Material Discovery Center for Bioengineering Innovation, South Dakota School of Mines and Technology, Rapid City, SD, United States
Shailabh Rauniyar Department of Chemical and Biological Engineering, South Dakota School of Mines and Technology, Rapid City, SD, United States 2-Dimensional Materials for Biofilm Engineering, Science and Technology, South Dakota School of Mines and Technology, Rapid City, SD, United States
Payal Thakur Department of Chemical and Biological Engineering, South Dakota School of Mines and Technology, Rapid City, SD, United States Data Driven Material Discovery Center for Bioengineering Innovation, South Dakota School of Mines and Technology, Rapid City, SD, United States
Ram Nageena Singh Department of Chemical and Biological Engineering, South Dakota School of Mines and Technology, Rapid City, SD, United States 2-Dimensional Materials for Biofilm Engineering, Science and Technology, South Dakota School of Mines and Technology, Rapid City, SD, United States
Alain Bomgni Department of Biomedical Engineering, University of South Dakota, Sioux Falls, SD, United States
Mathew O. Alaba Department of Biomedical Engineering, University of South Dakota, Sioux Falls, SD, United States
Abhilash Kumar Tripathi Department of Chemical and Biological Engineering, South Dakota School of Mines and Technology, Rapid City, SD, United States 2-Dimensional Materials for Biofilm Engineering, Science and Technology, South Dakota School of Mines and Technology, Rapid City, SD, United States
Etienne Z. Gnimpieba Department of Biomedical Engineering, University of South Dakota, Sioux Falls, SD, United States *Correspondence: Etienne Z. Gnimpieba,
Carol Lushbough Department of Biomedical Engineering, University of South Dakota, Sioux Falls, SD, United States
Rajesh Kumar Sani Department of Chemical and Biological Engineering, South Dakota School of Mines and Technology, Rapid City, SD, United States Data Driven Material Discovery Center for Bioengineering Innovation, South Dakota School of Mines and Technology, Rapid City, SD, United States 2-Dimensional Materials for Biofilm Engineering, Science and Technology, South Dakota School of Mines and Technology, Rapid City, SD, United States BuG ReMeDEE Consortium, South Dakota School of Mines and Technology, Rapid City, SD, United States Rajesh Kumar Sani,

Collapse

Manzo M, Giordano M, Maddalena L, Guarracino MR, Granata I. Novel Data Science Methodologies for Essential Genes Identification Based on Network Analysis. STUDIES IN COMPUTATIONAL INTELLIGENCE 2023:117-145. [DOI: 10.1007/978-3-031-24453-7_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]

Identifying Tumor-Associated Genes from Bilayer Networks of DNA Methylation Sites and RNAs. LIFE (BASEL, SWITZERLAND) 2022;13:life13010076. [PMID: 36676027 PMCID: PMC9861397 DOI: 10.3390/life13010076] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 12/21/2022] [Accepted: 12/21/2022] [Indexed: 12/29/2022]

Hao C, Dewar AE, West SA, Ghoul M. Gene transferability and sociality do not correlate with gene connectivity. Proc Biol Sci 2022;289:20221819. [PMID: 36448285 PMCID: PMC9709509 DOI: 10.1098/rspb.2022.1819] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open

Bonner S, Barrett IP, Ye C, Swiers R, Engkvist O, Bender A, Hoyt CT, Hamilton WL. A review of biomedical datasets relating to drug discovery: a knowledge graph perspective. Brief Bioinform 2022;23:6712301. [PMID: 36151740 DOI: 10.1093/bib/bbac404] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 07/14/2022] [Accepted: 08/20/2022] [Indexed: 12/14/2022] Open

ELIMINATOR: essentiality analysis using multisystem networks and integer programming. BMC Bioinformatics 2022;23:324. [PMID: 35933325 PMCID: PMC9357337 DOI: 10.1186/s12859-022-04855-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 07/21/2022] [Indexed: 11/28/2022] Open

Abstract

A gene is considered as essential when it is indispensable for cells to grow and replicate in a certain environment. However, gene essentiality is not a structural property but rather a contextual one, which depends on the specific biological conditions affecting the cell. This circumstantial essentiality of genes is what brings the attention of scientist since we can identify genes essential for cancer cells but not essential for healthy cells. This same contextuality makes their identification extremely challenging. Huge experimental efforts such as Project Achilles where the essentiality of thousands of genes is measured together with a plethora of molecular data (transcriptomics, copy number, mutations, etc.) in over one thousand cell lines can shed light on the causality behind the essentiality of a gene in a given environment. Here, we present an in-silico method for the identification of patient-specific essential genes using constraint-based modelling (CBM). Our method expands the ideas behind traditional CBM to accommodate multisystem networks. In essence, it first calculates the minimum number of lowly expressed genes required to be activated by the cell to sustain life as defined by a set of requirements; and second, it performs an exhaustive in-silico gene knockout to find those that lead to the need of activating additional lowly expressed genes. We validated the proposed methodology using a set of 452 cancer cell lines derived from the Cancer Cell Line Encyclopedia where an exhaustive experimental large-scale gene knockout study using CRISPR (Achilles Project) evaluates the impact of each removal. We also show that the integration of different essentiality predictions per gene, what we called Essentiality Congruity Score, reduces the number of false positives. Finally, we explored our method in a breast cancer patient dataset, and our results showed high concordance with previous publications. These findings suggest that identifying genes whose activity is fundamental to sustain cellular life in a patient-specific manner is feasible using in-silico methods. The patient-level gene essentiality predictions can pave the way for precision medicine by identifying potential drug targets whose deletion can induce death in tumour cells.

Collapse

Yue Y, Ye C, Peng PY, Zhai HX, Ahmad I, Xia C, Wu YZ, Zhang YH. A deep learning framework for identifying essential proteins based on multiple biological information. BMC Bioinformatics 2022;23:318. [PMID: 35927611 PMCID: PMC9351218 DOI: 10.1186/s12859-022-04868-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 07/29/2022] [Indexed: 11/15/2022] Open

Abstract

Background

Essential Proteins are demonstrated to exert vital functions on cellular processes and are indispensable for the survival and reproduction of the organism. Traditional centrality methods perform poorly on complex protein–protein interaction (PPI) networks. Machine learning approaches based on high-throughput data lack the exploitation of the temporal and spatial dimensions of biological information.

Results

We put forward a deep learning framework to predict essential proteins by integrating features obtained from the PPI network, subcellular localization, and gene expression profiles. In our model, the node2vec method is applied to learn continuous feature representations for proteins in the PPI network, which capture the diversity of connectivity patterns in the network. The concept of depthwise separable convolution is employed on gene expression profiles to extract properties and observe the trends of gene expression over time under different experimental conditions. Subcellular localization information is mapped into a long one-dimensional vector to capture its characteristics. Additionally, we use a sampling method to mitigate the impact of imbalanced learning when training the model. With experiments carried out on the data of Saccharomyces cerevisiae, results show that our model outperforms traditional centrality methods and machine learning methods. Likewise, the comparative experiments have manifested that our process of various biological information is preferable.

Conclusions

Our proposed deep learning framework effectively identifies essential proteins by integrating multiple biological data, proving a broader selection of subcellular localization information significantly improves the results of prediction and depthwise separable convolution implemented on gene expression profiles enhances the performance.

Collapse

Affiliation(s)

Yi Yue Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, China. .,School of Information and Computer, Anhui Agricultural University, Hefei, 230036, China. .,School of Life Sciences, Anhui Agricultural University, Hefei, 230036, China. .,State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, 230036, China.
Chen Ye Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, China.,School of Information and Computer, Anhui Agricultural University, Hefei, 230036, China
Pei-Yun Peng Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, China.,School of Information and Computer, Anhui Agricultural University, Hefei, 230036, China
Hui-Xin Zhai Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, China.,School of Information and Computer, Anhui Agricultural University, Hefei, 230036, China
Iftikhar Ahmad Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, China.,School of Information and Computer, Anhui Agricultural University, Hefei, 230036, China
Chuan Xia Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, China.,School of Information and Computer, Anhui Agricultural University, Hefei, 230036, China
Yun-Zhi Wu Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, China.,School of Information and Computer, Anhui Agricultural University, Hefei, 230036, China.,State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, 230036, China
You-Hua Zhang Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, China. .,School of Information and Computer, Anhui Agricultural University, Hefei, 230036, China. .,School of Life Sciences, Anhui Agricultural University, Hefei, 230036, China.

Collapse

Min W, Wan X, Chang TH, Zhang S. A Novel Sparse Graph-Regularized Singular Value Decomposition Model and Its Application to Genomic Data Analysis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022;33:3842-3856. [PMID: 33556027 DOI: 10.1109/tnnls.2021.3054635] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

ProB-Site: Protein Binding Site Prediction Using Local Features. Cells 2022;11:cells11132117. [PMID: 35805201 PMCID: PMC9266162 DOI: 10.3390/cells11132117] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 06/30/2022] [Accepted: 07/01/2022] [Indexed: 01/16/2023] Open

Li X, Xiang J, Wu FX, Li M. A Dual Ranking Algorithm Based on the Multiplex Network for Heterogeneous Complex Disease Analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:1993-2002. [PMID: 33577455 DOI: 10.1109/tcbb.2021.3059046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Zhu Y, Zhang H, Yang Y, Zhang C, Ou-Yang L, Bai L, Deng M, Yi M, Liu S, Wang C. Discovery of pan-cancer related genes via integrative network analysis. Brief Funct Genomics 2022;21:325-338. [PMID: 35760070 DOI: 10.1093/bfgp/elac012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 05/14/2022] [Accepted: 05/25/2022] [Indexed: 01/02/2023] Open

Affiliation(s)

Yuan Zhu School of Automation, China University of Geosciences, Lumo Road, 430074, Wuhan, China.,Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Lumo Road, 430074, Wuhan, China.,Engineering Research Center of Intelligent Technology for Geo-Exploration, Lumo Road, 430074, Wuhan, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence(Fudan University), Ministry of Education, Handan Road, 200433, Shanghai, China
Houwang Zhang Electrical Engineering, City University of HongKong, Kowloon, 999077, HongKong, China
Yuanhang Yang School of Mathematics and Physics, China University of Geosciences, Lumo Road, 430074, Wuhan, China
Chaoyang Zhang School of Computing Sciences and Computer Engineering, The University of Southern Mississippi, Hattiesburg, USA
Le Ou-Yang Guangdong Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of Media Security, Shenzhen University, Nanhai Avenue, 518060, Shenzhen, China
Litai Bai School of Automation, China University of Geosciences, Lumo Road, 430074, Wuhan, China.,Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Lumo Road, 430074, Wuhan, China.,Engineering Research Center of Intelligent Technology for Geo-Exploration, Lumo Road, 430074, Wuhan, China
Minghua Deng School of Mathematical Sciences, Peking University, No.5 Yiheyuan Road, 100871, Beijing, China
Ming Yi School of Mathematics and Physics, China University of Geosciences, Lumo Road, 430074, Wuhan, China
Song Liu School of Automation, China University of Geosciences, Lumo Road, 430074, Wuhan, China.,Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Lumo Road, 430074, Wuhan, China.,Engineering Research Center of Intelligent Technology for Geo-Exploration, Lumo Road, 430074, Wuhan, China
Chao Wang Hepatic Surgery Center, Institute of Hepato-Pancreato-Biliary Surgery, Department of Surgery, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Jiefang Avenue, 430030, Wuhan, China

Collapse

Inference of Molecular Regulatory Systems Using Statistical Path-Consistency Algorithm. ENTROPY 2022;24:e24050693. [PMID: 35626576 PMCID: PMC9142129 DOI: 10.3390/e24050693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 05/12/2022] [Accepted: 05/12/2022] [Indexed: 11/16/2022]

Mosharaf MP, Reza MS, Gov E, Mahumud RA, Mollah MNH. Disclosing Potential Key Genes, Therapeutic Targets and Agents for Non-Small Cell Lung Cancer: Evidence from Integrative Bioinformatics Analysis. Vaccines (Basel) 2022;10:vaccines10050771. [PMID: 35632527 PMCID: PMC9143695 DOI: 10.3390/vaccines10050771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Revised: 05/07/2022] [Accepted: 05/08/2022] [Indexed: 12/10/2022] Open

Abstract

Non-small-cell lung cancer (NSCLC) is considered as one of the malignant cancers that causes premature death. The present study aimed to identify a few potential novel genes highlighting their functions, pathways, and regulators for diagnosis, prognosis, and therapies of NSCLC by using the integrated bioinformatics approaches. At first, we picked out 1943 DEGs between NSCLC and control samples by using the statistical LIMMA approach. Then we selected 11 DEGs (CDK1, EGFR, FYN, UBC, MYC, CCNB1, FOS, RHOB, CDC6, CDC20, and CHEK1) as the hub-DEGs (potential key genes) by the protein–protein interaction network analysis of DEGs. The DEGs and hub-DEGs regulatory network analysis commonly revealed four transcription factors (FOXC1, GATA2, YY1, and NFIC) and five miRNAs (miR-335-5p, miR-26b-5p, miR-92a-3p, miR-155-5p, and miR-16-5p) as the key transcriptional and post-transcriptional regulators of DEGs as well as hub-DEGs. We also disclosed the pathogenetic processes of NSCLC by investigating the biological processes, molecular function, cellular components, and KEGG pathways of DEGs. The multivariate survival probability curves based on the expression of hub-DEGs in the SurvExpress web-tool and database showed the significant differences between the low- and high-risk groups, which indicates strong prognostic power of hub-DEGs. Then, we explored top-ranked 5-hub-DEGs-guided repurposable drugs based on the Connectivity Map (CMap) database. Out of the selected drugs, we validated six FDA-approved launched drugs (Dinaciclib, Afatinib, Icotinib, Bosutinib, Dasatinib, and TWS-119) by molecular docking interaction analysis with the respective target proteins for the treatment against NSCLC. The detected therapeutic targets and repurposable drugs require further attention by experimental studies to establish them as potential biomarkers for precision medicine in NSCLC treatment.

Collapse

Schapke J, Tavares A, Recamonde-Mendoza M. EPGAT: Gene Essentiality Prediction With Graph Attention Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:1615-1626. [PMID: 33497339 DOI: 10.1109/tcbb.2021.3054738] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Mahbub S, Bayzid MS. EGRET: edge aggregated graph attention networks and transfer learning improve protein-protein interaction site prediction. Brief Bioinform 2022;23:6518045. [PMID: 35106547 DOI: 10.1093/bib/bbab578] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Revised: 11/25/2021] [Accepted: 12/16/2021] [Indexed: 12/18/2022] Open

Liu Y, Liang H, Zou Q, He Z. Significance-Based Essential Protein Discovery. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:633-642. [PMID: 32750873 DOI: 10.1109/tcbb.2020.3004364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Her HL, Lin PT, Wu YW. PangenomeNet: a pan-genome-based network reveals functional modules on antimicrobial resistome for Escherichia coli strains. BMC Bioinformatics 2021;22:548. [PMID: 34758735 PMCID: PMC8579557 DOI: 10.1186/s12859-021-04459-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 10/19/2021] [Indexed: 11/30/2022] Open

Abstract

BACKGROUND

Discerning genes crucial to antimicrobial resistance (AMR) mechanisms is becoming more and more important to accurately and swiftly identify AMR pathogenic strains. Pangenome-wide association studies (e.g. Scoary) identified numerous putative AMR genes. However, only a tiny proportion of the putative resistance genes are annotated by AMR databases or Gene Ontology. In addition, many putative resistance genes are of unknown function (termed hypothetical proteins). An annotation tool is crucially needed in order to reveal the functional organization of the resistome and expand our knowledge of the AMR gene repertoire.

RESULTS

We developed an approach (PangenomeNet) for building co-functional networks from pan-genomes to infer functions for hypothetical genes. Using Escherichia coli as an example, we demonstrated that it is possible to build co-functional network from its pan-genome using co-inheritance, domain-sharing, and protein-protein-interaction information. The investigation of the network revealed that it fits the characteristics of biological networks and can be used for functional inferences. The subgraph consisting of putative meropenem resistance genes consists of clusters of stress response genes and resistance gene acquisition pathways. Resistome subgraphs also demonstrate drug-specific AMR genes such as beta-lactamase, as well as functional roles shared among multiple classes of drugs, mostly in the stress-related pathways.

CONCLUSIONS

By demonstrating the idea of pan-genome-based co-functional network on the E. coli species, we showed that the network can infer functional roles of the genes, including those without functional annotations, and provides holistic views on the putative antimicrobial resistomes. We hope that the pan-genome network idea can help formulate hypothesis for targeted experimental works.

Collapse

Liu Y, Chen W, He Z. Essential Protein Recognition via Community Significance. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:2788-2794. [PMID: 34347602 DOI: 10.1109/tcbb.2021.3102018] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Li X, Xiang J, Wang J, Li J, Wu FX, Li M. FUNMarker: Fusion Network-Based Method to Identify Prognostic and Heterogeneous Breast Cancer Biomarkers. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:2483-2491. [PMID: 32070993 DOI: 10.1109/tcbb.2020.2973148] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Liu Y, Wei X, Chen W, Hu L, He Z. A graph-traversal approach to identify influential nodes in a network. PATTERNS 2021;2:100321. [PMID: 34553168 PMCID: PMC8441579 DOI: 10.1016/j.patter.2021.100321] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 05/16/2021] [Accepted: 07/07/2021] [Indexed: 11/19/2022]

Abstract

Influential node identification plays a significant role in understanding network structure and functions. Here we propose a general method for detecting influential nodes in a graph-traversal framework. We evaluate the influence of each node by constructing a breadth-first search (BFS) tree in which the target node is the root node. From the BFS tree, we generate a curve in which the x axis is the level number and the y axis is the cumulative scores of all nodes visited so far. We use the area under the curve value as the final influence score of the target node. Experimental results on various networks across different domains demonstrate that our method can be significantly superior to widely used centrality measures on the task of influential node detection.

•

We propose an influential node detection method, TARank, in a graph-traversal framework

•

We evaluate the influence of each node by constructing a breadth-first search tree

•

TARank is capable of enhancing existing centrality measures

•

TARank can yield new, yet effective, centrality measures as well

The discovery of influential nodes is a fundamental research issue in network science. To quantify the influence of each node in a network, various methods have been presented in the literature. To the best of our knowledge, no previous research efforts address the influential node identification problem from a graph-traversal perspective. To fulfill this void, we propose the TARank method that integrates the information collected from the breadth-first search tree to identify influential nodes. The formulation under the graph-traversal framework opens the door to a fundamentally new type of method of influential node identification. In the future, more effective recognition methods can be expected to be constructed based on this general framework. Since empirical studies have validated the effectiveness of TARank, it would be plausible to employ this method in different applications to reveal new findings.

Collapse

Protein Integrated Network Analysis to Reveal Potential Drug Targets Against Extended Drug-Resistant Mycobacterium tuberculosis XDR1219. Mol Biotechnol 2021;63:1252-1267. [PMID: 34382159 DOI: 10.1007/s12033-021-00377-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 07/30/2021] [Indexed: 10/20/2022]

Li XY, Xiang J, Wu FX, Li M. NetAUC: A network-based multi-biomarker identification method by AUC optimization. Methods 2021;198:56-64. [PMID: 34364986 DOI: 10.1016/j.ymeth.2021.08.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 07/08/2021] [Accepted: 08/03/2021] [Indexed: 10/20/2022] Open

Integrated Inference of Asymmetric Protein Interaction Networks Using Dynamic Model and Individual Patient Proteomics Data. Symmetry (Basel) 2021. [DOI: 10.3390/sym13061097] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open

Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study. MATHEMATICAL AND COMPUTATIONAL APPLICATIONS 2021. [DOI: 10.3390/mca26020040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]

Schonfeld E, Vendrow E, Vendrow J, Schonfeld E. On the relation of gene essentiality to intron structure: a computational and deep learning approach. Life Sci Alliance 2021;4:4/6/e202000951. [PMID: 33906938 PMCID: PMC8127325 DOI: 10.26508/lsa.202000951] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Revised: 04/12/2021] [Accepted: 04/15/2021] [Indexed: 11/24/2022] Open

Xiang J, Zhang J, Zheng R, Li X, Li M. NIDM: network impulsive dynamics on multiplex biological network for disease-gene prediction. Brief Bioinform 2021;22:6236070. [PMID: 33866352 DOI: 10.1093/bib/bbab080] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 02/11/2021] [Accepted: 02/21/2021] [Indexed: 12/12/2022] Open

Lv Z, Cui F, Zou Q, Zhang L, Xu L. Anticancer peptides prediction with deep representation learning features. Brief Bioinform 2021;22:6126754. [PMID: 33529337 DOI: 10.1093/bib/bbab008] [Citation(s) in RCA: 78] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Revised: 12/20/2020] [Accepted: 01/05/2021] [Indexed: 12/13/2022] Open

Kumar N, Mishra B, Athar M, Mukhtar S. Inference of Gene Regulatory Network from Single-Cell Transcriptomic Data Using pySCENIC. Methods Mol Biol 2021;2328:171-182. [PMID: 34251625 DOI: 10.1007/978-1-0716-1534-8_10] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]

Zheng R, Li M, Chen X, Zhao S, Wu FX, Pan Y, Wang J. An Ensemble Method to Reconstruct Gene Regulatory Networks Based on Multivariate Adaptive Regression Splines. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:347-354. [PMID: 30794516 DOI: 10.1109/tcbb.2019.2900614] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Zeng M, Li M, Fei Z, Wu FX, Li Y, Pan Y, Wang J. A Deep Learning Framework for Identifying Essential Proteins by Integrating Multiple Types of Biological Information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:296-305. [PMID: 30736002 DOI: 10.1109/tcbb.2019.2897679] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Nandi S, Ganguli P, Sarkar RR. Essential gene prediction using limited gene essentiality information-An integrative semi-supervised machine learning strategy. PLoS One 2020;15:e0242943. [PMID: 33253254 PMCID: PMC7703937 DOI: 10.1371/journal.pone.0242943] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Accepted: 11/12/2020] [Indexed: 11/24/2022] Open

Abstract

Essential gene prediction helps to find minimal genes indispensable for the survival of any organism. Machine learning (ML) algorithms have been useful for the prediction of gene essentiality. However, currently available ML pipelines perform poorly for organisms with limited experimental data. The objective is the development of a new ML pipeline to help in the annotation of essential genes of less explored disease-causing organisms for which minimal experimental data is available. The proposed strategy combines unsupervised feature selection technique, dimension reduction using the Kamada-Kawai algorithm, and semi-supervised ML algorithm employing Laplacian Support Vector Machine (LapSVM) for prediction of essential and non-essential genes from genome-scale metabolic networks using very limited labeled dataset. A novel scoring technique, Semi-Supervised Model Selection Score, equivalent to area under the ROC curve (auROC), has been proposed for the selection of the best model when supervised performance metrics calculation is difficult due to lack of data. The unsupervised feature selection followed by dimension reduction helped to observe a distinct circular pattern in the clustering of essential and non-essential genes. LapSVM then created a curve that dissected this circle for the classification and prediction of essential genes with high accuracy (auROC > 0.85) even with 1% labeled data for model training. After successful validation of this ML pipeline on both Eukaryotes and Prokaryotes that show high accuracy even when the labeled dataset is very limited, this strategy is used for the prediction of essential genes of organisms with inadequate experimentally known data, such as Leishmania sp. Using a graph-based semi-supervised machine learning scheme, a novel integrative approach has been proposed for essential gene prediction that shows universality in application to both Prokaryotes and Eukaryotes with limited labeled data. The essential genes predicted using the pipeline provide an important lead for the prediction of gene essentiality and identification of novel therapeutic targets for antibiotic and vaccine development against disease-causing parasites.

Collapse

Liu X, Maiorino E, Halu A, Glass K, Prasad RB, Loscalzo J, Gao J, Sharma A. Robustness and lethality in multilayer biological molecular networks. Nat Commun 2020;11:6043. [PMID: 33247151 PMCID: PMC7699651 DOI: 10.1038/s41467-020-19841-3] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2020] [Accepted: 10/26/2020] [Indexed: 12/27/2022] Open

Mazandu GK, Hooper C, Opap K, Makinde F, Nembaware V, Thomford NE, Chimusa ER, Wonkam A, Mulder NJ. IHP-PING-generating integrated human protein-protein interaction networks on-the-fly. Brief Bioinform 2020;22:5943797. [PMID: 33129201 DOI: 10.1093/bib/bbaa277] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 09/12/2020] [Accepted: 09/21/2020] [Indexed: 01/04/2023] Open

Affiliation(s)

Gaston K Mazandu Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, CIDRI-Africa WT Centre, University of Cape Town, Health Sciences Campus. Anzio Rd, Observatory, 7925, South Africa.,African Institute for Mathematical Sciences, 5-7 Melrose Road, Muizenberg, 7945, Cape Town, South Africa.,Division of Human Genetics, Department of Pathology, University of Cape Town, Health Sciences Campus, Anzio Rd, Observatory, 7925, South Africa
Christopher Hooper Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, CIDRI-Africa WT Centre, University of Cape Town, Health Sciences Campus. Anzio Rd, Observatory, 7925, South Africa
Kenneth Opap Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, CIDRI-Africa WT Centre, University of Cape Town, Health Sciences Campus. Anzio Rd, Observatory, 7925, South Africa
Funmilayo Makinde Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, CIDRI-Africa WT Centre, University of Cape Town, Health Sciences Campus. Anzio Rd, Observatory, 7925, South Africa.,African Institute for Mathematical Sciences, 5-7 Melrose Road, Muizenberg, 7945, Cape Town, South Africa
Victoria Nembaware Division of Human Genetics, Department of Pathology, University of Cape Town, Health Sciences Campus, Anzio Rd, Observatory, 7925, South Africa
Nicholas E Thomford Division of Human Genetics, Department of Pathology, University of Cape Town, Health Sciences Campus, Anzio Rd, Observatory, 7925, South Africa.,School of Medical Sciences, University of Cape Coast, PMB, Cape Coast, Ghana
Emile R Chimusa Division of Human Genetics, Department of Pathology, University of Cape Town, Health Sciences Campus, Anzio Rd, Observatory, 7925, South Africa
Ambroise Wonkam Division of Human Genetics, Department of Pathology, University of Cape Town, Health Sciences Campus, Anzio Rd, Observatory, 7925, South Africa
Nicola J Mulder Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, CIDRI-Africa WT Centre, University of Cape Town, Health Sciences Campus. Anzio Rd, Observatory, 7925, South Africa

Collapse

Liu R, Mancuso CA, Yannakopoulos A, Johnson KA, Krishnan A. Supervised learning is an accurate method for network-based gene classification. Bioinformatics 2020;36:3457-3465. [PMID: 32129827 PMCID: PMC7267831 DOI: 10.1093/bioinformatics/btaa150] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 12/01/2019] [Accepted: 02/27/2020] [Indexed: 12/22/2022] Open

Abstract

Background

Assigning every human gene to specific functions, diseases and traits is a grand challenge in modern genetics. Key to addressing this challenge are computational methods, such as supervised learning and label propagation, that can leverage molecular interaction networks to predict gene attributes. In spite of being a popular machine-learning technique across fields, supervised learning has been applied only in a few network-based studies for predicting pathway-, phenotype- or disease-associated genes. It is unknown how supervised learning broadly performs across different networks and diverse gene classification tasks, and how it compares to label propagation, the widely benchmarked canonical approach for this problem.

Results

In this study, we present a comprehensive benchmarking of supervised learning for network-based gene classification, evaluating this approach and a classic label propagation technique on hundreds of diverse prediction tasks and multiple networks using stringent evaluation schemes. We demonstrate that supervised learning on a gene’s full network connectivity outperforms label propagaton and achieves high prediction accuracy by efficiently capturing local network properties, rivaling label propagation’s appeal for naturally using network topology. We further show that supervised learning on the full network is also superior to learning on node embeddings (derived using node2vec), an increasingly popular approach for concisely representing network connectivity. These results show that supervised learning is an accurate approach for prioritizing genes associated with diverse functions, diseases and traits and should be considered a staple of network-based gene classification workflows.

Availability and implementation

The datasets and the code used to reproduce the results and add new gene classification methods have been made freely available.

Contact

arjun@msu.edu

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Khorsand B, Savadi A, Naghibzadeh M. Comprehensive host-pathogen protein-protein interaction network analysis. BMC Bioinformatics 2020;21:400. [PMID: 32912135 PMCID: PMC7488060 DOI: 10.1186/s12859-020-03706-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2020] [Accepted: 07/31/2020] [Indexed: 12/27/2022] Open

Abstract

BACKGROUND

Infectious diseases are a cruel assassin with millions of victims around the world each year. Understanding infectious mechanism of viruses is indispensable for their inhibition. One of the best ways of unveiling this mechanism is to investigate the host-pathogen protein-protein interaction network. In this paper we try to disclose many properties of this network. We focus on human as host and integrate experimentally 32,859 interaction between human proteins and virus proteins from several databases. We investigate different properties of human proteins targeted by virus proteins and find that most of them have a considerable high centrality scores in human intra protein-protein interaction network. Investigating human proteins network properties which are targeted by different virus proteins can help us to design multipurpose drugs.

RESULTS

As host-pathogen protein-protein interaction network is a bipartite network and centrality measures for this type of networks are scarce, we proposed seven new centrality measures for analyzing bipartite networks. Applying them to different virus strains reveals unrandomness of attack strategies of virus proteins which could help us in drug design hence elevating the quality of life. They could also be used in detecting host essential proteins. Essential proteins are those whose functions are critical for survival of its host. One of the proposed centralities named diversity of predators, outperforms the other existing centralities in terms of detecting essential proteins and could be used as an optimal essential proteins' marker.

CONCLUSIONS

Different centralities were applied to analyze human protein-protein interaction network and to detect characteristics of human proteins targeted by virus proteins. Moreover, seven new centralities were proposed to analyze host-pathogen protein-protein interaction network and to detect pathogens' favorite host protein victims. Comparing different centralities in detecting essential proteins reveals that diversity of predator (one of the proposed centralities) is the best essential protein marker.

Collapse

Zhang X, Xiao W, Xiao W. DeepHE: Accurately predicting human essential genes based on deep learning. PLoS Comput Biol 2020;16:e1008229. [PMID: 32936825 PMCID: PMC7521708 DOI: 10.1371/journal.pcbi.1008229] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 09/28/2020] [Accepted: 08/09/2020] [Indexed: 11/19/2022] Open

Abstract

Accurately predicting essential genes using computational methods can greatly reduce the effort in finding them via wet experiments at both time and resource scales, and further accelerate the process of drug discovery. Several computational methods have been proposed for predicting essential genes in model organisms by integrating multiple biological data sources either via centrality measures or machine learning based methods. However, the methods aiming to predict human essential genes are still limited and the performance still need improve. In addition, most of the machine learning based essential gene prediction methods are lack of skills to handle the imbalanced learning issue inherent in the essential gene prediction problem, which might be one factor affecting their performance. We propose a deep learning based method, DeepHE, to predict human essential genes by integrating features derived from sequence data and protein-protein interaction (PPI) network. A deep learning based network embedding method is utilized to automatically learn features from PPI network. In addition, 89 sequence features were derived from DNA sequence and protein sequence for each gene. These two types of features are integrated to train a multilayer neural network. A cost-sensitive technique is used to address the imbalanced learning problem when training the deep neural network. The experimental results for predicting human essential genes show that our proposed method, DeepHE, can accurately predict human gene essentiality with an average performance of AUC higher than 94%, the area under precision-recall curve (AP) higher than 90%, and the accuracy higher than 90%. We also compare DeepHE with several widely used traditional machine learning models (SVM, Naïve Bayes, Random Forest, and Adaboost) using the same features and utilizing the same cost-sensitive technique to against the imbalanced learning issue. The experimental results show that DeepHE significantly outperforms the compared machine learning models. We have demonstrated that human essential genes can be accurately predicted by designing effective machine learning algorithm and integrating representative features captured from available biological data. The proposed deep learning framework is effective for such task.

Collapse