Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Kulmanov M, Khan MA, Hoehndorf R, Wren J. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinformatics 2018;34:660-668. [PMID: 29028931 PMCID: PMC5860606 DOI: 10.1093/bioinformatics/btx624] [Citation(s) in RCA: 212] [Impact Index Per Article: 35.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Accepted: 09/27/2017] [Indexed: 12/29/2022] Open

For:	Kulmanov M, Khan MA, Hoehndorf R, Wren J. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinformatics 2018;34:660-668. [PMID: 29028931 PMCID: PMC5860606 DOI: 10.1093/bioinformatics/btx624] [Citation(s) in RCA: 212] [Impact Index Per Article: 35.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Accepted: 09/27/2017] [Indexed: 12/29/2022] Open

Number

Cited by Other Article(s)

151

Zhang F, Cetin Karayumak S, Hoffmann N, Rathi Y, Golby AJ, O'Donnell LJ. Deep white matter analysis (DeepWMA): Fast and consistent tractography segmentation. Med Image Anal 2020;65:101761. [PMID: 32622304 PMCID: PMC7483951 DOI: 10.1016/j.media.2020.101761] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 06/16/2020] [Accepted: 06/18/2020] [Indexed: 02/07/2023]

152

Dutta P, Mishra P, Saha S. Incomplete multi-view gene clustering with data regeneration using Shape Boltzmann Machine. Comput Biol Med 2020;125:103965. [PMID: 32931989 DOI: 10.1016/j.compbiomed.2020.103965] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 08/08/2020] [Accepted: 08/08/2020] [Indexed: 11/17/2022]

153

Hew B, Tan QW, Goh W, Ng JWX, Mutwil M. LSTrAP-Crowd: prediction of novel components of bacterial ribosomes with crowd-sourced analysis of RNA sequencing data. BMC Biol 2020;18:114. [PMID: 32883264 PMCID: PMC7470450 DOI: 10.1186/s12915-020-00846-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Accepted: 08/12/2020] [Indexed: 12/12/2022] Open

154

Ranjan A, Fahad MS, Fernandez-Baca D, Deepak A, Tripathi S. Deep Robust Framework for Protein Function Prediction Using Variable-Length Protein Sequences. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020;17:1648-1659. [PMID: 30998479 DOI: 10.1109/tcbb.2019.2911609] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

155

Fan K, Zhang Y. Pseudo2GO: A Graph-Based Deep Learning Method for Pseudogene Function Prediction by Borrowing Information From Coding Genes. Front Genet 2020;11:807. [PMID: 33014009 PMCID: PMC7461887 DOI: 10.3389/fgene.2020.00807] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Accepted: 07/06/2020] [Indexed: 12/16/2022] Open

156

Saha S, Prasad A, Chatterjee P, Basu S, Nasipuri M. Protein function prediction from dynamic protein interaction network using gene expression data. J Bioinform Comput Biol 2020;17:1950025. [PMID: 31617461 DOI: 10.1142/s0219720019500252] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

157

Fan K, Guan Y, Zhang Y. Graph2GO: a multi-modal attributed network embedding method for inferring protein functions. Gigascience 2020;9:giaa081. [PMID: 32770210 PMCID: PMC7414417 DOI: 10.1093/gigascience/giaa081] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2019] [Revised: 04/30/2020] [Indexed: 01/17/2023] Open

158

Le DH. UFO: A tool for unifying biomedical ontology-based semantic similarity calculation, enrichment analysis and visualization. PLoS One 2020;15:e0235670. [PMID: 32645039 PMCID: PMC7347127 DOI: 10.1371/journal.pone.0235670] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2019] [Accepted: 06/22/2020] [Indexed: 02/06/2023] Open

Abstract

Background

Biomedical ontologies have been growing quickly and proven to be useful in many biomedical applications. Important applications of those data include estimating the functional similarity between ontology terms and between annotated biomedical entities, analyzing enrichment for a set of biomedical entities. Many semantic similarity calculation and enrichment analysis methods have been proposed for such applications. Also, a number of tools implementing the methods have been developed on different platforms. However, these tools have implemented a small number of the semantic similarity calculation and enrichment analysis methods for a certain type of biomedical ontology. Note that the methods can be applied to all types of biomedical ontologies. More importantly, each method can be dominant in different applications; thus, users have more choice with more number of methods implemented in tools. Also, more functions would facilitate their task with ontology.

Results

In this study, we developed a Cytoscape app, named UFO, which unifies most of the semantic similarity measures for between-term and between-entity similarity calculation for all types of biomedical ontologies in OBO format. Based on the similarity calculation, UFO can calculate the similarity between two sets of entities and weigh imported entity networks as well as generate functional similarity networks. Besides, it can perform enrichment analysis of a set of entities by different methods. Moreover, UFO can visualize structural relationships between ontology terms, annotating relationships between entities and terms, and functional similarity between entities. Finally, we demonstrated the ability of UFO through some case studies on finding the best semantic similarity measures for assessing the similarity between human disease phenotypes, constructing biomedical entity functional similarity networks for predicting disease-associated biomarkers, and performing enrichment analysis on a set of similar phenotypes.

Conclusions

Taken together, UFO is expected to be a tool where biomedical ontologies can be exploited for various biomedical applications.

Availability

UFO is distributed as a Cytoscape app, and can be downloaded freely at Cytoscape App (http://apps.cytoscape.org/apps/ufo) for non-commercial use

Collapse

159

Liu Z, Chen Q, Lan W, Liang J, Chen YPP, Chen B. A Survey of Network Embedding for Drug Analysis and Prediction. Curr Protein Pept Sci 2020;22:CPPS-EPUB-107859. [PMID: 32614745 DOI: 10.2174/1389203721666200702145701] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Revised: 04/05/2020] [Accepted: 05/21/2020] [Indexed: 11/22/2022]

160

Lennox M, Robertson N, Devereux B. Expanding the Vocabulary of a Protein: Application of Subword Algorithms to Protein Sequence Modelling. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2020;2020:2361-2367. [PMID: 33018481 DOI: 10.1109/embc44109.2020.9176380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

161

Poverennaya E, Kiseleva O, Romanova A, Pyatnitskiy M. Predicting Functions of Uncharacterized Human Proteins: From Canonical to Proteoforms. Genes (Basel) 2020;11:E677. [PMID: 32575886 PMCID: PMC7350264 DOI: 10.3390/genes11060677] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 06/09/2020] [Accepted: 06/19/2020] [Indexed: 01/22/2023] Open

162

ElAbd H, Bromberg Y, Hoarfrost A, Lenz T, Franke A, Wendorff M. Amino acid encoding for deep learning applications. BMC Bioinformatics 2020;21:235. [PMID: 32517697 PMCID: PMC7285590 DOI: 10.1186/s12859-020-03546-x] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Accepted: 05/12/2020] [Indexed: 12/25/2022] Open

Abstract

BACKGROUND

The number of applications of deep learning algorithms in bioinformatics is increasing as they usually achieve superior performance over classical approaches, especially, when bigger training datasets are available. In deep learning applications, discrete data, e.g. words or n-grams in language, or amino acids or nucleotides in bioinformatics, are generally represented as a continuous vector through an embedding matrix. Recently, learning this embedding matrix directly from the data as part of the continuous iteration of the model to optimize the target prediction - a process called 'end-to-end learning' - has led to state-of-the-art results in many fields. Although usage of embeddings is well described in the bioinformatics literature, the potential of end-to-end learning for single amino acids, as compared to more classical manually-curated encoding strategies, has not been systematically addressed. To this end, we compared classical encoding matrices, namely one-hot, VHSE8 and BLOSUM62, to end-to-end learning of amino acid embeddings for two different prediction tasks using three widely used architectures, namely recurrent neural networks (RNN), convolutional neural networks (CNN), and the hybrid CNN-RNN.

RESULTS

By using different deep learning architectures, we show that end-to-end learning is on par with classical encodings for embeddings of the same dimension even when limited training data is available, and might allow for a reduction in the embedding dimension without performance loss, which is critical when deploying the models to devices with limited computational capacities. We found that the embedding dimension is a major factor in controlling the model performance. Surprisingly, we observed that deep learning models are capable of learning from random vectors of appropriate dimension.

CONCLUSION

Our study shows that end-to-end learning is a flexible and powerful method for amino acid encoding. Further, due to the flexibility of deep learning systems, amino acid encoding schemes should be benchmarked against random vectors of the same dimension to disentangle the information content provided by the encoding scheme from the distinguishability effect provided by the scheme.

Collapse

163

Mutwil M. Computational approaches to unravel the pathways and evolution of specialized metabolism. CURRENT OPINION IN PLANT BIOLOGY 2020;55:38-46. [PMID: 32200228 DOI: 10.1016/j.pbi.2020.01.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Revised: 01/19/2020] [Accepted: 01/31/2020] [Indexed: 05/13/2023]

164

Crawford J, Greene CS. Incorporating biological structure into machine learning models in biomedicine. Curr Opin Biotechnol 2020;63:126-134. [PMID: 31962244 PMCID: PMC7308204 DOI: 10.1016/j.copbio.2019.12.021] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Revised: 12/17/2019] [Accepted: 12/19/2019] [Indexed: 12/19/2022]

165

You R, Yao S, Xiong Y, Huang X, Sun F, Mamitsuka H, Zhu S. NetGO: improving large-scale protein function prediction with massive network information. Nucleic Acids Res 2020;47:W379-W387. [PMID: 31106361 PMCID: PMC6602452 DOI: 10.1093/nar/gkz388] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 04/24/2019] [Accepted: 05/01/2019] [Indexed: 01/19/2023] Open

Affiliation(s)

Ronghui You School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai 200433, China.,Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, China
Shuwei Yao School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai 200433, China.,Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, China
Yi Xiong Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University
Xiaodi Huang School of Computing and Mathematics, Charles Sturt University, Albury, NSW 2640, Australia
Fengzhu Sun Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, China.,Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
Hiroshi Mamitsuka Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji 611-0011, Japan.,Department of Computer Science, Aalto University, Espoo and Helsinki, Finland
Shanfeng Zhu School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai 200433, China.,Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, China

Collapse

166

Jin S, Zeng X, Xia F, Huang W, Liu X. Application of deep learning methods in biological networks. Brief Bioinform 2020;22:1902-1917. [PMID: 32363401 DOI: 10.1093/bib/bbaa043] [Citation(s) in RCA: 84] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2019] [Revised: 02/19/2020] [Accepted: 03/05/2020] [Indexed: 01/07/2023] Open

167

Cai Y, Wang J, Deng L. SDN2GO: An Integrated Deep Learning Model for Protein Function Prediction. Front Bioeng Biotechnol 2020;8:391. [PMID: 32411695 PMCID: PMC7201018 DOI: 10.3389/fbioe.2020.00391] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Accepted: 04/07/2020] [Indexed: 02/01/2023] Open

168

Sangphukieo A, Laomettachit T, Ruengjitchatchawalya M. Photosynthetic protein classification using genome neighborhood-based machine learning feature. Sci Rep 2020;10:7108. [PMID: 32346070 PMCID: PMC7189237 DOI: 10.1038/s41598-020-64053-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Accepted: 04/07/2020] [Indexed: 11/08/2022] Open

169

Zhao Y, Wang J, Chen J, Zhang X, Guo M, Yu G. A Literature Review of Gene Function Prediction by Modeling Gene Ontology. Front Genet 2020;11:400. [PMID: 32391061 PMCID: PMC7193026 DOI: 10.3389/fgene.2020.00400] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 03/30/2020] [Indexed: 12/14/2022] Open

170

Thapa N, Chaudhari M, McManus S, Roy K, Newman RH, Saigo H, Kc DB. DeepSuccinylSite: a deep learning based approach for protein succinylation site prediction. BMC Bioinformatics 2020;21:63. [PMID: 32321437 PMCID: PMC7178942 DOI: 10.1186/s12859-020-3342-z] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Accepted: 01/08/2020] [Indexed: 01/15/2023] Open

Abstract

Background

Protein succinylation has recently emerged as an important and common post-translation modification (PTM) that occurs on lysine residues. Succinylation is notable both in its size (e.g., at 100 Da, it is one of the larger chemical PTMs) and in its ability to modify the net charge of the modified lysine residue from + 1 to − 1 at physiological pH. The gross local changes that occur in proteins upon succinylation have been shown to correspond with changes in gene activity and to be perturbed by defects in the citric acid cycle. These observations, together with the fact that succinate is generated as a metabolic intermediate during cellular respiration, have led to suggestions that protein succinylation may play a role in the interaction between cellular metabolism and important cellular functions. For instance, succinylation likely represents an important aspect of genomic regulation and repair and may have important consequences in the etiology of a number of disease states. In this study, we developed DeepSuccinylSite, a novel prediction tool that uses deep learning methodology along with embedding to identify succinylation sites in proteins based on their primary structure.

Results

Using an independent test set of experimentally identified succinylation sites, our method achieved efficiency scores of 79%, 68.7% and 0.48 for sensitivity, specificity and MCC respectively, with an area under the receiver operator characteristic (ROC) curve of 0.8. In side-by-side comparisons with previously described succinylation predictors, DeepSuccinylSite represents a significant improvement in overall accuracy for prediction of succinylation sites.

Conclusion

Together, these results suggest that our method represents a robust and complementary technique for advanced exploration of protein succinylation.

Collapse

171

Strodthoff N, Wagner P, Wenzel M, Samek W. UDSMProt: universal deep sequence models for protein classification. Bioinformatics 2020;36:2401-2409. [PMID: 31913448 PMCID: PMC7178389 DOI: 10.1093/bioinformatics/btaa003] [Citation(s) in RCA: 73] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Revised: 12/13/2019] [Accepted: 01/02/2020] [Indexed: 01/03/2023] Open

172

Peng J, Xue H, Wei Z, Tuncali I, Hao J, Shang X. Integrating multi-network topology for gene function prediction using deep neural networks. Brief Bioinform 2020;22:2096-2105. [PMID: 32249297 DOI: 10.1093/bib/bbaa036] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Revised: 02/09/2020] [Accepted: 02/25/2020] [Indexed: 01/18/2023] Open

173

Fabris F, Palmer D, Salama KM, de Magalhães JP, Freitas AA. Using deep learning to associate human genes with age-related diseases. Bioinformatics 2020;36:2202-2208. [PMID: 31845988 PMCID: PMC7141856 DOI: 10.1093/bioinformatics/btz887] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Revised: 09/06/2019] [Accepted: 12/13/2019] [Indexed: 11/15/2022] Open

174

The sterlet sturgeon genome sequence and the mechanisms of segmental rediploidization. Nat Ecol Evol 2020;4:841-852. [PMID: 32231327 PMCID: PMC7269910 DOI: 10.1038/s41559-020-1166-x] [Citation(s) in RCA: 124] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Accepted: 02/27/2020] [Indexed: 12/20/2022]

175

Makrodimitris S, van Ham RCHJ, Reinders MJT. Improving protein function prediction using protein sequence and GO-term similarities. Bioinformatics 2020;35:1116-1124. [PMID: 30169569 PMCID: PMC6449755 DOI: 10.1093/bioinformatics/bty751] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2017] [Revised: 07/04/2018] [Accepted: 08/28/2018] [Indexed: 12/26/2022] Open

176

Yue X, Wang Z, Huang J, Parthasarathy S, Moosavinasab S, Huang Y, Lin SM, Zhang W, Zhang P, Sun H. Graph embedding on biomedical networks: methods, applications and evaluations. Bioinformatics 2020;36:1241-1251. [PMID: 31584634 PMCID: PMC7703771 DOI: 10.1093/bioinformatics/btz718] [Citation(s) in RCA: 102] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Revised: 08/25/2019] [Accepted: 09/26/2019] [Indexed: 01/12/2023] Open

Abstract

MOTIVATION

Graph embedding learning that aims to automatically learn low-dimensional node representations, has drawn increasing attention in recent years. To date, most recent graph embedding methods are evaluated on social and information networks and are not comprehensively studied on biomedical networks under systematic experiments and analyses. On the other hand, for a variety of biomedical network analysis tasks, traditional techniques such as matrix factorization (which can be seen as a type of graph embedding methods) have shown promising results, and hence there is a need to systematically evaluate the more recent graph embedding methods (e.g. random walk-based and neural network-based) in terms of their usability and potential to further the state-of-the-art.

RESULTS

We select 11 representative graph embedding methods and conduct a systematic comparison on 3 important biomedical link prediction tasks: drug-disease association (DDA) prediction, drug-drug interaction (DDI) prediction, protein-protein interaction (PPI) prediction; and 2 node classification tasks: medical term semantic type classification, protein function prediction. Our experimental results demonstrate that the recent graph embedding methods achieve promising results and deserve more attention in the future biomedical graph analysis. Compared with three state-of-the-art methods for DDAs, DDIs and protein function predictions, the recent graph embedding methods achieve competitive performance without using any biological features and the learned embeddings can be treated as complementary representations for the biological features. By summarizing the experimental results, we provide general guidelines for properly selecting graph embedding methods and setting their hyper-parameters for different biomedical tasks.

AVAILABILITY AND IMPLEMENTATION

As part of our contributions in the paper, we develop an easy-to-use Python package with detailed instructions, BioNEV, available at: https://github.com/xiangyue9607/BioNEV, including all source code and datasets, to facilitate studying various graph embedding methods on biomedical tasks.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

177

de Jongh RP, van Dijk AD, Julsing MK, Schaap PJ, de Ridder D. Designing Eukaryotic Gene Expression Regulation Using Machine Learning. Trends Biotechnol 2020;38:191-201. [DOI: 10.1016/j.tibtech.2019.07.007] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Revised: 07/12/2019] [Accepted: 07/19/2019] [Indexed: 12/11/2022]

178

Mishra R, Mohapatra R, Mahanty B, Joshi RK. Analysis of microRNAs and their targets from onion (Allium cepa) using genome survey sequences (GSS) and expressed sequence tags (ESTs). Bioinformation 2019;15:907-917. [PMID: 32256010 PMCID: PMC7088424 DOI: 10.6026/97320630015907] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2019] [Revised: 12/30/2019] [Accepted: 12/31/2019] [Indexed: 01/31/2023] Open

179

Shi Q, Chen W, Huang S, Wang Y, Xue Z. Deep learning for mining protein data. Brief Bioinform 2019;22:194-218. [PMID: 31867611 DOI: 10.1093/bib/bbz156] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Revised: 10/21/2019] [Accepted: 11/07/2019] [Indexed: 01/16/2023] Open

180

Functional Gene Network of Prenyltransferases in Arabidopsis thaliana. Molecules 2019;24:molecules24244556. [PMID: 31842481 PMCID: PMC6943727 DOI: 10.3390/molecules24244556] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Revised: 12/09/2019] [Accepted: 12/10/2019] [Indexed: 12/17/2022] Open

181

Wang J, Zhang J, Cai Y, Deng L. DeepMiR2GO: Inferring Functions of Human MicroRNAs Using a Deep Multi-Label Classification Model. Int J Mol Sci 2019;20:E6046. [PMID: 31801264 PMCID: PMC6928926 DOI: 10.3390/ijms20236046] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2019] [Revised: 11/25/2019] [Accepted: 11/26/2019] [Indexed: 01/08/2023] Open

182

Bonetta R, Valentino G. Machine learning techniques for protein function prediction. Proteins 2019;88:397-413. [PMID: 31603244 DOI: 10.1002/prot.25832] [Citation(s) in RCA: 63] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 07/05/2019] [Accepted: 09/17/2019] [Indexed: 12/17/2022]

183

Mishra S, Rastogi YP, Jabin S, Kaur P, Amir M, Khatun S. A deep learning ensemble for function prediction of hypothetical proteins from pathogenic bacterial species. Comput Biol Chem 2019;83:107147. [PMID: 31698160 DOI: 10.1016/j.compbiolchem.2019.107147] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 10/05/2019] [Accepted: 10/09/2019] [Indexed: 01/06/2023]

184

Mahapatra M, Mahanty B, Joshi RK. Genome wide identification and functional assignments of C₂H₂ Zinc-finger family transcription factors in Dichanthelium oligosanthes. Bioinformation 2019;15:689-696. [PMID: 31787818 PMCID: PMC6859702 DOI: 10.6026/97320630015689] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 10/16/2019] [Accepted: 10/16/2019] [Indexed: 12/23/2022] Open

185

Zitnik M, Nguyen F, Wang B, Leskovec J, Goldenberg A, Hoffman MM. Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities. AN INTERNATIONAL JOURNAL ON INFORMATION FUSION 2019;50:71-91. [PMID: 30467459 PMCID: PMC6242341 DOI: 10.1016/j.inffus.2018.09.012] [Citation(s) in RCA: 222] [Impact Index Per Article: 44.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]

186

Investigation of machine learning techniques on proteomics: A comprehensive survey. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2019;149:54-69. [PMID: 31568792 DOI: 10.1016/j.pbiomolbio.2019.09.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Revised: 09/16/2019] [Accepted: 09/23/2019] [Indexed: 11/21/2022]

187

Nakano FK, Lietaert M, Vens C. Machine learning for discovering missing or wrong protein function annotations : A comparison using updated benchmark datasets. BMC Bioinformatics 2019;20:485. [PMID: 31547800 PMCID: PMC6755698 DOI: 10.1186/s12859-019-3060-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Accepted: 08/27/2019] [Indexed: 12/22/2022] Open

Abstract

BACKGROUND

A massive amount of proteomic data is generated on a daily basis, nonetheless annotating all sequences is costly and often unfeasible. As a countermeasure, machine learning methods have been used to automatically annotate new protein functions. More specifically, many studies have investigated hierarchical multi-label classification (HMC) methods to predict annotations, using the Functional Catalogue (FunCat) or Gene Ontology (GO) label hierarchies. Most of these studies employed benchmark datasets created more than a decade ago, and thus train their models on outdated information. In this work, we provide an updated version of these datasets. By querying recent versions of FunCat and GO yeast annotations, we provide 24 new datasets in total. We compare four HMC methods, providing baseline results for the new datasets. Furthermore, we also evaluate whether the predictive models are able to discover new or wrong annotations, by training them on the old data and evaluating their results against the most recent information.

RESULTS

The results demonstrated that the method based on predictive clustering trees, Clus-Ensemble, proposed in 2008, achieved superior results compared to more recent methods on the standard evaluation task. For the discovery of new knowledge, Clus-Ensemble performed better when discovering new annotations in the FunCat taxonomy, whereas hierarchical multi-label classification with genetic algorithm (HMC-GA), a method based on genetic algorithms, was overall superior when detecting annotations that were removed. In the GO datasets, Clus-Ensemble once again had the upper hand when discovering new annotations, HMC-GA performed better for detecting removed annotations. However, in this evaluation, there were less significant differences among the methods.

CONCLUSIONS

The experiments have showed that protein function prediction is a very challenging task which should be further investigated. We believe that the baseline results associated with the updated datasets provided in this work should be considered as guidelines for future studies, nonetheless the old versions of the datasets should not be disregarded since other tasks in machine learning could benefit from them.

Collapse

188

Kulmanov M, Hoehndorf R. DeepGOPlus: improved protein function prediction from sequence. Bioinformatics 2019;36:422-429. [PMID: 31350877 PMCID: PMC9883727 DOI: 10.1093/bioinformatics/btz595] [Citation(s) in RCA: 133] [Impact Index Per Article: 26.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 07/01/2019] [Accepted: 07/24/2019] [Indexed: 02/03/2023] Open

189

Chen H, Shaw D, Zeng J, Bu D, Jiang T. DIFFUSE: predicting isoform functions from sequences and expression profiles via deep learning. Bioinformatics 2019;35:i284-i294. [PMID: 31510699 PMCID: PMC6612874 DOI: 10.1093/bioinformatics/btz367] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open

Abstract

MOTIVATION

Alternative splicing generates multiple isoforms from a single gene, greatly increasing the functional diversity of a genome. Although gene functions have been well studied, little is known about the specific functions of isoforms, making accurate prediction of isoform functions highly desirable. However, the existing approaches to predicting isoform functions are far from satisfactory due to at least two reasons: (i) unlike genes, isoform-level functional annotations are scarce. (ii) The information of isoform functions is concealed in various types of data including isoform sequences, co-expression relationship among isoforms, etc.

RESULTS

In this study, we present a novel approach, DIFFUSE (Deep learning-based prediction of IsoForm FUnctions from Sequences and Expression), to predict isoform functions. To integrate various types of data, our approach adopts a hybrid framework by first using a deep neural network (DNN) to predict the functions of isoforms from their genomic sequences and then refining the prediction using a conditional random field (CRF) based on co-expression relationship. To overcome the lack of isoform-level ground truth labels, we further propose an iterative semi-supervised learning algorithm to train both the DNN and CRF together. Our extensive computational experiments demonstrate that DIFFUSE could effectively predict the functions of isoforms and genes. It achieves an average area under the receiver operating characteristics curve of 0.840 and area under the precision-recall curve of 0.581 over 4184 GO functional categories, which are significantly higher than the state-of-the-art methods. We further validate the prediction results by analyzing the correlation between functional similarity, sequence similarity, expression similarity and structural similarity, as well as the consistency between the predicted functions and some well-studied functional features of isoform sequences.

AVAILABILITY AND IMPLEMENTATION

https://github.com/haochenucr/DIFFUSE.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

190

Lv Z, Ao C, Zou Q. Protein Function Prediction: From Traditional Classifier to Deep Learning. Proteomics 2019;19:e1900119. [PMID: 31187588 DOI: 10.1002/pmic.201900119] [Citation(s) in RCA: 61] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2019] [Revised: 05/20/2019] [Indexed: 11/12/2022]

191

Teso S, Masera L, Diligenti M, Passerini A. Combining learning and constraints for genome-wide protein annotation. BMC Bioinformatics 2019;20:338. [PMID: 31208327 PMCID: PMC6580517 DOI: 10.1186/s12859-019-2875-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Accepted: 05/03/2019] [Indexed: 11/28/2022] Open

192

Saha S, Chatterjee P, Basu S, Nasipuri M, Plewczynski D. FunPred 3.0: improved protein function prediction using protein interaction network. PeerJ 2019;7:e6830. [PMID: 31198622 PMCID: PMC6535044 DOI: 10.7717/peerj.6830] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Accepted: 03/21/2019] [Indexed: 11/23/2022] Open

Abstract

Proteins are the most versatile macromolecules in living systems and perform crucial biological functions. In the advent of the post-genomic era, the next generation sequencing is done routinely at the population scale for a variety of species. The challenging problem is to massively determine the functions of proteins that are yet not characterized by detailed experimental studies. Identification of protein functions experimentally is a laborious and time-consuming task involving many resources. We therefore propose the automated protein function prediction methodology using in silico algorithms trained on carefully curated experimental datasets. We present the improved protein function prediction tool FunPred 3.0, an extended version of our previous methodology FunPred 2, which exploits neighborhood properties in protein–protein interaction network (PPIN) and physicochemical properties of amino acids. Our method is validated using the available functional annotations in the PPIN network of Saccharomyces cerevisiae in the latest Munich information center for protein (MIPS) dataset. The PPIN data of S. cerevisiae in MIPS dataset includes 4,554 unique proteins in 13,528 protein–protein interactions after the elimination of the self-replicating and the self-interacting protein pairs. Using the developed FunPred 3.0 tool, we are able to achieve the mean precision, the recall and the F-score values of 0.55, 0.82 and 0.66, respectively. FunPred 3.0 is then used to predict the functions of unpredicted protein pairs (incomplete and missing functional annotations) in MIPS dataset of S. cerevisiae. The method is also capable of predicting the subcellular localization of proteins along with its corresponding functions. The code and the complete prediction results are available freely at: https://github.com/SovanSaha/FunPred-3.0.git.

Collapse

193

Zhang F, Song H, Zeng M, Li Y, Kurgan L, Li M. DeepFunc: A Deep Learning Framework for Accurate Prediction of Protein Functions from Protein Sequences and Interactions. Proteomics 2019;19:e1900019. [PMID: 30941889 DOI: 10.1002/pmic.201900019] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2019] [Revised: 03/18/2019] [Indexed: 01/06/2023]

194

Zhu J, Zhao Q, Katsevich E, Sabatti C. Exploratory Gene Ontology Analysis with Interactive Visualization. Sci Rep 2019;9:7793. [PMID: 31127124 PMCID: PMC6534545 DOI: 10.1038/s41598-019-42178-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Accepted: 03/14/2019] [Indexed: 12/17/2022] Open

195

Stephenson N, Shane E, Chase J, Rowland J, Ries D, Justice N, Zhang J, Chan L, Cao R. Survey of Machine Learning Techniques in Drug Discovery. Curr Drug Metab 2019;20:185-193. [DOI: 10.2174/1389200219666180820112457] [Citation(s) in RCA: 111] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Revised: 01/01/2018] [Accepted: 03/19/2018] [Indexed: 12/19/2022]

196

Sureyya Rifaioglu A, Doğan T, Jesus Martin M, Cetin-Atalay R, Atalay V. DEEPred: Automated Protein Function Prediction with Multi-task Feed-forward Deep Neural Networks. Sci Rep 2019;9:7344. [PMID: 31089211 PMCID: PMC6517386 DOI: 10.1038/s41598-019-43708-3] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Accepted: 04/27/2019] [Indexed: 01/22/2023] Open

197

Leveraging implicit knowledge in neural networks for functional dissection and engineering of proteins. NAT MACH INTELL 2019. [DOI: 10.1038/s42256-019-0049-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

198

Nelson W, Zitnik M, Wang B, Leskovec J, Goldenberg A, Sharan R. To Embed or Not: Network Embedding as a Paradigm in Computational Biology. Front Genet 2019;10:381. [PMID: 31118945 PMCID: PMC6504708 DOI: 10.3389/fgene.2019.00381] [Citation(s) in RCA: 74] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Accepted: 04/09/2019] [Indexed: 12/20/2022] Open

199

Deep learning in bioinformatics: Introduction, application, and perspective in the big data era. Methods 2019;166:4-21. [PMID: 31022451 DOI: 10.1016/j.ymeth.2019.04.008] [Citation(s) in RCA: 134] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Revised: 03/23/2019] [Accepted: 04/15/2019] [Indexed: 12/13/2022] Open

200

From Genotype to Phenotype: Augmenting Deep Learning with Networks and Systems Biology. ACTA ACUST UNITED AC 2019;15:68-73. [PMID: 31777764 DOI: 10.1016/j.coisb.2019.04.001] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]