Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Mordelet F, Vert JP. ProDiGe: Prioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples. BMC Bioinformatics 2011;12:389. [PMID: 21977986 PMCID: PMC3215680 DOI: 10.1186/1471-2105-12-389] [Citation(s) in RCA: 96] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2011] [Accepted: 10/06/2011] [Indexed: 01/22/2023] Open

For:	Mordelet F, Vert JP. ProDiGe: Prioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples. BMC Bioinformatics 2011;12:389. [PMID: 21977986 PMCID: PMC3215680 DOI: 10.1186/1471-2105-12-389] [Citation(s) in RCA: 96] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2011] [Accepted: 10/06/2011] [Indexed: 01/22/2023] Open

Number

Cited by Other Article(s)

Xie J, Rao J, Xie J, Zhao H, Yang Y. Predicting disease-gene associations through self-supervised mutual infomax graph convolution network. Comput Biol Med 2024;170:108048. [PMID: 38310804 DOI: 10.1016/j.compbiomed.2024.108048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 12/19/2023] [Accepted: 01/26/2024] [Indexed: 02/06/2024]

Molotkov I, Artomov M. Detecting biased validation of predictive models in the positive-unlabeled setting: disease gene prioritization case study. BIOINFORMATICS ADVANCES 2023;3:vbad128. [PMID: 37745001 PMCID: PMC10517638 DOI: 10.1093/bioadv/vbad128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/13/2023] [Accepted: 09/12/2023] [Indexed: 09/26/2023]

Mastropietro A, De Carlo G, Anagnostopoulos A. XGDAG: explainable gene-disease associations via graph neural networks. Bioinformatics 2023;39:btad482. [PMID: 37531293 PMCID: PMC10421968 DOI: 10.1093/bioinformatics/btad482] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 06/27/2023] [Accepted: 08/01/2023] [Indexed: 08/04/2023] Open

Lee H, Jeon J, Jung D, Won JI, Kim K, Kim YJ, Yoon J. RelCurator: a text mining-based curation system for extracting gene-phenotype relationships specific to neurodegenerative disorders. Genes Genomics 2023;45:1025-1036. [PMID: 37300788 DOI: 10.1007/s13258-023-01405-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 05/18/2023] [Indexed: 06/12/2023]

Abstract

BACKGROUND

The identification of gene-phenotype relationships is important in medical genetics as it serves as a basis for precision medicine. However, most of the gene-phenotype relationship data are buried in the biomedical literature in textual form.

OBJECTIVE

We propose RelCurator, a curation system that extracts sentences including both gene and phenotype entities related to specific disease categories from PubMed articles, provides rich additional information such as entity taggings, and predictions of gene-phenotype relationships.

METHODS

We targeted neurodegenerative disorders and developed a deep learning model using Bidirectional Gated Recurrent Unit (BiGRU) networks and BioWordVec word embeddings for predicting gene-phenotype relationships from biomedical texts. The prediction model is trained with more than 130,000 labeled PubMed sentences including gene and phenotype entities, which are related to or unrelated to neurodegenerative disorders.

RESULTS

We compared the performance of our deep learning model with those of Bidirectional Encoder Representations from Transformers (BERT), Support Vector Machine (SVM), and simple Recurrent Neural Network (simple RNN) models. Our model performed better with an F1-score of 0.96. Furthermore, the evaluation done using a few curation cases in the real scenario showed the effectiveness of our work. Therefore, we conclude that RelCurator can identify not only new causative genes, but also new genes associated with neurodegenerative disorders' phenotype.

CONCLUSION

RelCurator is a user-friendly method for accessing deep learning-based supporting information and a concise web interface to assist curators while browsing the PubMed articles. Our curation process represents an important and broadly applicable improvement to the state of the art for the curation of gene-phenotype relationships.

Collapse

Wang R, Liang Y, Miao Z, Liu T. BAYESIAN ANALYSIS FOR IMBALANCED POSITIVE-UNLABELLED DIAGNOSIS CODES IN ELECTRONIC HEALTH RECORDS. Ann Appl Stat 2023;17:1220-1238. [PMID: 37152904 PMCID: PMC10156089 DOI: 10.1214/22-aoas1666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]

Jagodnik KM, Shvili Y, Bartal A. HetIG-PreDiG: A Heterogeneous Integrated Graph Model for Predicting Human Disease Genes based on gene expression. PLoS One 2023;18:e0280839. [PMID: 36791052 PMCID: PMC9931161 DOI: 10.1371/journal.pone.0280839] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 01/10/2023] [Indexed: 02/16/2023] Open

de la Fuente L, Del Pozo-Valero M, Perea-Romero I, Blanco-Kelly F, Fernández-Caballero L, Cortón M, Ayuso C, Mínguez P. Prioritization of New Candidate Genes for Rare Genetic Diseases by a Disease-Aware Evaluation of Heterogeneous Molecular Networks. Int J Mol Sci 2023;24:ijms24021661. [PMID: 36675175 PMCID: PMC9864172 DOI: 10.3390/ijms24021661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Revised: 01/10/2023] [Accepted: 01/11/2023] [Indexed: 01/18/2023] Open

Affiliation(s)

Lorena de la Fuente Department of Genetics, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III (ISCIII), 28040 Madrid, Spain Bioinformatics Unit, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain
Marta Del Pozo-Valero Department of Genetics, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III (ISCIII), 28040 Madrid, Spain
Irene Perea-Romero Department of Genetics, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III (ISCIII), 28040 Madrid, Spain
Fiona Blanco-Kelly Department of Genetics, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III (ISCIII), 28040 Madrid, Spain
Lidia Fernández-Caballero Department of Genetics, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III (ISCIII), 28040 Madrid, Spain
Marta Cortón Department of Genetics, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III (ISCIII), 28040 Madrid, Spain
Carmen Ayuso Department of Genetics, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III (ISCIII), 28040 Madrid, Spain
Pablo Mínguez Department of Genetics, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III (ISCIII), 28040 Madrid, Spain Bioinformatics Unit, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain Correspondence:

Collapse

Deelder W, Manko E, Phelan JE, Campino S, Palla L, Clark TG. Geographical classification of malaria parasites through applying machine learning to whole genome sequence data. Sci Rep 2022;12:21150. [PMID: 36476815 PMCID: PMC9729610 DOI: 10.1038/s41598-022-25568-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Accepted: 12/01/2022] [Indexed: 12/12/2022] Open

Ma H, Dong Z, Chen M, Sheng W, Li Y, Zhang W, Zhang S, Yu Y. A gradient boosting tree model for multi-department venous thromboembolism risk assessment with imbalanced data. J Biomed Inform 2022;134:104210. [PMID: 36122879 DOI: 10.1016/j.jbi.2022.104210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 08/17/2022] [Accepted: 09/12/2022] [Indexed: 11/19/2022]

Abstract

Venous thromboembolism (VTE) is the world's third most common cause of vascular mortality and a serious complication from multiple departments. Risk assessment of VTE guides clinical intervention in time and is of great importance to in-hospital patients. Traditional VTE risk assessment methods based on scaling tools, which always require rules carefully designed by human experts, are difficult to apply to large-population scenarios since the manually designed rules are not guaranteed to be accurate to all populations. In contrast, with the development of the electronic health record (EHR) datasets, data-driven machine-learning-based risk assessment methods have proven superior predictability in many studies in recent years. This paper uses the gradient boosting tree model to study the VTE risk assessment problem with multi-department data. There exist two distinct characteristics of VTE data collected at the level of the entire hospital: its wide distribution and heterogeneity across multiple departments. To this end, we consider the prediction task over multiple departments as a multi-task learning process, and introduce the algorithm of a task-aware tree-based method TSGB to tackle the multi-task prediction problem. Although the introduction of multi-task learning improves overall across-department performance, we reveal the problem of task-wise performance decline while dealing with imbalanced VTE data volume. According to the analysis, we finally propose two variants of TSGB to alleviate the problems and further boost the prediction performance. Compared with state-of-the-art rule-based and multi-task tree-based methods, the experimental results show the proposed methods not only improve the overall across-department AUC performance effectively, but also ensure the improvement of performance over every single department prediction.

Collapse

Subramanian A, Zakeri P, Mousa M, Alnaqbi H, Alshamsi FY, Bettoni L, Damiani E, Alsafar H, Saeys Y, Carmeliet P. Angiogenesis goes computational - The future way forward to discover new angiogenic targets? Comput Struct Biotechnol J 2022;20:5235-5255. [PMID: 36187917 PMCID: PMC9508490 DOI: 10.1016/j.csbj.2022.09.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 09/09/2022] [Accepted: 09/09/2022] [Indexed: 11/26/2022] Open

Affiliation(s)

Abhishek Subramanian Laboratory of Angiogenesis & Vascular Metabolism, Center for Cancer Biology, VIB, Leuven, Belgium Laboratory of Angiogenesis & Vascular Metabolism, Department of Oncology, KU Leuven, Leuven, Belgium
Pooya Zakeri Laboratory of Angiogenesis & Vascular Heterogeneity, Department of Biomedicine, Aarhus University, Aarhus, Denmark Centre for Brain and Disease Research, Flanders Institute for Biotechnology (VIB), Leuven, Belgium Department of Neurosciences and Leuven Brain Institute, KU Leuven, Leuven, Belgium
Mira Mousa Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
Halima Alnaqbi Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
Fatima Yousif Alshamsi Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
Leo Bettoni Laboratory of Angiogenesis & Vascular Metabolism, Center for Cancer Biology, VIB, Leuven, Belgium Laboratory of Angiogenesis & Vascular Metabolism, Department of Oncology, KU Leuven, Leuven, Belgium
Ernesto Damiani Robotics and Intelligent Systems Institute, Khalifa University, Abu Dhabi, United Arab Emirates
Habiba Alsafar Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
Yvan Saeys Data Mining and Modelling for Biomedicine Group, VIB Center for Inflammation Research, Ghent, Belgium Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
Peter Carmeliet Laboratory of Angiogenesis & Vascular Metabolism, Center for Cancer Biology, VIB, Leuven, Belgium Laboratory of Angiogenesis & Vascular Metabolism, Department of Oncology, KU Leuven, Leuven, Belgium Laboratory of Angiogenesis & Vascular Heterogeneity, Department of Biomedicine, Aarhus University, Aarhus, Denmark Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates

Collapse

He R, Han Z, Yin Y. Towards safe and robust weakly-supervised anomaly detection under subpopulation shift. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Machine learning prediction and tau-based screening identifies potential Alzheimer's disease genes relevant to immunity. Commun Biol 2022;5:125. [PMID: 35149761 PMCID: PMC8837797 DOI: 10.1038/s42003-022-03068-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 01/21/2022] [Indexed: 12/19/2022] Open

Santos SDS, Torres M, Galeano D, Sánchez MDM, Cernuzzi L, Paccanaro A. Machine learning and network medicine approaches for drug repositioning for COVID-19. PATTERNS (NEW YORK, N.Y.) 2022;3:100396. [PMID: 34778851 PMCID: PMC8576113 DOI: 10.1016/j.patter.2021.100396] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 06/21/2021] [Accepted: 11/01/2021] [Indexed: 12/13/2022]

Velardi P, Madeddu L. Aim in Genomics. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_76] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Yang K, Zheng Y, Lu K, Chang K, Wang N, Shu Z, Yu J, Liu B, Gao Z, Zhou X. PDGNet: Predicting Disease Genes Using a Deep Neural Network With Multi-View Features. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:575-584. [PMID: 32750864 DOI: 10.1109/tcbb.2020.3002771] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Li F, Dong S, Leier A, Han M, Guo X, Xu J, Wang X, Pan S, Jia C, Zhang Y, Webb GI, Coin LJM, Li C, Song J. Positive-unlabeled learning in bioinformatics and computational biology: a brief review. Brief Bioinform 2021;23:6415313. [PMID: 34729589 DOI: 10.1093/bib/bbab461] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 09/27/2021] [Accepted: 10/07/2021] [Indexed: 12/14/2022] Open

Yang H, Ding Y, Tang J, Guo F. Identifying potential association on gene-disease network via dual hypergraph regularized least squares. BMC Genomics 2021;22:605. [PMID: 34372777 PMCID: PMC8351363 DOI: 10.1186/s12864-021-07864-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 06/29/2021] [Indexed: 12/27/2022] Open

Deelder W, Benavente ED, Phelan J, Manko E, Campino S, Palla L, Clark TG. Using deep learning to identify recent positive selection in malaria parasite sequence data. Malar J 2021;20:270. [PMID: 34126997 PMCID: PMC8201710 DOI: 10.1186/s12936-021-03788-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 05/29/2021] [Indexed: 11/18/2022] Open

N-semble-based method for identifying Parkinson’s disease genes. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-05974-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Luo P, Chen B, Liao B, Wu F. Predicting disease‐associated genes: Computational methods, databases, and evaluations. WIRES DATA MINING AND KNOWLEDGE DISCOVERY 2021;11. [DOI: 10.1002/widm.1383] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2019] [Accepted: 06/13/2020] [Indexed: 09/09/2024]

Classifier chains for positive unlabelled multi-label learning. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2020.106709] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Yang K, Lu K, Wu Y, Yu J, Liu B, Zhao Y, Chen J, Zhou X. A network-based machine-learning framework to identify both functional modules and disease genes. Hum Genet 2021;140:897-913. [PMID: 33409574 DOI: 10.1007/s00439-020-02253-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Accepted: 12/22/2020] [Indexed: 01/20/2023]

Aim in Genomics. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-58080-3_76-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Liu Y, Guo Y, Liu X, Wang C, Guo M. Pathogenic gene prediction based on network embedding. Brief Bioinform 2020;22:6053103. [PMID: 33367541 DOI: 10.1093/bib/bbaa353] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 11/02/2020] [Accepted: 11/03/2020] [Indexed: 11/13/2022] Open

Ata SK, Wu M, Fang Y, Ou-Yang L, Kwoh CK, Li XL. Recent advances in network-based methods for disease gene prediction. Brief Bioinform 2020;22:6023077. [PMID: 33276376 DOI: 10.1093/bib/bbaa303] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 09/29/2020] [Accepted: 10/10/2020] [Indexed: 01/28/2023] Open

Song H, Bremer BJ, Hinds EC, Raskutti G, Romero PA. Inferring Protein Sequence-Function Relationships with Large-Scale Positive-Unlabeled Learning. Cell Syst 2020;12:92-101.e8. [PMID: 33212013 DOI: 10.1016/j.cels.2020.10.007] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2019] [Revised: 08/13/2020] [Accepted: 10/22/2020] [Indexed: 10/22/2022]

Li H, Guo Y, Cai M, Li L. MicroRNA-disease association prediction by matrix tri-factorization. BMC Genomics 2020;21:617. [PMID: 33208088 PMCID: PMC7677824 DOI: 10.1186/s12864-020-07006-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open

Paliwal S, de Giorgio A, Neil D, Michel JB, Lacoste AM. Preclinical validation of therapeutic targets predicted by tensor factorization on heterogeneous graphs. Sci Rep 2020;10:18250. [PMID: 33106501 PMCID: PMC7589557 DOI: 10.1038/s41598-020-74922-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Accepted: 09/30/2020] [Indexed: 12/04/2022] Open

Jang J, Gu GH, Noh J, Kim J, Jung Y. Structure-Based Synthesizability Prediction of Crystals Using Partially Supervised Learning. J Am Chem Soc 2020;142:18836-18843. [DOI: 10.1021/jacs.0c07384] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]

Zeng X, Lin Y, He Y, Lu L, Min X, Rodriguez-Paton A. Deep Collaborative Filtering for Prediction of Disease Genes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020;17:1639-1647. [PMID: 30932845 DOI: 10.1109/tcbb.2019.2907536] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Zhang Y, Qiu Y, Cui Y, Liu S, Zhang W. Predicting drug-drug interactions using multi-modal deep auto-encoders based network embedding and positive-unlabeled learning. Methods 2020;179:37-46. [DOI: 10.1016/j.ymeth.2020.05.007] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2020] [Revised: 05/06/2020] [Accepted: 05/13/2020] [Indexed: 12/21/2022] Open

Lan C, Chandrasekaran SN, Huan J. On the Unreported-Profile-is-Negative Assumption for Predictive Cheminformatics. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020;17:1352-1363. [PMID: 31056508 DOI: 10.1109/tcbb.2019.2913855] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Le DH. Machine learning-based approaches for disease gene prediction. Brief Funct Genomics 2020;19:350-363. [PMID: 32567652 DOI: 10.1093/bfgp/elaa013] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 04/30/2020] [Accepted: 05/09/2020] [Indexed: 12/20/2022] Open

Nicholls HL, John CR, Watson DS, Munroe PB, Barnes MR, Cabrera CP. Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci. Front Genet 2020;11:350. [PMID: 32351543 PMCID: PMC7174742 DOI: 10.3389/fgene.2020.00350] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 03/23/2020] [Indexed: 12/21/2022] Open

Abstract

Genome-wide association studies (GWAS) have revealed thousands of genetic loci that underpin the complex biology of many human traits. However, the strength of GWAS - the ability to detect genetic association by linkage disequilibrium (LD) - is also its limitation. Whilst the ever-increasing study size and improved design have augmented the power of GWAS to detect effects, differentiation of causal variants or genes from other highly correlated genes associated by LD remains the real challenge. This has severely hindered the biological insights and clinical translation of GWAS findings. Although thousands of disease susceptibility loci have been reported, causal genes at these loci remain elusive. Machine learning (ML) techniques offer an opportunity to dissect the heterogeneity of variant and gene signals in the post-GWAS analysis phase. ML models for GWAS prioritization vary greatly in their complexity, ranging from relatively simple logistic regression approaches to more complex ensemble models such as random forests and gradient boosting, as well as deep learning models, i.e., neural networks. Paired with functional validation, these methods show important promise for clinical translation, providing a strong evidence-based approach to direct post-GWAS research. However, as ML approaches continue to evolve to meet the challenge of causal gene identification, a critical assessment of the underlying methodologies and their applicability to the GWAS prioritization problem is needed. This review investigates the landscape of ML applications in three parts: selected models, input features, and output model performance, with a focus on prioritizations of complex disease associated loci. Overall, we explore the contributions ML has made towards reaching the GWAS end-game with consequent wide-ranging translational impact.

Collapse

Affiliation(s)

Hannah L. Nicholls Clinical Pharmacology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom Centre for Translational Bioinformatics, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom
Christopher R. John Centre for Translational Bioinformatics, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom Centre for Experimental Medicine and Rheumatology, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom
David S. Watson Centre for Translational Bioinformatics, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom Oxford Internet Institute, University of Oxford, Oxford, United Kingdom
Patricia B. Munroe Clinical Pharmacology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom NIHR Barts Biomedical Research Centre, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom
Michael R. Barnes Clinical Pharmacology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom Centre for Translational Bioinformatics, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom NIHR Barts Biomedical Research Centre, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom The Alan Turing Institute, British Library, London, United Kingdom
Claudia P. Cabrera Clinical Pharmacology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom Centre for Translational Bioinformatics, William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom NIHR Barts Biomedical Research Centre, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom

Collapse

Bekker J, Davis J. Learning from positive and unlabeled data: a survey. Mach Learn 2020. [DOI: 10.1007/s10994-020-05877-5] [Citation(s) in RCA: 104] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]

Wang C, Zhang J, Wang X, Han K, Guo M. Pathogenic Gene Prediction Algorithm Based on Heterogeneous Information Fusion. Front Genet 2020;11:5. [PMID: 32117433 PMCID: PMC7010852 DOI: 10.3389/fgene.2020.00005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Accepted: 01/06/2020] [Indexed: 12/23/2022] Open

Abstract

Complex diseases seriously affect people's physical and mental health. The discovery of disease-causing genes has become a target of research. With the emergence of bioinformatics and the rapid development of biotechnology, to overcome the inherent difficulties of the long experimental period and high cost of traditional biomedical methods, researchers have proposed many gene prioritization algorithms that use a large amount of biological data to mine pathogenic genes. However, because the currently known gene-disease association matrix is still very sparse and lacks evidence that genes and diseases are unrelated, there are limits to the predictive performance of gene prioritization algorithms. Based on the hypothesis that functionally related gene mutations may lead to similar disease phenotypes, this paper proposes a PU induction matrix completion algorithm based on heterogeneous information fusion (PUIMCHIF) to predict candidate genes involved in the pathogenicity of human diseases. On the one hand, PUIMCHIF uses different compact feature learning methods to extract features of genes and diseases from multiple data sources, making up for the lack of sparse data. On the other hand, based on the prior knowledge that most of the unknown gene-disease associations are unrelated, we use the PU-Learning strategy to treat the unknown unlabeled data as negative examples for biased learning. The experimental results of the PUIMCHIF algorithm regarding the three indexes of precision, recall, and mean percentile ranking (MPR) were significantly better than those of other algorithms. In the top 100 global prediction analysis of multiple genes and multiple diseases, the probability of recovering true gene associations using PUIMCHIF reached 50% and the MPR value was 10.94%. The PUIMCHIF algorithm has higher priority than those from other methods, such as IMC and CATAPULT.

Collapse

Tran VD, Sperduti A, Backofen R, Costa F. Heterogeneous networks integration for disease–gene prioritization with node kernels. Bioinformatics 2020;36:2649-2656. [DOI: 10.1093/bioinformatics/btaa008] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Revised: 12/19/2019] [Accepted: 01/23/2020] [Indexed: 12/21/2022] Open

Benincasa G, Marfella R, Della Mura N, Schiano C, Napoli C. Strengths and Opportunities of Network Medicine in Cardiovascular Diseases. Circ J 2020;84:144-152. [DOI: 10.1253/circj.cj-19-0879] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Yang K, Wang R, Liu G, Shu Z, Wang N, Zhang R, Yu J, Chen J, Li X, Zhou X. HerGePred: Heterogeneous Network Embedding Representation for Disease Gene Prediction. IEEE J Biomed Health Inform 2020;23:1805-1815. [PMID: 31283472 DOI: 10.1109/jbhi.2018.2870728] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data. LECTURE NOTES IN COMPUTER SCIENCE 2020. [PMCID: PMC7303724 DOI: 10.1007/978-3-030-50423-6_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

A novel one-class classification approach to accurately predict disease-gene association in acute myeloid leukemia cancer. PLoS One 2019;14:e0226115. [PMID: 31825992 PMCID: PMC6905554 DOI: 10.1371/journal.pone.0226115] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Accepted: 11/19/2019] [Indexed: 01/02/2023] Open

Long Y, Luo J. WMGHMDA: a novel weighted meta-graph-based model for predicting human microbe-disease association on heterogeneous information network. BMC Bioinformatics 2019;20:541. [PMID: 31675979 PMCID: PMC6824056 DOI: 10.1186/s12859-019-3066-0] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Accepted: 09/02/2019] [Indexed: 12/12/2022] Open

Abstract

BACKGROUND

An increasing number of biological and clinical evidences have indicated that the microorganisms significantly get involved in the pathological mechanism of extensive varieties of complex human diseases. Inferring potential related microbes for diseases can not only promote disease prevention, diagnosis and treatment, but also provide valuable information for drug development. Considering that experimental methods are expensive and time-consuming, developing computational methods is an alternative choice. However, most of existing methods are biased towards well-characterized diseases and microbes. Furthermore, existing computational methods are limited in predicting potential microbes for new diseases.

RESULTS

Here, we developed a novel computational model to predict potential human microbe-disease associations (MDAs) based on Weighted Meta-Graph (WMGHMDA). We first constructed a heterogeneous information network (HIN) by combining the integrated microbe similarity network, the integrated disease similarity network and the known microbe-disease bipartite network. And then, we implemented iteratively pre-designed Weighted Meta-Graph search algorithm on the HIN to uncover possible microbe-disease pairs by cumulating the contribution values of weighted meta-graphs to the pairs as their probability scores. Depending on contribution potential, we described the contribution degree of different types of meta-graphs to a microbe-disease pair with bias rating. Meta-graph with higher bias rating will be assigned greater weight value when calculating probability scores.

CONCLUSIONS

The experimental results showed that WMGHMDA outperformed some state-of-the-art methods with average AUCs of 0.9288, 0.9068 ±0.0031 in global leave-one-out cross validation (LOOCV) and 5-fold cross validation (5-fold CV), respectively. In the case studies, 9, 19, 37 and 10, 20, 45 out of top-10, 20, 50 candidate microbes were manually verified by previous reports for asthma and inflammatory bowel disease (IBD), respectively. Furthermore, three common human diseases (Crohn's disease, Liver cirrhosis, Type 1 diabetes) were adopted to demonstrate that WMGHMDA could be efficiently applied to make predictions for new diseases. In summary, WMGHMDA has a high potential in predicting microbe-disease associations.

Collapse

Chen X, Wang L, Qu J, Guan NN, Li JQ. Predicting miRNA-disease association based on inductive matrix completion. Bioinformatics 2019;34:4256-4265. [PMID: 29939227 DOI: 10.1093/bioinformatics/bty503] [Citation(s) in RCA: 255] [Impact Index Per Article: 51.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2018] [Accepted: 06/20/2018] [Indexed: 12/17/2022] Open

Abstract

Motivation

It has been shown that microRNAs (miRNAs) play key roles in variety of biological processes associated with human diseases. In Consideration of the cost and complexity of biological experiments, computational methods for predicting potential associations between miRNAs and diseases would be an effective complement.

Results

This paper presents a novel model of Inductive Matrix Completion for MiRNA-Disease Association prediction (IMCMDA). The integrated miRNA similarity and disease similarity are calculated based on miRNA functional similarity, disease semantic similarity and Gaussian interaction profile kernel similarity. The main idea is to complete the missing miRNA-disease association based on the known associations and the integrated miRNA similarity and disease similarity. IMCMDA achieves AUC of 0.8034 based on leave-one-out-cross-validation and improved previous models. In addition, IMCMDA was applied to five common human diseases in three types of case studies. In the first type, respectively, 42, 44, 45 out of top 50 predicted miRNAs of Colon Neoplasms, Kidney Neoplasms, Lymphoma were confirmed by experimental reports. In the second type of case study for new diseases without any known miRNAs, we chose Breast Neoplasms as the test example by hiding the association information between the miRNAs and Breast Neoplasms. As a result, 50 out of top 50 predicted Breast Neoplasms-related miRNAs are verified. In the third type of case study, IMCMDA was tested on HMDD V1.0 to assess the robustness of IMCMDA, 49 out of top 50 predicted Esophageal Neoplasms-related miRNAs are verified.

Availability and implementation

The code and dataset of IMCMDA are freely available at https://github.com/IMCMDAsourcecode/IMCMDA.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

QTG-Finder: A Machine-Learning Based Algorithm To Prioritize Causal Genes of Quantitative Trait Loci in Arabidopsis and Rice. G3-GENES GENOMES GENETICS 2019;9:3129-3138. [PMID: 31358562 PMCID: PMC6778793 DOI: 10.1534/g3.119.400319] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Collier O, Stoven V, Vert JP. LOTUS: A single- and multitask machine learning algorithm for the prediction of cancer driver genes. PLoS Comput Biol 2019;15:e1007381. [PMID: 31568528 PMCID: PMC6786659 DOI: 10.1371/journal.pcbi.1007381] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2018] [Revised: 10/10/2019] [Accepted: 09/04/2019] [Indexed: 12/16/2022] Open

Abstract

Cancer driver genes, i.e., oncogenes and tumor suppressor genes, are involved in the acquisition of important functions in tumors, providing a selective growth advantage, allowing uncontrolled proliferation and avoiding apoptosis. It is therefore important to identify these driver genes, both for the fundamental understanding of cancer and to help finding new therapeutic targets or biomarkers. Although the most frequently mutated driver genes have been identified, it is believed that many more remain to be discovered, particularly for driver genes specific to some cancer types. In this paper, we propose a new computational method called LOTUS to predict new driver genes. LOTUS is a machine-learning based approach which allows to integrate various types of data in a versatile manner, including information about gene mutations and protein-protein interactions. In addition, LOTUS can predict cancer driver genes in a pan-cancer setting as well as for specific cancer types, using a multitask learning strategy to share information across cancer types. We empirically show that LOTUS outperforms five other state-of-the-art driver gene prediction methods, both in terms of intrinsic consistency and prediction accuracy, and provide predictions of new cancer genes across many cancer types.

Cancer development is driven by mutations and dysfunction of important, so-called cancer driver genes, that could be targeted by specific therapies. While a number of such cancer genes have already been identified, it is believed that many more remain to be discovered. To help prioritize experimental investigations of candidate genes, several computational methods have been proposed to rank promising candidates based on their mutations in large cohorts of cancer cases, or on their interactions with known driver genes in biological networks. We propose LOTUS, a new computational approach to identify genes with high oncogenic potential. LOTUS implements a machine learning approach to learn an oncogenic potential score from known driver genes, and brings two novelties compared to existing methods. First, it allows to easily combine heterogeneous sources of information into the scoring function, which we illustrate by learning a scoring function from both known mutations in large cancer cohorts and interactions in biological networks. Second, using a multitask learning strategy, it can predict different driver genes for different cancer types, while sharing information between them to improve the prediction for every type. We provide experimental results showing that LOTUS significantly outperforms several state-of-the-art cancer gene prediction software.

Collapse

Timilsina M, Yang H, Sahay R, Rebholz-Schuhmann D. Predicting links between tumor samples and genes using 2-Layered graph based diffusion approach. BMC Bioinformatics 2019;20:462. [PMID: 31500564 PMCID: PMC6734347 DOI: 10.1186/s12859-019-3056-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Accepted: 08/26/2019] [Indexed: 12/21/2022] Open

Abstract

Background

Determining the association between tumor sample and the gene is demanding because it requires a high cost for conducting genetic experiments. Thus, the discovered association between tumor sample and gene further requires clinical verification and validation. This entire mechanism is time-consuming and expensive. Due to this issue, predicting the association between tumor samples and genes remain a challenge in biomedicine.

Results

Here we present, a computational model based on a heat diffusion algorithm which can predict the association between tumor samples and genes. We proposed a 2-layered graph. In the first layer, we constructed a graph of tumor samples and genes where these two types of nodes are connected by “hasGene” relationship. In the second layer, the gene nodes are connected by “interaction” relationship. We applied the heat diffusion algorithms in nine different variants of genetic interaction networks extracted from STRING and BioGRID database. The heat diffusion algorithm predicted the links between tumor samples and genes with mean AUC-ROC score of 0.84. This score is obtained by using weighted genetic interactions of fusion or co-occurrence channels from the STRING database. For the unweighted genetic interaction from the BioGRID database, the algorithms predict the links with an AUC-ROC score of 0.74.

Conclusions

We demonstrate that the gene-gene interaction scores could improve the predictive power of the heat diffusion model to predict the links between tumor samples and genes. We showed the efficient runtime of the heat diffusion algorithm in various genetic interaction network. We statistically validated our prediction quality of the links between tumor samples and genes.

Electronic supplementary material

The online version of this article (10.1186/s12859-019-3056-2) contains supplementary material, which is available to authorized users.

Collapse

Zolotareva O, Kleine M. A Survey of Gene Prioritization Tools for Mendelian and Complex Human Diseases. J Integr Bioinform 2019;16:/j/jib.ahead-of-print/jib-2018-0069/jib-2018-0069.xml. [PMID: 31494632 PMCID: PMC7074139 DOI: 10.1515/jib-2018-0069] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Accepted: 07/12/2019] [Indexed: 12/16/2022] Open

Picart-Armada S, Barrett SJ, Willé DR, Perera-Lluna A, Gutteridge A, Dessailly BH. Benchmarking network propagation methods for disease gene identification. PLoS Comput Biol 2019;15:e1007276. [PMID: 31479437 PMCID: PMC6743778 DOI: 10.1371/journal.pcbi.1007276] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 09/13/2019] [Accepted: 07/16/2019] [Indexed: 12/17/2022] Open

Abstract

In-silico identification of potential target genes for disease is an essential aspect of drug target discovery. Recent studies suggest that successful targets can be found through by leveraging genetic, genomic and protein interaction information. Here, we systematically tested the ability of 12 varied algorithms, based on network propagation, to identify genes that have been targeted by any drug, on gene-disease data from 22 common non-cancerous diseases in OpenTargets. We considered two biological networks, six performance metrics and compared two types of input gene-disease association scores. The impact of the design factors in performance was quantified through additive explanatory models. Standard cross-validation led to over-optimistic performance estimates due to the presence of protein complexes. In order to obtain realistic estimates, we introduced two novel protein complex-aware cross-validation schemes. When seeding biological networks with known drug targets, machine learning and diffusion-based methods found around 2-4 true targets within the top 20 suggestions. Seeding the networks with genes associated to disease by genetics decreased performance below 1 true hit on average. The use of a larger network, although noisier, improved overall performance. We conclude that diffusion-based prioritisers and machine learning applied to diffusion-based features are suited for drug discovery in practice and improve over simpler neighbour-voting methods. We also demonstrate the large impact of choosing an adequate validation strategy and the definition of seed disease genes.

The use of biological network data has proven its effectiveness in many areas from computational biology. Networks consist of nodes, usually genes or proteins, and edges that connect pairs of nodes, representing information such as physical interactions, regulatory roles or co-occurrence. In order to find new candidate nodes for a given biological property, the so-called network propagation algorithms start from the set of known nodes with that property and leverage the connections from the biological network to make predictions. Here, we assess the performance of several network propagation algorithms to find sensible gene targets for 22 common non-cancerous diseases, i.e. those that have been found promising enough to start the clinical trials with any compound. We focus on obtaining performance metrics that reflect a practical scenario in drug development where only a small set of genes can be essayed. We found that the presence of protein complexes biased the performance estimates, leading to over-optimistic conclusions, and introduced two novel strategies to address it. Our results support that network propagation is still a viable approach to find drug targets, but that special care needs to be put on the validation strategy. Algorithms benefitted from the use of a larger -although noisier- network and of direct evidence data, rather than indirect genetic associations to disease.

Collapse

Zakeri P, Simm J, Arany A, ElShal S, Moreau Y. Gene prioritization using Bayesian matrix factorization with genomic and phenotypic side information. Bioinformatics 2019;34:i447-i456. [PMID: 29949967 PMCID: PMC6022676 DOI: 10.1093/bioinformatics/bty289] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open

Cáceres JJ, Paccanaro A. Disease gene prediction for molecularly uncharacterized diseases. PLoS Comput Biol 2019;15:e1007078. [PMID: 31276496 PMCID: PMC6636748 DOI: 10.1371/journal.pcbi.1007078] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2018] [Revised: 07/17/2019] [Accepted: 05/09/2019] [Indexed: 02/06/2023] Open