Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Alshahrani M, Hoehndorf R. Semantic Disease Gene Embeddings (SmuDGE): phenotype-based disease gene prioritization without phenotypes. Bioinformatics 2019;34:i901-i907. [PMID: 30423077 PMCID: PMC6129260 DOI: 10.1093/bioinformatics/bty559] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open

For:	Alshahrani M, Hoehndorf R. Semantic Disease Gene Embeddings (SmuDGE): phenotype-based disease gene prioritization without phenotypes. Bioinformatics 2019;34:i901-i907. [PMID: 30423077 PMCID: PMC6129260 DOI: 10.1093/bioinformatics/bty559] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open

Number

Cited by Other Article(s)

Zhang D, Zhao R, Xian G, Kou Y, Ma W. A new model construction based on the knowledge graph for mining elite polyphenotype genes in crops. FRONTIERS IN PLANT SCIENCE 2024;15:1361716. [PMID: 38571713 PMCID: PMC10987776 DOI: 10.3389/fpls.2024.1361716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Accepted: 03/04/2024] [Indexed: 04/05/2024]

Abstract

Identifying polyphenotype genes that simultaneously regulate important agronomic traits (e.g., plant height, yield, and disease resistance) is critical for developing novel high-quality crop varieties. Predicting the associations between genes and traits requires the organization and analysis of multi-dimensional scientific data. The existing methods for establishing the relationships between genomic data and phenotypic data can only elucidate the associations between genes and individual traits. However, there are relatively few methods for detecting elite polyphenotype genes. In this study, a knowledge graph for traits regulating-genes was constructed by collecting data from the PubMed database and eight other databases related to the staple food crops rice, maize, and wheat as well as the model plant Arabidopsis thaliana. On the basis of the knowledge graph, a model for predicting traits regulating-genes was constructed by combining the data attributes of the gene nodes and the topological relationship attributes of the gene nodes. Additionally, a scoring method for predicting the genes regulating specific traits was developed to screen for elite polyphenotype genes. A total of 125,591 nodes and 547,224 semantic relationships were included in the knowledge graph. The accuracy of the knowledge graph-based model for predicting traits regulating-genes was 0.89, the precision rate was 0.91, the recall rate was 0.96, and the F1 value was 0.94. Moreover, 4,447 polyphenotype genes for 31 trait combinations were identified, among which the rice polyphenotype gene IPA1 and the A. thaliana polyphenotype gene CUC2 were verified via a literature search. Furthermore, the wheat gene TraesCS5A02G275900 was revealed as a potential polyphenotype gene that will need to be further characterized. Meanwhile, the result of venn diagram analysis between the polyphenotype gene datasets (consists of genes that are predicted by our model) and the transcriptome gene datasets (consists of genes that were differential expression in response to disease, drought or salt) showed approximately 70% and 54% polyphenotype genes were identified in the transcriptome datasets of Arabidopsis and rice, respectively. The application of the model driven by knowledge graph for predicting traits regulating-genes represents a novel method for detecting elite polyphenotype genes.

Collapse

Chaturvedi J, Wang T, Velupillai S, Stewart R, Roberts A. Development of a Knowledge Graph Embeddings Model for Pain. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2024;2023:299-308. [PMID: 38222382 PMCID: PMC10785867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]

Li N, Yang Z, Yang Y, Wang J, Lin H. Hyperbolic hierarchical knowledge graph embeddings for biological entities. J Biomed Inform 2023;147:104503. [PMID: 37778673 DOI: 10.1016/j.jbi.2023.104503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 08/25/2023] [Accepted: 09/19/2023] [Indexed: 10/03/2023]

Zhang L, Lu D, Bi X, Zhao K, Yu G, Quan N. Predicting disease genes based on multi-head attention fusion. BMC Bioinformatics 2023;24:162. [PMID: 37085750 PMCID: PMC10122338 DOI: 10.1186/s12859-023-05285-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Accepted: 04/12/2023] [Indexed: 04/23/2023] Open

Jagodnik KM, Shvili Y, Bartal A. HetIG-PreDiG: A Heterogeneous Integrated Graph Model for Predicting Human Disease Genes based on gene expression. PLoS One 2023;18:e0280839. [PMID: 36791052 PMCID: PMC9931161 DOI: 10.1371/journal.pone.0280839] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 01/10/2023] [Indexed: 02/16/2023] Open

Himmelstein DS, Zietz M, Rubinetti V, Kloster K, Heil BJ, Alquaddoomi F, Hu D, Nicholson DN, Hao Y, Sullivan BD, Nagle MW, Greene CS. Hetnet connectivity search provides rapid insights into how two biomedical entities are related. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.05.522941. [PMID: 36711546 PMCID: PMC9882000 DOI: 10.1101/2023.01.05.522941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]

Affiliation(s)

Daniel S. Himmelstein Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Related Sciences
Michael Zietz Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Department of Biomedical Informatics, Columbia University, New York, New York, United States of America
Vincent Rubinetti Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America
Kyle Kloster Carbon, Inc.; Department of Computer Science, North Carolina State University, Raleigh, North Carolina, United States of America
Benjamin J. Heil Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania
Faisal Alquaddoomi Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, United States of America; Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America
Dongbo Hu Department of Pathology, Perelman School of Medicine University of Pennsylvania, Philadelphia PA, USA
David N. Nicholson Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine University of Pennsylvania, Philadelphia PA, USA
Yun Hao Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia PA, USA
Blair D. Sullivan School of Computing, University of Utah, Salt Lake City, Utah, USA
Michael W. Nagle Integrative Biology, Internal Medicine Research Unit, Worldwide Research, Development, and Medicine, Pfizer Inc, Cambridge, Massachusetts, United States of America; Neurogenomics, Translational Sciences, Neurology Business Group, Eisai Inc, Cambridge, Massachusetts, United States of America
Casey S. Greene Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, United States of America; Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America

Collapse

Himmelstein DS, Zietz M, Rubinetti V, Kloster K, Heil BJ, Alquaddoomi F, Hu D, Nicholson DN, Hao Y, Sullivan BD, Nagle MW, Greene CS. Hetnet connectivity search provides rapid insights into how biomedical entities are related. Gigascience 2022;12:giad047. [PMID: 37503959 PMCID: PMC10375517 DOI: 10.1093/gigascience/giad047] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 04/14/2023] [Accepted: 06/06/2023] [Indexed: 07/29/2023] Open

Affiliation(s)

Daniel S Himmelstein Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA Related Sciences, Denver, CO 80202, USA
Michael Zietz Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
Vincent Rubinetti Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA Center for Health AI, University of Colorado School of Medicine, Aurora, CO 80045, USA
Kyle Kloster Carbon, Inc., Redwood City, CA 94063, USA Department of Computer Science, North Carolina State University, Raleigh, NC 27606, USA
Benjamin J Heil Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
Faisal Alquaddoomi Center for Health AI, University of Colorado School of Medicine, Aurora, CO 80045, USA Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045, USA
Dongbo Hu Department of Pathology, Perelman School of Medicine University of Pennsylvania, Philadelphia, PA 19104, USA
David N Nicholson Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
Yun Hao Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
Blair D Sullivan School of Computing, University of Utah, Salt Lake City, UT 84112, USA
Michael W Nagle Integrative Biology, Internal Medicine Research Unit, Worldwide Research, Development, and Medicine, Pfizer Inc, Cambridge, MA 02139, USA Human Biology Integration Foundation, Deep Human Biology Learning, Eisai Inc., Cambridge, MA 02140, USA
Casey S Greene Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA Center for Health AI, University of Colorado School of Medicine, Aurora, CO 80045, USA Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045, USA

Collapse

Su L, Liu G, Guo Y, Zhang X, Zhu X, Wang J. Integration of Protein-Protein Interaction Networks and Gene Expression Profiles Helps Detect Pancreatic Adenocarcinoma Candidate Genes. Front Genet 2022;13:854661. [PMID: 35711911 PMCID: PMC9197464 DOI: 10.3389/fgene.2022.854661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 05/09/2022] [Indexed: 11/13/2022] Open

Alshahrani M, Almansour A, Alkhaldi A, Thafar MA, Uludag M, Essack M, Hoehndorf R. Combining biomedical knowledge graphs and text to improve predictions for drug-target interactions and drug-indications. PeerJ 2022;10:e13061. [PMID: 35402106 PMCID: PMC8988936 DOI: 10.7717/peerj.13061] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Accepted: 02/13/2022] [Indexed: 01/11/2023] Open

Zhang Y, Chen L, Li S. CIPHER-SC: Disease-Gene Association Inference Using Graph Convolution on a Context-Aware Network With Single-Cell Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:819-829. [PMID: 32809944 DOI: 10.1109/tcbb.2020.3017547] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Yang K, Zheng Y, Lu K, Chang K, Wang N, Shu Z, Yu J, Liu B, Gao Z, Zhou X. PDGNet: Predicting Disease Genes Using a Deep Neural Network With Multi-View Features. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:575-584. [PMID: 32750864 DOI: 10.1109/tcbb.2020.3002771] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Gene prediction of aging-related diseases based on DNN and Mashup. BMC Bioinformatics 2021;22:597. [PMID: 34920719 PMCID: PMC8680025 DOI: 10.1186/s12859-021-04518-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2021] [Accepted: 11/30/2021] [Indexed: 11/17/2022] Open

Abstract

Background

At present, the bioinformatics research on the relationship between aging-related diseases and genes is mainly through the establishment of a machine learning multi-label model to classify each gene. Most of the existing methods for predicting pathogenic genes mainly rely on specific types of gene features, or directly encode multiple features with different dimensions, use the same encoder to concatenate and predict the final results, which will be subject to many limitations in the applicability of the algorithm. Possible shortcomings of the above include: incomplete coverage of gene features by a single type of biomics data, overfitting of small dimensional datasets by a single encoder, or underfitting of larger dimensional datasets.

Methods

We use the known gene disease association data and gene descriptors, such as gene ontology terms (GO), protein interaction data (PPI), PathDIP, Kyoto Encyclopedia of genes and genomes Genes (KEGG), etc, as input for deep learning to predict the association between genes and diseases. Our innovation is to use Mashup algorithm to reduce the dimensionality of PPI, GO and other large biological networks, and add new pathway data in KEGG database, and then combine a variety of biological information sources through modular Deep Neural Network (DNN) to predict the genes related to aging diseases.

Result and conclusion

The results show that our algorithm is more effective than the standard neural network algorithm (the Area Under the ROC curve from 0.8795 to 0.9153), gradient enhanced tree classifier and logistic regression classifier. In this paper, we firstly use DNN to learn the similar genes associated with the known diseases from the complex multi-dimensional feature space, and then provide the evidence that the assumed genes are associated with a certain disease.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12859-021-04518-5.

Collapse

Bischof C, Mirtschink P, Yuan T, Wu M, Zhu C, Kaur J, Pham MD, Gonzalez-Gonoggia S, Hammer M, Rogg EM, Sharma R, Bottermann K, Gercken B, Hagag E, Berthonneche C, Sossalla S, Stehr SN, Maxeiner J, Duda MA, Latreille M, Zamboni N, Martelli F, Pedrazzini T, Dimmeler S, Krishnan J. Mitochondrial-cell cycle cross-talk drives endoreplication in heart disease. Sci Transl Med 2021;13:eabi7964. [PMID: 34878823 DOI: 10.1126/scitranslmed.abi7964] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Affiliation(s)

Corinne Bischof MRC Clinical Sciences Centre, Imperial College London, London W12 0NN, UK.,Institute of Cardiovascular Regeneration, Centre for Molecular Medicine, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
Peter Mirtschink Institute of Clinical Chemistry and Laboratory Medicine, Department of Clinical Pathobiochemistry, University Hospital Dresden, Fetscherstasse 74, 01307 Dresden, Germany
Ting Yuan Institute of Cardiovascular Regeneration, Centre for Molecular Medicine, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany.,Department of Medicine III, Division of Cardiology/Nephrology/Angiology, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
Meiqian Wu Institute of Cardiovascular Regeneration, Centre for Molecular Medicine, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany.,Department of Medicine III, Division of Cardiology/Nephrology/Angiology, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
Chaonan Zhu Institute of Cardiovascular Regeneration, Centre for Molecular Medicine, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany.,Department of Medicine III, Division of Cardiology/Nephrology/Angiology, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
Jaskiran Kaur Institute of Cardiovascular Regeneration, Centre for Molecular Medicine, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany.,Department of Medicine III, Division of Cardiology/Nephrology/Angiology, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
Minh Duc Pham Institute of Cardiovascular Regeneration, Centre for Molecular Medicine, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany.,Genome Biologics, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
Suam Gonzalez-Gonoggia MRC Clinical Sciences Centre, Imperial College London, London W12 0NN, UK
Marie Hammer Institute of Cardiovascular Regeneration, Centre for Molecular Medicine, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
Eva-Maria Rogg Institute of Cardiovascular Regeneration, Centre for Molecular Medicine, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
Rahul Sharma Institute of Cardiovascular Regeneration, Centre for Molecular Medicine, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
Katharina Bottermann Institute of Cardiovascular Regeneration, Centre for Molecular Medicine, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
Bettina Gercken Institute of Clinical Chemistry and Laboratory Medicine, Department of Clinical Pathobiochemistry, University Hospital Dresden, Fetscherstasse 74, 01307 Dresden, Germany
Eman Hagag Institute of Clinical Chemistry and Laboratory Medicine, Department of Clinical Pathobiochemistry, University Hospital Dresden, Fetscherstasse 74, 01307 Dresden, Germany
Corinne Berthonneche Cardiovascular Assessment Facility, University of Lausanne, CHUV, CH-1011 Lausanne, Switzerland
Samuel Sossalla Department of Internal Medicine II, University Medical Center Regensburg, 93053 Regensburg, Germany.,Klinik für Kardiologie und Pneumologie, Georg-August-Universität Goettingen, DZHK (German Centre for Cardiovascular Research), Robert-Koch Str. 40, D-37075 Goettingen, Germany
Sebastian N Stehr Department of Anesthesiology and Critical Care Medicine, University Hospital Leipzig, Liebigstrasse 20, D-04103 Leipzig, Germany
Joachim Maxeiner Genome Biologics, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
Maria Anna Duda Genome Biologics, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
Mathieu Latreille MRC Clinical Sciences Centre, Imperial College London, London W12 0NN, UK
Nicola Zamboni Institute of Molecular Systems Biology, ETH Zurich, Zurich 8093, Switzerland
Fabio Martelli Molecular Cardiology Laboratory, IRCCS-Policlinico San Donato, 20097, San Donato Milanese, Milan, Italy
Thierry Pedrazzini Department of Medicine, University of Lausanne Medical School, CHUV, MP14-220, 1011 Lausanne, Switzerland
Stefanie Dimmeler Institute of Cardiovascular Regeneration, Centre for Molecular Medicine, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany.,DZHK Partner Site RheinMain, Mainz, Germany.,Cardio-Pulmonary Institute, Giessen, Germany
Jaya Krishnan MRC Clinical Sciences Centre, Imperial College London, London W12 0NN, UK.,Institute of Cardiovascular Regeneration, Centre for Molecular Medicine, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany.,Department of Medicine III, Division of Cardiology/Nephrology/Angiology, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany.,Cardio-Pulmonary Institute, Giessen, Germany

Collapse

Moon J, Posada-Quintero HF, Kim I, Chon KH. Preliminary Analysis of the Risk Factor Identification Embedding Model for Cardiovascular Disease. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021;2021:1946-1949. [PMID: 34891668 DOI: 10.1109/embc46164.2021.9630039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

Abstract

Cardiovascular Disease (CVD) is responsible for a large part of healthcare costs every year, but susceptibility to it is affected by complex biological and physiological variables including patients' genetics and lifestyles. There has not been much work to develop a framework that incorporates these important and clinically relevant risk factors into a comprehensive model for CVD research. Moreover, the data labeling required to do so, such as annotating gene functions, is an extremely challenging, tedious, and time-consuming process. In this work, our goal was to develop and validate a risk factor embedding model, which incorporates genotype, phenotype without pre-labeled information to identify various risk factors of CVD. We hypothesize that (1) the knowledge background that does not require data labeling could be gathered from published abstract data, (2) the phenotype, genotype risk factors could be represented in an embedding vector space. We collected 1,363,682 published abstracts from PubMed using the keyword "heart" and 19,264 human gene names, then trained our model using the collected abstracts. We evaluated our CVD risk factor identification model using both intrinsic and extrinsic evaluations: for the intrinsic evaluation, we examined whether or not the captured top-10 words and genes have references related to the input query "myocardial infarction", as one of CVDs, and our model correctly identified them. For the extrinsic evaluation, we used our model to the dimensionality reduction task for classifications, and our method outperformed other popular methods. These results show the feasibility of our approach for disease-associated risk factors of CVD which incorporates genotype, phenotype.Clinical Relevance-Our model provides a comprehensive tool to incorporate various risk factors without any a priori data labeling knowledge for CVD. Our approach shows a potential to provide discovered knowledge that contributes to better understanding and treatment of CVD.

Collapse

Kim J, Kim D, Sohn KA. HiG2Vec: hierarchical representations of Gene Ontology and genes in the Poincaré ball. Bioinformatics 2021;37:2971-2980. [PMID: 33760022 DOI: 10.1093/bioinformatics/btab193] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Revised: 03/14/2021] [Accepted: 03/23/2021] [Indexed: 02/02/2023] Open

Holmgren SD, Boyles RR, Cronk RD, Duncan CG, Kwok RK, Lunn RM, Osborn KC, Thessen AE, Schmitt CP. Catalyzing Knowledge-Driven Discovery in Environmental Health Sciences through a Community-Driven Harmonized Language. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021;18:8985. [PMID: 34501574 PMCID: PMC8430534 DOI: 10.3390/ijerph18178985] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 08/13/2021] [Accepted: 08/19/2021] [Indexed: 01/10/2023]

Hassani‐Pak K, Singh A, Brandizi M, Hearnshaw J, Parsons JD, Amberkar S, Phillips AL, Doonan JH, Rawlings C. KnetMiner: a comprehensive approach for supporting evidence-based gene discovery and complex trait analysis across species. PLANT BIOTECHNOLOGY JOURNAL 2021;19:1670-1678. [PMID: 33750020 PMCID: PMC8384599 DOI: 10.1111/pbi.13583] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 12/17/2020] [Accepted: 03/16/2021] [Indexed: 05/03/2023]

Wang X, Yang Y, Li K, Li W, Li F, Peng S. BioERP: biomedical heterogeneous network-based self-supervised representation learning approach for entity relationship predictions. Bioinformatics 2021;37:4793-4800. [PMID: 34329382 DOI: 10.1093/bioinformatics/btab565] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Revised: 07/18/2021] [Accepted: 07/29/2021] [Indexed: 11/14/2022] Open

Lou P, Dong Y, Jimeno Yepes A, Li C. A representation model for biological entities by fusing structured axioms with unstructured texts. Bioinformatics 2021;37:1156-1163. [PMID: 33107905 DOI: 10.1093/bioinformatics/btaa913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 09/04/2020] [Accepted: 10/13/2020] [Indexed: 11/14/2022] Open

Chen J, Althagafi A, Hoehndorf R. Predicting candidate genes from phenotypes, functions and anatomical site of expression. Bioinformatics 2021;37:853-860. [PMID: 33051643 PMCID: PMC8248315 DOI: 10.1093/bioinformatics/btaa879] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 08/26/2020] [Accepted: 09/28/2020] [Indexed: 12/30/2022] Open

Liu-Wei W, Kafkas Ş, Chen J, Dimonaco NJ, Tegnér J, Hoehndorf R. DeepViral: prediction of novel virus-host interactions from protein sequences and infectious disease phenotypes. Bioinformatics 2021;37:2722-2729. [PMID: 33682875 PMCID: PMC8428617 DOI: 10.1093/bioinformatics/btab147] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 01/18/2021] [Accepted: 03/01/2021] [Indexed: 11/12/2022] Open

Alshahrani M, Thafar MA, Essack M. Application and evaluation of knowledge graph embeddings in biomedical data. PeerJ Comput Sci 2021;7:e341. [PMID: 33816992 PMCID: PMC7959619 DOI: 10.7717/peerj-cs.341] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Accepted: 11/29/2020] [Indexed: 05/07/2023]

Yang K, Lu K, Wu Y, Yu J, Liu B, Zhao Y, Chen J, Zhou X. A network-based machine-learning framework to identify both functional modules and disease genes. Hum Genet 2021;140:897-913. [PMID: 33409574 DOI: 10.1007/s00439-020-02253-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Accepted: 12/22/2020] [Indexed: 01/20/2023]

Ata SK, Wu M, Fang Y, Ou-Yang L, Kwoh CK, Li XL. Recent advances in network-based methods for disease gene prediction. Brief Bioinform 2020;22:6023077. [PMID: 33276376 DOI: 10.1093/bib/bbaa303] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 09/29/2020] [Accepted: 10/10/2020] [Indexed: 01/28/2023] Open

Liu R, Mancuso CA, Yannakopoulos A, Johnson KA, Krishnan A. Supervised learning is an accurate method for network-based gene classification. Bioinformatics 2020;36:3457-3465. [PMID: 32129827 PMCID: PMC7267831 DOI: 10.1093/bioinformatics/btaa150] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 12/01/2019] [Accepted: 02/27/2020] [Indexed: 12/22/2022] Open

Abstract

Background

Assigning every human gene to specific functions, diseases and traits is a grand challenge in modern genetics. Key to addressing this challenge are computational methods, such as supervised learning and label propagation, that can leverage molecular interaction networks to predict gene attributes. In spite of being a popular machine-learning technique across fields, supervised learning has been applied only in a few network-based studies for predicting pathway-, phenotype- or disease-associated genes. It is unknown how supervised learning broadly performs across different networks and diverse gene classification tasks, and how it compares to label propagation, the widely benchmarked canonical approach for this problem.

Results

In this study, we present a comprehensive benchmarking of supervised learning for network-based gene classification, evaluating this approach and a classic label propagation technique on hundreds of diverse prediction tasks and multiple networks using stringent evaluation schemes. We demonstrate that supervised learning on a gene’s full network connectivity outperforms label propagaton and achieves high prediction accuracy by efficiently capturing local network properties, rivaling label propagation’s appeal for naturally using network topology. We further show that supervised learning on the full network is also superior to learning on node embeddings (derived using node2vec), an increasingly popular approach for concisely representing network connectivity. These results show that supervised learning is an accurate approach for prioritizing genes associated with diverse functions, diseases and traits and should be considered a staple of network-based gene classification workflows.

Availability and implementation

The datasets and the code used to reproduce the results and add new gene classification methods have been made freely available.

Contact

arjun@msu.edu

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Tang X, Xiao Q, Yu K. Breast Cancer Candidate Gene Detection Through Integration of Subcellular Localization Data With Protein–Protein Interaction Networks. IEEE Trans Nanobioscience 2020;19:556-561. [DOI: 10.1109/tnb.2020.2990178] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Nicholson DN, Greene CS. Constructing knowledge graphs and their biomedical applications. Comput Struct Biotechnol J 2020;18:1414-1428. [PMID: 32637040 PMCID: PMC7327409 DOI: 10.1016/j.csbj.2020.05.017] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 05/22/2020] [Accepted: 05/23/2020] [Indexed: 12/31/2022] Open

Liu R, Mancuso CA, Yannakopoulos A, Johnson KA, Krishnan A. Supervised learning is an accurate method for network-based gene classification. BIOINFORMATICS (OXFORD, ENGLAND) 2020;36:3457-3465. [PMID: 32129827 DOI: 10.1101/721423] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 12/01/2019] [Accepted: 02/27/2020] [Indexed: 05/26/2023]

Abstract

BACKGROUND

RESULTS

In this study, we present a comprehensive benchmarking of supervised learning for network-based gene classification, evaluating this approach and a classic label propagation technique on hundreds of diverse prediction tasks and multiple networks using stringent evaluation schemes. We demonstrate that supervised learning on a gene's full network connectivity outperforms label propagaton and achieves high prediction accuracy by efficiently capturing local network properties, rivaling label propagation's appeal for naturally using network topology. We further show that supervised learning on the full network is also superior to learning on node embeddings (derived using node2vec), an increasingly popular approach for concisely representing network connectivity. These results show that supervised learning is an accurate approach for prioritizing genes associated with diverse functions, diseases and traits and should be considered a staple of network-based gene classification workflows.

AVAILABILITY AND IMPLEMENTATION

The datasets and the code used to reproduce the results and add new gene classification methods have been made freely available.

CONTACT

arjun@msu.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Tran VD, Sperduti A, Backofen R, Costa F. Heterogeneous networks integration for disease–gene prioritization with node kernels. Bioinformatics 2020;36:2649-2656. [DOI: 10.1093/bioinformatics/btaa008] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Revised: 12/19/2019] [Accepted: 01/23/2020] [Indexed: 12/21/2022] Open

Iourov IY, Vorsanova SG, Zelenova MA, Vasin KS, Kurinnaia OS, Korostelev SA, Yurov YB. [Epigenomic variations manifesting as a loss of heterozygosity affecting imprinted genes represent a molecular mechanism of autism spectrum disorders and intellectual disability in children]. Zh Nevrol Psikhiatr Im S S Korsakova 2019;119:91-97. [PMID: 31317896 DOI: 10.17116/jnevro201911905191] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]

Abstract

AIM

Long continuous stretches of homozygosity (LCSH) are regularly detected in studies using molecular karyotyping (SNP array). Despite this type of variation being able to provide meaningful data on the parents' kinship, uniparental disomy and chromosome rearrangements, LCSH are rarely considered as a possible epigenetic cause of neurodevelopmental disorders. Despite their direct relationship to imprinting, LCSH in imprinted loci have not been considered in terms of pathogenicity. The present work is aimed at studying LCSH in chromosomal regions containing imprinted genes previously associated with disease in children with idiopathic intellectual disability, autism, congenital malformations and/or epilepsy.

MATERIAL AND METHODS

Five hundred and four patients with autism spectrum disorders and intellectual disability were examined.

RESULTS

LCSH affecting imprinted loci associated with various diseases were identified in 40 (7.9%) individuals. Chromosomal region 7q21.3 was affected in twenty three cases, 15q11.2 in twelve, 11p15.5 in five, 7q32.2 in four. Four patients had 2 LCSH affecting imprinted loci. Besides one LCSH in 7q31.33q32.3 (~4 Mbp) region, all LCSH were 1-1.6 Mbp. Clinically, these cases resembled the corresponding imprinting diseases (e.g. Silver-Russell, Beckwith-Wiedemann, Prader-Willi, Angelman syndromes). Parental kinship was identified in 8 cases (1.59%), which were not affected by LCSH at imprinted loci.

CONCLUSION

The present study shows that LCSH affecting chromosomal regions 7q21.3, 7q32.2, 11p15.5 and 15p11.2 occur in about 7.9% of children with intellectual disability, autism, congenital malformations and/or epilepsy. Consequently, this type of epigenetic mutations is obviously common in a group of children with neurodevelopmental disorders. LCSH less than 2.5-10 Mbp are usually ignored in molecular karyotyping (SNP array) studies and, therefore, an important epigenetic cause of intellectual disability, autism or epilepsy with high probability remains without attention.

Collapse

Kafkas Ş, Abdelhakim M, Hashish Y, Kulmanov M, Abdellatif M, Schofield PN, Hoehndorf R. PathoPhenoDB, linking human pathogens to their phenotypes in support of infectious disease research. Sci Data 2019;6:79. [PMID: 31160594 PMCID: PMC6546783 DOI: 10.1038/s41597-019-0090-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Accepted: 05/07/2019] [Indexed: 12/11/2022] Open

Alghamdi SM, Sundberg BA, Sundberg JP, Schofield PN, Hoehndorf R. Quantitative evaluation of ontology design patterns for combining pathology and anatomy ontologies. Sci Rep 2019;9:4025. [PMID: 30858527 PMCID: PMC6411989 DOI: 10.1038/s41598-019-40368-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Accepted: 02/14/2019] [Indexed: 12/28/2022] Open

Smaili FZ, Gao X, Hoehndorf R. OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction. Bioinformatics 2018;35:2133-2140. [DOI: 10.1093/bioinformatics/bty933] [Citation(s) in RCA: 65] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2018] [Revised: 11/02/2018] [Accepted: 11/07/2018] [Indexed: 12/11/2022] Open