Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	[Subscribe] [Scholar Register]

Number

Cited by Other Article(s)

Freidel S, Schwarz E. Knowledge graphs in psychiatric research: Potential applications and future perspectives. Acta Psychiatr Scand 2024. [PMID: 38886846 DOI: 10.1111/acps.13717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 05/15/2024] [Accepted: 06/05/2024] [Indexed: 06/20/2024]

Althagafi A, Zhapa-Camacho F, Hoehndorf R. Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning. Bioinformatics 2024;40:btae301. [PMID: 38696757 PMCID: PMC11132820 DOI: 10.1093/bioinformatics/btae301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 04/05/2024] [Accepted: 04/30/2024] [Indexed: 05/04/2024] Open

Abstract

MOTIVATION

Whole-exome and genome sequencing have become common tools in diagnosing patients with rare diseases. Despite their success, this approach leaves many patients undiagnosed. A common argument is that more disease variants still await discovery, or the novelty of disease phenotypes results from a combination of variants in multiple disease-related genes. Interpreting the phenotypic consequences of genomic variants relies on information about gene functions, gene expression, physiology, and other genomic features. Phenotype-based methods to identify variants involved in genetic diseases combine molecular features with prior knowledge about the phenotypic consequences of altering gene functions. While phenotype-based methods have been successfully applied to prioritizing variants, such methods are based on known gene-disease or gene-phenotype associations as training data and are applicable to genes that have phenotypes associated, thereby limiting their scope. In addition, phenotypes are not assigned uniformly by different clinicians, and phenotype-based methods need to account for this variability.

RESULTS

We developed an Embedding-based Phenotype Variant Predictor (EmbedPVP), a computational method to prioritize variants involved in genetic diseases by combining genomic information and clinical phenotypes. EmbedPVP leverages a large amount of background knowledge from human and model organisms about molecular mechanisms through which abnormal phenotypes may arise. Specifically, EmbedPVP incorporates phenotypes linked to genes, functions of gene products, and the anatomical site of gene expression, and systematically relates them to their phenotypic effects through neuro-symbolic, knowledge-enhanced machine learning. We demonstrate EmbedPVP's efficacy on a large set of synthetic genomes and genomes matched with clinical information.

AVAILABILITY AND IMPLEMENTATION

EmbedPVP and all evaluation experiments are freely available at https://github.com/bio-ontology-research-group/EmbedPVP.

Collapse

Grassmann G, Miotto M, Desantis F, Di Rienzo L, Tartaglia GG, Pastore A, Ruocco G, Monti M, Milanetti E. Computational Approaches to Predict Protein-Protein Interactions in Crowded Cellular Environments. Chem Rev 2024;124:3932-3977. [PMID: 38535831 PMCID: PMC11009965 DOI: 10.1021/acs.chemrev.3c00550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 02/20/2024] [Accepted: 02/21/2024] [Indexed: 04/11/2024]

Sousa RT, Silva S, Pesquita C. Explaining protein-protein interactions with knowledge graph-based semantic similarity. Comput Biol Med 2024;170:108076. [PMID: 38308873 DOI: 10.1016/j.compbiomed.2024.108076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 12/11/2023] [Accepted: 01/27/2024] [Indexed: 02/05/2024]

Abstract

The application of artificial intelligence and machine learning methods for several biomedical applications, such as protein-protein interaction prediction, has gained significant traction in recent decades. However, explainability is a key aspect of using machine learning as a tool for scientific discovery. Explainable artificial intelligence approaches help clarify algorithmic mechanisms and identify potential bias in the data. Given the complexity of the biomedical domain, explanations should be grounded in domain knowledge which can be achieved by using ontologies and knowledge graphs. These knowledge graphs express knowledge about a domain by capturing different perspectives of the representation of real-world entities. However, the most popular way to explore knowledge graphs with machine learning is through using embeddings, which are not explainable. As an alternative, knowledge graph-based semantic similarity offers the advantage of being explainable. Additionally, similarity can be computed to capture different semantic aspects within the knowledge graph and increasing the explainability of predictive approaches. We propose a novel method to generate explainable vector representations, KGsim2vec, that uses aspect-oriented semantic similarity features to represent pairs of entities in a knowledge graph. Our approach employs a set of machine learning models, including decision trees, genetic programming, random forest and eXtreme gradient boosting, to predict relations between entities. The experiments reveal that considering multiple semantic aspects when representing the similarity between two entities improves explainability and predictive performance. KGsim2vec performs better than black-box methods based on knowledge graph embeddings or graph neural networks. Moreover, KGsim2vec produces global models that can capture biological phenomena and elucidate data biases.

Collapse

Chen H, King FJ, Zhou B, Wang Y, Canedy CJ, Hayashi J, Zhong Y, Chang MW, Pache L, Wong JL, Jia Y, Joslin J, Jiang T, Benner C, Chanda SK, Zhou Y. Drug target prediction through deep learning functional representation of gene signatures. Nat Commun 2024;15:1853. [PMID: 38424040 PMCID: PMC10904399 DOI: 10.1038/s41467-024-46089-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 02/14/2024] [Indexed: 03/02/2024] Open

Affiliation(s)

Hao Chen Novartis Biomedical Research, 10675 John Jay Hopkins Drive, San Diego, CA, 92121, USA. Department of Computer Science and Engineering, University of California, Riverside, 900 University Avenue, Riverside, CA, 92521, USA. Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
Frederick J King Novartis Biomedical Research, 10675 John Jay Hopkins Drive, San Diego, CA, 92121, USA
Bin Zhou Novartis Biomedical Research, 10675 John Jay Hopkins Drive, San Diego, CA, 92121, USA
Yu Wang Novartis Biomedical Research, 10675 John Jay Hopkins Drive, San Diego, CA, 92121, USA
Carter J Canedy Novartis Biomedical Research, 10675 John Jay Hopkins Drive, San Diego, CA, 92121, USA
Joel Hayashi Novartis Biomedical Research, 10675 John Jay Hopkins Drive, San Diego, CA, 92121, USA
Yang Zhong Novartis Biomedical Research, 10675 John Jay Hopkins Drive, San Diego, CA, 92121, USA
Max W Chang Department of Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
Lars Pache NCI Designated Cancer Center, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, 92037, USA
Julian L Wong Novartis Biomedical Research, 10675 John Jay Hopkins Drive, San Diego, CA, 92121, USA
Yong Jia Novartis Biomedical Research, 10675 John Jay Hopkins Drive, San Diego, CA, 92121, USA
John Joslin Novartis Biomedical Research, 10675 John Jay Hopkins Drive, San Diego, CA, 92121, USA
Tao Jiang Department of Computer Science and Engineering, University of California, Riverside, 900 University Avenue, Riverside, CA, 92521, USA
Christopher Benner Department of Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
Sumit K Chanda Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, 92037, USA
Yingyao Zhou Novartis Biomedical Research, 10675 John Jay Hopkins Drive, San Diego, CA, 92121, USA.

Collapse

Li W, Wang B, Dai J, Kou Y, Chen X, Pan Y, Hu S, Xu ZZ. Partial order relation-based gene ontology embedding improves protein function prediction. Brief Bioinform 2024;25:bbae077. [PMID: 38446740 PMCID: PMC10917077 DOI: 10.1093/bib/bbae077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 01/22/2024] [Indexed: 03/08/2024] Open

Sanjak J, Binder J, Yadaw AS, Zhu Q, Mathé EA. Clustering rare diseases within an ontology-enriched knowledge graph. J Am Med Inform Assoc 2023;31:154-164. [PMID: 37759342 PMCID: PMC10746319 DOI: 10.1093/jamia/ocad186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 08/01/2023] [Accepted: 09/05/2023] [Indexed: 09/29/2023] Open

Wang Y, Wegner P, Domingo-Fernández D, Tom Kodamullil A. Multi-ontology embeddings approach on human-aligned multi-ontologies representation for gene-disease associations prediction. Heliyon 2023;9:e21502. [PMID: 38027969 PMCID: PMC10651438 DOI: 10.1016/j.heliyon.2023.e21502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 10/17/2023] [Accepted: 10/23/2023] [Indexed: 12/01/2023] Open

Li N, Yang Z, Yang Y, Wang J, Lin H. Hyperbolic hierarchical knowledge graph embeddings for biological entities. J Biomed Inform 2023;147:104503. [PMID: 37778673 DOI: 10.1016/j.jbi.2023.104503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 08/25/2023] [Accepted: 09/19/2023] [Indexed: 10/03/2023]

Muniyappan S, Rayan AXA, Varrieth GT. EGeRepDR: An enhanced genetic-based representation learning for drug repurposing using multiple biomedical sources. J Biomed Inform 2023;147:104528. [PMID: 37858852 DOI: 10.1016/j.jbi.2023.104528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 09/11/2023] [Accepted: 10/16/2023] [Indexed: 10/21/2023]

Abstract

MOTIVATION

Drug repurposing (DR) is an imminent approach for identifying novel therapeutic indications for the available drugs and discovering novel drugs for previously untreatable diseases. Nowadays, DR has major attention in the pharmaceutical industry due to the high cost and time of launching new drugs to the market through traditional drug development. DR task majorly depends on genetic information since the drugs revert the modified Gene Expression (GE) of diseases to normal. Many of the existing studies have not considered the genetic importance of predicting the potential candidates.

METHOD

We proposed a novel multimodal framework that utilizes genetic aspects of drugs and diseases such as genes, pathways, gene signatures, or expression to enhance the performance of DR using various data sources. Firstly, the heterogeneous biological network (HBN) is constructed with three types of nodes namely drug, disease, and gene, and 4 types of edges similarities (drug, gene, and disease), drug-gene, gene-disease, and drug-disease. Next, a modified graph auto-encoder (GAE*) model is applied to learn the representation of drug and disease nodes using the topological structure and edge information. Secondly, the HBN is enhanced with the information extracted from biomedical literature and ontology using a novel semi-supervised pattern embedding-based bootstrapping model and novel DR perspective representation learning respectively to improve the prediction performance. Finally, our proposed system uses a neural network model to generate the probability score of drug-disease pairs.

RESULTS

We demonstrate the efficiency of the proposed model on various datasets and achieved outstanding performance in 5-fold cross-validation (AUC = 0.99, AUPR = 0.98). Further, we validated the top-ranked potential candidates using pathway analysis and proved that the known and predicted candidates share common genes in the pathways.

Collapse

Jha K, Saha S, Karmakar S. Prediction of Protein-Protein Interactions Using Vision Transformer and Language Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:3215-3225. [PMID: 37027644 DOI: 10.1109/tcbb.2023.3248797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]

Xu W, Duan L, Zheng H, Li-Ling J, Jiang W, Zhang Y, Wang T, Qin R. An Integrative Disease Information Network Approach to Similar Disease Detection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:2724-2735. [PMID: 34478379 DOI: 10.1109/tcbb.2021.3110127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Nunes S, Sousa R, Pesquita C. Multi-domain knowledge graph embeddings for gene-disease association prediction. J Biomed Semantics 2023;14:11. [PMID: 37580835 PMCID: PMC10426189 DOI: 10.1186/s13326-023-00291-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 07/29/2023] [Indexed: 08/16/2023] Open

Kartheeswaran KP, Rayan AXA, Varrieth GT. Enhanced disease-disease association with information enriched disease representation. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023;20:8892-8932. [PMID: 37161227 DOI: 10.3934/mbe.2023391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]

Abstract

OBJECTIVE

Quantification of disease-disease association (DDA) enables the understanding of disease relationships for discovering disease progression and finding comorbidity. For effective DDA strength calculation, there is a need to address the main challenge of integration of various biomedical aspects of DDA is to obtain an information rich disease representation.

MATERIALS AND METHODS

An enhanced and integrated DDA framework is developed that integrates enriched literature-based with concept-based DDA representation. The literature component of the proposed framework uses PubMed abstracts and consists of improved neural network model that classifies DDAs for an enhanced literature-based DDA representation. Similarly, an ontology-based joint multi-source association embedding model is proposed in the ontology component using Disease Ontology (DO), UMLS, claims insurance, clinical notes etc. Results and Discussion: The obtained information rich disease representation is evaluated on different aspects of DDA datasets such as Gene, Variant, Gene Ontology (GO) and a human rated benchmark dataset. The DDA scores calculated using the proposed method achieved a high correlation mainly in gene-based dataset. The quantified scores also shown better correlation of 0.821, when evaluated on human rated 213 disease pairs. In addition, the generated disease representation is proved to have substantial effect on correlation of DDA scores for different categories of disease pairs.

CONCLUSION

The enhanced context and semantic DDA framework provides an enriched disease representation, resulting in high correlated results with different DDA datasets. We have also presented the biological interpretation of disease pairs. The developed framework can also be used for deriving the strength of other biomedical associations.

Collapse

Castell-Díaz J, Miñarro-Giménez JA, Martínez-Costa C. Supporting SNOMED CT postcoordination with knowledge graph embeddings. J Biomed Inform 2023;139:104297. [PMID: 36736448 DOI: 10.1016/j.jbi.2023.104297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 12/22/2022] [Accepted: 01/25/2023] [Indexed: 02/03/2023]

Sanjak J, Zhu Q, Mathé EA. Clustering rare diseases within an ontology-enriched knowledge graph. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.15.528673. [PMID: 36824742 PMCID: PMC9949046 DOI: 10.1101/2023.02.15.528673] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]

Abstract

Objective

Identifying sets of rare diseases with shared aspects of etiology and pathophysiology may enable drug repurposing and/or platform based therapeutic development. Toward that aim, we utilized an integrative knowledge graph-based approach to constructing clusters of rare diseases.

Materials and Methods

Data on 3,242 rare diseases were extracted from the National Center for Advancing Translational Science (NCATS) Genetic and Rare Diseases Information center (GARD) internal data resources. The rare disease data was enriched with additional biomedical data, including gene and phenotype ontologies, biological pathway data and small molecule-target activity data, to create a knowledge graph (KG). Node embeddings were used to convert nodes into vectors upon which k-means clustering was applied. We validated the disease clusters through semantic similarity and feature enrichment analysis.

Results

A node embedding model was trained on the ontology enriched rare disease KG and k-means clustering was applied to the embedding vectors resulting in 37 disease clusters with a mean size of 87 diseases. We validate the disease clusters quantitatively by looking at semantic similarity of clustered diseases, using the Orphanet Rare Disease Ontology. In addition, the clusters were analyzed for enrichment of associated genes, revealing that the enriched genes within clusters were shown to be highly related.

Discussion

We demonstrate that node embeddings are an effective method for clustering diseases within a heterogenous KG. Semantically similar diseases and relevant enriched genes have been uncovered within the clusters. Connections between disease clusters and approved or investigational drugs are enumerated for follow-up efforts.

Conclusion

Our study lays out a method for clustering rare diseases using the graph node embeddings. We develop an easy to maintain pipeline that can be updated when new data on rare diseases emerges. The embeddings themselves can be paired with other representation learning methods for other data types, such as drugs, to address other predictive modeling problems. Detailed subnetwork analysis and in-depth review of individual clusters may lead to translatable findings. Future work will focus on incorporation of additional data sources, with a particular focus on common disease data.

Collapse

Munarko Y, Rampadarath A, Nickerson D. Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE). F1000Res 2023;12:162. [PMID: 37842339 PMCID: PMC10570691 DOI: 10.12688/f1000research.128982.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/25/2023] [Indexed: 10/17/2023] Open

Carvalho RMS, Oliveira D, Pesquita C. Knowledge Graph Embeddings for ICU readmission prediction. BMC Med Inform Decis Mak 2023;23:12. [PMID: 36658526 PMCID: PMC9850812 DOI: 10.1186/s12911-022-02070-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 11/28/2022] [Indexed: 01/20/2023] Open

Abstract

BACKGROUND

Intensive Care Unit (ICU) readmissions represent both a health risk for patients,with increased mortality rates and overall health deterioration, and a financial burden for healthcare facilities. As healthcare became more data-driven with the introduction of Electronic Health Records (EHR), machine learning methods have been applied to predict ICU readmission risk. However, these methods disregard the meaning and relationships of data objects and work blindly over clinical data without taking into account scientific knowledge and context. Ontologies and Knowledge Graphs can help bridge this gap between data and scientific context, as they are computational artefacts that represent the entities of a domain and their relationships to each other in a formalized way.

METHODS AND RESULTS

We have developed an approach that enriches EHR data with semantic annotations to ontologies to build a Knowledge Graph. A patient's ICU stay is represented by Knowledge Graph embeddings in a contextualized manner, which are used by machine learning models to predict 30-days ICU readmissions. This approach is based on several contributions: (1) an enrichment of the MIMIC-III dataset with patient-oriented annotations to various biomedical ontologies; (2) a Knowledge Graph that defines patient data with biomedical ontologies; (3) a predictive model of ICU readmission risk that uses Knowledge Graph embeddings; (4) a variant of the predictive model that targets different time points during an ICU stay. Our predictive approaches outperformed both a baseline and state-of-the-art works achieving a mean Area Under the Receiver Operating Characteristic Curve of 0.827 and an Area Under the Precision-Recall Curve of 0.691. The application of this novel approach to help clinicians decide whether a patient can be discharged has the potential to prevent the readmission of [Formula: see text] of Intensive Care Unit patients, without unnecessarily prolonging the stay of those who would not require it.

CONCLUSION

The coupling of semantic annotation and Knowledge Graph embeddings affords two clear advantages: they consider scientific context and they are able to build representations of EHR information of different types in a common format. This work demonstrates the potential for impact that integrating ontologies and Knowledge Graphs into clinical machine learning applications can have.

Collapse

Li X, Han P, Chen W, Gao C, Wang S, Song T, Niu M, Rodriguez-Patón A. MARPPI: boosting prediction of protein-protein interactions with multi-scale architecture residual network. Brief Bioinform 2023;24:6887309. [PMID: 36502435 DOI: 10.1093/bib/bbac524] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 09/29/2022] [Accepted: 11/04/2022] [Indexed: 12/14/2022] Open

Wang H, Zheng H, Chen DZ. TANGO: A GO-Term Embedding Based Method for Protein Semantic Similarity Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:694-706. [PMID: 35030084 DOI: 10.1109/tcbb.2022.3143480] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Jha K, Saha S. Analyzing Effect of Multi-Modality in Predicting Protein-Protein Interactions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:162-173. [PMID: 35259112 DOI: 10.1109/tcbb.2022.3157531] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Zhapa-Camacho F, Kulmanov M, Hoehndorf R. mOWL: Python library for machine learning with biomedical ontologies. Bioinformatics 2022;39:6935780. [PMID: 36534832 PMCID: PMC9848046 DOI: 10.1093/bioinformatics/btac811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 11/25/2022] [Accepted: 12/16/2022] [Indexed: 12/24/2022] Open

He Y, Yu H, Huffman A, Lin AY, Natale DA, Beverley J, Zheng L, Perl Y, Wang Z, Liu Y, Ong E, Wang Y, Huang P, Tran L, Du J, Shah Z, Shah E, Desai R, Huang HH, Tian Y, Merrell E, Duncan WD, Arabandi S, Schriml LM, Zheng J, Masci AM, Wang L, Liu H, Smaili FZ, Hoehndorf R, Pendlington ZM, Roncaglia P, Ye X, Xie J, Tang YW, Yang X, Peng S, Zhang L, Chen L, Hur J, Omenn GS, Athey B, Smith B. A comprehensive update on CIDO: the community-based coronavirus infectious disease ontology. J Biomed Semantics 2022;13:25. [PMID: 36271389 PMCID: PMC9585694 DOI: 10.1186/s13326-022-00279-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 09/13/2022] [Indexed: 11/24/2022] Open

Abstract

Background

The current COVID-19 pandemic and the previous SARS/MERS outbreaks of 2003 and 2012 have resulted in a series of major global public health crises. We argue that in the interest of developing effective and safe vaccines and drugs and to better understand coronaviruses and associated disease mechenisms it is necessary to integrate the large and exponentially growing body of heterogeneous coronavirus data. Ontologies play an important role in standard-based knowledge and data representation, integration, sharing, and analysis. Accordingly, we initiated the development of the community-based Coronavirus Infectious Disease Ontology (CIDO) in early 2020.

Results

As an Open Biomedical Ontology (OBO) library ontology, CIDO is open source and interoperable with other existing OBO ontologies. CIDO is aligned with the Basic Formal Ontology and Viral Infectious Disease Ontology. CIDO has imported terms from over 30 OBO ontologies. For example, CIDO imports all SARS-CoV-2 protein terms from the Protein Ontology, COVID-19-related phenotype terms from the Human Phenotype Ontology, and over 100 COVID-19 terms for vaccines (both authorized and in clinical trial) from the Vaccine Ontology. CIDO systematically represents variants of SARS-CoV-2 viruses and over 300 amino acid substitutions therein, along with over 300 diagnostic kits and methods. CIDO also describes hundreds of host-coronavirus protein-protein interactions (PPIs) and the drugs that target proteins in these PPIs. CIDO has been used to model COVID-19 related phenomena in areas such as epidemiology. The scope of CIDO was evaluated by visual analysis supported by a summarization network method. CIDO has been used in various applications such as term standardization, inference, natural language processing (NLP) and clinical data integration. We have applied the amino acid variant knowledge present in CIDO to analyze differences between SARS-CoV-2 Delta and Omicron variants. CIDO's integrative host-coronavirus PPIs and drug-target knowledge has also been used to support drug repurposing for COVID-19 treatment.

Conclusion

CIDO represents entities and relations in the domain of coronavirus diseases with a special focus on COVID-19. It supports shared knowledge representation, data and metadata standardization and integration, and has been used in a range of applications.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13326-022-00279-z.

Collapse

Affiliation(s)

Yongqun He University of Michigan Medical School, Ann Arbor, MI, USA.
Hong Yu People's Hospital of Guizhou Province, Guiyang, Guizhou, China.
Anthony Huffman University of Michigan Medical School, Ann Arbor, MI, USA
Asiyah Yu Lin National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.,National Center for Ontological Research, Buffalo, NY, USA
Darren A Natale Georgetown University Medical Center, Washington, DC, USA
John Beverley National Center for Ontological Research, Buffalo, NY, USA.,The Johns Hopkins University Applied Physics Laboratory, Laurel, MD, USA
Ling Zheng Computer Science and Software Engineering Department, Monmouth University, West Long Branch, NJ, USA
Yehoshua Perl Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
Zhigang Wang Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & School of Basic Medicine, Peking Union Medical College, Beijing, China
Yingtong Liu University of Michigan Medical School, Ann Arbor, MI, USA
Edison Ong University of Michigan Medical School, Ann Arbor, MI, USA
Yang Wang University of Michigan Medical School, Ann Arbor, MI, USA.,People's Hospital of Guizhou Province, Guiyang, Guizhou, China
Philip Huang University of Michigan Medical School, Ann Arbor, MI, USA
Long Tran University of Michigan Medical School, Ann Arbor, MI, USA
Jinyang Du University of Michigan Medical School, Ann Arbor, MI, USA
Zalan Shah University of Michigan Medical School, Ann Arbor, MI, USA
Easheta Shah University of Michigan Medical School, Ann Arbor, MI, USA
Roshan Desai University of Michigan Medical School, Ann Arbor, MI, USA
Hsin-Hui Huang University of Michigan Medical School, Ann Arbor, MI, USA.,National Yang-Ming University, Taipei, Taiwan
Yujia Tian Rutgers University, New Brunswick, NJ, USA
Eric Merrell University at Buffalo, Buffalo, NY, 14260, USA
William D Duncan University of Florida, Gainesville, FL, USA
Sivaram Arabandi OntoPro LLC, Houston, TX, USA
Lynn M Schriml University of Maryland School of Medicine, Baltimore, MD, USA
Jie Zheng Department of Biology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
Anna Maria Masci Office of Data Science, National Institute of Environmental Health Sciences, Research Triangle Park, NC, USA
Liwei Wang Mayo Clinic, Rochester, MN, USA
Hongfang Liu Mayo Clinic, Rochester, MN, USA
Fatima Zohra Smaili King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
Robert Hoehndorf King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
Zoë May Pendlington European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, UK
Paola Roncaglia European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, UK
Xianwei Ye People's Hospital of Guizhou Province, Guiyang, Guizhou, China
Jiangan Xie School of Bioinformatics, Chongqing University of Posts and Telecommunications, Chongqing, China
Yi-Wei Tang Cepheid, Danaher Diagnostic Platform, Shanghai, China
Xiaolin Yang Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & School of Basic Medicine, Peking Union Medical College, Beijing, China
Suyuan Peng National Institute of Health Data Science, Peking University, Beijing, China
Luxia Zhang National Institute of Health Data Science, Peking University, Beijing, China
Luonan Chen Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China
Junguk Hur University of North Dakota School of Medicine and Health Sciences, Grand Forks, ND, USA
Gilbert S Omenn University of Michigan Medical School, Ann Arbor, MI, USA
Brian Athey University of Michigan Medical School, Ann Arbor, MI, USA
Barry Smith National Center for Ontological Research, Buffalo, NY, USA.,University at Buffalo, Buffalo, NY, 14260, USA

Collapse

Zhao L, Sun H, Cao X, Wen N, Wang J, Wang C. Learning representations for gene ontology terms by jointly encoding graph structure and textual node descriptors. Brief Bioinform 2022;23:6651302. [PMID: 35901452 DOI: 10.1093/bib/bbac318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 06/06/2022] [Accepted: 07/13/2022] [Indexed: 11/14/2022] Open

Manifold biomedical text sentence embedding. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.04.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Alghamdi SM, Schofield PN, Hoehndorf R. How much do model organism phenotypes contribute to the computational identification of human disease genes? Dis Model Mech 2022;15:275986. [PMID: 35758016 PMCID: PMC9366895 DOI: 10.1242/dmm.049441] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Accepted: 06/13/2022] [Indexed: 12/04/2022] Open

Alshahrani M, Almansour A, Alkhaldi A, Thafar MA, Uludag M, Essack M, Hoehndorf R. Combining biomedical knowledge graphs and text to improve predictions for drug-target interactions and drug-indications. PeerJ 2022;10:e13061. [PMID: 35402106 PMCID: PMC8988936 DOI: 10.7717/peerj.13061] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Accepted: 02/13/2022] [Indexed: 01/11/2023] Open

Kamran AB, Naveed H. GOntoSim: a semantic similarity measure based on LCA and common descendants. Sci Rep 2022;12:3818. [PMID: 35264663 PMCID: PMC8907294 DOI: 10.1038/s41598-022-07624-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Accepted: 02/14/2022] [Indexed: 11/20/2022] Open

Ieremie I, Ewing RM, Niranjan M. TransformerGO: predicting protein-protein interactions by modelling the attention between sets of gene ontology terms. Bioinformatics 2022;38:2269-2277. [PMID: 35176146 PMCID: PMC9363134 DOI: 10.1093/bioinformatics/btac104] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 01/26/2022] [Accepted: 02/15/2022] [Indexed: 02/03/2023] Open

Xiang J, Zhang J, Zhao Y, Wu FX, Li M. Biomedical data, computational methods and tools for evaluating disease-disease associations. Brief Bioinform 2022;23:6522999. [PMID: 35136949 DOI: 10.1093/bib/bbac006] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 01/04/2022] [Accepted: 01/05/2022] [Indexed: 12/12/2022] Open

Edera AA, Milone DH, Stegmayer G. Anc2vec: embedding gene ontology terms by preserving ancestors relationships. Brief Bioinform 2022;23:6523148. [PMID: 35136916 DOI: 10.1093/bib/bbac003] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 12/13/2021] [Accepted: 01/04/2022] [Indexed: 12/11/2022] Open

Slater LT, Russell S, Makepeace S, Carberry A, Karwath A, Williams JA, Fanning H, Ball S, Hoehndorf R, Gkoutos GV. Evaluating semantic similarity methods for comparison of text-derived phenotype profiles. BMC Med Inform Decis Mak 2022;22:33. [PMID: 35123470 PMCID: PMC8818208 DOI: 10.1186/s12911-022-01770-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Accepted: 01/21/2022] [Indexed: 11/16/2022] Open

Althagafi A, Alsubaie L, Kathiresan N, Mineta K, Aloraini T, Al Mutairi F, Alfadhel M, Gojobori T, Alfares A, Hoehndorf R. DeepSVP: integration of genotype and phenotype for structural variant prioritization using deep learning. Bioinformatics 2021;38:1677-1684. [PMID: 34951628 PMCID: PMC8896633 DOI: 10.1093/bioinformatics/btab859] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 12/07/2021] [Accepted: 12/21/2021] [Indexed: 02/03/2023] Open

Affiliation(s)

Azza Althagafi Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia,Computer Science Department, College of Computers and Information Technology, Taif University, Taif, Saudi Arabia
Lamia Alsubaie Department of Pathology and Laboratory Medicine, King Abdulaziz Medical City (KAMC), Riyadh, Saudi Arabia,Center for Genetics and Inherited Diseases, Taibah University, Almadinah Almunwarah, Saudi Arabia
Nagarajan Kathiresan Supercomputing Core Lab, KAUST, Thuwal, Saudi Arabia
Katsuhiko Mineta Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
Taghrid Aloraini Department of Pathology and Laboratory Medicine, King Abdulaziz Medical City (KAMC), Riyadh, Saudi Arabia,King Saud bin Abdulaziz University for Health Sciences, King Abdullah International Medical Research Centre, Ministry of National Guard-Health Affairs (MNG-HA), Riyadh, Saudi Arabia
Fuad Al Mutairi Genetics & Precision Medicine Department, King Abdulaziz Medical City, Ministry of National Guard-Health Affairs (MNG-HA), Riyadh, Saudi Arabia,King Saud bin Abdulaziz University for Health Sciences, King Abdullah International Medical Research Centre, Ministry of National Guard-Health Affairs (MNG-HA), Riyadh, Saudi Arabia
Majid Alfadhel Genetics & Precision Medicine Department, King Abdulaziz Medical City, Ministry of National Guard-Health Affairs (MNG-HA), Riyadh, Saudi Arabia,King Saud bin Abdulaziz University for Health Sciences, King Abdullah International Medical Research Centre, Ministry of National Guard-Health Affairs (MNG-HA), Riyadh, Saudi Arabia
Takashi Gojobori KCBRC, Biological and Environmental Science and Engineering Division (BESE), KAUST, Thuwal, Saudi Arabia
Ahmad Alfares Department of Pathology and Laboratory Medicine, King Abdulaziz Medical City (KAMC), Riyadh, Saudi Arabia,King Saud bin Abdulaziz University for Health Sciences, King Abdullah International Medical Research Centre, Ministry of National Guard-Health Affairs (MNG-HA), Riyadh, Saudi Arabia,Department of Pediatrics, College of Medicine, Qassim University, Qassim, Saudi Arabia
Robert Hoehndorf To whom correspondence should be addressed.

Collapse

Nourani E. GoVec: Gene Ontology Representation Learning Using Weighted Heterogeneous Graph and Meta-Path. J Comput Biol 2021;28:1196-1207. [PMID: 34847734 DOI: 10.1089/cmb.2021.0069] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open

Network-based protein-protein interaction prediction method maps perturbations of cancer interactome. PLoS Genet 2021;17:e1009869. [PMID: 34727106 PMCID: PMC8610286 DOI: 10.1371/journal.pgen.1009869] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Revised: 11/23/2021] [Accepted: 10/09/2021] [Indexed: 01/09/2023] Open

Konopka T, Vestito L, Smedley D. Dimensional reduction of phenotypes from 53 000 mouse models reveals a diverse landscape of gene function. BIOINFORMATICS ADVANCES 2021;1:vbab026. [PMID: 34870209 PMCID: PMC8633315 DOI: 10.1093/bioadv/vbab026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Revised: 09/09/2021] [Accepted: 10/07/2021] [Indexed: 01/27/2023]

Kim J, Kim D, Sohn KA. HiG2Vec: hierarchical representations of Gene Ontology and genes in the Poincaré ball. Bioinformatics 2021;37:2971-2980. [PMID: 33760022 DOI: 10.1093/bioinformatics/btab193] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Revised: 03/14/2021] [Accepted: 03/23/2021] [Indexed: 02/02/2023] Open

Wang X, Yang Y, Li K, Li W, Li F, Peng S. BioERP: biomedical heterogeneous network-based self-supervised representation learning approach for entity relationship predictions. Bioinformatics 2021;37:4793-4800. [PMID: 34329382 DOI: 10.1093/bioinformatics/btab565] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Revised: 07/18/2021] [Accepted: 07/29/2021] [Indexed: 11/14/2022] Open

Kulmanov M, Smaili FZ, Gao X, Hoehndorf R. Semantic similarity and machine learning with ontologies. Brief Bioinform 2021;22:bbaa199. [PMID: 33049044 PMCID: PMC8293838 DOI: 10.1093/bib/bbaa199] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 08/03/2020] [Accepted: 08/04/2020] [Indexed: 12/13/2022] Open

OWL2Vec*: embedding of OWL ontologies. Mach Learn 2021. [DOI: 10.1007/s10994-021-05997-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Lou P, Dong Y, Jimeno Yepes A, Li C. A representation model for biological entities by fusing structured axioms with unstructured texts. Bioinformatics 2021;37:1156-1163. [PMID: 33107905 DOI: 10.1093/bioinformatics/btaa913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 09/04/2020] [Accepted: 10/13/2020] [Indexed: 11/14/2022] Open

Chen J, Althagafi A, Hoehndorf R. Predicting candidate genes from phenotypes, functions and anatomical site of expression. Bioinformatics 2021;37:853-860. [PMID: 33051643 PMCID: PMC8248315 DOI: 10.1093/bioinformatics/btaa879] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 08/26/2020] [Accepted: 09/28/2020] [Indexed: 12/30/2022] Open

Slater K, Karwath A, Williams JA, Russell S, Makepeace S, Carberry A, Hoehndorf R, Gkoutos GV. Towards similarity-based differential diagnostics for common diseases. Comput Biol Med 2021;133:104360. [PMID: 33836447 PMCID: PMC8204262 DOI: 10.1016/j.compbiomed.2021.104360] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 03/22/2021] [Accepted: 03/24/2021] [Indexed: 11/30/2022]

Affiliation(s)

Karin Slater College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK.
Andreas Karwath College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
John A Williams College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
Sophie Russell College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK
Silver Makepeace College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK
Alexander Carberry College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK
Robert Hoehndorf Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Saudi Arabia
Georgios V Gkoutos College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; NIHR Experimental Cancer Medicine Centre, UK; NIHR Surgical Reconstruction and Microbiology Research Centre, UK; NIHR Biomedical Research Centre, UK; MRC Health Data Research UK (HDR UK) Midlands, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK

Collapse

Chen YZ, Wang ZZ, Wang Y, Ying G, Chen Z, Song J. nhKcr: a new bioinformatics tool for predicting crotonylation sites on human nonhistone proteins based on deep learning. Brief Bioinform 2021;22:6277413. [PMID: 34002774 DOI: 10.1093/bib/bbab146] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Revised: 03/18/2021] [Accepted: 03/25/2021] [Indexed: 12/20/2022] Open

Abstract

Lysine crotonylation (Kcr) is a newly discovered type of protein post-translational modification and has been reported to be involved in various pathophysiological processes. High-resolution mass spectrometry is the primary approach for identification of Kcr sites. However, experimental approaches for identifying Kcr sites are often time-consuming and expensive when compared with computational approaches. To date, several predictors for Kcr site prediction have been developed, most of which are capable of predicting crotonylation sites on either histones alone or mixed histone and nonhistone proteins together. These methods exhibit high diversity in their algorithms, encoding schemes, feature selection techniques and performance assessment strategies. However, none of them were designed for predicting Kcr sites on nonhistone proteins. Therefore, it is desirable to develop an effective predictor for identifying Kcr sites from the large amount of nonhistone sequence data. For this purpose, we first provide a comprehensive review on six methods for predicting crotonylation sites. Second, we develop a novel deep learning-based computational framework termed as CNNrgb for Kcr site prediction on nonhistone proteins by integrating different types of features. We benchmark its performance against multiple commonly used machine learning classifiers (including random forest, logitboost, naïve Bayes and logistic regression) by performing both 10-fold cross-validation and independent test. The results show that the proposed CNNrgb framework achieves the best performance with high computational efficiency on large datasets. Moreover, to facilitate users' efforts to investigate Kcr sites on human nonhistone proteins, we implement an online server called nhKcr and compare it with other existing tools to illustrate the utility and robustness of our method. The nhKcr web server and all the datasets utilized in this study are freely accessible at http://nhKcr.erc.monash.edu/.

Collapse

Li Y, Wang K, Wang G. Evaluating Disease Similarity Based on Gene Network Reconstruction and Representation. Bioinformatics 2021;37:3579-3587. [PMID: 33978702 DOI: 10.1093/bioinformatics/btab252] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 03/01/2021] [Accepted: 04/28/2021] [Indexed: 11/13/2022] Open

Abstract

MOTIVATION

Quantifying the associations between diseases is of great significance in increasing our understanding of disease biology, improving disease diagnosis, re-positioning, and developing drugs. Therefore, in recent years, the research of disease similarity has received a lot of attention in the field of bioinformatics. Previous work has shown that the combination of the ontology (such as disease ontology and gene ontology) and disease-gene interactions are worthy to be regarded to elucidate diseases and disease associations. However, most of them are either based on the overlap between disease-related gene sets or distance within the ontology's hierarchy. The diseases in these methods are represented by discrete or sparse feature vectors, which cannot grasp the deep semantic information of diseases. Recently, deep representation learning has been widely studied and gradually applied to various fields of bioinformatics. Based on the hypothesis that disease representation depends on its related gene representations, we propose a disease representation model using two most representative gene resources HumanNet and Gene Ontology to construct a new gene network and learn gene (disease) representations. The similarity between two diseases is computed by the cosine similarity of their corresponding representations.

RESULTS

We propose a novel approach to compute disease similarity, which integrates two important factors disease-related genes and gene ontology hierarchy to learn disease representation based on deep representation learning. Under the same experimental settings, the AUC value of our method is 0.8074, which improves the most competitive baseline method by 10.1%. The quantitative and qualitative experimental results show that our model can learn effective disease representations and improve the accuracy of disease similarity computation significantly.

AVAILABILITY

The research shows that this method has certain applicability in the prediction of gene-related diseases, the migration of disease treatment methods, drug development, and so on.

SUPPLEMENTARY INFORMATION

Supplementary data are available at https://github.com/catly/disease_similarity.

Collapse

Zhong X, Rajapakse JC. Graph embeddings on gene ontology annotations for protein-protein interaction prediction. BMC Bioinformatics 2020;21:560. [PMID: 33323115 PMCID: PMC7739483 DOI: 10.1186/s12859-020-03816-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 10/13/2020] [Indexed: 01/15/2023] Open

Cox S, Dong X, Rai R, Christopherson L, Zheng W, Tropsha A, Schmitt C. A semantic similarity based methodology for predicting protein-protein interactions: Evaluation with P53-interacting kinases. J Biomed Inform 2020;111:103579. [PMID: 33007449 DOI: 10.1016/j.jbi.2020.103579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 09/14/2020] [Accepted: 09/25/2020] [Indexed: 10/23/2022]

Abstract

Biomedical literature contains unstructured, rich information regarding proteins, ligands, diseases as well as biological pathways in which they are involved. Systematically analyzing such textual corpus has the potential for biomedical discovery of new protein-protein interactions and hidden drug indications. For this purpose, we have investigated a methodology that is based on a well-established text mining tool, Word2Vec, for the analysis of PubMed full text articles to derive word embeddings, and the use of a simple semantic similarity comparison either by itself or in conjunction with k-Nearest Neighbor (kNN) technique for the prediction of new relationships. To test this methodology, three lines of retrospective analyses of a dataset with known P53-interacting proteins have been conducted. First, we demonstrated that Word2Vec semantic similarity can infer functional relatedness among all kinases known to interact with P53. Second, in a series of time-split experiments, we demonstrated that both a simple similarity comparison and kNN models built with papers published up to a certain year were able to discover P53 interactors described in later publications. Third, in a different scenario of time-split experiments, we examined the predictions of P53-interacting proteins based on the kNN models built on data prior to a certain split year for different time ranges past that year, and found that the cumulative number of correct predictions was indeed increasing with time. We conclude that text mining of research papers in the PubMed literature based on Word2Vec analysis followed by a simple similarity comparison or kNN modeling affords excellent predictions of protein-protein interactions between P53 and kinases, and should have wide applications in translational biomedical studies such as repurposing of existing drugs, drug-drug interaction, and elucidation of mechanisms of action for drugs.

Collapse

Smaili FZ, Gao X, Hoehndorf R. Formal axioms in biomedical ontologies improve analysis and interpretation of associated data. Bioinformatics 2020;36:2229-2236. [PMID: 31821406 PMCID: PMC7141863 DOI: 10.1093/bioinformatics/btz920] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 10/16/2019] [Accepted: 12/06/2019] [Indexed: 12/30/2022] Open

Zhao L, Wang J, Hu Y, Cheng L. Conjoint Feature Representation of GO and Protein Sequence for PPI Prediction Based on an Inception RNN Attention Network. MOLECULAR THERAPY. NUCLEIC ACIDS 2020;22:198-208. [PMID: 33230427 PMCID: PMC7515979 DOI: 10.1016/j.omtn.2020.08.025] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/11/2020] [Accepted: 08/21/2020] [Indexed: 12/18/2022]

Callahan TJ, Tripodi IJ, Pielke-Lombardo H, Hunter LE. Knowledge-Based Biomedical Data Science. Annu Rev Biomed Data Sci 2020;3:23-41. [PMID: 33954284 PMCID: PMC8095730 DOI: 10.1146/annurev-biodatasci-010820-091627] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]