1
|
Wang G, Feng H, Du M, Feng Y, Cao C. Multimodal Representation Learning via Graph Isomorphism Network for Toxicity Multitask Learning. J Chem Inf Model 2024; 64:8322-8338. [PMID: 39432821 DOI: 10.1021/acs.jcim.4c01061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2024]
Abstract
Toxicity is paramount for comprehending compound properties, particularly in the early stages of drug design. Due to the diversity and complexity of toxic effects, it became a challenge to compute compound toxicity tasks. To address this issue, we propose a multimodal representation learning model, termed multimodal graph isomorphism network (MMGIN), to address this challenge for compound toxicity multitask learning. Based on fingerprints and molecular graphs of compounds, our MMGIN model incorporates a multimodal representation learning model to acquire a comprehensive compound representation. This model adopts a two-channel structure to independently learn fingerprint representation and molecular graph representation. Subsequently, two feedforward neural networks utilize the learned multimodal compound representation to perform multitask learning, encompassing compound toxicity classification and multiple compound category classification simultaneously. To test the effectiveness of our model, we constructed a novel data set, termed the compound toxicity multitask learning (CTMTL) data set, derived from the TOXRIC data set. We compare our MMGIN model with other representative machine learning and deep learning models on the CTMTL and Tox21 data sets. The experimental results demonstrate significant advancements achieved by our MMGIN model. Furthermore, the ablation study underscores the effectiveness of the introduced fingerprints, molecular graphs, the multimodal representation learning model, and the multitask learning model, showcasing the model's superior predictive capability and robustness.
Collapse
Affiliation(s)
- Guishen Wang
- School of Computer Science and Engineering, Changchun University of Technology, North Yuanda Street No. 3000, Changchun, 130012 Jilin, China
| | - Hui Feng
- School of Computer Science and Engineering, Changchun University of Technology, North Yuanda Street No. 3000, Changchun, 130012 Jilin, China
| | - Mengyan Du
- School of Computer Science and Engineering, Changchun University of Technology, North Yuanda Street No. 3000, Changchun, 130012 Jilin, China
| | - Yuncong Feng
- School of Computer Science and Engineering, Changchun University of Technology, North Yuanda Street No. 3000, Changchun, 130012 Jilin, China
| | - Chen Cao
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Longmian Avenue No. 101, Nanjing, 211166 Jiangsu, China
| |
Collapse
|
2
|
Li X, Zhang F, Zheng L, Guo J. Advancing ecotoxicity assessment: Leveraging pre-trained model for bee toxicity and compound degradability prediction. JOURNAL OF HAZARDOUS MATERIALS 2024; 475:134828. [PMID: 38876015 DOI: 10.1016/j.jhazmat.2024.134828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 05/09/2024] [Accepted: 06/03/2024] [Indexed: 06/16/2024]
Abstract
The prediction of ecological toxicity plays an increasingly important role in modern society. However, the existing models often suffer from poor performance and limited predictive capabilities. In this study, we propose a novel approach for ecological toxicity assessment based on pre-trained models. By leveraging pre-training techniques and graph neural network models, we establish a highperformance predictive model. Furthermore, we incorporate a variational autoencoder to optimize the model, enabling simultaneous discrimination of toxicity to bees and molecular degradability. Additionally, despite the low similarity between the endogenous hormones in bees and the compounds in our dataset, our model confidently predicts that these hormones are non-toxic to bees, which further strengthens the credibility and accuracy of our model. We also discovered the negative correlation between the degradation and bee toxicity of compounds. In summary, this study presents an ecological toxicity assessment model with outstanding performance. The proposed model accurately predicts the toxicity of chemicals to bees and their degradability capabilities, offering valuable technical support to relevant fields.
Collapse
Affiliation(s)
- Xinkang Li
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, 999078, Macao
| | - Feng Zhang
- College of Plant Protection, Nanjing Agricultural University, Nanjing 210095, China
| | - Liangzhen Zheng
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518000, China; Zelixir Biotech Company Ltd. Shanghai, China.
| | - Jingjing Guo
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, 999078, Macao.
| |
Collapse
|
3
|
Tang X, Tran A, Tan J, Gerstein MB. MolLM: a unified language model for integrating biomedical text with 2D and 3D molecular representations. Bioinformatics 2024; 40:i357-i368. [PMID: 38940177 PMCID: PMC11256921 DOI: 10.1093/bioinformatics/btae260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION The current paradigm of deep learning models for the joint representation of molecules and text primarily relies on 1D or 2D molecular formats, neglecting significant 3D structural information that offers valuable physical insight. This narrow focus inhibits the models' versatility and adaptability across a wide range of modalities. Conversely, the limited research focusing on explicit 3D representation tends to overlook textual data within the biomedical domain. RESULTS We present a unified pre-trained language model, MolLM, that concurrently captures 2D and 3D molecular information alongside biomedical text. MolLM consists of a text Transformer encoder and a molecular Transformer encoder, designed to encode both 2D and 3D molecular structures. To support MolLM's self-supervised pre-training, we constructed 160K molecule-text pairings. Employing contrastive learning as a supervisory signal for learning, MolLM demonstrates robust molecular representation capabilities across four downstream tasks, including cross-modal molecule and text matching, property prediction, captioning, and text-prompted molecular editing. Through ablation, we demonstrate that the inclusion of explicit 3D representations improves performance in these downstream tasks. AVAILABILITY AND IMPLEMENTATION Our code, data, pre-trained model weights, and examples of using our model are all available at https://github.com/gersteinlab/MolLM. In particular, we provide Jupyter Notebooks offering step-by-step guidance on how to use MolLM to extract embeddings for both molecules and text.
Collapse
Affiliation(s)
- Xiangru Tang
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
| | - Andrew Tran
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
| | - Jeffrey Tan
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
| | - Mark B Gerstein
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT 06520, United States
- Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, CT 06520, United States
- Department of Statistics & Data Science, Yale University, New Haven, CT 06520, United States
- Department of Biomedical Informatics & Data Science, Yale University, New Haven, CT 06520, United States
| |
Collapse
|
4
|
Chirsir P, Palm EH, Baskaran S, Schymanski EL, Wang Z, Wolf R, Hale SE, Arp HPH. Grouping strategies for assessing and managing persistent and mobile substances. ENVIRONMENTAL SCIENCES EUROPE 2024; 36:102. [PMID: 38784824 PMCID: PMC11108893 DOI: 10.1186/s12302-024-00919-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 04/24/2024] [Indexed: 05/25/2024]
Abstract
Background Persistent, mobile and toxic (PMT), or very persistent and very mobile (vPvM) substances are a wide class of chemicals that are recalcitrant to degradation, easily transported, and potentially harmful to humans and the environment. Due to their persistence and mobility, these substances are often widespread in the environment once emitted, particularly in water resources, causing increased challenges during water treatment processes. Some PMT/vPvM substances such as GenX and perfluorobutane sulfonic acid have been identified as substances of very high concern (SVHCs) under the European Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH) regulation. With hundreds to thousands of potential PMT/vPvM substances yet to be assessed and managed, effective and efficient approaches that avoid a case-by-case assessment and prevent regrettable substitution are necessary to achieve the European Union's zero-pollution goal for a non-toxic environment by 2050. Main Substance grouping has helped global regulation of some highly hazardous chemicals, e.g., through the Montreal Protocol and the Stockholm Convention. This article explores the potential of grouping strategies for identifying, assessing and managing PMT/vPvM substances. The aim is to facilitate early identification of lesser-known or new substances that potentially meet PMT/vPvM criteria, prompt additional testing, avoid regrettable use or substitution, and integrate into existing risk management strategies. Thus, this article provides an overview of PMT/vPvM substances and reviews the definition of PMT/vPvM criteria and various lists of PMT/vPvM substances available. It covers the current definition of groups, compares the use of substance grouping for hazard assessment and regulation, and discusses the advantages and disadvantages of grouping substances for regulation. The article then explores strategies for grouping PMT/vPvM substances, including read-across, structural similarity and commonly retained moieties, as well as the potential application of these strategies using cheminformatics to predict P, M and T properties for selected examples. Conclusions Effective substance grouping can accelerate the assessment and management of PMT/vPvM substances, especially for substances that lack information. Advances to read-across methods and cheminformatics tools are needed to support efficient and effective chemical management, preventing broad entry of hazardous chemicals into the global market and favouring safer and more sustainable alternatives.
Collapse
Affiliation(s)
- Parviel Chirsir
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 Avenue du Swing, 4367 Belvaux, Luxembourg
| | - Emma H. Palm
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 Avenue du Swing, 4367 Belvaux, Luxembourg
| | - Sivani Baskaran
- Department of Environmental Engineering, Norwegian Geotechnical Institute, 0806 Oslo, Norway
| | - Emma L. Schymanski
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 Avenue du Swing, 4367 Belvaux, Luxembourg
| | - Zhanyun Wang
- Technology and Society Laboratory, Empa-Swiss Federal Laboratories for Materials Science and Technology, 9014 St. Gallen, Switzerland
| | - Raoul Wolf
- Department of Environmental Engineering, Norwegian Geotechnical Institute, 0806 Oslo, Norway
| | - Sarah E. Hale
- TZW: DVGW-Technologiezentrum Wasser (German Water Centre), Karlsruher Straße 84, 76139 Karlsruhe, Germany
| | - Hans Peter H. Arp
- Department of Environmental Engineering, Norwegian Geotechnical Institute, 0806 Oslo, Norway
- Department of Chemistry, Norwegian University of Science and Technology (NTNU), 7491 Trondheim, Norway
| |
Collapse
|
5
|
Tang X, Lei X, Zhang Y. Prediction of Drug-Target Affinity Using Attention Neural Network. Int J Mol Sci 2024; 25:5126. [PMID: 38791165 PMCID: PMC11121300 DOI: 10.3390/ijms25105126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 04/19/2024] [Accepted: 04/27/2024] [Indexed: 05/26/2024] Open
Abstract
Studying drug-target interactions (DTIs) is the foundational and crucial phase in drug discovery. Biochemical experiments, while being the most reliable method for determining drug-target affinity (DTA), are time-consuming and costly, making it challenging to meet the current demands for swift and efficient drug development. Consequently, computational DTA prediction methods have emerged as indispensable tools for this research. In this article, we propose a novel deep learning algorithm named GRA-DTA, for DTA prediction. Specifically, we introduce Bidirectional Gated Recurrent Unit (BiGRU) combined with a soft attention mechanism to learn target representations. We employ Graph Sample and Aggregate (GraphSAGE) to learn drug representation, especially to distinguish the different features of drug and target representations and their dimensional contributions. We merge drug and target representations by an attention neural network (ANN) to learn drug-target pair representations, which are fed into fully connected layers to yield predictive DTA. The experimental results showed that GRA-DTA achieved mean squared error of 0.142 and 0.225 and concordance index reached 0.897 and 0.890 on the benchmark datasets KIBA and Davis, respectively, surpassing the most state-of-the-art DTA prediction algorithms.
Collapse
Affiliation(s)
- Xin Tang
- School of Computer Science, Shaanxi Normal University, Xi’an 710119, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi’an 710119, China
| | - Yuchen Zhang
- College of Information Engineering, Northwest A&F University, Xianyang 712199, China;
| |
Collapse
|
6
|
Zhang R, Wu C, Yang Q, Liu C, Wang Y, Li K, Huang L, Zhou F. MolFeSCue: enhancing molecular property prediction in data-limited and imbalanced contexts using few-shot and contrastive learning. Bioinformatics 2024; 40:btae118. [PMID: 38426310 PMCID: PMC10984949 DOI: 10.1093/bioinformatics/btae118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2023] [Revised: 02/04/2024] [Accepted: 02/27/2024] [Indexed: 03/02/2024] Open
Abstract
MOTIVATION Predicting molecular properties is a pivotal task in various scientific domains, including drug discovery, material science, and computational chemistry. This problem is often hindered by the lack of annotated data and imbalanced class distributions, which pose significant challenges in developing accurate and robust predictive models. RESULTS This study tackles these issues by employing pretrained molecular models within a few-shot learning framework. A novel dynamic contrastive loss function is utilized to further improve model performance in the situation of class imbalance. The proposed MolFeSCue framework not only facilitates rapid generalization from minimal samples, but also employs a contrastive loss function to extract meaningful molecular representations from imbalanced datasets. Extensive evaluations and comparisons of MolFeSCue and state-of-the-art algorithms have been conducted on multiple benchmark datasets, and the experimental data demonstrate our algorithm's effectiveness in molecular representations and its broad applicability across various pretrained models. Our findings underscore MolFeSCues potential to accelerate advancements in drug discovery. AVAILABILITY AND IMPLEMENTATION We have made all the source code utilized in this study publicly accessible via GitHub at http://www.healthinformaticslab.org/supp/ or https://github.com/zhangruochi/MolFeSCue. The code (MolFeSCue-v1-00) is also available as the supplementary file of this paper.
Collapse
Affiliation(s)
- Ruochi Zhang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
- School of Artificial Intelligence, Jilin University, Changchun 130012, China
| | - Chao Wu
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
- College of Computer Science and Technology, Jilin University, Changchun, Jilin 130012, China
| | - Qian Yang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
- College of Computer Science and Technology, Jilin University, Changchun, Jilin 130012, China
| | - Chang Liu
- Beijing Life Science Academy, Beijing 102209, China
| | - Yan Wang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
- College of Computer Science and Technology, Jilin University, Changchun, Jilin 130012, China
| | - Kewei Li
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
- College of Computer Science and Technology, Jilin University, Changchun, Jilin 130012, China
| | - Lan Huang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
- College of Computer Science and Technology, Jilin University, Changchun, Jilin 130012, China
| | - Fengfeng Zhou
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
- College of Computer Science and Technology, Jilin University, Changchun, Jilin 130012, China
- School of Biology and Engineering, Guizhou Medical University, Guiyang, Guizhou 550025, China
| |
Collapse
|
7
|
Ma M, Lei X. A deep learning framework for predicting molecular property based on multi-type features fusion. Comput Biol Med 2024; 169:107911. [PMID: 38160501 DOI: 10.1016/j.compbiomed.2023.107911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 12/18/2023] [Accepted: 12/24/2023] [Indexed: 01/03/2024]
Abstract
Extracting expressive molecular features is essential for molecular property prediction. Sequence-based representation is a common representation of molecules, which ignores the structure information of molecules. While molecular graph representation has a weak ability in expressing the 3D structure. In this article, we try to make use of the advantages of different type representations simultaneously for molecular property prediction. Thus, we propose a fusion model named DLF-MFF, which integrates the multi-type molecular features. Specifically, we first extract four different types of features from molecular fingerprints, 2D molecular graph, 3D molecular graph and molecular image. Then, in order to learn molecular features individually, we use four essential deep learning frameworks, which correspond to four distinct molecular representations. The final molecular representation is created by integrating the four feature vectors and feeding them into prediction layer to predict molecular property. We compare DLF-MFF with 7 state-of-the-art methods on 6 benchmark datasets consisting of multiple molecular properties, the experimental results show that DLF-MFF achieves state-of-the-art performance on 6 benchmark datasets. Moreover, DLF-MFF is applied to identify potential anti-SARS-CoV-2 inhibitor from 2500 drugs. We predict probability of each drug being inferred as a 3CL protease inhibitor and also calculate the binding affinity scores between each drug and 3CL protease. The results show that DLF-MFF product better performance in the identification of anti-SARS-CoV-2 inhibitor. This work is expected to offer novel research perspectives for accurate prediction of molecular properties and provide valuable insights into drug repurposing for COVID-19.
Collapse
Affiliation(s)
- Mei Ma
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China; School of Mathematics and Statistics, Qinghai Normal University, Qinghai, 810000, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China.
| |
Collapse
|
8
|
Ouabane M, Zaki K, Tabti K, Alaqarbeh M, Sbai A, Sekkate C, Bouachrine M, Lakhlifi T. Molecular toxicity of nitrobenzene derivatives to tetrahymena pyriformis based on SMILES descriptors using Monte Carlo, docking, and MD simulations. Comput Biol Med 2024; 169:107880. [PMID: 38211383 DOI: 10.1016/j.compbiomed.2023.107880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Revised: 12/05/2023] [Accepted: 12/18/2023] [Indexed: 01/13/2024]
Abstract
It is challenging to model the toxicity of nitroaromatic compounds due to limited experimental data. Nitrobenzene derivatives are commonly used in industry and can lead to environmental contamination. Extensive research, including several QSPR studies, has been conducted to understand their toxicity. Predictive QSPR models can help improve chemical safety, but their limitations must be considered, and the molecular factors affecting toxicity should be carefully investigated. The latest QSPR methods, molecular modeling techniques, machine learning algorithms, and computational chemistry tools are essential for developing accurate and robust models. In this work, we used these methods to study a series of fifty compounds derived from nitrobenzene. The Monte Carlo approach was used for QSPR modeling by applying the SMILES molecular structure representation and optimal molecular descriptors. The correlation ideality index (CII) and correlation contradiction index (CCI) were further introduced as validation parameters to estimate the developed models' predictive ability. The statistical quality of the CII models was better than those without CII. The best QSPR model with the following statistical parameters (Split-3): (R2 = 0.968, CCC = 0.984, IIC = 0.861, CII = 0.979, Q2 = 0.954, QF12 = 0.946, QF22 = 0.938, QF32 = 0.947, Rm2 = 0.878, RMSE = 0.187, MAE = 0.151, FTraining = 390, FInvisible = 218, FCalibration = 240, RTest2 = 0.905) was selected to generate the studied promoters with increasing and decreasing activity.
Collapse
Affiliation(s)
- Mohamed Ouabane
- Molecular Chemistry and Natural Substances Laboratory, Department of Chemistry, Faculty of Science, Moulay Ismail University, Meknes, Morocco; Chemistry-Biology Applied to the Environment URL CNRT 13, Chemistry Department, Faculty of Science, Moulay Ismail University, Meknes, Morocco
| | - Khadija Zaki
- Molecular Chemistry and Natural Substances Laboratory, Department of Chemistry, Faculty of Science, Moulay Ismail University, Meknes, Morocco
| | - Kamal Tabti
- Molecular Chemistry and Natural Substances Laboratory, Department of Chemistry, Faculty of Science, Moulay Ismail University, Meknes, Morocco
| | - Marwa Alaqarbeh
- Basic Science Department, Prince Al Hussein Bin Abdullah II Academy for Civil Protection, Al-Balqa Applied University, Al-Salt, 19117, Jordan
| | - Abdelouahid Sbai
- Molecular Chemistry and Natural Substances Laboratory, Department of Chemistry, Faculty of Science, Moulay Ismail University, Meknes, Morocco
| | - Chakib Sekkate
- Chemistry-Biology Applied to the Environment URL CNRT 13, Chemistry Department, Faculty of Science, Moulay Ismail University, Meknes, Morocco
| | - Mohammed Bouachrine
- Molecular Chemistry and Natural Substances Laboratory, Department of Chemistry, Faculty of Science, Moulay Ismail University, Meknes, Morocco; Higher School of Technology-Khenifra (EST-Khenifra), University of Sultan Moulay Slimane, PB 170, Khenifra, 54000, Morocco
| | - Tahar Lakhlifi
- Molecular Chemistry and Natural Substances Laboratory, Department of Chemistry, Faculty of Science, Moulay Ismail University, Meknes, Morocco.
| |
Collapse
|
9
|
Veljković AN, Orlov YL, Mitić NS. BioGraph: Data Model for Linking and Querying Diverse Biological Metadata. Int J Mol Sci 2023; 24:ijms24086954. [PMID: 37108117 PMCID: PMC10138499 DOI: 10.3390/ijms24086954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 03/30/2023] [Accepted: 04/06/2023] [Indexed: 04/29/2023] Open
Abstract
Studying the association of gene function, diseases, and regulatory gene network reconstruction demands data compatibility. Data from different databases follow distinct schemas and are accessible in heterogenic ways. Although the experiments differ, data may still be related to the same biological entities. Some entities may not be strictly biological, such as geolocations of habitats or paper references, but they provide a broader context for other entities. The same entities from different datasets can share similar properties, which may or may not be found within other datasets. Joint, simultaneous data fetching from multiple data sources is complicated for the end-user or, in many cases, unsupported and inefficient due to differences in data structures and ways of accessing the data. We propose BioGraph-a new model that enables connecting and retrieving information from the linked biological data that originated from diverse datasets. We have tested the model on metadata collected from five diverse public datasets and successfully constructed a knowledge graph containing more than 17 million model objects, of which 2.5 million are individual biological entity objects. The model enables the selection of complex patterns and retrieval of matched results that can be discovered only by joining the data from multiple sources.
Collapse
Affiliation(s)
- Aleksandar N Veljković
- Faculty of Mathematics, University of Belgrade, Studentski trg 16, 11158 Belgrade, Serbia
| | - Yuriy L Orlov
- The Digital Health Institute, I.M. Sechenov First Moscow State Medical University of the Ministry of Health of the Russian Federation (Sechenov University), 119991 Moscow, Russia
- Institute of Cytology and Genetics SB RAS, 630090 Novosibirsk, Russia
- Agrarian and Technological Institute, Peoples' Friendship University of Russia, 117198 Moscow, Russia
| | - Nenad S Mitić
- Faculty of Mathematics, University of Belgrade, Studentski trg 16, 11158 Belgrade, Serbia
| |
Collapse
|
10
|
MSResG: Using GAE and Residual GCN to Predict Drug-Drug Interactions Based on Multi-source Drug Features. Interdiscip Sci 2023; 15:171-188. [PMID: 36646843 DOI: 10.1007/s12539-023-00550-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 01/05/2023] [Accepted: 01/07/2023] [Indexed: 01/18/2023]
Abstract
Drug-drug interaction refers to taking the two drugs may produce certain reaction which may be a threat to patients' health, or enhance the efficacy helpful for medical work. Therefore, it is necessary to study and predict it. In fact, traditional experimental methods can be used for drug-drug interaction prediction, but they are time-consuming and costly, so we prefer to use more accurate and convenient calculation methods to predict the unknown drug-drug interaction. In this paper, we proposed a deep learning framework called MSResG that considers multi-sources features of drugs and combines them with Graph Auto-Encoder to predicting. Firstly, the model obtains four feature representations of drugs from the database, namely, chemical substructure, target, pathway and enzyme, and then calculates the Jaccard similarity of the drugs. To balance different drug features, we perform similarity integration by finding the mean value. Then we will be comprehensive similarity network combined with drug interaction network, and encodes and decodes it using the graph auto-encoder based on residual graph convolution network. Encoding is to learn the potential feature vectors of drugs, which contain similar information and interaction information. Decoding is to reconstruct the network to predict unknown drug-drug interaction. The experimental results show that our model has advanced performance and is superior to other existing advanced methods. Case study also shows that MSResG has practical significance.
Collapse
|