1
|
Moon H, Rho M. MultiChem: predicting chemical properties using multi-view graph attention network. BioData Min 2025; 18:4. [PMID: 39815309 PMCID: PMC11737097 DOI: 10.1186/s13040-024-00419-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Accepted: 12/26/2024] [Indexed: 01/18/2025] Open
Abstract
BACKGROUND Understanding the molecular properties of chemical compounds is essential for identifying potential candidates or ensuring safety in drug discovery. However, exploring the vast chemical space is time-consuming and costly, necessitating the development of time-efficient and cost-effective computational methods. Recent advances in deep learning approaches have offered deeper insights into molecular structures. Leveraging this progress, we developed a novel multi-view learning model. RESULTS We introduce a graph-integrated model that captures both local and global structural features of chemical compounds. In our model, graph attention layers are employed to effectively capture essential local structures by jointly considering atom and bond features, while multi-head attention layers extract important global features. We evaluated our model on nine MoleculeNet datasets, encompassing both classification and regression tasks, and compared its performance with state-of-the-art methods. Our model achieved an average area under the receiver operating characteristic (AUROC) of 0.822 and a root mean squared error (RMSE) of 1.133, representing a 3% improvement in AUROC and a 7% improvement in RMSE over state-of-the-art models in extensive seed testing. CONCLUSION MultiChem highlights the importance of integrating both local and global structural information in predicting molecular properties, while also assessing the stability of the models across multiple datasets using various random seed values. IMPLEMENTATION The codes are available at https://github.com/DMnBI/MultiChem .
Collapse
Affiliation(s)
- Heesang Moon
- Department of Computer Science, Hanyang University, Seoul, Republic of Korea
| | - Mina Rho
- Department of Computer Science, Hanyang University, Seoul, Republic of Korea.
- Department of Artificial Intelligence, Seoul, Republic of Korea.
- Department of Biomedical Informatics, Hanyang University, Seoul, Republic of Korea.
| |
Collapse
|
2
|
Huang J, Xing Q, Ji J, Yang B. PerCNet: Periodic complete representation for crystal graphs. Neural Netw 2025; 181:106841. [PMID: 39515084 DOI: 10.1016/j.neunet.2024.106841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Revised: 09/26/2024] [Accepted: 10/23/2024] [Indexed: 11/16/2024]
Abstract
Crystal molecules are considered as graph structures in different representation methods. A reasonable crystal representation method should capture the local and global information. However, existing methods only consider the local information of crystal molecules by modeling the bond distance and bond angle of first-order neighbors of atoms, which leads to the issue that different crystals will have the same representation. To solve this many-to-one issue, we consider the global information by further considering dihedral angles. We propose a periodic complete representation of graph modeling and a calculation algorithm for infinite extended crystal materials. A theoretical proof for the representation that satisfies the periodic completeness is provided. Based on the proposed representation, we then propose a network for predicting crystal material properties, PerCNet, with a specially designed message-passing mechanism. To our best known, we are the first work that ensures the representation corresponds one-to-one with the crystal material based on graph modeling. Extensive experiments are conducted on two large-scale real-world material benchmark datasets. The PerCNet achieves the best performance among baseline methods in terms of MAE. Our code is available at https://github.com/JiaoHuang111/PerCNet.
Collapse
Affiliation(s)
- Jiao Huang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Artificial Intelligence, Jilin University, Changchun, Jilin, 130012, China.
| | - Qianli Xing
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China.
| | - Jinglong Ji
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Artificial Intelligence, Jilin University, Changchun, Jilin, 130012, China.
| | - Bo Yang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China.
| |
Collapse
|
3
|
Park J, Lee W, Kim J. Large-Scale Construction and Analysis of Amorphous Porous Polymer Network Materials. ACS APPLIED MATERIALS & INTERFACES 2024; 16:57190-57199. [PMID: 39388380 DOI: 10.1021/acsami.4c13221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
In recent decades, data-driven methodologies have emerged as irreplaceable tools in materials science, particularly for elucidating structure-property relationships and facilitating the discovery of novel materials. However, despite the rapid development witnessed in other domains, amorphous materials have received relatively less attention in this context. The disordered atomic structure of amorphous materials resulting from irreversible reactions between building blocks has posed a difficulty in structural modeling, leading to a lack of databases that accurately reflect the amorphous nature of these materials. In this work, a database composed of 10,237 porous polymer networks (PPNs) was constructed from self-assembly simulations, resulting in the largest database of PPNs considering their amorphous characteristics. Through the distinct differences observed in comparison with existing databases, we emphasize that carefully considering the structural disorder of PPNs is essential for accurately characterizing their chemical behaviors. Machine learning models trained on the constructed database have confirmed that the macroscopic properties of amorphous PPNs can be predicted solely from the atomic structures of their monomers, implying that the characteristics of previously unseen PPNs can be assessed without the need for additional self-assembly simulations.
Collapse
Affiliation(s)
- Junkil Park
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Wonseok Lee
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Jihan Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| |
Collapse
|
4
|
Zhang Q, Mao D, Tu Y, Wu YY. A New Fingerprint and Graph Hybrid Neural Network for Predicting Molecular Properties. J Chem Inf Model 2024; 64:5853-5866. [PMID: 39052623 DOI: 10.1021/acs.jcim.4c00586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
Machine learning plays a role in accelerating drug discovery, and the design of effective machine learning models is crucial for accurately predicting molecular properties. Characterizing molecules typically involves the use of molecular fingerprints and molecular graphs. These are input into a multilayer perceptron (MLP) and variants of graph neural networks, such as graph attention networks (GATs). Due to the diverse types and large dimension of fingerprints, models may contain many features that are relatively irrelevant or redundant; meanwhile, although the GAT excels in handling heterogeneous graph tasks, it lacks the ability to extract collaborative information from neighboring nodes, which is crucial in scenarios where it cannot capture the joint influence of adjacent groups on atoms. To overcome these challenges, we introduce a hybrid model, combining improved GAT and MLP. In GAT, the recurrent neural network is employed to capture collaborative information. To address the dimensionality issue, we propose a feature selection algorithm, which is based on the principle of maximizing relevance while minimizing redundancy. Through experiments on 13 public data sets and 14 breast cell lines, our model demonstrates superior performance compared to state-of-the-art deep learning and traditional machine learning algorithms. Additionally, a series of ablation experiments were conducted to demonstrate the advantages of our improved version, as well as its antinoise capability and interpretability. These results indicate that our model holds promising prospects for practical applications.
Collapse
Affiliation(s)
- Qingtian Zhang
- College of Physics Science and Technology, Yangzhou University, Jiangsu 225009, China
| | - Dangxin Mao
- College of Physics Science and Technology, Yangzhou University, Jiangsu 225009, China
| | - Yusong Tu
- College of Physics Science and Technology, Yangzhou University, Jiangsu 225009, China
| | - Yuan-Yan Wu
- College of Physics Science and Technology, Yangzhou University, Jiangsu 225009, China
| |
Collapse
|
5
|
Kang L, Zhou S, Fang S, Liu S. Adapting differential molecular representation with hierarchical prompts for multi-label property prediction. Brief Bioinform 2024; 25:bbae438. [PMID: 39252594 PMCID: PMC11383732 DOI: 10.1093/bib/bbae438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 08/05/2024] [Accepted: 08/21/2024] [Indexed: 09/11/2024] Open
Abstract
Accurate prediction of molecular properties is crucial in drug discovery. Traditional methods often overlook that real-world molecules typically exhibit multiple property labels with complex correlations. To this end, we propose a novel framework, HiPM, which stands for Hierarchical Prompted Molecular representation learning framework. HiPM leverages task-aware prompts to enhance the differential expression of tasks in molecular representations and mitigate negative transfer caused by conflicts in individual task information. Our framework comprises two core components: the Molecular Representation Encoder (MRE) and the Task-Aware Prompter (TAP). MRE employs a hierarchical message-passing network architecture to capture molecular features at both the atom and motif levels. Meanwhile, TAP utilizes agglomerative hierarchical clustering algorithm to construct a prompt tree that reflects task affinity and distinctiveness, enabling the model to consider multi-granular correlation information among tasks, thereby effectively handling the complexity of multi-label property prediction. Extensive experiments demonstrate that HiPM achieves state-of-the-art performance across various multi-label datasets, offering a novel perspective on multi-label molecular representation learning.
Collapse
Affiliation(s)
- Linjia Kang
- College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Songhua Zhou
- College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Shuyan Fang
- College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Shichao Liu
- College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| |
Collapse
|
6
|
Jiang J, Li Y, Zhang R, Liu Y. INTransformer: Data augmentation-based contrastive learning by injecting noise into transformer for molecular property prediction. J Mol Graph Model 2024; 128:108703. [PMID: 38228013 DOI: 10.1016/j.jmgm.2024.108703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 12/05/2023] [Accepted: 01/02/2024] [Indexed: 01/18/2024]
Abstract
Molecular property prediction plays an essential role in drug discovery for identifying the candidate molecules with target properties. Deep learning models usually require sufficient labeled data to train good prediction models. However, the size of labeled data is usually small for molecular property prediction, which brings great challenges to deep learning-based molecular property prediction methods. Furthermore, the global information of molecules is critical for predicting molecular properties. Therefore, we propose INTransformer for molecular property prediction, which is a data augmentation method via contrastive learning to alleviate the limitations of the labeled molecular data while enhancing the ability to capture global information. Specifically, INTransformer consists of two identical Transformer sub-encoders to extract the molecular representation from the original SMILES and noisy SMILES respectively, while achieving the goal of data augmentation. To reduce the influence of noise, we use contrastive learning to ensure the molecular encoding of noisy SMILES is consistent with that of the original input so that the molecular representation information can be better extracted by INTransformer. Experiments on various benchmark datasets show that INTransformer achieved competitive performance for molecular property prediction tasks compared with the baselines and state-of-the-art methods.
Collapse
Affiliation(s)
- Jing Jiang
- Key Laboratory of Linguistic and Cultural Computing, Ministry of Education, Northwest Minzu University, Lanzhou 730030, China.
| | - Yachao Li
- Key Laboratory of Linguistic and Cultural Computing, Ministry of Education, Northwest Minzu University, Lanzhou 730030, China.
| | - Ruisheng Zhang
- School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China.
| | - Yunwu Liu
- School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China.
| |
Collapse
|
7
|
Meewan I, Panmanee J, Petchyam N, Lertvilai P. HBCVTr: an end-to-end transformer with a deep neural network hybrid model for anti-HBV and HCV activity predictor from SMILES. Sci Rep 2024; 14:9262. [PMID: 38649402 PMCID: PMC11035669 DOI: 10.1038/s41598-024-59933-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 04/16/2024] [Indexed: 04/25/2024] Open
Abstract
Hepatitis B and C viruses (HBV and HCV) are significant causes of chronic liver diseases, with approximately 350 million infections globally. To accelerate the finding of effective treatment options, we introduce HBCVTr, a novel ligand-based drug design (LBDD) method for predicting the inhibitory activity of small molecules against HBV and HCV. HBCVTr employs a hybrid model consisting of double encoders of transformers and a deep neural network to learn the relationship between small molecules' simplified molecular-input line-entry system (SMILES) and their antiviral activity against HBV or HCV. The prediction accuracy of HBCVTr has surpassed baseline machine learning models and existing methods, with R-squared values of 0.641 and 0.721 for the HBV and HCV test sets, respectively. The trained models were successfully applied to virtual screening against 10 million compounds within 240 h, leading to the discovery of the top novel inhibitor candidates, including IJN04 for HBV and IJN12 and IJN19 for HCV. Molecular docking and dynamics simulations identified IJN04, IJN12, and IJN19 target proteins as the HBV core antigen, HCV NS5B RNA-dependent RNA polymerase, and HCV NS3/4A serine protease, respectively. Overall, HBCVTr offers a new and rapid drug discovery and development screening method targeting HBV and HCV.
Collapse
Affiliation(s)
- Ittipat Meewan
- Center for Advanced Therapeutics, Institute of Molecular Biosciences, Mahidol University, Nakhon Pathom, 73170, Thailand.
| | - Jiraporn Panmanee
- Research Center for Neuroscience, Institute of Molecular Biosciences, Mahidol University, Nakhon Pathom, 73170, Thailand
| | - Nopphon Petchyam
- Center for Advanced Therapeutics, Institute of Molecular Biosciences, Mahidol University, Nakhon Pathom, 73170, Thailand
| | - Pichaya Lertvilai
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, 92037, USA
| |
Collapse
|
8
|
Kengkanna A, Ohue M. Enhancing property and activity prediction and interpretation using multiple molecular graph representations with MMGX. Commun Chem 2024; 7:74. [PMID: 38580841 PMCID: PMC10997661 DOI: 10.1038/s42004-024-01155-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 03/18/2024] [Indexed: 04/07/2024] Open
Abstract
Graph Neural Networks (GNNs) excel in compound property and activity prediction, but the choice of molecular graph representations significantly influences model learning and interpretation. While atom-level molecular graphs resemble natural topology, they overlook key substructures or functional groups and their interpretation partially aligns with chemical intuition. Recent research suggests alternative representations using reduced molecular graphs to integrate higher-level chemical information and leverages both representations for model. However, there is a lack of studies about applicability and impact of different molecular graphs on model learning and interpretation. Here, we introduce MMGX (Multiple Molecular Graph eXplainable discovery), investigating the effects of multiple molecular graphs, including Atom, Pharmacophore, JunctionTree, and FunctionalGroup, on model learning and interpretation with various perspectives. Our findings indicate that multiple graphs relatively improve model performance, but in varying degrees depending on datasets. Interpretation from multiple graphs in different views provides more comprehensive features and potential substructures consistent with background knowledge. These results help to understand model decisions and offer valuable insights for subsequent tasks. The concept of multiple molecular graph representations and diverse interpretation perspectives has broad applicability across tasks, architectures, and explanation techniques, enhancing model learning and interpretation for relevant applications in drug discovery.
Collapse
Affiliation(s)
- Apakorn Kengkanna
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Kanagawa, 226-8501, Japan
| | - Masahito Ohue
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Kanagawa, 226-8501, Japan.
| |
Collapse
|
9
|
Ouyang H, Xu Z, Hong J, Malroy J, Qian L, Ji S, Zhu X. Mining the Metabolic Capacity of Clostridium sporogenes Aided by Machine Learning. Angew Chem Int Ed Engl 2024; 63:e202319925. [PMID: 38286754 PMCID: PMC10986427 DOI: 10.1002/anie.202319925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 01/26/2024] [Accepted: 01/29/2024] [Indexed: 01/31/2024]
Abstract
Anaerobes dominate the microbiota of the gastrointestinal (GI) tract, where a significant portion of small molecules can be degraded or modified. However, the enormous metabolic capacity of gut anaerobes remains largely elusive in contrast to aerobic bacteria, mainly due to the requirement of sophisticated laboratory settings. In this study, we employed an in silico machine learning platform, MoleculeX, to predict the metabolic capacity of a gut anaerobe, Clostridium sporogenes, against small molecules. Experiments revealed that among the top seven candidates predicted as unstable, six indeed exhibited instability in C. sporogenes culture. We further identified several metabolites resulting from the supplementation of everolimus in the bacterial culture for the first time. By utilizing bioinformatics and in vitro biochemical assays, we successfully identified an enzyme encoded in the genome of C. sporogenes responsible for everolimus transformation. Our framework thus can potentially facilitate future understanding of small molecules metabolism in the gut, further improve patient care through personalized medicine, and guide the development of new small molecule drugs and therapeutic approaches.
Collapse
Affiliation(s)
- Huanrong Ouyang
- Department of Chemical Engineering, Texas A&M University, College Station, 77843, United States
| | - Zhao Xu
- Department of Computer Science & Engineering, Texas A&M University, College Station, 77843, United States
| | - Joshua Hong
- Department of Chemical Engineering, Texas A&M University, College Station, 77843, United States
| | - Jeshua Malroy
- Department of Chemical Engineering, Texas A&M University, College Station, 77843, United States
| | - Liangyu Qian
- Department of Chemical Engineering, Texas A&M University, College Station, 77843, United States
| | - Shuiwang Ji
- Department of Computer Science & Engineering, Texas A&M University, College Station, 77843, United States
| | - Xuejun Zhu
- Department of Chemical Engineering, Texas A&M University, College Station, 77843, United States; Interdisciplinary Graduate Program in Genetics and Genomics, Texas A&M University, College Station, 77843, United States
| |
Collapse
|
10
|
Ham KP, Sael L. Evidential meta-model for molecular property prediction. Bioinformatics 2023; 39:btad604. [PMID: 37847785 PMCID: PMC10597608 DOI: 10.1093/bioinformatics/btad604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 09/02/2023] [Accepted: 10/16/2023] [Indexed: 10/19/2023] Open
Abstract
MOTIVATION The usefulness of supervised molecular property prediction (MPP) is well-recognized in many applications. However, the insufficiency and the imbalance of labeled data make the learning problem difficult. Moreover, the reliability of the predictions is also a huddle in the deployment of MPP models in safety-critical fields. RESULTS We propose the Evidential Meta-model for Molecular Property Prediction (EM3P2) method that returns uncertainty estimates along with its predictions. Our EM3P2 trains an evidential graph isomorphism network classifier using multi-task molecular property datasets under the model-agnostic meta-learning (MAML) framework while addressing the problem of data imbalance. Our results showed better prediction performances compared to existing meta-MPP models. Furthermore, we showed that the uncertainty estimates returned by our EM3P2 can be used to reject uncertain predictions for applications that require higher confidence. AVAILABILITY AND IMPLEMENTATION Source code available for download at https://github.com/Ajou-DILab/EM3P2.
Collapse
Affiliation(s)
- Kyung Pyo Ham
- Department of Artificial Intelligence, Ajou University, Suwon 16499, Republic of Korea
| | - Lee Sael
- Department of Artificial Intelligence, Ajou University, Suwon 16499, Republic of Korea
- Department of Software and Computer Engineering, Ajou University, Suwon 16499, Republic of Korea
| |
Collapse
|
11
|
Han S, Fu H, Wu Y, Zhao G, Song Z, Huang F, Zhang Z, Liu S, Zhang W. HimGNN: a novel hierarchical molecular graph representation learning framework for property prediction. Brief Bioinform 2023; 24:bbad305. [PMID: 37594313 DOI: 10.1093/bib/bbad305] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 07/18/2023] [Accepted: 08/04/2023] [Indexed: 08/19/2023] Open
Abstract
Accurate prediction of molecular properties is an important topic in drug discovery. Recent works have developed various representation schemes for molecular structures to capture different chemical information in molecules. The atom and motif can be viewed as hierarchical molecular structures that are widely used for learning molecular representations to predict chemical properties. Previous works have attempted to exploit both atom and motif to address the problem of information loss in single representation learning for various tasks. To further fuse such hierarchical information, the correspondence between learned chemical features from different molecular structures should be considered. Herein, we propose a novel framework for molecular property prediction, called hierarchical molecular graph neural networks (HimGNN). HimGNN learns hierarchical topology representations by applying graph neural networks on atom- and motif-based graphs. In order to boost the representational power of the motif feature, we design a Transformer-based local augmentation module to enrich motif features by introducing heterogeneous atom information in motif representation learning. Besides, we focus on the molecular hierarchical relationship and propose a simple yet effective rescaling module, called contextual self-rescaling, that adaptively recalibrates molecular representations by explicitly modelling interdependencies between atom and motif features. Extensive computational experiments demonstrate that HimGNN can achieve promising performances over state-of-the-art baselines on both classification and regression tasks in molecular property prediction.
Collapse
Affiliation(s)
- Shen Han
- College of Informatics, Huazhong Agricultural University, People's Republic of China
| | - Haitao Fu
- College of Informatics, Huazhong Agricultural University, People's Republic of China
| | - Yuyang Wu
- College of Plant Science and Technology, Huazhong Agricultural University, People's Republic of China
| | - Ganglan Zhao
- College of Informatics, Huazhong Agricultural University, People's Republic of China
| | - Zhenyu Song
- College of Informatics, Huazhong Agricultural University, People's Republic of China
| | - Feng Huang
- College of Informatics, Huazhong Agricultural University, People's Republic of China
| | - Zhongfei Zhang
- Computer Science Department, Binghamton University, Binghamton, NY, USA
| | - Shichao Liu
- College of Informatics, Huazhong Agricultural University, People's Republic of China and Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Animal Farming Technology, Ministry of Agriculture, Huazhong Agricultural University
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, People's Republic of China and Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Animal Farming Technology, Ministry of Agriculture, Huazhong Agricultural University
| |
Collapse
|
12
|
Huang Y, Huang HY, Chen Y, Lin YCD, Yao L, Lin T, Leng J, Chang Y, Zhang Y, Zhu Z, Ma K, Cheng YN, Lee TY, Huang HD. A Robust Drug-Target Interaction Prediction Framework with Capsule Network and Transfer Learning. Int J Mol Sci 2023; 24:14061. [PMID: 37762364 PMCID: PMC10531393 DOI: 10.3390/ijms241814061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 08/27/2023] [Accepted: 08/28/2023] [Indexed: 09/29/2023] Open
Abstract
Drug-target interactions (DTIs) are considered a crucial component of drug design and drug discovery. To date, many computational methods were developed for drug-target interactions, but they are insufficiently informative for accurately predicting DTIs due to the lack of experimentally verified negative datasets, inaccurate molecular feature representation, and ineffective DTI classifiers. Therefore, we address the limitations of randomly selecting negative DTI data from unknown drug-target pairs by establishing two experimentally validated datasets and propose a capsule network-based framework called CapBM-DTI to capture hierarchical relationships of drugs and targets, which adopts pre-trained bidirectional encoder representations from transformers (BERT) for contextual sequence feature extraction from target proteins through transfer learning and the message-passing neural network (MPNN) for the 2-D graph feature extraction of compounds to accurately and robustly identify drug-target interactions. We compared the performance of CapBM-DTI with state-of-the-art methods using four experimentally validated DTI datasets of different sizes, including human (Homo sapiens) and worm (Caenorhabditis elegans) species datasets, as well as three subsets (new compounds, new proteins, and new pairs). Our results demonstrate that the proposed model achieved robust performance and powerful generalization ability in all experiments. The case study on treating COVID-19 demonstrates the applicability of the model in virtual screening.
Collapse
Affiliation(s)
- Yixian Huang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Hsi-Yuan Huang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Yigang Chen
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Yang-Chi-Dung Lin
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Lantian Yao
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Tianxiu Lin
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Junlin Leng
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Yuan Chang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Yuntian Zhang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Zihao Zhu
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Kun Ma
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| | - Yeong-Nan Cheng
- Institute of Bioinformatics and Systems Biology, Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (Y.-N.C.)
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (Y.-N.C.)
| | - Hsien-Da Huang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.)
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
| |
Collapse
|
13
|
Wu T, Tang Y, Sun Q, Xiong L. Molecular Joint Representation Learning via Multi-Modal Information of SMILES and Graphs. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3044-3055. [PMID: 37028366 DOI: 10.1109/tcbb.2023.3253862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
In recent years, artificial intelligence has played an important role on accelerating the whole process of drug discovery. Various of molecular representation schemes of different modals (e.g., textual sequence or graph) are developed. By digitally encoding them, different chemical information can be learned through corresponding network structures. Molecular graphs and Simplified Molecular Input Line Entry System (SMILES) are popular means for molecular representation learning in current. Previous works have done attempts by combining both of them to solve the problem of specific information loss in single-modal representation on various tasks. To further fusing such multi-modal imformation, the correspondence between learned chemical feature from different representation should be considered. To realize this, we propose a novel framework of molecular joint representation learning via Multi-Modal information of SMILES and molecular Graphs, called MMSG. We improve the self-attention mechanism by introducing bond-level graph representation as attention bias in Transformer to reinforce feature correspondence between multi-modal information. We further propose a Bidirectional Message Communication Graph Neural Network (BMC GNN) to strengthen the information flow aggregated from graphs for further combination. Numerous experiments on public property prediction datasets have demonstrated the effectiveness of our model.
Collapse
|
14
|
Jang WD, Jang J, Song JS, Ahn S, Oh KS. PredPS: Attention-based graph neural network for predicting stability of compounds in human plasma. Comput Struct Biotechnol J 2023; 21:3532-3539. [PMID: 37484492 PMCID: PMC10362732 DOI: 10.1016/j.csbj.2023.07.008] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 07/02/2023] [Accepted: 07/05/2023] [Indexed: 07/25/2023] Open
Abstract
Stability of compounds in the human plasma is crucial for maintaining sufficient systemic drug exposure and considered an essential factor in the early stages of drug discovery and development. The rapid degradation of compounds in the plasma can result in poor in vivo efficacy. Currently, there are no open-source software programs for predicting human plasma stability. In this study, we developed an attention-based graph neural network, PredPS to predict the plasma stability of compounds in human plasma using in-house and open-source datasets. The PredPS outperformed the two machine learning and two deep learning algorithms that were used for comparison indicating its stability-predicting efficiency. PredPS achieved an area under the receiver operating characteristic curve of 90.1%, accuracy of 83.5%, sensitivity of 82.3%, and specificity of 84.6% when evaluated using 5-fold cross-validation. In the early stages of drug discovery, PredPS could be a helpful method for predicting the human plasma stability of compounds. Saving time and money can be accomplished by adopting an in silico-based plasma stability prediction model at the high-throughput screening stage. The source code for PredPS is available at https://bitbucket.org/krict-ai/predps and the PredPS web server is available at https://predps.netlify.app.
Collapse
Affiliation(s)
- Woo Dae Jang
- Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology, Daejeon 34114, Republic of Korea
| | - Jidon Jang
- Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology, Daejeon 34114, Republic of Korea
| | - Jin Sook Song
- Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology, Daejeon 34114, Republic of Korea
| | - Sunjoo Ahn
- Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology, Daejeon 34114, Republic of Korea
- Department of Medicinal and Pharmaceutical Chemistry, University of Science and Technology, Daejeon 34129, Republic of Korea
| | - Kwang-Seok Oh
- Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology, Daejeon 34114, Republic of Korea
- Department of Medicinal and Pharmaceutical Chemistry, University of Science and Technology, Daejeon 34129, Republic of Korea
| |
Collapse
|
15
|
Jiang J, Zhang R, Yuan Y, Li T, Li G, Zhao Z, Yu Z. NoiseMol: A noise-robusted data augmentation via perturbing noise for molecular property prediction. J Mol Graph Model 2023; 121:108454. [PMID: 36963306 DOI: 10.1016/j.jmgm.2023.108454] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 03/05/2023] [Accepted: 03/13/2023] [Indexed: 03/17/2023]
Abstract
Simplified Molecular-Input Line-Entry System (SMILES) is one of a widely used molecular representation methods for molecular property prediction. We conjecture that all the characters in the SMILES string of a molecule are essential for making up the molecules, but most of them make little contribution to determining a particular property of the molecule. Therefore, we verified the conjecture in the pre-experiment. Motivated by the result, we propose to inject proper noisy information into the SMILES to augment the training data by increasing the diversity of the labeled molecules. To this end, we explore injecting perturbing noise into the original labeled SMILES strings to construct augmented data for alleviating the limitation of the labeled compound data and enhancing the model to extract more useful molecular representation for molecular property prediction. Specifically, we directly adopt mask, swap, deletion, and fusion operations on SMILES strings to randomly mask, swap, and delete atoms in SMILES strings. Then, the augmented data is used by two strategies: each epoch alternately feeds the original and perturbing noisy molecules, or each batch alternately feeds the original and perturbing noisy molecules. We conduct experiments on both Transformer and BiGRU models to validate the effectiveness by adopting widely used datasets from MoleculeNet and ZINC. Experimental results demonstrate that the proposed method outperforms strong baselines on all the datasets. NoiseMol obtains the best performance on BBBP and FDA when compared with state-of-the-art methods. Besides, NoiseMol achieves the best accuracy on LogP. Therefore, injecting perturbing noise into the labeled SMILES strings is an effective and efficient method, which improves the prediction performance, generalization, and robustness of the deep learning models.
Collapse
Affiliation(s)
- Jing Jiang
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China; Key Laboratory of China's Ethnic Languages and Information Technology of Ministry of Education, Northwest Minzu University, Lanzhou, Gansu, China.
| | - Ruisheng Zhang
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China.
| | - Yongna Yuan
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China.
| | - Tongfeng Li
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China; Computer College, Qinghai Normal University, Xining, Qinghai, China.
| | - Gaili Li
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China.
| | - Zhili Zhao
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China.
| | - Zhixuan Yu
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China.
| |
Collapse
|
16
|
Yuan H, Yu H, Gui S, Ji S. Explainability in Graph Neural Networks: A Taxonomic Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:5782-5799. [PMID: 36063508 DOI: 10.1109/tpami.2022.3204236] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Deep learning methods are achieving ever-increasing performance on many artificial intelligence tasks. A major limitation of deep models is that they are not amenable to interpretability. This limitation can be circumvented by developing post hoc techniques to explain predictions, giving rise to the area of explainability. Recently, explainability of deep models on images and texts has achieved significant progress. In the area of graph data, graph neural networks (GNNs) and their explainability are experiencing rapid developments. However, there is neither a unified treatment of GNN explainability methods, nor a standard benchmark and testbed for evaluations. In this survey, we provide a unified and taxonomic view of current GNN explainability methods. Our unified and taxonomic treatments of this subject shed lights on the commonalities and differences of existing methods and set the stage for further methodological developments. To facilitate evaluations, we provide a testbed for GNN explainability, including datasets, common algorithms and evaluation metrics. Furthermore, we conduct comprehensive experiments to compare and analyze the performance of many techniques. Altogether, this work provides a unified methodological treatment of GNN explainability and a standardized testbed for evaluations.
Collapse
|
17
|
Sarmiento Varón L, González-Puelma J, Medina-Ortiz D, Aldridge J, Alvarez-Saravia D, Uribe-Paredes R, Navarrete MA. The role of machine learning in health policies during the COVID-19 pandemic and in long COVID management. Front Public Health 2023; 11:1140353. [PMID: 37113165 PMCID: PMC10126380 DOI: 10.3389/fpubh.2023.1140353] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2023] [Accepted: 03/20/2023] [Indexed: 04/29/2023] Open
Abstract
The ongoing COVID-19 pandemic is arguably one of the most challenging health crises in modern times. The development of effective strategies to control the spread of SARS-CoV-2 were major goals for governments and policy makers. Mathematical modeling and machine learning emerged as potent tools to guide and optimize the different control measures. This review briefly summarizes the SARS-CoV-2 pandemic evolution during the first 3 years. It details the main public health challenges focusing on the contribution of mathematical modeling to design and guide government action plans and spread mitigation interventions of SARS-CoV-2. Next describes the application of machine learning methods in a series of study cases, including COVID-19 clinical diagnosis, the analysis of epidemiological variables, and drug discovery by protein engineering techniques. Lastly, it explores the use of machine learning tools for investigating long COVID, by identifying patterns and relationships of symptoms, predicting risk indicators, and enabling early evaluation of COVID-19 sequelae.
Collapse
Affiliation(s)
| | - Jorge González-Puelma
- Centro Asistencial Docente y de Investigación, Universidad de Magallanes, Punta Arenas, Chile
- Escuela de Medicina, Universidad de Magallanes, Punta Arenas, Chile
| | - David Medina-Ortiz
- Departamento de Ingeniería en Computación, Facultad de Ingeniería, Universidad de Magallanes, Punta Arenas, Chile
| | - Jacqueline Aldridge
- Departamento de Ingeniería en Computación, Facultad de Ingeniería, Universidad de Magallanes, Punta Arenas, Chile
| | - Diego Alvarez-Saravia
- Centro Asistencial Docente y de Investigación, Universidad de Magallanes, Punta Arenas, Chile
- Escuela de Medicina, Universidad de Magallanes, Punta Arenas, Chile
| | - Roberto Uribe-Paredes
- Departamento de Ingeniería en Computación, Facultad de Ingeniería, Universidad de Magallanes, Punta Arenas, Chile
| | - Marcelo A. Navarrete
- Centro Asistencial Docente y de Investigación, Universidad de Magallanes, Punta Arenas, Chile
- Escuela de Medicina, Universidad de Magallanes, Punta Arenas, Chile
| |
Collapse
|
18
|
Xiong F, Xu H, Yu M, Chen X, Zhong Z, Guo Y, Chen M, Ou H, Wu J, Xie A, Xiong J, Xu L, Zhang L, Zhong Q, Huang L, Li Z, Zhang T, Jin F, He X. 3CLpro inhibitors: DEL-based molecular generation. Front Pharmacol 2022; 13:1085665. [PMID: 36569316 PMCID: PMC9768338 DOI: 10.3389/fphar.2022.1085665] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 11/23/2022] [Indexed: 12/12/2022] Open
Abstract
Molecular generation (MG) via machine learning (ML) has speeded drug structural optimization, especially for targets with a large amount of reported bioactivity data. However, molecular generation for structural optimization is often powerless for new targets. DNA-encoded library (DEL) can generate systematic, target-specific activity data, including novel targets with few or unknown activity data. Therefore, this study aims to overcome the limitation of molecular generation in the structural optimization for the new target. Firstly, we generated molecules using the structure-affinity data (2.96 million samples) for 3C-like protease (3CLpro) from our own-built DEL platform to get rid of using public databases (e.g., CHEMBL and ZINC). Subsequently, to analyze the effect of transfer learning on the positive rate of the molecule generation model, molecular docking and affinity model based on DEL data were applied to explore the enhanced impact of transfer learning on molecule generation. In addition, the generated molecules are subjected to multiple filtering, including physicochemical properties, drug-like properties, and pharmacophore evaluation, molecular docking to determine the molecules for further study and verified by molecular dynamics simulation.
Collapse
Affiliation(s)
- Feng Xiong
- Shenzhen Innovation Center for Small Molecule Drug Discovery Co., Ltd., Shenzhen, China,*Correspondence: Feng Xiong, ; Feng Jin, ; Xun He,
| | - Honggui Xu
- Shenzhen NewDEL Biotech Co., Ltd., Shenzhen, China
| | - Mingao Yu
- Shenzhen NewDEL Biotech Co., Ltd., Shenzhen, China
| | - Xingyu Chen
- Shenzhen NewDEL Biotech Co., Ltd., Shenzhen, China
| | - Zhenmin Zhong
- Shenzhen Innovation Center for Small Molecule Drug Discovery Co., Ltd., Shenzhen, China
| | - Yuhan Guo
- Shenzhen NewDEL Biotech Co., Ltd., Shenzhen, China
| | - Meihong Chen
- Shenzhen Innovation Center for Small Molecule Drug Discovery Co., Ltd., Shenzhen, China
| | - Huanfang Ou
- Shenzhen Innovation Center for Small Molecule Drug Discovery Co., Ltd., Shenzhen, China
| | - Jiaqi Wu
- Shenzhen Innovation Center for Small Molecule Drug Discovery Co., Ltd., Shenzhen, China
| | - Anhua Xie
- Shenzhen Innovation Center for Small Molecule Drug Discovery Co., Ltd., Shenzhen, China
| | - Jiaqi Xiong
- Shenzhen Innovation Center for Small Molecule Drug Discovery Co., Ltd., Shenzhen, China
| | - Linlin Xu
- Shenzhen Innovation Center for Small Molecule Drug Discovery Co., Ltd., Shenzhen, China
| | - Lanmei Zhang
- Shenzhen Innovation Center for Small Molecule Drug Discovery Co., Ltd., Shenzhen, China
| | - Qijian Zhong
- Shenzhen Innovation Center for Small Molecule Drug Discovery Co., Ltd., Shenzhen, China
| | - Liye Huang
- Shenzhen Innovation Center for Small Molecule Drug Discovery Co., Ltd., Shenzhen, China
| | - Zhenwei Li
- Shenzhen Innovation Center for Small Molecule Drug Discovery Co., Ltd., Shenzhen, China
| | | | - Feng Jin
- Shenzhen NewDEL Biotech Co., Ltd., Shenzhen, China,*Correspondence: Feng Xiong, ; Feng Jin, ; Xun He,
| | - Xun He
- Shenzhen Innovation Center for Small Molecule Drug Discovery Co., Ltd., Shenzhen, China,*Correspondence: Feng Xiong, ; Feng Jin, ; Xun He,
| |
Collapse
|
19
|
Zeng X, Xiang H, Yu L, Wang J, Li K, Nussinov R, Cheng F. Accurate prediction of molecular properties and drug targets using a self-supervised image representation learning framework. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00557-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
20
|
Jiang J, Zhang R, Ma J, Liu Y, Yang E, Du S, Zhao Z, Yuan Y. TranGRU: focusing on both the local and global information of molecules for molecular property prediction. APPL INTELL 2022; 53:15246-15260. [PMID: 36405344 PMCID: PMC9662124 DOI: 10.1007/s10489-022-04280-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/17/2022] [Indexed: 11/16/2022]
Abstract
Molecular property prediction is an essential but challenging task in drug discovery. The recurrent neural network (RNN) and Transformer are the mainstream methods for sequence modeling, and both have been successfully applied independently for molecular property prediction. As the local information and global information of molecules are very important for molecular properties, we aim to integrate the bi-directional gated recurrent unit (BiGRU) into the original Transformer encoder, together with self-attention to better capture local and global molecular information simultaneously. To this end, we propose the TranGRU approach, which encodes the local and global information of molecules by using the BiGRU and self-attention, respectively. Then, we use a gate mechanism to reasonably fuse the two molecular representations. In this way, we enhance the ability of the proposed model to encode both local and global molecular information. Compared to the baselines and state-of-the-art methods when treating each task as a single-task classification on Tox21, the proposed approach outperforms the baselines on 9 out of 12 tasks and state-of-the-art methods on 5 out of 12 tasks. TranGRU also obtains the best ROC-AUC scores on BBBP, FDA, LogP, and Tox21 (multitask classification) and has a comparable performance on ToxCast, BACE, and ecoli. On the whole, TranGRU achieves better performance for molecular property prediction. The source code is available in GitHub: https://github.com/Jiangjing0122/TranGRU.
Collapse
Affiliation(s)
- Jing Jiang
- School of Information Science and Engineering, Lanzhou University, Tianshui Road, Lanzhou, 730000 Gansu China
- Key Laboratory of China’s Ethnic Languages and Information Technology of Ministry of Education, Northwest Minzu University, Baiyin Road, Lanzhou, 730030 Gansu China
| | - Ruisheng Zhang
- School of Information Science and Engineering, Lanzhou University, Tianshui Road, Lanzhou, 730000 Gansu China
| | - Jun Ma
- School of Information Science and Engineering, Lanzhou University, Tianshui Road, Lanzhou, 730000 Gansu China
| | - Yunwu Liu
- School of Information Science and Engineering, Lanzhou University, Tianshui Road, Lanzhou, 730000 Gansu China
| | - Enjie Yang
- School of Information Science and Engineering, Lanzhou University, Tianshui Road, Lanzhou, 730000 Gansu China
| | - Shikang Du
- School of Information Science and Engineering, Lanzhou University, Tianshui Road, Lanzhou, 730000 Gansu China
| | - Zhili Zhao
- School of Information Science and Engineering, Lanzhou University, Tianshui Road, Lanzhou, 730000 Gansu China
| | - Yongna Yuan
- School of Information Science and Engineering, Lanzhou University, Tianshui Road, Lanzhou, 730000 Gansu China
| |
Collapse
|
21
|
Liu G, Stokes JM. A brief guide to machine learning for antibiotic discovery. Curr Opin Microbiol 2022; 69:102190. [PMID: 35963098 DOI: 10.1016/j.mib.2022.102190] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 07/11/2022] [Accepted: 07/12/2022] [Indexed: 11/03/2022]
Abstract
Rising antibiotic resistance and an alarmingly lean antibiotic pipeline require the adoption of novel approaches to rapidly discover new structural and functional classes of antibiotics. Excitingly, algorithmic approaches to antibiotic discovery are sufficiently advanced to meaningfully influence the antibiotic discovery process. Indeed, once trained on high-quality datasets, contemporary machine-learning and deep-learning models can be used to perform predictions for new antibiotics across vast chemical spaces, orders of magnitude more rapidly than compounds can be screened in the laboratory. This increases the probability of discovering new antibiotics with desirable properties. In this short review, we briefly describe the utility of contemporary machine-learning and deep-learning approaches to guide the discovery of new small-molecule antibiotics and unidentified natural products. We then propose a call to action for more open sharing of high-quality screening datasets to accelerate the rate at which forthcoming antibiotic-prediction models can be trained. Together, we aim to introduce antibiotic discoverers to a sample of recent applications of contemporary algorithmic methods to facilitate the wider adoption of these powerful computational approaches.
Collapse
Affiliation(s)
- Gary Liu
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, Canada; Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada; David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, Canada
| | - Jonathan M Stokes
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, Canada; Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada; David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, Canada.
| |
Collapse
|