1
|
Zhao D, Zhang Y, Chen Y, Li B, Zhou W, Wang L. Highly Accurate and Explainable Predictions of Small-Molecule Antioxidants for Eight In Vitro Assays Simultaneously through an Alternating Multitask Learning Strategy. J Chem Inf Model 2024. [PMID: 38888465 DOI: 10.1021/acs.jcim.4c00748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/20/2024]
Abstract
Small molecule antioxidants can inhibit or retard oxidation reactions and protect against free radical damage to cells, thus playing a key role in food, cosmetics, pharmaceuticals, the environment, as well as materials. Experimentally driven antioxidant discovery is a major paradigm, and computationally assisted antioxidants are rarely reported. In this study, a functional-group-based alternating multitask self-supervised molecular representation learning method is proposed to simultaneously predict the antioxidant activities of small molecules for eight commonly used in vitro antioxidant assays. Extensive evaluation results reveal that compared with the baseline models, the multitask FG-BERT model achieves the best overall predictive performance, with the highest average F1, BA, ROC-AUC, and PRC-AUC values of 0.860, 0.880, 0.954, and 0.937 for the test sets, respectively. The Y-scrambling testing results further demonstrate that such a deep learning model was not constructed by accident and that it has reliable predictive capabilities. Additionally, the excellent interpretability of the multitask FG-BERT model makes it easy to identify key structural fragments/groups that contribute significantly to the antioxidant effect of a given molecule. Finally, an online antioxidant activity prediction platform called AOP (freely available at https://aop.idruglab.cn/) and its local version were developed based on the high-quality multitask FG-BERT model for experts and nonexperts in the field. We anticipate that it will contribute to the discovery of novel small-molecule antioxidants.
Collapse
Affiliation(s)
- Duancheng Zhao
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Yanhong Zhang
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Yihao Chen
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Biaoshun Li
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Wenguang Zhou
- Central Laboratory of The Sixth Affiliated Hospital, School of Medicine, South China University of Technology, Foshan 528200, China
| | - Ling Wang
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| |
Collapse
|
2
|
Qian X, Ju B, Shen P, Yang K, Li L, Liu Q. Meta Learning with Attention Based FP-GNNs for Few-Shot Molecular Property Prediction. ACS OMEGA 2024; 9:23940-23948. [PMID: 38854580 PMCID: PMC11154901 DOI: 10.1021/acsomega.4c02147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 05/09/2024] [Accepted: 05/14/2024] [Indexed: 06/11/2024]
Abstract
Molecular property prediction holds significant importance in drug discovery, enabling the identification of biologically active compounds with favorable drug-like properties. However, the low data problem, arising from the scarcity of labeled data in drug discovery, poses a substantial obstacle for accurate predictions. To address this challenge, we introduce a novel architecture, AttFPGNN-MAML, for few-shot molecular property prediction. The proposed approach incorporates a hybrid feature representation to enrich molecular representations and model intermolecular relationships specific to the task. By leveraging ProtoMAML, a meta-learning strategy, our model is trained and adapted to new tasks. Evaluation on two few-shot data sets, MoleculeNet and FS-Mol, demonstrates our method's superior performance in three out of four tasks and across various support set sizes. These results convincingly validate the effectiveness of our method in the realm of few-shot molecular property prediction. The source code is publicly available at https://github.com/sanomics-lab/AttFPGNN-MAML.
Collapse
Affiliation(s)
- Xiaoliang Qian
- Translational
Medical Center for Stem Cell Therapy and Institute for Regenerative
Medicine, Shanghai East Hospital, Frontier Science Center for Stem
Cell Research, Bioinformatics Department, School of Life Sciences
and Technology, Tongji University, Shanghai 200092, China
- SanOmics
AI Co., Ltd., Hangzhou 311103, China
| | - Bin Ju
- SanOmics
AI Co., Ltd., Hangzhou 311103, China
- State
Key Laboratory for Diagnosis and Treatment of Infectious Diseases,
National Clinical Research Center for Infectious Diseases, Collaborative
Innovation Center for Diagnosis and Treatment of Infectious Diseases,
The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou 310009, China
| | - Ping Shen
- State
Key Laboratory for Diagnosis and Treatment of Infectious Diseases,
National Clinical Research Center for Infectious Diseases, Collaborative
Innovation Center for Diagnosis and Treatment of Infectious Diseases,
The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou 310009, China
| | - Keda Yang
- Shulan
International Medical College, Zhejiang
Shuren University, Hangzhou 310015, China
| | - Li Li
- Department
of Hepatobiliary Surgery, The First People’s
Hospital of Kunming, Kunming 650034, China
| | - Qi Liu
- Translational
Medical Center for Stem Cell Therapy and Institute for Regenerative
Medicine, Shanghai East Hospital, Frontier Science Center for Stem
Cell Research, Bioinformatics Department, School of Life Sciences
and Technology, Tongji University, Shanghai 200092, China
- Key
Laboratory
of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University),
Ministry of Education, Orthopaedic Department of Tongji Hospital,
Frontier Science Center for Stem Cell Research, Bioinformatics Department,
School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
- Shanghai
Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China
| |
Collapse
|
3
|
Wang H, Chen B, Sun H, Zhang Y. Carbon-based molecular properties efficiently predicted by deep learning-based quantum chemical simulation with large language models. Comput Biol Med 2024; 176:108531. [PMID: 38728991 DOI: 10.1016/j.compbiomed.2024.108531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Revised: 04/21/2024] [Accepted: 04/28/2024] [Indexed: 05/12/2024]
Abstract
The prediction of thermodynamic properties of carbon-based molecules based on their geometrical conformation using fluctuation and density functional theories has achieved great success in the field of energy chemistry, while the excessive computational cost provides both opportunities and challenges for the integration of machine learning. In this work, a deep learning-based quantum chemical prediction model was constructed for efficient prediction of thermodynamic properties of carbon-based molecules. We constructed a novel framework - encoding the 3D information into a large language model (LLM), which in turn generates a 2D SMILES string, while embedding a learnable encoding designed to preserve the integrity of the original 3D information, providing better structural information for the model. Additionally, we have designed an equivariant learning module to encompass representations of conformations and feature learning for conformational sampling. This framework aims to predict thermodynamic properties more accurately than learning from 2D topology alone, while providing faster computational speeds than conventional simulations. By combining machine learning and quantum chemistry, we pioneer efficient practical applications in the field of energy chemistry. Our model advances the integration of data-driven and physics-based modeling to unlock novel insights into carbon-based molecules.
Collapse
Affiliation(s)
- Haoyu Wang
- University of Shanghai for Science and Technology, Shanghai, China; School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, China.
| | - Bin Chen
- University of Shanghai for Science and Technology, Shanghai, China; School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, China.
| | - Hangling Sun
- Hengtu Imalligent Technology (Shanghai) Co., Ltd., Shanghai, China
| | - Yuxuan Zhang
- University of Shanghai for Science and Technology, Shanghai, China
| |
Collapse
|
4
|
Chen K, Xu R, Hu X, Li D, Hou T, Kang Y. Recent advances in the development of DprE1 inhibitors using AI/CADD approaches. Drug Discov Today 2024; 29:103987. [PMID: 38670256 DOI: 10.1016/j.drudis.2024.103987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 03/22/2024] [Accepted: 04/16/2024] [Indexed: 04/28/2024]
Abstract
Tuberculosis (TB) is a global lethal disease caused by Mycobacterium tuberculosis (Mtb). The flavoenzyme decaprenylphosphoryl-β-d-ribose 2'-oxidase (DprE1) plays a crucial part in the biosynthesis of lipoarabinomannan and arabinogalactan for the cell wall of Mtb and represents a promising target for anti-TB drug development. Therefore, there is an urgent need to discover DprE1 inhibitors with novel scaffolds, improved bioactivity and high drug-likeness. Recent studies have shown that artificial intelligence/computer-aided drug design (AI/CADD) techniques are powerful tools in the discovery of novel DprE1 inhibitors. This review provides an overview of the discovery of DprE1 inhibitors and their underlying mechanism of action and highlights recent advances in the discovery and optimization of DprE1 inhibitors using AI/CADD approaches.
Collapse
Affiliation(s)
- Kepeng Chen
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Ruolan Xu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Xueping Hu
- Institute of Molecular Sciences and Engineering, Institute of Frontier and Interdisciplinary Science, Shandong University, Qingdao, Shandong 266237, China
| | - Dan Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China.
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China.
| |
Collapse
|
5
|
Pang Y, Chen Y, Lin M, Zhang Y, Zhang J, Wang L. MMSyn: A New Multimodal Deep Learning Framework for Enhanced Prediction of Synergistic Drug Combinations. J Chem Inf Model 2024; 64:3689-3705. [PMID: 38676916 DOI: 10.1021/acs.jcim.4c00165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2024]
Abstract
Combination therapy is a promising strategy for the successful treatment of cancer. The large number of possible combinations, however, mean that it is laborious and expensive to screen for synergistic drug combinations in vitro. Nevertheless, because of the availability of high-throughput screening data and advances in computational techniques, deep learning (DL) can be a useful tool for the prediction of synergistic drug combinations. In this study, we proposed a multimodal DL framework, MMSyn, for the prediction of synergistic drug combinations. First, features embedded in the drug molecules were extracted: structure, fingerprint, and string encoding. Then, gene expression data, DNA copy number, and pathway activity were used to describe cancer cell lines. Finally, these processed features were integrated using an attention mechanism and an interaction module and then input into a multilayer perceptron to predict drug synergy. Experimental results showed that our method outperformed five state-of-the-art DL methods and three traditional machine learning models for drug combination prediction. We verified that MMSyn achieved superior performance in stratified cross-validation settings using both the drug combination and cell line data. Moreover, we performed a set of ablation experiments to illustrate the effectiveness of each component and the efficacy of our model. In addition, our visual representation and case studies further confirmed the effectiveness of our model. All results showed that MMSyn can be used as a powerful tool for the prediction of synergistic drug combinations.
Collapse
Affiliation(s)
- Yu Pang
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Yihao Chen
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Mujie Lin
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Yanhong Zhang
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Jiquan Zhang
- Guizhou Provincial Engineering Technology Research Center for Chemical Drug R&D, College of Pharmacy, Guizhou Medical University, Guiyang 550025, P. R. China
| | - Ling Wang
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| |
Collapse
|
6
|
Monsia R, Bhattacharyya S. Virtual Screening of Molecules via Neural Fingerprint-based Deep Learning Technique. RESEARCH SQUARE 2024:rs.3.rs-4355625. [PMID: 38766198 PMCID: PMC11100899 DOI: 10.21203/rs.3.rs-4355625/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
A machine learning-based drug screening technique has been developed and optimized using convolutional neural network-derived fingerprints. The optimization of weights in the neural network-based fingerprinting technique was compared with fixed Morgan fingerprints in regard to binary classification on drug-target binding affinity. The assessment was carried out using six different target proteins using randomly chosen small molecules from the ZINC15 database for training. This new architecture proved to be more efficient in screening molecules that less favorably bind to specific targets and retaining molecules that favorably bind to it. Scientific contribution We have developed a new neural fingerprint-based screening model that has a significant ability to capture hits. Despite using a smaller dataset, this model is capable of mapping chemical space similar to other contemporary algorithms designed for molecular screening. The novelty of the present algorithm lies in the speed with which the models are trained and tuned before testing its predictive capabilities and hence is a significant step forward in the field of machine learning-embedded computational drug discovery.
Collapse
|
7
|
Gao M, Zhang D, Chen Y, Zhang Y, Wang Z, Wang X, Li S, Guo Y, Webb GI, Nguyen ATN, May L, Song J. GraphormerDTI: A graph transformer-based approach for drug-target interaction prediction. Comput Biol Med 2024; 173:108339. [PMID: 38547658 DOI: 10.1016/j.compbiomed.2024.108339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 03/05/2024] [Accepted: 03/17/2024] [Indexed: 04/17/2024]
Abstract
The application of Artificial Intelligence (AI) to screen drug molecules with potential therapeutic effects has revolutionized the drug discovery process, with significantly lower economic cost and time consumption than the traditional drug discovery pipeline. With the great power of AI, it is possible to rapidly search the vast chemical space for potential drug-target interactions (DTIs) between candidate drug molecules and disease protein targets. However, only a small proportion of molecules have labelled DTIs, consequently limiting the performance of AI-based drug screening. To solve this problem, a machine learning-based approach with great ability to generalize DTI prediction across molecules is desirable. Many existing machine learning approaches for DTI identification failed to exploit the full information with respect to the topological structures of candidate molecules. To develop a better approach for DTI prediction, we propose GraphormerDTI, which employs the powerful Graph Transformer neural network to model molecular structures. GraphormerDTI embeds molecular graphs into vector-format representations through iterative Transformer-based message passing, which encodes molecules' structural characteristics by node centrality encoding, node spatial encoding and edge encoding. With a strong structural inductive bias, the proposed GraphormerDTI approach can effectively infer informative representations for out-of-sample molecules and as such, it is capable of predicting DTIs across molecules with an exceptional performance. GraphormerDTI integrates the Graph Transformer neural network with a 1-dimensional Convolutional Neural Network (1D-CNN) to extract the drugs' and target proteins' representations and leverages an attention mechanism to model the interactions between them. To examine GraphormerDTI's performance for DTI prediction, we conduct experiments on three benchmark datasets, where GraphormerDTI achieves a superior performance than five state-of-the-art baselines for out-of-molecule DTI prediction, including GNN-CPI, GNN-PT, DeepEmbedding-DTI, MolTrans and HyperAttentionDTI, and is on a par with the best baseline for transductive DTI prediction. The source codes and datasets are publicly accessible at https://github.com/mengmeng34/GraphormerDTI.
Collapse
Affiliation(s)
- Mengmeng Gao
- School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Daokun Zhang
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Melbourne, Australia.
| | - Yi Chen
- School of Biological Science and Medical Engineering, Southeast University, Nanjing, China.
| | - Yiwen Zhang
- Climate, Air Quality Research Unit, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, 3004, Australia
| | - Zhikang Wang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia
| | - Xiaoyu Wang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia
| | - Shanshan Li
- Climate, Air Quality Research Unit, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, 3004, Australia
| | - Yuming Guo
- Climate, Air Quality Research Unit, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, 3004, Australia
| | - Geoffrey I Webb
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Melbourne, Australia
| | - Anh T N Nguyen
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Melbourne, Australia
| | - Lauren May
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Melbourne, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia.
| |
Collapse
|
8
|
Offensperger F, Tin G, Duran-Frigola M, Hahn E, Dobner S, Ende CWA, Strohbach JW, Rukavina A, Brennsteiner V, Ogilvie K, Marella N, Kladnik K, Ciuffa R, Majmudar JD, Field SD, Bensimon A, Ferrari L, Ferrada E, Ng A, Zhang Z, Degliesposti G, Boeszoermenyi A, Martens S, Stanton R, Müller AC, Hannich JT, Hepworth D, Superti-Furga G, Kubicek S, Schenone M, Winter GE. Large-scale chemoproteomics expedites ligand discovery and predicts ligand behavior in cells. Science 2024; 384:eadk5864. [PMID: 38662832 DOI: 10.1126/science.adk5864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 03/22/2024] [Indexed: 05/04/2024]
Abstract
Chemical modulation of proteins enables a mechanistic understanding of biology and represents the foundation of most therapeutics. However, despite decades of research, 80% of the human proteome lacks functional ligands. Chemical proteomics has advanced fragment-based ligand discovery toward cellular systems, but throughput limitations have stymied the scalable identification of fragment-protein interactions. We report proteome-wide maps of protein-binding propensity for 407 structurally diverse small-molecule fragments. We verified that identified interactions can be advanced to active chemical probes of E3 ubiquitin ligases, transporters, and kinases. Integrating machine learning binary classifiers further enabled interpretable predictions of fragment behavior in cells. The resulting resource of fragment-protein interactions and predictive models will help to elucidate principles of molecular recognition and expedite ligand discovery efforts for hitherto undrugged proteins.
Collapse
Affiliation(s)
- Fabian Offensperger
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - Gary Tin
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - Miquel Duran-Frigola
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
- Ersilia Open Source Initiative, Cambridge CB1 3DE, UK
| | - Elisa Hahn
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - Sarah Dobner
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | | | | | - Andrea Rukavina
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - Vincenth Brennsteiner
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - Kevin Ogilvie
- Medicine Design, Pfizer Worldwide Research and Development, Groton, CT 06340, USA
| | - Nara Marella
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - Katharina Kladnik
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - Rodolfo Ciuffa
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | | | | | - Ariel Bensimon
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - Luca Ferrari
- Max Perutz Labs, Vienna Biocenter Campus (VBC), Vienna Biocenter 5, 1030 Vienna, Austria
- University of Vienna, Max Perutz Labs, Vienna Biocenter 5, 1030 Vienna, Austria
| | - Evandro Ferrada
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - Amanda Ng
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - Zhechun Zhang
- Molecular Informatics, Machine Learning and Computational Sciences, Early Clinical Development, Pfizer, Cambridge, MA 02139, USA
| | - Gianluca Degliesposti
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - Andras Boeszoermenyi
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - Sascha Martens
- Max Perutz Labs, Vienna Biocenter Campus (VBC), Vienna Biocenter 5, 1030 Vienna, Austria
- University of Vienna, Max Perutz Labs, Vienna Biocenter 5, 1030 Vienna, Austria
| | - Robert Stanton
- Molecular Informatics, Machine Learning and Computational Sciences, Early Clinical Development, Pfizer, Cambridge, MA 02139, USA
| | - André C Müller
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - J Thomas Hannich
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | | | - Giulio Superti-Furga
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
- Center for Physiology and Pharmacology, Medical University of Vienna, 1090 Vienna, Austria
| | - Stefan Kubicek
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | | | - Georg E Winter
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| |
Collapse
|
9
|
Yao X, Ouyang S, Lian Y, Peng Q, Zhou X, Huang F, Hu X, Shi F, Xia J. PheSeq, a Bayesian deep learning model to enhance and interpret the gene-disease association studies. Genome Med 2024; 16:56. [PMID: 38627848 PMCID: PMC11020195 DOI: 10.1186/s13073-024-01330-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 04/02/2024] [Indexed: 04/19/2024] Open
Abstract
Despite the abundance of genotype-phenotype association studies, the resulting association outcomes often lack robustness and interpretations. To address these challenges, we introduce PheSeq, a Bayesian deep learning model that enhances and interprets association studies through the integration and perception of phenotype descriptions. By implementing the PheSeq model in three case studies on Alzheimer's disease, breast cancer, and lung cancer, we identify 1024 priority genes for Alzheimer's disease and 818 and 566 genes for breast cancer and lung cancer, respectively. Benefiting from data fusion, these findings represent moderate positive rates, high recall rates, and interpretation in gene-disease association studies.
Collapse
Affiliation(s)
- Xinzhi Yao
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Sizhuo Ouyang
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Yulong Lian
- College of Science, Huazhong Agricultural University, Wuhan, China
| | - Qianqian Peng
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Xionghui Zhou
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Feier Huang
- College of Life Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Xuehai Hu
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Feng Shi
- College of Science, Huazhong Agricultural University, Wuhan, China
| | - Jingbo Xia
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China.
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China.
| |
Collapse
|
10
|
Zhao Q, Zheng Y, Qiu Y, Yu Y, Huang M, Wu Y, Chen X, Huang Y, Cui S, Zhuang S. Graph Convolutional Network-Enhanced Model for Screening Persistent, Mobile, and Toxic and Very Persistent and Very Mobile Substances. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:6149-6157. [PMID: 38556993 DOI: 10.1021/acs.est.4c01201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
The global management for persistent, mobile, and toxic (PMT) and very persistent and very mobile (vPvM) substances has been further strengthened with the rapid increase of emerging contaminants. The development of a ready-to-use and publicly available tool for the high-throughput screening of PMT/vPvM substances is thus urgently needed. However, the current model building with the coupling of conventional algorithms, small-scale data set, and simplistic features hinders the development of a robust model for screening PMT/vPvM with wide application domains. Here, we construct a graph convolutional network (GCN)-enhanced model with feature fusion of a molecular graph and molecular descriptors to effectively utilize the significant correlation between critical descriptors and PMT/vPvM substances. The model is built with 213,084 substances following the latest PMT classification criteria. The application domains of the GCN-enhanced model assessed by kernel density estimation demonstrate the high suitability for high-throughput screening PMT/vPvM substances with both a high accuracy rate (86.6%) and a low false-negative rate (6.8%). An online server named PMT/vPvM profiler is further developed with a user-friendly web interface (http://www.pmt.zj.cn/). Our study facilitates a more efficient evaluation of PMT/vPvM substances with a globally accessible screening platform.
Collapse
Affiliation(s)
- Qiming Zhao
- College of Environmental and Resource Sciences, and Women's Hospital, School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yuting Zheng
- Solid Waste and Chemicals Management Center, Ministry of Ecology and Environment of the People's Republic of China, Beijing 100029, China
| | - Yu Qiu
- College of Environmental and Resource Sciences, and Women's Hospital, School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yang Yu
- Solid Waste and Chemicals Management Center, Ministry of Ecology and Environment of the People's Republic of China, Beijing 100029, China
| | - Meiling Huang
- College of Environmental and Resource Sciences, and Women's Hospital, School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yiqu Wu
- College of Environmental and Resource Sciences, and Women's Hospital, School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Xiyu Chen
- College of Environmental and Resource Sciences, and Women's Hospital, School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yizhou Huang
- College of Environmental and Resource Sciences, and Women's Hospital, School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Shixuan Cui
- College of Environmental and Resource Sciences, and Women's Hospital, School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Shulin Zhuang
- College of Environmental and Resource Sciences, and Women's Hospital, School of Medicine, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
11
|
Zhang W, Mou M, Hu W, Lu M, Zhang H, Zhang H, Luo Y, Xu H, Tao L, Dai H, Gao J, Zhu F. MOINER: A Novel Multiomics Early Integration Framework for Biomedical Classification and Biomarker Discovery. J Chem Inf Model 2024; 64:2720-2732. [PMID: 38373720 DOI: 10.1021/acs.jcim.4c00013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2024]
Abstract
In the context of precision medicine, multiomics data integration provides a comprehensive understanding of underlying biological processes and is critical for disease diagnosis and biomarker discovery. One commonly used integration method is early integration through concatenation of multiple dimensionally reduced omics matrices due to its simplicity and ease of implementation. However, this approach is seriously limited by information loss and lack of latent feature interaction. Herein, a novel multiomics early integration framework (MOINER) based on information enhancement and image representation learning is thus presented to address the challenges. MOINER employs the self-attention mechanism to capture the intrinsic correlations of omics-features, which make it significantly outperform the existing state-of-the-art methods for multiomics data integration. Moreover, visualizing the attention embedding and identifying potential biomarkers offer interpretable insights into the prediction results. All source codes and model for MOINER are freely available https://github.com/idrblab/MOINER.
Collapse
Affiliation(s)
- Wei Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Wei Hu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Mingkun Lu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Hanyu Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Hongning Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Hongquan Xu
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China
| | - Haibin Dai
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Jianqing Gao
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
12
|
Guo W, Dong Y, Hao GF. Transfer learning empowers accurate pharmacokinetics prediction of small samples. Drug Discov Today 2024; 29:103946. [PMID: 38460571 DOI: 10.1016/j.drudis.2024.103946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 02/22/2024] [Accepted: 03/05/2024] [Indexed: 03/11/2024]
Abstract
Accurate assessment of pharmacokinetic (PK) properties is crucial for selecting optimal candidates and avoiding downstream failures. Transfer learning is an innovative machine learning approach enabling high-throughput prediction with limited data. Recently, transfer learning methods showed promise in predicting ADME/PK parameters. Given the prolific growth of research on transfer learning for PK prediction, a comprehensive review of its advantages and challenges is imperative. This study explores the fundamentals, classifications, toolkits and applications of various transfer learning techniques for PK prediction, demonstrating their utility through three practical case studies. This work will serve as a reference for drug design researchers.
Collapse
Affiliation(s)
- Wenbo Guo
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Guizhou University, Guiyang 550025, China
| | - Yawen Dong
- School of Pharmaceutical Sciences, Guizhou University, Guiyang 550025, China.
| | - Ge-Fei Hao
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Guizhou University, Guiyang 550025, China.
| |
Collapse
|
13
|
Llompart P, Minoletti C, Baybekov S, Horvath D, Marcou G, Varnek A. Will we ever be able to accurately predict solubility? Sci Data 2024; 11:303. [PMID: 38499581 PMCID: PMC10948805 DOI: 10.1038/s41597-024-03105-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 02/29/2024] [Indexed: 03/20/2024] Open
Abstract
Accurate prediction of thermodynamic solubility by machine learning remains a challenge. Recent models often display good performances, but their reliability may be deceiving when used prospectively. This study investigates the origins of these discrepancies, following three directions: a historical perspective, an analysis of the aqueous solubility dataverse and data quality. We investigated over 20 years of published solubility datasets and models, highlighting overlooked datasets and the overlaps between popular sets. We benchmarked recently published models on a novel curated solubility dataset and report poor performances. We also propose a workflow to cure aqueous solubility data aiming at producing useful models for bench chemist. Our results demonstrate that some state-of-the-art models are not ready for public usage because they lack a well-defined applicability domain and overlook historical data sources. We report the impact of factors influencing the utility of the models: interlaboratory standard deviation, ionic state of the solute and data sources. The herein obtained models, and quality-assessed datasets are publicly available.
Collapse
Affiliation(s)
- P Llompart
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
- IDD/CADD, Sanofi, Vitry-Sur-Seine, France
| | | | - S Baybekov
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
| | - D Horvath
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
| | - G Marcou
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France.
| | - A Varnek
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
| |
Collapse
|
14
|
He Y, Liu K, Liu Y, Han W. Prediction of bitterness based on modular designed graph neural network. BIOINFORMATICS ADVANCES 2024; 4:vbae041. [PMID: 38566918 PMCID: PMC10987211 DOI: 10.1093/bioadv/vbae041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 01/31/2024] [Accepted: 03/11/2024] [Indexed: 04/04/2024]
Abstract
Motivation Bitterness plays a pivotal role in our ability to identify and evade harmful substances in food. As one of the five tastes, it constitutes a critical component of our sensory experiences. However, the reliance on human tasting for discerning flavors presents cost challenges, rendering in silico prediction of bitterness a more practical alternative. Results In this study, we introduce the use of Graph Neural Networks (GNNs) in bitterness prediction, superseding traditional machine learning techniques. We developed an advanced model, a Hybrid Graph Neural Network (HGNN), surpassing conventional GNNs according to tests on public datasets. Using HGNN and three other GNNs, we designed BitterGNNs, a bitterness predictor that achieved an AUC value of 0.87 in both external bitter/non-bitter and bitter/sweet evaluations, outperforming the acclaimed RDKFP-MLP predictor with AUC values of 0.86 and 0.85. We further created a bitterness prediction website and database, TastePD (https://www.tastepd.com/). The BitterGNNs predictor, built on GNNs, offers accurate bitterness predictions, enhancing the efficacy of bitterness prediction, aiding advanced food testing methodology development, and deepening our understanding of bitterness origins. Availability and implementation TastePD can be available at https://www.tastepd.com, all codes are at https://github.com/heyigacu/BitterGNN.
Collapse
Affiliation(s)
- Yi He
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Science, Jilin University, Changchun 130012, China
| | - Kaifeng Liu
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Science, Jilin University, Changchun 130012, China
| | - Yuyang Liu
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Science, Jilin University, Changchun 130012, China
| | - Weiwei Han
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Science, Jilin University, Changchun 130012, China
| |
Collapse
|
15
|
Hatzakis N, Kaestel-Hansen J, de Sautu M, Saminathan A, Scanavachi G, Correia R, Nielsen AJ, Bleshoey S, Boomsma W, Kirchhausen T. Deep learning assisted single particle tracking for automated correlation between diffusion and function. RESEARCH SQUARE 2024:rs.3.rs-3716053. [PMID: 38352328 PMCID: PMC10862944 DOI: 10.21203/rs.3.rs-3716053/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/21/2024]
Abstract
Sub-cellular diffusion in living systems reflects cellular processes and interactions. Recent advances in optical microscopy allow the tracking of this nanoscale diffusion of individual objects with an unprecedented level of precision. However, the agnostic and automated extraction of functional information from the diffusion of molecules and organelles within the sub-cellular environment, is labor-intensive and poses a significant challenge. Here we introduce DeepSPT, a deep learning framework to interpret the diffusional 2D or 3D temporal behavior of objects in a rapid and efficient manner, agnostically. Demonstrating its versatility, we have applied DeepSPT to automated mapping of the early events of viral infections, identifying distinct types of endosomal organelles, and clathrin-coated pits and vesicles with up to 95% accuracy and within seconds instead of weeks. The fact that DeepSPT effectively extracts biological information from diffusion alone illustrates that besides structure, motion encodes function at the molecular and subcellular level.
Collapse
|
16
|
Ma M, Lei X. A deep learning framework for predicting molecular property based on multi-type features fusion. Comput Biol Med 2024; 169:107911. [PMID: 38160501 DOI: 10.1016/j.compbiomed.2023.107911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 12/18/2023] [Accepted: 12/24/2023] [Indexed: 01/03/2024]
Abstract
Extracting expressive molecular features is essential for molecular property prediction. Sequence-based representation is a common representation of molecules, which ignores the structure information of molecules. While molecular graph representation has a weak ability in expressing the 3D structure. In this article, we try to make use of the advantages of different type representations simultaneously for molecular property prediction. Thus, we propose a fusion model named DLF-MFF, which integrates the multi-type molecular features. Specifically, we first extract four different types of features from molecular fingerprints, 2D molecular graph, 3D molecular graph and molecular image. Then, in order to learn molecular features individually, we use four essential deep learning frameworks, which correspond to four distinct molecular representations. The final molecular representation is created by integrating the four feature vectors and feeding them into prediction layer to predict molecular property. We compare DLF-MFF with 7 state-of-the-art methods on 6 benchmark datasets consisting of multiple molecular properties, the experimental results show that DLF-MFF achieves state-of-the-art performance on 6 benchmark datasets. Moreover, DLF-MFF is applied to identify potential anti-SARS-CoV-2 inhibitor from 2500 drugs. We predict probability of each drug being inferred as a 3CL protease inhibitor and also calculate the binding affinity scores between each drug and 3CL protease. The results show that DLF-MFF product better performance in the identification of anti-SARS-CoV-2 inhibitor. This work is expected to offer novel research perspectives for accurate prediction of molecular properties and provide valuable insights into drug repurposing for COVID-19.
Collapse
Affiliation(s)
- Mei Ma
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China; School of Mathematics and Statistics, Qinghai Normal University, Qinghai, 810000, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China.
| |
Collapse
|
17
|
Zhang Y, Zhou Y, Zhou Y, Yu X, Shen X, Hong Y, Zhang Y, Wang S, Mou M, Zhang J, Tao L, Gao J, Qiu Y, Chen Y, Zhu F. TheMarker: a comprehensive database of therapeutic biomarkers. Nucleic Acids Res 2024; 52:D1450-D1464. [PMID: 37850638 PMCID: PMC10767989 DOI: 10.1093/nar/gkad862] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 09/21/2023] [Accepted: 09/29/2023] [Indexed: 10/19/2023] Open
Abstract
Distinct from the traditional diagnostic/prognostic biomarker (adopted as the indicator of disease state/process), the therapeutic biomarker (ThMAR) has emerged to be very crucial in the clinical development and clinical practice of all therapies. There are five types of ThMAR that have been found to play indispensable roles in various stages of drug discovery, such as: Pharmacodynamic Biomarker essential for guaranteeing the pharmacological effects of a therapy, Safety Biomarker critical for assessing the extent or likelihood of therapy-induced toxicity, Monitoring Biomarker indispensable for guiding clinical management by serially measuring patients' status, Predictive Biomarker crucial for maximizing the clinical outcome of a therapy for specific individuals, and Surrogate Endpoint fundamental for accelerating the approval of a therapy. However, these data of ThMARs has not been comprehensively described by any of the existing databases. Herein, a database, named 'TheMarker', was therefore constructed to (a) systematically offer all five types of ThMAR used at different stages of drug development, (b) comprehensively describe ThMAR information for the largest number of drugs among available databases, (c) extensively cover the widest disease classes by not just focusing on anticancer therapies. These data in TheMarker are expected to have great implication and significant impact on drug discovery and clinical practice, and it is freely accessible without any login requirement at: https://idrblab.org/themarker.
Collapse
Affiliation(s)
- Yintao Zhang
- College of Pharmaceutical Sciences, The First Affiliated Hospital, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Ying Zhou
- College of Pharmaceutical Sciences, The First Affiliated Hospital, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- National Key Laboratory of Diagnosis and Treatment of Severe Infectious Disease, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, 310000, China
| | - Yuan Zhou
- College of Pharmaceutical Sciences, The First Affiliated Hospital, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Xinyuan Yu
- College of Pharmaceutical Sciences, The First Affiliated Hospital, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Xinyi Shen
- Department of Environmental Health Sciences, Yale School of Public Health, Yale University, New Haven 06510, USA
| | - Yanfeng Hong
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China
| | - Yuxin Zhang
- College of Pharmaceutical Sciences, The First Affiliated Hospital, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Shanshan Wang
- Qian Xuesen Collaborative Research Center of Astrochemistry and Space Life Sciences, Institute of Drug Discovery Technology, Ningbo University, Ningbo 315211, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, The First Affiliated Hospital, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Jinsong Zhang
- College of Pharmaceutical Sciences, The First Affiliated Hospital, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China
| | - Jianqing Gao
- College of Pharmaceutical Sciences, The First Affiliated Hospital, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yunqing Qiu
- College of Pharmaceutical Sciences, The First Affiliated Hospital, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- National Key Laboratory of Diagnosis and Treatment of Severe Infectious Disease, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, 310000, China
| | - Yuzong Chen
- State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, The Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, China
- Institute of Biomedical Health Technology and Engineering, Shenzhen Bay Laboratory, Shenzhen 518000, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The First Affiliated Hospital, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
18
|
Wang Y, Pan Z, Mou M, Xia W, Zhang H, Zhang H, Liu J, Zheng L, Luo Y, Zheng H, Yu X, Lian X, Zeng Z, Li Z, Zhang B, Zheng M, Li H, Hou T, Zhu F. A task-specific encoding algorithm for RNAs and RNA-associated interactions based on convolutional autoencoder. Nucleic Acids Res 2023; 51:e110. [PMID: 37889083 PMCID: PMC10682500 DOI: 10.1093/nar/gkad929] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 08/01/2023] [Accepted: 10/10/2023] [Indexed: 10/28/2023] Open
Abstract
RNAs play essential roles in diverse physiological and pathological processes by interacting with other molecules (RNA/protein/compound), and various computational methods are available for identifying these interactions. However, the encoding features provided by existing methods are limited and the existing tools does not offer an effective way to integrate the interacting partners. In this study, a task-specific encoding algorithm for RNAs and RNA-associated interactions was therefore developed. This new algorithm was unique in (a) realizing comprehensive RNA feature encoding by introducing a great many of novel features and (b) enabling task-specific integration of interacting partners using convolutional autoencoder-directed feature embedding. Compared with existing methods/tools, this novel algorithm demonstrated superior performances in diverse benchmark testing studies. This algorithm together with its source code could be readily accessed by all user at: https://idrblab.org/corain/ and https://github.com/idrblab/corain/.
Collapse
Affiliation(s)
- Yunxia Wang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Weiqi Xia
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Hongning Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Hanyu Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Jin Liu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Lingyan Zheng
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-ZJU Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Hanqi Zheng
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Xinyuan Yu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Xichen Lian
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Zhenyu Zeng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-ZJU Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Zhaorong Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-ZJU Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Bing Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-ZJU Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Mingyue Zheng
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Honglin Li
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
- School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-ZJU Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang, China
| |
Collapse
|
19
|
Wu W, Qian J, Liang C, Yang J, Ge G, Zhou Q, Guan X. GeoDILI: A Robust and Interpretable Model for Drug-Induced Liver Injury Prediction Using Graph Neural Network-Based Molecular Geometric Representation. Chem Res Toxicol 2023; 36:1717-1730. [PMID: 37839069 DOI: 10.1021/acs.chemrestox.3c00199] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2023]
Abstract
Drug-induced liver injury (DILI) is a significant cause of drug failure and withdrawal due to liver damage. Accurate prediction of hepatotoxic compounds is crucial for safe drug development. Several DILI prediction models have been published, but they are built on different data sets, making it difficult to compare model performance. Moreover, most existing models are based on molecular fingerprints or descriptors, neglecting molecular geometric properties and lacking interpretability. To address these limitations, we developed GeoDILI, an interpretable graph neural network that uses a molecular geometric representation. First, we utilized a geometry-based pretrained molecular representation and optimized it on the DILI data set to improve predictive performance. Second, we leveraged gradient information to obtain high-precision atomic-level weights and deduce the dominant substructure. We benchmarked GeoDILI against recently published DILI prediction models, as well as popular GNN models and fingerprint-based machine learning models using the same data set, showing superior predictive performance of our proposed model. We applied the interpretable method in the DILI data set and derived seven precise and mechanistically elucidated structural alerts. Overall, GeoDILI provides a promising approach for accurate and interpretable DILI prediction with potential applications in drug discovery and safety assessment. The data and source code are available at GitHub repository (https://github.com/CSU-QJY/GeoDILI).
Collapse
Affiliation(s)
- Wenxuan Wu
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China
| | - Jiayu Qian
- School of Mathematics and Statistics, Central South University, Changsha, Hunan 410083, China
| | - Changjie Liang
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China
| | - Jingya Yang
- School of Mathematics and Statistics, Central South University, Changsha, Hunan 410083, China
| | - Guangbo Ge
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China
| | - Qingping Zhou
- School of Mathematics and Statistics, Central South University, Changsha, Hunan 410083, China
| | - Xiaoqing Guan
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China
| |
Collapse
|
20
|
Du BX, Xu Y, Yiu SM, Yu H, Shi JY. ADMET property prediction via multi-task graph learning under adaptive auxiliary task selection. iScience 2023; 26:108285. [PMID: 38026198 PMCID: PMC10654589 DOI: 10.1016/j.isci.2023.108285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 09/18/2023] [Accepted: 10/18/2023] [Indexed: 12/01/2023] Open
Abstract
It is a critical step in lead optimization to evaluate the absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties of drug-like compounds. Classical single-task learning (STL) has effectively predicted individual ADMET endpoints with abundant labels. Conversely, multi-task learning (MTL) can predict multiple ADMET endpoints with fewer labels, but ensuring task synergy and highlighting key molecular substructures remain challenges. To tackle these issues, this work elaborates a multi-task graph learning framework for predicting multiple ADMET properties of drug-like small molecules (MTGL-ADMET) by holding a new paradigm of MTL, "one primary, multiple auxiliaries." It first adeptly combines status theory with maximum flow for auxiliary task selection. The subsequent phase introduces a primary-task-centric MTL model with integrated modules. MTGL-ADMET not only outstrips existing STL and MTL methods but also offers a transparent lens into crucial molecular substructures. It is anticipated that this work can promote lead compound finding and optimization in drug discovery.
Collapse
Affiliation(s)
- Bing-Xue Du
- School of Life Sciences, Northwestern Polytechnical University, Xi’an 710072, China
| | - Yi Xu
- School of Life Sciences, Northwestern Polytechnical University, Xi’an 710072, China
| | - Siu-Ming Yiu
- Department of Computer Science, The University of Hong Kong, Hong Kong, China
| | - Hui Yu
- School of Life Sciences, Northwestern Polytechnical University, Xi’an 710072, China
| | - Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi’an 710072, China
| |
Collapse
|
21
|
Yang J, Jiang C, Chen J, Qin L, Cheng G. Predicting GPR40 Agonists with A Deep Learning-Based Ensemble Model. ChemistryOpen 2023; 12:e202300051. [PMID: 37404062 PMCID: PMC10661831 DOI: 10.1002/open.202300051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 05/23/2023] [Indexed: 07/06/2023] Open
Abstract
Recent studies have identified G protein-coupled receptor 40 (GPR40) as a promising target for treating type 2 diabetes mellitus, and GPR40 agonists have several superior effects over other hypoglycemic drugs, including cardiovascular protection and suppression of glucagon levels. In this study, we constructed an up-to-date GPR40 ligand dataset for training models and performed a systematic optimization of the ensemble model, resulting in a powerful ensemble model (ROC AUC: 0.9496) for distinguishing GPR40 agonists and non-agonists. The ensemble model is divided into three layers, and the optimization process is carried out in each layer. We believe that these results will prove helpful for both the development of GPR40 agonists and ensemble models. All the data and models are available on GitHub. (https://github.com/Jiamin-Yang/ensemble_model).
Collapse
Affiliation(s)
- Jiamin Yang
- School of Pharmaceutical SciencesZhejiang Chinese Medical UniversityHangzhouP. R. China310053
| | - Chen Jiang
- School of Pharmaceutical SciencesZhejiang Chinese Medical UniversityHangzhouP. R. China310053
| | - Jing Chen
- School of Pharmaceutical SciencesZhejiang Chinese Medical UniversityHangzhouP. R. China310053
| | - Lu‐Ping Qin
- School of Pharmaceutical SciencesZhejiang Chinese Medical UniversityHangzhouP. R. China310053
| | - Gang Cheng
- School of Pharmaceutical SciencesZhejiang Chinese Medical UniversityHangzhouP. R. China310053
| |
Collapse
|
22
|
Deng J, Yang Z, Wang H, Ojima I, Samaras D, Wang F. A systematic study of key elements underlying molecular property prediction. Nat Commun 2023; 14:6395. [PMID: 37833262 PMCID: PMC10575948 DOI: 10.1038/s41467-023-41948-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 09/18/2023] [Indexed: 10/15/2023] Open
Abstract
Artificial intelligence (AI) has been widely applied in drug discovery with a major task as molecular property prediction. Despite booming techniques in molecular representation learning, key elements underlying molecular property prediction remain largely unexplored, which impedes further advancements in this field. Herein, we conduct an extensive evaluation of representative models using various representations on the MoleculeNet datasets, a suite of opioids-related datasets and two additional activity datasets from the literature. To investigate the predictive power in low-data and high-data space, a series of descriptors datasets of varying sizes are also assembled to evaluate the models. In total, we have trained 62,820 models, including 50,220 models on fixed representations, 4200 models on SMILES sequences and 8400 models on molecular graphs. Based on extensive experimentation and rigorous comparison, we show that representation learning models exhibit limited performance in molecular property prediction in most datasets. Besides, multiple key elements underlying molecular property prediction can affect the evaluation results. Furthermore, we show that activity cliffs can significantly impact model prediction. Finally, we explore into potential causes why representation learning models can fail and show that dataset size is essential for representation learning models to excel.
Collapse
Affiliation(s)
- Jianyuan Deng
- Stony Brook University, Department of Biomedical Informatics, Stony Brook, NY, 11794, USA
| | - Zhibo Yang
- Stony Brook University, Department of Computer Science, Stony Brook, NY, 11794, USA
| | - Hehe Wang
- Stony Brook University, Department of Chemistry, Stony Brook, NY, 11794, USA
| | - Iwao Ojima
- Stony Brook University, Department of Chemistry, Stony Brook, NY, 11794, USA
| | - Dimitris Samaras
- Stony Brook University, Department of Computer Science, Stony Brook, NY, 11794, USA
| | - Fusheng Wang
- Stony Brook University, Department of Biomedical Informatics, Stony Brook, NY, 11794, USA.
- Stony Brook University, Department of Computer Science, Stony Brook, NY, 11794, USA.
| |
Collapse
|
23
|
Han M, Jin B, Liang J, Huang C, Arp HPH. Developing machine learning approaches to identify candidate persistent, mobile and toxic (PMT) and very persistent and very mobile (vPvM) substances based on molecular structure. WATER RESEARCH 2023; 244:120470. [PMID: 37595327 DOI: 10.1016/j.watres.2023.120470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 08/07/2023] [Accepted: 08/08/2023] [Indexed: 08/20/2023]
Abstract
Determining which substances on the global market could be classified as persistent, mobile and toxic (PMT) substances or very persistent, very mobile (vPvM) substances is essential to prevent or reduce drinking water contamination from them. This study developed machine learning models based on different molecular descriptors (MDs) and defined applicability domains for the screening of PMT/vPvM substances. The models were trained with 3111 substances with expert weight-of-evidence based PMT/vPvM hazard classifications that considered the highest quality data available. The model was based on the hypothesis that PMT/vPvM substances contain similar MDs, representative of chemical structures resistant to degradation, be associated with low sorption (or high-water solubility) and in some cases be associated with known toxic mechanisms. All possible model combinations were tested by integrating different molecular description methods, data balancing strategies and machine learning algorithms. Our model allows one-step prediction of candidate PMT/vPvM substances, and our method was compared with the approach predicting P, M and T separately (i.e. three-step prediction). The results showed that the one-step model achieved a higher accuracy of 92% for PMT/vPvM identification (i.e. positive samples) for an internal test set, and also resulted in a higher accuracy of 90% for an external test set of chemical pollutants detected in Taihu Lake, China. Furthermore, prediction mechanism of the model was interpreted by Shapley additive explanations (SHAP). This work presents an advance of big data in silico screening models for the identification of substances that potentially meet the PMT/vPvM criteria.
Collapse
Affiliation(s)
- Min Han
- State Key Laboratory of Organic Geochemistry, Guangzhou Institute of Geochemistry, Chinese Academy of Sciences, Guangzhou, 510640, China; CAS Center for Excellence in Deep Earth Science, Guangzhou, 510640, China; University of Chinese Academy of Sciences, Beijing, 10069, China
| | - Biao Jin
- State Key Laboratory of Organic Geochemistry, Guangzhou Institute of Geochemistry, Chinese Academy of Sciences, Guangzhou, 510640, China; CAS Center for Excellence in Deep Earth Science, Guangzhou, 510640, China; University of Chinese Academy of Sciences, Beijing, 10069, China.
| | - Jun Liang
- School of Software, South China Normal University, Foshan, 528225, China
| | - Chen Huang
- State Key Laboratory of Organic Geochemistry, Guangzhou Institute of Geochemistry, Chinese Academy of Sciences, Guangzhou, 510640, China; CAS Center for Excellence in Deep Earth Science, Guangzhou, 510640, China; University of Chinese Academy of Sciences, Beijing, 10069, China
| | - Hans Peter H Arp
- Norwegian Geotechnical Institute (NGI), P.O. Box 3930 Ullevaal Stadion, Oslo, N-0806, Norway; Norwegian University of Science and Technology (NTNU), Trondheim, NO-7491, Norway
| |
Collapse
|
24
|
Mou M, Pan Z, Zhou Z, Zheng L, Zhang H, Shi S, Li F, Sun X, Zhu F. A Transformer-Based Ensemble Framework for the Prediction of Protein-Protein Interaction Sites. RESEARCH (WASHINGTON, D.C.) 2023; 6:0240. [PMID: 37771850 PMCID: PMC10528219 DOI: 10.34133/research.0240] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 09/08/2023] [Indexed: 09/30/2023]
Abstract
The identification of protein-protein interaction (PPI) sites is essential in the research of protein function and the discovery of new drugs. So far, a variety of computational tools based on machine learning have been developed to accelerate the identification of PPI sites. However, existing methods suffer from the low predictive accuracy or the limited scope of application. Specifically, some methods learned only global or local sequential features, leading to low predictive accuracy, while others achieved improved performance by extracting residue interactions from structures but were limited in their application scope for the serious dependence on precise structure information. There is an urgent need to develop a method that integrates comprehensive information to realize proteome-wide accurate profiling of PPI sites. Herein, a novel ensemble framework for PPI sites prediction, EnsemPPIS, was therefore proposed based on transformer and gated convolutional networks. EnsemPPIS can effectively capture not only global and local patterns but also residue interactions. Specifically, EnsemPPIS was unique in (a) extracting residue interactions from protein sequences with transformer and (b) further integrating global and local sequential features with the ensemble learning strategy. Compared with various existing methods, EnsemPPIS exhibited either superior performance or broader applicability on multiple PPI sites prediction tasks. Moreover, pattern analysis based on the interpretability of EnsemPPIS demonstrated that EnsemPPIS was fully capable of learning residue interactions within the local structure of PPI sites using only sequence information. The web server of EnsemPPIS is freely available at http://idrblab.org/ensemppis.
Collapse
Affiliation(s)
- Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Zhimeng Zhou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Lingyan Zheng
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Hanyu Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Shuiyang Shi
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Fengcheng Li
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Xiuna Sun
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
25
|
Turon G, Hlozek J, Woodland JG, Kumar A, Chibale K, Duran-Frigola M. First fully-automated AI/ML virtual screening cascade implemented at a drug discovery centre in Africa. Nat Commun 2023; 14:5736. [PMID: 37714843 PMCID: PMC10504240 DOI: 10.1038/s41467-023-41512-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 09/06/2023] [Indexed: 09/17/2023] Open
Abstract
Streamlined data-driven drug discovery remains challenging, especially in resource-limited settings. We present ZairaChem, an artificial intelligence (AI)- and machine learning (ML)-based tool for quantitative structure-activity/property relationship (QSAR/QSPR) modelling. ZairaChem is fully automated, requires low computational resources and works across a broad spectrum of datasets. We describe an end-to-end implementation at the H3D Centre, the leading integrated drug discovery unit in Africa, at which no prior AI/ML capabilities were available. By leveraging in-house data collected over a decade, we have developed a virtual screening cascade for malaria and tuberculosis drug discovery comprising 15 models for key decision-making assays ranging from whole-cell phenotypic screening and cytotoxicity to aqueous solubility, permeability, microsomal metabolic stability, cytochrome inhibition, and cardiotoxicity. We show how computational profiling of compounds, prior to synthesis and testing, can inform progression of frontrunner compounds at H3D. This project is a first-of-its-kind deployment at scale of AI/ML tools in a research centre operating in a low-resource setting.
Collapse
Affiliation(s)
- Gemma Turon
- Ersilia Open Source Initiative, Cambridge, UK
| | - Jason Hlozek
- Department of Chemistry and Holistic Drug Discovery and Development (H3D) Centre, University of Cape Town, Cape Town, South Africa
| | - John G Woodland
- Department of Chemistry and Holistic Drug Discovery and Development (H3D) Centre, University of Cape Town, Cape Town, South Africa
- South African Medical Research Council Drug Discovery and Development Research Unit, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | - Ankur Kumar
- Ersilia Open Source Initiative, Cambridge, UK
| | - Kelly Chibale
- Department of Chemistry and Holistic Drug Discovery and Development (H3D) Centre, University of Cape Town, Cape Town, South Africa.
- South African Medical Research Council Drug Discovery and Development Research Unit, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa.
| | | |
Collapse
|
26
|
Aldughayfiq B, Ashfaq F, Jhanjhi NZ, Humayun M. Capturing Semantic Relationships in Electronic Health Records Using Knowledge Graphs: An Implementation Using MIMIC III Dataset and GraphDB. Healthcare (Basel) 2023; 11:1762. [PMID: 37372880 DOI: 10.3390/healthcare11121762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 06/03/2023] [Accepted: 06/12/2023] [Indexed: 06/29/2023] Open
Abstract
Electronic health records (EHRs) are an increasingly important source of information for healthcare professionals and researchers. However, EHRs are often fragmented, unstructured, and difficult to analyze due to the heterogeneity of the data sources and the sheer volume of information. Knowledge graphs have emerged as a powerful tool for capturing and representing complex relationships within large datasets. In this study, we explore the use of knowledge graphs to capture and represent complex relationships within EHRs. Specifically, we address the following research question: Can a knowledge graph created using the MIMIC III dataset and GraphDB effectively capture semantic relationships within EHRs and enable more efficient and accurate data analysis? We map the MIMIC III dataset to an ontology using text refinement and Protege; then, we create a knowledge graph using GraphDB and use SPARQL queries to retrieve and analyze information from the graph. Our results demonstrate that knowledge graphs can effectively capture semantic relationships within EHRs, enabling more efficient and accurate data analysis. We provide examples of how our implementation can be used to analyze patient outcomes and identify potential risk factors. Our results demonstrate that knowledge graphs are an effective tool for capturing semantic relationships within EHRs, enabling a more efficient and accurate data analysis. Our implementation provides valuable insights into patient outcomes and potential risk factors, contributing to the growing body of literature on the use of knowledge graphs in healthcare. In particular, our study highlights the potential of knowledge graphs to support decision-making and improve patient outcomes by enabling a more comprehensive and holistic analysis of EHR data. Overall, our research contributes to a better understanding of the value of knowledge graphs in healthcare and lays the foundation for further research in this area.
Collapse
Affiliation(s)
- Bader Aldughayfiq
- Department of Information Systems, College of Computer and Information Sciences, Jouf University, Sakaka 72388, Saudi Arabia
| | - Farzeen Ashfaq
- School of Computer Science-SCS, Taylor's University, Subang Jaya 47500, Malaysia
| | - N Z Jhanjhi
- School of Computer Science-SCS, Taylor's University, Subang Jaya 47500, Malaysia
| | - Mamoona Humayun
- Department of Information Systems, College of Computer and Information Sciences, Jouf University, Sakaka 72388, Saudi Arabia
| |
Collapse
|
27
|
Yan X, Yue T, Winkler DA, Yin Y, Zhu H, Jiang G, Yan B. Converting Nanotoxicity Data to Information Using Artificial Intelligence and Simulation. Chem Rev 2023. [PMID: 37262026 DOI: 10.1021/acs.chemrev.3c00070] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Decades of nanotoxicology research have generated extensive and diverse data sets. However, data is not equal to information. The question is how to extract critical information buried in vast data streams. Here we show that artificial intelligence (AI) and molecular simulation play key roles in transforming nanotoxicity data into critical information, i.e., constructing the quantitative nanostructure (physicochemical properties)-toxicity relationships, and elucidating the toxicity-related molecular mechanisms. For AI and molecular simulation to realize their full impacts in this mission, several obstacles must be overcome. These include the paucity of high-quality nanomaterials (NMs) and standardized nanotoxicity data, the lack of model-friendly databases, the scarcity of specific and universal nanodescriptors, and the inability to simulate NMs at realistic spatial and temporal scales. This review provides a comprehensive and representative, but not exhaustive, summary of the current capability gaps and tools required to fill these formidable gaps. Specifically, we discuss the applications of AI and molecular simulation, which can address the large-scale data challenge for nanotoxicology research. The need for model-friendly nanotoxicity databases, powerful nanodescriptors, new modeling approaches, molecular mechanism analysis, and design of the next-generation NMs are also critically discussed. Finally, we provide a perspective on future trends and challenges.
Collapse
Affiliation(s)
- Xiliang Yan
- Institute of Environmental Research at the Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| | - Tongtao Yue
- Key Laboratory of Marine Environment and Ecology, Ministry of Education, Institute of Coastal Environmental Pollution Control, Ocean University of China, Qingdao 266100, China
| | - David A Winkler
- Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria 3052, Australia
- School of Pharmacy, University of Nottingham, Nottingham NG7 2QL, U.K
- Department of Biochemistry and Chemistry, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, Victoria 3086, Australia
| | - Yongguang Yin
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Hao Zhu
- Department of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, United States
| | - Guibin Jiang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Bing Yan
- Institute of Environmental Research at the Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| |
Collapse
|
28
|
Mucllari E, Zadorozhnyy V, Ye Q, Nguyen DD. Novel Molecular Representations Using Neumann-Cayley Orthogonal Gated Recurrent Unit. J Chem Inf Model 2023; 63:2656-2666. [PMID: 37075324 DOI: 10.1021/acs.jcim.2c01526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/21/2023]
Abstract
Advances in deep neural networks (DNNs) have made a very powerful machine learning method available to researchers across many fields of study, including the biomedical and cheminformatics communities, where DNNs help to improve tasks such as protein performance, molecular design, drug discovery, etc. Many of those tasks rely on molecular descriptors for representing molecular characteristics in cheminformatics. Despite significant efforts and the introduction of numerous methods that derive molecular descriptors, the quantitative prediction of molecular properties remains challenging. One widely used method of encoding molecule features into bit strings is the molecular fingerprint. In this work, we propose using new Neumann-Cayley Gated Recurrent Units (NC-GRU) inside the Neural Nets encoder (AutoEncoder) to create neural molecular fingerprints (NC-GRU fingerprints). The NC-GRU AutoEncoder introduces orthogonal weights into widely used GRU architecture, resulting in faster, more stable training, and more reliable molecular fingerprints. Integrating novel NC-GRU fingerprints and Multi-Task DNN schematics improves the performance of various molecular-related tasks such as toxicity, partition coefficient, lipophilicity, and solvation-free energy, producing state-of-the-art results on several benchmarks.
Collapse
Affiliation(s)
- Edison Mucllari
- Department of Mathematics, University of Kentucky, Lexington, Kentucky 40506, United States
| | - Vasily Zadorozhnyy
- Department of Mathematics, University of Kentucky, Lexington, Kentucky 40506, United States
| | - Qiang Ye
- Department of Mathematics, University of Kentucky, Lexington, Kentucky 40506, United States
| | - Duc Duy Nguyen
- Department of Mathematics, University of Kentucky, Lexington, Kentucky 40506, United States
| |
Collapse
|
29
|
Peng W, Chen T, Liu H, Dai W, Yu N, Lan W. Improving drug response prediction based on two-space graph convolution. Comput Biol Med 2023; 158:106859. [PMID: 37023539 DOI: 10.1016/j.compbiomed.2023.106859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Revised: 02/22/2023] [Accepted: 03/30/2023] [Indexed: 04/03/2023]
Abstract
Patients with the same cancer types may present different genomic features and therefore have different drug sensitivities. Accordingly, correctly predicting patients' responses to the drugs can guide treatment decisions and improve the outcome of cancer patients. Existing computational methods leverage the graph convolution network model to aggregate features of different types of nodes in the heterogeneous network. They most fail to consider the similarity between homogeneous nodes. To this end, we propose an algorithm based on two-space graph convolutional neural networks, TSGCNN, to predict the response of anticancer drugs. TSGCNN first constructs the cell line feature space and the drug feature space and separately performs the graph convolution operation on the feature spaces to diffuse similarity information among homogeneous nodes. After that, we generate a heterogeneous network based on the known cell line and drug relationship and perform graph convolution operations on the heterogeneous network to collect the features of different types of nodes. Subsequently, the algorithm produces the final feature representations for cell lines and drugs by adding their self features, the feature space representations, and the heterogeneous space representations. Finally, we leverage the linear correlation coefficient decoder to reconstruct the cell line-drug correlation matrix for drug response prediction based on the final representations. We tested our model on the Cancer Drug Sensitivity Data (GDSC) and Cancer Cell Line Encyclopedia (CCLE) databases. The results indicate that TSGCNN shows excellent performance drug response prediction compared with other eight state-of-the-art methods.
Collapse
Affiliation(s)
- Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, 650050, China.
| | - Tielin Chen
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650050, China
| | - Hancheng Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650050, China
| | - Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, 650050, China
| | - Ning Yu
- State University of New York, The College at Brockport, Department of Computing Sciences, 350 New Campus Drive, Brockport, NY 14422, United States of America
| | - Wei Lan
- School of Computer Electronic and Information, Guangxi University, Nanning, Guangxi 530004, China
| |
Collapse
|
30
|
Duran-Frigola M, Cigler M, Winter GE. Advancing Targeted Protein Degradation via Multiomics Profiling and Artificial Intelligence. J Am Chem Soc 2023; 145:2711-2732. [PMID: 36706315 PMCID: PMC9912273 DOI: 10.1021/jacs.2c11098] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Only around 20% of the human proteome is considered to be druggable with small-molecule antagonists. This leaves some of the most compelling therapeutic targets outside the reach of ligand discovery. The concept of targeted protein degradation (TPD) promises to overcome some of these limitations. In brief, TPD is dependent on small molecules that induce the proximity between a protein of interest (POI) and an E3 ubiquitin ligase, causing ubiquitination and degradation of the POI. In this perspective, we want to reflect on current challenges in the field, and discuss how advances in multiomics profiling, artificial intelligence, and machine learning (AI/ML) will be vital in overcoming them. The presented roadmap is discussed in the context of small-molecule degraders but is equally applicable for other emerging proximity-inducing modalities.
Collapse
Affiliation(s)
- Miquel Duran-Frigola
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria,Ersilia
Open Source Initiative, 28 Belgrave Road, CB1 3DE, Cambridge, United Kingdom,
| | - Marko Cigler
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria
| | - Georg E. Winter
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria,
| |
Collapse
|
31
|
Dong J, Qian J, Yu K, Huang S, Cheng X, Chen F, Jiang H, Zeng W. Rational Design of Organelle-Targeted Fluorescent Probes: Insights from Artificial Intelligence. RESEARCH (WASHINGTON, D.C.) 2023; 6:0075. [PMID: 36930810 PMCID: PMC10013958 DOI: 10.34133/research.0075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Accepted: 01/18/2023] [Indexed: 01/27/2023]
Abstract
Monitoring the physiological changes of organelles is essential for understanding the local biological information of cells and for improving the diagnosis and therapy of diseases. Currently, fluorescent probes are considered as the most powerful tools for imaging and have been widely applied in biomedical fields. However, the expected targeting effects of these probes are often inconsistent with the real experiments. The design of fluorescent probes mainly depends on the empirical knowledge of researchers, which was inhibited by limited chemical space and low efficiency. Herein, we proposed a novel multilevel framework for the prediction of organelle-targeted fluorescent probes by employing advanced artificial intelligence algorithms. In this way, not only the targeting mechanism could be interpreted beyond intuitions but also a quick evaluation method could be established for the rational design. Furthermore, the targeting and imaging powers of the optimized and synthesized probes based on this methodology were verified by quantitative calculation and experiments.
Collapse
Affiliation(s)
- Jie Dong
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410083, P.R. China
| | - Jie Qian
- National Engineering Research Center of Rice and Byproduct Deep Processing, School of Food Science and Engineering, Central South University of Forestry and Technology, Changsha 410004, P.R. China
| | - Kunqian Yu
- State Key Laboratory of Drug Research, Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, P.R. China
| | - Shuai Huang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410083, P.R. China
| | - Xiang Cheng
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410083, P.R. China
| | - Fei Chen
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410083, P.R. China
| | - Hualiang Jiang
- State Key Laboratory of Drug Research, Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, P.R. China
| | - Wenbin Zeng
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410083, P.R. China
| |
Collapse
|
32
|
de Souza LP, Fernie AR. Databases and Tools to Investigate Protein-Metabolite Interactions. Methods Mol Biol 2023; 2554:231-249. [PMID: 36178629 DOI: 10.1007/978-1-0716-2624-5_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Protein-metabolite interactions (PMIs) are directly responsible for the regulation of numerous processes. From the direct regulation of enzymes to complex developmental processes intermediated by hormones, PMIs are central to understanding the molecular mechanisms of important physiological phenomena. Still, proving such interactions experimentally has proven an arduous task. We discuss here some of the current technologies contributing to expand our knowledge on PMIs, with particular emphasis on platforms and databases to explore the highly heterogenous nature of characterized PMIs, which is likely to be an essential resource on the development of new computational approaches to predict and validate interactions based on large-scale PMI screenings.
Collapse
Affiliation(s)
| | - Alisdair R Fernie
- Max-Planck-Institute of Molecular Plant Physiology, Potsdam-Golm, Germany.
| |
Collapse
|
33
|
Zhou H, Shan M, Qin LP, Cheng G. Reliable prediction of cannabinoid receptor 2 ligand by machine learning based on combined fingerprints. Comput Biol Med 2023; 152:106379. [PMID: 36502694 DOI: 10.1016/j.compbiomed.2022.106379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 11/15/2022] [Accepted: 11/28/2022] [Indexed: 12/02/2022]
Abstract
Cannabinoid receptors, as part of the family of the G protein-coupled receptors (GPCRs), are involved in various physiological functions. Its subtype cannabinoid receptor subtype 2 (CB2), mainly distributed in the periphery, is a crucial therapeutic target for anti-epileptic, anti-inflammation, anti-fibrosis, and bone metabolism regulation, and it regulates these physiological functions without psychiatric side effects. Recently machine learning methods for predicting biophysics properties have attracted much attention. Successful application of machine learning usually highly depends on the appropriate representation of the compounds. In this study, we comprehensively evaluate the performance of the descriptor-based models (including XGBoost, Random Forest, and KNN) and two graph-based models (D-MPNN, MolMap) for the prediction of the CB2 regulators, and found that XGBoost offers outstanding performance for both regression tasks and classification tasks. 13 different molecular fingerprints and 12 descriptors, as well as their combination were further screened; AvalonFP + AtomPairFP + RDkitFP + MorganFP and AtomPairFP + MorganFP + AvalonFP were the optimum combinations for regression task (R2 increase to 0.667) and classification task (AUC-ROC increase to 0.933), respectively. Specifically, the best XGBoost regression model with optimum features achieves better performance than Mizera's QSAR model on the same dataset developed by Mizera (R2 0.664 versus 0.62). It also achieves optimal performance with an AUC-ROC of 0.917 on the external validation set. By comparison, MolMap and D-MPNN only provide 0.912 and 0.898. The Shapley additive explanation method was used to interpret the models, and features importance were shown for both regression and classification task. The XGBoost model equipped with essential molecular fingerprints combination in this paper may provide valuable clues to designing novel CB2 ligands and developing models for other properties prediction.
Collapse
Affiliation(s)
- Hao Zhou
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University Hangzhou, 310053, People's Republic of China
| | - Mengyi Shan
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University Hangzhou, 310053, People's Republic of China
| | - Lu-Ping Qin
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University Hangzhou, 310053, People's Republic of China.
| | - Gang Cheng
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University Hangzhou, 310053, People's Republic of China.
| |
Collapse
|
34
|
Krishnan SR, Bung N, Padhi S, Bulusu G, Misra P, Pal M, Oruganti S, Srinivasan R, Roy A. De novo design of anti-tuberculosis agents using a structure-based deep learning method. J Mol Graph Model 2023; 118:108361. [PMID: 36257148 DOI: 10.1016/j.jmgm.2022.108361] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 09/10/2022] [Accepted: 10/07/2022] [Indexed: 11/28/2022]
Abstract
Mycobacterium tuberculosis (Mtb) is a pathogen of major concern due to its ability to withstand both first- and second-line antibiotics, leading to drug resistance. Thus, there is a critical need for identification of novel anti-tuberculosis agents targeting Mtb-specific proteins. The ceaseless search for novel antimicrobial agents to combat drug-resistant bacteria can be accelerated by the development of advanced deep learning methods, to explore both existing and uncharted regions of the chemical space. The adaptation of deep learning methods to under-explored pathogens such as Mtb is a challenging aspect, as most of the existing methods rely on the availability of sufficient target-specific ligand data to design novel small molecules with optimized bioactivity. In this work, we report the design of novel anti-tuberculosis agents targeting the Mtb chorismate mutase protein using a structure-based drug design algorithm. The structure-based deep learning method relies on the knowledge of the target protein's binding site structure alone for conditional generation of novel small molecules. The method eliminates the need for curation of a high-quality target-specific small molecule dataset, which remains a challenge even for many druggable targets, including Mtb chorismate mutase. Novel molecules are proposed, that show high complementarity to the target binding site. The graph attention model could identify the probable key binding site residues, which influenced the conditional molecule generator to design new molecules with pharmacophoric features similar to the known inhibitors.
Collapse
Affiliation(s)
| | - Navneet Bung
- TCS Research (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad, 500081, India
| | - Siladitya Padhi
- TCS Research (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad, 500081, India
| | - Gopalakrishnan Bulusu
- TCS Research (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad, 500081, India; Dr. Reddy's Institute of Life Sciences, University of Hyderabad Campus, Gachibowli, Hyderabad, 500046, India
| | - Parimal Misra
- Dr. Reddy's Institute of Life Sciences, University of Hyderabad Campus, Gachibowli, Hyderabad, 500046, India
| | - Manojit Pal
- Dr. Reddy's Institute of Life Sciences, University of Hyderabad Campus, Gachibowli, Hyderabad, 500046, India
| | - Srinivas Oruganti
- Dr. Reddy's Institute of Life Sciences, University of Hyderabad Campus, Gachibowli, Hyderabad, 500046, India
| | - Rajgopal Srinivasan
- TCS Research (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad, 500081, India
| | - Arijit Roy
- TCS Research (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad, 500081, India.
| |
Collapse
|
35
|
Shen WX, Liang SR, Jiang YY, Chen YZ. Enhanced metagenomic deep learning for disease prediction and consistent signature recognition by restructured microbiome 2D representations. PATTERNS (NEW YORK, N.Y.) 2022; 4:100658. [PMID: 36699735 PMCID: PMC9868677 DOI: 10.1016/j.patter.2022.100658] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 07/15/2022] [Accepted: 11/15/2022] [Indexed: 12/23/2022]
Abstract
Metagenomic analysis has been explored for disease diagnosis and biomarker discovery. Low sample sizes, high dimensionality, and sparsity of metagenomic data challenge metagenomic investigations. Here, an unsupervised microbial embedding, grouping, and mapping algorithm (MEGMA) was developed to transform metagenomic data into individualized multichannel microbiome 2D representation by manifold learning and clustering of microbial profiles (e.g., composition, abundance, hierarchy, and taxonomy). These 2D representations enable enhanced disease prediction by established ConvNet-based AggMapNet models, outperforming the commonly used machine learning and deep learning models in metagenomic benchmark datasets. These 2D representations combined with AggMapNet explainable module robustly identified more reliable and replicable disease-prediction microbes (biomarkers). Employing the MEGMA-AggMapNet pipeline for biomarker identification from 5 disease datasets, 84% of the identified biomarkers have been described in over 74 distinct works as important for these diseases. Moreover, the method also discovered highly consistent sets of biomarkers in cross-cohort colorectal cancer (CRC) patients and microbial shifts in different CRC stages.
Collapse
Affiliation(s)
- Wan Xiang Shen
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China,Bioinformatics and Drug Design Group, Department of Pharmacy, and Center for Computational Science and Engineering, National University of Singapore, Singapore 117543, Singapore
| | - Shu Ran Liang
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China
| | - Yu Yang Jiang
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China,Corresponding author
| | - Yu Zong Chen
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China,Shenzhen Bay Laboratory, Shenzhen 518000, China,Corresponding author
| |
Collapse
|
36
|
Wu J, Wang J, Wu Z, Zhang S, Deng Y, Kang Y, Cao D, Hsieh CY, Hou T. ALipSol: An Attention-Driven Mixture-of-Experts Model for Lipophilicity and Solubility Prediction. J Chem Inf Model 2022; 62:5975-5987. [PMID: 36417544 DOI: 10.1021/acs.jcim.2c01290] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Lipophilicity (logD) and aqueous solubility (logSw) play a central role in drug development. The accurate prediction of these properties remains to be solved due to data scarcity. Current methodologies neglect the intrinsic relationships between physicochemical properties and usually ignore the ionization effects. Here, we propose an attention-driven mixture-of-experts (MoE) model named ALipSol, which explicitly reproduces the hierarchy of task relationships. We adopt the principle of divide-and-conquer by breaking down the complex end point (logD or logSw) into simpler ones (acidic pKa, basic pKa, and logP) and allocating a specific expert network for each subproblem. Subsequently, we implement transfer learning to extract knowledge from related tasks, thus alleviating the dilemma of limited data. Additionally, we substitute the gating network with an attention mechanism to better capture the dynamic task relationships on a per-example basis. We adopt local fine-tuning and consensus prediction to further boost model performance. Extensive evaluation experiments verify the success of the ALipSol model, which achieves RMSE improvement of 8.04%, 2.49%, 8.57%, 12.8%, and 8.60% on the Lipop, ESOL, AqSolDB, external logD, and external logS data sets, respectively, compared with Attentive FP and the state-of-the-art in silico tools. In particular, our model yields more significant advantages (Welch's t-test) for small training data, implying its high robustness and generalizability. The interpretability analysis proves that the atom contributions learned by ALipSol are more reasonable compared with the vanilla Attentive FP, and the substitution effects in benzene derivatives agreed well with empirical constants, revealing the potential of our model to extract useful patterns from data and provide guidance for lead optimization.
Collapse
Affiliation(s)
- Jialu Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018Zhejiang, P. R. China
| | - Junmei Wang
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, Pennsylvania15261, United States
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018Zhejiang, P. R. China
| | - Shengyu Zhang
- Tencent Quantum Laboratory, Tencent, Shenzhen, 518057Guangdong, P. R. China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018Zhejiang, P. R. China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410004Hunan, P. R. China
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China
| |
Collapse
|
37
|
Ji Z, Shi R, Lu J, Li F, Yang Y. ReLMole: Molecular Representation Learning Based on Two-Level Graph Similarities. J Chem Inf Model 2022; 62:5361-5372. [PMID: 36302249 DOI: 10.1021/acs.jcim.2c00798] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Molecular representation is a critical part of various prediction tasks for physicochemical properties of molecules and drug design. As graph notations are common in expressing the structural information of chemical compounds, graph neural networks (GNNs) have become the mainstream backbone model for learning molecular representation. However, the scarcity of task-specific labels in the biomedical domain limits the power of GNNs. Recently, self-supervised pretraining for GNNs has been leveraged to deal with this issue, while the existing pretraining methods are mainly designed for graph data in general domains without considering the specific data properties of molecules. In this paper, we propose a representation learning method for molecular graphs, called ReLMole, which is featured by a hierarchical graph modeling of molecules and a contrastive learning scheme based on two-level graph similarities. We assess the performance of ReLMole on two types of downstream tasks, namely, the prediction of molecular properties (MPs) and drug-drug interaction (DDIs). ReLMole achieves promising results for all the tasks. It outperforms the baseline models by over 2.6% on ROC-AUC averaged across six MP prediction tasks, and it improves the F1 value by 7-18% in DDI prediction for unseen drugs compared with other self-supervised models.
Collapse
Affiliation(s)
- Zewei Ji
- Department of Computer Science and Engineering, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai200240, China
| | - Runhan Shi
- Department of Computer Science and Engineering, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai200240, China
| | - Jiarui Lu
- Department of Computer Science and Engineering, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai200240, China
| | - Fang Li
- Department of Computer Science and Engineering, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai200240, China
| | - Yang Yang
- Department of Computer Science and Engineering, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai200240, China
| |
Collapse
|
38
|
Zhang H, Wang Y, Pan Z, Sun X, Mou M, Zhang B, Li Z, Li H, Zhu F. ncRNAInter: a novel strategy based on graph neural network to discover interactions between lncRNA and miRNA. Brief Bioinform 2022; 23:6747810. [PMID: 36198065 DOI: 10.1093/bib/bbac411] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 08/04/2022] [Accepted: 08/23/2022] [Indexed: 12/14/2022] Open
Abstract
In recent years, many studies have illustrated the significant role that non-coding RNA (ncRNA) plays in biological activities, in which lncRNA, miRNA and especially their interactions have been proved to affect many biological processes. Some in silico methods have been proposed and applied to identify novel lncRNA-miRNA interactions (LMIs), but there are still imperfections in their RNA representation and information extraction approaches, which imply there is still room for further improving their performances. Meanwhile, only a few of them are accessible at present, which limits their practical applications. The construction of a new tool for LMI prediction is thus imperative for the better understanding of their relevant biological mechanisms. This study proposed a novel method, ncRNAInter, for LMI prediction. A comprehensive strategy for RNA representation and an optimized deep learning algorithm of graph neural network were utilized in this study. ncRNAInter was robust and showed better performance of 26.7% higher Matthews correlation coefficient than existing reputable methods for human LMI prediction. In addition, ncRNAInter proved its universal applicability in dealing with LMIs from various species and successfully identified novel LMIs associated with various diseases, which further verified its effectiveness and usability. All source code and datasets are freely available at https://github.com/idrblab/ncRNAInter.
Collapse
Affiliation(s)
- Hanyu Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Yunxia Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Xiuna Sun
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Bing Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Zhaorong Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Honglin Li
- School of Computer Science and Technology, East China Normal University, Shanghai 200062, China.,Shanghai Key Laboratory of New Drug Design, East China University of Science and Technology, Shanghai 200237, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
39
|
Cai H, Zhang H, Zhao D, Wu J, Wang L. FP-GNN: a versatile deep learning architecture for enhanced molecular property prediction. Brief Bioinform 2022; 23:6702671. [PMID: 36124766 DOI: 10.1093/bib/bbac408] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 07/28/2022] [Accepted: 08/22/2022] [Indexed: 12/14/2022] Open
Abstract
Accurate prediction of molecular properties, such as physicochemical and bioactive properties, as well as ADME/T (absorption, distribution, metabolism, excretion and toxicity) properties, remains a fundamental challenge for molecular design, especially for drug design and discovery. In this study, we advanced a novel deep learning architecture, termed FP-GNN (fingerprints and graph neural networks), which combined and simultaneously learned information from molecular graphs and fingerprints for molecular property prediction. To evaluate the FP-GNN model, we conducted experiments on 13 public datasets, an unbiased LIT-PCBA dataset and 14 phenotypic screening datasets for breast cell lines. Extensive evaluation results showed that compared to advanced deep learning and conventional machine learning algorithms, the FP-GNN algorithm achieved state-of-the-art performance on these datasets. In addition, we analyzed the influence of different molecular fingerprints, and the effects of molecular graphs and molecular fingerprints on the performance of the FP-GNN model. Analysis of the anti-noise ability and interpretation ability also indicated that FP-GNN was competitive in real-world situations. Collectively, FP-GNN algorithm can assist chemists, biologists and pharmacists in predicting and discovering better molecules with desired functions or properties.
Collapse
Affiliation(s)
- Hanxuan Cai
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Huimin Zhang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Duancheng Zhao
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Jingxing Wu
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Ling Wang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| |
Collapse
|
40
|
Amahong K, Zhang W, Zhou Y, Zhang S, Yin J, Li F, Xu H, Yan T, Yue Z, Liu Y, Hou T, Qiu Y, Tao L, Han L, Zhu F. CovInter: interaction data between coronavirus RNAs and host proteins. Nucleic Acids Res 2022; 51:D546-D556. [PMID: 36200814 PMCID: PMC9825556 DOI: 10.1093/nar/gkac834] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 09/07/2022] [Accepted: 09/16/2022] [Indexed: 01/29/2023] Open
Abstract
Coronavirus has brought about three massive outbreaks in the past two decades. Each step of its life cycle invariably depends on the interactions among virus and host molecules. The interaction between virus RNA and host protein (IVRHP) is unique compared to other virus-host molecular interactions and represents not only an attempt by viruses to promote their translation/replication, but also the host's endeavor to combat viral pathogenicity. In other words, there is an urgent need to develop a database for providing such IVRHP data. In this study, a new database was therefore constructed to describe the interactions between coronavirus RNAs and host proteins (CovInter). This database is unique in (a) unambiguously characterizing the interactions between virus RNA and host protein, (b) comprehensively providing experimentally validated biological function for hundreds of host proteins key in viral infection and (c) systematically quantifying the differential expression patterns (before and after infection) of these key proteins. Given the devastating and persistent threat of coronaviruses, CovInter is highly expected to fill the gap in the whole process of the 'molecular arms race' between viruses and their hosts, which will then aid in the discovery of new antiviral therapies. It's now free and publicly accessible at: https://idrblab.org/covinter/.
Collapse
Affiliation(s)
| | | | | | - Song Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Jiayi Yin
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Fengcheng Li
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Hongquan Xu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China
| | - Tianci Yan
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China
| | - Zixuan Yue
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China
| | - Yuhong Liu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yunqing Qiu
- State Key Laboratory for Diagnosis and Treatment of Infectious Disease, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, Hangzhou 310000, China
| | - Lin Tao
- Correspondence may also be addressed to Lin Tao.
| | - Lianyi Han
- Correspondence may also be addressed to Lianyi Han.
| | - Feng Zhu
- To whom correspondence should be addressed. Tel: +86 189 8946 6518; Fax: +86 571 8820 8444;
| |
Collapse
|
41
|
A Deep Neural Network-Based Model for Quantitative Evaluation of the Effects of Swimming Training. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:5508365. [PMID: 36210996 PMCID: PMC9546648 DOI: 10.1155/2022/5508365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Revised: 09/06/2022] [Accepted: 09/16/2022] [Indexed: 11/17/2022]
Abstract
This paper analyzes the quantitative assessment model of the swimming training effect based on the deep neural network by constructing a deep neural network model and designing a quantitative assessment model of the swimming training effect. This paper addresses the problem of not considering the influence of the uncertainties existing in the virtual environment when evaluating swimming training and adds the power of the delays in the actual training operation environment, which is used to improve the objectivity and usability of swimming training evaluation results. To better measure the degree of influence of uncertainties, a training evaluation software module is developed to validate the usability of the simulated training evaluation method using simulated case data and compare it with the data after training evaluation using the unimproved evaluation method to verify the correctness and objectivity of the evaluation method in this paper. In the experiments, the feature extractor is a deep neural network, and the classifier is a gradient-boosting decision tree with integrated learning advantages. In the experimental comparison, we can achieve more than 60% accuracy and no more than a 1.00% decrease in recognition rate on DBPNN + GBDT, 78.5% parameter reduction, and 54.5% floating-point reduction on DPBNN. We can effectively reduce 32.1% of video memory occupation. It can be concluded from the experiments that deep neural network models are more effective and easier to obtain relatively accurate experimental results than shallow learning when facing high-dimensional sparse features. At the same time, deep neural networks can also improve the prediction results of external learning models. Therefore, the experimental results of this model are most intuitively accurate when combining deep neural networks with gradient boosting decision trees.
Collapse
|
42
|
An interpretable machine learning model for selectivity of small molecules against homologous protein family. Future Med Chem 2022; 14:1441-1453. [PMID: 36169035 DOI: 10.4155/fmc-2022-0075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Aim: In the early stages of drug discovery, various experimental and computational methods are used to measure the specificity of small molecules against a target protein. The selectivity of small molecules remains a challenge leading to off-target side effects. Methods: We have developed a multitask deep learning model for predicting the selectivity on closely related homologs of the target protein. The model has been tested on the Janus-activated kinase and dopamine receptor families of proteins. Results & conclusion: The feature-based representation (extended connectivity fingerprint 4) with Extreme Gradient Boosting performed better when compared with deep neural network models in most of the evaluation metrics. Both the Extreme Gradient Boosting and deep neural network models outperformed the graph-based models. Furthermore, to decipher the model decision on selectivity, the important fragments associated with each homologous protein were identified.
Collapse
|
43
|
Interpretable Machine Learning Models for Molecular Design of Tyrosine Kinase Inhibitors Using Variational Autoencoders and Perturbation-Based Approach of Chemical Space Exploration. Int J Mol Sci 2022; 23:ijms231911262. [PMID: 36232566 PMCID: PMC9569663 DOI: 10.3390/ijms231911262] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 09/21/2022] [Accepted: 09/21/2022] [Indexed: 11/17/2022] Open
Abstract
In the current study, we introduce an integrative machine learning strategy for the autonomous molecular design of protein kinase inhibitors using variational autoencoders and a novel cluster-based perturbation approach for exploration of the chemical latent space. The proposed strategy combines autoencoder-based embedding of small molecules with a cluster-based perturbation approach for efficient navigation of the latent space and a feature-based kinase inhibition likelihood classifier that guides optimization of the molecular properties and targeted molecular design. In the proposed generative approach, molecules sharing similar structures tend to cluster in the latent space, and interpolating between two molecules in the latent space enables smooth changes in the molecular structures and properties. The results demonstrated that the proposed strategy can efficiently explore the latent space of small molecules and kinase inhibitors along interpretable directions to guide the generation of novel family-specific kinase molecules that display a significant scaffold diversity and optimal biochemical properties. Through assessment of the latent-based and chemical feature-based binary and multiclass classifiers, we developed a robust probabilistic evaluator of kinase inhibition likelihood that is specifically tailored to guide the molecular design of novel SRC kinase molecules. The generated molecules originating from LCK and ABL1 kinase inhibitors yielded ~40% of novel and valid SRC kinase compounds with high kinase inhibition likelihood probability values (p > 0.75) and high similarity (Tanimoto coefficient > 0.6) to the known SRC inhibitors. By combining the molecular perturbation design with the kinase inhibition likelihood analysis and similarity assessments, we showed that the proposed molecular design strategy can produce novel valid molecules and transform known inhibitors of different kinase families into potential chemical probes of the SRC kinase with excellent physicochemical profiles and high similarity to the known SRC kinase drugs. The results of our study suggest that task-specific manipulation of a biased latent space may be an important direction for more effective task-oriented and target-specific autonomous chemical design models.
Collapse
|
44
|
Deep learning methods for molecular representation and property prediction. Drug Discov Today 2022; 27:103373. [PMID: 36167282 DOI: 10.1016/j.drudis.2022.103373] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 08/22/2022] [Accepted: 09/21/2022] [Indexed: 01/11/2023]
Abstract
With advances in artificial intelligence (AI) methods, computer-aided drug design (CADD) has developed rapidly in recent years. Effective molecular representation and accurate property prediction are crucial tasks in CADD workflows. In this review, we summarize contemporary applications of deep learning (DL) methods for molecular representation and property prediction. We categorize DL methods according to the format of molecular data (1D, 2D, and 3D). In addition, we discuss some common DL models, such as ensemble learning and transfer learning, and analyze the interpretability methods for these models. We also highlight the challenges and opportunities of DL methods for molecular representation and property prediction.
Collapse
|
45
|
Shan M, Jiang C, Qin L, Cheng G. A Review of Computational Methods in Predicting hERG Channel Blockers. ChemistrySelect 2022. [DOI: 10.1002/slct.202201221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Mengyi Shan
- School of Pharmaceutical Sciences Zhejiang Chinese Medical University Hangzhou 310053 People's Republic of China
| | - Chen Jiang
- QuanMin RenZheng (HangZhou) Technology Co. Ltd. China
| | - Lu‐Ping Qin
- School of Pharmaceutical Sciences Zhejiang Chinese Medical University Hangzhou 310053 People's Republic of China
| | - Gang Cheng
- School of Pharmaceutical Sciences Zhejiang Chinese Medical University Hangzhou 310053 People's Republic of China
| |
Collapse
|
46
|
An adaptive graph learning method for automated molecular interactions and properties predictions. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00501-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
47
|
Ma X, Yu S, Zhao B, Bai W, Cui Y, Ni J, Lyu Q, Zhao J. Development and Validation of a Novel Ferroptosis-Related LncRNA Signature for Predicting Prognosis and the Immune Landscape Features in Uveal Melanoma. Front Immunol 2022; 13:922315. [PMID: 35774794 PMCID: PMC9238413 DOI: 10.3389/fimmu.2022.922315] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2022] [Accepted: 05/10/2022] [Indexed: 12/18/2022] Open
Abstract
Background Ferroptosis is a newly iron-dependent mode of programmed cell death that is involved in a variety of malignancies. But no research has shown a link between ferroptosis-related long non-coding RNAs (FRLs) and uveal melanoma (UM). We aimed to develop a predictive model for UM and explore its potential function in relation to immune cell infiltration. Methods Identification of FRLs was performed using the Cancer Genome Atlas (TCGA) and FerrDb databases. To develop a prognostic FRLs signature, univariate Cox regression and least absolute shrinkage and selection operator (LASSO) were used in training cohort. Kaplan-Meier (K-M) and receiver operating characteristic (ROC) curve analyses were used to assess the reliability of the risk model. The immunological functions of FRLs signature were determined using gene set enrichment analysis (GSEA). Immunological cell infiltration and immune treatment were studied using the ESTIMATE, CIBERSORT, and ssGSEA algorithms. Finally, in vitro assays were carried out to confirm the biological roles of FRLs with known primer sequences (LINC00963, PPP1R14B.AS1, and ZNF667.AS1). Results A five-genes novel FRLs signature was identified. The mean risk score generated by this signature was used to create two risk groups. The high-risk score UM patients had a lower overall survival rate. The area under the curve (AUC) of ROC and K-M analysis further validated the strong prediction capacity of the prognostic signature. Immune cells such as memory CD8 T cells, M1 macrophages, monocytes, and B cells showed a substantial difference between the two groups. GSEA enrichment results showed that the FRLs signature was linked to certain immune pathways. Moreover, UM patients with high-risk scores were highly susceptible to several chemotherapy drugs, such as cisplatin, imatinib, bortezomib, and pazopanib. Finally, the experimental validation confirmed that knockdown of three identified lncRNA (LINC00963, PPP1R14B.AS1, and ZNF667.AS1) suppressed the invasive ability of tumor cells in vitro. Conclusion The five-FRLs (AC104129.1, AC136475.3, LINC00963, PPP1R14B.AS1, and ZNF667.AS1) signature has effects on clinical survival prediction and selection of immunotherapies for UM patients.
Collapse
Affiliation(s)
- Xiaochen Ma
- The Second Clinical Medical College, Jinan University, Shenzhen, China
| | - Sejie Yu
- The Second Clinical Medical College, Jinan University, Shenzhen, China
| | - Bin Zhao
- Biomedical Research Institute, Shenzhen Peking University-The Hong Kong University of Science and Technology Medical Center, Shenzhen, China
| | - Wei Bai
- The Second Clinical Medical College, Jinan University, Shenzhen, China
| | - Yubo Cui
- Department of Ophthalmology, Shenzhen People’s Hospital, The Second Clinical Medical College of Jinan University & The First Affiliated Hospital of Southern University of Science and Technology, Shenzhen, China
| | - Jinglan Ni
- The Second Clinical Medical College, Jinan University, Shenzhen, China
| | - Qinghua Lyu
- Department of Ophthalmology, Shenzhen People’s Hospital, The Second Clinical Medical College of Jinan University & The First Affiliated Hospital of Southern University of Science and Technology, Shenzhen, China
- *Correspondence: Qinghua Lyu, ; Jun Zhao,
| | - Jun Zhao
- Department of Ophthalmology, Shenzhen People’s Hospital, The Second Clinical Medical College of Jinan University & The First Affiliated Hospital of Southern University of Science and Technology, Shenzhen, China
- *Correspondence: Qinghua Lyu, ; Jun Zhao,
| |
Collapse
|
48
|
Moriwaki H, Saito S, Matsumoto T, Serizawa T, Kunimoto R. Global Analysis of Deep Learning Prediction Using Large-Scale In-House Kinome-Wide Profiling Data. ACS OMEGA 2022; 7:18374-18381. [PMID: 35694454 PMCID: PMC9178758 DOI: 10.1021/acsomega.2c00664] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 05/12/2022] [Indexed: 06/11/2023]
Abstract
In drug discovery, the prediction of activity and absorption, distribution, metabolism, excretion, and toxicity parameters is one of the most important approaches in determining which compound to synthesize next. In recent years, prediction methods based on deep learning as well as non-deep learning approaches have been established, and a number of applications to drug discovery have been reported by various companies and organizations. In this research, we performed activity prediction using deep learning and non-deep learning methods on in-house assay data for several hundred kinases and compared and discussed the prediction results. We found that the prediction accuracy of the single-task graph neural network (GNN) model was generally lower than that of the non-deep learning model (LightGBM), but the multitask GNN model, which combined data from other kinases, comprehensively outperformed LightGBM. In addition, the extrapolative validity of the multitask model was verified by using it for prediction on known kinase ligands. We observed an overlap between characteristic protein-ligand interaction sites and the atoms that are important for prediction. By building appropriate models based on the conditions of the data set and analyzing the feature importance of the prediction results, a ligand-based prediction method may be used not only for activity prediction but also for drug design.
Collapse
Affiliation(s)
- Hirotomo Moriwaki
- ExaWizards
Inc., 21F Shiodome Sumitomo
Building, 1-9-2 Higashi Shimbashi, Minato-ku, Tokyo 105-0021, Japan
| | - Shin Saito
- ExaWizards
Inc., 21F Shiodome Sumitomo
Building, 1-9-2 Higashi Shimbashi, Minato-ku, Tokyo 105-0021, Japan
| | - Tomoya Matsumoto
- ExaWizards
Inc., 21F Shiodome Sumitomo
Building, 1-9-2 Higashi Shimbashi, Minato-ku, Tokyo 105-0021, Japan
| | - Takayuki Serizawa
- Medicinal
Chemistry Research Laboratories, R&D Division, Daiichi-Sankyo
Shinagawa R&D Center, Daiichi Sankyo
Company, Limited, 1-2-58 Hiromachi, Shinagawa-ku, Tokyo 140-8710, Japan
| | - Ryo Kunimoto
- Medicinal
Chemistry Research Laboratories, R&D Division, Daiichi-Sankyo
Shinagawa R&D Center, Daiichi Sankyo
Company, Limited, 1-2-58 Hiromachi, Shinagawa-ku, Tokyo 140-8710, Japan
| |
Collapse
|
49
|
Wang Y, Magar R, Liang C, Barati Farimani A. Improving Molecular Contrastive Learning via Faulty Negative Mitigation and Decomposed Fragment Contrast. J Chem Inf Model 2022; 62:2713-2725. [PMID: 35638560 DOI: 10.1021/acs.jcim.2c00495] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Deep learning has been a prevalence in computational chemistry and widely implemented in molecular property predictions. Recently, self-supervised learning (SSL), especially contrastive learning (CL), has gathered growing attention for the potential to learn molecular representations that generalize to the gigantic chemical space. Unlike supervised learning, SSL can directly leverage large unlabeled data, which greatly reduces the effort to acquire molecular property labels through costly and time-consuming simulations or experiments. However, most molecular SSL methods borrow the insights from the machine learning community but neglect the unique cheminformatics (e.g., molecular fingerprints) and multilevel graphical structures (e.g., functional groups) of molecules. In this work, we propose iMolCLR, improvement of Molecular Contrastive Learning of Representations with graph neural networks (GNNs) in two aspects: (1) mitigating faulty negative contrastive instances via considering cheminformatics similarities between molecule pairs and (2) fragment-level contrasting between intramolecule and intermolecule substructures decomposed from molecules. Experiments have shown that the proposed strategies significantly improve the performance of GNN models on various challenging molecular property predictions. In comparison to the previous CL framework, iMolCLR demonstrates an averaged 1.2% improvement of ROC-AUC on eight classification benchmarks and an averaged 10.1% decrease of the error on six regression benchmarks. On most benchmarks, the generic GNN pretrained by iMolCLR rivals or even surpasses supervised learning models with sophisticated architectures and engineered features. Further investigations demonstrate that representations learned through iMolCLR intrinsically embed scaffolds and functional groups that can reason molecule similarities.
Collapse
Affiliation(s)
- Yuyang Wang
- Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Rishikesh Magar
- Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Chen Liang
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Amir Barati Farimani
- Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States.,Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|
50
|
Zhang S, Yan Z, Huang Y, Liu L, He D, Wang W, Fang X, Zhang X, Wang F, Wu H, Wang H. HelixADMET: a robust and endpoint extensible ADMET system incorporating self-supervised knowledge transfer. Bioinformatics 2022; 38:3444-3453. [PMID: 35604079 DOI: 10.1093/bioinformatics/btac342] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Revised: 05/06/2022] [Accepted: 05/17/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Accurate ADMET (an abbreviation for "absorption, distribution, metabolism, excretion, and toxicity") predictions can efficiently screen out undesirable drug candidates in the early stage of drug discovery. In recent years, multiple comprehensive ADMET systems that adopt advanced machine learning models have been developed, providing services to estimate multiple endpoints. However, those ADMET systems usually suffer from weak extrapolation ability. First, due to the lack of labelled data for each endpoint, typical machine learning models perform frail for the molecules with unobserved scaffolds. Second, most systems only provide fixed built-in endpoints and cannot be customised to satisfy various research requirements. To this end, we develop a robust and endpoint extensible ADMET system, HelixADMET (H-ADMET). H-ADMET incorporates the concept of self-supervised learning to produce a robust pre-trained model. The model is then fine-tuned with a multi-task and multi-stage framework to transfer knowledge between ADMET endpoints, auxiliary tasks, and self-supervised tasks. RESULTS Our results demonstrate that H-ADMET achieves an overall improvement of 4%, compared with existing ADMET systems on comparable endpoints. Additionally, the pre-trained model provided by H-ADMET can be fine-tuned to generate new and customised ADMET endpoints, meeting various demands of drug research and development requirements. AVAILABILITY H-ADMET is freely accessible at https://paddlehelix.baidu.com/app/drug/admet/train. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shanzhuo Zhang
- Baidu International Technology (Shenzhen) Co., Ltd., Shenzhen, China
| | - Zhiyuan Yan
- Baidu International Technology (Shenzhen) Co., Ltd., Shenzhen, China
| | - Yueyang Huang
- Baidu International Technology (Shenzhen) Co., Ltd., Shenzhen, China
| | - Lihang Liu
- Baidu International Technology (Shenzhen) Co., Ltd., Shenzhen, China
| | - Donglong He
- Baidu International Technology (Shenzhen) Co., Ltd., Shenzhen, China
| | - Wei Wang
- School of Computer Science and Technology, Harbin Institute of Technology (HIT), Shenzhen, China
| | - Xiaomin Fang
- Baidu International Technology (Shenzhen) Co., Ltd., Shenzhen, China
| | - Xiaonan Zhang
- Baidu International Technology (Shenzhen) Co., Ltd., Shenzhen, China
| | - Fan Wang
- Baidu International Technology (Shenzhen) Co., Ltd., Shenzhen, China
| | - Hua Wu
- Baidu Inc., Beijing, China
| | | |
Collapse
|