1
|
Duan Y, Yang X, Zeng X, Wang W, Deng Y, Cao D. Enhancing Molecular Property Prediction through Task-Oriented Transfer Learning: Integrating Universal Structural Insights and Domain-Specific Knowledge. J Med Chem 2024; 67:9575-9586. [PMID: 38748846 DOI: 10.1021/acs.jmedchem.4c00692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2024]
Abstract
Precisely predicting molecular properties is crucial in drug discovery, but the scarcity of labeled data poses a challenge for applying deep learning methods. While large-scale self-supervised pretraining has proven an effective solution, it often neglects domain-specific knowledge. To tackle this issue, we introduce Task-Oriented Multilevel Learning based on BERT (TOML-BERT), a dual-level pretraining framework that considers both structural patterns and domain knowledge of molecules. TOML-BERT achieved state-of-the-art prediction performance on 10 pharmaceutical datasets. It has the capability to mine contextual information within molecular structures and extract domain knowledge from massive pseudo-labeled data. The dual-level pretraining accomplished significant positive transfer, with its two components making complementary contributions. Interpretive analysis elucidated that the effectiveness of the dual-level pretraining lies in the prior learning of a task-related molecular representation. Overall, TOML-BERT demonstrates the potential of combining multiple pretraining tasks to extract task-oriented knowledge, advancing molecular property prediction in drug discovery.
Collapse
Affiliation(s)
- Yanjing Duan
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha Hunan 410013, P. R. China
| | - Xixi Yang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410013, P. R. China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410013, P. R. China
| | - Wenxuan Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha Hunan 410013, P. R. China
| | - Youchao Deng
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha Hunan 410013, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha Hunan 410013, P. R. China
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, P. R. China
| |
Collapse
|
2
|
Pitman C, Santiago-McRae E, Lohia R, Bassi K, Joseph TT, Hansen MEB, Brannigan G. The blobulator: a webtool for identification and visual exploration of hydrophobic modularity in protein sequences. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.15.575761. [PMID: 38293114 PMCID: PMC10827107 DOI: 10.1101/2024.01.15.575761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Motivation Clusters of hydrophobic residues are known to promote structured protein stability and drive protein aggregation. Recent work has shown that identifying contiguous hydrophobic residue clusters (termed "blobs") has proven useful in both intrinsically disordered protein (IDP) simulation and human genome studies. However, a graphical interface was unavailable. Results Here, we present the blobulator: an interactive and intuitive web interface to detect intrinsic modularity in any protein sequence based on hydrophobicity. We demonstrate three use cases of the blobulator and show how identifying blobs with biologically relevant parameters provides useful information about a globular protein, two orthologous membrane proteins, and an IDP. Other potential applications are discussed, including: predicting protein segments with critical roles in tertiary interactions, providing a definition of local order and disorder with clear edges, and aiding in predicting protein features from sequence. Availability The blobulator GUI can be found at www.blobulator.branniganlab.org, and the source code with pip installable command line tool can be found on GitHub at www.GitHub.com/BranniganLab/blobulator.
Collapse
Affiliation(s)
- Connor Pitman
- Center for Computational and Integrative Biology, Rutgers University-Camden, 201 Broadway, 08103, NJ, USA
| | - Ezry Santiago-McRae
- Center for Computational and Integrative Biology, Rutgers University-Camden, 201 Broadway, 08103, NJ, USA
| | - Ruchi Lohia
- Department of Physiology, University of Toronto, 1 King's College Circle, M5S 1A8, Toronto, Ontario, Canada
| | - Kaitlin Bassi
- Center for Computational and Integrative Biology, Rutgers University-Camden, 201 Broadway, 08103, NJ, USA
| | - Thomas T Joseph
- Department of Anesthesiology and Critical Care, Perelman School of Medicine, University of Pennsylvania, JMB 305, 3620 Hamilton Walk, 19104, PA, USA
| | - Matthew E B Hansen
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, 3400 Civic Center Blvd, 19104, PA, USA
| | - Grace Brannigan
- Center for Computational and Integrative Biology, Rutgers University-Camden, 201 Broadway, 08103, NJ, USA
- Department of Physics, Rutgers University-Camden, 201 Broadway, 08103, NJ, USA
| |
Collapse
|
3
|
Liu J, Lei X, Ji C, Pan Y. Fragment-pair based drug molecule solubility prediction through attention mechanism. Front Pharmacol 2023; 14:1255181. [PMID: 37881183 PMCID: PMC10595153 DOI: 10.3389/fphar.2023.1255181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Accepted: 09/26/2023] [Indexed: 10/27/2023] Open
Abstract
The purpose of drug discovery is to identify new drugs, and the solubility of drug molecules is an important physicochemical property in medicinal chemistry, that plays a crucial role in drug discovery. In solubility prediction, high-precision computational methods can significantly reduce the experimental costs and time associated with drug development. Therefore, artificial intelligence technologies have been widely used for solubility prediction. This study utilized the attention layer in mechanism in the deep learning model to consider the atomic-level features of the molecules, and used gated recurrent neural networks to aggregate vectors between layers. It also utilized molecular fragment technology to divide the complete molecule into pairs of fragments, extracted characteristics from each fragment pair, and finally fused the characteristics to predict the solubility of drug molecules. We compared and evaluated our method with five existing models using two performance evaluation indicators, demonstrating that our method has better performance and greater robustness.
Collapse
Affiliation(s)
- Jianping Liu
- School of Computer Science, Shaanxi Normal University, Xi’an, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi’an, China
| | - Chunyan Ji
- Computer Science Department, BNU-HKBU United International College, Zhuhai, China
| | - Yi Pan
- Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Shenzhen Key Laboratory of Intelligent Bioinformatics, Shenzhen Institute of Advanced Technology, Shenzhen, China
| |
Collapse
|
4
|
Wang Y, Xiong J, Xiao F, Zhang W, Cheng K, Rao J, Niu B, Tong X, Qu N, Zhang R, Wang D, Chen K, Li X, Zheng M. LogD7.4 prediction enhanced by transferring knowledge from chromatographic retention time, microscopic pKa and logP. J Cheminform 2023; 15:76. [PMID: 37670374 PMCID: PMC10478446 DOI: 10.1186/s13321-023-00754-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 08/25/2023] [Indexed: 09/07/2023] Open
Abstract
Lipophilicity is a fundamental physical property that significantly affects various aspects of drug behavior, including solubility, permeability, metabolism, distribution, protein binding, and toxicity. Accurate prediction of lipophilicity, measured by the logD7.4 value (the distribution coefficient between n-octanol and buffer at physiological pH 7.4), is crucial for successful drug discovery and design. However, the limited availability of data for logD modeling poses a significant challenge to achieving satisfactory generalization capability. To address this challenge, we have developed a novel logD7.4 prediction model called RTlogD, which leverages knowledge from multiple sources. RTlogD combines pre-training on a chromatographic retention time (RT) dataset since the RT is influenced by lipophilicity. Additionally, microscopic pKa values are incorporated as atomic features, providing valuable insights into ionizable sites and ionization capacity. Furthermore, logP is integrated as an auxiliary task within a multitask learning framework. We conducted ablation studies and presented a detailed analysis, showcasing the effectiveness and interpretability of RT, pKa, and logP in the RTlogD model. Notably, our RTlogD model demonstrated superior performance compared to commonly used algorithms and prediction tools. These results underscore the potential of the RTlogD model to improve the accuracy and generalization of logD prediction in drug discovery and design. In summary, the RTlogD model addresses the challenge of limited data availability in logD modeling by leveraging knowledge from RT, microscopic pKa, and logP. Incorporating these factors enhances the predictive capabilities of our model, and it holds promise for real-world applications in drug discovery and design scenarios.
Collapse
Affiliation(s)
- Yitian Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Jiacheng Xiong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Fu Xiao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing, 210023, China
| | - Wei Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Kaiyang Cheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing, 210023, China
| | - Jingxin Rao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Buying Niu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Xiaochu Tong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Ning Qu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Runze Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | | | - Kaixian Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing, 210023, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
- Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing, 210023, China.
| |
Collapse
|
5
|
Stienstra CMK, Ieritano C, Haack A, Hopkins WS. Bridging the Gap between Differential Mobility, Log S, and Log P Using Machine Learning and SHAP Analysis. Anal Chem 2023. [PMID: 37384824 DOI: 10.1021/acs.analchem.3c00921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
Aqueous solubility, log S, and the water-octanol partition coefficient, log P, are physicochemical properties that are used to screen the viability of drug candidates and to estimate mass transport in the environment. In this work, differential mobility spectrometry (DMS) experiments performed in microsolvating environments are used to train machine learning (ML) frameworks that predict the log S and log P of various molecule classes. In lieu of a consistent source of experimentally measured log S and log P values, the OPERA package was used to evaluate the aqueous solubility and hydrophobicity of 333 analytes. With ion mobility/DMS data (e.g., CCS, dispersion curves) as input, we used ML regressors and ensemble stacking to derive relationships with a high degree of explainability, as assessed via SHapley Additive exPlanations (SHAP) analysis. The DMS-based regression models returned scores of R2 = 0.67 and RMSE = 1.03 ± 0.10 for log S predictions and R2 = 0.67 and RMSE = 1.20 ± 0.10 for log P after 5-fold random cross-validation. SHAP analysis reveals that the regressors strongly weighted gas-phase clustering in log P correlations. The addition of structural descriptors (e.g., # of aromatic carbons) improved log S predictions to yield RMSE = 0.84 ± 0.07 and R2 = 0.78. Similarly, log P predictions using the same data resulted in an RMSE of 0.83 ± 0.04 and R2 = 0.84. The SHAP analysis of log P models highlights the need for additional experimental parameters describing hydrophobic interactions. These results were achieved with a smaller dataset (333 instances) and minimal structural correlation compared to purely structure-based models, underscoring the value of employing DMS data in predictive models.
Collapse
Affiliation(s)
- Cailum M K Stienstra
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Christian Ieritano
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Alexander Haack
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - W Scott Hopkins
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
- Watermine Innovation, Waterloo, Ontario N0B 2T0, Canada
- Centre for Eye and Vision Research, Hong Kong Science Park, New Territories 999077, Hong Kong
| |
Collapse
|