1
|
Li J, Zhang J, Guo R, Dai J, Niu Z, Wang Y, Wang T, Jiang X, Hu W. Progress of machine learning in the application of small molecule druggability prediction. Eur J Med Chem 2025; 285:117269. [PMID: 39808972 DOI: 10.1016/j.ejmech.2025.117269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Revised: 01/07/2025] [Accepted: 01/08/2025] [Indexed: 01/16/2025]
Abstract
Machine learning (ML) has become an important tool for predicting the pharmaceutical properties of small molecules. Recent advancements in ML algorithms enable the rapid and accurate evaluation of solubility, activity, toxicity, pharmacokinetics, and other molecular properties through ML-based models. By conducting virtual screening of drug targets and elucidating drug-target protein interactions, researchers can conduct preliminary evaluations of the activity and safety of compounds from the ultra-large drug compound libraries, thereby accelerating the screening process for lead compounds. Moreover, ML leverages existing experimental data to train and generate new datasets, addressing the challenge of limited compounds and protein target data. This review provided a concise overview of ML applications in predicting small molecule properties, focusing on model construction principles, molecular feature selection, and other essential aspects. It also discussed the potential applications of ML in the screening of pharmaceutical small molecules.
Collapse
Affiliation(s)
- Junyao Li
- School of Chemistry and Life Sciences, Suzhou University of Science and Technology, Suzhou, China; School of Life Sciences, Huaiyin Normal University, Huaian, 223300, China; Institute of Translational Medicine, School of Medicine, Yangzhou University, Yangzhou, 225009, China
| | - Jianmei Zhang
- School of Chemistry and Life Sciences, Suzhou University of Science and Technology, Suzhou, China
| | - Rui Guo
- School of Chemistry and Life Sciences, Suzhou University of Science and Technology, Suzhou, China; Institute of Translational Medicine, School of Medicine, Yangzhou University, Yangzhou, 225009, China
| | - Jiawei Dai
- Institute of Translational Medicine, School of Medicine, Yangzhou University, Yangzhou, 225009, China
| | - Zhiqiang Niu
- Institute of Translational Medicine, School of Medicine, Yangzhou University, Yangzhou, 225009, China
| | - Yan Wang
- School of Chemistry and Life Sciences, Suzhou University of Science and Technology, Suzhou, China
| | - Taoyun Wang
- School of Chemistry and Life Sciences, Suzhou University of Science and Technology, Suzhou, China.
| | - Xiaojian Jiang
- School of Life Sciences, Huaiyin Normal University, Huaian, 223300, China.
| | - Weicheng Hu
- Institute of Translational Medicine, School of Medicine, Yangzhou University, Yangzhou, 225009, China.
| |
Collapse
|
2
|
Su Y, Wang X, Ye Y, Xie Y, Xu Y, Jiang Y, Wang C. Automation and machine learning augmented by large language models in a catalysis study. Chem Sci 2024; 15:12200-12233. [PMID: 39118602 PMCID: PMC11304797 DOI: 10.1039/d3sc07012c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Accepted: 06/21/2024] [Indexed: 08/10/2024] Open
Abstract
Recent advancements in artificial intelligence and automation are transforming catalyst discovery and design from traditional trial-and-error manual mode into intelligent, high-throughput digital methodologies. This transformation is driven by four key components, including high-throughput information extraction, automated robotic experimentation, real-time feedback for iterative optimization, and interpretable machine learning for generating new knowledge. These innovations have given rise to the development of self-driving labs and significantly accelerated materials research. Over the past two years, the emergence of large language models (LLMs) has added a new dimension to this field, providing unprecedented flexibility in information integration, decision-making, and interacting with human researchers. This review explores how LLMs are reshaping catalyst design, heralding a revolutionary change in the fields.
Collapse
Affiliation(s)
- Yuming Su
- iChem, State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Xiamen University Xiamen 361005 P. R. China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM) Xiamen 361005 P. R. China
| | - Xue Wang
- iChem, State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Xiamen University Xiamen 361005 P. R. China
| | - Yuanxiang Ye
- Institute of Artificial Intelligence, Xiamen University Xiamen 361005 P. R. China
| | - Yibo Xie
- Institute of Artificial Intelligence, Xiamen University Xiamen 361005 P. R. China
| | - Yujing Xu
- iChem, State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Xiamen University Xiamen 361005 P. R. China
| | - Yibin Jiang
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM) Xiamen 361005 P. R. China
| | - Cheng Wang
- iChem, State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Xiamen University Xiamen 361005 P. R. China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM) Xiamen 361005 P. R. China
| |
Collapse
|
3
|
Liu S, Yang Q, Zhang L, Luo S. Accurate Protein p Ka Prediction with Physical Organic Chemistry Guided 3D Protein Representation. J Chem Inf Model 2024; 64:4410-4418. [PMID: 38780156 DOI: 10.1021/acs.jcim.4c00354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
Protein pKa is a fundamental physicochemical parameter that dictates protein structure and function. However, accurately determining protein site-pKa values remains a substantial challenge, both experimentally and theoretically. In this study, we introduce a physical organic approach, leveraging a protein structural and physical-organic-parameter-based representation (P-SPOC), to develop a rapid and intuitive model for protein pKa prediction. Our P-SPOC model achieves state-of-the-art predictive accuracy, with a mean absolute error (MAE) of 0.33 pKa units. Furthermore, we have incorporated advanced protein structure prediction models, like AlphaFold2, to approximate structures for proteins lacking three-dimensional representations, which enhances the applicability of our model in the context of structure-undetermined protein research. To promote broader accessibility within the research community, an online prediction interface was also established at isyn.luoszgroup.com.
Collapse
Affiliation(s)
- Siyuan Liu
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Qi Yang
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Long Zhang
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Sanzhong Luo
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing 100084, China
| |
Collapse
|
4
|
Ma M, Lei X. A deep learning framework for predicting molecular property based on multi-type features fusion. Comput Biol Med 2024; 169:107911. [PMID: 38160501 DOI: 10.1016/j.compbiomed.2023.107911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 12/18/2023] [Accepted: 12/24/2023] [Indexed: 01/03/2024]
Abstract
Extracting expressive molecular features is essential for molecular property prediction. Sequence-based representation is a common representation of molecules, which ignores the structure information of molecules. While molecular graph representation has a weak ability in expressing the 3D structure. In this article, we try to make use of the advantages of different type representations simultaneously for molecular property prediction. Thus, we propose a fusion model named DLF-MFF, which integrates the multi-type molecular features. Specifically, we first extract four different types of features from molecular fingerprints, 2D molecular graph, 3D molecular graph and molecular image. Then, in order to learn molecular features individually, we use four essential deep learning frameworks, which correspond to four distinct molecular representations. The final molecular representation is created by integrating the four feature vectors and feeding them into prediction layer to predict molecular property. We compare DLF-MFF with 7 state-of-the-art methods on 6 benchmark datasets consisting of multiple molecular properties, the experimental results show that DLF-MFF achieves state-of-the-art performance on 6 benchmark datasets. Moreover, DLF-MFF is applied to identify potential anti-SARS-CoV-2 inhibitor from 2500 drugs. We predict probability of each drug being inferred as a 3CL protease inhibitor and also calculate the binding affinity scores between each drug and 3CL protease. The results show that DLF-MFF product better performance in the identification of anti-SARS-CoV-2 inhibitor. This work is expected to offer novel research perspectives for accurate prediction of molecular properties and provide valuable insights into drug repurposing for COVID-19.
Collapse
Affiliation(s)
- Mei Ma
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China; School of Mathematics and Statistics, Qinghai Normal University, Qinghai, 810000, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China.
| |
Collapse
|
5
|
Wang R, Chen J, Song Z, Qi Z. Bridging Machine Learning and Redlich–Kister Theory for Solid–Liquid Equilibria Prediction of Binary Eutectic Solvent Systems. Ind Eng Chem Res 2023. [DOI: 10.1021/acs.iecr.3c00054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/11/2023]
Affiliation(s)
- Ruizhuan Wang
- State Key Laboratory of Chemical Engineering, School of Chemical Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Jiahui Chen
- State Key Laboratory of Chemical Engineering, School of Chemical Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Zhen Song
- State Key Laboratory of Chemical Engineering, School of Chemical Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Zhiwen Qi
- State Key Laboratory of Chemical Engineering, School of Chemical Engineering, East China University of Science and Technology, Shanghai 200237, China
| |
Collapse
|
6
|
Kee CW. Molecular Understanding and Practical In Silico Catalyst Design in Computational Organocatalysis and Phase Transfer Catalysis-Challenges and Opportunities. Molecules 2023; 28:1715. [PMID: 36838703 PMCID: PMC9966076 DOI: 10.3390/molecules28041715] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 02/03/2023] [Accepted: 02/05/2023] [Indexed: 02/25/2023] Open
Abstract
Through the lens of organocatalysis and phase transfer catalysis, we will examine the key components to calculate or predict catalysis-performance metrics, such as turnover frequency and measurement of stereoselectivity, via computational chemistry. The state-of-the-art tools available to calculate potential energy and, consequently, free energy, together with their caveats, will be discussed via examples from the literature. Through various examples from organocatalysis and phase transfer catalysis, we will highlight the challenges related to the mechanism, transition state theory, and solvation involved in translating calculated barriers to the turnover frequency or a metric of stereoselectivity. Examples in the literature that validated their theoretical models will be showcased. Lastly, the relevance and opportunity afforded by machine learning will be discussed.
Collapse
Affiliation(s)
- Choon Wee Kee
- Institute of Sustainability for Chemicals, Energy and Environment (ISCE2), Agency for Science, Technology and Research (A*STAR), 1 Pesek Road, Jurong Island, Singapore 627833, Republic of Singapore
| |
Collapse
|
7
|
Nascimben M, Rimondini L. Molecular Toxicity Virtual Screening Applying a Quantized Computational SNN-Based Framework. Molecules 2023; 28:molecules28031342. [PMID: 36771009 PMCID: PMC9919191 DOI: 10.3390/molecules28031342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 01/27/2023] [Accepted: 01/29/2023] [Indexed: 02/04/2023] Open
Abstract
Spiking neural networks are biologically inspired machine learning algorithms attracting researchers' attention for their applicability to alternative energy-efficient hardware other than traditional computers. In the current work, spiking neural networks have been tested in a quantitative structure-activity analysis targeting the toxicity of molecules. Multiple public-domain databases of compounds have been evaluated with spiking neural networks, achieving accuracies compatible with high-quality frameworks presented in the previous literature. The numerical experiments also included an analysis of hyperparameters and tested the spiking neural networks on molecular fingerprints of different lengths. Proposing alternatives to traditional software and hardware for time- and resource-consuming tasks, such as those found in chemoinformatics, may open the door to new research and improvements in the field.
Collapse
Affiliation(s)
- Mauro Nascimben
- Department of Health Sciences, Center on Autoimmune and Allergic Diseases CAAD, Università del Piemonte Orientale, 28100 Novara, Italy
- Enginsoft SpA, 35129 Padua, Italy
- Correspondence:
| | - Lia Rimondini
- Department of Health Sciences, Center on Autoimmune and Allergic Diseases CAAD, Università del Piemonte Orientale, 28100 Novara, Italy
| |
Collapse
|
8
|
Jiang J, Ma X, Ouyang D, Williams RO. Emerging Artificial Intelligence (AI) Technologies Used in the Development of Solid Dosage Forms. Pharmaceutics 2022; 14:2257. [PMID: 36365076 PMCID: PMC9694557 DOI: 10.3390/pharmaceutics14112257] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 10/11/2022] [Accepted: 10/17/2022] [Indexed: 07/30/2023] Open
Abstract
Artificial Intelligence (AI)-based formulation development is a promising approach for facilitating the drug product development process. AI is a versatile tool that contains multiple algorithms that can be applied in various circumstances. Solid dosage forms, represented by tablets, capsules, powder, granules, etc., are among the most widely used administration methods. During the product development process, multiple factors including critical material attributes (CMAs) and processing parameters can affect product properties, such as dissolution rates, physical and chemical stabilities, particle size distribution, and the aerosol performance of the dry powder. However, the conventional trial-and-error approach for product development is inefficient, laborious, and time-consuming. AI has been recently recognized as an emerging and cutting-edge tool for pharmaceutical formulation development which has gained much attention. This review provides the following insights: (1) a general introduction of AI in the pharmaceutical sciences and principal guidance from the regulatory agencies, (2) approaches to generating a database for solid dosage formulations, (3) insight on data preparation and processing, (4) a brief introduction to and comparisons of AI algorithms, and (5) information on applications and case studies of AI as applied to solid dosage forms. In addition, the powerful technique known as deep learning-based image analytics will be discussed along with its pharmaceutical applications. By applying emerging AI technology, scientists and researchers can better understand and predict the properties of drug formulations to facilitate more efficient drug product development processes.
Collapse
Affiliation(s)
- Junhuang Jiang
- Division of Molecular Pharmaceutics and Drug Delivery, College of Pharmacy, The University of Texas at Austin, Austin, TX 78712, USA
| | - Xiangyu Ma
- Global Investment Research, Goldman Sachs, New York, NY 10282, USA
| | - Defang Ouyang
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences (ICMS), University of Macau, Macau 999078, China
| | - Robert O. Williams
- Division of Molecular Pharmaceutics and Drug Delivery, College of Pharmacy, The University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|
9
|
Jiang J, Peng HH, Yang Z, Ma X, Sahakijpijarn S, Moon C, Ouyang D, Williams Iii RO. The applications of Machine learning (ML) in designing dry powder for inhalation by using thin-film-freezing technology. Int J Pharm 2022; 626:122179. [PMID: 36084876 DOI: 10.1016/j.ijpharm.2022.122179] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Revised: 09/01/2022] [Accepted: 09/02/2022] [Indexed: 12/19/2022]
Abstract
Dry powder inhalers (DPIs) are one of the most widely used devices for treating respiratory diseases. Thin--film--freezing (TFF) is a particle engineering technology that has been demonstrated to prepare dry powder for inhalation with enhanced physicochemical properties. Aerosol performance, which is indicated by fine particle fraction (FPF) and mass median aerodynamic diameter (MMAD), is an important consideration during the product development process. However, the conventional approach for formulation development requires many trial-and-error experiments, which is both laborious and time consuming. As a state-of-the art technique, machine learning has gained more attention in pharmaceutical science and has been widely applied in different settings. In this study, we have successfully built a prediction model for aerosol performance by using both tabular data and scanning electron microscopy (SEM) images. TFF technology was used to prepare 134 dry powder formulations which were collected as a tabular dataset. After testing many machine learning models, we determined that the Random Forest (RF) model was best for FPF prediction with a mean absolute error of ± 7.251%, and artificial neural networks (ANNs) performed the best in estimating MMAD with a mean absolute error of ± 0.393 μm. In addition, a convolutional neural network was employed for SEM image classification and has demonstrated high accuracy (>83.86%) and adaptability in predicting 316 SEM images of three different drug formulations. In conclusion, the machine learning models using both tabular data and image classification were successfully established to evaluate the aerosol performance of dry powder for inhalation. These machine learning models facilitate the product development process of dry powder for inhalation manufactured by TFF technology and have the potential to significantly reduce the product development workload. The machine learning methodology can also be applied to other formulation design and development processes in the future.
Collapse
Affiliation(s)
- Junhuang Jiang
- Department of Molecular Pharmaceutics and Drug Delivery, College of Pharmacy, The University of Texas at Austin, TX, USA
| | - Han-Hsuan Peng
- Department of Molecular Pharmaceutics and Drug Delivery, College of Pharmacy, The University of Texas at Austin, TX, USA
| | - Zhenpei Yang
- Department of Computer Science, The University of Texas at Austin, TX, USA
| | - Xiangyu Ma
- Global Investment Research, Goldman Sachs, NY, USA
| | | | - Chaeho Moon
- Department of Molecular Pharmaceutics and Drug Delivery, College of Pharmacy, The University of Texas at Austin, TX, USA
| | - Defang Ouyang
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences (ICMS), University of Macau, Macau, China
| | - Robert O Williams Iii
- Department of Molecular Pharmaceutics and Drug Delivery, College of Pharmacy, The University of Texas at Austin, TX, USA.
| |
Collapse
|
10
|
Shi M, Zhang Q, Gao J, Mi X, Luo S. Catalytic Asymmetric α‐Alkylsulfenylation with a Disulfide Reagent. Angew Chem Int Ed Engl 2022; 61:e202209044. [DOI: 10.1002/anie.202209044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Indexed: 11/11/2022]
Affiliation(s)
- Mingying Shi
- College of Chemistry Beijing Normal University Beijing 100875 China
| | - Qi Zhang
- Center of Basic Molecular Science (CBMS) Department of Chemistry Tsinghua University Beijing 100084 China
| | - Jiali Gao
- College of Chemistry Beijing Normal University Beijing 100875 China
| | - Xueling Mi
- College of Chemistry Beijing Normal University Beijing 100875 China
| | - Sanzhong Luo
- Center of Basic Molecular Science (CBMS) Department of Chemistry Tsinghua University Beijing 100084 China
| |
Collapse
|
11
|
Shi M, Zhang Q, Gao J, Mi X, Luo S. Catalytic Asymmetric α‐Alkylsulfenylation with a Disulfide Reagent. Angew Chem Int Ed Engl 2022. [DOI: 10.1002/ange.202209044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Mingying Shi
- Beijing Normal University Department of Chemistry CHINA
| | - Qi Zhang
- Tsinghua University CBMS, Department of Chemistry CHINA
| | - Jiali Gao
- Beijing Normal University Department of Chemistry CHINA
| | - Xueling Mi
- Beijing Normal University Department of Chemistry CHINA
| | - Sanzhong Luo
- Tsinghua University Department of Chemistry Tsinghua University 100084 Beijing CHINA
| |
Collapse
|