1
|
Raiyn J, Rayan A, Abu-Lafi S, Rayan A. From Sequence to Solution: Intelligent Learning Engine Optimization in Drug Discovery and Protein Analysis. BIOTECH 2024; 13:33. [PMID: 39311335 PMCID: PMC11417716 DOI: 10.3390/biotech13030033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 08/22/2024] [Accepted: 08/29/2024] [Indexed: 09/26/2024] Open
Abstract
This study introduces the intelligent learning engine (ILE) optimization technology, a novel approach designed to revolutionize screening processes in bioinformatics, cheminformatics, and a range of other scientific fields. By focusing on the efficient and precise identification of candidates with desirable characteristics, the ILE technology marks a significant leap forward in addressing the complexities of candidate selection in drug discovery, protein classification, and beyond. The study's primary objective is to address the challenges associated with optimizing screening processes to efficiently select candidates across various fields, including drug discovery and protein classification. The methodology employed involves a detailed algorithmic process that includes dataset preparation, encoding of protein sequences, sensor nucleation, and optimization, culminating in the empirical evaluation of molecular activity indexing, homology-based modeling, and classification of proteins such as G-protein-coupled receptors. This process showcases the method's success in multiple sequence alignment, protein identification, and classification. Key results demonstrate the ILE's superior accuracy in protein classification and virtual high-throughput screening, with a notable breakthrough in drug development for assessing drug-induced long QT syndrome risks through hERG potassium channel interaction analysis. The technology showcased exceptional results in the formulation and evaluation of novel cancer drug candidates, highlighting its potential for significant advancements in pharmaceutical innovations. The findings underline the ILE optimization technology as a transformative tool in screening processes due to its proven effectiveness and broad applicability across various domains. This breakthrough contributes substantially to the fields of systems optimization and holds promise for diverse applications, enhancing the process of selecting candidate molecules with target properties and advancing drug discovery, protein classification, and modeling.
Collapse
Affiliation(s)
- Jamal Raiyn
- Computer Science Department, Faculty of Science, Al-Qasemi Academic College, Baka EL-Garbiah 30100, Israel;
| | - Adam Rayan
- NGS Ac-Tech—Next Generation Scholars Ltd., Kabul 2496300, Israel;
| | - Saleh Abu-Lafi
- Faculty of Pharmacy, Al-Quds University, Abu-Dies 144, Palestine;
| | - Anwar Rayan
- NGS Ac-Tech—Next Generation Scholars Ltd., Kabul 2496300, Israel;
- Science and Technology Department, Faculty of Science, Al-Qasemi Academic College, Baka EL-Garbiah 30100, Israel
| |
Collapse
|
2
|
Yi J, Shi S, Fu L, Yang Z, Nie P, Lu A, Wu C, Deng Y, Hsieh C, Zeng X, Hou T, Cao D. OptADMET: a web-based tool for substructure modifications to improve ADMET properties of lead compounds. Nat Protoc 2024; 19:1105-1121. [PMID: 38263521 DOI: 10.1038/s41596-023-00942-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Accepted: 10/27/2023] [Indexed: 01/25/2024]
Abstract
Lead optimization is a crucial step in the drug discovery process, which aims to design potential drug candidates from biologically active hits. During lead optimization, active hits undergo modifications to improve their absorption, distribution, metabolism, excretion and toxicity (ADMET) profiles. Medicinal chemists face key questions regarding which compound(s) should be synthesized next and how to balance multiple ADMET properties. Reliable transformation rules from multiple experimental analyses are critical to improve this decision-making process. We developed OptADMET ( https://cadd.nscc-tj.cn/deploy/optadmet/ ), an integrated web-based platform that provides chemical transformation rules for 32 ADMET properties and leverages prior experimental data for lead optimization. The multiproperty transformation rule database contains a total of 41,779 validated transformation rules generated from the analysis of 177,191 reliable experimental datasets. Additionally, 146,450 rules were generated by analyzing 239,194 molecular data predictions. OptADMET provides the ADMET profiles of all optimized molecules from the queried molecule and enables the prediction of desirable substructure transformations and subsequent validation of drug candidates. OptADMET is based on matched molecular pairs analysis derived from synthetic chemistry, thus providing improved practicality over other methods. OptADMET is designed for use by both experimental and computational scientists.
Collapse
Affiliation(s)
- Jiacai Yi
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, China
- School of Computer Science, National University of Defense Technology, Changsha, China
| | - Shaohua Shi
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, China
- School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, China
| | - Li Fu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, China
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, China
| | - Ziyi Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, China
| | - Pengfei Nie
- National Supercomputer Center in Tianjin, Tianjin, China
| | - Aiping Lu
- School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, China
- Guangdong-Hong Kong-Macau Joint Lab on Chinese Medicine and Immune Disease Research, Guangzhou, China
| | - Chengkun Wu
- School of Computer Science, National University of Defense Technology, Changsha, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, China
| | - Changyu Hsieh
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, China
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Xiangxiang Zeng
- Deparment of Computer Science, Hunan University, Changsha, China
| | - Tingjun Hou
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, China.
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, China.
- School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, China.
| |
Collapse
|
3
|
Verma A, Awasthi A. Revolutionizing Drug Discovery: The Role of Artificial Intelligence and Machine Learning. Curr Pharm Des 2024; 30:807-810. [PMID: 38409722 DOI: 10.2174/0113816128298691240222054120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 01/29/2024] [Accepted: 02/12/2024] [Indexed: 02/28/2024]
Affiliation(s)
- Abhishek Verma
- Department of Pharmaceutics, ISF College of Pharmacy, Moga, Punjab 142001, India
| | - Ankit Awasthi
- Department of Pharmaceutics, ISF College of Pharmacy, Moga, Punjab 142001, India
| |
Collapse
|
4
|
Cinaglia P, Cannataro M. Forecasting COVID-19 Epidemic Trends by Combining a Neural Network with Rt Estimation. ENTROPY (BASEL, SWITZERLAND) 2022; 24:929. [PMID: 35885152 PMCID: PMC9322732 DOI: 10.3390/e24070929] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 06/25/2022] [Accepted: 07/01/2022] [Indexed: 11/16/2022]
Abstract
On 31 December 2019, a cluster of pneumonia cases of unknown etiology was reported in Wuhan (China). The cases were declared to be Coronavirus Disease 2019 (COVID-19) by the World Health Organization (WHO). COVID-19 has been defined as SARS Coronavirus 2 (SARS-CoV-2). Some countries, e.g., Italy, France, and the United Kingdom (UK), have been subjected to frequent restrictions for preventing the spread of infection, contrary to other ones, e.g., the United States of America (USA) and Sweden. The restrictions afflicted the evolution of trends with several perturbations that destabilized its normal evolution. Globally, Rt has been used to estimate time-varying reproduction numbers during epidemics. Methods: This paper presents a solution based on Deep Learning (DL) for the analysis and forecasting of epidemic trends in new positive cases of SARS-CoV-2 (COVID-19). It combined a neural network (NN) and an Rt estimation by adjusting the data produced by the output layer of the NN on the related Rt estimation. Results: Tests were performed on datasets related to the following countries: Italy, the USA, France, the UK, and Sweden. Positive case registration was retrieved between 24 February 2020 and 11 January 2022. Tests performed on the Italian dataset showed that our solution reduced the Mean Absolute Percentage Error (MAPE) by 28.44%, 39.36%, 22.96%, 17.93%, 28.10%, and 24.50% compared to other ones with the same configuration but that were based on the LSTM, GRU, RNN, ARIMA (1,0,3), and ARIMA (7,2,4) models, or an NN without applying the Rt as a corrective index. It also reduced MAPE by 17.93%, the Mean Absolute Error (MAE) by 34.37%, and the Root Mean Square Error (RMSE) by 43.76% compared to the same model without the adjustment performed by the Rt. Furthermore, it allowed an average MAPE reduction of 5.37%, 63.10%, 17.84%, and 14.91% on the datasets related to the USA, France, the UK, and Sweden, respectively.
Collapse
Affiliation(s)
- Pietro Cinaglia
- Department of Health Sciences, Magna Graecia University of Catanzaro, 88100 Catanzaro, Italy
| | - Mario Cannataro
- Department of Medical and Surgical Sciences, Magna Graecia University of Catanzaro, 88100 Catanzaro, Italy;
| |
Collapse
|
5
|
Yang ZY, Fu L, Lu AP, Liu S, Hou TJ, Cao DS. Semi-automated workflow for molecular pair analysis and QSAR-assisted transformation space expansion. J Cheminform 2021; 13:86. [PMID: 34774096 PMCID: PMC8590336 DOI: 10.1186/s13321-021-00564-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 10/30/2021] [Indexed: 12/01/2022] Open
Abstract
In the process of drug discovery, the optimization of lead compounds has always been a challenge faced by pharmaceutical chemists. Matched molecular pair analysis (MMPA), a promising tool to efficiently extract and summarize the relationship between structural transformation and property change, is suitable for local structural optimization tasks. Especially, the integration of MMPA with QSAR modeling can further strengthen the utility of MMPA in molecular optimization navigation. In this study, a new semi-automated procedure based on KNIME was developed to support MMPA on both large- and small-scale datasets, including molecular preparation, QSAR model construction, applicability domain evaluation, and MMP calculation and application. Two examples covering regression and classification tasks were provided to gain a better understanding of the importance of MMPA, which has also shown the reliability and utility of this MMPA-by-QSAR pipeline. ![]()
Collapse
Affiliation(s)
- Zi-Yi Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, People's Republic of China.,Hunan Key Laboratory of Diagnostic and Therapeutic Drug Research for Chronic Diseases, Changsha, 410013, Hunan, China
| | - Li Fu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, People's Republic of China.,Hunan Key Laboratory of Diagnostic and Therapeutic Drug Research for Chronic Diseases, Changsha, 410013, Hunan, China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, 999077, SAR, People's Republic of China
| | - Shao Liu
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha, 410008, Hunan, People's Republic of China
| | - Ting-Jun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, People's Republic of China.
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, People's Republic of China. .,Hunan Key Laboratory of Diagnostic and Therapeutic Drug Research for Chronic Diseases, Changsha, 410013, Hunan, China. .,Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, 999077, SAR, People's Republic of China.
| |
Collapse
|
6
|
Shan W, Li X, Yao H, Lin K. Convolutional Neural Network-based Virtual Screening. Curr Med Chem 2021; 28:2033-2047. [PMID: 32452320 DOI: 10.2174/0929867327666200526142958] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 04/19/2020] [Accepted: 04/30/2020] [Indexed: 11/22/2022]
Abstract
Virtual screening is an important means for lead compound discovery. The scoring function is the key to selecting hit compounds. Many scoring functions are currently available; however, there are no all-purpose scoring functions because different scoring functions tend to have conflicting results. Recently, neural networks, especially convolutional neural networks, have constantly been penetrating drug design and most CNN-based virtual screening methods are superior to traditional docking methods, such as Dock and AutoDock. CNNbased virtual screening is expected to improve the previous model of overreliance on computational chemical screening. Utilizing the powerful learning ability of neural networks provides us with a new method for evaluating compounds. We review the latest progress of CNN-based virtual screening and propose prospects.
Collapse
Affiliation(s)
- Wenying Shan
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Xuanyi Li
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Hequan Yao
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Kejiang Lin
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing, China
| |
Collapse
|
7
|
Sansare S, Duran T, Mohammadiarani H, Goyal M, Yenduri G, Costa A, Xu X, O'Connor T, Burgess D, Chaudhuri B. Artificial neural networks in tandem with molecular descriptors as predictive tools for continuous liposome manufacturing. Int J Pharm 2021; 603:120713. [PMID: 34019974 DOI: 10.1016/j.ijpharm.2021.120713] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 05/08/2021] [Accepted: 05/12/2021] [Indexed: 01/12/2023]
Abstract
The current study utilized an artificial neural network (ANN) to generate computational models to achieve process optimization for a previously developed continuous liposome manufacturing system. The liposome formation was based on a continuous manufacturing system with a co-axial turbulent jet in a co-flow technology. The ethanol phase with lipids and aqueous phase resulted in liposomes of homogeneous sizes. The input features of the ANN included critical material attributes (CMAs) (e.g., hydrocarbon tail length, cholesterol percent, and buffer type) and critical process parameters (CPPs) (e.g., solvent temperature and flow rate), while the ANN outputs included critical quality attributes (CQAs) of liposomes (i.e., particle size and polydispersity index (PDI)). Two common ANN architectures, multiple-input-multiple-output (MIMO) models and multiple-input-single-output (MISO) models, were evaluated in this study, where the MISO outperformed MIMO with improved accuracy. Molecular descriptors, obtained from PaDEL-Descriptor software, were used to capture the physicochemical properties of the lipids and used in training of the ANN. The combination of CMAs, CPPs, and molecular descriptors as inputs to the MISO ANN model reduced the training and testing mean relative error. Additionally, a graphic user interface (GUI) was successfully developed to assist the end-user in performing interactive simulated risk analysis and visualizing model predictions.
Collapse
Affiliation(s)
- Sameera Sansare
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, CT 06269, USA
| | - Tibo Duran
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, CT 06269, USA
| | | | - Manish Goyal
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA
| | - Gowtham Yenduri
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, CT 06269, USA
| | - Antonio Costa
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, CT 06269, USA
| | - Xiaoming Xu
- Division of Product Quality Research, Office of Testing and Research, Office of Pharmaceutical Quality/CDER/FDA, Silver Spring, MD 20993, USA
| | - Thomas O'Connor
- Division of Product Quality Research, Office of Testing and Research, Office of Pharmaceutical Quality/CDER/FDA, Silver Spring, MD 20993, USA
| | - Diane Burgess
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, CT 06269, USA
| | - Bodhisattwa Chaudhuri
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, CT 06269, USA; Institute of Material Sciences, University of Connecticut, Storrs, CT 06269, USA; Department of Chemical and Biomolecular Engineering, University of Connecticut, Storrs, CT 06269, USA.
| |
Collapse
|
8
|
Kapourani A, Valkanioti V, Kontogiannopoulos KN, Barmpalexis P. Determination of the physical state of a drug in amorphous solid dispersions using artificial neural networks and ATR-FTIR spectroscopy. INTERNATIONAL JOURNAL OF PHARMACEUTICS-X 2020; 2:100064. [PMID: 33354666 PMCID: PMC7744708 DOI: 10.1016/j.ijpx.2020.100064] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 11/27/2020] [Accepted: 11/28/2020] [Indexed: 12/11/2022]
Abstract
The objective of the present study was to evaluate the use of artificial neural networks (ANNs) in the development of a new chemometric model that will be able to simultaneously distinguish and quantify the percentage of the crystalline and the neat amorphous drug located within the drug-rich amorphous zones formed in an amorphous solid dispersion (ASD) system. Attenuated total reflectance Fourier-transform infrared (ATR-FTIR) spectroscopy was used, while Rivaroxaban (RIV, drug) and Soluplus® (SOL, matrix-carrier) were selected for the preparation of a suitable ASD model system. Adequate calibration and test sets were prepared by spiking different percentages of the crystalline and the amorphous drug in the ASDs (prepared by the melting - quench cooling approach), while a 24 full factorial experimental design was employed for the screening of ANN's structure and training parameters as well as spectra region selection and data preprocessing. Results showed increased prediction performance, measured based on the root mean squared error of prediction (RMSEp) for the test sample, for both the crystalline (RMSEp (crystal) = 0.86) and the amorphous (RMSEp (amorphous) = 2.14) drug. Comparison with traditional regression techniques, such as partial least square and principle component regressions, revealed the superiority of ANNs, indicating that in cases of high structural similarity between the investigated compounds (i.e., the crystalline and the amorphous forms of the same compound) the implementation of more powerful/sophisticated regression techniques, such as ANNs, is mandatory.
Collapse
Affiliation(s)
- Afroditi Kapourani
- Department of Pharmaceutical Technology, School of Pharmacy, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
| | - Vasiliki Valkanioti
- Department of Pharmaceutical Technology, School of Pharmacy, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
| | - Konstantinos N Kontogiannopoulos
- Department of Pharmaceutical Technology, School of Pharmacy, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece.,Ecoresources P.C., 15-17 Giannitson-Santaroza Str., Thessaloniki 54627, Greece
| | - Panagiotis Barmpalexis
- Department of Pharmaceutical Technology, School of Pharmacy, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
| |
Collapse
|
9
|
Serrano A, Imbernón B, Pérez-Sánchez H, Cecilia JM, Bueno-Crespo A, Abellán JL. QN-Docking: An innovative molecular docking methodology based on Q-Networks. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106678] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
|
10
|
Li X, Xu Y, Yao H, Lin K. Chemical space exploration based on recurrent neural networks: applications in discovering kinase inhibitors. J Cheminform 2020; 12:42. [PMID: 33430983 PMCID: PMC7278228 DOI: 10.1186/s13321-020-00446-3] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Accepted: 06/04/2020] [Indexed: 01/10/2023] Open
Abstract
With the rise of artificial intelligence (AI) in drug discovery, de novo molecular generation provides new ways to explore chemical space. However, because de novo molecular generation methods rely on abundant known molecules, generated molecules may have a problem of novelty. Novelty is important in highly competitive areas of medicinal chemistry, such as the discovery of kinase inhibitors. In this study, de novo molecular generation based on recurrent neural networks was applied to discover a new chemical space of kinase inhibitors. During the application, the practicality was evaluated, and new inspiration was found. With the successful discovery of one potent Pim1 inhibitor and two lead compounds that inhibit CDK4, AI-based molecular generation shows potentials in drug discovery and development.![]()
Collapse
Affiliation(s)
- Xuanyi Li
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing, 210009, China
| | - Yinqiu Xu
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing, 210009, China
| | - Hequan Yao
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing, 210009, China.
| | - Kejiang Lin
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing, 210009, China.
| |
Collapse
|
11
|
Chuang KV, Gunsalus LM, Keiser MJ. Learning Molecular Representations for Medicinal Chemistry. J Med Chem 2020; 63:8705-8722. [PMID: 32366098 DOI: 10.1021/acs.jmedchem.0c00385] [Citation(s) in RCA: 78] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
The accurate modeling and prediction of small molecule properties and bioactivities depend on the critical choice of molecular representation. Decades of informatics-driven research have relied on expert-designed molecular descriptors to establish quantitative structure-activity and structure-property relationships for drug discovery. Now, advances in deep learning make it possible to efficiently and compactly learn molecular representations directly from data. In this review, we discuss how active research in molecular deep learning can address limitations of current descriptors and fingerprints while creating new opportunities in cheminformatics and virtual screening. We provide a concise overview of the role of representations in cheminformatics, key concepts in deep learning, and argue that learning representations provides a way forward to improve the predictive modeling of small molecule bioactivities and properties.
Collapse
Affiliation(s)
- Kangway V Chuang
- Department of Pharmaceutical Chemistry, Department of Bioengineering & Therapeutic Sciences, Institute for Neurodegenerative Diseases, Kavli Institute for Fundamental Neuroscience, Bakar Computational Health Sciences Institute, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, California 94143, United States
| | - Laura M Gunsalus
- Department of Pharmaceutical Chemistry, Department of Bioengineering & Therapeutic Sciences, Institute for Neurodegenerative Diseases, Kavli Institute for Fundamental Neuroscience, Bakar Computational Health Sciences Institute, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, California 94143, United States
| | - Michael J Keiser
- Department of Pharmaceutical Chemistry, Department of Bioengineering & Therapeutic Sciences, Institute for Neurodegenerative Diseases, Kavli Institute for Fundamental Neuroscience, Bakar Computational Health Sciences Institute, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, California 94143, United States
| |
Collapse
|
12
|
Chen G, Shen Z, Iyer A, Ghumman UF, Tang S, Bi J, Chen W, Li Y. Machine-Learning-Assisted De Novo Design of Organic Molecules and Polymers: Opportunities and Challenges. Polymers (Basel) 2020; 12:E163. [PMID: 31936321 PMCID: PMC7023065 DOI: 10.3390/polym12010163] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Revised: 12/27/2019] [Accepted: 01/02/2020] [Indexed: 12/18/2022] Open
Abstract
Organic molecules and polymers have a broad range of applications in biomedical, chemical, and materials science fields. Traditional design approaches for organic molecules and polymers are mainly experimentally-driven, guided by experience, intuition, and conceptual insights. Though they have been successfully applied to discover many important materials, these methods are facing significant challenges due to the tremendous demand of new materials and vast design space of organic molecules and polymers. Accelerated and inverse materials design is an ideal solution to these challenges. With advancements in high-throughput computation, artificial intelligence (especially machining learning, ML), and the growth of materials databases, ML-assisted materials design is emerging as a promising tool to flourish breakthroughs in many areas of materials science and engineering. To date, using ML-assisted approaches, the quantitative structure property/activity relation for material property prediction can be established more accurately and efficiently. In addition, materials design can be revolutionized and accelerated much faster than ever, through ML-enabled molecular generation and inverse molecular design. In this perspective, we review the recent progresses in ML-guided design of organic molecules and polymers, highlight several successful examples, and examine future opportunities in biomedical, chemical, and materials science fields. We further discuss the relevant challenges to solve in order to fully realize the potential of ML-assisted materials design for organic molecules and polymers. In particular, this study summarizes publicly available materials databases, feature representations for organic molecules, open-source tools for feature generation, methods for molecular generation, and ML models for prediction of material properties, which serve as a tutorial for researchers who have little experience with ML before and want to apply ML for various applications. Last but not least, it draws insights into the current limitations of ML-guided design of organic molecules and polymers. We anticipate that ML-assisted materials design for organic molecules and polymers will be the driving force in the near future, to meet the tremendous demand of new materials with tailored properties in different fields.
Collapse
Affiliation(s)
- Guang Chen
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA; (G.C.); (Z.S.)
| | - Zhiqiang Shen
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA; (G.C.); (Z.S.)
| | - Akshay Iyer
- Department of Mechanical Engineering, Northwestern University, Evanston, IL 60208, USA; (A.I.); (U.F.G.)
| | - Umar Farooq Ghumman
- Department of Mechanical Engineering, Northwestern University, Evanston, IL 60208, USA; (A.I.); (U.F.G.)
| | - Shan Tang
- State Key Laboratory of Structural Analysis for Industrial Equipment, Department of Engineering Mechanics, and International Research Center for Computational Mechanics, Dalian University of Technology, Dalian 116023, China;
| | - Jinbo Bi
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA;
| | - Wei Chen
- Department of Mechanical Engineering, Northwestern University, Evanston, IL 60208, USA; (A.I.); (U.F.G.)
| | - Ying Li
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA; (G.C.); (Z.S.)
- Polymer Program, Institute of Materials Science, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
13
|
Kwon Y, Yoo J, Choi YS, Son WJ, Lee D, Kang S. Efficient learning of non-autoregressive graph variational autoencoders for molecular graph generation. J Cheminform 2019; 11:70. [PMID: 33430985 PMCID: PMC6873411 DOI: 10.1186/s13321-019-0396-x] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Accepted: 11/13/2019] [Indexed: 11/10/2022] Open
Abstract
With the advancements in deep learning, deep generative models combined with graph neural networks have been successfully employed for data-driven molecular graph generation. Early methods based on the non-autoregressive approach have been effective in generating molecular graphs quickly and efficiently but have suffered from low performance. In this paper, we present an improved learning method involving a graph variational autoencoder for efficient molecular graph generation in a non-autoregressive manner. We introduce three additional learning objectives and incorporate them into the training of the model: approximate graph matching, reinforcement learning, and auxiliary property prediction. We demonstrate the effectiveness of the proposed method by evaluating it for molecular graph generation tasks using QM9 and ZINC datasets. The model generates molecular graphs with high chemical validity and diversity compared with existing non-autoregressive methods. It can also conditionally generate molecular graphs satisfying various target conditions.
Collapse
Affiliation(s)
- Youngchun Kwon
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon, Republic of Korea
- Department of Computer Science and Engineering, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, Republic of Korea
| | - Jiho Yoo
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon, Republic of Korea
| | - Youn-Suk Choi
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon, Republic of Korea
| | - Won-Joon Son
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon, Republic of Korea
| | - Dongseon Lee
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon, Republic of Korea
| | - Seokho Kang
- Department of Systems Management Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon, Republic of Korea.
| |
Collapse
|
14
|
Deep learning in drug discovery: opportunities, challenges and future prospects. Drug Discov Today 2019; 24:2017-2032. [DOI: 10.1016/j.drudis.2019.07.006] [Citation(s) in RCA: 104] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2019] [Revised: 06/11/2019] [Accepted: 07/18/2019] [Indexed: 12/27/2022]
|
15
|
Neural networks in drug discovery: current insights from medicinal chemists. Future Med Chem 2019; 11:1669-1672. [DOI: 10.4155/fmc-2019-0118] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
|
16
|
Abstract
Quantification of noncovalent interactions is the key for the understanding of binding mechanisms, of biological systems, for the design of drugs, their delivery and for the design of receptors for separations, sensors, actuators, or smart materials.
Collapse
|