1
|
Ahmad F, Muhmood T. Clinical translation of nanomedicine with integrated digital medicine and machine learning interventions. Colloids Surf B Biointerfaces 2024; 241:114041. [PMID: 38897022 DOI: 10.1016/j.colsurfb.2024.114041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 06/11/2024] [Accepted: 06/13/2024] [Indexed: 06/21/2024]
Abstract
Nanomaterials based therapeutics transform the ways of disease prevention, diagnosis and treatment with increasing sophistications in nanotechnology at a breakneck pace, but very few could reach to the clinic due to inconsistencies in preclinical studies followed by regulatory hinderances. To tackle this, integrating the nanomedicine discovery with digital medicine provide technologies as tools of specific biological activity measurement. Hence, overcome the redundancies in nanomedicine discovery by the on-site data acquisition and analytics through integrating intelligent sensors and artificial intelligence (AI) or machine learning (ML). Integrated AI/ML wearable sensors directly gather clinically relevant biochemical information from the subject's body and process data for physicians to make right clinical decision(s) in a time and cost-effective way. This review summarizes insights and recommend the infusion of actionable big data computation enabled sensors in burgeoning field of nanomedicine at academia, research institutes, and pharmaceutical industries, with a potential of clinical translation. Furthermore, many blind spots are present in modern clinically relevant computation, one of which could prevent ML-guided low-cost new nanomedicine development from being successfully translated into the clinic was also discussed.
Collapse
Affiliation(s)
- Farooq Ahmad
- State Key Laboratory of Chemistry and Utilization of Carbon Based Energy Resources, College of Chemistry, Xinjiang University, Urumqi 830017, China.
| | - Tahir Muhmood
- International Iberian Nanotechnology Laboratory (INL), Avenida Mestre José Veiga, Braga 4715-330, Portugal.
| |
Collapse
|
2
|
Saifi I, Bhat BA, Hamdani SS, Bhat UY, Lobato-Tapia CA, Mir MA, Dar TUH, Ganie SA. Artificial intelligence and cheminformatics tools: a contribution to the drug development and chemical science. J Biomol Struct Dyn 2024; 42:6523-6541. [PMID: 37434311 DOI: 10.1080/07391102.2023.2234039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Accepted: 07/03/2023] [Indexed: 07/13/2023]
Abstract
In the ever-evolving field of drug discovery, the integration of Artificial Intelligence (AI) and Machine Learning (ML) with cheminformatics has proven to be a powerful combination. Cheminformatics, which combines the principles of computer science and chemistry, is used to extract chemical information and search compound databases, while the application of AI and ML allows for the identification of potential hit compounds, optimization of synthesis routes, and prediction of drug efficacy and toxicity. This collaborative approach has led to the discovery, preclinical evaluations and approval of over 70 drugs in recent years. To aid researchers in the pursuit of new drugs, this article presents a comprehensive list of databases, datasets, predictive and generative models, scoring functions and web platforms that have been launched between 2021 and 2022. These resources provide a wealth of information and tools for computer-assisted drug development, and are a valuable asset for those working in the field of cheminformatics. Overall, the integration of AI, ML and cheminformatics has greatly advanced the drug discovery process and continues to hold great potential for the future. As new resources and technologies become available, we can expect to see even more groundbreaking discoveries and advancements in these fields.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Ifra Saifi
- Chaudhary Charan Singh University, Meerut, Uttar Pradesh, India
| | - Basharat Ahmad Bhat
- Department of Bioresources, School of Biological Sciences, University of Kashmir, Srinagar, J&K, India
| | - Syed Suhail Hamdani
- Department of Bioresources, School of Biological Sciences, University of Kashmir, Srinagar, J&K, India
| | - Umar Yousuf Bhat
- Department of Zoology, School of Biological Sciences, University of Kashmir, Srinagar, J&K, India
| | | | - Mushtaq Ahmad Mir
- Department of Clinical Laboratory Sciences, College of Applied Medical Science, King Khalid University, KSA, Saudi Arabia
| | - Tanvir Ul Hasan Dar
- Department of Biotechnology, School of Biosciences and Biotechnology, BGSB University, Rajouri, India
| | - Showkat Ahmad Ganie
- Department of Clinical Biochemistry, School of Biological Sciences, University of Kashmir, Srinagar, J&K, India
| |
Collapse
|
3
|
Ayala-Orozco C, Teimouri H, Medvedeva A, Li B, Lathem A, Li G, Kolomeisky AB, Tour JM. Chemoinformatics Insights on Molecular Jackhammers and Cancer Cells. J Chem Inf Model 2024; 64:5570-5579. [PMID: 38958581 DOI: 10.1021/acs.jcim.4c00806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/04/2024]
Abstract
One of the most challenging tasks in modern medicine is to find novel efficient cancer therapeutic methods with minimal side effects. The recent discovery of several classes of organic molecules known as "molecular jackhammers" is a promising development in this direction. It is known that these molecules can directly target and eliminate cancer cells with no impact on healthy tissues. However, the underlying microscopic picture remains poorly understood. We present a study that utilizes theoretical analysis together with experimental measurements to clarify the microscopic aspects of jackhammers' anticancer activities. Our physical-chemical approach combines statistical analysis with chemoinformatics methods to design and optimize molecular jackhammers. By correlating specific physical-chemical properties of these molecules with their abilities to kill cancer cells, several important structural features are identified and discussed. Although our theoretical analysis enhances understanding of the molecular interactions of jackhammers, it also highlights the need for further research to comprehensively elucidate their mechanisms and to develop a robust physical-chemical framework for the rational design of targeted anticancer drugs.
Collapse
Affiliation(s)
| | - Hamid Teimouri
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
| | - Angela Medvedeva
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
| | - Bowen Li
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
| | - Alex Lathem
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
| | - Gang Li
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
| | - Anatoly B Kolomeisky
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
- Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas 77005, United States
| | - James M Tour
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
- Department of Physics and Astronomy, Rice University, Houston, Texas 77005, United States
- Department of Materials Science and NanoEngineering, Rice University, Houston, Texas 77005, United States
- Smalley-Curl Institute, Rice University, Houston, Texas 77005, United States
- Rice Advanced Materials Institute, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
4
|
Yang L, Guo Q, Zhang L. AI-assisted chemistry research: a comprehensive analysis of evolutionary paths and hotspots through knowledge graphs. Chem Commun (Camb) 2024; 60:6977-6987. [PMID: 38910536 DOI: 10.1039/d4cc01892c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/25/2024]
Abstract
Artificial intelligence (AI) offers transformative potential for chemical research through its ability to optimize reactions and processes, enhance energy efficiency, and reduce waste. AI-assisted chemical research (AI + chem) has become a global hotspot. To better understand the current research status of "AI + chem", this study conducted a scientific bibliometric investigation using CiteSpace. The web of science core collection was utilized to retrieve original articles related to "AI + chem" published from 2000 to 2024. The obtained data allowed for the visualization of the knowledge background, current research status, and latest knowledge structure of "AI + chem". The "AI + chem" has entered a stage of explosive growth, and the number of papers will maintain long-term high-speed growth. This article systematically analyzes the latest progress in "AI + chem" and objectively predicts future trends, including molecular design, reaction prediction, materials design, drug design, and quantum chemistry. The outcomes of this study will provide readers with a comprehensive understanding of the overall landscape of "AI + chem".
Collapse
Affiliation(s)
- Lin Yang
- School of Intellectual Property, Dalian University of Technology, Dalian 116024, Liaoning, P. R. China
| | - Qingle Guo
- School of Intellectual Property, Dalian University of Technology, Dalian 116024, Liaoning, P. R. China
| | - Lijing Zhang
- School of Chemistry, Dalian University of Technology, Dalian 116024, Liaoning, P. R. China.
| |
Collapse
|
5
|
Wang QQ, Song J, Wei D. Origin of Chemoselectivity of Halohydrin Dehalogenase-Catalyzed Epoxide Ring-Opening Reactions. J Chem Inf Model 2024; 64:4530-4541. [PMID: 38808649 DOI: 10.1021/acs.jcim.4c00640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2024]
Abstract
By performing molecular dynamics (MD), quantum mechanical/molecular mechanical (QM/MM) calculations, and QM cluster calculations, the origin of chemoselectivity of halohydrin dehalogenase (HHDH)-catalyzed ring-opening reactions of epoxide with the nucleophilic reagent NO2- has been explored. Four possible chemoselective pathways were considered, and the computed results indicate that the pathway associated with the nucleophilic attack on the Cα position of epoxide by NO2- is most energetically favorable and has an energy barrier of 12.9 kcal/mol, which is close to 14.1 kcal/mol derived from experimental kinetic data. A hydrogen bonding network formed by residues Ser140, Tyr153, and Arg157 can strengthen the electrophilicity of the active site of the epoxide substrate to affect chemoselectivity. To predict the energy barrier trends of the chemoselective transition states, multiple analyses including distortion analysis and electrophilic Parr function (Pk+) analysis were carried out with or without an enzyme environment. The obtained insights should be valuable for the rational design of enzyme-catalyzed and biomimetic organocatalytic epoxide ring-opening reactions with special chemoselectivity.
Collapse
Affiliation(s)
- Qian-Qian Wang
- College of Chemistry, Zhengzhou University, 100 Science Avenue, Zhengzhou 450001, Henan, P. R. China
| | - Jinshuai Song
- College of Chemistry, Zhengzhou University, 100 Science Avenue, Zhengzhou 450001, Henan, P. R. China
| | - Donghui Wei
- College of Chemistry, Zhengzhou University, 100 Science Avenue, Zhengzhou 450001, Henan, P. R. China
| |
Collapse
|
6
|
Cao W, Wu N, Zhang S, Qi Y, Guo R, Wang Z, Qu R. Photodegradation of polychlorinated biphenyls in water/nitrogen-doped silica and air/nitrogen-doped silica systems: Kinetics, mechanism and quantitative structure activity relationship (QSAR) analysis. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 924:171586. [PMID: 38461975 DOI: 10.1016/j.scitotenv.2024.171586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 03/05/2024] [Accepted: 03/07/2024] [Indexed: 03/12/2024]
Abstract
Developing efficient and low-cost photocatalytic materials is essential for removing polychlorinated biphenyls (PCBs). In this work, the photodegradation process of fourteen representative polychlorinated biphenyls (PCBs) in both water/nitrogen-doped SiO2 (N-SiO2) and air/N-SiO2 systems was studied. The photodegradation kinetics of PCBs is consistent with the pseudo-first-order kinetic equation. The variation in the degradation effects of different PCBs in the two systems is primarily related to the position of the Cl substituent and the effective absorption wavelength range of PCBs. A total of fourteen intermediates for 4'-Dichlorobiphenyl (PCB-15), 2,2',4,4',6,6'-Hexachlorobiphenyl (PCB-155), and 2,2',3,3',4,4',5,5',6,6'-Decachlorobiphenyl (PCB-209) generated from four reaction pathways were identified based on both mass spectrometry analysis and theoretical calculations. Using the values of lnk (k denotes pseudo-first-order kinetic constants) for the 11 PCBs in the training set and the calculated molecular and structural parameters, quantitative structure-activity relationship (QSAR) models for the two systems were constructed by using multiple linear regression (MLR) method to better understand the factors affecting the photodegradation rate of PCBs. The QSAR equations were obtained with Cl atom substitution at position 3 (N3) as the main parameter, which were lnk = -1.98 - 0.19 N3 for the water/N-SiO2 system and lnk = -1.56 - 0.34 N3 for the air/N-SiO2 system, with the correlation coefficient (R2) of 0.66 and 0.73, leave-one-out cross-validation (Q2LOO) of 0.51 and 0.59, respectively, and bootstrapping validation coefficients (Q2BOOT) values of both 0.74, confirming that the models were well fitted and showed high robustness and prediction ability. This study provides valuable insights into photocatalytic degradation studies of PCBs.
Collapse
Affiliation(s)
- Wenqian Cao
- State Key Laboratory of Pollution Control and Resources Reuse, School of the Environment, Nanjing University, Jiangsu, Nanjing 210023, PR China
| | - Nannan Wu
- State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of Agro-products, Institute of Agro-product Safety and Nutrition, Zhejiang Academy of Agricultural Sciences, Hangzhou 310021, PR China
| | - Shengnan Zhang
- State Key Laboratory of Pollution Control and Resources Reuse, School of the Environment, Nanjing University, Jiangsu, Nanjing 210023, PR China
| | - Yumeng Qi
- State Key Laboratory of Pollution Control and Resources Reuse, School of the Environment, Nanjing University, Jiangsu, Nanjing 210023, PR China
| | - Ruixue Guo
- State Key Laboratory of Pollution Control and Resources Reuse, School of the Environment, Nanjing University, Jiangsu, Nanjing 210023, PR China
| | - Zunyao Wang
- State Key Laboratory of Pollution Control and Resources Reuse, School of the Environment, Nanjing University, Jiangsu, Nanjing 210023, PR China
| | - Ruijuan Qu
- State Key Laboratory of Pollution Control and Resources Reuse, School of the Environment, Nanjing University, Jiangsu, Nanjing 210023, PR China.
| |
Collapse
|
7
|
Duda J, Podlewska S. Prediction of probability distributions of molecular properties: towards more efficient virtual screening and better understanding of compound representations. Mol Divers 2024; 28:437-448. [PMID: 36586082 DOI: 10.1007/s11030-022-10589-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 12/18/2022] [Indexed: 01/01/2023]
Abstract
Various in silico approaches to predict activity and properties of chemical compounds constitute nowadays the basis of computer-aided drug design. While there is a general focus on the predictions of values, mathematically more appropriate is the prognosis of probability distributions, which offers additional possibilities, such as the evaluation of uncertainty, higher moments, and quantiles. In this study, we applied the Hierarchical Correlation Reconstruction approach to assess several ADMET properties of chemical compounds. It uses multiple linear regression to independently assess multiple moments, which are then finally combined into predicted probability distribution. The method enables inexpensive selection of compounds with properties nearly certain to fall into the particular range during virtual screening and automatic rejection of predictions characterized by high rate of uncertainty; however, unlike to the currently used virtual screening methods, it focuses on the prediction of the property distribution, not its actual value. Moreover, the presented protocol enables detection of structural features, which should be carefully considered when optimizing compounds towards particular property, as well as it provides deeper understanding of the examined compound representations.
Collapse
Affiliation(s)
- Jarosław Duda
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348, Kraków, Poland
| | - Sabina Podlewska
- Department of Medicinal Chemistry, Maj Institute of Pharmacology, Polish Academy of Sciences, Smętna Street 12, 31-343, Kraków, Poland.
| |
Collapse
|
8
|
Humayun F, Khan F, Khan A, Alshammari A, Ji J, Farhan A, Fawad N, Alam W, Ali A, Wei DQ. De novo generation of dual-target ligands for the treatment of SARS-CoV-2 using deep learning, virtual screening, and molecular dynamic simulations. J Biomol Struct Dyn 2024; 42:3019-3029. [PMID: 37449757 DOI: 10.1080/07391102.2023.2234481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Accepted: 04/30/2023] [Indexed: 07/18/2023]
Abstract
De novo generation of molecules with the necessary features offers a promising opportunity for artificial intelligence, such as deep generative approaches. However, creating novel compounds having biological activities toward two distinct targets continues to be a very challenging task. In this study, we develop a unique computational framework for the de novo synthesis of bioactive compounds directed at two predetermined therapeutic targets. This framework is referred to as the dual-target ligand generative network. Our approach uses a stochastic policy to explore chemical spaces called a sequence-based simple molecular input line entry system (SMILES) generator. The steps in the high-level workflow would be to gather and prepare the training data for both targets' molecules, build a neural network model and train it to make molecules, create new molecules using generative AI, and then virtually screen the newly validated molecules against the SARS-CoV-2 PLpro and 3CLpro drug targets. Results shows that novel molecules generated have higher binding affinity with both targets than the conventional drug i.e. Remdesivir being used for the treatment of SARS-CoV-2.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Fahad Humayun
- Department of Bioinformatics and Biological Statistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, PR China
- State Key Laboratory of Microbial Metabolism and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, PR China
| | - Fatima Khan
- National Institute of Health, Islamabad, Pakistan
| | - Abbas Khan
- Department of Bioinformatics and Biological Statistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, PR China
- State Key Laboratory of Microbial Metabolism and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, PR China
| | - Abdulrahman Alshammari
- Department of Pharmacology and Toxicology, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Jun Ji
- Henan Provincial Engineering and Technology Center of Health Products for Livestock and Poultry, Henan Provincial Engineering and Technology Center of Animal Disease Diagnosis and Integrated Control, Nanyang Normal University, Nanyang, PR China
| | - Ali Farhan
- Department of Chemistry, Chung Yuan Christian University, Taoyuan, Taiwan
| | - Nasim Fawad
- Poultry Research Institute, Rawalpindi, Pakistan
| | - Waheed Alam
- National Institute of Health, Islamabad, Pakistan
| | - Arif Ali
- Department of Bioinformatics and Biological Statistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, PR China
- State Key Laboratory of Microbial Metabolism and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, PR China
| | - Dong-Qing Wei
- Department of Bioinformatics and Biological Statistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, PR China
- State Key Laboratory of Microbial Metabolism and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, PR China
- Centre for Research in Molecular Modeling, Concordia University, Québec, Canada
| |
Collapse
|
9
|
Velásquez-López Y, Ruiz-Escudero A, Arrasate S, González-Díaz H. Implementation of IFPTML Computational Models in Drug Discovery Against Flaviviridae Family. J Chem Inf Model 2024; 64:1841-1852. [PMID: 38466369 DOI: 10.1021/acs.jcim.3c01796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]
Abstract
The Flaviviridae family consists of single-stranded positive-sense RNA viruses, which contains the genera Flavivirus, Hepacivirus, Pegivirus, and Pestivirus. Currently, there is an outbreak of viral diseases caused by this family affecting millions of people worldwide, leading to significant morbidity and mortality rates. Advances in computational chemistry have greatly facilitated the discovery of novel drugs and treatments for diseases associated with this family. Chemoinformatic techniques, such as the perturbation theory machine learning method, have played a crucial role in developing new approaches based on ML models that can effectively aid drug discovery. The IFPTML models have shown its capability to handle, classify, and process large data sets with high specificity. The results obtained from different models indicates that this methodology is proficient in processing the data, resulting in a reduction of the false positive rate by 4.25%, along with an accuracy of 83% and reliability of 92%. These values suggest that the model can serve as a computational tool in assisting drug discovery efforts and the development of new treatments against Flaviviridae family diseases.
Collapse
Affiliation(s)
- Yendrek Velásquez-López
- Departamento de Química Orgánica e Inorgánica, Facultad de Ciencia y Tecnología, Universidad del País Vasco/Euskal Herriko Unibertsitatea UPV/EHU. Apdo. 644. 48080 Bilbao (Spain)
- Bio-Cheminformatics Research Group, Universidad de Las Américas, Quito 170504, (Ecuador)
| | - Andrea Ruiz-Escudero
- Department of Pharmacology, University of the Basque Country UPV/EHU, 48940 Leioa, (Spain)
- IKERDATA S.L., ZITEK, University of Basque Country UPV/EHU, Rectorate Building, 48940 Leioa, Spain
| | - Sonia Arrasate
- Departamento de Química Orgánica e Inorgánica, Facultad de Ciencia y Tecnología, Universidad del País Vasco/Euskal Herriko Unibertsitatea UPV/EHU. Apdo. 644. 48080 Bilbao (Spain)
| | - Humberto González-Díaz
- Departamento de Química Orgánica e Inorgánica, Facultad de Ciencia y Tecnología, Universidad del País Vasco/Euskal Herriko Unibertsitatea UPV/EHU. Apdo. 644. 48080 Bilbao (Spain)
- BIOFISIKA, Basque Center for Biophysics CSIC-UPV/EHU, 48940 Bilbao (Spain)
- IKERBASQUE, Basque Foundation for Science, 48011 Bilbao (Spain)
| |
Collapse
|
10
|
Takiguchi Y, Nakane D, Akitsu T. The prediction of single-molecule magnet properties via deep learning. IUCRJ 2024; 11:182-189. [PMID: 38299376 PMCID: PMC10916298 DOI: 10.1107/s2052252524000770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 01/22/2024] [Indexed: 02/02/2024]
Abstract
This paper uses deep learning to present a proof-of-concept for data-driven chemistry in single-molecule magnets (SMMs). Previous discussions within SMM research have proposed links between molecular structures (crystal structures) and single-molecule magnetic properties; however, these have only interpreted the results. Therefore, this study introduces a data-driven approach to predict the properties of SMM structures using deep learning. The deep-learning model learns the structural features of the SMM molecules by extracting the single-molecule magnetic properties from the 3D coordinates presented in this paper. The model accurately determined whether a molecule was a single-molecule magnet, with an accuracy rate of approximately 70% in predicting the SMM properties. The deep-learning model found SMMs from 20 000 metal complexes extracted from the Cambridge Structural Database. Using deep-learning models for predicting SMM properties and guiding the design of novel molecules is promising.
Collapse
Affiliation(s)
- Yuji Takiguchi
- Department of Chemistry, Tokyo University of Science, 1-3 Kagurazaka, Shinjuku-ku, Tokyo 1628601, Japan
| | - Daisuke Nakane
- Department of Chemistry, Tokyo University of Science, 1-3 Kagurazaka, Shinjuku-ku, Tokyo 1628601, Japan
| | - Takashiro Akitsu
- Department of Chemistry, Tokyo University of Science, 1-3 Kagurazaka, Shinjuku-ku, Tokyo 1628601, Japan
| |
Collapse
|
11
|
Bo W, Duan Y, Zou Y, Ma Z, Yang T, Wang P, Guo T, Fu Z, Wang J, Fan L, Liu J, Wang T, Chen L. Local Scaffold Diversity-Contributed Generator for Discovering Potential NLRP3 Inhibitors. J Chem Inf Model 2024; 64:737-748. [PMID: 38258981 DOI: 10.1021/acs.jcim.3c01818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Deep generative models have become crucial tools in de novo drug design. In current models for multiobjective optimization in molecular generation, the scaffold diversity is limited when multiple constraints are introduced. To enhance scaffold diversity, we herein propose a local scaffold diversity-contributed generator (LSDC), which can be utilized to generate diverse lead compounds capable of satisfying multiple constraints. Compared to the state-of-the-art methods, molecules generated by LSDC exhibit greater diversity when applied to the generation of inhibitors targeting the NOD-like receptor (NLR) family, pyrin domain-containing protein 3 (NLRP3). We present 12 molecules, some of which feature previously unreported scaffolds, and demonstrate their reasonable docking binding modes. Consequently, the modification of selected scaffolds and subsequent bioactivity evaluation lead to the discovery of two potent NLRP3 inhibitors, A22 and A14, with IC50 values of 38.1 nM and 44.43 nM, respectively. And the oral bioavailability of compound A14 is very high (F is 83.09% in mice). This work contributes to the discovery of novel NLRP3 inhibitors and provides a reference for integrating AI-based generation with wet experiments.
Collapse
Affiliation(s)
- Weichen Bo
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Yangqin Duan
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Yurong Zou
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Ziyan Ma
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Tao Yang
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Peng Wang
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Tao Guo
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Zhiyuan Fu
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Jianmin Wang
- The Interdisciplinary Graduate Program in Integrative Biotechnology and Translational Medicine, Yonsei University, Incheon 21983, Republic of Korea
| | - Linchuan Fan
- College of Automation, Chongqing University, Chongqing 40000, China
| | - Jie Liu
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Taijin Wang
- Chengdu Zenitar Biomedical Technology Co., Ltd, Chengdu 610041, China
| | - Lijuan Chen
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
- Chengdu Zenitar Biomedical Technology Co., Ltd, Chengdu 610041, China
| |
Collapse
|
12
|
Satalkar V, Degaga GD, Li W, Pang YT, McShan AC, Gumbart JC, Mitchell JC, Torres MP. Generative β-hairpin design using a residue-based physicochemical property landscape. Biophys J 2024:S0006-3495(24)00070-5. [PMID: 38297834 DOI: 10.1016/j.bpj.2024.01.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 12/20/2023] [Accepted: 01/25/2024] [Indexed: 02/02/2024] Open
Abstract
De novo peptide design is a new frontier that has broad application potential in the biological and biomedical fields. Most existing models for de novo peptide design are largely based on sequence homology that can be restricted based on evolutionarily derived protein sequences and lack the physicochemical context essential in protein folding. Generative machine learning for de novo peptide design is a promising way to synthesize theoretical data that are based on, but unique from, the observable universe. In this study, we created and tested a custom peptide generative adversarial network intended to design peptide sequences that can fold into the β-hairpin secondary structure. This deep neural network model is designed to establish a preliminary foundation of the generative approach based on physicochemical and conformational properties of 20 canonical amino acids, for example, hydrophobicity and residue volume, using extant structure-specific sequence data from the PDB. The beta generative adversarial network model robustly distinguishes secondary structures of β hairpin from α helix and intrinsically disordered peptides with an accuracy of up to 96% and generates artificial β-hairpin peptide sequences with minimum sequence identities around 31% and 50% when compared against the current NCBI PDB and nonredundant databases, respectively. These results highlight the potential of generative models specifically anchored by physicochemical and conformational property features of amino acids to expand the sequence-to-structure landscape of proteins beyond evolutionary limits.
Collapse
Affiliation(s)
- Vardhan Satalkar
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia
| | - Gemechis D Degaga
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee
| | - Wei Li
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia
| | - Yui Tik Pang
- School of Physics, Georgia Institute of Technology, Atlanta, Georgia
| | - Andrew C McShan
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia
| | - James C Gumbart
- School of Physics, Georgia Institute of Technology, Atlanta, Georgia; School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia
| | - Julie C Mitchell
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee.
| | - Matthew P Torres
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia; School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia.
| |
Collapse
|
13
|
Siramshetty VB, Xu X, Shah P. Artificial Intelligence in ADME Property Prediction. Methods Mol Biol 2024; 2714:307-327. [PMID: 37676606 DOI: 10.1007/978-1-0716-3441-7_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
Absorption, distribution, metabolism, excretion (ADME) are key properties of a small molecule that govern pharmacokinetic profiles and impact its efficacy and safety. Computational methods such as machine learning and artificial intelligence have gained significant interest in both academic and industrial settings to predict pharmacokinetic properties of small molecules. These methods are applied in drug discovery to optimize chemical libraries, prioritize hits from biological screens, and optimize ADME properties of lead molecules. In the recent years, the drug discovery community witnessed the use of a range of neural network architectures such as deep neural networks, recurrent neural networks, graph neural networks, and transformer neural networks, which marked a paradigm shift in computer-aided drug design and development. This chapter discusses recent developments with an emphasis on their application to predict ADME properties.
Collapse
Affiliation(s)
- Vishal B Siramshetty
- National Center for Advancing Translational Sciences, Rockville, MD, USA
- Department of Safety Assessment, Genentech, Inc., South San Francisco, CA, USA
| | - Xin Xu
- National Center for Advancing Translational Sciences, Rockville, MD, USA
| | - Pranav Shah
- National Center for Advancing Translational Sciences, Rockville, MD, USA.
| |
Collapse
|
14
|
Mwangi J, Kamau PM, Thuku RC, Lai R. Design methods for antimicrobial peptides with improved performance. Zool Res 2023; 44:1095-1114. [PMID: 37914524 PMCID: PMC10802102 DOI: 10.24272/j.issn.2095-8137.2023.246] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 09/20/2023] [Indexed: 11/03/2023] Open
Abstract
The recalcitrance of pathogens to traditional antibiotics has made treating and eradicating bacterial infections more difficult. In this regard, developing new antimicrobial agents to combat antibiotic-resistant strains has become a top priority. Antimicrobial peptides (AMPs), a ubiquitous class of naturally occurring compounds with broad-spectrum antipathogenic activity, hold significant promise as an effective solution to the current antimicrobial resistance (AMR) crisis. Several AMPs have been identified and evaluated for their therapeutic application, with many already in the drug development pipeline. Their distinct properties, such as high target specificity, potency, and ability to bypass microbial resistance mechanisms, make AMPs a promising alternative to traditional antibiotics. Nonetheless, several challenges, such as high toxicity, lability to proteolytic degradation, low stability, poor pharmacokinetics, and high production costs, continue to hamper their clinical applicability. Therefore, recent research has focused on optimizing the properties of AMPs to improve their performance. By understanding the physicochemical properties of AMPs that correspond to their activity, such as amphipathicity, hydrophobicity, structural conformation, amino acid distribution, and composition, researchers can design AMPs with desired and improved performance. In this review, we highlight some of the key strategies used to optimize the performance of AMPs, including rational design and de novo synthesis. We also discuss the growing role of predictive computational tools, utilizing artificial intelligence and machine learning, in the design and synthesis of highly efficacious lead drug candidates.
Collapse
Affiliation(s)
- James Mwangi
- Key Laboratory of Bioactive Peptides of Yunnan Province, Engineering Laboratory of Peptides of Chinese Academy of Sciences, KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, National Resource Centre for Non-Human Primates, Kunming Primate Research Centre, National Research Facility for Phenotypic & Genetic Analysis of Model Animals (Primate Facility), Sino-African Joint Research Centre, New Cornerstone Science Institute, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650107, China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, Yunnan 650204, China
| | - Peter Muiruri Kamau
- Key Laboratory of Bioactive Peptides of Yunnan Province, Engineering Laboratory of Peptides of Chinese Academy of Sciences, KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, National Resource Centre for Non-Human Primates, Kunming Primate Research Centre, National Research Facility for Phenotypic & Genetic Analysis of Model Animals (Primate Facility), Sino-African Joint Research Centre, New Cornerstone Science Institute, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650107, China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, Yunnan 650204, China
| | - Rebecca Caroline Thuku
- Key Laboratory of Bioactive Peptides of Yunnan Province, Engineering Laboratory of Peptides of Chinese Academy of Sciences, KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, National Resource Centre for Non-Human Primates, Kunming Primate Research Centre, National Research Facility for Phenotypic & Genetic Analysis of Model Animals (Primate Facility), Sino-African Joint Research Centre, New Cornerstone Science Institute, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650107, China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, Yunnan 650204, China
| | - Ren Lai
- Key Laboratory of Bioactive Peptides of Yunnan Province, Engineering Laboratory of Peptides of Chinese Academy of Sciences, KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, National Resource Centre for Non-Human Primates, Kunming Primate Research Centre, National Research Facility for Phenotypic & Genetic Analysis of Model Animals (Primate Facility), Sino-African Joint Research Centre, New Cornerstone Science Institute, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650107, China
- Centre for Evolution and Conservation Biology, Southern Marine Science and Engineering Guangdong Laboratory, Guangzhou, Guangdong 511458, China. E-mail:
| |
Collapse
|
15
|
Moreira-Filho JT, Neves BJ, Cajas RA, Moraes JD, Andrade CH. Artificial intelligence-guided approach for efficient virtual screening of hits against Schistosoma mansoni. Future Med Chem 2023; 15:2033-2050. [PMID: 37937522 DOI: 10.4155/fmc-2023-0152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 10/06/2023] [Indexed: 11/09/2023] Open
Abstract
Background: The impact of schistosomiasis, which affects over 230 million people, emphasizes the urgency of developing new antischistosomal drugs. Artificial intelligence is vital in accelerating the drug discovery process. Methodology & results: We developed classification and regression machine learning models to predict the schistosomicidal activity of compounds not experimentally tested. The prioritized compounds were tested on schistosomula and adult stages of Schistosoma mansoni. Four compounds demonstrated significant activity against schistosomula, with 50% effective concentration values ranging from 9.8 to 32.5 μM, while exhibiting no toxicity in animal and human cell lines. Conclusion: These findings represent a significant step forward in the discovery of antischistosomal drugs. Further optimization of these active compounds can pave the way for their progression into preclinical studies.
Collapse
Affiliation(s)
- José Teófilo Moreira-Filho
- Laboratory of Molecular Modeling and Drug Design (LabMol), Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, 74605-170, Brazil
| | - Bruno Junior Neves
- Laboratory of Molecular Modeling and Drug Design (LabMol), Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, 74605-170, Brazil
| | - Rayssa Araujo Cajas
- Research Center on Neglected Diseases (NPDN), Universidade Guarulhos, Guarulhos, 07023-070, Brazil
| | - Josué de Moraes
- Research Center on Neglected Diseases (NPDN), Universidade Guarulhos, Guarulhos, 07023-070, Brazil
| | - Carolina Horta Andrade
- Laboratory of Molecular Modeling and Drug Design (LabMol), Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, 74605-170, Brazil
- Center for the Research and Advancement in Fragments and molecular Targets (CRAFT), School of Pharmaceutical Sciences at Ribeirao Preto, University of São Paulo, Ribeirão Preto, SP, Brazil
| |
Collapse
|
16
|
Riedl M, Mukherjee S, Gauthier M. Descriptor-Free Deep Learning QSAR Model for the Fraction Unbound in Human Plasma. Mol Pharm 2023; 20:4984-4993. [PMID: 37656906 DOI: 10.1021/acs.molpharmaceut.3c00129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/03/2023]
Abstract
Chemical-specific parameters are either measured in vitro or estimated using quantitative structure-activity relationship (QSAR) models. The existing body of QSAR work relies on extracting a set of descriptors or fingerprints, subset selection, and training a machine learning model. In this work, we used a state-of-the-art natural language processing model, Bidirectional Encoder Representations from Transformers, which allowed us to circumvent the need for calculation of these chemical descriptors. In this approach, simplified molecular-input line-entry system (SMILES) strings were embedded in a high-dimensional space using a two-stage training approach. The model was first pre-trained on a masked SMILES token task and then fine-tuned on a QSAR prediction task. The pre-training task learned meaningful high-dimensional embeddings based upon the relationships between the chemical tokens in the SMILES strings derived from the "in-stock" portion of the ZINC 15 dataset─a large dataset of commercially available chemicals. The fine-tuning task then perturbed the pre-trained embeddings to facilitate prediction of a specific QSAR endpoint of interest. The power of this model stems from the ability to reuse the pre-trained model for multiple different fine-tuning tasks, reducing the computational burden of developing multiple models for different endpoints. We used our framework to develop a predictive model for fraction unbound in human plasma (fu,p). This approach is flexible, requires minimum domain expertise, and can be generalized for other parameters of interest for rapid and accurate estimation of absorption, distribution, metabolism, excretion, and toxicity.
Collapse
|
17
|
Saini V. Machine learning prediction of empirical polarity using SMILES encoding of organic solvents. Mol Divers 2023; 27:2331-2343. [PMID: 36334165 DOI: 10.1007/s11030-022-10559-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 10/26/2022] [Indexed: 11/07/2022]
Abstract
Machine learning based statistical models have played a significant role in increasing the speed and accuracy with which the chemical and physical properties of chemical compounds can be predicted as compared to the experimental, and traditional ab initio and quantum mechanical approaches. The transformative impact that these techniques have, in the field of chemical sciences has completely changed the way experiments are designed. The last decade has seen the prominence of computer-aided molecular design based on machine learning algorithms. The major challenge has been the generation of machine-readable data in the form of descriptors and observations for training the model, which can again be time-consuming and computationally expensive if atomic coordinates based molecular encoding approach is used. In this study, we have tried to solve this problem using SMILES representation of molecules for generating various topological, physicochemical, electronic and steric descriptors using open-source cheminformatics packages. With the aid of the data generated using these packages, we have been able to develop a simple and explainable quantitative structure property relationship model using artificial neural network based on 7 numerical descriptors and 1 categorical descriptor for predicting the empirical polarity of a wide diversity of organic solvents. Since polarity is the representation of various solute-solvent and solvent-solvent interactions taking place in an organic transformation, its intuition beforehand will definitely help a chemist in a better experimental design. An ANN algorithm based on 8 descriptors was successfully employed to predict the ET(30) values of organic solvents.
Collapse
Affiliation(s)
- Vaneet Saini
- Department of Chemistry & Centre for Advanced Studies in Chemistry, Panjab University, Chandigarh, 160014, India.
| |
Collapse
|
18
|
Khondkaryan L, Tevosyan A, Navasardyan H, Khachatrian H, Tadevosyan G, Apresyan L, Chilingaryan G, Navoyan Z, Stopper H, Babayan N. Datasets Construction and Development of QSAR Models for Predicting Micronucleus In Vitro and In Vivo Assay Outcomes. TOXICS 2023; 11:785. [PMID: 37755795 PMCID: PMC10537630 DOI: 10.3390/toxics11090785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 09/07/2023] [Accepted: 09/11/2023] [Indexed: 09/28/2023]
Abstract
In silico (quantitative) structure-activity relationship modeling is an approach that provides a fast and cost-effective alternative to assess the genotoxic potential of chemicals. However, one of the limiting factors for model development is the availability of consolidated experimental datasets. In the present study, we collected experimental data on micronuclei in vitro and in vivo, utilizing databases and conducting a PubMed search, aided by text mining using the BioBERT large language model. Chemotype enrichment analysis on the updated datasets was performed to identify enriched substructures. Additionally, chemotypes common for both endpoints were found. Five machine learning models in combination with molecular descriptors, twelve fingerprints and two data balancing techniques were applied to construct individual models. The best-performing individual models were selected for the ensemble construction. The curated final dataset consists of 981 chemicals for micronuclei in vitro and 1309 for mouse micronuclei in vivo, respectively. Out of 18 chemotypes enriched in micronuclei in vitro, only 7 were found to be relevant for in vivo prediction. The ensemble model exhibited high accuracy and sensitivity when applied to an external test set of in vitro data. A good balanced predictive performance was also achieved for the micronucleus in vivo endpoint.
Collapse
Affiliation(s)
- Lusine Khondkaryan
- Institute of Molecular Biology, NAS RA, Yerevan 0014, Armenia; (L.K.); (G.T.); (L.A.)
- Toxometris.ai, Yerevan 0009, Armenia; (A.T.); (H.N.); (Z.N.)
| | - Ani Tevosyan
- Toxometris.ai, Yerevan 0009, Armenia; (A.T.); (H.N.); (Z.N.)
- YerevaNN, Yerevan 0025, Armenia; (H.K.); (G.C.)
| | | | - Hrant Khachatrian
- YerevaNN, Yerevan 0025, Armenia; (H.K.); (G.C.)
- Department of Informatics and Applied Mathematics, Yerevan State University, Yerevan 0025, Armenia
| | - Gohar Tadevosyan
- Institute of Molecular Biology, NAS RA, Yerevan 0014, Armenia; (L.K.); (G.T.); (L.A.)
- Toxometris.ai, Yerevan 0009, Armenia; (A.T.); (H.N.); (Z.N.)
| | - Lilit Apresyan
- Institute of Molecular Biology, NAS RA, Yerevan 0014, Armenia; (L.K.); (G.T.); (L.A.)
- Toxometris.ai, Yerevan 0009, Armenia; (A.T.); (H.N.); (Z.N.)
| | | | - Zaven Navoyan
- Toxometris.ai, Yerevan 0009, Armenia; (A.T.); (H.N.); (Z.N.)
| | - Helga Stopper
- Institute of Pharmacology and Toxicology, University of Würzburg, 97078 Würzburg, Germany;
| | - Nelly Babayan
- Institute of Molecular Biology, NAS RA, Yerevan 0014, Armenia; (L.K.); (G.T.); (L.A.)
- Toxometris.ai, Yerevan 0009, Armenia; (A.T.); (H.N.); (Z.N.)
| |
Collapse
|
19
|
Alnammi M, Liu S, Ericksen SS, Ananiev GE, Voter AF, Guo S, Keck JL, Hoffmann FM, Wildman SA, Gitter A. Evaluating Scalable Supervised Learning for Synthesize-on-Demand Chemical Libraries. J Chem Inf Model 2023; 63:5513-5528. [PMID: 37625010 PMCID: PMC10538940 DOI: 10.1021/acs.jcim.3c00912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Indexed: 08/27/2023]
Abstract
Traditional small-molecule drug discovery is a time-consuming and costly endeavor. High-throughput chemical screening can only assess a tiny fraction of drug-like chemical space. The strong predictive power of modern machine-learning methods for virtual chemical screening enables training models on known active and inactive compounds and extrapolating to much larger chemical libraries. However, there has been limited experimental validation of these methods in practical applications on large commercially available or synthesize-on-demand chemical libraries. Through a prospective evaluation with the bacterial protein-protein interaction PriA-SSB, we demonstrate that ligand-based virtual screening can identify many active compounds in large commercial libraries. We use cross-validation to compare different types of supervised learning models and select a random forest (RF) classifier as the best model for this target. When predicting the activity of more than 8 million compounds from Aldrich Market Select, the RF substantially outperforms a naïve baseline based on chemical structure similarity. 48% of the RF's 701 selected compounds are active. The RF model easily scales to score one billion compounds from the synthesize-on-demand Enamine REAL database. We tested 68 chemically diverse top predictions from Enamine REAL and observed 31 hits (46%), including one with an IC50 value of 1.3 μM.
Collapse
Affiliation(s)
- Moayad Alnammi
- Department
of Computer Sciences, University of Wisconsin−Madison, Madison, Wisconsin 53706, United States
- Morgridge
Institute for Research, Madison, Wisconsin 53715, United States
- Department
of Information and Computer Science, King
Fahd University of Petroleum & Minerals, Dhahran 31261, Saudi Arabia
| | - Shengchao Liu
- Department
of Computer Sciences, University of Wisconsin−Madison, Madison, Wisconsin 53706, United States
- Morgridge
Institute for Research, Madison, Wisconsin 53715, United States
| | - Spencer S. Ericksen
- Small
Molecule Screening Facility, University
of Wisconsin−Madison, Madison, Wisconsin 53792, United States
| | - Gene E. Ananiev
- Small
Molecule Screening Facility, University
of Wisconsin−Madison, Madison, Wisconsin 53792, United States
| | - Andrew F. Voter
- Department
of Biomolecular Chemistry, University of
Wisconsin−Madison, Madison, Wisconsin 53706, United States
| | - Song Guo
- Small
Molecule Screening Facility, University
of Wisconsin−Madison, Madison, Wisconsin 53792, United States
| | - James L. Keck
- Department
of Biomolecular Chemistry, University of
Wisconsin−Madison, Madison, Wisconsin 53706, United States
| | - F. Michael Hoffmann
- Small
Molecule Screening Facility, University
of Wisconsin−Madison, Madison, Wisconsin 53792, United States
- McArdle Laboratory
for Cancer Research, University of Wisconsin−Madison, Madison, Wisconsin 53705, United States
| | - Scott A. Wildman
- Small
Molecule Screening Facility, University
of Wisconsin−Madison, Madison, Wisconsin 53792, United States
| | - Anthony Gitter
- Department
of Computer Sciences, University of Wisconsin−Madison, Madison, Wisconsin 53706, United States
- Morgridge
Institute for Research, Madison, Wisconsin 53715, United States
- Department
of Biostatistics and Medical Informatics, University of Wisconsin−Madison, Madison, Wisconsin 53792, United States
| |
Collapse
|
20
|
Wicks SL, Morgan BS, Wilson AW, Hargrove AE. Probing Bioactive Chemical Space to Discover RNA-Targeted Small Molecules. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.31.551350. [PMID: 37577658 PMCID: PMC10418101 DOI: 10.1101/2023.07.31.551350] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Small molecules have become increasingly recognized as invaluable tools to study RNA structure and function and to develop RNA-targeted therapeutics. To rationally design RNA-targeting ligands, a comprehensive understanding and explicit testing of small molecule properties that govern molecular recognition is crucial. To date, most studies have primarily evaluated properties of small molecules that bind RNA in vitro, with little to no assessment of properties that are distinct to selective and bioactive RNA-targeted ligands. Therefore, we curated an RNA-focused library, termed the Duke RNA-Targeted Library (DRTL), that was biased towards the physicochemical and structural properties of biologically active and non-ribosomal RNA-targeted small molecules. The DRTL represents one of the largest academic RNA-focused small molecule libraries curated to date with more than 800 small molecules. These ligands were selected using computational approaches that measure similarity to known bioactive RNA ligands and that diversify the molecules within this space. We evaluated DRTL binding in vitro to a panel of four RNAs using two optimized fluorescent indicator displacement assays, and we successfully identified multiple small molecule hits, including several novel scaffolds for RNA. The DRTL has and will continue to provide insights into biologically relevant RNA chemical space, such as the identification of additional RNA-privileged scaffolds and validation of RNA-privileged molecular features. Future DRTL screening will focus on expanding both the targets and assays used, and we welcome collaboration from the scientific community. We envision that the DRTL will be a valuable resource for the discovery of RNA-targeted chemical probes and therapeutic leads.
Collapse
Affiliation(s)
- Sarah L. Wicks
- Department of Chemistry; Duke University; 124 Science Drive; Durham, NC 27708
| | - Brittany S. Morgan
- Department of Chemistry & Biochemistry; University of Notre Dame; 123 McCourtney Hall Notre Dame, IN 46556
| | - Alexander W. Wilson
- Department of Chemistry; Duke University; 124 Science Drive; Durham, NC 27708
| | - Amanda E. Hargrove
- Department of Chemistry; Duke University; 124 Science Drive; Durham, NC 27708
| |
Collapse
|
21
|
Weiss T, Wahab A, Bronstein AM, Gershoni-Poranne R. Interpretable Deep-Learning Unveils Structure-Property Relationships in Polybenzenoid Hydrocarbons. J Org Chem 2023; 88:9645-9656. [PMID: 36696660 DOI: 10.1021/acs.joc.2c02381] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
In this work, interpretable deep learning was used to identify structure-property relationships governing the HOMO-LUMO gap and the relative stability of polybenzenoid hydrocarbons (PBHs) using a ring-based graph representation. This representation was combined with a subunit-based perception of PBHs, allowing chemical insights to be presented in terms of intuitive and simple structural motifs. The resulting insights agree with conventional organic chemistry knowledge and electronic structure-based analyses and also reveal new behaviors and identify influential structural motifs. In particular, we evaluated and compared the effects of linear, angular, and branching motifs on these two molecular properties and explored the role of dispersion in mitigating the torsional strain inherent in nonplanar PBHs. Hence, the observed regularities and the proposed analysis contribute to a deeper understanding of the behavior of PBHs and form the foundation for design strategies for new functional PBHs.
Collapse
Affiliation(s)
- Tomer Weiss
- Department of Computer Science, Technion - Israel Institute of Technology, Haifa32000, Israel
| | - Alexandra Wahab
- Laboratory for Organic Chemistry, Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich8093, Switzerland
| | - Alex M Bronstein
- Department of Computer Science, Technion - Israel Institute of Technology, Haifa32000, Israel
| | - Renana Gershoni-Poranne
- Schulich Faculty of Chemistry, Technion - Israel Institute of Technology, Haifa32000, Israel
| |
Collapse
|
22
|
Damavandi S, Shiri F, Emamjomeh A, Pirhadi S, Beyzaei H. A study of the interaction space of two lactate dehydrogenase isoforms (LDHA and LDHB) and some of their inhibitors using proteochemometrics modeling. BMC Chem 2023; 17:70. [PMID: 37415191 DOI: 10.1186/s13065-023-00991-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Accepted: 06/30/2023] [Indexed: 07/08/2023] Open
Abstract
Lactate dehydrogenase (LDH) is a tetramer enzyme that converts pyruvate to lactate reversibly. This enzyme becomes important because it is associated with diseases such as cancers, heart disease, liver problems, and most importantly, corona disease. As a system-based method, proteochemometrics does not require knowledge of the protein's three-dimensional structure, but rather depends on the amino acid sequence and protein descriptors. Here, we applied this methodology to model a set of LDHA and LDHB isoenzyme inhibitors. To implement the proteochemetrics method, the camb package in the R Studio Server programming environment was used. The activity of 312 compounds of LDHA and LDHB isoenzyme inhibitors from the valid Binding DB database was retrieved. The proteochemometrics method was applied to three machine learning algorithms gradient amplification model, random forest, and support vector machine as regression methods to find the best model. Through the combination of different models into an ensemble (greedy and stacking optimization), we explored the possibility of improving the performance of models. For the RF best ensemble model of inhibitors of LDHA and LDHB isoenzymes, and were 0.66 and 0.62, respectively. LDH inhibitory activation is influenced by Morgan fingerprints and topological structure descriptors.
Collapse
Affiliation(s)
- Sedigheh Damavandi
- Department of Bioinformatics, Laboratory of Computational Biotechnology and Bioinformatics (CBB Lab), University of Zabol, Zabol, Iran
| | - Fereshteh Shiri
- Department of Chemistry, Faculty of Science, University of Zabol, Zabol, Iran.
| | - Abbasali Emamjomeh
- Department of Bioinformatics, Laboratory of Computational Biotechnology and Bioinformatics (CBB Lab), University of Zabol, Zabol, Iran
- Department of Plant Breeding and Biotechnology (PBB), Faculty of Agriculture, University of Zabol, Zabol, Iran
| | - Somayeh Pirhadi
- Medicinal and Natural Products Chemistry Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Hamid Beyzaei
- Department of Chemistry, Faculty of Science, University of Zabol, Zabol, Iran
| |
Collapse
|
23
|
Di Lascio E, Gerebtzoff G, Rodríguez-Pérez R. Systematic Evaluation of Local and Global Machine Learning Models for the Prediction of ADME Properties. Mol Pharm 2023; 20:1758-1767. [PMID: 36745394 DOI: 10.1021/acs.molpharmaceut.2c00962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Machine learning (ML) has become an indispensable tool to predict absorption, distribution, metabolism, and excretion (ADME) properties in pharmaceutical research. ML algorithms are trained on molecular structures and corresponding ADME assay data to develop quantitative structure-property relationship (QSPR) models. Traditional QSPR models were trained on compound sets of limited size. With the advent of more complex ML algorithms and data availability, training sets have become larger and more diverse. Most common training approaches consist in either training a model with a small set of similar compounds, namely, compounds designed for the same drug discovery project or chemical series (local model approach) or with a larger set of diverse compounds (global model approach). Global models are built with all experimental data available for an assay, combining compound data from different projects and disease areas. Despite the ML progress made so far, the choice of the appropriate data composition for building ML models is still unclear. Herein, a systematic evaluation of local and global ML models was performed for 10 different experimental assays and 112 drug discovery projects. Results show a consistent superior performance of global models for ADME property predictions. Diagnostic analyses were also carried out to investigate the influence of training set size, structural diversity, and data shift in the relative performance of local and global ML models. Training set and structural diversity did not have an impact in the relative performance on the methods. Instead, data shift helped to identify the projects with larger performance differences between local and global models. Results presented in this work can be leveraged to improve ML-based ADME properties predictions and thus decision-making in drug discovery projects.
Collapse
Affiliation(s)
- Elena Di Lascio
- Novartis Institutes for Biomedical Research, Novartis Campus, BaselCH-4002, Switzerland
| | - Grégori Gerebtzoff
- Novartis Institutes for Biomedical Research, Novartis Campus, BaselCH-4002, Switzerland
| | | |
Collapse
|
24
|
Comparative Studies on Resampling Techniques in Machine Learning and Deep Learning Models for Drug-Target Interaction Prediction. Molecules 2023; 28:molecules28041663. [PMID: 36838652 PMCID: PMC9964614 DOI: 10.3390/molecules28041663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 01/23/2023] [Accepted: 01/24/2023] [Indexed: 02/12/2023] Open
Abstract
The prediction of drug-target interactions (DTIs) is a vital step in drug discovery. The success of machine learning and deep learning methods in accurately predicting DTIs plays a huge role in drug discovery. However, when dealing with learning algorithms, the datasets used are usually highly dimensional and extremely imbalanced. To solve this issue, the dataset must be resampled accordingly. In this paper, we have compared several data resampling techniques to overcome class imbalance in machine learning methods as well as to study the effectiveness of deep learning methods in overcoming class imbalance in DTI prediction in terms of binary classification using ten (10) cancer-related activity classes from BindingDB. It is found that the use of Random Undersampling (RUS) in predicting DTIs severely affects the performance of a model, especially when the dataset is highly imbalanced, thus, rendering RUS unreliable. It is also found that SVM-SMOTE can be used as a go-to resampling method when paired with the Random Forest and Gaussian Naïve Bayes classifiers, whereby a high F1 score is recorded for all activity classes that are severely and moderately imbalanced. Additionally, the deep learning method called Multilayer Perceptron recorded high F1 scores for all activity classes even when no resampling method was applied.
Collapse
|
25
|
Danel T, Łęski J, Podlewska S, Podolak IT. Docking-based generative approaches in the search for new drug candidates. Drug Discov Today 2023; 28:103439. [PMID: 36372330 DOI: 10.1016/j.drudis.2022.103439] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 10/08/2022] [Accepted: 11/08/2022] [Indexed: 11/13/2022]
Abstract
Despite the popularity of virtual screening (VS) of existing compound libraries, the search for new potential drug candidates also takes advantage of generative protocols, where new compound suggestions are enumerated using various algorithms. To increase the activity potency of generative approaches, they have recently been coupled with molecular docking, a leading methodology of structure-based drug design (SBDD). In this review, we summarize progress since docking-based generative models emerged. We propose a new taxonomy for these methods and discuss their importance for the field of computer-aided drug design (CADD). In addition, we discuss the most promising directions for the further development of generative protocols coupled with docking.
Collapse
Affiliation(s)
- Tomasz Danel
- Faculty of Mathematics and Computer Science, Jagiellonian University, 6 Łojasiewicza Street, 30-348 Kraków, Poland.
| | - Jan Łęski
- Faculty of Mathematics and Computer Science, Jagiellonian University, 6 Łojasiewicza Street, 30-348 Kraków, Poland
| | - Sabina Podlewska
- Maj Institute of Pharmacology, Polish Academy of Sciences, Department of Medicinal Chemistry, 31-343 Kraków, Smętna Street 12, Poland
| | - Igor T Podolak
- Faculty of Mathematics and Computer Science, Jagiellonian University, 6 Łojasiewicza Street, 30-348 Kraków, Poland
| |
Collapse
|
26
|
Nascimben M, Rimondini L. Molecular Toxicity Virtual Screening Applying a Quantized Computational SNN-Based Framework. Molecules 2023; 28:molecules28031342. [PMID: 36771009 PMCID: PMC9919191 DOI: 10.3390/molecules28031342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 01/27/2023] [Accepted: 01/29/2023] [Indexed: 02/04/2023] Open
Abstract
Spiking neural networks are biologically inspired machine learning algorithms attracting researchers' attention for their applicability to alternative energy-efficient hardware other than traditional computers. In the current work, spiking neural networks have been tested in a quantitative structure-activity analysis targeting the toxicity of molecules. Multiple public-domain databases of compounds have been evaluated with spiking neural networks, achieving accuracies compatible with high-quality frameworks presented in the previous literature. The numerical experiments also included an analysis of hyperparameters and tested the spiking neural networks on molecular fingerprints of different lengths. Proposing alternatives to traditional software and hardware for time- and resource-consuming tasks, such as those found in chemoinformatics, may open the door to new research and improvements in the field.
Collapse
Affiliation(s)
- Mauro Nascimben
- Department of Health Sciences, Center on Autoimmune and Allergic Diseases CAAD, Università del Piemonte Orientale, 28100 Novara, Italy
- Enginsoft SpA, 35129 Padua, Italy
- Correspondence:
| | - Lia Rimondini
- Department of Health Sciences, Center on Autoimmune and Allergic Diseases CAAD, Università del Piemonte Orientale, 28100 Novara, Italy
| |
Collapse
|
27
|
Apprato G, D’Agostini G, Rossetti P, Ermondi G, Caron G. In Silico Tools to Extract the Drug Design Information Content of Degradation Data: The Case of PROTACs Targeting the Androgen Receptor. Molecules 2023; 28:molecules28031206. [PMID: 36770875 PMCID: PMC9919651 DOI: 10.3390/molecules28031206] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 01/19/2023] [Accepted: 01/24/2023] [Indexed: 01/28/2023] Open
Abstract
Proteolysis-Targeting Chimeras (PROTACs) have recently emerged as a promising technology in the drug discovery landscape. Large interest in the degradation of the androgen receptor (AR) as a new anti-prostatic cancer strategy has resulted in several papers focusing on PROTACs against AR. This study explores the potential of a few in silico tools to extract drug design information from AR degradation data in the format often reported in the literature. After setting up a dataset of 92 PROTACs with consistent AR degradation values, we employed the Bemis-Murcko method for their classification. The resulting clusters were not informative in terms of structure-degradation relationship. Subsequently, we performed Degradation Cliff analysis and identified some key aspects conferring a positive contribution to activity, as well as some methodological limits when applying this approach to PROTACs. Linker structure degradation relationships were also investigated. Then, we built and characterized ternary complexes to validate previous results. Finally, we implemented machine learning classification models and showed that AR degradation for VHL-based but not CRBN-based PROTACs can be predicted from simple permeability-related 2D molecular descriptors.
Collapse
|
28
|
Molecular Property Prediction of Modified Gedunin Using Machine Learning. Molecules 2023; 28:molecules28031125. [PMID: 36770791 PMCID: PMC9921289 DOI: 10.3390/molecules28031125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 01/16/2023] [Accepted: 01/18/2023] [Indexed: 01/25/2023] Open
Abstract
Images of molecules are often utilized in education and synthetic exploration to predict molecular characteristics. Deep learning (DL) has also had an influence on drug research, such as the interpretation of cellular images as well as the development of innovative methods for the synthesis of organic molecules. Although research in these areas has been significant, a comprehensive review of DL applications in drug development would be beyond the scope of a single Account. In this study, we will concentrate on a single major area where DL has influenced molecular design: the prediction of molecular properties of modified gedunin using machine learning (ML). AI and ML technologies are critical in drug research and development. In these other words, deep learning (DL) algorithms and artificial neural networks (ANN) have changed the field. In short, advances in AI and ML present a good potential for rational drug design and exploration, which will ultimately benefit humanity. In this paper, long short-term memory (LSTM) was used to convert a modified gedunin SMILE into a molecular image. The 2D molecular representations and their immediately visible highlights should then provide adequate data to predict the subordinate characteristics of atom design. We aim to find the properties of modified gedunin using K-means clustering; RNN-like ML tools. To support this postulation, neural network (NN) clustering based on the AI picture is used and evaluated in this study. The novel chemical developed via profound learning has long been predicted on characteristics. As a result, LSTM with RNNs allow us to predict the properties of molecules of modified gedunin. The total accuracy of the suggested model is 98.68%. The accuracy of the molecular property prediction of modified gedunin research is promising enough to evaluate extrapolation and generalization. The model suggested in this research requires just seconds or minutes to calculate, making it faster as well as more effective than existing techniques. In short, ML can be a useful tool for predicting the properties of modified gedunin molecules.
Collapse
|
29
|
Dawson DE, Lau C, Pradeep P, Sayre RR, Judson RS, Tornero-Velez R, Wambaugh JF. A Machine Learning Model to Estimate Toxicokinetic Half-Lives of Per- and Polyfluoro-Alkyl Substances (PFAS) in Multiple Species. TOXICS 2023; 11:98. [PMID: 36850973 PMCID: PMC9962572 DOI: 10.3390/toxics11020098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 01/09/2023] [Accepted: 01/18/2023] [Indexed: 06/18/2023]
Abstract
Per- and polyfluoroalkyl substances (PFAS) are a diverse group of man-made chemicals that are commonly found in body tissues. The toxicokinetics of most PFAS are currently uncharacterized, but long half-lives (t½) have been observed in some cases. Knowledge of chemical-specific t½ is necessary for exposure reconstruction and extrapolation from toxicological studies. We used an ensemble machine learning method, random forest, to model the existing in vivo measured t½ across four species (human, monkey, rat, mouse) and eleven PFAS. Mechanistically motivated descriptors were examined, including two types of surrogates for renal transporters: (1) physiological descriptors, including kidney geometry, for renal transporter expression and (2) structural similarity of defluorinated PFAS to endogenous chemicals for transporter affinity. We developed a classification model for t½ (Bin 1: <12 h; Bin 2: <1 week; Bin 3: <2 months; Bin 4: >2 months). The model had an accuracy of 86.1% in contrast to 32.2% for a y-randomized null model. A total of 3890 compounds were within domain of the model, and t½ was predicted using the bin medians: 4.9 h, 2.2 days, 33 days, and 3.3 years. For human t½, 56% of PFAS were classified in Bin 4, 7% were classified in Bin 3, and 37% were classified in Bin 2. This model synthesizes the limited available data to allow tentative extrapolation and prioritization.
Collapse
Affiliation(s)
- Daniel E. Dawson
- U.S. Environmental Protection Agency, Office of Research and Development, Center for Computational Toxicology and Exposure, 109 T.W. Alexander Drive, Research Triangle Park, NC 27711, USA
| | - Christopher Lau
- U.S. Environmental Protection Agency, Office of Research and Development, Center for Public Health and Environmental Assessment, 109 T.W. Alexander Drive, Research Triangle Park, NC 277011, USA
| | - Prachi Pradeep
- U.S. Environmental Protection Agency, Office of Research and Development, Center for Computational Toxicology and Exposure, 109 T.W. Alexander Drive, Research Triangle Park, NC 27711, USA
- Oak Ridge Institutes for Science and Education, Oak Ridge, TN 37830, USA
| | - Risa R. Sayre
- U.S. Environmental Protection Agency, Office of Research and Development, Center for Computational Toxicology and Exposure, 109 T.W. Alexander Drive, Research Triangle Park, NC 27711, USA
| | - Richard S. Judson
- U.S. Environmental Protection Agency, Office of Research and Development, Center for Computational Toxicology and Exposure, 109 T.W. Alexander Drive, Research Triangle Park, NC 27711, USA
| | - Rogelio Tornero-Velez
- U.S. Environmental Protection Agency, Office of Research and Development, Center for Computational Toxicology and Exposure, 109 T.W. Alexander Drive, Research Triangle Park, NC 27711, USA
| | - John F. Wambaugh
- U.S. Environmental Protection Agency, Office of Research and Development, Center for Computational Toxicology and Exposure, 109 T.W. Alexander Drive, Research Triangle Park, NC 27711, USA
| |
Collapse
|
30
|
A Review on Artificial Intelligence Enabled Design, Synthesis, and Process Optimization of Chemical Products for Industry 4.0. Processes (Basel) 2023. [DOI: 10.3390/pr11020330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
With the development of Industry 4.0, artificial intelligence (AI) is gaining increasing attention for its performance in solving particularly complex problems in industrial chemistry and chemical engineering. Therefore, this review provides an overview of the application of AI techniques, in particular machine learning, in chemical design, synthesis, and process optimization over the past years. In this review, the focus is on the application of AI for structure-function relationship analysis, synthetic route planning, and automated synthesis. Finally, we discuss the challenges and future of AI in making chemical products.
Collapse
|
31
|
Ogawa K, Sakamoto D, Hosoki R. Computer Science Technology in Natural Products Research: A Review of Its Applications and Implications. Chem Pharm Bull (Tokyo) 2023; 71:486-494. [PMID: 37394596 DOI: 10.1248/cpb.c23-00039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Computational approaches to drug development are rapidly growing in popularity and have been used to produce significant results. Recent developments in information science have expanded databases and chemical informatics knowledge relating to natural products. Natural products have long been well-studied, and a large number of unique structures and remarkable active substances have been reported. Analyzing accumulated natural product knowledge using emerging computational science techniques is expected to yield more new discoveries. In this article, we discuss the current state of natural product research using machine learning. The basic concepts and frameworks of machine learning are summarized. Natural product research that utilizes machine learning is described in terms of the exploration of active compounds, automatic compound design, and application to spectral data. In addition, efforts to develop drugs for intractable diseases will be addressed. Lastly, we discuss key considerations for applying machine learning in this field. This paper aims to promote progress in natural product research by presenting the current state of computational science and chemoinformatics approaches in terms of its applications, strengths, limitations, and implications for the field.
Collapse
Affiliation(s)
- Keiko Ogawa
- Laboratory of Regulatory Science, College of Pharmaceutical Sciences, Ritsumeikan University
| | - Daiki Sakamoto
- Laboratory of Regulatory Science, College of Pharmaceutical Sciences, Ritsumeikan University
| | - Rumiko Hosoki
- Laboratory of Regulatory Science, College of Pharmaceutical Sciences, Ritsumeikan University
| |
Collapse
|
32
|
Overview of cocaine identification by vibrational spectroscopy and chemometrics. Forensic Sci Int 2023; 342:111540. [PMID: 36565684 DOI: 10.1016/j.forsciint.2022.111540] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 11/29/2022] [Accepted: 12/10/2022] [Indexed: 12/23/2022]
Abstract
The use of non-destructive forensic methods for cocaine identification is of outstanding importance, given the amount of samples seized. Techniques such as ATR-FTIR, Raman, and NIR spectroscopy have become alternatives to circumvent this problem, as they allow fast, cheap analysis, and enable the reanalysis of samples. When combined with chemometrics, these spectroscopic methods can be used to determine and quantify cocaine samples, meaning that the limitations of existing techniques can be overcome. This review article covers spectroscopic techniques for identifying cocaine in different forms and matrices, such as food and textiles, which are materials used for smuggling. The chemometric identification of cocaine in oral fluid and water is also discussed. In addition, vibrational spectroscopy techniques using portable equipment are described. This work seeks to evaluate the main chemometric applications of spectroscopic data and to find new perspectives on the identification of cocaine using chemometrics.
Collapse
|
33
|
Kondinski A, Bai J, Mosbach S, Akroyd J, Kraft M. Knowledge Engineering in Chemistry: From Expert Systems to Agents of Creation. Acc Chem Res 2022; 56:128-139. [PMID: 36516456 PMCID: PMC9850921 DOI: 10.1021/acs.accounts.2c00617] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Passing knowledge from human to human is a natural process that has continued since the beginning of humankind. Over the past few decades, we have witnessed that knowledge is no longer passed only between humans but also from humans to machines. The latter form of knowledge transfer represents a cornerstone in artificial intelligence (AI) and lays the foundation for knowledge engineering (KE). In order to pass knowledge to machines, humans need to structure, formalize, and make knowledge machine-readable. Subsequently, humans also need to develop software that emulates their decision-making process. In order to engineer chemical knowledge, chemists are often required to challenge their understanding of chemistry and thinking processes, which may help improve the structure of chemical knowledge.Knowledge engineering in chemistry dates from the development of expert systems that emulated the thinking process of analytical and organic chemists. Since then, many different expert systems employing rather limited knowledge bases have been developed, solving problems in retrosynthesis, analytical chemistry, chemical risk assessment, etc. However, toward the end of the 20th century, the AI winters slowed down the development of expert systems for chemistry. At the same time, the increasing complexity of chemical research, alongside the limitations of the available computing tools, made it difficult for many chemistry expert systems to keep pace.In the past two decades, the semantic web, the popularization of object-oriented programming, and the increase in computational power have revitalized knowledge engineering. Knowledge formalization through ontologies has become commonplace, triggering the subsequent development of knowledge graphs and cognitive software agents. These tools enable the possibility of interoperability, enabling the representation of more complex systems, inference capabilities, and the synthesis of new knowledge.This Account introduces the history, the core principles of KE, and its applications within the broad realm of chemical research and engineering. In this regard, we first discuss how chemical knowledge is formalized and how a chemist's cognition can be emulated with the help of reasoning algorithms. Following this, we discuss various applications of knowledge graph and agent technology used to solve problems in chemistry related to molecular engineering, chemical mechanisms, multiscale modeling, automation of calculations and experiments, and chemist-machine interactions. These developments are discussed in the context of a universal and dynamic knowledge ecosystem, referred to as The World Avatar (TWA).
Collapse
Affiliation(s)
- Aleksandar Kondinski
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K.
| | - Jiaru Bai
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K.
| | - Sebastian Mosbach
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K.,CARES,
Cambridge Centre for Advanced Research and Education in Singapore, 1 Create Way, CREATE Tower, #05-05, 138602 Singapore
| | - Jethro Akroyd
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K.,CMCL
Innovations, Sheraton
House, Castle Park, Cambridge CB3 0AX, U.K.
| | - Markus Kraft
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K.,CARES,
Cambridge Centre for Advanced Research and Education in Singapore, 1 Create Way, CREATE Tower, #05-05, 138602 Singapore,School
of Chemical and Biomedical Engineering, Nanyang Technological University, 62 Nanyang Drive, 637459 Singapore,E-mail:
| |
Collapse
|
34
|
Bort W, Mazitov D, Horvath D, Bonachera F, Lin A, Marcou G, Baskin I, Madzhidov T, Varnek A. Inverse QSAR: Reversing Descriptor-Driven Prediction Pipeline Using Attention-Based Conditional Variational Autoencoder. J Chem Inf Model 2022; 62:5471-5484. [PMID: 36332178 DOI: 10.1021/acs.jcim.2c01086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In order to better foramize it, the notorious inverse-QSAR problem (finding structures of given QSAR-predicted properties) is considered in this paper as a two-step process including (i) finding "seed" descriptor vectors corresponding to user-constrained QSAR model output values and (ii) identifying the chemical structures best matching the "seed" vectors. The main development effort here was focused on the latter stage, proposing a new attention-based conditional variational autoencoder neural-network architecture based on recent developments in attention-based methods. The obtained results show that this workflow was capable of generating compounds predicted to display desired activity while being completely novel compared to the training database (ChEMBL). Moreover, the generated compounds show acceptable druglikeness and synthetic accessibility. Both pharmacophore and docking studies were carried out as "orthogonal" in silico validation methods, proving that some of de novo structures are, beyond being predicted active by 2D-QSAR models, clearly able to match binding 3D pharmacophores and bind the protein pocket.
Collapse
Affiliation(s)
- William Bort
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Daniyar Mazitov
- Laboratory of Chemoinformatics and Molecular Modeling, A. M. Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008 Kazan, Russia
| | - Dragos Horvath
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Fanny Bonachera
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Arkadii Lin
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Gilles Marcou
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Igor Baskin
- Department of Material Science and Engineering, Technion─Israel Institute of Technology, 3200003 Haifa, Israel
| | - Timur Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling, A. M. Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008 Kazan, Russia
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| |
Collapse
|
35
|
Lin K, Peng J, Xu C, Gu FL, Lan Z. Automatic Evolution of Machine-Learning-Based Quantum Dynamics with Uncertainty Analysis. J Chem Theory Comput 2022; 18:5837-5855. [DOI: 10.1021/acs.jctc.2c00702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Kunni Lin
- School of Chemistry, South China Normal University, Guangzhou510006, P. R. China
- MOE Key Laboratory of Environmental Theoretical Chemistry, South China Normal University, Guangzhou510006, P. R. China
| | - Jiawei Peng
- School of Chemistry, South China Normal University, Guangzhou510006, P. R. China
- MOE Key Laboratory of Environmental Theoretical Chemistry, South China Normal University, Guangzhou510006, P. R. China
| | - Chao Xu
- MOE Key Laboratory of Environmental Theoretical Chemistry, South China Normal University, Guangzhou510006, P. R. China
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety, School of Environment, South China Normal University, Guangzhou510006, P. R. China
| | - Feng Long Gu
- MOE Key Laboratory of Environmental Theoretical Chemistry, South China Normal University, Guangzhou510006, P. R. China
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety, School of Environment, South China Normal University, Guangzhou510006, P. R. China
| | - Zhenggang Lan
- MOE Key Laboratory of Environmental Theoretical Chemistry, South China Normal University, Guangzhou510006, P. R. China
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety, School of Environment, South China Normal University, Guangzhou510006, P. R. China
| |
Collapse
|
36
|
Sang L, Wang Y, Zong C, Wang P, Zhang H, Guo D, Yuan B, Pan Y. Machine Learning for Evaluating the Cytotoxicity of Mixtures of Nano-TiO 2 and Heavy Metals: QSAR Model Apply Random Forest Algorithm after Clustering Analysis. Molecules 2022; 27:molecules27186125. [PMID: 36144857 PMCID: PMC9500633 DOI: 10.3390/molecules27186125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 09/09/2022] [Accepted: 09/13/2022] [Indexed: 11/24/2022] Open
Abstract
With the development and application of nanomaterials, their impact on the environment and organisms has attracted attention. As a common nanomaterial, nano-titanium dioxide (nano-TiO2) has adsorption properties to heavy metals in the environment. Quantitative structure-activity relationship (QSAR) is often used to predict the cytotoxicity of a single substance. However, there is little research on the toxicity of interaction between nanomaterials and other substances. In this study, we exposed human renal cortex proximal tubule epithelial (HK-2) cells to mixtures of eight heavy metals with nano-TiO2, measured absorbance values by CCK-8, and calculated cell viability. PLS and two ensemble learning algorithms are used to build multiple QSAR models for data sets, and the test set R2 is increased from 0.38 to 0.78 and 0.85, and RMSE is decreased from 0.18 to 0.12 and 0.10. After selecting the better random forest algorithm, the K-means clustering algorithm is used to continue to optimize the model, increasing the test set R2 to 0.95 and decreasing the RMSE to 0.08 and 0.06. As a reliable machine algorithm, random forest can be used to predict the toxicity of the mixture of nano-metal oxides and heavy metals. The cluster analysis can effectively improve the stability and predictability of the model, and provide a new idea for the prediction of cytotoxicity model in the future.
Collapse
Affiliation(s)
- Leqi Sang
- College of Safety Science and Engineering, Nanjing Tech University, Nanjing 211816, China
| | - Yunlin Wang
- College of Safety Science and Engineering, Nanjing Tech University, Nanjing 211816, China
| | - Cheng Zong
- College of Safety Science and Engineering, Nanjing Tech University, Nanjing 211816, China
| | - Pengfei Wang
- College of Safety Science and Engineering, Nanjing Tech University, Nanjing 211816, China
| | - Huazhong Zhang
- Department of Emergency Medicine, The First Affiliated Hospital of Nanjing Medical University, Nanjing 210006, China
| | - Dan Guo
- Department of Preventive Health Branch, The Affiliated Jiangning Hospital of Nanjing Medical University, Nanjing 211100, China
| | - Beilei Yuan
- College of Safety Science and Engineering, Nanjing Tech University, Nanjing 211816, China
- Correspondence: (B.Y.); (Y.P.); Tel.: +86-25-5813-9553 (B.Y.)
| | - Yong Pan
- College of Safety Science and Engineering, Nanjing Tech University, Nanjing 211816, China
- Correspondence: (B.Y.); (Y.P.); Tel.: +86-25-5813-9553 (B.Y.)
| |
Collapse
|
37
|
Rajkumar M, Bhukya SN, Ahalya N, Elumalai G, Sivanandam K, Almutairi KMA, Alonazi WB, Soma SR, Urugo MM. Impact of ANN in Revealing of Viral Peptides. BIOMED RESEARCH INTERNATIONAL 2022; 2022:7760734. [PMID: 35978632 PMCID: PMC9377878 DOI: 10.1155/2022/7760734] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Revised: 06/13/2022] [Accepted: 06/22/2022] [Indexed: 11/18/2022]
Abstract
All organisms contain antimicrobial peptides (AMPs), which are a critical component of the innate immune system. These chemicals have the ability to suppress the growth of a variety of fungi, bacteria, and viruses. Because AMPs interact with structural components of the microbial cell membrane and have a wide range of cellular targets, bacteria are unlikely to be able to develop resistance to them in the short term. The underlying structure of AMPs is critical in determining the selectivity with which they target their respective targets. As far as we know, peptides have not been tested in a lab to see if they can fight bacteria, fungus, and viruses in real life. In this paper, we develop an artificial neural network (ANN) using a back propagation neural network (BPNN) that enables optimal classification of tendency of a peptide sequence that involves the activities of antifungal, antibacterial, or antiviral. The BPNN is trained on the datasets collected across different repositories and then the overfitting is avoided using particle swarm optimization (PSO) algorithm. Hence, at the time of testing, the BPNN clearly finds the predicted samples belonging to the same classes and this avoids the problem of finding the false positives. The simulation is conducted to test the efficacy of the model against various metrics that includes accuracy, precision, recall, and f1-measure. The effectiveness of the BPNN-PSO model in classifying instances at a faster rate than other techniques is demonstrated by its performance. The principle is straightforward, it is not difficult to programme, it converges more quickly, and it generally offers a superior solution.
Collapse
Affiliation(s)
- M. Rajkumar
- Department of Computer Science and Engineering, Rajalakshmi Engineering College, Chennai, Tamil Nadu, India
| | - Shankar Nayak Bhukya
- Department of Computer Science and Engineering (Data Science), CMR Technical Campus, Hyderabad, Telangana 501401, India
| | - N. Ahalya
- Department of Biotechnology, MS Ramaiah Institute Technology, Bengaluru, Karnataka 560054, India
| | - G. Elumalai
- Department of Electronics and Communication Engineering, Panimalar Engineering College, Chennai, Tamil Nadu 600123, India
| | - K. Sivanandam
- Department of Electronics and Communication Engineering, M.Kumarasamy College of Engineering, Karur, Tamil Nadu 639113, India
| | - Khalid M. A. Almutairi
- Department of Community Health Sciences, College of Applied Medical Sciences, King Saud University, P. O. Box: 10219, Riyadh 11433, Saudi Arabia
| | - Wadi B. Alonazi
- Health Administration Department, College of Business Administration, King Saud University, PO Box: 71115, Riyadh 11587, Saudi Arabia
| | - S. R. Soma
- Department of Biology, University of Tennessee Health Science Center, Memphis, USA
| | - Markos Makiso Urugo
- Department of Food Science and Postharvest Technology, College of Agricultural Sciences, Wachamo University, Hosaena, Ethiopia
| |
Collapse
|
38
|
Conti S, Ovchinnikov V, Karplus M. ppdx: Automated modeling of protein-protein interaction descriptors for use with machine learning. J Comput Chem 2022; 43:1747-1757. [PMID: 35930347 DOI: 10.1002/jcc.26974] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 07/01/2022] [Accepted: 07/13/2022] [Indexed: 11/07/2022]
Abstract
This paper describes ppdx, a python workflow tool that combines protein sequence alignment, homology modeling, and structural refinement, to compute a broad array of descriptors for characterizing protein-protein interactions. The descriptors can be used to predict various properties of interest, such as protein-protein binding affinities, or inhibitory concentrations (IC50 ), using approaches that range from simple regression to more complex machine learning models. The software is highly modular. It supports different protocols for generating structures, and 95 descriptors can be currently computed. More protocols and descriptors can be easily added. The implementation is highly parallel and can fully exploit the available cores in a single workstation, or multiple nodes on a supercomputer, allowing many systems to be analyzed simultaneously. As an illustrative application, ppdx is used to parametrize a model that predicts the IC50 of a set of antigens and a class of antibodies directed to the influenza hemagglutinin stalk.
Collapse
Affiliation(s)
- Simone Conti
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, USA
| | - Victor Ovchinnikov
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, USA
| | - Martin Karplus
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, USA.,Laboratoire de Chimie Biophysique, Institut de Science et d'Ingénierie Supramoléculaires, Université de Strasbourg, Strasbourg, France
| |
Collapse
|
39
|
On the Rapid Calculation of Binding Affinities for Antigen and Antibody Design and Affinity Maturation Simulations. Antibodies (Basel) 2022; 11:antib11030051. [PMID: 35997345 PMCID: PMC9397028 DOI: 10.3390/antib11030051] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 07/23/2022] [Accepted: 08/01/2022] [Indexed: 02/05/2023] Open
Abstract
The accurate and efficient calculation of protein-protein binding affinities is an essential component in antibody and antigen design and optimization, and in computer modeling of antibody affinity maturation. Such calculations remain challenging despite advances in computer hardware and algorithms, primarily because proteins are flexible molecules, and thus, require explicit or implicit incorporation of multiple conformational states into the computational procedure. The astronomical size of the amino acid sequence space further compounds the challenge by requiring predictions to be computed within a short time so that many sequence variants can be tested. In this study, we compare three classes of methods for antibody/antigen (Ab/Ag) binding affinity calculations: (i) a method that relies on the physical separation of the Ab/Ag complex in equilibrium molecular dynamics (MD) simulations, (ii) a collection of 18 scoring functions that act on an ensemble of structures created using homology modeling software, and (iii) methods based on the molecular mechanics-generalized Born surface area (MM-GBSA) energy decomposition, in which the individual contributions of the energy terms are scaled to optimize agreement with the experiment. When applied to a set of 49 antibody mutations in two Ab/HIV gp120 complexes, all of the methods are found to have modest accuracy, with the highest Pearson correlations reaching about 0.6. In particular, the most computationally intensive method, i.e., MD simulation, did not outperform several scoring functions. The optimized energy decomposition methods provided marginally higher accuracy, but at the expense of requiring experimental data for parametrization. Within each method class, we examined the effect of the number of independent computational replicates, i.e., modeled structures or reinitialized MD simulations, on the prediction accuracy. We suggest using about ten modeled structures for scoring methods, and about five simulation replicates for MD simulations as a rule of thumb for obtaining reasonable convergence. We anticipate that our study will be a useful resource for practitioners working to incorporate binding affinity calculations within their protein design and optimization process.
Collapse
|
40
|
Gautam V, Gupta R, Gupta D, Ruhela A, Mittal A, Mohanty SK, Arora S, Gupta R, Saini C, Sengupta D, Murugan NA, Ahuja G. deepGraphh: AI-driven web service for graph-based quantitative structure-activity relationship analysis. Brief Bioinform 2022; 23:6648791. [PMID: 35868454 DOI: 10.1093/bib/bbac288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 06/01/2022] [Accepted: 06/23/2022] [Indexed: 11/12/2022] Open
Abstract
Artificial intelligence (AI)-based computational techniques allow rapid exploration of the chemical space. However, representation of the compounds into computational-compatible and detailed features is one of the crucial steps for quantitative structure-activity relationship (QSAR) analysis. Recently, graph-based methods are emerging as a powerful alternative to chemistry-restricted fingerprints or descriptors for modeling. Although graph-based modeling offers multiple advantages, its implementation demands in-depth domain knowledge and programming skills. Here we introduce deepGraphh, an end-to-end web service featuring a conglomerate of established graph-based methods for model generation for classification or regression tasks. The graphical user interface of deepGraphh supports highly configurable parameter support for model parameter tuning, model generation, cross-validation and testing of the user-supplied query molecules. deepGraphh supports four widely adopted methods for QSAR analysis, namely, graph convolution network, graph attention network, directed acyclic graph and Attentive FP. Comparative analysis revealed that deepGraphh supported methods are comparable to the descriptors-based machine learning techniques. Finally, we used deepGraphh models to predict the blood-brain barrier permeability of human and microbiome-generated metabolites. In summary, deepGraphh offers a one-stop web service for graph-based methods for chemoinformatics.
Collapse
Affiliation(s)
- Vishakha Gautam
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Rahul Gupta
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Deepti Gupta
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Anubhav Ruhela
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Aayushi Mittal
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Sanjay Kumar Mohanty
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Sakshi Arora
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Ria Gupta
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Chandan Saini
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Debarka Sengupta
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India.,Department of Computer Science and Engineering, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India.,Centre for Artificial Intelligence, Indraprastha Institute of Information Technology, New Delhi, India
| | - Natarajan Arul Murugan
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| | - Gaurav Ahuja
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), Okhla, Phase III, New Delhi-110020, India
| |
Collapse
|
41
|
Lewis‐Atwell T, Townsend PA, Grayson MN. Machine learning activation energies of chemical reactions. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1593] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Toby Lewis‐Atwell
- Department of Computer Science, Faculty of Science University of Bath Bath UK
| | - Piers A. Townsend
- Department of Chemistry, Faculty of Science University of Bath Bath UK
| | | |
Collapse
|
42
|
Nagy B, Galata DL, Farkas A, Nagy ZK. Application of Artificial Neural Networks in the Process Analytical Technology of Pharmaceutical Manufacturing-a Review. AAPS J 2022; 24:74. [PMID: 35697951 DOI: 10.1208/s12248-022-00706-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Accepted: 04/06/2022] [Indexed: 01/22/2023] Open
Abstract
Industry 4.0 has started to transform the manufacturing industries by embracing digitalization, automation, and big data, aiming for interconnected systems, autonomous decisions, and smart factories. Machine learning techniques, such as artificial neural networks (ANN), have emerged as potent tools to address the related computational tasks. These advancements have also reached the pharmaceutical industry, where the Process Analytical Technology (PAT) initiative has already paved the way for the real-time analysis of the processes and the science- and risk-based flexible production. This paper aims to assess the potential of ANNs within the PAT concept to aid the modernization of pharmaceutical manufacturing. The current state of ANNs is systematically reviewed for the most common manufacturing steps of solid pharmaceutical products, and possible research gaps and future directions are identified. In this way, this review could aid the further development of machine learning techniques for pharmaceutical production and eventually contribute to the implementation of intelligent manufacturing lines with automated quality assurance.
Collapse
Affiliation(s)
- Brigitta Nagy
- Department of Organic Chemistry and Technology, Faculty of Chemical Technology and Biotechnology, Budapest University of Technology and Economics, Műegyetem rkp. 3., Budapest, H-1111, Hungary
| | - Dorián László Galata
- Department of Organic Chemistry and Technology, Faculty of Chemical Technology and Biotechnology, Budapest University of Technology and Economics, Műegyetem rkp. 3., Budapest, H-1111, Hungary
| | - Attila Farkas
- Department of Organic Chemistry and Technology, Faculty of Chemical Technology and Biotechnology, Budapest University of Technology and Economics, Műegyetem rkp. 3., Budapest, H-1111, Hungary
| | - Zsombor Kristóf Nagy
- Department of Organic Chemistry and Technology, Faculty of Chemical Technology and Biotechnology, Budapest University of Technology and Economics, Műegyetem rkp. 3., Budapest, H-1111, Hungary.
| |
Collapse
|
43
|
Antinucci G, Dereli B, Vittoria A, Budzelaar PHM, Cipullo R, Goryunov GP, Kulyabin PS, Uborsky DV, Cavallo L, Ehm C, Voskoboynikov AZ, Busico V. Selection of Low-Dimensional 3-D Geometric Descriptors for Accurate Enantioselectivity Prediction. ACS Catal 2022. [DOI: 10.1021/acscatal.2c00976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Giuseppe Antinucci
- Dipartimento di Scienze Chimiche, Università di Napoli Federico II, Via Cintia, 80126 Napoli, Italy
- DPI, P.O.
Box 902, 5600 AX Eindhoven, the Netherlands
| | - Busra Dereli
- Catalysis Research Center, Physical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| | - Antonio Vittoria
- Dipartimento di Scienze Chimiche, Università di Napoli Federico II, Via Cintia, 80126 Napoli, Italy
- DPI, P.O.
Box 902, 5600 AX Eindhoven, the Netherlands
| | - Peter H. M. Budzelaar
- Dipartimento di Scienze Chimiche, Università di Napoli Federico II, Via Cintia, 80126 Napoli, Italy
- DPI, P.O.
Box 902, 5600 AX Eindhoven, the Netherlands
| | - Roberta Cipullo
- Dipartimento di Scienze Chimiche, Università di Napoli Federico II, Via Cintia, 80126 Napoli, Italy
- DPI, P.O.
Box 902, 5600 AX Eindhoven, the Netherlands
| | - Georgy P. Goryunov
- Department of Chemistry, Lomonosov Moscow State University, 1/3 Leninskie Gory, 119991 Moscow, Russia
- DPI, P.O.
Box 902, 5600 AX Eindhoven, the Netherlands
| | - Pavel S. Kulyabin
- Department of Chemistry, Lomonosov Moscow State University, 1/3 Leninskie Gory, 119991 Moscow, Russia
- DPI, P.O.
Box 902, 5600 AX Eindhoven, the Netherlands
| | - Dmitry V. Uborsky
- Department of Chemistry, Lomonosov Moscow State University, 1/3 Leninskie Gory, 119991 Moscow, Russia
- DPI, P.O.
Box 902, 5600 AX Eindhoven, the Netherlands
| | - Luigi Cavallo
- Catalysis Research Center, Physical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| | - Christian Ehm
- Dipartimento di Scienze Chimiche, Università di Napoli Federico II, Via Cintia, 80126 Napoli, Italy
- DPI, P.O.
Box 902, 5600 AX Eindhoven, the Netherlands
| | - Alexander Z. Voskoboynikov
- Department of Chemistry, Lomonosov Moscow State University, 1/3 Leninskie Gory, 119991 Moscow, Russia
- DPI, P.O.
Box 902, 5600 AX Eindhoven, the Netherlands
| | - Vincenzo Busico
- Dipartimento di Scienze Chimiche, Università di Napoli Federico II, Via Cintia, 80126 Napoli, Italy
- DPI, P.O.
Box 902, 5600 AX Eindhoven, the Netherlands
| |
Collapse
|
44
|
Bender A, Schneider N, Segler M, Patrick Walters W, Engkvist O, Rodrigues T. Evaluation guidelines for machine learning tools in the chemical sciences. Nat Rev Chem 2022; 6:428-442. [PMID: 37117429 DOI: 10.1038/s41570-022-00391-9] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/13/2022] [Indexed: 02/07/2023]
Abstract
Machine learning (ML) promises to tackle the grand challenges in chemistry and speed up the generation, improvement and/or ordering of research hypotheses. Despite the overarching applicability of ML workflows, one usually finds diverse evaluation study designs. The current heterogeneity in evaluation techniques and metrics leads to difficulty in (or the impossibility of) comparing and assessing the relevance of new algorithms. Ultimately, this may delay the digitalization of chemistry at scale and confuse method developers, experimentalists, reviewers and journal editors. In this Perspective, we critically discuss a set of method development and evaluation guidelines for different types of ML-based publications, emphasizing supervised learning. We provide a diverse collection of examples from various authors and disciplines in chemistry. While taking into account varying accessibility across research groups, our recommendations focus on reporting completeness and standardizing comparisons between tools. We aim to further contribute to improved ML transparency and credibility by suggesting a checklist of retro-/prospective tests and dissecting their importance. We envisage that the wide adoption and continuous update of best practices will encourage an informed use of ML on real-world problems related to the chemical sciences.
Collapse
|
45
|
Rodrigues G, Souza Santos L, Franco OL. Antimicrobial Peptides Controlling Resistant Bacteria in Animal Production. Front Microbiol 2022; 13:874153. [PMID: 35663853 PMCID: PMC9161144 DOI: 10.3389/fmicb.2022.874153] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 04/06/2022] [Indexed: 11/13/2022] Open
Abstract
In the last few decades, antimicrobial resistance (AMR) has been a worldwide concern. The excessive use of antibiotics affects animal and human health. In the last few years, livestock production has used antibiotics as food supplementation. This massive use can be considered a principal factor in the accelerated development of genetic modifications in bacteria. These modifications are responsible for AMR and can be widespread to pathogenic and commensal bacteria. In addition, these antibiotic residues can be dispersed by water and sewer water systems, the contamination of soil and, water and plants, in addition, can be stocked in tissues such as muscle, milk, eggs, fat, and others. These residues can be spread to humans by the consumption of water or contaminated food. In addition, studies have demonstrated that antimicrobial resistance may be developed by vertical and horizontal gene transfer, producing a risk to public health. Hence, the World Health Organization in 2000 forbid the use of antibiotics for feed supplementation in livestock. In this context, to obtain safe food production, one of the potential substitutes for traditional antibiotics is the use of antimicrobial peptides (AMPs). In general, AMPs present anti-infective activity, and in some cases immune response. A limited number of AMP-based drugs are now available for use in animals and humans. This use is still not widespread due to a few problems like in-vivo effectiveness, stability, and high cost of production. This review will elucidate the different AMPs applications in animal diets, in an effort to generate safe food and control AMR.
Collapse
Affiliation(s)
- Gisele Rodrigues
- Centro de Análises Proteômicas e Bioquímicas, Programa de Pós-Graduação em Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília, Brasília, Brazil
- S-Inova Biotech, Programa de Pós-Graduação em Biotecnologia, Universidade Católica Dom Bosco, Campo Grande, Brazil
| | - Lucas Souza Santos
- Centro de Análises Proteômicas e Bioquímicas, Programa de Pós-Graduação em Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília, Brasília, Brazil
| | - Octávio Luiz Franco
- Centro de Análises Proteômicas e Bioquímicas, Programa de Pós-Graduação em Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília, Brasília, Brazil
- S-Inova Biotech, Programa de Pós-Graduação em Biotecnologia, Universidade Católica Dom Bosco, Campo Grande, Brazil
- *Correspondence: Octávio Luiz Franco
| |
Collapse
|
46
|
Rodríguez-Pérez R, Miljković F, Bajorath J. Machine Learning in Chemoinformatics and Medicinal Chemistry. Annu Rev Biomed Data Sci 2022; 5:43-65. [PMID: 35440144 DOI: 10.1146/annurev-biodatasci-122120-124216] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In chemoinformatics and medicinal chemistry, machine learning has evolved into an important approach. In recent years, increasing computational resources and new deep learning algorithms have put machine learning onto a new level, addressing previously unmet challenges in pharmaceutical research. In silico approaches for compound activity predictions, de novo design, and reaction modeling have been further advanced by new algorithmic developments and the emergence of big data in the field. Herein, novel applications of machine learning and deep learning in chemoinformatics and medicinal chemistry are reviewed. Opportunities and challenges for new methods and applications are discussed, placing emphasis on proper baseline comparisons, robust validation methodologies, and new applicability domains. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Raquel Rodríguez-Pérez
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany; .,Current affiliation: Novartis Institutes for Biomedical Research, Novartis Campus, Basel, Switzerland
| | - Filip Miljković
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany; .,Current affiliation: Data Science and AI, Imaging and Data Analytics, Clinical Pharmacology and Safety Sciences, R&D AstraZeneca, Gothenburg, Sweden
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany;
| |
Collapse
|
47
|
Zhang J, Wang Q, Shen W. Hyper-parameter optimization of multiple machine learning algorithms for molecular property prediction using hyperopt library. Chin J Chem Eng 2022. [DOI: 10.1016/j.cjche.2022.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
48
|
Sun X, Zhu J, Chen B, You H, Xu H. A feature transferring workflow between data-poor compounds in various tasks. PLoS One 2022; 17:e0266088. [PMID: 35353844 PMCID: PMC8967016 DOI: 10.1371/journal.pone.0266088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 03/14/2022] [Indexed: 12/03/2022] Open
Abstract
Compound screening by in silico approaches has advantages in identifying high-activity leading compounds and can predict the safety of the drug. A key challenge is that the number of observations of drug activity and toxicity accumulation varies by target in different datasets, some of which are more understudied than others. Owing to an overall insufficiency and imbalance of drug data, it is hard to accurately predict drug activity and toxicity of multiple tasks by the existing models. To solve this problem, this paper proposed a two-stage transfer learning workflow to develop a novel prediction model, which can accurately predict drug activity and toxicity of the targets with insufficient observations. We built a balanced dataset based on the Tox21 dataset and developed a drug activity and toxicity prediction model based on Siamese networks and graph convolution to produce multitasking output. We also took advantage of transfer learning from data-rich targets to data-poor targets. We showed greater accuracy in predicting the activity and toxicity of compounds to targets with rich data and poor data. In Tox21, a relatively rich dataset, the prediction model accuracy for classification tasks was 0.877 AUROC. In the other five unbalanced datasets, we also found that transfer learning strategies brought the accuracy of models to a higher level in understudied targets. Our models can overcome the imbalance in target data and predict the compound activity and toxicity of understudied targets to help prioritize upcoming biological experiments.
Collapse
Affiliation(s)
- Xiaofei Sun
- Chengdu Institute of Computer Application, Chinese Academy of Sciences, Chengdu, Sichuan, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jingyuan Zhu
- School of science, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Bin Chen
- University of Chinese Academy of Sciences, Beijing, China
- IRIAI, Harbin Institute of Technology, Shenzhen, Guangdong, China
- * E-mail: (BC); (HY)
| | - Hengzhi You
- School of science, Harbin Institute of Technology, Shenzhen, Guangdong, China
- * E-mail: (BC); (HY)
| | - Huiqing Xu
- Guangdong Energy Group Science and Technology Research Institute Co., Ltd., Guangzhou, Guangdong, China
| |
Collapse
|
49
|
Machine Learning Prediction of Critical Temperature of Organic Refrigerants by Molecular Topology. Processes (Basel) 2022. [DOI: 10.3390/pr10030577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
In this work, molecular structures, combined with machine learning algorithms, were applied to predict the critical temperatures (Tc) of a group of organic refrigerants. Aiming at solving the problem that previous models cannot distinguish isomers, a topological index was introduced. The results indicate that the novel molecular descriptor ‘molecular fingerprint + topological index’ can effectively differentiate isomers. The average absolute average deviation between the predicted and experimental values is 3.99%, which proves a reasonable prediction ability of the present method. In addition, the performance of the proposed model was compared with that of other previously reported methods. The results show that the present model is superior to other approaches with respect to accuracy.
Collapse
|
50
|
Abstract
Drug design is a complex pharmaceutical science with a long history. Many achievements have been made in the field of drug design since the end of 19th century, when Emil Fisher suggested that the drug-receptor interaction resembles the key and lock interplay. Gradually, drug design has been transformed into a coherent and well-organized science with a solid theoretical background and practical applications. Now, drug design is the most advanced approach for drug discovery. It utilizes the innovations in science and technology and includes them in its wide-ranging arsenal of methods and tools in order to achieve the main goal: discovery of effective, specific, non-toxic, safe and well-tolerated drugs. Drug design is one of the most intensively developing modern sciences and its progress is accelerated by the implication of artificial intelligence. The present review aims to capture some of the most important milestones in the development of drug design, to outline some of the most used current methods and to sketch the future perspective according to the author's point of view. Without pretending to cover fully the wide range of drug design topics, the review introduces the reader to the content of Molecules' Special Issue "Drug Design-Science and Practice".
Collapse
Affiliation(s)
- Irini Doytchinova
- Drug Design and Bioinformatics Lab, Faculty of Pharmacy, Medical University of Sofia, 1000 Sofia, Bulgaria
| |
Collapse
|