201
|
Lee YO, Kim YJ. The Effect of Resampling on Data‐imbalanced Conditions for Prediction towards Nuclear Receptor Profiling Using Deep Learning. Mol Inform 2020; 39:e1900131. [DOI: 10.1002/minf.201900131] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Accepted: 01/25/2020] [Indexed: 11/11/2022]
Affiliation(s)
- Yong Oh Lee
- Smart Convergence GroupKIST Europe Saarbrücken 66123 Germany
| | - Young Jun Kim
- Environmental Safety GroupKIST Europe Saarbrücken 66123 Germany
| |
Collapse
|
202
|
Keyvanpour MR, Shirzad MB. An Analysis of QSAR Research Based on Machine Learning Concepts. Curr Drug Discov Technol 2020; 18:17-30. [PMID: 32178612 DOI: 10.2174/1570163817666200316104404] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 08/22/2019] [Accepted: 10/28/2019] [Indexed: 11/22/2022]
Abstract
Quantitative Structure-Activity Relationship (QSAR) is a popular approach developed to correlate chemical molecules with their biological activities based on their chemical structures. Machine learning techniques have proved to be promising solutions to QSAR modeling. Due to the significant role of machine learning strategies in QSAR modeling, this area of research has attracted much attention from researchers. A considerable amount of literature has been published on machine learning based QSAR modeling methodologies whilst this domain still suffers from lack of a recent and comprehensive analysis of these algorithms. This study systematically reviews the application of machine learning algorithms in QSAR, aiming to provide an analytical framework. For this purpose, we present a framework called 'ML-QSAR'. This framework has been designed for future research to: a) facilitate the selection of proper strategies among existing algorithms according to the application area requirements, b) help to develop and ameliorate current methods and c) providing a platform to study existing methodologies comparatively. In ML-QSAR, first a structured categorization is depicted which studied the QSAR modeling research based on machine models. Then several criteria are introduced in order to assess the models. Finally, inspired by aforementioned criteria the qualitative analysis is carried out.
Collapse
Affiliation(s)
| | - Mehrnoush Barani Shirzad
- Data Mining Research Laboratory, Department of Computer Engineering, Alzahra University, Tehran, Iran
| |
Collapse
|
203
|
Chen CT, Gu GX. Generative Deep Neural Networks for Inverse Materials Design Using Backpropagation and Active Learning. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2020; 7:1902607. [PMID: 32154072 PMCID: PMC7055566 DOI: 10.1002/advs.201902607] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2019] [Revised: 11/11/2019] [Indexed: 05/19/2023]
Abstract
In recent years, machine learning (ML) techniques are seen to be promising tools to discover and design novel materials. However, the lack of robust inverse design approaches to identify promising candidate materials without exploring the entire design space causes a fundamental bottleneck. A general-purpose inverse design approach is presented using generative inverse design networks. This ML-based inverse design approach uses backpropagation to calculate the analytical gradients of an objective function with respect to design variables. This inverse design approach is capable of overcoming local minima traps by using backpropagation to provide rapid calculations of gradient information and running millions of optimizations with different initial values. Furthermore, an active learning strategy is adopted in the inverse design approach to improve the performance of candidate materials and reduce the amount of training data needed to do so. Compared to passive learning, the active learning strategy is capable of generating better designs and reducing the amount of training data by at least an order-of-magnitude in the case study on composite materials. The inverse design approach is compared with conventional gradient-based topology optimization and gradient-free genetic algorithms and the pros and cons of each method are discussed when applied to materials discovery and design problems.
Collapse
Affiliation(s)
- Chun-Teh Chen
- Department of Materials Science and Engineering University of California Berkeley CA 94720 USA
| | - Grace X Gu
- Department of Mechanical Engineering University of California Berkeley CA 94720 USA
| |
Collapse
|
204
|
Zhang L, Mao H, Liu Q, Gani R. Chemical product design – recent advances and perspectives. Curr Opin Chem Eng 2020. [DOI: 10.1016/j.coche.2019.10.005] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
|
205
|
Pracht P, Bohle F, Grimme S. Automated exploration of the low-energy chemical space with fast quantum chemical methods. Phys Chem Chem Phys 2020; 22:7169-7192. [PMID: 32073075 DOI: 10.1039/c9cp06869d] [Citation(s) in RCA: 890] [Impact Index Per Article: 222.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
We propose and discuss an efficient scheme for the in silico sampling for parts of the molecular chemical space by semiempirical tight-binding methods combined with a meta-dynamics driven search algorithm. The focus of this work is set on the generation of proper thermodynamic ensembles at a quantum chemical level for conformers, but similar procedures for protonation states, tautomerism and non-covalent complex geometries are also discussed. The conformational ensembles consisting of all significantly populated minimum energy structures normally form the basis of further, mostly DFT computational work, such as the calculation of spectra or macroscopic properties. By using basic quantum chemical methods, electronic effects or possible bond breaking/formation are accounted for and a very reasonable initial energetic ranking of the candidate structures is obtained. Due to the huge computational speedup gained by the fast low-cost quantum chemical methods, overall short computation times even for systems with hundreds of atoms (typically drug-sized molecules) are achieved. Furthermore, specialized applications, such as sampling with implicit solvation models or constrained conformational sampling for transition-states, metal-, surface-, or noncovalently bound complexes are discussed, opening many possible applications in modern computational chemistry and drug discovery. The procedures have been implemented in a freely available computer code called CREST, that makes use of the fast and reliable GFNn-xTB methods.
Collapse
Affiliation(s)
- Philipp Pracht
- Mulliken Center for Theoretical Chemistry, Universität Bonn, Beringstr. 4, 53115 Bonn, Germany.
| | | | | |
Collapse
|
206
|
Bonanno E, Ebejer JP. Applying Machine Learning to Ultrafast Shape Recognition in Ligand-Based Virtual Screening. Front Pharmacol 2020; 10:1675. [PMID: 32140104 PMCID: PMC7042174 DOI: 10.3389/fphar.2019.01675] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Accepted: 12/23/2019] [Indexed: 11/13/2022] Open
Abstract
Ultrafast Shape Recognition (USR), along with its derivatives, are Ligand-Based Virtual Screening (LBVS) methods that condense 3-dimensional information about molecular shape, as well as other properties, into a small set of numeric descriptors. These can be used to efficiently compute a measure of similarity between pairs of molecules using a simple inverse Manhattan Distance metric. In this study we explore the use of suitable Machine Learning techniques that can be trained using USR descriptors, so as to improve the similarity detection of potential new leads. We use molecules from the Directory for Useful Decoys-Enhanced to construct machine learning models based on three different algorithms: Gaussian Mixture Models (GMMs), Isolation Forests and Artificial Neural Networks (ANNs). We train models based on full molecule conformer models, as well as the Lowest Energy Conformations (LECs) only. We also investigate the performance of our models when trained on smaller datasets so as to model virtual screening scenarios when only a small number of actives are known a priori. Our results indicate significant performance gains over a state of the art USR-derived method, ElectroShape 5D, with GMMs obtaining a mean performance up to 430% better than that of ElectroShape 5D in terms of Enrichment Factor with a maximum improvement of up to 940%. Additionally, we demonstrate that our models are capable of maintaining their performance, in terms of enrichment factor, within 10% of the mean as the size of the training dataset is successively reduced. Furthermore, we also demonstrate that running times for retrospective screening using the machine learning models we selected are faster than standard USR, on average by a factor of 10, including the time required for training. Our results show that machine learning techniques can significantly improve the virtual screening performance and efficiency of the USR family of methods.
Collapse
Affiliation(s)
- Etienne Bonanno
- Department of Artificial Intelligence, University of Malta, Msida, Malta
| | - Jean-Paul Ebejer
- Centre for Molecular Medicine and Biobanking, University of Malta, Msida, Malta
| |
Collapse
|
207
|
Sevakula RK, Au-Yeung WTM, Singh JP, Heist EK, Isselbacher EM, Armoundas AA. State-of-the-Art Machine Learning Techniques Aiming to Improve Patient Outcomes Pertaining to the Cardiovascular System. J Am Heart Assoc 2020; 9:e013924. [PMID: 32067584 PMCID: PMC7070211 DOI: 10.1161/jaha.119.013924] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
| | | | - Jagmeet P Singh
- The Cardiac Arrhythmia Service Massachusetts General Hospital Boston MA
| | - E Kevin Heist
- The Cardiac Arrhythmia Service Massachusetts General Hospital Boston MA
| | | | - Antonis A Armoundas
- Cardiovascular Research Center Massachusetts General Hospital Boston MA.,Institute for Medical Engineering and Science Massachusetts Institute of Technology Cambridge MA
| |
Collapse
|
208
|
Wu Y, Lou L, Xie ZR. A Pilot Study of All-Computational Drug Design Protocol-From Structure Prediction to Interaction Analysis. Front Chem 2020; 8:81. [PMID: 32117898 PMCID: PMC7028743 DOI: 10.3389/fchem.2020.00081] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Accepted: 01/24/2020] [Indexed: 11/13/2022] Open
Abstract
Speeding up the drug discovery process is of great significance. To achieve that, high-efficiency methods should be exploited. The conventional wet-bench methods hardly meet the high-speed demand due to time-consuming experiments. Conversely, in silico approaches are much more efficient for drug discovery and design. However, in silico approaches usually serve as a supportive role in research processes. To fully exert the strength of computational methods, we propose a protocol which integrates various in silico approaches, from de novo protein structure prediction to ligand-protein interaction simulation. As a proof of concept, human SK2/calmodulin complex was used as a target for validation. First, we obtained a predicted structure of SK2/calmodulin and predicted binding sites which were consistent with the literature data. Then we investigated the ligand-protein interaction via virtual mutagenesis, flexible docking, and binding affinity calculation. As a result, the binding energies of mutants have similar trends compared with the EC50 values (R = 0.6 for NS309 in V481 mutants). The results indicate that our protocol can be applied to the drug design of structure unknown proteins. Our study also demonstrates that the integration of in silico approaches is feasible and it facilitates the acceleration of new drug discovery.
Collapse
Affiliation(s)
- Yifei Wu
- Computational Drug Discovery Laboratory, School of Electrical and Computer Engineering, College of Engineering, University of Georgia, Athens, GA, United States
| | - Lei Lou
- Computational Drug Discovery Laboratory, School of Electrical and Computer Engineering, College of Engineering, University of Georgia, Athens, GA, United States
| | - Zhong-Ru Xie
- Computational Drug Discovery Laboratory, School of Electrical and Computer Engineering, College of Engineering, University of Georgia, Athens, GA, United States
| |
Collapse
|
209
|
Feijoo F, Palopoli M, Bernstein J, Siddiqui S, Albright TE. Key indicators of phase transition for clinical trials through machine learning. Drug Discov Today 2020; 25:414-421. [DOI: 10.1016/j.drudis.2019.12.014] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Revised: 12/22/2019] [Accepted: 12/30/2019] [Indexed: 02/08/2023]
|
210
|
Houssein EH, Hosney ME, Oliva D, Mohamed WM, Hassaballah M. A novel hybrid Harris hawks optimization and support vector machines for drug design and discovery. Comput Chem Eng 2020. [DOI: 10.1016/j.compchemeng.2019.106656] [Citation(s) in RCA: 124] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
211
|
Computational basis for the design of PLK-2 inhibitors. Struct Chem 2020. [DOI: 10.1007/s11224-019-01394-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
212
|
Martinez-Mayorga K, Madariaga-Mazon A, Medina-Franco JL, Maggiora G. The impact of chemoinformatics on drug discovery in the pharmaceutical industry. Expert Opin Drug Discov 2020; 15:293-306. [PMID: 31965870 DOI: 10.1080/17460441.2020.1696307] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Introduction: Even though there have been substantial advances in our understanding of biological systems, research in drug discovery is only just now beginning to utilize this type of information. The single-target paradigm, which exemplifies the reductionist approach, remains a mainstay of drug research today. A deeper view of the complexity involved in drug discovery is necessary to advance on this field.Areas covered: This perspective provides a summary of research areas where cheminformatics has played a key role in drug discovery, including of the available resources as well as a personal perspective of the challenges still faced in the field.Expert opinion: Although great strides have been made in the handling and analysis of biological and pharmacological data, more must be done to link the data to biological pathways. This is crucial if one is to understand how drugs modify disease phenotypes, although this will involve a shift from the single drug/single target paradigm that remains a mainstay of drug research. Moreover, such a shift would require an increased awareness of the role of physiology in the mechanism of drug action, which will require the introduction of new mathematical, computer, and biological methods for chemoinformaticians to be trained in.
Collapse
Affiliation(s)
| | | | - José L Medina-Franco
- Facultad de Química, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | | |
Collapse
|
213
|
Bagherian M, Sabeti E, Wang K, Sartor MA, Nikolovska-Coleska Z, Najarian K. Machine learning approaches and databases for prediction of drug-target interaction: a survey paper. Brief Bioinform 2020; 22:247-269. [PMID: 31950972 PMCID: PMC7820849 DOI: 10.1093/bib/bbz157] [Citation(s) in RCA: 161] [Impact Index Per Article: 40.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Revised: 11/01/2019] [Accepted: 11/07/2019] [Indexed: 12/12/2022] Open
Abstract
The task of predicting the interactions between drugs and targets plays a key role in the process of drug discovery. There is a need to develop novel and efficient prediction approaches in order to avoid costly and laborious yet not-always-deterministic experiments to determine drug–target interactions (DTIs) by experiments alone. These approaches should be capable of identifying the potential DTIs in a timely manner. In this article, we describe the data required for the task of DTI prediction followed by a comprehensive catalog consisting of machine learning methods and databases, which have been proposed and utilized to predict DTIs. The advantages and disadvantages of each set of methods are also briefly discussed. Lastly, the challenges one may face in prediction of DTI using machine learning approaches are highlighted and we conclude by shedding some lights on important future research directions.
Collapse
Affiliation(s)
- Maryam Bagherian
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Elyas Sabeti
- Michigan Institute for Data Science, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Kai Wang
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Maureen A Sartor
- Department of Pathology, University of Michigan, Ann Arbor, MI, 48109, USA
| | | | - Kayvan Najarian
- Department of Electrical Engineering and Computer Science, College of Engineering, University of Michigan, Ann Arbor, MI, 48109, USA
| |
Collapse
|
214
|
Ma R, Li Y, Li C, Wan F, Hu H, Xu W, Zeng J. Secure multiparty computation for privacy-preserving drug discovery. Bioinformatics 2020; 36:2872-2880. [DOI: 10.1093/bioinformatics/btaa038] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Revised: 01/08/2020] [Accepted: 01/15/2020] [Indexed: 01/24/2023] Open
Abstract
Abstract
Motivation
Quantitative structure–activity relationship (QSAR) and drug–target interaction (DTI) prediction are both commonly used in drug discovery. Collaboration among pharmaceutical institutions can lead to better performance in both QSAR and DTI prediction. However, the drug-related data privacy and intellectual property issues have become a noticeable hindrance for inter-institutional collaboration in drug discovery.
Results
We have developed two novel algorithms under secure multiparty computation (MPC), including QSARMPC and DTIMPC, which enable pharmaceutical institutions to achieve high-quality collaboration to advance drug discovery without divulging private drug-related information. QSARMPC, a neural network model under MPC, displays good scalability and performance and is feasible for privacy-preserving collaboration on large-scale QSAR prediction. DTIMPC integrates drug-related heterogeneous network data and accurately predicts novel DTIs, while keeping the drug information confidential. Under several experimental settings that reflect the situations in real drug discovery scenarios, we have demonstrated that DTIMPC possesses significant performance improvement over the baseline methods, generates novel DTI predictions with supporting evidence from the literature and shows the feasible scalability to handle growing DTI data. All these results indicate that QSARMPC and DTIMPC can provide practically useful tools for advancing privacy-preserving drug discovery.
Availability and implementation
The source codes of QSARMPC and DTIMPC are available on the GitHub: https://github.com/rongma6/QSARMPC_DTIMPC.git.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rong Ma
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Yi Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Chenxing Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Fangping Wan
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Hailin Hu
- School of Medicine, Tsinghua University, Beijing 100084, China
| | - Wei Xu
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
- MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China
| |
Collapse
|
215
|
Chen G, Shen Z, Iyer A, Ghumman UF, Tang S, Bi J, Chen W, Li Y. Machine-Learning-Assisted De Novo Design of Organic Molecules and Polymers: Opportunities and Challenges. Polymers (Basel) 2020; 12:E163. [PMID: 31936321 PMCID: PMC7023065 DOI: 10.3390/polym12010163] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Revised: 12/27/2019] [Accepted: 01/02/2020] [Indexed: 12/18/2022] Open
Abstract
Organic molecules and polymers have a broad range of applications in biomedical, chemical, and materials science fields. Traditional design approaches for organic molecules and polymers are mainly experimentally-driven, guided by experience, intuition, and conceptual insights. Though they have been successfully applied to discover many important materials, these methods are facing significant challenges due to the tremendous demand of new materials and vast design space of organic molecules and polymers. Accelerated and inverse materials design is an ideal solution to these challenges. With advancements in high-throughput computation, artificial intelligence (especially machining learning, ML), and the growth of materials databases, ML-assisted materials design is emerging as a promising tool to flourish breakthroughs in many areas of materials science and engineering. To date, using ML-assisted approaches, the quantitative structure property/activity relation for material property prediction can be established more accurately and efficiently. In addition, materials design can be revolutionized and accelerated much faster than ever, through ML-enabled molecular generation and inverse molecular design. In this perspective, we review the recent progresses in ML-guided design of organic molecules and polymers, highlight several successful examples, and examine future opportunities in biomedical, chemical, and materials science fields. We further discuss the relevant challenges to solve in order to fully realize the potential of ML-assisted materials design for organic molecules and polymers. In particular, this study summarizes publicly available materials databases, feature representations for organic molecules, open-source tools for feature generation, methods for molecular generation, and ML models for prediction of material properties, which serve as a tutorial for researchers who have little experience with ML before and want to apply ML for various applications. Last but not least, it draws insights into the current limitations of ML-guided design of organic molecules and polymers. We anticipate that ML-assisted materials design for organic molecules and polymers will be the driving force in the near future, to meet the tremendous demand of new materials with tailored properties in different fields.
Collapse
Affiliation(s)
- Guang Chen
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA; (G.C.); (Z.S.)
| | - Zhiqiang Shen
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA; (G.C.); (Z.S.)
| | - Akshay Iyer
- Department of Mechanical Engineering, Northwestern University, Evanston, IL 60208, USA; (A.I.); (U.F.G.)
| | - Umar Farooq Ghumman
- Department of Mechanical Engineering, Northwestern University, Evanston, IL 60208, USA; (A.I.); (U.F.G.)
| | - Shan Tang
- State Key Laboratory of Structural Analysis for Industrial Equipment, Department of Engineering Mechanics, and International Research Center for Computational Mechanics, Dalian University of Technology, Dalian 116023, China;
| | - Jinbo Bi
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA;
| | - Wei Chen
- Department of Mechanical Engineering, Northwestern University, Evanston, IL 60208, USA; (A.I.); (U.F.G.)
| | - Ying Li
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA; (G.C.); (Z.S.)
- Polymer Program, Institute of Materials Science, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
216
|
Schneider M, Pons JL, Bourguet W, Labesse G. Towards accurate high-throughput ligand affinity prediction by exploiting structural ensembles, docking metrics and ligand similarity. Bioinformatics 2020; 36:160-168. [PMID: 31350558 PMCID: PMC6956784 DOI: 10.1093/bioinformatics/btz538] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Revised: 05/29/2019] [Accepted: 07/19/2019] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Nowadays, virtual screening (VS) plays a major role in the process of drug development. Nonetheless, an accurate estimation of binding affinities, which is crucial at all stages, is not trivial and may require target-specific fine-tuning. Furthermore, drug design also requires improved predictions for putative secondary targets among which is Estrogen Receptor alpha (ERα). RESULTS VS based on combinations of Structure-Based VS (SBVS) and Ligand-Based VS (LBVS) is gaining momentum to improve VS performances. In this study, we propose an integrated approach using ligand docking on multiple structural ensembles to reflect receptor flexibility. Then, we investigate the impact of the two different types of features (structure-based and ligand molecular descriptors) on affinity predictions using a random forest algorithm. We find that ligand-based features have lower predictive power (rP = 0.69, R2 = 0.47) than structure-based features (rP = 0.78, R2 = 0.60). Their combination maintains high accuracy (rP = 0.73, R2 = 0.50) on the internal test set, but it shows superior robustness on external datasets. Further improvement and extending the training dataset to include xenobiotics, leads to a novel high-throughput affinity prediction method for ERα ligands (rP = 0.85, R2 = 0.71). The presented prediction tool is provided to the community as a dedicated satellite of the @TOME server in which one can upload a ligand dataset in mol2 format and get ligand docked and affinity predicted. AVAILABILITY AND IMPLEMENTATION http://edmon.cbs.cnrs.fr. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Melanie Schneider
- Centre de Biochimie Structurale, CNRS, INSERM, Univ Montpellier, 34090 Montpellier, France
| | - Jean-Luc Pons
- Centre de Biochimie Structurale, CNRS, INSERM, Univ Montpellier, 34090 Montpellier, France
| | - William Bourguet
- Centre de Biochimie Structurale, CNRS, INSERM, Univ Montpellier, 34090 Montpellier, France
| | - Gilles Labesse
- Centre de Biochimie Structurale, CNRS, INSERM, Univ Montpellier, 34090 Montpellier, France
| |
Collapse
|
217
|
Abstract
There has been an upsurge of interest in applying machine learning to chemistry, and impressive predictive accuracies have been achieved, but this has been done without providing any insight into what has been learnt from the training data.
Collapse
|
218
|
Duan Q, Lee J. Fast-developing machine learning support complex system research in environmental chemistry. NEW J CHEM 2020. [DOI: 10.1039/c9nj05717j] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Machine learning will radically accelerate analysis of complex material networks in environmental chemistry.
Collapse
Affiliation(s)
- Qiannan Duan
- Department of Environment Science
- Shaanxi Normal University
- Xi’an 710062
- China
- State Key Laboratory of Pollution Control and Resource Reuse
| | - Jianchao Lee
- Department of Environment Science
- Shaanxi Normal University
- Xi’an 710062
- China
| |
Collapse
|
219
|
Preclinical toxicity of innovative molecules: In vitro, in vivo and metabolism prediction. Chem Biol Interact 2020; 315:108896. [DOI: 10.1016/j.cbi.2019.108896] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Revised: 10/19/2019] [Accepted: 11/08/2019] [Indexed: 11/22/2022]
|
220
|
Gadalla AAH, Friberg IM, Kift-Morgan A, Zhang J, Eberl M, Topley N, Weeks I, Cuff S, Wootton M, Gal M, Parekh G, Davis P, Gregory C, Hood K, Hughes K, Butler C, Francis NA. Identification of clinical and urine biomarkers for uncomplicated urinary tract infection using machine learning algorithms. Sci Rep 2019; 9:19694. [PMID: 31873085 PMCID: PMC6928162 DOI: 10.1038/s41598-019-55523-x] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2019] [Accepted: 11/19/2019] [Indexed: 12/14/2022] Open
Abstract
Women with uncomplicated urinary tract infection (UTI) symptoms are commonly treated with empirical antibiotics, resulting in overuse of antibiotics, which promotes antimicrobial resistance. Available diagnostic tools are either not cost-effective or diagnostically sub-optimal. Here, we identified clinical and urinary immunological predictors for UTI diagnosis. We explored 17 clinical and 42 immunological potential predictors for bacterial culture among women with uncomplicated UTI symptoms using random forest or support vector machine coupled with recursive feature elimination. Urine cloudiness was the best performing clinical predictor to rule out (negative likelihood ratio [LR−] = 0.4) and rule in (LR+ = 2.6) UTI. Using a more discriminatory scale to assess cloudiness (turbidity) increased the accuracy of UTI prediction further (LR+ = 4.4). Urinary levels of MMP9, NGAL, CXCL8 and IL-1β together had a higher LR+ (6.1) and similar LR− (0.4), compared to cloudiness. Varying the bacterial count thresholds for urine culture positivity did not alter best clinical predictor selection, but did affect the number of immunological predictors required for reaching an optimal prediction. We conclude that urine cloudiness is particularly helpful in ruling out negative UTI cases. The identified urinary biomarkers could be used to develop a point of care test for UTI but require further validation.
Collapse
Affiliation(s)
- Amal A H Gadalla
- Division of Population Medicine, School of Medicine, College of Biomedical and Life Sciences, Cardiff University, Cardiff, United Kingdom.
| | - Ida M Friberg
- Division of Infection & Immunity, School of Medicine, College of Biomedical and Life Sciences, Cardiff University, Cardiff, United Kingdom
| | - Ann Kift-Morgan
- Division of Infection & Immunity, School of Medicine, College of Biomedical and Life Sciences, Cardiff University, Cardiff, United Kingdom
| | - Jingjing Zhang
- Division of Infection & Immunity, School of Medicine, College of Biomedical and Life Sciences, Cardiff University, Cardiff, United Kingdom
| | - Matthias Eberl
- Division of Infection & Immunity, School of Medicine, College of Biomedical and Life Sciences, Cardiff University, Cardiff, United Kingdom.,Systems Immunity Research Institute, Cardiff University, Cardiff, United Kingdom
| | - Nicholas Topley
- Division of Infection & Immunity, School of Medicine, College of Biomedical and Life Sciences, Cardiff University, Cardiff, United Kingdom.,Systems Immunity Research Institute, Cardiff University, Cardiff, United Kingdom
| | - Ian Weeks
- Systems Immunity Research Institute, Cardiff University, Cardiff, United Kingdom.,Clinical Innovation Hub, School of Medicine, College of Biomedical and Life Sciences, Cardiff University, Cardiff, United Kingdom
| | - Simone Cuff
- Division of Infection & Immunity, School of Medicine, College of Biomedical and Life Sciences, Cardiff University, Cardiff, United Kingdom.,Systems Immunity Research Institute, Cardiff University, Cardiff, United Kingdom.,Clinical Innovation Hub, School of Medicine, College of Biomedical and Life Sciences, Cardiff University, Cardiff, United Kingdom
| | - Mandy Wootton
- Specialist Antimicrobial Chemotherapy Unit, Public Health Wales Microbiology Cardiff, University Hospital of Wales, Cardiff, United Kingdom
| | - Micaela Gal
- Division of Population Medicine, School of Medicine, College of Biomedical and Life Sciences, Cardiff University, Cardiff, United Kingdom
| | - Gita Parekh
- Mologic Ltd., Bedford Technology Park, Thurleigh, Bedford, United Kingdom
| | - Paul Davis
- Mologic Ltd., Bedford Technology Park, Thurleigh, Bedford, United Kingdom
| | - Clive Gregory
- Division of Population Medicine, School of Medicine, College of Biomedical and Life Sciences, Cardiff University, Cardiff, United Kingdom
| | - Kerenza Hood
- Centre for Trials Research, School of Medicine, College of Biomedical and Life Sciences, Cardiff University, Cardiff, United Kingdom
| | - Kathryn Hughes
- Division of Population Medicine, School of Medicine, College of Biomedical and Life Sciences, Cardiff University, Cardiff, United Kingdom
| | - Christopher Butler
- Division of Population Medicine, School of Medicine, College of Biomedical and Life Sciences, Cardiff University, Cardiff, United Kingdom.,Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, United Kingdom
| | - Nick A Francis
- Division of Population Medicine, School of Medicine, College of Biomedical and Life Sciences, Cardiff University, Cardiff, United Kingdom.,Primary Care, Population Sciences and Medical Education, University of Southampton, Southampton, United Kingdom
| |
Collapse
|
221
|
Neural-based approaches to overcome feature selection and applicability domain in drug-related property prediction. Appl Soft Comput 2019. [DOI: 10.1016/j.asoc.2019.105777] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
222
|
Der Torossian Torres M, de la Fuente-Nunez C. Reprogramming biological peptides to combat infectious diseases. Chem Commun (Camb) 2019; 55:15020-15032. [PMID: 31782426 DOI: 10.1039/c9cc07898c] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
With the rapid spread of resistance among parasites and bacterial pathogens, antibiotic-resistant infections have drawn much attention worldwide. Consequently, there is an urgent need to develop new strategies to treat neglected diseases and drug-resistant infections. Here, we outline several new strategies that have been developed to counter pathogenic microorganisms by designing and constructing antimicrobial peptides (AMPs). In addition to traditional discovery and design mechanisms guided by chemical biology, synthetic biology and computationally-based approaches offer useful tools for the discovery and generation of bioactive peptides. We believe that the convergence of such fields, coupled with systematic experimentation in animal models, will help translate biological peptides into the clinic. The future of anti-infective therapeutics is headed towards specifically designed molecules whose form is driven by computer-based frameworks. These molecules are selective, stable, and active at therapeutic doses.
Collapse
Affiliation(s)
- Marcelo Der Torossian Torres
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, and Department of Bioengineering, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, and Department of Bioengineering, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
| |
Collapse
|
223
|
Kirsch P, Hartman AM, Hirsch AKH, Empting M. Concepts and Core Principles of Fragment-Based Drug Design. Molecules 2019; 24:molecules24234309. [PMID: 31779114 PMCID: PMC6930586 DOI: 10.3390/molecules24234309] [Citation(s) in RCA: 94] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2019] [Revised: 11/11/2019] [Accepted: 11/20/2019] [Indexed: 02/06/2023] Open
Abstract
In this review, a general introduction to fragment-based drug design and the underlying concepts is given. General considerations and methodologies ranging from library selection/construction over biophysical screening and evaluation methods to in-depth hit qualification and subsequent optimization strategies are discussed. These principles can be generally applied to most classes of drug targets. The examples given for fragment growing, merging, and linking strategies at the end of the review are set in the fields of enzyme-inhibitor design and macromolecule–macromolecule interaction inhibition. Building upon the foundation of fragment-based drug discovery (FBDD) and its methodologies, we also highlight a few new trends in FBDD.
Collapse
Affiliation(s)
- Philine Kirsch
- Helmholtz-Institute for Pharmaceutical Research Saarland (HIPS)-Helmholtz Centre for Infection Research (HZI), Department of Drug Design and Optimization (DDOP), Campus E8.1, 66123 Saarbrücken, Germany; (P.K.); (A.M.H.); (A.K.H.H.)
- Department of Pharmacy, Saarland University, Campus E8.1, 66123 Saarbrücken, Germany
- German Centre for Infection Research (DZIF), Partner Site Hannover-Braunschweig, 66123 Saarbrücken, Germany
| | - Alwin M. Hartman
- Helmholtz-Institute for Pharmaceutical Research Saarland (HIPS)-Helmholtz Centre for Infection Research (HZI), Department of Drug Design and Optimization (DDOP), Campus E8.1, 66123 Saarbrücken, Germany; (P.K.); (A.M.H.); (A.K.H.H.)
- Department of Pharmacy, Saarland University, Campus E8.1, 66123 Saarbrücken, Germany
- Stratingh Institute for Chemistry, University of Groningen, Nijenborgh 7, 9747 AG Groningen, The Netherlands
| | - Anna K. H. Hirsch
- Helmholtz-Institute for Pharmaceutical Research Saarland (HIPS)-Helmholtz Centre for Infection Research (HZI), Department of Drug Design and Optimization (DDOP), Campus E8.1, 66123 Saarbrücken, Germany; (P.K.); (A.M.H.); (A.K.H.H.)
- Department of Pharmacy, Saarland University, Campus E8.1, 66123 Saarbrücken, Germany
- Stratingh Institute for Chemistry, University of Groningen, Nijenborgh 7, 9747 AG Groningen, The Netherlands
| | - Martin Empting
- Helmholtz-Institute for Pharmaceutical Research Saarland (HIPS)-Helmholtz Centre for Infection Research (HZI), Department of Drug Design and Optimization (DDOP), Campus E8.1, 66123 Saarbrücken, Germany; (P.K.); (A.M.H.); (A.K.H.H.)
- Department of Pharmacy, Saarland University, Campus E8.1, 66123 Saarbrücken, Germany
- German Centre for Infection Research (DZIF), Partner Site Hannover-Braunschweig, 66123 Saarbrücken, Germany
- Correspondence: ; Tel.: +49-681-988-062-031
| |
Collapse
|
224
|
Lee J, Kumar S, Lee SY, Park SJ, Kim MH. Development of Predictive Models for Identifying Potential S100A9 Inhibitors Based on Machine Learning Methods. Front Chem 2019; 7:779. [PMID: 31824919 PMCID: PMC6886474 DOI: 10.3389/fchem.2019.00779] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2019] [Accepted: 10/29/2019] [Indexed: 01/05/2023] Open
Abstract
S100A9 is a potential therapeutic target for various disease including prostate cancer, colorectal cancer, and Alzheimer's disease. However, the sparsity of atomic level data, such as protein-protein interaction of S100A9 with RAGE, TLR4/MD2, or CD147 (EMMPRIN) hinders the rational drug design of S100A9 inhibitors. Herein we first report predictive models of S100A9 inhibitory effect by applying machine learning classifiers on 2D-molecular descriptors. The models were optimized through feature selectors as well as classifiers to produce the top eight random forest models with robust predictability and high cost-effectiveness. Notably, optimal feature sets were obtained after the reduction of 2,798 features into dozens of features with the chopping of fingerprint bits. Moreover, the high efficiency of compact feature sets allowed us to further screen a large-scale dataset (over 6,000,000 compounds) within a week. Through a consensus vote of the top models, 46 hits (hit rate = 0.000713%) were identified as potential S100A9 inhibitors. We expect that our models will facilitate the drug discovery process by providing high predictive power as well as cost-reduction ability and give insights into designing novel drugs targeting S100A9.
Collapse
Affiliation(s)
- Jihyeun Lee
- Department of Pharmacy, Gachon Institute of Pharmaceutical Science, College of Pharmacy, Gachon University, Incheon, South Korea
| | - Surendra Kumar
- Department of Pharmacy, Gachon Institute of Pharmaceutical Science, College of Pharmacy, Gachon University, Incheon, South Korea
| | - Sang-Yoon Lee
- Gachon Advanced Institute for Health Science and Technology, Graduate School and Neuroscience Research Institute, Gachon University, Incheon, South Korea
| | - Sung Jean Park
- Department of Pharmacy, Gachon Institute of Pharmaceutical Science, College of Pharmacy, Gachon University, Incheon, South Korea
| | - Mi-hyun Kim
- Department of Pharmacy, Gachon Institute of Pharmaceutical Science, College of Pharmacy, Gachon University, Incheon, South Korea
| |
Collapse
|
225
|
Lu J, Hou X, Wang C, Zhang Y. Incorporating Explicit Water Molecules and Ligand Conformation Stability in Machine-Learning Scoring Functions. J Chem Inf Model 2019; 59:4540-4549. [PMID: 31638801 PMCID: PMC6878146 DOI: 10.1021/acs.jcim.9b00645] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Structure-based drug design is critically dependent on accuracy of molecular docking scoring functions, and there is of significant interest to advance scoring functions with machine learning approaches. In this work, by judiciously expanding the training set, exploring new features related to explicit mediating water molecules as well as ligand conformation stability, and applying extreme gradient boosting (XGBoost) with Δ-Vina parametrization, we have improved robustness and applicability of machine-learning scoring functions. The new scoring function ΔvinaXGB can not only perform consistently among the top compared to classical scoring functions for the CASF-2016 benchmark but also achieves significantly better prediction accuracy in different types of structures that mimic real docking applications.
Collapse
Affiliation(s)
- Jianing Lu
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Xuben Hou
- Department of Chemistry, New York University, New York, New York 10003, United States
- Department of Medicinal Chemistry, Key Laboratory of Chemical Biology (Ministry of Education), School of Pharmaceutical Science, Shandong University, Jinan, Shandong 250012, China
| | - Cheng Wang
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department of Chemistry, New York University, New York, New York 10003, United States
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
226
|
Lipinski CF, Maltarollo VG, Oliveira PR, da Silva ABF, Honorio KM. Advances and Perspectives in Applying Deep Learning for Drug Design and Discovery. Front Robot AI 2019; 6:108. [PMID: 33501123 PMCID: PMC7805776 DOI: 10.3389/frobt.2019.00108] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Accepted: 10/11/2019] [Indexed: 01/10/2023] Open
Abstract
Discovering (or planning) a new drug candidate involves many parameters, which makes this process slow, costly, and leading to failures at the end in some cases. In the last decades, we have witnessed a revolution in the computational area (hardware, software, large-scale computing, etc.), as well as an explosion in data generation (big data), which raises the need for more sophisticated algorithms to analyze this myriad of data. In this scenario, we can highlight the potentialities of artificial intelligence (AI) or computational intelligence (CI) as a powerful tool to analyze medicinal chemistry data. According to IEEE, computational intelligence involves the theory, the design, the application, and the development of biologically and linguistically motivated computational paradigms. In addition, CI encompasses three main methodologies: neural networks (NN), fuzzy systems, and evolutionary computation. In particular, artificial neural networks have been successfully applied in medicinal chemistry studies. A branch of the NN area that has attracted a lot of attention refers to deep learning (DL) due to its generalization power and ability to extract features from data. Therefore, in this mini-review we will briefly outline the present scope, advances, and challenges related to the use of DL in drug design and discovery, describing successful studies involving quantitative structure-activity relationships (QSAR) and virtual screening (VS) of databases containing thousands of compounds.
Collapse
Affiliation(s)
- Celio F Lipinski
- Departamento de Química e Física Molecular, Instituto de Química de São Carlos, Universidade de São Paulo, São Carlos, Brazil
| | | | - Patricia R Oliveira
- Escola de Artes, Ciências e Humanidades, Universidade de São Paulo, São Paulo, Brazil
| | - Alberico B F da Silva
- Departamento de Química e Física Molecular, Instituto de Química de São Carlos, Universidade de São Paulo, São Carlos, Brazil
| | - Kathia Maria Honorio
- Escola de Artes, Ciências e Humanidades, Universidade de São Paulo, São Paulo, Brazil.,Centro de Ciências Naturais e Humanas, Universidade Federal do ABC, Santo André, Brazil
| |
Collapse
|
227
|
Chemogenomic Analysis of the Druggable Kinome and Its Application to Repositioning and Lead Identification Studies. Cell Chem Biol 2019; 26:1608-1622.e6. [DOI: 10.1016/j.chembiol.2019.08.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Revised: 07/18/2019] [Accepted: 08/21/2019] [Indexed: 02/06/2023]
|
228
|
Learning-to-rank technique based on ignoring meaningless ranking orders between compounds. J Mol Graph Model 2019; 92:192-200. [DOI: 10.1016/j.jmgm.2019.07.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 07/17/2019] [Accepted: 07/17/2019] [Indexed: 11/19/2022]
|
229
|
Martin R, Heider D. ContraDRG: Automatic Partial Charge Prediction by Machine Learning. Front Genet 2019; 10:990. [PMID: 31737032 PMCID: PMC6831742 DOI: 10.3389/fgene.2019.00990] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Accepted: 09/18/2019] [Indexed: 01/14/2023] Open
Abstract
In recent years, machine learning techniques have been widely used in biomedical research to predict unseen data based on models trained on experimentally derived data. In the current study, we used machine learning algorithms to emulate computationally complex predictions in a reverse engineering-like manner and developed ContraDRG, a software that can be used to predict partial charges for small molecules based on PRODRG and Automated Topology Builder (ATB) predictions. Both tools generate molecular topology files, including the partial atomic charge, by using different procedures. We show that ContraDRG can accurately predict partial charges in a fraction of the time, because it exploits existing complex models with intensive calculations by using machine learning techniques and thus can also be applied for screening projects with large amounts of molecules. We provide ContraDRG as a web server, which can be used to automatically assign partial charges to incoming user-specified molecules by using our machine learning models. In this study, we compared ContraDRG with PRODRG and ATB in regard of predictivity by statistical methods. ContraDRG allows predicting ATB-derived partial charges with an R2 value up to 0.980 and for PRODRG up to 1.00. While ATB requires hours or days for the quantum mechanical accurate calculation and refinements, ContraDRG does its approximation within seconds.
Collapse
Affiliation(s)
- Roman Martin
- Department of Mathematics and Computer Science, University of Marbug, Marburg, Germany
- Department of Organic-Analytical Chemistry, TUM Campus Straubing, Straubing, Germany
| | - Dominik Heider
- Department of Mathematics and Computer Science, University of Marbug, Marburg, Germany
| |
Collapse
|
230
|
Cheng L, Kovachki NB, Welborn M, Miller TF. Regression Clustering for Improved Accuracy and Training Costs with Molecular-Orbital-Based Machine Learning. J Chem Theory Comput 2019; 15:6668-6677. [DOI: 10.1021/acs.jctc.9b00884] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Affiliation(s)
- Lixue Cheng
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Nikola B. Kovachki
- Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California 91125, United States
| | - Matthew Welborn
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Thomas F. Miller
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| |
Collapse
|
231
|
Schuler J, Samudrala R. Fingerprinting CANDO: Increased Accuracy with Structure- and Ligand-Based Shotgun Drug Repurposing. ACS OMEGA 2019; 4:17393-17403. [PMID: 31656912 PMCID: PMC6812124 DOI: 10.1021/acsomega.9b02160] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Accepted: 08/30/2019] [Indexed: 05/08/2023]
Abstract
We have upgraded our Computational Analysis of Novel Drug Opportunities (CANDO) platform for shotgun drug repurposing by including ligand-based, data fusion, and decision tree pipelines. The goal of shotgun drug repurposing is to screen and rank every existing human use drug or compound for every disease/indication. The first version of CANDO implemented a structure-based pipeline that modeled interactions between compounds and proteins on a large scale, generating compound-proteome interaction signatures used to infer the similarity of drug behavior; the new pipelines accomplish this by incorporating molecular fingerprints and the Tanimoto coefficient. We obtain improved benchmarking performance with the new pipelines across all three evaluation metrics used: average indication accuracy, pairwise accuracy, and coverage. The best performing pipeline achieves an average indication accuracy of 19.0% at the top10 cutoff, compared to 11.7% for v1, and 2.2% for a random control. Our results demonstrate that the CANDO drug recovery accuracy is substantially improved by integrating multiple pipelines, thereby enhancing our ability to generate putative therapeutic repurposing candidates, and increasing drug discovery efficiency.
Collapse
Affiliation(s)
- James Schuler
- Department of Biomedical
Informatics, Jacobs School of Medicine and
Biomedical Sciences at the University at Buffalo, Buffalo, New York 14203, United States
| | - Ram Samudrala
- Department of Biomedical
Informatics, Jacobs School of Medicine and
Biomedical Sciences at the University at Buffalo, Buffalo, New York 14203, United States
| |
Collapse
|
232
|
Andrade CH, Neves BJ, Melo-Filho CC, Rodrigues J, Silva DC, Braga RC, Cravo PVL. In Silico Chemogenomics Drug Repositioning Strategies for Neglected Tropical Diseases. Curr Med Chem 2019. [DOI: 10.2174/0929867325666180309114824] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Only ~1% of all drug candidates against Neglected Tropical Diseases (NTDs)
have reached clinical trials in the last decades, underscoring the need for new, safe and effective
treatments. In such context, drug repositioning, which allows finding novel indications
for approved drugs whose pharmacokinetic and safety profiles are already known,
emerging as a promising strategy for tackling NTDs. Chemogenomics is a direct descendent
of the typical drug discovery process that involves the systematic screening of chemical
compounds against drug targets in high-throughput screening (HTS) efforts, for the identification
of lead compounds. However, different to the one-drug-one-target paradigm, chemogenomics
attempts to identify all potential ligands for all possible targets and diseases. In
this review, we summarize current methodological development efforts in drug repositioning
that use state-of-the-art computational ligand- and structure-based chemogenomics approaches.
Furthermore, we highlighted the recent progress in computational drug repositioning
for some NTDs, based on curation and modeling of genomic, biological, and chemical data.
Additionally, we also present in-house and other successful examples and suggest possible solutions
to existing pitfalls.
Collapse
Affiliation(s)
- Carolina Horta Andrade
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Goiania, GO, 74605-170, Brazil
| | - Bruno Junior Neves
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Goiania, GO, 74605-170, Brazil
| | - Cleber Camilo Melo-Filho
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Goiania, GO, 74605-170, Brazil
| | - Juliana Rodrigues
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Goiania, GO, 74605-170, Brazil
| | - Diego Cabral Silva
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Goiania, GO, 74605-170, Brazil
| | - Rodolpho Campos Braga
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Goiania, GO, 74605-170, Brazil
| | - Pedro Vitor Lemos Cravo
- Laboratory of Cheminformatics, Centro Universitario de Anapolis (UniEVANGELICA), Anapolis, GO, 75083-515, Brazil
| |
Collapse
|
233
|
Compound optimization monitor (COMO) method for computational evaluation of progress in medicinal chemistry projects. FUTURE DRUG DISCOVERY 2019. [DOI: 10.4155/fdd-2019-0016] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Aim: Development of a new, practically applicable computational method to monitor progress in lead optimization. Computational approaches that aid in compound optimization are discussed and the Compound Optimization Monitor (COMO) method is introduced and put into scientific context. Methodology & calculations: The methodological concept and the COMO scoring scheme are described in detail. Results & discussions: Calculation parameters are evaluated, and profiling results reported for an ensemble of analog series. Future perspective: The dual role of virtual analogs as diagnostic tools for progress evaluation and as potential candidates for lead optimization is discussed. In light of this dual role, interfacing COMO with machine learning for compound activity prediction and prioritization of candidates is highlighted as a future research objective.
Collapse
|
234
|
Deep learning in drug discovery: opportunities, challenges and future prospects. Drug Discov Today 2019; 24:2017-2032. [DOI: 10.1016/j.drudis.2019.07.006] [Citation(s) in RCA: 104] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2019] [Revised: 06/11/2019] [Accepted: 07/18/2019] [Indexed: 12/27/2022]
|
235
|
Zheng L, Fan J, Mu Y. OnionNet: a Multiple-Layer Intermolecular-Contact-Based Convolutional Neural Network for Protein-Ligand Binding Affinity Prediction. ACS OMEGA 2019; 4:15956-15965. [PMID: 31592466 PMCID: PMC6776976 DOI: 10.1021/acsomega.9b01997] [Citation(s) in RCA: 146] [Impact Index Per Article: 29.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Accepted: 09/06/2019] [Indexed: 05/12/2023]
Abstract
Computational drug discovery provides an efficient tool for helping large-scale lead molecule screening. One of the major tasks of lead discovery is identifying molecules with promising binding affinities toward a target, a protein in general. The accuracies of current scoring functions that are used to predict the binding affinity are not satisfactory enough. Thus, machine learning or deep learning based methods have been developed recently to improve the scoring functions. In this study, a deep convolutional neural network model (called OnionNet) is introduced; its features are based on rotation-free element-pair-specific contacts between ligands and protein atoms, and the contacts are further grouped into different distance ranges to cover both the local and nonlocal interaction information between the ligand and the protein. The prediction power of the model is evaluated and compared with other scoring functions using the comparative assessment of scoring functions (CASF-2013) benchmark and the v2016 core set of the PDBbind database. The robustness of the model is further explored by predicting the binding affinities of the complexes generated from docking simulations instead of experimentally determined PDB structures.
Collapse
Affiliation(s)
- Liangzhen Zheng
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551, Singapore
| | - Jingrong Fan
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551, Singapore
| | - Yuguang Mu
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551, Singapore
| |
Collapse
|
236
|
Rodríguez-Pérez R, Bajorath J. Interpretation of Compound Activity Predictions from Complex Machine Learning Models Using Local Approximations and Shapley Values. J Med Chem 2019; 63:8761-8777. [PMID: 31512867 DOI: 10.1021/acs.jmedchem.9b01101] [Citation(s) in RCA: 139] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
In qualitative or quantitative studies of structure-activity relationships (SARs), machine learning (ML) models are trained to recognize structural patterns that differentiate between active and inactive compounds. Understanding model decisions is challenging but of critical importance to guide compound design. Moreover, the interpretation of ML results provides an additional level of model validation based on expert knowledge. A number of complex ML approaches, especially deep learning (DL) architectures, have distinctive black-box character. Herein, a locally interpretable explanatory method termed Shapley additive explanations (SHAP) is introduced for rationalizing activity predictions of any ML algorithm, regardless of its complexity. Models resulting from random forest (RF), nonlinear support vector machine (SVM), and deep neural network (DNN) learning are interpreted, and structural patterns determining the predicted probability of activity are identified and mapped onto test compounds. The results indicate that SHAP has high potential for rationalizing predictions of complex ML models.
Collapse
Affiliation(s)
- Raquel Rodríguez-Pérez
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, D-53115 Bonn, Germany.,Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riß, Germany
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, D-53115 Bonn, Germany
| |
Collapse
|
237
|
Martin EJ, Polyakov VR, Zhu XW, Tian L, Mukherjee P, Liu X. All-Assay-Max2 pQSAR: Activity Predictions as Accurate as Four-Concentration IC 50s for 8558 Novartis Assays. J Chem Inf Model 2019; 59:4450-4459. [PMID: 31518124 DOI: 10.1021/acs.jcim.9b00375] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Profile-quantitative structure-activity relationship (pQSAR) is a massively multitask, two-step machine learning method with unprecedented scope, accuracy, and applicability domain. In step one, a "profile" of conventional single-assay random forest regression models are trained on a very large number of biochemical and cellular pIC50 assays using Morgan 2 substructural fingerprints as compound descriptors. In step two, a panel of partial least squares (PLS) models are built using the profile of pIC50 predictions from those random forest regression models as compound descriptors (hence the name). Previously described for a panel of 728 biochemical and cellular kinase assays, we have now built an enormous pQSAR from 11 805 diverse Novartis (NVS) IC50 and EC50 assays. This large number of assays, and hence of compound descriptors for PLS, dictated reducing the profile by only including random forest regression models whose predictions correlate with the assay being modeled. The random forest regression and pQSAR models were evaluated with our "realistically novel" held-out test set, whose median average similarity to the nearest training set member across the 11 805 assays was only 0.34, comparable to the novelty of compounds actually selected from virtual screens. For the 11 805 single-assay random forest regression models, the median correlation of prediction with the experiment was only rext2 = 0.05, virtually random, and only 8% of the models achieved our standard success threshold of rext2 = 0.30. For pQSAR, the median correlation was rext2 = 0.53, comparable to four-concentration experimental IC50s, and 72% of the models met our rext2 > 0.30 standard, totaling 8558 successful models. The successful models included assays from all of the 51 annotated target subclasses, as well as 4196 phenotypic assays, indicating that pQSAR can be applied to virtually any disease area. Every month, all models are updated to include new measurements, and predictions are made for 5.5 million NVS compounds, totaling 50 billion predictions. Common uses have included virtual screening, selectivity design, toxicity and promiscuity prediction, mechanism-of-action prediction, and others. Several such actual applications are described.
Collapse
Affiliation(s)
- Eric J Martin
- Novartis Institute for Biomedical Research , 5300 Chiron Way , Emeryville , California 94608-2916 , United States
| | - Valery R Polyakov
- Novartis Institute for Biomedical Research , 5300 Chiron Way , Emeryville , California 94608-2916 , United States
| | - Xiang-Wei Zhu
- Novartis Institute for Biomedical Research , 5300 Chiron Way , Emeryville , California 94608-2916 , United States
| | - Li Tian
- Novartis Institute for Biomedical Research , 5300 Chiron Way , Emeryville , California 94608-2916 , United States.,China Novartis Institutes for BioMedical Research Company, Limited , 2F, Building 4, Novartis Campus, No. 4218 Jinke Road , Zhangjiang, Pudong, Shanghai 201203 , China
| | - Prasenjit Mukherjee
- Novartis Institute for Biomedical Research , 5300 Chiron Way , Emeryville , California 94608-2916 , United States
| | - Xin Liu
- Novartis Institute for Biomedical Research , 5300 Chiron Way , Emeryville , California 94608-2916 , United States.,China Novartis Institutes for BioMedical Research Company, Limited , 2F, Building 4, Novartis Campus, No. 4218 Jinke Road , Zhangjiang, Pudong, Shanghai 201203 , China
| |
Collapse
|
238
|
Naveja JJ, Pilón-Jiménez BA, Bajorath J, Medina-Franco JL. A general approach for retrosynthetic molecular core analysis. J Cheminform 2019; 11:61. [PMID: 33430974 PMCID: PMC6760108 DOI: 10.1186/s13321-019-0380-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Accepted: 08/04/2019] [Indexed: 11/13/2022] Open
Abstract
Scaffold analysis of compound data sets has reemerged as a chemically interpretable alternative to machine learning for chemical space and structure–activity relationships analysis. In this context, analog series-based scaffolds (ASBS) are synthetically relevant core structures that represent individual series of analogs. As an extension to ASBS, we herein introduce the development of a general conceptual framework that considers all putative cores of molecules in a compound data set, thus softening the often applied “single molecule–single scaffold” correspondence. A putative core is here defined as any substructure of a molecule complying with two basic rules: (a) the size of the core is a significant proportion of the whole molecule size and (b) the substructure can be reached from the original molecule through a succession of retrosynthesis rules. Thereafter, a bipartite network consisting of molecules and cores can be constructed for a database of chemical structures. Compounds linked to the same cores are considered analogs. We present case studies illustrating the potential of the general framework. The applications range from inter- and intra-core diversity analysis of compound data sets, structure–property relationships, and identification of analog series and ASBS. The molecule–core network herein presented is a general methodology with multiple applications in scaffold analysis. New statistical methods are envisioned that will be able to draw quantitative conclusions from these data. The code to use the method presented in this work is freely available as an additional file. Follow-up applications include analog searching and core structure–property relationships analyses.![]()
Collapse
Affiliation(s)
- J Jesús Naveja
- PECEM, School of Medicine, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510, Mexico City, Mexico. .,Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510, Mexico City, Mexico.
| | - B Angélica Pilón-Jiménez
- Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510, Mexico City, Mexico
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, 53115, Bonn, Germany
| | - José L Medina-Franco
- Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510, Mexico City, Mexico.
| |
Collapse
|
239
|
Molecular Docking: Shifting Paradigms in Drug Discovery. Int J Mol Sci 2019; 20:ijms20184331. [PMID: 31487867 PMCID: PMC6769923 DOI: 10.3390/ijms20184331] [Citation(s) in RCA: 835] [Impact Index Per Article: 167.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Revised: 09/02/2019] [Accepted: 09/02/2019] [Indexed: 12/11/2022] Open
Abstract
Molecular docking is an established in silico structure-based method widely used in drug discovery. Docking enables the identification of novel compounds of therapeutic interest, predicting ligand-target interactions at a molecular level, or delineating structure-activity relationships (SAR), without knowing a priori the chemical structure of other target modulators. Although it was originally developed to help understanding the mechanisms of molecular recognition between small and large molecules, uses and applications of docking in drug discovery have heavily changed over the last years. In this review, we describe how molecular docking was firstly applied to assist in drug discovery tasks. Then, we illustrate newer and emergent uses and applications of docking, including prediction of adverse effects, polypharmacology, drug repurposing, and target fishing and profiling, discussing also future applications and further potential of this technique when combined with emergent techniques, such as artificial intelligence.
Collapse
|
240
|
Onay A, Onay M. A Drug Decision Support System for Developing a Successful Drug Candidate Using Machine Learning Techniques. Curr Comput Aided Drug Des 2019; 16:407-419. [PMID: 31438830 DOI: 10.2174/1573409915666190716143601] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2019] [Revised: 04/24/2019] [Accepted: 05/06/2019] [Indexed: 11/22/2022]
Abstract
BACKGROUND Virtual screening of candidate drug molecules using machine learning techniques plays a key role in pharmaceutical industry to design and discovery of new drugs. Computational classification methods can determine drug types according to the disease groups and distinguish approved drugs from withdrawn ones. INTRODUCTION Classification models developed in this study can be used as a simple filter in drug modelling to eliminate potentially inappropriate molecules in the early stages. In this work, we developed a Drug Decision Support System (DDSS) to classify each drug candidate molecule as potentially drug or non-drug and to predict its disease group. METHODS Molecular descriptors were identified for the determination of a number of rules in drug molecules. They were derived using ADRIANA.Code program and Lipinski's rule of five. We used Artificial Neural Network (ANN) to classify drug molecules correctly according to the types of diseases. Closed frequent molecular structures in the form of subgraph fragments were also obtained with Gaston algorithm included in ParMol Package to find common molecular fragments for withdrawn drugs. RESULTS We observed that TPSA, XlogP Natoms, HDon_O and TPSA are the most distinctive features in the pool of the molecular descriptors and evaluated the performances of classifiers on all datasets and found that classification accuracies are very high on all the datasets. Neural network models achieved 84.6% and 83.3% accuracies on test sets including cardiac therapy, anti-epileptics and anti-parkinson drugs with approved and withdrawn drugs for drug classification problems. CONCLUSION The experimental evaluation shows that the system is promising at determination of potential drug molecules to classify drug molecules correctly according to the types of diseases.
Collapse
Affiliation(s)
- Aytun Onay
- Department of Computer Engineering, Faculty of Engineering & Architecture, Kafkas University, Kars, 36100, Turkey
| | - Melih Onay
- Department of Environmental Engineering, Computational & Experimental Biochemistry Lab, Faculty of Engineering, Van Yuzuncu Yil University, 65100, Van, Turkey
| |
Collapse
|
241
|
de Almeida AF, Moreira R, Rodrigues T. Synthetic organic chemistry driven by artificial intelligence. Nat Rev Chem 2019. [DOI: 10.1038/s41570-019-0124-0] [Citation(s) in RCA: 111] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
242
|
Hua D, Patabandige MW, Go EP, Desaire H. The Aristotle Classifier: Using the Whole Glycomic Profile To Indicate a Disease State. Anal Chem 2019; 91:11070-11077. [PMID: 31407893 DOI: 10.1021/acs.analchem.9b01606] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
"The totality is not, as it were, a mere heap, but the whole is something besides the parts."-Aristotle. We built a classifier that uses the totality of the glycomic profile, not restricted to a few glycoforms, to differentiate samples from two different sources. This approach, which relies on using thousands of features, is a radical departure from current strategies, where most of the glycomic profile is ignored in favor of selecting a few features, or even a single feature, meant to capture the differences in sample types. The classifier can be used to differentiate the source of the material; applicable sources may be different species of animals, different protein production methods, or, most importantly, different biological states (disease vs healthy). The classifier can be used on glycomic data in any form, including derivatized monosaccharides, intact glycans, or glycopeptides. It takes advantage of the fact that changing the source material can cause a change in the glycomic profile in many subtle ways: some glycoforms can be upregulated, some downregulated, some may appear unchanged, yet their proportion-with respect to other forms present-can be altered to a detectable degree. By classifying samples using the entirety of their glycan abundances, along with the glycans' relative proportions to each other, the "Aristotle Classifier" is more effective at capturing the underlying trends than standard classification procedures used in glycomics, including PCA (principal components analysis). It also outperforms workflows where a single, representative glycomic-based biomarker is used to classify samples. We describe the Aristotle Classifier and provide several examples of its utility for biomarker studies and other classification problems using glycomic data from several sources.
Collapse
Affiliation(s)
- David Hua
- Department of Chemistry , University of Kansas , Lawrence , Kansas 66045 , United States
| | | | - Eden P Go
- Department of Chemistry , University of Kansas , Lawrence , Kansas 66045 , United States
| | - Heather Desaire
- Department of Chemistry , University of Kansas , Lawrence , Kansas 66045 , United States
| |
Collapse
|
243
|
Alazmi M, Kuwahara H, Soufan O, Ding L, Gao X. Systematic selection of chemical fingerprint features improves the Gibbs energy prediction of biochemical reactions. Bioinformatics 2019; 35:2634-2643. [PMID: 30590445 PMCID: PMC6662295 DOI: 10.1093/bioinformatics/bty1035] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2018] [Revised: 09/26/2018] [Accepted: 12/19/2018] [Indexed: 01/09/2023] Open
Abstract
MOTIVATION Accurate and wide-ranging prediction of thermodynamic parameters for biochemical reactions can facilitate deeper insights into the workings and the design of metabolic systems. RESULTS Here, we introduce a machine learning method with chemical fingerprint-based features for the prediction of the Gibbs free energy of biochemical reactions. From a large pool of 2D fingerprint-based features, this method systematically selects a small number of relevant ones and uses them to construct a regularized linear model. Since a manual selection of 2D structure-based features can be a tedious and time-consuming task, requiring expert knowledge about the structure-activity relationship of chemical compounds, the systematic feature selection step in our method offers a convenient means to identify relevant 2D fingerprint-based features. By comparing our method with state-of-the-art linear regression-based methods for the standard Gibbs free energy prediction, we demonstrated that its prediction accuracy and prediction coverage are most favorable. Our results show direct evidence that a number of 2D fingerprints collectively provide useful information about the Gibbs free energy of biochemical reactions and that our systematic feature selection procedure provides a convenient way to identify them. AVAILABILITY AND IMPLEMENTATION Our software is freely available for download at http://sfb.kaust.edu.sa/Pages/Software.aspx. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Meshari Alazmi
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Thuwal, Saudi Arabia
| | - Hiroyuki Kuwahara
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Thuwal, Saudi Arabia
| | - Othman Soufan
- Institute of Parasitology, McGill University, Montreal, Quebec, Canada
| | - Lizhong Ding
- Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, UAE
| | - Xin Gao
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Thuwal, Saudi Arabia
| |
Collapse
|
244
|
Torres MD, Sothiselvam S, Lu TK, de la Fuente-Nunez C. Peptide Design Principles for Antimicrobial Applications. J Mol Biol 2019; 431:3547-3567. [DOI: 10.1016/j.jmb.2018.12.015] [Citation(s) in RCA: 184] [Impact Index Per Article: 36.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2018] [Revised: 12/19/2018] [Accepted: 12/22/2018] [Indexed: 02/08/2023]
|
245
|
Liu P, Li H, Li S, Leung KS. Improving prediction of phenotypic drug response on cancer cell lines using deep convolutional network. BMC Bioinformatics 2019; 20:408. [PMID: 31357929 PMCID: PMC6664725 DOI: 10.1186/s12859-019-2910-6] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2018] [Accepted: 05/21/2019] [Indexed: 12/11/2022] Open
Abstract
Background Understanding the phenotypic drug response on cancer cell lines plays a vital role in anti-cancer drug discovery and re-purposing. The Genomics of Drug Sensitivity in Cancer (GDSC) database provides open data for researchers in phenotypic screening to build and test their models. Previously, most research in these areas starts from the molecular fingerprints or physiochemical features of drugs, instead of their structures. Results In this paper, a model called twin Convolutional Neural Network for drugs in SMILES format (tCNNS) is introduced for phenotypic screening. tCNNS uses a convolutional network to extract features for drugs from their simplified molecular input line entry specification (SMILES) format and uses another convolutional network to extract features for cancer cell lines from the genetic feature vectors respectively. After that, a fully connected network is used to predict the interaction between the drugs and the cancer cell lines. When the training set and the testing set are divided based on the interaction pairs between drugs and cell lines, tCNNS achieves 0.826, 0.831 for the mean and top quartile of the coefficient of determinant (R2) respectively and 0.909, 0.912 for the mean and top quartile of the Pearson correlation (Rp) respectively, which are significantly better than those of the previous works (Ammad-Ud-Din et al., J Chem Inf Model 54:2347–9, 2014), (Haider et al., PLoS ONE 10:0144490, 2015), (Menden et al., PLoS ONE 8:61318, 2013). However, when the training set and the testing set are divided exclusively based on drugs or cell lines, the performance of tCNNS decreases significantly and Rp and R2 drop to barely above 0. Conclusions Our approach is able to predict the drug effects on cancer cell lines with high accuracy, and its performance remains stable with less but high-quality data, and with fewer features for the cancer cell lines. tCNNS can also solve the problem of outliers in other feature space. Besides achieving high scores in these statistical metrics, tCNNS also provides some insights into the phenotypic screening. However, the performance of tCNNS drops in the blind test. Electronic supplementary material The online version of this article (10.1186/s12859-019-2910-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Pengfei Liu
- Department of Computer Science and Engineering, the Chinese University of Hong Kong, Sha Tin, N.T., Hong Kong, China.
| | - Hongjian Li
- SDIVF R&D Centre, Hong Kong Science Park, Sha Tin, N.T., Hong Kong, China.,CUHK-SDU Reproductive Genetics Joint Laboratory, School of Biomedical Sciences, the Chinese University of Hong Kong, Sha Tin, N.T., Hong Kong, China
| | - Shuai Li
- Department of Computer Science and Engineering, the Chinese University of Hong Kong, Sha Tin, N.T., Hong Kong, China
| | - Kwong-Sak Leung
- Department of Computer Science and Engineering, the Chinese University of Hong Kong, Sha Tin, N.T., Hong Kong, China
| |
Collapse
|
246
|
Wei Y, Li W, Du T, Hong Z, Lin J. Targeting HIV/HCV Coinfection Using a Machine Learning-Based Multiple Quantitative Structure-Activity Relationships (Multiple QSAR) Method. Int J Mol Sci 2019; 20:ijms20143572. [PMID: 31336592 PMCID: PMC6678913 DOI: 10.3390/ijms20143572] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Revised: 07/13/2019] [Accepted: 07/21/2019] [Indexed: 12/11/2022] Open
Abstract
Human immunodeficiency virus type-1 and hepatitis C virus (HIV/HCV) coinfection occurs when a patient is simultaneously infected with both human immunodeficiency virus type-1 (HIV-1) and hepatitis C virus (HCV), which is common today in certain populations. However, the treatment of coinfection is a challenge because of the special considerations needed to ensure hepatic safety and avoid drug–drug interactions. Multitarget inhibitors with less toxicity may provide a promising therapeutic strategy for HIV/HCV coinfection. However, the identification of one molecule that acts on multiple targets simultaneously by experimental evaluation is costly and time-consuming. In silico target prediction tools provide more opportunities for the development of multitarget inhibitors. In this study, by combining Naïve Bayes (NB) and support vector machine (SVM) algorithms with two types of molecular fingerprints, MACCS and extended connectivity fingerprints 6 (ECFP6), 60 classification models were constructed to predict compounds that were active against 11 HIV-1 targets and four HCV targets based on a multiple quantitative structure–activity relationships (multiple QSAR) method. Five-fold cross-validation and test set validation were performed to measure the performance of the 60 classification models. Our results show that the 60 multiple QSAR models appeared to have high classification accuracy in terms of the area under the ROC curve (AUC) values, which ranged from 0.83 to 1 with a mean value of 0.97 for the HIV-1 models and from 0.84 to 1 with a mean value of 0.96 for the HCV models. Furthermore, the 60 models were used to comprehensively predict the potential targets of an additional 46 compounds, including 27 approved HIV-1 drugs, 10 approved HCV drugs and nine selected compounds known to be active against one or more targets of HIV-1 or HCV. Finally, 20 hits, including seven approved HIV-1 drugs, four approved HCV drugs, and nine other compounds, were predicted to be HIV/HCV coinfection multitarget inhibitors. The reported bioactivity data confirmed that seven out of nine compounds actually interacted with HIV-1 and HCV targets simultaneously with diverse binding affinities. The remaining predicted hits and chemical-protein interaction pairs with the potential ability to suppress HIV/HCV coinfection are worthy of further experimental investigation. This investigation shows that the multiple QSAR method is useful in predicting chemical-protein interactions for the discovery of multitarget inhibitors and provides a unique strategy for the treatment of HIV/HCV coinfection.
Collapse
Affiliation(s)
- Yu Wei
- State Key Laboratory of Medicinal Chemical Biology, College of Pharmacy and Tianjin Key Laboratory of Molecular Drug Research, Nankai University, Haihe Education Park, 38 Tongyan Road, Tianjin 300353, China
| | - Wei Li
- State Key Laboratory of Medicinal Chemical Biology, College of Pharmacy and Tianjin Key Laboratory of Molecular Drug Research, Nankai University, Haihe Education Park, 38 Tongyan Road, Tianjin 300353, China
- Platform of Pharmaceutical Intelligence, Tianjin International Joint Academy of Biomedicine, Tianjin 300000, China
| | - Tengfei Du
- State Key Laboratory of Medicinal Chemical Biology, College of Pharmacy and Tianjin Key Laboratory of Molecular Drug Research, Nankai University, Haihe Education Park, 38 Tongyan Road, Tianjin 300353, China
| | - Zhangyong Hong
- State Key Laboratory of Medicinal Chemical Biology, College of Life Sciences, Nankai University, 94 Weijin Road, Tianjin 300071, China.
| | - Jianping Lin
- State Key Laboratory of Medicinal Chemical Biology, College of Pharmacy and Tianjin Key Laboratory of Molecular Drug Research, Nankai University, Haihe Education Park, 38 Tongyan Road, Tianjin 300353, China.
- Platform of Pharmaceutical Intelligence, Tianjin International Joint Academy of Biomedicine, Tianjin 300000, China.
- Biodesign Center, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China.
| |
Collapse
|
247
|
Ståhl N, Falkman G, Karlsson A, Mathiason G, Boström J. Deep Reinforcement Learning for Multiparameter Optimization in de novo Drug Design. J Chem Inf Model 2019; 59:3166-3176. [PMID: 31273995 DOI: 10.1021/acs.jcim.9b00325] [Citation(s) in RCA: 81] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
In medicinal chemistry programs it is key to design and make compounds that are efficacious and safe. This is a long, complex, and difficult multiparameter optimization process, often including several properties with orthogonal trends. New methods for the automated design of compounds against profiles of multiple properties are thus of great value. Here we present a fragment-based reinforcement learning approach based on an actor-critic model, for the generation of novel molecules with optimal properties. The actor and the critic are both modeled with bidirectional long short-term memory (LSTM) networks. The AI method learns how to generate new compounds with desired properties by starting from an initial set of lead molecules and then improving these by replacing some of their fragments. A balanced binary tree based on the similarity of fragments is used in the generative process to bias the output toward structurally similar molecules. The method is demonstrated by a case study showing that 93% of the generated molecules are chemically valid and more than a third satisfy the targeted objectives, while there were none in the initial set.
Collapse
Affiliation(s)
- Niclas Ståhl
- School of Informatics , University of Skövde , 541 28 Skövde , Sweden
| | - Göran Falkman
- School of Informatics , University of Skövde , 541 28 Skövde , Sweden
| | | | - Gunnar Mathiason
- School of Informatics , University of Skövde , 541 28 Skövde , Sweden
| | - Jonas Boström
- Medicinal Chemistry, Early Cardiovascular, Renal and Metabolism, R&D BioPharmaceuticals , AstraZeneca , 431 83 Mölndal , Sweden
| |
Collapse
|
248
|
Meng HY, Jin WL, Yan CK, Yang H. The Application of Machine Learning Techniques in Clinical Drug Therapy. Curr Comput Aided Drug Des 2019; 15:111-119. [PMID: 29804538 DOI: 10.2174/1573409914666180525124608] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Revised: 05/15/2018] [Accepted: 05/22/2018] [Indexed: 12/19/2022]
Abstract
INTRODUCTION The development of a novel drug is an extremely complicated process that includes the target identification, design and manufacture, and proper therapy of the novel drug, as well as drug dose selection, drug efficacy evaluation, and adverse drug reaction control. Due to the limited resources, high costs, long duration, and low hit-to-lead ratio in the development of pharmacogenetics and computer technology, machine learning techniques have assisted novel drug development and have gradually received more attention by researchers. METHODS According to current research, machine learning techniques are widely applied in the process of the discovery of new drugs and novel drug targets, the decision surrounding proper therapy and drug dose, and the prediction of drug efficacy and adverse drug reactions. RESULTS AND CONCLUSION In this article, we discussed the history, workflow, and advantages and disadvantages of machine learning techniques in the processes mentioned above. Although the advantages of machine learning techniques are fairly obvious, the application of machine learning techniques is currently limited. With further research, the application of machine techniques in drug development could be much more widespread and could potentially be one of the major methods used in drug development.
Collapse
Affiliation(s)
- Huan-Yu Meng
- Department of Neurology, Xiangya Hospital of Central South University, Changsha, China
| | - Wan-Lin Jin
- Department of Neurology, Xiangya Hospital of Central South University, Changsha, China
| | - Cheng-Kai Yan
- Department of Neurology, Xiangya Hospital of Central South University, Changsha, China
| | - Huan Yang
- Department of Neurology, Xiangya Hospital of Central South University, Changsha, China
| |
Collapse
|
249
|
Shen C, Ding J, Wang Z, Cao D, Ding X, Hou T. From machine learning to deep learning: Advances in scoring functions for protein–ligand docking. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2019. [DOI: 10.1002/wcms.1429] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Affiliation(s)
- Chao Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University Hangzhou P. R. China
| | - Junjie Ding
- Beijing Institute of Pharmaceutical Chemistry Beijing P. R. China
| | - Zhe Wang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University Hangzhou P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University Changsha P. R. China
| | - Xiaoqin Ding
- Beijing Institute of Pharmaceutical Chemistry Beijing P. R. China
| | - Tingjun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University Hangzhou P. R. China
| |
Collapse
|
250
|
Ai S, Lin G, Bai Y, Liu X, Piao L. QSAR Classification-Based Virtual Screening Followed by Molecular Docking Identification of Potential COX-2 Inhibitors in a Natural Product Library. J Comput Biol 2019; 26:1296-1315. [PMID: 31233340 DOI: 10.1089/cmb.2019.0142] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Developments of natural inhibitors to prevent the function of cyclooxygenase-2 (COX-2) protein, responsible for a variety of inflammations and cancers, are a major challenge in the scientific community. In this study, robust QSAR classification models for predicting COX-2 inhibitor were developed, by which the self-organizing feature map neural network and random forest (RF) were adopted to improve the prediction of classification model ability. The F-score-based criterion combined with RF was used for feature selection, and good performance for COX-2 inhibitor prediction in overall accuracy was demonstrated. We used this model as a virtual screening tool for identifying the potential COX-2 inhibitor from a natural product library and found potential hit compounds. This compound further screened by applying molecular docking simulation identified five potential hits such as osthole, kavain, vanillyl acetone, myristicin, and psoralen, having a comparable binding affinity to COX-2 protein. However, in cell experiment, three hit compounds revealed COX-2 inhibitory activity in mRNA and protein level such as osthole, kavain, and psoralen.
Collapse
Affiliation(s)
- Shangjie Ai
- School of Informatic Engineering Science, Hainan University, Haikou, China
| | - Guanfei Lin
- School of Life and Pharmaceutical Science, Hainan University, Haikou, China
| | - Yong Bai
- School of Informatic Engineering Science, Hainan University, Haikou, China
| | - Xiande Liu
- School of Life and Pharmaceutical Science, Hainan University, Haikou, China
| | - Linghua Piao
- Department of Physiology, Hainan Medical University, Haikou, China
| |
Collapse
|