1
|
Jin JX, Ren GP, Hu J, Liu Y, Gao Y, Wu KJ, He Y. Force field-inspired transformer network assisted crystal density prediction for energetic materials. J Cheminform 2023; 15:65. [PMID: 37468954 DOI: 10.1186/s13321-023-00736-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 07/12/2023] [Indexed: 07/21/2023] Open
Abstract
Machine learning has great potential in predicting chemical information with greater precision than traditional methods. Graph neural networks (GNNs) have become increasingly popular in recent years, as they can automatically learn the features of the molecule from the graph, significantly reducing the time needed to find and build molecular descriptors. However, the application of machine learning to energetic materials property prediction is still in the initial stage due to insufficient data. In this work, we first curated a dataset of 12,072 compounds containing CHON elements, which are traditionally regarded as main composition elements of energetic materials, from the Cambridge Structural Database, then we implemented a refinement to our force field-inspired neural network (FFiNet), through the adoption of a Transformer encoder, resulting in force field-inspired Transformer network (FFiTrNet). After the improvement, our model outperforms other machine learning-based and GNNs-based models and shows its powerful predictive capabilities especially for high-density materials. Our model also shows its capability in predicting the crystal density of potential energetic materials dataset (i.e. Huang & Massa dataset), which will be helpful in practical high-throughput screening of energetic materials.
Collapse
Affiliation(s)
- Jun-Xuan Jin
- Zhejiang Provincial Key Laboratory of Advanced Chemical Engineering Manufacture Technology, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, 310027, China
- Institute of Zhejiang University-Quzhou, Quzhou, 324000, China
| | - Gao-Peng Ren
- Zhejiang Provincial Key Laboratory of Advanced Chemical Engineering Manufacture Technology, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, 310027, China
- Institute of Zhejiang University-Quzhou, Quzhou, 324000, China
| | - Jianjian Hu
- Xi'an Modern Chemistry Research Institute, Xi'an, 710065, China
| | - Yingzhe Liu
- Xi'an Modern Chemistry Research Institute, Xi'an, 710065, China
| | - Yunhu Gao
- Department of Engineering, University of Cambridge, Cambridge, CB2 1PZ, UK
| | - Ke-Jun Wu
- Zhejiang Provincial Key Laboratory of Advanced Chemical Engineering Manufacture Technology, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, 310027, China.
- Institute of Zhejiang University-Quzhou, Quzhou, 324000, China.
| | - Yuchen He
- State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou, 310027, China.
| |
Collapse
|
2
|
Huang B, Fong LWR, Chaudhari R, Zhang S. Development and evaluation of a java-based deep neural network method for drug response predictions. Front Artif Intell 2023; 6:1069353. [PMID: 37035534 PMCID: PMC10076891 DOI: 10.3389/frai.2023.1069353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 03/03/2023] [Indexed: 04/11/2023] Open
Abstract
Accurate prediction of drug response is a crucial step in personalized medicine. Recently, deep learning techniques have been witnessed with significant breakthroughs in a variety of areas including biomedical research and chemogenomic applications. This motivated us to develop a novel deep learning platform to accurately and reliably predict the response of cancer cells to different drug treatments. In the present work, we describe a Java-based implementation of deep neural network method, termed JavaDL, to predict cancer responses to drugs solely based on their chemical features. To this end, we devised a novel cost function and added a regularization term which suppresses overfitting. We also adopted an early stopping strategy to further reduce overfit and improve the accuracy and robustness of our models. To evaluate our method, we compared with several popular machine learning and deep neural network programs and observed that JavaDL either outperformed those methods in model building or obtained comparable predictions. Finally, JavaDL was employed to predict drug responses of several aggressive breast cancer cell lines, and the results showed robust and accurate predictions with r 2 as high as 0.81.
Collapse
|
3
|
Biofilm- i: A Platform for Predicting Biofilm Inhibitors Using Quantitative Structure-Relationship (QSAR) Based Regression Models to Curb Antibiotic Resistance. MOLECULES (BASEL, SWITZERLAND) 2022; 27:molecules27154861. [PMID: 35956807 PMCID: PMC9369795 DOI: 10.3390/molecules27154861] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Revised: 07/16/2022] [Accepted: 07/17/2022] [Indexed: 11/19/2022]
Abstract
Antibiotic drug resistance has emerged as a major public health threat globally. One of the leading causes of drug resistance is the colonization of microorganisms in biofilm mode. Hence, there is an urgent need to design novel and highly effective biofilm inhibitors that can work either synergistically with antibiotics or individually. Therefore, we have developed a recursive regression-based platform “Biofilm-i” employing a quantitative structure–activity relationship approach for making generalized predictions, along with group and species-specific predictions of biofilm inhibition efficiency of chemical(s). The platform encompasses eight predictors, three analysis tools, and data visualization modules. The experimentally validated biofilm inhibitors for model development were retrieved from the “aBiofilm” resource and processed using a 10-fold cross-validation approach using the support vector machine and andom forest machine learning techniques. The data was further sub-divided into training/testing and independent validation sets. From training/testing data sets the Pearson’s correlation coefficient of overall chemicals, Gram-positive bacteria, Gram-negative bacteria, fungus, Pseudomonas aeruginosa, Staphylococcus aureus, Candida albicans, and Escherichia coli was 0.60, 0.77, 0.62, 0.77, 0.73, 0.83, 0.70, and 0.71 respectively via Support Vector Machine. Further, all the QSAR models performed equally well on independent validation data sets. Additionally, we also checked the performance of the random forest machine learning technique for the above datasets. The integrated analysis tools can convert the chemical structure into different formats, search for a similar chemical in the aBiofilm database and design the analogs. Moreover, the data visualization modules check the distribution of experimentally validated biofilm inhibitors according to their common scaffolds. The Biofilm-i platform would be of immense help to researchers engaged in designing highly efficacious biofilm inhibitors for tackling the menace of antibiotic drug resistance.
Collapse
|
4
|
In Silico Approaches for Some Sulfa Drugs as Eco-Friendly Corrosion Inhibitors of Iron in Aqueous Medium. LUBRICANTS 2022. [DOI: 10.3390/lubricants10030043] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
This paper addresses the prediction of the adsorption behavior as well as the inhibition capacity of non-toxic sulfonamide-based molecules, also called sulfa drugs, on the surface of mild steel. The study of the electronic structure was investigated through quantum chemical calculations using the density functional theory method (DFT) and the direct interaction of inhibitors with the iron (Fe) metal surface was predicted using the multiple probability Monte Carlo simulations (MC). Then, the examination of the solubility and the environmental toxicity was confirmed using a chemical database modeling environment website. It was shown that the presence of substituents containing heteroatoms able to release electrons consequently increased the electron density in the lowest unoccupied and highest occupied molecular orbitals (LUMO and HOMO), which allowed a good interaction between the inhibitors and the steel surface. The high values of EHOMO imply an ability to donate electrons while the low values of ELUMO are related to the ability to accept electrons thus allowing good adsorption of the inhibitor molecules on the steel surface. Molecular dynamics simulations revealed that all sulfonamide molecules adsorb flat on the metal surface conforming to the highly protective Fe (1 1 0) surface. The results obtained from the quantum chemistry and molecular dynamics studies are consistent and reveal that the order of effectiveness of the sulfonamide compounds is P7 > P5 > P6 > P1 > P2 > P3 > P4.
Collapse
|
5
|
Zhang T, Androulakis IP, Bonate P, Cheng L, Helikar T, Parikh J, Rackauckas C, Subramanian K, Cho CR. Two heads are better than one: current landscape of integrating QSP and machine learning : An ISoP QSP SIG white paper by the working group on the integration of quantitative systems pharmacology and machine learning. J Pharmacokinet Pharmacodyn 2022; 49:5-18. [PMID: 35103884 PMCID: PMC8837505 DOI: 10.1007/s10928-022-09805-z] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 01/10/2022] [Indexed: 12/02/2022]
Abstract
Quantitative systems pharmacology (QSP) modeling is applied to address essential questions in drug development, such as the mechanism of action of a therapeutic agent and the progression of disease. Meanwhile, machine learning (ML) approaches also contribute to answering these questions via the analysis of multi-layer ‘omics’ data such as gene expression, proteomics, metabolomics, and high-throughput imaging. Furthermore, ML approaches can also be applied to aspects of QSP modeling. Both approaches are powerful tools and there is considerable interest in integrating QSP modeling and ML. So far, a few successful implementations have been carried out from which we have learned about how each approach can overcome unique limitations of the other. The QSP + ML working group of the International Society of Pharmacometrics QSP Special Interest Group was convened in September, 2019 to identify and begin realizing new opportunities in QSP and ML integration. The working group, which comprises 21 members representing 18 academic and industry organizations, has identified four categories of current research activity which will be described herein together with case studies of applications to drug development decision making. The working group also concluded that the integration of QSP and ML is still in its early stages of moving from evaluating available technical tools to building case studies. This paper reports on this fast-moving field and serves as a foundation for future codification of best practices.
Collapse
Affiliation(s)
- Tongli Zhang
- University of Cincinnati, Cincinnati, OH, 45267, USA.
| | | | | | | | - Tomáš Helikar
- Department of Biochemistry, University of Nebraska-Lincoln, Lincoln, NE, USA
| | | | - Christopher Rackauckas
- Pumas-AI, Baltimore, MD, USA.,Department of Mathematics, Massachusetts Institute of Technology, Boston, MA, USA
| | | | | | | |
Collapse
|
6
|
Sowrirajan S, Elangovan N, Ajithkumar G, Manoj KP. (E)-4-((4-Bromobenzylidene) Amino)-N-(Pyrimidin-2-yl) Benzenesulfonamide from 4-Bromobenzaldehyde and Sulfadiazine, Synthesis, Spectral (FTIR, UV–Vis), Computational (DFT, HOMO–LUMO, MEP, NBO, NPA, ELF, LOL, RDG) and Molecular Docking Studies. Polycycl Aromat Compd 2022. [DOI: 10.1080/10406638.2021.2006245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- S. Sowrirajan
- Department of Chemistry, King Fahd University of Petroleum and Minerals, Kingdom of Saudi Arabia
| | - N. Elangovan
- Department of Chemistry, Arignar Anna Government Arts College (Affiliated to Bharathidasan University), Musiri, Tiruchirappalli, Tamil Nadu, India
| | - G. Ajithkumar
- Department of Chemistry, Arignar Anna Government Arts College (Affiliated to Bharathidasan University), Musiri, Tiruchirappalli, Tamil Nadu, India
| | - K. P. Manoj
- Department of Chemistry, Arignar Anna Government Arts College (Affiliated to Bharathidasan University), Musiri, Tiruchirappalli, Tamil Nadu, India
| |
Collapse
|
7
|
Zhou Y, Zhang Y, Lian X, Li F, Wang C, Zhu F, Qiu Y, Chen Y. Therapeutic target database update 2022: facilitating drug discovery with enriched comparative data of targeted agents. Nucleic Acids Res 2021; 50:D1398-D1407. [PMID: 34718717 PMCID: PMC8728281 DOI: 10.1093/nar/gkab953] [Citation(s) in RCA: 289] [Impact Index Per Article: 96.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 09/29/2021] [Accepted: 10/04/2021] [Indexed: 11/14/2022] Open
Abstract
Drug discovery relies on the knowledge of not only drugs and targets, but also the comparative agents and targets. These include poor binders and non-binders for developing discovery tools, prodrugs for improved therapeutics, co-targets of therapeutic targets for multi-target strategies and off-target investigations, and the collective structure-activity and drug-likeness landscapes of enhanced drug feature. However, such valuable data are inadequately covered by the available databases. In this study, a major update of the Therapeutic Target Database, previously featured in NAR, was therefore introduced. This update includes (a) 34 861 poor binders and 12 683 non-binders of 1308 targets; (b) 534 prodrug-drug pairs for 121 targets; (c) 1127 co-targets of 672 targets regulated by 642 approved and 624 clinical trial drugs; (d) the collective structure-activity landscapes of 427 262 active agents of 1565 targets; (e) the profiles of drug-like properties of 33 598 agents of 1102 targets. Moreover, a variety of additional data and function are provided, which include the cross-links to the target structure in PDB and AlphaFold, 159 and 1658 newly emerged targets and drugs, and the advanced search function for multi-entry target sequences or drug structures. The database is accessible without login requirement at: https://idrblab.org/ttd/.
Collapse
Affiliation(s)
- Ying Zhou
- State Key Laboratory for Diagnosis and Treatment of Infectious Disease, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, 79 QingChun Road, Hangzhou, Zhejiang 310000, China
| | - Yintao Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Xichen Lian
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Fengcheng Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Chaoxin Wang
- Department of Computer Science, Kansas State University, Manhattan 66506, USA
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Yunqing Qiu
- State Key Laboratory for Diagnosis and Treatment of Infectious Disease, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, 79 QingChun Road, Hangzhou, Zhejiang 310000, China
| | - Yuzong Chen
- State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, The Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, China.,Qian Xuesen Collaborative Research Center of Astrochemistry and Space Life Sciences, Institute of Drug Discovery Technology, Ningbo University, Ningbo 315211, China
| |
Collapse
|
8
|
Elangovan N, Sowrirajan S. Synthesis, single crystal (XRD), Hirshfeld surface analysis, computational study (DFT) and molecular docking studies of (E)-4-((2-hydroxy-3,5-diiodobenzylidene)amino)-N-(pyrimidine)-2-yl) benzenesulfonamide. Heliyon 2021; 7:e07724. [PMID: 34458601 PMCID: PMC8379672 DOI: 10.1016/j.heliyon.2021.e07724] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 07/14/2021] [Accepted: 08/03/2021] [Indexed: 12/15/2022] Open
Abstract
The Schiff base (E)-4-((2-hydroxy-3,5-diiodobenzylidene)amino)-N-(pyrimidine)-2-yl) benzene sulfonamide (DIDA) compound was synthesis with condensation of 3,5-diiodosalicylaldehyde and sulfadiazine. The compound characterized with FTIR, X-ray crystallography and electronic spectra. The titled compound associated with experimental and theoretical method, DFT used for the theoretical method. The IR was calculated from DFT mode with B3LYP/GENSEP basic set. The electronic spectra computed from TD-DFT method with CAM-B3LYP functional, with IEFPCM solvation model and DMSO used as the solvent. Wave function based properties like localized orbital locator, electron localization function and non-covalent interactions have been studied extensively. The ADMET properties of the compound DIDA indicated that the compound has excellent drug likeness properties and PASS studies showed that it has anti-infective properties, which is confirmed by a docking score of -7.4 kcal/mol.
Collapse
Affiliation(s)
- N Elangovan
- Department of Chemistry, Arignar Anna Government Arts College, Musiri 621211, Bharathidasan University, Tiruchirappalli, Tamilnadu, India
| | - S Sowrirajan
- Department of Chemistry, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia
| |
Collapse
|
9
|
Yoshimori A. Prediction of Molecular Properties Using Molecular Topographic Map. Molecules 2021; 26:4475. [PMID: 34361624 PMCID: PMC8348331 DOI: 10.3390/molecules26154475] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Revised: 07/21/2021] [Accepted: 07/21/2021] [Indexed: 12/18/2022] Open
Abstract
Prediction of molecular properties plays a critical role towards rational drug design. In this study, the Molecular Topographic Map (MTM) is proposed, which is a two-dimensional (2D) map that can be used to represent a molecule. An MTM is generated from the atomic features set of a molecule using generative topographic mapping and is then used as input data for analyzing structure-property/activity relationships. In the visualization and classification of 20 amino acids, differences of the amino acids can be visually confirmed from and revealed by hierarchical clustering with a similarity matrix of their MTMs. The prediction of molecular properties was performed on the basis of convolutional neural networks using MTMs as input data. The performance of the predictive models using MTM was found to be equal to or better than that using Morgan fingerprint or MACCS keys. Furthermore, data augmentation of MTMs using mixup has improved the prediction performance. Since molecules converted to MTMs can be treated like 2D images, they can be easily used with existing neural networks for image recognition and related technologies. MTM can be effectively utilized to predict molecular properties of small molecules to aid drug discovery research.
Collapse
Affiliation(s)
- Atsushi Yoshimori
- Institute for Theoretical Medicine, Inc., 26-1, Muraoka-Higashi 2-chome, Fujisawa 251-0012, Japan
| |
Collapse
|
10
|
Mekni N, Coronnello C, Langer T, Rosa MD, Perricone U. Support Vector Machine as a Supervised Learning for the Prioritization of Novel Potential SARS-CoV-2 Main Protease Inhibitors. Int J Mol Sci 2021; 22:7714. [PMID: 34299333 PMCID: PMC8305792 DOI: 10.3390/ijms22147714] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Revised: 07/14/2021] [Accepted: 07/15/2021] [Indexed: 12/04/2022] Open
Abstract
In the last year, the COVID-19 pandemic has highly affected the lifestyle of the world population, encouraging the scientific community towards a great effort on studying the infection molecular mechanisms. Several vaccine formulations are nowadays available and helping to reach immunity. Nevertheless, there is a growing interest towards the development of novel anti-covid drugs. In this scenario, the main protease (Mpro) represents an appealing target, being the enzyme responsible for the cleavage of polypeptides during the viral genome transcription. With the aim of sharing new insights for the design of novel Mpro inhibitors, our research group developed a machine learning approach using the support vector machine (SVM) classification. Starting from a dataset of two million commercially available compounds, the model was able to classify two hundred novel chemo-types as potentially active against the viral protease. The compounds labelled as actives by SVM were next evaluated through consensus docking studies on two PDB structures and their binding mode was compared to well-known protease inhibitors. The best five compounds selected by consensus docking were then submitted to molecular dynamics to deepen binding interactions stability. Of note, the compounds selected via SVM retrieved all the most important interactions known in the literature.
Collapse
Affiliation(s)
- Nedra Mekni
- Department of Pharmaceutical Chemistry, University of Vienna, 1090 Vienna, Austria;
- Drug Discovery Unit, Fondazione Ri.MED, 90128 Palermo, Italy; (C.C.); (M.D.R.)
| | - Claudia Coronnello
- Drug Discovery Unit, Fondazione Ri.MED, 90128 Palermo, Italy; (C.C.); (M.D.R.)
| | - Thierry Langer
- Department of Pharmaceutical Chemistry, University of Vienna, 1090 Vienna, Austria;
| | - Maria De Rosa
- Drug Discovery Unit, Fondazione Ri.MED, 90128 Palermo, Italy; (C.C.); (M.D.R.)
| | - Ugo Perricone
- Drug Discovery Unit, Fondazione Ri.MED, 90128 Palermo, Italy; (C.C.); (M.D.R.)
| |
Collapse
|
11
|
Vaškevičius M, Kapočiūtė-Dzikienė J, Šlepikas L. Prediction of Chromatography Conditions for Purification in Organic Synthesis Using Deep Learning. Molecules 2021; 26:2474. [PMID: 33922736 PMCID: PMC8123027 DOI: 10.3390/molecules26092474] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Revised: 04/15/2021] [Accepted: 04/22/2021] [Indexed: 01/27/2023] Open
Abstract
In this research, a process for developing normal-phase liquid chromatography solvent systems has been proposed. In contrast to the development of conditions via thin-layer chromatography (TLC), this process is based on the architecture of two hierarchically connected neural network-based components. Using a large database of reaction procedures allows those two components to perform an essential role in the machine-learning-based prediction of chromatographic purification conditions, i.e., solvents and the ratio between solvents. In our paper, we build two datasets and test various molecular vectorization approaches, such as extended-connectivity fingerprints, learned embedding, and auto-encoders along with different types of deep neural networks to demonstrate a novel method for modeling chromatographic solvent systems employing two neural networks in sequence. Afterward, we present our findings and provide insights on the most effective methods for solving prediction tasks. Our approach results in a system of two neural networks with long short-term memory (LSTM)-based auto-encoders, where the first predicts solvent labels (by reaching the classification accuracy of 0.950 ± 0.001) and in the case of two solvents, the second one predicts the ratio between two solvents (R2 metric equal to 0.982 ± 0.001). Our approach can be used as a guidance instrument in laboratories to accelerate scouting for suitable chromatography conditions.
Collapse
Affiliation(s)
- Mantas Vaškevičius
- Department of Applied Informatics, Vytautas Magnus University, LT-44404 Kaunas, Lithuania;
- JSC Synhet, Biržų Str. 6, LT-44139 Kaunas, Lithuania;
| | | | | |
Collapse
|
12
|
Onay A, Onay M. A Drug Decision Support System for Developing a Successful Drug Candidate Using Machine Learning Techniques. Curr Comput Aided Drug Des 2019; 16:407-419. [PMID: 31438830 DOI: 10.2174/1573409915666190716143601] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2019] [Revised: 04/24/2019] [Accepted: 05/06/2019] [Indexed: 11/22/2022]
Abstract
BACKGROUND Virtual screening of candidate drug molecules using machine learning techniques plays a key role in pharmaceutical industry to design and discovery of new drugs. Computational classification methods can determine drug types according to the disease groups and distinguish approved drugs from withdrawn ones. INTRODUCTION Classification models developed in this study can be used as a simple filter in drug modelling to eliminate potentially inappropriate molecules in the early stages. In this work, we developed a Drug Decision Support System (DDSS) to classify each drug candidate molecule as potentially drug or non-drug and to predict its disease group. METHODS Molecular descriptors were identified for the determination of a number of rules in drug molecules. They were derived using ADRIANA.Code program and Lipinski's rule of five. We used Artificial Neural Network (ANN) to classify drug molecules correctly according to the types of diseases. Closed frequent molecular structures in the form of subgraph fragments were also obtained with Gaston algorithm included in ParMol Package to find common molecular fragments for withdrawn drugs. RESULTS We observed that TPSA, XlogP Natoms, HDon_O and TPSA are the most distinctive features in the pool of the molecular descriptors and evaluated the performances of classifiers on all datasets and found that classification accuracies are very high on all the datasets. Neural network models achieved 84.6% and 83.3% accuracies on test sets including cardiac therapy, anti-epileptics and anti-parkinson drugs with approved and withdrawn drugs for drug classification problems. CONCLUSION The experimental evaluation shows that the system is promising at determination of potential drug molecules to classify drug molecules correctly according to the types of diseases.
Collapse
Affiliation(s)
- Aytun Onay
- Department of Computer Engineering, Faculty of Engineering & Architecture, Kafkas University, Kars, 36100, Turkey
| | - Melih Onay
- Department of Environmental Engineering, Computational & Experimental Biochemistry Lab, Faculty of Engineering, Van Yuzuncu Yil University, 65100, Van, Turkey
| |
Collapse
|
13
|
Abstract
The most common applications of artificial intelligence (AI) in drug treatment have to do with matching patients to their optimal drug or combination of drugs, predicting drug-target or drug-drug interactions, and optimizing treatment protocols. This review outlines some of the recently developed AI methods aiding the drug treatment and administration process. Selection of the best drug(s) for a patient typically requires the integration of patient data, such as genetics or proteomics, with drug data, like compound chemical descriptors, to score the therapeutic efficacy of drugs. The prediction of drug interactions often relies on similarity metrics, assuming that drugs with similar structures or targets will have comparable behavior or may interfere with each other. Optimizing the dosage schedule for administration of drugs is performed using mathematical models to interpret pharmacokinetic and pharmacodynamic data. The recently developed and powerful models for each of these tasks are addressed, explained, and analyzed here.
Collapse
Affiliation(s)
- Eden L Romm
- CureMatch Inc., San Diego, California 92121, USA
| | - Igor F Tsigelny
- CureMatch Inc., San Diego, California 92121, USA.,San Diego Supercomputer Center, University of California, San Diego, La Jolla, California 92093, USA;
| |
Collapse
|
14
|
Sachdev K, Gupta MK. A comprehensive review of feature based methods for drug target interaction prediction. J Biomed Inform 2019; 93:103159. [PMID: 30926470 DOI: 10.1016/j.jbi.2019.103159] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Revised: 03/25/2019] [Accepted: 03/26/2019] [Indexed: 12/22/2022]
Abstract
Drug target interaction is a prominent research area in the field of drug discovery. It refers to the recognition of interactions between chemical compounds and the protein targets in the human body. Wet lab experiments to identify these interactions are expensive as well as time consuming. The computational methods of interaction prediction help limit the search space for these experiments. These computational methods can be divided into ligand based approaches, docking approaches and chemogenomic approaches. In this review, we aim to describe the various feature based chemogenomic methods for drug target interaction prediction. It provides a comprehensive overview of the various techniques, datasets, tools and metrics. The feature based methods have been categorized, explained and compared. A novel framework for drug target interaction prediction has also been proposed that aims to improve the performance of existing methods. To the best of our knowledge, this is the first comprehensive review focusing only on feature based methods of drug target interaction.
Collapse
Affiliation(s)
- Kanica Sachdev
- Computer Science and Engineering Department, SMVDU, J&K, India.
| | | |
Collapse
|
15
|
Uslu F, Icoz K, Tasdemir K, Yilmaz B. Automated quantification of immunomagnetic beads and leukemia cells from optical microscope images. Biomed Signal Process Control 2019. [DOI: 10.1016/j.bspc.2019.01.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
16
|
Yang H, Sun L, Li W, Liu G, Tang Y. In Silico Prediction of Chemical Toxicity for Drug Design Using Machine Learning Methods and Structural Alerts. Front Chem 2018. [PMID: 29515993 PMCID: PMC5826228 DOI: 10.3389/fchem.2018.00030] [Citation(s) in RCA: 101] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
During drug development, safety is always the most important issue, including a variety of toxicities and adverse drug effects, which should be evaluated in preclinical and clinical trial phases. This review article at first simply introduced the computational methods used in prediction of chemical toxicity for drug design, including machine learning methods and structural alerts. Machine learning methods have been widely applied in qualitative classification and quantitative regression studies, while structural alerts can be regarded as a complementary tool for lead optimization. The emphasis of this article was put on the recent progress of predictive models built for various toxicities. Available databases and web servers were also provided. Though the methods and models are very helpful for drug design, there are still some challenges and limitations to be improved for drug safety assessment in the future.
Collapse
Affiliation(s)
- Hongbin Yang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Lixia Sun
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| |
Collapse
|
17
|
Yang M, Chen J, Xu L, Shi X, Zhou X, Xi Z, An R, Wang X. A novel adaptive ensemble classification framework for ADME prediction. RSC Adv 2018; 8:11661-11683. [PMID: 35542768 PMCID: PMC9079056 DOI: 10.1039/c8ra01206g] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Accepted: 03/20/2018] [Indexed: 12/20/2022] Open
Abstract
AECF is a GA based ensemble method. It includes four components which are (1) data balancing, (2) generating individual models, (3) combining individual models, and (4) optimizing the ensemble.
Collapse
Affiliation(s)
- Ming Yang
- Department of Pharmacy
- Longhua Hospital Affiliated to Shanghai University of TCM
- Shanghai
- People's Republic of China
- Department of Chemistry
| | - Jialei Chen
- Department of Pharmacy
- Longhua Hospital Affiliated to Shanghai University of TCM
- Shanghai
- People's Republic of China
| | - Liwen Xu
- Department of Pharmacy
- Longhua Hospital Affiliated to Shanghai University of TCM
- Shanghai
- People's Republic of China
| | - Xiufeng Shi
- Department of Pharmacy
- Longhua Hospital Affiliated to Shanghai University of TCM
- Shanghai
- People's Republic of China
| | - Xin Zhou
- Department of Pharmacy
- Longhua Hospital Affiliated to Shanghai University of TCM
- Shanghai
- People's Republic of China
| | - Zhijun Xi
- Department of Pharmacy
- Longhua Hospital Affiliated to Shanghai University of TCM
- Shanghai
- People's Republic of China
| | - Rui An
- Department of Chemistry
- College of Pharmacy
- Shanghai University of Traditional Chinese Medicine
- Shanghai
- People's Republic of China
| | - Xinhong Wang
- Department of Chemistry
- College of Pharmacy
- Shanghai University of Traditional Chinese Medicine
- Shanghai
- People's Republic of China
| |
Collapse
|
18
|
Optimizing the macrocyclic diterpenic core toward the reversal of multidrug resistance in cancer. Future Med Chem 2017; 8:629-45. [PMID: 27105294 DOI: 10.4155/fmc.16.11] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND From a dataset obtained by chemical derivatization of a macrocyclic diterpenic scaffold, in silico approaches identified which structural features correlate with experimental modulation of P-gp activity. Results/methodology: Ninety-two percent of the strongest MDR modulators were positively identified within the dataset by virtual screening. Quantitative structure-activity relationships models with high robustness and predictability were obtained for both MDR1-transfected L5178Y mouse lymphoma T-cells (q(2) 0.875, R(2) pred 0.921) and human colon adenocarcinoma (q(2) 0.820, R(2) pred 0.951) cell lines. A new pharmacophoric model suggests that charge distribution within the molecule is important for biological activity. CONCLUSION For the studied diterpenes, the conformation of the macrocyclic scaffold and its substitution pattern are the main determinants for the biological activity, being related with steric and electrostatic factors.
Collapse
|
19
|
Shen W, Xiao T, Chen S, Liu F, Chen YZ, Jiang Y. Predicting the Enzymatic Hydrolysis Half‐lives of New Chemicals Using Support Vector Regression Models Based on Stepwise Feature Elimination. Mol Inform 2017. [DOI: 10.1002/minf.201600153] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Affiliation(s)
- Wanxiang Shen
- Department of ChemistryTsinghua University Beijing 100084 P. R. China
- The State Key Laboratory Breeding Base-Shenzhen Key Laboratory of Chemical Biology, the Graduate School at ShenzhenTsinghua University Shenzhen 518055 P. R. China
| | - Tao Xiao
- Department of ChemistryTsinghua University Beijing 100084 P. R. China
- The State Key Laboratory Breeding Base-Shenzhen Key Laboratory of Chemical Biology, the Graduate School at ShenzhenTsinghua University Shenzhen 518055 P. R. China
| | - Shangying Chen
- Bioinformatics and Drug Design Group, Department of PharmacyNational University of Singapore Singapore 117543 Singapore
| | - Feng Liu
- Department of ChemistryTsinghua University Beijing 100084 P. R. China
- The State Key Laboratory Breeding Base-Shenzhen Key Laboratory of Chemical Biology, the Graduate School at ShenzhenTsinghua University Shenzhen 518055 P. R. China
| | - Yu Zong Chen
- The State Key Laboratory Breeding Base-Shenzhen Key Laboratory of Chemical Biology, the Graduate School at ShenzhenTsinghua University Shenzhen 518055 P. R. China
- Bioinformatics and Drug Design Group, Department of PharmacyNational University of Singapore Singapore 117543 Singapore
- Shenzhen Kivita Innovative Drug Discovery Institute Shenzhen 518055 P. R. China
| | - Yuyang Jiang
- The State Key Laboratory Breeding Base-Shenzhen Key Laboratory of Chemical Biology, the Graduate School at ShenzhenTsinghua University Shenzhen 518055 P. R. China
- School of Pharmaceutical SciencesTsinghua University Beijing 100084 P. R. China
| |
Collapse
|
20
|
Development of novel in silico model for developmental toxicity assessment by using naïve Bayes classifier method. Reprod Toxicol 2017; 71:8-15. [PMID: 28428071 DOI: 10.1016/j.reprotox.2017.04.005] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Revised: 04/10/2017] [Accepted: 04/13/2017] [Indexed: 02/05/2023]
Abstract
Toxicological testing associated with developmental toxicity endpoints are very expensive, time consuming and labor intensive. Thus, developing alternative approaches for developmental toxicity testing is an important and urgent task in the drug development filed. In this investigation, the naïve Bayes classifier was applied to develop a novel prediction model for developmental toxicity. The established prediction model was evaluated by the internal 5-fold cross validation and external test set. The overall prediction results for the internal 5-fold cross validation of the training set and external test set were 96.6% and 82.8%, respectively. In addition, four simple descriptors and some representative substructures of developmental toxicants were identified. Thus, we hope the established in silico prediction model could be used as alternative method for toxicological assessment. And these obtained molecular information could afford a deeper understanding on the developmental toxicants, and provide guidance for medicinal chemists working in drug discovery and lead optimization.
Collapse
|
21
|
Onay A, Onay M, Abul O. Classification of nervous system withdrawn and approved drugs with ToxPrint features via machine learning strategies. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2017; 142:9-19. [PMID: 28325450 DOI: 10.1016/j.cmpb.2017.02.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2016] [Revised: 01/20/2017] [Accepted: 02/08/2017] [Indexed: 06/06/2023]
Abstract
BACKGROUND AND OBJECTIVES Early-phase virtual screening of candidate drug molecules plays a key role in pharmaceutical industry from data mining and machine learning to prevent adverse effects of the drugs. Computational classification methods can distinguish approved drugs from withdrawn ones. We focused on 6 data sets including maximum 110 approved and 110 withdrawn drugs for all and nervous system diseases to distinguish approved drugs from withdrawn ones. METHODS In this study, we used support vector machines (SVMs) and ensemble methods (EMs) such as boosted and bagged trees to classify drugs into approved and withdrawn categories. Also, we used CORINA Symphony program to identify Toxprint chemotypes including over 700 predefined chemotypes for determination of risk and safety assesment of candidate drug molecules. In addition, we studied nervous system withdrawn drugs to determine the key fragments with The ParMol package including gSpan algorithm. RESULTS According to our results, the descriptors named as the number of total chemotypes and bond CN_amine_aliphatic_generic were more significant descriptors. The developed Medium Gaussian SVM model reached 78% prediction accuracy on test set for drug data set including all disease. Here, bagged tree and linear SVM models showed 89% of accuracies for phycholeptics and psychoanaleptics drugs. A set of discriminative fragments in nervous system withdrawn drug (NSWD) data sets was obtained. These fragments responsible for the drugs removed from market were benzene, toluene, N,N-dimethylethylamine, crotylamine, 5-methyl-2,4-heptadiene, octatriene and carbonyl group. CONCLUSION This paper covers the development of computational classification methods to distinguish approved drugs from withdrawn ones. In addition, the results of this study indicated the identification of discriminative fragments is of significance to design a new nervous system approved drugs with interpretation of the structures of the NSWDs.
Collapse
Affiliation(s)
- Aytun Onay
- Department of Computer Engineering, TOBB University of Economics & Technology, 06560, Ankara, Turkey
| | - Melih Onay
- Department of Environmental Engineering, Computational & Experimental Biochemistry Lab, Yuzuncu Yil University, 65080, Van, Turkey.
| | - Osman Abul
- Department of Computer Engineering, TOBB University of Economics & Technology, 06560, Ankara, Turkey
| |
Collapse
|
22
|
Chen S, Zhang P, Liu X, Qin C, Tao L, Zhang C, Yang SY, Chen YZ, Chui WK. Towards cheminformatics-based estimation of drug therapeutic index: Predicting the protective index of anticonvulsants using a new quantitative structure-index relationship approach. J Mol Graph Model 2016; 67:102-10. [PMID: 27262528 DOI: 10.1016/j.jmgm.2016.05.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2016] [Revised: 05/17/2016] [Accepted: 05/18/2016] [Indexed: 02/05/2023]
Abstract
The overall efficacy and safety profile of a new drug is partially evaluated by the therapeutic index in clinical studies and by the protective index (PI) in preclinical studies. In-silico predictive methods may facilitate the assessment of these indicators. Although QSAR and QSTR models can be used for predicting PI, their predictive capability has not been evaluated. To test this capability, we developed QSAR and QSTR models for predicting the activity and toxicity of anticonvulsants at accuracy levels above the literature-reported threshold (LT) of good QSAR models as tested by both the internal 5-fold cross validation and external validation method. These models showed significantly compromised PI predictive capability due to the cumulative errors of the QSAR and QSTR models. Therefore, in this investigation a new quantitative structure-index relationship (QSIR) model was devised and it showed improved PI predictive capability that superseded the LT of good QSAR models. The QSAR, QSTR and QSIR models were developed using support vector regression (SVR) method with the parameters optimized by using the greedy search method. The molecular descriptors relevant to the prediction of anticonvulsant activities, toxicities and PIs were analyzed by a recursive feature elimination method. The selected molecular descriptors are primarily associated with the drug-like, pharmacological and toxicological features and those used in the published anticonvulsant QSAR and QSTR models. This study suggested that QSIR is useful for estimating the therapeutic index of drug candidates.
Collapse
Affiliation(s)
- Shangying Chen
- Department of Pharmacy, National University of Singapore, 18 Science Drive 4, Singapore 117543, Singapore
| | - Peng Zhang
- Department of Pharmacy, National University of Singapore, 18 Science Drive 4, Singapore 117543, Singapore
| | - Xin Liu
- Shanghai Applied Protein Technology Co. Ltd, Research Center for Proteome Analysis, Institute of Biochemistry and cell Biology, Shanghai Institutes for Biological Sciences, Shanghai, 200233, China
| | - Chu Qin
- Department of Pharmacy, National University of Singapore, 18 Science Drive 4, Singapore 117543, Singapore
| | - Lin Tao
- Department of Pharmacy, National University of Singapore, 18 Science Drive 4, Singapore 117543, Singapore
| | - Cheng Zhang
- Department of Pharmacy, National University of Singapore, 18 Science Drive 4, Singapore 117543, Singapore
| | - Sheng Yong Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, West China Medical School, Sichuan University, Sichuan, China
| | - Yu Zong Chen
- Department of Pharmacy, National University of Singapore, 18 Science Drive 4, Singapore 117543, Singapore.
| | - Wai Keung Chui
- Department of Pharmacy, National University of Singapore, 18 Science Drive 4, Singapore 117543, Singapore.
| |
Collapse
|
23
|
Recent progresses in the exploration of machine learning methods as in-silico ADME prediction tools. Adv Drug Deliv Rev 2015; 86:83-100. [PMID: 26037068 DOI: 10.1016/j.addr.2015.03.014] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2014] [Revised: 03/18/2015] [Accepted: 03/22/2015] [Indexed: 02/05/2023]
Abstract
In-silico methods have been explored as potential tools for assessing ADME and ADME regulatory properties particularly in early drug discovery stages. Machine learning methods, with their ability in classifying diverse structures and complex mechanisms, are well suited for predicting ADME and ADME regulatory properties. Recent efforts have been directed at the broadening of application scopes and the improvement of predictive performance with particular focuses on the coverage of ADME properties, and exploration of more diversified training data, appropriate molecular features, and consensus modeling. Moreover, several online machine learning ADME prediction servers have emerged. Here we review these progresses and discuss the performances, application prospects and challenges of exploring machine learning methods as useful tools in predicting ADME and ADME regulatory properties.
Collapse
|
24
|
Mizera M, Talaczyńska A, Zalewski P, Skibiński R, Cielecka-Piontek J. Prediction of HPLC retention times of tebipenem pivoxyl and its degradation products in solid state by applying adaptive artificial neural network with recursive features elimination. Talanta 2015; 137:174-81. [DOI: 10.1016/j.talanta.2015.01.032] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2014] [Revised: 01/22/2015] [Accepted: 01/23/2015] [Indexed: 02/07/2023]
|
25
|
Thai KM, Huynh NT, Ngo TD, Mai TT, Nguyen TH, Tran TD. Three- and four-class classification models for P-glycoprotein inhibitors using counter-propagation neural networks. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2015; 26:139-163. [PMID: 25588022 DOI: 10.1080/1062936x.2014.995701] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
P-glycoprotein (P-gp) is an ATP binding cassette (ABC) transporter that helps to protect several certain human organs from xenobiotic exposure. This efflux pump is also responsible for multi-drug resistance (MDR), an issue of the chemotherapy approach in the fight against cancer. Therefore, the discovery of P-gp inhibitors is considered one of the most popular strategies to reverse MDR in tumour cells and to improve therapeutic efficacy of commonly used cytotoxic drugs. Until now, several generations of P-gp inhibitors have been developed but they have largely failed in preclinical and clinical studies due to lack of selectivity, poor solubility and severe pharmacokinetic interactions. In this study, three models (SION, SIO, SIN) to classify specific 'true' P-gp inhibitors as well as three other models (CPBN, CPB1, CPN) to distinguish between P-gp inhibitors, CYP 3A inhibitors and co-inhibitors of these proteins with rather high accuracy values for the test set and the external set were generated based on counter-propagation neural networks (CPG-NN). Such three and four-class classification models helped provide more information about the bioactivities of compounds not only on one target (P-gp), but also on a combination of multiple targets (P-gp, CYP 3A).
Collapse
Affiliation(s)
- K-M Thai
- a Department of Medicinal Chemistry, School of Pharmacy , University of Medicine and Pharmacy at Ho Chi Minh City , Ho Chi Minh City , Viet Nam
| | | | | | | | | | | |
Collapse
|
26
|
Abstract
The emphasis of this review is particularly on multivariate statistical methods currently used in quantitative structure–activity relationship (QSAR) studies.
Collapse
Affiliation(s)
- Somayeh Pirhadi
- Drug Design in Silico Lab
- Chemistry Faculty
- K. N. Toosi University of Technology
- Tehran
- Iran
| | | | - Jahan B. Ghasemi
- Drug Design in Silico Lab
- Chemistry Faculty
- K. N. Toosi University of Technology
- Tehran
- Iran
| |
Collapse
|
27
|
Randhawa V, Kumar Singh A, Acharya V. A systematic approach to prioritize drug targets using machine learning, a molecular descriptor-based classification model, and high-throughput screening of plant derived molecules: a case study in oral cancer. MOLECULAR BIOSYSTEMS 2015; 11:3362-77. [DOI: 10.1039/c5mb00468c] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Network-based and cheminformatics approaches identify novel lead molecules forCXCR4, a key gene prioritized in oral cancer.
Collapse
Affiliation(s)
- Vinay Randhawa
- Functional Genomics and Complex Systems Laboratory
- Biotechnology Division
- CSIR-Institute of Himalayan Bioresource Technology
- Council of Scientific and Industrial Research
- Palampur
| | - Anil Kumar Singh
- Biotechnology Division
- CSIR-Institute of Himalayan Bioresource Technology
- Council of Scientific and Industrial Research
- Palampur
- India
| | - Vishal Acharya
- Functional Genomics and Complex Systems Laboratory
- Biotechnology Division
- CSIR-Institute of Himalayan Bioresource Technology
- Council of Scientific and Industrial Research
- Palampur
| |
Collapse
|
28
|
Korkmaz S, Zararsiz G, Goksuluk D. Drug/nondrug classification using Support Vector Machines with various feature selection strategies. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2014; 117:51-60. [PMID: 25224081 DOI: 10.1016/j.cmpb.2014.08.009] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2014] [Revised: 08/15/2014] [Accepted: 08/27/2014] [Indexed: 06/03/2023]
Abstract
In conjunction with the advance in computer technology, virtual screening of small molecules has been started to use in drug discovery. Since there are thousands of compounds in early-phase of drug discovery, a fast classification method, which can distinguish between active and inactive molecules, can be used for screening large compound collections. In this study, we used Support Vector Machines (SVM) for this type of classification task. SVM is a powerful classification tool that is becoming increasingly popular in various machine-learning applications. The data sets consist of 631 compounds for training set and 216 compounds for a separate test set. In data pre-processing step, the Pearson's correlation coefficient used as a filter to eliminate redundant features. After application of the correlation filter, a single SVM has been applied to this reduced data set. Moreover, we have investigated the performance of SVM with different feature selection strategies, including SVM-Recursive Feature Elimination, Wrapper Method and Subset Selection. All feature selection methods generally represent better performance than a single SVM while Subset Selection outperforms other feature selection methods. We have tested SVM as a classification tool in a real-life drug discovery problem and our results revealed that it could be a useful method for classification task in early-phase of drug discovery.
Collapse
Affiliation(s)
- Selcuk Korkmaz
- Hacettepe University, Faculty of Medicine, Department of Biostatistics, 06100 Sihhiye, Ankara, Turkey.
| | - Gokmen Zararsiz
- Hacettepe University, Faculty of Medicine, Department of Biostatistics, 06100 Sihhiye, Ankara, Turkey
| | - Dincer Goksuluk
- Hacettepe University, Faculty of Medicine, Department of Biostatistics, 06100 Sihhiye, Ankara, Turkey
| |
Collapse
|
29
|
Gunturi SB, Ramamurthi N. A novel approach to generate robust classification models to predict developmental toxicity from imbalanced datasets. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2014; 25:711-727. [PMID: 25102768 DOI: 10.1080/1062936x.2014.942357] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Computational models to predict the developmental toxicity of compounds are built on imbalanced datasets wherein the toxicants outnumber the non-toxicants. Consequently, the results are biased towards the majority class (toxicants). To overcome this problem and to obtain sensitive but also accurate classifiers, we followed an integrated approach wherein (i) Synthetic Minority Over Sampling (SMOTE) is used for re-sampling, (ii) genetic algorithm (GA) is used for variable selection and (iii) support vector machines (SVM) is used for model development. The best model, M3, has (i) sensitivity (SE) = 85.54% and specificity (SP) = 85.62% in leave-one-out validation, (ii) classification accuracy of the training set = 99.67%, (iii) classification accuracy of the test set = 92.59%; and (iv) sensitivity = 92.68, specificity = 92.31 on the test set. Consensus prediction based on models M3-M5 improved these percentages by 5% over M3. From the analysis of results we infer that data imbalance in toxicity studies can be effectively addressed by the application of re-sampling techniques.
Collapse
Affiliation(s)
- S B Gunturi
- a Innovation Labs Hyderabad , Tata Consultancy Services Limited , Madhapur , Hyderabad , India
| | | |
Collapse
|
30
|
Zang Q, Rotroff DM, Judson RS. Binary Classification of a Large Collection of Environmental Chemicals from Estrogen Receptor Assays by Quantitative Structure–Activity Relationship and Machine Learning Methods. J Chem Inf Model 2013; 53:3244-61. [DOI: 10.1021/ci400527b] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
| | - Daniel M. Rotroff
- Bioinformatics
Research Center, Department of Statistics, North Carolina State University, Raleigh, North Carolina 27695, United States
| | | |
Collapse
|
31
|
Newby D, Freitas AA, Ghafourian T. Pre-processing Feature Selection for Improved C&RT Models for Oral Absorption. J Chem Inf Model 2013; 53:2730-42. [DOI: 10.1021/ci400378j] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Affiliation(s)
- Danielle Newby
- Medway School of Pharmacy, Universities of Kent and Greenwich, Chatham, Kent ME4 4TB, U.K
| | - Alex. A. Freitas
- School of Computing, University of Kent, Canterbury, Kent CT2 7NF, U.K
| | - Taravat Ghafourian
- Medway School of Pharmacy, Universities of Kent and Greenwich, Chatham, Kent ME4 4TB, U.K
- Drug Applied Research Centre and Faculty of Pharmacy, Tabriz University of Medical Sciences, Tabriz, East Azerbaijan 51664, Iran
| |
Collapse
|
32
|
Li BK, Cong Y, Yang XG, Xue Y, Chen YZ. In silico prediction of spleen tyrosine kinase inhibitors using machine learning approaches and an optimized molecular descriptor subset generated by recursive feature elimination method. Comput Biol Med 2013; 43:395-404. [DOI: 10.1016/j.compbiomed.2013.01.015] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2012] [Revised: 12/31/2012] [Accepted: 01/21/2013] [Indexed: 11/16/2022]
|
33
|
Chang CY, Hsu MT, Esposito EX, Tseng YJ. Oversampling to Overcome Overfitting: Exploring the Relationship between Data Set Composition, Molecular Descriptors, and Predictive Modeling Methods. J Chem Inf Model 2013; 53:958-71. [DOI: 10.1021/ci4000536] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Affiliation(s)
- Chia-Yun Chang
- School of Pharmacy, College of Medicine, National Taiwan University, No.1, Sec.1, Jen-Ai Road,
Taipei, Taiwan 100
| | - Ming-Tsung Hsu
- Genome
and Systems Biology Degree Program, College of Life Science, National Taiwan University, No.1 Sec.4, Roosevelt Road,
Taipei, Taiwan 106
| | | | - Yufeng J. Tseng
- School of Pharmacy, College of Medicine, National Taiwan University, No.1, Sec.1, Jen-Ai Road,
Taipei, Taiwan 100
- Genome
and Systems Biology Degree Program, College of Life Science, National Taiwan University, No.1 Sec.4, Roosevelt Road,
Taipei, Taiwan 106
- Department of Computer Science and Information
Engineering, National Taiwan University, No.1 Sec.4, Roosevelt Road, Taipei, Taiwan 106
- Graduate Institute of Biomedical Electronics and
Bioinformatics, National Taiwan University, No.1 Sec.4, Roosevelt Road, Taipei, Taiwan 106
| |
Collapse
|
34
|
Shahid M, Shahzad Cheema M, Klenner A, Younesi E, Hofmann-Apitius M. SVM Based Descriptor Selection and Classification of Neurodegenerative Disease Drugs for Pharmacological Modeling. Mol Inform 2013; 32:241-9. [PMID: 27481519 DOI: 10.1002/minf.201200116] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2012] [Accepted: 01/07/2013] [Indexed: 11/10/2022]
Abstract
Systems pharmacological modeling of drug mode of action for the next generation of multitarget drugs may open new routes for drug design and discovery. Computational methods are widely used in this context amongst which support vector machines (SVM) have proven successful in addressing the challenge of classifying drugs with similar features. We have applied a variety of such SVM-based approaches, namely SVM-based recursive feature elimination (SVM-RFE). We use the approach to predict the pharmacological properties of drugs widely used against complex neurodegenerative disorders (NDD) and to build an in-silico computational model for the binary classification of NDD drugs from other drugs. Application of an SVM-RFE model to a set of drugs successfully classified NDD drugs from non-NDD drugs and resulted in overall accuracy of ∼80 % with 10 fold cross validation using 40 top ranked molecular descriptors selected out of total 314 descriptors. Moreover, SVM-RFE method outperformed linear discriminant analysis (LDA) based feature selection and classification. The model reduced the multidimensional descriptors space of drugs dramatically and predicted NDD drugs with high accuracy, while avoiding over fitting. Based on these results, NDD-specific focused libraries of drug-like compounds can be designed and existing NDD-specific drugs can be characterized by a well-characterized set of molecular descriptors.
Collapse
Affiliation(s)
- Mohammad Shahid
- Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn-Aachen International Center for IT, Dahlmannstr. 2, 53113 Bonn, Germany
| | - Muhammad Shahzad Cheema
- Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn-Aachen International Center for IT, Dahlmannstr. 2, 53113 Bonn, Germany
| | - Alexander Klenner
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), 53754 Sankt Augustin, Germany
| | - Erfan Younesi
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), 53754 Sankt Augustin, Germany
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), 53754 Sankt Augustin, Germany..
| |
Collapse
|
35
|
Han B, Ma X, Zhao R, Zhang J, Wei X, Liu X, Liu X, Zhang C, Tan C, Jiang Y, Chen Y. Development and experimental test of support vector machines virtual screening method for searching Src inhibitors from large compound libraries. Chem Cent J 2012; 6:139. [PMID: 23173901 PMCID: PMC3538513 DOI: 10.1186/1752-153x-6-139] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2012] [Accepted: 11/07/2012] [Indexed: 01/04/2023] Open
Abstract
UNLABELLED BACKGROUND Src plays various roles in tumour progression, invasion, metastasis, angiogenesis and survival. It is one of the multiple targets of multi-target kinase inhibitors in clinical uses and trials for the treatment of leukemia and other cancers. These successes and appearances of drug resistance in some patients have raised significant interest and efforts in discovering new Src inhibitors. Various in-silico methods have been used in some of these efforts. It is desirable to explore additional in-silico methods, particularly those capable of searching large compound libraries at high yields and reduced false-hit rates. RESULTS We evaluated support vector machines (SVM) as virtual screening tools for searching Src inhibitors from large compound libraries. SVM trained and tested by 1,703 inhibitors and 63,318 putative non-inhibitors correctly identified 93.53%~ 95.01% inhibitors and 99.81%~ 99.90% non-inhibitors in 5-fold cross validation studies. SVM trained by 1,703 inhibitors reported before 2011 and 63,318 putative non-inhibitors correctly identified 70.45% of the 44 inhibitors reported since 2011, and predicted as inhibitors 44,843 (0.33%) of 13.56M PubChem, 1,496 (0.89%) of 168 K MDDR, and 719 (7.73%) of 9,305 MDDR compounds similar to the known inhibitors. CONCLUSIONS SVM showed comparable yield and reduced false hit rates in searching large compound libraries compared to the similarity-based and other machine-learning VS methods developed from the same set of training compounds and molecular descriptors. We tested three virtual hits of the same novel scaffold from in-house chemical libraries not reported as Src inhibitor, one of which showed moderate activity. SVM may be potentially explored for searching Src inhibitors from large compound libraries at low false-hit rates.
Collapse
Affiliation(s)
- Bucong Han
- The Key Laboratory of Chemical Biology, Guangdong Province, The Graduate School at Shenzhen, Tsinghua University, Shenzhen, Guangdong, 518055, People’s Republic of China
- Computation and Systems Biology, Singapore-MIT Alliance, National University of Singapore, E4-04-10, 4 Engineering Drive 3, Singapore, 117576, Singapore
- Bioinformatics and Drug Design Group, Department of Pharmacy, Centre for Computational Science and Engineering, National University of Singapore, Blk S16, Level 8, 3 Science Drive 2, Singapore, 117543, Singapore
| | - Xiaohua Ma
- Bioinformatics and Drug Design Group, Department of Pharmacy, Centre for Computational Science and Engineering, National University of Singapore, Blk S16, Level 8, 3 Science Drive 2, Singapore, 117543, Singapore
| | - Ruiying Zhao
- Central Research Institute of China Chemical Science and Technology, 20 Xueyuan Road, Haidian District, Beijing, 100083, People’s Republic of China
| | - Jingxian Zhang
- Bioinformatics and Drug Design Group, Department of Pharmacy, Centre for Computational Science and Engineering, National University of Singapore, Blk S16, Level 8, 3 Science Drive 2, Singapore, 117543, Singapore
| | - Xiaona Wei
- Computation and Systems Biology, Singapore-MIT Alliance, National University of Singapore, E4-04-10, 4 Engineering Drive 3, Singapore, 117576, Singapore
- Bioinformatics and Drug Design Group, Department of Pharmacy, Centre for Computational Science and Engineering, National University of Singapore, Blk S16, Level 8, 3 Science Drive 2, Singapore, 117543, Singapore
| | - Xianghui Liu
- Bioinformatics and Drug Design Group, Department of Pharmacy, Centre for Computational Science and Engineering, National University of Singapore, Blk S16, Level 8, 3 Science Drive 2, Singapore, 117543, Singapore
| | - Xin Liu
- Bioinformatics and Drug Design Group, Department of Pharmacy, Centre for Computational Science and Engineering, National University of Singapore, Blk S16, Level 8, 3 Science Drive 2, Singapore, 117543, Singapore
| | - Cunlong Zhang
- The Key Laboratory of Chemical Biology, Guangdong Province, The Graduate School at Shenzhen, Tsinghua University, Shenzhen, Guangdong, 518055, People’s Republic of China
| | - Chunyan Tan
- The Key Laboratory of Chemical Biology, Guangdong Province, The Graduate School at Shenzhen, Tsinghua University, Shenzhen, Guangdong, 518055, People’s Republic of China
| | - Yuyang Jiang
- The Key Laboratory of Chemical Biology, Guangdong Province, The Graduate School at Shenzhen, Tsinghua University, Shenzhen, Guangdong, 518055, People’s Republic of China
| | - Yuzong Chen
- The Key Laboratory of Chemical Biology, Guangdong Province, The Graduate School at Shenzhen, Tsinghua University, Shenzhen, Guangdong, 518055, People’s Republic of China
- Computation and Systems Biology, Singapore-MIT Alliance, National University of Singapore, E4-04-10, 4 Engineering Drive 3, Singapore, 117576, Singapore
- Bioinformatics and Drug Design Group, Department of Pharmacy, Centre for Computational Science and Engineering, National University of Singapore, Blk S16, Level 8, 3 Science Drive 2, Singapore, 117543, Singapore
| |
Collapse
|
36
|
Ghafourian T, Freitas AA, Newby D. The impact of training set data distributions for modelling of passive intestinal absorption. Int J Pharm 2012; 436:711-20. [DOI: 10.1016/j.ijpharm.2012.07.041] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2012] [Revised: 06/11/2012] [Accepted: 07/22/2012] [Indexed: 11/15/2022]
|
37
|
Zhang J, Han B, Wei X, Tan C, Chen Y, Jiang Y. A two-step target binding and selectivity support vector machines approach for virtual screening of dopamine receptor subtype-selective ligands. PLoS One 2012; 7:e39076. [PMID: 22720033 PMCID: PMC3376116 DOI: 10.1371/journal.pone.0039076] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2011] [Accepted: 05/15/2012] [Indexed: 01/13/2023] Open
Abstract
Target selective drugs, such as dopamine receptor (DR) subtype selective ligands, are developed for enhanced therapeutics and reduced side effects. In silico methods have been explored for searching DR selective ligands, but encountered difficulties associated with high subtype similarity and ligand structural diversity. Machine learning methods have shown promising potential in searching target selective compounds. Their target selective capability can be further enhanced. In this work, we introduced a new two-step support vector machines target-binding and selectivity screening method for searching DR subtype-selective ligands, which was tested together with three previously-used machine learning methods for searching D1, D2, D3 and D4 selective ligands. It correctly identified 50.6%–88.0% of the 21–408 subtype selective and 71.7%–81.0% of the 39–147 multi-subtype ligands. Its subtype selective ligand identification rates are significantly better than, and its multi-subtype ligand identification rates are comparable to the best rates of the previously used methods. Our method produced low false-hit rates in screening 13.56 M PubChem, 168,016 MDDR and 657,736 ChEMBLdb compounds. Molecular features important for subtype selectivity were extracted by using the recursive feature elimination feature selection method. These features are consistent with literature-reported features. Our method showed similar performance in searching estrogen receptor subtype selective ligands. Our study demonstrated the usefulness of the two-step target binding and selectivity screening method in searching subtype selective ligands from large compound libraries.
Collapse
Affiliation(s)
- Jingxian Zhang
- The Key Laboratory of Chemical Biology, Guangdong Province, Graduate School at Shenzhen, Tsinghua University, Shenzhen, People's Republic of China
- Bioinformatics and Drug Design Group, Department of Pharmacy, Centre for Computational Science and Engineering, National University of Singapore, Singapore, Singapore
| | - Bucong Han
- Bioinformatics and Drug Design Group, Department of Pharmacy, Centre for Computational Science and Engineering, National University of Singapore, Singapore, Singapore
- Computation and Systems Biology, Singapore-MIT Alliance, National University of Singapore, Singapore, Singapore
| | - Xiaona Wei
- Bioinformatics and Drug Design Group, Department of Pharmacy, Centre for Computational Science and Engineering, National University of Singapore, Singapore, Singapore
- Computation and Systems Biology, Singapore-MIT Alliance, National University of Singapore, Singapore, Singapore
| | - Chunyan Tan
- The Key Laboratory of Chemical Biology, Guangdong Province, Graduate School at Shenzhen, Tsinghua University, Shenzhen, People's Republic of China
| | - Yuzong Chen
- The Key Laboratory of Chemical Biology, Guangdong Province, Graduate School at Shenzhen, Tsinghua University, Shenzhen, People's Republic of China
- Bioinformatics and Drug Design Group, Department of Pharmacy, Centre for Computational Science and Engineering, National University of Singapore, Singapore, Singapore
- * E-mail: (YZC); (YYJ)
| | - Yuyang Jiang
- The Key Laboratory of Chemical Biology, Guangdong Province, Graduate School at Shenzhen, Tsinghua University, Shenzhen, People's Republic of China
- * E-mail: (YZC); (YYJ)
| |
Collapse
|
38
|
Joung JY, Kim HJ, Kim HM, Ahn SK, Nam KY, No KT. Prediction Models of P-Glycoprotein Substrates Using Simple 2D and 3D Descriptors by a Recursive Partitioning Approach. B KOREAN CHEM SOC 2012. [DOI: 10.5012/bkcs.2012.33.4.1123] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
39
|
Wang JF, Cai CZ, Kong CY, Cao ZW, Chen YZ. A Computer Method for Validating Traditional Chinese Medicine Herbal Prescriptions. THE AMERICAN JOURNAL OF CHINESE MEDICINE 2012; 33:281-97. [PMID: 15974487 DOI: 10.1142/s0192415x05002825] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Traditional Chinese medicine (TCM) has been widely practiced and is considered as an alternative to conventional medicine. TCM herbal prescriptions contain a mixture of herbs that collectively exert therapeutic actions and modulating effects. Traditionally defined herbal properties, related to the pharmacodynamic, pharmacokinetic and toxicological, as well as physicochemical properties of their principal ingredients, have been used as the basis for formulating TCM multi-herb prescriptions. These properties are used in this work to develop a computer program for predicting whether a multi-herb recipe is a valid TCM prescription. This program is based on a statistical learning method, support vector machine (SVM), and it is trained by using 575 well-known TCM prescriptions and 1961 non-TCM recipes generated by random combination of TCM herbs. Testing results by using 72 well-known TCM prescriptions and 5039 non-TCM recipes showed that 73.6% of the TCM prescriptions and 99.9% of non-TCM recipes are correctly classified by this system. A further test by using 48 TCM prescriptions published in recent years found that 68.7% of these are correctly classified. These accuracies are comparable to those of SVM classification of other biological systems. Our study indicates the potential of SVM for facilitating the analysis of TCM prescriptions.
Collapse
Affiliation(s)
- J F Wang
- Department of Computational Science, National University of Singapore Blk SOCI, Level 7, 3 Science Drivf 2, Singapore
| | | | | | | | | |
Collapse
|
40
|
Pilkington NCV, Trotter MWB, Holden SB. Multiple Kernel Learning for Drug Discovery. Mol Inform 2012; 31:313-22. [PMID: 27477100 DOI: 10.1002/minf.201100146] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2011] [Accepted: 03/12/2012] [Indexed: 01/04/2023]
Abstract
The support vector machine (SVM) methodology has become a popular and well-used component of present chemometric analysis. We assess a relatively recent development of the algorithm, multiple kernel learning (MKL), on published structure-property relationship (SPR) data. The MKL algorithm learns a weighting across multiple kernel-based representations of the data during supervised classifier creation and, thereby, may be used to describe the influence of distinct groups of structural descriptors upon a single structure-property classifier without explicitly omitting any of them. We observe a statistically significant performance improvement over a conventional, single kernel SVM on all three SPR data sets analysed. Furthermore, MKL output is observed to provide useful information regarding the relative influence of five distinct descriptor subsets present in each data set.
Collapse
Affiliation(s)
- Nicholas C V Pilkington
- University of Cambridge Computer Laboratory, 15 JJ Thomson Avenue, Cambridge, CB3 0FD, UK phone: +44 (0)1223 763725
| | - Matthew W B Trotter
- Anne McLaren Laboratory for Regenerative Medicine & Department of Surgery, University of Cambridge, UK.,Celgene Institute for Translational Research Europe (CITRE), Sevilla, Spain
| | - Sean B Holden
- University of Cambridge Computer Laboratory, 15 JJ Thomson Avenue, Cambridge, CB3 0FD, UK phone: +44 (0)1223 763725.
| |
Collapse
|
41
|
Wang S, Li Y, Wang J, Chen L, Zhang L, Yu H, Hou T. ADMET evaluation in drug discovery. 12. Development of binary classification models for prediction of hERG potassium channel blockage. Mol Pharm 2012; 9:996-1010. [PMID: 22380484 DOI: 10.1021/mp300023x] [Citation(s) in RCA: 118] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Inhibition of the human ether-a-go-go related gene (hERG) potassium channel may result in QT interval prolongation, which causes severe cardiac side effects and is a major problem in clinical studies of drug candidates. The development of in silico tools to filter out potential hERG potassium channel blockers in early stages of the drug discovery process is of considerable interest. Here, a diverse set of 806 compounds with hERG inhibition data was assembled, and the binary hERG classification models using naive Bayesian classification and recursive partitioning (RP) techniques were established and evaluated. The naive Bayesian classifier based on molecular properties and the ECFP_8 fingerprints yielded 84.8% accuracy for the training set using the leave-one-out (LOO) cross-validation procedure and 85% accuracy for the test set of 120 molecules. For the two additional test sets, the model achieved 89.4% accuracy for the WOMBAT-PK test set, and 86.1% accuracy for the PubChem test set. The naive Bayesian classifiers gave better predictions than the RP classifiers. Moreover, the Bayesian classifier, employing molecular fingerprints, highlights the important structural fragments favorable or unfavorable for hERG potassium channel blockage, which offers extra valuable information for the design of compounds avoiding undesirable hERG activity.
Collapse
Affiliation(s)
- Sichao Wang
- Institute of Functional Nano & Soft Materials-FUNSOM and Jiangsu Key Laboratory for Carbon-Based Functional Materials & Devices, Soochow University, Suzhou, Jiangsu 215123, China
| | | | | | | | | | | | | |
Collapse
|
42
|
Predicting P-glycoprotein-mediated drug transport based on support vector machine and three-dimensional crystal structure of P-glycoprotein. PLoS One 2011; 6:e25815. [PMID: 21991360 PMCID: PMC3186768 DOI: 10.1371/journal.pone.0025815] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2011] [Accepted: 09/11/2011] [Indexed: 01/16/2023] Open
Abstract
Human P-glycoprotein (P-gp) is an ATP-binding cassette multidrug transporter that confers resistance to a wide range of chemotherapeutic agents in cancer cells by active efflux of the drugs from cells. P-gp also plays a key role in limiting oral absorption and brain penetration and in facilitating biliary and renal elimination of structurally diverse drugs. Thus, identification of drugs or new molecular entities to be P-gp substrates is of vital importance for predicting the pharmacokinetics, efficacy, safety, or tissue levels of drugs or drug candidates. At present, publicly available, reliable in silico models predicting P-gp substrates are scarce. In this study, a support vector machine (SVM) method was developed to predict P-gp substrates and P-gp-substrate interactions, based on a training data set of 197 known P-gp substrates and non-substrates collected from the literature. We showed that the SVM method had a prediction accuracy of approximately 80% on an independent external validation data set of 32 compounds. A homology model of human P-gp based on the X-ray structure of mouse P-gp as a template has been constructed. We showed that molecular docking to the P-gp structures successfully predicted the geometry of P-gp-ligand complexes. Our SVM prediction and the molecular docking methods have been integrated into a free web server (http://pgp.althotas.com), which allows the users to predict whether a given compound is a P-gp substrate and how it binds to and interacts with P-gp. Utilization of such a web server may prove valuable for both rational drug design and screening.
Collapse
|
43
|
Wong WWL, Burkowski FJ. Using kernel alignment to select features of molecular descriptors in a QSAR study. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1373-1384. [PMID: 21339534 DOI: 10.1109/tcbb.2011.31] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Quantitative structure-activity relationships (QSARs) correlate biological activities of chemical compounds with their physicochemical descriptors. By modeling the observed relationship seen between molecular descriptors and their corresponding biological activities, we may predict the behavior of other molecules with similar descriptors. In QSAR studies, it has been shown that the quality of the prediction model strongly depends on the selected features within molecular descriptors. Thus, methods capable of automatic selection of relevant features are very desirable. In this paper, we present a new feature selection algorithm for a QSAR study based on kernel alignment which has been used as a measure of similarity between two kernel functions. In our algorithm, we deploy kernel alignment as an evaluation tool, using recursive feature elimination to compute a molecular descriptor containing the most important features needed for a classification application. Empirical results show that the algorithm works well for the computation of descriptors for various applications involving different QSAR data sets. The prediction accuracies are substantially increased and are comparable to those from earlier studies.
Collapse
Affiliation(s)
- William W L Wong
- Toronto Health Economics and Technology Assessment Collaborative, Faculty of Pharmacy, University of Toronto, Toronto, ON, Canada.
| | | |
Collapse
|
44
|
Chen YF, Hsu KC, Lin PT, Hsu DF, Kristal BS, Yang JM. LigSeeSVM: ligand-based virtual screening using support vector machines and data fusion. ACTA ACUST UNITED AC 2011; 4:274-89. [PMID: 21778560 DOI: 10.1504/ijcbdd.2011.041415] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Ligand-based in silico drug screening is useful for lead discovery, in particular for those targets without structures. Here, we have developed LigSeeSVM, a ligand-based screening tool using data fusion and Support Vector Machines (SVMs). We used Atom Pair (AP) structure descriptors and Physicochemical (PC) descriptors of compounds to generate SVM-AP and SVM-PC models. Sequentially, the two models were combined using rank-based data fusion to create LigSeeSVM model. LigSeeSVM was evaluated on five data sets. Experimental results show that the performance of LigSeeSVM is better than other ligand-based virtual screening approaches. We believe that LigSeeSVM is useful for lead compounds.
Collapse
Affiliation(s)
- Yen-Fu Chen
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, 30050, Taiwan.
| | | | | | | | | | | |
Collapse
|
45
|
Uğuz H, Güraksın GE, Ergün U, Saraçoğlu R. Biomedical system based on the Discrete Hidden Markov Model using the Rocchio-Genetic approach for the classification of internal carotid artery Doppler signals. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2011; 103:51-60. [PMID: 20673596 DOI: 10.1016/j.cmpb.2010.07.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2009] [Revised: 06/24/2010] [Accepted: 07/02/2010] [Indexed: 05/29/2023]
Abstract
When the maximum likelihood approach (ML) is used during the calculation of the Discrete Hidden Markov Model (DHMM) parameters, DHMM parameters of the each class are only calculated using the training samples (positive training samples) of the same class. The training samples (negative training samples) not belonging to that class are not used in the calculation of DHMM model parameters. With the aim of supplying that deficiency, by involving the training samples of all classes in calculating processes, a Rocchio algorithm based approach is suggested. During the calculation period, in order to determine the most appropriate values of parameters for adjusting the relative effect of the positive and negative training samples, a Genetic algorithm is used as an optimization technique. The purposed method is used to classify the internal carotid artery Doppler signals recorded from 136 patients as well as of 55 healthy people. Our proposed method reached 97.38% classification accuracy with fivefold cross-validation (CV) technique. The classification results showed that the proposed method was effective for the classification of internal carotid artery Doppler signals.
Collapse
Affiliation(s)
- Harun Uğuz
- Department of Computer Engineering, Selçuk University, Konya, Turkey.
| | | | | | | |
Collapse
|
46
|
Klon AE. Machine learning algorithms for the prediction of hERG and CYP450 binding in drug development. Expert Opin Drug Metab Toxicol 2011; 6:821-33. [PMID: 20465523 DOI: 10.1517/17425255.2010.489550] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
IMPORTANCE OF THE FIELD The cost of developing new drugs is estimated at approximately $1 billion; the withdrawal of a marketed compound due to toxicity can result in serious financial loss for a pharmaceutical company. There has been a greater interest in the development of in silico tools that can identify compounds with metabolic liabilities before they are brought to market. AREAS COVERED IN THIS REVIEW The two largest classes of machine learning (ML) models, which will be discussed in this review, have been developed to predict binding to the human ether-a-go-go related gene (hERG) ion channel protein and the various CYP isoforms. Being able to identify potentially toxic compounds before they are made would greatly reduce the number of compound failures and the costs associated with drug development. WHAT THE READER WILL GAIN This review summarizes the state of modeling hERG and CYP binding towards this goal since 2003 using ML algorithms. TAKE HOME MESSAGE A wide variety of ML algorithms that are comparable in their overall performance are available. These ML methods may be applied regularly in discovery projects to flag compounds with potential metabolic liabilities.
Collapse
Affiliation(s)
- Anthony E Klon
- Ansaris, Computational Chemistry, Four Valley Square, 512 East Township Line Road, Blue Bell, PA 19422, USA.
| |
Collapse
|
47
|
Hecht D. Applications of machine learning and computational intelligence to drug discovery and development. Drug Dev Res 2010. [DOI: 10.1002/ddr.20402] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Affiliation(s)
- David Hecht
- Southwestern College, Chula Vista, California
| |
Collapse
|
48
|
Niijima S, Yabuuchi H, Okuno Y. Cross-Target View to Feature Selection: Identification of Molecular Interaction Features in Ligand−Target Space. J Chem Inf Model 2010; 51:15-24. [DOI: 10.1021/ci1001394] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Satoshi Niijima
- Department of Systems Bioscience for Drug Discovery, Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto, Japan
| | - Hiroaki Yabuuchi
- Department of Systems Bioscience for Drug Discovery, Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto, Japan
| | - Yasushi Okuno
- Department of Systems Bioscience for Drug Discovery, Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto, Japan
| |
Collapse
|
49
|
Cao D, Liang Y, Xu Q, Yun Y, Li H. Toward better QSAR/QSPR modeling: simultaneous outlier detection and variable selection using distribution of model features. J Comput Aided Mol Des 2010; 25:67-80. [DOI: 10.1007/s10822-010-9401-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2010] [Accepted: 11/03/2010] [Indexed: 10/18/2022]
|
50
|
Dehmer MM, Barbarini NN, Varmuza KK, Graber AA. Novel topological descriptors for analyzing biological networks. BMC STRUCTURAL BIOLOGY 2010; 10:18. [PMID: 20565796 PMCID: PMC2906494 DOI: 10.1186/1472-6807-10-18] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/29/2009] [Accepted: 06/17/2010] [Indexed: 01/28/2023]
Abstract
BACKGROUND Topological descriptors, other graph measures, and in a broader sense, graph-theoretical methods, have been proven as powerful tools to perform biological network analysis. However, the majority of the developed descriptors and graph-theoretical methods does not have the ability to take vertex- and edge-labels into account, e.g., atom- and bond-types when considering molecular graphs. Indeed, this feature is important to characterize biological networks more meaningfully instead of only considering pure topological information. RESULTS In this paper, we put the emphasis on analyzing a special type of biological networks, namely bio-chemical structures. First, we derive entropic measures to calculate the information content of vertex- and edge-labeled graphs and investigate some useful properties thereof. Second, we apply the mentioned measures combined with other well-known descriptors to supervised machine learning methods for predicting Ames mutagenicity. Moreover, we investigate the influence of our topological descriptors - measures for only unlabeled vs. measures for labeled graphs - on the prediction performance of the underlying graph classification problem. CONCLUSIONS Our study demonstrates that the application of entropic measures to molecules representing graphs is useful to characterize such structures meaningfully. For instance, we have found that if one extends the measures for determining the structural information content of unlabeled graphs to labeled graphs, the uniqueness of the resulting indices is higher. Because measures to structurally characterize labeled graphs are clearly underrepresented so far, the further development of such methods might be valuable and fruitful for solving problems within biological network analysis.
Collapse
Affiliation(s)
- Matthias M Dehmer
- Institute for Bioinformatics and Translational Research, UMIT, Eduard Wallnoefer Zentrum 1, A-6060, Hall in Tyrol, Austria
| | - Nicola N Barbarini
- Department of Computer Science and Systems, University of Pavia, Via Ferrata 1, 27100, Pavia, Italy
| | - Kurt K Varmuza
- Institute of Chemical Engineering, Laboratory for Chemometrics, Vienna University of Technology, Getreidemarkt 9/166, A-1060 Vienna, Austria
| | - Armin A Graber
- Institute for Bioinformatics and Translational Research, UMIT, Eduard Wallnoefer Zentrum 1, A-6060, Hall in Tyrol, Austria
| |
Collapse
|