1
|
Yu S, Liao B, Zhu W, Peng D, Wu F. Accurate prediction and key protein sequence feature identification of cyclins. Brief Funct Genomics 2023; 22:411-419. [PMID: 37118891 DOI: 10.1093/bfgp/elad014] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Revised: 03/03/2023] [Accepted: 03/17/2023] [Indexed: 04/30/2023] Open
Abstract
Cyclin proteins are a group of proteins that activate the cell cycle by forming complexes with cyclin-dependent kinases. Identifying cyclins correctly can provide key clues to understanding the function of cyclins. However, due to the low similarity between cyclin protein sequences, the advancement of a machine learning-based approach to identify cycles is urgently needed. In this study, cyclin protein sequence features were extracted using the profile-based auto-cross covariance method. Then the features were ranked and selected with maximum relevance-maximum distance (MRMD) 1.0 and MRMD2.0. Finally, the prediction model was assessed through 10-fold cross-validation. The computational experiments showed that the best protein sequence features generated by MRMD1.0 could correctly predict 98.2% of cyclins using the random forest (RF) classifier, whereas seven-dimensional key protein sequence features identified with MRMD2.0 could correctly predict 96.1% of cyclins, which was superior to previous studies on the same dataset both in terms of dimensionality and performance comparisons. Therefore, our work provided a valuable tool for identifying cyclins. The model data can be downloaded from https://github.com/YUshunL/cyclin.
Collapse
Affiliation(s)
- Shaoyou Yu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Bo Liao
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Wen Zhu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Dejun Peng
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Fangxiang Wu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| |
Collapse
|
2
|
Win ZM, Cheong AMY, Hopkins WS. Using Machine Learning To Predict Partition Coefficient (Log P) and Distribution Coefficient (Log D) with Molecular Descriptors and Liquid Chromatography Retention Time. J Chem Inf Model 2023; 63:1906-1913. [PMID: 36926888 DOI: 10.1021/acs.jcim.2c01373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
During preclinical evaluations of drug candidates, several physicochemical (p-chem) properties are measured and employed as metrics to estimate drug efficacy in vivo. Two such p-chem properties are the octanol-water partition coefficient, Log P, and distribution coefficient, Log D, which are useful in estimating the distribution of drugs within the body. Log P and Log D are traditionally measured using the shake-flask method and high-performance liquid chromatography. However, it is challenging to measure these properties for species that are very hydrophobic (or hydrophilic) owing to the very low equilibrium concentrations partitioned into octanol (or aqueous) phases. Moreover, the shake-flask method is relatively time-consuming and can require multistep dilutions as the range of analyte concentrations can differ by several orders of magnitude. Here, we circumvent these limitations by using machine learning (ML) to correlate Log P and Log D with liquid chromatography (LC) retention time (RT). Predictive models based on four ML algorithms, which used molecular descriptors and LC RTs as features, were extensively tested and compared. The inclusion of RT as an additional descriptor improves model performance (MAE = 0.366 and R2 = 0.89), and Shapley additive explanations analysis indicates that RT has the highest impact on model accuracy.
Collapse
Affiliation(s)
- Zaw-Myo Win
- Centre for Eye and Vision Research, Hong Kong Science Park, New Territories 999077, Hong Kong.,School of Optometry, The Hong Kong Polytechnic University, Kowloon 999077, Hong Kong.,Department of Chemistry, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1, Canada
| | - Allen M Y Cheong
- Centre for Eye and Vision Research, Hong Kong Science Park, New Territories 999077, Hong Kong.,School of Optometry, The Hong Kong Polytechnic University, Kowloon 999077, Hong Kong
| | - W Scott Hopkins
- Centre for Eye and Vision Research, Hong Kong Science Park, New Territories 999077, Hong Kong.,Department of Chemistry, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1, Canada.,Waterloo Institute for Nanotechnology, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1, Canada.,WaterMine Innovation, Inc., Waterloo, Ontario N0B 2T0, Canada
| |
Collapse
|
3
|
Hao Y, Fan T, Sun G, Li F, Zhang N, Zhao L, Zhong R. Environmental toxicity risk evaluation of nitroaromatic compounds: Machine learning driven binary/multiple classification and design of safe alternatives. Food Chem Toxicol 2022; 170:113461. [PMID: 36243219 DOI: 10.1016/j.fct.2022.113461] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 09/11/2022] [Accepted: 10/04/2022] [Indexed: 11/06/2022]
|
4
|
Chen S, Li T, Yang L, Zhai F, Jiang X, Xiang R, Ling G. Artificial intelligence-driven prediction of multiple drug interactions. Brief Bioinform 2022; 23:6720429. [PMID: 36168896 DOI: 10.1093/bib/bbac427] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Revised: 09/01/2022] [Accepted: 09/02/2022] [Indexed: 12/14/2022] Open
Abstract
When a drug is administered to exert its efficacy, it will encounter multiple barriers and go through multiple interactions. Predicting the drug-related multiple interactions is critical for drug development and safety monitoring because it provides foundations for practical, safe compatibility and rational use of multiple drugs. With the progress of artificial intelligence (AI) technology, a variety of novel prediction methods for single interaction have emerged and shown great advantages compared to the traditional, expensive and time-consuming laboratory research. To promote the comprehensive and simultaneous predictions of multiple interactions, we systematically reviewed the application of AI in drug-drug, drug-food (excipients) and drug-microbiome interactions. We began by outlining the model methods, evaluation indicators, algorithms and databases commonly used to build models for three types of drug interactions. The models based on the metabolic enzyme P450, drug similarity and drug targets have empathized among the machine learning models of drug-drug interactions. In particular, we discussed the limitations of current approaches and identified potential areas for future research. It is anticipated the in-depth review will be helpful for the development of the next-generation of systematic prediction models for simultaneous multiple interactions.
Collapse
Affiliation(s)
- Siqi Chen
- College of Medical Devices, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
| | - Tiancheng Li
- College of Medical Devices, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
| | - Luna Yang
- College of Medical Devices, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
| | - Fei Zhai
- College of Medical Devices, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
| | - Xiwei Jiang
- College of Medical Devices, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
| | - Rongwu Xiang
- College of Medical Devices, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China.,Liaoning Medical Big Data and Artificial Intelligence Engineering Technology Research Center, Shenyang 110016, China
| | - Guixia Ling
- College of Medical Devices, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
| |
Collapse
|
5
|
Yu S, Peng D, Zhu W, Liao B, Wang P, Yang D, Wu F. Hybrid_DBP: Prediction of DNA-binding proteins using hybrid features and convolutional neural networks. Front Pharmacol 2022; 13:1031759. [PMID: 36299898 PMCID: PMC9589247 DOI: 10.3389/fphar.2022.1031759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 09/27/2022] [Indexed: 11/21/2022] Open
Abstract
DNA-binding proteins (DBP) play an essential role in the genetics and evolution of organisms. A particular DNA sequence could provide underlying therapeutic benefits for hereditary diseases and cancers. Studying these proteins can timely and effectively understand their mechanistic analysis and play a particular function in disease prevention and treatment. The limitation of identifying DNA-binding protein members from the sequence database is time-consuming, costly, and ineffective. Therefore, efficient methods for improving DBP classification are crucial to disease research. In this paper, we developed a novel predictor Hybrid _DBP, which identified potential DBP by using hybrid features and convolutional neural networks. The method combines two feature selection methods, MonoDiKGap and Kmer, and then used MRMD2.0 to remove redundant features. According to the results, 94% of DBP were correctly recognized, and the accuracy of the independent test set reached 91.2%. This means Hybrid_ DBP can become a useful prediction tool for predicting DBP.
Collapse
Affiliation(s)
- Shaoyou Yu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Dejun Peng
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Wen Zhu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
- *Correspondence: Wen Zhu,
| | - Bo Liao
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Peng Wang
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Dongxuan Yang
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Fangxiang Wu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| |
Collapse
|
6
|
Wang Y, Michael S, Yang SM, Huang R, Cruz-Gutierrez K, Zhang Y, Zhao J, Xia M, Shinn P, Sun H. Retro Drug Design: From Target Properties to Molecular Structures. J Chem Inf Model 2022; 62:2659-2669. [PMID: 35653613 PMCID: PMC9198977 DOI: 10.1021/acs.jcim.2c00123] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
![]()
To
deliver more therapeutics to more patients more quickly and
economically is the ultimate goal of pharmaceutical researchers. The
advent and rapid development of artificial intelligence (AI), in combination
with other powerful computational methods in drug discovery, makes
this goal more practical than ever before. Here, we describe a new
strategy, retro drug design, or RDD, to create novel small-molecule
drugs from scratch to meet multiple predefined requirements, including
biological activity against a drug target and optimal range of physicochemical
and ADMET properties. The molecular structure was represented by an
atom typing based molecular descriptor system, optATP, which was further
transformed to the space of loading vectors from principal component
analysis. Traditional predictive models were trained over experimental
data for the target properties using optATP and shallow machine learning
methods. The Monte Carlo sampling algorithm was then utilized to find
the solutions in the space of loading vectors that have the target
properties. Finally, a deep learning model was employed to decode
molecular structures from the solutions. To test the feasibility of
the algorithm, we challenged RDD to generate novel kinase inhibitors
from random numbers with five different ADMET properties optimized
at the same time. The best Tanimoto similarity score between the generated
valid structures and the available 4,314 kinase inhibitors was <
0.50, indicating a high extent of novelty of the generated compounds.
From the 3,040 structures that met all six target properties, 20 were
selected for synthesis and experimental measurement of inhibition
activity over 97 representative kinases and the ADMET properties.
Fifteen and eight compounds were determined to be hits or strong hits,
respectively. Five of the six strong kinase inhibitors have excellent
experimental ADMET properties. The results presented in this paper
illustrate that RDD has the potential to significantly improve the
current drug discovery process.
Collapse
Affiliation(s)
- Yuhong Wang
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Sam Michael
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Shyh-Ming Yang
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Ruili Huang
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Kennie Cruz-Gutierrez
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Yaqing Zhang
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Jinghua Zhao
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Menghang Xia
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Paul Shinn
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Hongmao Sun
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| |
Collapse
|
7
|
Tang W, Liu W, Wang Z, Hong H, Chen J. Machine learning models on chemical inhibitors of mitochondrial electron transport chain. JOURNAL OF HAZARDOUS MATERIALS 2022; 426:128067. [PMID: 34920224 DOI: 10.1016/j.jhazmat.2021.128067] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Revised: 12/05/2021] [Accepted: 12/08/2021] [Indexed: 06/14/2023]
Abstract
Chemicals can induce adverse effects in humans by inhibiting mitochondrial electron transport chain (ETC) such as disrupting mitochondrial membrane potential, enhancing oxidative stress and causing some diseases. Thus, identifying ETC inhibitors (ETCi) is important to chemical risk assessment and protecting the public health. However, it is not feasible to identify all ETCi with experimental methods. Quantitative structure-activity relationship (QSAR) modeling is a promising method to rapidly and effectively identify ETCi. In this study, QSAR models for predicting ETCi were developed using machine learning methods. A clustering-based under-sampling (CBUS) method was developed to handle the imbalance issue in training sets. Structure-activity landscapes were generated and analyzed for training sets generated by the CBUS method. The consensus QSAR models constructed with CBUS achieved satisfactory performances (balanced accuracy = 0.852) in 100 iterations of five-fold cross validations, indicating the models can effectively classify ETCi. The classification model was further employed to screen chemicals in the Inventory of Existing Chemical Substances of China and 13 chemicals were identified as ETCi. Fifteen structural alerts for ETCi were identified in this study. These results demonstrated that the model and structural alerts are useful to screen ETCi.
Collapse
Affiliation(s)
- Weihao Tang
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Wenjia Liu
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Zhongyu Wang
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Huixiao Hong
- National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR Rd, Jefferson, AR 72079, USA
| | - Jingwen Chen
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China.
| |
Collapse
|
8
|
Wang Y, Michael S, Huang R, Zhao J, Recabo K, Bougie D, Shu Q, Shinn P, Sun H. Retro Drug Design: From Target Properties to Molecular Structures. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021. [PMID: 34013260 PMCID: PMC8132216 DOI: 10.1101/2021.05.11.442656] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
To generate drug molecules of desired properties with computational methods is the holy grail in pharmaceutical research. Here we describe an AI strategy, retro drug design, or RDD, to generate novel small molecule drugs from scratch to meet predefined requirements, including but not limited to biological activity against a drug target, and optimal range of physicochemical and ADMET properties. Traditional predictive models were first trained over experimental data for the target properties, using an atom typing based molecular descriptor system, ATP. Monte Carlo sampling algorithm was then utilized to find the solutions in the ATP space defined by the target properties, and the deep learning model of Seq2Seq was employed to decode molecular structures from the solutions. To test feasibility of the algorithm, we challenged RDD to generate novel drugs that can activate μ opioid receptor (MOR) and penetrate blood brain barrier (BBB). Starting from vectors of random numbers, RDD generated 180,000 chemical structures, of which 78% were chemically valid. About 42,000 (31%) of the valid structures fell into the property space defined by MOR activity and BBB permeability. Out of the 42,000 structures, only 267 chemicals were commercially available, indicating a high extent of novelty of the AI-generated compounds. We purchased and assayed 96 compounds, and 25 of which were found to be MOR agonists. These compounds also have excellent BBB scores. The results presented in this paper illustrate that RDD has potential to revolutionize the current drug discovery process and create novel structures with multiple desired properties, including biological functions and ADMET properties. Availability of an AI-enabled fast track in drug discovery is essential to cope with emergent public health threat, such as pandemic of COVID-19.
Collapse
|
9
|
Ye Z, Yang W, Yang Y, Ouyang D. Interpretable machine learning methods for in vitro pharmaceutical formulation development. FOOD FRONTIERS 2021. [DOI: 10.1002/fft2.78] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Affiliation(s)
- Zhuyifan Ye
- State Key Laboratory of Quality Research in Chinese Medicine Institute of Chinese Medical Sciences (ICMS) University of Macau Macau China
| | - Wenmian Yang
- State Key Laboratory of Internet of Things for Smart City University of Macau Macau China
| | - Yilong Yang
- School of Software Beihang University Beijing China
| | - Defang Ouyang
- State Key Laboratory of Quality Research in Chinese Medicine Institute of Chinese Medical Sciences (ICMS) University of Macau Macau China
| |
Collapse
|
10
|
Patel L, Shukla T, Huang X, Ussery DW, Wang S. Machine Learning Methods in Drug Discovery. Molecules 2020; 25:E5277. [PMID: 33198233 PMCID: PMC7696134 DOI: 10.3390/molecules25225277] [Citation(s) in RCA: 118] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Revised: 11/04/2020] [Accepted: 11/09/2020] [Indexed: 12/30/2022] Open
Abstract
The advancements of information technology and related processing techniques have created a fertile base for progress in many scientific fields and industries. In the fields of drug discovery and development, machine learning techniques have been used for the development of novel drug candidates. The methods for designing drug targets and novel drug discovery now routinely combine machine learning and deep learning algorithms to enhance the efficiency, efficacy, and quality of developed outputs. The generation and incorporation of big data, through technologies such as high-throughput screening and high through-put computational analysis of databases used for both lead and target discovery, has increased the reliability of the machine learning and deep learning incorporated techniques. The use of these virtual screening and encompassing online information has also been highlighted in developing lead synthesis pathways. In this review, machine learning and deep learning algorithms utilized in drug discovery and associated techniques will be discussed. The applications that produce promising results and methods will be reviewed.
Collapse
Affiliation(s)
- Lauv Patel
- Chemistry Department, University of Arkansas at Little Rock, Little Rock, AR 72204, USA; (L.P.); (T.S.)
| | - Tripti Shukla
- Chemistry Department, University of Arkansas at Little Rock, Little Rock, AR 72204, USA; (L.P.); (T.S.)
| | - Xiuzhen Huang
- Department of Computer Science, Arkansas State University, Jonesboro, AR 72467, USA;
| | - David W. Ussery
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA;
| | - Shanzhi Wang
- Chemistry Department, University of Arkansas at Little Rock, Little Rock, AR 72204, USA; (L.P.); (T.S.)
| |
Collapse
|
11
|
Tang W, Chen J, Hong H. Development of classification models for predicting inhibition of mitochondrial fusion and fission using machine learning methods. CHEMOSPHERE 2020; 273:128567. [PMID: 34756375 DOI: 10.1016/j.chemosphere.2020.128567] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 10/03/2020] [Accepted: 10/06/2020] [Indexed: 06/13/2023]
Abstract
Mitochondrial fusion and fission are processes to maintain mitochondrial function when cells respond to environment stresses. Disruption of mitochondrial fusion and fission influences cell health and can cause adverse events such as neurodegenerative disorders. It is critical to identify environmental chemicals that can disrupt mitochondrial fusion and fission. However, experimentally testing all the chemicals is not practical because experimental methods are time-consuming and costly. Quantitative structure-activity relationship (QSAR) modeling is an attractive approach for evaluation of chemicals disrupting potential on mitochondrial fusion and fission. In this study, QSAR models were developed for differentiating chemicals capable of inhibition of mitochondrial fusion and fission using machine learning algorithms (i.e. random forest, logistic regression, Bernoulli naive Bayes, and deep neural network). One hundred iterations of five-fold cross validations and external validations showed that the best model on mitochondrial fusion had area under the receiver operating characteristic curve (AUC) of 82.8% and 78.1%, respectively; and the best model for mitochondrial fission yielded AUC of 84.3% and 97.5%, respectively. Furthermore, 45 and 56 structural alerts were identified for inhibition of mitochondrial fusion and fission, respectively. The results demonstrated that the models and the structural alerts could be useful for screening chemicals that inhibit mitochondrial fusion and fission.
Collapse
Affiliation(s)
- Weihao Tang
- Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology, Dalian University of Technology, Dalian, 116024, China
| | - Jingwen Chen
- Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology, Dalian University of Technology, Dalian, 116024, China.
| | - Huixiao Hong
- National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Rd, Jefferson, AR, 72079, USA.
| |
Collapse
|
12
|
Tang W, Chen J, Hong H. Discriminant models on mitochondrial toxicity improved by consensus modeling and resolving imbalance in training. CHEMOSPHERE 2020; 253:126768. [PMID: 32464767 DOI: 10.1016/j.chemosphere.2020.126768] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 04/08/2020] [Accepted: 04/08/2020] [Indexed: 06/11/2023]
Abstract
Humans and animals may be exposed to tens of thousands of natural and synthetic chemicals during their lifespan. It is difficult to assess risk for all the chemicals with experimental toxicity tests. An alternative approach is to use computational toxicology methods such as quantitative structure-activity relationship (QSAR) modeling. Mitochondrial toxicity is involved in many diseases such as cancer, neurodegeneration, type 2 diabetes, cardiovascular diseases and autoimmune diseases. Thus, it is important to rapidly and efficiently identify chemicals with mitochondrial toxicity. In this study, five machine learning algorithms and twelve types of molecular fingerprints were employed to generate QSAR discriminant models for mitochondrial toxicity. A threshold moving method was adopted to resolve the imbalance issue in the training data. Consensus of the models by an averaging probability strategy improved prediction performance. The best model has correct classification rates of 81.8% and 88.3% in ten-fold cross validation and external validation, respectively. Substructures such as phenol, carboxylic acid, nitro and arylchloride were found informative through analysis of information gain and frequency of substructures. The results demonstrate that resolving imbalance in training and building consensus models can improve classification rates for mitochondrial toxicity prediction.
Collapse
Affiliation(s)
- Weihao Tang
- Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology, Dalian University of Technology, Dalian, 116024, China
| | - Jingwen Chen
- Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology, Dalian University of Technology, Dalian, 116024, China.
| | - Huixiao Hong
- National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Rd, Jefferson, AR, 72079, USA
| |
Collapse
|
13
|
Korkmaz S. Deep Learning-Based Imbalanced Data Classification for Drug Discovery. J Chem Inf Model 2020; 60:4180-4190. [DOI: 10.1021/acs.jcim.9b01162] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Selçuk Korkmaz
- Trakya University Faculty of Medicine, Department of Biostatistics and Medical Informatics, Edirne, Turkey
| |
Collapse
|
14
|
Hao Y, Sun G, Fan T, Sun X, Liu Y, Zhang N, Zhao L, Zhong R, Peng Y. Prediction on the mutagenicity of nitroaromatic compounds using quantum chemistry descriptors based QSAR and machine learning derived classification methods. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2019; 186:109822. [PMID: 31634658 DOI: 10.1016/j.ecoenv.2019.109822] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Revised: 10/11/2019] [Accepted: 10/14/2019] [Indexed: 06/10/2023]
Abstract
Nitroaromatic compounds (NACs) are an important type of environmental organic pollutants. However, it is lack of sufficient information relating to their potential adverse effects on human health and the environment due to the limited resources. Thus, using in silico technologies to assess their potential hazardous effects is urgent and promising. In this study, quantitative structure activity relationship (QSAR) and classification models were constructed using a set of NACs based on their mutagenicity against Salmonella typhimurium TA100 strain. For QSAR studies, DRAGON descriptors together with quantum chemistry descriptors were calculated for characterizing the detailed molecular information. Based on genetic algorithm (GA) and multiple linear regression (MLR) analyses, we screened descriptors and developed QSAR models. For classification studies, seven machine learning methods along with six molecular fingerprints were applied to develop qualitative classification models. The goodness of fitting, reliability, robustness and predictive performance of all developed models were measured by rigorous statistical validation criteria, then the best QSAR and classification models were chosen. Moreover, the QSAR models with quantum chemistry descriptors were compared to that without quantum chemistry descriptors and previously reported models. Notably, we also obtained some specific molecular properties or privileged substructures responsible for the high mutagenicity of NACs. Overall, the developed QSAR and classification models can be utilized as potential tools for rapidly predicting the mutagenicity of new or untested NACs for environmental hazard assessment and regulatory purposes, and may provide insights into the in vivo toxicity mechanisms of NACs and related compounds.
Collapse
Affiliation(s)
- Yuxing Hao
- Beijing Key Laboratory of Environmental and Viral Oncology, College of Life Science and Bioengineering, Beijing University of Technology, Beijing, 100124, PR China.
| | - Guohui Sun
- Beijing Key Laboratory of Environmental and Viral Oncology, College of Life Science and Bioengineering, Beijing University of Technology, Beijing, 100124, PR China.
| | - Tengjiao Fan
- Beijing Key Laboratory of Environmental and Viral Oncology, College of Life Science and Bioengineering, Beijing University of Technology, Beijing, 100124, PR China.
| | - Xiaodong Sun
- Beijing Key Laboratory of Environmental and Viral Oncology, College of Life Science and Bioengineering, Beijing University of Technology, Beijing, 100124, PR China.
| | - Yongdong Liu
- Beijing Key Laboratory of Environmental and Viral Oncology, College of Life Science and Bioengineering, Beijing University of Technology, Beijing, 100124, PR China.
| | - Na Zhang
- Beijing Key Laboratory of Environmental and Viral Oncology, College of Life Science and Bioengineering, Beijing University of Technology, Beijing, 100124, PR China.
| | - Lijiao Zhao
- Beijing Key Laboratory of Environmental and Viral Oncology, College of Life Science and Bioengineering, Beijing University of Technology, Beijing, 100124, PR China.
| | - Rugang Zhong
- Beijing Key Laboratory of Environmental and Viral Oncology, College of Life Science and Bioengineering, Beijing University of Technology, Beijing, 100124, PR China.
| | - Yongzhen Peng
- National Engineering Laboratory for Advanced Municipal Wastewater Treatment and Reuse Technology, Engineering Research Center of Beijing, Beijing University of Technology, Beijing, 100124, China.
| |
Collapse
|
15
|
Hinge VK, Roy D, Kovalenko A. Prediction of P-glycoprotein inhibitors with machine learning classification models and 3D-RISM-KH theory based solvation energy descriptors. J Comput Aided Mol Des 2019; 33:965-971. [PMID: 31745705 DOI: 10.1007/s10822-019-00253-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Accepted: 11/14/2019] [Indexed: 11/24/2022]
Abstract
Development of novel in silico methods for questing novel PgP inhibitors is crucial for the reversal of multi-drug resistance in cancer therapy. Here, we report machine learning based binary classification schemes to identify the PgP inhibitors from non-inhibitors using molecular solvation theory with excellent accuracy and precision. The excess chemical potential and partial molar volume in various solvents are calculated for PgP± (PgP inhibitors and non-inhibitors) compounds with the statistical-mechanical based three-dimensional reference interaction site model with the Kovalenko-Hirata closure approximation (3D-RISM-KH molecular theory of solvation). The statistical importance analysis of descriptors identified the 3D-RISM-KH based descriptors as top molecular descriptors for classification. Among the constructed classification models, the support vector machine predicted the test set of Pgp± compounds with highest accuracy and precision of ~ 97% for test set. The validation of models confirms the robustness of state-of-the-art molecular solvation theory based descriptors in identification of the Pgp± compounds.
Collapse
Affiliation(s)
- Vijaya Kumar Hinge
- Department of Mechanical Engineering, 10-203 Donadeo Innovation Centre for Engineering, University of Alberta, 9211-116 Street NW, Edmonton, AB, T6G 1H9, Canada
| | - Dipankar Roy
- Department of Mechanical Engineering, 10-203 Donadeo Innovation Centre for Engineering, University of Alberta, 9211-116 Street NW, Edmonton, AB, T6G 1H9, Canada
| | - Andriy Kovalenko
- Department of Mechanical Engineering, 10-203 Donadeo Innovation Centre for Engineering, University of Alberta, 9211-116 Street NW, Edmonton, AB, T6G 1H9, Canada. .,Nanotechnology Research Centre, 11421 Saskatchewan Drive, Edmonton, AB, T6G 2M9, Canada.
| |
Collapse
|
16
|
Abstract
The Python programing language is becoming a promising tool for data analysis in various fields. However, little attention has been paid to using Python in the field of analytical chemistry, though recent advances in instrumental analysis require robust and reliable data analysis. In order to overcome the difficulty in accurate analysis, multivariate analysis, or chemometrics, has been widely applied to various kinds of data obtained by instrumental analysis. In the present work, the potential usefulness of Python for chemometrics and related fields in chemistry is reviewed. Many practical tools for chemometrics, e.g., principal component analysis (PCA), partial least squares (PLS), support vector machine (SVM), etc., are included in the scikit-learn machine learning (ML) library for Python. Other useful libraries such as pyMCR for multivariate curve resolution (MCR), 2Dpy for two-dimensional correlation spectroscopy (2D-COS), etc. can be obtained from GitHub. For these reasons, a computational environment for chemometrics is easily constructed in Python.
Collapse
Affiliation(s)
- Shigeaki Morita
- Department of Engineering Science, Osaka Electro-Communication University
| |
Collapse
|
17
|
Jiang C, Yang H, Di P, Li W, Tang Y, Liu G. In silico prediction of chemical reproductive toxicity using machine learning. J Appl Toxicol 2019; 39:844-854. [DOI: 10.1002/jat.3772] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Revised: 12/05/2018] [Accepted: 12/15/2018] [Indexed: 12/30/2022]
Affiliation(s)
- Changsheng Jiang
- Shanghai Key Laboratory of New Drug Design, School of PharmacyEast China University of Science and Technology Shanghai 200237 China
| | - Hongbin Yang
- Shanghai Key Laboratory of New Drug Design, School of PharmacyEast China University of Science and Technology Shanghai 200237 China
| | - Peiwen Di
- Shanghai Key Laboratory of New Drug Design, School of PharmacyEast China University of Science and Technology Shanghai 200237 China
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design, School of PharmacyEast China University of Science and Technology Shanghai 200237 China
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design, School of PharmacyEast China University of Science and Technology Shanghai 200237 China
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design, School of PharmacyEast China University of Science and Technology Shanghai 200237 China
| |
Collapse
|
18
|
Sun G, Fan T, Sun X, Hao Y, Cui X, Zhao L, Ren T, Zhou Y, Zhong R, Peng Y. In Silico Prediction of O⁶-Methylguanine-DNA Methyltransferase Inhibitory Potency of Base Analogs with QSAR and Machine Learning Methods. Molecules 2018; 23:E2892. [PMID: 30404161 PMCID: PMC6278368 DOI: 10.3390/molecules23112892] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2018] [Revised: 11/04/2018] [Accepted: 11/06/2018] [Indexed: 12/24/2022] Open
Abstract
O⁶-methylguanine-DNA methyltransferase (MGMT), a unique DNA repair enzyme, can confer resistance to DNA anticancer alkylating agents that modify the O⁶-position of guanine. Thus, inhibition of MGMT activity in tumors has a great interest for cancer researchers because it can significantly improve the anticancer efficacy of such alkylating agents. In this study, we performed a quantitative structure activity relationship (QSAR) and classification study based on a total of 134 base analogs related to their ED50 values (50% inhibitory concentration) against MGMT. Molecular information of all compounds were described by quantum chemical descriptors and Dragon descriptors. Genetic algorithm (GA) and multiple linear regression (MLR) analysis were combined to develop QSAR models. Classification models were generated by seven machine-learning methods based on six types of molecular fingerprints. Performances of all developed models were assessed by internal and external validation techniques. The best QSAR model was obtained with Q²Loo = 0.83, R² = 0.87, Q²ext = 0.67, and R²ext = 0.69 based on 84 compounds. The results from QSAR studies indicated topological charge indices, polarizability, ionization potential (IP), and number of primary aromatic amines are main contributors for MGMT inhibition of base analogs. For classification studies, the accuracies of 10-fold cross-validation ranged from 0.750 to 0.885 for top ten models. The range of accuracy for the external test set ranged from 0.800 to 0.880 except for PubChem-Tree model, suggesting a satisfactory predictive ability. Three models (Ext-SVM, Ext-Tree and Graph-RF) showed high and reliable predictive accuracy for both training and external test sets. In addition, several representative substructures for characterizing MGMT inhibitors were identified by information gain and substructure frequency analysis method. Our studies might be useful for further study to design and rapidly identify potential MGMT inhibitors.
Collapse
Affiliation(s)
- Guohui Sun
- Beijing Key Laboratory of Environmental & Viral Oncology, College of Life Science & Bioengineering, Beijing University of Technology, Beijing 100124, China.
| | - Tengjiao Fan
- Beijing Key Laboratory of Environmental & Viral Oncology, College of Life Science & Bioengineering, Beijing University of Technology, Beijing 100124, China.
| | - Xiaodong Sun
- Beijing Key Laboratory of Environmental & Viral Oncology, College of Life Science & Bioengineering, Beijing University of Technology, Beijing 100124, China.
| | - Yuxing Hao
- Beijing Key Laboratory of Environmental & Viral Oncology, College of Life Science & Bioengineering, Beijing University of Technology, Beijing 100124, China.
| | - Xin Cui
- Beijing Key Laboratory of Environmental & Viral Oncology, College of Life Science & Bioengineering, Beijing University of Technology, Beijing 100124, China.
| | - Lijiao Zhao
- Beijing Key Laboratory of Environmental & Viral Oncology, College of Life Science & Bioengineering, Beijing University of Technology, Beijing 100124, China.
| | - Ting Ren
- Beijing Key Laboratory of Environmental & Viral Oncology, College of Life Science & Bioengineering, Beijing University of Technology, Beijing 100124, China.
| | - Yue Zhou
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, 2A Nanwei Road, Beijing 100050, China.
| | - Rugang Zhong
- Beijing Key Laboratory of Environmental & Viral Oncology, College of Life Science & Bioengineering, Beijing University of Technology, Beijing 100124, China.
| | - Yongzhen Peng
- National Engineering Laboratory for Advanced Municipal Wastewater Treatment & Reuse Technology, Engineering Research Center of Beijing, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|
19
|
Fan T, Sun G, Zhao L, Cui X, Zhong R. QSAR and Classification Study on Prediction of Acute Oral Toxicity of N-Nitroso Compounds. Int J Mol Sci 2018; 19:E3015. [PMID: 30282923 PMCID: PMC6213880 DOI: 10.3390/ijms19103015] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2018] [Revised: 09/29/2018] [Accepted: 09/30/2018] [Indexed: 12/30/2022] Open
Abstract
To better understand the mechanism of in vivo toxicity of N-nitroso compounds (NNCs), the toxicity data of 80 NNCs related to their rat acute oral toxicity data (50% lethal dose concentration, LD50) were used to establish quantitative structure-activity relationship (QSAR) and classification models. Quantum chemistry methods calculated descriptors and Dragon descriptors were combined to describe the molecular information of all compounds. Genetic algorithm (GA) and multiple linear regression (MLR) analyses were combined to develop QSAR models. Fingerprints and machine learning methods were used to establish classification models. The quality and predictive performance of all established models were evaluated by internal and external validation techniques. The best GA-MLR-based QSAR model containing eight molecular descriptors was obtained with Q²loo = 0.7533, R² = 0.8071, Q²ext = 0.7041 and R²ext = 0.7195. The results derived from QSAR studies showed that the acute oral toxicity of NNCs mainly depends on three factors, namely, the polarizability, the ionization potential (IP) and the presence/absence and frequency of C⁻O bond. For classification studies, the best model was obtained using the MACCS keys fingerprint combined with artificial neural network (ANN) algorithm. The classification models suggested that several representative substructures, including nitrile, hetero N nonbasic, alkylchloride and amine-containing fragments are main contributors for the high toxicity of NNCs. Overall, the developed QSAR and classification models of the rat acute oral toxicity of NNCs showed satisfying predictive abilities. The results provide an insight into the understanding of the toxicity mechanism of NNCs in vivo, which might be used for a preliminary assessment of NNCs toxicity to mammals.
Collapse
Affiliation(s)
- Tengjiao Fan
- Beijing Key Laboratory of Environmental & Viral Oncology, College of Life Science and Bioengineering, Beijing University of Technology, Beijing 100124, China.
| | - Guohui Sun
- Beijing Key Laboratory of Environmental & Viral Oncology, College of Life Science and Bioengineering, Beijing University of Technology, Beijing 100124, China.
| | - Lijiao Zhao
- Beijing Key Laboratory of Environmental & Viral Oncology, College of Life Science and Bioengineering, Beijing University of Technology, Beijing 100124, China.
| | - Xin Cui
- Beijing Key Laboratory of Environmental & Viral Oncology, College of Life Science and Bioengineering, Beijing University of Technology, Beijing 100124, China.
| | - Rugang Zhong
- Beijing Key Laboratory of Environmental & Viral Oncology, College of Life Science and Bioengineering, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|
20
|
Kaiser TM, Burger PB, Butch CJ, Pelly SC, Liotta DC. A Machine Learning Approach for Predicting HIV Reverse Transcriptase Mutation Susceptibility of Biologically Active Compounds. J Chem Inf Model 2018; 58:1544-1552. [PMID: 29953819 DOI: 10.1021/acs.jcim.7b00475] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
HIV resistance emerging against antiretroviral drugs represents a great threat to the continued prolongation of the lifespans of HIV-infected patients. Therefore, methods capable of predicting resistance susceptibility in the development of compounds are in great need. By targeting the major reverse transcription residues Y181, K103, and L100, we used the biological activities of compounds against these enzymes and the wild-type reverse transcriptase to create Naïve Bayes Networks. Through this machine learning approach, we could predict, with high accuracy, whether a compound would be susceptible to a loss of potency due to resistance. Also, we could perfectly predict retrospectively whether compounds would be susceptible to both a K103 mutant RT and a Y181 mutant RT. In the study presented here, our method outperformed a traditional molecular mechanics approach. This method should be of broad interest beyond drug discovery efforts, and serves to expand the utility of machine learning for the prediction of physical, chemical, or biological properties using the vast information available in the literature.
Collapse
Affiliation(s)
- Thomas M Kaiser
- Department of Chemistry , Emory University , 201 Dowman Drive , Atlanta , Georgia 30322 , United States
| | - Pieter B Burger
- Department of Chemistry , Emory University , 201 Dowman Drive , Atlanta , Georgia 30322 , United States.,Department of Drug Discovery and Biomedical Sciences, College of Pharmacy , Medical University of South Carolina , 280 Calhoun St., MSC 141 , Charleston , South Carolina 29425-1410 , United States
| | - Christopher J Butch
- Department of Chemistry , Emory University , 201 Dowman Drive , Atlanta , Georgia 30322 , United States.,Earth-Life Science Institute , Tokyo Institute of Technology , 2-12-1-IE-1 Ookayam , Meguro-ku , Tokyo 152-8550 , Japan
| | - Stephen C Pelly
- Department of Chemistry , Emory University , 201 Dowman Drive , Atlanta , Georgia 30322 , United States
| | - Dennis C Liotta
- Department of Chemistry , Emory University , 201 Dowman Drive , Atlanta , Georgia 30322 , United States
| |
Collapse
|
21
|
Volpe DA, Qosa H. Challenges with the precise prediction of ABC-transporter interactions for improved drug discovery. Expert Opin Drug Discov 2018; 13:697-707. [PMID: 29943645 DOI: 10.1080/17460441.2018.1493454] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
INTRODUCTION Given that membrane efflux transporters can influence a drug's pharmacokinetics, efficacy and safety, identifying potential substrates and inhibitors of these transporters is a critical element in the drug discovery and development process. Additionally, it is important to predict the inhibition potential of new drugs to avoid clinically significant drug interactions. The goal of preclinical studies is to characterize a new drug as a substrate or inhibitor of efflux transporters. Areas covered: This article reviews preclinical systems that are routinely utilized to determine whether a new drug is substrate or inhibitor of efflux transporters including in silico models, in vitro membrane and cell assays, and animal models. Also included is an examination of studies comparing in vitro inhibition data to clinical drug interaction outcomes. Expert opinion: While a number of models are employed to classify a drug as an efflux substrate or inhibitor, there are challenges in predicting clinical drug interactions. Improvements could be made in these predictions through a tier approach to classify new drugs, validation of preclinical assays, and refinement of threshold criteria for clinical interaction studies.
Collapse
Affiliation(s)
- Donna A Volpe
- a Office of Clinical Pharmacology, Center for Drug Evaluation and Research , Food and Drug Administration , Silver Spring , MD , USA
| | - Hisham Qosa
- a Office of Clinical Pharmacology, Center for Drug Evaluation and Research , Food and Drug Administration , Silver Spring , MD , USA
| |
Collapse
|
22
|
Ghasemi F, Mehridehnavi A, Pérez-Garrido A, Pérez-Sánchez H. Neural network and deep-learning algorithms used in QSAR studies: merits and drawbacks. Drug Discov Today 2018; 23:1784-1790. [PMID: 29936244 DOI: 10.1016/j.drudis.2018.06.016] [Citation(s) in RCA: 91] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2017] [Revised: 06/05/2018] [Accepted: 06/14/2018] [Indexed: 10/28/2022]
Affiliation(s)
- Fahimeh Ghasemi
- Department of Bioinformatics and Systems Biology, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Hezar-Jerib Ave., 81746 73461, Islamic Republic of Iran.
| | - Alireza Mehridehnavi
- Department of Bioinformatics and Systems Biology, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Hezar-Jerib Ave., 81746 73461, Islamic Republic of Iran
| | - Alfonso Pérez-Garrido
- Bioinformatics and High Performance Computing Research Group (BIO-HPC), Universidad Católica de Murcia (UCAM), E30107 Murcia, Spain
| | - Horacio Pérez-Sánchez
- Bioinformatics and High Performance Computing Research Group (BIO-HPC), Universidad Católica de Murcia (UCAM), E30107 Murcia, Spain.
| |
Collapse
|
23
|
Fan D, Yang H, Li F, Sun L, Di P, Li W, Tang Y, Liu G. In silico prediction of chemical genotoxicity using machine learning methods and structural alerts. Toxicol Res (Camb) 2018; 7:211-220. [PMID: 30090576 PMCID: PMC6062245 DOI: 10.1039/c7tx00259a] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2017] [Accepted: 12/14/2017] [Indexed: 01/19/2023] Open
Abstract
Genotoxicity tests can detect compounds that have an adverse effect on the process of heredity. The in vivo micronucleus assay, a genotoxicity test method, has been widely used to evaluate the presence and extent of chromosomal damage in human beings. Due to the high cost and laboriousness of experimental tests, computational approaches for predicting genotoxicity based on chemical structures and properties are recognized as an alternative. In this study, a dataset containing 641 diverse chemicals was collected and the molecules were represented by both fingerprints and molecular descriptors. Then classification models were constructed by six machine learning methods, including the support vector machine (SVM), naïve Bayes (NB), k-nearest neighbor (kNN), C4.5 decision tree (DT), random forest (RF) and artificial neural network (ANN). The performance of the models was estimated by five-fold cross-validation and an external validation set. The top ten models showed excellent performance for the external validation with accuracies ranging from 0.846 to 0.938, among which models Pubchem_SVM and MACCS_RF showed a more reliable predictive ability. The applicability domain was also defined to distinguish favorable predictions from unfavorable ones. Finally, ten structural fragments which can be used to assess the genotoxicity potential of a chemical were identified by using information gain and structural fragment frequency analysis. Our models might be helpful for the initial screening of potential genotoxic compounds.
Collapse
Affiliation(s)
- Defang Fan
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China . ; ; ; Tel: +86-21-64250811
| | - Hongbin Yang
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China . ; ; ; Tel: +86-21-64250811
| | - Fuxing Li
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China . ; ; ; Tel: +86-21-64250811
| | - Lixia Sun
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China . ; ; ; Tel: +86-21-64250811
| | - Peiwen Di
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China . ; ; ; Tel: +86-21-64250811
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China . ; ; ; Tel: +86-21-64250811
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China . ; ; ; Tel: +86-21-64250811
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China . ; ; ; Tel: +86-21-64250811
| |
Collapse
|
24
|
Yang M, Chen J, Xu L, Shi X, Zhou X, Xi Z, An R, Wang X. A novel adaptive ensemble classification framework for ADME prediction. RSC Adv 2018; 8:11661-11683. [PMID: 35542768 PMCID: PMC9079056 DOI: 10.1039/c8ra01206g] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Accepted: 03/20/2018] [Indexed: 12/20/2022] Open
Abstract
AECF is a GA based ensemble method. It includes four components which are (1) data balancing, (2) generating individual models, (3) combining individual models, and (4) optimizing the ensemble.
Collapse
Affiliation(s)
- Ming Yang
- Department of Pharmacy
- Longhua Hospital Affiliated to Shanghai University of TCM
- Shanghai
- People's Republic of China
- Department of Chemistry
| | - Jialei Chen
- Department of Pharmacy
- Longhua Hospital Affiliated to Shanghai University of TCM
- Shanghai
- People's Republic of China
| | - Liwen Xu
- Department of Pharmacy
- Longhua Hospital Affiliated to Shanghai University of TCM
- Shanghai
- People's Republic of China
| | - Xiufeng Shi
- Department of Pharmacy
- Longhua Hospital Affiliated to Shanghai University of TCM
- Shanghai
- People's Republic of China
| | - Xin Zhou
- Department of Pharmacy
- Longhua Hospital Affiliated to Shanghai University of TCM
- Shanghai
- People's Republic of China
| | - Zhijun Xi
- Department of Pharmacy
- Longhua Hospital Affiliated to Shanghai University of TCM
- Shanghai
- People's Republic of China
| | - Rui An
- Department of Chemistry
- College of Pharmacy
- Shanghai University of Traditional Chinese Medicine
- Shanghai
- People's Republic of China
| | - Xinhong Wang
- Department of Chemistry
- College of Pharmacy
- Shanghai University of Traditional Chinese Medicine
- Shanghai
- People's Republic of China
| |
Collapse
|
25
|
Sun H, Huang R, Xia M, Shahane S, Southall N, Wang Y. Prediction of hERG Liability - Using SVM Classification, Bootstrapping and Jackknifing. Mol Inform 2017; 36:10.1002/minf.201600126. [PMID: 28000393 PMCID: PMC5382096 DOI: 10.1002/minf.201600126] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2016] [Accepted: 11/14/2016] [Indexed: 12/11/2022]
Abstract
Drug-induced QT prolongation leads to life-threatening cardiotoxicity, mostly through blockage of the human ether-à-go-go-related gene (hERG) encoded potassium ion (K+ ) channels. The hERG channel is one of the most important antitargets to be addressed in the early stage of drug discovery process, in order to avoid more costly failures in the development phase. Using a thallium flux assay, 4,323 molecules were screened for hERG channel inhibition in a quantitative high throughput screening (qHTS) format. Here, we present support vector classification (SVC) models of hERG channel inhibition with the averaged area under the receiver operator characteristics curve (AUC-ROC) of 0.93 for the tested compounds. Both Jackknifing and bootstrapping have been employed to rebalance the heavily biased training datasets, and the impact of these two under-sampling rebalance methods on the performance of the predictive models is discussed. Our results indicated that the rebalancing techniques did not enhance the predictive power of the resulting models; instead, adoption of optimal cutoffs could restore the desirable balance of sensitivity and specificity of the binary classifiers. In an external validation set of 66 drug molecules, the SVC model exhibited an AUC-ROC of 0.86, further demonstrating the utility of this modeling approach to predict hERG liabilities.
Collapse
Affiliation(s)
- Hongmao Sun
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA
| | - Ruili Huang
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA
| | - Menghang Xia
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA
| | - Sampada Shahane
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA
| | - Noel Southall
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA
| | - Yuhong Wang
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
26
|
Wang Q, Li X, Yang H, Cai Y, Wang Y, Wang Z, Li W, Tang Y, Liu G. In silico prediction of serious eye irritation or corrosion potential of chemicals. RSC Adv 2017. [DOI: 10.1039/c6ra25267b] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Chemical fingerprints combined with machine learning methods were used to build binary classification models for predicting the potential EC/EI of compounds.
Collapse
Affiliation(s)
- Qin Wang
- Shanghai Key Laboratory of New Drug Design
- School of Pharmacy
- East China University of Science and Technology
- Shanghai 200237
- China
| | - Xiao Li
- Shanghai Key Laboratory of New Drug Design
- School of Pharmacy
- East China University of Science and Technology
- Shanghai 200237
- China
| | - Hongbin Yang
- Shanghai Key Laboratory of New Drug Design
- School of Pharmacy
- East China University of Science and Technology
- Shanghai 200237
- China
| | - Yingchun Cai
- Shanghai Key Laboratory of New Drug Design
- School of Pharmacy
- East China University of Science and Technology
- Shanghai 200237
- China
| | - Yinyin Wang
- Shanghai Key Laboratory of New Drug Design
- School of Pharmacy
- East China University of Science and Technology
- Shanghai 200237
- China
| | - Zhuang Wang
- Shanghai Key Laboratory of New Drug Design
- School of Pharmacy
- East China University of Science and Technology
- Shanghai 200237
- China
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design
- School of Pharmacy
- East China University of Science and Technology
- Shanghai 200237
- China
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design
- School of Pharmacy
- East China University of Science and Technology
- Shanghai 200237
- China
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design
- School of Pharmacy
- East China University of Science and Technology
- Shanghai 200237
- China
| |
Collapse
|
27
|
Sun H, Nguyen K, Kerns E, Yan Z, Yu KR, Shah P, Jadhav A, Xu X. Highly predictive and interpretable models for PAMPA permeability. Bioorg Med Chem 2016; 25:1266-1276. [PMID: 28082071 DOI: 10.1016/j.bmc.2016.12.049] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Revised: 12/22/2016] [Accepted: 12/27/2016] [Indexed: 11/28/2022]
Abstract
Cell membrane permeability is an important determinant for oral absorption and bioavailability of a drug molecule. An in silico model predicting drug permeability is described, which is built based on a large permeability dataset of 7488 compound entries or 5435 structurally unique molecules measured by the same lab using parallel artificial membrane permeability assay (PAMPA). On the basis of customized molecular descriptors, the support vector regression (SVR) model trained with 4071 compounds with quantitative data is able to predict the remaining 1364 compounds with the qualitative data with an area under the curve of receiver operating characteristic (AUC-ROC) of 0.90. The support vector classification (SVC) model trained with half of the whole dataset comprised of both the quantitative and the qualitative data produced accurate predictions to the remaining data with the AUC-ROC of 0.88. The results suggest that the developed SVR model is highly predictive and provides medicinal chemists a useful in silico tool to facilitate design and synthesis of novel compounds with optimal drug-like properties, and thus accelerate the lead optimization in drug discovery.
Collapse
Affiliation(s)
- Hongmao Sun
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA.
| | - Kimloan Nguyen
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA
| | - Edward Kerns
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA
| | - Zhengyin Yan
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA
| | - Kyeong Ri Yu
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA
| | - Pranav Shah
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA
| | - Ajit Jadhav
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA
| | - Xin Xu
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA.
| |
Collapse
|
28
|
Ngo TD, Tran TD, Le MT, Thai KM. Machine learning-, rule- and pharmacophore-based classification on the inhibition of P-glycoprotein and NorA. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2016; 27:747-780. [PMID: 27667641 DOI: 10.1080/1062936x.2016.1233137] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2016] [Accepted: 09/02/2016] [Indexed: 06/06/2023]
Abstract
The efflux pumps P-glycoprotein (P-gp) in humans and NorA in Staphylococcus aureus are of great interest for medicinal chemists because of their important roles in multidrug resistance (MDR). The high polyspecificity as well as the unavailability of high-resolution X-ray crystal structures of these transmembrane proteins lead us to combining ligand-based approaches, which in the case of this study were machine learning, perceptual mapping and pharmacophore modelling. For P-gp inhibitory activity, individual models were developed using different machine learning algorithms and subsequently combined into an ensemble model which showed a good discrimination between inhibitors and noninhibitors (acctrain-diverse = 84%; accinternal-test = 92% and accexternal-test = 100%). For ligand promiscuity between P-gp and NorA, perceptual maps and pharmacophore models were generated for the detection of rules and features. Based on these in silico tools, hit compounds for reversing MDR were discovered from the in-house and DrugBank databases through virtual screening in an attempt to restore drug sensitivity in cancer cells and bacteria.
Collapse
Affiliation(s)
- T-D Ngo
- a Department of Medicinal Chemistry, Faculty of Pharmacy , University of Medicine and Pharmacy at Ho Chi Minh City , Viet Nam
| | - T-D Tran
- a Department of Medicinal Chemistry, Faculty of Pharmacy , University of Medicine and Pharmacy at Ho Chi Minh City , Viet Nam
| | - M-T Le
- a Department of Medicinal Chemistry, Faculty of Pharmacy , University of Medicine and Pharmacy at Ho Chi Minh City , Viet Nam
| | - K-M Thai
- a Department of Medicinal Chemistry, Faculty of Pharmacy , University of Medicine and Pharmacy at Ho Chi Minh City , Viet Nam
| |
Collapse
|
29
|
Niu AQ, Xie LJ, Wang H, Zhu B, Wang SQ. Prediction of selective estrogen receptor beta agonist using open data and machine learning approach. Drug Des Devel Ther 2016; 10:2323-31. [PMID: 27486309 PMCID: PMC4958355 DOI: 10.2147/dddt.s110603] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Background Estrogen receptors (ERs) are nuclear transcription factors that are involved in the regulation of many complex physiological processes in humans. ERs have been validated as important drug targets for the treatment of various diseases, including breast cancer, ovarian cancer, osteoporosis, and cardiovascular disease. ERs have two subtypes, ER-α and ER-β. Emerging data suggest that the development of subtype-selective ligands that specifically target ER-β could be a more optimal approach to elicit beneficial estrogen-like activities and reduce side effects. Methods Herein, we focused on ER-β and developed its in silico quantitative structure-activity relationship models using machine learning (ML) methods. Results The chemical structures and ER-β bioactivity data were extracted from public chemogenomics databases. Four types of popular fingerprint generation methods including MACCS fingerprint, PubChem fingerprint, 2D atom pairs, and Chemistry Development Kit extended fingerprint were used as descriptors. Four ML methods including Naïve Bayesian classifier, k-nearest neighbor, random forest, and support vector machine were used to train the models. The range of classification accuracies was 77.10% to 88.34%, and the range of area under the ROC (receiver operating characteristic) curve values was 0.8151 to 0.9475, evaluated by the 5-fold cross-validation. Comparison analysis suggests that both the random forest and the support vector machine are superior for the classification of selective ER-β agonists. Chemistry Development Kit extended fingerprints and MACCS fingerprint performed better in structural representation between active and inactive agonists. Conclusion These results demonstrate that combining the fingerprint and ML approaches leads to robust ER-β agonist prediction models, which are potentially applicable to the identification of selective ER-β agonists.
Collapse
Affiliation(s)
- Ai-Qin Niu
- Department of Gynecology, the First People's Hospital of Shangqiu, Shangqiu, Henan, People's Republic of China
| | - Liang-Jun Xie
- Department of Image Diagnoses, the Third Hospital of Jinan, Jinan, Shandong, People's Republic of China
| | - Hui Wang
- Department of Gynecology, the First People's Hospital of Shangqiu, Shangqiu, Henan, People's Republic of China
| | - Bing Zhu
- Department of Gynecology, the First People's Hospital of Shangqiu, Shangqiu, Henan, People's Republic of China
| | - Sheng-Qi Wang
- Department of Mammary Disease, Guangdong Provincial Hospital of Chinese Medicine, the Second Clinical College of Guangzhou University of Chinese Medicine, Guangzhou, People's Republic of China
| |
Collapse
|
30
|
Zhang C, Cheng F, Li W, Liu G, Lee PW, Tang Y. In silico Prediction of Drug Induced Liver Toxicity Using Substructure Pattern Recognition Method. Mol Inform 2016; 35:136-44. [PMID: 27491923 DOI: 10.1002/minf.201500055] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2015] [Accepted: 12/14/2015] [Indexed: 02/05/2023]
Abstract
Drug-induced liver injury (DILI) is a leading cause of acute liver failure in the US and less severe liver injury worldwide. It is also one of the major reasons of drug withdrawal from the market. Thus, DILI has become one of the most important concerns of drugs, and should be predicted in very early stage of drug discovery process. In this study, a comprehensive data set containing 1317 diverse compounds was collected from publications. Then, high accuracy classification models were built using five machine learning methods based on MACCS and FP4 fingerprints after evaluating by substructure pattern recognition method. The best model was built using SVM method together with FP4 fingerprint at the IG value threshold of 0.0005. Its overall predictive accuracies were 79.7 % and 64.5 % for the training and test sets, separately, which yielded overall accuracy of 75.0 % for the external validation dataset, consisting of 88 compounds collected from a benchmark DILI database - the Liver Toxicity Knowledge Base. This model could be used for drug-induced liver toxicity prediction. Moreover, some key substructure patterns correlated with drug-induced liver toxicity were also identified as structural alerts.
Collapse
Affiliation(s)
- Chen Zhang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Feixiong Cheng
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
- Current address: Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA, Tel: +86-21-64251052; Fax: +86-21-64251033
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Philip W Lee
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China.
| |
Collapse
|
31
|
Zhang C, Zhou Y, Gu S, Wu Z, Wu W, Liu C, Wang K, Liu G, Li W, Lee PW, Tang Y. In silico prediction of hERG potassium channel blockage by chemical category approaches. Toxicol Res (Camb) 2016; 5:570-582. [PMID: 30090371 DOI: 10.1039/c5tx00294j] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Accepted: 01/13/2016] [Indexed: 12/18/2022] Open
Abstract
The human ether-a-go-go related gene (hERG) plays an important role in cardiac action potential. It encodes an ion channel protein named Kv11.1, which is related to long QT syndrome and may cause avoidable sudden cardiac death. Therefore, it is important to assess the hERG channel blockage of lead compounds in an early drug discovery process. In this study, we collected a large data set containing 1163 diverse compounds with IC50 values determined by the patch clamp method on mammalian cell lines. The whole data set was divided into 80% as the training set and 20% as the test set. Then, five machine learning methods were applied to build a series of binary classification models based on 13 molecular descriptors, five fingerprints and molecular descriptors combining fingerprints at four IC50 thresholds to discriminate hERG blockers from nonblockers, respectively. Models built by molecular descriptors combining fingerprints were validated by using an external validation set containing 407 compounds collected from the hERGCentral database. The performance indicated that the model built by molecular descriptors combining fingerprints yielded the best results and each threshold had its best suitable method, which means that hERG blockage assessment might depend on threshold values. Meanwhile, kNN and SVM methods were better than the others for model building. Furthermore, six privileged substructures were identified using information gain and frequency analysis methods, which could be regarded as structural alerts of cardiac toxicity mediated by hERG channel blockage.
Collapse
Affiliation(s)
- Chen Zhang
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China . ; ; Tel: +86-21-64251052
| | - Yuan Zhou
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China . ; ; Tel: +86-21-64251052
| | - Shikai Gu
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China . ; ; Tel: +86-21-64251052
| | - Zengrui Wu
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China . ; ; Tel: +86-21-64251052
| | - Wenjie Wu
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China . ; ; Tel: +86-21-64251052
| | - Changming Liu
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China . ; ; Tel: +86-21-64251052
| | - Kaidong Wang
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China . ; ; Tel: +86-21-64251052
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China . ; ; Tel: +86-21-64251052
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China . ; ; Tel: +86-21-64251052
| | - Philip W Lee
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China . ; ; Tel: +86-21-64251052
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , 130 Meilong Road , Shanghai 200237 , China . ; ; Tel: +86-21-64251052
| |
Collapse
|
32
|
Yang M, Chen J, Shi X, Xu L, Xi Z, You L, An R, Wang X. Development of in Silico Models for Predicting P-Glycoprotein Inhibitors Based on a Two-Step Approach for Feature Selection and Its Application to Chinese Herbal Medicine Screening. Mol Pharm 2015; 12:3691-713. [PMID: 26376206 DOI: 10.1021/acs.molpharmaceut.5b00465] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
P-glycoprotein (P-gp) is regarded as an important factor in determining the ADMET (absorption, distribution, metabolism, elimination, and toxicity) characteristics of drugs and drug candidates. Successful prediction of P-gp inhibitors can thus lead to an improved understanding of the underlying mechanisms of both changes in the pharmacokinetics of drugs and drug-drug interactions. Therefore, there has been considerable interest in the development of in silico modeling of P-gp inhibitors in recent years. Considering that a large number of molecular descriptors are used to characterize diverse structural moleculars, efficient feature selection methods are required to extract the most informative predictors. In this work, we constructed an extensive available data set of 2428 molecules that includes 1518 P-gp inhibitors and 910 P-gp noninhibitors from multiple resources. Importantly, a two-step feature selection approach based on a genetic algorithm and a greedy forward-searching algorithm was employed to select the minimum set of the most informative descriptors that contribute to the prediction of P-gp inhibitors. To determine the best machine learning algorithm, 18 classifiers coupled with the feature selection method were compared. The top three best-performing models (flexible discriminant analysis, support vector machine, and random forest) and their ensemble model using respectively only 3, 9, 7, and 14 descriptors achieve an overall accuracy of 83.2%-86.7% for the training set containing 1040 compounds, an overall accuracy of 82.3%-85.5% for the test set containing 1039 compounds, and a prediction accuracy of 77.4%-79.9% for the external validation set containing 349 compounds. The models were further extensively validated by DrugBank database (1890 compounds). The proposed models are competitive with and in some cases better than other published models in terms of prediction accuracy and minimum number of descriptors. Applicability domain then was addressed by developing an ensemble classification model to obtain more reliable predictions. Finally, we employed these models as a virtual screening tool for identifying potential P-gp inhibitors in Traditional Chinese Medicine Systems Pharmacology (TCMSP) database containing a total of 13 051 unique compounds from 498 herbs, resulting in 875 potential P-gp inhibitors and 15 inhibitor-rich herbs. These predictions were partly supported by a literature search and are valuable not only to develop novel P-gp inhibitors from TCM in the early stages of drug development, but also to optimize the use of herbal remedies.
Collapse
Affiliation(s)
- Ming Yang
- Department of Chemistry, College of Pharmacy, Shanghai University of Traditional Chinese Medicine , Shanghai 200444, People's Republic of China.,Department of Pharmacy, Longhua Hospital Affiliated to Shanghai University of Traditional Chinese Medicine , Shanghai 200032, People's Republic of China
| | - Jialei Chen
- Department of Pharmacy, Longhua Hospital Affiliated to Shanghai University of Traditional Chinese Medicine , Shanghai 200032, People's Republic of China
| | - Xiufeng Shi
- Department of Pharmacy, Longhua Hospital Affiliated to Shanghai University of Traditional Chinese Medicine , Shanghai 200032, People's Republic of China
| | - Liwen Xu
- Department of Pharmacy, Longhua Hospital Affiliated to Shanghai University of Traditional Chinese Medicine , Shanghai 200032, People's Republic of China
| | - Zhijun Xi
- Department of Pharmacy, Longhua Hospital Affiliated to Shanghai University of Traditional Chinese Medicine , Shanghai 200032, People's Republic of China
| | - Lisha You
- Department of Chemistry, College of Pharmacy, Shanghai University of Traditional Chinese Medicine , Shanghai 200444, People's Republic of China
| | - Rui An
- Department of Chemistry, College of Pharmacy, Shanghai University of Traditional Chinese Medicine , Shanghai 200444, People's Republic of China
| | - Xinhong Wang
- Department of Chemistry, College of Pharmacy, Shanghai University of Traditional Chinese Medicine , Shanghai 200444, People's Republic of China
| |
Collapse
|
33
|
Korkmaz S, Zararsiz G, Goksuluk D. MLViS: A Web Tool for Machine Learning-Based Virtual Screening in Early-Phase of Drug Discovery and Development. PLoS One 2015; 10:e0124600. [PMID: 25928885 PMCID: PMC4415797 DOI: 10.1371/journal.pone.0124600] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Accepted: 03/03/2015] [Indexed: 12/18/2022] Open
Abstract
Virtual screening is an important step in early-phase of drug discovery process. Since there are thousands of compounds, this step should be both fast and effective in order to distinguish drug-like and nondrug-like molecules. Statistical machine learning methods are widely used in drug discovery studies for classification purpose. Here, we aim to develop a new tool, which can classify molecules as drug-like and nondrug-like based on various machine learning methods, including discriminant, tree-based, kernel-based, ensemble and other algorithms. To construct this tool, first, performances of twenty-three different machine learning algorithms are compared by ten different measures, then, ten best performing algorithms have been selected based on principal component and hierarchical cluster analysis results. Besides classification, this application has also ability to create heat map and dendrogram for visual inspection of the molecules through hierarchical cluster analysis. Moreover, users can connect the PubChem database to download molecular information and to create two-dimensional structures of compounds. This application is freely available through www.biosoft.hacettepe.edu.tr/MLViS/.
Collapse
Affiliation(s)
- Selcuk Korkmaz
- Department of Biostatistics, Faculty of Medicine, Hacettepe University, Sihhiye, Ankara, Turkey
- * E-mail:
| | - Gokmen Zararsiz
- Department of Biostatistics, Faculty of Medicine, Hacettepe University, Sihhiye, Ankara, Turkey
| | - Dincer Goksuluk
- Department of Biostatistics, Faculty of Medicine, Hacettepe University, Sihhiye, Ankara, Turkey
| |
Collapse
|
34
|
Subhani S, Jayaraman A, Jamil K. Homology modelling and molecular docking of MDR1 with chemotherapeutic agents in non-small cell lung cancer. Biomed Pharmacother 2015; 71:37-45. [PMID: 25960213 DOI: 10.1016/j.biopha.2015.02.009] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2015] [Accepted: 02/09/2015] [Indexed: 10/24/2022] Open
Abstract
MDR1, a protein commonly involved in drug transport, has been linked to multi drug resistance and disease progression in cancers such as non-small cell lung cancer. Hence, targeting this protein is essential for improving drug design and preventing adverse drug-drug interactions. The aim of the study was to examine chemotherapeutic drug binding to MDR1 and the interactions therein. We have used Schrödinger suite 2014, to perform homology modelling of human MDR1 based on Mouse MDR1, followed by Induced Fit Docking with Paclitaxel, Docetaxel, Gemcitabine, Carboplatin and Cisplatin drugs. Finally, we evaluated drug binding affinities using Prime/MMGBSA and using these scores we compared the affinities of combination therapies against MDR1. Analysis of the docking results showed Paclitaxel>Docetaxel>Gemcitabine>Carboplatin>Cisplatin as the order of binding affinities, with Paclitaxel having the best docking score. The combination drug binding affinity analysis showed Paclitaxel+Gemcitabine to have the best docking score and hence, efficacy. Through our investigation we have identified the residues Gln 195 and Gln 946 to be more frequently involved in drug binding interactions with MDR1. Our results suggest that, Paclitaxel or combination of Paclitaxel+Gemcitabine could serve as a suitable therapy against MDR1 in NSCLC patients. Thus, our study provides new insight into the possible repurposing of chemotherapeutic drugs in targeting elevated MDR1 levels in NSCLC patients, thereby ensuring better overall outcome. Further our study highlights the use of in silico methodologies in understanding drug binding to protein targets and its relevance to advancing lung cancer therapy.
Collapse
Affiliation(s)
- Syed Subhani
- Genetics Department, Bhagwan Mahavir Medical Research Centre, #10-1-1, Mahavir Marg, Masab Tank, Hyderabad 500004, Telangana, India.
| | - Archana Jayaraman
- Centre for Biotechnology and Bioinformatics, School of Life Sciences, Jawaharlal Nehru Institute of Advanced Studies (JNIAS), 6th Floor, Buddha Bhawan, M.G. Road, Secunderabad 500003, Telangana, India.
| | - Kaiser Jamil
- Genetics Department, Bhagwan Mahavir Medical Research Centre, #10-1-1, Mahavir Marg, Masab Tank, Hyderabad 500004, Telangana, India; Centre for Biotechnology and Bioinformatics, School of Life Sciences, Jawaharlal Nehru Institute of Advanced Studies (JNIAS), 6th Floor, Buddha Bhawan, M.G. Road, Secunderabad 500003, Telangana, India.
| |
Collapse
|
35
|
Thai KM, Huynh NT, Ngo TD, Mai TT, Nguyen TH, Tran TD. Three- and four-class classification models for P-glycoprotein inhibitors using counter-propagation neural networks. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2015; 26:139-163. [PMID: 25588022 DOI: 10.1080/1062936x.2014.995701] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
P-glycoprotein (P-gp) is an ATP binding cassette (ABC) transporter that helps to protect several certain human organs from xenobiotic exposure. This efflux pump is also responsible for multi-drug resistance (MDR), an issue of the chemotherapy approach in the fight against cancer. Therefore, the discovery of P-gp inhibitors is considered one of the most popular strategies to reverse MDR in tumour cells and to improve therapeutic efficacy of commonly used cytotoxic drugs. Until now, several generations of P-gp inhibitors have been developed but they have largely failed in preclinical and clinical studies due to lack of selectivity, poor solubility and severe pharmacokinetic interactions. In this study, three models (SION, SIO, SIN) to classify specific 'true' P-gp inhibitors as well as three other models (CPBN, CPB1, CPN) to distinguish between P-gp inhibitors, CYP 3A inhibitors and co-inhibitors of these proteins with rather high accuracy values for the test set and the external set were generated based on counter-propagation neural networks (CPG-NN). Such three and four-class classification models helped provide more information about the bioactivities of compounds not only on one target (P-gp), but also on a combination of multiple targets (P-gp, CYP 3A).
Collapse
Affiliation(s)
- K-M Thai
- a Department of Medicinal Chemistry, School of Pharmacy , University of Medicine and Pharmacy at Ho Chi Minh City , Ho Chi Minh City , Viet Nam
| | | | | | | | | | | |
Collapse
|
36
|
Erić S, Kalinić M, Ilić K, Zloh M. Computational classification models for predicting the interaction of drugs with P-glycoprotein and breast cancer resistance protein. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2014; 25:939-966. [PMID: 25435255 DOI: 10.1080/1062936x.2014.976265] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2014] [Accepted: 08/13/2014] [Indexed: 06/04/2023]
Abstract
P-glycoprotein (P-gp/ABCB1) and breast cancer resistance protein (BCRP/ABCG2) are two members of the adenosine triphosphate (ATP) binding cassette (ABC) family of transporters which function as membrane efflux transporters and display considerable substrate promiscuity. Both are known to significantly influence the absorption, distribution and elimination of drugs, mediate drug-drug interactions and contribute to multiple drug resistance (MDR) of cancer cells. Correspondingly, timely characterization of the interaction of novel leads and drug candidates with these two transporters is of great importance. In this study, several computational classification models for prediction of transport and inhibition of P-gp and BCRP, respectively, were developed based on newly compiled and critically evaluated experimental data. Artificial neural network (ANN) and support vector machine (SVM) ensemble based models were explored, as well as knowledge-based approaches to descriptor selection. The average overall classification accuracy of best performing models was 82% for P-gp transport, 88% for BCRP transport, 89% for P-gp inhibition and 87% for BCRP inhibition, determined across an array of different test sets. An analysis of substrate overlap between P-gp and BCRP was also performed. The accuracy, simplicity and interpretability of the proposed models suggest that they could be of significant utility in the drug discovery and development settings.
Collapse
Affiliation(s)
- S Erić
- a Department of Pharmaceutical Chemistry , University of Belgrade , Belgrade , Serbia
| | | | | | | |
Collapse
|
37
|
Balfer J, Bajorath J. Introduction of a methodology for visualization and graphical interpretation of Bayesian classification models. J Chem Inf Model 2014; 54:2451-68. [PMID: 25137527 DOI: 10.1021/ci500410g] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
Supervised machine learning models are widely used in chemoinformatics, especially for the prediction of new active compounds or targets of known actives. Bayesian classification methods are among the most popular machine learning approaches for the prediction of activity from chemical structure. Much work has focused on predicting structure-activity relationships (SARs) on the basis of experimental training data. By contrast, only a few efforts have thus far been made to rationalize the performance of Bayesian or other supervised machine learning models and better understand why they might succeed or fail. In this study, we introduce an intuitive approach for the visualization and graphical interpretation of naïve Bayesian classification models. Parameters derived during supervised learning are visualized and interactively analyzed to gain insights into model performance and identify features that determine predictions. The methodology is introduced in detail and applied to assess Bayesian modeling efforts and predictions on compound data sets of varying structural complexity. Different classification models and features determining their performance are characterized in detail. A prototypic implementation of the approach is provided.
Collapse
Affiliation(s)
- Jenny Balfer
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität , Dahlmannstrasse 2, D-53113 Bonn, Germany
| | | |
Collapse
|
38
|
Liu Z, Zheng M, Yan X, Gu Q, Gasteiger J, Tijhuis J, Maas P, Li J, Xu J. ChemStable: a web server for rule-embedded naïve Bayesian learning approach to predict compound stability. J Comput Aided Mol Des 2014; 28:941-50. [PMID: 25031075 DOI: 10.1007/s10822-014-9778-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2014] [Accepted: 07/09/2014] [Indexed: 11/26/2022]
Abstract
Predicting compound chemical stability is important because unstable compounds can lead to either false positive or to false negative conclusions in bioassays. Experimental data (COMDECOM) measured from DMSO/H2O solutions stored at 50 °C for 105 days were used to predicted stability by applying rule-embedded naïve Bayesian learning, based upon atom center fragment (ACF) features. To build the naïve Bayesian classifier, we derived ACF features from 9,746 compounds in the COMDECOM dataset. By recursively applying naïve Bayesian learning from the data set, each ACF is assigned with an expected stable probability (p(s)) and an unstable probability (p(uns)). 13,340 ACFs, together with their p(s) and p(uns) data, were stored in a knowledge base for use by the Bayesian classifier. For a given compound, its ACFs were derived from its structure connection table with the same protocol used to drive ACFs from the training data. Then, the Bayesian classifier assigned p(s) and p(uns) values to the compound ACFs by a structural pattern recognition algorithm, which was implemented in-house. Compound instability is calculated, with Bayes' theorem, based upon the p(s) and p(uns) values of the compound ACFs. We were able to achieve performance with an AUC value of 84% and a tenfold cross validation accuracy of 76.5%. To reduce false negatives, a rule-based approach has been embedded in the classifier. The rule-based module allows the program to improve its predictivity by expanding its compound instability knowledge base, thus further reducing the possibility of false negatives. To our knowledge, this is the first in silico prediction service for the prediction of the stabilities of organic compounds.
Collapse
Affiliation(s)
- Zhihong Liu
- Research Center for Drug Discovery, School of Pharmaceutical Sciences, Sun Yat-sen University, 132 East Circle at University City, Guangzhou, 510006, China
| | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Li D, Chen L, Li Y, Tian S, Sun H, Hou T. ADMET Evaluation in Drug Discovery. 13. Development of in Silico Prediction Models for P-Glycoprotein Substrates. Mol Pharm 2014; 11:716-26. [DOI: 10.1021/mp400450m] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Dan Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Lei Chen
- Institute of Functional Nano & Soft Materials (FUNSOM) and Collaborative Innovation Center of Suzhou Nano Science and Technology, Soochow University, Suzhou, Jiangsu 215123, China
| | - Youyong Li
- Institute of Functional Nano & Soft Materials (FUNSOM) and Collaborative Innovation Center of Suzhou Nano Science and Technology, Soochow University, Suzhou, Jiangsu 215123, China
| | - Sheng Tian
- Institute of Functional Nano & Soft Materials (FUNSOM) and Collaborative Innovation Center of Suzhou Nano Science and Technology, Soochow University, Suzhou, Jiangsu 215123, China
| | - Huiyong Sun
- Institute of Functional Nano & Soft Materials (FUNSOM) and Collaborative Innovation Center of Suzhou Nano Science and Technology, Soochow University, Suzhou, Jiangsu 215123, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
- Institute of Functional Nano & Soft Materials (FUNSOM) and Collaborative Innovation Center of Suzhou Nano Science and Technology, Soochow University, Suzhou, Jiangsu 215123, China
| |
Collapse
|
40
|
Xu C, Cheng F, Chen L, Du Z, Li W, Liu G, Lee PW, Tang Y. In silico Prediction of Chemical Ames Mutagenicity. J Chem Inf Model 2012; 52:2840-7. [DOI: 10.1021/ci300400a] [Citation(s) in RCA: 114] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Affiliation(s)
- Congying Xu
- Shanghai Key Laboratory of New
Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai
200237, China
| | - Feixiong Cheng
- Shanghai Key Laboratory of New
Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai
200237, China
| | - Lei Chen
- Shanghai Key Laboratory of New
Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai
200237, China
| | - Zheng Du
- Shanghai Key Laboratory of New
Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai
200237, China
| | - Weihua Li
- Shanghai Key Laboratory of New
Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai
200237, China
| | - Guixia Liu
- Shanghai Key Laboratory of New
Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai
200237, China
- State key
Laboratory of Drug
Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203,
China
| | - Philip W. Lee
- Shanghai Key Laboratory of New
Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai
200237, China
| | - Yun Tang
- Shanghai Key Laboratory of New
Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai
200237, China
| |
Collapse
|
41
|
Modi S, Li J, Malcomber S, Moore C, Scott A, White A, Carmichael P. Integrated in silico approaches for the prediction of Ames test mutagenicity. J Comput Aided Mol Des 2012; 26:1017-33. [DOI: 10.1007/s10822-012-9595-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2011] [Accepted: 08/09/2012] [Indexed: 02/04/2023]
|
42
|
Chen L, Li Y, Yu H, Zhang L, Hou T. Computational models for predicting substrates or inhibitors of P-glycoprotein. Drug Discov Today 2012; 17:343-51. [DOI: 10.1016/j.drudis.2011.11.003] [Citation(s) in RCA: 108] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2011] [Revised: 10/24/2011] [Accepted: 11/10/2011] [Indexed: 01/11/2023]
|
43
|
Carbon-Mangels M, Hutter MC. Selecting Relevant Descriptors for Classification by Bayesian Estimates: A Comparison with Decision Trees and Support Vector Machines Approaches for Disparate Data Sets. Mol Inform 2011; 30:885-95. [PMID: 27468108 DOI: 10.1002/minf.201100069] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2011] [Accepted: 08/19/2011] [Indexed: 11/12/2022]
Abstract
Classification algorithms suffer from the curse of dimensionality, which leads to overfitting, particularly if the problem is over-determined. Therefore it is of particular interest to identify the most relevant descriptors to reduce the complexity. We applied Bayesian estimates to model the probability distribution of descriptors values used for binary classification using n-fold cross-validation. As a measure for the discriminative power of the classifiers, the symmetric form of the Kullback-Leibler divergence of their probability distributions was computed. We found that the most relevant descriptors possess a Gaussian-like distribution of their values, show the largest divergences, and therefore appear most often in the cross-validation scenario. The results were compared to those of the LASSO feature selection method applied to multiple decision trees and support vector machine approaches for data sets of substrates and nonsubstrates of three Cytochrome P450 isoenzymes, which comprise strongly unbalanced compound distributions. In contrast to decision trees and support vector machines, the performance of Bayesian estimates is less affected by unbalanced data sets. This strategy reveals those descriptors that allow a simple linear separation of the classes, whereas the superior accuracy of decision trees and support vector machines can be attributed to nonlinear separation, which are in turn more prone to overfitting.
Collapse
Affiliation(s)
- Miriam Carbon-Mangels
- Section of Biostatistics, Paul-Ehrlich-Institut, Federal Institute for Vaccines and Biomedicines, Paul-Ehrlich-Straße 51-59, 63225 Langen, Germany
| | - Michael C Hutter
- Center for Bioinformatics, Saarland University, Campus Building E2.1, 66123 Saarbrücken, Germany phone/fax: +49 681 302 70703/70702.
| |
Collapse
|
44
|
Hao M, Li Y, Wang Y, Zhang S. A classification study of human β 3-adrenergic receptor agonists using BCUT descriptors. Mol Divers 2011; 15:877-87. [DOI: 10.1007/s11030-011-9321-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2011] [Accepted: 05/17/2011] [Indexed: 10/18/2022]
|
45
|
Chen L, Li Y, Zhao Q, Peng H, Hou T. ADME Evaluation in Drug Discovery. 10. Predictions of P-Glycoprotein Inhibitors Using Recursive Partitioning and Naive Bayesian Classification Techniques. Mol Pharm 2011; 8:889-900. [DOI: 10.1021/mp100465q] [Citation(s) in RCA: 127] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Lei Chen
- Institute of Functional Nano & Soft Materials (FUNSOM) and Jiangsu Key Laboratory for Carbon-Based Functional Materials & Devices, Soochow University, Suzhou, Jiangsu 215123, China
| | - Youyong Li
- Institute of Functional Nano & Soft Materials (FUNSOM) and Jiangsu Key Laboratory for Carbon-Based Functional Materials & Devices, Soochow University, Suzhou, Jiangsu 215123, China
| | - Qing Zhao
- Department of Molecular Immunology, Institute of Basic Medical Sciences, Beijing 100850, China
| | - Hui Peng
- Department of Molecular Immunology, Institute of Basic Medical Sciences, Beijing 100850, China
| | - Tingjun Hou
- Institute of Functional Nano & Soft Materials (FUNSOM) and Jiangsu Key Laboratory for Carbon-Based Functional Materials & Devices, Soochow University, Suzhou, Jiangsu 215123, China
| |
Collapse
|
46
|
Li Y, Du G, Cai W, Shao X. Classification and Quantitative Analysis of Azithromycin Tablets by Raman Spectroscopy and Chemometrics. ACTA ACUST UNITED AC 2011. [DOI: 10.4236/ajac.2011.22015] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
47
|
Abstract
Extended-connectivity fingerprints (ECFPs) are a novel class of topological fingerprints for molecular characterization. Historically, topological fingerprints were developed for substructure and similarity searching. ECFPs were developed specifically for structure-activity modeling. ECFPs are circular fingerprints with a number of useful qualities: they can be very rapidly calculated; they are not predefined and can represent an essentially infinite number of different molecular features (including stereochemical information); their features represent the presence of particular substructures, allowing easier interpretation of analysis results; and the ECFP algorithm can be tailored to generate different types of circular fingerprints, optimized for different uses. While the use of ECFPs has been widely adopted and validated, a description of their implementation has not previously been presented in the literature.
Collapse
|
48
|
Cucurull-Sanchez L. Successful identification of key chemical structure modifications that lead to improved ADME profiles. J Comput Aided Mol Des 2010; 24:449-58. [DOI: 10.1007/s10822-010-9361-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2009] [Accepted: 04/26/2010] [Indexed: 11/28/2022]
|
49
|
Lead Discovery Using Virtual Screening. TOPICS IN MEDICINAL CHEMISTRY 2009. [PMCID: PMC7176223 DOI: 10.1007/7355_2009_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The practice of virtual screening (VS) to identify chemical leads to known or novel targets is becoming a core function of the computational chemist within industry. By employing a range of techniques, when attempting to identify compounds with activity against a biological target, a small focused subset of a larger collection of compounds can be identified and tested, often with results much better than selecting a similar number of compounds at random. We will review the key methods available, their relative success, and provide practical insights into best practices and key gaps. We will also argue that the capability of VS methods has grown to a point where fuller integration with experimental methods, including HTS, could increase the effectiveness of both.
Collapse
|
50
|
Chen X, Liang YZ, Yuan DL, Xu QS. A modified uncorrelated linear discriminant analysis model coupled with recursive feature elimination for the prediction of bioactivity. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2009; 20:1-26. [PMID: 19343582 DOI: 10.1080/10629360902724127] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
To meet the requirements of providing accurate, robust, and interpretable prediction of bioactivity, a modified uncorrelated linear discriminant analysis (M-ULDA) model was developed. In addition, a feature selection method called recursive feature elimination (RFE), originally used for support vector machine (SVM), was introduced and modified to fit the scheme of ULDA. From the evaluation of six pharmaceutical datasets, the M-UDLA coupled with RFE showed better or comparable classification accuracy with respect to other well-studied methods such as SVM and decision trees. The RFE used for ULDA has the advantage of increasing the computational speed and provides useful insights into biochemical mechanisms related to pharmaceutical activity by significantly reducing the number of variables used for the final model.
Collapse
Affiliation(s)
- X Chen
- College of Chemistry and Chemical Engineering, Central South University, Changsha, People's Republic of China
| | | | | | | |
Collapse
|