Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

93
(from Reference Citation Analysis)

Article PDFs (14)

Cited by > 0 (76)

Searched Name

Dong-Sheng Cao

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Statistics

Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Category

Show more Refine

Number	Citation Analysis
1	CPhaMAS: An online platform for pharmacokinetic data analysis based on optimized parameter fitting algorithm. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024;248:108137. [PMID: 38520784 DOI: 10.1016/j.cmpb.2024.108137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 03/15/2024] [Accepted: 03/17/2024] [Indexed: 03/25/2024] Abstract BACKGROUND AND OBJECTIVE Clinical pharmacological modeling and statistical analysis software is an essential basic tool for drug development and personalized drug therapy. The learning curve of current basic tools is steep and unfriendly to beginners. The curve is even more challenging in cases of significant individual differences or measurement errors in data, resulting in difficulties in accurately estimating pharmacokinetic parameters by existing fitting algorithms. Hence, this study aims to explore a new optimized parameter fitting algorithm that reduces the sensitivity of the model to initial values and integrate it into the CPhaMAS platform, a user-friendly online application for pharmacokinetic data analysis. METHODS In this study, we proposed an optimized Nelder-Mead method that reinitializes simplex vertices when trapped in local solutions and integrated it into the CPhaMAS platform. The CPhaMAS, an online platform for pharmacokinetic data analysis, includes three modules: compartment model analysis, non-compartment analysis (NCA) and bioequivalence/bioavailability (BE/BA) analysis. Our proposed CPhaMAS platform was evaluated and compared with existing WinNonlin. RESULTS The platform was easy to learn and did not require code programming. The accuracy investigation found that the optimized Nelder-Mead method of the CPhaMAS platform showed better accuracy (smaller mean relative error and higher R2) in two-compartment and extravascular administration models when the initial value was set to true and abnormal values (10 times larger or smaller than the true value) compared with the WinNonlin. The mean relative error of the NCA calculation parameters of CPhaMAS and WinNonlin was <0.0001 %. When calculating BE for conventional, high-variability and narrow-therapeutic drugs. The main statistical parameters of the parameters Cmax, AUCt, and AUCinf in CPhaMAS have a mean relative error of <0.01% compared to WinNonLin. CONCLUSIONS In summary, CPhaMAS is a user-friendly platform with relatively accurate algorithms. It is a powerful tool for analysing pharmacokinetic data for new drug development and precision medicine. Collapse Key Words CPhaMAS Online platform Optimized Nelder-Mead method Pharmacokinetic data analysis Collapse MESH Headings Algorithms Software Models, Theoretical Pharmaceutical Preparations Research Design Collapse Grants Collapse
2	Enhancing Multi-species Liver Microsomal Stability Prediction through Artificial Intelligence. J Chem Inf Model 2024;64:3222-3236. [PMID: 38498003 DOI: 10.1021/acs.jcim.4c00159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2024] Abstract Liver microsomal stability, a crucial aspect of metabolic stability, significantly impacts practical drug discovery. However, current models for predicting liver microsomal stability are based on limited molecular information from a single species. To address this limitation, we constructed the largest public database of compounds from three common species: human, rat, and mouse. Subsequently, we developed a series of classification models using both traditional descriptor-based and classic graph-based machine learning (ML) algorithms. Remarkably, the best-performing models for the three species achieved Matthews correlation coefficients (MCCs) of 0.616, 0.603, and 0.574, respectively, on the test set. Furthermore, through the construction of consensus models based on these individual models, we have demonstrated their superior predictive performance in comparison with the existing models of the same type. To explore the similarities and differences in the properties of liver microsomal stability among multispecies molecules, we conducted preliminary interpretative explorations using the Shapley additive explanations (SHAP) and atom heatmap approaches for the models and misclassified molecules. Additionally, we further investigated representative structural modifications and substructures that decrease the liver microsomal stability in different species using the matched molecule pair analysis (MMPA) method and substructure extraction techniques. The established prediction models, along with insightful interpretation information regarding liver microsomal stability, will significantly contribute to enhancing the efficiency of exploring practical drugs for development. Collapse Key Words Collapse MESH Headings Microsomes, Liver/metabolism Animals Mice Rats Humans Artificial Intelligence Machine Learning Drug Discovery/methods Pharmaceutical Preparations/metabolism Pharmaceutical Preparations/chemistry Collapse Grants Collapse
3	PatentNetML: A Novel Framework for Predicting Key Compounds in Patents Using Network Science and Machine Learning. J Med Chem 2024;67:1347-1359. [PMID: 38181431 DOI: 10.1021/acs.jmedchem.3c01893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2024] Abstract Patents play a crucial role in drug research and development, providing early access to unpublished data and offering unique insights. Identifying key compounds in patents is essential to finding novel lead compounds. This study collected a comprehensive data set comprising 1555 patents, encompassing 1000 key compounds, to explore innovative approaches for predicting these key compounds. Our novel PatentNetML framework integrated network science and machine learning algorithms, combining network measures, ADMET properties, and physicochemical properties, to construct robust classification models to identify key compounds. Through a model interpretation and an analysis of three compelling case studies, we showcase the potential of PatentNetML in unveiling hidden patterns and connections within diverse patents. While our framework is pioneering, we acknowledge its limitations when applied to patents that deviate from the assumed central pattern. This work serves as a promising foundation for future research endeavors aimed at efficiently identifying promising drug candidates and expediting drug discovery in the pharmaceutical industry. Collapse Key Words Collapse MESH Headings Machine Learning Algorithms Drug Discovery Drug Industry Collapse Grants Collapse
4	ChemMORT: an automatic ADMET optimization platform using deep learning and multi-objective particle swarm optimization. Brief Bioinform 2024;25:bbae008. [PMID: 38385872 PMCID: PMC10883642 DOI: 10.1093/bib/bbae008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 12/17/2023] [Accepted: 01/02/2024] [Indexed: 02/23/2024] Open Abstract Drug discovery and development constitute a laborious and costly undertaking. The success of a drug hinges not only good efficacy but also acceptable absorption, distribution, metabolism, elimination, and toxicity (ADMET) properties. Overall, up to 50% of drug development failures have been contributed from undesirable ADMET profiles. As a multiple parameter objective, the optimization of the ADMET properties is extremely challenging owing to the vast chemical space and limited human expert knowledge. In this study, a freely available platform called Chemical Molecular Optimization, Representation and Translation (ChemMORT) is developed for the optimization of multiple ADMET endpoints without the loss of potency (https://cadd.nscc-tj.cn/deploy/chemmort/). ChemMORT contains three modules: Simplified Molecular Input Line Entry System (SMILES) Encoder, Descriptor Decoder and Molecular Optimizer. The SMILES Encoder can generate the molecular representation with a 512-dimensional vector, and the Descriptor Decoder is able to translate the above representation to the corresponding molecular structure with high accuracy. Based on reversible molecular representation and particle swarm optimization strategy, the Molecular Optimizer can be used to effectively optimize undesirable ADMET properties without the loss of bioactivity, which essentially accomplishes the design of inverse QSAR. The constrained multi-objective optimization of the poly (ADP-ribose) polymerase-1 inhibitor is provided as the case to explore the utility of ChemMORT. Collapse Key Words ADMET evaluation deep learning inverse QSAR lead optimization particle swarm optimization reversible molecular representation substructure modification Collapse MESH Headings Humans Deep Learning Drug Development Drug Discovery Poly(ADP-ribose) Polymerase Inhibitors Collapse Grants 2022YFA1004303 National Key Research and Development Program of China 2023-KJWHPCL-01 Foundation of State Key Laboratory of HPCL 23-ZZCX-JDZ-08 Science Foundation for Indigenous Innovation of National University of Defense Technology 22173118 National Science Foundation of China 2021JJ10068 Excellent Youth Foundation of Hunan Province 24A520036 Key scientific research projects in higher education institutions of Henan Province SDF19-0402-P02 HKBU Strategic Development Fund project Collapse
5	Comprehensive Review of Drug-Drug Interaction Prediction Based on Machine Learning: Current Status, Challenges, and Opportunities. J Chem Inf Model 2024;64:96-109. [PMID: 38132638 DOI: 10.1021/acs.jcim.3c01304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2023] Abstract Detecting drug-drug interactions (DDIs) is an essential step in drug development and drug administration. Given the shortcomings of current experimental methods, the machine learning (ML) approach has become a reliable alternative, attracting extensive attention from the academic and industrial fields. With the rapid development of computational science and the growing popularity of cross-disciplinary research, a large number of DDI prediction studies based on ML methods have been published in recent years. To give an insight into the current situation and future direction of DDI prediction research, we systemically review these studies from three aspects: (1) the classic DDI databases, mainly including databases of drugs, side effects, and DDI information; (2) commonly used drug attributes, which focus on chemical, biological, and phenotypic attributes for representing drugs; (3) popular ML approaches, such as shallow learning-based, deep learning-based, recommender system-based, and knowledge graph-based methods for DDI detection. For each section, related studies are described, summarized, and compared, respectively. In the end, we conclude the research status of DDI prediction based on ML methods and point out the existing issues, future challenges, potential opportunities, and subsequent research direction. Collapse Key Words Collapse MESH Headings Drug Interactions Pharmaceutical Preparations Machine Learning Databases, Factual Knowledge Bases Collapse Grants Collapse
6	Assembling spatial clustering framework for heterogeneous spatial transcriptomics data with GRAPHDeep. Bioinformatics 2024;40:btae023. [PMID: 38243703 PMCID: PMC10832355 DOI: 10.1093/bioinformatics/btae023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 11/24/2023] [Accepted: 01/13/2024] [Indexed: 01/21/2024] Open Abstract MOTIVATION Spatial clustering is essential and challenging for spatial transcriptomics' data analysis to unravel tissue microenvironment and biological function. Graph neural networks are promising to address gene expression profiles and spatial location information in spatial transcriptomics to generate latent representations. However, choosing an appropriate graph deep learning module and graph neural network necessitates further exploration and investigation. RESULTS In this article, we present GRAPHDeep to assemble a spatial clustering framework for heterogeneous spatial transcriptomics data. Through integrating 2 graph deep learning modules and 20 graph neural networks, the most appropriate combination is decided for each dataset. The constructed spatial clustering method is compared with state-of-the-art algorithms to demonstrate its effectiveness and superiority. The significant new findings include: (i) the number of genes or proteins of spatial omics data is quite crucial in spatial clustering algorithms; (ii) the variational graph autoencoder is more suitable for spatial clustering tasks than deep graph infomax module; (iii) UniMP, SAGE, SuperGAT, GATv2, GCN, and TAG are the recommended graph neural networks for spatial clustering tasks; and (iv) the used graph neural network in the existent spatial clustering frameworks is not the best candidate. This study could be regarded as desirable guidance for choosing an appropriate graph neural network for spatial clustering. AVAILABILITY AND IMPLEMENTATION The source code of GRAPHDeep is available at https://github.com/narutoten520/GRAPHDeep. The studied spatial omics data are available at https://zenodo.org/record/8141084. Collapse Key Words Collapse MESH Headings Gene Expression Profiling Algorithms Neural Networks, Computer Software Cluster Analysis Collapse Grants 2022YFC3601800 National Key R&D Programmes (NKPs) of China KJZD-K202300105 Science and Technology Research Program of Chongqing Municipal Education Commission Collapse
7	Identification and evaluation of a novel PARP1 inhibitor for the treatment of triple-negative breast cancer. Chem Biol Interact 2023;382:110567. [PMID: 37271214 DOI: 10.1016/j.cbi.2023.110567] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 05/20/2023] [Accepted: 05/24/2023] [Indexed: 06/06/2023] Abstract Triple-negative breast cancer (TNBC) is a particularly invasive subtype of breast cancer and usually has a poor prognosis due to the lack of effective therapeutic targets. Approximately 25% of TNBC patients carry a breast cancer susceptibility gene1/2 (BRCA1/2) mutation. Clinically, PARP1 inhibitors have been approved for the treatment of patients with BRCA1/2-mutated breast cancer through the mechanism of synthetic lethality. In this study, we identified compound 6 {systematic name: 2-[2-(4-Hydroxy-phenyl)-vinyl]-3H-quinazolin-4-one} as a novel PARP1 inhibitor from established virtual screening methods. Compound 6 exerted stronger PARP1 inhibitory activity and anti-cancer activity as compared to olaparib in BRCA1-mutated TNBC cells and TNBC patient-derived organoids. Unexpectedly, we found that compound 6 also significantly inhibited cell viability, proliferation, and induced cell apoptosis in BRCA wild-type TNBC cells. To further elucidate the underlying molecular mechanism, we found that tankyrase (TNKS), a vital promoter of homologous-recombination repair, was a potential target of compound 6 by cheminformatics analysis. Compound 6 not only decreased the expression of PAR, but also down-regulated the expression of TNKS, thus resulting in significant DNA single-strand and double-strand breaks in BRCA wild-type TNBC cells. In addition, we demonstrated that compound 6 enhanced the sensitivity of BRCA1-mutated and wild-type TNBC cells to chemotherapy including paclitaxel and cisplatin. Collectively, our study identified a novel PARP1 inhibitor, providing a therapeutic candidate for the treatment of TNBC. Collapse Key Words BRCA1/2 Olaparib PARP1 inhibitor TNKS Triple-negative breast cancer Collapse MESH Headings Humans Triple Negative Breast Neoplasms/drug therapy Triple Negative Breast Neoplasms/genetics Triple Negative Breast Neoplasms/metabolism BRCA1 Protein/genetics Cell Line, Tumor BRCA2 Protein Poly (ADP-Ribose) Polymerase-1 Collapse Grants Collapse
8	Comprehensive evaluation of deep and graph learning on drug-drug interactions prediction. Brief Bioinform 2023:bbad235. [PMID: 37401373 DOI: 10.1093/bib/bbad235] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/30/2023] [Accepted: 06/05/2023] [Indexed: 07/05/2023] Open Abstract Recent advances and achievements of artificial intelligence (AI) as well as deep and graph learning models have established their usefulness in biomedical applications, especially in drug-drug interactions (DDIs). DDIs refer to a change in the effect of one drug to the presence of another drug in the human body, which plays an essential role in drug discovery and clinical research. DDIs prediction through traditional clinical trials and experiments is an expensive and time-consuming process. To correctly apply the advanced AI and deep learning, the developer and user meet various challenges such as the availability and encoding of data resources, and the design of computational methods. This review summarizes chemical structure based, network based, natural language processing based and hybrid methods, providing an updated and accessible guide to the broad researchers and development community with different domain knowledge. We introduce widely used molecular representation and describe the theoretical frameworks of graph neural network models for representing molecular structures. We present the advantages and disadvantages of deep and graph learning methods by performing comparative experiments. We discuss the potential technical challenges and highlight future directions of deep and graph learning models for accelerating DDIs prediction. Collapse Key Words deep learning drug–drug interactions prediction graph learning Collapse MESH Headings Collapse Grants 62202413 National Natural Science Foundation of China 2020YFC0832405 National Key Researchand Development Program of China 2022RC1099 Science and Technology Innovation Program of Hunan Province of China 2019RS1060 High-Level Talent Aggregation Project in Hunan Province, China Innovation Team 2022JJ20016 Hunan Provincial Natural Science Foundation of China 2021ZD0150100 Science and Technology Innovation 2030-Major Project 21C0074 General Project of Hunan Provincial Education Department 2021RD0AB02 Open Research Projects of Zhejiang Lab kq2202137 Natural Science Foundationof Changsha City III-1763325 National Science Foundation Collapse
9	DKADE: a novel framework based on deep learning and knowledge graph for identifying adverse drug events and related medications. Brief Bioinform 2023:bbad228. [PMID: 37344167 DOI: 10.1093/bib/bbad228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 04/16/2023] [Accepted: 05/26/2023] [Indexed: 06/23/2023] Open Abstract Adverse drug events (ADEs) are common in clinical practice and can cause significant harm to patients and increase resource use. Natural language processing (NLP) has been applied to automate ADE detection, but NLP systems become less adaptable when drug entities are missing or multiple medications are specified in clinical narratives. Additionally, no Chinese-language NLP system has been developed for ADE detection due to the complexity of Chinese semantics, despite ˃10 million cases of drug-related adverse events occurring annually in China. To address these challenges, we propose DKADE, a deep learning and knowledge graph-based framework for identifying ADEs. DKADE infers missing drug entities and evaluates their correlations with ADEs by combining medication orders and existing drug knowledge. Moreover, DKADE can automatically screen for new adverse drug reactions. Experimental results show that DKADE achieves an overall F1-score value of 91.13%. Furthermore, the adaptability of DKADE is validated using real-world external clinical data. In summary, DKADE is a powerful tool for studying drug safety and automating adverse event monitoring. Collapse Key Words Chinese natural language processing adverse drug events deep learning knowledge graph Collapse MESH Headings Collapse Grants 2021YFF1201400 National Key Research and Development Program of China 22173118 National Natural Science Foundation of China 2021JJ10068 Hunan Provincial Science Fund for Distinguished Young Scholars 2021RC4011 Science and Technology Innovation Program of Hunan Province 2020SK2010 Key Research and Development Program of Hunan Province of China 2023JJ60513 Natural Science Foundation of Hunan Province Collapse
10	In-silico target prediction by ensemble chemogenomic model based on multi-scale information of chemical structures and protein sequences. J Cheminform 2023;15:48. [PMID: 37088813 PMCID: PMC10123967 DOI: 10.1186/s13321-023-00720-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Accepted: 04/08/2023] [Indexed: 04/25/2023] Open Abstract Identification and validation of bioactive small-molecule targets is a significant challenge in drug discovery. In recent years, various in-silico approaches have been proposed to expedite time- and resource-consuming experiments for target detection. Herein, we developed several chemogenomic models for target prediction based on multi-scale information of chemical structures and protein sequences. By combining the information of a compound with multiple protein targets together and putting these compound-target pairs into a well-established model, the scores to indicate whether there are interactions between compounds and targets can be derived, and thus a target prediction task can be completed by sorting the outputted scores. To improve the prediction performance, we constructed several chemogenomic models using multi-scale information of chemical structures and protein sequences, and the ensemble model with the best performance was used as our final model. The model was validated by various strategies and external datasets and the promising target prediction capability of the model, i.e., the fraction of known targets identified in the top-k (1 to 10) list of the potential target candidates suggested by the model, was confirmed. Compared with multiple state-of-art target prediction methods, our model showed equivalent or better predictive ability in terms of the top-k predictions. It is expected that our method can be utilized as a powerful computational tool to narrow down the potential targets for experimental testing. Collapse Key Words Chemogenomic Ensemble model Target prediction XGBoost Collapse MESH Headings Collapse Grants SDF19 0402 P02 HKBU Strategic Development Fund project 2021YFF1201400 National Key Research and Development Program of China 22173118 National Natural Science Foundation of China 2021JJ10068 Hunan Provincial Science Fund for Distinguished Young Scholars TC210804V The Project of Intelligent Management Software for Multimodal Medical Big Data for New Generation Information Technology, Ministry of Industry and Information Technology of People's Republic of China kq2014144 the science and technology innovation Program of Hunan Province (2021RC4011), Changsha Municipal Natural Science Foundation kq2001034 Changsha Science and Technology Bureau project Collapse
11	Graph deep learning enabled spatial domains identification for spatial transcriptomics. Brief Bioinform 2023;24:7130976. [PMID: 37080761 DOI: 10.1093/bib/bbad146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 03/02/2023] [Accepted: 03/27/2023] [Indexed: 04/22/2023] Open Abstract Advancing spatially resolved transcriptomics (ST) technologies help biologists comprehensively understand organ function and tissue microenvironment. Accurate spatial domain identification is the foundation for delineating genome heterogeneity and cellular interaction. Motivated by this perspective, a graph deep learning (GDL) based spatial clustering approach is constructed in this paper. First, the deep graph infomax module embedded with residual gated graph convolutional neural network is leveraged to address the gene expression profiles and spatial positions in ST. Then, the Bayesian Gaussian mixture model is applied to handle the latent embeddings to generate spatial domains. Designed experiments certify that the presented method is superior to other state-of-the-art GDL-enabled techniques on multiple ST datasets. The codes and dataset used in this manuscript are summarized at https://github.com/narutoten520/SCGDL. Collapse Key Words Bayesian Gaussian mixture models deep graph infomax graph deep learning residual gated graph convolutional neural network spatial clustering spatial transcriptome Collapse MESH Headings Collapse Grants 2022YFC3601802 National Key Research and Development Programs of China 22173118 National Natural Science Foundation of China 2021JJ10068 Hunan Provincial Science Fund for Distinguished Young Scholars 2021RC4011 Science and Technology Innovation Program of Hunan Province 2022GK2021 Key Research and Development Program of Hunan Province Collapse
12	Improved GNNs for Log D_7.4 Prediction by Transferring Knowledge from Low-Fidelity Data. J Chem Inf Model 2023;63:2345-2359. [PMID: 37000044 DOI: 10.1021/acs.jcim.2c01564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/01/2023] Abstract The n-octanol/buffer solution distribution coefficient at pH = 7.4 (log D_7.4) is an indicator of lipophilicity, and it influences a wide variety of absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties and druggability of compounds. In log D_7.4 prediction, graph neural networks (GNNs) can uncover subtle structure-property relationships (SPRs) by automatically extracting features from molecular graphs that facilitate the learning of SPRs, but their performances are often limited by the small size of available datasets. Herein, we present a transfer learning strategy called pretraining on computational data and then fine-tuning on experimental data (PCFE) to fully exploit the predictive potential of GNNs. PCFE works by pretraining a GNN model on 1.71 million computational log D data (low-fidelity data) and then fine-tuning it on 19,155 experimental log D_7.4 data (high-fidelity data). The experiments for three GNN architectures (graph convolutional network (GCN), graph attention network (GAT), and Attentive FP) demonstrated the effectiveness of PCFE in improving GNNs for log D_7.4 predictions. Moreover, the optimal PCFE-trained GNN model (cx-Attentive FP, R_test² = 0.909) outperformed four excellent descriptor-based models (random forest (RF), gradient boosting (GB), support vector machine (SVM), and extreme gradient boosting (XGBoost)). The robustness of the cx-Attentive FP model was also confirmed by evaluating the models with different training data sizes and dataset splitting strategies. Therefore, we developed a webserver and defined the applicability domain for this model. The webserver (http://tools.scbdd.com/chemlogd/) provides free log D_7.4 prediction services. In addition, the important descriptors for log D_7.4 were detected by the Shapley additive explanations (SHAP) method, and the most relevant substructures of log D_7.4 were identified by the attention mechanism. Finally, the matched molecular pair analysis (MMPA) was performed to summarize the contributions of common chemical substituents to log D_7.4, including a variety of hydrocarbon groups, halogen groups, heteroatoms, and polar groups. In conclusion, we believe that the cx-Attentive FP model can serve as a reliable tool to predict log D_7.4 and hope that pretraining on low-fidelity data can help GNNs make accurate predictions of other endpoints in drug discovery. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
13	Comprehensive assessment of nine target prediction web services: which should we choose for target fishing? Brief Bioinform 2023;24:6995377. [PMID: 36681902 DOI: 10.1093/bib/bbad014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 12/29/2022] [Accepted: 01/03/2023] [Indexed: 01/23/2023] Open Abstract Identification of potential targets for known bioactive compounds and novel synthetic analogs is of considerable significance. In silico target fishing (TF) has become an alternative strategy because of the expensive and laborious wet-lab experiments, explosive growth of bioactivity data and rapid development of high-throughput technologies. However, these TF methods are based on different algorithms, molecular representations and training datasets, which may lead to different results when predicting the same query molecules. This can be confusing for practitioners in practical applications. Therefore, this study systematically evaluated nine popular ligand-based TF methods based on target and ligand-target pair statistical strategies, which will help practitioners make choices among multiple TF methods. The evaluation results showed that SwissTargetPrediction was the best method to produce the most reliable predictions while enriching more targets. High-recall similarity ensemble approach (SEA) was able to find real targets for more compounds compared with other TF methods. Therefore, SwissTargetPrediction and SEA can be considered as primary selection methods in future studies. In addition, the results showed that k = 5 was the optimal number of experimental candidate targets. Finally, a novel ensemble TF method based on consensus voting is proposed to improve the prediction performance. The precision of the ensemble TF method outperforms the individual TF method, indicating that the ensemble TF method can more effectively identify real targets within a given top-k threshold. The results of this study can be used as a reference to guide practitioners in selecting the most effective methods in computational drug discovery. Collapse Key Words SwissTargetPrediction consensus voting ensemble target prediction method in silico target fishing similarity ensemble approach Collapse MESH Headings Ligands Hunting Algorithms Collapse Grants 2020B1212030006 The 2020 Guangdong Provincial Science and Technology Innovation Strategy Special Fund kq2001034 Changsha Science and Technology Bureau project 2022JJ80104 Natural Science Foundation of Jilin Province 2021JJ10068 Hunan Provincial Science Fund for Distinguished Young Scholars 22220102001 National Natural Science Foundation of China 2021YFF1201400 National Key Research and Development Program of China Collapse
14	Reducing false positive rate of docking-based virtual screening by active learning. Brief Bioinform 2023;24:6987822. [PMID: 36642412 DOI: 10.1093/bib/bbac626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 12/10/2022] [Accepted: 12/20/2022] [Indexed: 01/17/2023] Open Abstract Machine learning-based scoring functions (MLSFs) have become a very favorable alternative to classical scoring functions because of their potential superior screening performance. However, the information of negative data used to construct MLSFs was rarely reported in the literature, and meanwhile the putative inactive molecules recorded in existing databases usually have obvious bias from active molecules. Here we proposed an easy-to-use method named AMLSF that combines active learning using negative molecular selection strategies with MLSF, which can iteratively improve the quality of inactive sets and thus reduce the false positive rate of virtual screening. We chose energy auxiliary terms learning as the MLSF and validated our method on eight targets in the diverse subset of DUD-E. For each target, we screened the IterBioScreen database by AMLSF and compared the screening results with those of the four control models. The results illustrate that the number of active molecules in the top 1000 molecules identified by AMLSF was significantly higher than those identified by the control models. In addition, the free energy calculation results for the top 10 molecules screened out by the AMLSF, null model and control models based on DUD-E also proved that more active molecules can be identified, and the false positive rate can be reduced by AMLSF. Collapse Key Words active learning false positive machine learning-based scoring function (MLSF) molecular docking virtual screening (VS) Collapse MESH Headings Collapse Grants Collapse
15	Structural Analysis and Prediction of Hematotoxicity Using Deep Learning Approaches. J Chem Inf Model 2023;63:111-125. [PMID: 36472475 DOI: 10.1021/acs.jcim.2c01088] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Abstract Hematotoxicity has been becoming a serious but overlooked toxicity in drug discovery. However, only a few in silico models have been reported for the prediction of hematotoxicity. In this study, we constructed a high-quality dataset comprising 759 hematotoxic compounds and 1623 nonhematotoxic compounds and then established a series of classification models based on a combination of seven machine learning (ML) algorithms and nine molecular representations. The results based on two data partitioning strategies and applicability domain (AD) analysis illustrate that the best prediction model based on Attentive FP yielded a balanced accuracy (BA) of 72.6%, an area under the receiver operating characteristic curve (AUC) value of 76.8% for the validation set, and a BA of 69.2%, an AUC of 75.9% for the test set. In addition, compared with existing filtering rules and models, our model achieved the highest BA value of 67.5% for the external validation set. Additionally, the shapley additive explanation (SHAP) and atom heatmap approaches were utilized to discover the important features and structural fragments related to hematotoxicity, which could offer helpful tips to detect undesired positive substances. Furthermore, matched molecular pair analysis (MMPA) and representative substructure derivation technique were employed to further characterize and investigate the transformation principles and distinctive structural features of hematotoxic chemicals. We believe that the novel graph-based deep learning algorithms and insightful interpretation presented in this study can be used as a trustworthy and effective tool to assess hematotoxicity in the development of new drugs. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
16	Evaluation of the red & blue LED effects on cutaneous refractory wound healing in male Sprague-Dawley rat using 3 different multi-drug resistant bacteria. Lasers Surg Med 2022;54:725-736. [PMID: 34989417 DOI: 10.1002/lsm.23515] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 11/24/2021] [Accepted: 12/21/2021] [Indexed: 01/23/2023] Abstract OBJECTIVES Photobiomodulation (PBM) is widely used in clinical therapy, and is an effective approach to resist the bacterial infection of the cutaneous wound and modulate the wound healing process. Due to the several detriments of lasers, Red & Blue LED light (RBLL) may be a more viable light source. This study is aimed to evaluate and compare the therapeutic effect of RBLL light on different multi-drug resistant (MDR) bacteria in vitro and male Sprague-Dawley (SD) rat refractory MDR infection wound model in vivo. MATERIALS AND METHODS Methicillin-resistant Staphylococcus aureus (MRSA), Extended-spectrum β-lactamases -producing Escherichia coli (ESBLs-Eco), and the MDR Pseudomonas aeruginosa (MDR-Pae) were employed to evaluate the antibacterial effects of the Blue LED light in vitro. Effects of RBLL on in vivo wound healing were evaluated by analyzing time to closure, wound score, semi-quantitative test for bacterial culture, histopathological examination and Masson staining of skin tissue, immunohistochemical (IHC) staining, and western blot analysis (WB) of wound tissue. RESULTS Blue LED light inhibited MRSA, ESBLs-Eco, and MDR-Pae in vitro study. In vivo, RBLL accelerated wound healing, reduced levels of pathogenic bacteria on the wound surface while increasing the blood supply to the wound surface and inhibiting the excessive inflammatory response. CONCLUSION RBLL showed a great potential gain for the treatment of MDR bacterial infected wounds, suggesting PBM therapy is an inexpensive, convenient, pain-free, and safe therapeutic intervention for refractory MDR infection wounds. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
17	A novel ribosomal protein S6 kinase 2 inhibitor attenuates the malignant phenotype of cutaneous malignant melanoma cells by inducing cell cycle arrest and apoptosis. Bioengineered 2022;13:13555-13570. [PMID: 36700473 PMCID: PMC9275999 DOI: 10.1080/21655979.2022.2080364] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2023] Open Abstract Malignant melanoma (MM) is a highly life-threatening tumor causing the majority of the cutaneous cancer-related deaths. Previously, ribosomal protein S6 kinase 2 (RSK2), the downstream effector of the MAPK pathway, represents a therapeutic target in melanoma. AE007 is discovered as a targeted RSK2 inhibitor, and subsequent results showed that AE007 inhibits RSK2 by directly binding to its protein kinase domain. AE007 causes cell cycle arrest and cellular apoptosis, thereby dramatically inhibiting proliferation, migration, and invasion of melanoma cells. Nevertheless, melanocytes and keratinocytes are not affected by this compound. In addition, suppression of RSK2 abrogates the inhibitory effect of AE007 on melanoma cell proliferation. AE007 treatment significantly inhibits the expression of Cyclin D1, Cyclin B1, CDK2, and Bcl-2, while raises the cleavage of PARP. Moreover, RNA sequencing results show that AE007 treatment can affect the genes expression profile, including the expression of cell cycle and DNA replication genes. In conclusion, AE007 is a promising melanoma therapeutic agent by targeting RSK2. Collapse Key Words AE007 Melanoma RSK2 apoptosis cell cycle arrest Collapse MESH Headings Collapse Grants Collapse
18	Machine learning to predict metabolic drug interactions related to cytochrome P450 isozymes. J Cheminform 2022;14:23. [PMID: 35428354 PMCID: PMC9013037 DOI: 10.1186/s13321-022-00602-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 03/26/2022] [Indexed: 11/28/2022] Open Abstract Drug–drug interaction (DDI) often causes serious adverse reactions and thus results in inestimable economic and social loss. Currently, comprehensive DDI evaluation has become a major challenge in pharmaceutical research due to the time-consuming and costly process of the experimental assessment and it is of high necessity to develop effective in silico methods to predict and evaluate DDIs accurately and efficiently. In this study, based on a large number of substrates and inhibitors related to five important CYP450 isozymes (CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A4), a series of high-performance predictive models for metabolic DDIs were constructed by two machine learning methods (random forest and XGBoost) and 4 different types of descriptors (MOE_2D, CATS, ECFP4 and MACCS). To reduce the uncertainty of individual models, the consensus method was applied to yield more reliable predictions. A series of evaluations illustrated that the consensus models were more reliable and robust for the DDI predictions of new drug combination. For the internal validation, the whole prediction accuracy and AUC value of the DDI models were around 0.8 and 0.9, respectively. When it was applied to the external datasets, the model accuracy was 0.793 and 0.795 for multi-level validation and external validation, respectively. Furthermore, we also compared our model with some recently published tools and then applied the final model to predict FDA-approved drugs and proposed 54,013 possible drug pairs with potential DDIs. In summary, we developed a powerful DDI predictive model from the perspective of the CYP450 enzyme family and it will help a lot in the future drug development and clinical pharmacy research. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
19	A novel multi-layer prediction approach for sweetness evaluation based on systematic machine learning modeling. Food Chem 2022;372:131249. [PMID: 34634587 DOI: 10.1016/j.foodchem.2021.131249] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 09/24/2021] [Accepted: 09/27/2021] [Indexed: 02/06/2023] Abstract Nowadays, computational approaches have drawn more and more attention when exploring the relationship between sweetness and chemical structure instead of traditional experimental tests. In this work, we proposed a novel multi-layer sweetness evaluation system based on machine learning methods. It can be used to evaluate sweet properties of compounds with different chemical spaces and categories, including natural, artificial, carbohydrate, non-carbohydrate, nutritive and non-nutritive ones, suitable for different application scenarios. Furthermore, it provided quantitative predictions of sweetness. In addition, sweetness-related chemical basis and structure transforming rules were obtained by using molecular cloud and matched molecular pair analysis (MMPA) methods. This work systematically improved the data quality, explored the best machine learning algorithm and molecular characterizing strategy, and finally obtained robust models to establish a multi-layer prediction system (available at: https://github.com/ifyoungnet/ChemSweet). We hope that this study could facilitate food scientists with efficient screening and precise development of high-quality sweeteners. Collapse Key Words Machine learning Matched molecular pair analysis Molecular cloud Sweetener Sweetness Virtual screening Collapse MESH Headings Collapse Grants Collapse
20	ABC-Net: a divide-and-conquer based deep learning architecture for SMILES recognition from molecular images. Brief Bioinform 2022;23:6535678. [PMID: 35212357 DOI: 10.1093/bib/bbac033] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 01/10/2022] [Accepted: 01/24/2022] [Indexed: 11/14/2022] Open Abstract Structural information for chemical compounds is often described by pictorial images in most scientific documents, which cannot be easily understood and manipulated by computers. This dilemma makes optical chemical structure recognition (OCSR) an essential tool for automatically mining knowledge from an enormous amount of literature. However, existing OCSR methods fall far short of our expectations for realistic requirements due to their poor recovery accuracy. In this paper, we developed a deep neural network model named ABC-Net (Atom and Bond Center Network) to predict graph structures directly. Based on the divide-and-conquer principle, we propose to model an atom or a bond as a single point in the center. In this way, we can leverage a fully convolutional neural network (CNN) to generate a series of heat-maps to identify these points and predict relevant properties, such as atom types, atom charges, bond types and other properties. Thus, the molecular structure can be recovered by assembling the detected atoms and bonds. Our approach integrates all the detection and property prediction tasks into a single fully CNN, which is scalable and capable of processing molecular images quite efficiently. Experimental results demonstrate that our method could achieve a significant improvement in recognition performance compared with publicly available tools. The proposed method could be considered as a promising solution to OCSR problems and a starting point for the acquisition of molecular information in the literature. Collapse Key Words deep learning divide and conquer fully convolutional neural network optical chemical structure recognition Collapse MESH Headings Collapse Grants Collapse
21	Pushing the Boundaries of Molecular Property Prediction for Drug Discovery with Multitask Learning BERT Enhanced by SMILES Enumeration. RESEARCH 2022. [DOI: 10.34133/research.0004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Abstract Accurate prediction of pharmacological properties of small molecules is becoming increasingly important in drug discovery. Traditional feature-engineering approaches heavily rely on handcrafted descriptors and/or fingerprints, which need extensive human expert knowledge. With the rapid progress of artificial intelligence technology, data-driven deep learning methods have shown unparalleled advantages over feature-engineering-based methods. However, existing deep learning methods usually suffer from the scarcity of labeled data and the inability to share information between different tasks when applied to predicting molecular properties, thus resulting in poor generalization capability. Here, we proposed a novel multitask learning BERT (Bidirectional Encoder Representations from Transformer) framework, named MTL-BERT, which leverages large-scale pre-training, multitask learning, and SMILES (simplified molecular input line entry specification) enumeration to alleviate the data scarcity problem. MTL-BERT first exploits a large amount of unlabeled data through self-supervised pretraining to mine the rich contextual information in SMILES strings and then fine-tunes the pretrained model for multiple downstream tasks simultaneously by leveraging their shared information. Meanwhile, SMILES enumeration is used as a data enhancement strategy during the pretraining, fine-tuning, and test phases to substantially increase data diversity and help to learn the key relevant patterns from complex SMILES strings. The experimental results showed that the pretrained MTL-BERT model with few additional fine-tuning can achieve much better performance than the state-of-the-art methods on most of the 60 practical molecular datasets. Additionally, the MTL-BERT model leverages attention mechanisms to focus on SMILES character features essential to target properties for model interpretability. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
22	BioNet: a large-scale and heterogeneous biological network model for interaction prediction with graph convolution. Brief Bioinform 2021;23:6440126. [PMID: 34849567 PMCID: PMC8690188 DOI: 10.1093/bib/bbab491] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Revised: 10/24/2021] [Accepted: 10/25/2021] [Indexed: 01/09/2023] Open Abstract Motivation Understanding chemical–gene interactions (CGIs) is crucial for screening drugs. Wet experiments are usually costly and laborious, which limits relevant studies to a small scale. On the contrary, computational studies enable efficient in-silico exploration. For the CGI prediction problem, a common method is to perform systematic analyses on a heterogeneous network involving various biomedical entities. Recently, graph neural networks become popular in the field of relation prediction. However, the inherent heterogeneous complexity of biological interaction networks and the massive amount of data pose enormous challenges. This paper aims to develop a data-driven model that is capable of learning latent information from the interaction network and making correct predictions. Results We developed BioNet, a deep biological networkmodel with a graph encoder–decoder architecture. The graph encoder utilizes graph convolution to learn latent information embedded in complex interactions among chemicals, genes, diseases and biological pathways. The learning process is featured by two consecutive steps. Then, embedded information learnt by the encoder is then employed to make multi-type interaction predictions between chemicals and genes with a tensor decomposition decoder based on the RESCAL algorithm. BioNet includes 79 325 entities as nodes, and 34 005 501 relations as edges. To train such a massive deep graph model, BioNet introduces a parallel training algorithm utilizing multiple Graphics Processing Unit (GPUs). The evaluation experiments indicated that BioNet exhibits outstanding prediction performance with a best area under Receiver Operating Characteristic (ROC) curve of 0.952, which significantly surpasses state-of-theart methods. For further validation, top predicted CGIs of cancer and COVID-19 by BioNet were verified by external curated data and published literature. Collapse Key Words chemical–gene interaction graph convolution network heterogeneous biological network parallel computing Collapse MESH Headings Collapse Grants Collapse
23	Semi-automated workflow for molecular pair analysis and QSAR-assisted transformation space expansion. J Cheminform 2021;13:86. [PMID: 34774096 PMCID: PMC8590336 DOI: 10.1186/s13321-021-00564-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 10/30/2021] [Indexed: 12/01/2022] Open Abstract In the process of drug discovery, the optimization of lead compounds has always been a challenge faced by pharmaceutical chemists. Matched molecular pair analysis (MMPA), a promising tool to efficiently extract and summarize the relationship between structural transformation and property change, is suitable for local structural optimization tasks. Especially, the integration of MMPA with QSAR modeling can further strengthen the utility of MMPA in molecular optimization navigation. In this study, a new semi-automated procedure based on KNIME was developed to support MMPA on both large- and small-scale datasets, including molecular preparation, QSAR model construction, applicability domain evaluation, and MMP calculation and application. Two examples covering regression and classification tasks were provided to gain a better understanding of the importance of MMPA, which has also shown the reliability and utility of this MMPA-by-QSAR pipeline. Collapse Key Words Lead optimization MMPA MMPA-by-QSAR pipeline Medicinal chemical rule QSAR Collapse MESH Headings Collapse Grants Collapse
24	Learning to SMILES: BAN-based strategies to improve latent representation learning from molecules. Brief Bioinform 2021;22:6356874. [PMID: 34427296 DOI: 10.1093/bib/bbab327] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 07/15/2021] [Accepted: 07/25/2021] [Indexed: 01/22/2023] Open Abstract Computational methods have become indispensable tools to accelerate the drug discovery process and alleviate the excessive dependence on time-consuming and labor-intensive experiments. Traditional feature-engineering approaches heavily rely on expert knowledge to devise useful features, which could be costly and sometimes biased. The emerging deep learning (DL) methods deliver a data-driven method to automatically learn expressive representations from complex raw data. Inspired by this, researchers have attempted to apply various deep neural network models to simplified molecular input line entry specification (SMILES) strings, which contain all the composition and structure information of molecules. However, current models usually suffer from the scarcity of labeled data. This results in a low generalization ability of SMILES-based DL models, which prevents them from competing with the state-of-the-art computational methods. In this study, we utilized the BiLSTM (bidirectional long short term merory) attention network (BAN) in which we employed a novel multi-step attention mechanism to facilitate the extracting of key features from the SMILES strings. Meanwhile, SMILES enumeration was utilized as a data augmentation method in the training phase to substantially increase the number of labeled data and enlarge the probability of mining more patterns from complex SMILES. We again took advantage of SMILES enumeration in the prediction phase to rectify model prediction bias and provide a more accurate prediction. Combined with the BAN model, our strategies can greatly improve the performance of latent features learned from SMILES strings. In 11 canonical absorption, distribution, metabolism, excretion and toxicity-related tasks, our method outperformed the state-of-the-art approaches. Collapse Key Words SMILES attention mechanism data augmentation deep learning drug discovery Collapse MESH Headings Collapse Grants Collapse
25	Computational Bioactivity Fingerprint Similarities To Navigate the Discovery of Novel Scaffolds. J Med Chem 2021;64:7544-7554. [PMID: 34008979 DOI: 10.1021/acs.jmedchem.1c00234] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Abstract As one of the central tasks of modern medicinal chemistry, scaffold hopping is expected to lead to the discovery of structural novel biological active compounds and broaden the chemical space of known active compounds. Here, we report the computational bioactivity fingerprint (CBFP) for easier scaffold hopping, where the predicted activities in multiple quantitative structure-activity relationship models are integrated to characterize the biological space of a molecule. In retrospective benchmarks, the CBFP representation shows outstanding scaffold hopping potential relative to other chemical descriptors. In the prospective validation for the discovery of novel inhibitors of poly [ADP-ribose] polymerase 1, 35 predicted compounds with diverse structures are tested, 25 of which show detectable growth-inhibitory activity; beyond this, the most potent (compound 6) has an IC₅₀ of 0.263 nM. These results support the use of CBFP representation as the bioactivity proxy of molecules to explore uncharted chemical space and discover novel compounds. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
26	MG-BERT: leveraging unsupervised atomic representation learning for molecular property prediction. Brief Bioinform 2021;22:6265201. [PMID: 33951729 DOI: 10.1093/bib/bbab152] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 03/11/2021] [Accepted: 04/01/2021] [Indexed: 11/12/2022] Open Abstract MOTIVATION Accurate and efficient prediction of molecular properties is one of the fundamental issues in drug design and discovery pipelines. Traditional feature engineering-based approaches require extensive expertise in the feature design and selection process. With the development of artificial intelligence (AI) technologies, data-driven methods exhibit unparalleled advantages over the feature engineering-based methods in various domains. Nevertheless, when applied to molecular property prediction, AI models usually suffer from the scarcity of labeled data and show poor generalization ability. RESULTS In this study, we proposed molecular graph BERT (MG-BERT), which integrates the local message passing mechanism of graph neural networks (GNNs) into the powerful BERT model to facilitate learning from molecular graphs. Furthermore, an effective self-supervised learning strategy named masked atoms prediction was proposed to pretrain the MG-BERT model on a large amount of unlabeled data to mine context information in molecules. We found the MG-BERT model can generate context-sensitive atomic representations after pretraining and transfer the learned knowledge to the prediction of a variety of molecular properties. The experimental results show that the pretrained MG-BERT model with a little extra fine-tuning can consistently outperform the state-of-the-art methods on all 11 ADMET datasets. Moreover, the MG-BERT model leverages attention mechanisms to focus on atomic features essential to the target property, providing excellent interpretability for the trained model. The MG-BERT model does not require any hand-crafted feature as input and is more reliable due to its excellent interpretability, providing a novel framework to develop state-of-the-art models for a wide range of drug discovery tasks. Collapse Key Words atomic representation deep learning molecular graph BERT molecular property prediction self-supervised learning Collapse MESH Headings Collapse Grants Collapse
27	Systematic comparison of ligand-based and structure-based virtual screening methods on poly (ADP-ribose) polymerase-1 inhibitors. Brief Bioinform 2021;22:6262239. [PMID: 33940596 DOI: 10.1093/bib/bbab135] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 03/16/2021] [Accepted: 03/23/2021] [Indexed: 11/12/2022] Open Abstract The poly (ADP-ribose) polymerase-1 (PARP1) has been regarded as a vital target in recent years and PARP1 inhibitors can be used for ovarian and breast cancer therapies. However, it has been realized that most of PARP1 inhibitors have disadvantages of low solubility and permeability. Therefore, by discovering more molecules with novel frameworks, it would have greater opportunities to apply it into broader clinical fields and have a more profound significance. In the present study, multiple virtual screening (VS) methods had been employed to evaluate the screening efficiency of ligand-based, structure-based and data fusion methods on PARP1 target. The VS methods include 2D similarity screening, structure-activity relationship (SAR) models, docking and complex-based pharmacophore screening. Moreover, the sum rank, sum score and reciprocal rank were also adopted for data fusion methods. The evaluation results show that the similarity searching based on Torsion fingerprint, six SAR models, Glide docking and pharmacophore screening using Phase have excellent screening performance. The best data fusion method is the reciprocal rank, but the sum score also performs well in framework enrichment. In general, the ligand-based VS methods show better performance on PARP1 inhibitor screening. These findings confirmed that adding ligand-based methods to the early screening stage will greatly improve the screening efficiency, and be able to enrich more highly active PARP1 inhibitors with diverse structures. Collapse Key Words PARP1 inhibitors data fusion pharmacophore similarity searching virtual screening (VS) Collapse MESH Headings Collapse Grants Collapse
28	PySmash: Python package and individual executable program for representative substructure generation and application. Brief Bioinform 2021;22:6168498. [PMID: 33709154 DOI: 10.1093/bib/bbab017] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2020] [Revised: 01/06/2021] [Accepted: 01/12/2021] [Indexed: 01/23/2023] Open Abstract BACKGROUND Substructure screening is widely applied to evaluate the molecular potency and ADMET properties of compounds in drug discovery pipelines, and it can also be used to interpret QSAR models for the design of new compounds with desirable physicochemical and biological properties. With the continuous accumulation of more experimental data, data-driven computational systems which can derive representative substructures from large chemical libraries attract more attention. Therefore, the development of an integrated and convenient tool to generate and implement representative substructures is urgently needed. RESULTS In this study, PySmash, a user-friendly and powerful tool to generate different types of representative substructures, was developed. The current version of PySmash provides both a Python package and an individual executable program, which achieves ease of operation and pipeline integration. Three types of substructure generation algorithms, including circular, path-based and functional group-based algorithms, are provided. Users can conveniently customize their own requirements for substructure size, accuracy and coverage, statistical significance and parallel computation during execution. Besides, PySmash provides the function for external data screening. CONCLUSION PySmash, a user-friendly and integrated tool for the automatic generation and implementation of representative substructures, is presented. Three screening examples, including toxicophore derivation, privileged motif detection and the integration of substructures with machine learning (ML) models, are provided to illustrate the utility of PySmash in safety profile evaluation, therapeutic activity exploration and molecular optimization, respectively. Its executable program and Python package are available at https://github.com/kotori-y/pySmash. Collapse Key Words ADMET Python package QSAR software substructure screening Collapse MESH Headings Collapse Grants Collapse
29	Clinical experience of the use of Integra in combination with negative pressure wound therapy: an alternative method for the management of wounds with exposed bone or tendon. J Plast Surg Hand Surg 2021;55:1-5. [PMID: 33433246 DOI: 10.1080/2000656x.2020.1781140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/22/2022] Abstract The use of Integra has attracted great interest in the treatment of wounds with exposed bone or tendon, which may lead to associated morbidities. However, the use of Integra alone results in poor wound outcomes. We conducted a randomized clinical study to evaluate the combined effects of Integra and negative pressure wound therapy (NPWT). Thirty-six patients with wounds with exposed bone or tendons were treated with Integra alone and with a combination of Integra and NPWT (n = 18 respectively). Negative pressure (125 mm Hg) was applied intermittently till Integra was revascularized. The take rate of Integra and time taken from Integra coverage to skin transplantation was recorded for each case. The average take rate of Integra in the conventional treatment group (Integra with partial packing compression dressings) was lower than that for the new treatment group (Integra with NPWT) (p < 0.001, 95% CI: 6.44-0.20). The mean time period from Integra coverage to skin transplantation was longer for the conventional treatment group than for the new treatment group (p < 0.001, 95% CI: -13.18 to -11.24). The application of NPWT could potentially increase the take rate of Integra and shorten the duration of hospital stay. The use of Integra with NPWT could be a treatment option for wounds with exposed bone or tendon. Collapse Key Words Integra negative pressure wound therapy take rate time Collapse MESH Headings Collapse Grants Collapse
30	QSAR-assisted-MMPA to expand chemical transformation space for lead optimization. Brief Bioinform 2021;22:6071857. [PMID: 33418563 DOI: 10.1093/bib/bbaa374] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Revised: 10/25/2020] [Accepted: 11/25/2020] [Indexed: 11/13/2022] Open Abstract Matched molecular pairs analysis (MMPA) has become a powerful tool for automatically and systematically identifying medicinal chemistry transformations from compound/property datasets. However, accurate determination of matched molecular pair (MMP) transformations largely depend on the size and quality of existing experimental data. Lack of high-quality experimental data heavily hampers the extraction of more effective medicinal chemistry knowledge. Here, we developed a new strategy called quantitative structure-activity relationship (QSAR)-assisted-MMPA to expand the number of chemical transformations and took the logD7.4 property endpoint as an example to demonstrate the reliability of the new method. A reliable logD7.4 consensus prediction model was firstly established, and its applicability domain was strictly assessed. By applying the reliable logD7.4 prediction model to screen two chemical databases, we obtained more high-quality logD7.4 data by defining a strict applicability domain threshold. Then, MMPA was performed on the predicted data and experimental data to derive more chemical rules. To validate the reliability of the chemical rules, we compared the magnitude and directionality of the property changes of the predicted rules with those of the measured rules. Then, we compared the novel chemical rules generated by our proposed approach with the published chemical rules, and found that the magnitude and directionality of the property changes were consistent, indicating that the proposed QSAR-assisted-MMPA approach has the potential to enrich the collection of rule types or even identify completely novel rules. Finally, we found that the number of the MMP rules derived from the experimental data could be amplified by the predicted data, which is helpful for us to analyze the medicinal chemical rules in local chemical environment. In summary, the proposed QSAR-assisted-MMPA approach could be regarded as a very promising strategy to expand the chemical transformation space for lead optimization, especially when no enough experimental data can support MMPA. Collapse Key Words MMPA QSAR lead optimization machine learning medicinal chemical rules Collapse MESH Headings Collapse Grants Collapse
31	Quantitative structure-toxicity relationship model for acute toxicity of organophosphates via multiple administration routes in rats and mice. JOURNAL OF HAZARDOUS MATERIALS 2021;401:123724. [PMID: 33113726 DOI: 10.1016/j.jhazmat.2020.123724] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Revised: 07/29/2020] [Accepted: 08/13/2020] [Indexed: 06/11/2023] Abstract Organophosphates (OPs) are highly toxic compounds, with widespread application in agricultural and chemical industries, whose introduction into the environment poses serious hazards to humans and ecological systems. To assess and ultimately mitigate these hazards, this study predicted the acute toxicity of OPs according to their chemical structure and administration route. The acute toxicity data of 161 OPs in two species via six different administration routes were manually collected and used to develop a series of quantitative structure-toxicity relationship (QSTR) models with robust and practical predictive abilities. The random forest algorithm was used to develop the models, employing both quantum chemical and two-dimensional descriptors according to OECD guidelines. Correlation results and feature similarities indicated that whereas acute toxicity data from rats and mice via the same administration route were combinable for modeling, data from different routes were not. Six QSTR models for each route in a single species and two QSTR models for a single route in the two species were constructed, achieving practical predictive performance. Despite significant variances in their datasets, the prediction models could predict the acute toxicity of novel or unknown OPs, realize rapid assessment, and provide guidance for regulatory decisions to reduce the hazards of OPs. Collapse Key Words OPs Pesticides QSTR Random forest Toxic compound Collapse MESH Headings Algorithms Animals Ecosystem Mice Organophosphates/toxicity Pharmaceutical Preparations Quantitative Structure-Activity Relationship Rats Collapse Grants Collapse
32	ChemFLuo: a web-server for structure analysis and identification of fluorescent compounds. Brief Bioinform 2020;22:5985287. [PMID: 33201188 DOI: 10.1093/bib/bbaa282] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 09/12/2020] [Accepted: 09/25/2020] [Indexed: 11/14/2022] Open Abstract BACKGROUND Fluorescent detection methods are indispensable tools for chemical biology. However, the frequent appearance of potential fluorescent compound has greatly interfered with the recognition of compounds with genuine activity. Such fluorescence interference is especially difficult to identify as it is reproducible and possesses concentration-dependent characteristic. Therefore, the development of a credible screening tool to detect fluorescent compounds from chemical libraries is urgently needed in early stages of drug discovery. RESULTS In this study, we developed a webserver ChemFLuo for fluorescent compound detection, based on two large and high-quality training datasets containing 4906 blue and 8632 green fluorescent compounds. These molecules were used to construct a group of prediction models based on the combination of three machine learning algorithms and seven types of molecular representations. The best blue fluorescence prediction model achieved with balanced accuracy (BA) = 0.858 and area under the receiver operating characteristic curve (AUC) = 0.931 for the validation set, and BA = 0.823 and AUC = 0.903 for the test set. The best green fluorescence prediction model achieved the prediction accuracy with BA = 0.810 and AUC = 0.887 for the validation set, and BA = 0.771 and AUC = 0.852 for the test set. Besides prediction model, 22 blue and 16 green representative fluorescent substructures were summarized for the screening of potential fluorescent compounds. The comparison with other fluorescence detection tools and theapplication to external validation sets and large molecule libraries have demonstrated the reliability of prediction model for fluorescent compound detection. CONCLUSION ChemFLuo is a public webserver to filter out compounds with undesirable fluorescent properties, which will benefit the design of high-quality chemical libraries for drug discovery. It is freely available at http://admet.scbdd.com/chemfluo/index/. Collapse Key Words false positives fluorescent compounds frequent hitters machine learning public webserver substructure screening Collapse MESH Headings Collapse Grants Collapse
33	Does regional lymph node status have a predictive effect on the prognosis of Merkel cell carcinoma? J Plast Reconstr Aesthet Surg 2020;74:845-856. [PMID: 33199219 DOI: 10.1016/j.bjps.2020.10.050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 10/20/2020] [Indexed: 11/17/2022] Abstract BACKGROUND There is no article that studies whether the regional lymph node (RLN) status affects the prognosis of Merkel cell carcinoma (MCC). METHODS The survival and disease data of MCC patients were obtained from the Surveillance, Epidemiological, and End Results (SEER) database. The overall survival (OS) and cause-specific survival (CSS) rates were endpoints. RESULTS A total of 1822 patients were included, with a mean age of 72.5 years. The number of RLN-positive patients was 862 (47.3%), and the number of RLN-negative patients was 960 (52.7%). The regression analysis showed that primary site, sex, and tumor size were statistically significant and independent predictors of RLN status. The five-year OS and CSS of RLN-negative patients were 71.4% and 92.3%, respectively, which were much higher than those of RLN-positive patients (37.5% and 65.8%, respectively) (P <0.001). In univariate survival analysis, positive RLN significantly predicted deterioration of OS and MSS (P <0.001). In multivariate analysis, RLN status had no statistically significant effect on patient prognosis. CONCLUSION The prognosis of patients with RLN metastasis is worse than that of patients without RLN metastasis, but RLN status is not an independent predictor of the prognosis of patients with MCC. Collapse Key Words Merkel cell carcinoma Prognosis Regional lymph node status Survival Collapse MESH Headings Aged Aged, 80 and over Carcinoma, Merkel Cell/mortality Carcinoma, Merkel Cell/pathology Female Humans Lymphatic Metastasis/pathology Male Middle Aged Prognosis SEER Program Skin Neoplasms/mortality Skin Neoplasms/pathology Survival Rate Collapse Grants Collapse
34	A combinatorial target screening strategy for deorphaning macromolecular targets of natural product. Eur J Med Chem 2020;204:112644. [PMID: 32738412 DOI: 10.1016/j.ejmech.2020.112644] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2020] [Revised: 06/02/2020] [Accepted: 07/02/2020] [Indexed: 11/24/2022] Abstract Natural products, as an ideal starting point for molecular design, play a pivotal role in drug discovery; however, ambiguous targets and mechanisms have limited their in-depth research and applications in a global dimension. In-silico target prediction methods have become an alternative to target identification experiments due to the high accuracy and speed, but most studies only use a single prediction method, which may reduce the accuracy and reliability of the prediction. Here, we firstly presented a combinatorial target screening strategy to facilitate multi-target screening of natural products considering the characteristics of diverse in-silico target prediction methods, which consists of ligand-based online approaches, consensus SAR modelling and target-specific re-scoring function modelling. To validate the practicability of the strategy, natural product neferine, a bisbenzylisoquinoline alkaloid isolated from the lotus seed, was taken as an example to illustrate the screening process and a series of corresponding experiments were implemented to explore the pharmacological mechanisms of neferine. The proposed computational method could be used for a complementary hypothesis generation and rapid analysis of potential targets of natural products. Collapse Key Words Combinatorial target screening Consensus SAR modelling Natural products Neferine Target-specific modelling Collapse MESH Headings Collapse Grants Collapse
35	The ups and downs of Poly(ADP-ribose) Polymerase-1 inhibitors in cancer therapy–Current progress and future direction. Eur J Med Chem 2020;203:112570. [DOI: 10.1016/j.ejmech.2020.112570] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2020] [Revised: 06/10/2020] [Accepted: 06/11/2020] [Indexed: 12/13/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
36	Scopy: an integrated negative design python library for desirable HTS/VS database design. Brief Bioinform 2020;22:5901981. [PMID: 32892221 DOI: 10.1093/bib/bbaa194] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2020] [Revised: 07/27/2020] [Accepted: 07/28/2020] [Indexed: 12/12/2022] Open Abstract BACKGROUND High-throughput screening (HTS) and virtual screening (VS) have been widely used to identify potential hits from large chemical libraries. However, the frequent occurrence of 'noisy compounds' in the screened libraries, such as compounds with poor drug-likeness, poor selectivity or potential toxicity, has greatly weakened the enrichment capability of HTS and VS campaigns. Therefore, the development of comprehensive and credible tools to detect noisy compounds from chemical libraries is urgently needed in early stages of drug discovery. RESULTS In this study, we developed a freely available integrated python library for negative design, called Scopy, which supports the functions of data preparation, calculation of descriptors, scaffolds and screening filters, and data visualization. The current version of Scopy can calculate 39 basic molecular properties, 3 comprehensive molecular evaluation scores, 2 types of molecular scaffolds, 6 types of substructure descriptors and 2 types of fingerprints. A number of important screening rules are also provided by Scopy, including 15 drug-likeness rules (13 drug-likeness rules and 2 building block rules), 8 frequent hitter rules (four assay interference substructure filters and four promiscuous compound substructure filters), and 11 toxicophore filters (five human-related toxicity substructure filters, three environment-related toxicity substructure filters and three comprehensive toxicity substructure filters). Moreover, this library supports four different visualization functions to help users to gain a better understanding of the screened data, including basic feature radar chart, feature-feature-related scatter diagram, functional group marker gram and cloud gram. CONCLUSION Scopy provides a comprehensive Python package to filter out compounds with undesirable properties or substructures, which will benefit the design of high-quality chemical libraries for drug design and discovery. It is freely available at https://github.com/kotori-y/Scopy. Collapse Key Words HTS drug-likeness frequent hitters negative design toxicity Collapse MESH Headings Collapse Grants Collapse
37	Clinical Features and Prognosis of Merkel Cell Carcinoma in Elderly Patients. Med Sci Monit 2020;26:e924570. [PMID: 32653892 PMCID: PMC7375029 DOI: 10.12659/msm.924570] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Accepted: 04/28/2020] [Indexed: 12/12/2022] Open Abstract BACKGROUND Merkel cell carcinoma (MCC) occurs primarily among elderly patients over 70 years old, but the ability to predict the prognosis of these elderly patients is poor. This population-based study aimed to identify prognostic risk factors for elderly patients with MCC. MATERIAL AND METHODS The survival and disease information of MCC patients age 65 years or older was downloaded from the SEER database, and all data were split into 2 groups based on age 80 years, with overall survival and MCC-specific survival as the main outcome indicators. RESULTS Application of the inclusion criteria yielded 1973 patients with MCC, of whom 55.6% were age 65-80 years. Among them, 1258 were males, accounting for 63.8%. In survival analysis, factors that were significantly correlated with overall survival and MCC-specific survival were N stage, M stage, liver metastasis, and lymph node surgery. CONCLUSIONS We provide epidemiological insights into Merkel cell carcinoma in elderly patients and confirmed that patients receiving lymph node surgery have better outcomes. To the best of our knowledge, this is the first study to show that the occurrence of liver metastasis is associated with poor prognosis. Our results will help strengthen monitoring of the liver condition of elderly patients and to perform necessary lymph node surgery within the patient's tolerance. Collapse Key Words carcinoma, merkel cell prognosis survival analysis Collapse MESH Headings Aged Aged, 80 and over Carcinoma, Merkel Cell/epidemiology Carcinoma, Merkel Cell/metabolism Carcinoma, Merkel Cell/mortality Carcinoma, Merkel Cell/pathology Female Humans Lymphatic Metastasis/physiopathology Male Neoplasm Recurrence, Local/pathology Prognosis SEER Program Sentinel Lymph Node Biopsy/methods Skin Neoplasms/pathology Survival Analysis Collapse Grants Collapse
38	Improving structure-based virtual screening performance via learning from scoring function components. Brief Bioinform 2020;22:5851268. [PMID: 32496540 DOI: 10.1093/bib/bbaa094] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Revised: 03/30/2020] [Accepted: 04/28/2020] [Indexed: 11/12/2022] Open Abstract Scoring functions (SFs) based on complex machine learning (ML) algorithms have gradually emerged as a promising alternative to overcome the weaknesses of classical SFs. However, extensive efforts have been devoted to the development of SFs based on new protein-ligand interaction representations and advanced alternative ML algorithms instead of the energy components obtained by the decomposition of existing SFs. Here, we propose a new method named energy auxiliary terms learning (EATL), in which the scoring components are extracted and used as the input for the development of three levels of ML SFs including EATL SFs, docking-EATL SFs and comprehensive SFs with ascending VS performance. The EATL approach not only outperforms classical SFs for the absolute performance (ROC) and initial enrichment (BEDROC) but also yields comparable performance compared with other advanced ML-based methods on the diverse subset of Directory of Useful Decoys: Enhanced (DUD-E). The test on the relatively unbiased actives as decoys (AD) dataset also proved the effectiveness of EATL. Furthermore, the idea of learning from SF components to yield improved screening power can also be extended to other docking programs and SFs available. Collapse Key Words docking program machine learning scoring function (SF) virtual screening Collapse MESH Headings Drug Discovery Machine Learning Molecular Docking Simulation Protein Binding Proteins/chemistry Collapse Grants Collapse
39	Improving Docking-Based Virtual Screening Ability by Integrating Multiple Energy Auxiliary Terms from Molecular Docking Scoring. J Chem Inf Model 2020;60:4216-4230. [PMID: 32352294 DOI: 10.1021/acs.jcim.9b00977] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Abstract Virtual Screening (VS) based on molecular docking is an efficient method used for retrieving novel hit compounds in drug discovery. However, the accuracy of the current docking scoring function (SF) is usually insufficient. In this study, in order to improve the screening power of SF, a novel approach named EAT-Score was proposed by directly utilizing the energy auxiliary terms (EAT) provided by molecular docking scoring through eXtreme Gradient Boosting (XGBoost). Here, EAT specifically refers to the output of the Molecular Operating Environment (MOE) scoring, including the energy scores of five different classical SFs and the Protein-Ligand Interaction Fingerprint (PLIF) terms. The performance of EAT-Score to discriminate actives from decoys was strictly validated on the DUD-E diverse subset by using different performance metrics. The results showed that EAT-Score performed much better than classical SFs in VS, with its AUC values exhibiting an improvement of around 0.3. Meanwhile, EAT-Score could achieve comparable even better prediction performance compared with other state-of-the-art VS methods, such as some machine learning (ML)-based SFs and classical SFs implemented in docking programs, in terms of AUC, LogAUC, or BEDROC. Furthermore, the EAT-Score model can capture important binding pattern information from protein-ligand complexes by Shapley additive explanations (SHAP) analysis, which may be very helpful in interpreting the ligand binding mechanism for a certain target and thereby guiding drug design. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
40	A multi-scale systems pharmacology approach uncovers the anti-cancer molecular mechanism of Ixabepilone. Eur J Med Chem 2020;199:112421. [PMID: 32428794 DOI: 10.1016/j.ejmech.2020.112421] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2020] [Revised: 04/29/2020] [Accepted: 05/03/2020] [Indexed: 12/21/2022] Abstract It has been realized that FDA approved drugs may have more molecular targets than is commonly thought. Thus, to find the exact drug-target interactions (DTIs) is of great significance for exploring the new molecular mechanism of drugs. Here, we developed a multi-scale system pharmacology (MSSP) method for the large-scale prediction of DTIs. We used MSSP to integrate drug-related and target-related data from multiple levels, the network structural data formed by known drug-target relationships for predicting likely unknown DTIs. Prediction results revealed that Ixabepilone, an epothilone B analog for treating breast cancer patients, may target Bcl-2, an oncogene that contributes to tumor progression and therapy resistance by inhibiting apoptosis. Furthermore, we demonstrated that Ixabepilone could bind with Bcl-2 and decrease its protein expression in breast cancer cells. The down-regulation of Bcl-2 by Ixabepilone is resulted from promoting its degradation by affecting p-Bcl-2. We further found that Ixabepilone could induce autophagy by releasing Beclin1 from Beclin1/Bcl-2 complex. Inhibition of autophagy by knockdown of Beclin1 or pharmacological inhibitor augmented apoptosis, thus enhancing the antitumor efficacy of Ixabepilone against breast cancer cells in vitro and in vivo. In addition, Ixabepilone also decreases Bcl-2 protein expression and induces cytoprotective autophagy in human hepatic carcinoma and glioma cells. In conclusion, this study not only provides a feasible and alternative way exploring new molecular mechanisms of drugs by combing computation DTI prediction, but also reveals an effective strategy to reinforce the antitumor efficacy of Ixabepilone. Collapse Key Words Autophagy Bcl-2 Drug-target interactions (DTIs) Ixabepilone Multi-scale systems pharmacology (MSSP) Collapse MESH Headings Collapse Grants Collapse
41	Substrate-Photocaged Enzymatic Fluorogenic Probe Enabling Sequential Activation for Light-Controllable Monitoring of Intracellular Tyrosinase Activity. Anal Chem 2020;92:7194-7199. [PMID: 32309931 DOI: 10.1021/acs.analchem.0c00746] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Abstract Tyrosinase (TYR) is a crucial enzyme involved in melanogenesis, and its overexpression is closely associated with melanoma. To precisely monitor intracellular TYR activity, remote control of a molecule imaging tool is highly meaningful but remains to be explored. In this work, we present the first photocaged tyrosinase fluorogenic probe by caging the substrate of the enzymatic probe with a photolabile group. Because of the sequential light and enzyme-activation feature, this probe exhibits photocontrollable "turn on" response toward TYR with good selectivity and high sensitivity (detection limit: 0.08 U/mL). Fluorescence imaging results validate that the caged probe possesses the capability of visualizing intracellular endogenous tyrosinase activity in a photocontrol fashion, thus offering a promising molecule imaging tool for investigating TYR-related physiological function and pathological role. Moreover, our sequential activation strategy has great potential for developing more photocontrollable enzymatic fluorogenic probes with spatiotemporal resolution. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
42	Frequency and prognosis of metastasis to liver, lung, bone and brain from Merkel cell carcinoma. Future Oncol 2020;16:1101-1113. [PMID: 32314598 DOI: 10.2217/fon-2020-0064] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open Abstract Aim: To describe the factors affecting distant metastasis of Merkel cell carcinoma (MCC) and the prognosis of metastatic MCC. Materials & methods: The MCC patient information was downloaded from the SEER database. Logistic regression and Cox proportional hazard models were conducted to screen for significant factors. Results: A total of 3449 patients were enrolled. Surgery and chemotherapy were significantly correlated with the occurrence of distant metastasis. In the cause-specific survival rate of MCC, regional lymph node removal, sentinel lymph node biopsy, radiation and chemotherapy can significantly reduce the prognostic risk of patients with distant metastases. Conclusion: Our study screened out the factors affecting the distant metastasis and prognosis of MCC and more prospective studies are needed to verify our findings. Collapse Key Words merkel cell carcinoma metastasis prognosis survival Collapse MESH Headings Collapse Grants Collapse
43	Frequent hitters: nuisance artifacts in high-throughput screening. Drug Discov Today 2020;25:657-667. [DOI: 10.1016/j.drudis.2020.01.014] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 12/28/2019] [Accepted: 01/16/2020] [Indexed: 11/27/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
44	Structural Analysis and Identification of False Positive Hits in Luciferase-Based Assays. J Chem Inf Model 2020;60:2031-2043. [PMID: 32202787 DOI: 10.1021/acs.jcim.9b01188] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Abstract Luciferase-based bioluminescence detection techniques are highly favored in high-throughput screening (HTS), in which the firefly luciferase (FLuc) is the most commonly used variant. However, FLuc inhibitors can interfere with the activity of luciferase, which may result in false positive signals in HTS assays. In order to reduce the unnecessary cost of time and money, an in silico prediction model for FLuc inhibitors is highly desirable. In this study, we built an extensive data set consisting of 20 888 FLuc inhibitors and 198 608 noninhibitors, and then developed a group of classification models based on the combination of three machine learning (ML) algorithms and four types of molecular representations. The best prediction model based on XGBoost and ECFP4 and MOE2d descriptors yielded a balanced accuracy (BA) of 0.878 and an area under the receiver operating characteristic curve (AUC) value of 0.958 for the validation set, and a BA of 0.886 and an AUC of 0.947 for the test set. Three external validation sets, including set 1 (3231 FLuc inhibitors and 69 783 noninhibitors), set 2 (695 FLuc inhibitors and 75 913 noninhibitors), and set 3 (1138 FLuc inhibitors and 8155 noninhibitors), were used to verify the predictive ability of our models. The BA values for the three external validation sets given by the best model are 0.864, 0.845, and 0.791, respectively. In addition, the important features or structural fragments related to FLuc inhibitors were recognized by the Shapley additive explanations (SHAP) method along with their influences on predictions, which may provide valuable clues to detecting undesirable luciferase inhibitors. Based on the important and explanatory features, 16 rules were proposed for detecting FLuc inhibitors, which can achieve a correction rate of 70% for FLuc inhibitors. Furthermore, a comparison with existing prediction rules and models for FLuc inhibitors used in virtual screening verified the high reliability of the models and rules proposed in this study. We also used the model to screen three curated chemical databases, and almost 10% of the molecules in the evaluated databases were predicted as inhibitors, highlighting the potential risk of false positives in luciferase-based assays. Finally, a public web server called ChemFLuc was developed (http://admet.scbdd.com/chemfluc/index/), and it offers a free available service to predict potential FLuc inhibitors. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
45	Application of Negative Design To Design a More Desirable Virtual Screening Library. J Med Chem 2020;63:4411-4429. [DOI: 10.1021/acs.jmedchem.9b01476] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
46	BioMedR: an R/CRAN package for integrated data analysis pipeline in biomedical study. Brief Bioinform 2019;22:474-484. [PMID: 31885044 DOI: 10.1093/bib/bbz150] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Revised: 10/22/2019] [Accepted: 10/30/2019] [Indexed: 02/01/2023] Open Abstract BACKGROUND With the increasing development of biotechnology and information technology, publicly available data in chemistry and biology are undergoing explosive growth. Such wealthy information in these resources needs to be extracted and then transformed to useful knowledge by various data mining methods. However, a main computational challenge is how to effectively represent or encode molecular objects under investigation such as chemicals, proteins, DNAs and even complicated interactions when data mining methods are employed. To further explore these complicated data, an integrated toolkit to represent different types of molecular objects and support various data mining algorithms is urgently needed. RESULTS We developed a freely available R/CRAN package, called BioMedR, for molecular representations of chemicals, proteins, DNAs and pairwise samples of their interactions. The current version of BioMedR could calculate 293 molecular descriptors and 13 kinds of molecular fingerprints for small molecules, 9920 protein descriptors based on protein sequences and six types of generalized scale-based descriptors for proteochemometric modeling, more than 6000 DNA descriptors from nucleotide sequences and six types of interaction descriptors using three different combining strategies. Moreover, this package realized five similarity calculation methods and four powerful clustering algorithms as well as several useful auxiliary tools, which aims at building an integrated analysis pipeline for data acquisition, data checking, descriptor calculation and data modeling. CONCLUSION BioMedR provides a comprehensive and uniform R package to link up different representations of molecular objects with each other and will benefit cheminformatics/bioinformatics and other biomedical users. It is available at: https://CRAN.R-project.org/package=BioMedR and https://github.com/wind22zhu/BioMedR/. Collapse Key Words R package bioinformatics cheminformatics drug discovery molecular representation Collapse MESH Headings Collapse Grants Collapse
47	Systematic Modeling of log D7.4 Based on Ensemble Machine Learning, Group Contribution, and Matched Molecular Pair Analysis. J Chem Inf Model 2019;60:63-76. [DOI: 10.1021/acs.jcim.9b00718] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
48	Structural Analysis and Identification of Colloidal Aggregators in Drug Discovery. J Chem Inf Model 2019;59:3714-3726. [DOI: 10.1021/acs.jcim.9b00541] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
49	A hybrid variable selection strategy based on continuous shrinkage of variable space in multivariate calibration. Anal Chim Acta 2019;1058:58-69. [DOI: 10.1016/j.aca.2019.01.022] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Revised: 01/14/2019] [Accepted: 01/16/2019] [Indexed: 10/27/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
50	Identification of a Novel Bcl-2 Inhibitor by Ligand-Based Screening and Investigation of Its Anti-cancer Effect on Human Breast Cancer Cells. Front Pharmacol 2019;10:391. [PMID: 31057406 PMCID: PMC6478794 DOI: 10.3389/fphar.2019.00391] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Accepted: 03/29/2019] [Indexed: 01/23/2023] Open Abstract Bcl-2 family protein is an important factor in regulating apoptosis and is associated with cancer. The anti-apoptotic proteins of Bcl-2 family, such as Bcl-2, are overexpression in numerous tumors, and contribute to cancer formation, development, and therapy resistance. Therefore, Bcl-2 is a promising target for drug development, and several Bcl-2 inhibitors are currently undergoing clinical trials. In this study, we carried out a QSAR-based virtual screening approach to develop potential Bcl-2 inhibitors from the SPECS database. Surface plasmon resonance (SPR) binding assay was performed to examine the interaction between Bcl-2 protein and the screened inhibitors. After that, we measured the anti-tumor activities of the 8 candidate compounds, and found that compound M1 has significant cytotoxic effect on breast cancer cells. We further proved that compound M1 downregulated Bcl-2 expression and activated apoptosis by inducing mitochondrial dysfunction. In conclusion, we identified a novel Bcl-2 inhibitor by QSAR screening, which exerted significant cytotoxic activity in breast cancer cells through inducing mitochondria-mediated apoptosis. Collapse Key Words Bcl-2 QSAR breast cancer cell small molecule inhibitors virtual screening Collapse MESH Headings Collapse Grants Collapse