1
|
Gong X, Li S, Huang J, Tan S, Zhang Q, Tian Y, Li Q, Wang L, Tong HHY, Yao X, Chen C, Lee SMY, Liu H. Discovery of potent LRRK2 inhibitors by ensemble virtual screening strategy and bioactivity evaluation. Eur J Med Chem 2024; 279:116812. [PMID: 39241668 DOI: 10.1016/j.ejmech.2024.116812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 08/28/2024] [Accepted: 08/28/2024] [Indexed: 09/09/2024]
Abstract
Leucine-rich repeat kinase 2 (LRRK2) has been reported to be associated with familial and idiopathic Parkinson's disease (PD) risk and is a promising target for drug discovery against PD. To identify novel and effective LRRK2 inhibitors, an ensemble virtual screening strategy by combining fingerprint similarity, complex-based pharmacophore and structure-based molecular docking was proposed and applied. Using this strategy, we finally selected 25 compounds from ∼1.7 million compounds for in vitro and in vivo tests. Firstly, the kinase inhibitory activity tests of compounds based on ADP-Glo assay identified three most potent compounds LY2023-19, LY2023-24 and LY2023-25 with IC50 of 556.4 nM, 218.1 nM and 22.4 nM for LRRK2 G2019S mutant, respectively. The further cellular experiments also indicated that three hit compounds significantly inhibited Ser935 phosphorylation of both wide-type and G2019S LRRK2 with IC50 ranging from 27 nM to 1674 nM in HEK293T cells. The MD simulations of three compounds and G2019S LRRK2 showed the hydrogen bond formed by Glu1948 and Ala1950 is crucial for the binding of LRRK2. Afterwards, 6-OHDA-induced PD zebrafish model was constructed to evaluate the neuroprotective effects of hit compounds. The locomotion of the 6-OHDA treated zebrafish larvae was improved after treatment with LY2023-24. The obtained results can provide valuable guidance for the development of PD drugs by targeting LRRK2.
Collapse
Affiliation(s)
- Xiaoqing Gong
- Faculty of Applied Sciences, Macao Polytechnic University, 999078, China
| | - Shuli Li
- State Key Laboratory of Quality Research in Chinese Medicine and Institute of Chinese Medical Sciences, University of Macau, 999078, China
| | - Junli Huang
- Department of Pharmacy, The People's Hospital of Guangxi Zhuang Autonomous Region & Guangxi Academy of Medical Sciences, Nanning, 530021, China
| | - Shuoyan Tan
- State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314100, China
| | - Qianqian Zhang
- Faculty of Applied Sciences, Macao Polytechnic University, 999078, China
| | - Yanan Tian
- Faculty of Applied Sciences, Macao Polytechnic University, 999078, China
| | - Qin Li
- Faculty of Applied Sciences, Macao Polytechnic University, 999078, China
| | - Lingling Wang
- Faculty of Applied Sciences, Macao Polytechnic University, 999078, China
| | - Henry H Y Tong
- Faculty of Applied Sciences, Macao Polytechnic University, 999078, China
| | - Xiaojun Yao
- Faculty of Applied Sciences, Macao Polytechnic University, 999078, China
| | - Chunxia Chen
- Department of Pharmacy, The People's Hospital of Guangxi Zhuang Autonomous Region & Guangxi Academy of Medical Sciences, Nanning, 530021, China.
| | - Simon Ming-Yuen Lee
- State Key Laboratory of Quality Research in Chinese Medicine and Institute of Chinese Medical Sciences, University of Macau, 999078, China; Research Centre for Chinese Medicine Innovation & Department of Food Science and Nutrition, The Hong Kong Polytechnic University, Hung Hom, 999077, China.
| | - Huanxiang Liu
- Faculty of Applied Sciences, Macao Polytechnic University, 999078, China.
| |
Collapse
|
2
|
Bhatt R, Wang A, Durrant JD. Teaching old docks new tricks with machine learning enhanced ensemble docking. Sci Rep 2024; 14:20722. [PMID: 39237737 PMCID: PMC11377811 DOI: 10.1038/s41598-024-71699-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2024] [Accepted: 08/30/2024] [Indexed: 09/07/2024] Open
Abstract
We here introduce Ensemble Optimizer (EnOpt), a machine-learning tool to improve the accuracy and interpretability of ensemble virtual screening (VS). Ensemble VS is an established method for predicting protein/small-molecule (ligand) binding. Unlike traditional VS, which focuses on a single protein conformation, ensemble VS better accounts for protein flexibility by predicting binding to multiple protein conformations. Each compound is thus associated with a spectrum of scores (one score per protein conformation) rather than a single score. To effectively rank and prioritize the molecules for further evaluation (including experimental testing), researchers must select which protein conformations to consider and how best to map each compound's spectrum of scores to a single value, decisions that are system-specific. EnOpt uses machine learning to address these challenges. We perform benchmark VS to show that for many systems, EnOpt ranking distinguishes active compounds from inactive or decoy molecules more effectively than traditional ensemble VS methods. To encourage broad adoption, we release EnOpt free of charge under the terms of the MIT license.
Collapse
Affiliation(s)
- Roshni Bhatt
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Ann Wang
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Jacob D Durrant
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, 15260, USA.
| |
Collapse
|
3
|
Zhao Q, Yu Y, Hao N, Miao P, Li X, Liu C, Li Z. Data fusion of Laser-induced breakdown spectroscopy and Near-infrared spectroscopy to quantitatively detect heavy metals in lily. Microchem J 2023. [DOI: 10.1016/j.microc.2023.108670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/07/2023]
|
4
|
Nhat Phuong D, Flower DR, Chattopadhyay S, Chattopadhyay AK. Towards Effective Consensus Scoring in Structure-Based Virtual Screening. Interdiscip Sci 2023; 15:131-145. [PMID: 36550341 PMCID: PMC9941253 DOI: 10.1007/s12539-022-00546-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2022] [Revised: 12/11/2022] [Accepted: 12/12/2022] [Indexed: 12/24/2022]
Abstract
Virtual screening (VS) is a computational strategy that uses in silico automated protein docking inter alia to rank potential ligands, or by extension rank protein-ligand pairs, identifying potential drug candidates. Most docking methods use preferred sets of physicochemical descriptors (PCDs) to model the interactions between host and guest molecules. Thus, conventional VS is often data-specific, method-dependent and with demonstrably differing utility in identifying candidate drugs. This study proposes four universality classes of novel consensus scoring (CS) algorithms that combine docking scores, derived from ten docking programs (ADFR, DOCK, Gemdock, Ledock, PLANTS, PSOVina, QuickVina2, Smina, Autodock Vina and VinaXB), using decoys from the DUD-E repository ( http://dude.docking.org/ ) against 29 MRSA-oriented targets to create a general VS formulation that can identify active ligands for any suitable protein target. Our results demonstrate that CS provides improved ligand-protein docking fidelity when compared to individual docking platforms. This approach requires only a small number of docking combinations and can serve as a viable and parsimonious alternative to more computationally expensive docking approaches. Predictions from our CS algorithm are compared against independent machine learning evaluations using the same docking data, complementing the CS outcomes. Our method is a reliable approach for identifying protein targets and high-affinity ligands that can be tested as high-probability candidates for drug repositioning.
Collapse
Affiliation(s)
- Do Nhat Phuong
- grid.7273.10000 0004 0376 4727Department of Mathematics, College of Engineering and Physical Sciences, Aston University, Birmingham, B4 7ET UK
| | - Darren R. Flower
- grid.7273.10000 0004 0376 4727Life and Health Sciences, Aston University, Birmingham, B4 7ET UK
| | | | - Amit K. Chattopadhyay
- grid.7273.10000 0004 0376 4727Department of Mathematics, College of Engineering and Physical Sciences, Aston University, Birmingham, B4 7ET UK
| |
Collapse
|
5
|
Sciabola S, Torella R, Nagata A, Boehm M. Critical Assessment of State‐of‐the‐Art Ligand‐Based Virtual Screening Methods. Mol Inform 2022; 41:e2200103. [DOI: 10.1002/minf.202200103] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 07/24/2022] [Indexed: 11/10/2022]
|
6
|
Fan N, Hirte S, Kirchmair J. Maximizing the Performance of Similarity-Based Virtual Screening Methods by Generating Synergy from the Integration of 2D and 3D Approaches. Int J Mol Sci 2022; 23:ijms23147747. [PMID: 35887097 PMCID: PMC9322642 DOI: 10.3390/ijms23147747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 07/05/2022] [Accepted: 07/08/2022] [Indexed: 02/04/2023] Open
Abstract
Methods for the pairwise comparison of 2D and 3D molecular structures are established approaches in virtual screening. In this work, we explored three strategies for maximizing the virtual screening performance of these methods: (i) the merging of hit lists obtained from multi-compound screening using a single screening method, (ii) the merging of the hit lists obtained from 2D and 3D screening by parallel selection, and (iii) the combination of both of these strategies in an integrated approach. We found that any of these strategies led to a boost in virtual screening performance, with the clearest advantages observed for the integrated approach. On test sets for virtual screening, covering 50 pharmaceutically relevant proteins, the integrated approach, using sets of five query molecules, yielded, on average, an area under the receiver operating characteristic curve (AUC) of 0.84, an early enrichment among the top 1% of ranked compounds (EF1%) of 53.82 and a scaffold recovery rate among the top 1% of ranked compounds (SRR1%) of 0.50. In comparison, the 2D and 3D methods on their own (when using a single query molecule) yielded AUC values of 0.68 and 0.54, EF1% values of 19.96 and 17.52, and SRR1% values of 0.20 and 0.17, respectively. In conclusion, based on these results, the integration of 2D and 3D methods, via a (balanced) parallel selection strategy, is recommended, and, in particular, when combined with multi-query screening.
Collapse
Affiliation(s)
- Ningning Fan
- Center for Bioinformatics (ZBH), Department of Informatics, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, 20146 Hamburg, Germany;
| | - Steffen Hirte
- Division of Pharmaceutical Chemistry, Department of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, 1090 Vienna, Austria;
- Vienna Doctoral School of Pharmaceutical, Nutritional and Sport Sciences (PhaNuSpo), University of Vienna, 1090 Vienna, Austria
| | - Johannes Kirchmair
- Center for Bioinformatics (ZBH), Department of Informatics, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, 20146 Hamburg, Germany;
- Division of Pharmaceutical Chemistry, Department of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, 1090 Vienna, Austria;
- Correspondence: ; Tel.: +43-1-4277-55104
| |
Collapse
|
7
|
Feature Reduction for Molecular Similarity Searching Based on Autoencoder Deep Learning. Biomolecules 2022; 12:biom12040508. [PMID: 35454097 PMCID: PMC9029813 DOI: 10.3390/biom12040508] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2022] [Revised: 03/21/2022] [Accepted: 03/22/2022] [Indexed: 01/27/2023] Open
Abstract
The concept of molecular similarity has been commonly used in rational drug design, where structurally similar molecules are examined in molecular databases to retrieve functionally similar molecules. The most used conventional similarity methods used two-dimensional (2D) fingerprints to evaluate the similarity of molecules towards a target query. However, these descriptors include redundant and irrelevant features that might impact the performance of similarity searching methods. Thus, this study proposed a new approach for identifying the important features of molecules in chemical datasets based on the representation of the molecular features using Autoencoder (AE), with the aim of removing irrelevant and redundant features. The proposed approach experimented using the MDL Data Drug Report standard dataset (MDDR). Based on experimental findings, the proposed approach performed better than several existing benchmark similarity methods such as Tanimoto Similarity Method (TAN), Adapted Similarity Measure of Text Processing (ASMTP), and Quantum-Based Similarity Method (SQB). The results demonstrated that the performance achieved by the proposed approach has proven to be superior, particularly with the use of structurally heterogeneous datasets, where it yielded improved results compared to other previously used methods with the similar goal of improving molecular similarity searching.
Collapse
|
8
|
Extended continuous similarity indices: theory and application for QSAR descriptor selection. J Comput Aided Mol Des 2022; 36:157-173. [PMID: 35288838 DOI: 10.1007/s10822-022-00444-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 02/23/2022] [Indexed: 01/10/2023]
Abstract
Extended (or n-ary) similarity indices have been recently proposed to extend the comparative analysis of binary strings. Going beyond the traditional notion of pairwise comparisons, these novel indices allow comparing any number of objects at the same time. This results in a remarkable efficiency gain with respect to other approaches, since now we can compare N molecules in O(N) instead of the common quadratic O(N2) timescale. This favorable scaling has motivated the application of these indices to diversity selection, clustering, phylogenetic analysis, chemical space visualization, and post-processing of molecular dynamics simulations. However, the current formulation of the n-ary indices is limited to vectors with binary or categorical inputs. Here, we present the further generalization of this formalism so it can be applied to numerical data, i.e. to vectors with continuous components. We discuss several ways to achieve this extension and present their analytical properties. As a practical example, we apply this formalism to the problem of feature selection in QSAR and prove that the extended continuous similarity indices provide a convenient way to discern between several sets of descriptors.
Collapse
|
9
|
Quevedo-Tumailli V, Ortega-Tenezaca B, González-Díaz H. IFPTML Mapping of Drug Graphs with Protein and Chromosome Structural Networks vs. Pre-Clinical Assay Information for Discovery of Antimalarial Compounds. Int J Mol Sci 2021; 22:13066. [PMID: 34884870 PMCID: PMC8657696 DOI: 10.3390/ijms222313066] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 11/23/2021] [Accepted: 11/24/2021] [Indexed: 11/16/2022] Open
Abstract
The parasite species of genus Plasmodium causes Malaria, which remains a major global health problem due to parasite resistance to available Antimalarial drugs and increasing treatment costs. Consequently, computational prediction of new Antimalarial compounds with novel targets in the proteome of Plasmodium sp. is a very important goal for the pharmaceutical industry. We can expect that the success of the pre-clinical assay depends on the conditions of assay per se, the chemical structure of the drug, the structure of the target protein to be targeted, as well as on factors governing the expression of this protein in the proteome such as genes (Deoxyribonucleic acid, DNA) sequence and/or chromosomes structure. However, there are no reports of computational models that consider all these factors simultaneously. Some of the difficulties for this kind of analysis are the dispersion of data in different datasets, the high heterogeneity of data, etc. In this work, we analyzed three databases ChEMBL (Chemical database of the European Molecular Biology Laboratory), UniProt (Universal Protein Resource), and NCBI-GDV (National Center for Biotechnology Information-Genome Data Viewer) to achieve this goal. The ChEMBL dataset contains outcomes for 17,758 unique assays of potential Antimalarial compounds including numeric descriptors (variables) for the structure of compounds as well as a huge amount of information about the conditions of assays. The NCBI-GDV and UniProt datasets include the sequence of genes, proteins, and their functions. In addition, we also created two partitions (cassayj = caj and cdataj = cdj) of categorical variables from theChEMBL dataset. These partitions contain variables that encode information about experimental conditions of preclinical assays (caj) or about the nature and quality of data (cdj). These categorical variables include information about 22 parameters of biological activity (ca0), 28 target proteins (ca1), and 9 organisms of assay (ca2), etc. We also created another partition of (cprotj = cpj) including categorical variables with biological information about the target proteins, genes, and chromosomes. These variables cover32 genes (cp0), 10 chromosomes (cp1), gene orientation (cp2), and 31 protein functions (cp3). We used a Perturbation-Theory Machine Learning Information Fusion (IFPTML) algorithm to map all this information (from three databases) into and train a predictive model. Shannon's entropy measure Shk (numerical variables) was used to quantify the information about the structure of drugs, protein sequences, gene sequences, and chromosomes in the same information scale. Perturbation Theory Operators (PTOs) with the form of Moving Average (MA) operators have been used to quantify perturbations (deviations) in the structural variables with respect to their expected values for different subsets (partitions) of categorical variables. We obtained three IFPTML models using General Discriminant Analysis (GDA), Classification Tree with Univariate Splits (CTUS), and Classification Tree with Linear Combinations (CTLC). The IFPTML-CTLC presented the better performance with Sensitivity Sn(%) = 83.6/85.1, and Specificity Sp(%) = 89.8/89.7 for training/validation sets, respectively. This model could become a useful tool for the optimization of preclinical assays of new Antimalarial compounds vs. different proteins in the proteome of Plasmodium.
Collapse
Affiliation(s)
- Viviana Quevedo-Tumailli
- Grupo RNASA-IMEDIR, Department of Computer Science, University of A Coruña, 15071 A Coruña, Spain; (V.Q.-T.); (B.O.-T.)
- Research Department, Puyo Campus, Universidad Estatal Amazónica, Puyo 160150, Ecuador
| | - Bernabe Ortega-Tenezaca
- Grupo RNASA-IMEDIR, Department of Computer Science, University of A Coruña, 15071 A Coruña, Spain; (V.Q.-T.); (B.O.-T.)
- Information and Communications Technology Management Department, Puyo Campus, Universidad Estatal Amazónica, Puyo 160150, Ecuador
| | - Humberto González-Díaz
- Department of Organic and Inorganic Chemistry, University of the Basque Country UPV/EHU, 48940 Leioa, Spain
- BIOFISIKA, Basque Centre for Biophysics, CSIC-UPV/EHU, 48940 Leioa, Spain
- IKERBASQUE, Basque Foundation for Science, 48011 Bilbao, Spain
| |
Collapse
|
10
|
Ricci-Lopez J, Aguila SA, Gilson MK, Brizuela CA. Improving Structure-Based Virtual Screening with Ensemble Docking and Machine Learning. J Chem Inf Model 2021; 61:5362-5376. [PMID: 34652141 DOI: 10.1021/acs.jcim.1c00511] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
One of the main challenges of structure-based virtual screening (SBVS) is the incorporation of the receptor's flexibility, as its explicit representation in every docking run implies a high computational cost. Therefore, a common alternative to include the receptor's flexibility is the approach known as ensemble docking. Ensemble docking consists of using a set of receptor conformations and performing the docking assays over each of them. However, there is still no agreement on how to combine the ensemble docking results to obtain the final ligand ranking. A common choice is to use consensus strategies to aggregate the ensemble docking scores, but these strategies exhibit slight improvement regarding the single-structure approach. Here, we claim that using machine learning (ML) methodologies over the ensemble docking results could improve the predictive power of SBVS. To test this hypothesis, four proteins were selected as study cases: CDK2, FXa, EGFR, and HSP90. Protein conformational ensembles were built from crystallographic structures, whereas the evaluated compound library comprised up to three benchmarking data sets (DUD, DEKOIS 2.0, and CSAR-2012) and cocrystallized molecules. Ensemble docking results were processed through 30 repetitions of 4-fold cross-validation to train and validate two ML classifiers: logistic regression and gradient boosting trees. Our results indicate that the ML classifiers significantly outperform traditional consensus strategies and even the best performance case achieved with single-structure docking. We provide statistical evidence that supports the effectiveness of ML to improve the ensemble docking performance.
Collapse
Affiliation(s)
- Joel Ricci-Lopez
- Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja California C.P. 22860, Mexico.,Centro de Nanociencias y Nanotecnología, Universidad Nacional Autónoma de México (UNAM), Ensenada, Baja California C.P. 22860, Mexico
| | - Sergio A Aguila
- Centro de Nanociencias y Nanotecnología, Universidad Nacional Autónoma de México (UNAM), Ensenada, Baja California C.P. 22860, Mexico
| | - Michael K Gilson
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, La Jolla, San Diego, California 92093, United States
| | - Carlos A Brizuela
- Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja California C.P. 22860, Mexico
| |
Collapse
|
11
|
Combination of consensus and ensemble docking strategies for the discovery of human dihydroorotate dehydrogenase inhibitors. Sci Rep 2021; 11:11417. [PMID: 34075175 PMCID: PMC8169699 DOI: 10.1038/s41598-021-91069-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 05/21/2021] [Indexed: 02/06/2023] Open
Abstract
The inconsistencies in the performance of the virtual screening (VS) process, depending on the used software and structural conformation of the protein, is a challenging issue in the drug design and discovery field. Varying performance, especially in terms of early recognition of the potential hit compounds, negatively affects the whole process and leads to unnecessary waste of the time and resources. Appropriate application of the ensemble docking and consensus-scoring approaches can significantly increase reliability of the VS results. Dihydroorotate dehydrogenase (DHODH) is a key enzyme in the pyrimidine biosynthesis pathway. It is considered as a valuable therapeutic target in cancer, autoimmune and viral diseases. Based on the conducted benchmark study and analysis of the effect of different combinations of the applied methods and approaches, here we suggested a structure-based virtual screening (SBVS) workflow that can be used to increase the reliability of VS.
Collapse
|
12
|
Miranda-Quintana RA, Rácz A, Bajusz D, Héberger K. Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 2: speed, consistency, diversity selection. J Cheminform 2021; 13:33. [PMID: 33892799 PMCID: PMC8067665 DOI: 10.1186/s13321-021-00504-4] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Accepted: 03/12/2021] [Indexed: 11/10/2022] Open
Abstract
Despite being a central concept in cheminformatics, molecular similarity has so far been limited to the simultaneous comparison of only two molecules at a time and using one index, generally the Tanimoto coefficent. In a recent contribution we have not only introduced a complete mathematical framework for extended similarity calculations, (i.e. comparisons of more than two molecules at a time) but defined a series of novel idices. Part 1 is a detailed analysis of the effects of various parameters on the similarity values calculated by the extended formulas. Their features were revealed by sum of ranking differences and ANOVA. Here, in addition to characterizing several important aspects of the newly introduced similarity metrics, we will highlight their applicability and utility in real-life scenarios using datasets with popular molecular fingerprints. Remarkably, for large datasets, the use of extended similarity measures provides an unprecedented speed-up over “traditional” pairwise similarity matrix calculations. We also provide illustrative examples of a more direct algorithm based on the extended Tanimoto similarity to select diverse compound sets, resulting in much higher levels of diversity than traditional approaches. We discuss the inner and outer consistency of our indices, which are key in practical applications, showing whether the n-ary and binary indices rank the data in the same way. We demonstrate the use of the new n-ary similarity metrics on t-distributed stochastic neighbor embedding (t-SNE) plots of datasets of varying diversity, or corresponding to ligands of different pharmaceutical targets, which show that our indices provide a better measure of set compactness than standard binary measures. We also present a conceptual example of the applicability of our indices in agglomerative hierarchical algorithms. The Python code for calculating the extended similarity metrics is freely available at: https://github.com/ramirandaq/MultipleComparisons
Collapse
Affiliation(s)
| | - Anita Rácz
- Plasma Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, 1117, Budapest, Hungary
| | - Dávid Bajusz
- Medicinal Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, 1117, Budapest, Hungary
| | - Károly Héberger
- Plasma Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, 1117, Budapest, Hungary.
| |
Collapse
|
13
|
Miranda-Quintana RA, Bajusz D, Rácz A, Héberger K. Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 1: Theory and characteristics †. J Cheminform 2021; 13:32. [PMID: 33892802 PMCID: PMC8067658 DOI: 10.1186/s13321-021-00505-3] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Accepted: 03/12/2021] [Indexed: 12/14/2022] Open
Abstract
Quantification of the similarity of objects is a key concept in many areas of computational science. This includes cheminformatics, where molecular similarity is usually quantified based on binary fingerprints. While there is a wide selection of available molecular representations and similarity metrics, there were no previous efforts to extend the computational framework of similarity calculations to the simultaneous comparison of more than two objects (molecules) at the same time. The present study bridges this gap, by introducing a straightforward computational framework for comparing multiple objects at the same time and providing extended formulas for as many similarity metrics as possible. In the binary case (i.e. when comparing two molecules pairwise) these are naturally reduced to their well-known formulas. We provide a detailed analysis on the effects of various parameters on the similarity values calculated by the extended formulas. The extended similarity indices are entirely general and do not depend on the fingerprints used. Two types of variance analysis (ANOVA) help to understand the main features of the indices: (i) ANOVA of mean similarity indices; (ii) ANOVA of sum of ranking differences (SRD). Practical aspects and applications of the extended similarity indices are detailed in the accompanying paper: Miranda-Quintana et al. J Cheminform. 2021. https://doi.org/10.1186/s13321-021-00504-4 . Python code for calculating the extended similarity metrics is freely available at: https://github.com/ramirandaq/MultipleComparisons .
Collapse
Affiliation(s)
| | - Dávid Bajusz
- Medicinal Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, 1117, Budapest, Hungary
| | - Anita Rácz
- Plasma Chemistry Research Group, ELKH Research Centre for Natural Sciences, Magyar tudósok krt. 2, 1117, Budapest, Hungary
| | - Károly Héberger
- Plasma Chemistry Research Group, ELKH Research Centre for Natural Sciences, Magyar tudósok krt. 2, 1117, Budapest, Hungary.
| |
Collapse
|
14
|
Miranda-Quintana RA, Bajusz D, Rácz A, Héberger K. Differential Consistency Analysis: Which Similarity Measures can be Applied in Drug Discovery? Mol Inform 2021; 40:e2060017. [PMID: 33891369 DOI: 10.1002/minf.202060017] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Accepted: 03/01/2021] [Indexed: 12/16/2022]
Abstract
Similarity measures are widely used in various areas from taxonomy to cheminformatics. To this end, a large number of similarity and distance measures (or, collectively, comparative measures) have been introduced, with only a few studies directed to revealing their inner relationships. We present a thorough analytical study of the conditions leading to two comparative measures providing equivalent results over a given set of molecules. A key part of this work is the introduction of a novel way to study the consistency between comparative measures: the differential consistency analysis (DCA). This tool reveals how the consistency can be established in an analytical way with minimal (or no) assumptions. We found that the consensus between Tanimoto and the Cosine coefficients improved by choosing a reference whose similarity to the rest of the molecules varies less, or by representing the molecules in a way that does not depend strongly on their size (i. e. bit frequency in the chosen fingerprint representation). The presented derivations are just some generic examples; DCA can be applied widely and for all binary similarity coefficients introduced so far, independently from the molecular representations.
Collapse
Affiliation(s)
| | - Dávid Bajusz
- Medicinal Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, 1117, Budapest, Hungary
| | - Anita Rácz
- Plasma Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, 1117, Budapest, Hungary
| | - Károly Héberger
- Plasma Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, 1117, Budapest, Hungary
| |
Collapse
|
15
|
Zhang X, Shen C, Guo X, Wang Z, Weng G, Ye Q, Wang G, He Q, Yang B, Cao D, Hou T. ASFP (Artificial Intelligence based Scoring Function Platform): a web server for the development of customized scoring functions. J Cheminform 2021; 13:6. [PMID: 33541407 PMCID: PMC7860246 DOI: 10.1186/s13321-021-00486-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Accepted: 01/17/2021] [Indexed: 12/18/2022] Open
Abstract
Virtual screening (VS) based on molecular docking has emerged as one of the mainstream technologies of drug discovery due to its low cost and high efficiency. However, the scoring functions (SFs) implemented in most docking programs are not always accurate enough and how to improve their prediction accuracy is still a big challenge. Here, we propose an integrated platform called ASFP, a web server for the development of customized SFs for structure-based VS. There are three main modules in ASFP: (1) the descriptor generation module that can generate up to 3437 descriptors for the modelling of protein–ligand interactions; (2) the AI-based SF construction module that can establish target-specific SFs based on the pre-generated descriptors through three machine learning (ML) techniques; (3) the online prediction module that provides some well-constructed target-specific SFs for VS and an additional generic SF for binding affinity prediction. Our methodology has been validated on several benchmark datasets. The target-specific SFs can achieve an average ROC AUC of 0.973 towards 32 targets and the generic SF can achieve the Pearson correlation coefficient of 0.81 on the PDBbind version 2016 core set. To sum up, the ASFP server is a powerful tool for structure-based VS.
Collapse
Affiliation(s)
- Xujun Zhang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China
| | - Chao Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China
| | - Xueying Guo
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China
| | - Zhe Wang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China
| | - Gaoqi Weng
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China
| | - Qing Ye
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China
| | - Gaoang Wang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China
| | - Qiaojun He
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China
| | - Bo Yang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China.
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan, 10013, China.
| | - Tingjun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China. .,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang, 310058, China.
| |
Collapse
|
16
|
Venkatesh B, Anuradha J. A fuzzy gaussian rank aggregation ensemble feature selection method for microarray data. INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED AND INTELLIGENT ENGINEERING SYSTEMS 2021. [DOI: 10.3233/kes-190134] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
In Microarray Data, it is complicated to achieve more classification accuracy due to the presence of high dimensions, irrelevant and noisy data. And also It had more gene expression data and fewer samples. To increase the classification accuracy and the processing speed of the model, an optimal number of features need to extract, this can be achieved by applying the feature selection method. In this paper, we propose a hybrid ensemble feature selection method. The proposed method has two phases, filter and wrapper phase in filter phase ensemble technique is used for aggregating the feature ranks of the Relief, minimum redundancy Maximum Relevance (mRMR), and Feature Correlation (FC) filter feature selection methods. This paper uses the Fuzzy Gaussian membership function ordering for aggregating the ranks. In wrapper phase, Improved Binary Particle Swarm Optimization (IBPSO) is used for selecting the optimal features, and the RBF Kernel-based Support Vector Machine (SVM) classifier is used as an evaluator. The performance of the proposed model are compared with state of art feature selection methods using five benchmark datasets. For evaluation various performance metrics such as Accuracy, Recall, Precision, and F1-Score are used. Furthermore, the experimental results show that the performance of the proposed method outperforms the other feature selection methods.
Collapse
|
17
|
Nasser M, Salim N, Hamza H, Saeed F, Rabiu I. Improved Deep Learning Based Method for Molecular Similarity Searching Using Stack of Deep Belief Networks. Molecules 2020; 26:E128. [PMID: 33383976 PMCID: PMC7795308 DOI: 10.3390/molecules26010128] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 12/24/2020] [Accepted: 12/25/2020] [Indexed: 11/24/2022] Open
Abstract
Virtual screening (VS) is a computational practice applied in drug discovery research. VS is popularly applied in a computer-based search for new lead molecules based on molecular similarity searching. In chemical databases similarity searching is used to identify molecules that have similarities to a user-defined reference structure and is evaluated by quantitative measures of intermolecular structural similarity. Among existing approaches, 2D fingerprints are widely used. The similarity of a reference structure and a database structure is measured by the computation of association coefficients. In most classical similarity approaches, it is assumed that the molecular features in both biological and non-biologically-related activity carry the same weight. However, based on the chemical structure, it has been found that some distinguishable features are more important than others. Hence, this difference should be taken consideration by placing more weight on each important fragment. The main aim of this research is to enhance the performance of similarity searching by using multiple descriptors. In this paper, a deep learning method known as deep belief networks (DBN) has been used to reweight the molecule features. Several descriptors have been used for the MDL Drug Data Report (MDDR) dataset each of which represents different important features. The proposed method has been implemented with each descriptor individually to select the important features based on a new weight, with a lower error rate, and merging together all new features from all descriptors to produce a new descriptor for similarity searching. Based on the extensive experiments conducted, the results show that the proposed method outperformed several existing benchmark similarity methods, including Bayesian inference networks (BIN), the Tanimoto similarity method (TAN), adapted similarity measure of text processing (ASMTP) and the quantum-based similarity method (SQB). The results of this proposed multi-descriptor-based on Stack of deep belief networks method (SDBN) demonstrated a higher accuracy compared to existing methods on structurally heterogeneous datasets.
Collapse
Affiliation(s)
- Maged Nasser
- School of Computing, Universiti Teknologi Malaysia, Johor Bahru 81310, Malaysia; (H.H.); (I.R.)
| | - Naomie Salim
- School of Computing, Universiti Teknologi Malaysia, Johor Bahru 81310, Malaysia; (H.H.); (I.R.)
| | - Hentabli Hamza
- School of Computing, Universiti Teknologi Malaysia, Johor Bahru 81310, Malaysia; (H.H.); (I.R.)
| | - Faisal Saeed
- College of Computer Science and Engineering, Taibah University, Medina 344, Saudi Arabia
| | - Idris Rabiu
- School of Computing, Universiti Teknologi Malaysia, Johor Bahru 81310, Malaysia; (H.H.); (I.R.)
| |
Collapse
|
18
|
Miften FS, Diykh M, Abdulla S, Siuly S, Green JH, Deo RC. A new framework for classification of multi-category hand grasps using EMG signals. Artif Intell Med 2020; 112:102005. [PMID: 33581825 DOI: 10.1016/j.artmed.2020.102005] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Revised: 12/10/2020] [Accepted: 12/23/2020] [Indexed: 11/26/2022]
Abstract
Electromyogram (EMG) signals have had a great impact on many applications, including prosthetic or rehabilitation devices, human-machine interactions, clinical and biomedical areas. In recent years, EMG signals have been used as a popular tool to generate device control commands for rehabilitation equipment, such as robotic prostheses. This intention of this study was to design an EMG signal-based expert model for hand-grasp classification that could enhance prosthetic hand movements for people with disabilities. The study, thus, aimed to introduce an innovative framework for recognising hand movements using EMG signals. The proposed framework consists of logarithmic spectrogram-based graph signal (LSGS), AdaBoost k-means (AB-k-means) and an ensemble of feature selection (FS) techniques. First, the LSGS model is applied to analyse and extract the desirable features from EMG signals. Then, to assist in selecting the most influential features, an ensemble FS is added to the design. Finally, in the classification phase, a novel classification model, named AB-k-means, is developed to classify the selected EMG features into different hand grasps. The proposed hybrid model, LSGS-based scheme is evaluated with a publicly available EMG hand movement dataset from the UCI repository. Using the same dataset, the LSGS-AB-k-means design model is also benchmarked with several classifications including the state-of-the-art algorithms. The results demonstrate that the proposed model achieves a high classification rate and demonstrates superior results compared to several previous research works. This study, therefore, establishes that the proposed model can accurately classify EMG hand grasps and can be implemented as a control unit with low cost and a high classification rate.
Collapse
Affiliation(s)
| | - Mohammed Diykh
- School of Sciences, University of Southern Queensland, Australia; University of Thi-Qar, College of Education for Pure Science, Iraq.
| | - Shahab Abdulla
- USQ College, University of Southern Queensland, Australia.
| | - Siuly Siuly
- Institute for Sustainable Industries & Liveable Cities, Victoria University, Australia.
| | - Jonathan H Green
- USQ College, University of Southern Queensland, Australia; Faculty of the Humanities, University of the Free State, South Africa.
| | - Ravinesh C Deo
- School of Sciences, University of Southern Queensland, Australia.
| |
Collapse
|
19
|
Bhattacharjee H, Vlachos DG. Thermochemical Data Fusion Using Graph Representation Learning. J Chem Inf Model 2020; 60:4673-4683. [PMID: 32966072 DOI: 10.1021/acs.jcim.0c00699] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Large databases are required for "Big Data" applications in catalysis and materials science. Thermochemical databases can be created by combining data from various sources and by correcting low-fidelity data sets to higher accuracy with minimal computation. To achieve this "data fusion", thermochemical quantities of interest, calculated at various levels of density functional theory (DFT), need to be mapped to the same, high levels of theory. In this work, a graph theoretical, statistical framework is proposed for such tasks. Subgraph frequencies are shown to provide a natural representation for learning these fusion maps. The maps are linear and are learnt with automated descriptor selection. Using a data set of as few as ∼1% from the QM9 database of 133 885 molecules, these models can predict multiple thermochemical quantities at a higher level of theory with an accuracy of 1 kcal/mol. The method is explainable, generalizable, and provides a diagnostic tool for outlier identification.
Collapse
Affiliation(s)
- Himaghna Bhattacharjee
- Department of Chemical and Biomolecular Engineering, University of Delaware, 150 Academy Street, Newark, Delaware 19716, United States.,Catalysis Center for Energy Innovation and RAPID Manufacturing Institute, 221 Academy Street, Newark, Delaware 19716, United States
| | - Dionisios G Vlachos
- Department of Chemical and Biomolecular Engineering, University of Delaware, 150 Academy Street, Newark, Delaware 19716, United States.,Catalysis Center for Energy Innovation and RAPID Manufacturing Institute, 221 Academy Street, Newark, Delaware 19716, United States
| |
Collapse
|
20
|
Vázquez J, López M, Gibert E, Herrero E, Luque FJ. Merging Ligand-Based and Structure-Based Methods in Drug Discovery: An Overview of Combined Virtual Screening Approaches. Molecules 2020; 25:E4723. [PMID: 33076254 PMCID: PMC7587536 DOI: 10.3390/molecules25204723] [Citation(s) in RCA: 97] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Revised: 10/06/2020] [Accepted: 10/11/2020] [Indexed: 12/20/2022] Open
Abstract
Virtual screening (VS) is an outstanding cornerstone in the drug discovery pipeline. A variety of computational approaches, which are generally classified as ligand-based (LB) and structure-based (SB) techniques, exploit key structural and physicochemical properties of ligands and targets to enable the screening of virtual libraries in the search of active compounds. Though LB and SB methods have found widespread application in the discovery of novel drug-like candidates, their complementary natures have stimulated continued efforts toward the development of hybrid strategies that combine LB and SB techniques, integrating them in a holistic computational framework that exploits the available information of both ligand and target to enhance the success of drug discovery projects. In this review, we analyze the main strategies and concepts that have emerged in the last years for defining hybrid LB + SB computational schemes in VS studies. Particularly, attention is focused on the combination of molecular similarity and docking, illustrating them with selected applications taken from the literature.
Collapse
Affiliation(s)
- Javier Vázquez
- Pharmacelera, Plaça Pau Vila, 1, Sector C 2a, Edificio Palau de Mar, 08039 Barcelona, Spain;
- Department of Nutrition, Food Science and Gastronomy, Faculty of Pharmacy and Food Sciences, Institute of Biomedicine (IBUB), and Institute of Theoretical and Computational Chemistry (IQTC-UB), University of Barcelona, Av. Prat de la Riba 171, E-08921 Santa Coloma de Gramanet, Spain
| | - Manel López
- AB Science, Parc Scientifique de Luminy, Zone Luminy Enterprise, Case 922, 163 Av. de Luminy, 13288 Marseille, France;
| | - Enric Gibert
- Pharmacelera, Plaça Pau Vila, 1, Sector C 2a, Edificio Palau de Mar, 08039 Barcelona, Spain;
| | - Enric Herrero
- Pharmacelera, Plaça Pau Vila, 1, Sector C 2a, Edificio Palau de Mar, 08039 Barcelona, Spain;
| | - F. Javier Luque
- Department of Nutrition, Food Science and Gastronomy, Faculty of Pharmacy and Food Sciences, Institute of Biomedicine (IBUB), and Institute of Theoretical and Computational Chemistry (IQTC-UB), University of Barcelona, Av. Prat de la Riba 171, E-08921 Santa Coloma de Gramanet, Spain
| |
Collapse
|
21
|
Prieto-Martínez FD, Medina-Franco JL. Current advances on the development of BET inhibitors: insights from computational methods. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2020; 122:127-180. [PMID: 32951810 DOI: 10.1016/bs.apcsb.2020.06.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Epigenetics was coined almost 70 years ago for the description of heritable phenotype without altering DNA sequences. Research on the field has uncovered significant roles of such mechanisms, that account for the biogenesis of several diseases. Further studies have led the way for drug development which targets epi-enzymes, mainly for cancer treatment. Of the numerous epi-targets involved with histone acetylation, bromodomains have captured the spotlight of drug discovery focused on novel therapies. However, due to high sequence identity, the development of potent and selective inhibitors poses a significant challenge. Herein, we discuss recent computational developments on BET inhibitors and other methods that may be applied for drug discovery in general. As a proof-of-concept, we discuss a virtual screening to identify novel BET inhibitors based on coumarin derivatives. From public data, we identified putative structure-activity relationships of coumarin scaffold and propose R-group modifications for BET selectivity. Results showed that the optimization and design of novel coumarins could be further explored.
Collapse
Affiliation(s)
- Fernando D Prieto-Martínez
- Department of Pharmacy, School of Chemistry, National Autonomous University of Mexico, Mexico City, Mexico
| | - José L Medina-Franco
- Department of Pharmacy, School of Chemistry, National Autonomous University of Mexico, Mexico City, Mexico
| |
Collapse
|
22
|
Carlos HV, Marta BL, Orlando PM, Samuel UE, Sader R, Seifert LB. Stress distribution is susceptible to the angle of the osteotomy in the high oblique sagittal osteotomy (HOSO): biomechanical evaluation using finite element analyses. Comput Methods Biomech Biomed Engin 2020; 24:67-75. [PMID: 32845167 DOI: 10.1080/10255842.2020.1810242] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
AIM This computational study aimed to evaluate the influence of the angle of the osteotomy when performing a high oblique sagittal osteotomy over the distribution of stress to the osteosynthesis plates and mandibular segments. Material and methods: For this purpose, a finite element analysis of different combinations was carried out based on the osteotomy angle and mandibular mobilization using Autodesk Inventor® resulting in a total of 72 simulations. To check the correlation between the osteotomy angles with respect to the tension in the mandibular structure in different mobilizations, a student t-test was used. Results: The results of the advancement mobilizations (2.5 mm to 5.5 mm) reported increasing values for tension in the probe of the fourth screw and in the probe of the plate surface as the osteotomy angle increased (p-value <10-8). The results of the setback mobilizations (-2.5 mm to -5.5 mm) show comparable values (p-value <10-8). The resulting contact surface between bone segments varies depending on the osteotomy angle, increasing 44.67% from 45° to 70° and decreasing 22.05% when the angle is reduced to 30°. Conclusion: The angle of the osteotomy is a very relevant parameter in the design of the studied mandibular osteotomy, since the distribution of the reported stresses is substantially susceptible to its variation.
Collapse
Affiliation(s)
- Herrera-Vizcaíno Carlos
- Department of Oral, Maxillofacial and Facial Plastic Surgery, University Hospital Frankfurt, Goethe University, Frankfurt am Main, Germany
| | - Baselga Lahoz Marta
- Applied Mechanics and Bioengineering Group (AMB) of Aragón Institute of Engineering Research (I3A), University of Zaragoza, Zaragoza, Spain
| | | | - Udeabor E Samuel
- Department of Oral and Maxillofacial Surgery, College of Dentistry, King Khalid University, Abha, Saudi Arabia
| | - Robert Sader
- Department of Oral, Maxillofacial and Facial Plastic Surgery, University Hospital Frankfurt, Goethe University, Frankfurt am Main, Germany
| | - Lukas Benedikt Seifert
- Department of Oral, Maxillofacial and Facial Plastic Surgery, University Hospital Frankfurt, Goethe University, Frankfurt am Main, Germany
| |
Collapse
|
23
|
Santibáñez-Morán MG, López-López E, Prieto-Martínez FD, Sánchez-Cruz N, Medina-Franco JL. Consensus virtual screening of dark chemical matter and food chemicals uncover potential inhibitors of SARS-CoV-2 main protease. RSC Adv 2020; 10:25089-25099. [PMID: 35517466 PMCID: PMC9055157 DOI: 10.1039/d0ra04922k] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Accepted: 06/23/2020] [Indexed: 12/15/2022] Open
Abstract
The pandemic caused by SARS-CoV-2 (COVID-19 disease) has claimed more than 500 000 lives worldwide, and more than nine million people are infected. Unfortunately, an effective drug or vaccine for its treatment is yet to be found. The increasing information available on critical molecular targets of SARS-CoV-2 and active compounds against related coronaviruses facilitates the proposal (or repurposing) of drug candidates for the treatment of COVID-19, with the aid of in silico methods. As part of a global effort to fight the COVID-19 pandemic, herein we report a consensus virtual screening of extensive collections of food chemicals and compounds known as dark chemical matter. The rationale is to contribute to global efforts with a description of currently underexplored chemical space regions. The consensus approach included combining similarity searching with various queries and fingerprints, molecular docking with two docking protocols, and ADMETox profiling. We propose compounds commercially available for experimental testing. The full list of virtual screening hits is disclosed.
Collapse
Affiliation(s)
- Marisa G Santibáñez-Morán
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México Mexico City Mexico +52 (55) 5622-3899, ext. 44458
| | - Edgar López-López
- Department of Pharmacology, Center of Research and Advanced Studies of the National Polytechnic Institute (CINVESTAV) Mexico City Mexico
| | - Fernando D Prieto-Martínez
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México Mexico City Mexico +52 (55) 5622-3899, ext. 44458
| | - Norberto Sánchez-Cruz
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México Mexico City Mexico +52 (55) 5622-3899, ext. 44458
| | - José L Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México Mexico City Mexico +52 (55) 5622-3899, ext. 44458
| |
Collapse
|
24
|
Gamela RR, Costa VC, Sperança MA, Pereira-Filho ER. Laser-induced breakdown spectroscopy (LIBS) and wavelength dispersive X-ray fluorescence (WDXRF) data fusion to predict the concentration of K, Mg and P in bean seed samples. Food Res Int 2020; 132:109037. [DOI: 10.1016/j.foodres.2020.109037] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Revised: 01/23/2020] [Accepted: 01/25/2020] [Indexed: 12/23/2022]
|
25
|
Sziklai BR, Héberger K. Apportionment and districting by Sum of Ranking Differences. PLoS One 2020; 15:e0229209. [PMID: 32203513 PMCID: PMC7089544 DOI: 10.1371/journal.pone.0229209] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Accepted: 01/14/2020] [Indexed: 11/18/2022] Open
Abstract
Sum of Ranking Differences is an innovative statistical method that ranks competing solutions based on a reference point. The latter might arise naturally, or can be aggregated from the data. We provide two case studies to feature both possibilities. Apportionment and districting are two critical issues that emerge in relation to democratic elections. Theoreticians invented clever heuristics to measure malapportionment and the compactness of the shape of the constituencies, yet, there is no unique best method in either cases. Using data from Norway and the US we rank the standard methods both for the apportionment and for the districting problem. In case of apportionment, we find that all the classical methods perform reasonably well, with subtle but significant differences. By a small margin the Leximin method emerges as a winner, but—somewhat unexpectedly—the non-regular Imperiali method ties for first place. In districting, the Lee-Sallee index and a novel parametric method the so-called Moment Invariant performs the best, although the latter is sensitive to the function’s chosen parameter.
Collapse
Affiliation(s)
- Balázs R. Sziklai
- Institute of Economics, Centre for Economic and Regional Studies, Budapest, Hungary
- Department of Operations Research and Actuarial Sciences, Corvinus University of Budapest, Budapest, Hungary
- * E-mail:
| | - Károly Héberger
- Institute of Materials and Environmental Chemistry, Research Centre for Natural Sciences, Budapest, Hungary
| |
Collapse
|
26
|
Abstract
The ccbmlib Python package is a collection of modules for modeling similarity value distributions based on Tanimoto coefficients for fingerprints available in RDKit. It can be used to assess the statistical significance of Tanimoto coefficients and evaluate how molecular similarity is reflected when different fingerprint representations are used. Significance measures derived from p-values allow a quantitative comparison of similarity scores obtained from different fingerprint representations that might have very different value ranges. Furthermore, the package models conditional distributions of similarity coefficients for a given reference compound. The conditional significance score estimates where a test compound would be ranked in a similarity search. The models are based on the statistical analysis of feature distributions and feature correlations of fingerprints of a reference database. The resulting models have been evaluated for 11 RDKit fingerprints, taking a collection of ChEMBL compounds as a reference data set. For most fingerprints, highly accurate models were obtained, with differences of 1% or less for Tanimoto coefficients indicating high similarity.
Collapse
Affiliation(s)
- Martin Vogt
- Department of Life Science Informatics, B-IT, University of Bonn, Endenicher Allee 19c, Bonn, NRW, 53115, Germany
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, University of Bonn, Endenicher Allee 19c, Bonn, NRW, 53115, Germany
| |
Collapse
|
27
|
Vogt M, Bajorath J. ccbmlib - a Python package for modeling Tanimoto similarity value distributions. F1000Res 2020; 9:Chem Inf Sci-100. [PMID: 32161645 PMCID: PMC7050271 DOI: 10.12688/f1000research.22292.1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/04/2020] [Indexed: 11/15/2023] Open
Abstract
The ccbmlib Python package is a collection of modules for modeling similarity value distributions based on Tanimoto coefficients for fingerprints available in RDKit. It can be used to assess the statistical significance of Tanimoto coefficients and evaluate how molecular similarity is reflected when different fingerprint representations are used. Significance measures derived from p-values allow a quantitative comparison of similarity scores obtained from different fingerprint representations that might have very different value ranges. Furthermore, the package models conditional distributions of similarity coefficients for a given reference compound. The conditional significance score estimates where a test compound would be ranked in a similarity search. The models are based on the statistical analysis of feature distributions and feature correlations of fingerprints of a reference database. The resulting models have been evaluated for 11 RDKit fingerprints, taking a collection of ChEMBL compounds as a reference data set. For most fingerprints, highly accurate models were obtained, with differences of 1% or less for Tanimoto coefficients indicating high similarity.
Collapse
Affiliation(s)
- Martin Vogt
- Department of Life Science Informatics, B-IT, University of Bonn, Endenicher Allee 19c, Bonn, NRW, 53115, Germany
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, University of Bonn, Endenicher Allee 19c, Bonn, NRW, 53115, Germany
| |
Collapse
|
28
|
Martinez-Mayorga K, Madariaga-Mazon A, Medina-Franco JL, Maggiora G. The impact of chemoinformatics on drug discovery in the pharmaceutical industry. Expert Opin Drug Discov 2020; 15:293-306. [PMID: 31965870 DOI: 10.1080/17460441.2020.1696307] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Introduction: Even though there have been substantial advances in our understanding of biological systems, research in drug discovery is only just now beginning to utilize this type of information. The single-target paradigm, which exemplifies the reductionist approach, remains a mainstay of drug research today. A deeper view of the complexity involved in drug discovery is necessary to advance on this field.Areas covered: This perspective provides a summary of research areas where cheminformatics has played a key role in drug discovery, including of the available resources as well as a personal perspective of the challenges still faced in the field.Expert opinion: Although great strides have been made in the handling and analysis of biological and pharmacological data, more must be done to link the data to biological pathways. This is crucial if one is to understand how drugs modify disease phenotypes, although this will involve a shift from the single drug/single target paradigm that remains a mainstay of drug research. Moreover, such a shift would require an increased awareness of the role of physiology in the mechanism of drug action, which will require the introduction of new mathematical, computer, and biological methods for chemoinformaticians to be trained in.
Collapse
Affiliation(s)
| | | | - José L Medina-Franco
- Facultad de Química, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | | |
Collapse
|
29
|
Santana R, Zuluaga R, Gañán P, Arrasate S, Onieva E, González-Díaz H. Designing nanoparticle release systems for drug-vitamin cancer co-therapy with multiplicative perturbation-theory machine learning (PTML) models. NANOSCALE 2019; 11:21811-21823. [PMID: 31691701 DOI: 10.1039/c9nr05070a] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Nano-systems for cancer co-therapy including vitamins or vitamin derivatives have showed adequate results to continue with further research studies to better understand them. However, the number of different combinations of drugs, vitamins, nanoparticle types, coating agents, synthesis conditions, and system types (nanocapsules, micelles, etc.) to be tested is very large generating a high cost in experimentations. In this context, there are reports of large datasets of preclinical assays of compounds (like in the ChEMBL database) and increasing but yet limited reports of experimental measurements of nano-systems per se. On the other hand, Machine Learning is gaining momentum in Nanotechnology and Pharmaceutical Sciences as a tool for rational design of new drugs and drug-release nano-systems. In this work, we propose to combine Perturbation Theory principles and Machine Learning to develop a PTML model for rational selection of the components of cancer co-therapy drug-vitamin release nano-systems (DVRNs). In doing so, we apply information fusion techniques with 2 data sets: (1) a large ChEMBL dataset of >36 000 preclinical assays of vitamin derivatives and a new dataset of >1000 outcomes of DVRNs, collected herein from the literature for the first time. The ChEMBL dataset used covers a considerable number of assay conditions (cjvit) each one with multiple levels. These conditions included >504 biological activity parameters (c0vit), >340 types of proteins (c1vit), >650 types of cells (c2vit), >120 assay organisms (c3vit), >60 assay strains (c4vit). Regarding the DVRNs, there are 25 different types of nano-systems (njn), with up to 16 conditions (cjn) including also different levels such as 8 biological activity parameters (c0n), 9 raw nanomaterials (c4n), 15 assay cells (c11n), etc. In the first stage, we used Moving Average operators to quantify the perturbations (deviations) in all input variables with respect to the conditions. After that, we used multiplicative PT operators to carry out data fusion, and dimension reduction, and Linear Discriminant Analysis (LDA) to seek the PTML model. The best PTML model found showed values of specificity, sensitivity, and accuracy in the range of 83-88% in training and external validation series for >130 000 cases (DVRNs vs. ChEMBL data pairs) formed after data fusion. To the best of our knowledge, this is the first general purpose model for the rational design of DVRNs for cancer co-therapy.
Collapse
|
30
|
Schuler J, Samudrala R. Fingerprinting CANDO: Increased Accuracy with Structure- and Ligand-Based Shotgun Drug Repurposing. ACS OMEGA 2019; 4:17393-17403. [PMID: 31656912 PMCID: PMC6812124 DOI: 10.1021/acsomega.9b02160] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Accepted: 08/30/2019] [Indexed: 05/08/2023]
Abstract
We have upgraded our Computational Analysis of Novel Drug Opportunities (CANDO) platform for shotgun drug repurposing by including ligand-based, data fusion, and decision tree pipelines. The goal of shotgun drug repurposing is to screen and rank every existing human use drug or compound for every disease/indication. The first version of CANDO implemented a structure-based pipeline that modeled interactions between compounds and proteins on a large scale, generating compound-proteome interaction signatures used to infer the similarity of drug behavior; the new pipelines accomplish this by incorporating molecular fingerprints and the Tanimoto coefficient. We obtain improved benchmarking performance with the new pipelines across all three evaluation metrics used: average indication accuracy, pairwise accuracy, and coverage. The best performing pipeline achieves an average indication accuracy of 19.0% at the top10 cutoff, compared to 11.7% for v1, and 2.2% for a random control. Our results demonstrate that the CANDO drug recovery accuracy is substantially improved by integrating multiple pipelines, thereby enhancing our ability to generate putative therapeutic repurposing candidates, and increasing drug discovery efficiency.
Collapse
Affiliation(s)
- James Schuler
- Department of Biomedical
Informatics, Jacobs School of Medicine and
Biomedical Sciences at the University at Buffalo, Buffalo, New York 14203, United States
| | - Ram Samudrala
- Department of Biomedical
Informatics, Jacobs School of Medicine and
Biomedical Sciences at the University at Buffalo, Buffalo, New York 14203, United States
| |
Collapse
|
31
|
Lipkus AH, Watkins SP, Gengras K, McBride MJ, Wills TJ. Recent Changes in the Scaffold Diversity of Organic Chemistry As Seen in the CAS Registry. J Org Chem 2019; 84:13948-13956. [DOI: 10.1021/acs.joc.9b02111] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Alan H. Lipkus
- CAS, P.O. Box 3012, Columbus, Ohio 43210-0012, United States
| | | | - Keith Gengras
- CAS, P.O. Box 3012, Columbus, Ohio 43210-0012, United States
| | | | - Todd J. Wills
- CAS, P.O. Box 3012, Columbus, Ohio 43210-0012, United States
| |
Collapse
|
32
|
Bajusz D, Rácz A, Héberger K. Comparison of Data Fusion Methods as Consensus Scores for Ensemble Docking. Molecules 2019; 24:E2690. [PMID: 31344902 PMCID: PMC6695709 DOI: 10.3390/molecules24152690] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Revised: 07/18/2019] [Accepted: 07/22/2019] [Indexed: 12/05/2022] Open
Abstract
Ensemble docking is a widely applied concept in structure-based virtual screening-to at least partly account for protein flexibility-usually granting a significant performance gain at a modest cost of speed. From the individual, single-structure docking scores, a consensus score needs to be produced by data fusion: this is usually done by taking the best docking score from the available pool (in most cases- and in this study as well-this is the minimum score). Nonetheless, there are a number of other fusion rules that can be applied. We report here the results of a detailed statistical comparison of seven fusion rules for ensemble docking, on five case studies of current drug targets, based on four performance metrics. Sevenfold cross-validation and variance analysis (ANOVA) allowed us to highlight the best fusion rules. The results are presented in bubble plots, to unite the four performance metrics into a single, comprehensive image. Notably, we suggest the use of the geometric and harmonic means as better alternatives to the generally applied minimum fusion rule.
Collapse
Affiliation(s)
- Dávid Bajusz
- Medicinal Chemistry Research Group, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Magyar tudósok krt. 2, H-1117 Budapest, Hungary
| | - Anita Rácz
- Plasma Chemistry Research Group, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Magyar tudósok krt. 2, H-1117 Budapest, Hungary.
| | - Károly Héberger
- Plasma Chemistry Research Group, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Magyar tudósok krt. 2, H-1117 Budapest, Hungary
| |
Collapse
|
33
|
Fragment-based discovery of a chemical probe for the PWWP1 domain of NSD3. Nat Chem Biol 2019; 15:822-829. [PMID: 31285596 DOI: 10.1038/s41589-019-0310-x] [Citation(s) in RCA: 63] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Accepted: 05/19/2019] [Indexed: 01/10/2023]
Abstract
Here, we report the fragment-based discovery of BI-9321, a potent, selective and cellular active antagonist of the NSD3-PWWP1 domain. The human NSD3 protein is encoded by the WHSC1L1 gene located in the 8p11-p12 amplicon, frequently amplified in breast and squamous lung cancer. Recently, it was demonstrated that the PWWP1 domain of NSD3 is required for the viability of acute myeloid leukemia cells. To further elucidate the relevance of NSD3 in cancer biology, we developed a chemical probe, BI-9321, targeting the methyl-lysine binding site of the PWWP1 domain with sub-micromolar in vitro activity and cellular target engagement at 1 µM. As a single agent, BI-9321 downregulates Myc messenger RNA expression and reduces proliferation in MOLM-13 cells. This first-in-class chemical probe BI-9321, together with the negative control BI-9466, will greatly facilitate the elucidation of the underexplored biological function of PWWP domains.
Collapse
|
34
|
Lee S, Dietrich F, Karniadakis GE, Kevrekidis IG. Linking Gaussian process regression with data-driven manifold embeddings for nonlinear data fusion. Interface Focus 2019; 9:20180083. [PMID: 31065346 PMCID: PMC6501345 DOI: 10.1098/rsfs.2018.0083] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/27/2019] [Indexed: 01/21/2023] Open
Abstract
In statistical modelling with Gaussian process regression, it has been shown that combining (few) high-fidelity data with (many) low-fidelity data can enhance prediction accuracy, compared to prediction based on the few high-fidelity data only. Such information fusion techniques for multi-fidelity data commonly approach the high-fidelity model f h(t) as a function of two variables (t, s), and then use f l(t) as the s data. More generally, the high-fidelity model can be written as a function of several variables (t, s 1, s 2….); the low-fidelity model f l and, say, some of its derivatives can then be substituted for these variables. In this paper, we will explore mathematical algorithms for multi-fidelity information fusion that use such an approach towards improving the representation of the high-fidelity function with only a few training data points. Given that f h may not be a simple function-and sometimes not even a function-of f l, we demonstrate that using additional functions of t, such as derivatives or shifts of f l, can drastically improve the approximation of f h through Gaussian processes. We also point out a connection with 'embedology' techniques from topology and dynamical systems. Our illustrative examples range from instructive caricatures to computational biology models, such as Hodgkin-Huxley neural oscillations.
Collapse
Affiliation(s)
- Seungjoon Lee
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Felix Dietrich
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, USA
| | | | - Ioannis G. Kevrekidis
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
35
|
Kalliokoski T, Sinervo K. Predicting pK
a
for Small Molecules on Public and In‐house Datasets Using Fast Prediction Methods Combined with Data Fusion. Mol Inform 2019; 38:e1800163. [DOI: 10.1002/minf.201800163] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Accepted: 04/06/2019] [Indexed: 11/05/2022]
Affiliation(s)
| | - Kai Sinervo
- Orion Pharma Orionintie 1 A 02101 Espoo Finland
| |
Collapse
|
36
|
Gere A, Radványi D, Héberger K. Which insect species can best be proposed for human consumption? INNOV FOOD SCI EMERG 2019. [DOI: 10.1016/j.ifset.2019.01.016] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
37
|
Mellor C, Marchese Robinson R, Benigni R, Ebbrell D, Enoch S, Firman J, Madden J, Pawar G, Yang C, Cronin M. Molecular fingerprint-derived similarity measures for toxicological read-across: Recommendations for optimal use. Regul Toxicol Pharmacol 2019; 101:121-134. [DOI: 10.1016/j.yrtph.2018.11.002] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2018] [Revised: 10/09/2018] [Accepted: 11/12/2018] [Indexed: 12/20/2022]
|
38
|
Feature Selection Applied to Microarray Data. Methods Mol Biol 2019; 1986:123-152. [PMID: 31115887 DOI: 10.1007/978-1-4939-9442-7_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
A typical characteristic of microarray data is that it has a very high number of features (in the order of thousands) while the number of examples is usually less than 100. In the context of microarray classification, this poses a challenge for machine learning methods, which can suffer overfitting and thus degradation in their performance. A common solution is to apply a dimensionality reduction technique before classification, to reduce the number of features. This chapter will be focused on one of the most famous dimensionality reduction techniques: feature selection. We will see how feature selection can help improve the classification accuracy in several microarray data scenarios.
Collapse
|
39
|
Pereira T, Ferreira FL, Cardoso S, Silva D, de Mendonça A, Guerreiro M, Madeira SC. Neuropsychological predictors of conversion from mild cognitive impairment to Alzheimer's disease: a feature selection ensemble combining stability and predictability. BMC Med Inform Decis Mak 2018; 18:137. [PMID: 30567554 PMCID: PMC6299964 DOI: 10.1186/s12911-018-0710-y] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Accepted: 11/21/2018] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Predicting progression from Mild Cognitive Impairment (MCI) to Alzheimer's Disease (AD) is an utmost open issue in AD-related research. Neuropsychological assessment has proven to be useful in identifying MCI patients who are likely to convert to dementia. However, the large battery of neuropsychological tests (NPTs) performed in clinical practice and the limited number of training examples are challenge to machine learning when learning prognostic models. In this context, it is paramount to pursue approaches that effectively seek for reduced sets of relevant features. Subsets of NPTs from which prognostic models can be learnt should not only be good predictors, but also stable, promoting generalizable and explainable models. METHODS We propose a feature selection (FS) ensemble combining stability and predictability to choose the most relevant NPTs for prognostic prediction in AD. First, we combine the outcome of multiple (filter and embedded) FS methods. Then, we use a wrapper-based approach optimizing both stability and predictability to compute the number of selected features. We use two large prospective studies (ADNI and the Portuguese Cognitive Complaints Cohort, CCC) to evaluate the approach and assess the predictive value of a large number of NPTs. RESULTS The best subsets of features include approximately 30 and 20 (from the original 79 and 40) features, for ADNI and CCC data, respectively, yielding stability above 0.89 and 0.95, and AUC above 0.87 and 0.82. Most NPTs learnt using the proposed feature selection ensemble have been identified in the literature as strong predictors of conversion from MCI to AD. CONCLUSIONS The FS ensemble approach was able to 1) identify subsets of stable and relevant predictors from a consensus of multiple FS methods using baseline NPTs and 2) learn reliable prognostic models of conversion from MCI to AD using these subsets of features. The machine learning models learnt from these features outperformed the models trained without FS and achieved competitive results when compared to commonly used FS algorithms. Furthermore, the selected features are derived from a consensus of methods thus being more robust, while releasing users from choosing the most appropriate FS method to be used in their classification task.
Collapse
Affiliation(s)
- Telma Pereira
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal
- Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal
| | | | - Sandra Cardoso
- Laboratório de Neurociências, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Lisbon, Portugal
| | - Dina Silva
- Cognitive Neuroscience Research Group, Department of Psychology and Educational Sciences and Centre for Biomedical Research (CBMR), University of Algarve, Faro, Portugal
| | - Alexandre de Mendonça
- Laboratório de Neurociências, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Lisbon, Portugal
| | - Manuela Guerreiro
- Laboratório de Neurociências, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Lisbon, Portugal
| | - Sara C. Madeira
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal
| | - for the Alzheimer’s Disease Neuroimaging Initiative
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal
- Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal
- Laboratório de Neurociências, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Lisbon, Portugal
- Cognitive Neuroscience Research Group, Department of Psychology and Educational Sciences and Centre for Biomedical Research (CBMR), University of Algarve, Faro, Portugal
| |
Collapse
|
40
|
López-Cabrera JD, Lorenzo-Ginori JV. Feature selection for the classification of traced neurons. J Neurosci Methods 2018; 303:41-54. [DOI: 10.1016/j.jneumeth.2018.04.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2017] [Revised: 03/19/2018] [Accepted: 04/04/2018] [Indexed: 10/17/2022]
|
41
|
Abstract
INTRODUCTION Activity landscapes (ALs) are representations and models of compound data sets annotated with a target-specific activity. In contrast to quantitative structure-activity relationship (QSAR) models, ALs aim at characterizing structure-activity relationships (SARs) on a large-scale level encompassing all active compounds for specific targets. The popularity of AL modeling has grown substantially with the public availability of large activity-annotated compound data sets. AL modeling crucially depends on molecular representations and similarity metrics used to assess structural similarity. Areas covered: The concepts of AL modeling are introduced and its basis in quantitatively assessing molecular similarity is discussed. The different types of AL modeling approaches are introduced. AL designs can broadly be divided into three categories: compound-pair based, dimensionality reduction, and network approaches. Recent developments for each of these categories are discussed focusing on the application of mathematical, statistical, and machine learning tools for AL modeling. AL modeling using chemical space networks is covered in more detail. Expert opinion: AL modeling has remained a largely descriptive approach for the analysis of SARs. Beyond mere visualization, the application of analytical tools from statistics, machine learning and network theory has aided in the sophistication of AL designs and provides a step forward in transforming ALs from descriptive to predictive tools. To this end, optimizing representations that encode activity relevant features of molecules might prove to be a crucial step.
Collapse
Affiliation(s)
- Martin Vogt
- a Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry , Rheinische Friedrich-Wilhelms-Universität , Bonn , Germany
| |
Collapse
|
42
|
In-silico guided discovery of novel CCR9 antagonists. J Comput Aided Mol Des 2018; 32:573-582. [DOI: 10.1007/s10822-018-0113-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Accepted: 03/19/2018] [Indexed: 12/15/2022]
|
43
|
Cross JB. Methods for Virtual Screening of GPCR Targets: Approaches and Challenges. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2017; 1705:233-264. [PMID: 29188566 DOI: 10.1007/978-1-4939-7465-8_11] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Virtual screening (VS) has become an integral part of the drug discovery process and is a valuable tool for finding novel chemical starting points for GPCR targets. Ligand-based VS makes use of biochemical data for known, active compounds and has been applied successfully to many diverse GPCRs. Recent progress in GPCR X-ray crystallography has made it possible to incorporate detailed structural information into the VS process. This chapter outlines the latest VS techniques along with examples that highlight successful applications of these methods. Best practices for increasing the likelihood of VS success, as well as ongoing challenges, are also discussed.
Collapse
Affiliation(s)
- Jason B Cross
- University of Texas MD Anderson Cancer Center, Houston, TX, 77054, USA.
| |
Collapse
|
44
|
Ruiz IL, Gómez-Nieto MÁ. Advantages of Relative versus Absolute Data for the Development of Quantitative Structure–Activity Relationship Classification Models. J Chem Inf Model 2017; 57:2776-2788. [DOI: 10.1021/acs.jcim.7b00492] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Irene Luque Ruiz
- Department of Computing and
Numerical
Analysis, University of Córdoba, Albert Einstein building, Campus de Rabanales, E-14071, Córdoba, Spain
| | - Miguel Ángel Gómez-Nieto
- Department of Computing and
Numerical
Analysis, University of Córdoba, Albert Einstein building, Campus de Rabanales, E-14071, Córdoba, Spain
| |
Collapse
|
45
|
Skinnider MA, Dejong CA, Franczak BC, McNicholas PD, Magarvey NA. Comparative analysis of chemical similarity methods for modular natural products with a hypothetical structure enumeration algorithm. J Cheminform 2017; 9:46. [PMID: 29086195 PMCID: PMC5559407 DOI: 10.1186/s13321-017-0234-y] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2017] [Accepted: 08/08/2017] [Indexed: 12/25/2022] Open
Abstract
Natural products represent a prominent source of pharmaceutically and industrially important agents. Calculating the chemical similarity of two molecules is a central task in cheminformatics, with applications at multiple stages of the drug discovery pipeline. Quantifying the similarity of natural products is a particularly important problem, as the biological activities of these molecules have been extensively optimized by natural selection. The large and structurally complex scaffolds of natural products distinguish their physical and chemical properties from those of synthetic compounds. However, no analysis of the performance of existing methods for molecular similarity calculation specific to natural products has been reported to date. Here, we present LEMONS, an algorithm for the enumeration of hypothetical modular natural product structures. We leverage this algorithm to conduct a comparative analysis of molecular similarity methods within the unique chemical space occupied by modular natural products using controlled synthetic data, and comprehensively investigate the impact of diverse biosynthetic parameters on similarity search. We additionally investigate a recently described algorithm for natural product retrobiosynthesis and alignment, and find that when rule-based retrobiosynthesis can be applied, this approach outperforms conventional two-dimensional fingerprints, suggesting it may represent a valuable approach for the targeted exploration of natural product chemical space and microbial genome mining. Our open-source algorithm is an extensible method of enumerating hypothetical natural product structures with diverse potential applications in bioinformatics.
Collapse
Affiliation(s)
- Michael A Skinnider
- Department of Biochemistry and Biomedical Sciences, Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON, Canada.,Department of Chemistry and Chemical Biology, Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON, Canada
| | - Chris A Dejong
- Department of Biochemistry and Biomedical Sciences, Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON, Canada.,Department of Chemistry and Chemical Biology, Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON, Canada
| | - Brian C Franczak
- Department of Mathematics and Statistics, McMaster University, Hamilton, ON, Canada.,Department of Mathematics and Statistics, MacEwan University, Edmonton, AB, Canada
| | - Paul D McNicholas
- Department of Mathematics and Statistics, McMaster University, Hamilton, ON, Canada
| | - Nathan A Magarvey
- Department of Biochemistry and Biomedical Sciences, Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON, Canada. .,Department of Chemistry and Chemical Biology, Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON, Canada.
| |
Collapse
|
46
|
Pertusi DA, O’Donnell G, Homsher MF, Solly K, Patel A, Stahler SL, Riley D, Finley MF, Finger EN, Adam GC, Meng J, Bell DJ, Zuck PD, Hudak EM, Weber MJ, Nothstein JE, Locco L, Quinn C, Amoss A, Squadroni B, Hartnett M, Heo MR, White T, May SA, Boots E, Roberts K, Cocchiarella P, Wolicki A, Kreamer A, Kutchukian PS, Wassermann AM, Uebele VN, Glick M, Rusinko A, Culberson JC. Prospective Assessment of Virtual Screening Heuristics Derived Using a Novel Fusion Score. SLAS DISCOVERY 2017; 22:995-1006. [DOI: 10.1177/2472555217706058] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
High-throughput screening (HTS) is a widespread method in early drug discovery for identifying promising chemical matter that modulates a target or phenotype of interest. Because HTS campaigns involve screening millions of compounds, it is often desirable to initiate screening with a subset of the full collection. Subsequently, virtual screening methods prioritize likely active compounds in the remaining collection in an iterative process. With this approach, orthogonal virtual screening methods are often applied, necessitating the prioritization of hits from different approaches. Here, we introduce a novel method of fusing these prioritizations and benchmark it prospectively on 17 screening campaigns using virtual screening methods in three descriptor spaces. We found that the fusion approach retrieves 15% to 65% more active chemical series than any single machine-learning method and that appropriately weighting contributions of similarity and machine-learning scoring techniques can increase enrichment by 1% to 19%. We also use fusion scoring to evaluate the tradeoff between screening more chemical matter initially in lieu of replicate samples to prevent false-positives and find that the former option leads to the retrieval of more active chemical series. These results represent guidelines that can increase the rate of identification of promising active compounds in future iterative screens.
Collapse
Affiliation(s)
- Dante A. Pertusi
- Modeling and Informatics, Merck & Co., Inc., West Point, PA, USA
| | - Gregory O’Donnell
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., West Point, PA, USA
| | - Michelle F. Homsher
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., West Point, PA, USA
| | - Kelli Solly
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., West Point, PA, USA
| | - Amita Patel
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., West Point, PA, USA
| | - Shannon L. Stahler
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., West Point, PA, USA
| | - Daniel Riley
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., West Point, PA, USA
| | - Michael F. Finley
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Discovery Sciences, Janssen Research and Development LLC, Spring House, PA, USA
| | - Eleftheria N. Finger
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Discovery & Preclinical Development, GlaxoSmithKline, Collegeville, PA, USA
| | - Gregory C. Adam
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., West Point, PA, USA
| | - Juncai Meng
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
| | - David J. Bell
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., North Wales, PA, USA
| | - Paul D. Zuck
- Merck & Co., Inc., North Wales, PA, USA
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | - Edward M. Hudak
- Discovery Sample Management, Merck & Co., Inc., North Wales, PA, USA
| | - Michael J. Weber
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | - Jennifer E. Nothstein
- Merck & Co., Inc., West Point, PA, USA
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | - Louis Locco
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | - Carissa Quinn
- Discovery Sciences, Janssen Research and Development LLC, Spring House, PA, USA
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | - Adam Amoss
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | - Brian Squadroni
- Merck & Co., Inc., West Point, PA, USA
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | - Michelle Hartnett
- Discovery Sciences, Janssen Research and Development LLC, Spring House, PA, USA
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | - Mee Ra Heo
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., North Wales, PA, USA
| | - Tara White
- Discovery Sample Management, Merck & Co., Inc., North Wales, PA, USA
| | - S. Alex May
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | - Evelyn Boots
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
| | - Kenneth Roberts
- Automation and Engineering, Merck & Co., Inc., North Wales, PA, USA
| | | | - Alex Wolicki
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
| | - Anthony Kreamer
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., Kenilworth, NJ, USA
| | | | | | - Victor N. Uebele
- Screening and Protein Sciences, Merck & Co., Inc., North Wales, PA, USA
- Merck & Co., Inc., North Wales, PA, USA
| | - Meir Glick
- Modeling and Informatics, Merck & Co., Inc., Boston, MA, USA
| | - Andrew Rusinko
- Modeling and Informatics, Merck & Co., Inc., West Point, PA, USA
| | | |
Collapse
|
47
|
Seijo-Pardo B, Bolón-Canedo V, Alonso-Betanzos A. Testing Different Ensemble Configurations for Feature Selection. Neural Process Lett 2017. [DOI: 10.1007/s11063-017-9619-1] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
48
|
O'Hagan S, Kell DB. Analysis of drug-endogenous human metabolite similarities in terms of their maximum common substructures. J Cheminform 2017; 9:18. [PMID: 28316656 PMCID: PMC5344883 DOI: 10.1186/s13321-017-0198-y] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Accepted: 02/09/2017] [Indexed: 12/21/2022] Open
Abstract
In previous work, we have assessed the structural similarities between marketed drugs (‘drugs’) and endogenous natural human metabolites (‘metabolites’ or ‘endogenites’), using ‘fingerprint’ methods in common use, and the Tanimoto and Tversky similarity metrics, finding that the fingerprint encoding used had a dramatic effect on the apparent similarities observed. By contrast, the maximal common substructure (MCS), when the means of determining it is fixed, is a means of determining similarities that is largely independent of the fingerprints, and also has a clear chemical meaning. We here explored the utility of the MCS and metrics derived therefrom. In many cases, a shared scaffold helps cluster drugs and endogenites, and gives insight into enzymes (in particular transporters) that they both share. Tanimoto and Tversky similarities based on the MCS tend to be smaller than those based on the MACCS fingerprint-type encoding, though the converse is also true for a significant fraction of the comparisons. While no single molecular descriptor can account for these differences, a machine learning-based analysis of the nature of the differences (MACCS_Tanimoto vs MCS_Tversky) shows that they are indeed deterministic, although the features that are used in the model to account for this vary greatly with each individual drug. The extent of its utility and interpretability vary with the drug of interest, implying that while MCS is neither ‘better’ nor ‘worse’ for every drug–endogenite comparison, it is sufficiently different to be of value. The overall conclusion is thus that the use of the MCS provides an additional and valuable strategy for understanding the structural basis for similarities between synthetic, marketed drugs and natural intermediary metabolites.
Collapse
Affiliation(s)
- Steve O'Hagan
- School of Chemistry, The University of Manchester, 131 Princess St, Manchester, M1 7DN UK.,Manchester Institute of Biotechnology, The University of Manchester, 131 Princess St, Manchester, M1 7DN UK
| | - Douglas B Kell
- School of Chemistry, The University of Manchester, 131 Princess St, Manchester, M1 7DN UK.,Manchester Institute of Biotechnology, The University of Manchester, 131 Princess St, Manchester, M1 7DN UK.,Centre for the Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), The University of Manchester, 131 Princess St, Manchester, M1 7DN UK
| |
Collapse
|
49
|
Al-Dabbagh MM, Salim N, Himmat M, Ahmed A, Saeed F. Quantum probability ranking principle for ligand-based virtual screening. J Comput Aided Mol Des 2017; 31:365-378. [PMID: 28220440 DOI: 10.1007/s10822-016-0003-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2016] [Accepted: 12/16/2016] [Indexed: 10/20/2022]
Abstract
Chemical libraries contain thousands of compounds that need screening, which increases the need for computational methods that can rank or prioritize compounds. The tools of virtual screening are widely exploited to enhance the cost effectiveness of lead drug discovery programs by ranking chemical compounds databases in decreasing probability of biological activity based upon probability ranking principle (PRP). In this paper, we developed a novel ranking approach for molecular compounds inspired by quantum mechanics, called quantum probability ranking principle (QPRP). The QPRP ranking criteria would make an attempt to draw an analogy between the physical experiment and molecular structure ranking process for 2D fingerprints in ligand based virtual screening (LBVS). The development of QPRP criteria in LBVS has employed the concepts of quantum at three different levels, firstly at representation level, this model makes an effort to develop a new framework of molecular representation by connecting the molecular compounds with mathematical quantum space. Secondly, estimate the similarity between chemical libraries and references based on quantum-based similarity searching method. Finally, rank the molecules using QPRP approach. Simulated virtual screening experiments with MDL drug data report (MDDR) data sets showed that QPRP outperformed the classical ranking principle (PRP) for molecular chemical compounds.
Collapse
Affiliation(s)
| | - Naomie Salim
- Faculty of Computing, Universiti Teknologi Malaysia, Skudia, 81310, Malaysia
| | - Mubarak Himmat
- Faculty of Computing, Universiti Teknologi Malaysia, Skudia, 81310, Malaysia
| | - Ali Ahmed
- Faculty of Engineering, Karary University, Khartoum, 12304, Sudan
| | - Faisal Saeed
- Faculty of Computing, Universiti Teknologi Malaysia, Skudia, 81310, Malaysia
| |
Collapse
|
50
|
Franco P, Porta N, Holliday JD, Willett P. Molecular similarity considerations in the licensing of orphan drugs. Drug Discov Today 2017; 22:377-381. [DOI: 10.1016/j.drudis.2016.11.024] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2016] [Revised: 11/19/2016] [Accepted: 11/30/2016] [Indexed: 11/17/2022]
|