Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Swamidass SJ, Azencott CA, Lin TW, Gramajo H, Tsai S, Baldi P. Influence relevance voting: an accurate and interpretable virtual high throughput screening method. J Chem Inf Model 2009;49:756-66. [PMID: 19391629 PMCID: PMC2750043 DOI: 10.1021/ci8004379] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

For:	Swamidass SJ, Azencott CA, Lin TW, Gramajo H, Tsai S, Baldi P. Influence relevance voting: an accurate and interpretable virtual high throughput screening method. J Chem Inf Model 2009;49:756-66. [PMID: 19391629 PMCID: PMC2750043 DOI: 10.1021/ci8004379] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Number

Cited by Other Article(s)

Wu X, Zhou Q, Mu L, Hu X. Machine learning in the identification, prediction and exploration of environmental toxicology: Challenges and perspectives. JOURNAL OF HAZARDOUS MATERIALS 2022;438:129487. [PMID: 35816807 DOI: 10.1016/j.jhazmat.2022.129487] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 06/16/2022] [Accepted: 06/26/2022] [Indexed: 06/15/2023]

Ma H, Bian Y, Rong Y, Huang W, Xu T, Xie W, Ye G, Huang J. Cross-dependent graph neural networks for molecular property prediction. Bioinformatics 2022;38:2003-2009. [PMID: 35094072 DOI: 10.1093/bioinformatics/btac039] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 12/14/2021] [Accepted: 01/25/2022] [Indexed: 02/03/2023] Open

Chen J, Si YW, Un CW, Siu SWI. Chemical toxicity prediction based on semi-supervised learning and graph convolutional neural network. J Cheminform 2021;13:93. [PMID: 34838140 PMCID: PMC8627024 DOI: 10.1186/s13321-021-00570-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 11/11/2021] [Indexed: 12/28/2022] Open

Bo W, Chen L, Qin D, Geng S, Li J, Mei H, Li B, Liang G. Application of quantitative structure-activity relationship to food-derived peptides: Methods, situations, challenges and prospects. Trends Food Sci Technol 2021. [DOI: 10.1016/j.tifs.2021.05.031] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Prediction of chemical compounds properties using a deep learning model. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-05961-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Shamsara J. Evaluation of the performance of various machine learning methods on the discrimination of the active compounds. Chem Biol Drug Des 2021;97:930-943. [DOI: 10.1111/cbdd.13819] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Revised: 12/10/2020] [Accepted: 12/21/2020] [Indexed: 12/12/2022]

Yang S, Ye Q, Ding J, Yin, Lu A, Chen X, Hou T, Cao D. Current advances in ligand‐based target prediction. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2020. [DOI: 10.1002/wcms.1504] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]

Jablonka K, Ongari D, Moosavi SM, Smit B. Big-Data Science in Porous Materials: Materials Genomics and Machine Learning. Chem Rev 2020;120:8066-8129. [PMID: 32520531 PMCID: PMC7453404 DOI: 10.1021/acs.chemrev.0c00004] [Citation(s) in RCA: 149] [Impact Index Per Article: 37.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Indexed: 12/16/2022]

TranScreen: Transfer Learning on Graph-Based Anti-Cancer Virtual Screening Model. BIG DATA AND COGNITIVE COMPUTING 2020. [DOI: 10.3390/bdcc4030016] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]

Abbasi K, Poso A, Ghasemi J, Amanlou M, Masoudi-Nejad A. Deep Transferable Compound Representation across Domains and Tasks for Low Data Drug Discovery. J Chem Inf Model 2019;59:4528-4539. [PMID: 31661955 DOI: 10.1021/acs.jcim.9b00626] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]

Yang K, Swanson K, Jin W, Coley C, Eiden P, Gao H, Guzman-Perez A, Hopper T, Kelley B, Mathea M, Palmer A, Settels V, Jaakkola T, Jensen K, Barzilay R. Analyzing Learned Molecular Representations for Property Prediction. J Chem Inf Model 2019;59:3370-3388. [PMID: 31361484 PMCID: PMC6727618 DOI: 10.1021/acs.jcim.9b00237] [Citation(s) in RCA: 591] [Impact Index Per Article: 118.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Indexed: 12/23/2022]

Schaub AJ, Moreno GO, Zhao S, Truong HV, Luo R, Tsai SC. Computational structural enzymology methodologies for the study and engineering of fatty acid synthases, polyketide synthases and nonribosomal peptide synthetases. Methods Enzymol 2019;622:375-409. [PMID: 31155062 PMCID: PMC7197764 DOI: 10.1016/bs.mie.2019.03.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]

Liu S, Alnammi M, Ericksen SS, Voter AF, Ananiev GE, Keck JL, Hoffmann FM, Wildman SA, Gitter A. Practical Model Selection for Prospective Virtual Screening. J Chem Inf Model 2018;59:282-293. [PMID: 30500183 PMCID: PMC6351977 DOI: 10.1021/acs.jcim.8b00363] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]

Srinivas R, Klimovich PV, Larson EC. Implicit-descriptor ligand-based virtual screening by means of collaborative filtering. J Cheminform 2018;10:56. [PMID: 30467684 PMCID: PMC6755561 DOI: 10.1186/s13321-018-0310-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Accepted: 11/13/2018] [Indexed: 12/20/2022] Open

Ghasemi F, Mehridehnavi A, Pérez-Garrido A, Pérez-Sánchez H. Neural network and deep-learning algorithms used in QSAR studies: merits and drawbacks. Drug Discov Today 2018;23:1784-1790. [PMID: 29936244 DOI: 10.1016/j.drudis.2018.06.016] [Citation(s) in RCA: 91] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2017] [Revised: 06/05/2018] [Accepted: 06/14/2018] [Indexed: 10/28/2022]

Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, Ferrero E, Agapow PM, Zietz M, Hoffman MM, Xie W, Rosen GL, Lengerich BJ, Israeli J, Lanchantin J, Woloszynek S, Carpenter AE, Shrikumar A, Xu J, Cofer EM, Lavender CA, Turaga SC, Alexandari AM, Lu Z, Harris DJ, DeCaprio D, Qi Y, Kundaje A, Peng Y, Wiley LK, Segler MHS, Boca SM, Swamidass SJ, Huang A, Gitter A, Greene CS. Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface 2018;15:20170387. [PMID: 29618526 PMCID: PMC5938574 DOI: 10.1098/rsif.2017.0387] [Citation(s) in RCA: 790] [Impact Index Per Article: 131.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Accepted: 03/07/2018] [Indexed: 11/12/2022] Open

Affiliation(s)

Travers Ching Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at Manoa, Honolulu, HI, USA
Daniel S Himmelstein Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Brett K Beaulieu-Jones Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Alexandr A Kalinin Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
Brian T Do Harvard Medical School, Boston, MA, USA
Gregory P Way Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Enrico Ferrero Computational Biology and Stats, Target Sciences, GlaxoSmithKline, Stevenage, UK
Paul-Michael Agapow Data Science Institute, Imperial College London, London, UK
Michael Zietz Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Michael M Hoffman Princess Margaret Cancer Centre, Toronto, Ontario, Canada Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
Wei Xie Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA
Gail L Rosen Ecological and Evolutionary Signal-processing and Informatics Laboratory, Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
Benjamin J Lengerich Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
Johnny Israeli Biophysics Program, Stanford University, Stanford, CA, USA
Jack Lanchantin Department of Computer Science, University of Virginia, Charlottesville, VA, USA
Stephen Woloszynek Ecological and Evolutionary Signal-processing and Informatics Laboratory, Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
Anne E Carpenter Imaging Platform, Broad Institute of Harvard and MIT, Cambridge, MA, USA
Avanti Shrikumar Department of Computer Science, Stanford University, Stanford, CA, USA
Jinbo Xu Toyota Technological Institute at Chicago, Chicago, IL, USA
Evan M Cofer Department of Computer Science, Trinity University, San Antonio, TX, USA Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
Christopher A Lavender Integrative Bioinformatics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, USA
Srinivas C Turaga Howard Hughes Medical Institute, Janelia Research Campus, Ashburn, VA, USA
Amr M Alexandari Department of Computer Science, Stanford University, Stanford, CA, USA
Zhiyong Lu National Center for Biotechnology Information and National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
David J Harris Department of Wildlife Ecology and Conservation, University of Florida, Gainesville, FL, USA
Dave DeCaprio ClosedLoop.ai, Austin, TX, USA
Yanjun Qi Department of Computer Science, University of Virginia, Charlottesville, VA, USA
Anshul Kundaje Department of Computer Science, Stanford University, Stanford, CA, USA Department of Genetics, Stanford University, Stanford, CA, USA
Yifan Peng National Center for Biotechnology Information and National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
Laura K Wiley Division of Biomedical Informatics and Personalized Medicine, University of Colorado School of Medicine, Aurora, CO, USA
Marwin H S Segler Institute of Organic Chemistry, Westfälische Wilhelms-Universität Münster, Münster, Germany
Simina M Boca Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA
S Joshua Swamidass Department of Pathology and Immunology, Washington University in Saint Louis, St Louis, MO, USA
Austin Huang Department of Medicine, Brown University, Providence, RI, USA
Anthony Gitter Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA Morgridge Institute for Research, Madison, WI, USA
Casey S Greene Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA

Collapse

Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS, Leswing K, Pande V. MoleculeNet: a benchmark for molecular machine learning. Chem Sci 2017;9:513-530. [PMID: 29629118 PMCID: PMC5868307 DOI: 10.1039/c7sc02664a] [Citation(s) in RCA: 884] [Impact Index Per Article: 126.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2017] [Accepted: 10/30/2017] [Indexed: 12/22/2022] Open

Lenselink EB, Ten Dijke N, Bongers B, Papadatos G, van Vlijmen HWT, Kowalczyk W, IJzerman AP, van Westen GJP. Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set. J Cheminform 2017;9:45. [PMID: 29086168 PMCID: PMC5555960 DOI: 10.1186/s13321-017-0232-0] [Citation(s) in RCA: 165] [Impact Index Per Article: 23.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2017] [Accepted: 07/31/2017] [Indexed: 11/10/2022] Open

Abstract

The increase of publicly available bioactivity data in recent years has fueled and catalyzed research in chemogenomics, data mining, and modeling approaches. As a direct result, over the past few years a multitude of different methods have been reported and evaluated, such as target fishing, nearest neighbor similarity-based methods, and Quantitative Structure Activity Relationship (QSAR)-based protocols. However, such studies are typically conducted on different datasets, using different validation strategies, and different metrics. In this study, different methods were compared using one single standardized dataset obtained from ChEMBL, which is made available to the public, using standardized metrics (BEDROC and Matthews Correlation Coefficient). Specifically, the performance of Naïve Bayes, Random Forests, Support Vector Machines, Logistic Regression, and Deep Neural Networks was assessed using QSAR and proteochemometric (PCM) methods. All methods were validated using both a random split validation and a temporal validation, with the latter being a more realistic benchmark of expected prospective execution. Deep Neural Networks are the top performing classifiers, highlighting the added value of Deep Neural Networks over other more conventional methods. Moreover, the best method ('DNN_PCM') performed significantly better at almost one standard deviation higher than the mean performance. Furthermore, Multi-task and PCM implementations were shown to improve performance over single task Deep Neural Networks. Conversely, target prediction performed almost two standard deviations under the mean performance. Random Forests, Support Vector Machines, and Logistic Regression performed around mean performance. Finally, using an ensemble of DNNs, alongside additional tuning, enhanced the relative performance by another 27% (compared with unoptimized 'DNN_PCM'). Here, a standardized set to test and evaluate different machine learning algorithms in the context of multi-task learning is offered by providing the data and the protocols. Graphical Abstract .

Collapse

Goh GB, Hodas NO, Vishnu A. Deep learning for computational chemistry. J Comput Chem 2017;38:1291-1307. [PMID: 28272810 DOI: 10.1002/jcc.24764] [Citation(s) in RCA: 297] [Impact Index Per Article: 42.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Revised: 01/09/2017] [Accepted: 01/18/2017] [Indexed: 02/06/2023]

Lima AN, Philot EA, Trossini GHG, Scott LPB, Maltarollo VG, Honorio KM. Use of machine learning approaches for novel drug discovery. Expert Opin Drug Discov 2016;11:225-39. [PMID: 26814169 DOI: 10.1517/17460441.2016.1146250] [Citation(s) in RCA: 138] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]

Molecular graph convolutions: moving beyond fingerprints. J Comput Aided Mol Des 2016;30:595-608. [PMID: 27558503 DOI: 10.1007/s10822-016-9938-8] [Citation(s) in RCA: 597] [Impact Index Per Article: 74.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2016] [Accepted: 08/11/2016] [Indexed: 10/21/2022]

Accurate and efficient target prediction using a potency-sensitive influence-relevance voter. J Cheminform 2015;7:63. [PMID: 26719774 PMCID: PMC4696267 DOI: 10.1186/s13321-015-0110-6] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Accepted: 12/02/2015] [Indexed: 11/10/2022] Open

Dörr A, Rosenbaum L, Zell A. A ranking method for the concurrent learning of compounds with various activity profiles. J Cheminform 2015;7:2. [PMID: 25643067 PMCID: PMC4306736 DOI: 10.1186/s13321-014-0050-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Accepted: 12/11/2014] [Indexed: 11/30/2022] Open

Lavecchia A. Machine-learning approaches in drug discovery: methods and applications. Drug Discov Today 2014;20:318-31. [PMID: 25448759 DOI: 10.1016/j.drudis.2014.10.012] [Citation(s) in RCA: 353] [Impact Index Per Article: 35.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2014] [Revised: 09/27/2014] [Accepted: 10/24/2014] [Indexed: 12/19/2022]

Xie L, Ge X, Tan H, Xie L, Zhang Y, Hart T, Yang X, Bourne PE. Towards structural systems pharmacology to study complex diseases and personalized medicine. PLoS Comput Biol 2014;10:e1003554. [PMID: 24830652 PMCID: PMC4022462 DOI: 10.1371/journal.pcbi.1003554] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open

Ng C, Hauptman R, Zhang Y, Bourne PE, Xie L. Anti-infectious drug repurposing using an integrated chemical genomics and structural systems biology approach. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2014:136-47. [PMID: 24297541 PMCID: PMC6322395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]

Zaretzki J, Matlock M, Swamidass SJ. XenoSite: Accurately Predicting CYP-Mediated Sites of Metabolism with Neural Networks. J Chem Inf Model 2013;53:3373-83. [DOI: 10.1021/ci400518g] [Citation(s) in RCA: 135] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]

Browning MR, Calhoun BT, Swamidass SJ. Managing missing measurements in small-molecule screens. J Comput Aided Mol Des 2013;27:469-78. [DOI: 10.1007/s10822-013-9642-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2013] [Accepted: 03/29/2013] [Indexed: 12/22/2022]

Varnek A, Baskin I. Machine learning methods for property prediction in chemoinformatics: Quo Vadis? J Chem Inf Model 2012;52:1413-37. [PMID: 22582859 DOI: 10.1021/ci200409x] [Citation(s) in RCA: 148] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]

Ranu S, Calhoun BT, Singh AK, Swamidass SJ. Probabilistic Substructure Mining From Small-Molecule Screens. Mol Inform 2011;30:809-15. [DOI: 10.1002/minf.201100058] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2011] [Accepted: 06/04/2011] [Indexed: 12/20/2022]

Swamidass SJ. Mining small-molecule screens to repurpose drugs. Brief Bioinform 2011;12:327-35. [PMID: 21715466 DOI: 10.1093/bib/bbr028] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open

Rosenbaum L, Hinselmann G, Jahn A, Zell A. Interpreting linear support vector machine models with heat map molecule coloring. J Cheminform 2011;3:11. [PMID: 21439031 PMCID: PMC3076244 DOI: 10.1186/1758-2946-3-11] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2010] [Accepted: 03/25/2011] [Indexed: 11/17/2022] Open

Abstract

Background

Model-based virtual screening plays an important role in the early drug discovery stage. The outcomes of high-throughput screenings are a valuable source for machine learning algorithms to infer such models. Besides a strong performance, the interpretability of a machine learning model is a desired property to guide the optimization of a compound in later drug discovery stages. Linear support vector machines showed to have a convincing performance on large-scale data sets. The goal of this study is to present a heat map molecule coloring technique to interpret linear support vector machine models. Based on the weights of a linear model, the visualization approach colors each atom and bond of a compound according to its importance for activity.

Results

We evaluated our approach on a toxicity data set, a chromosome aberration data set, and the maximum unbiased validation data sets. The experiments show that our method sensibly visualizes structure-property and structure-activity relationships of a linear support vector machine model. The coloring of ligands in the binding pocket of several crystal structures of a maximum unbiased validation data set target indicates that our approach assists to determine the correct ligand orientation in the binding pocket. Additionally, the heat map coloring enables the identification of substructures important for the binding of an inhibitor.

Conclusions

In combination with heat map coloring, linear support vector machine models can help to guide the modification of a compound in later stages of drug discovery. Particularly substructures identified as important by our method might be a starting point for optimization of a lead compound. The heat map coloring should be considered as complementary to structure based modeling approaches. As such, it helps to get a better understanding of the binding mode of an inhibitor.

Collapse

Gago G, Diacovich L, Arabolaza A, Tsai SC, Gramajo H. Fatty acid biosynthesis in actinomycetes. FEMS Microbiol Rev 2011;35:475-97. [PMID: 21204864 DOI: 10.1111/j.1574-6976.2010.00259.x] [Citation(s) in RCA: 116] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open

Grave KD, Costa F. Molecular graph augmentation with rings and functional groups. J Chem Inf Model 2011;50:1660-8. [PMID: 20795702 DOI: 10.1021/ci9005035] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Hinselmann G, Rosenbaum L, Jahn A, Fechner N, Ostermann C, Zell A. Large-Scale Learning of Structure−Activity Relationships Using a Linear Support Vector Machine and Problem-Specific Metrics. J Chem Inf Model 2011;51:203-13. [DOI: 10.1021/ci100073w] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Swamidass SJ, Bittker JA, Bodycombe NE, Ryder SP, Clemons PA. An economic framework to prioritize confirmatory tests after a high-throughput screen. ACTA ACUST UNITED AC 2010;15:680-6. [PMID: 20547534 DOI: 10.1177/1087057110372803] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Geppert H, Vogt M, Bajorath J. Current trends in ligand-based virtual screening: molecular representations, data mining methods, new application areas, and performance evaluation. J Chem Inf Model 2010;50:205-16. [PMID: 20088575 DOI: 10.1021/ci900419k] [Citation(s) in RCA: 231] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]

Swamidass SJ, Azencott CA, Daily K, Baldi P. A CROC stronger than ROC: measuring, visualizing and optimizing early retrieval. Bioinformatics 2010;26:1348-56. [PMID: 20378557 DOI: 10.1093/bioinformatics/btq140] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Abstract

MOTIVATION

The performance of classifiers is often assessed using Receiver Operating Characteristic ROC [or (AC) accumulation curve or enrichment curve] curves and the corresponding areas under the curves (AUCs). However, in many fundamental problems ranging from information retrieval to drug discovery, only the very top of the ranked list of predictions is of any interest and ROCs and AUCs are not very useful. New metrics, visualizations and optimization tools are needed to address this 'early retrieval' problem.

RESULTS

To address the early retrieval problem, we develop the general concentrated ROC (CROC) framework. In this framework, any relevant portion of the ROC (or AC) curve is magnified smoothly by an appropriate continuous transformation of the coordinates with a corresponding magnification factor. Appropriate families of magnification functions confined to the unit square are derived and their properties are analyzed together with the resulting CROC curves. The area under the CROC curve (AUC[CROC]) can be used to assess early retrieval. The general framework is demonstrated on a drug discovery problem and used to discriminate more accurately the early retrieval performance of five different predictors. From this framework, we propose a novel metric and visualization-the CROC(exp), an exponential transform of the ROC curve-as an alternative to other methods. The CROC(exp) provides a principled, flexible and effective way for measuring and visualizing early retrieval performance with excellent statistical power. Corresponding methods for optimizing early retrieval are also described in the Appendix.

AVAILABILITY

Datasets are publicly available. Python code and command-line utilities implementing CROC curves and metrics are available at http://pypi.python.org/pypi/CROC/ CONTACT: pfbaldi@ics.uci.edu

Collapse