1
|
Özçelik R, Öztürk H, Özgür A, Ozkirimli E. ChemBoost: A Chemical Language Based Approach for Protein - Ligand Binding Affinity Prediction. Mol Inform 2020; 40:e2000212. [PMID: 33225594 DOI: 10.1002/minf.202000212] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Accepted: 11/20/2020] [Indexed: 11/07/2022]
Abstract
Identification of high affinity drug-target interactions is a major research question in drug discovery. Proteins are generally represented by their structures or sequences. However, structures are available only for a small subset of biomolecules and sequence similarity is not always correlated with functional similarity. We propose ChemBoost, a chemical language based approach for affinity prediction using SMILES syntax. We hypothesize that SMILES is a codified language and ligands are documents composed of chemical words. These documents can be used to learn chemical word vectors that represent words in similar contexts with similar vectors. In ChemBoost, the ligands are represented via chemical word embeddings, while the proteins are represented through sequence-based features and/or chemical words of their ligands. Our aim is to process the patterns in SMILES as a language to predict protein-ligand affinity, even when we cannot infer the function from the sequence. We used eXtreme Gradient Boosting to predict protein-ligand affinities in KIBA and BindingDB data sets. ChemBoost was able to predict drug-target binding affinity as well as or better than state-of-the-art machine learning systems. When powered with ligand-centric representations, ChemBoost was more robust to the changes in protein sequence similarity and successfully captured the interactions between a protein and a ligand, even if the protein has low sequence similarity to the known targets of the ligand.
Collapse
Affiliation(s)
- Rıza Özçelik
- Department of Computer Engineering, Boğaziçi University, Istanbul, Turkey
| | - Hakime Öztürk
- Department of Computer Engineering, Boğaziçi University, Istanbul, Turkey
| | - Arzucan Özgür
- Department of Computer Engineering, Boğaziçi University, Istanbul, Turkey
| | - Elif Ozkirimli
- Department of Chemical Engineering, Boğaziçi University, Istanbul, Turkey.,Data and Analytics Chapter, Pharma International Informatics, F. Hoffmann-La Roche AG, Switzerland
| |
Collapse
|
2
|
Plehiers PP, Coley CW, Gao H, Vermeire FH, Dobbelaere MR, Stevens CV, Van Geem KM, Green WH. Artificial Intelligence for Computer-Aided Synthesis In Flow: Analysis and Selection of Reaction Components. FRONTIERS IN CHEMICAL ENGINEERING 2020. [DOI: 10.3389/fceng.2020.00005] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
|
3
|
Beker W, Gajewska EP, Badowski T, Grzybowski BA. Prediction of Major Regio‐, Site‐, and Diastereoisomers in Diels–Alder Reactions by Using Machine‐Learning: The Importance of Physically Meaningful Descriptors. Angew Chem Int Ed Engl 2018. [DOI: 10.1002/ange.201806920] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Wiktor Beker
- Institute of Organic Chemistry Polish Academy of Sciences ul. Kasprzaka 44/52 01-224 Warsaw Poland
| | - Ewa P. Gajewska
- Institute of Organic Chemistry Polish Academy of Sciences ul. Kasprzaka 44/52 01-224 Warsaw Poland
| | - Tomasz Badowski
- Institute of Organic Chemistry Polish Academy of Sciences ul. Kasprzaka 44/52 01-224 Warsaw Poland
| | - Bartosz A. Grzybowski
- Institute of Organic Chemistry Polish Academy of Sciences ul. Kasprzaka 44/52 01-224 Warsaw Poland
- Center for Soft and Living Matter and Department of Chemistry UNIST 50, UNIST-gil, Eonyang-eup, Ulju-gun Ulsan South Korea
| |
Collapse
|
4
|
Beker W, Gajewska EP, Badowski T, Grzybowski BA. Prediction of Major Regio‐, Site‐, and Diastereoisomers in Diels–Alder Reactions by Using Machine‐Learning: The Importance of Physically Meaningful Descriptors. Angew Chem Int Ed Engl 2018; 58:4515-4519. [DOI: 10.1002/anie.201806920] [Citation(s) in RCA: 73] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 08/22/2018] [Indexed: 01/15/2023]
Affiliation(s)
- Wiktor Beker
- Institute of Organic Chemistry Polish Academy of Sciences ul. Kasprzaka 44/52 01-224 Warsaw Poland
| | - Ewa P. Gajewska
- Institute of Organic Chemistry Polish Academy of Sciences ul. Kasprzaka 44/52 01-224 Warsaw Poland
| | - Tomasz Badowski
- Institute of Organic Chemistry Polish Academy of Sciences ul. Kasprzaka 44/52 01-224 Warsaw Poland
| | - Bartosz A. Grzybowski
- Institute of Organic Chemistry Polish Academy of Sciences ul. Kasprzaka 44/52 01-224 Warsaw Poland
- Center for Soft and Living Matter and Department of Chemistry UNIST 50, UNIST-gil, Eonyang-eup, Ulju-gun Ulsan South Korea
| |
Collapse
|
5
|
Affiliation(s)
- Shahar Harel
- Department of Computer Science, Technion - Israel Institute of Technology, Haifa 3200003, Israel
| | - Kira Radinsky
- Department of Computer Science, Technion - Israel Institute of Technology, Haifa 3200003, Israel
| |
Collapse
|
6
|
Szymkuć S, Gajewska EP, Klucznik T, Molga K, Dittwald P, Startek M, Bajczyk M, Grzybowski BA. Computer-Assisted Synthetic Planning: The End of the Beginning. Angew Chem Int Ed Engl 2016; 55:5904-37. [PMID: 27062365 DOI: 10.1002/anie.201506101] [Citation(s) in RCA: 310] [Impact Index Per Article: 38.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2015] [Revised: 09/14/2015] [Indexed: 11/07/2022]
Abstract
Exactly half a century has passed since the launch of the first documented research project (1965 Dendral) on computer-assisted organic synthesis. Many more programs were created in the 1970s and 1980s but the enthusiasm of these pioneering days had largely dissipated by the 2000s, and the challenge of teaching the computer how to plan organic syntheses earned itself the reputation of a "mission impossible". This is quite curious given that, in the meantime, computers have "learned" many other skills that had been considered exclusive domains of human intellect and creativity-for example, machines can nowadays play chess better than human world champions and they can compose classical music pleasant to the human ear. Although there have been no similar feats in organic synthesis, this Review argues that to concede defeat would be premature. Indeed, bringing together the combination of modern computational power and algorithms from graph/network theory, chemical rules (with full stereo- and regiochemistry) coded in appropriate formats, and the elements of quantum mechanics, the machine can finally be "taught" how to plan syntheses of non-trivial organic molecules in a matter of seconds to minutes. The Review begins with an overview of some basic theoretical concepts essential for the big-data analysis of chemical syntheses. It progresses to the problem of optimizing pathways involving known reactions. It culminates with discussion of algorithms that allow for a completely de novo and fully automated design of syntheses leading to relatively complex targets, including those that have not been made before. Of course, there are still things to be improved, but computers are finally becoming relevant and helpful to the practice of organic-synthetic planning. Paraphrasing Churchill's famous words after the Allies' first major victory over the Axis forces in Africa, it is not the end, it is not even the beginning of the end, but it is the end of the beginning for the computer-assisted synthesis planning. The machine is here to stay.
Collapse
Affiliation(s)
- Sara Szymkuć
- Institute of Organic Chemistry, Polish Academy of Sciences, Kasprzaka 44/52, Warsaw, 02-224, Poland
| | - Ewa P Gajewska
- Institute of Organic Chemistry, Polish Academy of Sciences, Kasprzaka 44/52, Warsaw, 02-224, Poland
| | - Tomasz Klucznik
- Institute of Organic Chemistry, Polish Academy of Sciences, Kasprzaka 44/52, Warsaw, 02-224, Poland
| | - Karol Molga
- Institute of Organic Chemistry, Polish Academy of Sciences, Kasprzaka 44/52, Warsaw, 02-224, Poland
| | - Piotr Dittwald
- Institute of Organic Chemistry, Polish Academy of Sciences, Kasprzaka 44/52, Warsaw, 02-224, Poland
| | - Michał Startek
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Banacha 2, 02-097 Warszawa, Poland
| | - Michał Bajczyk
- Institute of Organic Chemistry, Polish Academy of Sciences, Kasprzaka 44/52, Warsaw, 02-224, Poland
| | - Bartosz A Grzybowski
- Institute of Organic Chemistry, Polish Academy of Sciences, Kasprzaka 44/52, Warsaw, 02-224, Poland. , .,Center for Soft and Living Matter of Korea's Institute for Basic Science (IBS), Department of Chemistry, Ulsan National Institute of Science and Technology, 50, UNIST-gil, Eonyang-eup, Ulju-gun, Ulsan, South Korea. ,
| |
Collapse
|
7
|
Szymkuć S, Gajewska EP, Klucznik T, Molga K, Dittwald P, Startek M, Bajczyk M, Grzybowski BA. Computergestützte Syntheseplanung: Das Ende vom Anfang. Angew Chem Int Ed Engl 2016. [DOI: 10.1002/ange.201506101] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Affiliation(s)
- Sara Szymkuć
- Institute of Organic Chemistry Polish Academy of Sciences Kasprzaka 44/52 Warsaw 02-224 Polen
| | - Ewa P. Gajewska
- Institute of Organic Chemistry Polish Academy of Sciences Kasprzaka 44/52 Warsaw 02-224 Polen
| | - Tomasz Klucznik
- Institute of Organic Chemistry Polish Academy of Sciences Kasprzaka 44/52 Warsaw 02-224 Polen
| | - Karol Molga
- Institute of Organic Chemistry Polish Academy of Sciences Kasprzaka 44/52 Warsaw 02-224 Polen
| | - Piotr Dittwald
- Institute of Organic Chemistry Polish Academy of Sciences Kasprzaka 44/52 Warsaw 02-224 Polen
| | - Michał Startek
- Faculty of Mathematics, Informatics, and Mechanics University of Warsaw Banacha 2 02-097 Warszawa Poland
| | - Michał Bajczyk
- Institute of Organic Chemistry Polish Academy of Sciences Kasprzaka 44/52 Warsaw 02-224 Polen
| | - Bartosz A. Grzybowski
- Institute of Organic Chemistry Polish Academy of Sciences Kasprzaka 44/52 Warsaw 02-224 Polen
- Center for Soft and Living Matter of Korea's Institute for Basic Science (IBS) Department of Chemistry Ulsan National Institute of Science and Technology 50, UNIST-gil, Eonyang-eup, Ulju-gun Ulsan Südkorea
| |
Collapse
|
8
|
Emami FS, Vahid A, Wylie EK, Szymkuć S, Dittwald P, Molga K, Grzybowski BA. A Priori Estimation of Organic Reaction Yields. Angew Chem Int Ed Engl 2015. [DOI: 10.1002/anie.201503890] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
9
|
Emami FS, Vahid A, Wylie EK, Szymkuć S, Dittwald P, Molga K, Grzybowski BA. A Priori Estimation of Organic Reaction Yields. Angew Chem Int Ed Engl 2015. [DOI: 10.1002/ange.201503890] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
- Fateme S. Emami
- Department of Chemical and Biological Engineering, Northwestern University (USA)
| | - Amir Vahid
- Department of Chemical and Biological Engineering, Northwestern University (USA)
| | - Elizabeth K. Wylie
- Department of Chemical and Biological Engineering, Northwestern University (USA)
| | - Sara Szymkuć
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw (Poland)
| | - Piotr Dittwald
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw (Poland)
| | - Karol Molga
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw (Poland)
| | - Bartosz A. Grzybowski
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw (Poland)
- Department of Chemistry and the IBS Center for Soft and Living Matter, UNIST, Ulsan (South Korea)
| |
Collapse
|