1
|
Granulo N, Sosnin S, Digles D, Ecker GF. The macrocycle inhibitor landscape of SLC-transporter. Mol Inform 2024; 43:e202300287. [PMID: 38288682 DOI: 10.1002/minf.202300287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 01/08/2024] [Accepted: 01/29/2024] [Indexed: 03/06/2024]
Abstract
In the past years the interest in Solute Carrier Transporters (SLC) has increased due to their potential as drug targets. At the same time, macrocycles demonstrated promising activities as therapeutic agents. However, the overall macrocycle/SLC-transporter interaction landscape has not been fully revealed yet. In this study, we present a statistical analysis of macrocycles with measured activity against SLC-transporter. Using a data mining pipeline based on KNIME retrieved in total 825 bioactivity data points of macrocycles interacting with SLC-transporter. For further analysis of the SLC inhibitor profiles we developed an interactive KNIME workflow as well as an interactive map of the chemical space coverage utilizing parametric t-SNE models. The parametric t-SNE models provide a good discrimination ability among several corresponding SLC subfamilies' targets. The KNIME workflow, the dataset, and the visualization tool are freely available to the community.
Collapse
Affiliation(s)
- Nejra Granulo
- Department of Pharmaceutical Sciences, University of Vienna, Josef Holaubek Platz 2, 1090, Vienna, Austria
- Research Platform NeGeMac-Next Generation Macrocycles to Address Challenging Protein Interfaces, University of Vienna, 1090, Vienna, Austria
| | - Sergey Sosnin
- Department of Pharmaceutical Sciences, University of Vienna, Josef Holaubek Platz 2, 1090, Vienna, Austria
| | - Daniela Digles
- Department of Pharmaceutical Sciences, University of Vienna, Josef Holaubek Platz 2, 1090, Vienna, Austria
| | - Gerhard F Ecker
- Department of Pharmaceutical Sciences, University of Vienna, Josef Holaubek Platz 2, 1090, Vienna, Austria
- Research Platform NeGeMac-Next Generation Macrocycles to Address Challenging Protein Interfaces, University of Vienna, 1090, Vienna, Austria
| |
Collapse
|
2
|
Huang J, Osthushenrich T, MacNamara A, Mälarstig A, Brocchetti S, Bradberry S, Scarabottolo L, Ferrada E, Sosnin S, Digles D, Superti-Furga G, Ecker GF. ProteoMutaMetrics: machine learning approaches for solute carrier family 6 mutation pathogenicity prediction. RSC Adv 2024; 14:13083-13094. [PMID: 38655474 PMCID: PMC11034476 DOI: 10.1039/d4ra00748d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 03/25/2024] [Indexed: 04/26/2024] Open
Abstract
The solute carrier transporter family 6 (SLC6) is of key interest for their critical role in the transport of small amino acids or amino acid-like molecules. Their dysfunction is strongly associated with human diseases such as including schizophrenia, depression, and Parkinson's disease. Linking single point mutations to disease may support insights into the structure-function relationship of these transporters. This work aimed to develop a computational model for predicting the potential pathogenic effect of single point mutations in the SLC6 family. Missense mutation data was retrieved from UniProt, LitVar, and ClinVar, covering multiple protein-coding transcripts. As encoding approach, amino acid descriptors were used to calculate the average sequence properties for both original and mutated sequences. In addition to the full-sequence calculation, the sequences were cut into twelve domains. The domains are defined according to the transmembrane domains of the SLC6 transporters to analyse the regions' contributions to the pathogenicity prediction. Subsequently, several classification models, namely Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost) with the hyperparameters optimized through grid search were built. For estimation of model performance, repeated stratified k-fold cross-validation was used. The accuracy values of the generated models are in the range of 0.72 to 0.80. Analysis of feature importance indicates that mutations in distinct regions of SLC6 transporters are associated with an increased risk for pathogenicity. When applying the model on an independent validation set, the performance in accuracy dropped to averagely 0.6 with high precision but low sensitivity scores.
Collapse
Affiliation(s)
- Jiahui Huang
- University of Vienna, Department of Pharmaceutical Sciences Vienna Austria
| | - Tanja Osthushenrich
- Bayer AG, Division Pharmaceuticals, Biomedical Data Science II Wuppertal Germany
| | - Aidan MacNamara
- Bayer AG, Division Pharmaceuticals, Biomedical Data Science II Wuppertal Germany
| | - Anders Mälarstig
- Emerging Science & Innovation, Pfizer Worldwide Research, Development and Medical Cambridge MA USA
| | | | | | | | - Evandro Ferrada
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences Vienna Austria
| | - Sergey Sosnin
- University of Vienna, Department of Pharmaceutical Sciences Vienna Austria
| | - Daniela Digles
- University of Vienna, Department of Pharmaceutical Sciences Vienna Austria
| | - Giulio Superti-Furga
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences Vienna Austria
| | - Gerhard F Ecker
- University of Vienna, Department of Pharmaceutical Sciences Vienna Austria
| |
Collapse
|
3
|
Smajić A, Rami I, Sosnin S, Ecker GF. Identifying Differences in the Performance of Machine Learning Models for Off-Targets Trained on Publicly Available and Proprietary Data Sets. Chem Res Toxicol 2023; 36:1300-1312. [PMID: 37439496 PMCID: PMC10445286 DOI: 10.1021/acs.chemrestox.3c00042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Indexed: 07/14/2023]
Abstract
Each year, publicly available databases are updated with new compounds from different research institutions. Positive experimental outcomes are more likely to be reported; therefore, they account for a considerable fraction of these entries. Established publicly available databases such as ChEMBL allow researchers to use information without constrictions and create predictive tools for a broad spectrum of applications in the field of toxicology. Therefore, we investigated the distribution of positive and nonpositive entries within ChEMBL for a set of off-targets and its impact on the performance of classification models when applied to pharmaceutical industry data sets. Results indicate that models trained on publicly available data tend to overpredict positives, and models based on industry data sets predict negatives more often than those built using publicly available data sets. This is strengthened even further by the visualization of the prediction space for a set of 10,000 compounds, which makes it possible to identify regions in the chemical space where predictions converge. Finally, we highlight the utilization of these models for consensus modeling for potential adverse events prediction.
Collapse
Affiliation(s)
- Aljoša Smajić
- Department of Pharmaceutical Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria
| | - Iris Rami
- Department of Pharmaceutical Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria
| | - Sergey Sosnin
- Department of Pharmaceutical Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria
| | - Gerhard F. Ecker
- Department of Pharmaceutical Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria
| |
Collapse
|
4
|
Sosnina EA, Sosnin S, Fedorov MV. Improvement of multi-task learning by data enrichment: application for drug discovery. J Comput Aided Mol Des 2023; 37:183-200. [PMID: 36943645 DOI: 10.1007/s10822-023-00500-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 02/21/2023] [Indexed: 03/23/2023]
Abstract
Multi-task learning in deep neural networks has become a topic of growing importance in many research fields, including drug discovery. However, applying multi-task learning poses new challenges in improving prediction performance. This study investigated the potential of training data enrichment to enhance multi-task model prediction quality in drug discovery. The study evaluated four scenarios with varying degrees of information capacity of the training data and applied two types of test data to evaluate prediction performance. We used three datasets: ViralChEMBL, which consisted of binary activities of compounds against viral species, was applied for the classification task; pQSAR(159) and pQSAR(4267), which consisted of bio-activities of compounds and assays from the research of the profile-QSAR method, were applied for regression tasks. We built multi-task models based on the feed-forward DNNs using the PyTorch framework. Our findings showed that training data enrichment could be an effective means of enhancing prediction performance in multi-task learning, but the degree of improvement depends on the quality of the training data. The more unique compounds and targets the training data included, the more new compound-target interactions are required for prediction improvement. Also, we found out that even using multi-task learning, one could not predict the interactions of compounds that are highly dissimilar from those used for model training. The study provides some recommendations for effectively employing multi-task learning in drug discovery to improve prediction accuracy and facilitate the discovery of novel drug candidates.
Collapse
Affiliation(s)
- Ekaterina A Sosnina
- Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow, Russia, 143026.
| | - Sergey Sosnin
- Department of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1190, Vienna, Austria
| | - Maxim V Fedorov
- Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow, Russia, 143026
- Sirius University of Science and Technology, Olympiisky Prospect 1, Sochi, Russia, 354340
| |
Collapse
|
5
|
Kostyukevich Y, Sosnin S, Osipenko S, Kovaleva O, Rumiantseva L, Kireev A, Zherebker A, Fedorov M, Nikolaev EN. PyFragMS-A Web Tool for the Investigation of the Collision-Induced Fragmentation Pathways. ACS Omega 2022; 7:9710-9719. [PMID: 35350354 PMCID: PMC8945079 DOI: 10.1021/acsomega.1c07272] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Accepted: 02/28/2022] [Indexed: 05/13/2023]
Abstract
Dissociation induced by the accumulation of internal energy via collisions of ions with neutral molecules is one of the most important fragmentation techniques in mass spectrometry (MS), and the identification of small singly charged molecules is based mainly on the consideration of the fragmentation spectrum. Many research studies have been dedicated to the creation of databases of experimentally measured tandem mass spectrometry (MS/MS) spectra (such as MzCloud, Metlin, etc.) and developing software for predicting MS/MS fragments in silico from the molecular structure (such as MetFrag, CFM-ID, CSI:FingerID, etc.). However, the fragmentation mechanisms and pathways are still not fully understood. One of the limiting obstacles is that protomers (positive ions protonated at different sites) produce different fragmentation spectra, and these spectra overlap in the case of the presence of different protomers. Here, we are proposing to use a combination of two powerful approaches: computing fragmentation trees that carry information of all consecutive fragmentations and consideration of the MS/MS data of isotopically labeled compounds. We have created PyFragMS-a web tool consisting of a database of annotated MS/MS spectra of isotopically labeled molecules (after H/D and/or 16O/18O exchange) and a collection of instruments for computing fragmentation trees for an arbitrary molecule. Using PyFragMS, we investigated how the site of protonation influences the fragmentation pathway for small molecules. Also, PyFragMS offers capabilities for performing database search when MS/MS data of the isotopically labeled compounds are taken into account.
Collapse
|
6
|
Andronov M, Fedorov MV, Sosnin S. Exploring Chemical Reaction Space with Reaction Difference Fingerprints and Parametric t-SNE. ACS Omega 2021; 6:30743-30751. [PMID: 34805702 PMCID: PMC8600617 DOI: 10.1021/acsomega.1c04778] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 10/18/2021] [Indexed: 06/13/2023]
Abstract
Humans prefer visual representations for the analysis of large databases. In this work, we suggest a method for the visualization of the chemical reaction space. Our technique uses the t-SNE approach that is parameterized using a deep neural network (parametric t-SNE). We demonstrated that the parametric t-SNE combined with reaction difference fingerprints could provide a tool for the projection of chemical reactions on a low-dimensional manifold for easy exploration of reaction space. We showed that the global reaction landscape projected on a 2D plane corresponds well with the already known reaction types. The application of a pretrained parametric t-SNE model to new reactions allows chemists to study these reactions in a global reaction space. We validated the feasibility of this approach for two commercial drugs, darunavir and montelukast. We believe that our method can help to explore reaction space and will inspire chemists to find new reactions and synthetic ways.
Collapse
Affiliation(s)
- Mikhail Andronov
- Faculty
of Fundamental Physical and Chemical Engineering, Lomonosov Moscow State University, Leninskie gory, 1, Moscow 119991, Russian Federation
| | - Maxim V. Fedorov
- Sirius
University of Science and Technology, Olimpiysky Ave. b.1, Sochi 354000, Russian Federation
- Syntelly
LLC, Bolshoy Boulevard
30, bld. 1, Moscow 121205, Russian Federation
- Skolkovo
Institute of Science and Technology, Bolshoy Boulevard 30, bld. 1, Moscow 121205, Russian
Federation
| | - Sergey Sosnin
- Syntelly
LLC, Bolshoy Boulevard
30, bld. 1, Moscow 121205, Russian Federation
- Skolkovo
Institute of Science and Technology, Bolshoy Boulevard 30, bld. 1, Moscow 121205, Russian
Federation
| |
Collapse
|
7
|
Mansouri K, Karmaus AL, Fitzpatrick J, Patlewicz G, Pradeep P, Alberga D, Alepee N, Allen TEH, Allen D, Alves VM, Andrade CH, Auernhammer TR, Ballabio D, Bell S, Benfenati E, Bhattacharya S, Bastos JV, Boyd S, Brown JB, Capuzzi SJ, Chushak Y, Ciallella H, Clark AM, Consonni V, Daga PR, Ekins S, Farag S, Fedorov M, Fourches D, Gadaleta D, Gao F, Gearhart JM, Goh G, Goodman JM, Grisoni F, Grulke CM, Hartung T, Hirn M, Karpov P, Korotcov A, Lavado GJ, Lawless M, Li X, Luechtefeld T, Lunghini F, Mangiatordi GF, Marcou G, Marsh D, Martin T, Mauri A, Muratov EN, Myatt GJ, Nguyen DT, Nicolotti O, Note R, Pande P, Parks AK, Peryea T, Polash AH, Rallo R, Roncaglioni A, Rowlands C, Ruiz P, Russo DP, Sayed A, Sayre R, Sheils T, Siegel C, Silva AC, Simeonov A, Sosnin S, Southall N, Strickland J, Tang Y, Teppen B, Tetko IV, Thomas D, Tkachenko V, Todeschini R, Toma C, Tripodi I, Trisciuzzi D, Tropsha A, Varnek A, Vukovic K, Wang Z, Wang L, Waters KM, Wedlake AJ, Wijeyesakere SJ, Wilson D, Xiao Z, Yang H, Zahoranszky-Kohalmi G, Zakharov AV, Zhang FF, Zhang Z, Zhao T, Zhu H, Zorn KM, Casey W, Kleinstreuer NC. Erratum: CATMoS: Collaborative Acute Toxicity Modeling Suite. Environ Health Perspect 2021; 129:109001. [PMID: 34647794 PMCID: PMC8516060 DOI: 10.1289/ehp10369] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 09/24/2021] [Indexed: 05/21/2023]
|
8
|
Krasnov L, Khokhlov I, Fedorov MV, Sosnin S. Transformer-based artificial neural networks for the conversion between chemical notations. Sci Rep 2021; 11:14798. [PMID: 34285269 PMCID: PMC8292511 DOI: 10.1038/s41598-021-94082-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 07/06/2021] [Indexed: 11/08/2022] Open
Abstract
We developed a Transformer-based artificial neural approach to translate between SMILES and IUPAC chemical notations: Struct2IUPAC and IUPAC2Struct. The overall performance level of our model is comparable to the rule-based solutions. We proved that the accuracy and speed of computations as well as the robustness of the model allow to use it in production. Our showcase demonstrates that a neural-based solution can facilitate rapid development keeping the required level of accuracy. We believe that our findings will inspire other developers to reduce development costs by replacing complex rule-based solutions with neural-based ones.
Collapse
Affiliation(s)
- Lev Krasnov
- Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology , Bolshoy Boulevard 30, bld. 1, Moscow, 121205, Russia
- Syntelly LLC, Bolshoy Boulevard 30, bld. 1, Moscow, 121205, Russia
- Department of Chemistry, Lomonosov Moscow State University, GSP-1, 1-3 Leninskiye Gory, Moscow, 119991, Russia
| | - Ivan Khokhlov
- Syntelly LLC, Bolshoy Boulevard 30, bld. 1, Moscow, 121205, Russia
| | - Maxim V Fedorov
- Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology , Bolshoy Boulevard 30, bld. 1, Moscow, 121205, Russia
- Syntelly LLC, Bolshoy Boulevard 30, bld. 1, Moscow, 121205, Russia
| | - Sergey Sosnin
- Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology , Bolshoy Boulevard 30, bld. 1, Moscow, 121205, Russia.
- Syntelly LLC, Bolshoy Boulevard 30, bld. 1, Moscow, 121205, Russia.
| |
Collapse
|
9
|
Mansouri K, Karmaus A, Fitzpatrick J, Patlewicz G, Pradeep P, Alberga D, Alepee N, Allen TEH, Allen D, Alves VM, Andrade CH, Auernhammer TR, Ballabio D, Bell S, Benfenati E, Bhattacharya S, Bastos JV, Boyd S, Brown JB, Capuzzi SJ, Chushak Y, Ciallella H, Clark AM, Consonni V, Daga PR, Ekins S, Farag S, Fedorov M, Fourches D, Gadaleta D, Gao F, Gearhart JM, Goh G, Goodman JM, Grisoni F, Grulke CM, Hartung T, Hirn M, Karpov P, Korotcov A, Lavado GJ, Lawless M, Li X, Luechtefeld T, Lunghini F, Mangiatordi GF, Marcou G, Marsh D, Martin T, Mauri A, Muratov EN, Myatt GJ, Nguyen DT, Nicolotti O, Note R, Pande P, Parks AK, Peryea T, Polash A, Rallo R, Roncaglioni A, Rowlands C, Ruiz P, Russo D, Sayed A, Sayre R, Sheils T, Siegel C, Silva AC, Simeonov A, Sosnin S, Southall N, Strickland J, Tang Y, Teppen B, Tetko IV, Thomas D, Tkachenko V, Todeschini R, Toma C, Tripodi I, Trisciuzzi D, Tropsha A, Varnek A, Vukovic K, Wang Z, Wang L, Waters KM, Wedlake AJ, Wijeyesakere SJ, Wilson D, Xiao Z, Yang H, Zahoranszky-Kohalmi G, Zakharov AV, Zhang FF, Zhang Z, Zhao T, Zhu H, Zorn KM, Casey W, Kleinstreuer NC. Erratum: CATMoS: Collaborative Acute Toxicity Modeling Suite. Environ Health Perspect 2021; 129:79001. [PMID: 34242083 PMCID: PMC8270350 DOI: 10.1289/ehp9883] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Accepted: 06/28/2021] [Indexed: 05/28/2023]
|
10
|
Mansouri K, Karmaus AL, Fitzpatrick J, Patlewicz G, Pradeep P, Alberga D, Alepee N, Allen TE, Allen D, Alves VM, Andrade CH, Auernhammer TR, Ballabio D, Bell S, Benfenati E, Bhattacharya S, Bastos JV, Boyd S, Brown J, Capuzzi SJ, Chushak Y, Ciallella H, Clark AM, Consonni V, Daga PR, Ekins S, Farag S, Fedorov M, Fourches D, Gadaleta D, Gao F, Gearhart JM, Goh G, Goodman JM, Grisoni F, Grulke CM, Hartung T, Hirn M, Karpov P, Korotcov A, Lavado GJ, Lawless M, Li X, Luechtefeld T, Lunghini F, Mangiatordi GF, Marcou G, Marsh D, Martin T, Mauri A, Muratov EN, Myatt GJ, Nguyen DT, Nicolotti O, Note R, Pande P, Parks AK, Peryea T, Polash AH, Rallo R, Roncaglioni A, Rowlands C, Ruiz P, Russo DP, Sayed A, Sayre R, Sheils T, Siegel C, Silva AC, Simeonov A, Sosnin S, Southall N, Strickland J, Tang Y, Teppen B, Tetko IV, Thomas D, Tkachenko V, Todeschini R, Toma C, Tripodi I, Trisciuzzi D, Tropsha A, Varnek A, Vukovic K, Wang Z, Wang L, Waters KM, Wedlake AJ, Wijeyesakere SJ, Wilson D, Xiao Z, Yang H, Zahoranszky-Kohalmi G, Zakharov AV, Zhang FF, Zhang Z, Zhao T, Zhu H, Zorn KM, Casey W, Kleinstreuer NC. CATMoS: Collaborative Acute Toxicity Modeling Suite. Environ Health Perspect 2021; 129:47013. [PMID: 33929906 PMCID: PMC8086800 DOI: 10.1289/ehp8495] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
BACKGROUND Humans are exposed to tens of thousands of chemical substances that need to be assessed for their potential toxicity. Acute systemic toxicity testing serves as the basis for regulatory hazard classification, labeling, and risk management. However, it is cost- and time-prohibitive to evaluate all new and existing chemicals using traditional rodent acute toxicity tests. In silico models built using existing data facilitate rapid acute toxicity predictions without using animals. OBJECTIVES The U.S. Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM) Acute Toxicity Workgroup organized an international collaboration to develop in silico models for predicting acute oral toxicity based on five different end points: Lethal Dose 50 (LD50 value, U.S. Environmental Protection Agency hazard (four) categories, Globally Harmonized System for Classification and Labeling hazard (five) categories, very toxic chemicals [LD50 (LD50≤50mg/kg)], and nontoxic chemicals (LD50>2,000mg/kg). METHODS An acute oral toxicity data inventory for 11,992 chemicals was compiled, split into training and evaluation sets, and made available to 35 participating international research groups that submitted a total of 139 predictive models. Predictions that fell within the applicability domains of the submitted models were evaluated using external validation sets. These were then combined into consensus models to leverage strengths of individual approaches. RESULTS The resulting consensus predictions, which leverage the collective strengths of each individual model, form the Collaborative Acute Toxicity Modeling Suite (CATMoS). CATMoS demonstrated high performance in terms of accuracy and robustness when compared with in vivo results. DISCUSSION CATMoS is being evaluated by regulatory agencies for its utility and applicability as a potential replacement for in vivo rat acute oral toxicity studies. CATMoS predictions for more than 800,000 chemicals have been made available via the National Toxicology Program's Integrated Chemical Environment tools and data sets (ice.ntp.niehs.nih.gov). The models are also implemented in a free, standalone, open-source tool, OPERA, which allows predictions of new and untested chemicals to be made. https://doi.org/10.1289/EHP8495.
Collapse
Affiliation(s)
- Kamel Mansouri
- Integrated Laboratory Systems, LLC, Morrisville, North Carolina, USA
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Research Triangle Park, North Carolina, USA
| | - Agnes L. Karmaus
- Integrated Laboratory Systems, LLC, Morrisville, North Carolina, USA
| | | | - Grace Patlewicz
- Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
| | - Prachi Pradeep
- Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
- Oak Ridge Institute for Science and Education (ORISE) Research Participation Program, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
| | - Domenico Alberga
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari “Aldo Moro”, Bari, Italy
| | | | - Timothy E.H. Allen
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Dave Allen
- Integrated Laboratory Systems, LLC, Morrisville, North Carolina, USA
| | - Vinicius M. Alves
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina, USA
- Laboratory for Molecular Modeling and Design, Faculty of Pharmacy, Federal University of Goiás, Goiania, Brazil
| | - Carolina H. Andrade
- Laboratory for Molecular Modeling and Design, Faculty of Pharmacy, Federal University of Goiás, Goiania, Brazil
| | | | - Davide Ballabio
- Milano Chemometrics & QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Shannon Bell
- Integrated Laboratory Systems, LLC, Morrisville, North Carolina, USA
| | - Emilio Benfenati
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Sudin Bhattacharya
- Institute for Quantitative Health Science and Engineering, Department of Biomedical Engineering, Michigan State University, East Lansing, Michigan, USA
| | - Joyce V. Bastos
- Laboratory for Molecular Modeling and Design, Faculty of Pharmacy, Federal University of Goiás, Goiania, Brazil
| | - Stephen Boyd
- Department of Plant, Soil, and Microbial Sciences, Michigan State University, East Lansing, Michigan, USA
| | - J.B. Brown
- Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Stephen J. Capuzzi
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Yaroslav Chushak
- Aeromedical Research Department, Force Health Protection, USAFSAM, Dayton, Ohio, USA
- Henry M Jackson Foundation for the Advancement of Military Medicine, Dayton, Ohio, USA
| | - Heather Ciallella
- Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey, USA
| | - Alex M. Clark
- Collaborations Pharmaceuticals, Inc., Raleigh, North Carolina, USA
| | - Viviana Consonni
- Milano Chemometrics & QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | | | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., Raleigh, North Carolina, USA
| | - Sherif Farag
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Maxim Fedorov
- Skoltech, Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Denis Fourches
- Department of Chemistry, North Carolina State University, Raleigh, North Carolina, USA
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
| | - Domenico Gadaleta
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Feng Gao
- Department of Plant, Soil, and Microbial Sciences, Michigan State University, East Lansing, Michigan, USA
| | - Jeffery M. Gearhart
- Aeromedical Research Department, Force Health Protection, USAFSAM, Dayton, Ohio, USA
- Henry M Jackson Foundation for the Advancement of Military Medicine, Dayton, Ohio, USA
| | - Garett Goh
- Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Jonathan M. Goodman
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Francesca Grisoni
- Milano Chemometrics & QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Christopher M. Grulke
- Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
| | | | - Matthew Hirn
- Department of Computational Mathematics, Science & Engineering, Department of Mathematics, Michigan State University, East Lansing, Michigan, USA
| | - Pavel Karpov
- Institute of Structural Biology, Helmholtz Zentrum München (GmbH), Neuherberg, Germany
| | | | - Giovanna J. Lavado
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | | | - Xinhao Li
- Department of Chemistry, North Carolina State University, Raleigh, North Carolina, USA
| | | | - Filippo Lunghini
- Laboratoire de Chemoinformatique, URM7140, Université de Strasbourg, Strasbourg, France
| | - Giuseppe F. Mangiatordi
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari “Aldo Moro”, Bari, Italy
| | - Gilles Marcou
- Laboratoire de Chemoinformatique, URM7140, Université de Strasbourg, Strasbourg, France
| | - Dan Marsh
- Underwriters Laboratories, Northbrook, Illinois, USA
| | - Todd Martin
- Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, Cincinnati, Ohio, USA
| | | | - Eugene N. Muratov
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina, USA
- Laboratory for Molecular Modeling and Design, Faculty of Pharmacy, Federal University of Goiás, Goiania, Brazil
| | | | - Dac-Trung Nguyen
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Orazio Nicolotti
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari “Aldo Moro”, Bari, Italy
| | - Reine Note
- L’Oréal Research & Innovation, Aulnay-sous-Bois, France
| | - Paritosh Pande
- Pacific Northwest National Laboratory, Richland, Washington, USA
| | | | - Tyler Peryea
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | | | - Robert Rallo
- Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Alessandra Roncaglioni
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | | | - Patricia Ruiz
- Office of Innovation and Analytics, Agency for Toxic Substances and Disease Registry, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Daniel P. Russo
- Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey, USA
| | - Ahmed Sayed
- Rosettastein Consulting UG, Freising, Germany
| | - Risa Sayre
- Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
- Oak Ridge Institute for Science and Education (ORISE) Research Participation Program, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
| | - Timothy Sheils
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Charles Siegel
- Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Arthur C. Silva
- Laboratory for Molecular Modeling and Design, Faculty of Pharmacy, Federal University of Goiás, Goiania, Brazil
| | - Anton Simeonov
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Sergey Sosnin
- Skoltech, Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Noel Southall
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Judy Strickland
- Integrated Laboratory Systems, LLC, Morrisville, North Carolina, USA
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Brian Teppen
- Department of Plant, Soil, and Microbial Sciences, Michigan State University, East Lansing, Michigan, USA
| | - Igor V. Tetko
- Institute of Structural Biology, Helmholtz Zentrum München (GmbH), Neuherberg, Germany
- BIGCHEM GmbH, Unterschleissheim, Germany
| | - Dennis Thomas
- Pacific Northwest National Laboratory, Richland, Washington, USA
| | | | - Roberto Todeschini
- Milano Chemometrics & QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Cosimo Toma
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Ignacio Tripodi
- Computer Science/Interdisciplinary Quantitative Biology, University of Colorado, Boulder, Colorado, USA
| | - Daniela Trisciuzzi
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari “Aldo Moro”, Bari, Italy
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Alexandre Varnek
- Laboratoire de Chemoinformatique, URM7140, Université de Strasbourg, Strasbourg, France
| | - Kristijan Vukovic
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Zhongyu Wang
- School of Environmental Sciences and Technology, Dalian University of Technology; Dalian, Liaoning, China
| | - Liguo Wang
- School of Environmental Sciences and Technology, Dalian University of Technology; Dalian, Liaoning, China
| | | | - Andrew J. Wedlake
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | | | - Dan Wilson
- The Dow Chemical Company, Midland, Michigan, USA
| | - Zijun Xiao
- School of Environmental Sciences and Technology, Dalian University of Technology; Dalian, Liaoning, China
| | - Hongbin Yang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Gergely Zahoranszky-Kohalmi
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Alexey V. Zakharov
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | | | - Zhen Zhang
- Dow Agrosciences, Indianapolis, Indiana, USA
| | - Tongan Zhao
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Hao Zhu
- Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey, USA
| | | | - Warren Casey
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Research Triangle Park, North Carolina, USA
| | - Nicole C. Kleinstreuer
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Research Triangle Park, North Carolina, USA
| |
Collapse
|
11
|
Sosnina EA, Sosnin S, Nikitina AA, Nazarov I, Osolodkin DI, Fedorov MV. Recommender Systems in Antiviral Drug Discovery. ACS Omega 2020; 5:15039-15051. [PMID: 32632398 PMCID: PMC7315437 DOI: 10.1021/acsomega.0c00857] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Accepted: 06/03/2020] [Indexed: 06/11/2023]
Abstract
Recommender systems (RSs), which underwent rapid development and had an enormous impact on e-commerce, have the potential to become useful tools for drug discovery. In this paper, we applied RS methods for the prediction of the antiviral activity class (active/inactive) for compounds extracted from ChEMBL. Two main RS approaches were applied: collaborative filtering (Surprise implementation) and content-based filtering (sparse-group inductive matrix completion (SGIMC) method). The effectiveness of RS approaches was investigated for prediction of antiviral activity classes ("interactions") for compounds and viruses, for which some of their interactions with other viruses or compounds are known, and for prediction of interaction profiles for new compounds. Both approaches achieved relatively good prediction quality for binary classification of individual interactions and compound profiles, as quantified by cross-validation and external validation receiver operating characteristic (ROC) score >0.9. Thus, even simple recommender systems may serve as an effective tool in antiviral drug discovery.
Collapse
Affiliation(s)
- Ekaterina A. Sosnina
- Center
for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow 143026, Russia
- Institute
of Physiologically Active Compounds, RAS, Severniy pr. 1, Chernogolovka 142432, Russia
| | - Sergey Sosnin
- Center
for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow 143026, Russia
- Syntelly
LLC, Skolkovo Innovation Center, Bolshoy Boulevard 30, Moscow 121205, Russia
| | - Anastasia A. Nikitina
- Department
of Chemistry, Lomonosov Moscow State University, Leninskie Gory 1 bd. 3, Moscow 119991, Russia
- FSBSI
“Chumakov FSC R&D IBP RAS”, Poselok Instituta Poliomielita 8
bd. 1, Poselenie Moskovsky, Moscow 108819, Russia
| | - Ivan Nazarov
- Center
for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow 143026, Russia
| | - Dmitry I. Osolodkin
- FSBSI
“Chumakov FSC R&D IBP RAS”, Poselok Instituta Poliomielita 8
bd. 1, Poselenie Moskovsky, Moscow 108819, Russia
- Institute
of Translational Medicine and Biotechnology, Sechenov First Moscow State Medical University, Trubetskaya Ulitsa 8, Moscow 119991, Russia
| | - Maxim V. Fedorov
- Center
for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow 143026, Russia
- Syntelly
LLC, Skolkovo Innovation Center, Bolshoy Boulevard 30, Moscow 121205, Russia
- Physics
John Anderson Building, University of Strathclyde, 107 Rottenrow East, Glasgow G4 0NG, U.K.
| |
Collapse
|
12
|
Karlov D, Sosnin S, Fedorov MV, Popov P. graphDelta: MPNN Scoring Function for the Affinity Prediction of Protein-Ligand Complexes. ACS Omega 2020; 5:5150-5159. [PMID: 32201802 PMCID: PMC7081425 DOI: 10.1021/acsomega.9b04162] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Accepted: 02/21/2020] [Indexed: 06/04/2023]
Abstract
In this work, we present graph-convolutional neural networks for the prediction of binding constants of protein-ligand complexes. We derived the model using multi task learning, where the target variables are the dissociation constant (K d), inhibition constant (K i), and half maximal inhibitory concentration (IC50). Being rigorously trained on the PDBbind dataset, the model achieves the Pearson correlation coefficient of 0.87 and the RMSE value of 1.05 in pK units, outperforming recently developed 3D convolutional neural network model K deep.
Collapse
Affiliation(s)
- Dmitry
S. Karlov
- Skolkovo
Institute of Science and Technology, Moscow 143026, Russia
| | - Sergey Sosnin
- Skolkovo
Institute of Science and Technology, Moscow 143026, Russia
- Skolkovo
Innovation Center,Syntelly LLC, 42 Bolshoy Boulevard, Moscow 143026, Russia
| | - Maxim V. Fedorov
- Skolkovo
Institute of Science and Technology, Moscow 143026, Russia
- Skolkovo
Innovation Center,Syntelly LLC, 42 Bolshoy Boulevard, Moscow 143026, Russia
- University
of Strathclyde, Physics
John Anderson Building, 107 Rottenrow East, Glasgow UK G4 0NG, U.K.
| | - Petr Popov
- Skolkovo
Institute of Science and Technology, Moscow 143026, Russia
- Moscow
Institute of Physics and Technology, Dolgoprudny 141701, Russia
| |
Collapse
|
13
|
Kostyukevich Y, Vladimirov G, Stekolschikova E, Ivanov D, Yablokov A, Zherebker A, Sosnin S, Orlov A, Fedorov M, Khaitovich P, Nikolaev E. Hydrogen/Deuterium Exchange Aiding Compound Identification for LC-MS and MALDI Imaging Lipidomics. Anal Chem 2019; 91:13465-13474. [PMID: 31490663 DOI: 10.1021/acs.analchem.9b02461] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
We present a novel approach for the increasing reliability of compound identification for LC-MS and MALDI imaging lipidomics. Our approach is based on the characterization of compounds not only by the elution time, accurate mass, and fragmentation spectra but also by the number of labile hydrogens that can be measured using the hydrogen/deuterium (H/D) exchange approach. The number of labile hydrogens (those from -OH and -NH groups) serves as an additional structural descriptor used when performing a database search. For LC-MS experiment, the H/D exchange was performed in the heating capillary of the modified electrospray ionization (ESI) source, while for MALDI imaging, the exchange was performed in the ion funnel at 10 Torr pressure. It was observed that such an approach allowed one to achieve a considerable degree of deuteration, enough to unambiguously distinguish between different classes of lipids. The proposed analytical approach may be successfully used for the identification not only of lipids but also of peptides and metabolites. A special software for the automatic filtration of molecules based on the number of functional groups was also developed.
Collapse
Affiliation(s)
- Yury Kostyukevich
- Skolkovo Institute of Science and Technology , Novaya Street, 100 , Skolkovo 143025 , Russian Federation.,Moscow Institute of Physics and Technology , Dolgoprudnyi , Moscow Region 141700 , Russia
| | - Gleb Vladimirov
- Skolkovo Institute of Science and Technology , Novaya Street, 100 , Skolkovo 143025 , Russian Federation
| | - Elena Stekolschikova
- Skolkovo Institute of Science and Technology , Novaya Street, 100 , Skolkovo 143025 , Russian Federation
| | - Daniil Ivanov
- Moscow Institute of Physics and Technology , Dolgoprudnyi , Moscow Region 141700 , Russia.,Emanuel Institute of Biochemical Physics , Russian Academy of Sciences , Kosygina Street, 4 , Moscow 119334 , Russia
| | - Arthur Yablokov
- Institute for Energy Problems of Chemical Physics , Russian Academy of Sciences , Leninskij pr. 38 k.2 , Moscow 119334 , Russia
| | - Alexander Zherebker
- Skolkovo Institute of Science and Technology , Novaya Street, 100 , Skolkovo 143025 , Russian Federation
| | - Sergey Sosnin
- Skolkovo Institute of Science and Technology , Novaya Street, 100 , Skolkovo 143025 , Russian Federation
| | - Alexey Orlov
- Skolkovo Institute of Science and Technology , Novaya Street, 100 , Skolkovo 143025 , Russian Federation
| | - Maxim Fedorov
- Skolkovo Institute of Science and Technology , Novaya Street, 100 , Skolkovo 143025 , Russian Federation
| | - Philipp Khaitovich
- Skolkovo Institute of Science and Technology , Novaya Street, 100 , Skolkovo 143025 , Russian Federation
| | - Evgeny Nikolaev
- Skolkovo Institute of Science and Technology , Novaya Street, 100 , Skolkovo 143025 , Russian Federation
| |
Collapse
|
14
|
Sosnin S, Vashurina M, Withnall M, Karpov P, Fedorov M, Tetko IV. A Survey of Multi-task Learning Methods in Chemoinformatics. Mol Inform 2019; 38:e1800108. [PMID: 30499195 PMCID: PMC6587441 DOI: 10.1002/minf.201800108] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2018] [Accepted: 10/16/2018] [Indexed: 01/09/2023]
Abstract
Despite the increasing volume of available data, the proportion of experimentally measured data remains small compared to the virtual chemical space of possible chemical structures. Therefore, there is a strong interest in simultaneously predicting different ADMET and biological properties of molecules, which are frequently strongly correlated with one another. Such joint data analyses can increase the accuracy of models by exploiting their common representation and identifying common features between individual properties. In this work we review the recent developments in multi-learning approaches as well as cover the freely available tools and packages that can be used to perform such studies.
Collapse
Affiliation(s)
- Sergey Sosnin
- Center for Computational and Data-Intensive Science and EngineeringSkolkovo Institute of Science and Technology Skolkovo Innovation CenterMoscow143026Russia
| | - Mariia Vashurina
- Helmholtz Zentrum München – German Research Center for Environmental Health (GmbH)Institute of Structural BiologyIngolstädter Landstraße 1D-85764NeuherbergGermany
| | - Michael Withnall
- Helmholtz Zentrum München – German Research Center for Environmental Health (GmbH)Institute of Structural BiologyIngolstädter Landstraße 1D-85764NeuherbergGermany
| | - Pavel Karpov
- Helmholtz Zentrum München – German Research Center for Environmental Health (GmbH)Institute of Structural BiologyIngolstädter Landstraße 1D-85764NeuherbergGermany
| | - Maxim Fedorov
- Center for Computational and Data-Intensive Science and EngineeringSkolkovo Institute of Science and Technology Skolkovo Innovation CenterMoscow143026Russia
- University of StrathclydeDepartment of Physics John Anderson Building, 107 Rottenrow EastG40NGGlasgowUnited Kingdom
| | - Igor V. Tetko
- Helmholtz Zentrum München – German Research Center for Environmental Health (GmbH)Institute of Structural BiologyIngolstädter Landstraße 1D-85764NeuherbergGermany
- BIGCHEM GmbHIngolstädter Landstraße 1, b. 60wD-85764NeuherbergGermany
| |
Collapse
|
15
|
Abstract
Acute toxicity is one of the most challenging properties to predict purely with computational methods due to its direct relationship to biological interactions. Moreover, toxicity can be represented by different end points: it can be measured for different species using different types of administration, etc., and it is questionable if the knowledge transfer between end points is possible. We performed a comparative study of prediction multitask toxicity for a broad chemical space using different descriptors and modeling algorithms and applied multitask learning for a large toxicity data set extracted from the Registry of Toxic Effects of Chemical Substances (RTECS). We demonstrated that multitask modeling provides significant improvement over single-output models and other machine learning methods. Our research reveals that multitask learning can be very useful to improve the quality of acute toxicity modeling and raises a discussion about the usage of multitask approaches for regulation purposes. Our MultiTox models are freely available in OCHEM platform ( ochem.eu/multitox ) under CC-BY-NC license.
Collapse
Affiliation(s)
- Sergey Sosnin
- Skolkovo Institute of Science and Technology , Skolkovo Innovation Center , Moscow 143026 , Russia
| | - Dmitry Karlov
- Skolkovo Institute of Science and Technology , Skolkovo Innovation Center , Moscow 143026 , Russia
| | - Igor V Tetko
- Helmholtz Zentrum München-Research Center for Environmental Health (GmbH) , Institute of Structural Biology and BIGCHEM GmbH , Ingolstädter Landstraße 1 , D-85764 Neuherberg , Germany
| | - Maxim V Fedorov
- Skolkovo Institute of Science and Technology , Skolkovo Innovation Center , Moscow 143026 , Russia.,University of Strathclyde , Department of Physics , John Anderson Building, 107 Rottenrow East , Glasgow , U.K. G40NG
| |
Collapse
|
16
|
Karlov DS, Sosnin S, Tetko IV, Fedorov MV. Chemical space exploration guided by deep neural networks. RSC Adv 2019; 9:5151-5157. [PMID: 35514634 PMCID: PMC9060647 DOI: 10.1039/c8ra10182e] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2018] [Accepted: 01/29/2019] [Indexed: 11/21/2022] Open
Abstract
A parametric t-SNE approach based on deep feed-forward neural networks was applied to the chemical space visualization problem. It is able to retain more information than certain dimensionality reduction techniques used for this purpose (principal component analysis (PCA), multidimensional scaling (MDS)). The applicability of this method to some chemical space navigation tasks (activity cliffs and activity landscapes identification) is discussed. We created a simple web tool to illustrate our work (http://space.syntelly.com). A parametric t-SNE approach based on deep feed-forward neural networks was applied to the chemical space visualization problem.![]()
Collapse
Affiliation(s)
- Dmitry S. Karlov
- Skolkovo Institute of Science and Technology
- Skolkovo Innovation Center
- Moscow 143026
- Russia
| | - Sergey Sosnin
- Skolkovo Institute of Science and Technology
- Skolkovo Innovation Center
- Moscow 143026
- Russia
- Syntelly LLC
| | - Igor V. Tetko
- Helmholtz Zentrum München – Research Center for Environmental Health (GmbH)
- Institute of Structural Biology
- Germany
- BIGCHEM GmbH
- Germany
| | - Maxim V. Fedorov
- Skolkovo Institute of Science and Technology
- Skolkovo Innovation Center
- Moscow 143026
- Russia
- Syntelly LLC
| |
Collapse
|
17
|
Sosnin S, Misin M, Palmer DS, Fedorov MV. 3D matters! 3D-RISM and 3D convolutional neural network for accurate bioaccumulation prediction. J Phys Condens Matter 2018; 30:32LT03. [PMID: 29964270 DOI: 10.1088/1361-648x/aad076] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this work, we present a new method for predicting complex physical-chemical properties of organic molecules. The approach utilizes 3D convolutional neural network (ActivNet4) that uses solvent spatial distributions around solutes as input. These spatial distributions are obtained by a molecular theory called three-dimensional reference interaction site model. We have shown that the method allows one to achieve a good accuracy of prediction of bioconcentration factor which is difficult to predict by direct application of methods of molecular theory or simulations. Our research demonstrates that combination of molecular theories with modern machine learning approaches can be effectively used for predicting properties that are otherwise inaccessible to purely theory-based models.
Collapse
Affiliation(s)
- Sergey Sosnin
- Center for Computational and Data-intensive Science and Engineering, Skolkovo Institute of Science and Technology, Nobelya Ulitsa 3 Moscow, 121205, Russia
| | | | | | | |
Collapse
|
18
|
Sosnina EA, Osolodkin DI, Radchenko EV, Sosnin S, Palyulin VA. Influence of Descriptor Implementation on Compound Ranking Based on Multiparameter Assessment. J Chem Inf Model 2018; 58:1083-1093. [PMID: 29689160 DOI: 10.1021/acs.jcim.7b00734] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Most of the common molecular descriptors have numerous different implementations. This can influence the results of compound prioritization based on the multiparameter assessment (MPA) approach that allows a medicinal chemist to simultaneously analyze and achieve the desired balance of the diverse and often conflicting molecular and pharmacological properties. In this study, we analyzed the feasibility of using different implementations of common descriptors (logP, logS, TPSA, logBB, hERG, nHBA) interchangeably in predesigned sets of requirements in the course of multiparameter compound optimization. The influence of methods of descriptor calculation, continuity or discreteness of their values, their applicability domains, as well as of the nature of desirability functions in an MPA profile were examined in terms of the stability of MPA compound ranking. It was shown that the interchangeable use of different methods of descriptor calculation is reliably acceptable only for continuously distributed parameters transformed by a smooth desirability function. If a descriptor in an MPA scheme is discretely distributed, only the implementation that was used for building the scoring profile may be used for assessment. An inconsistency of assessment due to different applicability domains of descriptors was also demonstrated.
Collapse
Affiliation(s)
- Ekaterina A Sosnina
- Department of Chemistry , Lomonosov Moscow State University , Moscow 119991 , Russia.,Center for Computational and Data-Intensive Science and Engineering , Skolkovo Institute of Science and Technology , Moscow 143026 , Russia.,Institute of Physiologically Active Compounds RAS , Chernogolovka 142432 , Russia
| | - Dmitry I Osolodkin
- Department of Chemistry , Lomonosov Moscow State University , Moscow 119991 , Russia.,Chumakov Institute of Poliomyelitis and Viral Encephalitides, Chumakov FSC R&D IBP RAS , Moscow 108819 , Russia.,Sechenov First Moscow State Medical University , Moscow 119991 , Russia
| | - Eugene V Radchenko
- Department of Chemistry , Lomonosov Moscow State University , Moscow 119991 , Russia.,Institute of Physiologically Active Compounds RAS , Chernogolovka 142432 , Russia
| | - Sergey Sosnin
- Center for Computational and Data-Intensive Science and Engineering , Skolkovo Institute of Science and Technology , Moscow 143026 , Russia.,Institute of Physiologically Active Compounds RAS , Chernogolovka 142432 , Russia
| | - Vladimir A Palyulin
- Department of Chemistry , Lomonosov Moscow State University , Moscow 119991 , Russia.,Institute of Physiologically Active Compounds RAS , Chernogolovka 142432 , Russia
| |
Collapse
|