Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Cortes-Ciriano I, Bender A. Improved Chemical Structure-Activity Modeling Through Data Augmentation. J Chem Inf Model 2015;55:2682-92. [PMID: 26619900 DOI: 10.1021/acs.jcim.5b00570] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

For:	Cortes-Ciriano I, Bender A. Improved Chemical Structure-Activity Modeling Through Data Augmentation. J Chem Inf Model 2015;55:2682-92. [PMID: 26619900 DOI: 10.1021/acs.jcim.5b00570] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Number

Cited by Other Article(s)

Gangwal A, Ansari A, Ahmad I, Azad AK, Wan Sulaiman WMA. Current strategies to address data scarcity in artificial intelligence-based drug discovery: A comprehensive review. Comput Biol Med 2024;179:108734. [PMID: 38964243 DOI: 10.1016/j.compbiomed.2024.108734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 06/01/2024] [Accepted: 06/08/2024] [Indexed: 07/06/2024]

Abstract

Artificial intelligence (AI) has played a vital role in computer-aided drug design (CADD). This development has been further accelerated with the increasing use of machine learning (ML), mainly deep learning (DL), and computing hardware and software advancements. As a result, initial doubts about the application of AI in drug discovery have been dispelled, leading to significant benefits in medicinal chemistry. At the same time, it is crucial to recognize that AI is still in its infancy and faces a few limitations that need to be addressed to harness its full potential in drug discovery. Some notable limitations are insufficient, unlabeled, and non-uniform data, the resemblance of some AI-generated molecules with existing molecules, unavailability of inadequate benchmarks, intellectual property rights (IPRs) related hurdles in data sharing, poor understanding of biology, focus on proxy data and ligands, lack of holistic methods to represent input (molecular structures) to prevent pre-processing of input molecules (feature engineering), etc. The major component in AI infrastructure is input data, as most of the successes of AI-driven efforts to improve drug discovery depend on the quality and quantity of data, used to train and test AI algorithms, besides a few other factors. Additionally, data-gulping DL approaches, without sufficient data, may collapse to live up to their promise. Current literature suggests a few methods, to certain extent, effectively handle low data for better output from the AI models in the context of drug discovery. These are transferring learning (TL), active learning (AL), single or one-shot learning (OSL), multi-task learning (MTL), data augmentation (DA), data synthesis (DS), etc. One different method, which enables sharing of proprietary data on a common platform (without compromising data privacy) to train ML model, is federated learning (FL). In this review, we compare and discuss these methods, their recent applications, and limitations while modeling small molecule data to get the improved output of AI methods in drug discovery. Article also sums up some other novel methods to handle inadequate data.

Collapse

Sikder R, Zhang H, Gao P, Ye T. Machine learning framework for predicting cytotoxicity and identifying toxicity drivers of disinfection byproducts. JOURNAL OF HAZARDOUS MATERIALS 2024;469:133989. [PMID: 38461660 DOI: 10.1016/j.jhazmat.2024.133989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 03/06/2024] [Accepted: 03/06/2024] [Indexed: 03/12/2024]

Almeida RL, Maltarollo VG, Coelho FGF. Overcoming class imbalance in drug discovery problems: Graph neural networks and balancing approaches. J Mol Graph Model 2024;126:108627. [PMID: 37801808 DOI: 10.1016/j.jmgm.2023.108627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 09/12/2023] [Accepted: 09/12/2023] [Indexed: 10/08/2023]

Loh C, Christensen T, Dangovski R, Kim S, Soljačić M. Surrogate- and invariance-boosted contrastive learning for data-scarce applications in science. Nat Commun 2022;13:4223. [PMID: 35864122 PMCID: PMC9304370 DOI: 10.1038/s41467-022-31915-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 07/07/2022] [Indexed: 11/18/2022] Open

Serov N, Vinogradov V. Artificial intelligence to bring nanomedicine to life. Adv Drug Deliv Rev 2022;184:114194. [PMID: 35283223 DOI: 10.1016/j.addr.2022.114194] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 03/04/2022] [Accepted: 03/07/2022] [Indexed: 12/13/2022]

Abstract

The technology of drug delivery systems (DDSs) has demonstrated an outstanding performance and effectiveness in production of pharmaceuticals, as it is proved by many FDA-approved nanomedicines that have an enhanced selectivity, manageable drug release kinetics and synergistic therapeutic actions. Nonetheless, to date, the rational design and high-throughput development of nanomaterial-based DDSs for specific purposes is far from a routine practice and is still in its infancy, mainly due to the limitations in scientists' capabilities to effectively acquire, analyze, manage, and comprehend complex and ever-growing sets of experimental data, which is vital to develop DDSs with a set of desired functionalities. At the same time, this task is feasible for the data-driven approaches, high throughput experimentation techniques, process automatization, artificial intelligence (AI) technology, and machine learning (ML) approaches, which is referred to as The Fourth Paradigm of scientific research. Therefore, an integration of these approaches with nanomedicine and nanotechnology can potentially accelerate the rational design and high-throughput development of highly efficient nanoformulated drugs and smart materials with pre-defined functionalities. In this Review, we survey the important results and milestones achieved to date in the application of data science, high throughput, as well as automatization approaches, combined with AI and ML to design and optimize DDSs and related nanomaterials. This manuscript mission is not only to reflect the state-of-art in data-driven nanomedicine, but also show how recent findings in the related fields can transform the nanomedicine's image. We discuss how all these results can be used to boost nanomedicine translation to the clinic, as well as highlight the future directions for the development, data-driven, high throughput experimentation-, and AI-assisted design, as well as the production of nanoformulated drugs and smart materials with pre-defined properties and behavior. This Review will be of high interest to the chemists involved in materials science, nanotechnology, and DDSs development for biomedical applications, although the general nature of the presented approaches enables knowledge translation to many other fields of science.

Collapse

Robinson RLM, Sarimveis H, Doganis P, Jia X, Kotzabasaki M, Gousiadou C, Harper SL, Wilkins T. Identifying diverse metal oxide nanomaterials with lethal effects on embryonic zebrafish using machine learning. BEILSTEIN JOURNAL OF NANOTECHNOLOGY 2021;12:1297-1325. [PMID: 34934606 PMCID: PMC8649207 DOI: 10.3762/bjnano.12.97] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/26/2021] [Accepted: 10/28/2021] [Indexed: 06/14/2023]

Abstract

Manufacturers of nanomaterial-enabled products need models of endpoints that are relevant to human safety to support the "safe by design" paradigm and avoid late-stage attrition. Increasingly, embryonic zebrafish (Danio Rerio) are recognised as a key human safety relevant in vivo test system. Hence, machine learning models were developed for identifying metal oxide nanomaterials causing lethality to embryonic zebrafish up to 24 hours post-fertilisation, or excess lethality in the period of 24-120 hours post-fertilisation, at concentrations of 250 ppm or less. Models were developed using data from the Nanomaterial Biological-Interactions Knowledgebase for a dataset of 44 diverse, coated and uncoated metal or, in one case, metalloid oxide nanomaterials. Different modelling approaches were evaluated using nested cross-validation on this dataset. Models were initially developed for both lethality endpoints using multiple descriptors representing the composition of the core, shell and surface functional groups, as well as particle characteristics. However, interestingly, the 24 hours post-fertilisation data were found to be harder to predict, which could reflect different exposure routes. Hence, subsequent analysis focused on the prediction of excess lethality at 120 hours-post fertilisation. The use of two data augmentation approaches, applied for the first time in nano-QSAR research, was explored, yet both failed to boost predictive performance. Interestingly, it was found that comparable results to those originally obtained using multiple descriptors could be obtained using a model based upon a single, simple descriptor: the Pauling electronegativity of the metal atom. Since it is widely recognised that a variety of intrinsic and extrinsic nanomaterial characteristics contribute to their toxicological effects, this is a surprising finding. This may partly reflect the need to investigate more sophisticated descriptors in future studies. Future studies are also required to examine how robust these modelling results are on truly external data, which were not used to select the single descriptor model. This will require further laboratory work to generate comparable data to those studied herein.

Collapse

Kim J, Park S, Min D, Kim W. Comprehensive Survey of Recent Drug Discovery Using Deep Learning. Int J Mol Sci 2021;22:9983. [PMID: 34576146 PMCID: PMC8470987 DOI: 10.3390/ijms22189983] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 09/09/2021] [Accepted: 09/10/2021] [Indexed: 02/07/2023] Open

Kim J, Kim Y, Lee EK, Chae CH, Lee K, Kim WJ, Choi IS. Rotational Variance-Based Data Augmentation in 3D Graph Convolutional Network. Chem Asian J 2021;16:2610-2613. [PMID: 34369653 DOI: 10.1002/asia.202100789] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 07/30/2021] [Indexed: 01/17/2023]

GPCR_LigandClassify.py; a rigorous machine learning classifier for GPCR targeting compounds. Sci Rep 2021;11:9510. [PMID: 33947911 PMCID: PMC8097070 DOI: 10.1038/s41598-021-88939-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2020] [Accepted: 04/12/2021] [Indexed: 02/02/2023] Open

Augmentation in Healthcare: Augmented Biosignal Using Deep Learning and Tensor Representation. JOURNAL OF HEALTHCARE ENGINEERING 2021;2021:6624764. [PMID: 33575018 PMCID: PMC7861952 DOI: 10.1155/2021/6624764] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Revised: 11/22/2020] [Accepted: 01/12/2021] [Indexed: 11/25/2022]

Lentelink NJ, Palkovits S. Transfer Learning as Tool to Enhance Predictions of Molecular Properties Based on 2D Projections. ADVANCED THEORY AND SIMULATIONS 2020. [DOI: 10.1002/adts.202000148] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Jablonka K, Ongari D, Moosavi SM, Smit B. Big-Data Science in Porous Materials: Materials Genomics and Machine Learning. Chem Rev 2020;120:8066-8129. [PMID: 32520531 PMCID: PMC7453404 DOI: 10.1021/acs.chemrev.0c00004] [Citation(s) in RCA: 154] [Impact Index Per Article: 38.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Indexed: 12/16/2022]

Cortés-Ciriano I, Škuta C, Bender A, Svozil D. QSAR-derived affinity fingerprints (part 2): modeling performance for potency prediction. J Cheminform 2020;12:41. [PMID: 33431016 PMCID: PMC7339533 DOI: 10.1186/s13321-020-00444-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 05/16/2020] [Indexed: 01/22/2023] Open

Abstract

Affinity fingerprints report the activity of small molecules across a set of assays, and thus permit to gather information about the bioactivities of structurally dissimilar compounds, where models based on chemical structure alone are often limited, and model complex biological endpoints, such as human toxicity and in vitro cancer cell line sensitivity. Here, we propose to model in vitro compound activity using computationally predicted bioactivity profiles as compound descriptors. To this aim, we apply and validate a framework for the calculation of QSAR-derived affinity fingerprints (QAFFP) using a set of 1360 QSAR models generated using K_i, K_d, IC₅₀ and EC₅₀ data from ChEMBL database. QAFFP thus represent a method to encode and relate compounds on the basis of their similarity in bioactivity space. To benchmark the predictive power of QAFFP we assembled IC₅₀ data from ChEMBL database for 18 diverse cancer cell lines widely used in preclinical drug discovery, and 25 diverse protein target data sets. This study complements part 1 where the performance of QAFFP in similarity searching, scaffold hopping, and bioactivity classification is evaluated. Despite being inherently noisy, we show that using QAFFP as descriptors leads to errors in prediction on the test set in the ~ 0.65-0.95 pIC₅₀ units range, which are comparable to the estimated uncertainty of bioactivity data in ChEMBL (0.76-1.00 pIC₅₀ units). We find that the predictive power of QAFFP is slightly worse than that of Morgan2 fingerprints and 1D and 2D physicochemical descriptors, with an effect size in the 0.02-0.08 pIC₅₀ units range. Including QSAR models with low predictive power in the generation of QAFFP does not lead to improved predictive power. Given that the QSAR models we used to compute the QAFFP were selected on the basis of data availability alone, we anticipate better modeling results for QAFFP generated using more diverse and biologically meaningful targets. Data sets and Python code are publicly available at https://github.com/isidroc/QAFFP_regression .

Collapse

Li X, Fourches D. Inductive transfer learning for molecular activity prediction: Next-Gen QSAR Models with MolPMoFiT. J Cheminform 2020;12:27. [PMID: 33430978 PMCID: PMC7178569 DOI: 10.1186/s13321-020-00430-x] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Accepted: 04/15/2020] [Indexed: 12/25/2022] Open

Drakakis G, Cortés-Ciriano I, Alexander-Dann B, Bender A. Elucidating Compound Mechanism of Action and Predicting Cytotoxicity Using Machine Learning Approaches, Taking Prediction Confidence into Account. ACTA ACUST UNITED AC 2020;11:e73. [PMID: 31483099 DOI: 10.1002/cpch.73] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Cardoso-Silva J, Papageorgiou LG, Tsoka S. Network-based piecewise linear regression for QSAR modelling. J Comput Aided Mol Des 2019;33:831-844. [PMID: 31628660 PMCID: PMC6825651 DOI: 10.1007/s10822-019-00228-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Accepted: 09/28/2019] [Indexed: 02/07/2023]

Simeon S, Montanari D, Gleeson MP. Investigation of Factors Affecting the Performance of in silico Volume Distribution QSAR Models for Human, Rat, Mouse, Dog & Monkey. Mol Inform 2019;38:e1900059. [DOI: 10.1002/minf.201900059] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Accepted: 07/03/2019] [Indexed: 01/09/2023]

Cortés-Ciriano I, Bender A. KekuleScope: prediction of cancer cell line sensitivity and compound potency using convolutional neural networks trained on compound images. J Cheminform 2019;11:41. [PMID: 31218493 PMCID: PMC6582521 DOI: 10.1186/s13321-019-0364-5] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Accepted: 06/09/2019] [Indexed: 02/08/2023] Open

Abstract

The application of convolutional neural networks (ConvNets) to harness high-content screening images or 2D compound representations is gaining increasing attention in drug discovery. However, existing applications often require large data sets for training, or sophisticated pretraining schemes. Here, we show using 33 IC50 data sets from ChEMBL 23 that the in vitro activity of compounds on cancer cell lines and protein targets can be accurately predicted on a continuous scale from their Kekulé structure representations alone by extending existing architectures (AlexNet, DenseNet-201, ResNet152 and VGG-19), which were pretrained on unrelated image data sets. We show that the predictive power of the generated models, which just require standard 2D compound representations as input, is comparable to that of Random Forest (RF) models and fully-connected Deep Neural Networks trained on circular (Morgan) fingerprints. Notably, including additional fully-connected layers further increases the predictive power of the ConvNets by up to 10%. Analysis of the predictions generated by RF models and ConvNets shows that by simply averaging the output of the RF models and ConvNets we obtain significantly lower errors in prediction for multiple data sets, although the effect size is small, than those obtained with either model alone, indicating that the features extracted by the convolutional layers of the ConvNets provide complementary predictive signal to Morgan fingerprints. Lastly, we show that multi-task ConvNets trained on compound images permit to model COX isoform selectivity on a continuous scale with errors in prediction comparable to the uncertainty of the data. Overall, in this work we present a set of ConvNet architectures for the prediction of compound activity from their Kekulé structure representations with state-of-the-art performance, that require no generation of compound descriptors or use of sophisticated image processing techniques. The code needed to reproduce the results presented in this study and all the data sets are provided at https://github.com/isidroc/kekulescope .

Collapse

Grenet I, Merlo K, Comet JP, Tertiaux R, Rouquié D, Dayan F. Stacked Generalization with Applicability Domain Outperforms Simple QSAR on in Vitro Toxicological Data. J Chem Inf Model 2019;59:1486-1496. [PMID: 30735402 DOI: 10.1021/acs.jcim.8b00553] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]

Lagunin AA, Geronikaki A, Eleftheriou P, Pogodin PV, Zakharov AV. Rational Use of Heterogeneous Data in Quantitative Structure-Activity Relationship (QSAR) Modeling of Cyclooxygenase/Lipoxygenase Inhibitors. J Chem Inf Model 2019;59:713-730. [PMID: 30688458 DOI: 10.1021/acs.jcim.8b00617] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]

Lagunin AA, Romanova MA, Zadorozhny AD, Kurilenko NS, Shilov BV, Pogodin PV, Ivanov SM, Filimonov DA, Poroikov VV. Comparison of Quantitative and Qualitative (Q)SAR Models Created for the Prediction of K_i and IC₅₀ Values of Antitarget Inhibitors. Front Pharmacol 2018;9:1136. [PMID: 30364128 PMCID: PMC6192375 DOI: 10.3389/fphar.2018.01136] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Accepted: 09/18/2018] [Indexed: 12/20/2022] Open

Cardoso‐Silva J, Papadatos G, Papageorgiou LG, Tsoka S. Optimal Piecewise Linear Regression Algorithm for QSAR Modelling. Mol Inform 2018;38:e1800028. [DOI: 10.1002/minf.201800028] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Accepted: 08/02/2018] [Indexed: 12/20/2022]

Svensson F, Aniceto N, Norinder U, Cortes-Ciriano I, Spjuth O, Carlsson L, Bender A. Conformal Regression for Quantitative Structure–Activity Relationship Modeling—Quantifying Prediction Uncertainty. J Chem Inf Model 2018;58:1132-1140. [DOI: 10.1021/acs.jcim.8b00054] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Fukunishi Y, Yamasaki S, Yasumatsu I, Takeuchi K, Kurosawa T, Nakamura H. Quantitative Structure-activity Relationship (QSAR) Models for Docking Score Correction. Mol Inform 2017;36:1600013. [PMID: 28001004 PMCID: PMC5297997 DOI: 10.1002/minf.201600013] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Accepted: 04/01/2016] [Indexed: 01/26/2023]