Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Tetko IV, Engkvist O, Koch U, Reymond JL, Chen H. BIGCHEM: Challenges and Opportunities for Big Data Analysis in Chemistry. Mol Inform 2016;35:615-621. [PMID: 27464907 PMCID: PMC5129546 DOI: 10.1002/minf.201600073] [Citation(s) in RCA: 68] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Accepted: 07/06/2016] [Indexed: 01/19/2023]

For:	Tetko IV, Engkvist O, Koch U, Reymond JL, Chen H. BIGCHEM: Challenges and Opportunities for Big Data Analysis in Chemistry. Mol Inform 2016;35:615-621. [PMID: 27464907 PMCID: PMC5129546 DOI: 10.1002/minf.201600073] [Citation(s) in RCA: 68] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Accepted: 07/06/2016] [Indexed: 01/19/2023]

Number

Cited by Other Article(s)

Kostal J, Voutchkova-Kostal A, Bercu JP, Graham JC, Hillegass J, Masuda-Herrera M, Trejo-Martin A, Gould J. Quantum-Mechanics Calculations Elucidate Skin-Sensitizing Pharmaceutical Compounds. Chem Res Toxicol 2024;37:1404-1414. [PMID: 39069667 DOI: 10.1021/acs.chemrestox.4c00185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]

Abstract

Skin sensitization is a critical end point in occupational toxicology that necessitates the use of fast, accurate, and affordable models to aid in establishing handling guidance for worker protection. While many in silico models have been developed, the scarcity of reliable data for active pharmaceutical ingredients (APIs) and their intermediates (together regarded as pharmaceutical compounds) brings into question the reliability of these tools, which are largely constructed using publicly available nonspecialty chemicals. Here, we present the quantum-mechanical (QM) Computer-Aided Discovery and REdesign (CADRE) model, which was developed with the bioactive and structurally complex chemical space in mind by relying on the fundamentals of chemical interactions in key events (versus structural attributes of training-set data). Validated in this study on 345 APIs and intermediates, CADRE achieved 95% accuracy, sensitivity, and specificity and a combined 79% accuracy in assigning potency categories compared to the mouse local lymph node assay data. We show how historical outcomes from CADRE testing in the pharmaceutical space, generated over the past 10 years on ca. 2500 chemicals, can be used to probe the relationships between sensitization mechanisms (or the underlying chemical classes) and the probability of eliciting a sensitization response in mice of a given potency. We believe this information to be of value to both practitioners, who can use it to quickly screen and triage their data sets, as well as to model developers to fine-tune their structure-based tools. Lastly, we leverage our experimentally validated subset of APIs and intermediates to show the importance of dermal permeability on the sensitization potential and potency. We demonstrate that common physicochemical properties used to assess permeation, such as the octanol-water partition coefficient and molecular weight, are poor proxies for the more accurate energy-pair distributions that can be computed from mixed QM and classical simulations using model representations of the stratum corneum.

Collapse

Sosnin S. MolCompass: multi-tool for the navigation in chemical space and visual validation of QSAR/QSPR models. J Cheminform 2024;16:98. [PMID: 39129016 PMCID: PMC11318166 DOI: 10.1186/s13321-024-00888-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 07/21/2024] [Indexed: 08/13/2024] Open

Comajuncosa-Creus A, Lenes A, Sánchez-Palomino M, Dalton D, Aloy P. Stereochemically-aware bioactivity descriptors for uncharacterized chemical compounds. J Cheminform 2024;16:70. [PMID: 38890727 PMCID: PMC11186078 DOI: 10.1186/s13321-024-00867-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 06/05/2024] [Indexed: 06/20/2024] Open

Daghighi A, Casanola-Martin GM, Iduoku K, Kusic H, González-Díaz H, Rasulev B. Multi-Endpoint Acute Toxicity Assessment of Organic Compounds Using Large-Scale Machine Learning Modeling. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024;58:10116-10127. [PMID: 38797941 DOI: 10.1021/acs.est.4c01017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]

Velásquez-López Y, Ruiz-Escudero A, Arrasate S, González-Díaz H. Implementation of IFPTML Computational Models in Drug Discovery Against Flaviviridae Family. J Chem Inf Model 2024;64:1841-1852. [PMID: 38466369 PMCID: PMC10966645 DOI: 10.1021/acs.jcim.3c01796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 02/26/2024] [Accepted: 02/27/2024] [Indexed: 03/13/2024]

Pinzi L, Rastelli G. Trends and Applications in Computationally Driven Drug Repurposing. Int J Mol Sci 2023;24:16511. [PMID: 38003701 PMCID: PMC10671888 DOI: 10.3390/ijms242216511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 11/06/2023] [Indexed: 11/26/2023] Open

Khodadadi Karimvand S, Mohammad Jafari J, Vali Zade S, Abdollahi H. Practical and comparative application of efficient data reduction - Multivariate curve resolution. Anal Chim Acta 2023;1243:340824. [PMID: 36697179 DOI: 10.1016/j.aca.2023.340824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Revised: 01/11/2023] [Accepted: 01/11/2023] [Indexed: 01/13/2023]

Neves P, McClure K, Verhoeven J, Dyubankova N, Nugmanov R, Gedich A, Menon S, Shi Z, Wegner JK. Global reactivity models are impactful in industrial synthesis applications. J Cheminform 2023;15:20. [PMID: 36774523 PMCID: PMC9921076 DOI: 10.1186/s13321-023-00685-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Accepted: 01/22/2023] [Indexed: 02/13/2023] Open

Abstract

Artificial Intelligence is revolutionizing many aspects of the pharmaceutical industry. Deep learning models are now routinely applied to guide drug discovery projects leading to faster and improved findings, but there are still many tasks with enormous unrealized potential. One such task is the reaction yield prediction. Every year more than one fifth of all synthesis attempts result in product yields which are either zero or too low. This equates to chemical and human resources being spent on activities which ultimately do not progress the programs, leading to a triple loss when accounting for the cost of opportunity in time wasted. In this work we pre-train a BERT model on more than 16 million reactions from 4 different data sources, and fine tune it to achieve an uncertainty calibrated global yield prediction model. This model is an improvement upon state of the art not just from the increase in pre-train data but also by introducing a new embedding layer which solves a few limitations of SMILES and enables integration of additional information such as equivalents and molecule role into the reaction encoding, the model is called BERT Enriched Embedding (BEE). The model is benchmarked on an open-source dataset against a state-of-the-art synthesis focused BERT showing a near 20-point improvement in r2 score. The model is fine-tuned and tested on an internal company data benchmark, and a prospective study shows that the application of the model can reduce the total number of negative reactions (yield under 5%) ran in Janssen by at least 34%. Lastly, we corroborate the previous results through experimental validation, by directly deploying the model in an on-going drug discovery project and showing that it can also be used successfully as a reagent recommender due to its fast inference speed and reliable confidence estimation, a critical feature for industry application.

Collapse

Boiko DA, Kashin AS, Sorokin VR, Agaev YV, Zaytsev RG, Ananikov VP. Analyzing ionic liquid systems using real-time electron microscopy and a computational framework combining deep learning and classic computer vision techniques. J Mol Liq 2023. [DOI: 10.1016/j.molliq.2023.121407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]

Machine Learning Prediction of Mycobacterial Cell Wall Permeability of Drugs and Drug-like Compounds. MOLECULES (BASEL, SWITZERLAND) 2023;28:molecules28020633. [PMID: 36677691 PMCID: PMC9863426 DOI: 10.3390/molecules28020633] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Revised: 12/30/2022] [Accepted: 12/30/2022] [Indexed: 01/11/2023]

Parastar H, Tauler R. Big (Bio)Chemical Data Mining Using Chemometric Methods: A Need for Chemists. Angew Chem Int Ed Engl 2022. [DOI: 10.1002/ange.201801134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Xiong F, Yu M, Xu H, Zhong Z, Li Z, Guo Y, Zhang T, Zeng Z, Jin F, He X. Discovery of TIGIT inhibitors based on DEL and machine learning. Front Chem 2022;10:982539. [PMID: 35958238 PMCID: PMC9360614 DOI: 10.3389/fchem.2022.982539] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 07/11/2022] [Indexed: 11/13/2022] Open

Murali V, Muralidhar YP, Königs C, Nair M, Madhu S, Nedungadi P, Srinivasa G, Athri P. Predicting clinical trial outcomes using drug bioactivities through graph database integration and machine learning. Chem Biol Drug Des 2022;100:169-184. [PMID: 35587730 DOI: 10.1111/cbdd.14092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Revised: 04/24/2022] [Accepted: 05/15/2022] [Indexed: 11/29/2022]

Abstract

The ability to estimate the probability of a drug to receive approval in clinical trials provides natural advantages to optimizing pharmaceutical research workflows. Success rates of clinical trials have deep implications for costs, duration of development, and under pressure due to stringent regulatory approval processes. We propose a machine learning approach that can predict the outcome of the trial with reliable accuracies, using biological activities, physicochemical properties of the compounds, target-related features, and NLP-based compound representation. In the above list, biological activities have never been used as an independent variable towards the prediction of clinical trial outcomes. We have extracted the drug-disease pair from clinical trials and mapped target(s) to that pair using multiple data sources. Empirical results demonstrate that ensemble learning outperforms independently trained, small-data ML models. We report results and inferences derived from a Random forest classifier with an average accuracy of 93%, and an F1 score of 0.96 for the "Pass" class. "Pass" refers to one of the two classes (Pass/Fail) of all clinical trials, and the model performed well in predicting the "Pass" category. Through the analysis of feature contributions to predictive capability, we have demonstrated that bioactivity plays a statistically significant role in predicting clinical trial outcome. A significant effort has gone into the production of the dataset that, for the first time, integrates clinical trial information with protein targets. Cleaned, organized, integrated data and code to map these entities, created as a part of this work, are available open-source. This reproducibility and the freely available code ensure that researchers with access to deep curated and proprietary clinical trial databases (we only use open-source data in this study) can further expand the scope of the results.

Collapse

Sharma C, Sinha R, Johnson K. Practical and comprehensive formalisms for modelling contemporary graph query languages. INFORM SYST 2021. [DOI: 10.1016/j.is.2021.101816] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Wang Z, Zhang W, Liu B. Computational Analysis of Synthetic Planning: Past and Future. CHINESE J CHEM 2021. [DOI: 10.1002/cjoc.202100273] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Sicho M, Liu X, Svozil D, van Westen GJP. GenUI: interactive and extensible open source software platform for de novo molecular generation and cheminformatics. J Cheminform 2021;13:73. [PMID: 34563271 PMCID: PMC8465716 DOI: 10.1186/s13321-021-00550-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 09/05/2021] [Indexed: 03/05/2023] Open

Abstract

Many contemporary cheminformatics methods, including computer-aided de novo drug design, hold promise to significantly accelerate and reduce the cost of drug discovery. Thanks to this attractive outlook, the field has thrived and in the past few years has seen an especially significant growth, mainly due to the emergence of novel methods based on deep neural networks. This growth is also apparent in the development of novel de novo drug design methods with many new generative algorithms now available. However, widespread adoption of new generative techniques in the fields like medicinal chemistry or chemical biology is still lagging behind the most recent developments. Upon taking a closer look, this fact is not surprising since in order to successfully integrate the most recent de novo drug design methods in existing processes and pipelines, a close collaboration between diverse groups of experimental and theoretical scientists needs to be established. Therefore, to accelerate the adoption of both modern and traditional de novo molecular generators, we developed Generator User Interface (GenUI), a software platform that makes it possible to integrate molecular generators within a feature-rich graphical user interface that is easy to use by experts of diverse backgrounds. GenUI is implemented as a web service and its interfaces offer access to cheminformatics tools for data preprocessing, model building, molecule generation, and interactive chemical space visualization. Moreover, the platform is easy to extend with customizable frontend React.js components and backend Python extensions. GenUI is open source and a recently developed de novo molecular generator, DrugEx, was integrated as a proof of principle. In this work, we present the architecture and implementation details of GenUI and discuss how it can facilitate collaboration in the disparate communities interested in de novo molecular generation and computer-aided drug discovery.

Collapse

Cañizares-Carmenate Y, Mena-Ulecia K, MacLeod Carey D, Perera-Sardiña Y, Hernández-Rodríguez EW, Marrero-Ponce Y, Torrens F, Castillo-Garit JA. Machine learning approach to discovery of small molecules with potential inhibitory action against vasoactive metalloproteases. Mol Divers 2021;26:1383-1397. [PMID: 34216326 DOI: 10.1007/s11030-021-10260-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 06/17/2021] [Indexed: 11/26/2022]

Wang H, Xiong W. Vibrational Sum-Frequency Generation Hyperspectral Microscopy for Molecular Self-Assembled Systems. Annu Rev Phys Chem 2021;72:279-306. [PMID: 33441031 DOI: 10.1146/annurev-physchem-090519-050510] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Żurański AM, Martinez Alvarado JI, Shields BJ, Doyle AG. Predicting Reaction Yields via Supervised Learning. Acc Chem Res 2021;54:1856-1865. [PMID: 33788552 DOI: 10.1021/acs.accounts.0c00770] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Abstract

Numerous disciplines, such as image recognition and language translation, have been revolutionized by using machine learning (ML) to leverage big data. In organic synthesis, providing accurate chemical reactivity predictions with supervised ML could assist chemists with reaction prediction, optimization, and mechanistic interrogation.To apply supervised ML to chemical reactions, one needs to define the object of prediction (e.g., yield, enantioselectivity, solubility, or a recommendation) and represent reactions with descriptive data. Our group's effort has focused on representing chemical reactions using DFT-derived physical features of the reacting molecules and conditions, which serve as features for building supervised ML models.In this Account, we present a review and perspective on three studies conducted by our group where ML models have been employed to predict reaction yield. First, we focus on a small reaction data set where 16 phosphine ligands were evaluated in a single Ni-catalyzed Suzuki-Miyaura cross-coupling reaction, and the reaction yield was modeled with linear regression. In this setting, where the regression complexity is strongly limited by the amount of available data, we emphasize the importance of identifying single features that are directly relevant to reactivity. Next, we focus on models trained on two larger data sets obtained with high-throughput experimentation (HTE). With hundreds to thousands of reactions available, more complex models can be explored, for example, models that algorithmically perform feature selection from a broad set of candidate features. We examine how a variety of ML algorithms model these data sets and how well these models generalize to out-of-sample substrates. Specifically, we compare the ML models that use DFT-based featurization to a baseline model that is obtained with features that carry no physical information, that is, random features, and to a naive non-ML model that averages yields of reactions that share the same conditions and substrate combinations. We find that for only one of the two data sets, DFT-based featurization leads to a significant, although moderate, out-of-sample prediction improvement. The source of this improvement was further isolated to specific features which allowed us to formulate a testable mechanistic hypothesis that was validated experimentally. Finally, we offer remarks on supervised ML model building on HTE data sets focusing on algorithmic improvements in model training.Statistical methods in chemistry have a rich history, but only recently has ML gained widespread attention in reaction development. As the untapped potential of ML is explored, novel tools are likely to arise from future research. Our studies suggest that supervised ML can lead to improved predictions of reaction yield over simpler modeling methods and facilitate mechanistic understanding of reaction dynamics. However, further research and development is required to establish ML as an indispensable tool in reactivity modeling.

Collapse

Orlova Y, Gambardella AA, Kryven I, Keune K, Iedema PD. Generative Algorithm for Molecular Graphs Uncovers Products of Oil Oxidation. J Chem Inf Model 2021;61:1457-1469. [PMID: 33615781 PMCID: PMC7988456 DOI: 10.1021/acs.jcim.0c01163] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Indexed: 12/13/2022]

Senthil S, Chakraborty S, Ramakrishnan R. Troubleshooting unstable molecules in chemical space. Chem Sci 2021;12:5566-5573. [PMID: 34163773 PMCID: PMC8179589 DOI: 10.1039/d0sc05591c] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 02/27/2021] [Indexed: 01/11/2023] Open

Jain S, Siramshetty VB, Alves VM, Muratov EN, Kleinstreuer N, Tropsha A, Nicklaus MC, Simeonov A, Zakharov AV. Large-Scale Modeling of Multispecies Acute Toxicity End Points Using Consensus of Multitask Deep Learning Methods. J Chem Inf Model 2021;61:653-663. [PMID: 33533614 DOI: 10.1021/acs.jcim.0c01164] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]

Affiliation(s)

Sankalp Jain National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
Vishal B Siramshetty National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
Vinicius M Alves UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
Eugene N Muratov UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
Nicole Kleinstreuer Division of Intramural Research, Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, 111 T.W. Alexander Drive, Durham, North Carolina 27709, United States.,National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, National Institute of Environmental Health Sciences, 111 T.W. Alexander Drive, Durham, North Carolina 27709, United States
Alexander Tropsha UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
Marc C Nicklaus Computer-Aided Drug Design (CADD) Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, DHHS, NCI-Frederick, 376 Boyles Street, Frederick, Maryland 21702, United States
Anton Simeonov National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
Alexey V Zakharov National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States

Collapse

Yeung AWK, Atanasov AG, Sheridan H, Klager E, Eibensteiner F, Völkl-Kernsock S, Kletecka-Pulker M, Willschke H, Schaden E. Open Innovation in Medical and Pharmaceutical Research: A Literature Landscape Analysis. Front Pharmacol 2021;11:587526. [PMID: 33519448 PMCID: PMC7840485 DOI: 10.3389/fphar.2020.587526] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 11/16/2020] [Indexed: 12/12/2022] Open

Affiliation(s)

Andy Wai Kan Yeung Oral and Maxillofacial Radiology, Applied Oral Sciences and Community Dental Care, Faculty of Dentistry, The University of Hong Kong, Hong Kong, China.,Ludwig Boltzmann Institute for Digital Health and Patient Safety, Medical University of Vienna, Vienna, Austria
Atanas G Atanasov Ludwig Boltzmann Institute for Digital Health and Patient Safety, Medical University of Vienna, Vienna, Austria.,Institute of Genetics and Animal Biotechnology of the Polish Academy of Sciences, Magdalenka, Poland.,Institute of Neurobiology, Bulgarian Academy of Sciences, Sofia, Bulgaria.,Department of Pharmacognosy, University of Vienna, Vienna, Austria
Helen Sheridan NatPro Centre. School of Pharmacy and Pharmaceutical Sciences, Trinity College Dublin, Dublin, Ireland
Elisabeth Klager Ludwig Boltzmann Institute for Digital Health and Patient Safety, Medical University of Vienna, Vienna, Austria
Fabian Eibensteiner Ludwig Boltzmann Institute for Digital Health and Patient Safety, Medical University of Vienna, Vienna, Austria.,Division of Pediatric Nephrology and Gastroenterology, Department of Pediatrics and Adolescent Medicine, Comprehensive Center for Pediatrics, Medical University of Vienna, Vienna, Austria
Sabine Völkl-Kernsock Ludwig Boltzmann Institute for Digital Health and Patient Safety, Medical University of Vienna, Vienna, Austria
Maria Kletecka-Pulker Ludwig Boltzmann Institute for Digital Health and Patient Safety, Medical University of Vienna, Vienna, Austria
Harald Willschke Ludwig Boltzmann Institute for Digital Health and Patient Safety, Medical University of Vienna, Vienna, Austria.,Department of Anaesthesia, Intensive Care Medicine and Pain Medicine, Medical University Vienna, Vienna, Austria
Eva Schaden Ludwig Boltzmann Institute for Digital Health and Patient Safety, Medical University of Vienna, Vienna, Austria.,Department of Anaesthesia, Intensive Care Medicine and Pain Medicine, Medical University Vienna, Vienna, Austria

Collapse

Rodrigues JF, Florea L, de Oliveira MCF, Diamond D, Oliveira ON. Big data and machine learning for materials science. DISCOVER MATERIALS 2021;1:12. [PMID: 33899049 PMCID: PMC8054236 DOI: 10.1007/s43939-021-00012-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 04/01/2021] [Indexed: 05/11/2023]

Thakkar A, Johansson S, Jorner K, Buttar D, Reymond JL, Engkvist O. Artificial intelligence and automation in computer aided synthesis planning. REACT CHEM ENG 2021. [DOI: 10.1039/d0re00340a] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Matter H, Buning C, Stefanescu DD, Ruf S, Hessler G. Using Graph Databases to Investigate Trends in Structure–Activity Relationship Networks. J Chem Inf Model 2020;60:6120-6134. [DOI: 10.1021/acs.jcim.0c00947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Coley CW, Eyke NS, Jensen KF. Autonome Entdeckung in den chemischen Wissenschaften, Teil II: Ausblick. Angew Chem Int Ed Engl 2020. [DOI: 10.1002/ange.201909989] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]

Tetko IV, Engkvist O. From Big Data to Artificial Intelligence: chemoinformatics meets new challenges. J Cheminform 2020;12:74. [PMID: 33339533 PMCID: PMC7747384 DOI: 10.1186/s13321-020-00475-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Accepted: 11/18/2020] [Indexed: 12/17/2022] Open

Abuín JM, Lopes N, Ferreira L, Pena TF, Schmidt B. Big Data in metagenomics: Apache Spark vs MPI. PLoS One 2020;15:e0239741. [PMID: 33022000 PMCID: PMC7537910 DOI: 10.1371/journal.pone.0239741] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Accepted: 09/14/2020] [Indexed: 11/23/2022] Open

Abstract

The progress of next-generation sequencing has lead to the availability of massive data sets used by a wide range of applications in biology and medicine. This has sparked significant interest in using modern Big Data technologies to process this large amount of information in distributed memory clusters of commodity hardware. Several approaches based on solutions such as Apache Hadoop or Apache Spark, have been proposed. These solutions allow developers to focus on the problem while the need to deal with low level details, such as data distribution schemes or communication patterns among processing nodes, can be ignored. However, performance and scalability are also of high importance when dealing with increasing problems sizes, making in this way the usage of High Performance Computing (HPC) technologies such as the message passing interface (MPI) a promising alternative. Recently, MetaCacheSpark, an Apache Spark based software for detection and quantification of species composition in food samples has been proposed. This tool can be used to analyze high throughput sequencing data sets of metagenomic DNA and allows for dealing with large-scale collections of complex eukaryotic and bacterial reference genome. In this work, we propose MetaCache-MPI, a fast and memory efficient solution for computing clusters which is based on MPI instead of Apache Spark. In order to evaluate its performance a comparison is performed between the original single CPU version of MetaCache, the Spark version and the MPI version we are introducing. Results show that for 32 processes, MetaCache-MPI is 1.65× faster while consuming 48.12% of the RAM memory used by Spark for building a metagenomics database. For querying this database, also with 32 processes, the MPI version is 3.11× faster, while using 55.56% of the memory used by Spark. We conclude that the new MetaCache-MPI version is faster in both building and querying the database and uses less RAM memory, when compared with MetaCacheSpark, while keeping the accuracy of the original implementation.

Collapse

McDonagh JL, Swope WC, Anderson RL, Johnston MA, Bray DJ. What can digitisation do for formulated product innovation and development? POLYM INT 2020. [DOI: 10.1002/pi.6056] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]

Coley CW, Eyke NS, Jensen KF. Autonomous Discovery in the Chemical Sciences Part II: Outlook. Angew Chem Int Ed Engl 2020;59:23414-23436. [PMID: 31553509 DOI: 10.1002/anie.201909989] [Citation(s) in RCA: 101] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Indexed: 01/19/2023]

Kostal J, Voutchkova-Kostal A. Going All In: A Strategic Investment in In Silico Toxicology. Chem Res Toxicol 2020;33:880-888. [PMID: 32166946 DOI: 10.1021/acs.chemrestox.9b00497] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Wang Y. User online behavior based on big data distributed clustering algorithm. INT J ADV ROBOT SYST 2020. [DOI: 10.1177/1729881420917293] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Ma R, Li Y, Li C, Wan F, Hu H, Xu W, Zeng J. Secure multiparty computation for privacy-preserving drug discovery. Bioinformatics 2020;36:2872-2880. [DOI: 10.1093/bioinformatics/btaa038] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Revised: 01/08/2020] [Accepted: 01/15/2020] [Indexed: 01/24/2023] Open

Abstract Abstract Motivation Quantitative structure–activity relationship (QSAR) and drug–target interaction (DTI) prediction are both commonly used in drug discovery. Collaboration among pharmaceutical institutions can lead to better performance in both QSAR and DTI prediction. However, the drug-related data privacy and intellectual property issues have become a noticeable hindrance for inter-institutional collaboration in drug discovery. Results We have developed two novel algorithms under secure multiparty computation (MPC), including QSARMPC and DTIMPC, which enable pharmaceutical institutions to achieve high-quality collaboration to advance drug discovery without divulging private drug-related information. QSARMPC, a neural network model under MPC, displays good scalability and performance and is feasible for privacy-preserving collaboration on large-scale QSAR prediction. DTIMPC integrates drug-related heterogeneous network data and accurately predicts novel DTIs, while keeping the drug information confidential. Under several experimental settings that reflect the situations in real drug discovery scenarios, we have demonstrated that DTIMPC possesses significant performance improvement over the baseline methods, generates novel DTI predictions with supporting evidence from the literature and shows the feasible scalability to handle growing DTI data. All these results indicate that QSARMPC and DTIMPC can provide practically useful tools for advancing privacy-preserving drug discovery. Availability and implementation The source codes of QSARMPC and DTIMPC are available on the GitHub: https://github.com/rongma6/QSARMPC_DTIMPC.git. Supplementary information Supplementary data are available at Bioinformatics online. Collapse

Grizou J, Points LJ, Sharma A, Cronin L. A curious formulation robot enables the discovery of a novel protocell behavior. SCIENCE ADVANCES 2020;6:eaay4237. [PMID: 32064348 PMCID: PMC6994213 DOI: 10.1126/sciadv.aay4237] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 11/20/2019] [Indexed: 05/11/2023]

Pinzi L, Rastelli G. Identification of Target Associations for Polypharmacology from Analysis of Crystallographic Ligands of the Protein Data Bank. J Chem Inf Model 2019;60:372-390. [PMID: 31800237 DOI: 10.1021/acs.jcim.9b00821] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

Karaman B, Sippl W. Computational Drug Repurposing: Current Trends. Curr Med Chem 2019;26:5389-5409. [DOI: 10.2174/0929867325666180530100332] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2018] [Revised: 05/06/2018] [Accepted: 05/14/2018] [Indexed: 01/31/2023]

Tarasova OA, Biziukova NY, Filimonov DA, Poroikov VV, Nicklaus MC. Data Mining Approach for Extraction of Useful Information About Biologically Active Compounds from Publications. J Chem Inf Model 2019;59:3635-3644. [PMID: 31453694 DOI: 10.1021/acs.jcim.9b00164] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

de Almeida AF, Moreira R, Rodrigues T. Synthetic organic chemistry driven by artificial intelligence. Nat Rev Chem 2019. [DOI: 10.1038/s41570-019-0124-0] [Citation(s) in RCA: 111] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Louzoun‐Zada S, Jaber QZ, Fridman M. Guiding Drugs to Target‐Harboring Organelles: Stretching Drug‐Delivery to a Higher Level of Resolution. Angew Chem Int Ed Engl 2019. [DOI: 10.1002/ange.201906284] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Louzoun-Zada S, Jaber QZ, Fridman M. Guiding Drugs to Target-Harboring Organelles: Stretching Drug-Delivery to a Higher Level of Resolution. Angew Chem Int Ed Engl 2019;58:15584-15594. [PMID: 31237741 DOI: 10.1002/anie.201906284] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Indexed: 01/04/2023]

Savosina PI, Stolbov LA, Druzhilovskiy DS, Filimonov DA, Nicklaus MC, Poroikov VV. [Discovering new antiretroviral compounds in "Big Data" chemical space of the SAVI library]. BIOMEDIT︠S︡INSKAI︠A︡ KHIMII︠A︡ 2019;65:73-79. [PMID: 30950810 DOI: 10.18097/pbmc20196502073] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Osypenko A, Dhers S, Lehn JM. Pattern Generation and Information Transfer through a Liquid/Liquid Interface in 3D Constitutional Dynamic Networks of Imine Ligands in Response to Metal Cation Effectors. J Am Chem Soc 2019;141:12724-12737. [DOI: 10.1021/jacs.9b05438] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Duros V, Grizou J, Sharma A, Mehr SHM, Bubliauskas A, Frei P, Miras HN, Cronin L. Intuition-Enabled Machine Learning Beats the Competition When Joint Human-Robot Teams Perform Inorganic Chemical Experiments. J Chem Inf Model 2019;59:2664-2671. [PMID: 31025861 PMCID: PMC6593393 DOI: 10.1021/acs.jcim.9b00304] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Sosnin S, Vashurina M, Withnall M, Karpov P, Fedorov M, Tetko IV. A Survey of Multi-task Learning Methods in Chemoinformatics. Mol Inform 2019;38:e1800108. [PMID: 30499195 PMCID: PMC6587441 DOI: 10.1002/minf.201800108] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2018] [Accepted: 10/16/2018] [Indexed: 01/09/2023]

Lovrić M, Molero JM, Kern R. PySpark and RDKit: Moving towards Big Data in Cheminformatics. Mol Inform 2019;38:e1800082. [DOI: 10.1002/minf.201800082] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2018] [Accepted: 02/18/2019] [Indexed: 12/16/2022]

Sosnin S, Karlov D, Tetko IV, Fedorov MV. Comparative Study of Multitask Toxicity Modeling on a Broad Chemical Space. J Chem Inf Model 2019;59:1062-1072. [PMID: 30589269 DOI: 10.1021/acs.jcim.8b00685] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

A Strength-Weaknesses-Opportunities-Threats (SWOT) Analysis of Cheminformatics in Natural Product Research. PROGRESS IN THE CHEMISTRY OF ORGANIC NATURAL PRODUCTS 2019;110:239-271. [PMID: 31621015 DOI: 10.1007/978-3-030-14632-0_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Zhavoronkov A, Mamoshina P, Vanhaelen Q, Scheibye-Knudsen M, Moskalev A, Aliper A. Artificial intelligence for aging and longevity research: Recent advances and perspectives. Ageing Res Rev 2019;49:49-66. [PMID: 30472217 DOI: 10.1016/j.arr.2018.11.003] [Citation(s) in RCA: 84] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2018] [Revised: 11/07/2018] [Accepted: 11/21/2018] [Indexed: 12/14/2022]

Prabhu GRD, Witek HA, Urban PL. Telechemistry: monitoring chemical reactionsviathe cloud using the Particle Photon Wi-Fi module. REACT CHEM ENG 2019. [DOI: 10.1039/c9re00043g] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]