1
|
Korolev V, Protsenko P. Accurate, interpretable predictions of materials properties within transformer language models. Patterns (N Y) 2023; 4:100803. [PMID: 37876904 PMCID: PMC10591138 DOI: 10.1016/j.patter.2023.100803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 06/06/2023] [Accepted: 07/04/2023] [Indexed: 10/26/2023]
Abstract
Property prediction accuracy has long been a key parameter of machine learning in materials informatics. Accordingly, advanced models showing state-of-the-art performance turn into highly parameterized black boxes missing interpretability. Here, we present an elegant way to make their reasoning transparent. Human-readable text-based descriptions automatically generated within a suite of open-source tools are proposed as materials representation. Transformer language models pretrained on 2 million peer-reviewed articles take as input well-known terms such as chemical composition, crystal symmetry, and site geometry. Our approach outperforms crystal graph networks by classifying four out of five analyzed properties if one considers all available reference data. Moreover, fine-tuned text-based models show high accuracy in the ultra-small data limit. Explanations of their internal machinery are produced using local interpretability techniques and are faithful and consistent with domain expert rationales. This language-centric framework makes accurate property predictions accessible to people without artificial-intelligence expertise.
Collapse
Affiliation(s)
- Vadim Korolev
- Department of Chemistry, Lomonosov Moscow State University, 119991 Moscow, Russia
| | - Pavel Protsenko
- Department of Chemistry, Lomonosov Moscow State University, 119991 Moscow, Russia
| |
Collapse
|
2
|
Huang G, Guo Y, Chen Y, Nie Z. Application of Machine Learning in Material Synthesis and Property Prediction. Materials (Basel) 2023; 16:5977. [PMID: 37687675 PMCID: PMC10488794 DOI: 10.3390/ma16175977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 08/22/2023] [Accepted: 08/28/2023] [Indexed: 09/10/2023]
Abstract
Material innovation plays a very important role in technological progress and industrial development. Traditional experimental exploration and numerical simulation often require considerable time and resources. A new approach is urgently needed to accelerate the discovery and exploration of new materials. Machine learning can greatly reduce computational costs, shorten the development cycle, and improve computational accuracy. It has become one of the most promising research approaches in the process of novel material screening and material property prediction. In recent years, machine learning has been widely used in many fields of research, such as superconductivity, thermoelectrics, photovoltaics, catalysis, and high-entropy alloys. In this review, the basic principles of machine learning are briefly outlined. Several commonly used algorithms in machine learning models and their primary applications are then introduced. The research progress of machine learning in predicting material properties and guiding material synthesis is discussed. Finally, a future outlook on machine learning in the materials science field is presented.
Collapse
Affiliation(s)
| | | | | | - Zhengwei Nie
- School of Mechanical and Power Engineering, Nanjing Tech University, Nanjing 211816, China; (G.H.); (Y.G.); (Y.C.)
| |
Collapse
|
3
|
Sun H, Zhang H, Ren G, Zhang C. A Knowledge Transfer Framework for General Alloy Materials Properties Prediction. Materials (Basel) 2022; 15:7442. [PMID: 36363034 PMCID: PMC9654329 DOI: 10.3390/ma15217442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 10/03/2022] [Accepted: 10/10/2022] [Indexed: 06/16/2023]
Abstract
Biomedical metal implants have many applications in clinical treatment. Due to a variety of application requirements, alloy materials with specific properties are being designed continuously. The traditional alloy properties testing experiment is faced with high-cost and time-consuming challenges. Machine learning can accurately predict the properties of materials at a lower cost. However, the predicted performance is limited by the material dataset. We propose a calculation framework of alloy properties based on knowledge transfer. The purpose of the framework is to improve the prediction performance of machine learning models on material datasets. In addition to assembling the experiment dataset, the simulation dataset is also generated manually in the proposed framework. Domain knowledge is extracted from the simulation data and transferred to help train experiment data by the framework. The high accuracy of the simulation data (above 0.9) shows that the framework can effectively extract domain knowledge. With domain knowledge, the prediction performance of experimental data can reach more than 0.8. And it is 10% higher than the traditional machine learning method. The explanatory ability of the model is enhanced with the help of domain knowledge. In addition, five tasks are applied to show the framework is a general method.
Collapse
Affiliation(s)
- Hang Sun
- School of Biomedical Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Heye Zhang
- School of Biomedical Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Guangli Ren
- Department of Pediatric, General Hospital of Southern Theater Command of PLA, Guangzhou 510010, China
| | - Chao Zhang
- School of Biomedical Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| |
Collapse
|
4
|
Li Z, Jiang M, Wang S, Zhang S. Deep learning methods for molecular representation and property prediction. Drug Discov Today 2022;:103373. [PMID: 36167282 DOI: 10.1016/j.drudis.2022.103373] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 08/22/2022] [Accepted: 09/21/2022] [Indexed: 01/11/2023]
Abstract
With advances in artificial intelligence (AI) methods, computer-aided drug design (CADD) has developed rapidly in recent years. Effective molecular representation and accurate property prediction are crucial tasks in CADD workflows. In this review, we summarize contemporary applications of deep learning (DL) methods for molecular representation and property prediction. We categorize DL methods according to the format of molecular data (1D, 2D, and 3D). In addition, we discuss some common DL models, such as ensemble learning and transfer learning, and analyze the interpretability methods for these models. We also highlight the challenges and opportunities of DL methods for molecular representation and property prediction.
Collapse
|
5
|
Endo S. Applicability Domain of Polyparameter Linear Free Energy Relationship Models Evaluated by Leverage and Prediction Interval Calculation. Environ Sci Technol 2022; 56:5572-5579. [PMID: 35420030 PMCID: PMC9069697 DOI: 10.1021/acs.est.2c00865] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Polyparameter linear free energy relationships (PP-LFERs) are accurate and robust models employed to predict equilibrium partition coefficients (K) of organic chemicals. The accuracy of predictions by a PP-LFER depends on the composition of the respective calibration data set. Generally, extrapolation outside the domain defined by the calibration data is likely to be less accurate than interpolation. In this study, the applicability domain (AD) of PP-LFERs was systematically evaluated by calculating the leverage (h) and prediction interval (PI). Repeated simulations with experimental data showed that the root mean squared error of predictions increased with h. However, the analysis also showed that PP-LFERs calibrated with a large number (e.g., 100) of training data were highly robust against extrapolation error. For such PP-LFERs, the common definition of extrapolation (h > 3 hmean, where hmean is the mean h of all training compounds) may be excessively strict. Alternatively, the PI is proposed as a metric to define the AD of PP-LFERs, as it provides a concrete estimate of the error range that agrees well with the observed errors, even for extreme extrapolations. Additionally, published PP-LFERs were evaluated in terms of their AD using the new concept of AD probes, which indicated the varying predictive performance of PP-LFERs in the existing literature for environmentally relevant compounds.
Collapse
Affiliation(s)
- Satoshi Endo
- Health
and Environmental Risk Division, National
Institute for Environmental Studies (NIES), Onogawa 16-2, Tsukuba, 305-8506 Ibaraki, Japan
- Graduate
School of Engineering, Osaka City University, Sugimoto 3-3-138, Sumiyoshi, 558-8585 Osaka, Japan
- . Phone: ++81-29-850-2695. Fax: ++81-29-850-2870
| |
Collapse
|
6
|
Liu W, Wu Y, Hong Y, Zhang Z, Yue Y, Zhang J. Applications of machine learning in computational nanotechnology. Nanotechnology 2022; 33:162501. [PMID: 34965514 DOI: 10.1088/1361-6528/ac46d7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 12/28/2021] [Indexed: 06/14/2023]
Abstract
Machine learning (ML) has gained extensive attention in recent years due to its powerful data analysis capabilities. It has been successfully applied to many fields and helped the researchers to achieve several major theoretical and applied breakthroughs. Some of the notable applications in the field of computational nanotechnology are ML potentials, property prediction, and material discovery. This review summarizes the state-of-the-art research progress in these three fields. ML potentials bridge the efficiency versus accuracy gap between density functional calculations and classical molecular dynamics. For property predictions, ML provides a robust method that eliminates the need for repetitive calculations for different simulation setups. Material design and drug discovery assisted by ML greatly reduce the capital and time investment by orders of magnitude. In this perspective, several common ML potentials and ML models are first introduced. Using these state-of-the-art models, developments in property predictions and material discovery are overviewed. Finally, this paper was concluded with an outlook on future directions of data-driven research activities in computational nanotechnology.
Collapse
Affiliation(s)
- Wenxiang Liu
- Key Laboratory of Hydraulic Machinery Transients (MOE), School of Power and Mechanical Engineering, Wuhan University, Wuhan, Hubei 430072, People's Republic of China
| | - Yongqiang Wu
- Weichai Power CO., Ltd, Weifang 261061, People's Republic of China
| | - Yang Hong
- Research Computing, RCAC, Purdue University, West Lafayette, IN 47907, United States of America
| | - Zhongtao Zhang
- Holland Computing Center, University of Nebraska-Lincoln, Lincoln, NE, United States of America
| | - Yanan Yue
- Key Laboratory of Hydraulic Machinery Transients (MOE), School of Power and Mechanical Engineering, Wuhan University, Wuhan, Hubei 430072, People's Republic of China
| | - Jingchao Zhang
- NVIDIA AI Technology Center (NVAITC), Santa Clara, CA 95051, United States of America
| |
Collapse
|
7
|
Tang L, Zhu W. Computational Design of High Energy RDX-Based Derivatives: Property Prediction, Intermolecular Interactions, and Decomposition Mechanisms. Molecules 2021; 26:7199. [PMID: 34885779 DOI: 10.3390/molecules26237199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Revised: 11/23/2021] [Accepted: 11/25/2021] [Indexed: 11/17/2022] Open
Abstract
A series of new high-energy insensitive compounds were designed based on 1,3,5-trinitro-1,3,5-triazinane (RDX) skeleton through incorporating -N(NO2)-CH2-N(NO2)-, -N(NH2)-, -N(NO2)-, and -O- linkages. Then, their electronic structures, heats of formation, detonation properties, and impact sensitivities were analyzed and predicted using DFT. The types of intermolecular interactions between their bimolecular assemble were analyzed. The thermal decomposition of one compound with excellent performance was studied through ab initio molecular dynamics simulations. All the designed compounds exhibit excellent detonation properties superior to 2,4,6,8,10,12-hexanitro-2,4,6,8,10,12-hexaazaisowurtzitane (CL-20), and lower impact sensitivity than CL-20. Thus, they may be viewed as promising candidates for high energy density compounds. Overall, our design strategy that the construction of bicyclic or cage compounds based on the RDX framework through incorporating the intermolecular linkages is very beneficial for developing novel energetic compounds with excellent detonation performance and low sensitivity.
Collapse
|
8
|
Abstract
From studying the atomic structure and chemical behavior to the discovery of new materials and investigating properties of existing materials, machine learning (ML) has been employed in realms that are arduous to probe experimentally. While numerous highly accurate models, specifically for property prediction, have been reported in the literature, there has been a lack of a generalized framework. Herein we propose a novel feature selection approach that enables the development of a unified ML model for property prediction for several classes of materials. It involves an ingenious blending of selected features from various classes of data such that the resultant feature set equips the model with global data descriptors capturing both class-specific as well as global traits. We took accurate band gaps of three distinct classes of 2D materials as our target property to develop the proposed feature blending approach. Using Gaussian process regression (GPR) with the blended features, the ML model developed here resulted in an average root-mean-squared error of 0.12 eV for unseen data belonging to any of the participating classes. The feature blending approach proposed here can be extended to additional classes of materials and also to predict other properties.
Collapse
|
9
|
Abstract
Deep learning, an emerging field of artificial intelligence based on neural networks in machine learning, has been applied in various fields and is highly valued. Herein, we mainly review several mainstream architectures in deep learning, including deep neural networks, convolutional neural networks and recurrent neural networks in the field of drug discovery. The applications of these architectures in molecular de novo design, property prediction, biomedical imaging and synthetic planning have also been explored. Apart from that, we further discuss the future direction of the deep learning approaches and the main challenges we need to address.
Collapse
Affiliation(s)
- Feng Wang
- College of Information Science and Engineering, Huaide College of Changzhou University, Taizhou 214500, China
| | - XiaoMin Diao
- College of Information Science and Engineering, Huaide College of Changzhou University, Taizhou 214500, China
| | - Shan Chang
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou 213001, China
| |
Collapse
|
10
|
De Breuck PP, Evans ML, Rignanese GM. Robust model benchmarking and bias-imbalance in data-driven materials science: a case study on MODNet. J Phys Condens Matter 2021; 33:404002. [PMID: 34237716 DOI: 10.1088/1361-648x/ac1280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 07/08/2021] [Indexed: 06/13/2023]
Abstract
As the number of novel data-driven approaches to material science continues to grow, it is crucial to perform consistent quality, reliability and applicability assessments of model performance. In this paper, we benchmark the Materials Optimal Descriptor Network (MODNet) method and architecture against the recently released MatBench v0.1, a curated test suite of materials datasets. MODNet is shown to outperform current leaders on 6 of the 13 tasks, while closely matching the current leaders on a further 2 tasks; MODNet performs particularly well when the number of samples is below 10 000. Attention is paid to two topics of concern when benchmarking models. First, we encourage the reporting of a more diverse set of metrics as it leads to a more comprehensive and holistic comparison of model performance. Second, an equally important task is the uncertainty assessment of a model towards a target domain. Significant variations in validation errors can be observed, depending on the imbalance and bias in the training set (i.e., similarity between training and application space). By using an ensemble MODNet model, confidence intervals can be built and the uncertainty on individual predictions can be quantified. Imbalance and bias issues are often overlooked, and yet are important for successful real-world applications of machine learning in materials science and condensed matter.
Collapse
Affiliation(s)
- Pierre-Paul De Breuck
- Université catholique de Louvain (UCLouvain), Institute of Condensed Matter and Nanosciences (IMCN), Chemin des Étoiles 8, B-1348 Louvain-la-Neuve, Belgium
| | - Matthew L Evans
- Université catholique de Louvain (UCLouvain), Institute of Condensed Matter and Nanosciences (IMCN), Chemin des Étoiles 8, B-1348 Louvain-la-Neuve, Belgium
| | - Gian-Marco Rignanese
- Université catholique de Louvain (UCLouvain), Institute of Condensed Matter and Nanosciences (IMCN), Chemin des Étoiles 8, B-1348 Louvain-la-Neuve, Belgium
| |
Collapse
|
11
|
Abstract
Prediction of molecular properties plays a critical role towards rational drug design. In this study, the Molecular Topographic Map (MTM) is proposed, which is a two-dimensional (2D) map that can be used to represent a molecule. An MTM is generated from the atomic features set of a molecule using generative topographic mapping and is then used as input data for analyzing structure-property/activity relationships. In the visualization and classification of 20 amino acids, differences of the amino acids can be visually confirmed from and revealed by hierarchical clustering with a similarity matrix of their MTMs. The prediction of molecular properties was performed on the basis of convolutional neural networks using MTMs as input data. The performance of the predictive models using MTM was found to be equal to or better than that using Morgan fingerprint or MACCS keys. Furthermore, data augmentation of MTMs using mixup has improved the prediction performance. Since molecules converted to MTMs can be treated like 2D images, they can be easily used with existing neural networks for image recognition and related technologies. MTM can be effectively utilized to predict molecular properties of small molecules to aid drug discovery research.
Collapse
Affiliation(s)
- Atsushi Yoshimori
- Institute for Theoretical Medicine, Inc., 26-1, Muraoka-Higashi 2-chome, Fujisawa 251-0012, Japan
| |
Collapse
|
12
|
Li C, Wang J, Niu Z, Yao J, Zeng X. A spatial-temporal gated attention module for molecular property prediction based on molecular geometry. Brief Bioinform 2021; 22:6210061. [PMID: 33822856 DOI: 10.1093/bib/bbab078] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2021] [Revised: 02/04/2021] [Accepted: 02/19/2021] [Indexed: 12/27/2022] Open
Abstract
MOTIVATION Geometry-based properties and characteristics of drug molecules play an important role in drug development for virtual screening in computational chemistry. The 3D characteristics of molecules largely determine the properties of the drug and the binding characteristics of the target. However, most of the previous studies focused on 1D or 2D molecular descriptors while ignoring the 3D topological structure, thereby degrading the performance of molecule-related prediction. Because it is very time-consuming to use dynamics to simulate molecular 3D conformer, we aim to use machine learning to represent 3D molecules by using the generated 3D molecular coordinates from the 2D structure. RESULTS We proposed Drug3D-Net, a novel deep neural network architecture based on the spatial geometric structure of molecules for predicting molecular properties. It is grid-based 3D convolutional neural network with spatial-temporal gated attention module, which can extract the geometric features for molecular prediction tasks in the process of convolution. The effectiveness of Drug3D-Net is verified on the public molecular datasets. Compared with other deep learning methods, Drug3D-Net shows superior performance in predicting molecular properties and biochemical activities. AVAILABILITY AND IMPLEMENTATION https://github.com/anny0316/Drug3D-Net. SUPPLEMENTARY DATA Supplementary data are available online at https://academic.oup.com/bib.
Collapse
Affiliation(s)
- Chunyan Li
- School of Informatics, Xiamen University, Xiamen 361005, China.,Yunnan Minzu University, Kunming 650500, China
| | - Jianmin Wang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410205, China
| | | | - Junfeng Yao
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410205, China
| |
Collapse
|
13
|
Affiliation(s)
- Jürgen Bajorath
- Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology & Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, Bonn D, 53115, Germany
| |
Collapse
|
14
|
Togo R, Saito N, Maeda K, Ogawa T, Haseyama M. Rubber Material Property Prediction Using Electron Microscope Images of Internal Structures Taken under Multiple Conditions. Sensors (Basel) 2021; 21:2088. [PMID: 33809765 DOI: 10.3390/s21062088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Revised: 03/10/2021] [Accepted: 03/10/2021] [Indexed: 11/22/2022]
Abstract
A method for prediction of properties of rubber materials utilizing electron microscope images of internal structures taken under multiple conditions is presented in this paper. Electron microscope images of rubber materials are taken under several conditions, and effective conditions for the prediction of properties are different for each rubber material. Novel approaches for the selection and integration of reliable prediction results are used in the proposed method. The proposed method enables selection of reliable results based on prediction intervals that can be derived by the predictors that are each constructed from electron microscope images taken under each condition. By monitoring the relationship between prediction results and prediction intervals derived from the corresponding predictors, it can be determined whether the target prediction results are reliable. Furthermore, the proposed method integrates the selected reliable results based on Dempster–Shafer (DS) evidence theory, and this integration result is regarded as a final prediction result. The DS evidence theory enables integration of multiple prediction results, even if the results are obtained from different imaging conditions. This means that integration can even be realized if electron microscope images of each material are taken under different conditions and even if these conditions are different for target materials. This nonconventional approach is suitable for our application, i.e., property prediction. Experiments on rubber material data showed that the evaluation index mean absolute percent error (MAPE) was under 10% by the proposed method. The performance of the proposed method outperformed conventional comparative property estimation methods. Consequently, the proposed method can realize accurate prediction of the properties with consideration of the characteristic of electron microscope images described above.
Collapse
|
15
|
Tsukanov A, Ivonin D, Gotman I, Gutmanas EY, Grachev E, Pervikov A, Lerner M. Effect of Cold-Sintering Parameters on Structure, Density, and Topology of Fe-Cu Nanocomposites. Materials (Basel) 2020; 13:ma13030541. [PMID: 31979235 PMCID: PMC7040682 DOI: 10.3390/ma13030541] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 01/16/2020] [Accepted: 01/19/2020] [Indexed: 12/01/2022]
Abstract
The design of advanced nanostructured materials with predetermined physical properties requires knowledge of the relationship between these properties and the internal structure of the material at the nanoscale, as well as the dependence of the internal structure on the production (synthesis) parameters. This work is the first report of computer-aided analysis of high pressure consolidation (cold sintering) of bimetallic nanoparticles of two immiscible (Fe and Cu) metals using the embedded atom method (EAM). A detailed study of the effect of cold sintering parameters on the internal structure and properties of bulk Fe–Cu nanocomposites was conducted within the limitations of the numerical model. The variation of estimated density and bulk porosity as a function of Fe-to-Cu ratio and consolidation pressure was found in good agreement with the experimental data. For the first time, topological analysis using Minkowski functionals was applied to characterize the internal structure of a bimetallic nanocomposite. The dependence of topological invariants on input processing parameters was described for various components and structural phases. The model presented allows formalizing the relationship between the internal structure and properties of the studied nanocomposites. Based on the obtained topological invariants and Hadwiger’s theorem we propose a new tool for computer-aided design of bimetallic Fe–Cu nanocomposites.
Collapse
Affiliation(s)
- Alexey Tsukanov
- Center for Computational and Data-Intensive Science and Engineering (CDISE), Skolkovo Institute of Science and Technology (Skoltech), 30, bld. 1, Bolshoy Boulevard, 121205 Moscow, Russia
- Correspondence:
| | - Dmitriy Ivonin
- Faculty of Physics, Lomonosov Moscow State University, GSP-1, 1-2 Leninskie Gory, 119991 Moscow, Russia; (D.I.); (E.G.)
| | - Irena Gotman
- Department of Mechanical Engineering, ORT Braude College, Karmiel 2161002, Israel;
| | - Elazar Y. Gutmanas
- Department of Materials Science and Engineering, Technion-Israel Institute of Technology, Haifa 32000, Israel;
| | - Eugene Grachev
- Faculty of Physics, Lomonosov Moscow State University, GSP-1, 1-2 Leninskie Gory, 119991 Moscow, Russia; (D.I.); (E.G.)
| | - Aleksandr Pervikov
- Institute of Strength Physics and Materials Science of SB RAS, 2/4, pr. Akademicheskii, 634055 Tomsk, Russia; (A.P.); (M.L.)
| | - Marat Lerner
- Institute of Strength Physics and Materials Science of SB RAS, 2/4, pr. Akademicheskii, 634055 Tomsk, Russia; (A.P.); (M.L.)
| |
Collapse
|
16
|
Abstract
Artificial Intelligence (AI) plays a pivotal role in drug discovery. In particular artificial neural networks such as deep neural networks or recurrent networks drive this area. Numerous applications in property or activity predictions like physicochemical and ADMET properties have recently appeared and underpin the strength of this technology in quantitative structure-property relationships (QSPR) or quantitative structure-activity relationships (QSAR). Artificial intelligence in de novo design drives the generation of meaningful new biologically active molecules towards desired properties. Several examples establish the strength of artificial intelligence in this field. Combination with synthesis planning and ease of synthesis is feasible and more and more automated drug discovery by computers is expected in the near future.
Collapse
Affiliation(s)
- Gerhard Hessler
- R&D, Integrated Drug Discovery, Industriepark Hoechst, 65926 Frankfurt am Main, Germany.
| | | |
Collapse
|
17
|
Abstract
Chemoinformatics provides computer methods for learning from chemical data and for modeling tasks a chemist is facing. The field has evolved in the past 50 years and has substantially shaped how chemical research is performed by providing access to chemical information on a scale unattainable by traditional methods. Many physical, chemical and biological data have been predicted from structural data. For the early phases of drug design, methods have been developed that are used in all major pharmaceutical companies. However, all domains of chemistry can benefit from chemoinformatics methods; many areas that are not yet well developed, but could substantially gain from the use of chemoinformatics methods. The quality of data is of crucial importance for successful results. Computer-assisted structure elucidation and computer-assisted synthesis design have been attempted in the early years of chemoinformatics. Because of the importance of these fields to the chemist, new approaches should be made with better hardware and software techniques. Society's concern about the impact of chemicals on human health and the environment could be met by the development of methods for toxicity prediction and risk assessment. In conjunction with bioinformatics, our understanding of the events in living organisms could be deepened and, thus, novel strategies for curing diseases developed. With so many challenging tasks awaiting solutions, the future is bright for chemoinformatics.
Collapse
|
18
|
Skvortsov VS, Alekseychuk NN, Khudyakov DV, Romero Reyes IV. [pIPredict: a computer tool for predicting isoelectric points of peptides and proteins]. Biomed Khim 2015; 61:83-91. [PMID: 25762601 DOI: 10.18097/pbmc20156101083] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
The data on approximate values of isoelectric point (pI) of peptides obtained during their fractionation by isoelectric focusing can be successfully used for the calculation of the pKa's scale for amino acid residues. This scale can be used for pI prediction. The data of peptide fractionation also provides information about various posttranslational modifications (PTM), so that the prediction of pI may be performed for a wide range of protein forms. In this study, pKa values were calculated using a set of 13448 peptides (including 300 peptides with PTMs significant for pI calculation). The pKa constants were calculated for N-terminal, internal and C-terminal amino acid residues separately. The comparative analysis has shown that our scale increases the accuracy of pI prediction for peptides and proteins and successfully competes with traditional scales and such methods as support vector machines and artificial neural networks. The prediction performed by this scale, can be made in our program pIPredict with GUI written in JAVA as executable jar-archive. The program is freely available for academic users at http://www.ibmc.msk.ru/LPCIT/pIPredict. The software has also the possibility of pI predicting by some other scales; it recognizes some PTM and has the ability to use a custom scale.
Collapse
|
19
|
Rybina AV, Skvortsov VS, Kopylov AT, Zgoda VG. [A plain method of prediction of visibility of peptides in mass spectrometry with electrospray ionization]. Biomed Khim 2015; 60:707-12. [PMID: 25552513 DOI: 10.18097/pbmc20146006707] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
A new method for screening of essential peptides for protein detection and quantification analysis in the direct positive electrospray mass spectrometry has been proposed. Our method is based on the prediction of the normalized abundance of the mass spectrometric peaks using a linear regression model. This method has the following limitations: (i) selected peptides should be taken so that at pH 2.5 the tested peptides must be presented mainly as the 2+ and 3+ ions; (ii) only peptides having C-terminal lysine or arginine residues are considered. The amino acid composition of the peptide, the peptide concentration, the ratio of the polar surface of peptide to common surface and ratio of the polar volume to common volume are used as independent variables in equation. Several combinations of variables were considered and the best linear regression model had a determination coefficient in leave-one-out validation procedure equal 0.54. This model confidently discriminates peptides with high response ability and peptides with low response ability, and therefore it allows to select only the most promising peptides. This screening method, a plain and fast, can be successfully applied to reduce the list of observed peptides.
Collapse
Affiliation(s)
- A V Rybina
- Orekhovich Institute of Biomedical Chemistry
| | | | - A T Kopylov
- Orekhovich Institute of Biomedical Chemistry
| | - V G Zgoda
- Orekhovich Institute of Biomedical Chemistry
| |
Collapse
|