Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Cravero F, Schustik SA, Martínez MJ, Vázquez GE, Díaz MF, Ponzoni I. Feature Selection for Polymer Informatics: Evaluating Scalability and Robustness of the FS4RV_DD Algorithm Using Synthetic Polydisperse Data Sets. J Chem Inf Model 2020;60:592-603. [PMID: 31790226 DOI: 10.1021/acs.jcim.9b00867] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]

For:	Cravero F, Schustik SA, Martínez MJ, Vázquez GE, Díaz MF, Ponzoni I. Feature Selection for Polymer Informatics: Evaluating Scalability and Robustness of the FS4RV_DD Algorithm Using Synthetic Polydisperse Data Sets. J Chem Inf Model 2020;60:592-603. [PMID: 31790226 DOI: 10.1021/acs.jcim.9b00867] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]

Number

Cited by Other Article(s)

Tao L, Varshney V, Li Y. Benchmarking Machine Learning Models for Polymer Informatics: An Example of Glass Transition Temperature. J Chem Inf Model 2021;61:5395-5413. [PMID: 34662106 DOI: 10.1021/acs.jcim.1c01031] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

Abstract

In the field of polymer informatics, utilizing machine learning (ML) techniques to evaluate the glass transition temperature T_g and other properties of polymers has attracted extensive attention. This data-centric approach is much more efficient and practical than the laborious experimental measurements when encountered a daunting number of polymer structures. Various ML models are demonstrated to perform well for T_g prediction. Nevertheless, they are trained on different data sets, using different structure representations, and based on different feature engineering methods. Thus, the critical question arises on selecting a proper ML model to better handle the T_g prediction with generalization ability. To provide a fair comparison of different ML techniques and examine the key factors that affect the model performance, we carry out a systematic benchmark study by compiling 79 different ML models and training them on a large and diverse data set. The three major components in setting up an ML model are structure representations, feature representations, and ML algorithms. In terms of polymer structure representation, we consider the polymer monomer, repeat unit, and oligomer with longer chain structure. Based on that feature, representation is calculated, including Morgan fingerprinting with or without substructure frequency, RDKit descriptors, molecular embedding, molecular graph, etc. Afterward, the obtained feature input is trained using different ML algorithms, such as deep neural networks, convolutional neural networks, random forest, support vector machine, LASSO regression, and Gaussian process regression. We evaluate the performance of these ML models using a holdout test set and an extra unlabeled data set from high-throughput molecular dynamics simulation. The ML model's generalization ability on an unlabeled data set is especially focused, and the model's sensitivity to topology and the molecular weight of polymers is also taken into consideration. This benchmark study provides not only a guideline for the T_g prediction task but also a useful reference for other polymer informatics tasks.

Collapse

Soares TA, Wahab HA. Outlook on the Development and Application of Molecular Simulations in Latin America. J Chem Inf Model 2020;60:435-438. [PMID: 32009389 DOI: 10.1021/acs.jcim.0c00112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]