1
|
Sharma V, Giammona M, Zubarev D, Tek A, Nugyuen K, Sundberg L, Congiu D, La YH. Formulation Graphs for Mapping Structure-Composition of Battery Electrolytes to Device Performance. J Chem Inf Model 2023; 63:6998-7010. [PMID: 37948621 PMCID: PMC10685446 DOI: 10.1021/acs.jcim.3c01030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 10/21/2023] [Accepted: 10/24/2023] [Indexed: 11/12/2023]
Abstract
Advanced computational methods are being actively sought to address the challenges associated with the discovery and development of new combinatorial materials, such as formulations. A widely adopted approach involves domain-informed high-throughput screening of individual components that can be combined together to form a formulation. This manages to accelerate the discovery of new compounds for a target application but still leaves the process of identifying the right "formulation" from the shortlisted chemical space largely a laboratory experiment-driven process. We report a deep learning model, the Formulation Graph Convolution Network (F-GCN), that can map the structure-composition relationship of the formulation constituents to the property of liquid formulation as a whole. Multiple GCNs are assembled in parallel that featurize formulation constituents domain-intuitively on the fly. The resulting molecular descriptors are scaled based on the respective constituent's molar percentage in the formulation, followed by integration into a combined formulation descriptor that represents the complete formulation to an external learning architecture. The use case of the proposed formulation learning model is demonstrated for battery electrolytes by training and testing it on two exemplary data sets representing electrolyte formulations vs battery performance: one data set is sourced from the literature about Li/Cu half-cells, while the other is obtained by lab experiments related to lithium-iodide full-cell chemistry. The model is shown to predict performance metrics such as Coulombic efficiency (CE) and specific capacity of new electrolyte formulations with the lowest reported errors. The best-performing F-GCN model uses molecular descriptors derived from molecular graphs (GCNs) that are informed with HOMO-LUMO and electric moment properties of the molecules using a knowledge transfer technique.
Collapse
Affiliation(s)
- Vidushi Sharma
- IBM Almaden Research Center, 650 Harry Rd, San Jose, California 95120, United States
| | - Maxwell Giammona
- IBM Almaden Research Center, 650 Harry Rd, San Jose, California 95120, United States
| | - Dmitry Zubarev
- IBM Almaden Research Center, 650 Harry Rd, San Jose, California 95120, United States
| | - Andy Tek
- IBM Almaden Research Center, 650 Harry Rd, San Jose, California 95120, United States
| | - Khanh Nugyuen
- IBM Almaden Research Center, 650 Harry Rd, San Jose, California 95120, United States
| | - Linda Sundberg
- IBM Almaden Research Center, 650 Harry Rd, San Jose, California 95120, United States
| | - Daniele Congiu
- IBM Almaden Research Center, 650 Harry Rd, San Jose, California 95120, United States
| | - Young-Hye La
- IBM Almaden Research Center, 650 Harry Rd, San Jose, California 95120, United States
| |
Collapse
|
2
|
Ng WP, Liang Q, Yang J. Low-Data Deep Quantum Chemical Learning for Accurate MP2 and Coupled-Cluster Correlations. J Chem Theory Comput 2023; 19:5439-5449. [PMID: 37506400 DOI: 10.1021/acs.jctc.3c00518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/30/2023]
Abstract
Accurate ab initio prediction of electronic energies is very expensive for macromolecules by explicitly solving post-Hartree-Fock equations. We here exploit the physically justified local correlation feature in a compact basis of small molecules and construct an expressive low-data deep neural network (dNN) model to obtain machine-learned electron correlation energies on par with MP2 and CCSD levels of theory for more complex molecules and different datasets that are not represented in the training set. We show that our dNN-powered model is data efficient and makes highly transferable predictions across alkanes of various lengths, organic molecules with non-covalent and biomolecular interactions, as well as water clusters of different sizes and morphologies. In particular, by training 800 (H2O)8 clusters with the local correlation descriptors, accurate MP2/cc-pVTZ correlation energies up to (H2O)128 can be predicted with a small random error within chemical accuracy from exact values, while a majority of prediction deviations are attributed to an intrinsically systematic error. Our results reveal that an extremely compact local correlation feature set, which is poor for any direct post-Hartree-Fock calculations, has however a prominent advantage in reserving important electron correlation patterns for making accurate transferable predictions across distinct molecular compositions, bond types, and geometries.
Collapse
Affiliation(s)
- Wai-Pan Ng
- Department of Chemistry, The University of Hong Kong, Hong Kong 999077, P. R. China
- Hong Kong Quantum AI Lab Limited, Hong Kong 999077, P. R. China
| | - Qiujiang Liang
- Department of Chemistry, The University of Hong Kong, Hong Kong 999077, P. R. China
| | - Jun Yang
- Department of Chemistry, The University of Hong Kong, Hong Kong 999077, P. R. China
- Hong Kong Quantum AI Lab Limited, Hong Kong 999077, P. R. China
| |
Collapse
|
3
|
Shi J, Albreiki F, Yamil J Colón, Srivastava S, Whitmer JK. Transfer Learning Facilitates the Prediction of Polymer-Surface Adhesion Strength. J Chem Theory Comput 2023; 19:4631-4640. [PMID: 37068204 DOI: 10.1021/acs.jctc.2c01314] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2023]
Abstract
Machine learning (ML) accelerates the exploration of material properties and their links to the structure of the underlying molecules. In previous work [Shi et al. ACS Applied Materials & Interfaces 2022, 14, 37161-37169.], ML models were applied to predict the adhesive free energy of polymer-surface interactions with high accuracy from the knowledge of the sequence data, demonstrating successes in inverse-design of polymer sequence for known surface compositions. While the method was shown to be successful in designing polymers for a known surface, extensive data sets were needed for each specific surface in order to train the surrogate models. Ideally, one should be able to infer information about similar surfaces without having to regenerate a full complement of adhesion data for each new case. In the current work, we demonstrate a transfer learning (TL) technique using a deep neural network to improve the accuracy of ML models trained on small data sets by pretraining on a larger database from a related system and fine-tuning the weights of all layers with a small amount of additional data. The shared knowledge from the pretrained model facilitates the prediction accuracy significantly on small data sets. We also explore the limits of database size on accuracy and the optimal tuning of network architecture and parameters for our learning tasks. While applied to a relatively simple coarse-grained (CG) polymer model, the general lessons of this study apply to detailed modeling studies and the broader problems of inverse materials design.
Collapse
Affiliation(s)
- Jiale Shi
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Fahed Albreiki
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, Los Angeles, California 90095, United States
| | - Yamil J Colón
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Samanvaya Srivastava
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, Los Angeles, California 90095, United States
- California NanoSystems Institute, Center for Biological Physics, University of California, Los Angeles, Los Angeles, California 90095, United States
- Institute for Carbon Management, University of California, Los Angeles, Los Angeles, California 90095, United States
- Center for Biological Physics, University of California, Los Angeles, Los Angeles, California 90095, United States
| | - Jonathan K Whitmer
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, Indiana 46556, United States
| |
Collapse
|
4
|
Martin TB, Audus DJ. Emerging Trends in Machine Learning: A Polymer Perspective. ACS POLYMERS AU 2023; 3:239-258. [PMID: 37334191 PMCID: PMC10273415 DOI: 10.1021/acspolymersau.2c00053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 12/20/2022] [Accepted: 12/21/2022] [Indexed: 01/19/2023]
Abstract
In the last five years, there has been tremendous growth in machine learning and artificial intelligence as applied to polymer science. Here, we highlight the unique challenges presented by polymers and how the field is addressing them. We focus on emerging trends with an emphasis on topics that have received less attention in the review literature. Finally, we provide an outlook for the field, outline important growth areas in machine learning and artificial intelligence for polymer science and discuss important advances from the greater material science community.
Collapse
Affiliation(s)
- Tyler B. Martin
- National Institute of Standards
and Technology, Gaithersburg, Maryland20899, United States
| | - Debra J. Audus
- National Institute of Standards
and Technology, Gaithersburg, Maryland20899, United States
| |
Collapse
|
5
|
Kolluru A, Shoghi N, Shuaibi M, Goyal S, Das A, Zitnick CL, Ulissi Z. Transfer learning using attentions across atomic systems with graph neural networks (TAAG). J Chem Phys 2022; 156:184702. [PMID: 35568535 DOI: 10.1063/5.0088019] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Recent advances in Graph Neural Networks (GNNs) have transformed the space of molecular and catalyst discovery. Despite the fact that the underlying physics across these domains remain the same, most prior work has focused on building domain-specific models either in small molecules or in materials. However, building large datasets across all domains is computationally expensive; therefore, the use of transfer learning (TL) to generalize to different domains is a promising but under-explored approach to this problem. To evaluate this hypothesis, we use a model that is pretrained on the Open Catalyst Dataset (OC20), and we study the model's behavior when fine-tuned for a set of different datasets and tasks. This includes MD17, the *CO adsorbate dataset, and OC20 across different tasks. Through extensive TL experiments, we demonstrate that the initial layers of GNNs learn a more basic representation that is consistent across domains, whereas the final layers learn more task-specific features. Moreover, these well-known strategies show significant improvement over the non-pretrained models for in-domain tasks with improvements of 53% and 17% for the *CO dataset and across the Open Catalyst Project (OCP) task, respectively. TL approaches result in up to 4× speedup in model training depending on the target data and task. However, these do not perform well for the MD17 dataset, resulting in worse performance than the non-pretrained model for few molecules. Based on these observations, we propose transfer learning using attentions across atomic systems with graph Neural Networks (TAAG), an attention-based approach that adapts to prioritize and transfer important features from the interaction layers of GNNs. The proposed method outperforms the best TL approach for out-of-domain datasets, such as MD17, and gives a mean improvement of 6% over a model trained from scratch.
Collapse
Affiliation(s)
- Adeesh Kolluru
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | - Nima Shoghi
- Meta AI Research, Menlo Park, California 94025, USA
| | - Muhammed Shuaibi
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | | | - Abhishek Das
- Meta AI Research, Menlo Park, California 94025, USA
| | | | - Zachary Ulissi
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| |
Collapse
|