1
|
Kuroshima D, Kilgour M, Tuckerman ME, Rogal J. Machine Learning Classification of Local Environments in Molecular Crystals. J Chem Theory Comput 2024. [PMID: 38959410 DOI: 10.1021/acs.jctc.4c00418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/05/2024]
Abstract
Identifying local structural motifs and packing patterns of molecular solids is a challenging task for both simulation and experiment. We demonstrate two novel approaches to characterize local environments in different polymorphs of molecular crystals using learning models that employ either flexibly learned or handcrafted molecular representations. In the first case, we follow our earlier work on graph learning in molecular crystals, deploying an atomistic graph convolutional network combined with molecule-wise aggregation to enable per-molecule environmental classification. For the second model, we develop a new set of descriptors based on symmetry functions combined with a point-vector representation of the molecules, encoding information about the positions and relative orientations of the molecule. We demonstrate very high classification accuracy for both approaches on urea and nicotinamide crystal polymorphs and practical applications to the analysis of dynamical trajectory data for nanocrystals and solid-solid interfaces. Both architectures are applicable to a wide range of molecules and diverse topologies, providing an essential step in the exploration of complex condensed matter phenomena.
Collapse
Affiliation(s)
- Daisuke Kuroshima
- Department of Chemistry, New York University (NYU), New York, New York 10003, United States
| | - Michael Kilgour
- Department of Chemistry, New York University (NYU), New York, New York 10003, United States
| | - Mark E Tuckerman
- Department of Chemistry, New York University (NYU), New York, New York 10003, United States
- Courant Institute of Mathematical Sciences, New York University, New York, New York 10012, United States
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, 3663 Zhongshan Rd. North, Shanghai 200062, China
- Simons Center for Computational Physical Chemistry at New York University, New York, New York 10003, United States
| | - Jutta Rogal
- Department of Chemistry, New York University (NYU), New York, New York 10003, United States
- Fachbereich Physik, Freie Universität Berlin, Berlin 14195, Germany
| |
Collapse
|
2
|
Kadan A, Ryczko K, Wildman A, Wang R, Roitberg A, Yamazaki T. Accelerated Organic Crystal Structure Prediction with Genetic Algorithms and Machine Learning. J Chem Theory Comput 2023; 19:9388-9402. [PMID: 38059458 DOI: 10.1021/acs.jctc.3c00853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/08/2023]
Abstract
We present a high-throughput, end-to-end pipeline for organic crystal structure prediction (CSP)─the problem of identifying the stable crystal structures that will form from a given molecule based only on its molecular composition. Our tool uses neural network potentials to allow for efficient screening and structural relaxation of generated crystal candidates. Our pipeline consists of two distinct stages: random search, whereby crystal candidates are randomly generated and screened, and optimization, where a genetic algorithm (GA) optimizes this screened population. We assess the performance of each stage of our pipeline on 21 molecules taken from the Cambridge Crystallographic Data Centre's CSP blind tests. We show that random search alone yields matches for ≈50% of targets. We then validate the potential of our full pipeline, making use of the GA to optimize the root-mean-square deviation between crystal candidates and the experimentally derived structure. With this approach, we are able to find matches for ≈80% of candidates with 10-100 times smaller initial population sizes than when using random search. Lastly, we run our full pipeline with an ANI model that is trained on a small data set of molecules extracted from crystal structures in the Cambridge Structural Database, generating ≈60% of targets. By leveraging machine learning models trained to predict energies at the density functional theory level, our pipeline has the potential to approach the accuracy of ab initio methods and the efficiency of empirical force fields.
Collapse
Affiliation(s)
- Amit Kadan
- Good Chemistry Company, 1285 W Pender Street, Vancouver, British Columbia V6E 4B1, Canada
| | - Kevin Ryczko
- Good Chemistry Company, 1285 W Pender Street, Vancouver, British Columbia V6E 4B1, Canada
| | - Andrew Wildman
- Good Chemistry Company, 1285 W Pender Street, Vancouver, British Columbia V6E 4B1, Canada
| | - Rodrigo Wang
- Good Chemistry Company, 1285 W Pender Street, Vancouver, British Columbia V6E 4B1, Canada
| | - Adrian Roitberg
- Department of Chemistry, University of Florida, P.O. Box 117200, Gainesville, Florida 32611-7200, United States
| | - Takeshi Yamazaki
- Good Chemistry Company, 1285 W Pender Street, Vancouver, British Columbia V6E 4B1, Canada
| |
Collapse
|
3
|
Beran GJO. Frontiers of molecular crystal structure prediction for pharmaceuticals and functional organic materials. Chem Sci 2023; 14:13290-13312. [PMID: 38033897 PMCID: PMC10685338 DOI: 10.1039/d3sc03903j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 11/02/2023] [Indexed: 12/02/2023] Open
Abstract
The reliability of organic molecular crystal structure prediction has improved tremendously in recent years. Crystal structure predictions for small, mostly rigid molecules are quickly becoming routine. Structure predictions for larger, highly flexible molecules are more challenging, but their crystal structures can also now be predicted with increasing rates of success. These advances are ushering in a new era where crystal structure prediction drives the experimental discovery of new solid forms. After briefly discussing the computational methods that enable successful crystal structure prediction, this perspective presents case studies from the literature that demonstrate how state-of-the-art crystal structure prediction can transform how scientists approach problems involving the organic solid state. Applications to pharmaceuticals, porous organic materials, photomechanical crystals, organic semi-conductors, and nuclear magnetic resonance crystallography are included. Finally, efforts to improve our understanding of which predicted crystal structures can actually be produced experimentally and other outstanding challenges are discussed.
Collapse
Affiliation(s)
- Gregory J O Beran
- Department of Chemistry, University of California Riverside Riverside CA 92521 USA
| |
Collapse
|
4
|
Lecca P, Lecca M. Graph embedding and geometric deep learning relevance to network biology and structural chemistry. Front Artif Intell 2023; 6:1256352. [PMID: 38035201 PMCID: PMC10687447 DOI: 10.3389/frai.2023.1256352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 10/16/2023] [Indexed: 12/02/2023] Open
Abstract
Graphs are used as a model of complex relationships among data in biological science since the advent of systems biology in the early 2000. In particular, graph data analysis and graph data mining play an important role in biology interaction networks, where recent techniques of artificial intelligence, usually employed in other type of networks (e.g., social, citations, and trademark networks) aim to implement various data mining tasks including classification, clustering, recommendation, anomaly detection, and link prediction. The commitment and efforts of artificial intelligence research in network biology are motivated by the fact that machine learning techniques are often prohibitively computational demanding, low parallelizable, and ultimately inapplicable, since biological network of realistic size is a large system, which is characterised by a high density of interactions and often with a non-linear dynamics and a non-Euclidean latent geometry. Currently, graph embedding emerges as the new learning paradigm that shifts the tasks of building complex models for classification, clustering, and link prediction to learning an informative representation of the graph data in a vector space so that many graph mining and learning tasks can be more easily performed by employing efficient non-iterative traditional models (e.g., a linear support vector machine for the classification task). The great potential of graph embedding is the main reason of the flourishing of studies in this area and, in particular, the artificial intelligence learning techniques. In this mini review, we give a comprehensive summary of the main graph embedding algorithms in light of the recent burgeoning interest in geometric deep learning.
Collapse
Affiliation(s)
- Paola Lecca
- Faculty of Engineering, Free University of Bozen-Bolzano, Bolzano, Italy
| | - Michela Lecca
- Fondazione Bruno Kessler, Digital Industry Center, Technologies of Vision, Trento, Italy
| |
Collapse
|
5
|
Wang B, Hilleke KP, Hajinazar S, Frapper G, Zurek E. Structurally Constrained Evolutionary Algorithm for the Discovery and Design of Metastable Phases. J Chem Theory Comput 2023; 19:7960-7971. [PMID: 37856841 DOI: 10.1021/acs.jctc.3c00594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2023]
Abstract
Metastable materials are abundant in nature and technology, showcasing remarkable properties that inspire innovative materials design. However, traditional crystal structure prediction methods, which rely solely on energetic factors to determine a structure's fitness, are not suitable for predicting the vast number of potentially synthesizable phases that represent a local minimum corresponding to a state in thermodynamic equilibrium. Here, we present a new approach for the prediction of metastable phases with specific structural features and interface this method with the XtalOpt evolutionary algorithm. Our method relies on structural features that include the local crystalline order (e.g, the coordination number or chemical environment), and symmetry (e.g, Bravais lattice and space group) to filter the breeding pool of an evolutionary crystal structure search. The effectiveness of this approach is benchmarked on three known metastable systems: XeN8, with a two-dimensional polymeric nitrogen sublattice, brookite TiO2, and a high pressure BaH4 phase, which was recently characterized. Additionally, a newly predicted metastable melaminate salt, P1̅ WC3N6, was found to possess an energy that is lower than that of two phases proposed in a recent computational study. The method presented here could help in identifying the structures of compounds that have already been synthesized, and in developing new synthesis targets with desired properties.
Collapse
Affiliation(s)
- Busheng Wang
- Department of Chemistry, State University of New York at Buffalo, Buffalo, New York 14260-3000, United States
| | - Katerina P Hilleke
- Department of Chemistry, State University of New York at Buffalo, Buffalo, New York 14260-3000, United States
| | - Samad Hajinazar
- Department of Chemistry, State University of New York at Buffalo, Buffalo, New York 14260-3000, United States
| | - Gilles Frapper
- Applied Quantum Chemistry Group, E4 Team, IC2MP UMR 7285, Université de Poitiers, CNRS, Poitiers 86073, France
| | - Eva Zurek
- Department of Chemistry, State University of New York at Buffalo, Buffalo, New York 14260-3000, United States
| |
Collapse
|