1
|
Isazawa T, Cole JM. How Beneficial Is Pretraining on a Narrow Domain-Specific Corpus for Information Extraction about Photocatalytic Water Splitting? J Chem Inf Model 2024; 64:3205-3212. [PMID: 38544337 DOI: 10.1021/acs.jcim.4c00063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Language models trained on domain-specific corpora have been employed to increase the performance in specialized tasks. However, little previous work has been reported on how specific a "domain-specific" corpus should be. Here, we test a number of language models trained on varyingly specific corpora by employing them in the task of extracting information from photocatalytic water splitting. We find that more specific corpora can benefit performance on downstream tasks. Furthermore, PhotocatalysisBERT, a pretrained model from scratch on scientific papers on photocatalytic water splitting, demonstrates improved performance over previous work in associating the correct photocatalyst with the correct photocatalytic activity during information extraction, achieving a precision of 60.8(+11.5)% and a recall of 37.2(+4.5)%.
Collapse
Affiliation(s)
- Taketomo Isazawa
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K
| | - Jacqueline M Cole
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K
| |
Collapse
|
2
|
Jung SG, Jung G, Cole JM. Automatic Prediction of Peak Optical Absorption Wavelengths in Molecules Using Convolutional Neural Networks. J Chem Inf Model 2024; 64:1486-1501. [PMID: 38422386 PMCID: PMC10934802 DOI: 10.1021/acs.jcim.3c01792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 02/15/2024] [Accepted: 02/16/2024] [Indexed: 03/02/2024]
Abstract
Molecular design depends heavily on optical properties for applications such as solar cells and polymer-based batteries. Accurate prediction of these properties is essential, and multiple predictive methods exist, from ab initio to data-driven techniques. Although theoretical methods, such as time-dependent density functional theory (TD-DFT) calculations, have well-established physical relevance and are among the most popular methods in computational physics and chemistry, they exhibit errors that are inherent in their approximate nature. These high-throughput electronic structure calculations also incur a substantial computational cost. With the emergence of big-data initiatives, cost-effective, data-driven methods have gained traction, although their usability is highly contingent on the degree of data quality and sparsity. In this study, we present a workflow that employs deep residual convolutional neural networks (DR-CNN) and gradient boosting feature selection to predict peak optical absorption wavelengths (λmax) exclusively from SMILES representations of dye molecules and solvents; one would normally measure λmax using UV-vis absorption spectroscopy. We use a multifidelity modeling approach, integrating 34,893 DFT calculations and 26,395 experimentally derived λmax data, to deliver more accurate predictions via a Bayesian-optimized gradient boosting machine. Our approach is benchmarked against the state of the art that is reported in the scientific literature; results demonstrate that learnt representations via a DR-CNN workflow that is integrated with other machine learning methods can accelerate the design of molecules for specific optical characteristics.
Collapse
Affiliation(s)
- Son Gyo Jung
- Cavendish
Laboratory, Department of Physics, University
of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.
- ISIS
Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K.
- Research
Complex at Harwell, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0FA, U.K.
| | - Guwon Jung
- Cavendish
Laboratory, Department of Physics, University
of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.
- Research
Complex at Harwell, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0FA, U.K.
- Scientific
Computing Department, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K.
| | - Jacqueline M. Cole
- Cavendish
Laboratory, Department of Physics, University
of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.
- ISIS
Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K.
- Research
Complex at Harwell, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0FA, U.K.
| |
Collapse
|
3
|
Jung SG, Jung G, Cole JM. Automatic Prediction of Band Gaps of Inorganic Materials Using a Gradient Boosted and Statistical Feature Selection Workflow. J Chem Inf Model 2024; 64:1187-1200. [PMID: 38320103 PMCID: PMC10900294 DOI: 10.1021/acs.jcim.3c01897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2024]
Abstract
Machine learning (ML) methods can train a model to predict material properties by exploiting patterns in materials databases that arise from structure-property relationships. However, the importance of ML-based feature analysis and selection is often neglected when creating such models. Such analysis and selection are especially important when dealing with multifidelity data because they afford a complex feature space. This work shows how a gradient-boosted statistical feature-selection workflow can be used to train predictive models that classify materials by their metallicity and predict their band gap against experimental measurements, as well as computational data that are derived from electronic-structure calculations. These models are fine-tuned via Bayesian optimization, using solely the features that are derived from chemical compositions of the materials data. We test these models against experimental, computational, and a combination of experimental and computational data. We find that the multifidelity modeling option can reduce the number of features required to train a model. The performance of our workflow is benchmarked against state-of-the-art algorithms, the results of which demonstrate that our approach is either comparable to or superior to them. The classification model realized an accuracy score of 0.943, a macro-averaged F1-score of 0.940, area under the curve of the receiver operating characteristic curve of 0.985, and an average precision of 0.977, while the regression model achieved a mean absolute error of 0.246, a root-mean squared error of 0.402, and R2 of 0.937. This illustrates the efficacy of our modeling approach and highlights the importance of thorough feature analysis and judicious selection over a "black-box" approach to feature engineering in ML-based modeling.
Collapse
Affiliation(s)
- Son Gyo Jung
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K
- Research Complex at Harwell, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0FA, U.K
| | - Guwon Jung
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K
- Research Complex at Harwell, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0FA, U.K
- Scientific Computing Department, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K
| | - Jacqueline M Cole
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K
- Research Complex at Harwell, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0FA, U.K
| |
Collapse
|
4
|
Huang D, Cole JM. A database of thermally activated delayed fluorescent molecules auto-generated from scientific literature with ChemDataExtractor. Sci Data 2024; 11:80. [PMID: 38233439 PMCID: PMC10794197 DOI: 10.1038/s41597-023-02897-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 12/27/2023] [Indexed: 01/19/2024] Open
Abstract
A database of thermally activated delayed fluorescent (TADF) molecules was automatically generated from the scientific literature. It consists of 25,482 data records with an overall precision of 82%. Among these, 5,349 records have chemical names in the form of SMILES strings which are represented with 91% accuracy; these are grouped in a subsidiary database. Each data record contains one of the following four properties: maximum emission wavelength (λEM), photoluminescence quantum yield (PLQY), singlet-triplet energy splitting (ΔEST), and delayed lifetime (τD). The databases were created through text mining using ChemDataExtractor, a chemistry-aware natural-language-processing toolkit, which has been adapted for TADF research. The text-mined corpus consisted of 2,733 papers from the Royal Society of Chemistry and Elsevier. To the best of our knowledge, these databases are the first databases that have been auto-generated for TADF molecules from existing publications. The databases have been publicly released for experimental and computational applications in the TADF research field.
Collapse
Affiliation(s)
- Dingyun Huang
- Cavendish Laboratory, University of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, UK
| | - Jacqueline M Cole
- Cavendish Laboratory, University of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, UK.
- ISIS Neutron and Muon Source, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0QX, UK.
| |
Collapse
|
5
|
Dong Q, Cole JM. Snowball 2.0: Generic Material Data Parser for ChemDataExtractor. J Chem Inf Model 2023; 63:7045-7055. [PMID: 37934697 PMCID: PMC10685441 DOI: 10.1021/acs.jcim.3c01281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Revised: 10/19/2023] [Accepted: 10/20/2023] [Indexed: 11/09/2023]
Abstract
The ever-growing amount of chemical data found in the scientific literature has led to the emergence of data-driven materials discovery. The first step in the pipeline, to automatically extract chemical information from plain text, has been driven by the development of software toolkits such as ChemDataExtractor. Such data extraction processes have created a demand for parsers that efficiently enable text mining. Here, we present Snowball 2.0, a sentence parser based on a semisupervised machine-learning algorithm. It can be used to extract any chemical property without additional training. We validate its precision, recall, and F-score by training and testing a model with sentences of semiconductor band gap information curated from journal articles. Snowball 2.0 builds on two previously developed Snowball algorithms. Evaluation of Snowball 2.0 shows a 15-20% increase in recall with marginally reduced precision over the previous version which has been incorporated into ChemDataExtractor 2.0, giving Snowball 2.0 better performance in most configurations. Snowball 2.0 offers more and better parsing options for ChemDataExtractor, and it is more capable in the pipeline of automated data extraction. Snowball 2.0 also features better generalizability, performance, learning efficiencies, and user-friendliness.
Collapse
Affiliation(s)
- Qingyang Dong
- Cavendish
Laboratory, Department of Physics, University
of Cambridge, Cambridge CB3 0HE, U.K.
| | - Jacqueline M. Cole
- Cavendish
Laboratory, Department of Physics, University
of Cambridge, Cambridge CB3 0HE, U.K.
- ISIS
Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K.
| |
Collapse
|
6
|
Jung SG, Jung G, Cole JM. Gradient boosted and statistical feature selection workflow for materials property predictions. J Chem Phys 2023; 159:194106. [PMID: 37971034 DOI: 10.1063/5.0171540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 10/13/2023] [Indexed: 11/19/2023] Open
Abstract
With the emergence of big data initiatives and the wealth of available chemical data, data-driven approaches are becoming a vital component of materials discovery pipelines or workflows. The screening of materials using machine-learning models, in particular, is increasingly gaining momentum to accelerate the discovery of new materials. However, the black-box treatment of machine-learning methods suffers from a lack of model interpretability, as feature relevance and interactions can be overlooked or disregarded. In addition, naive approaches to model training often lead to irrelevant features being used which necessitates the need for various regularization techniques to achieve model generalization; this incurs a high computational cost. We present a feature-selection workflow that overcomes this problem by leveraging a gradient boosting framework and statistical feature analyses to identify a subset of features, in a recursive manner, which maximizes their relevance to the target variable or classes. We subsequently obtain minimal feature redundancy through multicollinearity reduction by performing feature correlation and hierarchical cluster analyses. The features are further refined using a wrapper method, which follows a greedy search approach by evaluating all possible feature combinations against the evaluation criterion. A case study on elastic material-property prediction and a case study on the classification of materials by their metallicity are used to illustrate the use of our proposed workflow; although it is highly general, as demonstrated through our wider subsequent prediction of various material properties. Our Bayesian-optimized machine-learning models generated results, without the use of regularization techniques, which are comparable to the state-of-the-art that are reported in the scientific literature.
Collapse
Affiliation(s)
- Son Gyo Jung
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, United Kingdom
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, United Kingdom
- Research Complex at Harwell, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0FA, United Kingdom
| | - Guwon Jung
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, United Kingdom
- Research Complex at Harwell, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0FA, United Kingdom
- Scientific Computing Department, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, United Kingdom
| | - Jacqueline M Cole
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, United Kingdom
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, United Kingdom
- Research Complex at Harwell, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0FA, United Kingdom
| |
Collapse
|
7
|
Wilary D, Cole JM. ReactionDataExtractor 2.0: A Deep Learning Approach for Data Extraction from Chemical Reaction Schemes. J Chem Inf Model 2023; 63:6053-6067. [PMID: 37729111 PMCID: PMC10565829 DOI: 10.1021/acs.jcim.3c00422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Indexed: 09/22/2023]
Abstract
Knowledge in the chemical domain is often disseminated graphically via chemical reaction schemes. The task of describing chemical transformations is greatly simplified by introducing reaction schemes that are composed of chemical diagrams and symbols. While intuitively understood by any chemist, like most graphical representations, such drawings are not easily understood by machines; this poses a challenge in the context of data extraction. Currently available tools are limited in their scope of extraction and require manual preprocessing, thus slowing down the speed of data extraction. We present a new tool, ReactionDataExtractor v2.0, which uses a combination of neural networks and symbolic artificial intelligence to effectively remove this barrier. We have evaluated our tool on a test set composed of reaction schemes that were taken from open-source journal articles and realized F1 score metrics between 75 and 96%. These evaluation metrics can be further improved by tuning our object-detection models to a specific chemical subdomain thanks to a data-driven approach that we have adopted with synthetically generated data. The system architecture of our tool is modular, which allows it to balance speed and accuracy to afford an autonomous, high-throughput solution for image-based chemical data extraction.
Collapse
Affiliation(s)
- Damian
M. Wilary
- Cavendish
Laboratory, Department of Physics, University
of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, U.K.
| | - Jacqueline M. Cole
- Cavendish
Laboratory, Department of Physics, University
of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, U.K.
- ISIS
Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K.
| |
Collapse
|
8
|
Isazawa T, Cole JM. Automated Construction of a Photocatalysis Dataset for Water-Splitting Applications. Sci Data 2023; 10:651. [PMID: 37739960 PMCID: PMC10517137 DOI: 10.1038/s41597-023-02511-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 08/29/2023] [Indexed: 09/24/2023] Open
Abstract
We present an automatically generated dataset of 15,755 records that were extracted from 47,357 papers. These records contain water-splitting activity in the presence of certain photocatalysts, along with additional information about the chemical reaction conditions under which this activity was recorded. These conditions include any co-catalysts and additives that were present during water splitting, the length of time for which the photocatalytic experiment was conducted, and the type of light source used, including its wavelength. Despite the text extraction of such a wide range of chemical reaction attributes, the dataset afforded good precision (71.2%) and recall (36.3%). These figures-of-merit were calculated based on a random sample of open-access papers from the corpus. Mining such a complex set of attributes required the development of novel techniques in knowledge extraction and interdependency resolution, leveraging inter- and intra-sentence relations, which are also described in this paper. We present a new version (version 2.2) of the chemistry-aware text-mining toolkit ChemDataExtractor, in which these new techniques are included.
Collapse
Affiliation(s)
- Taketomo Isazawa
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, UK
| | - Jacqueline M Cole
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, UK.
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0QX, UK.
| |
Collapse
|
9
|
Abstract
Text mining in the optical-materials domain is becoming increasingly important as the number of scientific publications in this area grows rapidly. Language models such as Bidirectional Encoder Representations from Transformers (BERT) have opened up a new era and brought a significant boost to state-of-the-art natural-language-processing (NLP) tasks. In this paper, we present two "materials-aware" text-based language models for optical research, OpticalBERT and OpticalPureBERT, which are trained on a large corpus of scientific literature in the optical-materials domain. These two models outperform BERT and previous state-of-the-art models in a variety of text-mining tasks about optical materials. We also release the first "materials-aware" table-based language model, OpticalTable-SQA. This is a querying facility that solicits answers to questions about optical materials using tabular information that pertains to this scientific domain. The OpticalTable-SQA model was realized by fine-tuning the Tapas-SQA model using a manually annotated OpticalTableQA data set which was curated specifically for this work. While preserving its sequential question-answering performance on general tables, the OpticalTable-SQA model significantly outperforms Tapas-SQA on optical-materials-related tables. All models and data sets are available to the optical-materials-science community.
Collapse
Affiliation(s)
- Jiuyang Zhao
- Cavendish Laboratory, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K
| | - Shu Huang
- Cavendish Laboratory, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K
| | - Jacqueline M Cole
- Cavendish Laboratory, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K
- ISIS Neutron and Muon Source, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K
| |
Collapse
|
10
|
Abstract
Contemporary electrocatalysts for the reduction of CO2 often suffer from low stability, activity, and selectivity, or a combination thereof. Mn-carbonyl complexes represent a promising class of molecular electrocatalysts for the reduction of CO2 to CO as they are able to promote this reaction at relatively mild overpotentials, whereby rare-earth metals are not required. The electronic and geometric structure of the reaction center of these molecular electrocatalysts is precisely known and can be tuned via ligand modifications. However, ligand characteristics that are required to achieve high catalytic turnover at minimal overpotential remain unclear. We consider 55 Mn-carbonyl complexes, which have previously been synthesized and characterized experimentally. Four intermediates were identified that are common across all catalytic mechanisms proposed for Mn-carbonyl complexes, and their structures were used to calculate descriptors for each of the 55 Mn-carbonyl complexes. These electronic-structure-based descriptors encompass the binding energies, the highest occupied and lowest unoccupied molecular orbitals, and partial charges. Trends in turnover frequency and overpotential with these descriptors were analyzed to afford meaningful physical insights into what ligand characteristics lead to good catalytic performance, and how this is affected by the reaction conditions. These insights can be expected to significantly contribute to the rational design of more active Mn-carbonyl electrocatalysts.
Collapse
Affiliation(s)
- Jacob Florian
- Cavendish
Laboratory, University of Cambridge, J.J. Thomson Avenue, Cambridge CB3 0HE, U.K.
| | - Jacqueline M. Cole
- Cavendish
Laboratory, University of Cambridge, J.J. Thomson Avenue, Cambridge CB3 0HE, U.K.,ISIS
Neutron and Muon Source, STFC Rutherford
Appleton Laboratory, Harwell Campus for Science and Innovation, Didcot OX11 0QX, U.K.,
| |
Collapse
|
11
|
Jung G, Jung SG, Cole JM. Automatic Materials Characterization from Infrared Spectra Using Convolutional Neural Networks. Chem Sci 2023; 14:3600-3609. [PMID: 37006683 PMCID: PMC10055241 DOI: 10.1039/d2sc05892h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 02/22/2023] [Indexed: 02/25/2023] Open
Abstract
Infrared spectroscopy is a ubiquitous technique used to characterize unknown materials in the form of solids, liquids, or gases by identifying the constituent functional groups of molecules through the analysis...
Collapse
Affiliation(s)
- Guwon Jung
- Department of Physics, Cavendish Laboratory, University of Cambridge J. J. Thomson Avenue Cambridge CB3 0HE UK
- Scientific Computing Department, Science and Technology Facilities Council Didcot OX11 0FA UK
- Research Complex at Harwell, Rutherford Appleton Laboratory Didcot Oxfordshire OX11 OQX UK
| | - Son Gyo Jung
- Department of Physics, Cavendish Laboratory, University of Cambridge J. J. Thomson Avenue Cambridge CB3 0HE UK
- Research Complex at Harwell, Rutherford Appleton Laboratory Didcot Oxfordshire OX11 OQX UK
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory Harwell Science and Innovation Campus, Didcot OX11 0QX UK
| | - Jacqueline M Cole
- Department of Physics, Cavendish Laboratory, University of Cambridge J. J. Thomson Avenue Cambridge CB3 0HE UK
- Research Complex at Harwell, Rutherford Appleton Laboratory Didcot Oxfordshire OX11 OQX UK
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory Harwell Science and Innovation Campus, Didcot OX11 0QX UK
| |
Collapse
|
12
|
Abstract
A great number of scientific papers are published every year in the field of battery research, which forms a huge textual data source. However, it is difficult to explore and retrieve useful information efficiently from these large unstructured sets of text. The Bidirectional Encoder Representations from Transformers (BERT) model, trained on a large data set in an unsupervised way, provides a route to process the scientific text automatically with minimal human effort. To this end, we realized six battery-related BERT models, namely, BatteryBERT, BatteryOnlyBERT, and BatterySciBERT, each of which consists of both cased and uncased models. They have been trained specifically on a corpus of battery research papers. The pretrained BatteryBERT models were then fine-tuned on downstream tasks, including battery paper classification and extractive question-answering for battery device component classification that distinguishes anode, cathode, and electrolyte materials. Our BatteryBERT models were found to outperform the original BERT models on the specific battery tasks. The fine-tuned BatteryBERT was then used to perform battery database enhancement. We also provide a website application for its interactive use and visualization.
Collapse
Affiliation(s)
- Shu Huang
- Cavendish
Laboratory, Department of Physics, University
of Cambridge, J.J. Thomson Avenue, Cambridge CB3 0HE, U.K.
| | - Jacqueline M. Cole
- Cavendish
Laboratory, Department of Physics, University
of Cambridge, J.J. Thomson Avenue, Cambridge CB3 0HE, U.K.,ISIS
Neutron and Muon Source, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K.,
| |
Collapse
|
13
|
Huang S, Cole JM. BatteryDataExtractor: battery-aware text-mining software embedded with BERT models. Chem Sci 2022; 13:11487-11495. [PMID: 36348711 PMCID: PMC9627715 DOI: 10.1039/d2sc04322j] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 09/05/2022] [Indexed: 11/21/2022] Open
Abstract
Due to the massive growth of scientific publications, literature mining is becoming increasingly popular for researchers to thoroughly explore scientific text and extract such data to create new databases or augment existing databases. Efforts in literature-mining software design and implementation have improved text-mining productivity, but most of the toolkits that mine text are based on traditional machine-learning-algorithms which hinder the performance of downstream text-mining tasks. Natural-language processing (NLP) and text-mining technologies have seen a rapid development since the release of transformer models, such as bidirectional encoder representations from transformers (BERT). Upgrading rule-based or machine-learning-based literature-mining toolkits by embedding transformer models into the software is therefore likely to improve their text-mining performance. To this end, we release a Python-based literature-mining toolkit for the field of battery materials, BatteryDataExtractor, which involves the embedding of BatteryBERT models in its automated data-extraction pipeline. This pipeline employs BERT models for token-classification tasks, such as abbreviation detection, part-of-speech tagging, and chemical-named-entity recognition, as well as new double-turn question-answering data-extraction models for auto-generating repositories of inter-related material and property data as well as general information. We demonstrate that BatteryDataExtractor exhibits state-of-the-art performance on the evaluation data sets for both token classification and automated data extraction. To aid the use of BatteryDataExtractor, its code is provided as open-source software, with associated documentation to serve as a user guide.
Collapse
Affiliation(s)
- Shu Huang
- Cavendish Laboratory, Department of Physics, University of Cambridge J. J. Thomson Avenue Cambridge CB3 0HE UK
| | - Jacqueline M Cole
- Cavendish Laboratory, Department of Physics, University of Cambridge J. J. Thomson Avenue Cambridge CB3 0HE UK
- ISIS Neutron and Muon Source, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus Didcot Oxfordshire OX11 0QX UK
| |
Collapse
|
14
|
Abstract
The number of scientific publications reporting cutting-edge third-generation photovoltaic devices is increasing rapidly, owing to the pressing need to develop renewable-energy technologies that address the climate-change crisis. Consequently, the field could benefit from a central repository where photovoltaic-performance metrics, such as the power-conversion efficiency (η) are recorded. We present two automatically generated databases that contain photovoltaic properties and device material data for dye-sensitized solar cells (DSCs) and perovskite solar cells (PSCs), totalling 660,881 data entries representing 57,678 photovoltaic devices. The databases were generated by applying the text-mining toolkit ChemDataExtractor on a corpus of 25,720 articles. A multi-faceted evaluation, incorporating manual and automatic methods, was applied to ensure that the data contained therein were of the highest quality, with precision metrics ranging from 73.1% to 95.8%. The DSC database contains 475,045 entries representing 41,680 devices, and the PSC database contains 185,836 entries representing 15,818 devices. The databases are available in MongoDB and JSON formats, which can be queried in Python, R, Java and MATLAB for data-driven photovoltaic materials discovery.
Collapse
Affiliation(s)
- Edward J Beard
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, UK
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0QX, UK
| | - Jacqueline M Cole
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, UK.
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0QX, UK.
- Argonne National Laboratory, 9700 South Cass Avenue, Lemont, IL, 60439, USA.
- Department of Chemical Engineering and Biotechnology, University of Cambridge, West Cambridge Site, Philippa Fawcett Drive, Cambridge, CB3 0FS, UK.
| |
Collapse
|
15
|
Abstract
![]()
Predicting the properties
of materials prior to their synthesis
is of great significance in materials science. Optical materials exhibit
a large number of interesting properties that make them useful in
a wide range of applications, including optical glasses, optical fibers,
and laser optics. In all of these applications, refraction and its
chromatic dispersion can directly reflect the characteristics of the
transmitted light and determine the practical utility of the material.
We demonstrate the feasibility of reconstructing chromatic-dispersion
relations of well-known optical materials by aggregating data over
a large number of independent sources, which are contained within
a material database of experimentally determined refractive indices
and wavelength values. We also employ this database to develop a machine-learning
platform that can predict refractive indices of compounds without
needing to know the structure or other properties of a material of
interest. We present a web-based application that enables users to
build their customized machine-learning models; this will help the
scientific community to conduct further research into the discovery
of optical materials.
Collapse
Affiliation(s)
- Jiuyang Zhao
- Cavendish Laboratory, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K
| | - Jacqueline M Cole
- Cavendish Laboratory, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.,ISIS Neutron and Muon Source, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K.,Department of Chemical Engineering and Biotechnology, University of Cambridge, West Cambridge Site, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K
| |
Collapse
|
16
|
Zhao J, Cole JM. A database of refractive indices and dielectric constants auto-generated using ChemDataExtractor. Sci Data 2022; 9:192. [PMID: 35504964 PMCID: PMC9065060 DOI: 10.1038/s41597-022-01295-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Accepted: 03/01/2022] [Indexed: 01/20/2023] Open
Abstract
The ability to auto-generate databases of optical properties holds great potential for advancing optical research, especially with regards to the data-driven discovery of optical materials. An optical property database of refractive indices and dielectric constants is presented, which comprises a total of 49,076 refractive index and 60,804 dielectric constant data records on 11,054 unique chemicals. The database was auto-generated using the state-of-the-art natural language processing software, ChemDataExtractor, using a corpus of 388,461 scientific papers. The data repository offers a representative overview of the information on linear optical properties that resides in scientific papers from the past 30 years. Public availability of these data will enable a quick search for the optical property of certain materials. The large size of this repository will accelerate data-driven research on the design and prediction of optical materials and their properties. To the best of our knowledge, this is the first auto-generated database of optical properties from a large number of scientific papers. We provide a web interface to aid the use of our database. Measurement(s) | refractive indices • dielectric constants | Technology Type(s) | natural language processing |
Collapse
Affiliation(s)
- Jiuyang Zhao
- Cavendish Laboratory, University of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, UK
| | - Jacqueline M Cole
- Cavendish Laboratory, University of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, UK. .,ISIS Neutron and Muon Source, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0QX, UK. .,Department of Chemical Engineering and Biotechnology, University of Cambridge, West Cambridge Site, Philippa Fawcett Drive, Cambridge, CB3 0AS, UK.
| |
Collapse
|
17
|
Cole JM, Gosztola DJ, Velazquez-Garcia JDJ. Structural Capture of η 1-OSO to η 2-(OS)O Coordination Isomerism in a New Ruthenium-Based SO 2-Linkage Photoisomer That Exhibits Single-Crystal Optical Actuation. J Phys Chem C Nanomater Interfaces 2022; 126:6047-6059. [PMID: 35573119 PMCID: PMC9098168 DOI: 10.1021/acs.jpcc.2c00170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/09/2022] [Revised: 03/11/2022] [Indexed: 06/15/2023]
Abstract
Recent discoveries of a range of single-crystal optical actuators are feeding a new form of materials chemistry, given their broad range of potential applications, from light-induced molecular motors to light sensors and optical-memory media. A series of ruthenium-based coordination complexes that exhibit sulfur dioxide linkage photoisomerization is of particular interest because they exhibit single-crystal optical actuation via either optical switching or nano-optomechanical transduction processes. We report the discovery of a new complex in this series of chemicals, [Ru(SO2)(NH3)4(3-fluoropyridine)]tosylate2 (1), which forms an η1-OSO photoisomer with 70% photoconversion upon the application of 505 nm light. The uncoordinated oxygen atom in this η1-OSO photoisomer impinges on one of the arene rings in a neighboring tosylate counter ion of 1 just enough that incipient nano-optomechanical transduction is observed. The structure and optical properties of this actuator are characterized via in situ light-induced single-crystal X-ray diffraction (photocrystallography), single-crystal optical absorption spectroscopy and microscopy, as well as single-crystal Raman spectroscopy. These materials-characterization methods were also used to track thermally induced reverse isomerization processes in 1. One of these processes involves an η1-OSO to η2-(OS)O transition, which was found to proceed sufficiently slowly at 110 K that its structural mechanism could be determined via a time sequence of photocrystallography experiments. The resulting data allowed us to structurally capture the transition, which was shown to occur via a form of coordination isomerism. Our newfound knowledge about this structural mechanism will aid the molecular design of new [RuSO2] complexes with functional applications.
Collapse
Affiliation(s)
- Jacqueline M. Cole
- Cavendish
Laboratory, Department of Physics, University
of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.
- ISIS
Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot OX11 0QX, U.K.
- Center
for Nanoscale Materials, Argonne National
Laboratory, 9700 S Cass Avenue, Lemont, Illinois 60439, United
States
| | - David J. Gosztola
- Center
for Nanoscale Materials, Argonne National
Laboratory, 9700 S Cass Avenue, Lemont, Illinois 60439, United
States
| | - Jose de J. Velazquez-Garcia
- Cavendish
Laboratory, Department of Physics, University
of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.
| |
Collapse
|
18
|
Flanagan PJ, Cole JM. Clustering a database of optically-absorbing organic molecules via a hierarchical fingerprint scheme that categorizes similar functional molecular fragments. J Chem Phys 2022; 156:154110. [DOI: 10.1063/5.0087603] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
A measure of chemical similarity is only useful if it implies similarity in some relevant property space. Typically, similarity calculations operate by assigning each molecule a chemical fingerprint: a fixed-length vector of bits where the on-bits signify the presence of a certain feature. Common fingerprinting schemes, such as extended-connectivity fingerprints, are by definition general and fail to capture much of the domain-specific theory that underpins similarity in a specific domain. We develop a hierarchical fingerprinting scheme that is bespoke to a database of ~4,500 organic molecules and their cognate optical absorption spectral properties. Our fingerprinting scheme incorporates molecular fragmentation and domain-specific chemical intuition into an algorithm that categorizes each fragment as being one of either a core chemical group, a substituent or a bridge. The algorithm is applied to every molecule in the database to generate a pool of chemically relevant fragments that are labeled according to their structural category. The fingerprint of each molecule is then composed of a nested Python dictionary specifying the unique identifiers of its constituent fragment entities, and the structural links between them, to give a hierarchical molecular encoding scheme. Four case studies showcase the application of our fingerprinting scheme, known as ChemCluster, to the subject database. In each case, the clustered molecules display many interesting chemical trends. The enhanced similarity comparisons afforded by our fingerprinting scheme, as well as the large repository of categorized fragments generated during its development, constitute the first step towards using this database in a data-driven materials discovery workflow.
Collapse
|
19
|
Zhu M, Cole JM. PDFDataExtractor: A Tool for Reading Scientific Text and Interpreting Metadata from the Typeset Literature in the Portable Document Format. J Chem Inf Model 2022; 62:1633-1643. [PMID: 35349259 PMCID: PMC9049592 DOI: 10.1021/acs.jcim.1c01198] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
![]()
The
layout of portable document format (PDF) files is constant
to any screen, and the metadata therein are latent, compared to mark-up
languages such as HTML and XML. No semantic tags are usually provided,
and a PDF file is not designed to be edited or its data interpreted
by software. However, data held in PDF files need to be extracted
in order to comply with open-source data requirements that are now
government-regulated. In the chemical domain, related chemical and
property data also need to be found, and their correlations need to
be exploited to enable data science in areas such as data-driven materials
discovery. Such relationships may be realized using text-mining software
such as the “chemistry-aware” natural-language-processing
tool, ChemDataExtractor; however, this tool has limited data-extraction
capabilities from PDF files. This study presents the PDFDataExtractor
tool, which can act as a plug-in to ChemDataExtractor. It outperforms
other PDF-extraction tools for the chemical literature by coupling
its functionalities to the chemical-named entity-recognition capabilities
of ChemDataExtractor. The intrinsic PDF-reading abilities of ChemDataExtractor
are much improved. The system features a template-based architecture.
This enables semantic information to be extracted from the PDF files
of scientific articles in order to reconstruct the logical structure
of articles. While other existing PDF-extracting tools focus on quantity
mining, this template-based system is more focused on quality mining
on different layouts. PDFDataExtractor outputs information in JSON
and plain text, including the metadata of a PDF file, such as paper
title, authors, affiliation, email, abstract, keywords, journal, year,
document object identifier (DOI), reference, and issue number. With
a self-created evaluation article set, PDFDataExtractor achieved promising
precision for all key assessed metadata areas of the document text.
Collapse
Affiliation(s)
- Miao Zhu
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K
| | - Jacqueline M Cole
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.,ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K.,Department of Chemical Engineering and Biotechnology, University of Cambridge, West Cambridge Site, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K
| |
Collapse
|
20
|
Abstract
![]()
Chemical Named Entity Recognition
(NER) forms the basis of information
extraction tasks in the chemical domain. However, while such tasks
can involve multiple domains of chemistry at the same time, currently
available named entity recognizers are specialized in one part of
chemistry, resulting in such workflows failing for a biased subset
of mentions. This paper presents a single model that performs at close
to the state-of-the-art for both organic (CHEMDNER,
89.7 F1 score) and inorganic (Matscholar, 88.0 F1 score) NER tasks
at the same time. Our NER system utilizing the Bert architecture
is available as part of ChemDataExtractor 2.1, along with the data
sets and scripts used to train the model.
Collapse
Affiliation(s)
- Taketomo Isazawa
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K
| | - Jacqueline M Cole
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.,ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K.,Department of Chemical Engineering and Biotechnology, University of Cambridge, West Cambridge Site, Philippa Fawcett Drive, Cambridge CB3 0FS, U.K
| |
Collapse
|
21
|
Cole JM, Mayer UFJ. Characterizing Interfacial Structures of Dye-Sensitized Solar Cell Working Electrodes. Langmuir 2022; 38:871-890. [PMID: 35014533 DOI: 10.1021/acs.langmuir.1c02165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this feature article, we discuss the fundamental use of materials-characterization methods that directly determine structural information on the dye···TiO2 interface in dye-sensitized solar cells (DSCs). This interface is usually buried within the DSC and submerged in solvent and electrolyte, which renders such metrological work nontrivial. We will show how ex-situ X-ray reflectometry (XRR), atomic-force microscopy (AFM), grazing-incidence X-ray scattering (GIXS), pair-distribution-function analysis of X-ray diffraction data (gaPDF), and in-situ neutron reflectometry (NR) can be used to deliver specific structural information on the dye···TiO2 interface regarding dye anchoring, dye aggregation, molecular dye orientation, intermolecular spacing between dye molecules, interactions between the dye molecules and the TiO2 surface, and interactions between the dye molecules and the electrolyte components and precursors. Some of these materials-characterization techniques have been developed specifically for this purpose. We will demonstrate how the direct acquisition of such information from materials-characterization experiments is crucial for assembling a holistic structural picture of this interface, which in turn can be used to develop DSC design guidelines. Moreover, we will show how these methodologies can be used in the experimental-validation process of "design-to-device" pipelines for big-data- and machine-learning-based materials discovery. We conclude with an outlook on further developments of this design-to-device approach as well as the materials characterization of more dye···TiO2 interfacial structures that involve known DSC dyes using the methods described herein. In addition, we propose to combine these formally disparate metrologies so that their complementary merits can be exploited simultaneously. New metrologies of this kind could serve as a "one-stop-shop" for the materials characterization of surfaces, interfaces, and bulk structures in DSCs and other devices with layered architectures.
Collapse
Affiliation(s)
- Jacqueline M Cole
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, United Kingdom
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot OX11 0QX, United Kingdom
| | - Ulrich F J Mayer
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, United Kingdom
| |
Collapse
|
22
|
Yildirim B, Washington A, Doutch J, Cole JM. Calculating small-angle scattering intensity functions from electron-microscopy images. RSC Adv 2022; 12:16656-16662. [PMID: 35754871 PMCID: PMC9169464 DOI: 10.1039/d2ra00685e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 04/19/2022] [Indexed: 11/21/2022] Open
Abstract
We outline procedures to calculate small-angle scattering (SAS) intensity functions from 2-dimensional electron-microscopy (EM) images. Two types of scattering systems were considered: (a) the sample is a set of particles confined to a plane; or (b) the sample is modelled as parallel, infinitely long cylinders that extend into the image plane. In each case, an EM image is segmented into particle instances and the background, whereby coordinates and morphological parameters are computed and used to calculate the constituents of the SAS-intensity function. We compare our results with experimental SAS data, discuss limitations, both general and case specific, and outline some applications of this method which could potentially complement experimental SAS. We outline procedures to calculate small-angle scattering (SAS) intensity functions from 2-dimensional electron-microscopy (EM) images for two types of scattering systems.![]()
Collapse
Affiliation(s)
- Batuhan Yildirim
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, UK
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Didcot, Oxfordshire, OX11 0QX, UK
- Research Complex at Harwell, Rutherford Appleton Laboratory, Didcot, Oxfordshire, OX11 0FA, UK
| | - Adam Washington
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Didcot, Oxfordshire, OX11 0QX, UK
| | - James Doutch
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Didcot, Oxfordshire, OX11 0QX, UK
| | - Jacqueline M. Cole
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, UK
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Didcot, Oxfordshire, OX11 0QX, UK
- Research Complex at Harwell, Rutherford Appleton Laboratory, Didcot, Oxfordshire, OX11 0FA, UK
| |
Collapse
|
23
|
Jain A, Cole JM, Vázquez-Mayagoitia Á, Sternberg MG. Modeling dark- and light-induced crystal structures and single-crystal optical absorption spectra of ruthenium-based complexes that undergo SO 2-linkage photoisomerization. J Chem Phys 2021; 155:234111. [PMID: 34937382 DOI: 10.1063/5.0077415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
A family of coordination complexes of the type [Ru(SO2)(NH3)4X]m+Yn - (m, n = 1 or 2) exhibit optical switching capabilities in their single-crystal states. This striking effect is caused by the light-induced formation of SO2-linkage photoisomers, which are metastable if kept at suitably cool temperatures. We modeled the dark- and light-induced states of these large crystalline complexes via plane-wave (PW)- and molecular-orbital (MO)-based density functional theory (DFT) and time-dependent DFT in order to calculate their structural and optical properties; the calculated results are compared with experimental data. We show that the PW-DFT-based periodic models replicate the structural properties of these complexes more effectively than the MO-DFT-based molecular-fragment models, observing only small deviations in key bond lengths relative to the experimentally derived crystal structures. The periodic models were also found to more effectively simulate trends seen in experimental optical absorption spectra, with optical absorbance and coverage of the visible region increasing with the formation of the photoinduced geometries. The contribution of the metastable photoisomeric species more heavily focuses on the lower-energy end of the spectra. Spectra generated from the molecular-fragment models are limited by the geometry of the fragment used and the number of excited-state roots considered in those calculations. In general, periodic models outperform the molecular-fragment models owing to their ability to better appreciate the periodic phenomena that are present in these crystalline materials as opposed to MO approaches, which are finite methods. We thus demonstrate that PW-DFT-based periodic models should be considered as a more than viable method for simulating the optical and electronic properties of these single-crystal optical switches.
Collapse
Affiliation(s)
- Apoorv Jain
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, United Kingdom
| | - Jacqueline M Cole
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, United Kingdom
| | | | - Michael G Sternberg
- Argonne National Laboratory, 9700 S Cass Avenue, Lemont, Illinois 60439, USA
| |
Collapse
|
24
|
Abstract
Chemical reaction schemes are commonly used for visual encapsulation of chemical information. Figures of reaction schemes contain chemical transformations, the chemical species involved, as well as reaction conditions. From a data-mining point of view, they constitute rich sources, densely packed with knowledge. Yet, the challenge of automatically extracting data from them has remained largely untackled. This work presents ReactionDataExtractor, a software tool that can be used for the automatic extraction of information from multistep reaction schemes. Its capabilities include segmentation of reaction steps, regions containing reaction conditions, chemical diagrams, as well as optical character and structure recognition. A combination of rules and unsupervised machine-learning approaches is used, with bespoke detection algorithms that identify arrows, structures, labels, and conditions detection algorithms. It can be used as a low-maintenance tool for database generation capable of extracting data from large quantities of images supplied by the user. On assessment using a self-generated evaluation set, the tool achieved precision and recall metrics of between 67% and 91% in the six core areas of data extraction. The ReactionDataExtractor tool is released under the MIT license and is available to download from http://www.reactiondataextractor.org.
Collapse
Affiliation(s)
- Damian M Wilary
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K
| | - Jacqueline M Cole
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.,ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K.,Department of Chemical Engineering and Biotechnology, University of Cambridge, West Cambridge Site, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K
| |
Collapse
|
25
|
Abstract
The ever-growing abundance of data found in heterogeneous sources, such as scientific publications, has forced the development of automated techniques for data extraction. While in the past, in the physical sciences domain, the focus has been on the precise extraction of individual properties, attention has recently been devoted to the extraction of higher-level relationships. Here, we present a framework for an automated population of ontologies. That is, the direct extraction of a larger group of properties linked by a semantic network. We exploit data-rich sources, such as tables within documents, and present a new model concept that enables data extraction for chemical and physical properties with the ability to organize hierarchical data as nested information. Combining these capabilities with automatically generated parsers for data extraction and forward-looking interdependency resolution, we illustrate the power of our approach via the automatic extraction of a crystallographic hierarchy of information. This includes 18 interrelated submodels of nested data, extracted from an evaluation set of scientific articles, yielding an overall precision of 92.2%, across 26 different journals. Our method and associated toolkit, ChemDataExtractor 2.0, offers a key step toward the seamless integration of primary literature sources into a data-driven scientific framework.
Collapse
Affiliation(s)
- Juraj Mavračić
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.,Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| | - Callum J Court
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K
| | - Taketomo Isazawa
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K
| | - Stephen R Elliott
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| | - Jacqueline M Cole
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.,Department of Chemical Engineering and Biotechnology, University of Cambridge, West Cambridge Site, Philippa Fawcett Drive, Cambridge CB3 0FS, U.K.,ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K
| |
Collapse
|
26
|
Cole JM, Gosztola DJ, Velazquez-Garcia JDJ. Nanooptomechanical Transduction in a Single Crystal with 100% Photoconversion. J Phys Chem C Nanomater Interfaces 2021; 125:8907-8915. [PMID: 34084264 PMCID: PMC8162413 DOI: 10.1021/acs.jpcc.1c02457] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 04/02/2021] [Indexed: 06/12/2023]
Abstract
Materials that exhibit nanooptomechanical transduction in their single-crystal form have prospective use in light-driven molecular machinery, nanotechnology, and quantum computing. Linkage photoisomerization is typically the source of such transduction in coordination complexes, although the isomers tend to undergo only partial photoconversion. We present a nanooptomechanical transducer, trans-[Ru(SO2)(NH3)4(3-bromopyridine)]tosylate2, whose S-bound η1-SO2 isomer fully converts into an O-bound η1-OSO photoisomer that is metastable while kept at 100 K. Its 100% photoconversion is confirmed structurally via photocrystallography, while single-crystal optical absorption and Raman spectroscopies reveal its metal-to-ligand charge-transfer and temperature-dependent characteristics. This perfect optical switching affords the material good prospects for nanooptomechanical transduction with single-photon control.
Collapse
Affiliation(s)
- Jacqueline M. Cole
- Cavendish
Laboratory, Department of Physics, University
of Cambridge, J.J. Thomson Avenue, Cambridge CB3 0HE, U.K.
- ISIS
Neutron and Muon Source, STFC Rutherford
Appleton Laboratory, Harwell Science and Innovation Campus, Didcot OX11 0QX, U.K.
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, West Cambridge Site, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K.
- Argonne
National Laboratory, 9700 South Cass Avenue, Lemont, Illinois 60439, United
States
| | - David J. Gosztola
- Argonne
National Laboratory, 9700 South Cass Avenue, Lemont, Illinois 60439, United
States
| | - Jose de J. Velazquez-Garcia
- Cavendish
Laboratory, Department of Physics, University
of Cambridge, J.J. Thomson Avenue, Cambridge CB3 0HE, U.K.
| |
Collapse
|
27
|
Abstract
Automating the analysis portion of materials characterization by electron microscopy (EM) has the potential to accelerate the process of scientific discovery. To this end, we present a Bayesian deep-learning model for semantic segmentation and localization of particle instances in EM images. These segmentations can subsequently be used to compute quantitative measures such as particle-size distributions, radial- distribution functions, average sizes, and aspect ratios of the particles in an image. Moreover, by making use of the epistemic uncertainty of our model, we obtain uncertainty estimates of its outputs and use these to filter out false-positive predictions and hence produce more accurate quantitative measures. We incorporate our method into the ImageDataExtractor package, as ImageDataExtractor 2.0, which affords a full pipeline to automatically extract particle information for large-scale data-driven materials discovery. Finally, we present and make publicly available the Electron Microscopy Particle Segmentation (EMPS) data set. This is the first human-labeled particle instance segmentation data set, consisting of 465 EM images and their corresponding semantic instance segmentation maps.
Collapse
Affiliation(s)
- Batuhan Yildirim
- Cavendish
Laboratory, Department of Physics, University
of Cambridge, J.J. Thomson Avenue, Cambridge CB3 0HE, U.K.
- ISIS
Neutron and Muon Source, STFC Rutherford
Appleton Laboratory, Didcot, Oxfordshire OX11 OQX, U.K.
- Research
Complex at Harwell, Rutherford Appleton
Laboratory, Didcot, Oxfordshire OX11 OQX, U.K.
| | - Jacqueline M. Cole
- Cavendish
Laboratory, Department of Physics, University
of Cambridge, J.J. Thomson Avenue, Cambridge CB3 0HE, U.K.
- ISIS
Neutron and Muon Source, STFC Rutherford
Appleton Laboratory, Didcot, Oxfordshire OX11 OQX, U.K.
- Research
Complex at Harwell, Rutherford Appleton
Laboratory, Didcot, Oxfordshire OX11 OQX, U.K.
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, J.J. Thomson Avenue, Cambridge CB3 0AS, U.K.
| |
Collapse
|
28
|
Deng K, Cole JM, Cooper JFK, Webster JRP, Haynes R, Al Bahri OK, Steinke NJ, Guan S, Stan L, Zhan X, Zhu T, Nye DW, Stenning GBG. Electrolyte/Dye/TiO 2 Interfacial Structures of Dye-Sensitized Solar Cells Revealed by In Situ Neutron Reflectometry with Contrast Matching. Langmuir 2021; 37:1970-1982. [PMID: 33492974 DOI: 10.1021/acs.langmuir.0c03508] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The nature of an interfacial structure buried within a device assembly is often critical to its function. For example, the dye/TiO2 interfacial structure that comprises the working electrode of a dye-sensitized solar cell (DSC) governs its photovoltaic output. These structures have been determined outside of the DSC device, using ex situ characterization methods; yet, they really should be probed while held within a DSC since they are modulated by the device environment. Dye/TiO2 structures will be particularly influenced by a layer of electrolyte ions that lies above the dye self-assembly. We show that electrolyte/dye/TiO2 interfacial structures can be resolved using in situ neutron reflectometry with contrast matching. We find that electrolyte constituents ingress into the self-assembled monolayer of dye molecules that anchor onto TiO2. Some dye/TiO2 anchoring configurations are modulated by the formation of electrolyte/dye intermolecular interactions. These electrolyte-influencing structural changes will affect dye-regeneration and electron-injection DSC operational processes. This underpins the importance of this in situ structural determination of electrolyte/dye/TiO2 interfaces within representative DSC device environments.
Collapse
Affiliation(s)
- Ke Deng
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, United Kingdom
- Research Complex at Harwell, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0FA, United Kingdom
| | - Jacqueline M Cole
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, United Kingdom
- Research Complex at Harwell, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0FA, United Kingdom
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, United Kingdom
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, United Kingdom
- Argonne National Laboratory, 9700 South Cass Avenue, Lemont, Illinois 60439, United States
| | - Joshaniel F K Cooper
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, United Kingdom
| | - John R P Webster
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, United Kingdom
| | - Richard Haynes
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, United Kingdom
| | - Othman K Al Bahri
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, United Kingdom
- Research Complex at Harwell, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0FA, United Kingdom
| | - Nina-Juliane Steinke
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, United Kingdom
| | - Shaoliang Guan
- Research Complex at Harwell, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0FA, United Kingdom
- Cardiff Catalysis Institute, School of Chemistry, Cardiff University, Cardiff CF10 3AT, United Kingdom
| | - Liliana Stan
- Argonne National Laboratory, 9700 South Cass Avenue, Lemont, Illinois 60439, United States
| | - Xiaozhi Zhan
- Dongguan Neutron Science Center, Dongguan 523000, China
| | - Tao Zhu
- Beijing National Laboratory for Condensed Matter Physics and Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China
- Songshan Lake Materials Laboratory, Dongguan, Guangdong 523808, China
| | - Daniel W Nye
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, United Kingdom
| | - Gavin B G Stenning
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, United Kingdom
| |
Collapse
|
29
|
Cole JM, Gosztola DJ, Velazquez-Garcia JDJ, Grass Wang S, Chen YS. Rapid build up of nanooptomechanical transduction in single crystals of a ruthenium-based SO 2 linkage photoisomer. Chem Commun (Camb) 2021; 57:1320-1323. [PMID: 33331833 DOI: 10.1039/d0cc06755e] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Single-crystal nanooptomechanical transduction occurs in [Ru(SO2)(NH3)4(H2O)]chlorobenzenesulfonate2, reaching maximal levels within 40 s at 100 K when photostimulated by 505 nm light. Its in situ light-induced crystal structure reveals the molecular origins of this optical actuation: 26.0(3)% of the η1-SO2 ligand photoconverts into an η1-OSO photoisomer which, in turn, induces a 49.6(9)° arene ring rotation in its neighbouring counter ion.
Collapse
Affiliation(s)
- Jacqueline M Cole
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, UK.
| | | | | | | | | |
Collapse
|
30
|
Cole JM, Gosztola DJ, Sylvester SO. Low-energy optical switching of SO 2 linkage isomerisation in single crystals of a ruthenium-based coordination complex. RSC Adv 2021; 11:13183-13192. [PMID: 35423860 PMCID: PMC8697492 DOI: 10.1039/d1ra01696b] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Accepted: 03/26/2021] [Indexed: 01/13/2023] Open
Abstract
Single crystals that behave as optical switches are desirable for a wide range of applications, from optical sensors to read–write memory media. A series of ruthenium-based complexes that exhibit optical switching in their single-crystal form via SO2 linkage photoisomerisation are of prospective interest for these technologies. This study explores the optical switching behaviour in one such complex, trans-[Ru(SO2)(NH3)4(H2O)]tosylate2 (1), in terms of its dark and photoinduced crystal structure, as well as its light and thermal decay characteristics, which are deduced by photocrystallography, single-crystal optical absorption spectroscopy and microscopy. Photocrystallography results reveal that a photoisomerisation level of 21.5(5)% is achievable in 1. Biphasic photochromic crystals of 1 were generated by applying green and then red light to switch on and off the η2-(OS)O photoisomer in different regions of a crystal. Heat is a known alternative to its thermal decay, whereby a method is demonstrated that employs optical absorption spectra to determine its activation energy of 30 kJ mol−1. This low-energy barrier to optical switching agrees well with computational studies on 1, as well as being comparable to activation energies in ruthenium-based nitrosyl linkage photoisomers that also display solid-state optical switching. Single crystals that behave as optical switches are desirable for a wide range of applications, from optical sensors to read–write memory media.![]()
Collapse
Affiliation(s)
- Jacqueline M. Cole
- Cavendish Laboratory
- Department of Physics
- University of Cambridge
- Cambridge
- UK
| | | | - Sven O. Sylvester
- Cavendish Laboratory
- Department of Physics
- University of Cambridge
- Cambridge
- UK
| |
Collapse
|
31
|
Abstract
Charge transfer across conjugated organic molecules is the functional basis of many optoelectronic and semiconductor devices. The ability to design such molecules to suit a given device application is highly desirable; yet, realizing this prospect is impeded by the lack of an algorithm that quantifies the extent of intramolecular charge transfer (ICT) in absolute terms. In turn, an algorithm to describe ICT is held back by a poor definition of one of its key dependent terms: conjugation. Current equations assume that π-bonding operates solely across two bonds, even though conjugation extends beyond these limits, and such equations only yield relative measures of π-conjugation. This work presents a four-step algorithm that enumerates ICT on an absolute scale. The method is applied successfully to four types of optoelectronic materials; results demonstrate the need to reconsider certain fundamental chemical-bonding and ICT concepts for conjugated molecules. These findings have implications for all optoelectronic and semiconducting materials.
Collapse
Affiliation(s)
- Jacqueline M Cole
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, United Kingdom.,ISIS Neutron and Muon Facility, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, OX11 0QX, United Kingdom.,Department of Chemical Engineering and Biotechnology, University of Cambridge , West Cambridge Site, Philippa Fawcett Drive, Cambridge, CB3 0AS, United Kingdom.,Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom
| |
Collapse
|
32
|
Court C, Yildirim B, Jain A, Cole JM. 3-D Inorganic Crystal Structure Generation and Property Prediction via Representation Learning. J Chem Inf Model 2020; 60:4518-4535. [PMID: 32866381 PMCID: PMC7592118 DOI: 10.1021/acs.jcim.0c00464] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Indexed: 12/31/2022]
Abstract
Generative models have been successfully used to synthesize completely novel images, text, music, and speech. As such, they present an exciting opportunity for the design of new materials for functional applications. So far, generative deep-learning methods applied to molecular and drug discovery have yet to produce stable and novel 3-D crystal structures across multiple material classes. To that end, we, herein, present an autoencoder-based generative deep-representation learning pipeline for geometrically optimized 3-D crystal structures that simultaneously predicts the values of eight target properties. The system is highly general, as demonstrated through creation of novel materials from three separate material classes: binary alloys, ternary perovskites, and Heusler compounds. Comparison of these generated structures to those optimized via electronic-structure calculations shows that our generated materials are valid and geometrically optimized.
Collapse
Affiliation(s)
- Callum
J. Court
- Cavendish
Laboratory, Department of Physics, University
of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, U.K.
| | - Batuhan Yildirim
- Cavendish
Laboratory, Department of Physics, University
of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, U.K.
| | - Apoorv Jain
- Cavendish
Laboratory, Department of Physics, University
of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, U.K.
- Department
of Chemical Engineering and Biotechnology, University of Cambridge,, West Cambridge Site, Philippa Fawcett Drive, Cambridge, CB3 0AS, U.K.
| | - Jacqueline M. Cole
- Cavendish
Laboratory, Department of Physics, University
of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, U.K.
- ISIS
Neutron and Muon Source, STFC Rutherford
Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K.
- Department
of Chemical Engineering and Biotechnology, University of Cambridge,, West Cambridge Site, Philippa Fawcett Drive, Cambridge, CB3 0AS, U.K.
| |
Collapse
|
33
|
Abstract
A database of battery materials is presented which comprises a total of 292,313 data records, with 214,617 unique chemical-property data relations between 17,354 unique chemicals and up to five material properties: capacity, voltage, conductivity, Coulombic efficiency and energy. 117,403 data are multivariate on a property where it is the dependent variable in part of a data series. The database was auto-generated by mining text from 229,061 academic papers using the chemistry-aware natural language processing toolkit, ChemDataExtractor version 1.5, which was modified for the specific domain of batteries. The collected data can be used as a representative overview of battery material information that is contained within text of scientific papers. Public availability of these data will also enable battery materials design and prediction via data-science methods. To the best of our knowledge, this is the first auto-generated database of battery materials extracted from a relatively large number of scientific papers. We also provide a Graphical User Interface (GUI) to aid the use of this database.
Collapse
Affiliation(s)
- Shu Huang
- Cavendish Laboratory, University of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, UK
| | - Jacqueline M Cole
- Cavendish Laboratory, University of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, UK.
- ISIS Neutron and Muon Source, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0QX, UK.
- Department of Chemical Engineering and Biotechnology, University of Cambridge, West Cambridge Site, Philippa Fawcett Drive, Cambridge, CB3 0AS, UK.
| |
Collapse
|
34
|
Beard EJ, Cole JM. ChemSchematicResolver: A Toolkit to Decode 2D Chemical Diagrams with Labels and R-Groups into Annotated Chemical Named Entities. J Chem Inf Model 2020; 60:2059-2072. [PMID: 32212690 DOI: 10.1021/acs.jcim.0c00042] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The number of journal articles in the scientific domain has grown to the point where it has become impossible for researchers to capitalize on all findings in their relevant discipline. Information is stored in these articles in a number of ways, including figures that describe important results. In organic chemistry, these figures often present chemical schematic diagrams that graphically define the structures of carbon-based compounds. These diagrams are intuitive for an expert to comprehend, but they are not designed for machines. This work presents ChemSchematicResolver, a software tool that can be used to identify chemical schematic diagrams within the figure of a document, resolve any R-group substituents within them, and convert the resulting diagrams to a machine-readable format in a high-throughput, autonomous fashion. The tool includes a new algorithm that is used to identify relevant diagrams and a mechanism that combines these data with contextual information from the rest of the document for the creation of highly relational databases. It includes support for a variety of general R-group structures, the first time this is available in any open-source chemical schematic diagram extraction tool. It is presented alongside a self-generated evaluation set, on which the most important assessment metric, precision, achieved 83-100% for all assessed areas. The ChemSchematicResolver tool is released under the MIT license and is available to download from www.chemschematicresolver.org.
Collapse
Affiliation(s)
- Edward J Beard
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.,ISIS Neutron and Muon Source, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K
| | - Jacqueline M Cole
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.,ISIS Neutron and Muon Source, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K.,Department of Chemical Engineering and Biotechnology, University of Cambridge, West Cambridge Site, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K
| |
Collapse
|
35
|
Abstract
The world needs new materials to stimulate the chemical industry in key sectors of our economy: environment and sustainability, information storage, optical telecommunications, and catalysis. Yet, nearly all functional materials are still discovered by "trial-and-error", of which the lack of predictability affords a major materials bottleneck to technological innovation. The average "molecule-to-market" lead time for materials discovery is currently 20 years. This is far too long for industrial needs, as highlighted by the Materials Genome Initiative, which has ambitious targets of up to 4-fold reductions in average molecule-to-market lead times. Such a large step change in progress can only be realistically achieved if one adopts an entirely new approach to materials discovery. Fortunately, a fundamentally new approach to materials discovery has been emerging, whereby data science with artificial intelligence offers a prospective solution to speed up these average molecule-to-market lead times.This approach is known as data-driven materials discovery. Its broad prospects have only recently become a reality, given the timely and major advances in "big data", artificial intelligence, and high-performance computing (HPC). Access to massive data sets has been stimulated by government-regulated open-access requirements for data and literature. Natural-language processing (NLP) and machine-learning (ML) tools that can mine data and find patterns therein are becoming mainstream. Exascale HPC capabilities that can aid data mining and pattern recognition and also generate their own data from calculations are now within our grasp. These timely advances present an ideal opportunity to develop data-driven materials-discovery strategies to systematically design and predict new chemicals for a given device application.This Account shows how data science can afford materials discovery via a four-step "design-to-device" pipeline that entails (1) data extraction, (2) data enrichment, (3) material prediction, and (4) experimental validation. Massive databases of cognate chemical and property information are first forged from "chemistry-aware" natural-language-processing tools, such as ChemDataExtractor, and enriched using machine-learning methods and high-throughput quantum-chemical calculations. New materials for a bespoke application can then be predicted by mining these databases with algorithmic encodings of relationships between chemical structures and physical properties that are known to deliver functional materials. These may take the form of classification, enumeration, or machine-learning algorithms. A data-mining workflow short-lists these predictions to a handful of lead candidate materials that go forward to experimental validation. This design-to-device approach is being developed to offer a roadmap for the accelerated discovery of new chemicals for functional applications. Case studies presented demonstrate its utility for photovoltaic, optical, and catalytic applications. While this Account is focused on applications in the physical sciences, the generic pipeline discussed is readily transferable to other scientific disciplines such as biology and medicine.
Collapse
Affiliation(s)
- Jacqueline M. Cole
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K
- Department of Chemical Engineering and Biotechnology, University of Cambridge, West Cambridge Site, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K
- Mathematical Institute, University of Oxford, Woodstock Road, Oxford OX2 6GG, U.K
| |
Collapse
|
36
|
Kettle B, Gerstmayr E, Streeter MJV, Albert F, Baggott RA, Bourgeois N, Cole JM, Dann S, Falk K, Gallardo González I, Hussein AE, Lemos N, Lopes NC, Lundh O, Ma Y, Rose SJ, Spindloe C, Symes DR, Šmíd M, Thomas AGR, Watt R, Mangles SPD. Single-Shot Multi-keV X-Ray Absorption Spectroscopy Using an Ultrashort Laser-Wakefield Accelerator Source. Phys Rev Lett 2019; 123:254801. [PMID: 31922780 DOI: 10.1103/physrevlett.123.254801] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2019] [Revised: 10/16/2019] [Indexed: 06/10/2023]
Abstract
Single-shot absorption measurements have been performed using the multi-keV x rays generated by a laser-wakefield accelerator. A 200 TW laser was used to drive a laser-wakefield accelerator in a mode which produced broadband electron beams with a maximum energy above 1 GeV and a broad divergence of ≈15 mrad FWHM. Betatron oscillations of these electrons generated 1.2±0.2×10^{6} photons/eV in the 5 keV region, with a signal-to-noise ratio of approximately 300∶1. This was sufficient to allow high-resolution x-ray absorption near-edge structure measurements at the K edge of a titanium sample in a single shot. We demonstrate that this source is capable of single-shot, simultaneous measurements of both the electron and ion distributions in matter heated to eV temperatures by comparison with density functional theory simulations. The unique combination of a high-flux, large bandwidth, few femtosecond duration x-ray pulse synchronized to a high-power laser will enable key advances in the study of ultrafast energetic processes such as electron-ion equilibration.
Collapse
Affiliation(s)
- B Kettle
- The John Adams Institute for Accelerator Science, Imperial College London, London, SW7 2AZ, United Kingdom
| | - E Gerstmayr
- The John Adams Institute for Accelerator Science, Imperial College London, London, SW7 2AZ, United Kingdom
| | - M J V Streeter
- Physics Department, Lancaster University, Lancaster LA1 4YB, United Kingdom
| | - F Albert
- Lawrence Livermore National Laboratory (LLNL), Livermore, California 94550, USA
| | - R A Baggott
- The John Adams Institute for Accelerator Science, Imperial College London, London, SW7 2AZ, United Kingdom
| | - N Bourgeois
- Central Laser Facility, STFC Rutherford Appleton Laboratory, Didcot OX11 0QX, United Kingdom
| | - J M Cole
- The John Adams Institute for Accelerator Science, Imperial College London, London, SW7 2AZ, United Kingdom
| | - S Dann
- Physics Department, Lancaster University, Lancaster LA1 4YB, United Kingdom
| | - K Falk
- Helmholtz-Zentrum Dresden-Rossendorf, Bautzner Landstrasse 400, 01328 Dresden, Germany
- Institute of Physics of the ASCR, Na Slovance 1999/2, 182 21 Prague, Czech Republic
- Technische Universität Dresden, 01062, Dresden, Germany
| | | | - A E Hussein
- Center for Ultrafast Optical Science, University of Michigan, Ann Arbor, Michigan 48109-2099, USA
| | - N Lemos
- Lawrence Livermore National Laboratory (LLNL), Livermore, California 94550, USA
| | - N C Lopes
- GoLP/Instituto de Plasmas e Fusão Nuclear, Instituto Superior Técnico, U.L., Lisboa 1049-001, Portugal
| | - O Lundh
- Department of Physics, Lund University, P.O. Box 118, S-22100, Lund, Sweden
| | - Y Ma
- Physics Department, Lancaster University, Lancaster LA1 4YB, United Kingdom
| | - S J Rose
- The John Adams Institute for Accelerator Science, Imperial College London, London, SW7 2AZ, United Kingdom
| | - C Spindloe
- Central Laser Facility, STFC Rutherford Appleton Laboratory, Didcot OX11 0QX, United Kingdom
| | - D R Symes
- Central Laser Facility, STFC Rutherford Appleton Laboratory, Didcot OX11 0QX, United Kingdom
| | - M Šmíd
- Helmholtz-Zentrum Dresden-Rossendorf, Bautzner Landstrasse 400, 01328 Dresden, Germany
| | - A G R Thomas
- Physics Department, Lancaster University, Lancaster LA1 4YB, United Kingdom
- Center for Ultrafast Optical Science, University of Michigan, Ann Arbor, Michigan 48109-2099, USA
| | - R Watt
- The John Adams Institute for Accelerator Science, Imperial College London, London, SW7 2AZ, United Kingdom
| | - S P D Mangles
- The John Adams Institute for Accelerator Science, Imperial College London, London, SW7 2AZ, United Kingdom
| |
Collapse
|
37
|
Beard EJ, Sivaraman G, Vázquez-Mayagoitia Á, Vishwanath V, Cole JM. Comparative dataset of experimental and computational attributes of UV/vis absorption spectra. Sci Data 2019; 6:307. [PMID: 31804487 PMCID: PMC6895184 DOI: 10.1038/s41597-019-0306-0] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Accepted: 11/12/2019] [Indexed: 02/07/2023] Open
Abstract
The ability to auto-generate databases of optical properties holds great prospects in data-driven materials discovery for optoelectronic applications. We present a cognate set of experimental and computational data that describes key features of optical absorption spectra. This includes an auto-generated database of 18,309 records of experimentally determined UV/vis absorption maxima, λmax, and associated extinction coefficients, ϵ, where present. This database was produced using the text-mining toolkit, ChemDataExtractor, on 402,034 scientific documents. High-throughput electronic-structure calculations using fast (simplified Tamm-Dancoff approach) and traditional (time-dependent) density functional theory were executed to predict λmax and oscillation strengths, f (related to ϵ) for a subset of validated compounds. Paired quantities of these computational and experimental data show strong correlations in λmax, f and ϵ, laying the path for reliable in silico calculations of additional optical properties. The total dataset of 8,488 unique compounds and a subset of 5,380 compounds with experimental and computational data, are available in MongoDB, CSV and JSON formats. These can be queried using Python, R, Java, and MATLAB, for data-driven optoelectronic materials discovery. Measurement(s) | ultraviolet–visible spectrum • absorption wavelength • extinction coefficient | Technology Type(s) | digital curation |
Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.10304897
Collapse
Affiliation(s)
- Edward J Beard
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, UK.,ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0QX, UK
| | - Ganesh Sivaraman
- Argonne National Laboratory, 9700 South Cass Avenue, Lemont, IL, 60439, USA
| | | | | | - Jacqueline M Cole
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, UK. .,ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0QX, UK. .,Argonne National Laboratory, 9700 South Cass Avenue, Lemont, IL, 60439, USA. .,Department of Chemical Engineering and Biotechnology, University of Cambridge, West Cambridge Site, Philippa Fawcett Drive, Cambridge, CB3 0FS, UK.
| |
Collapse
|
38
|
Abstract
The rise of data science is leading to new paradigms in data-driven materials discovery. This carries an essential notion that large data sources containing chemical structure and property information can be mined in a fashion that detects and exploits structure-property relationships, such that chemicals can be predicted to suit a given material application. The success of material predictions is predicated on these large data sources of chemical structure and property information being suited to a target application. Microscopy is commonly used to characterize chemical structure, especially in fields such as nanotechnology where material properties are highly dependent on the size and shape of nanoparticles. Large data sources of nanoparticle information stemming from microscopy images would thus be highly beneficial. Millions of microscopy images exist, but they lie fragmented across the literature, typically presented individually within a paper article and usually in a qualitative fashion therein, even though they harbor a wealth of numeric information. We present the ImageDataExtractor toolkit that autoidentifies and autoextracts microscopy images from scientific documents, whereupon it autonomously analyzes each image to produce quantitative particle size and shape information about its subject material. Each image is quantified by decoding its scale bar information using optical character recognition, with help from super-resolution convolutional neural networks where required. Individual particles are detected and profiled using various thresholding, segmentation, polygon fitting, and edge correction routines. The high-throughput operational capability of ImageDataExtractor means that it can be used to generate large-data sources of particle information for data-driven materials discovery. Evaluation metrics, precision and recall, are greater than 80% for the majority of the image processing steps, and precision is above 80% for all critical steps. The ImageDataExtractor tool is released under the MIT license and is available to download from http://www.imagedataextractor.org.
Collapse
Affiliation(s)
- Karim T Mukaddem
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K
| | - Edward J Beard
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.,ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K
| | - Batuhan Yildirim
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K
| | - Jacqueline M Cole
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.,ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K.,Department of Chemical Engineering and Biotechnology, University of Cambridge, West Cambridge Site, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K
| |
Collapse
|
39
|
Affiliation(s)
- Jacqueline M. Cole
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, United Kingdom
- ISIS Neutron and Muon Facility, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot OX11 0QX, United Kingdom
- Department of Chemical Engineering and Biotechnology, University of Cambridge, West Cambridge Site, Philippa Fawcett Drive, Cambridge CB3 0AS, United Kingdom
- Argonne National Laboratory, 9700 South Cass Avenue, Argonne, Illinois 60439, United States
| | - Giulio Pepe
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, United Kingdom
| | - Othman K. Al Bahri
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, United Kingdom
| | - Christopher B. Cooper
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, United Kingdom
- Department of Chemical Engineering and Biotechnology, University of Cambridge, West Cambridge Site, Philippa Fawcett Drive, Cambridge CB3 0AS, United Kingdom
| |
Collapse
|
40
|
Hussein AE, Senabulya N, Ma Y, Streeter MJV, Kettle B, Dann SJD, Albert F, Bourgeois N, Cipiccia S, Cole JM, Finlay O, Gerstmayr E, González IG, Higginbotham A, Jaroszynski DA, Falk K, Krushelnick K, Lemos N, Lopes NC, Lumsdon C, Lundh O, Mangles SPD, Najmudin Z, Rajeev PP, Schlepütz CM, Shahzad M, Smid M, Spesyvtsev R, Symes DR, Vieux G, Willingale L, Wood JC, Shahani AJ, Thomas AGR. Laser-wakefield accelerators for high-resolution X-ray imaging of complex microstructures. Sci Rep 2019; 9:3249. [PMID: 30824838 PMCID: PMC6397215 DOI: 10.1038/s41598-019-39845-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Accepted: 01/29/2019] [Indexed: 12/19/2022] Open
Abstract
Laser-wakefield accelerators (LWFAs) are high acceleration-gradient plasma-based particle accelerators capable of producing ultra-relativistic electron beams. Within the strong focusing fields of the wakefield, accelerated electrons undergo betatron oscillations, emitting a bright pulse of X-rays with a micrometer-scale source size that may be used for imaging applications. Non-destructive X-ray phase contrast imaging and tomography of heterogeneous materials can provide insight into their processing, structure, and performance. To demonstrate the imaging capability of X-rays from an LWFA we have examined an irregular eutectic in the aluminum-silicon (Al-Si) system. The lamellar spacing of the Al-Si eutectic microstructure is on the order of a few micrometers, thus requiring high spatial resolution. We present comparisons between the sharpness and spatial resolution in phase contrast images of this eutectic alloy obtained via X-ray phase contrast imaging at the Swiss Light Source (SLS) synchrotron and X-ray projection microscopy via an LWFA source. An upper bound on the resolving power of 2.7 ± 0.3 μm of the LWFA source in this experiment was measured. These results indicate that betatron X-rays from laser wakefield acceleration can provide an alternative to conventional synchrotron sources for high resolution imaging of eutectics and, more broadly, complex microstructures.
Collapse
Affiliation(s)
- A E Hussein
- Center for Ultrafast Optical Science, University of Michigan, Ann Arbor, MI, 48109-2099, USA.
| | - N Senabulya
- Department of Materials Science and Engineering, University of Michigan, Ann Arbor, MI, 48109-2099, USA
| | - Y Ma
- Center for Ultrafast Optical Science, University of Michigan, Ann Arbor, MI, 48109-2099, USA.,Physics Department, Lancaster University, Lancaster, LA1 4YB, UK.,The Cockcroft Institute, Keckwick Lane, Daresbury, WA4 4AD, UK
| | - M J V Streeter
- Physics Department, Lancaster University, Lancaster, LA1 4YB, UK.,The Cockcroft Institute, Keckwick Lane, Daresbury, WA4 4AD, UK.,The John Adams Institute for Accelerator Science, Imperial College London, London, SW7 2AZ, UK
| | - B Kettle
- The John Adams Institute for Accelerator Science, Imperial College London, London, SW7 2AZ, UK
| | - S J D Dann
- Physics Department, Lancaster University, Lancaster, LA1 4YB, UK.,The Cockcroft Institute, Keckwick Lane, Daresbury, WA4 4AD, UK
| | - F Albert
- Lawrence Livermore National Laboratory, NIF and Photon Sciences, Livermore, CA, 94550, USA
| | - N Bourgeois
- Central Laser Facility, STFC Rutherford Appleton Laboratory, Didcot, OX11 0QX, UK
| | - S Cipiccia
- Diamond Light Source, Harwell Science and Innovation Campus, Fermi Avenue, Didcot, OX11 0DE, UK
| | - J M Cole
- The John Adams Institute for Accelerator Science, Imperial College London, London, SW7 2AZ, UK
| | - O Finlay
- Physics Department, Lancaster University, Lancaster, LA1 4YB, UK.,The Cockcroft Institute, Keckwick Lane, Daresbury, WA4 4AD, UK
| | - E Gerstmayr
- The John Adams Institute for Accelerator Science, Imperial College London, London, SW7 2AZ, UK
| | | | - A Higginbotham
- York Plasma Institute, Department of Physics, University of York, York, YO10 5DD, UK
| | - D A Jaroszynski
- The Cockcroft Institute, Keckwick Lane, Daresbury, WA4 4AD, UK.,SUPA, Department of Physics, University of Strathclyde, Glasgow, G4 0NG, UK
| | - K Falk
- Helmholtz-Zentrum Dresden-Rossendorf, Bautzner Landstraße 400, 01328, Dresden, Germany.,Technische Universität Dresden, 01062, Dresden, Germany.,Institute of Physics of the ASCR, 182 21, Prague, Czech Republic
| | - K Krushelnick
- Center for Ultrafast Optical Science, University of Michigan, Ann Arbor, MI, 48109-2099, USA
| | - N Lemos
- Lawrence Livermore National Laboratory, NIF and Photon Sciences, Livermore, CA, 94550, USA
| | - N C Lopes
- The John Adams Institute for Accelerator Science, Imperial College London, London, SW7 2AZ, UK.,GoLP/Instituto de Plasmas e Fusão Nuclear, Instituto Superior Técnico, U.L., Lisboa, 1049-001, Portugal
| | - C Lumsdon
- York Plasma Institute, Department of Physics, University of York, York, YO10 5DD, UK
| | - O Lundh
- Department of Physics, Lund University, P.O. Box 118, S-22100, Lund, Sweden
| | - S P D Mangles
- The John Adams Institute for Accelerator Science, Imperial College London, London, SW7 2AZ, UK
| | - Z Najmudin
- The John Adams Institute for Accelerator Science, Imperial College London, London, SW7 2AZ, UK
| | - P P Rajeev
- Central Laser Facility, STFC Rutherford Appleton Laboratory, Didcot, OX11 0QX, UK
| | - C M Schlepütz
- Swiss Light Source, Paul Scherrer Institute, CH-5232, Villigen, Switzerland
| | - M Shahzad
- The Cockcroft Institute, Keckwick Lane, Daresbury, WA4 4AD, UK.,SUPA, Department of Physics, University of Strathclyde, Glasgow, G4 0NG, UK
| | - M Smid
- Helmholtz-Zentrum Dresden-Rossendorf, Bautzner Landstraße 400, 01328, Dresden, Germany.,ELI Beamlines, Institute of Physics of the ASCR, 182 21, Prague, Czech Republic
| | - R Spesyvtsev
- The Cockcroft Institute, Keckwick Lane, Daresbury, WA4 4AD, UK.,SUPA, Department of Physics, University of Strathclyde, Glasgow, G4 0NG, UK
| | - D R Symes
- Central Laser Facility, STFC Rutherford Appleton Laboratory, Didcot, OX11 0QX, UK
| | - G Vieux
- The Cockcroft Institute, Keckwick Lane, Daresbury, WA4 4AD, UK.,SUPA, Department of Physics, University of Strathclyde, Glasgow, G4 0NG, UK
| | - L Willingale
- Center for Ultrafast Optical Science, University of Michigan, Ann Arbor, MI, 48109-2099, USA
| | - J C Wood
- The John Adams Institute for Accelerator Science, Imperial College London, London, SW7 2AZ, UK
| | - A J Shahani
- Department of Materials Science and Engineering, University of Michigan, Ann Arbor, MI, 48109-2099, USA
| | - A G R Thomas
- Center for Ultrafast Optical Science, University of Michigan, Ann Arbor, MI, 48109-2099, USA.,Physics Department, Lancaster University, Lancaster, LA1 4YB, UK.,The Cockcroft Institute, Keckwick Lane, Daresbury, WA4 4AD, UK
| |
Collapse
|
41
|
Cole JM, Ashcroft CM. Generic Classification Scheme for Second-Order Dipolar Nonlinear Optical Organometallic Complexes That Exhibit Second Harmonic Generation. J Phys Chem A 2018; 123:702-714. [DOI: 10.1021/acs.jpca.8b11687] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Jacqueline M. Cole
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K
- Department of Chemical Engineering and Biotechnology, University of Cambridge, West Cambridge Site, Philippa Fawcett Drive, Cambridge CB3 0FS, U.K
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K
- Argonne National Laboratory, 9700 South Cass Avenue, Argonne, Illinois 60439, United States
- Mathematical Institute, University of Oxford, Woodstock Road, Oxford OX2 6GG, U.K
| | - Christopher M. Ashcroft
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K
| |
Collapse
|
42
|
Behm KT, Cole JM, Joglekar AS, Gerstmayr E, Wood JC, Baird CD, Blackburn TG, Duff M, Harvey C, Ilderton A, Kuschel S, Mangles SPD, Marklund M, McKenna P, Murphy CD, Najmudin Z, Poder K, Ridgers CP, Sarri G, Samarin GM, Symes D, Warwick J, Zepf M, Krushelnick K, Thomas AGR. A spectrometer for ultrashort gamma-ray pulses with photon energies greater than 10 MeV. Rev Sci Instrum 2018; 89:113303. [PMID: 30501337 DOI: 10.1063/1.5056248] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Accepted: 10/16/2018] [Indexed: 06/09/2023]
Abstract
We present a design for a pixelated scintillator based gamma-ray spectrometer for non-linear inverse Compton scattering experiments. By colliding a laser wakefield accelerated electron beam with a tightly focused, intense laser pulse, gamma-ray photons up to 100 MeV energies and with few femtosecond duration may be produced. To measure the energy spectrum and angular distribution, a 33 × 47 array of cesium-iodide crystals was oriented such that the 47 crystal length axis was parallel to the gamma-ray beam and the 33 crystal length axis was oriented in the vertical direction. Using an iterative deconvolution method similar to the YOGI code, modeling of the scintillator response using GEANT4 and fitting to a quantum Monte Carlo calculated photon spectrum, we are able to extract the gamma ray spectra generated by the inverse Compton interaction.
Collapse
Affiliation(s)
- K T Behm
- Center for Ultrafast Optical Science, University of Michigan, Ann Arbor, Michigan 48109-2099, USA
| | - J M Cole
- The John Adams Institute for Accelerator Science, Imperial College London, London SW7 2AZ, United Kingdom
| | - A S Joglekar
- Physics and Astronomy, University of California, Los Angeles, Los Angeles, California 90095, USA
| | - E Gerstmayr
- The John Adams Institute for Accelerator Science, Imperial College London, London SW7 2AZ, United Kingdom
| | - J C Wood
- The John Adams Institute for Accelerator Science, Imperial College London, London SW7 2AZ, United Kingdom
| | - C D Baird
- York Plasma Institute, Department of Physics, University of York, York YO10 5DD, United Kingdom
| | - T G Blackburn
- Department of Physics, Chalmers University of Technology, SE-41296 Gothenburg, Sweden
| | - M Duff
- SUPA Department of Physics, University of Strathclyde, Glasgow G4 0NG, United Kingdom
| | - C Harvey
- Department of Physics, Chalmers University of Technology, SE-41296 Gothenburg, Sweden
| | - A Ilderton
- Department of Physics, Chalmers University of Technology, SE-41296 Gothenburg, Sweden
| | - S Kuschel
- Institut für Optik und Quantenelektronik, Friedrich-Schiller-Universität, 07743 Jena, Germany
| | - S P D Mangles
- The John Adams Institute for Accelerator Science, Imperial College London, London SW7 2AZ, United Kingdom
| | - M Marklund
- Department of Physics, Chalmers University of Technology, SE-41296 Gothenburg, Sweden
| | - P McKenna
- SUPA Department of Physics, University of Strathclyde, Glasgow G4 0NG, United Kingdom
| | - C D Murphy
- York Plasma Institute, Department of Physics, University of York, York YO10 5DD, United Kingdom
| | - Z Najmudin
- The John Adams Institute for Accelerator Science, Imperial College London, London SW7 2AZ, United Kingdom
| | - K Poder
- The John Adams Institute for Accelerator Science, Imperial College London, London SW7 2AZ, United Kingdom
| | - C P Ridgers
- York Plasma Institute, Department of Physics, University of York, York YO10 5DD, United Kingdom
| | - G Sarri
- School of Mathematics and Physics, The Queen's University of Belfast, BT7 1NN Belfast, United Kingdom
| | - G M Samarin
- School of Mathematics and Physics, The Queen's University of Belfast, BT7 1NN Belfast, United Kingdom
| | - D Symes
- Central Laser Facility, Rutherford Appleton Laboratory, Didcot OX11 0QX, United Kingdom
| | - J Warwick
- School of Mathematics and Physics, The Queen's University of Belfast, BT7 1NN Belfast, United Kingdom
| | - M Zepf
- Institut für Optik und Quantenelektronik, Friedrich-Schiller-Universität, 07743 Jena, Germany
| | - K Krushelnick
- Center for Ultrafast Optical Science, University of Michigan, Ann Arbor, Michigan 48109-2099, USA
| | - A G R Thomas
- Center for Ultrafast Optical Science, University of Michigan, Ann Arbor, Michigan 48109-2099, USA
| |
Collapse
|
43
|
Cramer AJ, Cole JM. Host-guest prospects of neodymium and gadolinium ultraphosphate frameworks for nuclear waste storage: Multi-temperature topological analysis of nanoporous cages in RP5O14. J SOLID STATE CHEM 2018. [DOI: 10.1016/j.jssc.2018.07.020] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
44
|
|
45
|
Affiliation(s)
- Jacqueline M. Cole
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K
- Department of Chemical Engineering and Biotechnology, University of Cambridge, West Cambridge Site, Philippa Fawcett Drive, Cambridge CB3 0FS, U.K
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K
- Argonne National Laboratory, 9700 South Cass Avenue, Argonne, Illinois 60439, United States
| | - Jose de J. Velazquez-Garcia
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K
| | - David J. Gosztola
- Argonne National Laboratory, 9700 South Cass Avenue, Argonne, Illinois 60439, United States
| | - SuYin Grass Wang
- ChemMatCARS Beamline,
The University of Chicago, Advanced Photon Source, Argonne, Illinois 60439, United States
| | - Yu-Sheng Chen
- ChemMatCARS Beamline,
The University of Chicago, Advanced Photon Source, Argonne, Illinois 60439, United States
| |
Collapse
|
46
|
Cole JM. Data-Driven Molecular Engineering of Solar-Powered Windows. Comput Sci Eng 2018. [DOI: 10.1109/mcse.2018.011111129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
47
|
Warwick J, Dzelzainis T, Dieckmann ME, Schumaker W, Doria D, Romagnani L, Poder K, Cole JM, Alejo A, Yeung M, Krushelnick K, Mangles SPD, Najmudin Z, Reville B, Samarin GM, Symes DD, Thomas AGR, Borghesi M, Sarri G. Experimental Observation of a Current-Driven Instability in a Neutral Electron-Positron Beam. Phys Rev Lett 2017; 119:185002. [PMID: 29219555 DOI: 10.1103/physrevlett.119.185002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Indexed: 06/07/2023]
Abstract
We report on the first experimental observation of a current-driven instability developing in a quasineutral matter-antimatter beam. Strong magnetic fields (≥1 T) are measured, via means of a proton radiography technique, after the propagation of a neutral electron-positron beam through a background electron-ion plasma. The experimentally determined equipartition parameter of ε_{B}≈10^{-3} is typical of values inferred from models of astrophysical gamma-ray bursts, in which the relativistic flows are also expected to be pair dominated. The data, supported by particle-in-cell simulations and simple analytical estimates, indicate that these magnetic fields persist in the background plasma for thousands of inverse plasma frequencies. The existence of such long-lived magnetic fields can be related to analog astrophysical systems, such as those prevalent in lepton-dominated jets.
Collapse
Affiliation(s)
- J Warwick
- School of Mathematics and Physics, Queen's University Belfast, University Road, Belfast BT7 1NN, United Kingdom
| | - T Dzelzainis
- School of Mathematics and Physics, Queen's University Belfast, University Road, Belfast BT7 1NN, United Kingdom
| | - M E Dieckmann
- Department of Science and Technology (ITN), Linköping University, Campus Norrköping, 60174 Norrköping, Sweden
| | - W Schumaker
- SLAC National Accelerator Laboratory, Menlo Park, California 94025, USA
| | - D Doria
- School of Mathematics and Physics, Queen's University Belfast, University Road, Belfast BT7 1NN, United Kingdom
| | - L Romagnani
- LULI, Ecole Polytechnique, CNRS, CEA, UPMC, 91128 Palaiseau, France
| | - K Poder
- The John Adams Institute for Accelerator Science, Blackett Laboratory, Imperial College London, London SW72AZ, United Kingdom
| | - J M Cole
- The John Adams Institute for Accelerator Science, Blackett Laboratory, Imperial College London, London SW72AZ, United Kingdom
| | - A Alejo
- School of Mathematics and Physics, Queen's University Belfast, University Road, Belfast BT7 1NN, United Kingdom
| | - M Yeung
- School of Mathematics and Physics, Queen's University Belfast, University Road, Belfast BT7 1NN, United Kingdom
| | - K Krushelnick
- Center for Ultrafast Optical Science, University of Michigan, Ann Arbor, Michigan 481099-2099, USA
| | - S P D Mangles
- The John Adams Institute for Accelerator Science, Blackett Laboratory, Imperial College London, London SW72AZ, United Kingdom
| | - Z Najmudin
- The John Adams Institute for Accelerator Science, Blackett Laboratory, Imperial College London, London SW72AZ, United Kingdom
| | - B Reville
- School of Mathematics and Physics, Queen's University Belfast, University Road, Belfast BT7 1NN, United Kingdom
| | - G M Samarin
- School of Mathematics and Physics, Queen's University Belfast, University Road, Belfast BT7 1NN, United Kingdom
| | - D D Symes
- Central Laser Facility, Rutherford Appleton Laboratory, Didcot, Oxfordshire OX11 0QX, United Kingdom
| | - A G R Thomas
- Center for Ultrafast Optical Science, University of Michigan, Ann Arbor, Michigan 481099-2099, USA
- Physics Department, Lancaster University, Lancaster LA1 4YB, United Kingdom
| | - M Borghesi
- School of Mathematics and Physics, Queen's University Belfast, University Road, Belfast BT7 1NN, United Kingdom
| | - G Sarri
- School of Mathematics and Physics, Queen's University Belfast, University Road, Belfast BT7 1NN, United Kingdom
| |
Collapse
|
48
|
McCree-Grey J, Cole JM, Holt SA, Evans PJ, Gong Y. DyeTiO 2 interfacial structure of dye-sensitised solar cell working electrodes buried under a solution of I -/I 3- redox electrolyte. Nanoscale 2017; 9:11793-11805. [PMID: 28786471 DOI: 10.1039/c7nr03936k] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Dye-sensitised solar cells (DSCs) have niche prospects for electricity-generating windows that could equip buildings for energy-sustainable future cities. However, this 'smart window' technology is being held back by a lack of understanding in how the dye interacts with its device environment at the molecular level. A better appreciation of the dyeTiO2 interfacial structure of the DSC working electrodes would be particularly valuable since associated structure-function relationships could be established; these rules would provide a 'toolkit' for the molecular engineering of more suitable DSC dyes via rational design. Previous materials characterisation efforts have been limited to determining this interfacial structure within an environment exposed to air or situated in a solvent medium. This study is the first to reveal the structure of this buried interface within the functional device environment, and represents the first application of in situ neutron reflectometry to DSC research. By incorporating the electrolyte into the structural model of this buried interface, we reveal how lithium cations from the electrolyte constituents influence the dyeTiO2 binding configuration of an organic sensitiser, MK-44, via Li+ complexation to the cyanoacrylate group. This dye is the molecular congener of the high-performance MK-2 DSC dye, whose hexa-alkyl chains appear to stabilise it from Li+ complexation. Our in situ neutron reflectometry findings are built up from auxiliary structural models derived from ex situ X-ray reflectometry and corroborated via density functional theory and UV/vis absorption spectroscopy. Significant differences between the in situ and ex situ dyeTiO2 interfacial structures are found, highlighting the need to characterise the molecular structure of DSC working electrodes while in a fully assembled device.
Collapse
Affiliation(s)
- Jonathan McCree-Grey
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, UK.
| | | | | | | | | |
Collapse
|
49
|
Cole JM, Blood-Forsythe MA, Lin TC, Pattison P, Gong Y, Vázquez-Mayagoitia Á, Waddell PG, Zhang L, Koumura N, Mori S. Discovery of S···C≡N Intramolecular Bonding in a Thiophenylcyanoacrylate-Based Dye: Realizing Charge Transfer Pathways and Dye···TiO 2 Anchoring Characteristics for Dye-Sensitized Solar Cells. ACS Appl Mater Interfaces 2017; 9:25952-25961. [PMID: 28692246 DOI: 10.1021/acsami.7b03522] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Donor-π-acceptor dyes containing thiophenyl π-conjugated units and cyanoacrylate acceptor groups are among the best-performing organic chromophores used in dye-sensitized solar cell (DSC) applications. Yet, the molecular origins of their high photovoltaic output have remained unclear until now. This synchrotron-based X-ray diffraction study elucidates these origins for the high-performance thiophenylcyanoacrylate-based dye MK-2 (7.7% DSC device efficiency) and its molecular building block, MK-44. The crystal structures of MK-2 and MK-44 are both determined, while a high-resolution charge-density mapping of the smaller molecule was also possible, enabling the nature of its bonding to be detailed. A strong S···C≡N intramolecular interaction is discovered, which bears a bond critical point, thus proving that this interaction should be formally classified as a chemical bond. A topological analysis of the π-conjugated portion of MK-44 shows that this S···C≡N bonding underpins the highly efficient intramolecular charge transfer (ICT) in thiophenylcyanoacrylate dyes. This manifests as two bipartite ICT pathways bearing carboxylate and nitrile end points. In turn, these pathways dictate a preferred COO/CN anchoring mode for the dye as it adsorbs onto TiO2 surfaces, to form the dye···TiO2 interface that constitutes the DSC working electrode. These results corroborate a recent proposal that all cyanoacrylate groups anchor onto TiO2 in this COO/CN binding configuration. Conformational analysis of the MK-44 and MK-2 crystal structures reveals that this S···C≡N bonding will persist in MK-2. Accordingly, this newly discovered bond affords a rational explanation for the attractive photovoltaic properties of MK-2. More generally, this study provides the first unequivocal evidence for an S···C≡N interaction, confirming previous speculative assignments of such interactions in other compounds.
Collapse
Affiliation(s)
- Jacqueline M Cole
- Cavendish Laboratory, Department of Physics, University of Cambridge , J. J. Thomson Avenue, Cambridge, CB3 0HE, United Kingdom
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory , Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, United Kingdom
- Argonne National Laboratory , 9700 S. Cass Avenue, Argonne, Illinois 60439, United States
- Department of Chemical Engineering and Biotechnology, University of Cambridge , West Cambridge Site, Philippa Fawcett Drive, Cambridge, CB3 0FS, United Kingdom
| | - Martin A Blood-Forsythe
- Cavendish Laboratory, Department of Physics, University of Cambridge , J. J. Thomson Avenue, Cambridge, CB3 0HE, United Kingdom
| | - Tze-Chia Lin
- Cavendish Laboratory, Department of Physics, University of Cambridge , J. J. Thomson Avenue, Cambridge, CB3 0HE, United Kingdom
| | - Philip Pattison
- Swiss Norwegian Beamlines, European Synchrotron Radiation Facility , F-38000 Grenoble, France
| | - Yun Gong
- Cavendish Laboratory, Department of Physics, University of Cambridge , J. J. Thomson Avenue, Cambridge, CB3 0HE, United Kingdom
| | | | - Paul G Waddell
- Cavendish Laboratory, Department of Physics, University of Cambridge , J. J. Thomson Avenue, Cambridge, CB3 0HE, United Kingdom
- Australian Centre for Neutron Scattering, Australian Nuclear Science and Technology Organisation, Lucas Heights NSW 2234, Australia
| | - Lei Zhang
- Cavendish Laboratory, Department of Physics, University of Cambridge , J. J. Thomson Avenue, Cambridge, CB3 0HE, United Kingdom
| | - Nagatoshi Koumura
- National Institute of Advanced Industrial Science and Technology , 1-1-1 Higashi, Tsukuba, Ibaraki 305-8565, Japan
| | - Shogo Mori
- Division of Chemistry and Materials, Shinshu University, Faculty of Textile Science and Technology , Ueda, Nagano 3868567, Japan
| |
Collapse
|
50
|
Abstract
The use of principal component analysis (PCA) to statistically infer features of local structure from experimental pair distribution function (PDF) data is assessed on a case study of rare-earth phosphate glasses (REPGs). Such glasses, codoped with two rare-earth ions (R and R') of different sizes and optical properties, are of interest to the laser industry. The determination of structure-property relationships in these materials is an important aspect of their technological development. Yet, realizing the local structure of codoped REPGs presents significant challenges relative to their singly doped counterparts; specifically, R and R' are difficult to distinguish in terms of establishing relative material compositions, identifying atomic pairwise correlation profiles in a PDF that are associated with each ion, and resolving peak overlap of such profiles in PDFs. This study demonstrates that PCA can be employed to help overcome these structural complications, by statistically inferring trends in PDFs that exist for a restricted set of experimental data on REPGs, and using these as training data to predict material compositions and PDF profiles in unknown codoped REPGs. The application of these PCA methods to resolve individual atomic pairwise correlations in t(r) signatures is also presented. The training methods developed for these structural predictions are prevalidated by testing their ability to reproduce known physical phenomena, such as the lanthanide contraction, on PDF signatures of the structurally simpler singly doped REPGs. The intrinsic limitations of applying PCA to analyze PDFs relative to the quality control of source data, data processing, and sample definition, are also considered. While this case study is limited to lanthanide-doped REPGs, this type of statistical inference may easily be extended to other inorganic solid-state materials and be exploited in large-scale data-mining efforts that probe many t(r) functions.
Collapse
Affiliation(s)
- Jacqueline M Cole
- Cavendish Laboratory, University of Cambridge , J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.,Argonne National Laboratory , 9700 South Cass Avenue, Argonne, Illinois 60439, United States.,ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus , Didcot, Oxfordshire OX11 0QX, U.K.,Department of Chemical Engineering and Biotechnology, University of Cambridge , West Cambridge Site, Philippa Fawcett Drive, Cambridge CB3 0FS, U.K
| | - Xie Cheng
- Cavendish Laboratory, University of Cambridge , J. J. Thomson Avenue, Cambridge CB3 0HE, U.K
| | - Michael C Payne
- Cavendish Laboratory, University of Cambridge , J. J. Thomson Avenue, Cambridge CB3 0HE, U.K
| |
Collapse
|