1
|
Mroz AM, Toka PN, Del Río Chanona EA, Jelfs KE. Web-BO: towards increased accessibility of Bayesian optimisation (BO) for chemistry. Faraday Discuss 2024. [PMID: 39344946 DOI: 10.1039/d4fd00109e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Historically, the chemical discovery process has predominantly been a matter of trial-and-improvement, where small modifications are made to a chemical system, guided by chemical knowledge, with the aim of optimising towards a target property or combination of properties. While a trial-and-improvement approach is frequently successful, especially when assisted by the help of serendipity, the approach is incredibly time- and resource-intensive. Complicating this further, the available chemical space that could, in theory, be explored is remarkably vast. As we are faced with near infinite possibilities and limited resources, we require improved search methods to effectively move towards desired optima, e.g. chemical systems exhibiting a target property, or several desired properties. Bayesian optimisation (BO) has recently gained significant traction in chemistry, where within the BO framework, prior knowledge is used to inform and guide the search process to optimise towards desired chemical targets, e.g. optimal reaction conditions to maximise yield, or optimal catalyst exhibiting improved catalytic activity. While powerful, implementing BO algorithms in practice is largely limited to interfacing via various APIs - requiring advanced coding experience and bespoke scripts for each optimisation task. Further, it is challenging to seamlessly link these with electronic lab notebooks via a graphical user interface (GUI). Ultimately, this limits the accessibility of BO algorithms. Here, we present Web-BO, a GUI to support BO for chemical optimisation tasks. We demonstrate its performance using an open source dataset and associated emulator, and link the platform with an existing electronic lab notebook, datalab. By providing a GUI-based BO service, we hope to improve the accessibility of data-driven optimisation tools in chemistry; https://suprashare.rcs.ic.ac.uk/web-bo/.
Collapse
Affiliation(s)
- Austin M Mroz
- Department of Chemistry, Imperial College London, White City Campus, W12 0BZ, UK.
- I-X Centre for AI in Science, Imperial College London, White City Campus, W12 0BZ, UK
| | - Piotr N Toka
- Department of Chemistry, Imperial College London, White City Campus, W12 0BZ, UK.
| | | | - Kim E Jelfs
- Department of Chemistry, Imperial College London, White City Campus, W12 0BZ, UK.
| |
Collapse
|
2
|
Steinbeck C, Koepler O, Bach F, Herres-Pawlis S, Jung N, Liermann J, Neumann S, Razum M, Baldauf C, Biedermann F, Bocklitz T, Boehm F, Broda F, Czodrowski P, Engel T, Hicks M, Kast S, Kettner C, Koch W, Lanza G, Link A, Mata R, Nagel W, Porzel A, Schlörer N, Schulze T, Weinig HG, Wenzel W, Wessjohann L, Wulle S. NFDI4Chem - Towards a National Research Data Infrastructure for Chemistry in Germany. RESEARCH IDEAS AND OUTCOMES 2020. [DOI: 10.3897/rio.6.e55852] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
The vision of NFDI4Chem is the digitalisation of all key steps in chemical research to support scientists in their efforts to collect, store, process, analyse, disclose and re-use research data. Measures to promote Open Science and Research Data Management (RDM) in agreement with the FAIR data principles are fundamental aims of NFDI4Chem to serve the chemistry community with a holistic concept for access to research data. To this end, the overarching objective is the development and maintenance of a national research data infrastructure for the research domain of chemistry in Germany, and to enable innovative and easy to use services and novel scientific approaches based on re-use of research data. NFDI4Chem intends to represent all disciplines of chemistry in academia. We aim to collaborate closely with thematically related consortia. In the initial phase, NFDI4Chem focuses on data related to molecules and reactions including data for their experimental and theoretical characterisation.
This overarching goal is achieved by working towards a number of key objectives:
Key Objective 1: Establish a virtual environment of federated repositories for storing, disclosing, searching and re-using research data across distributed data sources. Connect existing data repositories and, based on a requirements analysis, establish domain-specific research data repositories for the national research community, and link them to international repositories.
Key Objective 2: Initiate international community processes to establish minimum information (MI) standards for data and machine-readable metadata as well as open data standards in key areas of chemistry. Identify and recommend open data standards in key areas of chemistry, in order to support the FAIR principles for research data. Finally, develop standards, if there is a lack.
Key Objective 3: Foster cultural and digital change towards Smart Laboratory Environments by promoting the use of digital tools in all stages of research and promote subsequent Research Data Management (RDM) at all levels of academia, beginning in undergraduate studies curricula.
Key Objective 4: Engage with the chemistry community in Germany through a wide range of measures to create awareness for and foster the adoption of FAIR data management. Initiate processes to integrate RDM and data science into curricula. Offer a wide range of training opportunities for researchers.
Key Objective 5: Explore synergies with other consortia and promote cross-cutting development within the NFDI.
Key Objective 6: Provide a legally reliable framework of policies and guidelines for FAIR and open RDM.
Collapse
|
3
|
Krahe MA, Toohey J, Wolski M, Scuffham PA, Reilly S. Research data management in practice: Results from a cross-sectional survey of health and medical researchers from an academic institution in Australia. HEALTH INF MANAG J 2019; 49:108-116. [DOI: 10.1177/1833358319831318] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Background: Building or acquiring research data management (RDM) capacity is a major challenge for health and medical researchers and academic institutes alike. Considering that RDM practices influence the integrity and longevity of data, targeting RDM services and support in recognition of needs is especially valuable in health and medical research. Objective: This project sought to examine the current RDM practices of health and medical researchers from an academic institution in Australia. Method: A cross-sectional survey was used to collect information from a convenience sample of 81 members of a research institute (68 academic staff and 13 postgraduate students). A survey was constructed to assess selected data management tasks associated with the earlier stages of the research data life cycle. Results: Our study indicates that RDM tasks associated with creating, processing and analysis of data vary greatly among researchers and are likely influenced by their level of research experience and RDM practices within their immediate teams. Conclusion: Evaluating the data management practices of health and medical researchers, contextualised by tasks associated with the research data life cycle, is an effective way of shaping RDM services and support in this group. Implications: This study recognises that institutional strategies targeted at tasks associated with the creation, processing and analysis of data will strengthen researcher capacity, instil good research practice and, over time, improve health informatics and research data quality.
Collapse
Affiliation(s)
| | - Julie Toohey
- Library and Learning Services, Griffith University, Gold Coast, QLD, Australia
| | - Malcolm Wolski
- eResearch Services, Griffith University, Nathan, QLD, Australia
| | - Paul A Scuffham
- Centre for Applied Health Economics, Griffith University, Nathan, QLD, Australia
- Menzies Health Institute Queensland, Griffith University, Gold Coast, QLD, Australia
| | - Sheena Reilly
- Health Group, Griffith University, Gold Coast, QLD, Australia
| |
Collapse
|
4
|
Procedures for systematic capture and management of analytical data in academia. Anal Chim Acta X 2019; 1:100007. [PMID: 33117974 PMCID: PMC7587027 DOI: 10.1016/j.acax.2019.100007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2018] [Revised: 01/02/2019] [Accepted: 01/29/2019] [Indexed: 11/22/2022] Open
Abstract
Data management in universities is a challenging endeavor in particular due to the diverse infrastructure of devices and software in combination with limited budget. Nevertheless, in particular the analytical measurements and data sets need to be stored if possible digitally and in a well-organized manner. This manuscript describes how scientists can achieve a data management workflow focusing on data capture and storage by small adaptions to commonly used systems. The presented method includes data transfer options from ubiquitous devices like NMR instruments, GC (MS) or LC (MS), IR and Raman, or mass spectrometers to a central server and the visualization of the available data files in an electronic lab notebook (ELN). The given instruments were chosen according to the needs of synthetic chemists, in particular devices needed in organic, inorganic and polymer chemistry where single data files in the range of several megabytes per data set are produced. Altogether, three different data transfer systems were elaborated to allow a flexible handling of different devices running with different proprietary software: The first procedure allows data capture via the use of a mail server as data exchange point. With the second procedure, data are automatically mirrored from a local file folder to a central storage server where new files are monitored and processed. The third procedure was designed to transfer data with manual support to a central server which is supervised to register new information. All components that are necessary to install and use the herein elaborated functions are available as Open Source and the designed workflows are described step by step to facilitate the adaption of procedures in other universities accordingly if desired.
Collapse
|
5
|
Tremouilhac P, Nguyen A, Huang YC, Kotov S, Lütjohann DS, Hübsch F, Jung N, Bräse S. Chemotion ELN: an Open Source electronic lab notebook for chemists in academia. J Cheminform 2017; 9:54. [PMID: 29086216 PMCID: PMC5612905 DOI: 10.1186/s13321-017-0240-0] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2017] [Accepted: 09/18/2017] [Indexed: 11/18/2022] Open
Abstract
The development of an electronic lab notebook (ELN) for researchers working in the field of chemical sciences is presented. The web based application is available as an Open Source software that offers modern solutions for chemical researchers. The Chemotion ELN is equipped with the basic functionalities necessary for the acquisition and processing of chemical data, in particular the work with molecular structures and calculations based on molecular properties. The ELN supports planning, description, storage, and management for the routine work of organic chemists. It also provides tools for communicating and sharing the recorded research data among colleagues. Meeting the requirements of a state of the art research infrastructure, the ELN allows the search for molecules and reactions not only within the user’s data but also in conventional external sources as provided by SciFinder and PubChem. The presented development makes allowance for the growing dependency of scientific activity on the availability of digital information by providing Open Source instruments to record and reuse research data. The current version of the ELN has been using for over half of a year in our chemistry research group, serves as a common infrastructure for chemistry research and enables chemistry researchers to build their own databases of digital information as a prerequisite for the detailed, systematic investigation and evaluation of chemical reactions and mechanisms.
Collapse
Affiliation(s)
- Pierre Tremouilhac
- Institute of Toxicology and Genetics, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344, Eggenstein-Leopoldshafen, Germany
| | - An Nguyen
- Institute of Toxicology and Genetics, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344, Eggenstein-Leopoldshafen, Germany
| | - Yu-Chieh Huang
- Institute of Toxicology and Genetics, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344, Eggenstein-Leopoldshafen, Germany
| | - Serhii Kotov
- Institute of Toxicology and Genetics, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344, Eggenstein-Leopoldshafen, Germany
| | - Dominic Sebastian Lütjohann
- Institute of Toxicology and Genetics, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344, Eggenstein-Leopoldshafen, Germany.,Cubuslab GmbH, Lange Straße 2, 76199, Karlsruhe, Germany
| | - Florian Hübsch
- Ninja-Concept GmbH, Haid-und-Neu-Straße 18, 76131, Karlsruhe, Germany
| | - Nicole Jung
- Institute of Toxicology and Genetics, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344, Eggenstein-Leopoldshafen, Germany. .,Institute of Organic Chemistry, Karlsruhe Institute of Technology, Fritz-Haber-Weg 6, 76131, Karlsruhe, Germany.
| | - Stefan Bräse
- Institute of Toxicology and Genetics, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344, Eggenstein-Leopoldshafen, Germany. .,Institute of Organic Chemistry, Karlsruhe Institute of Technology, Fritz-Haber-Weg 6, 76131, Karlsruhe, Germany.
| |
Collapse
|
6
|
Perrier L, Blondal E, Ayala AP, Dearborn D, Kenny T, Lightfoot D, Reka R, Thuna M, Trimble L, MacDonald H. Research data management in academic institutions: A scoping review. PLoS One 2017; 12:e0178261. [PMID: 28542450 PMCID: PMC5441653 DOI: 10.1371/journal.pone.0178261] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Accepted: 04/26/2017] [Indexed: 11/18/2022] Open
Abstract
OBJECTIVE The purpose of this study is to describe the volume, topics, and methodological nature of the existing research literature on research data management in academic institutions. MATERIALS AND METHODS We conducted a scoping review by searching forty literature databases encompassing a broad range of disciplines from inception to April 2016. We included all study types and data extracted on study design, discipline, data collection tools, and phase of the research data lifecycle. RESULTS We included 301 articles plus 10 companion reports after screening 13,002 titles and abstracts and 654 full-text articles. Most articles (85%) were published from 2010 onwards and conducted within the sciences (86%). More than three-quarters of the articles (78%) reported methods that included interviews, cross-sectional, or case studies. Most articles (68%) included the Giving Access to Data phase of the UK Data Archive Research Data Lifecycle that examines activities such as sharing data. When studies were grouped into five dominant groupings (Stakeholder, Data, Library, Tool/Device, and Publication), data quality emerged as an integral element. CONCLUSION Most studies relied on self-reports (interviews, surveys) or accounts from an observer (case studies) and we found few studies that collected empirical evidence on activities amongst data producers, particularly those examining the impact of research data management interventions. As well, fewer studies examined research data management at the early phases of research projects. The quality of all research outputs needs attention, from the application of best practices in research data management studies, to data producers depositing data in repositories for long-term use.
Collapse
Affiliation(s)
- Laure Perrier
- Gerstein Science Information Centre, University of Toronto, Toronto, Ontario, Canada
| | - Erik Blondal
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
| | - A. Patricia Ayala
- Gerstein Science Information Centre, University of Toronto, Toronto, Ontario, Canada
| | - Dylanne Dearborn
- Gerstein Science Information Centre, University of Toronto, Toronto, Ontario, Canada
| | - Tim Kenny
- Gibson D. Lewis Health Science Library, UNT Health Science Center, Fort Worth, Texas, United States of America
| | - David Lightfoot
- St. Michael’s Hospital Library, St. Michael’s Hospital, Toronto, Ontario, Canada
| | - Roger Reka
- Faculty of Information, University of Toronto, Toronto, Ontario, Canada
| | - Mindy Thuna
- Engineering & Computer Science Library, University of Toronto, Toronto, Ontario, Canada
| | - Leanne Trimble
- Map and Data Library, University of Toronto, Toronto, Ontario, Canada
| | | |
Collapse
|
7
|
Willoughby C, Logothetis TA, Frey JG. Effects of using structured templates for recalling chemistry experiments. J Cheminform 2016; 8:9. [PMID: 26900406 PMCID: PMC4759737 DOI: 10.1186/s13321-016-0118-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2015] [Accepted: 01/26/2016] [Indexed: 11/10/2022] Open
Abstract
Background The way that we recall information is dependent upon both the knowledge in our memories and the conditions under which we recall the information. Electronic Laboratory Notebooks can provide a structured interface for the capture of experiment records through the use of forms and templates.
These templates can be useful by providing cues to help researchers to remember to record particular aspects of their experiment, but they may also constrain the information that is recorded by encouraging them to record only what is asked for. It is therefore unknown whether using structured templates for capturing experiment records will have positive or negative effects on the quality and usefulness of the records for assessment and future use. In this paper we report on the results of a set of studies investigating the effects of different template designs on the recording of experiments by undergraduate students and academic researchers. Results The results indicate that using structured templates to write up experiments does make a significant difference to the information that is recalled and recorded. These differences have both positive and negative effects, with templates prompting the capture of specific information that is otherwise forgotten, but also apparently losing some of the personal elements of the experiment experience such as observations and explanations. Other unexpected effects were seen with templates that can change the information that is captured, but also interfere with the way an experiment is conducted. Conclusions Our results showed that using structured templates can improve the completeness of the experiment context information captured but can also cause a loss of personal elements of the experiment experience when compared with allowing the researcher to structure their own record. The results suggest that interfaces for recording information about chemistry experiments, whether paper-based questionnaires or templates in
Electronic Laboratory Notebooks, can be an effective way to improve the quality of experiment write-ups, but that care needs to be taken to ensure that the correct cues are provided.Scientists have traditionally recorded their research in paper notebooks, a format that provides great flexibility for capturing information. In contrast, Electronic Laboratory Notebooks frequently make use of forms or structured templates for capturing experiment records. Structured templates can provide cues that can improve record quality by increasing the amount of information captured and encouraging consistency. However, using the wrong cues can lead to a loss of personal elements of the experiment experience and frustrate users. This image shows two participants from one of our studies recording their experiment using a computer-based template ![]() Electronic supplementary material The online version of this article (doi:10.1186/s13321-016-0118-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Cerys Willoughby
- Faculty of Natural and Environmental Sciences, University of Southampton, Southampton, SO17 1BJ UK
| | - Thomas A Logothetis
- Faculty of Natural and Environmental Sciences, University of Southampton, Southampton, SO17 1BJ UK
| | - Jeremy G Frey
- Faculty of Natural and Environmental Sciences, University of Southampton, Southampton, SO17 1BJ UK
| |
Collapse
|