Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Guo J, Ibanez-Lopez AS, Gao H, Quach V, Coley CW, Jensen KF, Barzilay R. Automated Chemical Reaction Extraction from Scientific Literature. J Chem Inf Model 2021;62:2035-2045. [PMID: 34115937 DOI: 10.1021/acs.jcim.1c00284] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

For:	Guo J, Ibanez-Lopez AS, Gao H, Quach V, Coley CW, Jensen KF, Barzilay R. Automated Chemical Reaction Extraction from Scientific Literature. J Chem Inf Model 2021;62:2035-2045. [PMID: 34115937 DOI: 10.1021/acs.jcim.1c00284] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Number

Cited by Other Article(s)

Jeong J, Park T, Song J, Kang S, Won J, Han J, Min K. Integrating Data Mining and Natural Language Processing to Construct a Melting Point Database for Organometallic Compounds. J Chem Inf Model 2024;64:7432-7446. [PMID: 39352375 DOI: 10.1021/acs.jcim.4c01254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/03/2024]

Abstract

As semiconductor devices are miniaturized, the importance of atomic layer deposition (ALD) technology is growing. When designing ALD precursors, it is important to consider the melting point, because the precursors should have melting points lower than the process temperature. However, obtaining melting point data is challenging due to experimental sensitivity and high computational costs. As a result, a comprehensive and well-organized database for the melting point of the OMCs has not been fully reported yet. Therefore, in this study, we constructed a database of melting points for 1,845 OMCs, including 58 metal and 6 metalloid elements. The database contains CAS numbers, molecular formulas, and structural information and was constructed through automatic extraction and systematic curation. The melting point information was extracted using two methods: 1) 1,434 materials from 11 chemical vendor databases and 2) 411 materials identified through natural language processing (NLP) techniques with an accuracy of 86.3%, based on 2,096 scientific papers published over the past 29 years. In our database, the OMCs contain up to around 250 atoms and have melting points that range from -170 to 1610 °C. The main source is the Chemsrc database, accounting for 607 materials (32.9%), and Fe is the most common central metal or metalloid element (15.0%), followed by Si (11.6%) and B (6.7%). To validate the utilization of the constructed database, a multimodal neural network model was developed integrating graph-based and feature-based information as descriptors to predict the melting points of the OMCs but moderate performance. We believe the current approach reduces the time and cost associated with hand-operated data collection and processing, contributing to effective screening of potentially promising ALD precursors and providing crucial information for the advancement of the semiconductor industry.

Collapse

Chen LY, Li YP. Machine learning-guided strategies for reaction conditions design and optimization. Beilstein J Org Chem 2024;20:2476-2492. [PMID: 39376489 PMCID: PMC11457048 DOI: 10.3762/bjoc.20.212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Accepted: 09/19/2024] [Indexed: 10/09/2024] Open

Ai Q, Meng F, Shi J, Pelkie B, Coley CW. Extracting structured data from organic synthesis procedures using a fine-tuned large language model. DIGITAL DISCOVERY 2024;3:1822-1831. [PMID: 39157760 PMCID: PMC11322921 DOI: 10.1039/d4dd00091a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2024] [Accepted: 07/30/2024] [Indexed: 08/20/2024]

Huang Z, Li X, Li A, Yang Y, He L, Zhang Z, Wu S, Wang Y, Cai S, He Y, Liu X. MPNTEXT: An Interactive Platform for Automatically Extracting Metal-Polyphenol Networks and Their Applications from Scientific Literature. J Chem Inf Model 2024. [PMID: 39258795 DOI: 10.1021/acs.jcim.4c01093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]

Zhang X, Li Y, Li C, Zhu J, Gan Z, Wang L, Sun X, You H. A chemical reaction entity recognition method based on a natural language data augmentation strategy. Chem Commun (Camb) 2024;60:9610-9613. [PMID: 39148332 DOI: 10.1039/d4cc01471e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]

Rebello NJ, Arora A, Mochigase H, Lin TS, Shi J, Audus DJ, Muckley ES, Osmani A, Olsen BD. The Block Copolymer Phase Behavior Database. J Chem Inf Model 2024;64:6464-6476. [PMID: 39126359 DOI: 10.1021/acs.jcim.4c00242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2024]

Abstract

The Block Copolymer Database (BCDB) is a platform that allows users to search, submit, visualize, benchmark, and download experimental phase measurements and their associated characterization information for di- and multiblock copolymers. To the best of our knowledge, there is no widely accepted data model for publishing experimental and simulation data on block copolymer self-assembly. This proposed data schema with traceable information can accommodate any number of blocks and at the time of publication contains over 5400 block copolymer total melt phase measurements mined from the literature and manually curated and simulation data points of the phase diagram generated from self-consistent field theory that can rapidly be augmented. This database can be accessed via the Community Resource for Innovation in Polymer Technology (CRIPT) web application and the Materials Data Facility. The chemical structure of the polymer is encoded in BigSMILES, an extension of the Simplified Molecular-Input Line-Entry System (SMILES) into the macromolecular domain, and the user can search repeat units and functional groups using the SMARTS search syntax (SMILES Arbitrary Target Specification). The user can also query characterization and phase information using Structured Query Language (SQL) and download custom sets of block copolymer data to train machine learning models. Finally, a protocol is presented in which GPT-4, an AI-powered large language model, can be used to rapidly screen and identify block copolymer papers from the literature using only the abstract text and determine whether they have BCDB data, allowing the database to grow as the number of published papers on the World Wide Web increases. The F1 score for this model is 0.74. This platform is an important step in making polymer data more accessible to the broader community.

Collapse

Su Y, Wang X, Ye Y, Xie Y, Xu Y, Jiang Y, Wang C. Automation and machine learning augmented by large language models in a catalysis study. Chem Sci 2024;15:12200-12233. [PMID: 39118602 PMCID: PMC11304797 DOI: 10.1039/d3sc07012c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Accepted: 06/21/2024] [Indexed: 08/10/2024] Open

Fan V, Qian Y, Wang A, Wang A, Coley CW, Barzilay R. OpenChemIE: An Information Extraction Toolkit for Chemistry Literature. J Chem Inf Model 2024;64:5521-5534. [PMID: 38950894 DOI: 10.1021/acs.jcim.4c00572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/03/2024]

Zhang W, Wang Q, Kong X, Xiong J, Ni S, Cao D, Niu B, Chen M, Li Y, Zhang R, Wang Y, Zhang L, Li X, Xiong Z, Shi Q, Huang Z, Fu Z, Zheng M. Fine-tuning large language models for chemical text mining. Chem Sci 2024;15:10600-10611. [PMID: 38994403 PMCID: PMC11234886 DOI: 10.1039/d4sc00924j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Accepted: 06/02/2024] [Indexed: 07/13/2024] Open

Affiliation(s)

Wei Zhang Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
Qinggong Wang Nanjing University of Chinese Medicine 138 Xianlin Road Nanjing 210023 China
Xiangtai Kong Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
Jiacheng Xiong Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
Shengkun Ni Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
Duanhua Cao Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou Zhejiang 310058 China
Buying Niu Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
Mingan Chen Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China School of Physical Science and Technology, ShanghaiTech University Shanghai 201210 China Lingang Laboratory Shanghai 200031 China
Yameng Li ProtonUnfold Technology Co., Ltd Suzhou China
Runze Zhang Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
Yitian Wang Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
Lehan Zhang Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
Xutong Li Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
Zhaoping Xiong ProtonUnfold Technology Co., Ltd Suzhou China
Qian Shi Lingang Laboratory Shanghai 200031 China
Ziming Huang Medizinische Klinik und Poliklinik I, Klinikum der Universität München, Ludwig-Maximilians-Universität Munich Germany
Zunyun Fu Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China
Mingyue Zheng Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China Nanjing University of Chinese Medicine 138 Xianlin Road Nanjing 210023 China

Collapse

Shaw WJ, Kidder MK, Bare SR, Delferro M, Morris JR, Toma FM, Senanayake SD, Autrey T, Biddinger EJ, Boettcher S, Bowden ME, Britt PF, Brown RC, Bullock RM, Chen JG, Daniel C, Dorhout PK, Efroymson RA, Gaffney KJ, Gagliardi L, Harper AS, Heldebrant DJ, Luca OR, Lyubovsky M, Male JL, Miller DJ, Prozorov T, Rallo R, Rana R, Rioux RM, Sadow AD, Schaidle JA, Schulte LA, Tarpeh WA, Vlachos DG, Vogt BD, Weber RS, Yang JY, Arenholz E, Helms BA, Huang W, Jordahl JL, Karakaya C, Kian KC, Kothandaraman J, Lercher J, Liu P, Malhotra D, Mueller KT, O'Brien CP, Palomino RM, Qi L, Rodriguez JA, Rousseau R, Russell JC, Sarazen ML, Sholl DS, Smith EA, Stevens MB, Surendranath Y, Tassone CJ, Tran B, Tumas W, Walton KS. A US perspective on closing the carbon cycle to defossilize difficult-to-electrify segments of our economy. Nat Rev Chem 2024;8:376-400. [PMID: 38693313 DOI: 10.1038/s41570-024-00587-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/16/2024] [Indexed: 05/03/2024]

Affiliation(s)

Wendy J Shaw Pacific Northwest National Laboratory, Richland, WA, USA.
Michelle K Kidder Oak Ridge National Laboratory, Oak Ridge, TN, USA.
Simon R Bare SLAC National Accelerator Laboratory, Menlo Park, CA, USA.
Massimiliano Delferro Argonne National Laboratory, Lemont, IL, USA.
James R Morris Ames National Laboratory, Ames, IA, USA.
Francesca M Toma Lawrence Berkeley National Laboratory, Berkeley, CA, USA. Institute of Functional Materials for Sustainability, Helmholtz Zentrum Hereon, Teltow, Brandenburg, Germany.
Sanjaya D Senanayake Brookhaven National Laboratory, Upton, NY, USA.
Tom Autrey Pacific Northwest National Laboratory, Richland, WA, USA
Elizabeth J Biddinger Department of Chemical Engineering, The City College of New York, New York, NY, USA
Shannon Boettcher Lawrence Berkeley National Laboratory, Berkeley, CA, USA Department of Chemical & Biomolecular Engineering and Department of Chemistry, University of California, Berkeley, Berkeley, CA, USA
Mark E Bowden Pacific Northwest National Laboratory, Richland, WA, USA
Phillip F Britt Oak Ridge National Laboratory, Oak Ridge, TN, USA
Robert C Brown Department of Mechanical Engineering, Iowa State University, Ames, IA, USA
R Morris Bullock Pacific Northwest National Laboratory, Richland, WA, USA
Jingguang G Chen Brookhaven National Laboratory, Upton, NY, USA Department of Chemical Engineering, Columbia University, New York, NY, USA
Claus Daniel Argonne National Laboratory, Lemont, IL, USA
Peter K Dorhout Vice President for Research, Iowa State University, Ames, IA, USA
Rebecca A Efroymson Oak Ridge National Laboratory, Oak Ridge, TN, USA
Kelly J Gaffney SLAC National Accelerator Laboratory, Menlo Park, CA, USA
Laura Gagliardi Department of Chemistry, The University of Chicago, Chicago, IL, USA
Aaron S Harper Pacific Northwest National Laboratory, Richland, WA, USA
David J Heldebrant Pacific Northwest National Laboratory, Richland, WA, USA Chemical Engineering and Bioengineering, Washington State University, Pullman, WA, USA
Oana R Luca Department of Chemistry, University of Colorado Boulder, Boulder, CO, USA
Maxim Lyubovsky Booz Allen Hamilton, Washington DC, USA
Jonathan L Male Pacific Northwest National Laboratory, Richland, WA, USA Biological Systems Engineering Department, Washington State University, Pullman, WA, USA
Daniel J Miller Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Tanya Prozorov Ames National Laboratory, Ames, IA, USA
Robert Rallo Pacific Northwest National Laboratory, Richland, WA, USA
Rachita Rana Department of Chemical Engineering, University of California, Davis, CA, USA
Robert M Rioux Department of Chemical Engineering, The Pennsylvania State University, University Park, PA, USA
Aaron D Sadow Ames National Laboratory, Ames, IA, USA Department of Chemistry, Iowa State University, Ames, IA, USA
Joshua A Schaidle National Renewable Energy Laboratory, Golden, CO, USA
Lisa A Schulte Department of Natural Resource Ecology and Management, Iowa State University, Ames, IA, USA
William A Tarpeh Department of Chemical Engineering, Stanford University, Stanford, CA, USA
Dionisios G Vlachos Department of Chemical and Biomolecular Engineering, University of Delaware, Newark, DE, USA
Bryan D Vogt Department of Chemical Engineering, The Pennsylvania State University, University Park, PA, USA
Robert S Weber Pacific Northwest National Laboratory, Richland, WA, USA
Jenny Y Yang Department of Chemistry, University of California Irvine, Irvine, CA, USA
Elke Arenholz Pacific Northwest National Laboratory, Richland, WA, USA
Brett A Helms Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Wenyu Huang Ames National Laboratory, Ames, IA, USA Department of Chemistry, Iowa State University, Ames, IA, USA
James L Jordahl Department of Natural Resource Ecology and Management, Iowa State University, Ames, IA, USA
Canan Karakaya Oak Ridge National Laboratory, Oak Ridge, TN, USA
Kourosh Cyrus Kian Independent consultant, Washington DC, USA Department of Chemical Engineering, Worcester Polytechnic Institute, Worcester, MA, USA
Jotheeswari Kothandaraman Pacific Northwest National Laboratory, Richland, WA, USA
Johannes Lercher Pacific Northwest National Laboratory, Richland, WA, USA Department of Chemistry, Technical University of Munich, Munich, Germany
Ping Liu Brookhaven National Laboratory, Upton, NY, USA
Deepika Malhotra Pacific Northwest National Laboratory, Richland, WA, USA
Karl T Mueller Pacific Northwest National Laboratory, Richland, WA, USA
Casey P O'Brien Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, IN, USA
Robert M Palomino BASF Corporation, Iselin, NJ, USA
Long Qi Ames National Laboratory, Ames, IA, USA
José A Rodriguez Brookhaven National Laboratory, Upton, NY, USA
Roger Rousseau Oak Ridge National Laboratory, Oak Ridge, TN, USA
Jake C Russell Advanced Research Projects Agency - Energy, Department of Energy, Washington DC, USA
Michele L Sarazen Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ, USA
David S Sholl Oak Ridge National Laboratory, Oak Ridge, TN, USA
Emily A Smith Ames National Laboratory, Ames, IA, USA Department of Chemistry, Iowa State University, Ames, IA, USA
Michaela Burke Stevens SLAC National Accelerator Laboratory, Menlo Park, CA, USA
Yogesh Surendranath Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA
Christopher J Tassone SLAC National Accelerator Laboratory, Menlo Park, CA, USA
Ba Tran Pacific Northwest National Laboratory, Richland, WA, USA
William Tumas National Renewable Energy Laboratory, Golden, CO, USA
Krista S Walton School of Chemical and Biomolecular Engineering, Georgia Institute of Technology, Atlanta, GA, USA

Collapse

Bai J, Mosbach S, Taylor CJ, Karan D, Lee KF, Rihm SD, Akroyd J, Lapkin AA, Kraft M. A dynamic knowledge graph approach to distributed self-driving laboratories. Nat Commun 2024;15:462. [PMID: 38263405 PMCID: PMC10805810 DOI: 10.1038/s41467-023-44599-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 12/21/2023] [Indexed: 01/25/2024] Open

Affiliation(s)

Jiaru Bai Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge, CB3 0AS, UK
Sebastian Mosbach Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge, CB3 0AS, UK Cambridge Centre for Advanced Research and Education in Singapore (CARES), 1 Create Way, CREATE Tower, #05-05, Singapore, 138602, Singapore
Connor J Taylor Astex Pharmaceuticals, 436 Cambridge Science Park Milton Road, Cambridge, CB4 0QA, UK Innovation Centre in Digital Molecular Technologies, Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK Faculty of Engineering, University of Nottingham, University Park, Nottingham, NG7 2RD, UK
Dogancan Karan Cambridge Centre for Advanced Research and Education in Singapore (CARES), 1 Create Way, CREATE Tower, #05-05, Singapore, 138602, Singapore
Kok Foong Lee CMCL Innovations, Sheraton House, Cambridge, CB3 0AX, UK
Simon D Rihm Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge, CB3 0AS, UK Cambridge Centre for Advanced Research and Education in Singapore (CARES), 1 Create Way, CREATE Tower, #05-05, Singapore, 138602, Singapore
Jethro Akroyd Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge, CB3 0AS, UK Cambridge Centre for Advanced Research and Education in Singapore (CARES), 1 Create Way, CREATE Tower, #05-05, Singapore, 138602, Singapore
Alexei A Lapkin Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge, CB3 0AS, UK Cambridge Centre for Advanced Research and Education in Singapore (CARES), 1 Create Way, CREATE Tower, #05-05, Singapore, 138602, Singapore Innovation Centre in Digital Molecular Technologies, Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
Markus Kraft Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge, CB3 0AS, UK. Cambridge Centre for Advanced Research and Education in Singapore (CARES), 1 Create Way, CREATE Tower, #05-05, Singapore, 138602, Singapore. School of Chemical and Biomedical Engineering, Nanyang Technological University, 62 Nanyang Drive, 637459, Singapore, Singapore. The Alan Turing Institute, London, NW1 2DB, UK.

Collapse

Zhang B, Xiao H, Ye G, Song Z, Han T, Sharman E, Luo M, Cheng A, Zhu Q, Zhao H, Zhang G, Wang S, Jiang J. Label-Free Data Mining of Scientific Literature by Unsupervised Syntactic Distance Analysis. J Phys Chem Lett 2024;15:212-219. [PMID: 38157213 DOI: 10.1021/acs.jpclett.3c03345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2024]

Affiliation(s)

Baicheng Zhang Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
Hengyu Xiao Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
Guilin Ye Hefei JiShu Quantum Technology Co. Ltd., Hefei 230026, China
Zhaokun Song Hefei JiShu Quantum Technology Co. Ltd., Hefei 230026, China
Tiantian Han Hefei JiShu Quantum Technology Co. Ltd., Hefei 230026, China
Edward Sharman Department of Neurology, University of California, Irvine, California 92697, United States
Man Luo Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
Aoyuan Cheng Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
Qing Zhu Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
Haitao Zhao Materials Interfaces Center, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
Guoqing Zhang Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
Song Wang Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China
Jun Jiang Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, China

Collapse

Matsumoto Y, Gotoh H. Compound Classification and Consideration of Correlation with Chemical Descriptors from Articles on Antioxidant Capacity Using Natural Language Processing. J Chem Inf Model 2024;64:119-127. [PMID: 38118462 DOI: 10.1021/acs.jcim.3c01826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2023]

Zhang Y, Liu C, Liu M, Liu T, Lin H, Huang CB, Ning L. Attention is all you need: utilizing attention in AI-enabled drug discovery. Brief Bioinform 2023;25:bbad467. [PMID: 38189543 PMCID: PMC10772984 DOI: 10.1093/bib/bbad467] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 11/03/2023] [Accepted: 11/25/2023] [Indexed: 01/09/2024] Open

Machi K, Akiyama S, Nagata Y, Yoshioka M. OSPAR: A Corpus for Extraction of Organic Synthesis Procedures with Argument Roles. J Chem Inf Model 2023;63:6619-6628. [PMID: 37859303 PMCID: PMC10647022 DOI: 10.1021/acs.jcim.3c01449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 10/05/2023] [Accepted: 10/06/2023] [Indexed: 10/21/2023]

Li S, Zhang Y, Fang Z, Meng K, Tian R, He H, Sun S. Extracting the Synthetic Route of Pd-Based Catalysts in Methanol Steam Reforming from the Scientific Literature. J Chem Inf Model 2023;63:6249-6260. [PMID: 37807535 DOI: 10.1021/acs.jcim.3c01442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]

Jablonka KM, Ai Q, Al-Feghali A, Badhwar S, Bocarsly JD, Bran AM, Bringuier S, Brinson LC, Choudhary K, Circi D, Cox S, de Jong WA, Evans ML, Gastellu N, Genzling J, Gil MV, Gupta AK, Hong Z, Imran A, Kruschwitz S, Labarre A, Lála J, Liu T, Ma S, Majumdar S, Merz GW, Moitessier N, Moubarak E, Mouriño B, Pelkie B, Pieler M, Ramos MC, Ranković B, Rodriques SG, Sanders JN, Schwaller P, Schwarting M, Shi J, Smit B, Smith BE, Van Herck J, Völker C, Ward L, Warren S, Weiser B, Zhang S, Zhang X, Zia GA, Scourtas A, Schmidt KJ, Foster I, White AD, Blaiszik B. 14 examples of how LLMs can transform materials science and chemistry: a reflection on a large language model hackathon. DIGITAL DISCOVERY 2023;2:1233-1250. [PMID: 38013906 PMCID: PMC10561547 DOI: 10.1039/d3dd00113j] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 08/08/2023] [Indexed: 11/04/2023]

Affiliation(s)

Kevin Maik Jablonka Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL) Sion Valais Switzerland
Qianxiang Ai Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Alexander Al-Feghali Department of Chemistry, McGill University Montreal Quebec Canada
Shruti Badhwar Reincarnate Inc. USA
Joshua D Bocarsly Yusuf Hamied Department of Chemistry, University of Cambridge Lensfield Road Cambridge CB2 1EW UK
Andres M Bran Laboratory of Artificial Chemical Intelligence (LIAC), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL) Lausanne Switzerland National Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL) Lausanne Switzerland
Stefan Bringuier Independent Researcher San Diego CA USA
L Catherine Brinson Mechanical Engineering and Materials Science, Duke University USA
Kamal Choudhary Material Measurement Laboratory, National Institute of Standards and Technology Maryland 20899 USA
Defne Circi Mechanical Engineering and Materials Science, Duke University USA
Sam Cox Department of Chemical Engineering, University of Rochester USA
Wibe A de Jong Applied Mathematics and Computational Research Division, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
Matthew L Evans Institut de la Matière Condensée et des Nanosciences (IMCN), UCLouvain Chemin des Étoiles 8 Louvain-la-Neuve 1348 Belgium Matgenix SRL 185 Rue Armand Bury 6534 Gozée Belgium
Nicolas Gastellu Department of Chemistry, McGill University Montreal Quebec Canada
Jerome Genzling Department of Chemistry, McGill University Montreal Quebec Canada
María Victoria Gil Instituto de Ciencia y Tecnología del Carbono (INCAR), CSIC Francisco Pintado Fe 26 33011 Oviedo Spain
Ankur K Gupta Applied Mathematics and Computational Research Division, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
Zhi Hong Department of Computer Science, University of Chicago Chicago Illinois 60637 USA
Alishba Imran Computer Science, University of California Berkeley CA 94704 USA
Sabine Kruschwitz Bundesanstalt für Materialforschung und -prüfung Unter den Eichen 87 12205 Berlin Germany
Anne Labarre Department of Chemistry, McGill University Montreal Quebec Canada
Jakub Lála Francis Crick Institute 1 Midland Rd London NW1 1AT UK
Tao Liu Department of Chemistry, McGill University Montreal Quebec Canada
Steven Ma Department of Chemistry, McGill University Montreal Quebec Canada
Sauradeep Majumdar Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL) Sion Valais Switzerland
Garrett W Merz American Family Insurance Data Science Institute, University of Wisconsin-Madison Madison WI 53706 USA
Nicolas Moitessier Department of Chemistry, McGill University Montreal Quebec Canada
Elias Moubarak Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL) Sion Valais Switzerland
Beatriz Mouriño Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL) Sion Valais Switzerland
Brenden Pelkie Department of Chemical Engineering, University of Washington Seattle WA 98105 USA
Michael Pieler OpenBioML.org UK Stability.AI UK
Mayk Caldas Ramos Department of Chemical Engineering, University of Rochester USA
Bojana Ranković Laboratory of Artificial Chemical Intelligence (LIAC), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL) Lausanne Switzerland National Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL) Lausanne Switzerland
Samuel G Rodriques Francis Crick Institute 1 Midland Rd London NW1 1AT UK
Jacob N Sanders Department of Chemistry and Biochemistry, University of California Los Angeles CA 90095 USA
Philippe Schwaller Laboratory of Artificial Chemical Intelligence (LIAC), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL) Lausanne Switzerland National Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL) Lausanne Switzerland
Marcus Schwarting Department of Computer Science, University of Chicago Chicago IL 60490 USA
Jiale Shi Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Berend Smit Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL) Sion Valais Switzerland
Ben E Smith Yusuf Hamied Department of Chemistry, University of Cambridge Lensfield Road Cambridge CB2 1EW UK
Joren Van Herck Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL) Sion Valais Switzerland
Christoph Völker Bundesanstalt für Materialforschung und -prüfung Unter den Eichen 87 12205 Berlin Germany
Logan Ward Data Science and Learning Division, Argonne National Lab USA
Sean Warren Department of Chemistry, McGill University Montreal Quebec Canada
Benjamin Weiser Department of Chemistry, McGill University Montreal Quebec Canada
Sylvester Zhang Department of Chemistry, McGill University Montreal Quebec Canada
Xiaoqi Zhang Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL) Sion Valais Switzerland
Ghezal Ahmad Zia Bundesanstalt für Materialforschung und -prüfung Unter den Eichen 87 12205 Berlin Germany
Aristana Scourtas Globus, University of Chicago, Data Science and Learning Division, Argonne National Lab USA
K J Schmidt Globus, University of Chicago, Data Science and Learning Division, Argonne National Lab USA
Ian Foster Department of Computer Science, University of Chicago, Data Science and Learning Division, Argonne National Lab USA
Andrew D White Department of Chemical Engineering, University of Rochester USA
Ben Blaiszik Globus, University of Chicago, Data Science and Learning Division, Argonne National Lab USA

Collapse

Panayi A, Ward K, Benhadji-Schaff A, Ibanez-Lopez AS, Xia A, Barzilay R. Evaluation of a prototype machine learning tool to semi-automate data extraction for systematic literature reviews. Syst Rev 2023;12:187. [PMID: 37803451 PMCID: PMC10557215 DOI: 10.1186/s13643-023-02351-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 09/13/2023] [Indexed: 10/08/2023] Open

Abstract

BACKGROUND

Evidence-based medicine requires synthesis of research through rigorous and time-intensive systematic literature reviews (SLRs), with significant resource expenditure for data extraction from scientific publications. Machine learning may enable the timely completion of SLRs and reduce errors by automating data identification and extraction.

METHODS

We evaluated the use of machine learning to extract data from publications related to SLRs in oncology (SLR 1) and Fabry disease (SLR 2). SLR 1 predominantly contained interventional studies and SLR 2 observational studies. Predefined key terms and data were manually annotated to train and test bidirectional encoder representations from transformers (BERT) and bidirectional long-short-term memory machine learning models. Using human annotation as a reference, we assessed the ability of the models to identify biomedical terms of interest (entities) and their relations. We also pretrained BERT on a corpus of 100,000 open access clinical publications and/or enhanced context-dependent entity classification with a conditional random field (CRF) model. Performance was measured using the F1 score, a metric that combines precision and recall. We defined successful matches as partial overlap of entities of the same type.

RESULTS

For entity recognition, the pretrained BERT+CRF model had the best performance, with an F1 score of 73% in SLR 1 and 70% in SLR 2. Entity types identified with the highest accuracy were metrics for progression-free survival (SLR 1, F1 score 88%) or for patient age (SLR 2, F1 score 82%). Treatment arm dosage was identified less successfully (F1 scores 60% [SLR 1] and 49% [SLR 2]). The best-performing model for relation extraction, pretrained BERT relation classification, exhibited F1 scores higher than 90% in cases with at least 80 relation examples for a pair of related entity types.

CONCLUSIONS

The performance of BERT is enhanced by pretraining with biomedical literature and by combining with a CRF model. With refinement, machine learning may assist with manual data extraction for SLRs.

Collapse

Reid JP, Betinol IO, Kuang Y. Mechanism to model: a physical organic chemistry approach to reaction prediction. Chem Commun (Camb) 2023;59:10711-10721. [PMID: 37552047 DOI: 10.1039/d3cc03229a] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/09/2023]

Qian Y, Guo J, Tu Z, Coley CW, Barzilay R. RxnScribe: A Sequence Generation Model for Reaction Diagram Parsing. J Chem Inf Model 2023. [PMID: 37368970 DOI: 10.1021/acs.jcim.3c00439] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2023]

Shetty P, Rajan AC, Kuenneth C, Gupta S, Panchumarti LP, Holm L, Zhang C, Ramprasad R. A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing. NPJ COMPUTATIONAL MATERIALS 2023;9:52. [PMID: 37033291 PMCID: PMC10073792 DOI: 10.1038/s41524-023-01003-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 03/16/2023] [Indexed: 06/19/2023]

Qian Y, Guo J, Tu Z, Li Z, Coley CW, Barzilay R. MolScribe: Robust Molecular Structure Recognition with Image-to-Graph Generation. J Chem Inf Model 2023;63:1925-1934. [PMID: 36971363 DOI: 10.1021/acs.jcim.2c01480] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]

Wang W, Liu Y, Wang Z, Hao G, Song B. The way to AI-controlled synthesis: how far do we need to go? Chem Sci 2022;13:12604-12615. [PMID: 36519036 PMCID: PMC9645373 DOI: 10.1039/d2sc04419f] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 09/26/2022] [Indexed: 09/08/2024] Open

Shavalieva G, Papadokonstantakis S, Peters G. Prior Knowledge for Predictive Modeling: The Case of Acute Aquatic Toxicity. J Chem Inf Model 2022;62:4018-4031. [PMID: 35998659 PMCID: PMC9472271 DOI: 10.1021/acs.jcim.1c01079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Indexed: 11/30/2022]

Tang C, McInnes BT. Cascade Processes with Micellar Reaction Media: Recent Advances and Future Directions. Molecules 2022;27:molecules27175611. [PMID: 36080376 PMCID: PMC9458028 DOI: 10.3390/molecules27175611] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 08/27/2022] [Accepted: 08/29/2022] [Indexed: 11/26/2022] Open

Gao H, Zhu LT, Luo ZH, Fraga MA, Hsing IM. Machine Learning and Data Science in Chemical Engineering. Ind Eng Chem Res 2022. [DOI: 10.1021/acs.iecr.2c01788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Rarey M, Nicklaus MC, Warr W. Special Issue on Reaction Informatics and Chemical Space. J Chem Inf Model 2022;62:2009-2010. [DOI: 10.1021/acs.jcim.2c00390] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]

Zhang L, He M. Prediction of solar cell materials via unsupervised literature learning. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2021;34:095902. [PMID: 34844235 DOI: 10.1088/1361-648x/ac3e1e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 11/29/2021] [Indexed: 06/13/2023]