1
|
de Blasio P, Elsborg J, Vegge T, Flores E, Bhowmik A. CALiSol-23: Experimental electrolyte conductivity data for various Li-salts and solvent combinations. Sci Data 2024; 11:750. [PMID: 38987528 PMCID: PMC11237020 DOI: 10.1038/s41597-024-03575-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 06/26/2024] [Indexed: 07/12/2024] Open
Abstract
Ion transport in non-aqueous electrolytes is crucial for high performance lithium-ion battery (LIB) development. The design of superior electrolytes requires extensive experimentation across the compositional space. To support data driven accelerated electrolyte discovery efforts, we curated and analyzed a large dataset covering a wide range of experimentally recorded ionic conductivities for various combinations of lithium salts, solvents, concentrations, and temperatures. The dataset is named as 'Conductivity Atlas for Lithium salts and Solvents' (CALiSol-23). Comprehensive datasets are lacking but are critical to building chemistry agnostic machine learning models for conductivity as well as data driven electrolyte optimization tasks. CALiSol-23 was derived from an exhaustive review of literature concerning experimental non-aqueous electrolyte conductivity measurement. The final dataset consists of 13,825 individual data points from 27 different experimental articles, in total covering 38 solvents, a broad temperature range, and 14 lithium salts. CALiSol-23 can help expedite machine learning model development that can help in understanding the complexities of ion transport and streamlining the optimization of non-aqueous electrolyte mixtures.
Collapse
Affiliation(s)
- Paolo de Blasio
- Technical University of Denmark, Department of Energy Conversion and Storage, Kgs. Lyngby, 2800, Denmark
| | - Jonas Elsborg
- Technical University of Denmark, Department of Energy Conversion and Storage, Kgs. Lyngby, 2800, Denmark
| | - Tejs Vegge
- Technical University of Denmark, Department of Energy Conversion and Storage, Kgs. Lyngby, 2800, Denmark
| | - Eibar Flores
- Technical University of Denmark, Department of Energy Conversion and Storage, Kgs. Lyngby, 2800, Denmark.
- SINTEF Industry, Sustainable Energy Technology, Trondheim, 7034, Norway.
| | - Arghya Bhowmik
- Technical University of Denmark, Department of Energy Conversion and Storage, Kgs. Lyngby, 2800, Denmark.
| |
Collapse
|
2
|
Liang Z, Lin C, Tan G, Li J, He Y, Cai S. A low-cost machine learning framework for predicting drug-drug interactions based on fusion of multiple features and a parameter self-tuning strategy. Phys Chem Chem Phys 2024; 26:6300-6315. [PMID: 38305788 DOI: 10.1039/d4cp00039k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Poly-drug therapy is now recognized as a crucial treatment, and the analysis of drug-drug interactions (DDIs) offers substantial theoretical support and guidance for its implementation. Predicting potential DDIs using intelligent algorithms is an emerging approach in pharmacological research. However, the existing supervised models and deep learning-based techniques still have several limitations. This paper proposes a novel DDI analysis and prediction framework called the Multi-View Semi-supervised Graph-based (MVSG) framework, which provides a comprehensive judgment by integrating multiple DDI features and functions without any time-consuming training process. Unlike conventional approaches, MVSG can search for the most suitable similarity (or distance) measurement among DDI data and construct graph structures for each feature. By employing a parameter self-tuning strategy, MVSG fuses multiple graphs according to the contributions of features' information. The actual anticancer drug data are extracted from the authoritative public database for evaluating the effectiveness of our framework, including 904 drugs, 7730 DDI records and 19 types of drug interactions. Validation results indicate that the prediction is more accurate when multiple features are adopted by our framework. In comparison to conventional machine learning techniques, MVSG can achieve higher performance even with less labeled data and without a training process. Finally, MVSG is employed to narrow down the search for potential valuable combinations.
Collapse
Affiliation(s)
- Zexiao Liang
- School of Integrated Circuits, Guangdong University of Technology, 100 Waihuan Xi Road, Panyu District, Guangzhou, 510006, Guangdong, China.
| | - Canxin Lin
- School of Computer Science and Technology, Guangdong University of Technology, 100 Waihuan Xi Road, Panyu District, Guangzhou, 510006, Guangdong, China
| | - Guoliang Tan
- School of Automation, Guangdong University of Technology, 100 Waihuan Xi Road, Panyu District, Guangzhou, 510006, Guangdong, China
| | - Jianzhong Li
- School of Integrated Circuits, Guangdong University of Technology, 100 Waihuan Xi Road, Panyu District, Guangzhou, 510006, Guangdong, China.
| | - Yan He
- School of Biomedical and Pharmaceutical Sciences, Guangdong University of Technology, 100 Waihuan Xi Road, Panyu District, Guangzhou, 510006, Guangdong, China
| | - Shuting Cai
- School of Integrated Circuits, Guangdong University of Technology, 100 Waihuan Xi Road, Panyu District, Guangzhou, 510006, Guangdong, China.
| |
Collapse
|
3
|
Heid E, Greenman KP, Chung Y, Li SC, Graff DE, Vermeire FH, Wu H, Green WH, McGill CJ. Chemprop: A Machine Learning Package for Chemical Property Prediction. J Chem Inf Model 2024; 64:9-17. [PMID: 38147829 PMCID: PMC10777403 DOI: 10.1021/acs.jcim.3c01250] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 12/04/2023] [Accepted: 12/05/2023] [Indexed: 12/28/2023]
Abstract
Deep learning has become a powerful and frequently employed tool for the prediction of molecular properties, thus creating a need for open-source and versatile software solutions that can be operated by nonexperts. Among the current approaches, directed message-passing neural networks (D-MPNNs) have proven to perform well on a variety of property prediction tasks. The software package Chemprop implements the D-MPNN architecture and offers simple, easy, and fast access to machine-learned molecular properties. Compared to its initial version, we present a multitude of new Chemprop functionalities such as the support of multimolecule properties, reactions, atom/bond-level properties, and spectra. Further, we incorporate various uncertainty quantification and calibration methods along with related metrics as well as pretraining and transfer learning workflows, improved hyperparameter optimization, and other customization options concerning loss functions or atom/bond features. We benchmark D-MPNN models trained using Chemprop with the new reaction, atom-level, and spectra functionality on a variety of property prediction data sets, including MoleculeNet and SAMPL, and observe state-of-the-art performance on the prediction of water-octanol partition coefficients, reaction barrier heights, atomic partial charges, and absorption spectra. Chemprop enables out-of-the-box training of D-MPNN models for a variety of problem settings in fast, user-friendly, and open-source software.
Collapse
Affiliation(s)
- Esther Heid
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Institute
of Materials Chemistry, TU Wien, 1060 Vienna, Austria
| | - Kevin P. Greenman
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Yunsie Chung
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Shih-Cheng Li
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemical Engineering, National Taiwan
University, Taipei 10617, Taiwan
| | - David E. Graff
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemistry and Chemical Biology, Harvard
University, Cambridge, Massachusetts 02138, United States
| | - Florence H. Vermeire
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemical Engineering, KU Leuven, Celestijnenlaan 200F, B-3001 Leuven, Belgium
| | - Haoyang Wu
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - William H. Green
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Charles J. McGill
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemical and Life Science Engineering, Virginia Commonwealth University, Richmond, Virginia 23284, United States
| |
Collapse
|
4
|
Cui S, Gao Y, Huang Y, Shen L, Zhao Q, Pan Y, Zhuang S. Advances and applications of machine learning and deep learning in environmental ecology and health. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2023; 335:122358. [PMID: 37567408 DOI: 10.1016/j.envpol.2023.122358] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 08/02/2023] [Accepted: 08/08/2023] [Indexed: 08/13/2023]
Abstract
Machine learning (ML) and deep learning (DL) possess excellent advantages in data analysis (e.g., feature extraction, clustering, classification, regression, image recognition and prediction) and risk assessment and management in environmental ecology and health (EEH). Considering the rapid growth and increasing complexity of data in EEH, it is of significance to summarize recent advances and applications of ML and DL in EEH. This review summarized the basic processes and fundamental algorithms of the ML and DL modeling, and indicated the urgent needs of ML and DL in EEH. Recent research hotspots such as environmental ecology and restoration, environmental fate of new pollutants, chemical exposures and risks, chemical hazard identification and control were highlighted. Various applications of ML and DL in EEH demonstrate their versatility and technological revolution, and present some challenges. The perspective of ML and DL in EEH were further outlined to promote the innovative analysis and cultivation of the ML-driven research paradigm.
Collapse
Affiliation(s)
- Shixuan Cui
- Key Laboratory of Environment Remediation and Ecological Health, Ministry of Education, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou, 310058, China; Women's Hospital, School of Medicine, Zhejiang University, Hangzhou, 310006, China
| | - Yuchen Gao
- Key Laboratory of Environment Remediation and Ecological Health, Ministry of Education, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Yizhou Huang
- Women's Hospital, School of Medicine, Zhejiang University, Hangzhou, 310006, China
| | - Lilai Shen
- Key Laboratory of Environment Remediation and Ecological Health, Ministry of Education, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Qiming Zhao
- Key Laboratory of Environment Remediation and Ecological Health, Ministry of Education, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Yaru Pan
- Key Laboratory of Environment Remediation and Ecological Health, Ministry of Education, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Shulin Zhuang
- Key Laboratory of Environment Remediation and Ecological Health, Ministry of Education, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou, 310058, China; Women's Hospital, School of Medicine, Zhejiang University, Hangzhou, 310006, China.
| |
Collapse
|
5
|
Biswas S, Chung Y, Ramirez J, Wu H, Green WH. Predicting Critical Properties and Acentric Factors of Fluids Using Multitask Machine Learning. J Chem Inf Model 2023; 63:4574-4588. [PMID: 37487557 DOI: 10.1021/acs.jcim.3c00546] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/26/2023]
Abstract
Knowledge of critical properties, such as critical temperature, pressure, density, as well as acentric factor, is essential to calculate thermo-physical properties of chemical compounds. Experiments to determine critical properties and acentric factors are expensive and time intensive; therefore, we developed a machine learning (ML) model that can predict these molecular properties given the SMILES representation of a chemical species. We explored directed message passing neural network (D-MPNN) and graph attention network as ML architecture choices. Additionally, we investigated featurization with additional atomic and molecular features, multitask training, and pretraining using estimated data to optimize model performance. Our final model utilizes a D-MPNN layer to learn the molecular representation and is supplemented by Abraham parameters. A multitask training scheme was used to train a single model to predict all the critical properties and acentric factors along with boiling point, melting point, enthalpy of vaporization, and enthalpy of fusion. The model was evaluated on both random and scaffold splits where it shows state-of-the-art accuracies. The extensive data set of critical properties and acentric factors contains 1144 chemical compounds and is made available in the public domain together with the source code that can be used for further exploration.
Collapse
Affiliation(s)
- Sayandeep Biswas
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Yunsie Chung
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Josephine Ramirez
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Haoyang Wu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
6
|
Peiro Ahmady Langeroudy K, Kharazi Esfahani P, Khorsand Movaghar MR. Enhanced intelligent approach for determination of crude oil viscosity at reservoir conditions. Sci Rep 2023; 13:1666. [PMID: 36717732 PMCID: PMC9887002 DOI: 10.1038/s41598-023-28770-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Accepted: 01/24/2023] [Indexed: 01/31/2023] Open
Abstract
Oil viscosity plays a prominent role in all areas of petroleum engineering, such as simulating reservoirs, predicting production rate, evaluating oil well performance, and even planning for thermal enhanced oil recovery (EOR) that involves fluid flow calculations. Experimental methods of determining oil viscosity, such as the rotational viscometer, are more accurate than other methods. The compositional method can also properly estimate oil viscosity. However, the composition of oil should be determined experimentally, which is costly and time-consuming. Therefore, the occasional inaccessibility of experimental data may make it inevitable to look for convenient methods for fast and accurate prediction of oil viscosity. Hence, in this study, the error in viscosity prediction has been minimized by taking into account the amount of dissolved gas in oil (solution gas-oil ratio: Rs) as a representative of oil composition along with other conventional black oil features including temperature, pressure, and API gravity by employing recently developed machine learning methods based on the gradient boosting decision tree (GBDT): extreme gradient boosting (XGBoost), CatBoost, and GradientBoosting. Moreover, the advantage of the proposed method lies in its independence to input viscosity data in each pressure region/stage. The results were then compared with well-known correlations and machine-learning methods employing the black oil approach applying least square support vector machine (LSSVM) and compositional approach implementing decision trees (DTs). XGBoost is offered as the best method with its greater precision and lower error. It provides an overall average absolute relative deviation (AARD) of 1.968% which has reduced the error of the compositional method by half and the black oil method (saturated region) by five times. This shows the proper viscosity prediction and corroborates the applied method's performance.
Collapse
Affiliation(s)
- Kiana Peiro Ahmady Langeroudy
- grid.411368.90000 0004 0611 6995Department of Petroleum Engineering, Amirkabir University of Technology (Tehran Polytechnic), 424 Hafez Avenue, Box 15875-4413, Tehran, 1591634311 Iran
| | - Parsa Kharazi Esfahani
- grid.411368.90000 0004 0611 6995Department of Petroleum Engineering, Amirkabir University of Technology (Tehran Polytechnic), 424 Hafez Avenue, Box 15875-4413, Tehran, 1591634311 Iran
| | - Mohammad Reza Khorsand Movaghar
- grid.411368.90000 0004 0611 6995Department of Petroleum Engineering, Amirkabir University of Technology (Tehran Polytechnic), 424 Hafez Avenue, Box 15875-4413, Tehran, 1591634311 Iran
| |
Collapse
|
7
|
Gao H, Zhu LT, Luo ZH, Fraga MA, Hsing IM. Machine Learning and Data Science in Chemical Engineering. Ind Eng Chem Res 2022. [DOI: 10.1021/acs.iecr.2c01788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Hanyu Gao
- Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong SAR, People’s Republic of China
| | - Li-Tao Zhu
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai 200240, People’s Republic of China
| | - Zheng-Hong Luo
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai 200240, People’s Republic of China
| | - Marco A. Fraga
- Instituto Nacional de Tecnologia − INT, Av. Venezuela, 82/518, Rio de Janeiro, RJ 20081-312, Brazil
| | - I-Ming Hsing
- Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong SAR, People’s Republic of China
| |
Collapse
|