1
|
Mahato KD, Kumar U. Optimized Machine learning techniques Enable prediction of organic dyes photophysical Properties: Absorption Wavelengths, emission Wavelengths, and quantum yields. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2024; 308:123768. [PMID: 38134661 DOI: 10.1016/j.saa.2023.123768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 12/05/2023] [Accepted: 12/12/2023] [Indexed: 12/24/2023]
Abstract
Applications of organic dyes, ranging from basic research to industry, are functions of their photophysical properties. Two important aspects- (1) knowledge of the photophysical properties of existing dyes long before real applications and (2) discovery of new organic dyes with desired photophysical properties for either upgradation of existing or development of new applications-are needed to be addressed. These two cases are coupled together with the common goal of estimating photophysical properties with high accuracy at the minimum cost of time and money long before the hard-core laboratory experiment. For this purpose, machine learning-based techniques are the most suitable approach. In this study, we used optimized machine-learning techniques to assess a dataset of 3066 organic dyes, which were evaluated using three evaluation parameters: Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and the coefficient of determination (R2). The Quadratic Support Vector Machine (QSVM) was the best predictive model for RMSE-16.614, MAE-10.837, and R2-0.961 for absorption wavelengths and RMSE-23.636, MAE-16.278, and R2-0.929 for emission wavelengths. These R2 values are 0.7% and 0.4% greater than the Gradient Boost Regression Tree (GBRT) model's recently reported values of 0.954 and 0.925 for absorption and emission wavelengths, respectively. Furthermore, we estimated the quantum yield and found that the Coarse Gaussian Support Vector Machine (CGSVM) outperformed all examined models. For more validation of these models, we compared the predicted results with the experimental results of selective dyes. The proposed automated approach can be used for predicting photophysical properties without much computer programming knowledge.
Collapse
Affiliation(s)
- Kapil Dev Mahato
- Department of Physics, National Institute of Technology Jamshedpur, Jharkhand 831014, India.
| | - Uday Kumar
- Department of Physics, National Institute of Technology Jamshedpur, Jharkhand 831014, India
| |
Collapse
|
2
|
Bi X, Lin L, Chen Z, Ye J. Artificial Intelligence for Surface-Enhanced Raman Spectroscopy. SMALL METHODS 2024; 8:e2301243. [PMID: 37888799 DOI: 10.1002/smtd.202301243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/11/2023] [Indexed: 10/28/2023]
Abstract
Surface-enhanced Raman spectroscopy (SERS), well acknowledged as a fingerprinting and sensitive analytical technique, has exerted high applicational value in a broad range of fields including biomedicine, environmental protection, food safety among the others. In the endless pursuit of ever-sensitive, robust, and comprehensive sensing and imaging, advancements keep emerging in the whole pipeline of SERS, from the design of SERS substrates and reporter molecules, synthetic route planning, instrument refinement, to data preprocessing and analysis methods. Artificial intelligence (AI), which is created to imitate and eventually exceed human behaviors, has exhibited its power in learning high-level representations and recognizing complicated patterns with exceptional automaticity. Therefore, facing up with the intertwining influential factors and explosive data size, AI has been increasingly leveraged in all the above-mentioned aspects in SERS, presenting elite efficiency in accelerating systematic optimization and deepening understanding about the fundamental physics and spectral data, which far transcends human labors and conventional computations. In this review, the recent progresses in SERS are summarized through the integration of AI, and new insights of the challenges and perspectives are provided in aim to better gear SERS toward the fast track.
Collapse
Affiliation(s)
- Xinyuan Bi
- State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, P. R. China
| | - Li Lin
- State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, P. R. China
| | - Zhou Chen
- State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, P. R. China
| | - Jian Ye
- State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, P. R. China
- Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, 200127, P. R. China
- Shanghai Key Laboratory of Gynecologic Oncology, Ren Ji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, P. R. China
| |
Collapse
|
3
|
Mao J, Chao K, Jiang FL, Ye XP, Yang T, Li P, Zhu X, Hu PJ, Zhou BJ, Huang M, Gao X, Wang XD. Comparison and development of machine learning for thalidomide-induced peripheral neuropathy prediction of refractory Crohn’s disease in Chinese population. World J Gastroenterol 2023; 29:3855-3870. [PMID: 37426324 PMCID: PMC10324537 DOI: 10.3748/wjg.v29.i24.3855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 05/07/2023] [Accepted: 05/23/2023] [Indexed: 06/28/2023] Open
Abstract
BACKGROUND Thalidomide is an effective treatment for refractory Crohn’s disease (CD). However, thalidomide-induced peripheral neuropathy (TiPN), which has a large individual variation, is a major cause of treatment failure. TiPN is rarely predictable and recognized, especially in CD. It is necessary to develop a risk model to predict TiPN occurrence.
AIM To develop and compare a predictive model of TiPN using machine learning based on comprehensive clinical and genetic variables.
METHODS A retrospective cohort of 164 CD patients from January 2016 to June 2022 was used to establish the model. The National Cancer Institute Common Toxicity Criteria Sensory Scale (version 4.0) was used to assess TiPN. With 18 clinical features and 150 genetic variables, five predictive models were established and evaluated by the confusion matrix receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), specificity, sensitivity (recall rate), precision, accuracy, and F1 score.
RESULTS The top-ranking five risk variables associated with TiPN were interleukin-12 rs1353248 [P = 0.0004, odds ratio (OR): 8.983, 95% confidence interval (CI): 2.497-30.90], dose (mg/d, P = 0.002), brain-derived neurotrophic factor (BDNF) rs2030324 (P = 0.001, OR: 3.164, 95%CI: 1.561-6.434), BDNF rs6265 (P = 0.001, OR: 3.150, 95%CI: 1.546-6.073) and BDNF rs11030104 (P = 0.001, OR: 3.091, 95%CI: 1.525-5.960). In the training set, gradient boosting decision tree (GBDT), extremely random trees (ET), random forest, logistic regression and extreme gradient boosting (XGBoost) obtained AUROC values > 0.90 and AUPRC > 0.87. Among these models, XGBoost and GBDT obtained the first two highest AUROC (0.90 and 1), AUPRC (0.98 and 1), accuracy (0.96 and 0.98), precision (0.90 and 0.95), F1 score (0.95 and 0.98), specificity (0.94 and 0.97), and sensitivity (1). In the validation set, XGBoost algorithm exhibited the best predictive performance with the highest specificity (0.857), accuracy (0.818), AUPRC (0.86) and AUROC (0.89). ET and GBDT obtained the highest sensitivity (1) and F1 score (0.8). Overall, compared with other state-of-the-art classifiers such as ET, GBDT and RF, XGBoost algorithm not only showed a more stable performance, but also yielded higher ROC-AUC and PRC-AUC scores, demonstrating its high accuracy in prediction of TiPN occurrence.
CONCLUSION The powerful XGBoost algorithm accurately predicts TiPN using 18 clinical features and 14 genetic variables. With the ability to identify high-risk patients using single nucleotide polymorphisms, it offers a feasible option for improving thalidomide efficacy in CD patients.
Collapse
Affiliation(s)
- Jing Mao
- Institute of Clinical Pharmacology, School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, Guangdong Province, China
- Guangdong Provincial Key Laboratory of New Drug Design and Evaluation, Sun Yat-sen University, Guangzhou 510006, Guangdong Province, China
| | - Kang Chao
- Department of Gastroenterology, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou 510006, Guangdong Province, China
| | - Fu-Lin Jiang
- Institute of Clinical Pharmacology, School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, Guangdong Province, China
| | - Xiao-Ping Ye
- Department of Pharmacy, Guangdong Women and Children Hospital, Guangzhou 510000, Guangdong Province, China
| | - Ting Yang
- Institute of Clinical Pharmacology, School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, Guangdong Province, China
- Guangdong Provincial Key Laboratory of New Drug Design and Evaluation, Sun Yat-sen University, Guangzhou 510006, Guangdong Province, China
| | - Pan Li
- Institute of Clinical Pharmacology, School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, Guangdong Province, China
- Guangdong Provincial Key Laboratory of New Drug Design and Evaluation, Sun Yat-sen University, Guangzhou 510006, Guangdong Province, China
| | - Xia Zhu
- Department of Gastroenterology, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou 510006, Guangdong Province, China
| | - Pin-Jin Hu
- Department of Gastroenterology, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou 510006, Guangdong Province, China
| | - Bai-Jun Zhou
- Institute of Clinical Pharmacology, School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, Guangdong Province, China
| | - Min Huang
- Institute of Clinical Pharmacology, School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, Guangdong Province, China
- Guangdong Provincial Key Laboratory of New Drug Design and Evaluation, Sun Yat-sen University, Guangzhou 510006, Guangdong Province, China
| | - Xiang Gao
- Department of Gastroenterology, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou 510006, Guangdong Province, China
| | - Xue-Ding Wang
- Institute of Clinical Pharmacology, School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, Guangdong Province, China
- Guangdong Provincial Key Laboratory of New Drug Design and Evaluation, Sun Yat-sen University, Guangzhou 510006, Guangdong Province, China
| |
Collapse
|
4
|
Hung SH, Ye ZR, Cheng CF, Chen B, Tsai MK. Enhanced Predictions for the Experimental Photophysical Data Using the Featurized Schnet-Bondstep Approach. J Chem Theory Comput 2023. [PMID: 37126224 DOI: 10.1021/acs.jctc.3c00054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
An assessment of modifying the SchNET model for the predictions of experimental molecular photophysical properties, including absorption energy (ΔEabs), emission energy (ΔEemi), and photoluminescence quantum yield (PLQY), was reported. The solution environment was properly introduced outside the interaction layers of SchNET for not overly amplifying the solute-solvent interactions, particularly being supported by the changes of prediction errors between the presence and absence of the solvent effect. Two featurization schemes under the framework of the Schnet-bondstep approach, with featuring the concepts of reduced-atomic-number and reduced-atomic-neighbor, were demonstrated. These featurized models can consequently provide fine predictions for ΔEabs and ΔEemi with errors less than 0.1 eV. The corresponding predictions of PLQY were shown to be comparable to the previous graph convolution network model.
Collapse
Affiliation(s)
- Sheng-Hsuan Hung
- Department of Chemistry, National Taiwan Normal University, Taipei 11677, Taiwan
| | - Zong-Rong Ye
- Department of Chemistry, National Taiwan Normal University, Taipei 11677, Taiwan
| | - Chi-Feng Cheng
- Department of Chemistry, National Taiwan Normal University, Taipei 11677, Taiwan
| | - Berlin Chen
- Department of Computer Science and Information Engineering, National Taiwan Normal University, Taipei 11677, Taiwan
| | - Ming-Kang Tsai
- Department of Chemistry, National Taiwan Normal University, Taipei 11677, Taiwan
- Department of Chemistry, Fu-Jen Catholic University, New Taipei City 24205, Taiwan
| |
Collapse
|
5
|
Ksenofontov AA, Isaev YI, Lukanov MM, Makarov DM, Eventova VA, Khodov IA, Berezin MB. Accurate prediction of 11B NMR chemical shift of BODIPYs via machine learning. Phys Chem Chem Phys 2023; 25:9472-9481. [PMID: 36935644 DOI: 10.1039/d3cp00253e] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2023]
Abstract
In this article, we present the results of developing a model based on an RFR machine learning method using the ISIDA fragment descriptors for predicting the 11B NMR chemical shift of BODIPYs. The model is freely available at https://ochem.eu/article/146458. The model demonstrates the high quality of predicting the 11B NMR chemical shift (RMSE, 5CV (FINALE training set) = 0.40 ppm, RMSE (TEST set) = 0.14 ppm). In addition, we compared the "cost" and the user-friendliness for calculations using the quantum-chemical model with the DFT/GIAO approach. The 11B NMR chemical shift prediction accuracy (RMSE) of the model considered is more than three times higher and tremendously faster than the DFT/GIAO calculations. As a result, we provide a convenient tool and database that we collected for all researchers, that allows them to predict the 11B NMR chemical shift of boron-containing dyes. We believe that the new model will make it easier for researchers to correctly interpret the 11B NMR chemical shifts experimentally determined and to select more optimal conditions to perform an NMR experiment.
Collapse
Affiliation(s)
- Alexander A Ksenofontov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| | - Yaroslav I Isaev
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia. .,Ivanovo State University of Chemistry and Technology, 7, Sheremetevskiy Avenue, Ivanovo 153000, Russia
| | - Michail M Lukanov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| | - Dmitry M Makarov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| | - Varvara A Eventova
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia. .,Ivanovo State University of Chemistry and Technology, 7, Sheremetevskiy Avenue, Ivanovo 153000, Russia
| | - Ilya A Khodov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| | - Mechail B Berezin
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| |
Collapse
|