1
|
Furxhi I, Faccani L, Zanoni I, Brigliadori A, Vespignani M, Costa AL. Design rules applied to silver nanoparticles synthesis: A practical example of machine learning application. Comput Struct Biotechnol J 2024; 25:20-33. [PMID: 38444982 PMCID: PMC10914561 DOI: 10.1016/j.csbj.2024.02.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 02/12/2024] [Accepted: 02/14/2024] [Indexed: 03/07/2024] Open
Abstract
The synthesis of silver nanoparticles with controlled physicochemical properties is essential for governing their intended functionalities and safety profiles. However, synthesis process involves multiple parameters that could influence the resulting properties. This challenge could be addressed with the development of predictive models that forecast endpoints based on key synthesis parameters. In this study, we manually extracted synthesis-related data from the literature and leveraged various machine learning algorithms. Data extraction included parameters such as reactant concentrations, experimental conditions, as well as physicochemical properties. The antibacterial efficiencies and toxicological profiles of the synthesized nanoparticles were also extracted. In a second step, based on data completeness, we employed regression algorithms to establish relationships between synthesis parameters and desired endpoints and to build predictive models. The models for core size and antibacterial efficiency were trained and validated using a cross-validation approach. Finally, the features' impact was evaluated via Shapley values to provide insights into the contribution of features to the predictions. Factors such as synthesis duration, scale of synthesis and the choice of capping agents emerged as the most significant predictors. This study demonstrated the potential of machine learning to aid in the rational design of synthesis process and paves the way for the safe-by-design principles development by providing insights into the optimization of the synthesis process to achieve the desired properties. Finally, this study provides a valuable dataset compiled from literature sources with significant time and effort from multiple researchers. Access to such datasets notably aids computational advances in the field of nanotechnology.
Collapse
Affiliation(s)
- Irini Furxhi
- CNR-ISSMC (Former ISTEC), National Research Council of Italy-Institute of Science, Technology and Sustainability for Ceramics, Faenza, Italy
- Transgero Limited, Limerick, Ireland
| | - Lara Faccani
- CNR-ISSMC (Former ISTEC), National Research Council of Italy-Institute of Science, Technology and Sustainability for Ceramics, Faenza, Italy
| | - Ilaria Zanoni
- CNR-ISSMC (Former ISTEC), National Research Council of Italy-Institute of Science, Technology and Sustainability for Ceramics, Faenza, Italy
| | - Andrea Brigliadori
- CNR-ISSMC (Former ISTEC), National Research Council of Italy-Institute of Science, Technology and Sustainability for Ceramics, Faenza, Italy
| | - Maurizio Vespignani
- CNR-ISSMC (Former ISTEC), National Research Council of Italy-Institute of Science, Technology and Sustainability for Ceramics, Faenza, Italy
| | - Anna Luisa Costa
- CNR-ISSMC (Former ISTEC), National Research Council of Italy-Institute of Science, Technology and Sustainability for Ceramics, Faenza, Italy
| |
Collapse
|
2
|
König C, Vellido A. Understanding predictions of drug profiles using explainable machine learning models. BioData Min 2024; 17:25. [PMID: 39090651 PMCID: PMC11293102 DOI: 10.1186/s13040-024-00378-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2024] [Accepted: 07/26/2024] [Indexed: 08/04/2024] Open
Abstract
PURPOSE The analysis of absorption, distribution, metabolism, and excretion (ADME) molecular properties is of relevance to drug design, as they directly influence the drug's effectiveness at its target location. This study concerns their prediction, using explainable Machine Learning (ML) models. The aim of the study is to find which molecular features are relevant to the prediction of the different ADME properties and measure their impact on the predictive model. METHODS The relative relevance of individual features for ADME activity is gauged by estimating feature importance in ML models' predictions. Feature importance is calculated using feature permutation and the individual impact of features is measured by SHAP additive explanations. RESULTS The study reveals the relevance of specific molecular descriptors for each ADME property and quantifies their impact on the ADME property prediction. CONCLUSION The reported research illustrates how explainable ML models can provide detailed insights about the individual contributions of molecular features to the final prediction of an ADME property, as an effort to support experts in the process of drug candidate selection through a better understanding of the impact of molecular features.
Collapse
Affiliation(s)
- Caroline König
- Intelligent Data Science and Artificial Intelligence (IDEAI-UPC) Research Centre, Universitat Politècnica de Catalunya (UPC Barcelona Tech), Jordi Girona 1-3, Barcelona, 08034, Catalonia, Spain.
- Department of Computer Science, Universitat Politècnica de Catalunya (UPC Barcelona Tech), Jordi Girona 1-3, Barcelona, 08034, Catalonia, Spain.
| | - Alfredo Vellido
- Intelligent Data Science and Artificial Intelligence (IDEAI-UPC) Research Centre, Universitat Politècnica de Catalunya (UPC Barcelona Tech), Jordi Girona 1-3, Barcelona, 08034, Catalonia, Spain
- Department of Computer Science, Universitat Politècnica de Catalunya (UPC Barcelona Tech), Jordi Girona 1-3, Barcelona, 08034, Catalonia, Spain
| |
Collapse
|
3
|
Ege D, Boccaccini AR. Investigating the Effect of Processing and Material Parameters of Alginate Dialdehyde-Gelatin (ADA-GEL)-Based Hydrogels on Stiffness by XGB Machine Learning Model. Bioengineering (Basel) 2024; 11:415. [PMID: 38790283 PMCID: PMC11117982 DOI: 10.3390/bioengineering11050415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 03/26/2024] [Accepted: 04/18/2024] [Indexed: 05/26/2024] Open
Abstract
To address the limitations of alginate and gelatin as separate hydrogels, partially oxidized alginate, alginate dialdehyde (ADA), is usually combined with gelatin to prepare ADA-GEL hydrogels. These hydrogels offer tunable properties, controllable degradation, and suitable stiffness for 3D bioprinting and tissue engineering applications. Several processing variables affect the final properties of the hydrogel, including degree of oxidation, gelatin content and type of crosslinking agent. In addition, in 3D-printed structures, pore size and the possible addition of a filler to make a hydrogel composite also affect the final physical and biological properties. This study utilized datasets from 13 research papers, encompassing 33 unique combinations of ADA concentration, gelatin concentration, CaCl2 and microbial transglutaminase (mTG) concentrations (as crosslinkers), pore size, bioactive glass (BG) filler content, and one identified target property of the hydrogels, stiffness, utilizing the Extreme Boost (XGB) machine learning algorithm to create a predictive model for understanding the combined influence of these parameters on hydrogel stiffness. The stiffness of ADA-GEL hydrogels is notably affected by the ADA to GEL ratio, and higher gelatin content for different ADA gel concentrations weakens the scaffold, likely due to the presence of unbound gelatin. Pore size and the inclusion of a BG particulate filler also have a significant impact on stiffness; smaller pore sizes and higher BG content lead to increased stiffness. The optimization of ADA-GEL composition and the inclusion of BG fillers are key determinants to tailor the stiffness of these 3D printed hydrogels, as found by the analysis of the available data.
Collapse
Affiliation(s)
- Duygu Ege
- Institute of Biomaterials, Department of Materials Science and Engineering, University of Erlangen-Nuremberg, 91058 Erlangen, Germany;
- Institute of Biomedical Engineering, Bogazici University, Rasathane St., Kandilli, 34684 İstanbul, Turkey
| | - Aldo R. Boccaccini
- Institute of Biomaterials, Department of Materials Science and Engineering, University of Erlangen-Nuremberg, 91058 Erlangen, Germany;
| |
Collapse
|
4
|
Fu L, Li M, Lv J, Yang C, Zhang Z, Qin S, Li W, Wang X, Chen L. Deep neural network for discovering metabolism-related biomarkers for lung adenocarcinoma. Front Endocrinol (Lausanne) 2023; 14:1270772. [PMID: 37955007 PMCID: PMC10634586 DOI: 10.3389/fendo.2023.1270772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 10/03/2023] [Indexed: 11/14/2023] Open
Abstract
Introduction Lung cancer is a major cause of illness and death worldwide. Lung adenocarcinoma (LUAD) is its most common subtype. Metabolite-mRNA interactions play a crucial role in cancer metabolism. Thus, metabolism-related mRNAs are potential targets for cancer therapy. Methods This study constructed a network of metabolite-mRNA interactions (MMIs) using four databases. We retrieved mRNAs from the Tumor Genome Atlas (TCGA)-LUAD cohort showing significant expressional changes between tumor and non-tumor tissues and identified metabolism-related differential expression (DE) mRNAs among the MMIs. Candidate mRNAs showing significant contributions to the deep neural network (DNN) model were mined. Using MMIs and the results of function analysis, we created a subnetwork comprising candidate mRNAs and metabolites. Results Finally, 10 biomarkers were obtained after survival analysis and validation. Their good prognostic value in LUAD was validated in independent datasets. Their effectiveness was confirmed in the TCGA and an independent Clinical Proteomic Tumor Analysis Consortium (CPTAC) dataset by comparison with traditional machine-learning models. Conclusion To summarize, 10 metabolism-related biomarkers were identified, and their prognostic value was confirmed successfully through the MMI network and the DNN model. Our strategy bears implications to pave the way for investigating metabolic biomarkers in other cancers.
Collapse
Affiliation(s)
- Lei Fu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Manshi Li
- Department of Radiation Oncology, The Fourth Affiliated Hospital of China Medical University, Shenyang, China
| | - Junjie Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Chengcheng Yang
- Department of Respiratory, Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Zihan Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Shimei Qin
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Wan Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Xinyan Wang
- Department of Respiratory, Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Lina Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| |
Collapse
|
5
|
Saberi S, Nasiri H, Ghorbani O, Friswell MI, Castro SGP. Explainable Artificial Intelligence to Investigate the Contribution of Design Variables to the Static Characteristics of Bistable Composite Laminates. MATERIALS (BASEL, SWITZERLAND) 2023; 16:5381. [PMID: 37570085 PMCID: PMC10419828 DOI: 10.3390/ma16155381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 07/12/2023] [Accepted: 07/26/2023] [Indexed: 08/13/2023]
Abstract
Material properties, geometrical dimensions, and environmental conditions can greatly influence the characteristics of bistable composite laminates. In the current work, to understand how each input feature contributes to the curvatures of the stable equilibrium shapes of bistable laminates and the snap-through force to change these configurations, the correlation between these inputs and outputs is studied using a novel explainable artificial intelligence (XAI) approach called SHapley Additive exPlanations (SHAP). SHAP is employed to explain the contribution and importance of the features influencing the curvatures and the snap-through force since XAI models change the data into a form that is more convenient for users to understand and interpret. The principle of minimum energy and the Rayleigh-Ritz method is applied to obtain the responses of the bistable laminates used as the input datasets in SHAP. SHAP effectively evaluates the importance of the input variables to the parameters. The results show that the transverse thermal expansion coefficient and moisture variation have the most impact on the model's output for the transverse curvatures and snap-through force. The eXtreme Gradient Boosting (XGBoost) and Finite Element (FM) methods are also employed to identify the feature importance and validate the theoretical approach, respectively.
Collapse
Affiliation(s)
- Saeid Saberi
- Department of Mechanical Engineering, Isfahan University of Technology, Isfahan 84156-83111, Iran;
| | - Hamid Nasiri
- Department of Computer Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran 159163-4311, Iran;
| | - Omid Ghorbani
- Department of Engineering, Kharazmi University, Tehran 15719-14911, Iran;
| | | | - Saullo G. P. Castro
- Department of Aerospace Structures and Materials, Delft University of Technology, Kluyverweg 1, 2629HS Delft, The Netherlands
| |
Collapse
|
6
|
Wu S, Pan Z, Li X, Wang Y, Tang J, Li H, Lu G, Li J, Feng Z, He Y, Liu X. Machine Learning Assisted Photothermal Conversion Efficiency Prediction of Anticancer Photothermal Agents. Chem Eng Sci 2023. [DOI: 10.1016/j.ces.2023.118619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/11/2023]
|
7
|
Huang AA, Huang SY. Increasing transparency in machine learning through bootstrap simulation and shapely additive explanations. PLoS One 2023; 18:e0281922. [PMID: 36821544 PMCID: PMC9949629 DOI: 10.1371/journal.pone.0281922] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 02/05/2023] [Indexed: 02/24/2023] Open
Abstract
Machine learning methods are widely used within the medical field. However, the reliability and efficacy of these models is difficult to assess, making it difficult for researchers to identify which machine-learning model to apply to their dataset. We assessed whether variance calculations of model metrics (e.g., AUROC, Sensitivity, Specificity) through bootstrap simulation and SHapely Additive exPlanations (SHAP) could increase model transparency and improve model selection. Data from the England National Health Services Heart Disease Prediction Cohort was used. After comparison of model metrics for XGBoost, Random Forest, Artificial Neural Network, and Adaptive Boosting, XGBoost was used as the machine-learning model of choice in this study. Boost-strap simulation (N = 10,000) was used to empirically derive the distribution of model metrics and covariate Gain statistics. SHapely Additive exPlanations (SHAP) to provide explanations to machine-learning output and simulation to evaluate the variance of model accuracy metrics. For the XGBoost modeling method, we observed (through 10,000 completed simulations) that the AUROC ranged from 0.771 to 0.947, a difference of 0.176, the balanced accuracy ranged from 0.688 to 0.894, a 0.205 difference, the sensitivity ranged from 0.632 to 0.939, a 0.307 difference, and the specificity ranged from 0.595 to 0.944, a 0.394 difference. Among 10,000 simulations completed, we observed that the gain for Angina ranged from 0.225 to 0.456, a difference of 0.231, for Cholesterol ranged from 0.148 to 0.326, a difference of 0.178, for maximum heart rate (MaxHR) ranged from 0.081 to 0.200, a range of 0.119, and for Age ranged from 0.059 to 0.157, difference of 0.098. Use of simulations to empirically evaluate the variability of model metrics and explanatory algorithms to observe if covariates match the literature are necessary for increased transparency, reliability, and utility of machine learning methods. These variance statistics, combined with model accuracy statistics can help researchers identify the best model for a given dataset.
Collapse
Affiliation(s)
- Alexander A. Huang
- Department of Statistics and Data Science, Cornell University, Ithaca, New York, United States of America
- Department of MD Education, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America
| | - Samuel Y. Huang
- Department of Statistics and Data Science, Cornell University, Ithaca, New York, United States of America
- Department of Internal Medicine, Virginia Commonwealth University School of Medicine, Richmond, Virginia, United States of America
| |
Collapse
|