1
|
Hong C, Wu X, Huang J, Dai H. Biomimetic fusion: Platyper's dual vision for predicting protein-surface interactions. MATERIALS HORIZONS 2024. [PMID: 38916578 DOI: 10.1039/d4mh00066h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Predicting protein binding with the material surface still remains a challenge. Here, a novel approach, platypus dual perception neural network (Platyper), was developed to describe the interactions in protein-surface systems involving bioceramics with BMPs. The resulting model integrates a graph convolutional neural network (GCN) based on interatomic potentials with a convolutional neural network (CNN) model based on images of molecular structures. This dual-vision approach, inspired by the platypus's adaptive sensory system, addresses the challenge of accurately predicting the complex binding and unbinding dynamics in steered molecular dynamics (SMD) simulations. The model's effectiveness is demonstrated through its application in predicting surface interactions in protein-ligand systems. Notably, Platyper improves computational efficiency compared to classical SMD-based methods and overcomes the limitations of GNN-based methods for large-scale atomic simulations. The incorporation of heat maps enhances model's interpretability, providing valuable insights into its predictive capabilities. Overall, Platyper represents a promising advancement in the accurate and efficient prediction of protein-surface interactions in the context of bioceramics and growth factors.
Collapse
Affiliation(s)
- Chuhang Hong
- State Key Laboratory of Advanced Technology for Materials Synthesis and Processing, Biomedical Materials and Engineering Research Center of Hubei Province, Wuhan University of Technology, Wuhan 430070, China.
| | - Xiaopei Wu
- State Key Laboratory of Advanced Technology for Materials Synthesis and Processing, Biomedical Materials and Engineering Research Center of Hubei Province, Wuhan University of Technology, Wuhan 430070, China.
| | - Jian Huang
- Materials Genome Institute, Shanghai University, Shanghai, 200444, China.
| | - Honglian Dai
- State Key Laboratory of Advanced Technology for Materials Synthesis and Processing, Biomedical Materials and Engineering Research Center of Hubei Province, Wuhan University of Technology, Wuhan 430070, China.
- Foshan Xianhu Laboratory of the Advanced Energy Science and Technology Guangdong Laboratory, Xianhu Hydrogen Valley, Foshan 528200, China
| |
Collapse
|
2
|
Abrofarakh M, Moghadam H, Abdulrahim HK. Investigation of direct contact membrane distillation (DCMD) performance using CFD and machine learning approaches. CHEMOSPHERE 2024; 357:141969. [PMID: 38604515 DOI: 10.1016/j.chemosphere.2024.141969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/24/2024] [Accepted: 04/08/2024] [Indexed: 04/13/2024]
Abstract
Direct Contact Membrane Distillation (DCMD) is emerging as an effective method for water desalination, known for its efficiency and adaptability. This study delves into the performance of DCMD by integrating two powerful analytical tools: Computational Fluid Dynamics (CFD) and Artificial Neural Networks (ANN). The research thoroughly examines the impact of various factors, such as inlet temperatures, velocities, channel heights, salt concentration, and membrane characteristics, on the process's efficiency, specifically calculating the water vapor flux. A rigorous validation of the CFD model aligns well with established studies, ensuring reliability. Subsequently, over 1000 data points reflecting variations in input factors are utilized to train and validate the ANN. The training phase demonstrated high accuracy, with near-zero mean squared errors and R2 values close to one, indicating a strong predictive capability. Further analysis post-ANN training shed light on key relationships: higher membrane porosity boosts water vapor flux, whereas thicker membranes reduce it. Additionally, it was detailed how salt concentration, channel dimensions, inlet temperatures, and velocities significantly influence the distillation process. Finally, a mathematical model was proposed for water vapor flux as a function of key input factors. The results highlighted that salt mole fraction and hot water inlet temperature have the most effect on the water vapor flux. This comprehensive investigation contributes to the understanding of DCMD and emphasizes the potential of combining CFD and ANN for optimizing and innovating water desalination technology.
Collapse
Affiliation(s)
- Moslem Abrofarakh
- Department of Chemical Engineering, Faculty of Engineering, University of Sistan and Baluchestan, Zahedan, Iran
| | - Hamid Moghadam
- Department of Chemical Engineering, Faculty of Engineering, University of Sistan and Baluchestan, Zahedan, Iran.
| | - Hassan K Abdulrahim
- Water Research Center (WRC), Kuwait Institute for Scientific Research (KISR), P.O. Box 24885, 13109, Safat, Kuwait
| |
Collapse
|
3
|
Tan J, Yang R, Xiao L, Xia Y, Qin W. Personalized decision support system for tailoring IgA nephropathy treatment strategies. Eur J Intern Med 2024; 124:69-77. [PMID: 38443263 DOI: 10.1016/j.ejim.2024.02.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 01/06/2024] [Accepted: 02/04/2024] [Indexed: 03/07/2024]
Abstract
BACKGROUND The ongoing debate surrounding the use of immunosuppressive treatments for IgA nephropathy (IgAN) underscores the demand for personalized and effective strategies. METHODS Analyzed data from 807 IgAN patients over 5+ years using three methods: Random Forest with molecular biomarkers, network biomarkers with graph engineering, and an auto-encoder model. All models were trained using identical demographic, clinical, and pathological data, employing an 80-20 split for training and testing purposes. RESULTS In the comprehensive assessment of IgAN prognosis, the Random Forest model, employing molecular biomarkers, demonstrated strong performance metrics (AUC = 0.83, sensitivity = 0.51, specificity = 0.96). However, traditional graph feature engineering on patient-specific networks outperformed these results with an AUC of 0.90, sensitivity of 0.64, and specificity of 0.94. The Auto-encoder model showed the best accuracy (AUC = 0.91, sensitivity = 0.46, specificity = 0.96). The findings highlighted the superior predictive capabilities of network biomarkers over molecular biomarkers for adverse renal outcome prediction in IgAN. Consequently, we integrated Auto-encoder-derived Network Biomarkers with Random Forest Models to enhance prognostic precision in diverse IgAN treatment scenarios. The prediction for the prognosis of patients receiving supportive care, glucocorticoid therapy, and immunosuppressant treatment yielded AUC values of 0.95, 0.96, and 1, respectively, indicating high specificity. Drawing from these insights, we pioneered the development of an innovative decision support model for IgAN treatment. This model demonstrated the ability to make medical decisions comparable to those by experienced nephrologists, enabling the customization of personalized disease management strategies. CONCLUSION Our system accurately predicted IgAN prognosis and evaluated various treatment efficacies, aiding physicians in devising optimal therapeutic strategies for patients.
Collapse
Affiliation(s)
- Jiaxing Tan
- Division of Nephrology, Department of Medicine, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Rongxin Yang
- College of Computer Science, Sichuan University, Chengdu, Sichuan, China
| | - Liyin Xiao
- College of Computer Science, Sichuan University, Chengdu, Sichuan, China
| | - Yuanlin Xia
- School of Mechanical Engineering, Sichuan University College of Computer Science, Sichuan University, Chengdu, China
| | - Wei Qin
- Division of Nephrology, Department of Medicine, West China Hospital, Sichuan University, Chengdu, Sichuan, China.
| |
Collapse
|
4
|
Li P, Dong L, Li C, Li Y, Zhao J, Peng B, Wang W, Zhou S, Liu W. Machine Learning to Promote Efficient Screening of Low-Contact Electrode for 2D Semiconductor Transistor Under Limited Data. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2024; 36:e2312887. [PMID: 38606800 DOI: 10.1002/adma.202312887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 03/09/2024] [Indexed: 04/13/2024]
Abstract
Low-barrier and high-injection electrodes are crucial for high-performance (HP) 2D semiconductor devices. Conventional trial-and-error methodologies for electrode material screening are impractical because of their low efficiency and arbitrary specificity. Although machine learning has emerged as a promising alternative to tackle this problem, its practical application in semiconductor devices is hindered by its substantial data requirements. In this paper, a comprehensive scheme combining an autoencoding regularized adversarial neural network and a feature-adaptive variational active learning algorithm for screening low-contact electrode materials for 2D semiconductor transistors with limited data is proposed. The proposed scheme exhibits exceptional performance by training with only 15% of the total data points, where the mean square errors are 0.17 and 0.27 eV for the vertical and lateral Schottky barrier, respectively, and 2.88% for tunneling probability. Further, it exhibits an optimal predictive performance for 100 randomly sampled training datasets, reveals the underlying physical insight based on the identified features, and realizes continual improvement by employing detailed density-of-states descriptors. Finally, the empirical evaluations of the transport characteristics are conducted and verified by constructing MOSFET devices. These findings demonstrate the considerable potential of machine-learning techniques for screening high-efficiency electrode materials and constructing HP 2D semiconductor devices.
Collapse
Affiliation(s)
- Penghui Li
- Shaanxi Province Key Laboratory of Thin Films Technology and Optical Test, Xi'an Technological University, Xi'an, 710032, China
- School of Opto-electronical Engineering, Xi'an Technological University, Xi'an, 710032, China
| | - Linpeng Dong
- Shaanxi Province Key Laboratory of Thin Films Technology and Optical Test, Xi'an Technological University, Xi'an, 710032, China
- School of Opto-electronical Engineering, Xi'an Technological University, Xi'an, 710032, China
| | - Chong Li
- Xi'an Xiangteng Microelectronics Technology Co., Ltd, Xi'an, 710075, China
| | - Yan Li
- Shaanxi Province Key Laboratory of Thin Films Technology and Optical Test, Xi'an Technological University, Xi'an, 710032, China
- School of Opto-electronical Engineering, Xi'an Technological University, Xi'an, 710032, China
| | - Jie Zhao
- Shaanxi Province Key Laboratory of Thin Films Technology and Optical Test, Xi'an Technological University, Xi'an, 710032, China
- School of Opto-electronical Engineering, Xi'an Technological University, Xi'an, 710032, China
| | - Bo Peng
- Key Laboratory of Wide Band-Gap Semiconductor Materials and Devices, School of Microelectronics, Xidian University, Xi'an, 710071, China
| | - Wei Wang
- School of Opto-electronical Engineering, Xi'an Technological University, Xi'an, 710032, China
| | - Shun Zhou
- Shaanxi Province Key Laboratory of Thin Films Technology and Optical Test, Xi'an Technological University, Xi'an, 710032, China
- School of Opto-electronical Engineering, Xi'an Technological University, Xi'an, 710032, China
| | - Weiguo Liu
- Shaanxi Province Key Laboratory of Thin Films Technology and Optical Test, Xi'an Technological University, Xi'an, 710032, China
- School of Opto-electronical Engineering, Xi'an Technological University, Xi'an, 710032, China
| |
Collapse
|
5
|
van Tilborg D, Brinkmann H, Criscuolo E, Rossen L, Özçelik R, Grisoni F. Deep learning for low-data drug discovery: Hurdles and opportunities. Curr Opin Struct Biol 2024; 86:102818. [PMID: 38669740 DOI: 10.1016/j.sbi.2024.102818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 03/27/2024] [Accepted: 03/29/2024] [Indexed: 04/28/2024]
Abstract
Deep learning is becoming increasingly relevant in drug discovery, from de novo design to protein structure prediction and synthesis planning. However, it is often challenged by the small data regimes typical of certain drug discovery tasks. In such scenarios, deep learning approaches-which are notoriously 'data-hungry'-might fail to live up to their promise. Developing novel approaches to leverage the power of deep learning in low-data scenarios is sparking great attention, and future developments are expected to propel the field further. This mini-review provides an overview of recent low-data-learning approaches in drug discovery, analyzing their hurdles and advantages. Finally, we venture to provide a forecast of future research directions in low-data learning for drug discovery.
Collapse
Affiliation(s)
- Derek van Tilborg
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands; Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Princetonlaan 6, 3584 CB, Utrecht, the Netherlands. https://twitter.com/DerekvTilborg
| | - Helena Brinkmann
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands. https://twitter.com/hlnbrkmnn
| | - Emanuele Criscuolo
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands. https://twitter.com/emanuelecriscu9
| | - Luke Rossen
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands. https://twitter.com/molecular_ml
| | - Rıza Özçelik
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands; Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Princetonlaan 6, 3584 CB, Utrecht, the Netherlands. https://twitter.com/Rza_ozcelik
| | - Francesca Grisoni
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands; Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Princetonlaan 6, 3584 CB, Utrecht, the Netherlands.
| |
Collapse
|
6
|
Khan MK, Raza M, Shahbaz M, Hussain I, Khan MF, Xie Z, Shah SSA, Tareen AK, Bashir Z, Khan K. The recent advances in the approach of artificial intelligence (AI) towards drug discovery. Front Chem 2024; 12:1408740. [PMID: 38882215 PMCID: PMC11176507 DOI: 10.3389/fchem.2024.1408740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Accepted: 04/26/2024] [Indexed: 06/18/2024] Open
Abstract
Artificial intelligence (AI) has recently emerged as a unique developmental influence that is playing an important role in the development of medicine. The AI medium is showing the potential in unprecedented advancements in truth and efficiency. The intersection of AI has the potential to revolutionize drug discovery. However, AI also has limitations and experts should be aware of these data access and ethical issues. The use of AI techniques for drug discovery applications has increased considerably over the past few years, including combinatorial QSAR and QSPR, virtual screening, and denovo drug design. The purpose of this survey is to give a general overview of drug discovery based on artificial intelligence, and associated applications. We also highlighted the gaps present in the traditional method for drug designing. In addition, potential strategies and approaches to overcome current challenges are discussed to address the constraints of AI within this field. We hope that this survey plays a comprehensive role in understanding the potential of AI in drug discovery.
Collapse
Affiliation(s)
- Mahroza Kanwal Khan
- College of Chemistry and Environmental Engineering, Shenzhen University, Shenzhen, China
| | - Mohsin Raza
- Additive Manufacturing Institute, Shenzhen University, Shenzhen, China
| | - Muhammad Shahbaz
- Additive Manufacturing Institute, Shenzhen University, Shenzhen, China
| | - Iftikhar Hussain
- Department of Mechanical Engineering, City University of Hong Kong, Kowloon, Hong Kong SAR, China
- A. J. Drexel Nanomaterials Institute and Department of Materials Science and Engineering, Drexel University, Philadelphia, PA, United States
| | - Muhammad Farooq Khan
- Department of Electrical Engineering, Sejong University, Seoul, Republic of Korea
| | - Zhongjian Xie
- Shenzhen Children's Hospital, Clinical Medical College of Southern University of Science and Technology, Shenzhen, China
| | - Syed Shoaib Ahmad Shah
- Department of Chemistry, School of Natural Sciences, National University of Sciences and Technology, Islamabad, Pakistan
| | - Ayesha Khan Tareen
- School of Mechanical Engineering, Dongguan University of Technology, Dongguan, China
| | - Zoobia Bashir
- College of Chemistry and Environmental Engineering, Shenzhen University, Shenzhen, China
| | - Karim Khan
- Additive Manufacturing Institute, Shenzhen University, Shenzhen, China
| |
Collapse
|
7
|
Fooladi H, Hirte S, Kirchmair J. Quantifying the Hardness of Bioactivity Prediction Tasks for Transfer Learning. J Chem Inf Model 2024; 64:4031-4046. [PMID: 38739465 PMCID: PMC11134514 DOI: 10.1021/acs.jcim.4c00160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 04/24/2024] [Accepted: 04/24/2024] [Indexed: 05/16/2024]
Abstract
Today, machine learning methods are widely employed in drug discovery. However, the chronic lack of data continues to hamper their further development, validation, and application. Several modern strategies aim to mitigate the challenges associated with data scarcity by learning from data on related tasks. These knowledge-sharing approaches encompass transfer learning, multitask learning, and meta-learning. A key question remaining to be answered for these approaches is about the extent to which their performance can benefit from the relatedness of available source (training) tasks; in other words, how difficult ("hard") a test task is to a model, given the available source tasks. This study introduces a new method for quantifying and predicting the hardness of a bioactivity prediction task based on its relation to the available training tasks. The approach involves the generation of protein and chemical representations and the calculation of distances between the bioactivity prediction task and the available training tasks. In the example of meta-learning on the FS-Mol data set, we demonstrate that the proposed task hardness metric is inversely correlated with performance (Pearson's correlation coefficient r = -0.72). The metric will be useful in estimating the task-specific gain in performance that can be achieved through meta-learning.
Collapse
Affiliation(s)
- Hosein Fooladi
- Department
of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry,
Faculty of Life Sciences, University of
Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria
- Christian
Doppler Laboratory for Molecular Informatics in the Biosciences, Department
for Pharmaceutical Sciences, University
of Vienna, 1090 Vienna, Austria
- Vienna
Doctoral School of Pharmaceutical, Nutritional and Sport Sciences
(PhaNuSpo), University of Vienna, 1090 Vienna, Austria
| | - Steffen Hirte
- Department
of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry,
Faculty of Life Sciences, University of
Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria
- Vienna
Doctoral School of Pharmaceutical, Nutritional and Sport Sciences
(PhaNuSpo), University of Vienna, 1090 Vienna, Austria
| | - Johannes Kirchmair
- Department
of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry,
Faculty of Life Sciences, University of
Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria
- Christian
Doppler Laboratory for Molecular Informatics in the Biosciences, Department
for Pharmaceutical Sciences, University
of Vienna, 1090 Vienna, Austria
| |
Collapse
|
8
|
Vecchi E, Bassetti D, Graziato F, Pospíšil L, Horenko I. Gauge-Optimal Approximate Learning for Small Data Classification. Neural Comput 2024; 36:1198-1227. [PMID: 38669692 DOI: 10.1162/neco_a_01664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 01/16/2024] [Indexed: 04/28/2024]
Abstract
Small data learning problems are characterized by a significant discrepancy between the limited number of response variable observations and the large feature space dimension. In this setting, the common learning tools struggle to identify the features important for the classification task from those that bear no relevant information and cannot derive an appropriate learning rule that allows discriminating among different classes. As a potential solution to this problem, here we exploit the idea of reducing and rotating the feature space in a lower-dimensional gauge and propose the gauge-optimal approximate learning (GOAL) algorithm, which provides an analytically tractable joint solution to the dimension reduction, feature segmentation, and classification problems for small data learning problems. We prove that the optimal solution of the GOAL algorithm consists in piecewise-linear functions in the Euclidean space and that it can be approximated through a monotonically convergent algorithm that presents-under the assumption of a discrete segmentation of the feature space-a closed-form solution for each optimization substep and an overall linear iteration cost scaling. The GOAL algorithm has been compared to other state-of-the-art machine learning tools on both synthetic data and challenging real-world applications from climate science and bioinformatics (i.e., prediction of the El Niño Southern Oscillation and inference of epigenetically induced gene-activity networks from limited experimental data). The experimental results show that the proposed algorithm outperforms the reported best competitors for these problems in both learning performance and computational cost.
Collapse
Affiliation(s)
- Edoardo Vecchi
- Università della Svizzera Italiana, Faculty of Informatics, Institute of Computing, 6962 Lugano, Switzerland
| | - Davide Bassetti
- Technical University of Kaiserslautern, Faculty of Mathematics, Group of Mathematics of AI, 67663 Kaiserslautern, Germany
| | | | - Lukáš Pospíšil
- VSB Ostrava, Department of Mathematics, Ludvika Podeste 1875/17 708 33 Ostrava, Czech Republic
| | - Illia Horenko
- Technical University of Kaiserslautern, Faculty of Mathematics, Group of Mathematics of AI, 67663 Kaiserslautern, Germany
| |
Collapse
|
9
|
Focke K, De Santis M, Wolter M, Martinez B JA, Vallet V, Pereira Gomes AS, Olejniczak M, Jacob CR. Interoperable workflows by exchanging grid-based data between quantum-chemical program packages. J Chem Phys 2024; 160:162503. [PMID: 38686818 DOI: 10.1063/5.0201701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 04/02/2024] [Indexed: 05/02/2024] Open
Abstract
Quantum-chemical subsystem and embedding methods require complex workflows that may involve multiple quantum-chemical program packages. Moreover, such workflows require the exchange of voluminous data that go beyond simple quantities, such as molecular structures and energies. Here, we describe our approach for addressing this interoperability challenge by exchanging electron densities and embedding potentials as grid-based data. We describe the approach that we have implemented to this end in a dedicated code, PyEmbed, currently part of a Python scripting framework. We discuss how it has facilitated the development of quantum-chemical subsystem and embedding methods and highlight several applications that have been enabled by PyEmbed, including wave-function theory (WFT) in density-functional theory (DFT) embedding schemes mixing non-relativistic and relativistic electronic structure methods, real-time time-dependent DFT-in-DFT approaches, the density-based many-body expansion, and workflows including real-space data analysis and visualization. Our approach demonstrates, in particular, the merits of exchanging (complex) grid-based data and, in general, the potential of modular software development in quantum chemistry, which hinges upon libraries that facilitate interoperability.
Collapse
Affiliation(s)
- Kevin Focke
- Institute of Physical and Theoretical Chemistry, Technische Universität Braunschweig, Gaußstraße 17, 38106 Braunschweig, Germany
| | - Matteo De Santis
- CNRS, UMR 8523-PhLAM-Physique des Lasers Atomes et Molécules, Univ. Lille, F-59000 Lille, France
| | - Mario Wolter
- Institute of Physical and Theoretical Chemistry, Technische Universität Braunschweig, Gaußstraße 17, 38106 Braunschweig, Germany
| | - Jessica A Martinez B
- CNRS, UMR 8523-PhLAM-Physique des Lasers Atomes et Molécules, Univ. Lille, F-59000 Lille, France
- Department of Chemistry, Rutgers University, Newark, New Jersey 07102, USA
| | - Valérie Vallet
- CNRS, UMR 8523-PhLAM-Physique des Lasers Atomes et Molécules, Univ. Lille, F-59000 Lille, France
| | | | - Małgorzata Olejniczak
- Centre of New Technologies, University of Warsaw, S. Banacha 2c, 02-097 Warsaw, Poland
| | - Christoph R Jacob
- Institute of Physical and Theoretical Chemistry, Technische Universität Braunschweig, Gaußstraße 17, 38106 Braunschweig, Germany
| |
Collapse
|
10
|
Gou Q, Liu J, Su H, Guo Y, Chen J, Zhao X, Pu X. Exploring an accurate machine learning model to quickly estimate stability of diverse energetic materials. iScience 2024; 27:109452. [PMID: 38523799 PMCID: PMC10960145 DOI: 10.1016/j.isci.2024.109452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 01/27/2024] [Accepted: 03/06/2024] [Indexed: 03/26/2024] Open
Abstract
High energy and low sensitivity have been the focus of developing new energetic materials (EMs). However, there has been a lack of a quick and accurate method for evaluating the stability of diverse EMs. Here, we develop a machine learning prediction model with high accuracy for bond dissociation energy (BDE) of EMs. A reliable and representative BDE dataset of EMs is constructed by collecting 778 experimental energetic compounds and quantum mechanics calculation. To sufficiently characterize the BDE of EMs, a hybrid feature representation is proposed by coupling the local target bond into the global structure characteristics. To alleviate the limitation of the low dataset, pairwise difference regression is utilized as a data augmentation with the advantage of reducing systematic errors and improving diversity. Benefiting from these improvements, the XGBoost model achieves the best prediction accuracy with R2 of 0.98 and MAE of 8.8 kJ mol-1, significantly outperforming competitive models.
Collapse
Affiliation(s)
- Qiaolin Gou
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Jing Liu
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Haoming Su
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Jiayi Chen
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Xueyan Zhao
- Institute of Chemical Materials, China Academy of Engineering Physics, Mianyang 621900, China
| | - Xuemei Pu
- College of Chemistry, Sichuan University, Chengdu 610064, China
| |
Collapse
|
11
|
Khiari Z. Recent Developments in Bio-Ink Formulations Using Marine-Derived Biomaterials for Three-Dimensional (3D) Bioprinting. Mar Drugs 2024; 22:134. [PMID: 38535475 PMCID: PMC10971850 DOI: 10.3390/md22030134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 03/12/2024] [Accepted: 03/13/2024] [Indexed: 05/01/2024] Open
Abstract
3D bioprinting is a disruptive, computer-aided, and additive manufacturing technology that allows the obtention, layer-by-layer, of 3D complex structures. This technology is believed to offer tremendous opportunities in several fields including biomedical, pharmaceutical, and food industries. Several bioprinting processes and bio-ink materials have emerged recently. However, there is still a pressing need to develop low-cost sustainable bio-ink materials with superior qualities (excellent mechanical, viscoelastic and thermal properties, biocompatibility, and biodegradability). Marine-derived biomaterials, including polysaccharides and proteins, represent a viable and renewable source for bio-ink formulations. Therefore, the focus of this review centers around the use of marine-derived biomaterials in the formulations of bio-ink. It starts with a general overview of 3D bioprinting processes followed by a description of the most commonly used marine-derived biomaterials for 3D bioprinting, with a special attention paid to chitosan, glycosaminoglycans, alginate, carrageenan, collagen, and gelatin. The challenges facing the application of marine-derived biomaterials in 3D bioprinting within the biomedical and pharmaceutical fields along with future directions are also discussed.
Collapse
Affiliation(s)
- Zied Khiari
- National Research Council of Canada, Aquatic and Crop Resource Development Research Centre, 1411 Oxford Street, Halifax, NS B3H 3Z1, Canada
| |
Collapse
|
12
|
Mahato KD, Kumar U. Optimized Machine learning techniques Enable prediction of organic dyes photophysical Properties: Absorption Wavelengths, emission Wavelengths, and quantum yields. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2024; 308:123768. [PMID: 38134661 DOI: 10.1016/j.saa.2023.123768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 12/05/2023] [Accepted: 12/12/2023] [Indexed: 12/24/2023]
Abstract
Applications of organic dyes, ranging from basic research to industry, are functions of their photophysical properties. Two important aspects- (1) knowledge of the photophysical properties of existing dyes long before real applications and (2) discovery of new organic dyes with desired photophysical properties for either upgradation of existing or development of new applications-are needed to be addressed. These two cases are coupled together with the common goal of estimating photophysical properties with high accuracy at the minimum cost of time and money long before the hard-core laboratory experiment. For this purpose, machine learning-based techniques are the most suitable approach. In this study, we used optimized machine-learning techniques to assess a dataset of 3066 organic dyes, which were evaluated using three evaluation parameters: Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and the coefficient of determination (R2). The Quadratic Support Vector Machine (QSVM) was the best predictive model for RMSE-16.614, MAE-10.837, and R2-0.961 for absorption wavelengths and RMSE-23.636, MAE-16.278, and R2-0.929 for emission wavelengths. These R2 values are 0.7% and 0.4% greater than the Gradient Boost Regression Tree (GBRT) model's recently reported values of 0.954 and 0.925 for absorption and emission wavelengths, respectively. Furthermore, we estimated the quantum yield and found that the Coarse Gaussian Support Vector Machine (CGSVM) outperformed all examined models. For more validation of these models, we compared the predicted results with the experimental results of selective dyes. The proposed automated approach can be used for predicting photophysical properties without much computer programming knowledge.
Collapse
Affiliation(s)
- Kapil Dev Mahato
- Department of Physics, National Institute of Technology Jamshedpur, Jharkhand 831014, India.
| | - Uday Kumar
- Department of Physics, National Institute of Technology Jamshedpur, Jharkhand 831014, India
| |
Collapse
|
13
|
Novais Â, Gonçalves AB, Ribeiro TG, Freitas AR, Méndez G, Mancera L, Read A, Alves V, López-Cerero L, Rodríguez-Baño J, Pascual Á, Peixe L. Development and validation of a quick, automated, and reproducible ATR FT-IR spectroscopy machine-learning model for Klebsiella pneumoniae typing. J Clin Microbiol 2024; 62:e0121123. [PMID: 38284762 PMCID: PMC10865814 DOI: 10.1128/jcm.01211-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 12/18/2023] [Indexed: 01/30/2024] Open
Abstract
The reliability of Fourier-transform infrared (FT-IR) spectroscopy for Klebsiella pneumoniae typing and outbreak control has been previously assessed, but issues remain in standardization and reproducibility. We developed and validated a reproducible FT-IR with attenuated total reflectance (ATR) workflow for the identification of K. pneumoniae lineages. We used 293 isolates representing multidrug-resistant K. pneumoniae lineages causing outbreaks worldwide (2002-2021) to train a random forest classification (RF) model based on capsular (KL)-type discrimination. This model was validated with 280 contemporaneous isolates (2021-2022), using wzi sequencing and whole-genome sequencing as references. Repeatability and reproducibility were tested in different culture media and instruments throughout time. Our RF model allowed the classification of 33 capsular (KL)-types and up to 36 clinically relevant K. pneumoniae lineages based on the discrimination of specific KL- and O-type combinations. We obtained high rates of accuracy (89%), sensitivity (88%), and specificity (92%), including from cultures obtained directly from the clinical sample, allowing to obtain typing information the same day bacteria are identified. The workflow was reproducible in different instruments throughout time (>98% correct predictions). Direct colony application, spectral acquisition, and automated KL prediction through Clover MS Data analysis software allow a short time-to-result (5 min/isolate). We demonstrated that FT-IR ATR spectroscopy provides meaningful, reproducible, and accurate information at a very early stage (as soon as bacterial identification) to support infection control and public health surveillance. The high robustness together with automated and flexible workflows for data analysis provide opportunities to consolidate real-time applications at a global level. IMPORTANCE We created and validated an automated and simple workflow for the identification of clinically relevant Klebsiella pneumoniae lineages by FT-IR spectroscopy and machine-learning, a method that can be extremely useful to provide quick and reliable typing information to support real-time decisions of outbreak management and infection control. This method and workflow is of interest to support clinical microbiology diagnostics and to aid public health surveillance.
Collapse
Affiliation(s)
- Ângela Novais
- UCIBIO, Applied Molecular Biosciences Unit, Department of Biological Sciences, Faculty of Pharmacy, University of Porto, Porto, Portugal
- Associate Laboratory i4HB - Institute for Health and Bioeconomy, Faculty of Pharmacy, University of Porto, Porto, Portugal
| | - Ana Beatriz Gonçalves
- UCIBIO, Applied Molecular Biosciences Unit, Department of Biological Sciences, Faculty of Pharmacy, University of Porto, Porto, Portugal
- Associate Laboratory i4HB - Institute for Health and Bioeconomy, Faculty of Pharmacy, University of Porto, Porto, Portugal
| | - Teresa G. Ribeiro
- UCIBIO, Applied Molecular Biosciences Unit, Department of Biological Sciences, Faculty of Pharmacy, University of Porto, Porto, Portugal
- Associate Laboratory i4HB - Institute for Health and Bioeconomy, Faculty of Pharmacy, University of Porto, Porto, Portugal
- CCP, Culture Collection of Porto, Faculty of Pharmacy, University of Porto, Porto, Portugal
| | - Ana R. Freitas
- UCIBIO, Applied Molecular Biosciences Unit, Department of Biological Sciences, Faculty of Pharmacy, University of Porto, Porto, Portugal
- Associate Laboratory i4HB - Institute for Health and Bioeconomy, Faculty of Pharmacy, University of Porto, Porto, Portugal
- 1H-TOXRUN, One Health Toxicology Research Unit, University Institute of Health Sciences, CESPU, CRL, Gandra, Portugal
| | - Gema Méndez
- CLOVER Bioanalytical Software, Granada, Spain
| | | | - Antónia Read
- Clinical Microbiology Laboratory, Local Healthcare Unit, Matosinhos, Portugal
| | - Valquíria Alves
- Clinical Microbiology Laboratory, Local Healthcare Unit, Matosinhos, Portugal
| | - Lorena López-Cerero
- Unidad Clínica de Enfermedades Infecciosas y Microbiología, Hospital Universitario Vírgen Macarena, Instituto de Biomedicina de Sevilla (IBIS; CSIC/Hospital Virgen Macarena/Universidad de Sevilla), Sevilla, Spain
- Departamentos de Microbiología y Medicina, Universidad de Sevilla, Sevilla, Spain
| | - Jesús Rodríguez-Baño
- Unidad Clínica de Enfermedades Infecciosas y Microbiología, Hospital Universitario Vírgen Macarena, Instituto de Biomedicina de Sevilla (IBIS; CSIC/Hospital Virgen Macarena/Universidad de Sevilla), Sevilla, Spain
- Departamentos de Microbiología y Medicina, Universidad de Sevilla, Sevilla, Spain
| | - Álvaro Pascual
- Unidad Clínica de Enfermedades Infecciosas y Microbiología, Hospital Universitario Vírgen Macarena, Instituto de Biomedicina de Sevilla (IBIS; CSIC/Hospital Virgen Macarena/Universidad de Sevilla), Sevilla, Spain
- Departamentos de Microbiología y Medicina, Universidad de Sevilla, Sevilla, Spain
| | - Luísa Peixe
- UCIBIO, Applied Molecular Biosciences Unit, Department of Biological Sciences, Faculty of Pharmacy, University of Porto, Porto, Portugal
- Associate Laboratory i4HB - Institute for Health and Bioeconomy, Faculty of Pharmacy, University of Porto, Porto, Portugal
- CCP, Culture Collection of Porto, Faculty of Pharmacy, University of Porto, Porto, Portugal
| |
Collapse
|
14
|
Wang X, Xiong Z, Hong W, Liao X, Yang G, Jiang Z, Jing L, Huang S, Fu Z, Zhu F. Identification of cuproptosis-related gene clusters and immune cell infiltration in major burns based on machine learning models and experimental validation. Front Immunol 2024; 15:1335675. [PMID: 38410514 PMCID: PMC10894925 DOI: 10.3389/fimmu.2024.1335675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 01/23/2024] [Indexed: 02/28/2024] Open
Abstract
Introduction Burns are a global public health problem. Major burns can stimulate the body to enter a stress state, thereby increasing the risk of infection and adversely affecting the patient's prognosis. Recently, it has been discovered that cuproptosis, a form of cell death, is associated with various diseases. Our research aims to explore the molecular clusters associated with cuproptosis in major burns and construct predictive models. Methods We analyzed the expression and immune infiltration characteristics of cuproptosis-related factors in major burn based on the GSE37069 dataset. Using 553 samples from major burn patients, we explored the molecular clusters based on cuproptosis-related genes and their associated immune cell infiltrates. The WGCNA was utilized to identify cluster-specific genes. Subsequently, the performance of different machine learning models was compared to select the optimal model. The effectiveness of the predictive model was validated using Nomogram, calibration curves, decision curves, and an external dataset. Finally, five core genes related to cuproptosis and major burn have been was validated using RT-qPCR. Results In both major burn and normal samples, we determined the cuproptosis-related genes associated with major burns through WGCNA analysis. Through immune infiltrate profiling analysis, we found significant immune differences between different clusters. When K=2, the clustering number is the most stable. GSVA analysis shows that specific genes in cluster 2 are closely associated with various functions. After identifying the cross-core genes, machine learning models indicate that generalized linear models have better accuracy. Ultimately, a generalized linear model for five highly correlated genes was constructed, and validation with an external dataset showed an AUC of 0.982. The accuracy of the model was further verified through calibration curves, decision curves, and modal graphs. Further analysis of clinical relevance revealed that these correlated genes were closely related to time of injury. Conclusion This study has revealed the intricate relationship between cuproptosis and major burns. Research has identified 15 cuproptosis-related genes that are associated with major burn. Through a machine learning model, five core genes related to cuproptosis and major burn have been selected and validated.
Collapse
Affiliation(s)
- Xin Wang
- Medical Center of Burn Plastic and Wound Repair, The First Affiliated Hospital of Nanchang University, Nanchang, Jiangxi, China
| | - Zhenfang Xiong
- Medical Center of Burn Plastic and Wound Repair, The First Affiliated Hospital of Nanchang University, Nanchang, Jiangxi, China
| | - Wangbing Hong
- Medical Center of Burn Plastic and Wound Repair, The First Affiliated Hospital of Nanchang University, Nanchang, Jiangxi, China
| | - Xincheng Liao
- Medical Center of Burn Plastic and Wound Repair, The First Affiliated Hospital of Nanchang University, Nanchang, Jiangxi, China
| | - Guangping Yang
- Medical Center of Burn Plastic and Wound Repair, The First Affiliated Hospital of Nanchang University, Nanchang, Jiangxi, China
| | - Zhengying Jiang
- Medical Center of Burn Plastic and Wound Repair, The First Affiliated Hospital of Nanchang University, Nanchang, Jiangxi, China
| | - Lanxin Jing
- Medical Center of Burn Plastic and Wound Repair, The First Affiliated Hospital of Nanchang University, Nanchang, Jiangxi, China
| | - Shengyu Huang
- Medical Center of Burn Plastic and Wound Repair, The First Affiliated Hospital of Nanchang University, Nanchang, Jiangxi, China
| | - Zhonghua Fu
- Medical Center of Burn Plastic and Wound Repair, The First Affiliated Hospital of Nanchang University, Nanchang, Jiangxi, China
| | - Feng Zhu
- Department of Critical Care Medicine, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
- Department of Burns, The First Affiliated Hospital, Naval Medical University, Shanghai, China
| |
Collapse
|
15
|
Wu C, Luo J, Xiao Y. Multi-omics assists genomic prediction of maize yield with machine learning approaches. MOLECULAR BREEDING : NEW STRATEGIES IN PLANT IMPROVEMENT 2024; 44:14. [PMID: 38343399 PMCID: PMC10853138 DOI: 10.1007/s11032-024-01454-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 01/19/2024] [Indexed: 02/28/2024]
Abstract
With the improvement of high-throughput technologies in recent years, large multi-dimensional plant omics data have been produced, and big-data-driven yield prediction research has received increasing attention. Machine learning offers promising computational and analytical solutions to interpret the biological meaning of large amounts of data in crops. In this study, we utilized multi-omics datasets from 156 maize recombinant inbred lines, containing 2496 single nucleotide polymorphisms (SNPs), 46 image traits (i-traits) from 16 developmental stages obtained through an automatic phenotyping platform, and 133 primary metabolites. Based on benchmark tests with different types of prediction models, some machine learning methods, such as Partial Least Squares (PLS), Random Forest (RF), and Gaussian process with Radial basis function kernel (GaussprRadial), achieved better prediction for maize yield, albeit slight difference for method preferences among i-traits, genomic, and metabolic data. We found that better yield prediction may be caused by various capabilities in ranking and filtering data features, which is found to be linked with biological meaning such as photosynthesis-related or kernel development-related regulations. Finally, by integrating multiple omics data with the RF machine learning approach, we can further improve the prediction accuracy of grain yield from 0.32 to 0.43. Our research provides new ideas for the application of plant omics data and artificial intelligence approaches to facilitate crop genetic improvements. Supplementary Information The online version contains supplementary material available at 10.1007/s11032-024-01454-z.
Collapse
Affiliation(s)
- Chengxiu Wu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070 China
| | - Jingyun Luo
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070 China
| | - Yingjie Xiao
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070 China
- Hubei Hongshan Laboratory, Wuhan, 430070 China
| |
Collapse
|
16
|
Ao YF, Dörr M, Menke MJ, Born S, Heuson E, Bornscheuer UT. Data-Driven Protein Engineering for Improving Catalytic Activity and Selectivity. Chembiochem 2024; 25:e202300754. [PMID: 38029350 DOI: 10.1002/cbic.202300754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 11/28/2023] [Accepted: 11/29/2023] [Indexed: 12/01/2023]
Abstract
Protein engineering is essential for altering the substrate scope, catalytic activity and selectivity of enzymes for applications in biocatalysis. However, traditional approaches, such as directed evolution and rational design, encounter the challenge in dealing with the experimental screening process of a large protein mutation space. Machine learning methods allow the approximation of protein fitness landscapes and the identification of catalytic patterns using limited experimental data, thus providing a new avenue to guide protein engineering campaigns. In this concept article, we review machine learning models that have been developed to assess enzyme-substrate-catalysis performance relationships aiming to improve enzymes through data-driven protein engineering. Furthermore, we prospect the future development of this field to provide additional strategies and tools for achieving desired activities and selectivities.
Collapse
Affiliation(s)
- Yu-Fei Ao
- Department of Biotechnology and Enzyme Catalysis, Institute of Biochemistry, University of Greifswald, Felix-Hausdorff-Str. 4, 17487, Greifswald, Germany
- Beijing National Laboratory for Molecular Sciences, CAS Key Laboratory of Molecular Recognition and Function, Institute of Chemistry, Chinese Academy of Sciences, Zhongguancun North First Street 2, Beijing, 100190, China
- University of Chinese Academy of Sciences, Yuquan Road 19(A), Beijing, 100049, China
| | - Mark Dörr
- Department of Biotechnology and Enzyme Catalysis, Institute of Biochemistry, University of Greifswald, Felix-Hausdorff-Str. 4, 17487, Greifswald, Germany
| | - Marian J Menke
- Department of Biotechnology and Enzyme Catalysis, Institute of Biochemistry, University of Greifswald, Felix-Hausdorff-Str. 4, 17487, Greifswald, Germany
| | - Stefan Born
- Technische Universität Berlin, Chair of Bioprocess Engineering, Ackerstraße 76, 13355, Berlin, Germany
| | - Egon Heuson
- Univ. Lille, CNRS, Centrale Lille, Univ. Artois, UMR 8181 UCCS, Unité de Catalyse et Chimie du Solide, 59000, Lille, France
| | - Uwe T Bornscheuer
- Department of Biotechnology and Enzyme Catalysis, Institute of Biochemistry, University of Greifswald, Felix-Hausdorff-Str. 4, 17487, Greifswald, Germany
| |
Collapse
|
17
|
Bass L, Elder LH, Folescu DE, Forouzesh N, Tolokh IS, Karpatne A, Onufriev AV. Improving the Accuracy of Physics-Based Hydration-Free Energy Predictions by Machine Learning the Remaining Error Relative to the Experiment. J Chem Theory Comput 2024; 20:396-410. [PMID: 38149593 PMCID: PMC10950260 DOI: 10.1021/acs.jctc.3c00981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2023]
Abstract
The accuracy of computational models of water is key to atomistic simulations of biomolecules. We propose a computationally efficient way to improve the accuracy of the prediction of hydration-free energies (HFEs) of small molecules: the remaining errors of the physics-based models relative to the experiment are predicted and mitigated by machine learning (ML) as a postprocessing step. Specifically, the trained graph convolutional neural network attempts to identify the "blind spots" in the physics-based model predictions, where the complex physics of aqueous solvation is poorly accounted for, and partially corrects for them. The strategy is explored for five classical solvent models representing various accuracy/speed trade-offs, from the fast analytical generalized Born (GB) to the popular TIP3P explicit solvent model; experimental HFEs of small neutral molecules from the FreeSolv set are used for the training and testing. For all of the models, the ML correction reduces the resulting root-mean-square error relative to the experiment for HFEs of small molecules, without significant overfitting and with negligible computational overhead. For example, on the test set, the relative accuracy improvement is 47% for the fast analytical GB, making it, after the ML correction, almost as accurate as uncorrected TIP3P. For the TIP3P model, the accuracy improvement is about 39%, bringing the ML-corrected model's accuracy below the 1 kcal/mol threshold. In general, the relative benefit of the ML corrections is smaller for more accurate physics-based models, reaching the lower limit of about 20% relative accuracy gain compared with that of the physics-based treatment alone. The proposed strategy of using ML to learn the remaining error of physics-based models offers a distinct advantage over training ML alone directly on reference HFEs: it preserves the correct overall trend, even well outside of the training set.
Collapse
Affiliation(s)
- Lewis Bass
- Department of Computer Engineering, Virginia Tech, Blacksburg, Virginia 24061, United States
| | - Luke H Elder
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia 24061, United States
| | - Dan E Folescu
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia 24061, United States
- Department of Mathematics, Virginia Tech, Blacksburg, Virginia 24061, United States
| | - Negin Forouzesh
- Department of Computer Science, California State University, Los Angeles, California 90032, United States
| | - Igor S Tolokh
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia 24061, United States
| | - Anuj Karpatne
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia 24061, United States
| | - Alexey V Onufriev
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia 24061, United States
- Department of Physics, Virginia Tech, Blacksburg, Virginia 24061, United States
- Center for Soft Matter and Biological Physics, Virginia Tech, Blacksburg, Virginia 24061, United States
| |
Collapse
|
18
|
Mondello A, Dal Bo M, Toffoli G, Polano M. Machine learning in onco-pharmacogenomics: a path to precision medicine with many challenges. Front Pharmacol 2024; 14:1260276. [PMID: 38264526 PMCID: PMC10803549 DOI: 10.3389/fphar.2023.1260276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 12/26/2023] [Indexed: 01/25/2024] Open
Abstract
Over the past two decades, Next-Generation Sequencing (NGS) has revolutionized the approach to cancer research. Applications of NGS include the identification of tumor specific alterations that can influence tumor pathobiology and also impact diagnosis, prognosis and therapeutic options. Pharmacogenomics (PGx) studies the role of inheritance of individual genetic patterns in drug response and has taken advantage of NGS technology as it provides access to high-throughput data that can, however, be difficult to manage. Machine learning (ML) has recently been used in the life sciences to discover hidden patterns from complex NGS data and to solve various PGx problems. In this review, we provide a comprehensive overview of the NGS approaches that can be employed and the different PGx studies implicating the use of NGS data. We also provide an excursus of the ML algorithms that can exert a role as fundamental strategies in the PGx field to improve personalized medicine in cancer.
Collapse
Affiliation(s)
| | | | | | - Maurizio Polano
- Experimental and Clinical Pharmacology Unit, Centro di Riferimento Oncologico di Aviano (CRO), Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS), Aviano, Italy
| |
Collapse
|
19
|
Li B, Wang Y, Yin Z, Xu L, Xie L, Xu X. Decision tree-based identification of important molecular fragments for protein-ligand binding. Chem Biol Drug Des 2024; 103:e14427. [PMID: 38230776 DOI: 10.1111/cbdd.14427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 11/16/2023] [Accepted: 12/11/2023] [Indexed: 01/18/2024]
Abstract
Fragment-based drug design is an emerging technology in pharmaceutical research and development. One of the key aspects of this technology is the identification and quantitative characterization of molecular fragments. This study presents a strategy for identifying important molecular fragments based on molecular fingerprints and decision tree algorithms and verifies its feasibility in predicting protein-ligand binding affinity. Specifically, the three-dimensional (3D) structures of protein-ligand complexes are encoded using extended-connectivity fingerprints (ECFP), and three decision tree models, namely Random Forest, XGBoost, and LightGBM, are used to quantitatively characterize the feature importance, thereby extracting important molecular fragments with high reliability. Few-shot learning reveals that the extracted molecular fragments contribute significantly and consistently to the binding affinity even with a small sample size. Despite the absence of location and distance information for molecular fragments in ECFP, 3D visualization, in combination with the reverse ECFP process, shows that the majority of the extracted fragments are located at the binding interface of the protein and the ligand. This alignment with the distance constraints critical for binding affinity further supports the reliability of the strategy for identifying important molecular fragments.
Collapse
Affiliation(s)
- Baiyi Li
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, China
| | - Yunsong Wang
- School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Zuode Yin
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, China
| | - Liangxu Xie
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, China
| | - Xiaojun Xu
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, China
| |
Collapse
|
20
|
Fu M, He R, Zhang Z, Ma F, Shen L, Zhang Y, Duan M, Zhang Y, Wang Y, Zhu L, He J. Multinomial machine learning identifies independent biomarkers by integrated metabolic analysis of acute coronary syndrome. Sci Rep 2023; 13:20535. [PMID: 37996510 PMCID: PMC10667512 DOI: 10.1038/s41598-023-47783-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 11/18/2023] [Indexed: 11/25/2023] Open
Abstract
A multi-class classification model for acute coronary syndrome (ACS) remains to be constructed based on multi-fluid metabolomics. Major confounders may exert spurious effects on the relationship between metabolism and ACS. The study aims to identify an independent biomarker panel for the multiclassification of HC, UA, and AMI by integrating serum and urinary metabolomics. We performed a liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based metabolomics study on 300 serum and urine samples from 44 patients with unstable angina (UA), 77 with acute myocardial infarction (AMI), and 29 healthy controls (HC). Multinomial machine learning approaches, including multinomial adaptive least absolute shrinkage and selection operator (LASSO) regression and random forest (RF), and assessment of the confounders were applied to integrate a multi-class classification biomarker panel for HC, UA and AMI. Different metabolic landscapes were portrayed during the transition from HC to UA and then to AMI. Glycerophospholipid metabolism and arginine biosynthesis were predominant during the progression from HC to UA and then to AMI. The multiclass metabolic diagnostic model (MDM) dependent on ACS, including 2-ketobutyric acid, LysoPC(18:2(9Z,12Z)), argininosuccinic acid, and cyclic GMP, demarcated HC, UA, and AMI, providing a C-index of 0.84 (HC vs. UA), 0.98 (HC vs. AMI), and 0.89 (UA vs. AMI). The diagnostic value of MDM largely derives from the contribution of 2-ketobutyric acid, and LysoPC(18:2(9Z,12Z)) in serum. Higher 2-ketobutyric acid and cyclic GMP levels were positively correlated with ACS risk and atherosclerosis plaque burden, while LysoPC(18:2(9Z,12Z)) and argininosuccinic acid showed the reverse relationship. An independent multiclass biomarker panel for HC, UA, and AMI was constructed using the multinomial machine learning methods based on serum and urinary metabolite signatures.
Collapse
Affiliation(s)
- Meijiao Fu
- Ningxia Medical University, Yinchuan, 750004, Ningxia, China
| | - Ruhua He
- Department of Cardiology, General Hospital of Ningxia Medical University, Yinchuan, 750004, Ningxia, China
| | - Zhihan Zhang
- Department of Cardiology, Hanzhong Central Hospital, Hanzhong, 723200, Shanxi, China
| | - Fuqing Ma
- Department of Cardiology, The Fifth People's Hospital of Ningxia, Shizuishan, 753000, Ningxia, China
| | - Libo Shen
- Center for Cardiovascular Diseases, People's Hospital of Ningxia Hui Autonomous Region, Yinchuan, 750002, Ningxia, China
| | - Yu Zhang
- Ningxia Medical University, Yinchuan, 750004, Ningxia, China
| | - Mingyu Duan
- Ningxia Medical University, Yinchuan, 750004, Ningxia, China
| | - Yameng Zhang
- Department of Cardiology, The Second Affiliated Hospital of Henan University of Science and Technology, Luoyang, 471000, Henan, China
| | - Yifan Wang
- Department of Radiology, General Hospital of Ningxia Medical University, Yinchuan, 750004, Ningxia, China
| | - Li Zhu
- Department of Radiology, General Hospital of Ningxia Medical University, Yinchuan, 750004, Ningxia, China.
| | - Jun He
- Department of Cardiology, General Hospital of Ningxia Medical University, Yinchuan, 750004, Ningxia, China.
| |
Collapse
|
21
|
Xia S, Chen E, Zhang Y. Integrated Molecular Modeling and Machine Learning for Drug Design. J Chem Theory Comput 2023; 19:7478-7495. [PMID: 37883810 PMCID: PMC10653122 DOI: 10.1021/acs.jctc.3c00814] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 10/10/2023] [Accepted: 10/11/2023] [Indexed: 10/28/2023]
Abstract
Modern therapeutic development often involves several stages that are interconnected, and multiple iterations are usually required to bring a new drug to the market. Computational approaches have increasingly become an indispensable part of helping reduce the time and cost of the research and development of new drugs. In this Perspective, we summarize our recent efforts on integrating molecular modeling and machine learning to develop computational tools for modulator design, including a pocket-guided rational design approach based on AlphaSpace to target protein-protein interactions, delta machine learning scoring functions for protein-ligand docking as well as virtual screening, and state-of-the-art deep learning models to predict calculated and experimental molecular properties based on molecular mechanics optimized geometries. Meanwhile, we discuss remaining challenges and promising directions for further development and use a retrospective example of FDA approved kinase inhibitor Erlotinib to demonstrate the use of these newly developed computational tools.
Collapse
Affiliation(s)
- Song Xia
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Eric Chen
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department
of Chemistry, New York University, New York, New York 10003, United States
- Simons
Center for Computational Physical Chemistry at New York University, New York, New York 10003, United States
- NYU-ECNU
Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
22
|
Shi Y, Zhang C, Pan S, Chen Y, Miao X, He G, Wu Y, Ye H, Weng C, Zhang H, Zhou W, Yang X, Liang C, Chen D, Hong L, Su F. The diagnosis of tuberculous meningitis: advancements in new technologies and machine learning algorithms. Front Microbiol 2023; 14:1290746. [PMID: 37942080 PMCID: PMC10628659 DOI: 10.3389/fmicb.2023.1290746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Accepted: 10/09/2023] [Indexed: 11/10/2023] Open
Abstract
Tuberculous meningitis (TBM) poses a diagnostic challenge, particularly impacting vulnerable populations such as infants and those with untreated HIV. Given the diagnostic intricacies of TBM, there's a pressing need for rapid and reliable diagnostic tools. This review scrutinizes the efficacy of up-and-coming technologies like machine learning in transforming TBM diagnostics and management. Advanced diagnostic technologies like targeted gene sequencing, real-time polymerase chain reaction (RT-PCR), miRNA assays, and metagenomic next-generation sequencing (mNGS) offer promising avenues for early TBM detection. The capabilities of these technologies are further augmented when paired with mass spectrometry, metabolomics, and proteomics, enriching the pool of disease-specific biomarkers. Machine learning algorithms, adept at sifting through voluminous datasets like medical imaging, genomic profiles, and patient histories, are increasingly revealing nuanced disease pathways, thereby elevating diagnostic accuracy and guiding treatment strategies. While these burgeoning technologies offer hope for more precise TBM diagnosis, hurdles remain in terms of their clinical implementation. Future endeavors should zero in on the validation of these tools through prospective studies, critically evaluating their limitations, and outlining protocols for seamless incorporation into established healthcare frameworks. Through this review, we aim to present an exhaustive snapshot of emerging diagnostic modalities in TBM, the current standing of machine learning in meningitis diagnostics, and the challenges and future prospects of converging these domains.
Collapse
Affiliation(s)
- Yi Shi
- Department of Infectious Diseases, Wenzhou Central Hospital, Wenzhou, China
- The First School of Medicine, Wenzhou Medical University, Wenzhou, China
| | - Chengxi Zhang
- School of Materials Science and Engineering, Shandong Jianzhu University, Jinan, China
| | - Shuo Pan
- The First School of Medicine, Wenzhou Medical University, Wenzhou, China
| | - Yi Chen
- The First School of Medicine, Wenzhou Medical University, Wenzhou, China
| | - Xingguo Miao
- Department of Infectious Diseases, Wenzhou Central Hospital, Wenzhou, China
- Department of Infectious Diseases, Wenzhou Sixth People’s Hospital, Wenzhou, China
- Wenzhou Key Laboratory of Diagnosis and Treatment of Emerging and Recurrent Infectious Diseases, Wenzhou, China
| | - Guoqiang He
- Postgraduate Training Base Alliance of Wenzhou Medical University, Wenzhou, China
- Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, China
| | - Yanchan Wu
- School of Electrical and Information Engineering, Quzhou University, Quzhou, China
| | - Hui Ye
- Department of Infectious Diseases, Wenzhou Central Hospital, Wenzhou, China
- Department of Infectious Diseases, Wenzhou Sixth People’s Hospital, Wenzhou, China
- Wenzhou Key Laboratory of Diagnosis and Treatment of Emerging and Recurrent Infectious Diseases, Wenzhou, China
| | - Chujun Weng
- The Fourth Affiliated Hospital Zhejiang University School of Medicine, Yiwu, China
| | - Huanhuan Zhang
- School and Hospital of Stomatology, Wenzhou Medical University, Wenzhou, China
| | - Wenya Zhou
- School and Hospital of Stomatology, Wenzhou Medical University, Wenzhou, China
| | - Xiaojie Yang
- Wenzhou Medical University Renji College, Wenzhou, China
| | - Chenglong Liang
- The First School of Medicine, Wenzhou Medical University, Wenzhou, China
| | - Dong Chen
- Wenzhou Key Laboratory of Diagnosis and Treatment of Emerging and Recurrent Infectious Diseases, Wenzhou, China
- Wenzhou Central Blood Station, Wenzhou, China
| | - Liang Hong
- Department of Infectious Diseases, The Third Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Feifei Su
- Department of Infectious Diseases, Wenzhou Central Hospital, Wenzhou, China
- Department of Infectious Diseases, Wenzhou Sixth People’s Hospital, Wenzhou, China
- Wenzhou Key Laboratory of Diagnosis and Treatment of Emerging and Recurrent Infectious Diseases, Wenzhou, China
| |
Collapse
|
23
|
Silva Junior HC, Menezes HNS, Ferreira GB, Guedes GP. Rapid and Accurate Prediction of the Axial Magnetic Anisotropy in Cobalt(II) Complexes Using a Machine-Learning Approach. Inorg Chem 2023; 62:14838-14842. [PMID: 37676736 DOI: 10.1021/acs.inorgchem.3c02569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
Estimating the magnetic anisotropy for single-ion magnets is complex due to its multireference nature. This study demonstrates that deep neural networks (DNNs) can provide accurate axial magnetic anisotropy (D) values, closely matching the complete-active-space self-consistent-field (CASSCF) quality using density functional theory (DFT) data. We curated an 86-parameter database (UFF1) with electronic data from over 33000 cobalt(II) compounds. The DNN achieved an R2 of 0.906 and a mean absolute error of 18.1 cm-1 in comparison to reference CASSCF D values. Remarkably, it is 11 times more accurate than DFT methods and 7700 times faster. This approach hints at DNNs predicting the anisotropy in larger molecules, even when trained on smaller ligands.
Collapse
Affiliation(s)
- Henrique C Silva Junior
- Instituto de Química, Universidade Federal Fluminense, Niterói, Rio de Janeiro 24020-141, Brazil
| | - Heloisa N S Menezes
- Instituto de Química, Universidade Federal Fluminense, Niterói, Rio de Janeiro 24020-141, Brazil
| | - Glaucio B Ferreira
- Instituto de Química, Universidade Federal Fluminense, Niterói, Rio de Janeiro 24020-141, Brazil
| | - Guilherme P Guedes
- Instituto de Química, Universidade Federal Fluminense, Niterói, Rio de Janeiro 24020-141, Brazil
| |
Collapse
|