1
|
Alsenan S, Al-Turaiki I, Aldayel M, Tounsi M. Role of Optimization in RNA-Protein-Binding Prediction. Curr Issues Mol Biol 2024; 46:1360-1373. [PMID: 38392205 PMCID: PMC11154364 DOI: 10.3390/cimb46020087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 01/25/2024] [Accepted: 01/31/2024] [Indexed: 02/24/2024] Open
Abstract
RNA-binding proteins (RBPs) play an important role in regulating biological processes, such as gene regulation. Understanding their behaviors, for example, their binding site, can be helpful in understanding RBP-related diseases. Studies have focused on predicting RNA binding by means of machine learning algorithms including deep convolutional neural network models. One of the integral parts of modeling deep learning is achieving optimal hyperparameter tuning and minimizing a loss function using optimization algorithms. In this paper, we investigate the role of optimization in the RBP classification problem using the CLIP-Seq 21 dataset. Three optimization methods are employed on the RNA-protein binding CNN prediction model; namely, grid search, random search, and Bayesian optimizer. The empirical results show an AUC of 94.42%, 93.78%, 93.23% and 92.68% on the ELAVL1C, ELAVL1B, ELAVL1A, and HNRNPC datasets, respectively, and a mean AUC of 85.30 on 24 datasets. This paper's findings provide evidence on the role of optimizers in improving the performance of RNA-protein binding prediction.
Collapse
Affiliation(s)
- Shrooq Alsenan
- Information Systems Department, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Isra Al-Turaiki
- Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh 11653, Saudi Arabia;
| | - Mashael Aldayel
- Information Technology Department, College of Computer and Information Sciences, King Saud University, Riyadh 11451, Saudi Arabia;
| | - Mohamed Tounsi
- Department of Computer Science, College of Computer and information Sciences, Prince Sultan University, P.O. Box 66833, Riyadh 12435, Saudi Arabia;
| |
Collapse
|
2
|
Dragan P, Joshi K, Atzei A, Latek D. Keras/TensorFlow in Drug Design for Immunity Disorders. Int J Mol Sci 2023; 24:15009. [PMID: 37834457 PMCID: PMC10573944 DOI: 10.3390/ijms241915009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 09/21/2023] [Accepted: 09/29/2023] [Indexed: 10/15/2023] Open
Abstract
Homeostasis of the host immune system is regulated by white blood cells with a variety of cell surface receptors for cytokines. Chemotactic cytokines (chemokines) activate their receptors to evoke the chemotaxis of immune cells in homeostatic migrations or inflammatory conditions towards inflamed tissue or pathogens. Dysregulation of the immune system leading to disorders such as allergies, autoimmune diseases, or cancer requires efficient, fast-acting drugs to minimize the long-term effects of chronic inflammation. Here, we performed structure-based virtual screening (SBVS) assisted by the Keras/TensorFlow neural network (NN) to find novel compound scaffolds acting on three chemokine receptors: CCR2, CCR3, and one CXC receptor, CXCR3. Keras/TensorFlow NN was used here not as a typically used binary classifier but as an efficient multi-class classifier that can discard not only inactive compounds but also low- or medium-activity compounds. Several compounds proposed by SBVS and NN were tested in 100 ns all-atom molecular dynamics simulations to confirm their binding affinity. To improve the basic binding affinity of the compounds, new chemical modifications were proposed. The modified compounds were compared with known antagonists of these three chemokine receptors. Known CXCR3 compounds were among the top predicted compounds; thus, the benefits of using Keras/TensorFlow in drug discovery have been shown in addition to structure-based approaches. Furthermore, we showed that Keras/TensorFlow NN can accurately predict the receptor subtype selectivity of compounds, for which SBVS often fails. We cross-tested chemokine receptor datasets retrieved from ChEMBL and curated datasets for cannabinoid receptors. The NN model trained on the cannabinoid receptor datasets retrieved from ChEMBL was the most accurate in the receptor subtype selectivity prediction. Among NN models trained on the chemokine receptor datasets, the CXCR3 model showed the highest accuracy in differentiating the receptor subtype for a given compound dataset.
Collapse
Affiliation(s)
- Paulina Dragan
- Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-903 Warsaw, Poland; (P.D.); (A.A.)
| | - Kavita Joshi
- Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-903 Warsaw, Poland; (P.D.); (A.A.)
| | - Alessandro Atzei
- Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-903 Warsaw, Poland; (P.D.); (A.A.)
- Department of Life and Environmental Science, Food Toxicology Unit, University of Cagliari, University Campus of Monserrato, SS 554, 09042 Cagliari, Italy
| | - Dorota Latek
- Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-903 Warsaw, Poland; (P.D.); (A.A.)
| |
Collapse
|
3
|
Zhu Z, Dou B, Cao Y, Jiang J, Zhu Y, Chen D, Feng H, Liu J, Zhang B, Zhou T, Wei GW. TIDAL: Topology-Inferred Drug Addiction Learning. J Chem Inf Model 2023; 63:1472-1489. [PMID: 36826415 DOI: 10.1021/acs.jcim.3c00046] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Abstract
Drug addiction is a global public health crisis, and the design of antiaddiction drugs remains a major challenge due to intricate mechanisms. Since experimental drug screening and optimization are too time-consuming and expensive, there is urgent need to develop innovative artificial intelligence (AI) methods for addressing the challenge. We tackle this challenge by topology-inferred drug addiction learning (TIDAL) built from integrating multiscale topological Laplacians, deep bidirectional transformer, and ensemble-assisted neural networks (EANNs). Multiscale topological Laplacians are a novel class of algebraic topology tools that embed molecular topological invariants and algebraic invariants into its harmonic spectra and nonharmonic spectra, respectively. These invariants complement sequence information extracted from a bidirectional transformer. We validate the proposed TIDAL framework on 22 drug addiction related, 4 hERG, and 12 DAT data sets, which suggests that the proposed TIDAL is a state-of-the-art framework for the modeling and analysis of drug addiction data. We carry out cross-target analysis of the current drug addiction candidates to alert their side effects and identify their repurposing potentials. Our analysis reveals drug-mediated linear and bilinear target correlations. Finally, TIDAL is applied to shed light on relative efficacy, repurposing potential, and potential side effects of 12 existing antiaddiction medications. Our results suggest that TIDAL provides a new computational strategy for pressingly needed antisubstance addiction drug development.
Collapse
Affiliation(s)
- Zailiang Zhu
- School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, 430200, P R. China
| | - Bozheng Dou
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, 430200, P R. China
| | - Yukang Cao
- School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, 430200, P R. China
| | - Jian Jiang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, 430200, P R. China.,Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yueying Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, 430200, P R. China
| | - Dong Chen
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Hongsong Feng
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Jie Liu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, 430200, P R. China
| | - Bengong Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, 430200, P R. China
| | - Tianshou Zhou
- Key Laboratory of Computational Mathematics, Guangdong Province, and School of Mathematics, Sun Yat-sen University, Guangzhou, 510006, P R. China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States.,Department of Electrical and Computer Engineering Michigan State University, East Lansing, Michigan 48824, United States.,Department of Biochemistry and Molecular Biology Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
4
|
Xu Z, Mo L, Zhou J, Fang W, Qin H. Stepwise decomposition-integration-prediction framework for runoff forecasting considering boundary correction. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 851:158342. [PMID: 36037902 DOI: 10.1016/j.scitotenv.2022.158342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 08/21/2022] [Accepted: 08/23/2022] [Indexed: 06/15/2023]
Abstract
Predicting river runoff accurately is of substantial significance for flood control, water resource allocation, and basin ecological dispatching. To explore the reasonable and effective application of time series decomposition in runoff forecasting, this study proposed a novel stepwise decomposition-integration-prediction considering boundary correction (SDIPBC) framework by using the stepwise decomposition sampling method and multi-input neural network. On this basis, we implemented a hybrid forecasting model combining seasonal-trend decomposition procedures based on loess (STL) with the long short-term memory (LSTM) network called STL-LSTM (SDIPBC) to estimate mid-long term river runoff. The reliability of the method was assessed using the historical runoff series of the Lianghekou and Jinping I Reservoirs in the Yalong River Basin, China, and developed several single models and hybrid models for comparative experiments. The results show that the existing decomposition-based hybrid forecasting frameworks are not suitable for practical runoff forecasting. The proposed SDIPBC framework can avoid using future information and improve the prediction accuracy of the single prediction model. For the Nash-Sutcliffe efficiency coefficient (NSE), the ten-day runoff forecasting accuracy of STL-LSTM (SDIPBC) in Lianghekou reservoir and Jinping I Reservoirs reached 0.845 and 0.862 respectively, which improved 1.81 % and 2.38 % than the single LSTM model, indicating that this is a practical and reliable decomposition-based hybrid runoff forecasting method.
Collapse
Affiliation(s)
- Zhanxing Xu
- School of Civil and Hydraulic Engineering, Huazhong University of Science and Technology, Wuhan 430074, China; Hubei Key Laboratory of Digital Valley Science and Technology, Wuhan 430074, China
| | - Li Mo
- School of Civil and Hydraulic Engineering, Huazhong University of Science and Technology, Wuhan 430074, China; Hubei Key Laboratory of Digital Valley Science and Technology, Wuhan 430074, China.
| | - Jianzhong Zhou
- School of Civil and Hydraulic Engineering, Huazhong University of Science and Technology, Wuhan 430074, China; Hubei Key Laboratory of Digital Valley Science and Technology, Wuhan 430074, China.
| | - Wei Fang
- School of Civil and Hydraulic Engineering, Huazhong University of Science and Technology, Wuhan 430074, China; Hubei Key Laboratory of Digital Valley Science and Technology, Wuhan 430074, China
| | - Hui Qin
- School of Civil and Hydraulic Engineering, Huazhong University of Science and Technology, Wuhan 430074, China; Hubei Key Laboratory of Digital Valley Science and Technology, Wuhan 430074, China
| |
Collapse
|
5
|
Learning-Based Clutter Mitigation with Subspace Projection and Sparse Representation in Holographic Subsurface Radar Imaging. REMOTE SENSING 2022. [DOI: 10.3390/rs14030682] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
The holographic subsurface radar (HSR) is an effective remote sensing modality for surveying shallowly buried objects with high resolution images in plan-view. However, strong reflections from the rough surface and inhomogeneities obscure the detection of stationary targets response. In this paper, a learning-based method is proposed to mitigate the clutter in HSR applications. The proposed method first decomposes the HSR image into raw clutter and target data using an adaptive subspace projection approach. Then, the autoencoder is applied to carry out unsupervised learning to extract the target features and mitigate the clutter. The sparse representation is also combined to further optimize the model and the alternating direction multiplier method (ADMM) is used to solve the optimization problem for precision and efficiency. Experiments using real data were conducted to demonstrate that the proposed method can effectively mitigate the strong clutter with the target preserved. The visual and quantitative results show that the proposed method achieves superior performance on suppressing clutter in HSR images compared with the widely used state-of-the-art clutter mitigation approaches.
Collapse
|
6
|
Babu T, Singh T, Gupta D, Hameed S. Colon cancer prediction on histological images using deep learning features and Bayesian optimized SVM. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-189850] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Colon cancer is one of the highest cancer diagnosis mortality rates worldwide. However, relying on the expertise of pathologists is a demanding and time-consuming process for histopathological analysis. The automated diagnosis of colon cancer from biopsy examination played an important role for patients and prognosis. As conventional handcrafted feature extraction requires specialized experience to select realistic features, deep learning processes have been chosen as abstract high-level features may be extracted automatically. This paper presents the colon cancer detection system using transfer learning architectures to automatically extract high-level features from colon biopsy images for automated diagnosis of patients and prognosis. In this study, the image features are extracted from a pre-trained convolutional neural network (CNN) and used to train the Bayesian optimized Support Vector Machine classifier. Moreover, Alexnet, VGG-16, and Inception-V3 pre-trained neural networks were used to analyze the best network for colon cancer detection. Furthermore, the proposed framework is evaluated using four datasets: two are collected from Indian hospitals (with different magnifications 4X, 10X, 20X, and 40X) and the other two are public colon image datasets. Compared with the existing classifiers and methods using public datasets, the test results evaluated the Inception-V3 network with the accuracy range from 96.5% - 99% as best suited for the proposed framework.
Collapse
Affiliation(s)
- Tina Babu
- Department of Computer Science and Engineering, Amrita School of Engineering, Bengaluru, Amrita Vishwa Vidyapeetham, India
| | - Tripty Singh
- Department of Computer Science and Engineering, Amrita School of Engineering, Bengaluru, Amrita Vishwa Vidyapeetham, India
| | - Deepa Gupta
- Department of Computer Science and Engineering, Amrita School of Engineering, Bengaluru, Amrita Vishwa Vidyapeetham, India
| | - Shahin Hameed
- Department of Pathology, MVR Cancer Center and Research Institute, Poolacode, Kerala, India
| |
Collapse
|
7
|
Warszycki D, Struski Ł, Śmieja M, Kafel R, Kurczab R. Pharmacoprint: A Combination of a Pharmacophore Fingerprint and Artificial Intelligence as a Tool for Computer-Aided Drug Design. J Chem Inf Model 2021; 61:5054-5065. [PMID: 34547888 DOI: 10.1021/acs.jcim.1c00589] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Structural fingerprints and pharmacophore modeling are methodologies that have been used for at least 2 decades in various fields of cheminformatics, from similarity searching to machine learning (ML). Advances in in silico techniques consequently led to combining both these methodologies into a new approach known as the pharmacophore fingerprint. Herein, we propose a high-resolution, pharmacophore fingerprint called Pharmacoprint that encodes the presence, types, and relationships between pharmacophore features of a molecule. Pharmacoprint was evaluated in classification experiments by using ML algorithms (logistic regression, support vector machines, linear support vector machines, and neural networks) and outperformed other popular molecular fingerprints (i.e., ECFP4, Estate, MACCS, PubChem, Substructure, Klekota-Roth, CDK, Extended, and GraphOnly) and the ChemAxon pharmacophoric features fingerprint. Pharmacoprint consisted of 39 973 bits; several methods were applied for dimensionality reduction, and the best algorithm not only reduced the length of the bit string but also improved the efficiency of the ML tests. Further optimization allowed us to define the best parameter settings for using Pharmacoprint in discrimination tests and for maximizing statistical parameters. Finally, Pharmacoprint generated for three-dimensional (3D) structures with defined hydrogens as input data was applied to neural networks with a supervised autoencoder for selecting the most important bits and allowed us to maximize the Matthews correlation coefficient up to 0.962. The results show the potential of Pharmacoprint as a new, perspective tool for computer-aided drug design.
Collapse
Affiliation(s)
- Dawid Warszycki
- Maj Institute of Pharmacology Polish Academy of Sciences, Smetna 12 Street, 31-343, Cracow, Poland
| | - Łukasz Struski
- Faculty of Mathematics and Computer Science, Jagiellonian University, 6 Lojasiewicza Street, 30-348, Cracow, Poland
| | - Marek Śmieja
- Faculty of Mathematics and Computer Science, Jagiellonian University, 6 Lojasiewicza Street, 30-348, Cracow, Poland
| | - Rafał Kafel
- Maj Institute of Pharmacology Polish Academy of Sciences, Smetna 12 Street, 31-343, Cracow, Poland
| | - Rafał Kurczab
- Maj Institute of Pharmacology Polish Academy of Sciences, Smetna 12 Street, 31-343, Cracow, Poland
| |
Collapse
|
8
|
Mekni N, Coronnello C, Langer T, Rosa MD, Perricone U. Support Vector Machine as a Supervised Learning for the Prioritization of Novel Potential SARS-CoV-2 Main Protease Inhibitors. Int J Mol Sci 2021; 22:7714. [PMID: 34299333 PMCID: PMC8305792 DOI: 10.3390/ijms22147714] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Revised: 07/14/2021] [Accepted: 07/15/2021] [Indexed: 12/04/2022] Open
Abstract
In the last year, the COVID-19 pandemic has highly affected the lifestyle of the world population, encouraging the scientific community towards a great effort on studying the infection molecular mechanisms. Several vaccine formulations are nowadays available and helping to reach immunity. Nevertheless, there is a growing interest towards the development of novel anti-covid drugs. In this scenario, the main protease (Mpro) represents an appealing target, being the enzyme responsible for the cleavage of polypeptides during the viral genome transcription. With the aim of sharing new insights for the design of novel Mpro inhibitors, our research group developed a machine learning approach using the support vector machine (SVM) classification. Starting from a dataset of two million commercially available compounds, the model was able to classify two hundred novel chemo-types as potentially active against the viral protease. The compounds labelled as actives by SVM were next evaluated through consensus docking studies on two PDB structures and their binding mode was compared to well-known protease inhibitors. The best five compounds selected by consensus docking were then submitted to molecular dynamics to deepen binding interactions stability. Of note, the compounds selected via SVM retrieved all the most important interactions known in the literature.
Collapse
Affiliation(s)
- Nedra Mekni
- Department of Pharmaceutical Chemistry, University of Vienna, 1090 Vienna, Austria;
- Drug Discovery Unit, Fondazione Ri.MED, 90128 Palermo, Italy; (C.C.); (M.D.R.)
| | - Claudia Coronnello
- Drug Discovery Unit, Fondazione Ri.MED, 90128 Palermo, Italy; (C.C.); (M.D.R.)
| | - Thierry Langer
- Department of Pharmaceutical Chemistry, University of Vienna, 1090 Vienna, Austria;
| | - Maria De Rosa
- Drug Discovery Unit, Fondazione Ri.MED, 90128 Palermo, Italy; (C.C.); (M.D.R.)
| | - Ugo Perricone
- Drug Discovery Unit, Fondazione Ri.MED, 90128 Palermo, Italy; (C.C.); (M.D.R.)
| |
Collapse
|
9
|
Osman MH, Mohamed RH, Sarhan HM, Park EJ, Baik SH, Lee KY, Kang J. Machine Learning Model for Predicting Postoperative Survival of Patients with Colorectal Cancer. Cancer Res Treat 2021; 54:517-524. [PMID: 34126702 PMCID: PMC9016295 DOI: 10.4143/crt.2021.206] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 06/13/2021] [Indexed: 11/24/2022] Open
Abstract
Purpose Machine learning (ML) is a strong candidate for making accurate predictions, as we can use large amount of data with powerful computational algorithms. We developed a ML based model to predict survival of patients with colorectal cancer (CRC) using data from two independent datasets. Materials and Methods A total of 364,316 and 1,572 CRC patients were included from the Surveillance, Epidemiology, and End Results (SEER) and a Korean dataset, respectively. As SEER combines data from 18 cancer registries, internal validation was done using 18-Fold-Cross-Validation then external validation was performed by testing the trained model on the Korean dataset. Performance was evaluated using area under the receiver operating characteristic curve (AUROC), sensitivity and positive predictive values. Results Clinicopathological characteristics were significantly different between the two datasets and the SEER showed a significant lower 5-year survival rate compared to the Korean dataset (60.1% vs. 75.3%, p < 0.001). The ML-based model using the Light gradient boosting algorithm achieved a better performance in predicting 5-year-survival compared to American Joint Committee on Cancer stage (AUROC, 0.804 vs. 0.736; p < 0.001). The most important features which influenced model performance were age, number of examined lymph nodes, and tumor size. Sensitivity and positive predictive values of predicting 5-year-survival for classes including dead or alive were reported as 68.14%, 77.51% and 49.88%, 88.1% respectively in the validation set. Survival probability can be checked using the web-based survival predictor (http://colorectalcancer.pythonanywhere.com). Conclusion ML-based model achieved a much better performance compared to staging in individualized estimation of survival of patients with CRC.
Collapse
Affiliation(s)
| | | | | | - Eun Jung Park
- Department of Surgery, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, Korea
| | - Seung Hyuk Baik
- Department of Surgery, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, Korea
| | - Kang Young Lee
- Department of Surgery, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea
| | - Jeonghyun Kang
- Department of Surgery, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, Korea
| |
Collapse
|
10
|
Cardoso VGK, Poppi RJ. Non-invasive identification of commercial green tea blends using NIR spectroscopy and support vector machine. Microchem J 2021. [DOI: 10.1016/j.microc.2021.106052] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
11
|
Shen WX, Zeng X, Zhu F, Wang YL, Qin C, Tan Y, Jiang YY, Chen YZ. Out-of-the-box deep learning prediction of pharmaceutical properties by broadly learned knowledge-based molecular representations. NAT MACH INTELL 2021. [DOI: 10.1038/s42256-021-00301-6] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
12
|
Wu L, Hu C, Liu WV. Forecasting the deterioration of cement-based mixtures under sulfuric acid attack using support vector regression based on Bayesian optimization. SN APPLIED SCIENCES 2020. [DOI: 10.1007/s42452-020-03778-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
|
13
|
Yahav A, Zurakhov G, Adler O, Adam D. Strain Curve Classification Using Supervised Machine Learning Algorithm with Physiologic Constraints. ULTRASOUND IN MEDICINE & BIOLOGY 2020; 46:2424-2438. [PMID: 32505614 DOI: 10.1016/j.ultrasmedbio.2020.03.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Revised: 03/05/2020] [Accepted: 03/05/2020] [Indexed: 06/11/2023]
Abstract
Speckle tracking echocardiography (STE) enables quantification of myocardial deformation by a generation of spatiotemporal strain curves or time-strain curves (TSCs). Currently, only assessment of peak global longitudinal strain is employed in clinical practice because of the uncertainty in the accuracy of STE. We describe a supervised machine learning, physiologically constrained, fully automatic algorithm, trained with labeled data, for classification of TSCs into physiologic or artifactual classes. The data set of 415 healthy patients, with three cine loops per patient, corresponding to the three standard 2-D longitudinal views, was processed using a previously published, in-house STE software termed K-SAD. We report an accuracy of 86.4% for classifying TSCs as physiologic, artifactual and undetermined curves. The positive predictive value for a physiologic strain curve is 89%. This is as a necessary step for a similar separation of pathologic conditions, to allow full utilization of the temporal information concealed in layer-specific segmental TSCs.
Collapse
Affiliation(s)
- Amir Yahav
- Faculty of Biomedical Engineering, Technion-Israel Institute of Technology, Haifa, Israel.
| | - Grigoriy Zurakhov
- Faculty of Biomedical Engineering, Technion-Israel Institute of Technology, Haifa, Israel
| | - Omri Adler
- Faculty of Biomedical Engineering, Technion-Israel Institute of Technology, Haifa, Israel
| | - Dan Adam
- Faculty of Biomedical Engineering, Technion-Israel Institute of Technology, Haifa, Israel
| |
Collapse
|
14
|
HamediRad M, Chao R, Weisberg S, Lian J, Sinha S, Zhao H. Towards a fully automated algorithm driven platform for biosystems design. Nat Commun 2019; 10:5150. [PMID: 31723141 PMCID: PMC6853954 DOI: 10.1038/s41467-019-13189-z] [Citation(s) in RCA: 71] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2019] [Accepted: 10/24/2019] [Indexed: 12/16/2022] Open
Abstract
Large-scale data acquisition and analysis are often required in the successful implementation of the design, build, test, and learn (DBTL) cycle in biosystems design. However, it has long been hindered by experimental cost, variability, biases, and missed insights from traditional analysis methods. Here, we report the application of an integrated robotic system coupled with machine learning algorithms to fully automate the DBTL process for biosystems design. As proof of concept, we have demonstrated its capacity by optimizing the lycopene biosynthetic pathway. This fully-automated robotic platform, BioAutomata, evaluates less than 1% of possible variants while outperforming random screening by 77%. A paired predictive model and Bayesian algorithm select experiments which are performed by Illinois Biological Foundry for Advanced Biomanufacturing (iBioFAB). BioAutomata excels with black-box optimization problems, where experiments are expensive and noisy and the success of the experiment is not dependent on extensive prior knowledge of biological mechanisms. Existing efforts have been focused on one of the elements in the automation of the design, build, test, and learn (DBTL) cycle for biosystems design. Here, the authors integrate a robotic system with machine learning algorithms to fully automate the DBTL cycle and apply it in optimizing the lycopene biosynthetic pathway.
Collapse
Affiliation(s)
- Mohammad HamediRad
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.,LifeFoundry Inc., 60 Hazelwood Dr., Champaign, IL, 61820, USA
| | - Ran Chao
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.,LifeFoundry Inc., 60 Hazelwood Dr., Champaign, IL, 61820, USA
| | - Scott Weisberg
- Department of Biochemistry, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Jiazhang Lian
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.,Key Laboratory of Biomass Chemical Engineering of Ministry of Education, College of Chemical and Biological Engineering, Zhejiang University, 310027, Hangzhou, China
| | - Saurabh Sinha
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA. .,Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
| | - Huimin Zhao
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA. .,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA. .,Department of Biochemistry, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA. .,Departments of Chemistry and Bioengineering, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
| |
Collapse
|
15
|
Exploring the Potential of Spherical Harmonics and PCVM for Compounds Activity Prediction. Int J Mol Sci 2019; 20:ijms20092175. [PMID: 31052500 PMCID: PMC6539940 DOI: 10.3390/ijms20092175] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 04/14/2019] [Accepted: 04/29/2019] [Indexed: 01/11/2023] Open
Abstract
Biologically active chemical compounds may provide remedies for several diseases. Meanwhile, Machine Learning techniques applied to Drug Discovery, which are cheaper and faster than wet-lab experiments, have the capability to more effectively identify molecules with the expected pharmacological activity. Therefore, it is urgent and essential to develop more representative descriptors and reliable classification methods to accurately predict molecular activity. In this paper, we investigate the potential of a novel representation based on Spherical Harmonics fed into Probabilistic Classification Vector Machines classifier, namely SHPCVM, to compound the activity prediction task. We make use of representation learning to acquire the features which describe the molecules as precise as possible. To verify the performance of SHPCVM ten-fold cross-validation tests are performed on twenty-one G protein-coupled receptors (GPCRs). Experimental outcomes (accuracy of 0.86) assessed by the classification accuracy, precision, recall, Matthews’ Correlation Coefficient and Cohen’s kappa reveal that using our Spherical Harmonics-based representation which is relatively short and Probabilistic Classification Vector Machines can achieve very satisfactory performance results for GPCRs.
Collapse
|
16
|
Martinez-Cantin R. Funneled Bayesian Optimization for Design, Tuning and Control of Autonomous Systems. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:1489-1500. [PMID: 29993824 DOI: 10.1109/tcyb.2018.2805695] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, we tackle several problems that appear in robotics and autonomous systems: algorithm tuning, automatic control, and intelligent design. All those problems share in common that they can be mapped to global optimization problems where evaluations are expensive. Bayesian optimization (BO) has become a fundamental global optimization algorithm in many problems where sample efficiency is of paramount importance. BO uses a probabilistic surrogate model to learn the response function and reduce the number of samples required. Gaussian processes (GPs) have become a standard surrogate model for their flexibility to represent a distribution over functions. In a black-box settings, the common assumption is that the underlying function can be modeled with a stationary GP. In this paper, we present a novel kernel function specially designed for BO, that allows nonstationary behavior of the surrogate model in an adaptive local region. This kernel is able to reconstruct nonstationarity even with the irregular sampling distribution that arises from BO. Furthermore, in our experiments, we found that this new kernel results in an improved local search (exploitation), without penalizing the global search (exploration) in many applications. We provide extensive results in well-known optimization benchmarks, machine learning hyperparameter tuning, reinforcement learning, and control problems, and UAV wing optimization. The results show that the new method is able to outperform the state of the art in BO both in stationary and nonstationary problems.
Collapse
|
17
|
Zhang Y, Wang Y, Zhou W, Fan Y, Zhao J, Zhu L, Lu S, Lu T, Chen Y, Liu H. A combined drug discovery strategy based on machine learning and molecular docking. Chem Biol Drug Des 2019; 93:685-699. [PMID: 30688405 DOI: 10.1111/cbdd.13494] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Revised: 01/04/2019] [Accepted: 01/19/2019] [Indexed: 12/14/2022]
Abstract
Data mining methods based on machine learning play an increasingly important role in drug design and discovery. In the current work, eight machine learning methods including decision trees, k-Nearest neighbor, support vector machines, random forests, extremely randomized trees, AdaBoost, gradient boosting trees, and XGBoost were evaluated comprehensively through a case study of ACC inhibitor data sets. Internal and external data sets were employed for cross-validation of the eight machine learning methods. Results showed that the extremely randomized trees model performed best and was adopted as the first step of virtual screening. Together with structure-based virtual screening in the second step, this combined strategy obtained desirable results. This work indicates that the combination of machine learning methods with traditional structure-based virtual screening can effectively strengthen the ability in finding potential hits from large compound database for a given target.
Collapse
Affiliation(s)
- Yanmin Zhang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Yuchen Wang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Weineng Zhou
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Yuanrong Fan
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Junnan Zhao
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Lu Zhu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Shuai Lu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Tao Lu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China.,State Key Laboratory of Natural Medicines, China Pharmaceutical University, Nanjing, China
| | - Yadong Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Haichun Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| |
Collapse
|
18
|
Cai Y, Yang H, Li W, Liu G, Lee PW, Tang Y. Computational Prediction of Site of Metabolism for UGT-Catalyzed Reactions. J Chem Inf Model 2018; 59:1085-1095. [DOI: 10.1021/acs.jcim.8b00851] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Yingchun Cai
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Hongbin Yang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Philip W. Lee
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| |
Collapse
|
19
|
Merino-Casallo F, Gomez-Benito MJ, Juste-Lanas Y, Martinez-Cantin R, Garcia-Aznar JM. Integration of in vitro and in silico Models Using Bayesian Optimization With an Application to Stochastic Modeling of Mesenchymal 3D Cell Migration. Front Physiol 2018; 9:1246. [PMID: 30271351 PMCID: PMC6142046 DOI: 10.3389/fphys.2018.01246] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Accepted: 08/17/2018] [Indexed: 11/13/2022] Open
Abstract
Cellular migration plays a crucial role in many aspects of life and development. In this paper, we propose a computational model of 3D migration that is solved by means of the tau-leaping algorithm and whose parameters have been calibrated using Bayesian optimization. Our main focus is two-fold: to optimize the numerical performance of the mechano-chemical model as well as to automate the calibration process of in silico models using Bayesian optimization. The presented mechano-chemical model allows us to simulate the stochastic behavior of our chemically reacting system in combination with mechanical constraints due to the surrounding collagen-based matrix. This numerical model has been used to simulate fibroblast migration. Moreover, we have performed in vitro analysis of migrating fibroblasts embedded in 3D collagen-based fibrous matrices (2 mg/ml). These in vitro experiments have been performed with the main objective of calibrating our model. Nine model parameters have been calibrated testing 300 different parametrizations using a completely automatic approach. Two competing evaluation metrics based on the Bhattacharyya coefficient have been defined in order to fit the model parameters. These metrics evaluate how accurately the in silico model is replicating in vitro measurements regarding the two main variables quantified in the experimental data (number of protrusions and the length of the longest protrusion). The selection of an optimal parametrization is based on the balance between the defined evaluation metrics. Results show how the calibrated model is able to predict the main features observed in the in vitro experiments.
Collapse
Affiliation(s)
- Francisco Merino-Casallo
- Multiscale in Mechanical and Biological Engineering, Department of Mechanical Engineering, Aragón Institute of Engineering Research, Universidad de Zaragoza, Zaragoza, Spain
| | - Maria J Gomez-Benito
- Multiscale in Mechanical and Biological Engineering, Department of Mechanical Engineering, Aragón Institute of Engineering Research, Universidad de Zaragoza, Zaragoza, Spain
| | - Yago Juste-Lanas
- Multiscale in Mechanical and Biological Engineering, Department of Mechanical Engineering, Aragón Institute of Engineering Research, Universidad de Zaragoza, Zaragoza, Spain
| | - Ruben Martinez-Cantin
- Centro Universitario de la Defensa, Zaragoza, Spain.,SigOpt, Inc., San Francisco, CA, United States
| | - Jose M Garcia-Aznar
- Multiscale in Mechanical and Biological Engineering, Department of Mechanical Engineering, Aragón Institute of Engineering Research, Universidad de Zaragoza, Zaragoza, Spain
| |
Collapse
|
20
|
Cai Y, Yang H, Li W, Liu G, Lee PW, Tang Y. Multiclassification Prediction of Enzymatic Reactions for Oxidoreductases and Hydrolases Using Reaction Fingerprints and Machine Learning Methods. J Chem Inf Model 2018; 58:1169-1181. [PMID: 29733642 DOI: 10.1021/acs.jcim.7b00656] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Drug metabolism is a complex procedure in the human body, including a series of enzymatically catalyzed reactions. However, it is costly and time consuming to investigate drug metabolism experimentally; computational methods are hence developed to predict drug metabolism and have shown great advantages. As the first step, classification of metabolic reactions and enzymes is highly desirable for drug metabolism prediction. In this study, we developed multiclassification models for prediction of reaction types catalyzed by oxidoreductases and hydrolases, in which three reaction fingerprints were used to describe the reactions and seven machine learnings algorithms were employed for model building. Data retrieved from KEGG containing 1055 hydrolysis and 2510 redox reactions were used to build the models, respectively. The external validation data consisted of 213 hydrolysis and 512 redox reactions extracted from the Rhea database. The best models were built by neural network or logistic regression with a 2048-bit transformation reaction fingerprint. The predictive accuracies of the main class, subclass, and superclass classification models on external validation sets were all above 90%. This study will be very helpful for enzymatic reaction annotation and further study on metabolism prediction.
Collapse
Affiliation(s)
- Yingchun Cai
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China
| | - Hongbin Yang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China
| | - Philip W Lee
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China
| |
Collapse
|
21
|
Liang Y, Qin D, Zhang Y, Liu W, Liang G. Comprehensive Interactions of ACE Inhibitors With Their Receptor by a Support Vector Machine Model and Molecular Docking. J CHIN CHEM SOC-TAIP 2017. [DOI: 10.1002/jccs.201600803] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Ya'nan Liang
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, School of Bioengineering; Chongqing University; Chongqing 400044 P. R. China
| | - Dongya Qin
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, School of Bioengineering; Chongqing University; Chongqing 400044 P. R. China
| | - Yonghong Zhang
- Medicine Engineering Research Center & School of Pharmacy; Chongqing Medical University; Chongqing 400016 P. R. China
| | - Wanqian Liu
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, School of Bioengineering; Chongqing University; Chongqing 400044 P. R. China
| | - Guizhao Liang
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, School of Bioengineering; Chongqing University; Chongqing 400044 P. R. China
| |
Collapse
|
22
|
Consistency between traditional Chinese medicine constitution-based classification and genetic classification. JOURNAL OF TRADITIONAL CHINESE MEDICAL SCIENCES 2015. [DOI: 10.1016/j.jtcms.2016.01.012] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
|