1
|
Chuntakaruk H, Boonpalit K, Kinchagawat J, Nakarin F, Khotavivattana T, Aonbangkhen C, Shigeta Y, Hengphasatporn K, Nutanong S, Rungrotmongkol T, Hannongbua S. Machine learning-guided design of potent darunavir analogs targeting HIV-1 proteases: A computational approach for antiretroviral drug discovery. J Comput Chem 2024; 45:953-968. [PMID: 38174739 DOI: 10.1002/jcc.27298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 11/30/2023] [Accepted: 12/13/2023] [Indexed: 01/05/2024]
Abstract
In the pursuit of novel antiretroviral therapies for human immunodeficiency virus type-1 (HIV-1) proteases (PRs), recent improvements in drug discovery have embraced machine learning (ML) techniques to guide the design process. This study employs ensemble learning models to identify crucial substructures as significant features for drug development. Using molecular docking techniques, a collection of 160 darunavir (DRV) analogs was designed based on these key substructures and subsequently screened using molecular docking techniques. Chemical structures with high fitness scores were selected, combined, and one-dimensional (1D) screening based on beyond Lipinski's rule of five (bRo5) and ADME (absorption, distribution, metabolism, and excretion) prediction implemented in the Combined Analog generator Tool (CAT) program. A total of 473 screened analogs were subjected to docking analysis through convolutional neural networks scoring function against both the wild-type (WT) and 12 major mutated PRs. DRV analogs with negative changes in binding free energy (ΔΔ G bind ) compared to DRV could be categorized into four attractive groups based on their interactions with the majority of vital PRs. The analysis of interaction profiles revealed that potent designed analogs, targeting both WT and mutant PRs, exhibited interactions with common key amino acid residues. This observation further confirms that the ML model-guided approach effectively identified the substructures that play a crucial role in potent analogs. It is expected to function as a powerful computational tool, offering valuable guidance in the identification of chemical substructures for synthesis and subsequent experimental testing.
Collapse
Affiliation(s)
- Hathaichanok Chuntakaruk
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok, Thailand
- Department of Biochemistry, Faculty of Science, Center of Excellence in Structural and Computational Biology, Chulalongkorn University, Bangkok, Thailand
| | - Kajjana Boonpalit
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong, Thailand
| | - Jiramet Kinchagawat
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong, Thailand
| | - Fahsai Nakarin
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong, Thailand
| | - Tanatorn Khotavivattana
- Center of Excellence in Natural Products Chemistry (CENP), Department of Chemistry, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
| | - Chanat Aonbangkhen
- Center of Excellence in Natural Products Chemistry (CENP), Department of Chemistry, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
| | - Yasuteru Shigeta
- Center for Computational Sciences, University of Tsukuba, Ibaraki, Japan
| | | | - Sarana Nutanong
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong, Thailand
| | - Thanyada Rungrotmongkol
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok, Thailand
- Department of Biochemistry, Faculty of Science, Center of Excellence in Structural and Computational Biology, Chulalongkorn University, Bangkok, Thailand
| | - Supot Hannongbua
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok, Thailand
- Department of Chemistry, Faculty of Science, Center of Excellence in Computational Chemistry (CECC), Chulalongkorn University, Bangkok, Thailand
| |
Collapse
|
2
|
Zhang B, Zhao D. An Ensemble Learning Model for Detecting Soybean Seedling Emergence in UAV Imagery. Sensors (Basel) 2023; 23:6662. [PMID: 37571446 PMCID: PMC10422598 DOI: 10.3390/s23156662] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/19/2023] [Accepted: 07/24/2023] [Indexed: 08/13/2023]
Abstract
Efficient detection and evaluation of soybean seedling emergence is an important measure for making field management decisions. However, there are many indicators related to emergence, and using multiple models to detect them separately makes data processing too slow to aid timely field management. In this study, we aimed to integrate several deep learning and image processing methods to build a model to evaluate multiple soybean seedling emergence information. An unmanned aerial vehicle (UAV) was used to acquire soybean seedling RGB images at emergence (VE), cotyledon (VC), and first node (V1) stages. The number of soybean seedlings that emerged was obtained by the seedling emergence detection module, and image datasets were constructed using the seedling automatic cutting module. The improved AlexNet was used as the backbone network of the growth stage discrimination module. The above modules were combined to calculate the emergence proportion in each stage and determine soybean seedlings emergence uniformity. The results show that the seedling emergence detection module was able to identify the number of soybean seedlings with an average accuracy of 99.92%, a R2 of 0.9784, a RMSE of 6.07, and a MAE of 5.60. The improved AlexNet was more lightweight, training time was reduced, the average accuracy was 99.07%, and the average loss was 0.0355. The model was validated in the field, and the error between predicted and real emergence proportions was up to 0.0775 and down to 0.0060. It provides an effective ensemble learning model for the detection and evaluation of soybean seedling emergence, which can provide a theoretical basis for making decisions on soybean field management and precision operations and has the potential to evaluate other crops emergence information.
Collapse
Affiliation(s)
- Bo Zhang
- College of Information and Electrical Engineering, Heilongjiang Bayi Agricultural University, Daqing 163319, China
| | | |
Collapse
|
3
|
Guo G, Li S, Liu Y, Cao Z, Deng Y. Prediction of Cavity Length Using an Interpretable Ensemble Learning Approach. Int J Environ Res Public Health 2022; 20:702. [PMID: 36613022 PMCID: PMC9819684 DOI: 10.3390/ijerph20010702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Revised: 12/27/2022] [Accepted: 12/27/2022] [Indexed: 06/17/2023]
Abstract
The cavity length, which is a vital index in aeration and corrosion reduction engineering, is affected by many factors and is challenging to calculate. In this study, 10-fold cross-validation was performed to select the optimal input configuration. Additionally, the hyperparameters of three ensemble learning models-random forest (RF), gradient boosting decision tree (GBDT), and extreme gradient boosting tree (XGBOOST)-were fine-tuned by the Bayesian optimization (BO) algorithm to improve the prediction accuracy and compare the five empirical methods. The XGBOOST method was observed to present the highest prediction accuracy. Further interpretability analysis carried out using the Sobol method demonstrated its ability to reasonably capture the varying relative significance of different input features under different flow conditions. The Sobol sensitivity analysis also observed two patterns of extracting information from the input features in ML models: (1) the main effect of individual features in ensemble learning and (2) the interactive effect between each feature in SVR. From the results, the models obtaining individual information both predict the cavity length more accurately than that using interactive information. Subsequently, the XGBOOST captures more correct information from features, which leads to the varied Sobol index in accordance with outside phenomena; meanwhile, the predicted results fit the experimental points best.
Collapse
Affiliation(s)
- Ganggui Guo
- School of Hydraulic Engineering, Faculty of Infrastructure Engineering, Dalian University of Technology, Dalian 116024, China
| | - Shanshan Li
- Conservancy and Hydropower Engineering, Xi’an University of Technology, Xian 710048, China
| | - Yakun Liu
- School of Hydraulic Engineering, Faculty of Infrastructure Engineering, Dalian University of Technology, Dalian 116024, China
| | - Ze Cao
- School of Hydraulic Engineering, Faculty of Infrastructure Engineering, Dalian University of Technology, Dalian 116024, China
| | - Yangyu Deng
- School of Hydraulic Engineering, Faculty of Infrastructure Engineering, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
4
|
Wang R, Li H, Jing J, Jiang L, Dong W. WYSIWYG: IoT Device Identification Based on WebUI Login Pages. Sensors (Basel) 2022; 22:4892. [PMID: 35808388 PMCID: PMC9269544 DOI: 10.3390/s22134892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 06/19/2022] [Accepted: 06/23/2022] [Indexed: 06/15/2023]
Abstract
With the improvement of intelligence and interconnection, Internet of Things (IoT) devices tend to become more vulnerable and exposed to many threats. Device identification is the foundation of many cybersecurity operations, such as asset management, vulnerability reaction, and situational awareness, which are important for enhancing the security of IoT devices. The more information sources and the more angles of view we have, the more precise identification results we obtain. This study proposes a novel and alternative method for IoT device identification, which introduces commonly available WebUI login pages with distinctive characteristics specific to vendors as the data source and uses an ensemble learning model based on a combination of Convolutional Neural Networks (CNN) and Deep Neural Networks (DNN) for device vendor identification and develops an Optical Character Recognition (OCR) based method for device type and model identification. The experimental results show that the ensemble learning model can achieve 99.1% accuracy and 99.5% F1-Score in the determination of whether a device is from a vendor that appeared in the training dataset, and if the answer is positive, 98% accuracy and 98.3% F1-Score in identifying which vendor it is from. The OCR-based method can identify fine-grained attributes of the device and achieve an accuracy of 99.46% in device model identification, which is higher than the results of the Shodan cyber search engine by a considerable margin of 11.39%.
Collapse
Affiliation(s)
- Ruimin Wang
- State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450000, China; (R.W.); (H.L.); (J.J.); (L.J.)
- Key Laboratory of Cyberspace Situation Awareness of Henan Province, Zhengzhou 450000, China
| | - Haitao Li
- State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450000, China; (R.W.); (H.L.); (J.J.); (L.J.)
| | - Jing Jing
- State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450000, China; (R.W.); (H.L.); (J.J.); (L.J.)
| | - Liehui Jiang
- State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450000, China; (R.W.); (H.L.); (J.J.); (L.J.)
- Key Laboratory of Cyberspace Situation Awareness of Henan Province, Zhengzhou 450000, China
| | - Weiyu Dong
- State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450000, China; (R.W.); (H.L.); (J.J.); (L.J.)
| |
Collapse
|
5
|
Ganie SM, Malik MB, Arif T. Performance analysis and prediction of type 2 diabetes mellitus based on lifestyle data using machine learning approaches. J Diabetes Metab Disord 2022; 21:339-352. [PMID: 35673418 PMCID: PMC9167316 DOI: 10.1007/s40200-022-00981-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 01/17/2022] [Indexed: 10/18/2022]
Abstract
Objective Diabetes is a chronic fatal disease that has affected millions of people all over the globe. Type 2 Diabetes Mellitus (T2DM) accounts for 90% of the affected population among all types of diabetes. Millions of T2DM patients remain undiagnosed due to lack of awareness and under resourced healthcare system. So, there is a dire need for a diagnostic and prognostic tool that shall help the healthcare providers, clinicians and practitioners with early prediction and hence can recommend the lifestyle changes required to stop the progression of diabetes. The main objective of this research is to develop a framework based on machine learning techniques using only lifestyle indicators for prediction of T2DM disease. Moreover, prediction model can be used without visiting clinical labs and hospital readmissions. Method A proposed framework is presented and implemented based on machine learning paradigms using lifestyle indicators for better prediction of T2DM disease. The current research has involved different experts like Diabetologists, Endocrinologists, Dieticians, Nutritionists, etc. for selecting the contributing 1552 instances and 11 attributes lifestyle biological features to promote health and manage complications towards T2DM disease. The dataset has been collected through survey and google forms from different geographical regions. Results Seven machine learning classifiers were employed namely K-Nearest Neighbour (KNN), Linear Regression (LR), Support Vector Machine (SVM), Naive Bayes (NB), Decision Tree (DT), Random Forest (RF) and Gradient Boosting (GB). Gradient Boosting classifier outperformed best with an accuracy rate of 97.24% for training and 96.90% for testing separately followed by RF, DT, NB, SVM, LR, and KNN as 95.36%, 92.52%, 90.72%, 90.20%, 90.20% and 77.06% respectively. However, in terms of precision, RF achieved high performance (0.980%) and KNN performed the lowest (0.793%). As far as recall is being concerned, GB achieved the highest rate of 0.975% and KNN showed the worst rate of 0.774%. Also, GB is top performed in terms of f1-score. According to the ROCs, GB and NB had a better area under the curve compared to the others. Conclusion The research developed a realistic health management system for T2DM disease based on machine learning techniques using only lifestyle data for prediction of T2DM. To extend the current study, these models shall be used for different, large and real-time datasets which share the commonality of data with T2DM disease to establish the efficacy of the proposed system.
Collapse
Affiliation(s)
| | - Majid Bashir Malik
- Department of Computer Sciences, BGSB University, UT J&K, Rajouri, India
| | - Tasleem Arif
- Department of Information Technology, BGSB University, UT J&K, Rajouri, India
| |
Collapse
|
6
|
Li L, Blomberg AJ, Stern RA, Kang CM, Papatheodorou S, Wei Y, Liu M, Peralta AA, Vieira CLZ, Koutrakis P. Predicting Monthly Community-Level Domestic Radon Concentrations in the Greater Boston Area with an Ensemble Learning Model. Environ Sci Technol 2021; 55:7157-7166. [PMID: 33939421 DOI: 10.1021/acs.est.0c08792] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Inhaling radon and its progeny is associated with adverse health outcomes. However, previous studies of the health effects of residential exposure to radon in the United States were commonly based on a county-level temporally invariant radon model that was developed using measurements collected in the mid- to late 1980s. We developed a machine learning model to predict monthly radon concentrations for each ZIP Code Tabulation Area (ZCTA) in the Greater Boston area based on 363,783 short-term measurements by Spruce Environmental Technologies, Inc., during the period 2005-2018. A two-stage ensemble-based model was developed to predict radon concentrations for all ZCTAs and months. Stage one included 12 base statistical models that independently predicted ZCTA-level radon concentrations based on geological, architectural, socioeconomic, and meteorological factors for each ZCTA. Stage two aggregated the predictions of these 12 base models using an ensemble learning method. The results of a 10-fold cross-validation showed that the stage-two model has a good prediction accuracy with a weighted R2 of 0.63 and root mean square error of 22.6 Bq/m3. The community-level time-varying predictions from our model have good predictive precision and accuracy and can be used in future prospective epidemiological studies in the Greater Boston area.
Collapse
Affiliation(s)
- Longxiang Li
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, 401 Park Drive, Boston, Massachusetts 02114, United States
| | - Annelise J Blomberg
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, 401 Park Drive, Boston, Massachusetts 02114, United States
| | - Rebecca A Stern
- Harvard John A. Paulson School of Engineering and Applied Sciences, 29 Oxford St., Cambridge, Massachusetts 02138, United States
| | - Choong-Min Kang
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, 401 Park Drive, Boston, Massachusetts 02114, United States
| | - Stefania Papatheodorou
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, 677 Huntington Avenue, Boston, Massachusetts 02115, United States
| | - Yaguang Wei
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, 401 Park Drive, Boston, Massachusetts 02114, United States
| | - Man Liu
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, 401 Park Drive, Boston, Massachusetts 02114, United States
| | - Adjani A Peralta
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, 401 Park Drive, Boston, Massachusetts 02114, United States
| | - Carolina L Z Vieira
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, 401 Park Drive, Boston, Massachusetts 02114, United States
| | - Petros Koutrakis
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, 401 Park Drive, Boston, Massachusetts 02114, United States
| |
Collapse
|