1
|
Boonyarit B, Yamprasert N, Kaewnuratchadasorn P, Kinchagawat J, Prommin C, Rungrotmongkol T, Nutanong S. GraphEGFR: Multi-task and transfer learning based on molecular graph attention mechanism and fingerprints improving inhibitor bioactivity prediction for EGFR family proteins on data scarcity. J Comput Chem 2024; 45:2001-2023. [PMID: 38713612 DOI: 10.1002/jcc.27388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 04/16/2024] [Accepted: 04/19/2024] [Indexed: 05/09/2024]
Abstract
The proteins within the human epidermal growth factor receptor (EGFR) family, members of the tyrosine kinase receptor family, play a pivotal role in the molecular mechanisms driving the development of various tumors. Tyrosine kinase inhibitors, key compounds in targeted therapy, encounter challenges in cancer treatment due to emerging drug resistance mutations. Consequently, machine learning has undergone significant evolution to address the challenges of cancer drug discovery related to EGFR family proteins. However, the application of deep learning in this area is hindered by inherent difficulties associated with small-scale data, particularly the risk of overfitting. Moreover, the design of a model architecture that facilitates learning through multi-task and transfer learning, coupled with appropriate molecular representation, poses substantial challenges. In this study, we introduce GraphEGFR, a deep learning regression model designed to enhance molecular representation and model architecture for predicting the bioactivity of inhibitors against both wild-type and mutant EGFR family proteins. GraphEGFR integrates a graph attention mechanism for molecular graphs with deep and convolutional neural networks for molecular fingerprints. We observed that GraphEGFR models employing multi-task and transfer learning strategies generally achieve predictive performance comparable to existing competitive methods. The integration of molecular graphs and fingerprints adeptly captures relationships between atoms and enables both global and local pattern recognition. We further validated potential multi-targeted inhibitors for wild-type and mutant HER1 kinases, exploring key amino acid residues through molecular dynamics simulations to understand molecular interactions. This predictive model offers a robust strategy that could significantly contribute to overcoming the challenges of developing deep learning models for drug discovery with limited data and exploring new frontiers in multi-targeted kinase drug discovery for EGFR family proteins.
Collapse
Affiliation(s)
- Bundit Boonyarit
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology, Rayong, Thailand
| | - Nattawin Yamprasert
- School of Information, Computer, and Communication Technology, Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani, Thailand
| | | | - Jiramet Kinchagawat
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology, Rayong, Thailand
| | - Chanatkran Prommin
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology, Rayong, Thailand
| | - Thanyada Rungrotmongkol
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok, Thailand
- Center of Excellence in Structural and Computational Biology Research Unit, Department of Biochemistry, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
| | - Sarana Nutanong
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology, Rayong, Thailand
| |
Collapse
|
2
|
Ahmad S, Raza K. An extensive review on lung cancer therapeutics using machine learning techniques: state-of-the-art and perspectives. J Drug Target 2024; 32:635-646. [PMID: 38662768 DOI: 10.1080/1061186x.2024.2347358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2024] [Accepted: 04/18/2024] [Indexed: 05/07/2024]
Abstract
There are over 100 types of human cancer, accounting for millions of deaths every year. Lung cancer alone claims over 1.8 million lives per year and is expected to surpass 3.2 million by 2050, which underscores the urgent need for rapid drug development and repurposing initiatives. The application of AI emerges as a pivotal solution to developing anti-cancer therapeutics. This state-of-the-art review aims to explore the various applications of AI in lung cancer therapeutics. Predictive models can analyse large datasets, including clinical data, genetic information, and treatment outcomes, for novel drug design and to generate personalised treatment recommendations, potentially optimising therapeutic strategies, enhancing treatment efficacy, and minimising adverse effects. A thorough literature review study was conducted based on articles indexed in PubMed and Scopus. We compiled the use of various machine learning approaches, including CNN, RNN, GAN, VAEs, and other AI techniques, enhancing efficiency with accuracy exceeding 95%, which is validated through a computer-aided drug design process. AI can revolutionise lung cancer therapeutics, streamlining processes and saving biological scientists' time and effort-however, further research is needed to overcome challenges and fully unlock AI's potential in Lung Cancer Therapeutics.
Collapse
Affiliation(s)
- Shaban Ahmad
- Department of Computer Science, Jamia Millia Islamia, New Delhi, India
| | - Khalid Raza
- Department of Computer Science, Jamia Millia Islamia, New Delhi, India
| |
Collapse
|
3
|
Chuntakaruk H, Boonpalit K, Kinchagawat J, Nakarin F, Khotavivattana T, Aonbangkhen C, Shigeta Y, Hengphasatporn K, Nutanong S, Rungrotmongkol T, Hannongbua S. Machine learning-guided design of potent darunavir analogs targeting HIV-1 proteases: A computational approach for antiretroviral drug discovery. J Comput Chem 2024; 45:953-968. [PMID: 38174739 DOI: 10.1002/jcc.27298] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 11/30/2023] [Accepted: 12/13/2023] [Indexed: 01/05/2024]
Abstract
In the pursuit of novel antiretroviral therapies for human immunodeficiency virus type-1 (HIV-1) proteases (PRs), recent improvements in drug discovery have embraced machine learning (ML) techniques to guide the design process. This study employs ensemble learning models to identify crucial substructures as significant features for drug development. Using molecular docking techniques, a collection of 160 darunavir (DRV) analogs was designed based on these key substructures and subsequently screened using molecular docking techniques. Chemical structures with high fitness scores were selected, combined, and one-dimensional (1D) screening based on beyond Lipinski's rule of five (bRo5) and ADME (absorption, distribution, metabolism, and excretion) prediction implemented in the Combined Analog generator Tool (CAT) program. A total of 473 screened analogs were subjected to docking analysis through convolutional neural networks scoring function against both the wild-type (WT) and 12 major mutated PRs. DRV analogs with negative changes in binding free energy (ΔΔ G bind ) compared to DRV could be categorized into four attractive groups based on their interactions with the majority of vital PRs. The analysis of interaction profiles revealed that potent designed analogs, targeting both WT and mutant PRs, exhibited interactions with common key amino acid residues. This observation further confirms that the ML model-guided approach effectively identified the substructures that play a crucial role in potent analogs. It is expected to function as a powerful computational tool, offering valuable guidance in the identification of chemical substructures for synthesis and subsequent experimental testing.
Collapse
Affiliation(s)
- Hathaichanok Chuntakaruk
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok, Thailand
- Department of Biochemistry, Faculty of Science, Center of Excellence in Structural and Computational Biology, Chulalongkorn University, Bangkok, Thailand
| | - Kajjana Boonpalit
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong, Thailand
| | - Jiramet Kinchagawat
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong, Thailand
| | - Fahsai Nakarin
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong, Thailand
| | - Tanatorn Khotavivattana
- Center of Excellence in Natural Products Chemistry (CENP), Department of Chemistry, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
| | - Chanat Aonbangkhen
- Center of Excellence in Natural Products Chemistry (CENP), Department of Chemistry, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
| | - Yasuteru Shigeta
- Center for Computational Sciences, University of Tsukuba, Ibaraki, Japan
| | | | - Sarana Nutanong
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong, Thailand
| | - Thanyada Rungrotmongkol
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok, Thailand
- Department of Biochemistry, Faculty of Science, Center of Excellence in Structural and Computational Biology, Chulalongkorn University, Bangkok, Thailand
| | - Supot Hannongbua
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok, Thailand
- Department of Chemistry, Faculty of Science, Center of Excellence in Computational Chemistry (CECC), Chulalongkorn University, Bangkok, Thailand
| |
Collapse
|
4
|
Syahid NF, Weerapreeyakul N, Srisongkram T. StackBRAF: A Large-Scale Stacking Ensemble Learning for BRAF Affinity Prediction. ACS OMEGA 2023; 8:20881-20891. [PMID: 37332807 PMCID: PMC10268632 DOI: 10.1021/acsomega.3c01641] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 05/22/2023] [Indexed: 06/20/2023]
Abstract
The B-rapidly accelerated fibrosarcoma (BRAF) is a proto-oncogene that plays a vital role in cell signaling and growth regulation. Identifying a potent BRAF inhibitor can enhance therapeutic success in high-stage cancers, particularly metastatic melanoma. In this study, we proposed a stacking ensemble learning framework for the accurate prediction of BRAF inhibitors. We obtained 3857 curated molecules with BRAF inhibitory activity expressed as a predicted half-maximal inhibitory concentration value (pIC50) from the ChEMBL database. Twelve molecular fingerprints from PaDeL-Descriptor were calculated for model training. Three machine learning algorithms including extreme gradient boosting, support vector regression, and multilayer perceptron were utilized for constructing new predictive features (PFs). The meta-ensemble random forest regression, called StackBRAF, was created based on the 36 PFs. The StackBRAF model achieves lower mean absolute error (MAE) and higher coefficient of determination (R2 and Q2) than the individual baseline models. The stacking ensemble learning model provides good y-randomization results, indicating a strong correlation between molecular features and pIC50. An applicability domain of the model with an acceptable Tanimoto similarity score was also defined. Moreover, a large-scale high-throughput screening of 2123 FDA-approved drugs against the BRAF protein was successfully demonstrated using the StackBRAF algorithm. Thus, the StackBRAF model proved beneficial as a drug design algorithm for BRAF inhibitor drug discovery and drug development.
Collapse
Affiliation(s)
- Nur Fadhilah Syahid
- Graduate
School in the Program of Pharmaceutical Chemistry and Natural Products,
Faculty of Pharmaceutical Sciences, Khon
Kaen University, Khon Kaen 40002, Thailand
| | - Natthida Weerapreeyakul
- Division
of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Khon Kaen University, Khon Kaen 40002, Thailand
- Human
High Performance and Health Promotion Research Institute, Khon Kaen University, Khon Kaen 40002, Thailand
| | - Tarapong Srisongkram
- Division
of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Khon Kaen University, Khon Kaen 40002, Thailand
- Human
High Performance and Health Promotion Research Institute, Khon Kaen University, Khon Kaen 40002, Thailand
| |
Collapse
|
5
|
Srisongkram T, Khamtang P, Weerapreeyakul N. Prediction of KRAS G12C inhibitors using conjoint fingerprint and machine learning-based QSAR models. J Mol Graph Model 2023; 122:108466. [PMID: 37058997 DOI: 10.1016/j.jmgm.2023.108466] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 03/19/2023] [Accepted: 03/29/2023] [Indexed: 04/16/2023]
Abstract
Kirsten rat sarcoma virus G12C (KRASG12C) is the major protein mutation associated with non-small cell lung cancer (NSCLC) severity. Inhibiting KRASG12C is therefore one of the key therapeutic strategies for NSCLC patients. In this paper, a cost-effective data driven drug design employing machine learning-based quantitative structure-activity relationship (QSAR) analysis was built for predicting ligand affinities against KRASG12C protein. A curated and non-redundant dataset of 1033 compounds with KRASG12C inhibitory activity (pIC50) was used to build and test the models. The PubChem fingerprint, Substructure fingerprint, Substructure fingerprint count, and the conjoint fingerprint-a combination of PubChem fingerprint and Substructure fingerprint count-were used to train the models. Using comprehensive validation methods and various machine learning algorithms, the results clearly showed that the XGBoost regression (XGBoost) achieved the highest performance in term of goodness of fit, predictivity, generalizability and model robustness (R2 = 0.81, Q2CV = 0.60, Q2Ext = 0.62, R2 - Q2Ext = 0.19, R2Y-Random = 0.31 ± 0.03, Q2Y-Random = -0.09 ± 0.04). The top 13 molecular fingerprints that correlated with the predicted pIC50 values were SubFPC274 (aromatic atoms), SubFPC307 (number of chiral-centers), PubChemFP37 (≥1 Chlorine), SubFPC18 (Number of alkylarylethers), SubFPC1 (number of primary carbons), SubFPC300 (number of 1,3-tautomerizables), PubChemFP621 (N-C:C:C:N structure), PubChemFP23 (≥1 Fluorine), SubFPC2 (number of secondary carbons), SubFPC295 (number of C-ONS bonds), PubChemFP199 (≥4 6-membered rings), PubChemFP180 (≥1 nitrogen-containing 6-membered ring), and SubFPC180 (number of tertiary amine). These molecular fingerprints were virtualized and validated using molecular docking experiments. In conclusion, this conjoint fingerprint and XGBoost-QSAR model demonstrated to be useful as a high-throughput screening tool for KRASG12C inhibitor identification and drug design.
Collapse
Affiliation(s)
- Tarapong Srisongkram
- Division of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Khon Kaen University, 40002, Thailand.
| | | | - Natthida Weerapreeyakul
- Division of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Khon Kaen University, 40002, Thailand
| |
Collapse
|