1
|
Ding H, Xing F, Zou L, Zhao L. QSAR analysis of VEGFR-2 inhibitors based on machine learning, Topomer CoMFA and molecule docking. BMC Chem 2024; 18:59. [PMID: 38555462 PMCID: PMC10981835 DOI: 10.1186/s13065-024-01165-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 03/12/2024] [Indexed: 04/02/2024] Open
Abstract
VEGFR-2 kinase inhibitors are clinically approved drugs that can effectively target cancer angiogenesis. However, such inhibitors have adverse effects such as skin toxicity, gastrointestinal reactions and hepatic impairment. In this study, machine learning and Topomer CoMFA, which is an alignment-dependent, descriptor-based method, were employed to build structural activity relationship models of potentially new VEGFR-2 inhibitors. The prediction ac-curacy of the training and test sets of the 2D-SAR model were 82.4 and 80.1%, respectively, with KNN. Topomer CoMFA approach was then used for 3D-QSAR modeling of VEGFR-2 inhibitors. The coefficient of q2 for cross-validation of the model 1 was greater than 0.5, suggesting that a stable drug activity-prediction model was obtained. Molecular docking was further performed to simulate the interactions between the five most promising compounds and VEGFR-2 target protein and the Total Scores were all greater than 6, indicating that they had a strong hydrogen bond interactions were present. This study successfully used machine learning to obtain five potentially novel VEGFR-2 inhibitors to increase our arsenal of drugs to combat cancer.
Collapse
Affiliation(s)
- Hao Ding
- Department of Ultrasound, Shengjing Hospital of China Medical University, Shenyang, 110004, Liaoning, China
| | - Fei Xing
- Department of Oncology, Shengjing Hospital of China Medical University, Shenyang, 110004, Liaoning, China
| | - Lin Zou
- Medical College of Guangxi University, Nanning, 530004, Guangxi, China
| | - Liang Zhao
- Hepatobiliary and Splenic Surgery Ward, Department of General Surgery, Shengjing Hospital of China Medical University, Shenyang, 110004, Liaoning, China.
| |
Collapse
|
2
|
Shen Y, Huang J, Jia L, Zhang C, Xu J. Bioinformatics and machine learning driven key genes screening for hepatocellular carcinoma. Biochem Biophys Rep 2024; 37:101587. [PMID: 38107663 PMCID: PMC10724547 DOI: 10.1016/j.bbrep.2023.101587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Revised: 11/01/2023] [Accepted: 11/17/2023] [Indexed: 12/19/2023] Open
Abstract
Liver cancer, a global menace, ranked as the sixth most prevalent and third deadliest cancer in 2020. The challenge of early diagnosis and treatment, especially for hepatocellular carcinoma (HCC), persists due to late-stage detections. Understanding HCC's complex pathogenesis is vital for advancing diagnostics and therapies. This study combines bioinformatics and machine learning, examining HCC comprehensively. Three datasets underwent meticulous scrutiny, employing various analytical tools such as Gene Ontology (GO) function and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis, protein interaction assessment, and survival analysis. These rigorous investigations uncovered twelve pivotal genes intricately linked with HCC's pathophysiological intricacies. Among them, CYP2C8, CYP2C9, EPHX2, and ESR1 were significantly positively correlated with overall patient survival, while AKR1B10 and NQO1 displayed a negative correlation. Moreover, the Adaboost prediction model yielded an 86.8 % accuracy, showcasing machine learning's potential in deciphering complex dataset patterns for clinically relevant predictions. These findings promise to contribute valuable insights into the elusive mechanisms driving liver cancer (HCC). They hold the potential to guide the development of more precise diagnostic methods and treatment strategies in the future. In the fight against this global health challenge, unraveling HCC's intricacies is of paramount importance.
Collapse
Affiliation(s)
- Ye Shen
- Department of Radiology, Wujin Hospital Affiliated with Jiangsu University, Changzhou, 213002, China
| | - Juanjie Huang
- Department of General Surgery, Dongguan Qingxi Hospital, Dongguan, 523660, China
| | - Lei Jia
- International Health Medicine Innovation Center, Shenzhen University, ShenZhen, 518060, China
| | - Chi Zhang
- Huaxia Eye Hospital of Foshan, Huaxia Eye Hospital Group, Foshan, Guangdong, 528000, China
| | - Jianxing Xu
- Department of Radiology, Wujin Hospital Affiliated with Jiangsu University, Changzhou, 213002, China
- Department of Radiology, The Wujin Clinical College of Xuzhou Medical University, Changzhou, 213002, China
| |
Collapse
|
3
|
Thafar MA, Albaradei S, Uludag M, Alshahrani M, Gojobori T, Essack M, Gao X. OncoRTT: Predicting novel oncology-related therapeutic targets using BERT embeddings and omics features. Front Genet 2023; 14:1139626. [PMID: 37091791 PMCID: PMC10117673 DOI: 10.3389/fgene.2023.1139626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Accepted: 03/24/2023] [Indexed: 04/08/2023] Open
Abstract
Late-stage drug development failures are usually a consequence of ineffective targets. Thus, proper target identification is needed, which may be possible using computational approaches. The reason being, effective targets have disease-relevant biological functions, and omics data unveil the proteins involved in these functions. Also, properties that favor the existence of binding between drug and target are deducible from the protein’s amino acid sequence. In this work, we developed OncoRTT, a deep learning (DL)-based method for predicting novel therapeutic targets. OncoRTT is designed to reduce suboptimal target selection by identifying novel targets based on features of known effective targets using DL approaches. First, we created the “OncologyTT” datasets, which include genes/proteins associated with ten prevalent cancer types. Then, we generated three sets of features for all genes: omics features, the proteins’ amino-acid sequence BERT embeddings, and the integrated features to train and test the DL classifiers separately. The models achieved high prediction performances in terms of area under the curve (AUC), i.e., AUC greater than 0.88 for all cancer types, with a maximum of 0.95 for leukemia. Also, OncoRTT outperformed the state-of-the-art method using their data in five out of seven cancer types commonly assessed by both methods. Furthermore, OncoRTT predicts novel therapeutic targets using new test data related to the seven cancer types. We further corroborated these results with other validation evidence using the Open Targets Platform and a case study focused on the top-10 predicted therapeutic targets for lung cancer.
Collapse
Affiliation(s)
- Maha A. Thafar
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- College of Computers and Information Technology, Computer Science Department, Taif University, Taif, Saudi Arabia
| | - Somayah Albaradei
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Mahmut Uludag
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Mona Alshahrani
- National Center for Artificial Intelligence (NCAI), Saudi Data and Artificial Intelligence Authority (SDAIA), Riyadh, Saudi Arabia
| | - Takashi Gojobori
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Magbubah Essack
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- *Correspondence: Xin Gao, ; Magbubah Essack,
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- *Correspondence: Xin Gao, ; Magbubah Essack,
| |
Collapse
|
4
|
Sandaruwan PD, Wannige CT. An improved deep learning model for hierarchical classification of protein families. PLoS One 2021; 16:e0258625. [PMID: 34669708 PMCID: PMC8528337 DOI: 10.1371/journal.pone.0258625] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Accepted: 10/01/2021] [Indexed: 12/28/2022] Open
Abstract
Although genes carry information, proteins are the main role player in providing all the functionalities of a living organism. Massive amounts of different proteins involve in every function that occurs in a cell. These amino acid sequences can be hierarchically classified into a set of families and subfamilies depending on their evolutionary relatedness and similarities in their structure or function. Protein characterization to identify protein structure and function is done accurately using laboratory experiments. With the rapidly increasing huge amount of novel protein sequences, these experiments have become difficult to carry out since they are expensive, time-consuming, and laborious. Therefore, many computational classification methods are introduced to classify proteins and predict their functional properties. With the progress of the performance of the computational techniques, deep learning plays a key role in many areas. Novel deep learning models such as DeepFam, ProtCNN have been presented to classify proteins into their families recently. However, these deep learning models have been used to carry out the non-hierarchical classification of proteins. In this research, we propose a deep learning neural network model named DeepHiFam with high accuracy to classify proteins hierarchically into different levels simultaneously. The model achieved an accuracy of 98.38% for protein family classification and more than 80% accuracy for the classification of protein subfamilies and sub-subfamilies. Further, DeepHiFam performed well in the non-hierarchical classification of protein families and achieved an accuracy of 98.62% and 96.14% for the popular Pfam dataset and COG dataset respectively.
Collapse
|
5
|
Some illuminating remarks on molecular genetics and genomics as well as drug development. Mol Genet Genomics 2020; 295:261-274. [PMID: 31894399 DOI: 10.1007/s00438-019-01634-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Accepted: 12/05/2019] [Indexed: 02/07/2023]
Abstract
Facing the explosive growth of biological sequences unearthed in the post-genomic age, one of the most important but also most difficult problems in computational biology is how to express a biological sequence with a discrete model or a vector, but still keep it with considerable sequence-order information or its special pattern. To deal with such a challenging problem, the ideas of "pseudo amino acid components" and "pseudo K-tuple nucleotide composition" have been proposed. The ideas and their approaches have further stimulated the birth for "distorted key theory", "wenxing diagram", and substantially strengthening the power in treating the multi-label systems, as well as the establishment of the famous "5-steps rule". All these logic developments are quite natural that are very useful not only for theoretical scientists but also for experimental scientists in conducting genetics/genomics analysis and drug development. Presented in this review paper are also their future perspectives; i.e., their impacts will become even more significant and propounding.
Collapse
|