1
|
Chu H, Liu T. Comprehensive Research on Druggable Proteins: From PSSM to Pre-Trained Language Models. Int J Mol Sci 2024; 25:4507. [PMID: 38674091 PMCID: PMC11049818 DOI: 10.3390/ijms25084507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 04/15/2024] [Accepted: 04/17/2024] [Indexed: 04/28/2024] Open
Abstract
Identification of druggable proteins can greatly reduce the cost of discovering new potential drugs. Traditional experimental approaches to exploring these proteins are often costly, slow, and labor-intensive, making them impractical for large-scale research. In response, recent decades have seen a rise in computational methods. These alternatives support drug discovery by creating advanced predictive models. In this study, we proposed a fast and precise classifier for the identification of druggable proteins using a protein language model (PLM) with fine-tuned evolutionary scale modeling 2 (ESM-2) embeddings, achieving 95.11% accuracy on the benchmark dataset. Furthermore, we made a careful comparison to examine the predictive abilities of ESM-2 embeddings and position-specific scoring matrix (PSSM) features by using the same classifiers. The results suggest that ESM-2 embeddings outperformed PSSM features in terms of accuracy and efficiency. Recognizing the potential of language models, we also developed an end-to-end model based on the generative pre-trained transformers 2 (GPT-2) with modifications. To our knowledge, this is the first time a large language model (LLM) GPT-2 has been deployed for the recognition of druggable proteins. Additionally, a more up-to-date dataset, known as Pharos, was adopted to further validate the performance of the proposed model.
Collapse
Affiliation(s)
| | - Taigang Liu
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China;
| |
Collapse
|
2
|
Zhang Y, Deng Z, Xu X, Feng Y, Junliang S. Application of Artificial Intelligence in Drug-Drug Interactions Prediction: A Review. J Chem Inf Model 2024; 64:2158-2173. [PMID: 37458400 DOI: 10.1021/acs.jcim.3c00582] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Drug-drug interactions (DDI) are a critical aspect of drug research that can have adverse effects on patients and can lead to serious consequences. Predicting these events accurately can significantly improve clinicians' ability to make better decisions and establish optimal treatment regimens. However, manually detecting these interactions is time-consuming and labor-intensive. Utilizing the advancements in Artificial Intelligence (AI) is essential for achieving accurate forecasts of DDIs. In this review, DDI prediction tasks are classified into three types according to the type of DDI prediction: undirected DDI prediction, DDI events prediction, and Asymmetric DDI prediction. The paper then reviews the progress of AI for each of these three prediction tasks in DDI and provides a summary of the data sets used as well as the representative methods used in these three prediction directions. In this review, we aim to provide a comprehensive overview of drug interaction prediction. The first section introduces commonly used databases and presents an overview of current research advancements and techniques across three domains of DDI. Additionally, we introduce classical machine learning techniques for predicting undirected drug interactions and provide a timeline for the progression of the predicted drug interaction events. At last, we debate the difficulties and prospects of AI approaches at predicting DDI, emphasizing their potential for improving clinical decision-making and patient outcomes.
Collapse
Affiliation(s)
- Yuanyuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao,266000,China
| | - Zengqian Deng
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao,266000,China
| | - Xiaoyu Xu
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao,266000,China
| | - Yinfei Feng
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao,266000,China
| | - Shang Junliang
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276800, China
| |
Collapse
|
3
|
Arif M, Fang G, Ghulam A, Musleh S, Alam T. DPI_CDF: druggable protein identifier using cascade deep forest. BMC Bioinformatics 2024; 25:145. [PMID: 38580921 DOI: 10.1186/s12859-024-05744-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 03/13/2024] [Indexed: 04/07/2024] Open
Abstract
BACKGROUND Drug targets in living beings perform pivotal roles in the discovery of potential drugs. Conventional wet-lab characterization of drug targets is although accurate but generally expensive, slow, and resource intensive. Therefore, computational methods are highly desirable as an alternative to expedite the large-scale identification of druggable proteins (DPs); however, the existing in silico predictor's performance is still not satisfactory. METHODS In this study, we developed a novel deep learning-based model DPI_CDF for predicting DPs based on protein sequence only. DPI_CDF utilizes evolutionary-based (i.e., histograms of oriented gradients for position-specific scoring matrix), physiochemical-based (i.e., component protein sequence representation), and compositional-based (i.e., normalized qualitative characteristic) properties of protein sequence to generate features. Then a hierarchical deep forest model fuses these three encoding schemes to build the proposed model DPI_CDF. RESULTS The empirical outcomes on 10-fold cross-validation demonstrate that the proposed model achieved 99.13 % accuracy and 0.982 of Matthew's-correlation-coefficient (MCC) on the training dataset. The generalization power of the trained model is further examined on an independent dataset and achieved 95.01% of maximum accuracy and 0.900 MCC. When compared to current state-of-the-art methods, DPI_CDF improves in terms of accuracy by 4.27% and 4.31% on training and testing datasets, respectively. We believe, DPI_CDF will support the research community to identify druggable proteins and escalate the drug discovery process. AVAILABILITY The benchmark datasets and source codes are available in GitHub: http://github.com/Muhammad-Arif-NUST/DPI_CDF .
Collapse
Affiliation(s)
- Muhammad Arif
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Ge Fang
- State Key Laboratory for Organic Electronics and Information Displays, Institute of Advanced Materials (IAM), Nanjing 210023, P. R. China, Nanjing 210023, China
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bankok, 10700, Thailand
| | - Ali Ghulam
- Information Technology Centre, Sindh Agriculture University, Sindh, Pakistan
| | - Saleh Musleh
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Tanvir Alam
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar.
| |
Collapse
|
4
|
Zou H. iDPPIV-SI: identifying dipeptidyl peptidase IV inhibitory peptides by using multiple sequence information. J Biomol Struct Dyn 2024; 42:2144-2152. [PMID: 37125813 DOI: 10.1080/07391102.2023.2203257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 04/10/2023] [Indexed: 05/02/2023]
Abstract
Currently, diabetes has become a great threaten for people's health in the world. Recent study shows that dipeptidyl peptidase IV (DPP-IV) inhibitory peptides may be a potential pharmaceutical agent to treat diabetes. Thus, there is a need to discriminate DPP-IV inhibitory peptides from non-DPP-IV inhibitory peptides. To address this issue, a novel computational model called iDPPIV-SI was developed in this study. In the first, 50 different types of physicochemical (PC) properties were employed to denote the peptide sequences. Three different feature descriptors including the 1-order, 2-order correlation methods and discrete wavelet transform were applied to collect useful information from the PC matrix. Furthermore, the least absolute shrinkage and selection operator (LASSO) algorithm was employed to select these most discriminative features. All of these chosen features were fed into support vector machine (SVM) for identifying DPP-IV inhibitory peptides. The iDPPIV-SI achieved 91.26% and 98.12% classification accuracies on the training and independent dataset, respectively. There is a significantly improvement in the classification performance by the proposed method, as compared with the state-of-the-art predictors. The datasets and MATLAB codes (based on MATLAB2015b) used in current study are available at https://figshare.com/articles/online_resource/iDPPIV-SI/20085878.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Hongliang Zou
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| |
Collapse
|
5
|
Parvatikar PP, Patil S, Khaparkhuntikar K, Patil S, Singh PK, Sahana R, Kulkarni RV, Raghu AV. Artificial intelligence: Machine learning approach for screening large database and drug discovery. Antiviral Res 2023; 220:105740. [PMID: 37935248 DOI: 10.1016/j.antiviral.2023.105740] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 10/17/2023] [Accepted: 10/26/2023] [Indexed: 11/09/2023]
Abstract
Recent research in drug discovery dealing with many faces difficulties, including development of new drugs during disease outbreak and drug resistance due to rapidly accumulating mutations. Virtual screening is the most widely used method in computer aided drug discovery. It has a prominent ability in screening drug targets from large molecular databases. Recently, a number of web servers have developed for quickly screening publicly accessible chemical databases. In a nutshell, deep learning algorithms and artificial neural networks have modernised the field. Several drug discovery processes have used machine learning and deep learning algorithms, including peptide synthesis, structure-based virtual screening, ligand-based virtual screening, toxicity prediction, drug monitoring and release, pharmacophore modelling, quantitative structure-activity relationship, drug repositioning, polypharmacology, and physiochemical activity. Although there are presently a wide variety of data-driven AI/ML tools available, the majority of these tools have, up to this point, been developed in the context of non-communicable diseases like cancer, and a number of obstacles have prevented the translation of these tools to the discovery of treatments against infectious diseases. In this review various aspects of AI and ML in virtual screening of large databases were discussed. Here, with an emphasis on antivirals as well as other disease, offers a perspective on the advantages, drawbacks, and hazards of AI/ML techniques in the search for innovative treatments.
Collapse
Affiliation(s)
- Prachi P Parvatikar
- Department of Biotechnology, Allied Health Science, BLDE (Deemed-to-be University), Vijayapur 586103, Karnataka, India.
| | - Sudha Patil
- Department of Pharmaceutics, BLDEA's SSM College of Pharmacy and Research Centre, Vijayapur 586 103, Karnataka, India
| | - Kedar Khaparkhuntikar
- Department of Pharmaceutics, National Institute of Pharmaceutical Education and Research (NIPER), Hyderabad, Telangana, 500037, India
| | - Shruti Patil
- Department of Biotechnology, Allied Health Science, BLDE (Deemed-to-be University), Vijayapur 586103, Karnataka, India
| | - Pankaj K Singh
- Department of Pharmaceutics, National Institute of Pharmaceutical Education and Research (NIPER), Hyderabad, Telangana, 500037, India
| | - R Sahana
- Department of Computer Science and Engineering, RV Institute of Technology and Management, 560076, Bengaluru, India
| | - Raghavendra V Kulkarni
- Department of Biotechnology, Allied Health Science, BLDE (Deemed-to-be University), Vijayapur 586103, Karnataka, India; Department of Pharmaceutics, BLDEA's SSM College of Pharmacy and Research Centre, Vijayapur 586 103, Karnataka, India
| | - Anjanapura V Raghu
- Department of Science and Technology, BLDE (Deemed-to-be University), Vijayapur 586103, Karnataka, India.
| |
Collapse
|
6
|
Alghushairy O, Ali F, Alghamdi W, Khalid M, Alsini R, Asiry O. Machine learning-based model for accurate identification of druggable proteins using light extreme gradient boosting. J Biomol Struct Dyn 2023:1-12. [PMID: 37850427 DOI: 10.1080/07391102.2023.2269280] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 10/04/2023] [Indexed: 10/19/2023]
Abstract
The identification of druggable proteins (DPs) is significant for the development of new drugs, personalized medicine, understanding of disease mechanisms, drug repurposing, and economic benefits. By identifying new druggable targets, researchers can develop new therapies for a range of diseases, leading to better patient outcomes. Identification of DPs by machine learning strategies is more efficient and cost-effective than conventional methods. In this study, a computational predictor, namely Drug-LXGB, is introduced to enhance the identification of DPs. Features are discovered by composition, transition, and distribution (CTD), composition of K-spaced amino acid pair (CKSAAP), pseudo-position-specific scoring matrix (PsePSSM), and a novel descriptor, called multi-block pseudo amino acid composition (MB-PseAAC). The dimensions of CTD, CKSAAP, PsePSSM, and MB-PseAAC are integrated and utilized the sequential forward selection as feature selection algorithm. The best characteristics are provided by random forest, extreme gradient boosting, and light eXtreme gradient boosting (LXGB). The predictive analysis of these learning methods is measured via 10-fold cross-validation. The LXGB-based model secures the highest results than other existing predictors. Our novel protocol will perform an active role in designing novel drugs and would be fruitful to explore the potential target. This study will help better to capture a more universal view of a potential target.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Omar Alghushairy
- Department of Information Systems and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Farman Ali
- Department of Software Engineering, Sarhad University of Science and Information Technology Peshawar Mardan Campus, Peshawar, Pakistan
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Majdi Khalid
- Department of Computer Science, College of Computers and Information Systems, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Raed Alsini
- Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Othman Asiry
- Department of Information Technology, College of Computing and Information Technology at Khulais, University of Jeddah, Jeddah, Saudi Arabia
| |
Collapse
|
7
|
Yu Z, Yin Z, Zou H. iAMY-RECMFF: Identifying amyloidgenic peptides by using residue pairwise energy content matrix and features fusion algorithm. J Bioinform Comput Biol 2023; 21:2350023. [PMID: 37899353 DOI: 10.1142/s0219720023500233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2023]
Abstract
Various diseases, including Huntington's disease, Alzheimer's disease, and Parkinson's disease, have been reported to be linked to amyloid. Therefore, it is crucial to distinguish amyloid from non-amyloid proteins or peptides. While experimental approaches are typically preferred, they are costly and time-consuming. In this study, we have developed a machine learning framework called iAMY-RECMFF to discriminate amyloidgenic from non-amyloidgenic peptides. In our model, we first encoded the peptide sequences using the residue pairwise energy content matrix. We then utilized Pearson's correlation coefficient and distance correlation to extract useful information from this matrix. Additionally, we employed an improved similarity network fusion algorithm to integrate features from different perspectives. The Fisher approach was adopted to select the optimal feature subset. Finally, the selected features were inputted into a support vector machine for identifying amyloidgenic peptides. Experimental results demonstrate that our proposed method significantly improves the identification of amyloidgenic peptides compared to existing predictors. This suggests that our method may serve as a powerful tool in identifying amyloidgenic peptides. To facilitate academic use, the dataset and codes used in the current study are accessible at https://figshare.com/articles/online_resource/iAMY-RECMFF/22816916.
Collapse
Affiliation(s)
- Zizheng Yu
- School of Communications and Electronics Jiangxi, Science and Technology Normal University, Nanchang 330013, P. R. China
| | - Zhijian Yin
- School of Communications and Electronics Jiangxi, Science and Technology Normal University, Nanchang 330013, P. R. China
- Jiangxi Engineering Research Center of Unattended Perception System and Artificial Intelligence Technology Jiangxi Science and Technology Normal University, Jiangxi 330088, P. R. China
| | - Hongliang Zou
- School of Communications and Electronics Jiangxi, Science and Technology Normal University, Nanchang 330013, P. R. China
- Jiangxi Engineering Research Center of Unattended Perception System and Artificial Intelligence Technology Jiangxi Science and Technology Normal University, Jiangxi 330088, P. R. China
| |
Collapse
|
8
|
Zou H, Yu W. Integrating Low-Order and High-Order Correlation Information for Identifying Phage Virion Proteins. J Comput Biol 2023; 30:1131-1143. [PMID: 37729064 DOI: 10.1089/cmb.2022.0237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/22/2023] Open
Abstract
Phage virion proteins (PVPs) play an important role in the host cell. Fast and accurate identification of PVPs is beneficial for the discovery and development of related drugs. Although wet experimental approaches are the first choice to identify PVPs, they are costly and time-consuming. Thus, researchers have turned their attention to computational models, which can speed up related studies. Therefore, we proposed a novel machine-learning model to identify PVPs in the current study. First, 50 different types of physicochemical properties were used to denote protein sequences. Next, two different approaches, including Pearson's correlation coefficient (PCC) and maximal information coefficient (MIC), were employed to extract discriminative information. Further, to capture the high-order correlation information, we used PCC and MIC once again. After that, we adopted the least absolute shrinkage and selection operator algorithm to select the optimal feature subset. Finally, these chosen features were fed into a support vector machine to discriminate PVPs from phage non-virion proteins. We performed experiments on two different datasets to validate the effectiveness of our proposed method. Experimental results showed a significant improvement in performance compared with state-of-the-art approaches. It indicates that the proposed computational model may become a powerful predictor in identifying PVPs.
Collapse
Affiliation(s)
- Hongliang Zou
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Wanting Yu
- College of Animal Science and Technology, Jiangxi Agricultural University, Nanchang, China
| |
Collapse
|
9
|
Mozafari N, Mozafari N, Dehshahri A, Azadi A. Knowledge Gaps in Generating Cell-Based Drug Delivery Systems and a Possible Meeting with Artificial Intelligence. Mol Pharm 2023; 20:3757-3778. [PMID: 37428824 DOI: 10.1021/acs.molpharmaceut.3c00162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2023]
Abstract
Cell-based drug delivery systems are new strategies in targeted delivery in which cells or cell-membrane-derived systems are used as carriers and release their cargo in a controlled manner. Recently, great attention has been directed to cells as carrier systems for treating several diseases. There are various challenges in the development of cell-based drug delivery systems. The prediction of the properties of these platforms is a prerequisite step in their development to reduce undesirable effects. Integrating nanotechnology and artificial intelligence leads to more innovative technologies. Artificial intelligence quickly mines data and makes decisions more quickly and accurately. Machine learning as a subset of the broader artificial intelligence has been used in nanomedicine to design safer nanomaterials. Here, how challenges of developing cell-based drug delivery systems can be solved with potential predictive models of artificial intelligence and machine learning is portrayed. The most famous cell-based drug delivery systems and their challenges are described. Last but not least, artificial intelligence and most of its types used in nanomedicine are highlighted. The present Review has shown the challenges of developing cells or their derivatives as carriers and how they can be used with potential predictive models of artificial intelligence and machine learning.
Collapse
Affiliation(s)
- Negin Mozafari
- Department of Pharmaceutics, School of Pharmacy, Shiraz University of Medical Sciences, 71468 64685 Shiraz, Iran
| | - Niloofar Mozafari
- Design and System Operations Department, Regional Information Center for Science and Technology, 71946 94171 Shiraz, Iran
| | - Ali Dehshahri
- Department of Pharmaceutical Biotechnology, School of Pharmacy, Shiraz University of Medical Sciences, 71468 64685 Shiraz, Iran
- Pharmaceutical Sciences Research Centre, Shiraz University of Medical Sciences, 71468 64685 Shiraz, Iran
| | - Amir Azadi
- Department of Pharmaceutics, School of Pharmacy, Shiraz University of Medical Sciences, 71468 64685 Shiraz, Iran
- Pharmaceutical Sciences Research Centre, Shiraz University of Medical Sciences, 71468 64685 Shiraz, Iran
| |
Collapse
|
10
|
Cunningham M, Pins D, Dezső Z, Torrent M, Vasanthakumar A, Pandey A. PINNED: identifying characteristics of druggable human proteins using an interpretable neural network. J Cheminform 2023; 15:64. [PMID: 37468968 PMCID: PMC10354961 DOI: 10.1186/s13321-023-00735-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 07/10/2023] [Indexed: 07/21/2023] Open
Abstract
The identification of human proteins that are amenable to pharmacologic modulation without significant off-target effects remains an important unsolved challenge. Computational methods have been devised to identify features which distinguish between "druggable" and "undruggable" proteins, finding that protein sequence, tissue and cellular localization, biological role, and position in the protein-protein interaction network are all important discriminant factors. However, many prior efforts to automate the assessment of protein druggability suffer from low performance or poor interpretability. We developed a neural network-based machine learning model capable of generating druggability sub-scores based on each of four distinct categories, combining them to form an overall druggability score. The model achieves an excellent performance in separating drugged and undrugged proteins in the human proteome, with an area under the receiver operating characteristic (AUC) of 0.95. Our use of multiple sub-scores allows the assessment of potential protein targets of interest based on distinct contributors to druggability, leading to a more interpretable and holistic model to identify novel targets.
Collapse
Affiliation(s)
- Michael Cunningham
- Genomics Research Center, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA.
| | - Danielle Pins
- Information Research, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA
| | - Zoltán Dezső
- Genomics Research Center, AbbVie Inc., 1000 Gateway Boulevard, South San Francisco, CA, 94080, USA
| | - Maricel Torrent
- Small Molecule Therapeutics and Platform Technologies, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA
| | - Aparna Vasanthakumar
- Genomics Research Center, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA
| | - Abhishek Pandey
- Information Research, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA
| |
Collapse
|
11
|
Chen J, Gu Z, Xu Y, Deng M, Lai L, Pei J. QuoteTarget: A sequence-based transformer protein language model to identify potentially druggable protein targets. Protein Sci 2023; 32:e4555. [PMID: 36564866 PMCID: PMC9878469 DOI: 10.1002/pro.4555] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2022] [Revised: 12/16/2022] [Accepted: 12/20/2022] [Indexed: 12/25/2022]
Abstract
The development of efficient computational methods for drug target protein identification can compensate for the high cost of experiments and is therefore of great significance for drug development. However, existing structure-based drug target protein-identification algorithms are limited by the insufficient number of proteins with experimentally resolved structures. Moreover, sequence-based algorithms cannot effectively extract information from protein sequences and thus display insufficient accuracy. Here, we combined the sequence-based self-supervised pretraining protein language model ESM1b with a graph convolutional neural network classifier to develop an improved, sequence-based drug target protein identification method. This complete model, named QuoteTarget, efficiently encodes proteins based on sequence information alone and achieves an accuracy of 95% with the nonredundant drug target and nondrug target datasets constructed for this study. When applied to all proteins from Homo sapiens, QuoteTarget identified 1213 potential undeveloped drug target proteins. We further inferred residue-binding weights from the well-trained network using the gradient-weighted class activation mapping (Grad-Cam) algorithm. Notably, we found that without any binding site information input, significant residues inferred by the model closely match the experimentally confirmed drug molecule-binding sites. Thus, our work provides a highly effective sequence-based identifier for drug target proteins, as well to yield new insights into recognizing drug molecule-binding sites. The entire model is available at https://github.com/Chenjxjx/drug-target-prediction.
Collapse
Affiliation(s)
- Jiaxiao Chen
- Center for Quantitative BiologyAcademy for Advanced Interdisciplinary Studies, Peking UniversityBeijingChina
| | - Zhonghui Gu
- Peking‐Tsinghua Center for Life SciencesAcademy for Advanced Interdisciplinary Studies, Peking UniversityBeijingChina
| | - Youjun Xu
- Infinite Intelligence PharmaBeijingChina
| | - Minghua Deng
- Center for Quantitative BiologyAcademy for Advanced Interdisciplinary Studies, Peking UniversityBeijingChina
- School of Mathematical SciencesPeking UniversityBeijingChina
- Center for Statistical SciencePeking UniversityBeijingChina
| | - Luhua Lai
- Center for Quantitative BiologyAcademy for Advanced Interdisciplinary Studies, Peking UniversityBeijingChina
- Peking‐Tsinghua Center for Life SciencesAcademy for Advanced Interdisciplinary Studies, Peking UniversityBeijingChina
- BNLMS, College of Chemistry and Molecular EngineeringPeking UniversityBeijingChina
- Research Unit of Drug Design MethodChinese Academy of Medical SciencesBeijingChina
| | - Jianfeng Pei
- Center for Quantitative BiologyAcademy for Advanced Interdisciplinary Studies, Peking UniversityBeijingChina
- Research Unit of Drug Design MethodChinese Academy of Medical SciencesBeijingChina
| |
Collapse
|
12
|
Iraji MS, Tanha J, Habibinejad M. Druggable protein prediction using a multi-canal deep convolutional neural network based on autocovariance method. Comput Biol Med 2022; 151:106276. [PMID: 36410099 DOI: 10.1016/j.compbiomed.2022.106276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 10/18/2022] [Accepted: 10/30/2022] [Indexed: 11/09/2022]
Abstract
Drug targets must be identified and positioned correctly to research and manufacture new drugs. In this study, rather than using traditional methods for drug expansion, the drug target is determined using machine learning. Machine learning has generated significant interest and desire in recent years and extensive research due to its low cost and speed of operation. As a result, it is critical to develop an intelligent classification system for drug proteins. This study proposes two distinct models for the prediction of druggable protein classes based on the deep learning method. The translation of drug-protein sequences is based on six physicochemical properties of amino acids. Following the application of the autocovariance method, converted sequences are used as fixed-length input vectors in deep stacked sparse auto-encoders (DSSAEs) network. The coded protein sequences are also considered and utilized as a six-channel input vector for the deep convolutional neural network model. The experimental results contributing to the deep convolution model are more efficient than previous studies for classifying druggable proteins. The proposed approach achieved a sensitivity of 96.92%, a specificity of 99.51%, and an accuracy of 98.29%.
Collapse
Affiliation(s)
- Mohammad Saber Iraji
- Department of Computer Engineering and Information Technology, Payame Noor University, Tehran, Iran; Department of Computer Engineering, University of Tabriz, Tabriz, Iran.
| | - Jafar Tanha
- Department of Computer Engineering, University of Tabriz, Tabriz, Iran
| | - Mahboobeh Habibinejad
- Department of Computer Engineering and Information Technology, Payame Noor University, Tehran, Iran
| |
Collapse
|
13
|
Raies A, Tulodziecka E, Stainer J, Middleton L, Dhindsa RS, Hill P, Engkvist O, Harper AR, Petrovski S, Vitsios D. DrugnomeAI is an ensemble machine-learning framework for predicting druggability of candidate drug targets. Commun Biol 2022; 5:1291. [PMID: 36434048 PMCID: PMC9700683 DOI: 10.1038/s42003-022-04245-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 11/09/2022] [Indexed: 11/27/2022] Open
Abstract
The druggability of targets is a crucial consideration in drug target selection. Here, we adopt a stochastic semi-supervised ML framework to develop DrugnomeAI, which estimates the druggability likelihood for every protein-coding gene in the human exome. DrugnomeAI integrates gene-level properties from 15 sources resulting in 324 features. The tool generates exome-wide predictions based on labelled sets of known drug targets (median AUC: 0.97), highlighting features from protein-protein interaction networks as top predictors. DrugnomeAI provides generic as well as specialised models stratified by disease type or drug therapeutic modality. The top-ranking DrugnomeAI genes were significantly enriched for genes previously selected for clinical development programs (p value < 1 × 10-308) and for genes achieving genome-wide significance in phenome-wide association studies of 450 K UK Biobank exomes for binary (p value = 1.7 × 10-5) and quantitative traits (p value = 1.6 × 10-7). We accompany our method with a web application ( http://drugnomeai.public.cgr.astrazeneca.com ) to visualise the druggability predictions and the key features that define gene druggability, per disease type and modality.
Collapse
Affiliation(s)
- Arwa Raies
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| | - Ewa Tulodziecka
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| | - James Stainer
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| | - Lawrence Middleton
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| | - Ryan S Dhindsa
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Waltham, MA, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, USA
| | - Pamela Hill
- Emerging Innovations, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Waltham, MA, USA
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Andrew R Harper
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| | - Slavé Petrovski
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
- Department of Medicine, University of Melbourne, Austin Health, Melbourne, VIC, Australia
| | - Dimitrios Vitsios
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK.
| |
Collapse
|
14
|
Wei Q, Zhang Q, Gao H, Song T, Salhi A, Yu B. DEEPStack-RBP: Accurate identification of RNA-binding proteins based on autoencoder feature selection and deep stacking ensemble classifier. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
|
15
|
Computational prediction and interpretation of druggable proteins using a stacked ensemble-learning framework. iScience 2022; 25:104883. [PMID: 36046193 PMCID: PMC9421381 DOI: 10.1016/j.isci.2022.104883] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Revised: 07/08/2022] [Accepted: 08/02/2022] [Indexed: 11/22/2022] Open
Abstract
Discovery of potential drugs requires rapid and precise identification of drug targets. Although traditional experimental methodologies can accurately identify drug targets, they are time-consuming and inappropriate for high-throughput screening. Computational approaches based on machine learning (ML) algorithms can expedite the prediction of druggable proteins; however, the performance of the existing computational methods remains unsatisfactory. This study proposes a computational tool, SPIDER, to enhance the accurate prediction of druggable proteins. SPIDER employs various feature descriptors pertaining to several aspects, including physicochemical properties, compositional information, and composition-transition-distribution information, coupled with well-known ML algorithms to facilitate the construction of the final meta-predictor. The experimental results showed that SPIDER enabled more precise and robust prediction of druggable proteins than the baseline models and current existing methods in terms of the independent test dataset. An online web server was established and made freely available online. Computational models can expedite the identification of potential druggable proteins SPIDER represents the first stacked model proposed for druggable protein prediction SPIDER enables more precise prediction of druggable proteins than existing methods The SPIDER web server is available at http://pmlabstack.pythonanywhere.com/SPIDER.
Collapse
|
16
|
Zou H, Yang F, Yin Z. Integrating multiple sequence features for identifying anticancer peptides. Comput Biol Chem 2022; 99:107711. [DOI: 10.1016/j.compbiolchem.2022.107711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 05/16/2022] [Accepted: 05/29/2022] [Indexed: 11/03/2022]
|
17
|
Villalobos-Alva J, Ochoa-Toledo L, Villalobos-Alva MJ, Aliseda A, Pérez-Escamirosa F, Altamirano-Bustamante NF, Ochoa-Fernández F, Zamora-Solís R, Villalobos-Alva S, Revilla-Monsalve C, Kemper-Valverde N, Altamirano-Bustamante MM. Protein Science Meets Artificial Intelligence: A Systematic Review and a Biochemical Meta-Analysis of an Inter-Field. Front Bioeng Biotechnol 2022; 10:788300. [PMID: 35875501 PMCID: PMC9301016 DOI: 10.3389/fbioe.2022.788300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Accepted: 05/25/2022] [Indexed: 11/23/2022] Open
Abstract
Proteins are some of the most fascinating and challenging molecules in the universe, and they pose a big challenge for artificial intelligence. The implementation of machine learning/AI in protein science gives rise to a world of knowledge adventures in the workhorse of the cell and proteome homeostasis, which are essential for making life possible. This opens up epistemic horizons thanks to a coupling of human tacit–explicit knowledge with machine learning power, the benefits of which are already tangible, such as important advances in protein structure prediction. Moreover, the driving force behind the protein processes of self-organization, adjustment, and fitness requires a space corresponding to gigabytes of life data in its order of magnitude. There are many tasks such as novel protein design, protein folding pathways, and synthetic metabolic routes, as well as protein-aggregation mechanisms, pathogenesis of protein misfolding and disease, and proteostasis networks that are currently unexplored or unrevealed. In this systematic review and biochemical meta-analysis, we aim to contribute to bridging the gap between what we call binomial artificial intelligence (AI) and protein science (PS), a growing research enterprise with exciting and promising biotechnological and biomedical applications. We undertake our task by exploring “the state of the art” in AI and machine learning (ML) applications to protein science in the scientific literature to address some critical research questions in this domain, including What kind of tasks are already explored by ML approaches to protein sciences? What are the most common ML algorithms and databases used? What is the situational diagnostic of the AI–PS inter-field? What do ML processing steps have in common? We also formulate novel questions such as Is it possible to discover what the rules of protein evolution are with the binomial AI–PS? How do protein folding pathways evolve? What are the rules that dictate the folds? What are the minimal nuclear protein structures? How do protein aggregates form and why do they exhibit different toxicities? What are the structural properties of amyloid proteins? How can we design an effective proteostasis network to deal with misfolded proteins? We are a cross-functional group of scientists from several academic disciplines, and we have conducted the systematic review using a variant of the PICO and PRISMA approaches. The search was carried out in four databases (PubMed, Bireme, OVID, and EBSCO Web of Science), resulting in 144 research articles. After three rounds of quality screening, 93 articles were finally selected for further analysis. A summary of our findings is as follows: regarding AI applications, there are mainly four types: 1) genomics, 2) protein structure and function, 3) protein design and evolution, and 4) drug design. In terms of the ML algorithms and databases used, supervised learning was the most common approach (85%). As for the databases used for the ML models, PDB and UniprotKB/Swissprot were the most common ones (21 and 8%, respectively). Moreover, we identified that approximately 63% of the articles organized their results into three steps, which we labeled pre-process, process, and post-process. A few studies combined data from several databases or created their own databases after the pre-process. Our main finding is that, as of today, there are no research road maps serving as guides to address gaps in our knowledge of the AI–PS binomial. All research efforts to collect, integrate multidimensional data features, and then analyze and validate them are, so far, uncoordinated and scattered throughout the scientific literature without a clear epistemic goal or connection between the studies. Therefore, our main contribution to the scientific literature is to offer a road map to help solve problems in drug design, protein structures, design, and function prediction while also presenting the “state of the art” on research in the AI–PS binomial until February 2021. Thus, we pave the way toward future advances in the synthetic redesign of novel proteins and protein networks and artificial metabolic pathways, learning lessons from nature for the welfare of humankind. Many of the novel proteins and metabolic pathways are currently non-existent in nature, nor are they used in the chemical industry or biomedical field.
Collapse
Affiliation(s)
- Jalil Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Luis Ochoa-Toledo
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Mario Javier Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Atocha Aliseda
- Instituto de Investigaciones Filosóficas, Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Fernando Pérez-Escamirosa
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | | | - Francine Ochoa-Fernández
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Ricardo Zamora-Solís
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Sebastián Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Cristina Revilla-Monsalve
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Nicolás Kemper-Valverde
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Myriam M. Altamirano-Bustamante
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
- *Correspondence: Myriam M. Altamirano-Bustamante,
| |
Collapse
|
18
|
Bektaş J. EKSL: An effective novel dynamic ensemble model for unbalanced datasets based on LR and SVM hyperplane-distances. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.03.042] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
19
|
Zou H. iAHTP-LH: Integrating Low-Order and High-Order Correlation Information for Identifying Antihypertensive Peptides. Int J Pept Res Ther 2022. [DOI: 10.1007/s10989-022-10414-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
20
|
Serov N, Vinogradov V. Artificial intelligence to bring nanomedicine to life. Adv Drug Deliv Rev 2022; 184:114194. [PMID: 35283223 DOI: 10.1016/j.addr.2022.114194] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 03/04/2022] [Accepted: 03/07/2022] [Indexed: 12/13/2022]
Abstract
The technology of drug delivery systems (DDSs) has demonstrated an outstanding performance and effectiveness in production of pharmaceuticals, as it is proved by many FDA-approved nanomedicines that have an enhanced selectivity, manageable drug release kinetics and synergistic therapeutic actions. Nonetheless, to date, the rational design and high-throughput development of nanomaterial-based DDSs for specific purposes is far from a routine practice and is still in its infancy, mainly due to the limitations in scientists' capabilities to effectively acquire, analyze, manage, and comprehend complex and ever-growing sets of experimental data, which is vital to develop DDSs with a set of desired functionalities. At the same time, this task is feasible for the data-driven approaches, high throughput experimentation techniques, process automatization, artificial intelligence (AI) technology, and machine learning (ML) approaches, which is referred to as The Fourth Paradigm of scientific research. Therefore, an integration of these approaches with nanomedicine and nanotechnology can potentially accelerate the rational design and high-throughput development of highly efficient nanoformulated drugs and smart materials with pre-defined functionalities. In this Review, we survey the important results and milestones achieved to date in the application of data science, high throughput, as well as automatization approaches, combined with AI and ML to design and optimize DDSs and related nanomaterials. This manuscript mission is not only to reflect the state-of-art in data-driven nanomedicine, but also show how recent findings in the related fields can transform the nanomedicine's image. We discuss how all these results can be used to boost nanomedicine translation to the clinic, as well as highlight the future directions for the development, data-driven, high throughput experimentation-, and AI-assisted design, as well as the production of nanoformulated drugs and smart materials with pre-defined properties and behavior. This Review will be of high interest to the chemists involved in materials science, nanotechnology, and DDSs development for biomedical applications, although the general nature of the presented approaches enables knowledge translation to many other fields of science.
Collapse
Affiliation(s)
- Nikita Serov
- International Institute "Solution Chemistry of Advanced Materials and Technologies", ITMO University, Saint-Petersburg 191002, Russian Federation
| | - Vladimir Vinogradov
- International Institute "Solution Chemistry of Advanced Materials and Technologies", ITMO University, Saint-Petersburg 191002, Russian Federation.
| |
Collapse
|
21
|
Sikander R, Ghulam A, Ali F. XGB-DrugPred: computational prediction of druggable proteins using eXtreme gradient boosting and optimized features set. Sci Rep 2022; 12:5505. [PMID: 35365726 PMCID: PMC8976041 DOI: 10.1038/s41598-022-09484-3] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Accepted: 03/07/2022] [Indexed: 11/19/2022] Open
Abstract
Accurate identification of drug-targets in human body has great significance for designing novel drugs. Compared with traditional experimental methods, prediction of drug-targets via machine learning algorithms has enhanced the attention of many researchers due to fast and accurate prediction. In this study, we propose a machine learning-based method, namely XGB-DrugPred for accurate prediction of druggable proteins. The features from primary protein sequences are extracted by group dipeptide composition, reduced amino acid alphabet, and novel encoder pseudo amino acid composition segmentation. To select the best feature set, eXtreme Gradient Boosting-recursive feature elimination is implemented. The best feature set is provided to eXtreme Gradient Boosting (XGB), Random Forest, and Extremely Randomized Tree classifiers for model training and prediction. The performance of these classifiers is evaluated by tenfold cross-validation. The empirical results show that XGB-based predictor achieves the best results compared with other classifiers and existing methods in the literature.
Collapse
Affiliation(s)
- Rahu Sikander
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, China.
| | - Ali Ghulam
- Computerization and Network Section, Sindh Agriculture University, Tandojam, Pakistan
| | - Farman Ali
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
| |
Collapse
|
22
|
Zou H, Zhan C. Using Multi‐Level Correlation Information to Identify Amyloidogenic Peptides. ChemistrySelect 2022. [DOI: 10.1002/slct.202104578] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Hongliang Zou
- School of Communications and Electronics Jiangxi Science and Technology Normal University Nanchang 330003 China
| | - Chun Zhan
- School of Communications and Electronics Jiangxi Science and Technology Normal University Nanchang 330003 China
| |
Collapse
|
23
|
Zou H, Yang F, Yin Z. iTTCA-MFF: identifying tumor T cell antigens based on multiple feature fusion. Immunogenetics 2022; 74:447-454. [PMID: 35246701 DOI: 10.1007/s00251-022-01258-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 02/26/2022] [Indexed: 11/05/2022]
Abstract
Cancer is a terrible disease, recent studies reported that tumor T cell antigens (TTCAs) may play a promising role in cancer treatment. Since experimental methods are still expensive and time-consuming, it is highly desirable to develop automatic computational methods to identify tumor T cell antigens from the huge amount of natural and synthetic peptides. Hence, in this study, a novel computational model called iTTCA-MFF was proposed to identify TTCAs. In order to describe the sequence effectively, the physicochemical (PC) properties of amino acid and residue pairwise energy content matrix (RECM) were firstly employed to encode peptide sequences. Then, two different approaches including covariance and Pearson's correlation coefficient (PCC) were used to collect discriminative information from PC and RECM matrixes. Next, an effective feature selection approach called the least absolute shrinkage and selection operator (LAASO) was adopted to select the optimal features. These selected optimal features were fed into support vector machine (SVM) for identifying TTCAs. We performed experiments on two different datasets, experimental results indicated that the proposed method is promising and may play a complementary role to the existing methods for identifying TTCAs. The datasets and codes can be available at https://figshare.com/articles/online_resource/iTTCA-MFF/17636120 .
Collapse
Affiliation(s)
- Hongliang Zou
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, 330003, China.
| | - Fan Yang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, 330003, China
| | - Zhijian Yin
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, 330003, China
| |
Collapse
|
24
|
Ding Y, Tang J, Guo F, Zou Q. Identification of drug-target interactions via multiple kernel-based triple collaborative matrix factorization. Brief Bioinform 2022; 23:6520305. [PMID: 35134117 DOI: 10.1093/bib/bbab582] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 12/02/2021] [Accepted: 12/19/2021] [Indexed: 12/15/2022] Open
Abstract
Targeted drugs have been applied to the treatment of cancer on a large scale, and some patients have certain therapeutic effects. It is a time-consuming task to detect drug-target interactions (DTIs) through biochemical experiments. At present, machine learning (ML) has been widely applied in large-scale drug screening. However, there are few methods for multiple information fusion. We propose a multiple kernel-based triple collaborative matrix factorization (MK-TCMF) method to predict DTIs. The multiple kernel matrices (contain chemical, biological and clinical information) are integrated via multi-kernel learning (MKL) algorithm. And the original adjacency matrix of DTIs could be decomposed into three matrices, including the latent feature matrix of the drug space, latent feature matrix of the target space and the bi-projection matrix (used to join the two feature spaces). To obtain better prediction performance, MKL algorithm can regulate the weight of each kernel matrix according to the prediction error. The weights of drug side-effects and target sequence are the highest. Compared with other computational methods, our model has better performance on four test data sets.
Collapse
Affiliation(s)
- Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, P.R.China
| | - Jijun Tang
- Department of Computational Science and Engineering, University of South Carolina, Columbia, U.S
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha, P.R.China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, P.R.China
| |
Collapse
|
25
|
Accurate prediction of immunoglobulin proteins using machine learning model. INFORMATICS IN MEDICINE UNLOCKED 2022. [DOI: 10.1016/j.imu.2022.100885] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
|
26
|
Yu L, Xue L, Liu F, Li Y, Jing R, Luo J. The applications of deep learning algorithms on in silico druggable proteins identification. J Adv Res 2022; 41:219-231. [PMID: 36328750 PMCID: PMC9637576 DOI: 10.1016/j.jare.2022.01.009] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 12/21/2021] [Accepted: 01/18/2022] [Indexed: 11/20/2022] Open
Abstract
We developed the first deep learning-based druggable protein classifier for fast and accurate identification of potential druggable proteins. Experimental results on a standard dataset demonstrate that the prediction performance of deep learning model is comparable to those of existing methods. We visualized the representations of druggable proteins learned by deep learning models, which helps us understand how they work. Our analysis reconfirms that the attention mechanism is especially useful for explaining deep learning models.
Introduction The top priority in drug development is to identify novel and effective drug targets. In vitro assays are frequently used for this purpose; however, traditional experimental approaches are insufficient for large-scale exploration of novel drug targets, as they are expensive, time-consuming and laborious. Therefore, computational methods have emerged in recent decades as an alternative to aid experimental drug discovery studies by developing sophisticated predictive models to estimate unknown drugs/compounds and their targets. The recent success of deep learning (DL) techniques in machine learning and artificial intelligence has further attracted a great deal of attention in the biomedicine field, including computational drug discovery. Objectives This study focuses on the practical applications of deep learning algorithms for predicting druggable proteins and proposes a powerful predictor for fast and accurate identification of potential drug targets. Methods Using a gold-standard dataset, we explored several typical protein features and different deep learning algorithms and evaluated their performance in a comprehensive way. We provide an overview of the entire experimental process, including protein features and descriptors, neural network architectures, libraries and toolkits for deep learning modelling, performance evaluation metrics, model interpretation and visualization. Results Experimental results show that the hybrid model (architecture: CNN-RNN (BiLSTM) + DNN; feature: dictionary encoding + DC_TC_CTD) performed better than the other models on the benchmark dataset. This hybrid model was able to achieve 90.0% accuracy and 0.800 MCC on the test dataset and 84.8% and 0.703 on a nonredundant independent test dataset, which is comparable to those of existing methods. Conclusion We developed the first deep learning-based classifier for fast and accurate identification of potential druggable proteins. We hope that this study will be helpful for future researchers who would like to use deep learning techniques to develop relevant predictive models.
Collapse
|
27
|
Gong Y, Liao B, Wang P, Zou Q. DrugHybrid_BS: Using Hybrid Feature Combined With Bagging-SVM to Predict Potentially Druggable Proteins. Front Pharmacol 2021; 12:771808. [PMID: 34916947 PMCID: PMC8669608 DOI: 10.3389/fphar.2021.771808] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 11/15/2021] [Indexed: 01/09/2023] Open
Abstract
Drug targets are biological macromolecules or biomolecule structures capable of specifically binding a therapeutic effect with a particular drug or regulating physiological functions. Due to the important value and role of drug targets in recent years, the prediction of potential drug targets has become a research hotspot. The key to the research and development of modern new drugs is first to identify potential drug targets. In this paper, a new predictor, DrugHybrid_BS, is developed based on hybrid features and Bagging-SVM to identify potentially druggable proteins. This method combines the three features of monoDiKGap (k = 2), cross-covariance, and grouped amino acid composition. It removes redundant features and analyses key features through MRMD and MRMD2.0. The cross-validation results show that 96.9944% of the potentially druggable proteins can be accurately identified, and the accuracy of the independent test set has reached 96.5665%. This all means that DrugHybrid_BS has the potential to become a useful predictive tool for druggable proteins. In addition, the hybrid key features can identify 80.0343% of the potentially druggable proteins combined with Bagging-SVM, which indicates the significance of this part of the features for research.
Collapse
Affiliation(s)
- Yuxin Gong
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Smart Education, Hainan Normal University, Ministry of Education, Haikou, China
| | - Bo Liao
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Smart Education, Hainan Normal University, Ministry of Education, Haikou, China
| | - Peng Wang
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Smart Education, Hainan Normal University, Ministry of Education, Haikou, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| |
Collapse
|
28
|
iDHS-DT: Identifying DNase I hypersensitive sites by integrating DNA dinucleotide and trinucleotide information. Biophys Chem 2021; 281:106717. [PMID: 34798459 DOI: 10.1016/j.bpc.2021.106717] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 11/10/2021] [Accepted: 11/10/2021] [Indexed: 01/02/2023]
Abstract
DNase I hypersensitive sites (DHSs) is important for identifying the location of gene regulatory elements, such as promoters, enhancers, silencers, and so on. Thus, it is crucial for discriminating DHSs from non-DHSs. Although some traditional methods, such as Southern blots and DNase-seq technique, have the ability to identify DHSs, these approaches are time-consuming, laborious, and expensive. To address these issues, researchers paid their attention on computational approaches. Therefore, in this study, we developed a novel predictor called iDHS-DT to identify DHSs. In this predictor, the DNA sequences were firstly denoted by physicochemical properties (PC) of DNA dinucleotide and trinucleotide. Then, three different descriptors, including auto-covariance, cross-covariance, and discrete wavelet transform were used to collect related features from the PC matrix. Next, the least absolute shrinkage and selection operator (LASSO) algorithm was employed to remove these irrelevant and redundant features. Finally, these selected features were fed into support vector machine (SVM) for distinguishing DHSs from non-DHSs. The proposed method achieved 97.64% and 98.22% classification accuracy on dataset S1 and S2, respectively. Compared with the existing predictors, our proposed model has significantly improvement in classification performance. Experimental results demonstrated that the proposed method is powerful in identifying DHSs.
Collapse
|
29
|
A band selection approach based on wavelet support vector machine ensemble model and membrane whale optimization algorithm for hyperspectral image. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02270-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
30
|
Zou H. Identifying blood‐brain barrier peptides by using amino acids physicochemical properties and features fusion method. Pept Sci (Hoboken) 2021. [DOI: 10.1002/pep2.24247] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Hongliang Zou
- School of Communications and Electronics Jiangxi Science and Technology Normal University Nanchang China
| |
Collapse
|
31
|
Zou H, Yin Z. m7G-DPP: Identifying N7-methylguanosine sites based on dinucleotide physicochemical properties of RNA. Biophys Chem 2021; 279:106697. [PMID: 34628276 DOI: 10.1016/j.bpc.2021.106697] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2021] [Revised: 10/01/2021] [Accepted: 10/02/2021] [Indexed: 11/17/2022]
Abstract
N7-methylguanosine (m7G) modification is one of the most common post-transcriptional RNA modifications, which play vital role in the regulation of gene expression. Dysfunction of m7G may result to developmental defects and the appearance of some serious diseases. Thus, it is an urgent task to fast and accurate identifying m7G sites. In view of experimental approaches are costly and time-consuming, researchers focused their attention on computational models. Hence, in current study, we proposed a novel predictor called m7G-DPP to identify m7G sites. In the predictor, the RNA sequences were firstly encoded by physicochemical (PC) properties of dinucleotide. Then, sliding window approach was adopted to divide PC matrix into multiple matrixes, and Pearson's correlation coefficient (PCC), dynamic time warping (DTW), and distance correlation (DC) were employed to extract classification features at each window. Next, the least absolute shrinkage and selection operator (LASSO) algorithm was applied to select discriminative features. Finally, these selected features were fed into support vector machine to identify m7G sites. Experimental results showed that the proposed method is effective, which may play a complementary role in current m7G sites prediction studies. The MATLAB codes and dataset can be obtained from website at https://figshare.com/articles/online_resource/m7G-DPP/15000348.
Collapse
Affiliation(s)
- Hongliang Zou
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang 330003, China.
| | - Zhijian Yin
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang 330003, China
| |
Collapse
|
32
|
Identification of drug-target interactions via multi-view graph regularized link propagation model. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.05.100] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
33
|
de Oliveira LN, do Nascimento EO, Caldas LVE. A new natural detector for irradiations with blue LED light source in photodynamic therapy measurements via UV-Vis spectroscopy. Photochem Photobiol Sci 2021; 20:1381-1395. [PMID: 34591269 DOI: 10.1007/s43630-021-00088-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 08/03/2021] [Indexed: 11/28/2022]
Abstract
Photodynamic therapy has been recently studied, bringing innovations regarding the reduction of exposure time to light by the patient. This work aimed to investigate the feasibility of using Coutarea hexandra (Jacq.) K. Schum (CHS) as a detector in photodynamic therapy measurements. For this, an irradiator containing a blue LED bulb lamp was utilized. The CHS samples were irradiated with ten doses from 0.60 up to 6.0 kJ/cm2, and six concentrations were prepared (1, 2, 3, 4, 5, and 6 mg/ml) for the CHS detector samples. After irradiation, the detector samples were evaluated using UV-Vis spectrophotometry. The results showed the behavior of the CHS detector with doses and concentrations, its sensitivity, and its linearity was also evaluated both by Wavelength Method (WM) and the Kernel Principal Component Regression (KPCR) Statistical Method. The values obtained indicate that this method can be applied to the CHS sample detector. In conclusion, the CHS is a promising detector in the field of photodynamic therapy.
Collapse
Affiliation(s)
- Lucas N de Oliveira
- Instituto Federal de Educação, Ciência e Tecnologia de Goiás-IFG, Rua 75, 46, Campus Goiânia, Goiânia, GO, 74055-110, Brazil. .,Instituto de Pesquisas Energéticas e Nucleares, Comissão Nacional de Energia Nuclear-IPEN/CNEN, Av. Prof. Lineu Prestes, 2242, São Paulo, SP, 05508-000, Brazil.
| | - Eriberto O do Nascimento
- Instituto Federal de Educação, Ciência e Tecnologia de Goiás-IFG, Rua 75, 46, Campus Goiânia, Goiânia, GO, 74055-110, Brazil
| | - Linda V E Caldas
- Instituto de Pesquisas Energéticas e Nucleares, Comissão Nacional de Energia Nuclear-IPEN/CNEN, Av. Prof. Lineu Prestes, 2242, São Paulo, SP, 05508-000, Brazil
| |
Collapse
|
34
|
Identifying Dipeptidyl Peptidase-IV Inhibitory Peptides Based on Correlation Information of Physicochemical Properties. Int J Pept Res Ther 2021. [DOI: 10.1007/s10989-021-10280-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
35
|
Akbar S, Ahmad A, Hayat M, Rehman AU, Khan S, Ali F. iAtbP-Hyb-EnC: Prediction of antitubercular peptides via heterogeneous feature representation and genetic algorithm based ensemble learning model. Comput Biol Med 2021; 137:104778. [PMID: 34481183 DOI: 10.1016/j.compbiomed.2021.104778] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Revised: 08/16/2021] [Accepted: 08/17/2021] [Indexed: 11/26/2022]
Abstract
Tuberculosis (TB) is a worldwide illness caused by the bacteria Mycobacterium tuberculosis. Owing to the high prevalence of multidrug-resistant tuberculosis, numerous traditional strategies for developing novel alternative therapies have been presented. The effectiveness and dependability of these procedures are not always consistent. Peptide-based therapy has recently been regarded as a preferable alternative due to its excellent selectivity in targeting specific cells without affecting the normal cells. However, due to the rapid growth of the peptide samples, predicting TB accurately has become a challenging task. To effectively identify antitubercular peptides, an intelligent and reliable prediction model is indispensable. An ensemble learning approach was used in this study to improve expected results by compensating for the shortcomings of individual classification algorithms. Initially, three distinct representation approaches were used to formulate the training samples: k-space amino acid composition, composite physiochemical properties, and one-hot encoding. The feature vectors of the applied feature extraction methods are then combined to generate a heterogeneous vector. Finally, utilizing individual and heterogeneous vectors, five distinct nature classification models were used to evaluate prediction rates. In addition, a genetic algorithm-based ensemble model was used to improve the suggested model's prediction and training capabilities. Using Training and independent datasets, the proposed ensemble model achieved an accuracy of 94.47% and 92.68%, respectively. It was observed that our proposed "iAtbP-Hyb-EnC" model outperformed and reported ~10% highest training accuracy than existing predictors. The "iAtbP-Hyb-EnC" model is suggested to be a reliable tool for scientists and might play a valuable role in academic research and drug discovery. The source code and all datasets are publicly available at https://github.com/Farman335/iAtbP-Hyb-EnC.
Collapse
Affiliation(s)
- Shahid Akbar
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP, 23200, Pakistan.
| | - Ashfaq Ahmad
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP, 23200, Pakistan.
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP, 23200, Pakistan.
| | - Ateeq Ur Rehman
- Department of Information Technology, The University of Haripur, KP, Pakistan.
| | - Salman Khan
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP, 23200, Pakistan.
| | - Farman Ali
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.
| |
Collapse
|
36
|
Garnica O, Gómez D, Ramos V, Hidalgo JI, Ruiz-Giardín JM. Diagnosing hospital bacteraemia in the framework of predictive, preventive and personalised medicine using electronic health records and machine learning classifiers. EPMA J 2021; 12:365-381. [PMID: 34484472 PMCID: PMC8405861 DOI: 10.1007/s13167-021-00252-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 07/30/2021] [Indexed: 12/12/2022]
Abstract
Background The bacteraemia prediction is relevant because sepsis is one of the most important causes of morbidity and mortality. Bacteraemia prognosis primarily depends on a rapid diagnosis. The bacteraemia prediction would shorten up to 6 days the diagnosis, and, in conjunction with individual patient variables, should be considered to start the early administration of personalised antibiotic treatment and medical services, the election of specific diagnostic techniques and the determination of additional treatments, such as surgery, that would prevent subsequent complications. Machine learning techniques could help physicians make these informed decisions by predicting bacteraemia using the data already available in electronic hospital records. Objective This study presents the application of machine learning techniques to these records to predict the blood culture's outcome, which would reduce the lag in starting a personalised antibiotic treatment and the medical costs associated with erroneous treatments due to conservative assumptions about blood culture outcomes. Methods Six supervised classifiers were created using three machine learning techniques, Support Vector Machine, Random Forest and K-Nearest Neighbours, on the electronic health records of hospital patients. The best approach to handle missing data was chosen and, for each machine learning technique, two classification models were created: the first uses the features known at the time of blood extraction, whereas the second uses four extra features revealed during the blood culture. Results The six classifiers were trained and tested using a dataset of 4357 patients with 117 features per patient. The models obtain predictions that, for the best case, are up to a state-of-the-art accuracy of 85.9%, a sensitivity of 87.4% and an AUC of 0.93. Conclusions Our results provide cutting-edge metrics of interest in predictive medical models with values that exceed the medical practice threshold and previous results in the literature using classical modelling techniques in specific types of bacteraemia. Additionally, the consistency of results is reasserted because the three classifiers' importance ranking shows similar features that coincide with those that physicians use in their manual heuristics. Therefore, the efficacy of these machine learning techniques confirms their viability to assist in the aims of predictive and personalised medicine once the disease presents bacteraemia-compatible symptoms and to assist in improving the healthcare economy.
Collapse
Affiliation(s)
- Oscar Garnica
- Departamento de Arquitectura de Computadores, Universidad Complutense de Madrid, Madrid, Spain
| | - Diego Gómez
- Universidad Complutense de Madrid, Madrid, Spain
| | - Víctor Ramos
- Universidad Complutense de Madrid, Madrid, Spain
| | - J. Ignacio Hidalgo
- Departamento de Arquitectura de Computadores, Universidad Complutense de Madrid, Madrid, Spain
| | - José M. Ruiz-Giardín
- Departamento de Medicina Interna, Hospital Universitario de Fuenlabrada, Madrid, Spain
| |
Collapse
|
37
|
Liu Y, Jin S, Song L, Han Y, Yu B. Prediction of protein ubiquitination sites via multi-view features based on eXtreme gradient boosting classifier. J Mol Graph Model 2021; 107:107962. [PMID: 34198216 DOI: 10.1016/j.jmgm.2021.107962] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 05/03/2021] [Accepted: 06/02/2021] [Indexed: 01/29/2023]
Abstract
Ubiquitination is a common and reversible post-translational protein modification that regulates apoptosis and plays an important role in protein degradation and cell diseases. However, experimental identification of protein ubiquitination sites is usually time-consuming and labor-intensive, so it is necessary to establish effective predictors. In this study, we propose a ubiquitination sites prediction method based on multi-view features, namely UbiSite-XGBoost. Firstly, we use seven single-view features encoding methods to convert protein sequence fragments into digital information. Secondly, the least absolute shrinkage and selection operator (LASSO) is applied to remove the redundant information and get the optimal feature subsets. Finally, these features are inputted into the eXtreme gradient boosting (XGBoost) classifier to predict ubiquitination sites. Five-fold cross-validation shows that the AUC values of Set1-Set6 datasets are 0.8258, 0.7592, 0.7853, 0.8345, 0.8979 and 0.8901, respectively. The synthetic minority oversampling technique (SMOTE) is employed in Set4-Set6 unbalanced datasets, and the AUC values are 0.9777, 0.9782 and 0.9860, respectively. In addition, we have constructed three independent test datasets which the AUC values are 0.8007, 0.6897 and 0.7280, respectively. The results show that the proposed method UbiSite-XGBoost is superior to other ubiquitination prediction methods and it provides new guidance for the identification of ubiquitination sites. The source code and all datasets are available at https://github.com/QUST-AIBBDRC/UbiSite-XGBoost/.
Collapse
Affiliation(s)
- Yushuang Liu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Shuping Jin
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Lili Song
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Yu Han
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China; Key Laboratory of Computational Science and Application of Hainan Province, Haikou, 571158, China.
| |
Collapse
|
38
|
Xia T, Zhuo P, Xiao L, Du S, Wang D, Xi L. Multi-stage fault diagnosis framework for rolling bearing based on OHF Elman AdaBoost-Bagging algorithm. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.10.003] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
39
|
Zhang S, Zhu F, Yu Q, Zhu X. Identifying DNA-binding proteins based on multi-features and LASSO feature selection. Biopolymers 2021; 112:e23419. [PMID: 33476047 DOI: 10.1002/bip.23419] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Revised: 01/08/2021] [Accepted: 01/08/2021] [Indexed: 01/22/2023]
Abstract
DNA-binding proteins perform an indispensable function in the maintenance and processing of genetic information and are inefficiently identified by traditional experimental methods due to their huge quantities. On the contrary, machine learning methods as an emerging technique demonstrate satisfactory speed and accuracy when used to study these molecules. This work focuses on extracting four different features from primary and secondary sequence features: Reduced sequence and index-vectors (RS), Pseudo-amino acid components (PseAACS), Position-specific scoring matrix-Auto Cross Covariance Transform (PSSM-ACCT), and Position-specific scoring matrix-Discrete Wavelet Transform (PSSM-DWT). Using the LASSO dimension reduction method, we experiment on the combination of feature submodels to obtain the optimized number of top rank features. These features are respectively input into the training Ensemble subspace discriminant, Ensemble bagged tree and KNN to predict the DNA-binding proteins. Three different datasets, PDB594, PDB1075, and PDB186, are adopted to evaluate the performance of the as-proposed approach in this work. The PDB1075 and PDB594 datasets are adopted for the five-fold cross-validation, and the PDB186 is used for the independent experiment. In the five-fold cross-validation, both the PDB1075 and PDB594 show extremely high accuracy, reaching 86.98% and 88.9% by Ensemble subspace discriminant, respectively. The accuracy of independent experiment by multi-classifiers voting is 83.33%, which suggests that the methodology proposed in this work is capable of predicting DNA-binding proteins effectively.
Collapse
Affiliation(s)
- Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an, China
| | - Fu Zhu
- School of Mathematics and Statistics, Xidian University, Xi'an, China
| | - Qianhao Yu
- School of Artificial Intelligence, Xidian University, Xi'an, China
| | - Xiaoyue Zhu
- School of Electronic Engineering, Xidian University, Xi'an, China
| |
Collapse
|
40
|
Identification of Drug–Target Interactions via Dual Laplacian Regularized Least Squares with Multiple Kernel Fusion. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.106254] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
41
|
Chen C, Zhang Q, Yu B, Yu Z, Lawrence PJ, Ma Q, Zhang Y. Improving protein-protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier. Comput Biol Med 2020; 123:103899. [DOI: 10.1016/j.compbiomed.2020.103899] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Revised: 06/28/2020] [Accepted: 06/28/2020] [Indexed: 10/23/2022]
|
42
|
Liu B, He H, Luo H, Zhang T, Jiang J. Artificial intelligence and big data facilitated targeted drug discovery. Stroke Vasc Neurol 2019; 4:206-213. [PMID: 32030204 PMCID: PMC6979871 DOI: 10.1136/svn-2019-000290] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 10/28/2019] [Indexed: 12/20/2022] Open
Abstract
Different kinds of biological databases publicly available nowadays provide us a goldmine of multidiscipline big data. The Cancer Genome Atlas is a cancer database including detailed information of many patients with cancer. DrugBank is a database including detailed information of approved, investigational and withdrawn drugs, as well as other nutraceutical and metabolite structures. PubChem is a chemical compound database including all commercially available compounds as well as other synthesisable compounds. Protein Data Bank is a crystal structure database including X-ray, cryo-EM and nuclear magnetic resonance protein three-dimensional structures as well as their ligands. On the other hand, artificial intelligence (AI) is playing an important role in the drug discovery progress. The integration of such big data and AI is making a great difference in the discovery of novel targeted drug. In this review, we focus on the currently available advanced methods for the discovery of highly effective lead compounds with great absorption, distribution, metabolism, excretion and toxicity properties.
Collapse
Affiliation(s)
- Benquan Liu
- Jiangsu Key Lab of Drug Screening, China Pharmaceutical University, Nanjing, China
| | - Huiqin He
- Jiangsu Key Lab of Drug Screening, China Pharmaceutical University, Nanjing, China
| | - Hongyi Luo
- Jiangsu Key Lab of Drug Screening, China Pharmaceutical University, Nanjing, China
| | - Tingting Zhang
- Jiangsu Key Lab of Drug Screening, China Pharmaceutical University, Nanjing, China
| | - Jingwei Jiang
- Jiangsu Key Lab of Drug Screening, China Pharmaceutical University, Nanjing, China
| |
Collapse
|