1
|
Ebrahimie E, Zamansani F, Alanazi IO, Sabi EM, Khazandi M, Ebrahimi F, Mohammadi-Dehcheshmeh M, Ebrahimi M. Advances in understanding the specificity function of transporters by machine learning. Comput Biol Med 2021; 138:104893. [PMID: 34598069 DOI: 10.1016/j.compbiomed.2021.104893] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 09/20/2021] [Accepted: 09/22/2021] [Indexed: 11/25/2022]
Abstract
Understanding the underlying molecular mechanism of transporter activity is one of the major discussions in structural biology. A transporter can exclusively transport one ion (specific transporter) or multiple ions (general transporter). This study compared categorical and numerical features of general and specific calcium transporters using machine learning and attribute weighting models. To this end, 444 protein features, such as the frequency of dipeptides, organism, and subcellular location, were extracted for general (n = 103) and specific calcium transporters (n = 238). Aliphatic index, subcellular location, organism, Ile-Leu frequency, Glycine frequency, hydrophobic frequency, and specific dipeptides such as Ile-Leu, Phe-Val, and Tyr-Gln were the key features in differentiating general from specific calcium transporters. Calcium transporters in the cell outer membranes were specific, while the inner ones were general; additionally, when the hydrophobic frequency or Aliphatic index is increased, the calcium transporter act as a general transporter. Random Forest with accuracy criterion showed the highest accuracy (88.88% ±5.75%) and high AUC (0.964 ± 0.020), based on 5-fold cross-validation. Decision Tree with accuracy criterion was able to predict the specificity of calcium transporter irrespective of the organism and subcellular location. This study demonstrates the precise classification of transporter function based on sequence-derived physicochemical features.
Collapse
Affiliation(s)
- Esmaeil Ebrahimie
- Genomics Research Platform, School of Life Sciences, College of Science, Health and Engineering, La Trobe University, Melbourne, Victoria, 3086, Australia; School of Animal and Veterinary Sciences, The University of Adelaide, South Australia, 5371, Australia.
| | - Fatemeh Zamansani
- Department of Crop Production and Plant Breeding, College of Agriculture, Shiraz University, Shiraz, Iran.
| | - Ibrahim O Alanazi
- National Center for Biotechnology, Life Science and Environment Research Institute, King Abdulaziz City for Science and Technology (KACST), Riyadh, 6086, Saudi Arabia.
| | - Essa M Sabi
- Department of Pathology, Clinical Biochemistry Unit, College of Medicine, King Saud University, Riyadh, 11461, Saudi Arabia.
| | - Manouchehr Khazandi
- UniSA Clinical and Health Sciences, The University of South Australia, Adelaide, 5000, Australia.
| | - Faezeh Ebrahimi
- Faculty of Life Sciences and Biotechnology, Department of Microbiology and Microbial Biotechnology, Shahid Beheshti University, Tehran, Iran.
| | | | - Mansour Ebrahimi
- School of Animal and Veterinary Sciences, The University of Adelaide, South Australia, 5371, Australia; Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran.
| |
Collapse
|
2
|
Hu Y, Zhao T, Zhang N, Zhang Y, Cheng L. A Review of Recent Advances and Research on Drug Target Identification Methods. Curr Drug Metab 2019; 20:209-216. [PMID: 30251599 DOI: 10.2174/1389200219666180925091851] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2017] [Revised: 01/01/2018] [Accepted: 08/02/2018] [Indexed: 12/14/2022]
Abstract
BACKGROUND From a therapeutic viewpoint, understanding how drugs bind and regulate the functions of their target proteins to protect against disease is crucial. The identification of drug targets plays a significant role in drug discovery and studying the mechanisms of diseases. Therefore the development of methods to identify drug targets has become a popular issue. METHODS We systematically review the recent work on identifying drug targets from the view of data and method. We compiled several databases that collect data more comprehensively and introduced several commonly used databases. Then divided the methods into two categories: biological experiments and machine learning, each of which is subdivided into different subclasses and described in detail. RESULTS Machine learning algorithms are the majority of new methods. Generally, an optimal set of features is chosen to predict successful new drug targets with similar properties. The most widely used features include sequence properties, network topological features, structural properties, and subcellular locations. Since various machine learning methods exist, improving their performance requires combining a better subset of features and choosing the appropriate model for the various datasets involved. CONCLUSION The application of experimental and computational methods in protein drug target identification has become increasingly popular in recent years. Current biological and computational methods still have many limitations due to unbalanced and incomplete datasets or imperfect feature selection methods.
Collapse
Affiliation(s)
- Yang Hu
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Tianyi Zhao
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Ningyi Zhang
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Ying Zhang
- Department of Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin 150088, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| |
Collapse
|
3
|
Clustering of fungal hexosaminidase enzymes based on free alignment method using MLP neural network. Neural Comput Appl 2018. [DOI: 10.1007/s00521-017-2876-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
4
|
Kargarfard F, Sami A, Mohammadi-Dehcheshmeh M, Ebrahimie E. Novel approach for identification of influenza virus host range and zoonotic transmissible sequences by determination of host-related associative positions in viral genome segments. BMC Genomics 2016; 17:925. [PMID: 27852224 PMCID: PMC5112743 DOI: 10.1186/s12864-016-3250-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2016] [Accepted: 11/02/2016] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Recent (2013 and 2009) zoonotic transmission of avian or porcine influenza to humans highlights an increase in host range by evading species barriers. Gene reassortment or antigenic shift between viruses from two or more hosts can generate a new life-threatening virus when the new shuffled virus is no longer recognized by antibodies existing within human populations. There is no large scale study to help understand the underlying mechanisms of host transmission. Furthermore, there is no clear understanding of how different segments of the influenza genome contribute in the final determination of host range. METHODS To obtain insight into the rules underpinning host range determination, various supervised machine learning algorithms were employed to mine reassortment changes in different viral segments in a range of hosts. Our multi-host dataset contained whole segments of 674 influenza strains organized into three host categories: avian, human, and swine. Some of the sequences were assigned to multiple hosts. In point of fact, the datasets are a form of multi-labeled dataset and we utilized a multi-label learning method to identify discriminative sequence sites. Then algorithms such as CBA, Ripper, and decision tree were applied to extract informative and descriptive association rules for each viral protein segment. RESULT We found informative rules in all segments that are common within the same host class but varied between different hosts. For example, for infection of an avian host, HA14V and NS1230S were the most important discriminative and combinatorial positions. CONCLUSION Host range identification is facilitated by high support combined rules in this study. Our major goal was to detect discriminative genomic positions that were able to identify multi host viruses, because such viruses are likely to cause pandemic or disastrous epidemics.
Collapse
Affiliation(s)
- Fatemeh Kargarfard
- Department of Computer Science and Engineering, School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran
| | - Ashkan Sami
- Department of Computer Science and Engineering, School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran
| | - Manijeh Mohammadi-Dehcheshmeh
- School of Animal and Veterinary Sciences, The University of Adelaide, Adelaide, Australia
- Institute of Biotechnology, Shiraz University, Shiraz, Iran
| | - Esmaeil Ebrahimie
- School of Animal and Veterinary Sciences, The University of Adelaide, Adelaide, Australia
- School of Medicine, Faculty of Health Sciences, The University of Adelaide, Adelaide, Australia
- Institute of Biotechnology, Shiraz University, Shiraz, Iran
- School of Information Technology and Mathematical Sciences, Division of Information Technology, Engineering and the Environment, University of South Australia, Adelaide, Australia
- School of Biological Sciences, Faculty of Science and Engineering, Flinders University, Adelaide, Australia
| |
Collapse
|
5
|
Pashaiasl M, Khodadadi K, Kayvanjoo AH, Pashaei-asl R, Ebrahimie E, Ebrahimi M. Unravelling evolution of Nanog, the key transcription factor involved in self-renewal of undifferentiated embryonic stem cells, by pattern recognition in nucleotide and tandem repeats characteristics. Gene 2016; 578:194-204. [DOI: 10.1016/j.gene.2015.12.023] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2015] [Revised: 12/10/2015] [Accepted: 12/10/2015] [Indexed: 12/27/2022]
|
6
|
Jamali AA, Ferdousi R, Razzaghi S, Li J, Safdari R, Ebrahimie E. DrugMiner: comparative analysis of machine learning algorithms for prediction of potential druggable proteins. Drug Discov Today 2016; 21:718-24. [PMID: 26821132 DOI: 10.1016/j.drudis.2016.01.007] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2015] [Revised: 12/05/2015] [Accepted: 01/19/2016] [Indexed: 12/14/2022]
Abstract
Application of computational methods in drug discovery has received increased attention in recent years as a way to accelerate drug target prediction. Based on 443 sequence-derived protein features, we applied the most commonly used machine learning methods to predict whether a protein is druggable as well as to opt for superior algorithm in this task. In addition, feature selection procedures were used to provide the best performance of each classifier according to the optimum number of features. When run on all features, Neural Network was the best classifier, with 89.98% accuracy, based on a k-fold cross-validation test. Among all the algorithms applied, the optimum number of most-relevant features was 130, according to the Support Vector Machine-Feature Selection (SVM-FS) algorithm. This study resulted in the discovery of new drug target which potentially can be employed in cell signaling pathways, gene expression, and signal transduction. The DrugMiner web tool was developed based on the findings of this study to provide researchers with the ability to predict druggable proteins. DrugMiner is freely available at www.DrugMiner.org.
Collapse
Affiliation(s)
- Ali Akbar Jamali
- Research Center for Pharmaceutical Nanotechnology (RCPN), Tabriz University of Medical Sciences, Tabriz, Iran
| | - Reza Ferdousi
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| | - Saeed Razzaghi
- Information Technology Center, The University of Zanjan, Zanjan, Iran
| | - Jiuyong Li
- School of Information Technology and Mathematical Sciences, Division of Information Technology, Engineering and the Environment, The University of South Australia, Adelaide, SA, Australia
| | - Reza Safdari
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran.
| | - Esmaeil Ebrahimie
- School of Information Technology and Mathematical Sciences, Division of Information Technology, Engineering and the Environment, The University of South Australia, Adelaide, SA, Australia; Department of Genetics & Evolution, School of Biological Sciences, The University of Adelaide, Adelaide, SA, Australia; School of Biological Sciences, Faculty of Science and Engineering, Flinders University, Adelaide, SA, Australia.
| |
Collapse
|
7
|
New layers in understanding and predicting α-linolenic acid content in plants using amino acid characteristics of omega-3 fatty acid desaturase. Comput Biol Med 2014; 54:14-23. [DOI: 10.1016/j.compbiomed.2014.08.019] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2014] [Revised: 08/16/2014] [Accepted: 08/17/2014] [Indexed: 12/11/2022]
|
8
|
Bakhtiarizadeh MR, Moradi-Shahrbabak M, Ebrahimi M, Ebrahimie E. Neural network and SVM classifiers accurately predict lipid binding proteins, irrespective of sequence homology. J Theor Biol 2014; 356:213-22. [PMID: 24819464 DOI: 10.1016/j.jtbi.2014.04.040] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Revised: 04/03/2014] [Accepted: 04/29/2014] [Indexed: 01/05/2023]
Abstract
Due to the central roles of lipid binding proteins (LBPs) in many biological processes, sequence based identification of LBPs is of great interest. The major challenge is that LBPs are diverse in sequence, structure, and function which results in low accuracy of sequence homology based methods. Therefore, there is a need for developing alternative functional prediction methods irrespective of sequence similarity. To identify LBPs from non-LBPs, the performances of support vector machine (SVM) and neural network were compared in this study. Comprehensive protein features and various techniques were employed to create datasets. Five-fold cross-validation (CV) and independent evaluation (IE) tests were used to assess the validity of the two methods. The results indicated that SVM outperforms neural network. SVM achieved 89.28% (CV) and 89.55% (IE) overall accuracy in identification of LBPs from non-LBPs and 92.06% (CV) and 92.90% (IE) (in average) for classification of different LBPs classes. Increasing the number and the range of extracted protein features as well as optimization of the SVM parameters significantly increased the efficiency of LBPs class prediction in comparison to the only previous report in this field. Altogether, the results showed that the SVM algorithm can be run on broad, computationally calculated protein features and offers a promising tool in detection of LBPs classes. The proposed approach has the potential to integrate and improve the common sequence alignment based methods.
Collapse
Affiliation(s)
| | - Mohammad Moradi-Shahrbabak
- Department of Animal Science, College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran
| | - Mansour Ebrahimi
- Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran
| | - Esmaeil Ebrahimie
- Department of Crop Production & Plant Breeding, College of Agriculture, Shiraz University, Shiraz, Iran; School of Molecular and Biomedical Science, The University of Adelaide, Adelaide, Australia.
| |
Collapse
|
9
|
Ebrahimi M, Aghagolzadeh P, Shamabadi N, Tahmasebi A, Alsharifi M, Adelson DL, Hemmatzadeh F, Ebrahimie E. Understanding the undelaying mechanism of HA-subtyping in the level of physic-chemical characteristics of protein. PLoS One 2014; 9:e96984. [PMID: 24809455 PMCID: PMC4014573 DOI: 10.1371/journal.pone.0096984] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2013] [Accepted: 04/07/2014] [Indexed: 01/05/2023] Open
Abstract
The evolution of the influenza A virus to increase its host range is a major concern worldwide. Molecular mechanisms of increasing host range are largely unknown. Influenza surface proteins play determining roles in reorganization of host-sialic acid receptors and host range. In an attempt to uncover the physic-chemical attributes which govern HA subtyping, we performed a large scale functional analysis of over 7000 sequences of 16 different HA subtypes. Large number (896) of physic-chemical protein characteristics were calculated for each HA sequence. Then, 10 different attribute weighting algorithms were used to find the key characteristics distinguishing HA subtypes. Furthermore, to discover machine leaning models which can predict HA subtypes, various Decision Tree, Support Vector Machine, Naïve Bayes, and Neural Network models were trained on calculated protein characteristics dataset as well as 10 trimmed datasets generated by attribute weighting algorithms. The prediction accuracies of the machine learning methods were evaluated by 10-fold cross validation. The results highlighted the frequency of Gln (selected by 80% of attribute weighting algorithms), percentage/frequency of Tyr, percentage of Cys, and frequencies of Try and Glu (selected by 70% of attribute weighting algorithms) as the key features that are associated with HA subtyping. Random Forest tree induction algorithm and RBF kernel function of SVM (scaled by grid search) showed high accuracy of 98% in clustering and predicting HA subtypes based on protein attributes. Decision tree models were successful in monitoring the short mutation/reassortment paths by which influenza virus can gain the key protein structure of another HA subtype and increase its host range in a short period of time with less energy consumption. Extracting and mining a large number of amino acid attributes of HA subtypes of influenza A virus through supervised algorithms represent a new avenue for understanding and predicting possible future structure of influenza pandemics.
Collapse
Affiliation(s)
- Mansour Ebrahimi
- Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran
| | - Parisa Aghagolzadeh
- Department of Nephrology, Hypertension, and Clinical Pharmacology, University of Bern, Bern, Switzerland
| | - Narges Shamabadi
- Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran
| | | | - Mohammed Alsharifi
- School of Molecular and Biomedical Science, The University of Adelaide, Adelaide, Australia
| | - David L. Adelson
- School of Molecular and Biomedical Science, The University of Adelaide, Adelaide, Australia
| | - Farhid Hemmatzadeh
- School of Animal and Veterinary Science, The University of Adelaide, Adelaide, Australia
- * E-mail: (FH); (EE)
| | - Esmaeil Ebrahimie
- School of Molecular and Biomedical Science, The University of Adelaide, Adelaide, Australia
- * E-mail: (FH); (EE)
| |
Collapse
|