1
|
Monroe LK, Truong DP, Miner JC, Adikari SH, Sasiene ZJ, Fenimore PW, Alexandrov B, Williams RF, Nguyen HB. Conotoxin Prediction: New Features to Increase Prediction Accuracy. Toxins (Basel) 2023; 15:641. [PMID: 37999504 PMCID: PMC10675404 DOI: 10.3390/toxins15110641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 10/27/2023] [Accepted: 10/30/2023] [Indexed: 11/25/2023] Open
Abstract
Conotoxins are toxic, disulfide-bond-rich peptides from cone snail venom that target a wide range of receptors and ion channels with multiple pathophysiological effects. Conotoxins have extraordinary potential for medical therapeutics that include cancer, microbial infections, epilepsy, autoimmune diseases, neurological conditions, and cardiovascular disorders. Despite the potential for these compounds in novel therapeutic treatment development, the process of identifying and characterizing the toxicities of conotoxins is difficult, costly, and time-consuming. This challenge requires a series of diverse, complex, and labor-intensive biological, toxicological, and analytical techniques for effective characterization. While recent attempts, using machine learning based solely on primary amino acid sequences to predict biological toxins (e.g., conotoxins and animal venoms), have improved toxin identification, these methods are limited due to peptide conformational flexibility and the high frequency of cysteines present in toxin sequences. This results in an enumerable set of disulfide-bridged foldamers with different conformations of the same primary amino acid sequence that affect function and toxicity levels. Consequently, a given peptide may be toxic when its cysteine residues form a particular disulfide-bond pattern, while alternative bonding patterns (isoforms) or its reduced form (free cysteines with no disulfide bridges) may have little or no toxicological effects. Similarly, the same disulfide-bond pattern may be possible for other peptide sequences and result in different conformations that all exhibit varying toxicities to the same receptor or to different receptors. We present here new features, when combined with primary sequence features to train machine learning algorithms to predict conotoxins, that significantly increase prediction accuracy.
Collapse
Affiliation(s)
- Lyman K. Monroe
- Bioscience Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Duc P. Truong
- Theoretical Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Jacob C. Miner
- Bioscience Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Samantha H. Adikari
- Bioscience Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Zachary J. Sasiene
- Bioscience Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Paul W. Fenimore
- Theoretical Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Boian Alexandrov
- Theoretical Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Robert F. Williams
- Bioscience Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Hau B. Nguyen
- Bioscience Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| |
Collapse
|
2
|
Wang X, Liu Y, Li J, Wang G. StackCirRNAPred: computational classification of long circRNA from other lncRNA based on stacking strategy. BMC Bioinformatics 2022; 23:563. [PMID: 36575368 PMCID: PMC9793644 DOI: 10.1186/s12859-022-05118-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Accepted: 12/20/2022] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND CircRNAs are essential for the regulation of post-transcriptional gene expression, including as miRNA sponges, and play an important role in disease development. Some computational tools have been proposed recently to predict circRNA, since only one classifier is used, there is still much that can be done to improve the performance. RESULTS StackCirRNAPred was proposed, the computational classification of long circRNA from other lncRNA based on stacking strategy. In order to cope with the potential problem that a single feature might not be able to distinguish circRNA well from other lncRNA, we first extracted features from different sources, including nucleic acid composition, sequence spatial features and physicochemical properties, Alu and tandem repeats. We innovatively apply the stacking strategy to integrate the more advantageous classifiers of RF, LightGBM, XGBoost. This allows the model to incorporate these features more flexibly. StackCirRNAPred was found to be significantly better than other tools, with precision, accuracy, F1, recall and MCC of 0.843, 0.833, 0.831, 0.819 and 0.666 respectively. We tested it directly on the mouse dataset. StackCirRNAPred was still significantly better than other methods, with precision, accuracy, F1, recall and MCC of 0.837, 0.839, 0.839, 0.841, 0.677. CONCLUSIONS We proposed StackCirRNAPred based on stacking strategy to distinguish long circRNAs from other lncRNAs. With the test results demonstrating the validity and robustness of StackCirRNAPred, we hope StackCirRNAPred will complement existing circRNA prediction methods and is helpful in down-stream research.
Collapse
Affiliation(s)
- Xin Wang
- grid.19373.3f0000 0001 0193 3564School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yadong Liu
- grid.19373.3f0000 0001 0193 3564School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jie Li
- grid.19373.3f0000 0001 0193 3564School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Guohua Wang
- grid.19373.3f0000 0001 0193 3564School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
3
|
Ruelas-Callejas A, Aguilar MB, Arteaga-Tlecuitl R, Gomora JC, López-Vera E. The T-1 conotoxin μ-SrVA from the worm hunting marine snail Conus spurius preferentially blocks the human Na V1.5 channel. Peptides 2022; 156:170859. [PMID: 35940316 DOI: 10.1016/j.peptides.2022.170859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 08/01/2022] [Accepted: 08/02/2022] [Indexed: 11/20/2022]
Abstract
Conotoxin sr5a had previously been identified in the vermivorous cone snail Conus spurius. This conotoxin is a highly hydrophobic peptide, with the sequence IINWCCLIFYQCC, which has a cysteine pattern "CC-CC" belonging to the T-1 superfamily. It is well known that this superfamily binds to molecular targets such as calcium channels, G protein-coupled receptors (GPCR), and neuronal nicotinic acetylcholine receptors (nAChR) and exerts an effect mainly in the central nervous system. However, its effects on other molecular targets are not yet defined, suggesting the potential of newly relevant molecular interactions. To find and demonstrate a potential molecular target for conotoxin sr5a electrophysiological assays were performed on three subtypes of voltage-activated sodium channels (NaV1.5, NaV1.6, and NaV1.7) expressed in HEK-293 cells with three different concentrations of sr5a(200, 400, and 600 nM). 200 nM sr5a blocked currents mediated by NaV1.5 by 33%, NaV1.6 by 14%, and NaV1.7 by 7%. The current-voltage (I-V) relationships revealed that conotoxin sr5a exhibits a preferential activity on the NaV1.5 subtype; the activation of NaV1.5 conductance was not modified by the blocking effect of sr5a, but sr5a affected the voltage-dependence of inactivation of channels. Since peptide sr5a showed a specific activity for a sodium channel subtype, we can assign a pharmacological family and rename it as conotoxin µ-SrVA.
Collapse
Affiliation(s)
- Angélica Ruelas-Callejas
- Laboratorio de Toxinología Marina, Unidad Académica de Ecología y Biodiversidad Acuática, Instituto de Ciencias del Mar y Limnología, Universidad Nacional Autónoma de México, Ciudad de México 04510, Mexico
| | - Manuel B Aguilar
- Laboratorio de Neurofarmacología Marina, Departamento de Neurobiología Celular y Molecular, Instituto de Neurobiología, Universidad Nacional Autónoma de México, Juriquilla, Querétaro 76230, Mexico
| | - Rogelio Arteaga-Tlecuitl
- Departamento de Neuropatología Molecular, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Ciudad de México 0410, Mexico
| | - Juan Carlos Gomora
- Departamento de Neuropatología Molecular, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Ciudad de México 0410, Mexico
| | - Estuardo López-Vera
- Laboratorio de Toxinología Marina, Unidad Académica de Ecología y Biodiversidad Acuática, Instituto de Ciencias del Mar y Limnología, Universidad Nacional Autónoma de México, Ciudad de México 04510, Mexico.
| |
Collapse
|
4
|
Raju B, Narendra G, Verma H, Kumar M, Sapra B, Kaur G, jain SK, Silakari O. Machine Learning Enabled Structure-Based Drug Repurposing Approach to Identify Potential CYP1B1 Inhibitors. ACS OMEGA 2022; 7:31999-32013. [PMID: 36120033 PMCID: PMC9476183 DOI: 10.1021/acsomega.2c02983] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 08/23/2022] [Indexed: 06/15/2023]
Abstract
Drug-metabolizing enzyme (DME)-mediated pharmacokinetic resistance of some clinically approved anticancer agents is one of the main reasons for cancer treatment failure. In particular, some commonly used anticancer medicines, including docetaxel, tamoxifen, imatinib, cisplatin, and paclitaxel, are inactivated by CYP1B1. Currently, no approved drugs are available to treat this CYP1B1-mediated inactivation, making the pharmaceutical industries strive to discover new anticancer agents. Because of the extreme complexity and high risk in drug discovery and development, it is worthwhile to come up with a drug repurposing strategy that may solve the resistance problem of existing chemotherapeutics. Therefore, in the current study, a drug repurposing strategy was implemented to find the possible CYP1B1 inhibitors using machine learning (ML) and structure-based virtual screening (SB-VS) approaches. Initially, three different ML models were developed such as support vector machines (SVMs), random forest (RF), and artificial neural network (ANN); subsequently, the best-selected ML model was employed for virtual screening of the selleckchem database to identify potential CYP1B1 inhibitors. The inhibition potency of the obtained hits was judged by analyzing the crucial active site amino acid interactions against CYP1B1. After a thorough assessment of docking scores, binding affinities, as well as binding modes, four compounds were selected and further subjected to in vitro analysis. From the in vitro analysis, it was observed that chlorprothixene, nadifloxacin, and ticagrelor showed promising inhibitory activity toward CYP1B1 in the IC50 range of 0.07-3.00 μM. These new chemical scaffolds can be explored as adjuvant therapies to address CYP1B1-mediated drug-resistance problems.
Collapse
Affiliation(s)
- Baddipadige Raju
- Molecular
Modeling Lab (MML), Department of Pharmaceutical Sciences and Drug
Research, Punjabi University, Patiala, Punjab 147002, India
| | - Gera Narendra
- Molecular
Modeling Lab (MML), Department of Pharmaceutical Sciences and Drug
Research, Punjabi University, Patiala, Punjab 147002, India
| | - Himanshu Verma
- Molecular
Modeling Lab (MML), Department of Pharmaceutical Sciences and Drug
Research, Punjabi University, Patiala, Punjab 147002, India
| | - Manoj Kumar
- Molecular
Modeling Lab (MML), Department of Pharmaceutical Sciences and Drug
Research, Punjabi University, Patiala, Punjab 147002, India
| | - Bharti Sapra
- Molecular
Modeling Lab (MML), Department of Pharmaceutical Sciences and Drug
Research, Punjabi University, Patiala, Punjab 147002, India
| | - Gurleen Kaur
- Center
for Basic and Translational Research in Health Sciences, Guru Nanak Dev University, Amritsar 143005, India
| | - Subheet Kumar jain
- Center
for Basic and Translational Research in Health Sciences, Guru Nanak Dev University, Amritsar 143005, India
| | - Om Silakari
- Molecular
Modeling Lab (MML), Department of Pharmaceutical Sciences and Drug
Research, Punjabi University, Patiala, Punjab 147002, India
| |
Collapse
|
5
|
Feng C, Wu J, Wei H, Xu L, Zou Q. CRCF: A Method of Identifying Secretory Proteins of Malaria Parasites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2149-2157. [PMID: 34061749 DOI: 10.1109/tcbb.2021.3085589] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Malaria is a mosquito-borne disease that results in millions of cases and deaths annually. The development of a fast computational method that identifies secretory proteins of the malaria parasite is important for research on antimalarial drugs and vaccines. Thus, a method was developed to identify the secretory proteins of malaria parasites. In this method, a reduced alphabet was selected to recode the original protein sequence. A feature synthesis method was used to synthesise three different types of feature information. Finally, the random forest method was used as a classifier to identify the secretory proteins. In addition, a web server was developed to share the proposed algorithm. Experiments using the benchmark dataset demonstrated that the overall accuracy achieved by the proposed method was greater than 97.8 percent using the 10-fold cross-validation method. Furthermore, the reduced schemes and characteristic performance analyses are discussed.
Collapse
|
6
|
Wang F, Chang S, Wei D. Prediction of conotoxin type based on long short-term memory network. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2021; 18:6700-6708. [PMID: 34517552 DOI: 10.3934/mbe.2021332] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Aiming at the problems of the wet experiment method in identifying the types of conotoxins, such as the complexity, low efficiency and high cost, this study proposes a method that uses the sequence information of the conotoxin peptides combined with long short term memory networks (LSTM) models to predict the Methods of spirotoxin category. This method only needs to take the conotoxin peptide sequence as input, and adopts the character embedding method in text processing to automatically map the sequence to the feature vector representation, and the model extracts features for training and prediction. Experimental results show that the correct index of this method on the test set reaches 0.80, and the AUC value reaches 0.817. For the same test set, the AUC value of the KNN algorithm is 0.641, and the AUC value of the method proposed in this paper is 0.817, indicating that this method can effectively assist in identifying the type of conotoxin.
Collapse
Affiliation(s)
- Feng Wang
- Changzhou University Huaide College, China
| | - Shan Chang
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Dashun Wei
- Changzhou University Huaide College, China
| |
Collapse
|
7
|
Raju B, Verma H, Narendra G, Sapra B, Silakari O. Multiple machine learning, molecular docking, and ADMET screening approach for identification of selective inhibitors of CYP1B1. J Biomol Struct Dyn 2021; 40:7975-7990. [PMID: 33769194 DOI: 10.1080/07391102.2021.1905552] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Cytochrome P4501B1 is a ubiquitous family protein that is majorly overexpressed in tumors and is responsible for biotransformation-based inactivation of anti-cancer drugs. This inactivation marks the cause of resistance to chemotherapeutics. In the present study, integrated in-silico approaches were utilized to identify selective CYP1B1 inhibitors. To achieve this objective, we initially developed different machine learning models corresponding to two isoforms of the CYP1 family i.e. CYP1A1 and CYP1B1. Subsequently, small molecule databases including ChemBridge, Maybridge, and natural compound library were screened from the selected models of CYP1B1 and CYP1A1. The obtained CYP1B1 inhibitors were further subjected to molecular docking and ADMET analysis. The selectivity of the obtained hits for CYP1B1 over the other isoforms was also judged with molecular docking analysis. Finally, two hits were found to be the most stable which retained key interactions within the active site of CYP1B1 after the molecular dynamics simulations. Novel compound with CYP-D9 and CYP-14 IDs were found to be the most selective CYP1B1 inhibitors which may address the issue of resistance. Moreover, these compounds can be considered as safe agents for further cell-based and animal model studies.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Baddipadige Raju
- Molecular Modeling Lab (MML), Department of Pharmaceutical Sciences and Drug Research, Punjabi University, Patiala, Punjab, India
| | - Himanshu Verma
- Molecular Modeling Lab (MML), Department of Pharmaceutical Sciences and Drug Research, Punjabi University, Patiala, Punjab, India
| | - Gera Narendra
- Molecular Modeling Lab (MML), Department of Pharmaceutical Sciences and Drug Research, Punjabi University, Patiala, Punjab, India
| | - Bharti Sapra
- Molecular Modeling Lab (MML), Department of Pharmaceutical Sciences and Drug Research, Punjabi University, Patiala, Punjab, India
| | - Om Silakari
- Molecular Modeling Lab (MML), Department of Pharmaceutical Sciences and Drug Research, Punjabi University, Patiala, Punjab, India
| |
Collapse
|
8
|
Sun Z, Huang S, Zheng L, Liang P, Yang W, Zuo Y. ICTC-RAAC: An improved web predictor for identifying the types of ion channel-targeted conotoxins by using reduced amino acid cluster descriptors. Comput Biol Chem 2020; 89:107371. [PMID: 32950852 DOI: 10.1016/j.compbiolchem.2020.107371] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Revised: 09/01/2020] [Accepted: 09/02/2020] [Indexed: 12/27/2022]
Abstract
Conotoxins are small peptide toxins which are rich in disulfide and have the unique diversity of sequences. It is significant to correctly identify the types of ion channel-targeted conotoxins because that they are considered as the optimal pharmacological candidate medicine in drug design owing to their ability specifically binding to ion channels and interfering with neural transmission. Comparing with other feature extracting methods, the reduced amino acid cluster (RAAC) better resolved in simplifying protein complexity and identifying functional conserved regions. Thus, in our study, 673 RAACs generated from 74 types of reduced amino acid alphabet were comprehensively assessed to establish a state-of-the-art predictor for predicting ion channel-targeted conotoxins. The results showed Type 20, Cluster 9 (T = 20, C = 9) in the tripeptide composition (N = 3) achieved the best accuracy, 89.3%, which was based on the algorithm of amino acids reduction of variance maximization. Further, the ANOVA with incremental feature selection (IFS) was used for feature selection to improve prediction performance. Finally, the cross-validation results showed that the best overall accuracy we calculated was 96.4% and 1.8% higher than the best accuracy of previous studies. Based on the predictor we proposed, a user-friendly webserver was established and can be friendly accessed at http://bioinfor.imu.edu.cn/ictcraac.
Collapse
Affiliation(s)
- Zijie Sun
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China; School of Mathematical Sciences, Inner Mongolia University, Hohhot, 010021, China
| | - Shenghui Huang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Lei Zheng
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Pengfei Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Wuritu Yang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China.
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China.
| |
Collapse
|
9
|
Jin AH, Muttenthaler M, Dutertre S, Himaya SWA, Kaas Q, Craik DJ, Lewis RJ, Alewood PF. Conotoxins: Chemistry and Biology. Chem Rev 2019; 119:11510-11549. [PMID: 31633928 DOI: 10.1021/acs.chemrev.9b00207] [Citation(s) in RCA: 167] [Impact Index Per Article: 33.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The venom of the marine predatory cone snails (genus Conus) has evolved for prey capture and defense, providing the basis for survival and rapid diversification of the now estimated 750+ species. A typical Conus venom contains hundreds to thousands of bioactive peptides known as conotoxins. These mostly disulfide-rich and well-structured peptides act on a wide range of targets such as ion channels, G protein-coupled receptors, transporters, and enzymes. Conotoxins are of interest to neuroscientists as well as drug developers due to their exquisite potency and selectivity, not just against prey but also mammalian targets, thereby providing a rich source of molecular probes and therapeutic leads. The rise of integrated venomics has accelerated conotoxin discovery with now well over 10,000 conotoxin sequences published. However, their structural and pharmacological characterization lags considerably behind. In this review, we highlight the diversity of new conotoxins uncovered since 2014, their three-dimensional structures and folds, novel chemical approaches to their syntheses, and their value as pharmacological tools to unravel complex biology. Additionally, we discuss challenges and future directions for the field.
Collapse
Affiliation(s)
- Ai-Hua Jin
- Institute for Molecular Bioscience , The University of Queensland , Brisbane Queensland 4072 , Australia
| | - Markus Muttenthaler
- Institute for Molecular Bioscience , The University of Queensland , Brisbane Queensland 4072 , Australia.,Institute of Biological Chemistry, Faculty of Chemistry , University of Vienna , 1090 Vienna , Austria
| | - Sebastien Dutertre
- Département des Acides Amines, Peptides et Protéines, Unité Mixte de Recherche 5247, Université Montpellier 2-Centre Nationale de la Recherche Scientifique , Institut des Biomolécules Max Mousseron , Place Eugène Bataillon , 34095 Montpellier Cedex 5 , France
| | - S W A Himaya
- Institute for Molecular Bioscience , The University of Queensland , Brisbane Queensland 4072 , Australia
| | - Quentin Kaas
- Institute for Molecular Bioscience , The University of Queensland , Brisbane Queensland 4072 , Australia
| | - David J Craik
- Institute for Molecular Bioscience , The University of Queensland , Brisbane Queensland 4072 , Australia
| | - Richard J Lewis
- Institute for Molecular Bioscience , The University of Queensland , Brisbane Queensland 4072 , Australia
| | - Paul F Alewood
- Institute for Molecular Bioscience , The University of Queensland , Brisbane Queensland 4072 , Australia
| |
Collapse
|
10
|
Verma N, Singh H, Khanna D, Rana PS, Bhadada SK. Classification of drug molecules for oxidative stress signalling pathway. IET Syst Biol 2019; 13:243-250. [PMID: 31538958 PMCID: PMC8687196 DOI: 10.1049/iet-syb.2018.5078] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
In humans, oxidative stress is involved in the development of diabetes, cancer, hypertension, Alzheimers’ disease, and heart failure. One of the mechanisms in the cellular defence against oxidative stress is the activation of the Nrf2‐antioxidant response element (ARE) signalling pathway. Computation of activity, efficacy, and potency score of ARE signalling pathway and to propose a multi‐level prediction scheme for the same is the main aim of the study as it contributes in a big amount to the improvement of oxidative stress in humans. Applying the process of knowledge discovery from data, required knowledge is gathered and then machine learning techniques are applied to propose a multi‐level scheme. The validation of the proposed scheme is done using the K‐fold cross‐validation method and an accuracy of 90% is achieved for prediction of activity score for ARE molecules which determine their power to refine oxidative stress.
Collapse
Affiliation(s)
- Nikhil Verma
- Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala, Punjab 147004, India.
| | - Harpreet Singh
- Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala, Punjab 147004, India
| | - Divya Khanna
- Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala, Punjab 147004, India
| | - Prashant Singh Rana
- Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala, Punjab 147004, India
| | - Sanjay Kumar Bhadada
- Department of Endocrinology, Postgraduate Institute of Medical Education and Research, Chandigarh 160012, India
| |
Collapse
|
11
|
Taju SW, Ou Y. DeepIon: Deep learning approach for classifying ion transporters and ion channels from membrane proteins. J Comput Chem 2019; 40:1521-1529. [DOI: 10.1002/jcc.25805] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Revised: 01/19/2019] [Accepted: 01/30/2019] [Indexed: 01/20/2023]
Affiliation(s)
- Semmy Wellem Taju
- Department of Computer Science and EngineeringYuan Ze University Chung‐Li 32003 Taiwan
| | - Yu‐Yen Ou
- Department of Computer Science and EngineeringYuan Ze University Chung‐Li 32003 Taiwan
| |
Collapse
|
12
|
Mansbach RA, Travers T, McMahon BH, Fair JM, Gnanakaran S. Snails In Silico: A Review of Computational Studies on the Conopeptides. Mar Drugs 2019; 17:E145. [PMID: 30832207 PMCID: PMC6471681 DOI: 10.3390/md17030145] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Revised: 02/21/2019] [Accepted: 02/22/2019] [Indexed: 12/26/2022] Open
Abstract
Marine cone snails are carnivorous gastropods that use peptide toxins called conopeptides both as a defense mechanism and as a means to immobilize and kill their prey. These peptide toxins exhibit a large chemical diversity that enables exquisite specificity and potency for target receptor proteins. This diversity arises in terms of variations both in amino acid sequence and length, and in posttranslational modifications, particularly the formation of multiple disulfide linkages. Most of the functionally characterized conopeptides target ion channels of animal nervous systems, which has led to research on their therapeutic applications. Many facets of the underlying molecular mechanisms responsible for the specificity and virulence of conopeptides, however, remain poorly understood. In this review, we will explore the chemical diversity of conopeptides from a computational perspective. First, we discuss current approaches used for classifying conopeptides. Next, we review different computational strategies that have been applied to understanding and predicting their structure and function, from machine learning techniques for predictive classification to docking studies and molecular dynamics simulations for molecular-level understanding. We then review recent novel computational approaches for rapid high-throughput screening and chemical design of conopeptides for particular applications. We close with an assessment of the state of the field, emphasizing important questions for future lines of inquiry.
Collapse
Affiliation(s)
- Rachael A Mansbach
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
| | - Timothy Travers
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
| | - Benjamin H McMahon
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
| | - Jeanne M Fair
- Biosecurity and Public Health Group, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
| | - S Gnanakaran
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
| |
Collapse
|
13
|
Kabir M, Ahmad S, Iqbal M, Hayat M. iNR-2L: A two-level sequence-based predictor developed via Chou's 5-steps rule and general PseAAC for identifying nuclear receptors and their families. Genomics 2019; 112:276-285. [PMID: 30779939 DOI: 10.1016/j.ygeno.2019.02.006] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Revised: 01/09/2019] [Accepted: 02/07/2019] [Indexed: 12/25/2022]
Abstract
Nuclear receptor proteins (NRPs) perform a vital role in regulating gene expression. With the rapidity growth of NRPs in post-genomic era, it is highly recommendable to identify NRPs and their sub-families accurately from their primary sequences. Several conventional methods have been used for discrimination of NRPs and their sub-families, but did not achieve considerable results. In a sequel, a two-level new computational model "iNR-2 L" is developed. Two discrete methods namely: Dipeptide Composition and Tripeptide Composition were used to formulate NRPs sequences. Further, both the descriptor spaces were merged to construct hybrid space. Furthermore, feature selection technique minimum redundancy and maximum relevance was employed in order to select salient features as well as reduce the noise and redundancy. The experiential outcomes exhibited that the proposed model iNR-2 L achieved outstanding results. It is anticipated that the proposed computational model might be a practical and effective tool for academia and research community.
Collapse
Affiliation(s)
- Muhammad Kabir
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan; School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China.
| | - Saeed Ahmad
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan; School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
| | - Muhammad Iqbal
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan.
| |
Collapse
|
14
|
Dao FY, Lv H, Wang F, Feng CQ, Ding H, Chen W, Lin H. Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique. Bioinformatics 2018; 35:2075-2083. [DOI: 10.1093/bioinformatics/bty943] [Citation(s) in RCA: 147] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Revised: 11/06/2018] [Accepted: 11/13/2018] [Indexed: 02/07/2023] Open
Affiliation(s)
- Fu-Ying Dao
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hao Lv
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Fang Wang
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Chao-Qin Feng
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hui Ding
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Wei Chen
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Hao Lin
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
15
|
Ariaeenejad S, Mousivand M, Moradi Dezfouli P, Hashemi M, Kavousi K, Hosseini Salekdeh G. A computational method for prediction of xylanase enzymes activity in strains of Bacillus subtilis based on pseudo amino acid composition features. PLoS One 2018; 13:e0205796. [PMID: 30346964 PMCID: PMC6197662 DOI: 10.1371/journal.pone.0205796] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2017] [Accepted: 10/02/2018] [Indexed: 01/09/2023] Open
Abstract
Xylanases are hydrolytic enzymes which based on physicochemical properties, structure, mode of action and substrate specificities are classified into various glycoside hydrolase (GH) families. The purpose of this study is to show that the activity of the members of the xylanase family in the specified pH and temperature conditions can be computationally predicted. The proposed computational regression model was trained and tested with the Pseudo Amino Acid Composition (PseAAC) features extracted solely from the amino acid sequences of enzymes. The xylanases with experimentally determined activities were used as the training dataset to adjust the model parameters. To develop the model, 41 strains of Bacillus subtilis isolated from field soil were screened. From them, 28 strains with the highest halo diameter were selected for further studies. The performance of the model for prediction of xylanase activity was evaluated in three different temperature and pH conditions using stratified cross-validation and jackknife methods. The trained model can be used for determining the activity of newly found xylanases in the specified condition. Such computational models help to scale down the experimental costs and save time by identifying enzymes with appropriate activity for scientific and industrial usage. Our methodology for activity prediction of xylanase enzymes can be potentially applied to the members of the other enzyme families. The availability of sufficient experimental data in specified pH and temperature conditions is a prerequisite for training the learning model and to achieve high accuracy.
Collapse
Affiliation(s)
- Shohreh Ariaeenejad
- Department of Systems Biology, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research Education and Extension Organization (AREO), Karaj, Iran
| | - Maryam Mousivand
- Department of Microbial Biotechnology, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research Education and Extension Organization (AREO), Karaj, Iran
| | - Parinaz Moradi Dezfouli
- Department of Microbial Biotechnology, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research Education and Extension Organization (AREO), Karaj, Iran
| | - Maryam Hashemi
- Department of Microbial Biotechnology, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research Education and Extension Organization (AREO), Karaj, Iran
| | - Kaveh Kavousi
- Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran
| | - Ghasem Hosseini Salekdeh
- Department of Systems Biology, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research Education and Extension Organization (AREO), Karaj, Iran
| |
Collapse
|
16
|
Using a Classifier Fusion Strategy to Identify Anti-angiogenic Peptides. Sci Rep 2018; 8:14062. [PMID: 30218091 PMCID: PMC6138733 DOI: 10.1038/s41598-018-32443-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Accepted: 09/07/2018] [Indexed: 12/27/2022] Open
Abstract
Anti-angiogenic peptides perform distinct physiological functions and potential therapies for angiogenesis-related diseases. Accurate identification of anti-angiogenic peptides may provide significant clues to understand the essential angiogenic homeostasis within tissues and develop antineoplastic therapies. In this study, an ensemble predictor is proposed for anti-angiogenic peptide prediction by fusing an individual classifier with the best sensitivity and another individual one with the best specificity. We investigate predictive capabilities of various feature spaces with respect to the corresponding optimal individual classifiers and ensemble classifiers. The accuracy and Matthew’s Correlation Coefficient (MCC) of the ensemble classifier trained by Bi-profile Bayes (BpB) features are 0.822 and 0.649, respectively, which represents the highest prediction results among the investigated prediction models. Discriminative features are obtained from BpB using the Relief algorithm followed by the Incremental Feature Selection (IFS) method. The sensitivity, specificity, accuracy, and MCC of the ensemble classifier trained by the discriminative features reach up to 0.776, 0.888, 0.832, and 0.668, respectively. Experimental results indicate that the proposed method is far superior to the previous study for anti-angiogenic peptide prediction.
Collapse
|
17
|
Classes, Databases, and Prediction Methods of Pharmaceutically and Commercially Important Cystine-Stabilized Peptides. Toxins (Basel) 2018; 10:toxins10060251. [PMID: 29921767 PMCID: PMC6024828 DOI: 10.3390/toxins10060251] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Revised: 06/12/2018] [Accepted: 06/14/2018] [Indexed: 12/13/2022] Open
Abstract
Cystine-stabilized peptides represent a large family of peptides characterized by high structural stability and bactericidal, fungicidal, or insecticidal properties. Found throughout a wide range of taxa, this broad and functionally important family can be subclassified into distinct groups dependent upon their number and type of cystine bonding patters, tertiary structures, and/or their species of origin. Furthermore, the annotation of proteins related to the cystine-stabilized family are under-represented in the literature due to their difficulty of isolation and identification. As a result, there are several recent attempts to collate them into data resources and build analytic tools for their dynamic prediction. Ultimately, the identification and delivery of new members of this family will lead to their growing inclusion into the repertoire of commercial viable alternatives to antibiotics and environmentally safe insecticides. This review of the literature and current state of cystine-stabilized peptide biology is aimed to better describe peptide subfamilies, identify databases and analytics resources associated with specific cystine-stabilized peptides, and highlight their current commercial success.
Collapse
|
18
|
Tang H, Zhao YW, Zou P, Zhang CM, Chen R, Huang P, Lin H. HBPred: a tool to identify growth hormone-binding proteins. Int J Biol Sci 2018; 14:957-964. [PMID: 29989085 PMCID: PMC6036759 DOI: 10.7150/ijbs.24174] [Citation(s) in RCA: 136] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Accepted: 01/15/2018] [Indexed: 12/19/2022] Open
Abstract
Hormone-binding protein (HBP) is a kind of soluble carrier protein and can selectively and non-covalently interact with hormone. HBP plays an important role in life growth, but its function is still unclear. Correct recognition of HBPs is the first step to further study their function and understand their biological process. However, it is difficult to correctly recognize HBPs from more and more proteins through traditional biochemical experiments because of high experimental cost and long experimental period. To overcome these disadvantages, we designed a computational method for identifying HBPs accurately in the study. At first, we collected HBP data from UniProt to establish a high-quality benchmark dataset. Based on the dataset, the dipeptide composition was extracted from HBP residue sequences. In order to find out the optimal features to provide key clues for HBP identification, the analysis of various (ANOVA) was performed for feature ranking. The optimal features were selected through the incremental feature selection strategy. Subsequently, the features were inputted into support vector machine (SVM) for prediction model construction. Jackknife cross-validation results showed that 88.6% HBPs and 81.3% non-HBPs were correctly recognized, suggesting that our proposed model was powerful. This study provides a new strategy to identify HBPs. Moreover, based on the proposed model, we established a webserver called HBPred, which could be freely accessed at http://lin-group.cn/server/HBPred.
Collapse
Affiliation(s)
- Hua Tang
- Department of Pathophysiology, Southwest Medical University, Luzhou 646000, China
| | - Ya-Wei Zhao
- Key Laboratory for NeuroInformation of Ministry of Education, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Ping Zou
- Department of Pathophysiology, Southwest Medical University, Luzhou 646000, China
| | - Chun-Mei Zhang
- Department of Pathophysiology, Southwest Medical University, Luzhou 646000, China
| | - Rong Chen
- Department of Pathophysiology, Southwest Medical University, Luzhou 646000, China
| | - Po Huang
- Department of Pathophysiology, Southwest Medical University, Luzhou 646000, China
| | - Hao Lin
- Key Laboratory for NeuroInformation of Ministry of Education, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
19
|
Ojeda PG, Ramírez D, Alzate-Morales J, Caballero J, Kaas Q, González W. Computational Studies of Snake Venom Toxins. Toxins (Basel) 2017; 10:E8. [PMID: 29271884 PMCID: PMC5793095 DOI: 10.3390/toxins10010008] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2017] [Revised: 12/09/2017] [Accepted: 12/18/2017] [Indexed: 12/17/2022] Open
Abstract
Most snake venom toxins are proteins, and participate to envenomation through a diverse array of bioactivities, such as bleeding, inflammation, and pain, cytotoxic, cardiotoxic or neurotoxic effects. The venom of a single snake species contains hundreds of toxins, and the venoms of the 725 species of venomous snakes represent a large pool of potentially bioactive proteins. Despite considerable discovery efforts, most of the snake venom toxins are still uncharacterized. Modern bioinformatics tools have been recently developed to mine snake venoms, helping focus experimental research on the most potentially interesting toxins. Some computational techniques predict toxin molecular targets, and the binding mode to these targets. This review gives an overview of current knowledge on the ~2200 sequences, and more than 400 three-dimensional structures of snake toxins deposited in public repositories, as well as of molecular modeling studies of the interaction between these toxins and their molecular targets. We also describe how modern bioinformatics have been used to study the snake venom protein phospholipase A2, the small basic myotoxin Crotamine, and the three-finger peptide Mambalgin.
Collapse
Affiliation(s)
- Paola G Ojeda
- Center for Bioinformatics and Molecular Simulations (CBSM), Universidad de Talca, 3460000 Talca, Chile.
- Facultad de Ciencias de la Salud, Instituto de Ciencias Biomedicas, Universidad Autonoma de Chile, 3460000 Talca, Chile.
| | - David Ramírez
- Center for Bioinformatics and Molecular Simulations (CBSM), Universidad de Talca, 3460000 Talca, Chile.
- Facultad de Ciencias de la Salud, Instituto de Ciencias Biomedicas, Universidad Autonoma de Chile, 3460000 Talca, Chile.
| | - Jans Alzate-Morales
- Center for Bioinformatics and Molecular Simulations (CBSM), Universidad de Talca, 3460000 Talca, Chile.
| | - Julio Caballero
- Center for Bioinformatics and Molecular Simulations (CBSM), Universidad de Talca, 3460000 Talca, Chile.
| | - Quentin Kaas
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland 4072, Australia.
| | - Wendy González
- Center for Bioinformatics and Molecular Simulations (CBSM), Universidad de Talca, 3460000 Talca, Chile.
- Millennium Nucleus of Ion Channels-Associated Diseases (MiNICAD), Universidad de Talca, 3460000 Talca, Chile.
| |
Collapse
|
20
|
Wei L, Tang J, Zou Q. SkipCPP-Pred: an improved and promising sequence-based predictor for predicting cell-penetrating peptides. BMC Genomics 2017. [PMID: 29513192 PMCID: PMC5657092 DOI: 10.1186/s12864-017-4128-1] [Citation(s) in RCA: 76] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
Background Cell-penetrating peptides (CPPs) are short peptides (5–30 amino acids) that can enter almost any cell without significant damage. On account of their high delivery efficiency, CPPs are promising candidates for gene therapy and cancer treatment. Accordingly, techniques that correctly predict CPPs are anticipated to accelerate CPP applications in future therapeutics. Recently, computational methods have been reportedly successful in predicting CPPs. Unfortunately, the predictive performance of existing methods is not satisfactory and reliable so as to accurately identify CPPs. Results In this study, we propose a novel computational predictor called SkipCPP-Pred to further improve the predictive performance. The novelty of the proposed predictor is that we present a sequence-based feature representation algorithm called adaptive k-skip-n-gram that sufficiently captures the intrinsic correlation information of residues. By fusing the proposed adaptive skip features with a random forest (RF) classifier, we successfully construct the prediction model of SkipCPP-Pred. The various jackknife results demonstrate that the proposed SkipCPP-Pred is 3.6% higher than state-of-the-art CPP predictors in terms of accuracy. Moreover, we construct a high-quality benchmark dataset by reducing the data redundancy and enhancing the similarity between the positive and negative classes. Using this dataset to build prediction models, we can successfully avoid the performance bias lying in existing methods and yield a promising predictive model. Conclusions The proposed SkipCPP-Pred is a simple and fast sequence-based predictor featured with the adaptive k-skip-n-gram model for the improved prediction of CPPs. Currently, SkipCPP-Pred is publicly available from an online webserver (http://server.malab.cn/SkipCPP-Pred/Index.html). Electronic supplementary material The online version of this article (10.1186/s12864-017-4128-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Leyi Wei
- School of Computer Science and Technology, Tianjin University, Tianjin, 30050, China.,State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin, 300074, China
| | - Jijun Tang
- School of Computer Science and Technology, Tianjin University, Tianjin, 30050, China
| | - Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin, 30050, China.
| |
Collapse
|
21
|
Saghapour E, Kermani S, Sehhati M. A novel feature ranking method for prediction of cancer stages using proteomics data. PLoS One 2017; 12:e0184203. [PMID: 28934234 PMCID: PMC5608217 DOI: 10.1371/journal.pone.0184203] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2017] [Accepted: 08/15/2017] [Indexed: 12/21/2022] Open
Abstract
Proteomic analysis of cancers' stages has provided new opportunities for the development of novel, highly sensitive diagnostic tools which helps early detection of cancer. This paper introduces a new feature ranking approach called FRMT. FRMT is based on the Technique for Order of Preference by Similarity to Ideal Solution method (TOPSIS) which select the most discriminative proteins from proteomics data for cancer staging. In this approach, outcomes of 10 feature selection techniques were combined by TOPSIS method, to select the final discriminative proteins from seven different proteomic databases of protein expression profiles. In the proposed workflow, feature selection methods and protein expressions have been considered as criteria and alternatives in TOPSIS, respectively. The proposed method is tested on seven various classifier models in a 10-fold cross validation procedure that repeated 30 times on the seven cancer datasets. The obtained results proved the higher stability and superior classification performance of method in comparison with other methods, and it is less sensitive to the applied classifier. Moreover, the final introduced proteins are informative and have the potential for application in the real medical practice.
Collapse
Affiliation(s)
- Ehsan Saghapour
- Department of Biomedical Engineering, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Saeed Kermani
- Department of Biomedical Engineering, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
- * E-mail:
| | - Mohammadreza Sehhati
- Department of Biomedical Engineering, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
| |
Collapse
|
22
|
Dao FY, Yang H, Su ZD, Yang W, Wu Y, Hui D, Chen W, Tang H, Lin H. Recent Advances in Conotoxin Classification by Using Machine Learning Methods. Molecules 2017; 22:molecules22071057. [PMID: 28672838 PMCID: PMC6152242 DOI: 10.3390/molecules22071057] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Revised: 06/12/2017] [Accepted: 06/19/2017] [Indexed: 11/16/2022] Open
Abstract
Conotoxins are disulfide-rich small peptides, which are invaluable peptides that target ion channel and neuronal receptors. Conotoxins have been demonstrated as potent pharmaceuticals in the treatment of a series of diseases, such as Alzheimer's disease, Parkinson's disease, and epilepsy. In addition, conotoxins are also ideal molecular templates for the development of new drug lead compounds and play important roles in neurobiological research as well. Thus, the accurate identification of conotoxin types will provide key clues for the biological research and clinical medicine. Generally, conotoxin types are confirmed when their sequence, structure, and function are experimentally validated. However, it is time-consuming and costly to acquire the structure and function information by using biochemical experiments. Therefore, it is important to develop computational tools for efficiently and effectively recognizing conotoxin types based on sequence information. In this work, we reviewed the current progress in computational identification of conotoxins in the following aspects: (i) construction of benchmark dataset; (ii) strategies for extracting sequence features; (iii) feature selection techniques; (iv) machine learning methods for classifying conotoxins; (v) the results obtained by these methods and the published tools; and (vi) future perspectives on conotoxin classification. The paper provides the basis for in-depth study of conotoxins and drug therapy research.
Collapse
Affiliation(s)
- Fu-Ying Dao
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Hui Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Zhen-Dong Su
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Wuritu Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
- Development and Planning Department, Inner Mongolia University, Hohhot 010021, China.
| | - Yun Wu
- College of Computer and Information Engineering, Xiamen University of Technology, Xiamen 361024, China.
| | - Ding Hui
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Wei Chen
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
- Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan 063000, China.
| | - Hua Tang
- Department of Pathophysiology, Southwest Medical University, Luzhou 646000, China.
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| |
Collapse
|
23
|
Pan Y, Liu D, Deng L. Accurate prediction of functional effects for variants by combining gradient tree boosting with optimal neighborhood properties. PLoS One 2017; 12:e0179314. [PMID: 28614374 PMCID: PMC5470696 DOI: 10.1371/journal.pone.0179314] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2017] [Accepted: 05/27/2017] [Indexed: 12/20/2022] Open
Abstract
Single amino acid variations (SAVs) potentially alter biological functions, including causing diseases or natural differences between individuals. Identifying the relationship between a SAV and certain disease provides the starting point for understanding the underlying mechanisms of specific associations, and can help further prevention and diagnosis of inherited disease.We propose PredSAV, a computational method that can effectively predict how likely SAVs are to be associated with disease by incorporating gradient tree boosting (GTB) algorithm and optimally selected neighborhood features. A two-step feature selection approach is used to explore the most relevant and informative neighborhood properties that contribute to the prediction of disease association of SAVs across a wide range of sequence and structural features, especially some novel structural neighborhood features. In cross-validation experiments on the benchmark dataset, PredSAV achieves promising performances with an AUC score of 0.908 and a specificity of 0.838, which are significantly better than that of the other existing methods. Furthermore, we validate the capability of our proposed method by an independent test and gain a competitive advantage as a result. PredSAV, which combines gradient tree boosting with optimally selected neighborhood features, can return reliable predictions in distinguishing between disease-associated and neutral variants. Compared with existing methods, PredSAV shows improved specificity as well as increased overall performance.
Collapse
Affiliation(s)
- Yuliang Pan
- School of Software, Central South University, Changsha, China
| | - Diwei Liu
- School of Software, Central South University, Changsha, China
| | - Lei Deng
- School of Software, Central South University, Changsha, China
- Shanghai Key Laboratory of Intelligent Information Processing, Shanghai, China
| |
Collapse
|
24
|
Tahir M, Hayat M. Machine learning based identification of protein–protein interactions using derived features of physiochemical properties and evolutionary profiles. Artif Intell Med 2017; 78:61-71. [DOI: 10.1016/j.artmed.2017.06.006] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2017] [Revised: 06/09/2017] [Accepted: 06/11/2017] [Indexed: 02/09/2023]
|
25
|
Wei L, Xing P, Su R, Shi G, Ma ZS, Zou Q. CPPred-RF: A Sequence-based Predictor for Identifying Cell-Penetrating Peptides and Their Uptake Efficiency. J Proteome Res 2017; 16:2044-2053. [PMID: 28436664 DOI: 10.1021/acs.jproteome.7b00019] [Citation(s) in RCA: 129] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Cell-penetrating peptides (CPPs), have been proven as important drug-delivery vehicles, demonstrating the potential as therapeutic candidates. The past decade has witnessed a rapid growth in CPP-based research. Recently, many computational efforts have been made to develop machine-learning-based methods for identifying CPPs. Although much progress has been made, existing methods still suffer low feature representation capability that limits further performance improvement. In this study, we propose a novel predictor called CPPred-RF, in which we integrate multiple sequence-based feature descriptors to sufficiently explore distinct information embedded in CPPs, employ a well-established feature selection technique to improve the feature representation, and, for the first time, construct a two-layer prediction framework based on the random forest algorithm. The jackknife results on benchmark data sets show that the proposed CPPred-RF is at least competitive with the state-of-the-art predictors. Moreover, we establish the first online Web server in terms of predicting CPPs and their uptake efficiency simultaneously. It is freely available at http://server.malab.cn/CPPred-RF .
Collapse
Affiliation(s)
- Leyi Wei
- School of Computer Science and Technology, Tianjin University , Tianjin 300072, China
| | - PengWei Xing
- School of Computer Science and Technology, Tianjin University , Tianjin 300072, China
| | - Ran Su
- School of Software, Tianjin University , Tianjin 300354, China
| | - Gaotao Shi
- School of Computer Science and Technology, Tianjin University , Tianjin 300072, China
| | - Zhanshan Sam Ma
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences , Kunming 650223, China
| | - Quan Zou
- School of Computer Science and Technology, Tianjin University , Tianjin 300072, China.,State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences , Kunming 650223, China
| |
Collapse
|
26
|
Asako Y, Uesawa Y. High-Performance Prediction of Human Estrogen Receptor Agonists Based on Chemical Structures. Molecules 2017; 22:molecules22040675. [PMID: 28441746 PMCID: PMC6154693 DOI: 10.3390/molecules22040675] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Revised: 04/16/2017] [Accepted: 04/19/2017] [Indexed: 12/20/2022] Open
Abstract
Many agonists for the estrogen receptor are known to disrupt endocrine functioning. We have developed a computational model that predicts agonists for the estrogen receptor ligand-binding domain in an assay system. Our model was entered into the Tox21 Data Challenge 2014, a computational toxicology competition organized by the National Center for Advancing Translational Sciences. This competition aims to find high-performance predictive models for various adverse-outcome pathways, including the estrogen receptor. Our predictive model, which is based on the random forest method, delivered the best performance in its competition category. In the current study, the predictive performance of the random forest models was improved by strictly adjusting the hyperparameters to avoid overfitting. The random forest models were optimized from 4000 descriptors simultaneously applied to 10,000 activity assay results for the estrogen receptor ligand-binding domain, which have been measured and compiled by Tox21. Owing to the correlation between our model's and the challenge's results, we consider that our model currently possesses the highest predictive power on agonist activity of the estrogen receptor ligand-binding domain. Furthermore, analysis of the optimized model revealed some important features of the agonists, such as the number of hydroxyl groups in the molecules.
Collapse
Affiliation(s)
- Yuki Asako
- Department of Clinical Pharmaceutics Meiji Pharmaceutical University, 2-522-1 Noshio, Kiyose, Tokyo 204-8588, Japan.
| | - Yoshihiro Uesawa
- Department of Clinical Pharmaceutics Meiji Pharmaceutical University, 2-522-1 Noshio, Kiyose, Tokyo 204-8588, Japan.
| |
Collapse
|
27
|
Predicting the Types of Ion Channel-Targeted Conotoxins Based on AVC-SVM Model. BIOMED RESEARCH INTERNATIONAL 2017; 2017:2929807. [PMID: 28497044 PMCID: PMC5401747 DOI: 10.1155/2017/2929807] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/29/2016] [Revised: 02/22/2017] [Accepted: 03/19/2017] [Indexed: 12/20/2022]
Abstract
The conotoxin proteins are disulfide-rich small peptides. Predicting the types of ion channel-targeted conotoxins has great value in the treatment of chronic diseases, epilepsy, and cardiovascular diseases. To solve the problem of information redundancy existing when using current methods, a new model is presented to predict the types of ion channel-targeted conotoxins based on AVC (Analysis of Variance and Correlation) and SVM (Support Vector Machine). First, the F value is used to measure the significance level of the feature for the result, and the attribute with smaller F value is filtered by rough selection. Secondly, redundancy degree is calculated by Pearson Correlation Coefficient. And the threshold is set to filter attributes with weak independence to get the result of the refinement. Finally, SVM is used to predict the types of ion channel-targeted conotoxins. The experimental results show the proposed AVC-SVM model reaches an overall accuracy of 91.98%, an average accuracy of 92.17%, and the total number of parameters of 68. The proposed model provides highly useful information for further experimental research. The prediction model will be accessed free of charge at our web server.
Collapse
|
28
|
Identifying the Types of Ion Channel-Targeted Conotoxins by Incorporating New Properties of Residues into Pseudo Amino Acid Composition. BIOMED RESEARCH INTERNATIONAL 2016; 2016:3981478. [PMID: 27631006 PMCID: PMC5008028 DOI: 10.1155/2016/3981478] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/13/2016] [Accepted: 07/31/2016] [Indexed: 12/31/2022]
Abstract
Conotoxins are a kind of neurotoxin which can specifically interact with potassium, sodium type, and calcium channels. They have become potential drug candidates to treat diseases such as chronic pain, epilepsy, and cardiovascular diseases. Thus, correctly identifying the types of ion channel-targeted conotoxins will provide important clue to understand their function and find potential drugs. Based on this consideration, we developed a new computational method to rapidly and accurately predict the types of ion-targeted conotoxins. Three kinds of new properties of residues were proposed to use in pseudo amino acid composition to formulate conotoxins samples. The support vector machine was utilized as classifier. A feature selection technique based on F-score was used to optimize features. Jackknife cross-validated results showed that the overall accuracy of 94.6% was achieved, which is higher than other published results, demonstrating that the proposed method is superior to published methods. Hence the current method may play a complementary role to other existing methods for recognizing the types of ion-target conotoxins.
Collapse
|
29
|
In Silico Prediction of Gamma-Aminobutyric Acid Type-A Receptors Using Novel Machine-Learning-Based SVM and GBDT Approaches. BIOMED RESEARCH INTERNATIONAL 2016; 2016:2375268. [PMID: 27579307 PMCID: PMC4992803 DOI: 10.1155/2016/2375268] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2016] [Revised: 06/08/2016] [Accepted: 06/19/2016] [Indexed: 11/17/2022]
Abstract
Gamma-aminobutyric acid type-A receptors (GABAARs) belong to multisubunit membrane spanning ligand-gated ion channels (LGICs) which act as the principal mediators of rapid inhibitory synaptic transmission in the human brain. Therefore, the category prediction of GABAARs just from the protein amino acid sequence would be very helpful for the recognition and research of novel receptors. Based on the proteins' physicochemical properties, amino acids composition and position, a GABAAR classifier was first constructed using a 188-dimensional (188D) algorithm at 90% cd-hit identity and compared with pseudo-amino acid composition (PseAAC) and ProtrWeb web-based algorithms for human GABAAR proteins. Then, four classifiers including gradient boosting decision tree (GBDT), random forest (RF), a library for support vector machine (libSVM), and k-nearest neighbor (k-NN) were compared on the dataset at cd-hit 40% low identity. This work obtained the highest correctly classified rate at 96.8% and the highest specificity at 99.29%. But the values of sensitivity, accuracy, and Matthew's correlation coefficient were a little lower than those of PseAAC and ProtrWeb; GBDT and libSVM can make a little better performance than RF and k-NN at the second dataset. In conclusion, a GABAAR classifier was successfully constructed using only the protein sequence information.
Collapse
|
30
|
Using the SMOTE technique and hybrid features to predict the types of ion channel-targeted conotoxins. J Theor Biol 2016; 403:75-84. [DOI: 10.1016/j.jtbi.2016.04.034] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Revised: 04/25/2016] [Accepted: 04/29/2016] [Indexed: 12/22/2022]
|
31
|
Liao Z, Ju Y, Zou Q. Prediction of G Protein-Coupled Receptors with SVM-Prot Features and Random Forest. SCIENTIFICA 2016; 2016:8309253. [PMID: 27529053 PMCID: PMC4978840 DOI: 10.1155/2016/8309253] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/26/2016] [Revised: 06/26/2016] [Accepted: 06/30/2016] [Indexed: 06/06/2023]
Abstract
G protein-coupled receptors (GPCRs) are the largest receptor superfamily. In this paper, we try to employ physical-chemical properties, which come from SVM-Prot, to represent GPCR. Random Forest was utilized as classifier for distinguishing them from other protein sequences. MEME suite was used to detect the most significant 10 conserved motifs of human GPCRs. In the testing datasets, the average accuracy was 91.61%, and the average AUC was 0.9282. MEME discovery analysis showed that many motifs aggregated in the seven hydrophobic helices transmembrane regions adapt to the characteristic of GPCRs. All of the above indicate that our machine-learning method can successfully distinguish GPCRs from non-GPCRs.
Collapse
Affiliation(s)
- Zhijun Liao
- School of Basic Medical Sciences, Fujian Medical University, Fuzhou, Fujian 350108, China
- School of Computer Science and Technology, Tianjin University, Tianjin 300350, China
| | - Ying Ju
- School of Information Science and Technology, Xiamen University, Xiamen, Fujian 361005, China
| | - Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin 300350, China
- State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin 300071, China
| |
Collapse
|
32
|
Wuyun Q, Zheng W, Zhang Y, Ruan J, Hu G. Improved Species-Specific Lysine Acetylation Site Prediction Based on a Large Variety of Features Set. PLoS One 2016; 11:e0155370. [PMID: 27183223 PMCID: PMC4868276 DOI: 10.1371/journal.pone.0155370] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Accepted: 04/27/2016] [Indexed: 12/21/2022] Open
Abstract
Lysine acetylation is a major post-translational modification. It plays a vital role in numerous essential biological processes, such as gene expression and metabolism, and is related to some human diseases. To fully understand the regulatory mechanism of acetylation, identification of acetylation sites is first and most important. However, experimental identification of protein acetylation sites is often time consuming and expensive. Therefore, the alternative computational methods are necessary. Here, we developed a novel tool, KA-predictor, to predict species-specific lysine acetylation sites based on support vector machine (SVM) classifier. We incorporated different types of features and employed an efficient feature selection on each type to form the final optimal feature set for model learning. And our predictor was highly competitive for the majority of species when compared with other methods. Feature contribution analysis indicated that HSE features, which were firstly introduced for lysine acetylation prediction, significantly improved the predictive performance. Particularly, we constructed a high-accurate structure dataset of H.sapiens from PDB to analyze the structural properties around lysine acetylation sites. Our datasets and a user-friendly local tool of KA-predictor can be freely available at http://sourceforge.net/p/ka-predictor.
Collapse
Affiliation(s)
- Qiqige Wuyun
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China, 300071
| | - Wei Zheng
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China, 300071
| | - Yanping Zhang
- Department of Mathematics, School of Science, Hebei University of Engineering, Handan, China, 056038
| | - Jishou Ruan
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China, 300071
- State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin, China, 300071
| | - Gang Hu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China, 300071
- * E-mail:
| |
Collapse
|
33
|
Chen J, Liu B, Huang D. Protein Remote Homology Detection Based on an Ensemble Learning Approach. BIOMED RESEARCH INTERNATIONAL 2016; 2016:5813645. [PMID: 27294123 PMCID: PMC4875977 DOI: 10.1155/2016/5813645] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Accepted: 02/21/2016] [Indexed: 12/15/2022]
Abstract
Protein remote homology detection is one of the central problems in bioinformatics. Although some computational methods have been proposed, the problem is still far from being solved. In this paper, an ensemble classifier for protein remote homology detection, called SVM-Ensemble, was proposed with a weighted voting strategy. SVM-Ensemble combined three basic classifiers based on different feature spaces, including Kmer, ACC, and SC-PseAAC. These features consider the characteristics of proteins from various perspectives, incorporating both the sequence composition and the sequence-order information along the protein sequences. Experimental results on a widely used benchmark dataset showed that the proposed SVM-Ensemble can obviously improve the predictive performance for the protein remote homology detection. Moreover, it achieved the best performance and outperformed other state-of-the-art methods.
Collapse
Affiliation(s)
- Junjie Chen
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
| | - Bingquan Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Dong Huang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
| |
Collapse
|
34
|
Che Y, Ju Y, Xuan P, Long R, Xing F. Identification of Multi-Functional Enzyme with Multi-Label Classifier. PLoS One 2016; 11:e0153503. [PMID: 27078147 PMCID: PMC4831692 DOI: 10.1371/journal.pone.0153503] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2016] [Accepted: 03/30/2016] [Indexed: 11/23/2022] Open
Abstract
Enzymes are important and effective biological catalyst proteins participating in almost all active cell processes. Identification of multi-functional enzymes is essential in understanding the function of enzymes. Machine learning methods perform better in protein structure and function prediction than traditional biological wet experiments. Thus, in this study, we explore an efficient and effective machine learning method to categorize enzymes according to their function. Multi-functional enzymes are predicted with a special machine learning strategy, namely, multi-label classifier. Sequence features are extracted from a position-specific scoring matrix with autocross-covariance transformation. Experiment results show that the proposed method obtains an accuracy rate of 94.1% in classifying six main functional classes through five cross-validation tests and outperforms state-of-the-art methods. In addition, 91.25% accuracy is achieved in multi-functional enzyme prediction, which is often ignored in other enzyme function prediction studies. The online prediction server and datasets can be accessed from the link http://server.malab.cn/MEC/.
Collapse
Affiliation(s)
- Yuxin Che
- School of Information Science and Technology, Xiamen University, Xiamen, Fujian 361005, China
| | - Ying Ju
- School of Information Science and Technology, Xiamen University, Xiamen, Fujian 361005, China
| | - Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Ren Long
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
| | - Fei Xing
- School of Aerospace Engineering, Xiamen University, Xiamen, Fujian 361005, China
| |
Collapse
|
35
|
Survey of Natural Language Processing Techniques in Bioinformatics. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2015; 2015:674296. [PMID: 26525745 PMCID: PMC4615216 DOI: 10.1155/2015/674296] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/10/2015] [Revised: 06/12/2015] [Accepted: 06/21/2015] [Indexed: 01/02/2023]
Abstract
Informatics methods, such as text mining and natural language processing, are always involved in bioinformatics research. In this study, we discuss text mining and natural language processing methods in bioinformatics from two perspectives. First, we aim to search for knowledge on biology, retrieve references using text mining methods, and reconstruct databases. For example, protein-protein interactions and gene-disease relationship can be mined from PubMed. Then, we analyze the applications of text mining and natural language processing techniques in bioinformatics, including predicting protein structure and function, detecting noncoding RNA. Finally, numerous methods and applications, as well as their contributions to bioinformatics, are discussed for future use by text mining and natural language processing researchers.
Collapse
|
36
|
Kabir M, Hayat M. iRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou’s PseAAC to formulate DNA samples. Mol Genet Genomics 2015; 291:285-96. [DOI: 10.1007/s00438-015-1108-5] [Citation(s) in RCA: 95] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2015] [Accepted: 08/19/2015] [Indexed: 10/23/2022]
|
37
|
Feng YE. Identify Secretory Protein of Malaria Parasite with Modified Quadratic Discriminant Algorithm and Amino Acid Composition. Interdiscip Sci 2015; 8:156-161. [DOI: 10.1007/s12539-015-0112-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2014] [Revised: 12/15/2014] [Accepted: 03/16/2015] [Indexed: 12/13/2022]
|
38
|
Bioinformatics-Aided Venomics. Toxins (Basel) 2015; 7:2159-87. [PMID: 26110505 PMCID: PMC4488696 DOI: 10.3390/toxins7062159] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2015] [Revised: 06/03/2015] [Accepted: 06/05/2015] [Indexed: 12/12/2022] Open
Abstract
Venomics is a modern approach that combines transcriptomics and proteomics to explore the toxin content of venoms. This review will give an overview of computational approaches that have been created to classify and consolidate venomics data, as well as algorithms that have helped discovery and analysis of toxin nucleic acid and protein sequences, toxin three-dimensional structures and toxin functions. Bioinformatics is used to tackle specific challenges associated with the identification and annotations of toxins. Recognizing toxin transcript sequences among second generation sequencing data cannot rely only on basic sequence similarity because toxins are highly divergent. Mass spectrometry sequencing of mature toxins is challenging because toxins can display a large number of post-translational modifications. Identifying the mature toxin region in toxin precursor sequences requires the prediction of the cleavage sites of proprotein convertases, most of which are unknown or not well characterized. Tracing the evolutionary relationships between toxins should consider specific mechanisms of rapid evolution as well as interactions between predatory animals and prey. Rapidly determining the activity of toxins is the main bottleneck in venomics discovery, but some recent bioinformatics and molecular modeling approaches give hope that accurate predictions of toxin specificity could be made in the near future.
Collapse
|
39
|
Ding H, Feng PM, Chen W, Lin H. Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis. MOLECULAR BIOSYSTEMS 2015; 10:2229-35. [PMID: 24931825 DOI: 10.1039/c4mb00316k] [Citation(s) in RCA: 106] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The bacteriophage virion proteins play extremely important roles in the fate of host bacterial cells. Accurate identification of bacteriophage virion proteins is very important for understanding their functions and clarifying the lysis mechanism of bacterial cells. In this study, a new sequence-based method was developed to identify phage virion proteins. In the new method, the protein sequences were initially formulated by the g-gap dipeptide compositions. Subsequently, the analysis of variance (ANOVA) with incremental feature selection (IFS) was used to search for the optimal feature set. It was observed that, in jackknife cross-validation, the optimal feature set including 160 optimized features can produce the maximum accuracy of 85.02%. By performing feature analysis, we found that the correlation between two amino acids with one gap was more important than other correlations for phage virion protein prediction and that some of the 1-gap dipeptides were important and mainly contributed to the virion protein prediction. This analysis will provide novel insights into the function of phage virion proteins. On the basis of the proposed method, an online web-server, PVPred, was established and can be freely accessed from the website (http://lin.uestc.edu.cn/server/PVPred). We believe that the PVPred will become a powerful tool to study phage virion proteins and to guide the related experimental validations.
Collapse
Affiliation(s)
- Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | | | | | | |
Collapse
|
40
|
Ganjtabesh M, Montaseri S, Zare-Mirakabad F. Using temperature effects to predict the interactions between two RNAs. J Theor Biol 2015; 364:98-102. [PMID: 25218429 DOI: 10.1016/j.jtbi.2014.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2014] [Revised: 08/30/2014] [Accepted: 09/02/2014] [Indexed: 10/24/2022]
Abstract
MOTIVATION Interaction of two RNA molecules is considered as an important factor that regulates gene expression post-transcriptional process. Most of the ncRNAs prevent the translation of their target mRNA(s) by forming stable bindings with them. Although several computational methods have been proposed to predict the interactions between two RNAs, none of them can produce reliable and accurate results. RESULTS In this paper, a new approach entitled tempRNAs is presented to accurately predict interaction structure between two RNAs based on a gradual temperature decrease. For each specified temperature, our algorithm contains two main steps. First, the secondary structure of each RNA is determined with respect to the previous base pairs as constraints. Second, two RNAs are concatenated and then the interaction between them is calculated according to the previous base pairs. The secondary structures are determined based on minimum free energy model. The proposed algorithm is evaluated for a set of known interacting RNA pairs. The results show the higher accuracy of the proposed method in comparison to the other state-of-the-art algorithms, namely inRNAs and RactIP.
Collapse
Affiliation(s)
- Mohammad Ganjtabesh
- Department of Computer Science, School of Mathematics, Statistics, and Computer Science, University of Tehran, Tehran, Iran.
| | - Soheila Montaseri
- Department of Computer Science, School of Mathematics, Statistics, and Computer Science, University of Tehran, Tehran, Iran.
| | - Fatemeh Zare-Mirakabad
- Department of Computer Science, Faculty of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran.
| |
Collapse
|
41
|
Lin H, Deng EZ, Ding H, Chen W, Chou KC. iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Res 2014; 42:12961-72. [PMID: 25361964 PMCID: PMC4245931 DOI: 10.1093/nar/gku1019] [Citation(s) in RCA: 398] [Impact Index Per Article: 39.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The σ54 promoters are unique in prokaryotic genome and responsible for transcripting carbon and nitrogen-related genes. With the avalanche of genome sequences generated in the postgenomic age, it is highly desired to develop automated methods for rapidly and effectively identifying the σ54 promoters. Here, a predictor called ‘iPro54-PseKNC’ was developed. In the predictor, the samples of DNA sequences were formulated by a novel feature vector called ‘pseudo k-tuple nucleotide composition’, which was further optimized by the incremental feature selection procedure. The performance of iPro54-PseKNC was examined by the rigorous jackknife cross-validation tests on a stringent benchmark data set. As a user-friendly web-server, iPro54-PseKNC is freely accessible at http://lin.uestc.edu.cn/server/iPro54-PseKNC. For the convenience of the vast majority of experimental scientists, a step-by-step protocol guide was provided on how to use the web-server to get the desired results without the need to follow the complicated mathematics that were presented in this paper just for its integrity. Meanwhile, we also discovered through an in-depth statistical analysis that the distribution of distances between the transcription start sites and the translation initiation sites were governed by the gamma distribution, which may provide a fundamental physical principle for studying the σ54 promoters.
Collapse
Affiliation(s)
- Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China Gordon Life Science Institute, Belmont, MA, USA
| | - En-Ze Deng
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Wei Chen
- Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, Hebei United University, Tangshan 063000, China Gordon Life Science Institute, Belmont, MA, USA
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Belmont, MA, USA Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
42
|
Transmission of intra-cellular genetic information: a system proposal. J Theor Biol 2014; 358:208-31. [PMID: 24928152 DOI: 10.1016/j.jtbi.2014.05.040] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2013] [Revised: 05/05/2014] [Accepted: 05/27/2014] [Indexed: 11/21/2022]
Abstract
One of the great challenges of the scientific community on theories of genetic information, genetic communication and genetic coding is to determine a mathematical structure related to DNA sequences. In this paper we propose a model of an intra-cellular transmission system of genetic information similar to a model of a power and bandwidth efficient digital communication system in order to identify a mathematical structure in DNA sequences where such sequences are biologically relevant. The model of a transmission system of genetic information is concerned with the identification, reproduction and mathematical classification of the nucleotide sequence of single stranded DNA by the genetic encoder. Hence, a genetic encoder is devised where labelings and cyclic codes are established. The establishment of the algebraic structure of the corresponding codes alphabets, mappings, labelings, primitive polynomials (p(x)) and code generator polynomials (g(x)) are quite important in characterizing error-correcting codes subclasses of G-linear codes. These latter codes are useful for the identification, reproduction and mathematical classification of DNA sequences. The characterization of this model may contribute to the development of a methodology that can be applied in mutational analysis and polymorphisms, production of new drugs and genetic improvement, among other things, resulting in the reduction of time and laboratory costs.
Collapse
|
43
|
Hayat M, Iqbal N. Discriminating protein structure classes by incorporating Pseudo Average Chemical Shift to Chou's general PseAAC and Support Vector Machine. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2014; 116:184-192. [PMID: 24997484 DOI: 10.1016/j.cmpb.2014.06.007] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2014] [Revised: 06/09/2014] [Accepted: 06/13/2014] [Indexed: 06/03/2023]
Abstract
Proteins control all biological functions in living species. Protein structure is comprised of four major classes including all-α class, all-β class, α+β, and α/β. Each class performs different function according to their nature. Owing to the large exploration of protein sequences in the databanks, the identification of protein structure classes is difficult through conventional methods with respect to cost and time. Looking at the importance of protein structure classes, it is thus highly desirable to develop a computational model for discriminating protein structure classes with high accuracy. For this purpose, we propose a silco method by incorporating Pseudo Average Chemical Shift and Support Vector Machine. Two feature extraction schemes namely Pseudo Amino Acid Composition and Pseudo Average Chemical Shift are used to explore valuable information from protein sequences. The performance of the proposed model is assessed using four benchmark datasets 25PDB, 1189, 640 and 399 employing jackknife test. The success rates of the proposed model are 84.2%, 85.0%, 86.4%, and 89.2%, respectively on the four datasets. The empirical results reveal that the performance of our proposed model compared to existing models is promising in the literature so far and might be useful for future research.
Collapse
Affiliation(s)
- Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan.
| | - Nadeem Iqbal
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
| |
Collapse
|
44
|
Feng Y, Luo L. Using long-range contact number information for protein secondary structure prediction. INT J BIOMATH 2014. [DOI: 10.1142/s1793524514500521] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In this paper, we first combine tetra-peptide structural words with contact number for protein secondary structure prediction. We used the method of increment of diversity combined with quadratic discriminant analysis to predict the structure of central residue for a sequence fragment. The method is used tetra-peptide structural words and long-range contact number as information resources. The accuracy of Q3 is over 83% in 194 proteins. The accuracies of predicted secondary structures for 20 amino acid residues are ranged from 81% to 88%. Moreover, we have introduced the residue long-range contact, which directly indicates the separation of contacting residue in terms of the position in the sequence, and examined the negative influence of long-range residue interactions on predicting secondary structure in a protein. The method is also compared with existing prediction methods. The results show that our method is more effective in protein secondary structures prediction.
Collapse
Affiliation(s)
- Yonge Feng
- College of Science, Inner Mongolia Agriculture University, Hohhot 010018, P. R. China
| | - Liaofu Luo
- School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, P. R. China
| |
Collapse
|
45
|
Xing YQ, Liu GQ, Zhao XJ, Zhao HY, Cai L. Genome-wide characterization and prediction of Arabidopsis thaliana replication origins. Biosystems 2014; 124:1-6. [PMID: 25050475 DOI: 10.1016/j.biosystems.2014.07.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2013] [Revised: 03/25/2014] [Accepted: 07/15/2014] [Indexed: 01/25/2023]
Abstract
Identification of replication origins is crucial for the faithful duplication of genomic DNA. The frequencies of single nucleotides and dinucleotides, GC/AT bias and GC/AT profile in the vicinity of Arabidopsis thaliana replication origins were analyzed in the present work. The guanine content or cytosine content is higher in origin of replication (Ori) than in non-Ori. The SS (S=G or C) dinucleotides are favoured in Ori whereas WW (W=A or T) dinucleotides are favoured in non-Ori. GC/AT bias and GC/AT profile in Ori are significantly different from that in non-Ori. Furthermore, by inputting DNA sequence features into support vector machine, we distinguished between the Ori and non-Ori regions in A. thaliana. The total prediction accuracy is about 69.5% as evaluated by the 10-fold cross-validation. This result suggested that apart from DNA sequence, deciphering the selection of replication origin must integrate many other factors including nucleosome positioning, DNA methylation, histone modification, etc. In addition, by comparing predictive performance we found that the predictive accuracy of SVM using sequence features on the context of WS language is significantly better than that of RY language. Furthermore, the same conclusion was also obtained in S. cerevisiae and D. melanogaster.
Collapse
Affiliation(s)
- Yong-Qiang Xing
- School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou, 014010, China; School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China; The Institute of Bioengineering and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China
| | - Guo-Qing Liu
- School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou, 014010, China; The Institute of Bioengineering and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China
| | - Xiu-Juan Zhao
- School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou, 014010, China; The Institute of Bioengineering and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China
| | - Hong-Yu Zhao
- School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou, 014010, China; The Institute of Bioengineering and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China; Inner Mongolia Key Laboratory of Biomass-Energy Conversion, Baotou, 014010, China
| | - Lu Cai
- School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou, 014010, China; The Institute of Bioengineering and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China; Inner Mongolia Key Laboratory of Biomass-Energy Conversion, Baotou, 014010, China.
| |
Collapse
|
46
|
iCTX-type: a sequence-based predictor for identifying the types of conotoxins in targeting ion channels. BIOMED RESEARCH INTERNATIONAL 2014; 2014:286419. [PMID: 24991545 PMCID: PMC4058692 DOI: 10.1155/2014/286419] [Citation(s) in RCA: 137] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/13/2014] [Revised: 04/22/2014] [Accepted: 05/07/2014] [Indexed: 11/30/2022]
Abstract
Conotoxins are small disulfide-rich neurotoxic peptides, which can bind to ion channels with very high specificity and modulate their activities. Over the last few decades, conotoxins have been the drug candidates for treating chronic pain, epilepsy, spasticity, and cardiovascular diseases. According to their functions and targets, conotoxins are generally categorized into three types: potassium-channel type, sodium-channel type, and calcium-channel types. With the avalanche of peptide sequences generated in the postgenomic age, it is urgent and challenging to develop an automated method for rapidly and accurately identifying the types of conotoxins based on their sequence information alone. To address this challenge, a new predictor, called iCTX-Type, was developed by incorporating the dipeptide occurrence frequencies of a conotoxin sequence into a 400-D (dimensional) general pseudoamino acid composition, followed by the feature optimization procedure to reduce the sample representation from 400-D to 50-D vector. The overall success rate achieved by iCTX-Type via a rigorous cross-validation was over 91%, outperforming its counterpart (RBF network). Besides, iCTX-Type is so far the only predictor in this area with its web-server available, and hence is particularly useful for most experimental scientists to get their desired results without the need to follow the complicated mathematics involved.
Collapse
|
47
|
Human proteins characterization with subcellular localizations. J Theor Biol 2014; 358:61-73. [PMID: 24862400 DOI: 10.1016/j.jtbi.2014.05.008] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2014] [Revised: 05/04/2014] [Accepted: 05/05/2014] [Indexed: 11/20/2022]
Abstract
Proteins are responsible for performing the vast majority of cellular functions which are critical to a cell's survival. The knowledge of the subcellular localization of proteins can provide valuable information about their molecular functions. Therefore, one of the fundamental goals in cell biology and proteomics is to analyze the subcellular localizations and functions of these proteins. Recent large-scale human genomics and proteomics studies have made it possible to characterize human proteins at a subcellular localization level. In this study, according to the annotation in Swiss-Prot, 8842 human proteins were classified into seven subcellular localizations. Human proteins in the seven subcellular localizations were compared by using topological properties, biological properties, codon usage indices, mRNA expression levels, protein complexity and physicochemical properties. All these properties were found to be significantly different in the seven categories. In addition, based on these properties and pseudo-amino acid compositions, a machine learning classifier was built for the prediction of protein subcellular localization. The study presented here was an attempt to address the aforementioned properties for comparing human proteins of different subcellular localizations. We hope our findings presented in this study may provide important help for the prediction of protein subcellular localization and for understanding the general function of human proteins in cells.
Collapse
|
48
|
Yang L, Wang J, Wang H, Lv Y, Zuo Y, Jiang W. Analysis and identification of toxin targets by topological properties in protein–protein interaction network. J Theor Biol 2014; 349:82-91. [DOI: 10.1016/j.jtbi.2014.02.001] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2013] [Revised: 01/20/2014] [Accepted: 02/01/2014] [Indexed: 10/25/2022]
|
49
|
Zhang SW, Shao DD, Zhang SY, Wang YB. Prioritization of candidate disease genes by enlarging the seed set and fusing information of the network topology and gene expression. MOLECULAR BIOSYSTEMS 2014; 10:1400-8. [PMID: 24695957 DOI: 10.1039/c3mb70588a] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
The identification of disease genes is very important not only to provide greater understanding of gene function and cellular mechanisms which drive human disease, but also to enhance human disease diagnosis and treatment. Recently, high-throughput techniques have been applied to detect dozens or even hundreds of candidate genes. However, experimental approaches to validate the many candidates are usually time-consuming, tedious and expensive, and sometimes lack reproducibility. Therefore, numerous theoretical and computational methods (e.g. network-based approaches) have been developed to prioritize candidate disease genes. Many network-based approaches implicitly utilize the observation that genes causing the same or similar diseases tend to correlate with each other in gene-protein relationship networks. Of these network approaches, the random walk with restart algorithm (RWR) is considered to be a state-of-the-art approach. To further improve the performance of RWR, we propose a novel method named ESFSC to identify disease-related genes, by enlarging the seed set according to the centrality of disease genes in a network and fusing information of the protein-protein interaction (PPI) network topological similarity and the gene expression correlation. The ESFSC algorithm restarts at all of the nodes in the seed set consisting of the known disease genes and their k-nearest neighbor nodes, then walks in the global network separately guided by the similarity transition matrix constructed with PPI network topological similarity properties and the correlational transition matrix constructed with the gene expression profiles. As a result, all the genes in the network are ranked by weighted fusing the above results of the RWR guided by two types of transition matrices. Comprehensive simulation results of the 10 diseases with 97 known disease genes collected from the Online Mendelian Inheritance in Man (OMIM) database show that ESFSC outperforms existing methods for prioritizing candidate disease genes. The top prediction results of Alzheimer's disease are consistent with previous literature reports.
Collapse
Affiliation(s)
- Shao-Wu Zhang
- College of Automation, Northwestern Polytechnical University, 710072, Xi'an, China.
| | | | | | | |
Collapse
|
50
|
Ding H, Feng PM, Chen W, Lin H. Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis. MOLECULAR BIOSYSTEMS 2014. [DOI: 10.1039/c4mb00316k pmid: 24931825] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The bacteriophage virion proteins play extremely important roles in the fate of host bacterial cells.
Collapse
Affiliation(s)
- Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education
- Center of Bioinformatics
- School of Life Science and Technology
- University of Electronic Science and Technology of China
- Chengdu 610054, China
| | - Peng-Mian Feng
- School of Public Health
- Hebei United University
- Tangshan 063000, China
| | - Wei Chen
- Department of Physics
- School of Sciences
- and Center for Genomics and Computational Biology
- Hebei United University
- Tangshan 063000, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education
- Center of Bioinformatics
- School of Life Science and Technology
- University of Electronic Science and Technology of China
- Chengdu 610054, China
| |
Collapse
|