1
|
Rukh G, Akbar S, Rehman G, Alarfaj FK, Zou Q. StackedEnC-AOP: prediction of antioxidant proteins using transform evolutionary and sequential features based multi-scale vector with stacked ensemble learning. BMC Bioinformatics 2024; 25:256. [PMID: 39098908 PMCID: PMC11298090 DOI: 10.1186/s12859-024-05884-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Accepted: 07/29/2024] [Indexed: 08/06/2024] Open
Abstract
BACKGROUND Antioxidant proteins are involved in several biological processes and can protect DNA and cells from the damage of free radicals. These proteins regulate the body's oxidative stress and perform a significant role in many antioxidant-based drugs. The current invitro-based medications are costly, time-consuming, and unable to efficiently screen and identify the targeted motif of antioxidant proteins. METHODS In this model, we proposed an accurate prediction method to discriminate antioxidant proteins namely StackedEnC-AOP. The training sequences are formulation encoded via incorporating a discrete wavelet transform (DWT) into the evolutionary matrix to decompose the PSSM-based images via two levels of DWT to form a Pseudo position-specific scoring matrix (PsePSSM-DWT) based embedded vector. Additionally, the Evolutionary difference formula and composite physiochemical properties methods are also employed to collect the structural and sequential descriptors. Then the combined vector of sequential features, evolutionary descriptors, and physiochemical properties is produced to cover the flaws of individual encoding schemes. To reduce the computational cost of the combined features vector, the optimal features are chosen using Minimum redundancy and maximum relevance (mRMR). The optimal feature vector is trained using a stacking-based ensemble meta-model. RESULTS Our developed StackedEnC-AOP method reported a prediction accuracy of 98.40% and an AUC of 0.99 via training sequences. To evaluate model validation, the StackedEnC-AOP training model using an independent set achieved an accuracy of 96.92% and an AUC of 0.98. CONCLUSION Our proposed StackedEnC-AOP strategy performed significantly better than current computational models with a ~ 5% and ~ 3% improved accuracy via training and independent sets, respectively. The efficacy and consistency of our proposed StackedEnC-AOP make it a valuable tool for data scientists and can execute a key role in research academia and drug design.
Collapse
Affiliation(s)
- Gul Rukh
- Department of Zoology, Abdul Wali Khan University Mardan, Mardan, 23200, KP, Pakistan
| | - Shahid Akbar
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, People's Republic of China
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, 23200, KP, Pakistan
| | - Gauhar Rehman
- Department of Zoology, Abdul Wali Khan University Mardan, Mardan, 23200, KP, Pakistan
| | - Fawaz Khaled Alarfaj
- Department of Management Information Systems (MIS), School of Business, King Faisal University (KFU), 31982, Al-Ahsa, Saudi Arabia
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, People's Republic of China.
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, People's Republic of China.
| |
Collapse
|
2
|
Meng C, Pei Y, Bu Y, Zou Q, Ju Y. Machine learning-based antioxidant protein identification model: Progress and evaluation. J Cell Biochem 2023; 124:1825-1834. [PMID: 37877550 DOI: 10.1002/jcb.30491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 09/30/2023] [Accepted: 10/06/2023] [Indexed: 10/26/2023]
Abstract
Efficient and accurate identification of antioxidant proteins is of great significance. In recent years, many models for identifying antioxidant proteins have been proposed, but the low sensitivity and high dimensionality of the models are common problems. The generalization ability of the model needs to be improved. Researchers have tried different feature extraction algorithms and feature selection algorithms to obtain the most effective feature combination and have chosen more appropriate classification algorithms and tools to improve model performance. In this article, we systematically reviewed the data set of the most frequently used antioxidant proteins and the method selection for each step of model establishment and discussed the characteristics of each method. We have conducted a detailed analysis of recent research and believe that the practical ability and efficiency of model application can be improved by reducing model dimensions. The key to improving the performance of antioxidant protein recognition models in the future may lie in feature selection, so this paper also focuses on the combination of feature extraction and selection steps in the analysis of the model building process.
Collapse
Affiliation(s)
- Chaolu Meng
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application of Agriculture and Animal Husbandry, Hohhot, China
| | - Yue Pei
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
| | - Yongbo Bu
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China
| |
Collapse
|
3
|
Shen Z, Liu T, Xu T. Accurate Identification of Antioxidant Proteins Based on a Combination of Machine Learning Techniques and Hidden Markov Model Profiles. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:5770981. [PMID: 34413898 PMCID: PMC8369162 DOI: 10.1155/2021/5770981] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Revised: 07/15/2021] [Accepted: 07/26/2021] [Indexed: 01/19/2023]
Abstract
Antioxidant proteins (AOPs) play important roles in the management and prevention of several human diseases due to their ability to neutralize excess free radicals. However, the identification of AOPs by using wet-lab experimental techniques is often time-consuming and expensive. In this study, we proposed an accurate computational model, called AOP-HMM, to predict AOPs by extracting discriminatory evolutionary features from hidden Markov model (HMM) profiles. First, auto cross-covariance (ACC) variables were applied to transform the HMM profiles into fixed-length feature vectors. Then, we performed the analysis of variance (ANOVA) method to reduce the dimensionality of the raw feature space. Finally, a support vector machine (SVM) classifier was adopted to conduct the prediction of AOPs. To comprehensively evaluate the performance of the proposed AOP-HMM model, the 10-fold cross-validation (CV), the jackknife CV, and the independent test were carried out on two widely used benchmark datasets. The experimental results demonstrated that AOP-HMM outperformed most of the existing methods and could be used to quickly annotate AOPs and guide the experimental process.
Collapse
Affiliation(s)
- Zhehan Shen
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
| | - Taigang Liu
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
| | - Ting Xu
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
| |
Collapse
|
4
|
Feng C, Wei H, Yang D, Feng B, Ma Z, Han S, Zou Q, Shi H. ORS-Pred: An optimized reduced scheme-based identifier for antioxidant proteins. Proteomics 2021; 21:e2100017. [PMID: 34009737 DOI: 10.1002/pmic.202100017] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 04/22/2021] [Accepted: 05/12/2021] [Indexed: 12/30/2022]
Abstract
Antioxidant proteins can terminate a chain of reactions caused by free radicals and protect cells from damage. To identify antioxidant proteins rapidly, a computational model was proposed based on the optimized recoding scheme, sequence information and machine learning methods. First, over 600 recoding schemes were collected to build a scheme set. Then, the original sequence was recoded as a reduced expression whose g-gap dipeptides (g = 0, 1, 2) were used as the features of proteins. Furthermore, a random forest method was used to evaluate the classification ability of the obtained dipeptide features. After going through all schemes, the best predictive performance scheme was chosen as the optimized reduction scheme. Finally, for the RF method, a grid search strategy was used to select a better parameter combination to identify antioxidant proteins. In the experiment, the present method correctly recognized 90.13-99.87% of the antioxidant samples. Other experimental results also proved that the present method was efficient to identify antioxidant proteins. Finally, we also developed a web server that was freely accessible to researchers.
Collapse
Affiliation(s)
- Changli Feng
- Department of Information Science and Technology, Taishan University, Taian, China
| | - Haiyan Wei
- Department of Teachers and Education, Taishan University, Taian, China
| | - Deyun Yang
- Department of Information Science and Technology, Taishan University, Taian, China
| | - Bin Feng
- Department of Information Science and Technology, Taishan University, Taian, China
| | - Zhaogui Ma
- Department of Information Science and Technology, Taishan University, Taian, China
| | - Shuguang Han
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.,China and Hainan Key Laboratory for Computational Science and Application, Hainan Normal University, Haikou, China
| | - Hua Shi
- School of Opto-electronic and Communication Engineering, Xiamen University of Technology, Xiamen, China
| |
Collapse
|
5
|
ANOX: A robust computational model for predicting the antioxidant proteins based on multiple features. Anal Biochem 2021; 631:114257. [PMID: 34043981 DOI: 10.1016/j.ab.2021.114257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 05/12/2021] [Accepted: 05/14/2021] [Indexed: 11/20/2022]
Abstract
As an indispensable component of various living organisms, the antioxidant proteins have been studied for anti-aging and prevention of various diseases, such as altitude sickness, coronary heart disease, and even cancer. However, the traditional experimental methods for identifying the antioxidant proteins are very expensive and time-consuming. Thus, to address the challenge, a new predictor, named ANOX, was developed in this study. Multiple features, such as frequency matrix features (FRE), amino acid and dipeptide composition (AADP), evolutionary difference formula features (EEDP), k-separated bigrams (KSB), and PSI-PRED secondary structure (PRED), were extracted to generate the original feature space. To find the optimized feature subset, the Max-Relevance-Max-Distance (MRMD) algorithm was implemented for feature ranking and our model received the best performance with the top 1170 features. Rigorous tests were performed to evaluate the performance of ANOX, and the results showed that ANOX achieved a major improvement in the prediction accuracy of the antioxidant proteins (AUC:0.930 and 0.935 using 5-fold cross-validation or the jackknife test) compared to the state-of-the-art predictor AOPs-SVM (AUC:0.869 and 0.885). The dataset used in this study and the source code of ANOX are all available at https://github.com/NWAFU-LiuLab/ANOX.
Collapse
|
6
|
Identification of antioxidant proteins using a discriminative intelligent model of k-space amino acid pairs based descriptors incorporating with ensemble feature selection. Biocybern Biomed Eng 2020. [DOI: 10.1016/j.bbe.2020.10.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
7
|
Ao C, Zhou W, Gao L, Dong B, Yu L. Prediction of antioxidant proteins using hybrid feature representation method and random forest. Genomics 2020; 112:4666-4674. [DOI: 10.1016/j.ygeno.2020.08.016] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 08/10/2020] [Accepted: 08/13/2020] [Indexed: 12/19/2022]
|
8
|
Li X, Tang Q, Tang H, Chen W. Identifying Antioxidant Proteins by Combining Multiple Methods. Front Bioeng Biotechnol 2020; 8:858. [PMID: 32793581 PMCID: PMC7391787 DOI: 10.3389/fbioe.2020.00858] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Accepted: 07/03/2020] [Indexed: 11/13/2022] Open
Abstract
Antioxidant proteins play important roles in preventing free radical oxidation from damaging cells and DNA. They have become ideal candidates of disease prevention and treatment. Therefore, it is urgent to identify antioxidants from natural compounds. Since experimental methods are still cost ineffective, a series of computational methods have been proposed to identify antioxidant proteins. However, the performance of the current methods are still not satisfactory. In this study, a support vector machine based method, called Vote9, was proposed to identify antioxidants, in which the sequences were encoded by using the features generated from 9 optimal individual models. Results from jackknife test demonstrated that Vote9 is comparable with the best one of the existing predictors for this task. We hope that Vote9 will become a useful tool or at least can play a complementary role to the existing methods for identifying antioxidants.
Collapse
Affiliation(s)
- Xianhai Li
- School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China.,Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Qiang Tang
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Hua Tang
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Wei Chen
- School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China.,Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China.,School of Life Sciences, Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, China
| |
Collapse
|
9
|
Feng P, Feng L. Recent Advances on Antioxidant Identification Based on Machine Learning Methods. Curr Drug Metab 2020; 21:804-809. [PMID: 32682368 DOI: 10.2174/1389200221666200719001449] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2020] [Revised: 03/17/2020] [Accepted: 05/13/2020] [Indexed: 11/22/2022]
Abstract
Antioxidants are molecules that can prevent damages to cells caused by free radicals. Recent studies also demonstrated that antioxidants play roles in preventing diseases. However, the number of known molecules with antioxidant activity is very small. Therefore, it is necessary to identify antioxidants from various resources. In the past several years, a series of computational methods have been proposed to identify antioxidants. In this review, we briefly summarized recent advances in computationally identifying antioxidants. The challenges and future perspectives for identifying antioxidants were also discussed. We hope this review will provide insights into researches on antioxidant identification.
Collapse
Affiliation(s)
- Pengmian Feng
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611730, China
| | - Lijing Feng
- School of Sciences, North China University of Science and Technology, Tangshan 063000, China
| |
Collapse
|
10
|
López-Cortés A, Cabrera-Andrade A, Vázquez-Naya JM, Pazos A, Gonzáles-Díaz H, Paz-Y-Miño C, Guerrero S, Pérez-Castillo Y, Tejera E, Munteanu CR. Prediction of breast cancer proteins involved in immunotherapy, metastasis, and RNA-binding using molecular descriptors and artificial neural networks. Sci Rep 2020; 10:8515. [PMID: 32444848 PMCID: PMC7244564 DOI: 10.1038/s41598-020-65584-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2019] [Accepted: 04/28/2020] [Indexed: 12/12/2022] Open
Abstract
Breast cancer (BC) is a heterogeneous disease where genomic alterations, protein expression deregulation, signaling pathway alterations, hormone disruption, ethnicity and environmental determinants are involved. Due to the complexity of BC, the prediction of proteins involved in this disease is a trending topic in drug design. This work is proposing accurate prediction classifier for BC proteins using six sets of protein sequence descriptors and 13 machine-learning methods. After using a univariate feature selection for the mix of five descriptor families, the best classifier was obtained using multilayer perceptron method (artificial neural network) and 300 features. The performance of the model is demonstrated by the area under the receiver operating characteristics (AUROC) of 0.980 ± 0.0037, and accuracy of 0.936 ± 0.0056 (3-fold cross-validation). Regarding the prediction of 4,504 cancer-associated proteins using this model, the best ranked cancer immunotherapy proteins related to BC were RPS27, SUPT4H1, CLPSL2, POLR2K, RPL38, AKT3, CDK3, RPS20, RASL11A and UBTD1; the best ranked metastasis driver proteins related to BC were S100A9, DDA1, TXN, PRNP, RPS27, S100A14, S100A7, MAPK1, AGR3 and NDUFA13; and the best ranked RNA-binding proteins related to BC were S100A9, TXN, RPS27L, RPS27, RPS27A, RPL38, MRPL54, PPAN, RPS20 and CSRP1. This powerful model predicts several BC-related proteins that should be deeply studied to find new biomarkers and better therapeutic targets. Scripts can be downloaded at https://github.com/muntisa/neural-networks-for-breast-cancer-proteins.
Collapse
Affiliation(s)
- Andrés López-Cortés
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Mariscal Sucre Avenue, Quito, 170129, Ecuador.
- RNASA-IMEDIR, Computer Science Faculty, University of Coruna, Coruna, 15071, Spain.
- Red Latinoamericana de Implementación y Validación de Guías Clínicas Farmacogenómicas (RELIVAF-CYTED), Quito, Ecuador.
| | - Alejandro Cabrera-Andrade
- RNASA-IMEDIR, Computer Science Faculty, University of Coruna, Coruna, 15071, Spain
- Grupo de Bio-Quimioinformática, Universidad de Las Américas, Avenue de los Granados, Quito, 170125, Ecuador
- Carrera de Enfermería, Facultad de Ciencias de la Salud, Universidad de Las Américas, Avenue de los Granados, Quito, 170125, Ecuador
| | - José M Vázquez-Naya
- RNASA-IMEDIR, Computer Science Faculty, University of Coruna, Coruna, 15071, Spain
- Centro de Investigación en Tecnologías de la Información y las Comunicaciones (CITIC), Campus de Elviña s/n 15071, A Coruña, Spain
- Biomedical Research Institute of A Coruña (INIBIC), University Hospital Complex of A Coruña (CHUAC), 15006, A Coruña, Spain
| | - Alejandro Pazos
- RNASA-IMEDIR, Computer Science Faculty, University of Coruna, Coruna, 15071, Spain
- Centro de Investigación en Tecnologías de la Información y las Comunicaciones (CITIC), Campus de Elviña s/n 15071, A Coruña, Spain
- Biomedical Research Institute of A Coruña (INIBIC), University Hospital Complex of A Coruña (CHUAC), 15006, A Coruña, Spain
| | - Humberto Gonzáles-Díaz
- Department of Organic Chemistry II, University of the Basque Country UPV/EHU, Leioa 48940, Biscay, Spain
- IKERBASQUE, Basque Foundation for Science, Bilbao, 48011, Biscay, Spain
| | - César Paz-Y-Miño
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Mariscal Sucre Avenue, Quito, 170129, Ecuador
| | - Santiago Guerrero
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Mariscal Sucre Avenue, Quito, 170129, Ecuador
| | - Yunierkis Pérez-Castillo
- Grupo de Bio-Quimioinformática, Universidad de Las Américas, Avenue de los Granados, Quito, 170125, Ecuador
- Escuela de Ciencias Físicas y Matemáticas, Universidad de Las Américas, Avenue de los Granados, Quito, 170125, Ecuador
| | - Eduardo Tejera
- Grupo de Bio-Quimioinformática, Universidad de Las Américas, Avenue de los Granados, Quito, 170125, Ecuador
- Facultad de Ingeniería y Ciencias Agropecuarias, Universidad de Las Américas, Avenue de los Granados, Quito, 170125, Ecuador
| | - Cristian R Munteanu
- RNASA-IMEDIR, Computer Science Faculty, University of Coruna, Coruna, 15071, Spain
- Centro de Investigación en Tecnologías de la Información y las Comunicaciones (CITIC), Campus de Elviña s/n 15071, A Coruña, Spain
- Biomedical Research Institute of A Coruña (INIBIC), University Hospital Complex of A Coruña (CHUAC), 15006, A Coruña, Spain
| |
Collapse
|
11
|
Xu Y, Wen Y, Han G. Antioxidant Proteins' Identification Based on Support Vector Machine. Comb Chem High Throughput Screen 2020; 23:319-325. [PMID: 32141416 DOI: 10.2174/1386207323666200306125538] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Revised: 12/23/2019] [Accepted: 01/13/2020] [Indexed: 12/26/2022]
Abstract
BACKGROUND Evidence have increasingly indicated that for human disease, cell metabolism are deeply associated with proteins. Structural mutations and dysregulations of these proteins contribute to the development of the complex disease. Free radicals are unstable molecules that seek for electrons from the surrounding atoms for stability. Once a free radical binds to an atom in the body, a chain reaction occurs, which causes damage to cells and DNA. An antioxidant protein is a substance that protects cells from free radical damage. Accurate identification of antioxidant proteins is important for understanding their role in delaying aging and preventing and treating related diseases. Therefore, computational methods to identify antioxidant proteins have become an effective prior-pinpointing approach to experimental verification. METHODS In this study, support vector machines was used to identify antioxidant proteins, using amino acid compositions and 9-gap dipeptide compositions as feature extraction, and feature reduction by Principal Component Analysis. RESULTS The prediction accuracy Acc of this experiment reached 98.38%, the recall rate Sn of the positive sample was found to be 99.27%, the recall rate Sp of the negative sample reached 97.54%, and the MCC value was 0.9678. To evaluate our proposed method, the predictive performance of 20 antioxidant proteins from the National Center for Biotechnology Information(NCBI) was studied. As a result, 20 antioxidant proteins were correctly predicted by our method. Experimental results demonstrate that the performance of our method is better than the state-of-the-art methods for identification of antioxidant proteins. CONCLUSION We collected experimental protein data from Uniport, including 253 antioxidant proteins and 1552 non-antioxidant proteins. The optimal feature extraction used in this paper is composed of amino acid composition and 9-gap dipeptide. The protein is identified by support vector machine, and the model evaluation index is obtained based on 5-fold cross-validation. Compared with the existing classification model, it is further explained that the SVM recognition model constructed in this paper is helpful for the recognition of antioxidized proteins.
Collapse
Affiliation(s)
- Yuanke Xu
- School of Mathematics and Computational Science, Xiangtan University, Hunan, China
| | - Yaping Wen
- School of Mathematics and Computational Science, Xiangtan University, Hunan, China
| | - Guosheng Han
- School of Mathematics and Computational Science, Xiangtan University, Hunan, China
| |
Collapse
|
12
|
Concu R, Cordeiro MNDS. Alignment-Free Method to Predict Enzyme Classes and Subclasses. Int J Mol Sci 2019; 20:ijms20215389. [PMID: 31671806 PMCID: PMC6862210 DOI: 10.3390/ijms20215389] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Revised: 10/21/2019] [Accepted: 10/23/2019] [Indexed: 01/03/2023] Open
Abstract
The Enzyme Classification (EC) number is a numerical classification scheme for enzymes, established using the chemical reactions they catalyze. This classification is based on the recommendation of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology. Six enzyme classes were recognised in the first Enzyme Classification and Nomenclature List, reported by the International Union of Biochemistry in 1961. However, a new enzyme group was recently added as the six existing EC classes could not describe enzymes involved in the movement of ions or molecules across membranes. Such enzymes are now classified in the new EC class of translocases (EC 7). Several computational methods have been developed in order to predict the EC number. However, due to this new change, all such methods are now outdated and need updating. In this work, we developed a new multi-task quantitative structure-activity relationship (QSAR) method aimed at predicting all 7 EC classes and subclasses. In so doing, we developed an alignment-free model based on artificial neural networks that proved to be very successful.
Collapse
Affiliation(s)
- Riccardo Concu
- LAQV@REQUIMTE/Department of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal.
| | - M Natália D S Cordeiro
- LAQV@REQUIMTE/Department of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal.
| |
Collapse
|
13
|
Meng C, Jin S, Wang L, Guo F, Zou Q. AOPs-SVM: A Sequence-Based Classifier of Antioxidant Proteins Using a Support Vector Machine. Front Bioeng Biotechnol 2019; 7:224. [PMID: 31620433 PMCID: PMC6759716 DOI: 10.3389/fbioe.2019.00224] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2019] [Accepted: 09/03/2019] [Indexed: 01/03/2023] Open
Abstract
Antioxidant proteins play important roles in countering oxidative damage in organisms. Because it is time-consuming and has a high cost, the accurate identification of antioxidant proteins using biological experiments is a challenging task. For these reasons, we proposed a model using machine-learning algorithms that we named AOPs-SVM, which was developed based on sequence features and a support vector machine. Using a testing dataset, we conducted a jackknife cross-validation test with the proposed AOPs-SVM classifier and obtained 0.68 in sensitivity, 0.985 in specificity, 0.942 in average accuracy, 0.741 in MCC, and 0.832 in AUC. This outperformed existing classifiers. The experiment results demonstrate that the AOPs-SVM is an effective classifier and contributes to the research related to antioxidant proteins. A web server was built at http://server.malab.cn/AOPs-SVM/index.jsp to provide open access.
Collapse
Affiliation(s)
- Chaolu Meng
- College of Intelligence and Computing, Tianjin University, Tianjin, China.,College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China
| | - Shunshan Jin
- Department of Neurology, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Lei Wang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Quan Zou
- College of Intelligence and Computing, Tianjin University, Tianjin, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
14
|
Butt AH, Rasool N, Khan YD. Prediction of antioxidant proteins by incorporating statistical moments based features into Chou's PseAAC. J Theor Biol 2019; 473:1-8. [DOI: 10.1016/j.jtbi.2019.04.019] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Revised: 04/02/2019] [Accepted: 04/16/2019] [Indexed: 12/23/2022]
|
15
|
Concu R, D. S. Cordeiro MN, Munteanu CR, González-Díaz H. PTML Model of Enzyme Subclasses for Mining the Proteome of Biofuel Producing Microorganisms. J Proteome Res 2019; 18:2735-2746. [DOI: 10.1021/acs.jproteome.8b00949] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Affiliation(s)
- Riccardo Concu
- LAQV@REQUIMTE/Department of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal
| | - M. Natália. D. S. Cordeiro
- LAQV@REQUIMTE/Department of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal
| | - Cristian R. Munteanu
- RNASA-IMEDIR, Computer Science Faculty, University of A Coruña, 15071 A Coruña, Spain
- INIBIC Biomedical Research Institute of Coruña, CHUAC University Hospital, 15006 A Coruña, Spain
| | - Humbert González-Díaz
- Department of Organic Chemistry II, University of Basque Country UPV/EHU, 48940 Leioa, Biscay, Spain
- IKERBASQUE, Basque Foundation for Science, 48011 Bilbao, Biscay, Spain
| |
Collapse
|
16
|
Liu Y, Munteanu CR, Kong Z, Ran T, Sahagún-Ruiz A, He Z, Zhou C, Tan Z. Identification of coenzyme-binding proteins with machine learning algorithms. Comput Biol Chem 2019; 79:185-192. [PMID: 30851647 DOI: 10.1016/j.compbiolchem.2019.01.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2018] [Revised: 09/11/2018] [Accepted: 01/25/2019] [Indexed: 01/12/2023]
Abstract
The coenzyme-binding proteins play a vital role in the cellular metabolism processes, such as fatty acid biosynthesis, enzyme and gene regulation, lipid synthesis, particular vesicular traffic, and β-oxidation donation of acyl-CoA esters. Based on the theory of Star Graph Topological Indices (SGTIs) of protein primary sequences, we proposed a method to develop a first classification model for predicting protein with coenzyme-binding properties. To simulate the properties of coenzyme-binding proteins, we created a dataset containing 2897 proteins, among 456 proteins functioned as coenzyme-binding activity. The SGTIs of peptide sequence were calculated with Sequence to Star Network (S2SNet) application. We used the SGTIs as inputs to several classification techniques with a machine learning software - Weka. A Random Forest classifier based on 3 features of the embedded and non-embedded graphs was identified as the best predictive model for coenzyme-binding proteins. This model developed was with the true positive (TP) rate of 91.7%, false positive (FP) rate of 7.6%, and Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.971. The prediction of new coenzyme-binding activity proteins using this model could be useful for further drug development or enzyme metabolism researches.
Collapse
Affiliation(s)
- Yong Liu
- Key Laboratory for Agro-Ecological Processes in Subtropical Region, National Engineering Laboratory for Pollution Control and Waste Utilization in Livestock and Poultry Production, South Central Experimental Station of Animal Nutrition and Feed Science in the Ministry of Agriculture, Institute of Subtropical Agriculture, The Chinese Academy of Sciences, Changsha, Hunan, 410125, PR China; Hunan Co-Innovation Center of Animal Production Safety, CICAPS, Changsha, Hunan, 410128, PR China
| | - Cristian R Munteanu
- RNASA-IMEDIR, Computer Science Faculty, University of A Coruna, A Coruña, Spain; Biomedical Research Institute of A Coruña (INIBIC), University Hospital Complex of A Coruña (CHUAC), A Coruña, 15006, Spain
| | - Zhiwei Kong
- Key Laboratory for Agro-Ecological Processes in Subtropical Region, National Engineering Laboratory for Pollution Control and Waste Utilization in Livestock and Poultry Production, South Central Experimental Station of Animal Nutrition and Feed Science in the Ministry of Agriculture, Institute of Subtropical Agriculture, The Chinese Academy of Sciences, Changsha, Hunan, 410125, PR China; University of the Chinese Academy of Sciences, Beijing, 100049, PR China
| | - Tao Ran
- Key Laboratory for Agro-Ecological Processes in Subtropical Region, National Engineering Laboratory for Pollution Control and Waste Utilization in Livestock and Poultry Production, South Central Experimental Station of Animal Nutrition and Feed Science in the Ministry of Agriculture, Institute of Subtropical Agriculture, The Chinese Academy of Sciences, Changsha, Hunan, 410125, PR China; Lethbridge Research and Development Centre, Agriculture and Agri-Food Canada, Lethbridge, Alberta, T1J 4B1, Canada
| | - Alfredo Sahagún-Ruiz
- Department of Microbiology and Immunology, Faculty of Veterinary Medicine and Animal Science, National Autonomous University of Mexico, Universidad 3000, Copilco Coyoacán, CP 04510, México D.F., Mexico
| | - Zhixiong He
- Key Laboratory for Agro-Ecological Processes in Subtropical Region, National Engineering Laboratory for Pollution Control and Waste Utilization in Livestock and Poultry Production, South Central Experimental Station of Animal Nutrition and Feed Science in the Ministry of Agriculture, Institute of Subtropical Agriculture, The Chinese Academy of Sciences, Changsha, Hunan, 410125, PR China; Hunan Co-Innovation Center of Animal Production Safety, CICAPS, Changsha, Hunan, 410128, PR China.
| | - Chuanshe Zhou
- Key Laboratory for Agro-Ecological Processes in Subtropical Region, National Engineering Laboratory for Pollution Control and Waste Utilization in Livestock and Poultry Production, South Central Experimental Station of Animal Nutrition and Feed Science in the Ministry of Agriculture, Institute of Subtropical Agriculture, The Chinese Academy of Sciences, Changsha, Hunan, 410125, PR China; Hunan Co-Innovation Center of Animal Production Safety, CICAPS, Changsha, Hunan, 410128, PR China
| | - Zhiliang Tan
- Key Laboratory for Agro-Ecological Processes in Subtropical Region, National Engineering Laboratory for Pollution Control and Waste Utilization in Livestock and Poultry Production, South Central Experimental Station of Animal Nutrition and Feed Science in the Ministry of Agriculture, Institute of Subtropical Agriculture, The Chinese Academy of Sciences, Changsha, Hunan, 410125, PR China; Hunan Co-Innovation Center of Animal Production Safety, CICAPS, Changsha, Hunan, 410128, PR China
| |
Collapse
|
17
|
Blanco JL, Porto-Pazos AB, Pazos A, Fernandez-Lozano C. Prediction of high anti-angiogenic activity peptides in silico using a generalized linear model and feature selection. Sci Rep 2018; 8:15688. [PMID: 30356060 PMCID: PMC6200741 DOI: 10.1038/s41598-018-33911-z] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Accepted: 10/06/2018] [Indexed: 12/22/2022] Open
Abstract
Screening and in silico modeling are critical activities for the reduction of experimental costs. They also speed up research notably and strengthen the theoretical framework, thus allowing researchers to numerically quantify the importance of a particular subset of information. For example, in fields such as cancer and other highly prevalent diseases, having a reliable prediction method is crucial. The objective of this paper is to classify peptide sequences according to their anti-angiogenic activity to understand the underlying principles via machine learning. First, the peptide sequences were converted into three types of numerical molecular descriptors based on the amino acid composition. We performed different experiments with the descriptors and merged them to obtain baseline results for the performance of the models, particularly of each molecular descriptor subset. A feature selection process was applied to reduce the dimensionality of the problem and remove noisy features – which are highly present in biological problems. After a robust machine learning experimental design under equal conditions (nested resampling, cross-validation, hyperparameter tuning and different runs), we statistically and significantly outperformed the best previously published anti-angiogenic model with a generalized linear model via coordinate descent (glmnet), achieving a mean AUC value greater than 0.96 and with an accuracy of 0.86 with 200 molecular descriptors, mixed from the three groups. A final analysis with the top-40 discriminative anti-angiogenic activity peptides is presented along with a discussion of the feature selection process and the individual importance of each molecular descriptors According to our findings, anti-angiogenic activity peptides are strongly associated with amino acid sequences SP, LSL, PF, DIT, PC, GH, RQ, QD, TC, SC, AS, CLD, ST, MF, GRE, IQ, CQ and HG.
Collapse
Affiliation(s)
- Jose Liñares Blanco
- Department of Computer Science, Faculty of Computer Science, University of A Coruña, A Coruña, 15071, Spain
| | - Ana B Porto-Pazos
- Department of Computer Science, Faculty of Computer Science, University of A Coruña, A Coruña, 15071, Spain.,Instituto de Investigación Biomédica de A Coruña (INIBIC). Complexo Hospitalario Universitario de A Coruña, A Coruña, Spain
| | - Alejandro Pazos
- Department of Computer Science, Faculty of Computer Science, University of A Coruña, A Coruña, 15071, Spain.,Instituto de Investigación Biomédica de A Coruña (INIBIC). Complexo Hospitalario Universitario de A Coruña, A Coruña, Spain
| | - Carlos Fernandez-Lozano
- Department of Computer Science, Faculty of Computer Science, University of A Coruña, A Coruña, 15071, Spain. .,Instituto de Investigación Biomédica de A Coruña (INIBIC). Complexo Hospitalario Universitario de A Coruña, A Coruña, Spain.
| |
Collapse
|
18
|
Xu L, Liang G, Shi S, Liao C. SeqSVM: A Sequence-Based Support Vector Machine Method for Identifying Antioxidant Proteins. Int J Mol Sci 2018; 19:ijms19061773. [PMID: 29914044 PMCID: PMC6032279 DOI: 10.3390/ijms19061773] [Citation(s) in RCA: 71] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2018] [Revised: 06/10/2018] [Accepted: 06/11/2018] [Indexed: 12/20/2022] Open
Abstract
Antioxidant proteins can be beneficial in disease prevention. More attention has been paid to the functionality of antioxidant proteins. Therefore, identifying antioxidant proteins is important for the study. In our work, we propose a computational method, called SeqSVM, for predicting antioxidant proteins based on their primary sequence features. The features are removed to reduce the redundancy by max relevance max distance method. Finally, the antioxidant proteins are identified by support vector machine (SVM). The experimental results demonstrated that our method performs better than existing methods, with the overall accuracy of 89.46%. Although a proposed computational method can attain an encouraging classification result, the experimental results are verified based on the biochemical approaches, such as wet biochemistry and molecular biology techniques.
Collapse
Affiliation(s)
- Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen 518060, China.
| | - Guangmin Liang
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen 518060, China.
| | - Shuhua Shi
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen 518060, China.
| | - Changrui Liao
- Key Laboratory of Optoelectronic Devices and Systems of Ministry of Education and Guangdong Province, College of Optoelectronic Engineering, Shenzhen University, Shenzhen 518060, China.
| |
Collapse
|
19
|
González-Durruthy M, Monserrat JM, Rasulev B, Casañola-Martín GM, Barreiro Sorrivas JM, Paraíso-Medina S, Maojo V, González-Díaz H, Pazos A, Munteanu CR. Carbon Nanotubes' Effect on Mitochondrial Oxygen Flux Dynamics: Polarography Experimental Study and Machine Learning Models using Star Graph Trace Invariants of Raman Spectra. NANOMATERIALS 2017; 7:nano7110386. [PMID: 29137126 PMCID: PMC5707603 DOI: 10.3390/nano7110386] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/07/2017] [Revised: 11/06/2017] [Accepted: 11/08/2017] [Indexed: 11/16/2022]
Abstract
This study presents the impact of carbon nanotubes (CNTs) on mitochondrial oxygen mass flux (Jm) under three experimental conditions. New experimental results and a new methodology are reported for the first time and they are based on CNT Raman spectra star graph transform (spectral moments) and perturbation theory. The experimental measures of Jm showed that no tested CNT family can inhibit the oxygen consumption profiles of mitochondria. The best model for the prediction of Jm for other CNTs was provided by random forest using eight features, obtaining test R-squared (R2) of 0.863 and test root-mean-square error (RMSE) of 0.0461. The results demonstrate the capability of encoding CNT information into spectral moments of the Raman star graphs (SG) transform with a potential applicability as predictive tools in nanotechnology and material risk assessments.
Collapse
Affiliation(s)
- Michael González-Durruthy
- Institute of Biological Science (ICB), Federal University of Rio Grande, Rio Grande, RS 96270-900, Brazil.
| | - Jose M Monserrat
- Institute of Biological Science (ICB), Federal University of Rio Grande, Rio Grande, RS 96270-900, Brazil.
| | - Bakhtiyor Rasulev
- Department of Coatings and Polymeric Materials, North Dakota State University (NDSU), Fargo, ND 58102, USA.
| | | | - José María Barreiro Sorrivas
- Computer Science School (ETSIINF), Polytechnic University of Madrid (UPM), Calle de losCiruelos, Boadilla del Monte, 28660 Madrid, Spain.
| | - Sergio Paraíso-Medina
- Biomedical Informatics Group, Artificial Intelligence Department, Polytechnic University of Madrid, Calle de los Ciruelos, Boadilla del Monte, 28660 Madrid, Spain.
| | - Víctor Maojo
- Biomedical Informatics Group, Artificial Intelligence Department, Polytechnic University of Madrid, Calle de los Ciruelos, Boadilla del Monte, 28660 Madrid, Spain.
| | - Humberto González-Díaz
- Department of Organic Chemistry II, University of the Basque Country UPV/EHU, 48940 Leioa, Biscay, Spain.
- IKERBASQUE, Basque Foundation for Science, 48011 Bilbao, Biscay, Spain.
| | - Alejandro Pazos
- INIBIC Institute of Biomedical Research, CHUAC, UDC, 15006 Coruña, Spain.
- RNASA-IMEDIR, Computer Sciences Faculty, University of Coruña, 15071 Coruña, Spain.
| | - Cristian R Munteanu
- INIBIC Institute of Biomedical Research, CHUAC, UDC, 15006 Coruña, Spain.
- RNASA-IMEDIR, Computer Sciences Faculty, University of Coruña, 15071 Coruña, Spain.
| |
Collapse
|
20
|
González-Durruthy M, Alberici LC, Curti C, Naal Z, Atique-Sawazaki DT, Vázquez-Naya JM, González-Díaz H, Munteanu CR. Experimental-Computational Study of Carbon Nanotube Effects on Mitochondrial Respiration: In Silico Nano-QSPR Machine Learning Models Based on New Raman Spectra Transform with Markov-Shannon Entropy Invariants. J Chem Inf Model 2017; 57:1029-1044. [PMID: 28414908 DOI: 10.1021/acs.jcim.6b00458] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
The study of selective toxicity of carbon nanotubes (CNTs) on mitochondria (CNT-mitotoxicity) is of major interest for future biomedical applications. In the current work, the mitochondrial oxygen consumption (E3) is measured under three experimental conditions by exposure to pristine and oxidized CNTs (hydroxylated and carboxylated). Respiratory functional assays showed that the information on the CNT Raman spectroscopy could be useful to predict structural parameters of mitotoxicity induced by CNTs. The in vitro functional assays show that the mitochondrial oxidative phosphorylation by ATP-synthase (or state V3 of respiration) was not perturbed in isolated rat-liver mitochondria. For the first time a star graph (SG) transform of the CNT Raman spectra is proposed in order to obtain the raw information for a nano-QSPR model. Box-Jenkins and perturbation theory operators are used for the SG Shannon entropies. A modified RRegrs methodology is employed to test four regression methods such as multiple linear regression (LM), partial least squares regression (PLS), neural networks regression (NN), and random forest (RF). RF provides the best models to predict the mitochondrial oxygen consumption in the presence of specific CNTs with R2 of 0.998-0.999 and RMSE of 0.0068-0.0133 (training and test subsets). This work is aimed at demonstrating that the SG transform of Raman spectra is useful to encode CNT information, similarly to the SG transform of the blood proteome spectra in cancer or electroencephalograms in epilepsy and also as a prospective chemoinformatics tool for nanorisk assessment. All data files and R object models are available at https://dx.doi.org/10.6084/m9.figshare.3472349 .
Collapse
Affiliation(s)
| | | | | | | | | | - José M Vázquez-Naya
- RNASA-IMEDIR, Computer Science Faculty, University of A Coruna , Campus de Elviña s/n, 15071 A Coruña, Spain
| | - Humberto González-Díaz
- Department of Organic Chemistry II, Faculty of Science and Technology, University of the Basque Country UPV/EHU , 48940, Leioa, Bizkaia, Spain.,IKERBASQUE, Basque Foundation for Science , 48011, Bilbao, Bizkaia, Spain
| | - Cristian R Munteanu
- RNASA-IMEDIR, Computer Science Faculty, University of A Coruna , Campus de Elviña s/n, 15071 A Coruña, Spain.,Instituto de Investigación Biomédica de A Coruña (INIBIC), Complexo Hospitalario Universitario de A Coruña (CHUAC) , A Coruña, 15006, Spain
| |
Collapse
|
21
|
Hamzeh-Mivehroud M, Sokouti B, Dastmalchi S. An Introduction to the Basic Concepts in QSAR-Aided Drug Design. Oncology 2017. [DOI: 10.4018/978-1-5225-0549-5.ch002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The need for the development of new drugs to combat existing and newly identified conditions is unavoidable. One of the important tools used in the advanced drug development pipeline is computer-aided drug design. Traditionally, to find a drug many ligands were synthesized and evaluated for their effectiveness using suitable bioassays and if all other drug-likeness features were met, the candidate(s) would possibly reach the market. Although this approach is still in use in advanced format, computational methods are an indispensable component of modern drug development projects. One of the methods used from very early days of rationalizing the drug design approaches is Quantitative Structure-Activity Relationship (QSAR). This chapter overviews QSAR modeling steps by introducing molecular descriptors, mathematical model development for relating biological activities to molecular structures, and model validation. At the end, several successful cases where QSAR studies were used extensively are presented.
Collapse
Affiliation(s)
| | | | - Siavoush Dastmalchi
- Biotechnology Research Center, Tabriz University of Medical Sciences, Iran & School of Pharmacy, Tabriz University of Medical Sciences, Iran
| |
Collapse
|
22
|
Fernandez-Lozano C, Cuiñas RF, Seoane JA, Fernández-Blanco E, Dorado J, Munteanu CR. Classification of signaling proteins based on molecular star graph descriptors using Machine Learning models. J Theor Biol 2015; 384:50-8. [DOI: 10.1016/j.jtbi.2015.07.038] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2015] [Revised: 07/20/2015] [Accepted: 07/27/2015] [Indexed: 12/11/2022]
|
23
|
Identifying Antioxidant Proteins by Using Optimal Dipeptide Compositions. Interdiscip Sci 2015; 8:186-191. [DOI: 10.1007/s12539-015-0124-9] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2015] [Revised: 06/29/2015] [Accepted: 08/29/2015] [Indexed: 11/26/2022]
|
24
|
Liu Y, Munteanu CR, Fernández Blanco E, Tan Z, Santos Del Riego A, Pazos A. Prediction of Nucleotide Binding Peptides Using Star Graph Topological Indices. Mol Inform 2015; 34:736-41. [PMID: 27491034 DOI: 10.1002/minf.201500064] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2015] [Accepted: 07/06/2015] [Indexed: 01/14/2023]
Abstract
The nucleotide binding proteins are involved in many important cellular processes, such as transmission of genetic information or energy transfer and storage. Therefore, the screening of new peptides for this biological function is an important research topic. The current study proposes a mixed methodology to obtain the first classification model that is able to predict new nucleotide binding peptides, using only the amino acid sequence. Thus, the methodology uses a Star graph molecular descriptor of the peptide sequences and the Machine Learning technique for the best classifier. The best model represents a Random Forest classifier based on two features of the embedded and non-embedded graphs. The performance of the model is excellent, considering similar models in the field, with an Area Under the Receiver Operating Characteristic Curve (AUROC) value of 0.938 and true positive rate (TPR) of 0.886 (test subset). The prediction of new nucleotide binding peptides with this model could be useful for drug target studies in drug development.
Collapse
Affiliation(s)
- Yong Liu
- Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruna, Campus de Elviña s/n, 15071, A Coruña, Spain, phone/fax: +34-981167000/+34-981167160.,Faculty of Veterinary Medicine and Animal Science, Autonomous University of the State of Mexico, Toluca, 50090, México.,Key Laboratory of Subtropical Agro-ecological Engineering, Institute of Subtropical Agriculture, the Chinese Academy of Sciences, Changsha, Hunan, 410125, P. R. China
| | - Cristian R Munteanu
- Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruna, Campus de Elviña s/n, 15071, A Coruña, Spain, phone/fax: +34-981167000/+34-981167160.
| | - Enrique Fernández Blanco
- Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruna, Campus de Elviña s/n, 15071, A Coruña, Spain, phone/fax: +34-981167000/+34-981167160
| | - Zhiliang Tan
- Key Laboratory of Subtropical Agro-ecological Engineering, Institute of Subtropical Agriculture, the Chinese Academy of Sciences, Changsha, Hunan, 410125, P. R. China
| | - Antonino Santos Del Riego
- Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruna, Campus de Elviña s/n, 15071, A Coruña, Spain, phone/fax: +34-981167000/+34-981167160
| | - Alejandro Pazos
- Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruna, Campus de Elviña s/n, 15071, A Coruña, Spain, phone/fax: +34-981167000/+34-981167160
| |
Collapse
|
25
|
Hamzeh-Mivehroud M, Sokouti B, Dastmalchi S. An Introduction to the Basic Concepts in QSAR-Aided Drug Design. QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS IN DRUG DESIGN, PREDICTIVE TOXICOLOGY, AND RISK ASSESSMENT 2015. [DOI: 10.4018/978-1-4666-8136-1.ch001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The need for the development of new drugs to combat existing and newly identified conditions is unavoidable. One of the important tools used in the advanced drug development pipeline is computer-aided drug design. Traditionally, to find a drug many ligands were synthesized and evaluated for their effectiveness using suitable bioassays and if all other drug-likeness features were met, the candidate(s) would possibly reach the market. Although this approach is still in use in advanced format, computational methods are an indispensable component of modern drug development projects. One of the methods used from very early days of rationalizing the drug design approaches is Quantitative Structure-Activity Relationship (QSAR). This chapter overviews QSAR modeling steps by introducing molecular descriptors, mathematical model development for relating biological activities to molecular structures, and model validation. At the end, several successful cases where QSAR studies were used extensively are presented.
Collapse
Affiliation(s)
- Maryam Hamzeh-Mivehroud
- Biotechnology Research Center & School of Pharmacy, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Babak Sokouti
- Biotechnology Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Siavoush Dastmalchi
- Biotechnology Research Center & School of Pharmacy, Tabriz University of Medical Sciences, Tabriz, Iran
| |
Collapse
|
26
|
Feng PM, Lin H, Chen W. Identification of antioxidants from sequence information using naïve Bayes. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2013; 2013:567529. [PMID: 24062796 PMCID: PMC3766563 DOI: 10.1155/2013/567529] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 05/03/2013] [Revised: 07/20/2013] [Accepted: 07/22/2013] [Indexed: 12/22/2022]
Abstract
Antioxidant proteins are substances that protect cells from the damage caused by free radicals. Accurate identification of new antioxidant proteins is important in understanding their roles in delaying aging. Therefore, it is highly desirable to develop computational methods to identify antioxidant proteins. In this study, a Naïve Bayes-based method was proposed to predict antioxidant proteins using amino acid compositions and dipeptide compositions. In order to remove redundant information, a novel feature selection technique was employed to single out optimized features. In the jackknife test, the proposed method achieved an accuracy of 66.88% for the discrimination between antioxidant and nonantioxidant proteins, which is superior to that of other state-of-the-art classifiers. These results suggest that the proposed method could be an effective and promising high-throughput method for antioxidant protein identification.
Collapse
Affiliation(s)
- Peng-Mian Feng
- School of Public Health, Hebei United University, Tangshan 063000, China
| | - Hao Lin
- Key Laboratory for NeuroInformation of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Wei Chen
- Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063000, China
| |
Collapse
|
27
|
Scerbo M, Radhakrishnan H, Cotton B, Dua A, Del Junco D, Wade C, Holcomb JB. Prehospital triage of trauma patients using the Random Forest computer algorithm. J Surg Res 2013; 187:371-6. [PMID: 24484906 DOI: 10.1016/j.jss.2013.06.037] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2013] [Revised: 06/14/2013] [Accepted: 06/19/2013] [Indexed: 10/26/2022]
Abstract
BACKGROUND Overtriage not only wastes resources but also displaces the patient from their community and causes delay of treatment for the more seriously injured. This study aimed to validate the Random Forest computer model (RFM) as means of better triaging trauma patients to level 1 trauma centers. METHODS Adult trauma patients with "medium activation" presenting via helicopter to a level 1 trauma center from May 2007 to May 2009 were included. The "medium activation" trauma patient is alert and hemodynamically stable on scene but has either subnormal vital signs or accumulation of risk factors that may indicate a potentially serious injury. Variables included in the RFM analysis were demographics, mechanism of injury, prehospital fluid, medications, vitals, and disposition. Statistical analysis was performed via the Random Forest algorithm to compare our institutional triage rate to rates determined by the RFM. RESULTS A total of 1653 patients were included in this study, of which 496 were used in the testing set of the RFM. In our testing set, 33.8% of patients brought to our level 1 trauma center could have been managed at a level 3 trauma center, and 88% of patients who required a level 1 trauma center were identified correctly. In the testing set, there was an overtriage rate of 66%, whereas using the RFM, we decreased the overtriage rate to 42% (P < 0.001). There was an undertriage rate of 8.3%. The RFM predicted patient disposition with a sensitivity of 89%, specificity of 42%, negative predictive value of 92%, and positive predictive value of 34%. CONCLUSIONS Although prospective validation is required, it appears that computer modeling potentially could be used to guide triage decisions, allowing both more accurate triage and more efficient use of the trauma system.
Collapse
Affiliation(s)
- Michelle Scerbo
- Division of Acute Care Surgery, Department of Surgery, Center for Translational Injury Research (CeTIR), University of Texas-Houston, Houston, Texas
| | - Hari Radhakrishnan
- Division of Acute Care Surgery, Department of Surgery, Center for Translational Injury Research (CeTIR), University of Texas-Houston, Houston, Texas
| | - Bryan Cotton
- Division of Acute Care Surgery, Department of Surgery, Center for Translational Injury Research (CeTIR), University of Texas-Houston, Houston, Texas
| | - Anahita Dua
- Division of Acute Care Surgery, Department of Surgery, Center for Translational Injury Research (CeTIR), University of Texas-Houston, Houston, Texas
| | - Deborah Del Junco
- Division of Acute Care Surgery, Department of Surgery, Center for Translational Injury Research (CeTIR), University of Texas-Houston, Houston, Texas
| | - Charles Wade
- Division of Acute Care Surgery, Department of Surgery, Center for Translational Injury Research (CeTIR), University of Texas-Houston, Houston, Texas
| | - John B Holcomb
- Division of Acute Care Surgery, Department of Surgery, Center for Translational Injury Research (CeTIR), University of Texas-Houston, Houston, Texas.
| |
Collapse
|