1
|
Ghosh V, Bhattacharjee A, Kumar A, Ojha PK. q-RASTR modelling for prediction of diverse toxic chemicals towards T. pyriformis. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2024; 35:11-30. [PMID: 38193248 DOI: 10.1080/1062936x.2023.2298452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 12/16/2023] [Indexed: 01/10/2024]
Abstract
A series of diverse organic compounds impose serious detrimental effects on the health of living organisms and the environment. Determination of the structural aspects of compounds that impart toxicity and evaluation of the same is crucial before public usage. The present study aims to determine the structural characteristics of compounds for Tetrahymena pyriformis toxicity using the q-RASTR (Quantitative Read Across Structure-Toxicity Relationship) model. It was developed using RASTR and 2-D descriptors for a dataset of 1792 compounds with defined endpoint (pIGC50) against a model organism, T. pyriformis. For the current study, the whole dataset was divided based on activity/property into the training and test sets, and the q-RASTR model was developed employing six descriptors (three latent variables) having r2, Q2F1 and Q2 values of 0.739, 0.767, and 0.735, respectively. The generated model was thoroughly validated using internationally recognized internal and external validation criteria to assess the model's dependability and predictability. It was highlighted that high molecular weight, aromatic hydroxyls, nitrogen, double bonds, and hydrophobicity increase the toxicity of organic compounds. The current study demonstrates the applicability of the RASTR algorithm in QSTR model development for the prediction of toxic chemicals (pIGC50) towards T. pyriformis.
Collapse
Affiliation(s)
- V Ghosh
- Drug Discovery and Development Laboratory (DDD Lab), Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
| | - A Bhattacharjee
- Drug Discovery and Development Laboratory (DDD Lab), Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
| | - A Kumar
- Drug Discovery and Development Laboratory (DDD Lab), Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
| | - P K Ojha
- Drug Discovery and Development Laboratory (DDD Lab), Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
| |
Collapse
|
2
|
Li X, Huang J, Chen R, You Z, Peng J, Shi Q, Li G, Liu F. Chromium in soil detection using adaptive weighted normalization and linear weighted network framework for LIBS matrix effect reduction. JOURNAL OF HAZARDOUS MATERIALS 2023; 448:130885. [PMID: 36738619 DOI: 10.1016/j.jhazmat.2023.130885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 01/12/2023] [Accepted: 01/26/2023] [Indexed: 06/18/2023]
Abstract
Rapid and accurate detection of agricultural soil chromium (Cr) is of great significance for soil pollution assessment. Laser-induced breakdown spectroscopy (LIBS) could serve as a rapid and chemical-free method for hazardous metal analysis compared with conventional chemical methods. However, the detection of LIBS is interfered by uncertainty and matrix effect. In this study, an average strategy combined with linear weighted network (LWNet) was proposed to reduce the uncertainty. Adaptive weighted normalization-LWNet (AWN-LWNet) framework was proposed to reduce the matrix effect in two soil types. The results indicated that LWNet outperformed traditional machine learning and achieved the average relative error (ARE) of 2.08 % and 3.03 % for yellow brown soil and lateritic red soil, respectively. Moreover, LWNet could effectively mine Cr feature peaks even under the low spectral resolution. AWN-LWNet was the optimal model compared with commonly used models to reduce matrix effect (ARE=4.12 %). Besides, AWN-LWNet greatly reduced the number (from 22016 to 72) of spectral variables for model input. By extracting Cr peaks from models, the difference of Cr peaks intensity could be intuitively observed, which served as spectral interpretation for matrix effect reduction. The two methods have the potential to realize the detection of hazardous metals in soil by LIBS.
Collapse
Affiliation(s)
- Xiaolong Li
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China
| | - Jing Huang
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China
| | - Rongqin Chen
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China
| | - Zhengkai You
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China
| | - Jiyu Peng
- College of Mechanical Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Qingcai Shi
- QIN Soil Testing Laboratory (Shandong) Co., Ltd, Shidanli Road, Linshu 276700, China
| | - Gang Li
- CAS Key Lab of Urban Environment and Health, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen 361021, China; Jiaxing Key Lab of Soil Health, Yangtze Delta Region Healthy Agriculture Institute, Jiaxing 314503, China
| | - Fei Liu
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China.
| |
Collapse
|
3
|
Jia Q, Wang S, Yu M, Wang Q, Yan F. Two QSAR models for predicting the toxicity of chemicals towards Tetrahymena pyriformis based on topological-norm descriptors and spatial-norm descriptors. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2023; 34:147-161. [PMID: 36749040 DOI: 10.1080/1062936x.2023.2171478] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 01/17/2023] [Indexed: 06/18/2023]
Abstract
Quantitative structure-activity relationship (QSAR) is important for safe, rapid and effective risk assessment of chemicals. In this study, two QSAR models were established with 1230 chemicals to predict toxicity towards Tetrahymena pyriformis using multiple linear regression (MLR) method. The topological(T)-QSAR model was developed by using topological-norm descriptors generated from the topological structure, and the spatial(S)-QSAR model were built with spatial-norm descriptors obtained from the three-dimensional structure of molecules and topological-norm descriptors. The r2training and r2test are 0.8304 and 0.8338 for the T-QSAR model, and 0.8485 and 0.8585 for the S-QSAR model, which means that T-QSAR model and S-QSAR model can be used to predict toxicity quickly and accurately. In addition, we also conducted validation on the developed models. Satisfying validation results and statistical parameters demonstrated that QSAR models based on the topological-norm descriptors and spatial-norm descriptors proposed in this paper could be further utilized to estimate the toxicity of chemicals towards Tetrahymena pyriformis.
Collapse
Affiliation(s)
- Q Jia
- School of Marine and Environmental Science, Tianjin Marine Environmental Protection and Restoration Technology Engineering Center, Tianjin University of Science and Technology, Tianjin, PR China
| | - S Wang
- School of Marine and Environmental Science, Tianjin Marine Environmental Protection and Restoration Technology Engineering Center, Tianjin University of Science and Technology, Tianjin, PR China
| | - M Yu
- School of Chemical Engineering and Material Science, Tianjin University of Science and Technology, Tianjin, PR China
| | - Q Wang
- School of Chemical Engineering and Material Science, Tianjin University of Science and Technology, Tianjin, PR China
| | - F Yan
- School of Chemical Engineering and Material Science, Tianjin University of Science and Technology, Tianjin, PR China
| |
Collapse
|
4
|
Tu K, Wen S, Cheng Y, Xu Y, Pan T, Hou H, Gu R, Wang J, Wang F, Sun Q. A model for genuineness detection in genetically and phenotypically similar maize variety seeds based on hyperspectral imaging and machine learning. PLANT METHODS 2022; 18:81. [PMID: 35690826 PMCID: PMC9188178 DOI: 10.1186/s13007-022-00918-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Accepted: 05/31/2022] [Indexed: 05/24/2023]
Abstract
BACKGROUND Variety genuineness and purity are essential indices of maize seed quality that affect yield. However, detection methods for variety genuineness are time-consuming, expensive, require extensive training, or destroy the seeds in the process. Here, we present an accurate, high-throughput, cost-effective, and non-destructive method for screening variety genuineness that uses seed phenotype data with machine learning to distinguish between genetically and phenotypically similar seed varieties. Specifically, we obtained image data of seed morphology and hyperspectral reflectance for Jingke 968 and nine other closely-related varieties (non-Jingke 968). We then compared the robustness of three common machine learning algorithms in distinguishing these varieties based on the phenotypic imaging data. RESULTS Our results showed that hyperspectral imaging (HSI) combined with a multilayer perceptron (MLP) or support vector machine (SVM) model could distinguish Jingke 968 from varieties that differed by as few as two loci, with a 99% or higher accuracy, while machine vision imaging provided ~ 90% accuracy. Through model validation and updating with varieties not included in the training data, we developed a genuineness detection model for Jingke 968 that effectively discriminated between genetically similar and distant varieties. CONCLUSIONS This strategy has potential for wide adoption in large-scale variety genuineness detection operations for internal quality control or governmental regulatory agencies, or for accelerating the breeding of new varieties. Besides, it could easily be extended to other target varieties and other crops.
Collapse
Affiliation(s)
- Keling Tu
- Department of Plant Genetics & Breeding and Seed Science, College of Agronomy and Biotechnology, Ministry of Agriculture and Rural Affairs/Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University/The Innovation Center (Beijing) of Crop Seeds Whole-Process Technology Research, Beijing, 100193, People's Republic of China
| | - Shaozhe Wen
- Beijing Key Laboratory of Vegetable Germplasm Improvement, Beijing Vegetable Research Center, Beijing Academy of Agriculture and Forestry Sciences (BAAFS), Beijing, 100097, People's Republic of China
| | - Ying Cheng
- Department of Plant Genetics & Breeding and Seed Science, College of Agronomy and Biotechnology, Ministry of Agriculture and Rural Affairs/Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University/The Innovation Center (Beijing) of Crop Seeds Whole-Process Technology Research, Beijing, 100193, People's Republic of China
| | - Yanan Xu
- Department of Plant Genetics & Breeding and Seed Science, College of Agronomy and Biotechnology, Ministry of Agriculture and Rural Affairs/Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University/The Innovation Center (Beijing) of Crop Seeds Whole-Process Technology Research, Beijing, 100193, People's Republic of China
| | - Tong Pan
- Department of Plant Genetics & Breeding and Seed Science, College of Agronomy and Biotechnology, Ministry of Agriculture and Rural Affairs/Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University/The Innovation Center (Beijing) of Crop Seeds Whole-Process Technology Research, Beijing, 100193, People's Republic of China
| | - Haonan Hou
- Department of Plant Genetics & Breeding and Seed Science, College of Agronomy and Biotechnology, Ministry of Agriculture and Rural Affairs/Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University/The Innovation Center (Beijing) of Crop Seeds Whole-Process Technology Research, Beijing, 100193, People's Republic of China
| | - Riliang Gu
- Department of Plant Genetics & Breeding and Seed Science, College of Agronomy and Biotechnology, Ministry of Agriculture and Rural Affairs/Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University/The Innovation Center (Beijing) of Crop Seeds Whole-Process Technology Research, Beijing, 100193, People's Republic of China
| | - Jianhua Wang
- Department of Plant Genetics & Breeding and Seed Science, College of Agronomy and Biotechnology, Ministry of Agriculture and Rural Affairs/Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University/The Innovation Center (Beijing) of Crop Seeds Whole-Process Technology Research, Beijing, 100193, People's Republic of China
| | - Fengge Wang
- Beijing Key Laboratory of Maize DNA Fingerprinting and Molecular Breeding, Maize Research Center, Beijing Academy of Agriculture and Forestry Sciences (BAAFS), Beijing, 100097, People's Republic of China.
| | - Qun Sun
- Department of Plant Genetics & Breeding and Seed Science, College of Agronomy and Biotechnology, Ministry of Agriculture and Rural Affairs/Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University/The Innovation Center (Beijing) of Crop Seeds Whole-Process Technology Research, Beijing, 100193, People's Republic of China.
| |
Collapse
|
5
|
Xu M, Yang H, Liu G, Tang Y, Li W. In Silico Prediction of Chemical Aquatic Toxicity by Multiple Machine Learning and Deep Learning Approaches. J Appl Toxicol 2022; 42:1766-1776. [PMID: 35653511 DOI: 10.1002/jat.4354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 05/16/2022] [Accepted: 05/31/2022] [Indexed: 11/08/2022]
Abstract
Fish is one of the model animals used to evaluate the adverse effects of a chemical exposed to the ecosystem. However, its low throughput and relevantly high expense make it impossible to test all new chemicals in manufacture. Hence using in silico models to prioritize compounds to be tested has been widely applied in environmental risk assessment and drug discovery. In this study, we constructed the local predictive models for four fish species, including bluegill sunfish, rainbow trout, fathead minnow, and sheepshead minnow, and the global models with all four fish data. A total of 1874 unique compounds with their labels, i.e. toxic (LC50 < 10 ppm) or nontoxic were collected from ECOTOX and literature. Both conventional machine learning methods and the deep learning architecture, graph convolutional network (GCN), were used to build predictive models. The classification accuracy of the best local model for each fish species was higher than 0.83. For the global models, two strategies including consistency prediction and probability threshold were adopted to improve the predictive capability at the cost of limiting applicability domain. For 63% of compounds in domain, the accuracy was around 0.97. By comparison of the deep learning and machine learning methods, we found that the single-task GCN showed specific advantages in performance and multi-task GCN showed no advantages over the conventional machine learning methods. The data and models are available on GitHub (https://github.com/ChemPredict/ChemicalAquaticToxicity).
Collapse
Affiliation(s)
- Minjie Xu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Hongbin Yang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| |
Collapse
|
6
|
Yoosefzadeh-Najafabadi M, Eskandari M, Torabi S, Torkamaneh D, Tulpan D, Rajcan I. Machine-Learning-Based Genome-Wide Association Studies for Uncovering QTL Underlying Soybean Yield and Its Components. Int J Mol Sci 2022; 23:5538. [PMID: 35628351 PMCID: PMC9141736 DOI: 10.3390/ijms23105538] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2022] [Revised: 05/11/2022] [Accepted: 05/13/2022] [Indexed: 12/14/2022] Open
Abstract
A genome-wide association study (GWAS) is currently one of the most recommended approaches for discovering marker-trait associations (MTAs) for complex traits in plant species. Insufficient statistical power is a limiting factor, especially in narrow genetic basis species, that conventional GWAS methods are suffering from. Using sophisticated mathematical methods such as machine learning (ML) algorithms may address this issue and advance the implication of this valuable genetic method in applied plant-breeding programs. In this study, we evaluated the potential use of two ML algorithms, support-vector machine (SVR) and random forest (RF), in a GWAS and compared them with two conventional methods of mixed linear models (MLM) and fixed and random model circulating probability unification (FarmCPU), for identifying MTAs for soybean-yield components. In this study, important soybean-yield component traits, including the number of reproductive nodes (RNP), non-reproductive nodes (NRNP), total nodes (NP), and total pods (PP) per plant along with yield and maturity, were assessed using a panel of 227 soybean genotypes evaluated at two locations over two years (four environments). Using the SVR-mediated GWAS method, we were able to discover MTAs colocalized with previously reported quantitative trait loci (QTL) with potential causal effects on the target traits, supported by the functional annotation of candidate gene analyses. This study demonstrated the potential benefit of using sophisticated mathematical approaches, such as SVR, in a GWAS to complement conventional GWAS methods for identifying MTAs that can improve the efficiency of genomic-based soybean-breeding programs.
Collapse
Affiliation(s)
| | - Milad Eskandari
- Department of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, Canada; (M.Y.-N.); (S.T.); (I.R.)
| | - Sepideh Torabi
- Department of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, Canada; (M.Y.-N.); (S.T.); (I.R.)
| | - Davoud Torkamaneh
- Département de Phytologie, Université Laval, Québec City, QC G1V 0A6, Canada;
| | - Dan Tulpan
- Department of Animal Biosciences, University of Guelph, Guelph, ON N1G 2W1, Canada;
| | - Istvan Rajcan
- Department of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, Canada; (M.Y.-N.); (S.T.); (I.R.)
| |
Collapse
|
7
|
Kaneko H. Examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variables. Heliyon 2021; 7:e07356. [PMID: 34195450 PMCID: PMC8237311 DOI: 10.1016/j.heliyon.2021.e07356] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Revised: 05/02/2021] [Accepted: 06/16/2021] [Indexed: 11/24/2022] Open
Abstract
The selection of a descriptor, X, is crucial for improving the interpretation and prediction accuracy of a regression model. In this study, the prediction accuracy of models constructed using the selected X was determined and the results of variable selection, according to the number of selected X and number of selected variables that are unrelated to an objective variable, such as activities and properties (y), were investigated to evaluate the variable or feature selection methods. Variable selection methods include least absolute shrinkage and selection operator, genetic algorithm-based partial least squares, genetic algorithm-based support vector regression, and Boruta. Several regression analysis methods were used to test the prediction accuracy of the model constructed using the selected X. The characteristics of each variable selection method were analyzed using eight datasets. The results showed that even when variables unrelated to y were selected by variable selection and the number of unrelated variables was the same as the number of the original variables, a regression model with good accuracy, which ignores the influence of such noise variables, can be constructed by applying various regression analysis methods. Additionally, the variables related to y must not to be deleted. These findings provide a basis for improving the variable selection methods.
Collapse
Affiliation(s)
- Hiromasa Kaneko
- Department of Applied Chemistry, School of Science and Technology, Meiji University, 1-1-1 Higashi-Mita, Tama-ku, Kawasaki, Kanagawa 214-8571, Japan
| |
Collapse
|
8
|
Hesami M, Naderi R, Tohidfar M, Yoosefzadeh-Najafabadi M. Development of support vector machine-based model and comparative analysis with artificial neural network for modeling the plant tissue culture procedures: effect of plant growth regulators on somatic embryogenesis of chrysanthemum, as a case study. PLANT METHODS 2020; 16:112. [PMID: 32817755 PMCID: PMC7424974 DOI: 10.1186/s13007-020-00655-9] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Accepted: 08/08/2020] [Indexed: 05/12/2023]
Abstract
BACKGROUND Optimizing the somatic embryogenesis protocol can be considered as the first and foremost step in successful gene transformation studies. However, it is usually difficult to achieve an optimized embryogenesis protocol due to the cost and time-consuming as well as the complexity of this process. Therefore, it is necessary to use a novel computational approach, such as machine learning algorithms for this aim. In the present study, two machine learning algorithms, including Multilayer Perceptron (MLP) as an artificial neural network (ANN) and support vector regression (SVR), were employed to model somatic embryogenesis of chrysanthemum, as a case study, and compare their prediction accuracy. RESULTS The results showed that SVR (R2 > 0.92) had better performance accuracy than MLP (R2 > 0.82). Moreover, the Non-dominated Sorting Genetic Algorithm-II (NSGA-II) was also applied for the optimization of the somatic embryogenesis and the results showed that the highest embryogenesis rate (99.09%) and the maximum number of somatic embryos per explant (56.24) can be obtained from a medium containing 9.10 μM 2,4-dichlorophenoxyacetic acid (2,4-D), 4.70 μM kinetin (KIN), and 18.73 μM sodium nitroprusside (SNP). According to our results, SVR-NSGA-II was able to optimize the chrysanthemum's somatic embryogenesis accurately. CONCLUSIONS SVR-NSGA-II can be employed as a reliable and applicable computational methodology in future plant tissue culture studies.
Collapse
Affiliation(s)
- Mohsen Hesami
- Department of Plant Agriculture, University of Guelph, Guelph, ON Canada
| | - Roohangiz Naderi
- Department of Horticultural Science, Faculty of Agriculture, University of Tehran, Karaj, Iran
| | - Masoud Tohidfar
- Department of Plant Biotechnology, Faculty of Science and Biotechnology, Shahid Beheshti University, G.C., Tehran, Iran
| | | |
Collapse
|
9
|
Hu Y, Lu Y, Wang S, Zhang M, Qu X, Niu B. Application of Machine Learning Approaches for the Design and Study of Anticancer Drugs. Curr Drug Targets 2020; 20:488-500. [PMID: 30091413 DOI: 10.2174/1389450119666180809122244] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Revised: 06/19/2018] [Accepted: 06/25/2018] [Indexed: 12/14/2022]
Abstract
BACKGROUND Globally the number of cancer patients and deaths are continuing to increase yearly, and cancer has, therefore, become one of the world's highest causes of morbidity and mortality. In recent years, the study of anticancer drugs has become one of the most popular medical topics. OBJECTIVE In this review, in order to study the application of machine learning in predicting anticancer drugs activity, some machine learning approaches such as Linear Discriminant Analysis (LDA), Principal components analysis (PCA), Support Vector Machine (SVM), Random forest (RF), k-Nearest Neighbor (kNN), and Naïve Bayes (NB) were selected, and the examples of their applications in anticancer drugs design are listed. RESULTS Machine learning contributes a lot to anticancer drugs design and helps researchers by saving time and is cost effective. However, it can only be an assisting tool for drug design. CONCLUSION This paper introduces the application of machine learning approaches in anticancer drug design. Many examples of success in identification and prediction in the area of anticancer drugs activity prediction are discussed, and the anticancer drugs research is still in active progress. Moreover, the merits of some web servers related to anticancer drugs are mentioned.
Collapse
Affiliation(s)
- Yan Hu
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Yi Lu
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Shuo Wang
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Mengying Zhang
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Xiaosheng Qu
- National Engineering Laboratory of Southwest Endangered Medicinal Resources Development, Guangxi Botanical Garden of Medicinal Plants, 530023,Nanning, China
| | - Bing Niu
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| |
Collapse
|
10
|
|
11
|
Zheng X, Lai W, Chen H, Fang S. Data Prediction of Mobile Network Traffic in Public Scenes by SOS- vSVR Method. SENSORS 2020; 20:s20030603. [PMID: 31978957 PMCID: PMC7037419 DOI: 10.3390/s20030603] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 01/14/2020] [Accepted: 01/14/2020] [Indexed: 11/16/2022]
Abstract
Accurate base station traffic data in a public place with large changes in the amount of people could help predict the occurrence of network congestion, which would allow us to effectively allocate network resources. This is of great significance for festival network support, routine maintenance, and resource scheduling. However, there are a few related reports on base station traffic prediction, especially base station traffic prediction in public scenes with fluctuations in people flow. This study proposes a public scene traffic data prediction method, which is based on a v Support Vector Regression (vSVR) algorithm. To achieve optimal prediction of traffic, a symbiotic organisms search (SOS) was adopted to optimize the vSVR parameters. Meanwhile, the optimal input time step was determined through a large number of experiments. Experimental data was obtained at the base station of Huainan Wanda Plaza, in the Anhui province of China, for three months, with the granularity being one hour. To verify the predictive performance of vSVR, the classic regression algorithm extreme learning machine (ELM) and variational Bayesian Linear Regression (vBLR) were used. Their optimal prediction results were compared with vSVR predictions. Experimental results show that the prediction results from SOS-vSVR were the best. Outcomes of this study could provide guidance for preventing network congestion and improving the user experience.
Collapse
Affiliation(s)
- Xiaoliang Zheng
- State Key Laboratory of Mining Response and Disaster Prevention and Control in Deep Coal Mines, Anhui University of Science and Technology, Huainan 232000, China;
- School of Electrical and Information Engineering, Anhui University of Science and Technology, Huainan 232000, China;
| | - Wenhao Lai
- School of Electrical and Information Engineering, Anhui University of Science and Technology, Huainan 232000, China;
- Correspondence:
| | - Hualiang Chen
- School of Electrical and Information Engineering, Anhui University of Science and Technology, Huainan 232000, China;
| | - Shen Fang
- Huainan Branch of China Mobile Group Anhui Company Limited, Huainan 232000, China;
| |
Collapse
|
12
|
Wang Y, Zheng B, Xu M, Cai S, Younseo J, Zhang C, Jiang B. Prediction and Analysis of Hub Genes in Renal Cell Carcinoma based on CFS Gene Selection Method Combined with Adaboost Algorithm. Med Chem 2020; 16:654-663. [PMID: 31584378 DOI: 10.2174/1573406415666191004100744] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Revised: 06/04/2019] [Accepted: 08/23/2019] [Indexed: 02/05/2023]
Abstract
BACKGROUND Renal cell carcinoma (RCC) is the most common malignant tumor of the adult kidney. OBJECTIVE The aim of this study was to identify key genes signatures during RCC and uncover their potential mechanisms. METHODS Firstly, the gene expression profiles of GSE53757 which contained 144 samples, including 72 kidney cancer samples and 72 controls, were downloaded from the GEO database. And then differentially expressed genes (DEGs) between the kidney cancer samples and the controls were identified. After that, GO and KEGG enrichment analyses of DEGs were performed by DAVID. Furthermore, the correlation-based feature subset (CFS) method was applied to the selection of key genes of DEGs. In addition, the classification model between the kidney cancer samples and the controls was built by Adaboost based on the selected key genes. RESULTS 213 DEGs including 80 up-regulated and 133 down-regulated genes were selected as the feature genes to build the classification model between the kidney cancer samples and the controls by CFS method. The accuracy of the classification model by using 5-folds cross-validation test and independent set test is 84.4% and 83.3%, respectively. Besides, TYROBP, CD4163, CAV1, CXCL9, CXCL11 and CXCL13 also can be found in the top 20 hub genes screened by proteinprotein interaction (PPI) network. CONCLUSION It indicated that CFS is a useful tool to identify key genes in kidney cancer. Besides, we also predicted genes such as TYROBP, CD4163, CAV1, CXCL9, CXCL11 and CXCL13 that might target genes to diagnose the kidney cancer.
Collapse
Affiliation(s)
- Yina Wang
- Department of VIP Medical Center, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, 510630, China
| | - Benrong Zheng
- Department of VIP Medical Center, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, 510630, China
| | - Manbin Xu
- Department of Head and Neck Surgery, The Cancer Hospital of Shantou University Medical College, Shantou 515000, China
| | - Shaoping Cai
- Department of Acupuncture and Moxibustion Foshan Hospital of TCM, Foshan 528000, China
| | - Jeong Younseo
- Center for Bioinformatics and Computational Biology, Pai Chai University, Daejeon, South Korea
| | - Chi Zhang
- Huaxia Eye Hospital of Foshan, Huaxia Eye Hospital Group, Foshan, Guangdong, 528000, China
| | - Boxiong Jiang
- Department of VIP Medical Center, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, 510630, China
| |
Collapse
|
13
|
Yoosefzadeh-Najafabadi M, Earl HJ, Tulpan D, Sulik J, Eskandari M. Application of Machine Learning Algorithms in Plant Breeding: Predicting Yield From Hyperspectral Reflectance in Soybean. FRONTIERS IN PLANT SCIENCE 2020; 11:624273. [PMID: 33510761 PMCID: PMC7835636 DOI: 10.3389/fpls.2020.624273] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Accepted: 12/10/2020] [Indexed: 05/20/2023]
Abstract
Recent substantial advances in high-throughput field phenotyping have provided plant breeders with affordable and efficient tools for evaluating a large number of genotypes for important agronomic traits at early growth stages. Nevertheless, the implementation of large datasets generated by high-throughput phenotyping tools such as hyperspectral reflectance in cultivar development programs is still challenging due to the essential need for intensive knowledge in computational and statistical analyses. In this study, the robustness of three common machine learning (ML) algorithms, multilayer perceptron (MLP), support vector machine (SVM), and random forest (RF), were evaluated for predicting soybean (Glycine max) seed yield using hyperspectral reflectance. For this aim, the hyperspectral reflectance data for the whole spectra ranged from 395 to 1005 nm, which were collected at the R4 and R5 growth stages on 250 soybean genotypes grown in four environments. The recursive feature elimination (RFE) approach was performed to reduce the dimensionality of the hyperspectral reflectance data and select variables with the largest importance values. The results indicated that R5 is more informative stage for measuring hyperspectral reflectance to predict seed yields. The 395 nm reflectance band was also identified as the high ranked band in predicting the soybean seed yield. By considering either full or selected variables as the input variables, the ML algorithms were evaluated individually and combined-version using the ensemble-stacking (E-S) method to predict the soybean yield. The RF algorithm had the highest performance with a value of 84% yield classification accuracy among all the individual tested algorithms. Therefore, by selecting RF as the metaClassifier for E-S method, the prediction accuracy increased to 0.93, using all variables, and 0.87, using selected variables showing the success of using E-S as one of the ensemble techniques. This study demonstrated that soybean breeders could implement E-S algorithm using either the full or selected spectra reflectance to select the high-yielding soybean genotypes, among a large number of genotypes, at early growth stages.
Collapse
Affiliation(s)
| | - Hugh J. Earl
- Department of Plant Agriculture, University of Guelph, Guelph, ON, Canada
| | - Dan Tulpan
- Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada
| | - John Sulik
- Department of Plant Agriculture, University of Guelph, Guelph, ON, Canada
| | - Milad Eskandari
- Department of Plant Agriculture, University of Guelph, Guelph, ON, Canada
- *Correspondence: Milad Eskandari,
| |
Collapse
|
14
|
|
15
|
Niu B, Liang C, Lu Y, Zhao M, Chen Q, Zhang Y, Zheng L, Chou KC. Glioma stages prediction based on machine learning algorithm combined with protein-protein interaction networks. Genomics 2019; 112:837-847. [PMID: 31150762 DOI: 10.1016/j.ygeno.2019.05.024] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Accepted: 05/25/2019] [Indexed: 12/18/2022]
Abstract
BACKGROUND Glioma is the most lethal nervous system cancer. Recent studies have made great efforts to study the occurrence and development of glioma, but the molecular mechanisms are still unclear. This study was designed to reveal the molecular mechanisms of glioma based on protein-protein interaction network combined with machine learning methods. Key differentially expressed genes (DEGs) were screened and selected by using the protein-protein interaction (PPI) networks. RESULTS As a result, 19 genes between grade I and grade II, 21 genes between grade II and grade III, and 20 genes between grade III and grade IV. Then, five machine learning methods were employed to predict the gliomas stages based on the selected key genes. After comparison, Complement Naive Bayes classifier was employed to build the prediction model for grade II-III with accuracy 72.8%. And Random forest was employed to build the prediction model for grade I-II and grade III-VI with accuracy 97.1% and 83.2%, respectively. Finally, the selected genes were analyzed by PPI networks, Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, and the results improve our understanding of the biological functions of select DEGs involved in glioma growth. We expect that the key genes expressed have a guiding significance for the occurrence of gliomas or, at the very least, that they are useful for tumor researchers. CONCLUSION Machine learning combined with PPI networks, GO and KEGG analyses of selected DEGs improve our understanding of the biological functions involved in glioma growth.
Collapse
Affiliation(s)
- Bing Niu
- School of Life Sciences, Shanghai University, Shanghai 200444, China; Gordon Life Science Institute, Boston, MA 02478, USA.
| | - Chaofeng Liang
- Department of Neurosurgery, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Yi Lu
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Manman Zhao
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Qin Chen
- School of Life Sciences, Shanghai University, Shanghai 200444, China.
| | - Yuhui Zhang
- Renji Hospital, Medical School, Shanghai Jiaotong University, 160 Pujian Rd, New Pudong District, Shanghai 200127, China; Changhai Hospital, Second Military Medical University, Shanghai 200433, China.
| | - Linfeng Zheng
- Department of Radiology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200080, China; Department of Radiology, Shanghai First People's Hospital, Baoshan Branch, Shanghai 200940, China.
| | - Kuo-Chen Chou
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; Gordon Life Science Institute, Boston, MA 02478, USA.
| |
Collapse
|
16
|
Han Q, Yang C, Lu J, Zhang Y, Li J. Metabolism of Oxalate in Humans: A Potential Role Kynurenine Aminotransferase/Glutamine Transaminase/Cysteine Conjugate Beta-lyase Plays in Hyperoxaluria. Curr Med Chem 2019; 26:4944-4963. [PMID: 30907303 DOI: 10.2174/0929867326666190325095223] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Revised: 02/17/2019] [Accepted: 02/22/2019] [Indexed: 11/22/2022]
Abstract
Hyperoxaluria, excessive urinary oxalate excretion, is a significant health problem worldwide. Disrupted oxalate metabolism has been implicated in hyperoxaluria and accordingly, an enzymatic disturbance in oxalate biosynthesis can result in the primary hyperoxaluria. Alanine glyoxylate aminotransferase-1 and glyoxylate reductase, the enzymes involving glyoxylate (precursor for oxalate) metabolism, have been related to primary hyperoxalurias. Some studies suggest that other enzymes such as glycolate oxidase and alanine glyoxylate aminotransferase-2 might be associated with primary hyperoxaluria as well, but evidence of a definitive link is not strong between the clinical cases and gene mutations. There are still some idiopathic hyperoxalurias, which require a further study for the etiologies. Some aminotransferases, particularly kynurenine aminotransferases, can convert glyoxylate to glycine. Based on biochemical and structural characteristics, expression level, subcellular localization of some aminotransferases, a number of them appear able to catalyze the transamination of glyoxylate to glycine more efficiently than alanine glyoxylate aminotransferase-1. The aim of this minireview is to explore other undermining causes of primary hyperoxaluria and stimulate research toward achieving a comprehensive understanding of underlying mechanisms leading to the disease. Herein, we reviewed all aminotransferases in the liver for their functions in glyoxylate metabolism. Particularly, kynurenine aminotransferase-I and III were carefully discussed regarding their biochemical and structural characteristics, cellular localization, and enzyme inhibition. Kynurenine aminotransferase-III is, so far, the most efficient putative mitochondrial enzyme to transaminate glyoxylate to glycine in mammalian livers, might be an interesting enzyme to look over in hyperoxaluria etiology of primary hyperoxaluria and should be carefully investigated for its involvement in oxalate metabolism.
Collapse
Affiliation(s)
- Qian Han
- Key Laboratory of Tropical Biological Resources of Ministry of Education, Hainan University, Haikou, Hainan 570228. China
| | - Cihan Yang
- Key Laboratory of Tropical Biological Resources of Ministry of Education, Hainan University, Haikou, Hainan 570228. China
| | - Jun Lu
- Central South University Xiangya School of Medicine Affiliated Haikou People's Hospital, Haikou, Hainan 570208. China
| | - Yinai Zhang
- Central South University Xiangya School of Medicine Affiliated Haikou People's Hospital, Haikou, Hainan 570208. China
| | - Jianyong Li
- Department of Biochemistry, Virginia Tech, Blacksburg, VA 24061. United States
| |
Collapse
|
17
|
Wu J, Mai G, Deng B, Younseo J, Du D, Chen F, Ma Q. Quantitative Structure-activity Relationship of Acetylcholinesterase Inhibitors based on mRMR Combined with Support Vector Regression. LETT ORG CHEM 2019. [DOI: 10.2174/1570178615666181008125341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
In this work, support vector regression (SVR), an effective machine learning method, proposed by Vapnik was applied to establish QSAR model for a series of AchEI. Fourteen descriptors were selected for constructing the SVR mode by using mRMR-Forward feature selection method. The parameters (ε, C) were adjusted by leave-one-out cross validation (LOOCV) method which was used to judge the predictive power of different models. After optimization, one optimal SVR-QSAR model was attained, and the mean relative errors (MRE) of LOOCV by using SVR is 1.72%. As a result, LogP negatively affected the activity, Refractivity and Water Accessible Surface Area positively affected the activity.
Collapse
Affiliation(s)
- Jiaxiang Wu
- Shanghai Key Laboratory of Bio-Crops, College of Life Science, Shanghai University, Shanghai, China
| | - Guozhao Mai
- Department of Rehabilitation Medicine, The People's Hospital of Heshan, Guangdong, China
| | - Bowen Deng
- Shanghai Key Laboratory of Bio-Crops, College of Life Science, Shanghai University, Shanghai, China
| | - Jeong Younseo
- Center for Bioinformatics and Computational Biology, Pai Chai University, Daejeon, South Korea
| | - Dongsu Du
- Shanghai Key Laboratory of Bio-Crops, College of Life Science, Shanghai University, Shanghai, China
| | - Fuxue Chen
- Shanghai Key Laboratory of Bio-Crops, College of Life Science, Shanghai University, Shanghai, China
| | - Qiaorong Ma
- Department of Clinical Laboratory, Minzu Hospital of Guangxi Zhuang Autonomous Region, Affiliated Minzu Hospital of Guangxi Medical University, Nanning, Guangxi, China
| |
Collapse
|
18
|
Chen W, Liang X, Nong Z, Li Y, Pan X, Chen C, Huang L. The Multiple Applications and Possible Mechanisms of the Hyperbaric Oxygenation Therapy. Med Chem 2018; 15:459-471. [PMID: 30569869 DOI: 10.2174/1573406415666181219101328] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Revised: 10/23/2018] [Accepted: 12/12/2018] [Indexed: 12/18/2022]
Abstract
Hyperbaric Oxygenation Therapy (HBOT) is used as an adjunctive method for multiple diseases. The method meets the routine treating and is non-invasive, as well as provides 100% pure oxygen (O2), which is at above-normal atmospheric pressure in a specialized chamber. It is well known that in the condition of O2 deficiency, it will induce a series of adverse events. In order to prevent the injury induced by anoxia, the capability of offering pressurized O2 by HBOT seems involuntary and significant. In recent years, HBOT displays particular therapeutic efficacy in some degree, and it is thought to be beneficial to the conditions of angiogenesis, tissue ischemia and hypoxia, nerve system disease, diabetic complications, malignancies, Carbon monoxide (CO) poisoning and chronic radiation-induced injury. Single and combination HBOT are both applied in previous studies, and the manuscript is to review the current applications and possible mechanisms of HBOT. The applicability and validity of HBOT for clinical treatment remain controversial, even though it is regarded as an adjunct to conventional medical treatment with many other clinical benefits. There also exists a negative side effect of accepting pressurized O2, such as oxidative stress injury, DNA damage, cellular metabolic, activating of coagulation, endothelial dysfunction, acute neurotoxicity and pulmonary toxicity. Then it is imperative to comprehensively consider the advantages and disadvantages of HBOT in order to obtain a satisfying therapeutic outcome.
Collapse
Affiliation(s)
- Wan Chen
- Department of Emergency, the People's Hospital of Guangxi Zhuang Autonomous Region, Nanning, Guangxi 530021, China
| | - Xingmei Liang
- Department of Pharmacy, Guangxi Medical College, Nanning, Guangxi 530021, China
| | - Zhihuan Nong
- Department of Pharmacology, Guangxi Institute of Chinese Medicine and Pharmaceutical Science, Nanning 530022, China
| | - Yaoxuan Li
- Department of Neurology, the People's Hospital of Guangxi Zhuang Autonomous Region, Nanning 530022, China
| | - Xiaorong Pan
- Department of Hyperbaric oxygen, the People's Hospital of Guangxi Zhuang Autonomous Region, Nanning, Guangxi 530021, China
| | - Chunxia Chen
- Department of Hyperbaric oxygen, the People's Hospital of Guangxi Zhuang Autonomous Region, Nanning, Guangxi 530021, China
| | - Luying Huang
- Department of Respiratory Medicine, the People's Hospital of Guangxi Zhuang Autonomous Region, Nanning, Guangxi 530021, China
| |
Collapse
|
19
|
Liang Y, Zhang S. Identify Gram-negative bacterial secreted protein types by incorporating different modes of PSSM into Chou’s general PseAAC via Kullback–Leibler divergence. J Theor Biol 2018; 454:22-29. [DOI: 10.1016/j.jtbi.2018.05.035] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2018] [Revised: 05/19/2018] [Accepted: 05/29/2018] [Indexed: 12/14/2022]
|
20
|
Qiu WR, Sun BQ, Xiao X, Xu ZC, Jia JH, Chou KC. iKcr-PseEns: Identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier. Genomics 2018; 110:239-246. [DOI: 10.1016/j.ygeno.2017.10.008] [Citation(s) in RCA: 99] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Revised: 10/23/2017] [Accepted: 10/25/2017] [Indexed: 01/23/2023]
|
21
|
Genome-wide analysis of H3K36me3 and its regulations to cancer-related genes expression in human cell lines. Biosystems 2018; 171:59-65. [DOI: 10.1016/j.biosystems.2018.07.004] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Revised: 07/01/2018] [Accepted: 07/09/2018] [Indexed: 01/11/2023]
|
22
|
Villaverde JJ, Sevilla-Morán B, López-Goti C, Alonso-Prados JL, Sandín-España P. Considerations of nano-QSAR/QSPR models for nanopesticide risk assessment within the European legislative framework. THE SCIENCE OF THE TOTAL ENVIRONMENT 2018; 634:1530-1539. [PMID: 29710651 DOI: 10.1016/j.scitotenv.2018.04.033] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2017] [Revised: 04/02/2018] [Accepted: 04/03/2018] [Indexed: 06/08/2023]
Abstract
The European market for pesticides is currently legislated through the well-developed Regulation (EC) No. 1107/2009. This regulation promotes the competitiveness of European agriculture, recognizing the necessity of safe pesticides for human and animal health and the environment to protect crops against pests, diseases and weeds. In this sense, nanotechnology can provide a tremendous opportunity to achieve a more rational use of pesticides. However, the lack of information regarding nanopesticides and their fate and behavior in the environment and their effects on human and animal health is inhibiting rapid nanopesticide incorporation into European Union agriculture. This review analyzes the recent state of knowledge on nanopesticide risk assessment, highlighting the challenges that need to be overcame to accelerate the arrival of these new tools for plant protection to European agricultural professionals. Novel nano-Quantitative Structure-Activity/Structure-Property Relationship (nano-QSAR/QSPR) tools for risk assessment are analyzed, including modeling methods and validation procedures towards the potential of these computational instruments to meet the current requirements for authorization of nanoformulations. Future trends on these issues, of pressing importance within the context of the current European pesticide legislative framework, are also discussed. Standard protocols to make high-quality and well-described datasets for the series of related but differently sized nanoparticles/nanopesticides are required.
Collapse
Affiliation(s)
- Juan José Villaverde
- Plant Protection Products Unit, DTEVPF, INIA, Crta, La Coruña, Km. 7.5, 28040 Madrid, Spain.
| | - Beatriz Sevilla-Morán
- Plant Protection Products Unit, DTEVPF, INIA, Crta, La Coruña, Km. 7.5, 28040 Madrid, Spain
| | - Carmen López-Goti
- Plant Protection Products Unit, DTEVPF, INIA, Crta, La Coruña, Km. 7.5, 28040 Madrid, Spain
| | | | - Pilar Sandín-España
- Plant Protection Products Unit, DTEVPF, INIA, Crta, La Coruña, Km. 7.5, 28040 Madrid, Spain
| |
Collapse
|
23
|
Mei J, Zhao J. Analysis and prediction of presynaptic and postsynaptic neurotoxins by Chou's general pseudo amino acid composition and motif features. J Theor Biol 2018; 447:147-153. [DOI: 10.1016/j.jtbi.2018.03.034] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Revised: 03/14/2018] [Accepted: 03/25/2018] [Indexed: 11/26/2022]
|
24
|
Patil RB, Barbosa EG, Sangshetti JN, Sawant SD, Zambre VP. 3D-QSAR with R: A new 3D-QSAR methodology applied to a set of DGAT1 inhibitors [corrected]. Comput Biol Chem 2018; 74:123-131. [PMID: 29602042 DOI: 10.1016/j.compbiolchem.2018.02.021] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2018] [Revised: 02/23/2018] [Accepted: 02/25/2018] [Indexed: 12/21/2022]
Abstract
The rapid advances in computational methods for the drug design have resulted in the accurate predictions of biological activities of ligands with or without the availability of enzyme structures. 3D-QSAR is one of the computational methods used for such purpose. Currently, freely available 3D-QSAR methods suffer the limitations like complex methodologies, difficulty in the analysis of results, applying the statistical methods and validations of models built. Present work describes simple and novel 3D-QSAR methodology, which uses bash scripts LQTA_R_LJ, LQTA_R_QQ and LQTA_R_HB using freely available R statistical program. These scripts then generate Leenard-Jones, Coulomb and Hydrogen bond descriptors. These descriptors provide the steric 3D property, electrostatic property and hydrogen bond formation capacity respectively. These scripts have been tested for the set of DGAT1 inhibitors and results showed that the 3D-QSAR models built have better predictive abilities in terms of R2 0.735, Q2loo 0.635 and R2ext 0.715. The 3D-QSAR model suggested that the substitutions of the alkyl group at the oxadiazolyl ring at the 6th position of the pyrrolo-pyridazine ring is undesirable, on the contrary, substituted phenyl ring at 7th position is responsible for the improved DGAT1 inhibitory activity. The analysis also suggested that 6th position could be substituted with the oxadiazolyl ring or analogous heterocyclic rings, where the 3rd position of such heterocyclic rings substituted with rigid hydrophobic substitute can improve DGAT1 activity.
Collapse
Affiliation(s)
- Rajesh B Patil
- Department of Pharmaceutical Chemistry, Sinhgad Technical Education Society's, Smt. Kashibai Navale College of Pharmacy, Pune-Saswad Road, Kondhwa (Bk.), Pune, 411048, Maharashtra, India.
| | - Euzebio G Barbosa
- Chemistry Institute, University of Campinas (UNICAMP), POB 6154, Campinas, SP, 13083-970, Brazil
| | - Jaiprakash N Sangshetti
- Department of Pharmaceutical Chemistry, Y. B. Chavan College of Pharmacy, Dr. Rafiq Zakaria Campus, Aurangabad, 431001, Maharashtra, India
| | - Sanjay D Sawant
- Department of Pharmaceutical Chemistry, Sinhgad Technical Education Society's, Smt. Kashibai Navale College of Pharmacy, Pune-Saswad Road, Kondhwa (Bk.), Pune, 411048, Maharashtra, India
| | - Vishal P Zambre
- Department of Pharmaceutical Chemistry, Sinhgad Technical Education Society's, Smt. Kashibai Navale College of Pharmacy, Pune-Saswad Road, Kondhwa (Bk.), Pune, 411048, Maharashtra, India
| |
Collapse
|
25
|
Dehzangi A, López Y, Lal SP, Taherzadeh G, Sattar A, Tsunoda T, Sharma A. Improving succinylation prediction accuracy by incorporating the secondary structure via helix, strand and coil, and evolutionary information from profile bigrams. PLoS One 2018; 13:e0191900. [PMID: 29432431 PMCID: PMC5809022 DOI: 10.1371/journal.pone.0191900] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2017] [Accepted: 01/12/2018] [Indexed: 11/18/2022] Open
Abstract
Post-translational modification refers to the biological mechanism involved in the enzymatic modification of proteins after being translated in the ribosome. This mechanism comprises a wide range of structural modifications, which bring dramatic variations to the biological function of proteins. One of the recently discovered modifications is succinylation. Although succinylation can be detected through mass spectrometry, its current experimental detection turns out to be a timely process unable to meet the exponential growth of sequenced proteins. Therefore, the implementation of fast and accurate computational methods has emerged as a feasible solution. This paper proposes a novel classification approach, which effectively incorporates the secondary structure and evolutionary information of proteins through profile bigrams for succinylation prediction. The proposed predictor, abbreviated as SSEvol-Suc, made use of the above features for training an AdaBoost classifier and consequently predicting succinylated lysine residues. When SSEvol-Suc was compared with four benchmark predictors, it outperformed them in metrics such as sensitivity (0.909), accuracy (0.875) and Matthews correlation coefficient (0.75).
Collapse
Affiliation(s)
- Abdollah Dehzangi
- Department of Computer Science, Morgan State University, Baltimore, Maryland, United States of America
| | - Yosvany López
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
- * E-mail:
| | - Sunil Pranit Lal
- School of Engineering & Advanced Technology, Massey University, Palmerston North, New Zealand
| | - Ghazaleh Taherzadeh
- School of Information and Communication Technology, Griffith University, Queensland, Australia
| | - Abdul Sattar
- School of Information and Communication Technology, Griffith University, Queensland, Australia
- Institute for Integrated and Intelligent Systems, Griffith University, Queensland, Australia
| | - Tatsuhiko Tsunoda
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
- CREST, JST, Tokyo, Japan
| | - Alok Sharma
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
- Institute for Integrated and Intelligent Systems, Griffith University, Queensland, Australia
- School of Engineering & Physics, University of the South Pacific, Suva, Fiji
| |
Collapse
|
26
|
Prediction of HIV-1 and HIV-2 proteins by using Chou's pseudo amino acid compositions and different classifiers. Sci Rep 2018; 8:2359. [PMID: 29402983 PMCID: PMC5799304 DOI: 10.1038/s41598-018-20819-x] [Citation(s) in RCA: 61] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Accepted: 01/24/2018] [Indexed: 01/02/2023] Open
Abstract
Human immunodeficiency virus (HIV) is the retroviral agent that causes acquired immune deficiency syndrome (AIDS). The number of HIV caused deaths was about 4 million in 2016 alone; it was estimated that about 33 million to 46 million people worldwide living with HIV. The HIV disease is especially harmful because the progressive destruction of the immune system prevents the ability of forming specific antibodies and to maintain an efficacious killer T cell activity. Successful prediction of HIV protein has important significance for the biological and pharmacological functions. In this study, based on the concept of Chou’s pseudo amino acid (PseAA) composition and increment of diversity (ID), support vector machine (SVM), logisitic regression (LR), and multilayer perceptron (MP) were presented to predict HIV-1 proteins and HIV-2 proteins. The results of the jackknife test indicated that the highest prediction accuracy and CC values were obtained by the SVM and MP were 0.9909 and 0.9763, respectively, indicating that the classifiers presented in this study were suitable for predicting two groups of HIV proteins.
Collapse
|
27
|
Feng P, Yang H, Ding H, Lin H, Chen W, Chou KC. iDNA6mA-PseKNC: Identifying DNA N 6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics 2018; 111:96-102. [PMID: 29360500 DOI: 10.1016/j.ygeno.2018.01.005] [Citation(s) in RCA: 188] [Impact Index Per Article: 31.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Revised: 12/24/2017] [Accepted: 01/07/2018] [Indexed: 11/29/2022]
Abstract
N6-methyladenine (6mA) is one kind of post-replication modification (PTM or PTRM) occurring in a wide range of DNA sequences. Accurate identification of its sites will be very helpful for revealing the biological functions of 6mA, but it is time-consuming and expensive to determine them by experiments alone. Unfortunately, so far, no bioinformatics tool is available to do so. To fill in such an empty area, we have proposed a novel predictor called iDNA6mA-PseKNC that is established by incorporating nucleotide physicochemical properties into Pseudo K-tuple Nucleotide Composition (PseKNC). It has been observed via rigorous cross-validations that the predictor's sensitivity (Sn), specificity (Sp), accuracy (Acc), and stability (MCC) are 93%, 100%, 96%, and 0.93, respectively. For the convenience of most experimental scientists, a user-friendly web server for iDNA6mA-PseKNC has been established at http://lin-group.cn/server/iDNA6mA-PseKNC, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved.
Collapse
Affiliation(s)
- Pengmian Feng
- Hebei Province Key Laboratory of Occupational Health and Safety for Coal Industry, School of Public Health, North China University of Science and Technology, Tangshan 063000, China
| | - Hui Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; Gordon Life Science Institute, Boston, MA 02478, USA.
| | - Wei Chen
- Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, Tangshan 063000, China; Gordon Life Science Institute, Boston, MA 02478, USA.
| | - Kuo-Chen Chou
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; Gordon Life Science Institute, Boston, MA 02478, USA.
| |
Collapse
|
28
|
Yu CY, Li XX, Yang H, Li YH, Xue WW, Chen YZ, Tao L, Zhu F. Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate. Int J Mol Sci 2018; 19:E183. [PMID: 29316706 PMCID: PMC5796132 DOI: 10.3390/ijms19010183] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2017] [Revised: 12/09/2017] [Accepted: 01/04/2018] [Indexed: 12/27/2022] Open
Abstract
The function of a protein is of great interest in the cutting-edge research of biological mechanisms, disease development and drug/target discovery. Besides experimental explorations, a variety of computational methods have been designed to predict protein function. Among these in silico methods, the prediction of BLAST is based on protein sequence similarity, while that of machine learning is also based on the sequence, but without the consideration of their similarity. This unique characteristic of machine learning makes it a good complement to BLAST and many other approaches in predicting the function of remotely relevant proteins and the homologous proteins of distinct function. However, the identification accuracies of these in silico methods and their false discovery rate have not yet been assessed so far, which greatly limits the usage of these algorithms. Herein, a comprehensive comparison of the performances among four popular prediction algorithms (BLAST, SVM, PNN and KNN) was conducted. In particular, the performance of these methods was systematically assessed by four standard statistical indexes based on the independent test datasets of 93 functional protein families defined by UniProtKB keywords. Moreover, the false discovery rates of these algorithms were evaluated by scanning the genomes of four representative model organisms (Homo sapiens, Arabidopsis thaliana, Saccharomyces cerevisiae and Mycobacterium tuberculosis). As a result, the substantially higher sensitivity of SVM and BLAST was observed compared with that of PNN and KNN. However, the machine learning algorithms (PNN, KNN and SVM) were found capable of substantially reducing the false discovery rate (SVM < PNN < KNN). In sum, this study comprehensively assessed the performance of four popular algorithms applied to protein function prediction, which could facilitate the selection of the most appropriate method in the related biomedical research.
Collapse
Affiliation(s)
- Chun Yan Yu
- Innovative Drug Research and Bioinformatics Group, School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing 401331, China.
- Innovative Drug Research and Bioinformatics Group, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
| | - Xiao Xu Li
- Innovative Drug Research and Bioinformatics Group, School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing 401331, China.
- Innovative Drug Research and Bioinformatics Group, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
| | - Hong Yang
- Innovative Drug Research and Bioinformatics Group, School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing 401331, China.
- Innovative Drug Research and Bioinformatics Group, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
| | - Ying Hong Li
- Innovative Drug Research and Bioinformatics Group, School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing 401331, China.
- Innovative Drug Research and Bioinformatics Group, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
| | - Wei Wei Xue
- Innovative Drug Research and Bioinformatics Group, School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing 401331, China.
| | - Yu Zong Chen
- Bioinformatics and Drug Design Group, Department of Pharmacy, and Center for Computational Science and Engineering, National University of Singapore, Singapore 117543, Singapore.
| | - Lin Tao
- School of Medicine, Hangzhou Normal University, Hangzhou 310012, China.
| | - Feng Zhu
- Innovative Drug Research and Bioinformatics Group, School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing 401331, China.
- Innovative Drug Research and Bioinformatics Group, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
| |
Collapse
|
29
|
Zhang L, Kong L. iRSpot-ADPM: Identify recombination spots by incorporating the associated dinucleotide product model into Chou's pseudo components. J Theor Biol 2018; 441:1-8. [PMID: 29305179 DOI: 10.1016/j.jtbi.2017.12.025] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2017] [Revised: 12/18/2017] [Accepted: 12/24/2017] [Indexed: 10/18/2022]
Abstract
Gene recombination is a key process to produce hereditary differences. Recombination spot identification plays an important role in revealing genome evolution and promoting DNA function study. However, traditional experiments are not good at identifying recombination spot with huge amounts of DNA sequences springed up by sequencing. At present, some machine learning methods have been proposed to speed up this identification process. However, the correlations between nucleotides pairs at different positions along DNA sequence is often ignored, which reflects the important sequence order information. For this purpose, this study proposes a novel feature extraction method, called iRSpot-ADPM, based on DNA property in a given DNA sequence. 85 features are selected from the original feature set according to the weights calculated by support vector machine. Five-fold cross validation tests on two widely used benchmark datasets indicate that the proposed method outperforms its existing counterparts on the individual specificity(Spec), Matthews correlation coefficient(MCC) value and overall accuracy(OA). The experimental results show that the proposed method is effective for accurate recombination spot identification. Moreover, it is anticipated that the proposed method could be extended to other biology sequence and be helpful in future research. The datasets and Matlab source codes can be download from the URL: http://stxy.neuq.edu.cn/info/1095/1157.htm.
Collapse
Affiliation(s)
- Lichao Zhang
- School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao 066004, PR China.
| | - Liang Kong
- School of Mathematics and Information Science & Technology, Hebei Normal University of Science & Technology, Qinhuangdao 066004, PR China
| |
Collapse
|
30
|
Prediction of protein subcellular localization with oversampling approach and Chou's general PseAAC. J Theor Biol 2018; 437:239-250. [DOI: 10.1016/j.jtbi.2017.10.030] [Citation(s) in RCA: 76] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2017] [Revised: 09/29/2017] [Accepted: 10/27/2017] [Indexed: 12/27/2022]
|
31
|
Borowska M, Brzozowska E, Kuć P, Oczeretko E, Mosdorf R, Laudański P. Identification of preterm birth based on RQA analysis of electrohysterograms. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018; 153:227-236. [PMID: 29157455 DOI: 10.1016/j.cmpb.2017.10.018] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/08/2017] [Revised: 10/10/2017] [Accepted: 10/12/2017] [Indexed: 06/07/2023]
Abstract
BACKGROUND AND OBJECTIVE Common methods for data analysis are mainly based on linear concepts, but in recent years nonlinear dynamics methods have been introduced. It is a well-known fact that In typical biological systems lack of stationarity and rather sudden changes of state are the properties distinguishing them from each other. There is an urgent need to better understand the mechanical activity of the myometrium (its contractility) to find a solution for preterm delivery problem, the largest cause of neonatal deaths and morbidity. The electrohysterographic signal (EHG) is a good non-linear, bioelectrical indicator for the detection and identification of term and preterm birth. METHODS The material of the study consists of EHG signals, obtained from 20 patients between the 24th and the 28th week of pregnancy with threatened preterm labor. The women were divided into two groups: those delivering after more than 7 days - group A (n = 10) and women delivering within 7 days - group B (n = 10). In this paper, an analysis of bioelectrical signals was performed by recurrence quantification analysis (RQA) and principal component analysis (PCA) to distinguish particular patterns for term and preterm birth. To date, these methods have not been used for the evaluation of bioelectrical activity in the uterus. To train novel classifiers for the EHG signals Support Vectors Machine classifications (multiclass SVM) was used. Statistical analysis was performed by means of non-parametric Mann-Whitney test. RESULTS From among eleven parameters obtained from recurrence quantification analysis, five most appropriate were chosen: Recurrence Rate, Determinism, Laminarity, Entropy and Recurrence Period Density Entropy. Significant increase (p < .001) of Recurrence Rate was found in patients from group B, while increase of parameters, besides Laminarity, was found in patients from group A. The accuracy of classification obtained as a result of the analysis increased to 83,32%. CONCLUSION We showed that the respectively selected recurrence quantificators obtained for that time series could be used to classify all those signals to the appropriate group. The proposed analysis could help in detecting preterm labor based on the EHG signal dynamics.
Collapse
Affiliation(s)
- Marta Borowska
- Faculty of Mechanical Engineering, Bialystok University of Technology, Wiejska 45C, 15-351 Białystok, Poland.
| | - Ewelina Brzozowska
- Faculty of Mechanical Engineering, Bialystok University of Technology, Wiejska 45C, 15-351 Białystok, Poland
| | - Paweł Kuć
- Department of Perinatology, Medical University of Bialystok, M. Skłodowskiej-Curie 24A, 15-276 Białystok, Poland
| | - Edward Oczeretko
- Faculty of Mechanical Engineering, Bialystok University of Technology, Wiejska 45C, 15-351 Białystok, Poland
| | - Romuald Mosdorf
- Faculty of Mechanical Engineering, Bialystok University of Technology, Wiejska 45C, 15-351 Białystok, Poland
| | - Piotr Laudański
- Department of Perinatology, Medical University of Bialystok, M. Skłodowskiej-Curie 24A, 15-276 Białystok, Poland
| |
Collapse
|
32
|
pLoc-mEuk: Predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general PseAAC. Genomics 2018; 110:50-58. [DOI: 10.1016/j.ygeno.2017.08.005] [Citation(s) in RCA: 180] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Revised: 08/10/2017] [Accepted: 08/11/2017] [Indexed: 11/22/2022]
|
33
|
Yang L, Ge S, Huang J, Bao X. Synthesis of novel (E)-2-(4-(1H-1,2,4-triazol-1-yl)styryl)-4- (alkyl/arylmethyleneoxy)quinazoline derivatives as antimicrobial agents. Mol Divers 2017; 22:71-82. [PMID: 29119421 DOI: 10.1007/s11030-017-9792-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2017] [Accepted: 10/23/2017] [Indexed: 10/18/2022]
Abstract
A series of novel (E)-2-(4-(1H-1,2,4-triazol-1-yl)styryl)-4-(alkyl/arylmethyleneoxy)quinazoline derivatives (4a-4s) were synthesized in good to excellent yields, and their structures were fully characterized by [Formula: see text] NMR, [Formula: see text] NMR, HRMS and IR spectra. The structure of compound 4b was further confirmed via single-crystal X-ray diffraction analysis. The bioassay results indicated that compounds 4s, 4q and 4n inhibit phytopathogenic bacterium Xanthomonas axonopodis pv. citri (Xac) more potently than commercial bactericide bismerthiazol. However, not a single compound can effectively inhibit three pathogenic fungi tested at 50 [Formula: see text].
Collapse
Affiliation(s)
- Lan Yang
- State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Centre for Research and Development of Fine Chemicals, Guizhou University, Guiyang, 550025, China
| | - Shijia Ge
- State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Centre for Research and Development of Fine Chemicals, Guizhou University, Guiyang, 550025, China
| | - Jian Huang
- State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Centre for Research and Development of Fine Chemicals, Guizhou University, Guiyang, 550025, China
| | - Xiaoping Bao
- State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Centre for Research and Development of Fine Chemicals, Guizhou University, Guiyang, 550025, China.
| |
Collapse
|
34
|
Xu C, Ge L, Zhang Y, Dehmer M, Gutman I. Computational prediction of therapeutic peptides based on graph index. J Biomed Inform 2017; 75:63-69. [DOI: 10.1016/j.jbi.2017.09.011] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Revised: 09/14/2017] [Accepted: 09/25/2017] [Indexed: 11/25/2022]
|
35
|
Cheng X, Xiao X, Chou KC. pLoc-mGneg: Predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general PseAAC. Genomics 2017; 110:S0888-7543(17)30102-7. [PMID: 28989035 DOI: 10.1016/j.ygeno.2017.10.002] [Citation(s) in RCA: 92] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2017] [Revised: 09/28/2017] [Accepted: 10/04/2017] [Indexed: 01/21/2023]
Abstract
Information of the proteins' subcellular localization is crucially important for revealing their biological functions in a cell, the basic unit of life. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to develop computational tools for timely identifying their subcellular locations based on the sequence information alone. The current study is focused on the Gram-negative bacterial proteins. Although considerable efforts have been made in protein subcellular prediction, the problem is far from being solved yet. This is because mounting evidences have indicated that many Gram-negative bacterial proteins exist in two or more location sites. Unfortunately, most existing methods can be used to deal with single-location proteins only. Actually, proteins with multi-locations may have some special biological functions important for both basic research and drug design. In this study, by using the multi-label theory, we developed a new predictor called "pLoc-mGneg" for predicting the subcellular localization of Gram-negative bacterial proteins with both single and multiple locations. Rigorous cross-validation on a high quality benchmark dataset indicated that the proposed predictor is remarkably superior to "iLoc-Gneg", the state-of-the-art predictor for the same purpose. For the convenience of most experimental scientists, a user-friendly web-server for the novel predictor has been established at http://www.jci-bioinfo.cn/pLoc-mGneg/, by which users can easily get their desired results without the need to go through the complicated mathematics involved.
Collapse
Affiliation(s)
- Xiang Cheng
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China; The Gordon Life Science Institute, Boston, MA 02478, USA.
| | - Xuan Xiao
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China; The Gordon Life Science Institute, Boston, MA 02478, USA.
| | - Kuo-Chen Chou
- The Gordon Life Science Institute, Boston, MA 02478, USA; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; Faculty of Computing and Information Technology in Rabigh, King Abdulaziz University, Jeddah, Saudi Arabia.
| |
Collapse
|
36
|
Du QS, Wang SQ, Xie NZ, Wang QY, Huang RB, Chou KC. 2L-PCA: a two-level principal component analyzer for quantitative drug design and its applications. Oncotarget 2017; 8:70564-70578. [PMID: 29050302 PMCID: PMC5642577 DOI: 10.18632/oncotarget.19757] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Accepted: 06/30/2017] [Indexed: 01/25/2023] Open
Abstract
A two-level principal component predictor (2L-PCA) was proposed based on the principal component analysis (PCA) approach. It can be used to quantitatively analyze various compounds and peptides about their functions or potentials to become useful drugs. One level is for dealing with the physicochemical properties of drug molecules, while the other level is for dealing with their structural fragments. The predictor has the self-learning and feedback features to automatically improve its accuracy. It is anticipated that 2L-PCA will become a very useful tool for timely providing various useful clues during the process of drug development.
Collapse
Affiliation(s)
- Qi-Shi Du
- State Key Laboratory of China for Biomass Energy Enzyme Technology, National Engineering Research Center of China for Non-Food Biorefinery, Guangxi Academy of Sciences, Nanning 530007, China
- Gordon Life Science Institute, Boston, MA 02478, USA
| | - Shu-Qing Wang
- School of Pharmacy, Tianjin Medical University, Tianjin 300070, China
| | - Neng-Zhong Xie
- State Key Laboratory of China for Biomass Energy Enzyme Technology, National Engineering Research Center of China for Non-Food Biorefinery, Guangxi Academy of Sciences, Nanning 530007, China
| | - Qing-Yan Wang
- State Key Laboratory of China for Biomass Energy Enzyme Technology, National Engineering Research Center of China for Non-Food Biorefinery, Guangxi Academy of Sciences, Nanning 530007, China
| | - Ri-Bo Huang
- State Key Laboratory of China for Biomass Energy Enzyme Technology, National Engineering Research Center of China for Non-Food Biorefinery, Guangxi Academy of Sciences, Nanning 530007, China
| | - Kuo-Chen Chou
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
- Gordon Life Science Institute, Boston, MA 02478, USA
| |
Collapse
|
37
|
pLoc-mVirus: Predict subcellular localization of multi-location virus proteins via incorporating the optimal GO information into general PseAAC. Gene 2017; 628:315-321. [DOI: 10.1016/j.gene.2017.07.036] [Citation(s) in RCA: 135] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Revised: 07/08/2017] [Accepted: 07/11/2017] [Indexed: 12/25/2022]
|
38
|
Highly accurate prediction of protein self-interactions by incorporating the average block and PSSM information into the general PseAAC. J Theor Biol 2017; 432:80-86. [PMID: 28802824 DOI: 10.1016/j.jtbi.2017.08.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Revised: 08/05/2017] [Accepted: 08/08/2017] [Indexed: 11/23/2022]
Abstract
It is a challenging task for fundamental research whether proteins can interact with their partners. Protein self-interaction (SIP) is a special case of PPIs, which plays a key role in the regulation of cellular functions. Due to the limitations of experimental self-interaction identification, it is very important to develop an effective biological tool for predicting SIPs based on protein sequences. In the study, we developed a novel computational method called RVM-AB that combines the Relevance Vector Machine (RVM) model and Average Blocks (AB) for detecting SIPs from protein sequences. Firstly, Average Blocks (AB) feature extraction method is employed to represent protein sequences on a Position Specific Scoring Matrix (PSSM). Secondly, Principal Component Analysis (PCA) method is used to reduce the dimension of AB vector for reducing the influence of noise. Then, by employing the Relevance Vector Machine (RVM) algorithm, the performance of RVM-AB is assessed and compared with the state-of-the-art support vector machine (SVM) classifier and other exiting methods on yeast and human datasets respectively. Using the fivefold test experiment, RVM-AB model achieved very high accuracies of 93.01% and 97.72% on yeast and human datasets respectively, which are significantly better than the method based on SVM classifier and other previous methods. The experimental results proved that the RVM-AB prediction model is efficient and robust. It can be an automatic decision support tool for detecting SIPs. For facilitating extensive studies for future proteomics research, the RVMAB server is freely available for academic use at http://219.219.62.123:8888/SIP_AB.
Collapse
|
39
|
Xiao X, Cheng X, Su S, Mao Q, Chou KC. pLoc-mGpos: Incorporate Key Gene Ontology Information into General PseAAC for Predicting Subcellular Localization of Gram-Positive Bacterial Proteins. ACTA ACUST UNITED AC 2017. [DOI: 10.4236/ns.2017.99032] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|