1
|
Yin X, Wang W, Seah SYK, Mine Y, Fan MZ. Deglycosylation Differentially Regulates Weaned Porcine Gut Alkaline Phosphatase Isoform Functionality along the Longitudinal Axis. Pathogens 2023; 12:pathogens12030407. [PMID: 36986329 PMCID: PMC10053101 DOI: 10.3390/pathogens12030407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 02/27/2023] [Accepted: 02/28/2023] [Indexed: 03/06/2023] Open
Abstract
Gut alkaline phosphatases (AP) dephosphorylate the lipid moiety of endotoxin and other pathogen-associated-molecular patterns members, thus maintaining gut eubiosis and preventing metabolic endotoxemia. Early weaned pigs experience gut dysbiosis, enteric diseases and growth retardation in association with decreased intestinal AP functionality. However, the role of glycosylation in modulation of the weaned porcine gut AP functionality is unclear. Herein three different research approaches were taken to investigate how deglycosylation affected weaned porcine gut AP activity kinetics. In the first approach, weaned porcine jejunal AP isoform (IAP) was fractionated by the fast protein-liquid chromatography and purified IAP fractions were kinetically characterized to be the higher-affinity and lower-capacity glycosylated mature IAP (p < 0.05) in comparison with the lower-affinity and higher-capacity non-glycosylated pre-mature IAP. The second approach enzyme activity kinetic analyses showed that N-deglycosylation of AP by the peptide N-glycosidase-F enzyme reduced (p < 0.05) the IAP maximal activity in the jejunum and ileum and decreased AP affinity (p < 0.05) in the large intestine. In the third approach, the porcine IAP isoform-X1 (IAPX1) gene was overexpressed in the prokaryotic ClearColiBL21 (DE3) cell and the recombinant porcine IAPX1 was associated with reduced (p < 0.05) enzyme affinity and maximal enzyme activity. Therefore, levels of glycosylation can modulate plasticity of weaned porcine gut AP functionality towards maintaining gut microbiome and the whole-body physiological status.
Collapse
Affiliation(s)
- Xindi Yin
- Department of Animal Biosciences, University of Guelph, Guelph, ON N1G 2W1, Canada
- Key Laboratory of Precision Nutrition and Food Quality, Department of Nutrition and Health, China Agricultural University, Beijing 100083, China
| | - Weijun Wang
- Department of Animal Biosciences, University of Guelph, Guelph, ON N1G 2W1, Canada
- Canadian Food Inspection Agency (CFIA)-Ontario Operation, Guelph, ON N1G 4S9, Canada
| | - Stephen Y. K. Seah
- Department of Molecular and Cellular Biology, University of Guelph, Guelph, ON N1G 2W1, Canada
| | - Yoshinori Mine
- Department of Food Science, University of Guelph, Guelph, ON N1G 2W1, Canada
| | - Ming Z. Fan
- Department of Animal Biosciences, University of Guelph, Guelph, ON N1G 2W1, Canada
- One Health Institute, University of Guelph, Guelph, ON N1G 2W1, Canada
- Correspondence:
| |
Collapse
|
2
|
Zhou J, Huang S, Zhou T, Armaghani DJ, Qiu Y. Employing a genetic algorithm and grey wolf optimizer for optimizing RF models to evaluate soil liquefaction potential. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10140-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
|
3
|
Yu G, Zhang L, Zhang Y, Zhou J, Zhang T, Bi X. Prediction and risk stratification from hospital discharge records based on Hierarchical sLDA. BMC Med Inform Decis Mak 2022; 22:14. [PMID: 35033059 PMCID: PMC8760773 DOI: 10.1186/s12911-022-01747-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 01/05/2022] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND The greatly accelerated development of information technology has conveniently provided adoption for risk stratification, which means more beneficial for both patients and clinicians. Risk stratification offers accurate individualized prevention and therapeutic decision making etc. Hospital discharge records (HDRs) routinely include accurate conclusions of diagnoses of the patients. For this reason, in this paper, we propose an improved model for risk stratification in a supervised fashion by exploring HDRs about coronary heart disease (CHD). METHODS We introduced an improved four-layer supervised latent Dirichlet allocation (sLDA) approach called Hierarchical sLDA model, which categorized patient features in HDRs as patient feature-value pairs in one-hot way according to clinical guidelines for lab test of CHD. To address the data missing and imbalance problem, RFs and SMOTE methods are used respectively. After TF-IDF processing of datasets, variational Bayes expectation-maximization method and generalized linear model were used to recognize the latent clinical state of a patient, i.e., risk stratification, as well as to predict CHD. Accuracy, macro-F1, training and testing time performance were used to evaluate the performance of our model. RESULTS According to the characteristics of our datasets, i.e., patient feature-value pairs, we construct a supervised topic model by adding one more Dirichlet distribution hyperparameter to sLDA. Compared with established supervised algorithm Multi-class sLDA model, we demonstrate that our proposed approach enhances training time by 59.74% and testing time by 25.58% but almost no loss of average prediction accuracy on our datasets. CONCLUSIONS A model for risk stratification and prediction of CHD based on sLDA model was proposed. Experimental results show that Hierarchical sLDA model we proposed is competitive in time performance and accuracy. Hierarchical processing of patient features can significantly improve the disadvantages of low efficiency and time-consuming Gibbs sampling of sLDA model.
Collapse
Affiliation(s)
- Guanglei Yu
- School of Medical Engineering and Technology, Xinjiang Medical University, No.567 North Shangde Road, Urumqi, China
| | - Linlin Zhang
- College of Information Science and Engineering, Xinjiang University, Urumqi, China
| | - Ying Zhang
- The First Affiliated Hospital of Xinjiang Medical University, Urumqi, China
| | - Jiaqi Zhou
- School of Medical Engineering and Technology, Xinjiang Medical University, No.567 North Shangde Road, Urumqi, China
| | - Tao Zhang
- School of Medical Engineering and Technology, Xinjiang Medical University, No.567 North Shangde Road, Urumqi, China
| | - Xuehua Bi
- School of Medical Engineering and Technology, Xinjiang Medical University, No.567 North Shangde Road, Urumqi, China.
| |
Collapse
|
4
|
Iannetta AA, Hicks LM. Maximizing Depth of PTM Coverage: Generating Robust MS Datasets for Computational Prediction Modeling. Methods Mol Biol 2022; 2499:1-41. [PMID: 35696073 DOI: 10.1007/978-1-0716-2317-6_1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Post-translational modifications (PTMs) regulate complex biological processes through the modulation of protein activity, stability, and localization. Insights into the specific modification type and localization within a protein sequence can help ascertain functional significance. Computational models are increasingly demonstrated to offer a low-cost, high-throughput method for comprehensive PTM predictions. Algorithms are optimized using existing experimental PTM data, thus accurate prediction performance relies on the creation of robust datasets. Herein, advancements in mass spectrometry-based proteomics technologies to maximize PTM coverage are reviewed. Further, requisite experimental validation approaches for PTM predictions are explored to ensure that follow-up mechanistic studies are focused on accurate modification sites.
Collapse
Affiliation(s)
- Anthony A Iannetta
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Leslie M Hicks
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|
5
|
Zhu Y, Yin S, Zheng J, Shi Y, Jia C. O-glycosylation site prediction for Homo sapiens by combining properties and sequence features with support vector machine. J Bioinform Comput Biol 2021; 20:2150029. [PMID: 34806952 DOI: 10.1142/s0219720021500293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
O-glycosylation is a protein posttranslational modification important in regulating almost all cells. It is related to a large number of physiological and pathological phenomena. Recognizing O-glycosylation sites is the key to further investigating the molecular mechanism of protein posttranslational modification. This study aimed to collect a reliable dataset on Homo sapiens and develop an O-glycosylation predictor for Homo sapiens, named Captor, through multiple features. A random undersampling method and a synthetic minority oversampling technique were employed to deal with imbalanced data. In addition, the Kruskal-Wallis (K-W) test was adopted to optimize feature vectors and improve the performance of the model. A support vector machine, due to its optimal performance, was used to train and optimize the final prediction model after a comprehensive comparison of various classifiers in traditional machine learning methods and deep learning. On the independent test set, Captor outperformed the existing O-glycosylation tool, suggesting that Captor could provide more instructive guidance for further experimental research on O-glycosylation. The source code and datasets are available at https://github.com/YanZhu06/Captor/.
Collapse
Affiliation(s)
- Yan Zhu
- School of Science, Dalian Maritime University, Dalian 116026, P. R. China
| | - Shuwan Yin
- School of Science, Dalian Maritime University, Dalian 116026, P. R. China
| | - Jia Zheng
- School of Science, Dalian Maritime University, Dalian 116026, P. R. China
| | - Yixia Shi
- School of Mathematics and Statistics, Lingnan Normal University, Zhanjiang 524048, P. R. China
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, Dalian 116026, P. R. China
| |
Collapse
|
6
|
Do DT, Le TQT, Le NQK. Using deep neural networks and biological subwords to detect protein S-sulfenylation sites. Brief Bioinform 2020; 22:5866114. [PMID: 32613242 DOI: 10.1093/bib/bbaa128] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 05/11/2020] [Accepted: 05/26/2020] [Indexed: 12/11/2022] Open
Abstract
Protein S-sulfenylation is one kind of crucial post-translational modifications (PTMs) in which the hydroxyl group covalently binds to the thiol of cysteine. Some recent studies have shown that this modification plays an important role in signaling transduction, transcriptional regulation and apoptosis. To date, the dynamic of sulfenic acids in proteins remains unclear because of its fleeting nature. Identifying S-sulfenylation sites, therefore, could be the key to decipher its mysterious structures and functions, which are important in cell biology and diseases. However, due to the lack of effective methods, scientists in this field tend to be limited in merely a handful of some wet lab techniques that are time-consuming and not cost-effective. Thus, this motivated us to develop an in silico model for detecting S-sulfenylation sites only from protein sequence information. In this study, protein sequences served as natural language sentences comprising biological subwords. The deep neural network was consequentially employed to perform classification. The performance statistics within the independent dataset including sensitivity, specificity, accuracy, Matthews correlation coefficient and area under the curve rates achieved 85.71%, 69.47%, 77.09%, 0.5554 and 0.833, respectively. Our results suggested that the proposed method (fastSulf-DNN) achieved excellent performance in predicting S-sulfenylation sites compared to other well-known tools on a benchmark dataset.
Collapse
Affiliation(s)
- Duyen Thi Do
- Faculty of Applied Sciences, Ton Duc Thang University
| | | | - Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, Taipei Medical University
| |
Collapse
|
7
|
Gana R, Vasudevan S. Ridge regression estimated linear probability model predictions of O-glycosylation in proteins with structural and sequence data. BMC Mol Cell Biol 2019; 20:21. [PMID: 31253080 PMCID: PMC6599295 DOI: 10.1186/s12860-019-0200-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Accepted: 05/27/2019] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND To-date, no claim regarding finding a consensus sequon for O-glycosylation has been made. Thus, predicting the likelihood of O-glycosylation with sequence and structural information using classical regression analysis is quite difficult. In particular, if a binary response is used to distinguish between O-glycosylated and non-O-glycosylated sequences, an appropriate set of non-O-glycosylatable sequences is hard to find. RESULTS Three sequences from similar post-translational modifications (PTMs) of proteins occurring at, or very near, the S/T-site are analyzed: N-glycosylation, O-mucin type (O-GalNAc) glycosylation, and phosphorylation. Results found include: 1) The consensus composite sequon for O-glycosylation is: ~(W-S/T-W), where "~" denotes the "not" operator. 2) The consensus sequon for phosphorylation is ~(W-S/T/Y/H-W); although W-S/T/Y/H-W is not an absolute inhibitor of phosphorylation. 3) For linear probability model (LPM) estimation, N-glycosylated sequences are good approximations to non-O-glycosylatable sequences; although N - ~P - S/T is not an absolute inhibitor of O-glycosylation. 4) The selective positioning of an amino acid along the sequence, differentiates the PTMs of proteins. 5) Some N-glycosylated sequences are also phosphorylated at the S/T-site in the N - ~P - S/T sequon. 6) ASA values for N-glycosylated sequences are stochastically larger than those for O-GlcNAc glycosylated sequences. 7) Structural attributes (beta turn II, II´, helix, beta bridges, beta hairpin, and the phi angle) are significant LPM predictors of O-GlcNAc glycosylation. The LPM with sequence and structural data as explanatory variables yields a Kolmogorov-Smirnov (KS) statistic of 99%. 8) With only sequence data, the KS statistic erodes to 80%, and 21% of out-of-sample O-GlcNAc glycosylated sequences are mispredicted as not being glycosylated. The 95% confidence interval around this mispredictions rate is 16% to 26%. CONCLUSIONS The data indicates the existence of a consensus sequon for O-glycosylation; and underscores the germaneness of structural information for predicting the likelihood of O-glycosylation.
Collapse
Affiliation(s)
- Rajaram Gana
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, Washington D.C, USA.
| | - Sona Vasudevan
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, Washington D.C, USA.
| |
Collapse
|
8
|
SVM-SulfoSite: A support vector machine based predictor for sulfenylation sites. Sci Rep 2018; 8:11288. [PMID: 30050050 PMCID: PMC6062547 DOI: 10.1038/s41598-018-29126-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Accepted: 07/02/2018] [Indexed: 12/15/2022] Open
Abstract
Protein S-sulfenylation, which results from oxidation of free thiols on cysteine residues, has recently emerged as an important post-translational modification that regulates the structure and function of proteins involved in a variety of physiological and pathological processes. By altering the size and physiochemical properties of modified cysteine residues, sulfenylation can impact the cellular function of proteins in several different ways. Thus, the ability to rapidly and accurately identify putative sulfenylation sites in proteins will provide important insights into redox-dependent regulation of protein function in a variety of cellular contexts. Though bottom-up proteomic approaches, such as tandem mass spectrometry (MS/MS), provide a wealth of information about global changes in the sulfenylation state of proteins, MS/MS-based experiments are often labor-intensive, costly and technically challenging. Therefore, to complement existing proteomic approaches, researchers have developed a series of computational tools to identify putative sulfenylation sites on proteins. However, existing methods often suffer from low accuracy, specificity, and/or sensitivity. In this study, we developed SVM-SulfoSite, a novel sulfenylation prediction tool that uses support vector machines (SVM) to identify key determinants of sulfenylation among five feature classes: binary code, physiochemical properties, k-space amino acid pairs, amino acid composition and high-quality physiochemical indices. Using 10-fold cross-validation, SVM-SulfoSite achieved 95% sensitivity and 83% specificity, with an overall accuracy of 89% and Matthew’s correlation coefficient (MCC) of 0.79. Likewise, using an independent test set of experimentally identified sulfenylation sites, our method achieved scores of 74%, 62%, 80% and 0.42 for accuracy, sensitivity, specificity and MCC, with an area under the receiver operator characteristic (ROC) curve of 0.81. Moreover, in side-by-side comparisons, SVM-SulfoSite performed as well as or better than existing sulfenylation prediction tools. Together, these results suggest that our method represents a robust and complementary technique for advanced exploration of protein S-sulfenylation.
Collapse
|
9
|
Pashaei E, Aydin N. Binary black hole algorithm for feature selection and classification on biological data. Appl Soft Comput 2017. [DOI: 10.1016/j.asoc.2017.03.002] [Citation(s) in RCA: 114] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
10
|
Ma L, Fan S. CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests. BMC Bioinformatics 2017; 18:169. [PMID: 28292263 PMCID: PMC5351181 DOI: 10.1186/s12859-017-1578-z] [Citation(s) in RCA: 61] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Accepted: 03/03/2017] [Indexed: 01/04/2023] Open
Abstract
Background The random forests algorithm is a type of classifier with prominent universality, a wide application range, and robustness for avoiding overfitting. But there are still some drawbacks to random forests. Therefore, to improve the performance of random forests, this paper seeks to improve imbalanced data processing, feature selection and parameter optimization. Results We propose the CURE-SMOTE algorithm for the imbalanced data classification problem. Experiments on imbalanced UCI data reveal that the combination of Clustering Using Representatives (CURE) enhances the original synthetic minority oversampling technique (SMOTE) algorithms effectively compared with the classification results on the original data using random sampling, Borderline-SMOTE1, safe-level SMOTE, C-SMOTE, and k-means-SMOTE. Additionally, the hybrid RF (random forests) algorithm has been proposed for feature selection and parameter optimization, which uses the minimum out of bag (OOB) data error as its objective function. Simulation results on binary and higher-dimensional data indicate that the proposed hybrid RF algorithms, hybrid genetic-random forests algorithm, hybrid particle swarm-random forests algorithm and hybrid fish swarm-random forests algorithm can achieve the minimum OOB error and show the best generalization ability. Conclusion The training set produced from the proposed CURE-SMOTE algorithm is closer to the original data distribution because it contains minimal noise. Thus, better classification results are produced from this feasible and effective algorithm. Moreover, the hybrid algorithm's F-value, G-mean, AUC and OOB scores demonstrate that they surpass the performance of the original RF algorithm. Hence, this hybrid algorithm provides a new way to perform feature selection and parameter optimization.
Collapse
Affiliation(s)
- Li Ma
- School of Information Science and Technology, Jinan University, Guangzhou, 510632, China
| | - Suohai Fan
- School of Information Science and Technology, Jinan University, Guangzhou, 510632, China.
| |
Collapse
|