1
|
Liu L, Huang Y, Zheng Y, Liao Y, Ma S, Wang Q. ScnML models single-cell transcriptome to predict spinal cord neuronal cell status. Front Genet 2024; 15:1413484. [PMID: 38894722 PMCID: PMC11183327 DOI: 10.3389/fgene.2024.1413484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Accepted: 05/20/2024] [Indexed: 06/21/2024] Open
Abstract
Injuries to the spinal cord nervous system often result in permanent loss of sensory, motor, and autonomic functions. Accurately identifying the cellular state of spinal cord nerves is extremely important and could facilitate the development of new therapeutic and rehabilitative strategies. Existing experimental techniques for identifying the development of spinal cord nerves are both labor-intensive and costly. In this study, we developed a machine learning predictor, ScnML, for predicting subpopulations of spinal cord nerve cells as well as identifying marker genes. The prediction performance of ScnML was evaluated on the training dataset with an accuracy of 94.33%. Based on XGBoost, ScnML on the test dataset achieved 94.08% 94.24%, 94.26%, and 94.24% accuracies with precision, recall, and F1-measure scores, respectively. Importantly, ScnML identified new significant genes through model interpretation and biological landscape analysis. ScnML can be a powerful tool for predicting the status of spinal cord neuronal cells, revealing potential specific biomarkers quickly and efficiently, and providing crucial insights for precision medicine and rehabilitation recovery.
Collapse
Affiliation(s)
- Lijia Liu
- School of Recreation and Community Sport, Capital University of Physical Education and Sports, Beijing, China
| | - Yuxuan Huang
- Department of Neuroscience in the Behavioral Sciences, Duke University and Duke Kunshan University, Suzhou, Jiangsu, China
| | - Yuan Zheng
- Taizhou Hospital of Zhejiang Province, Wenzhou Medical University, Luqiao, China
| | - Yihan Liao
- Taizhou Hospital of Zhejiang Province, Wenzhou Medical University, Luqiao, China
| | - Siyuan Ma
- School of Recreation and Community Sport, Capital University of Physical Education and Sports, Beijing, China
| | - Qian Wang
- Department of Neurology, The First Hospital of Tsinghua University, Beijing, China
| |
Collapse
|
2
|
Yang S, Liu D, Song Y, Liang Y, Yu H, Zuo Y. Designing a structure-function alphabet of helix based on reduced amino acid clusters. Arch Biochem Biophys 2024; 754:109942. [PMID: 38387828 DOI: 10.1016/j.abb.2024.109942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 02/16/2024] [Accepted: 02/19/2024] [Indexed: 02/24/2024]
Abstract
Several simple secondary structures could form complex and diverse functional proteins, meaning that secondary structures may contain a lot of hidden information and are arranged according to certain principles, to carry enough information of functional specificity and diversity. However, these inner information and principles have not been understood systematically. In our study, we designed a structure-function alphabet of helix based on reduced amino acid clusters to describe the typical features of helices and delve into the information. Firstly, we selected 480 typical helices from membrane proteins, zymoproteins, transcription factors, and other proteins to define and calculate the interval range, and the helices are classified in terms of hydrophilicity, charge and length: (1) hydrophobic helix (≤43%), amphiphilic helix (43%∼71%), and hydrophilic helix (≥71%). (2) positive helix, negative helix, electrically neutral helix and uncharged helix. (3) short helix (≤8 aa), medium-length helix (9-28 aa), and long helix (≥29 aa). Then, we designed an alphabet containing 36 triplet codes according to the above classification, so that the main features of each helix can be represented by only three letters. This alphabet not only preliminarily defined the helix characteristics, but also greatly reduced the informational dimension of protein structure. Finally, we present an application example to demonstrate the value of the structure-function alphabet in protein functional determination and differentiation.
Collapse
Affiliation(s)
- Siqi Yang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, 010021, China
| | - Dongyang Liu
- Key Laboratory of Photobiology, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China; University of Chinese Academy of Sciences, Beijing, China
| | - Yancheng Song
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, 010021, China
| | - Yuchao Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, 010021, China
| | - Haoyu Yu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, 010021, China
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, 010021, China.
| |
Collapse
|
3
|
Liu S, Liang Y, Li J, Yang S, Liu M, Liu C, Yang D, Zuo Y. Integrating reduced amino acid composition into PSSM for improving copper ion-binding protein prediction. Int J Biol Macromol 2023:124993. [PMID: 37307968 DOI: 10.1016/j.ijbiomac.2023.124993] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/12/2023] [Accepted: 05/19/2023] [Indexed: 06/14/2023]
Abstract
Copper ion-binding proteins play an essential role in metabolic processes and are critical factors in many diseases, such as breast cancer, lung cancer, and Menkes disease. Many algorithms have been developed for predicting metal ion classification and binding sites, but none have been applied to copper ion-binding proteins. In this study, we developed a copper ion-bound protein classifier, RPCIBP, which integrating the reduced amino acid composition into position-specific score matrix (PSSM). The reduced amino acid composition filters out a large number of useless evolutionary features, improving the operational efficiency and predictive ability of the model (feature dimension from 2900 to 200, ACC from 83 % to 85.1 %). Compared with the basic model using only three sequence feature extraction methods (ACC in training set between 73.8 %-86.2 %, ACC in test set between 69.3 %-87.5 %), the model integrating the evolutionary features of the reduced amino acid composition showed higher accuracy and robustness (ACC in training set between 83.1 %-90.8 %, ACC in test set between 79.1 %-91.9 %). Best copper ion-binding protein classifiers filtered by feature selection progress were deployed in a user-friendly web server (http://bioinfor.imu.edu.cn/RPCIBP). RPCIBP can accurately predict copper ion-binding proteins, which is convenient for further structural and functional studies, and conducive to mechanism exploration and target drug development.
Collapse
Affiliation(s)
- Shanghua Liu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China; Inner Mongolia International Mongolian Hospital, Hohhot 010065, China
| | - Yuchao Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China; Digital College, Inner Mongolia Intelligent Union Big Data Academy, Hohhot 010010, China
| | - Jinzhao Li
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Siqi Yang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Ming Liu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Chengfang Liu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Dezhi Yang
- Inner Mongolia International Mongolian Hospital, Hohhot 010065, China.
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China; Inner Mongolia International Mongolian Hospital, Hohhot 010065, China; Digital College, Inner Mongolia Intelligent Union Big Data Academy, Hohhot 010010, China.
| |
Collapse
|
4
|
Nguyen Q, Tran HV, Nguyen BP, Do TTT. Identifying Transcription Factors That Prefer Binding to Methylated DNA Using Reduced G-Gap Dipeptide Composition. ACS OMEGA 2022; 7:32322-32330. [PMID: 36119976 PMCID: PMC9475634 DOI: 10.1021/acsomega.2c03696] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Accepted: 08/23/2022] [Indexed: 06/15/2023]
Abstract
Transcription factors (TFs) play an important role in gene expression and regulation of 3D genome conformation. TFs have ability to bind to specific DNA fragments called enhancers and promoters. Some TFs bind to promoter DNA fragments which are near the transcription initiation site and form complexes that allow polymerase enzymes to bind to initiate transcription. Previous studies showed that methylated DNAs had ability to inhibit and prevent TFs from binding to DNA fragments. However, recent studies have found that there were TFs that could bind to methylated DNA fragments. The identification of these TFs is an important steppingstone to a better understanding of cellular gene expression mechanisms. However, as experimental methods are often time-consuming and labor-intensive, developing computational methods is essential. In this study, we propose two machine learning methods for two problems: (1) identifying TFs and (2) identifying TFs that prefer binding to methylated DNA targets (TFPMs). For the TF identification problem, the proposed method uses the position-specific scoring matrix for data representation and a deep convolutional neural network for modeling. This method achieved 90.56% sensitivity, 83.96% specificity, and an area under the receiver operating characteristic curve (AUC) of 0.9596 on an independent test set. For the TFPM identification problem, we propose to use the reduced g-gap dipeptide composition for data representation and the support vector machine algorithm for modeling. This method achieved 82.61% sensitivity, 64.86% specificity, and an AUC of 0.8486 on another independent test set. These results are higher than those of other studies on the same problems.
Collapse
Affiliation(s)
- Quang
H. Nguyen
- School
of Information and Communication Technology, Hanoi University of Science and Technology, 1 Dai Co Viet, Hanoi 100000, Vietnam
| | - Hoang V. Tran
- School
of Information and Communication Technology, Hanoi University of Science and Technology, 1 Dai Co Viet, Hanoi 100000, Vietnam
| | - Binh P. Nguyen
- School
of Mathematics and Statistics, Victoria
University of Wellington, Kelburn Parade, Wellington 6140, New Zealand
| | - Trang T. T. Do
- School
of Innovation, Design and Technology, Wellington
Institute of Technology, 21 Kensington Avenue, Lower Hutt 5012, New Zealand
| |
Collapse
|
5
|
Liang Y, Yang S, Zheng L, Wang H, Zhou J, Huang S, Yang L, Zuo Y. Research progress of reduced amino acid alphabets in protein analysis and prediction. Comput Struct Biotechnol J 2022; 20:3503-3510. [PMID: 35860409 PMCID: PMC9284397 DOI: 10.1016/j.csbj.2022.07.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 06/30/2022] [Accepted: 07/01/2022] [Indexed: 11/29/2022] Open
Abstract
A comprehensive summary of the literature on the reduced amino acid alphabets. A systematic review of the development history of reduced amino acid alphabets. Rich application cases of amino acid reduction alphabets are described in the article. A detailed analysis of the properties and uses of the reduced amino acid alphabets.
Proteins are the executors of cellular physiological activities, and accurate structural and function elucidation are crucial for the refined mapping of proteins. As a feature engineering method, the reduction of amino acid composition is not only an important method for protein structure and function analysis, but also opens a broad horizon for the complex field of machine learning. Representing sequences with fewer amino acid types greatly reduces the complexity and noise of traditional feature engineering in dimension, and provides more interpretable predictive models for machine learning to capture key features. In this paper, we systematically reviewed the strategy and method studies of the reduced amino acid (RAA) alphabets, and summarized its main research in protein sequence alignment, functional classification, and prediction of structural properties, respectively. In the end, we gave a comprehensive analysis of 672 RAA alphabets from 74 reduction methods.
Collapse
Affiliation(s)
- Yuchao Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Siqi Yang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Lei Zheng
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Hao Wang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Jian Zhou
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Shenghui Huang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
- Corresponding authors.
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
- Corresponding authors.
| |
Collapse
|
6
|
Zou H. iAHTP-LH: Integrating Low-Order and High-Order Correlation Information for Identifying Antihypertensive Peptides. Int J Pept Res Ther 2022. [DOI: 10.1007/s10989-022-10414-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
7
|
Zou H. Identifying blood‐brain barrier peptides by using amino acids physicochemical properties and features fusion method. Pept Sci (Hoboken) 2021. [DOI: 10.1002/pep2.24247] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Hongliang Zou
- School of Communications and Electronics Jiangxi Science and Technology Normal University Nanchang China
| |
Collapse
|
8
|
Zhou J, Bo S, Wang H, Zheng L, Liang P, Zuo Y. Identification of Disease-Related 2-Oxoglutarate/Fe (II)-Dependent Oxygenase Based on Reduced Amino Acid Cluster Strategy. Front Cell Dev Biol 2021; 9:707938. [PMID: 34336861 PMCID: PMC8323781 DOI: 10.3389/fcell.2021.707938] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 06/10/2021] [Indexed: 11/17/2022] Open
Abstract
The 2-oxoglutarate/Fe (II)-dependent (2OG) oxygenase superfamily is mainly responsible for protein modification, nucleic acid repair and/or modification, and fatty acid metabolism and plays important roles in cancer, cardiovascular disease, and other diseases. They are likely to become new targets for the treatment of cancer and other diseases, so the accurate identification of 2OG oxygenases is of great significance. Many computational methods have been proposed to predict functional proteins to compensate for the time-consuming and expensive experimental identification. However, machine learning has not been applied to the study of 2OG oxygenases. In this study, we developed OGFE_RAAC, a prediction model to identify whether a protein is a 2OG oxygenase. To improve the performance of OGFE_RAAC, 673 amino acid reduction alphabets were used to determine the optimal feature representation scheme by recoding the protein sequence. The 10-fold cross-validation test showed that the accuracy of the model in identifying 2OG oxygenases is 91.04%. Besides, the independent dataset results also proved that the model has excellent generalization and robustness. It is expected to become an effective tool for the identification of 2OG oxygenases. With further research, we have also found that the function of 2OG oxygenases may be related to their polarity and hydrophobicity, which will help the follow-up study on the catalytic mechanism of 2OG oxygenases and the way they interact with the substrate. Based on the model we built, a user-friendly web server was established and can be friendly accessed at http://bioinfor.imu.edu.cn/ogferaac.
Collapse
Affiliation(s)
- Jian Zhou
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Suling Bo
- College of Computer and Information, Inner Mongolia Medical University, Hohhot, China
| | - Hao Wang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Lei Zheng
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Pengfei Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| |
Collapse
|