1
|
Arif M, Musleh S, Ghulam A, Fida H, Alqahtani Y, Alam T. StackDPPred: Multiclass prediction of defensin peptides using stacked ensemble learning with optimized features. Methods 2024; 230:129-139. [PMID: 39173785 DOI: 10.1016/j.ymeth.2024.08.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 07/30/2024] [Accepted: 08/13/2024] [Indexed: 08/24/2024] Open
Abstract
Host defense or antimicrobial peptides (AMPs) are promising candidates for protecting host against microbial pathogens for example bacteria, virus, fungi, yeast. Defensins are the type of AMPs that act as potential therapeutic drug agent and perform vital role in various biological process. Conventional Experiments to identify defensin peptides (DPs) are time consuming and expensive. Thus, the shortcomings of wet lab experiments are leveraged by computational methods to accurately predict the functional types of DPs. In this paper, we aim to propose a novel multi-class ensemble-based prediction model called StackDPPred for identifying the properties of DPs. The peptide sequences are encoded using split amino acid composition (SAAC), segmented position specific scoring matrix (SegPSSM), histogram of oriented gradients-based PSSM (HOGPSSM) and feature extraction based graphical and statistical (FEGS) descriptors. Next, principal component analysis (PCA) is used to select the best subset of attributes. After that, the optimized features are fed into single machine learning and stacking-based ensemble classifiers. Furthermore, the ablation study demonstrates the robustness and efficacy of the stacking approach using reduced features for predicting DPs and their families. The proposed StackDPPred method improves the overall accuracy by 13.41% and 7.62% compared to existing DPs predictors iDPF-PseRAAC and iDEF-PseRAAC, respectively on validation test. Additionally, we applied the local interpretable model-agnostic explanations (LIME) algorithm to understand the contribution of selected features to the overall prediction. We believe, StackDPPred could serve as a valuable tool accelerating the screening of large-scale DPs and peptide-based drug discovery process.
Collapse
Affiliation(s)
- Muhammad Arif
- College of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, Qatar
| | - Saleh Musleh
- College of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, Qatar
| | - Ali Ghulam
- Information Technology Centre, Sindh Agriculture University, Sindh, Pakistan
| | - Huma Fida
- Department of Microbiology, Abdul Wali Khan University Mardan, 23200, KPK, Pakistan
| | | | - Tanvir Alam
- College of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, Qatar.
| |
Collapse
|
2
|
Feng C, Wei H, Li X, Feng B, Xu C, Zhu X, Liu R. A stacking-based algorithm for antifreeze protein identification using combined physicochemical, pseudo amino acid composition, and reduction property features. Comput Biol Med 2024; 176:108534. [PMID: 38754217 DOI: 10.1016/j.compbiomed.2024.108534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 04/03/2024] [Accepted: 04/28/2024] [Indexed: 05/18/2024]
Abstract
Antifreeze proteins have wide applications in the medical and food industries. In this study, we propose a stacking-based classifier that can effectively identify antifreeze proteins. Initially, feature extraction was performed in three aspects: reduction properties, scalable pseudo amino acid composition, and physicochemical properties. A hybrid feature set comprised of the combined information from these three categories was obtained. Subsequently, we trained the training set based on LightGBM, XGBoost, and RandomForest algorithms, and the training outcomes were passed to the Logistic algorithm for matching, thereby establishing a stacking algorithm. The proposed algorithm was tested on the test set and an independent validation set. Experimental data indicates that the algorithm achieved a recognition accuracy of 98.3 %, and an accuracy of 98.5 % on the validation set. Lastly, we analyzed the reasons why numerical features achieved high recognition capabilities from multiple aspects. Data dimensionality reduction and the analysis from two-dimensional and three-dimensional views revealed separability between positive and negative samples, and the protein three-dimensional structure further demonstrated significant differences in related features between the two samples. Analysis of the classifier revealed that Hr*Hr, HrHr, and Sc-PseAAC_1, 188D(152,116,57,183) were among the seven most important numerical features affecting algorithm recognition. For Hr*Hr and HrHr, supportive sequence level evidence for the reduction dictionary was found in terms of conservation area analysis, multiple sequence alignment, and amino acid conservative substitution. Moreover, the importance of the reduction dictionary was recognized through a comparative analysis of importance before and after the reduction, realizing the effectiveness of the dictionary in improving feature importance. A decision tree model has been utilized to discern the distinctions between dipeptides associated with the physical and chemical properties of His(H), Iso(I), Leu(L), and Lys(K) and other dipeptides. We finally analyzed the other seven features of importance, and data analysis confirmed that hydrophobicity, secondary structure, charge properties, van der Waals forces, and solvent accessibility are also factors affecting the antifreeze capability of proteins.
Collapse
Affiliation(s)
- Changli Feng
- Department of Information Science and Technology, Taishan University, Taian, 271000, China.
| | - Haiyan Wei
- Department of Information Science and Technology, Taishan University, Taian, 271000, China.
| | - Xin Li
- Department of Information Science and Technology, Taishan University, Taian, 271000, China.
| | - Bin Feng
- Department of Information Science and Technology, Taishan University, Taian, 271000, China.
| | - Chugui Xu
- Department of Information Science and Technology, Taishan University, Taian, 271000, China.
| | - Xiaorong Zhu
- Department of Information Science and Technology, Taishan University, Taian, 271000, China.
| | - Ruijun Liu
- School of Software, Beihang University, Beijing, 100191, China.
| |
Collapse
|
3
|
Liu S, Liang Y, Li J, Yang S, Liu M, Liu C, Yang D, Zuo Y. Integrating reduced amino acid composition into PSSM for improving copper ion-binding protein prediction. Int J Biol Macromol 2023:124993. [PMID: 37307968 DOI: 10.1016/j.ijbiomac.2023.124993] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/12/2023] [Accepted: 05/19/2023] [Indexed: 06/14/2023]
Abstract
Copper ion-binding proteins play an essential role in metabolic processes and are critical factors in many diseases, such as breast cancer, lung cancer, and Menkes disease. Many algorithms have been developed for predicting metal ion classification and binding sites, but none have been applied to copper ion-binding proteins. In this study, we developed a copper ion-bound protein classifier, RPCIBP, which integrating the reduced amino acid composition into position-specific score matrix (PSSM). The reduced amino acid composition filters out a large number of useless evolutionary features, improving the operational efficiency and predictive ability of the model (feature dimension from 2900 to 200, ACC from 83 % to 85.1 %). Compared with the basic model using only three sequence feature extraction methods (ACC in training set between 73.8 %-86.2 %, ACC in test set between 69.3 %-87.5 %), the model integrating the evolutionary features of the reduced amino acid composition showed higher accuracy and robustness (ACC in training set between 83.1 %-90.8 %, ACC in test set between 79.1 %-91.9 %). Best copper ion-binding protein classifiers filtered by feature selection progress were deployed in a user-friendly web server (http://bioinfor.imu.edu.cn/RPCIBP). RPCIBP can accurately predict copper ion-binding proteins, which is convenient for further structural and functional studies, and conducive to mechanism exploration and target drug development.
Collapse
Affiliation(s)
- Shanghua Liu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China; Inner Mongolia International Mongolian Hospital, Hohhot 010065, China
| | - Yuchao Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China; Digital College, Inner Mongolia Intelligent Union Big Data Academy, Hohhot 010010, China
| | - Jinzhao Li
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Siqi Yang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Ming Liu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Chengfang Liu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Dezhi Yang
- Inner Mongolia International Mongolian Hospital, Hohhot 010065, China.
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China; Inner Mongolia International Mongolian Hospital, Hohhot 010065, China; Digital College, Inner Mongolia Intelligent Union Big Data Academy, Hohhot 010010, China.
| |
Collapse
|
4
|
Liang Y, Yang S, Zheng L, Wang H, Zhou J, Huang S, Yang L, Zuo Y. Research progress of reduced amino acid alphabets in protein analysis and prediction. Comput Struct Biotechnol J 2022; 20:3503-3510. [PMID: 35860409 PMCID: PMC9284397 DOI: 10.1016/j.csbj.2022.07.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 06/30/2022] [Accepted: 07/01/2022] [Indexed: 11/29/2022] Open
Abstract
A comprehensive summary of the literature on the reduced amino acid alphabets. A systematic review of the development history of reduced amino acid alphabets. Rich application cases of amino acid reduction alphabets are described in the article. A detailed analysis of the properties and uses of the reduced amino acid alphabets.
Proteins are the executors of cellular physiological activities, and accurate structural and function elucidation are crucial for the refined mapping of proteins. As a feature engineering method, the reduction of amino acid composition is not only an important method for protein structure and function analysis, but also opens a broad horizon for the complex field of machine learning. Representing sequences with fewer amino acid types greatly reduces the complexity and noise of traditional feature engineering in dimension, and provides more interpretable predictive models for machine learning to capture key features. In this paper, we systematically reviewed the strategy and method studies of the reduced amino acid (RAA) alphabets, and summarized its main research in protein sequence alignment, functional classification, and prediction of structural properties, respectively. In the end, we gave a comprehensive analysis of 672 RAA alphabets from 74 reduction methods.
Collapse
Affiliation(s)
- Yuchao Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Siqi Yang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Lei Zheng
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Hao Wang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Jian Zhou
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Shenghui Huang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
- Corresponding authors.
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
- Corresponding authors.
| |
Collapse
|
5
|
Feng C, Wu J, Wei H, Xu L, Zou Q. CRCF: A Method of Identifying Secretory Proteins of Malaria Parasites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2149-2157. [PMID: 34061749 DOI: 10.1109/tcbb.2021.3085589] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Malaria is a mosquito-borne disease that results in millions of cases and deaths annually. The development of a fast computational method that identifies secretory proteins of the malaria parasite is important for research on antimalarial drugs and vaccines. Thus, a method was developed to identify the secretory proteins of malaria parasites. In this method, a reduced alphabet was selected to recode the original protein sequence. A feature synthesis method was used to synthesise three different types of feature information. Finally, the random forest method was used as a classifier to identify the secretory proteins. In addition, a web server was developed to share the proposed algorithm. Experiments using the benchmark dataset demonstrated that the overall accuracy achieved by the proposed method was greater than 97.8 percent using the 10-fold cross-validation method. Furthermore, the reduced schemes and characteristic performance analyses are discussed.
Collapse
|
6
|
Charoenkwan P, Schaduangrat N, Mahmud SMH, Thinnukool O, Shoombuatong W. Recent development of machine learning-based methods for the prediction of defensin family and subfamily. EXCLI JOURNAL 2022; 21:757-771. [PMID: 35949489 PMCID: PMC9360473 DOI: 10.17179/excli2022-4913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Accepted: 05/03/2022] [Indexed: 11/05/2022]
Abstract
Nearly all living species comprise of host defense peptides called defensins, that are crucial for innate immunity. These peptides work by activating the immune system which kills the microbes directly or indirectly, thus providing protection to the host. Thus far, numerous preclinical and clinical trials for peptide-based drugs are currently being evaluated. Although, experimental methods can help to precisely identify the defensin peptide family and subfamily, these approaches are often time-consuming and cost-ineffective. On the other hand, machine learning (ML) methods are able to effectively employ protein sequence information without the knowledge of a protein's three-dimensional structure, thus highlighting their predictive ability for the large-scale identification. To date, several ML methods have been developed for the in silico identification of the defensin peptide family and subfamily. Therefore, summarizing the advantages and disadvantages of the existing methods is urgently needed in order to provide useful suggestions for the development and improvement of new computational models for the identification of the defensin peptide family and subfamily. With this goal in mind, we first provide a comprehensive survey on a collection of six state-of-the-art computational approaches for predicting the defensin peptide family and subfamily. Herein, we cover different important aspects, including the dataset quality, feature encoding methods, feature selection schemes, ML algorithms, cross-validation methods and web server availability/usability. Moreover, we provide our thoughts on the limitations of existing methods and future perspectives for improving the prediction performance and model interpretability. The insights and suggestions gained from this review are anticipated to serve as a valuable guidance for researchers for the development of more robust and useful predictors.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, Thailand, 50200
| | - Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700
| | - S. M. Hasan Mahmud
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700,Department of Computer Science, American International University-Bangladesh (AIUB), Kuratoli, Dhaka 1229, Bangladesh
| | - Orawit Thinnukool
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, Thailand, 50200
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700,*To whom correspondence should be addressed: Watshara Shoombuatong, Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700; Phone: +66 2 441 4371, Fax: +66 2 441 4380, E-mail:
| |
Collapse
|
7
|
Kaur D, Patiyal S, Arora C, Singh R, Lodhi G, Raghava GPS. In-Silico Tool for Predicting, Scanning, and Designing Defensins. Front Immunol 2021; 12:780610. [PMID: 34880873 PMCID: PMC8645896 DOI: 10.3389/fimmu.2021.780610] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Accepted: 10/28/2021] [Indexed: 12/12/2022] Open
Abstract
Defensins are host defense peptides present in nearly all living species, which play a crucial role in innate immunity. These peptides provide protection to the host, either by killing microbes directly or indirectly by activating the immune system. In the era of antibiotic resistance, there is a need to develop a fast and accurate method for predicting defensins. In this study, a systematic attempt has been made to develop models for predicting defensins from available information on defensins. We created a dataset of defensins and non-defensins called the main dataset that contains 1,036 defensins and 1,035 AMPs (antimicrobial peptides, or non-defensins) to understand the difference between defensins and AMPs. Our analysis indicates that certain residues like Cys, Arg, and Tyr are more abundant in defensins in comparison to AMPs. We developed machine learning technique-based models on the main dataset using a wide range of peptide features. Our SVM (support vector machine)-based model discriminates defensins and AMPs with MCC of 0.88 and AUC of 0.98 on the validation set of the main dataset. In addition, we created an alternate dataset that consists of 1,036 defensins and 1,054 non-defensins obtained from Swiss-Prot. Models were also developed on the alternate dataset to predict defensins. Our SVM-based model achieved maximum MCC of 0.96 with AUC of 0.99 on the validation set of the alternate dataset. All models were trained, tested, and validated using standard protocols. Finally, we developed a web-based service "DefPred" to predict defensins, scan defensins in proteins, and design the best defensins from their analogs. The stand-alone software and web server of DefPred are available at https://webs.iiitd.edu.in/raghava/defpred.
Collapse
Affiliation(s)
- Dilraj Kaur
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Sumeet Patiyal
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Chakit Arora
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Ritesh Singh
- Department of Computer Science, Indraprastha Institute of Information Technology, New Delhi, India
| | - Gaurav Lodhi
- Department of Computer Science, Indraprastha Institute of Information Technology, New Delhi, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| |
Collapse
|
8
|
Feng C, Wei H, Yang D, Feng B, Ma Z, Han S, Zou Q, Shi H. ORS-Pred: An optimized reduced scheme-based identifier for antioxidant proteins. Proteomics 2021; 21:e2100017. [PMID: 34009737 DOI: 10.1002/pmic.202100017] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 04/22/2021] [Accepted: 05/12/2021] [Indexed: 12/30/2022]
Abstract
Antioxidant proteins can terminate a chain of reactions caused by free radicals and protect cells from damage. To identify antioxidant proteins rapidly, a computational model was proposed based on the optimized recoding scheme, sequence information and machine learning methods. First, over 600 recoding schemes were collected to build a scheme set. Then, the original sequence was recoded as a reduced expression whose g-gap dipeptides (g = 0, 1, 2) were used as the features of proteins. Furthermore, a random forest method was used to evaluate the classification ability of the obtained dipeptide features. After going through all schemes, the best predictive performance scheme was chosen as the optimized reduction scheme. Finally, for the RF method, a grid search strategy was used to select a better parameter combination to identify antioxidant proteins. In the experiment, the present method correctly recognized 90.13-99.87% of the antioxidant samples. Other experimental results also proved that the present method was efficient to identify antioxidant proteins. Finally, we also developed a web server that was freely accessible to researchers.
Collapse
Affiliation(s)
- Changli Feng
- Department of Information Science and Technology, Taishan University, Taian, China
| | - Haiyan Wei
- Department of Teachers and Education, Taishan University, Taian, China
| | - Deyun Yang
- Department of Information Science and Technology, Taishan University, Taian, China
| | - Bin Feng
- Department of Information Science and Technology, Taishan University, Taian, China
| | - Zhaogui Ma
- Department of Information Science and Technology, Taishan University, Taian, China
| | - Shuguang Han
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.,China and Hainan Key Laboratory for Computational Science and Application, Hainan Normal University, Haikou, China
| | - Hua Shi
- School of Opto-electronic and Communication Engineering, Xiamen University of Technology, Xiamen, China
| |
Collapse
|
9
|
Zhou J, Bo S, Wang H, Zheng L, Liang P, Zuo Y. Identification of Disease-Related 2-Oxoglutarate/Fe (II)-Dependent Oxygenase Based on Reduced Amino Acid Cluster Strategy. Front Cell Dev Biol 2021; 9:707938. [PMID: 34336861 PMCID: PMC8323781 DOI: 10.3389/fcell.2021.707938] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 06/10/2021] [Indexed: 11/17/2022] Open
Abstract
The 2-oxoglutarate/Fe (II)-dependent (2OG) oxygenase superfamily is mainly responsible for protein modification, nucleic acid repair and/or modification, and fatty acid metabolism and plays important roles in cancer, cardiovascular disease, and other diseases. They are likely to become new targets for the treatment of cancer and other diseases, so the accurate identification of 2OG oxygenases is of great significance. Many computational methods have been proposed to predict functional proteins to compensate for the time-consuming and expensive experimental identification. However, machine learning has not been applied to the study of 2OG oxygenases. In this study, we developed OGFE_RAAC, a prediction model to identify whether a protein is a 2OG oxygenase. To improve the performance of OGFE_RAAC, 673 amino acid reduction alphabets were used to determine the optimal feature representation scheme by recoding the protein sequence. The 10-fold cross-validation test showed that the accuracy of the model in identifying 2OG oxygenases is 91.04%. Besides, the independent dataset results also proved that the model has excellent generalization and robustness. It is expected to become an effective tool for the identification of 2OG oxygenases. With further research, we have also found that the function of 2OG oxygenases may be related to their polarity and hydrophobicity, which will help the follow-up study on the catalytic mechanism of 2OG oxygenases and the way they interact with the substrate. Based on the model we built, a user-friendly web server was established and can be friendly accessed at http://bioinfor.imu.edu.cn/ogferaac.
Collapse
Affiliation(s)
- Jian Zhou
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Suling Bo
- College of Computer and Information, Inner Mongolia Medical University, Hohhot, China
| | - Hao Wang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Lei Zheng
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Pengfei Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| |
Collapse
|
10
|
Cao Y, Yu C, Huang S, Wang S, Zuo Y, Yang L. Characterization and Prediction of Presynaptic and Postsynaptic Neurotoxins Based on Reduced Amino Acids and Biological Properties. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200707150512] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Presynaptic and postsynaptic neurotoxins are two important neurotoxins. Due to the important
role of presynaptic and postsynaptic neurotoxins in pharmacology and neuroscience, identification of them becomes very
important in biology.
Method:
In this study, the statistical test and F-score were used to calculate the difference between amino acids and
biological properties. The support vector machine was used to predict the presynaptic and postsynaptic neurotoxins by
using the reduced amino acid alphabet types.
Results:
By using the reduced amino acid alphabet as the input parameters of support vector machine, the overall accuracy
of our classifier had increased to 91.07%, which was the highest overall accuracy in this study. When compared with the
other published methods, better predictive results were obtained by our classifier.
Conclusion:
In summary, we analyzed the differences between two neurotoxins in amino acids and biological properties,
and constructed a classifier that could predict these two neurotoxins by using the reduced amino acid alphabet.
Collapse
Affiliation(s)
- Yiyin Cao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Chunlu Yu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Shenghui Huang
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yongchun Zuo
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| |
Collapse
|
11
|
Dong GF, Zheng L, Huang SH, Gao J, Zuo YC. Amino Acid Reduction Can Help to Improve the Identification of Antimicrobial Peptides and Their Functional Activities. Front Genet 2021; 12:669328. [PMID: 33959153 PMCID: PMC8093877 DOI: 10.3389/fgene.2021.669328] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 03/23/2021] [Indexed: 02/03/2023] Open
Abstract
Antimicrobial peptides (AMPs) are considered as potential substitutes of antibiotics in the field of new anti-infective drug design. There have been several machine learning algorithms and web servers in identifying AMPs and their functional activities. However, there is still room for improvement in prediction algorithms and feature extraction methods. The reduced amino acid (RAA) alphabet effectively solved the problems of simplifying protein complexity and recognizing the structure conservative region. This article goes into details about evaluating the performances of more than 5,000 amino acid reduced descriptors generated from 74 types of amino acid reduced alphabet in the first stage and the second stage to construct an excellent two-stage classifier, Identification of Antimicrobial Peptides by Reduced Amino Acid Cluster (iAMP-RAAC), for identifying AMPs and their functional activities, respectively. The results show that the first stage AMP classifier is able to achieve the accuracy of 97.21 and 97.11% for the training data set and independent test dataset. In the second stage, our classifier still shows good performance. At least three of the four metrics, sensitivity (SN), specificity (SP), accuracy (ACC), and Matthews correlation coefficient (MCC), exceed the calculation results in the literature. Further, the ANOVA with incremental feature selection (IFS) is used for feature selection to further improve prediction performance. The prediction performance is further improved after the feature selection of each stage. At last, a user-friendly web server, iAMP-RAAC, is established at http://bioinfor.imu.edu. cn/iampraac.
Collapse
Affiliation(s)
- Gai-Fang Dong
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application of Agriculture and Animal Husbandry, College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China
| | - Lei Zheng
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Sheng-Hui Huang
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Jing Gao
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application of Agriculture and Animal Husbandry, College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China
| | - Yong-Chun Zuo
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| |
Collapse
|
12
|
ANPrAod: Identify Antioxidant Proteins by Fusing Amino Acid Clustering Strategy and N-Peptide Combination. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:5518209. [PMID: 33927782 PMCID: PMC8049822 DOI: 10.1155/2021/5518209] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 03/02/2021] [Accepted: 03/10/2021] [Indexed: 11/18/2022]
Abstract
Antioxidant proteins perform significant functions in disease control and delaying aging which can prevent free radicals from damaging organisms. Accurate identification of antioxidant proteins has important implications for the development of new drugs and the treatment of related diseases, as they play a critical role in the control or prevention of cancer and aging-related conditions. Since experimental identification techniques are time-consuming and expensive, many computational methods have been proposed to identify antioxidant proteins. Although the accuracy of these methods is acceptable, there are still some challenges. In this study, we developed a computational model called ANPrAod to identify antioxidant proteins based on a support vector machine. In order to eliminate potential redundant features and improve prediction accuracy, 673 amino acid reduction alphabets were calculated by us to find the optimal feature representation scheme. The final model could produce an overall accuracy of 87.53% with the ROC of 0.7266 in five-fold cross-validation, which was better than the existing methods. The results of the independent dataset also demonstrated the excellent robustness and reliability of ANPrAod, which could be a promising tool for antioxidant protein identification and contribute to hypothesis-driven experimental design.
Collapse
|
13
|
Zhang ZM, Guan ZX, Wang F, Zhang D, Ding H. Application of Machine Learning Methods in Predicting Nuclear Receptors and their Families. Med Chem 2021; 16:594-604. [PMID: 31584374 DOI: 10.2174/1573406415666191004125551] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Revised: 06/18/2019] [Accepted: 08/23/2019] [Indexed: 11/22/2022]
Abstract
Nuclear receptors (NRs) are a superfamily of ligand-dependent transcription factors that are closely related to cell development, differentiation, reproduction, homeostasis, and metabolism. According to the alignments of the conserved domains, NRs are classified and assigned the following seven subfamilies or eight subfamilies: (1) NR1: thyroid hormone like (thyroid hormone, retinoic acid, RAR-related orphan receptor, peroxisome proliferator activated, vitamin D3- like), (2) NR2: HNF4-like (hepatocyte nuclear factor 4, retinoic acid X, tailless-like, COUP-TFlike, USP), (3) NR3: estrogen-like (estrogen, estrogen-related, glucocorticoid-like), (4) NR4: nerve growth factor IB-like (NGFI-B-like), (5) NR5: fushi tarazu-F1 like (fushi tarazu-F1 like), (6) NR6: germ cell nuclear factor like (germ cell nuclear factor), and (7) NR0: knirps like (knirps, knirpsrelated, embryonic gonad protein, ODR7, trithorax) and DAX like (DAX, SHP), or dividing NR0 into (7) NR7: knirps like and (8) NR8: DAX like. Different NRs families have different structural features and functions. Since the function of a NR is closely correlated with which subfamily it belongs to, it is highly desirable to identify NRs and their subfamilies rapidly and effectively. The knowledge acquired is essential for a proper understanding of normal and abnormal cellular mechanisms. With the advent of the post-genomics era, huge amounts of sequence-known proteins have increased explosively. Conventional methods for accurately classifying the family of NRs are experimental means with high cost and low efficiency. Therefore, it has created a greater need for bioinformatics tools to effectively recognize NRs and their subfamilies for the purpose of understanding their biological function. In this review, we summarized the application of machine learning methods in the prediction of NRs from different aspects. We hope that this review will provide a reference for further research on the classification of NRs and their families.
Collapse
Affiliation(s)
- Zi-Mei Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Fang Wang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Dan Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
14
|
Wang H, Xi Q, Liang P, Zheng L, Hong Y, Zuo Y. IHEC_RAAC: a online platform for identifying human enzyme classes via reduced amino acid cluster strategy. Amino Acids 2021; 53:239-251. [PMID: 33486591 DOI: 10.1007/s00726-021-02941-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2020] [Accepted: 01/11/2021] [Indexed: 12/18/2022]
Abstract
Enzymes have been proven to play considerable roles in disease diagnosis and biological functions. The feature extraction that truly reflects the intrinsic properties of protein is the most critical step for the automatic identification of enzymes. Although lots of feature extraction methods have been proposed, some challenges remain. In this study, we developed a predictor called IHEC_RAAC, which has the capability to identify whether a protein is a human enzyme and distinguish the function of the human enzyme. To improve the feature representation ability, protein sequences were encoded by a new feature-vector called 'reduced amino acid cluster'. We calculated 673 amino acid reduction alphabets to determine the optimal feature representative scheme. The tenfold cross-validation test showed that the accuracy of IHEC_RAAC to identify human enzymes was 74.66% and further discriminate the human enzyme classes with an accuracy of 54.78%, which was 2.06% and 8.68% higher than the state-of-the-art predictors, respectively. Additionally, the results from the independent dataset indicated that IHEC_RAAC can effectively predict human enzymes and human enzyme classes to further provide guidance for protein research. A user-friendly web server, IHEC_RAAC, is freely accessible at http://bioinfor.imu.edu.cn/ihecraac .
Collapse
Affiliation(s)
- Hao Wang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Qilemuge Xi
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Pengfei Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Lei Zheng
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Yan Hong
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China.
| |
Collapse
|
15
|
Sun Z, Huang S, Zheng L, Liang P, Yang W, Zuo Y. ICTC-RAAC: An improved web predictor for identifying the types of ion channel-targeted conotoxins by using reduced amino acid cluster descriptors. Comput Biol Chem 2020; 89:107371. [PMID: 32950852 DOI: 10.1016/j.compbiolchem.2020.107371] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Revised: 09/01/2020] [Accepted: 09/02/2020] [Indexed: 12/27/2022]
Abstract
Conotoxins are small peptide toxins which are rich in disulfide and have the unique diversity of sequences. It is significant to correctly identify the types of ion channel-targeted conotoxins because that they are considered as the optimal pharmacological candidate medicine in drug design owing to their ability specifically binding to ion channels and interfering with neural transmission. Comparing with other feature extracting methods, the reduced amino acid cluster (RAAC) better resolved in simplifying protein complexity and identifying functional conserved regions. Thus, in our study, 673 RAACs generated from 74 types of reduced amino acid alphabet were comprehensively assessed to establish a state-of-the-art predictor for predicting ion channel-targeted conotoxins. The results showed Type 20, Cluster 9 (T = 20, C = 9) in the tripeptide composition (N = 3) achieved the best accuracy, 89.3%, which was based on the algorithm of amino acids reduction of variance maximization. Further, the ANOVA with incremental feature selection (IFS) was used for feature selection to improve prediction performance. Finally, the cross-validation results showed that the best overall accuracy we calculated was 96.4% and 1.8% higher than the best accuracy of previous studies. Based on the predictor we proposed, a user-friendly webserver was established and can be friendly accessed at http://bioinfor.imu.edu.cn/ictcraac.
Collapse
Affiliation(s)
- Zijie Sun
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China; School of Mathematical Sciences, Inner Mongolia University, Hohhot, 010021, China
| | - Shenghui Huang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Lei Zheng
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Pengfei Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Wuritu Yang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China.
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China.
| |
Collapse
|
16
|
Li J, Shi X, You ZH, Yi HC, Chen Z, Lin Q, Fang M. Using Weighted Extreme Learning Machine Combined With Scale-Invariant Feature Transform to Predict Protein-Protein Interactions From Protein Evolutionary Information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1546-1554. [PMID: 31940546 DOI: 10.1109/tcbb.2020.2965919] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Protein-Protein Interactions (PPIs) play an irreplaceable role in biological activities of organisms. Although many high-throughput methods are used to identify PPIs from different kinds of organisms, they have some shortcomings, such as high cost and time-consuming. To solve the above problems, computational methods are developed to predict PPIs. Thus, in this paper, we present a method to predict PPIs using protein sequences. First, protein sequences are transformed into Position Weight Matrix (PWM), in which Scale-Invariant Feature Transform (SIFT) algorithm is used to extract features. Then Principal Component Analysis (PCA) is applied to reduce the dimension of features. At last, Weighted Extreme Learning Machine (WELM) classifier is employed to predict PPIs and a series of evaluation results are obtained. In our method, since SIFT and WELM are used to extract features and classify respectively, we called the proposed method SIFT-WELM. When applying the proposed method on three well-known PPIs datasets of Yeast, Human and Helicobacter.pylori, the average accuracies of our method using five-fold cross validation are obtained as high as 94.83, 97.60 and 83.64 percent, respectively. In order to evaluate the proposed approach properly, we compare it with Support Vector Machine (SVM) classifier and other recent-developed methods in different aspects. Moreover, the training time of our method is greatly shortened, which is obviously superior to the previous methods, such as SVM, ACC, PCVMZM and so on.
Collapse
|
17
|
Guan ZX, Li SH, Zhang ZM, Zhang D, Yang H, Ding H. A Brief Survey for MicroRNA Precursor Identification Using Machine Learning Methods. Curr Genomics 2020; 21:11-25. [PMID: 32655294 PMCID: PMC7324890 DOI: 10.2174/1389202921666200214125102] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 01/24/2020] [Accepted: 01/30/2020] [Indexed: 11/22/2022] Open
Abstract
MicroRNAs, a group of short non-coding RNA molecules, could regulate gene expression. Many diseases are associated with abnormal expression of miRNAs. Therefore, accurate identification of miRNA precursors is necessary. In the past 10 years, experimental methods, comparative genomics methods, and artificial intelligence methods have been used to identify pre-miRNAs. However, experimental methods and comparative genomics methods have their disadvantages, such as time-consuming. In contrast, machine learning-based method is a better choice. Therefore, the review summarizes the current advances in pre-miRNA recognition based on computational methods, including the construction of benchmark datasets, feature extraction methods, prediction algorithms, and the results of the models. And we also provide valid information about the predictors currently available. Finally, we give the future perspectives on the identification of pre-miRNAs. The review provides scholars with a whole background of pre-miRNA identification by using machine learning methods, which can help researchers have a clear understanding of progress of the research in this field.
Collapse
Affiliation(s)
- Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Shi-Hao Li
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Zi-Mei Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Dan Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Hui Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| |
Collapse
|
18
|
Li H, Du H, Wang X, Gao P, Liu Y, Lin W. Remarks on Computational Method for Identifying Acid and Alkaline Enzymes. Curr Pharm Des 2020; 26:3105-3114. [PMID: 32552636 DOI: 10.2174/1381612826666200617170826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Accepted: 05/07/2020] [Indexed: 11/22/2022]
Abstract
The catalytic efficiency of the enzyme is thousands of times higher than that of ordinary catalysts. Thus, they are widely used in industrial and medical fields. However, enzymes with protein structure can be destroyed and inactivated in high temperature, over acid or over alkali environment. It is well known that most of enzymes work well in an environment with pH of 6-8, while some special enzymes remain active only in an alkaline environment with pH > 8 or an acidic environment with pH < 6. Therefore, the identification of acidic and alkaline enzymes has become a key task for industrial production. Because of the wide varieties of enzymes, it is hard work to determine the acidity and alkalinity of the enzyme by experimental methods, and even this task cannot be achieved. Converting protein sequences into digital features and building computational models can efficiently and accurately identify the acidity and alkalinity of enzymes. This review summarized the progress of the digital features to express proteins and computational methods to identify acidic and alkaline enzymes. We hope that this paper will provide more convenience, ideas, and guides for computationally classifying acid and alkaline enzymes.
Collapse
Affiliation(s)
- Hongfei Li
- School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Haoze Du
- Department of Computer Science, Wake Forest University, Winston-Salem, NC, 27109, United States
| | - Xianfang Wang
- School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Peng Gao
- School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Yifeng Liu
- School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Weizhong Lin
- Department of Computer Science, University of Missouri, Columbia, MO, 65211, United States
| |
Collapse
|
19
|
Zheng L, Liu D, Yang W, Yang L, Zuo Y. RaacLogo: a new sequence logo generator by using reduced amino acid clusters. Brief Bioinform 2020; 22:5855392. [PMID: 32524143 DOI: 10.1093/bib/bbaa096] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2020] [Revised: 04/12/2020] [Accepted: 04/29/2020] [Indexed: 12/15/2022] Open
Abstract
Sequence logos give a fast and concise display in visualizing consensus sequence. Protein exhibits greater complexity and diversity than DNA, which usually affects the graphical representation of the logo. Reduced amino acids perform powerful ability for simplifying complexity of sequence alignment, which motivated us to establish RaacLogo. As a new sequence logo generator by using reduced amino acid alphabets, RaacLogo can easily generate many different simplified logos tailored to users by selecting various reduced amino acid alphabets that consisted of more than 40 clustering algorithms. This current web server provides 74 types of reduced amino acid alphabet, which were manually extracted to generate 673 reduced amino acid clusters (RAACs) for dealing with protein alignment. A two-dimensional selector was proposed for easily selecting desired RAACs with underlying biology knowledge. It is anticipated that the RaacLogo web server will play more high-potential roles for protein sequence alignment, topological estimation and protein design experiments. RaacLogo is freely available at http://bioinfor.imu.edu.cn/raaclogo.
Collapse
Affiliation(s)
- Lei Zheng
- State key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of life sciences, Inner Mongolia University
| | - Dongyang Liu
- State key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of life sciences, Inner Mongolia University
| | - Wuritu Yang
- State key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of life sciences, Inner Mongolia University
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University
| | - Yongchun Zuo
- State key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of life sciences, Inner Mongolia University
| |
Collapse
|
20
|
Yu Y, Wang S, Wang Y, Cao Y, Yu C, Pan Y, Su D, Lu Q, Zuo Y, Yang L. Using Reduced Amino Acid Alphabet and Biological Properties to Analyze and Predict Animal Neurotoxin Protein. Curr Drug Metab 2020; 21:810-817. [PMID: 32433000 DOI: 10.2174/1389200221666200520090555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Revised: 01/07/2020] [Accepted: 01/15/2020] [Indexed: 11/22/2022]
Abstract
AIMS Because of the high affinity of these animal neurotoxin proteins for some special target site, they were usually used as pharmacological tools and therapeutic agents in medicine to gain deep insights into the function of the nervous system. BACKGROUND AND OBJECTIVE The animal neurotoxin proteins are one of the most common functional groups among the animal toxin proteins. Thus, it was very important to characterize and predict the animal neurotoxin proteins. METHODS In this study, the differences between the animal neurotoxin proteins and non-toxin proteins were analyzed. RESULT Significant differences were found between them. In addition, the support vector machine was proposed to predict the animal neurotoxin proteins. The predictive results of our classifier achieved the overall accuracy of 96.46%. Furthermore, the random forest and k-nearest neighbors were applied to predict the animal neurotoxin proteins. CONCLUSION The compared results indicated that the predictive performances of our classifier were better than other two algorithms.
Collapse
Affiliation(s)
- Yao Yu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yakun Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yiyin Cao
- Public Health College, Harbin Medical University, Harbin 150081, China
| | - Chunlu Yu
- Public Health College, Harbin Medical University, Harbin 150081, China
| | - Yi Pan
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Dongqing Su
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Qianzi Lu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yongchun Zuo
- The State key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| |
Collapse
|
21
|
Zheng L, Huang S, Mu N, Zhang H, Zhang J, Chang Y, Yang L, Zuo Y. RAACBook: a web server of reduced amino acid alphabet for sequence-dependent inference by using Chou's five-step rule. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2019:5650975. [PMID: 31802128 PMCID: PMC6893003 DOI: 10.1093/database/baz131] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Revised: 10/16/2019] [Accepted: 10/17/2019] [Indexed: 12/12/2022]
Abstract
By reducing amino acid alphabet, the protein complexity can be significantly simplified, which could improve computational efficiency, decrease information redundancy and reduce chance of overfitting. Although some reduced alphabets have been proposed, different classification rules could produce distinctive results for protein sequence analysis. Thus, it is urgent to construct a systematical frame for reduced alphabets. In this work, we constructed a comprehensive web server called RAACBook for protein sequence analysis and machine learning application by integrating reduction alphabets. The web server contains three parts: (i) 74 types of reduced amino acid alphabet were manually extracted to generate 673 reduced amino acid clusters (RAACs) for dealing with unique protein problems. It is easy for users to select desired RAACs from a multilayer browser tool. (ii) An online tool was developed to analyze primary sequence of protein. The tool could produce K-tuple reduced amino acid composition by defining three correlation parameters (K-tuple, g-gap, λ-correlation). The results are visualized as sequence alignment, mergence of RAA composition, feature distribution and logo of reduced sequence. (iii) The machine learning server is provided to train the model of protein classification based on K-tuple RAAC. The optimal model could be selected according to the evaluation indexes (ROC, AUC, MCC, etc.). In conclusion, RAACBook presents a powerful and user-friendly service in protein sequence analysis and computational proteomics. RAACBook can be freely available at http://bioinfor.imu.edu.cn/raacbook. Database URL: http://bioinfor.imu.edu.cn/raacbook
Collapse
Affiliation(s)
- Lei Zheng
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Shenghui Huang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Nengjiang Mu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Haoyue Zhang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Jiayu Zhang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Yu Chang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Baojian Road No.157, Harbin 150081, China
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| |
Collapse
|
22
|
Yan J, Bhadra P, Li A, Sethiya P, Qin L, Tai HK, Wong KH, Siu SWI. Deep-AmPEP30: Improve Short Antimicrobial Peptides Prediction with Deep Learning. MOLECULAR THERAPY. NUCLEIC ACIDS 2020; 20:882-894. [PMID: 32464552 PMCID: PMC7256447 DOI: 10.1016/j.omtn.2020.05.006] [Citation(s) in RCA: 117] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Revised: 04/08/2020] [Accepted: 05/06/2020] [Indexed: 12/12/2022]
Abstract
Antimicrobial peptides (AMPs) are a valuable source of antimicrobial agents and a potential solution to the multi-drug resistance problem. In particular, short-length AMPs have been shown to have enhanced antimicrobial activities, higher stability, and lower toxicity to human cells. We present a short-length (≤30 aa) AMP prediction method, Deep-AmPEP30, developed based on an optimal feature set of PseKRAAC reduced amino acids composition and convolutional neural network. On a balanced benchmark dataset of 188 samples, Deep-AmPEP30 yields an improved performance of 77% in accuracy, 85% in the area under the receiver operating characteristic curve (AUC-ROC), and 85% in area under the precision-recall curve (AUC-PR) over existing machine learning-based methods. To demonstrate its power, we screened the genome sequence of Candida glabrata—a gut commensal fungus expected to interact with and/or inhibit other microbes in the gut—for potential AMPs and identified a peptide of 20 aa (P3, FWELWKFLKSLWSIFPRRRP) with strong anti-bacteria activity against Bacillus subtilis and Vibrio parahaemolyticus. The potency of the peptide is remarkably comparable to that of ampicillin. Therefore, Deep-AmPEP30 is a promising prediction tool to identify short-length AMPs from genomic sequences for drug discovery. Our method is available at https://cbbio.cis.um.edu.mo/AxPEP for both individual sequence prediction and genome screening for AMPs.
Collapse
Affiliation(s)
- Jielu Yan
- Department of Computer and Information Science, University of Macau, Macau, China
| | - Pratiti Bhadra
- Department of Computer and Information Science, University of Macau, Macau, China
| | - Ang Li
- Faculty of Health Sciences, University of Macau, Macau, China
| | - Pooja Sethiya
- Faculty of Health Sciences, University of Macau, Macau, China
| | - Longguang Qin
- Faculty of Health Sciences, University of Macau, Macau, China
| | - Hio Kuan Tai
- Department of Computer and Information Science, University of Macau, Macau, China
| | - Koon Ho Wong
- Faculty of Health Sciences, University of Macau, Macau, China; Institute of Translational Medicines, University of Macau, Macau, China
| | - Shirley W I Siu
- Department of Computer and Information Science, University of Macau, Macau, China.
| |
Collapse
|
23
|
Zhang L, Kong L. A Novel Amino Acid Properties Selection Method for Protein Fold Classification. Protein Pept Lett 2020; 27:287-294. [PMID: 32207399 DOI: 10.2174/0929866526666190718151753] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Revised: 04/17/2019] [Accepted: 06/10/2019] [Indexed: 12/21/2022]
Abstract
BACKGROUND Amino acid physicochemical properties encoded in protein primary structure play a crucial role in protein folding. However, it is not yet clear which of the properties are the most suitable for protein fold classification. OBJECTIVE To avoid exhaustively searching the total properties space, an amino acid properties selection method was proposed in this study to rapidly obtain a suitable properties combination for protein fold classification. METHODS The proposed amino acid properties selection method was based on sequential floating forward selection strategy. Beginning with an empty set, variable number of features were added iteratively until achieving the iteration termination condition. RESULTS The experimental results indicate that the proposed method improved prediction accuracies by 0.26-5% on a widely used benchmark dataset with appropriately selected amino acid properties. CONCLUSION The proposed properties selection method can be extended to other biomolecule property related classification problems in bioinformatics.
Collapse
Affiliation(s)
- Lichao Zhang
- School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao, China.,College of Sciences, Northeastern University, Shenyang, China
| | - Liang Kong
- School of Mathematics and Information Science & Technology, Hebei Normal University of Science & Technology, Qinhuangdao, China
| |
Collapse
|
24
|
Liu T, Tang H. A Brief Survey of Machine Learning Methods in Identification of Mitochondria Proteins in Malaria Parasite. Curr Pharm Des 2020; 26:3049-3058. [PMID: 32156226 DOI: 10.2174/1381612826666200310122324] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Accepted: 02/10/2020] [Indexed: 11/22/2022]
Abstract
The number of human deaths caused by malaria is increasing day-by-day. In fact, the mitochondrial proteins of the malaria parasite play vital roles in the organism. For developing effective drugs and vaccines against infection, it is necessary to accurately identify mitochondrial proteins of the malaria parasite. Although precise details for the mitochondrial proteins can be provided by biochemical experiments, they are expensive and time-consuming. In this review, we summarized the machine learning-based methods for mitochondrial proteins identification in the malaria parasite and compared the construction strategies of these computational methods. Finally, we also discussed the future development of mitochondrial proteins recognition with algorithms.
Collapse
Affiliation(s)
- Ting Liu
- Department of Pathophysiology, Key Laboratory of Medical Electrophysiology, Ministry of Education, Southwest Medical University, Luzhou 646000, China
| | - Hua Tang
- Department of Pathophysiology, Key Laboratory of Medical Electrophysiology, Ministry of Education, Southwest Medical University, Luzhou 646000, China
| |
Collapse
|
25
|
pLoc_bal-mHum: Predict subcellular localization of human proteins by PseAAC and quasi-balancing training dataset. Genomics 2019; 111:1274-1282. [DOI: 10.1016/j.ygeno.2018.08.007] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Revised: 08/14/2018] [Accepted: 08/16/2018] [Indexed: 12/17/2022]
|
26
|
Wang F, Guan ZX, Dao FY, Ding H. A Brief Review of the Computational Identification of Antifreeze Protein. CURR ORG CHEM 2019. [DOI: 10.2174/1385272823666190718145613] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Lots of cold-adapted organisms could produce antifreeze proteins (AFPs) to counter the freezing of cell fluids by controlling the growth of ice crystal. AFPs have been found in various species such as in vertebrates, invertebrates, plants, bacteria, and fungi. These AFPs from fish, insects and plants displayed a high diversity. Thus, the identification of the AFPs is a challenging task in computational proteomics. With the accumulation of AFPs and development of machine meaning methods, it is possible to construct a high-throughput tool to timely identify the AFPs. In this review, we briefly reviewed the application of machine learning methods in antifreeze proteins identification from difference section, including published benchmark dataset, sequence descriptor, classification algorithms and published methods. We hope that this review will produce new ideas and directions for the researches in identifying antifreeze proteins.
Collapse
Affiliation(s)
- Fang Wang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Fu-Ying Dao
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
27
|
Malik N, Khatkar A, Dhiman P. Computational Analysis and Synthesis of Syringic Acid Derivatives as Xanthine Oxidase Inhibitors. Med Chem 2019; 16:643-653. [PMID: 31584375 DOI: 10.2174/1573406415666191004134346] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Revised: 05/07/2019] [Accepted: 08/23/2019] [Indexed: 11/22/2022]
Abstract
BACKGROUND Xanthine oxidase (XO; EC 1.17.3.2) has been considered as a potent drug target for the cure and management of pathological conditions prevailing due to high levels of uric acid in the bloodstream. The role of xanthine oxidase has been well established in the generation of hyperuricemia and gout due to its important role in catalytic oxidative hydroxylation of hypoxanthine to xanthine and further catalyses of xanthine to generate uric acid. In this research, syringic acid, a bioactive phenolic acid was explored to determine the capability of itself and its derivatives to inhibit xanthine oxidase. OBJECTIVE The study aimed to develop new xanthine oxidase inhibitors from natural constituents along with the antioxidant potential. METHODS In this report, we designed and synthesized syringic acid derivatives hybridized with alcohol and amines to form ester and amide linkage with the help of molecular docking. The synthesized compounds were evaluated for their antioxidant and xanthine oxidase inhibitory potential. RESULTS Results of the study revealed that SY3 produces very good xanthine oxidase inhibitory activity. All the compounds showed very good antioxidant activity. The enzyme kinetic studies performed on syringic acid derivatives showed a potential inhibitory effect on XO ability in a competitive manner with IC50 value ranging from 07.18μM-15.60μM and SY3 was revealed as the most active derivative. Molecular simulation revealed that new syringic acid derivatives interacted with the amino acid residues SER1080, PHE798, GLN1194, ARG912, GLN 767, ALA1078 and MET1038 positioned inside the binding site of XO. Results of antioxidant activity revealed that all the derivatives showed very good antioxidant potential. CONCLUSION Molecular docking proved to be an effective and selective tool in the design of new syringic acid derivatives .This hybridization of two natural constituents could lead to desirable xanthine oxidase inhibitors with improved activity.
Collapse
Affiliation(s)
- Neelam Malik
- Department of Pharmaceutical Sciences, M.D. University Rohtak, Rohtak, Haryana, India
| | - Anurag Khatkar
- Laboratory for Preservation Technology and Enzyme Inhibition Studies, Department of Pharmaceutical Sciences, M.D. University, Rohtak, Haryana, India
| | - Priyanka Dhiman
- Department of Pharmaceutical Sciences, M.D. University Rohtak, Rohtak, Haryana, India
| |
Collapse
|
28
|
Le NQK. Fertility-GRU: Identifying Fertility-Related Proteins by Incorporating Deep-Gated Recurrent Units and Original Position-Specific Scoring Matrix Profiles. J Proteome Res 2019. [DOI: 10.1021/acs.jproteome.9b00411 10.1021/acs.jproteome.9b00411] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Nguyen Quoc Khanh Le
- Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, Singapore 639798
| |
Collapse
|
29
|
Zuo Y, Chang Y, Huang S, Zheng L, Yang L, Cao G. iDEF-PseRAAC: Identifying the Defensin Peptide by Using Reduced Amino Acid Composition Descriptor. Evol Bioinform Online 2019; 15:1176934319867088. [PMID: 31391777 PMCID: PMC6669840 DOI: 10.1177/1176934319867088] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Accepted: 07/08/2019] [Indexed: 11/18/2022] Open
Abstract
Defensins as 1 of major classes of host defense peptides play a significant role in the innate immunity, which are extremely evolved in almost all living organisms. Developing high-throughput computational methods can accurately help in designing drugs or medical means to defense against pathogens. To take up such a challenge, an up-to-date server based on rigorous benchmark dataset, referred to as iDEF-PseRAAC, was designed for predicting the defensin family in this study. By extracting primary sequence compositions based on different types of reduced amino acid alphabet, it was calculated that the best overall accuracy of the selected feature subset was achieved to 92.38%. Therefore, we can conclude that the information provided by abundant types of amino acid reduction will provide efficient and rational methodology for defensin identification. And, a free online server is freely available for academic users at http://bioinfor.imu.edu.cn/idpf. We hold expectations that iDEF-PseRAAC may be a promising weapon for the function annotation about the defensins protein.
Collapse
Affiliation(s)
- Yongchun Zuo
- College of Veterinary Medicine, Inner Mongolia Agricultural University, Hohhot, China.,State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Yu Chang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Shenghui Huang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Lei Zheng
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Guifang Cao
- College of Veterinary Medicine, Inner Mongolia Agricultural University, Hohhot, China
| |
Collapse
|
30
|
Le NQK. Fertility-GRU: Identifying Fertility-Related Proteins by Incorporating Deep-Gated Recurrent Units and Original Position-Specific Scoring Matrix Profiles. J Proteome Res 2019; 18:3503-3511. [DOI: 10.1021/acs.jproteome.9b00411] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Nguyen Quoc Khanh Le
- Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, Singapore 639798
| |
Collapse
|
31
|
Pan Z, Zhang H, Liang C, Li G, Xiao Q, Ding P, Luo J. Self-Weighted Multi-Kernel Multi-Label Learning for Potential miRNA-Disease Association Prediction. MOLECULAR THERAPY-NUCLEIC ACIDS 2019; 17:414-423. [PMID: 31319245 PMCID: PMC6637211 DOI: 10.1016/j.omtn.2019.06.014] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Revised: 05/22/2019] [Accepted: 06/12/2019] [Indexed: 11/23/2022]
Abstract
Researchers have realized that microRNAs (miRNAs) play significant roles in the pathogenesis of various diseases. Although many computational models have been proposed to predict the associations between miRNAs and diseases, prediction performance could still be improved. In this paper, we propose a novel self-weighted, multi-kernel, multi-label learning (SwMKML) method to predict disease-related miRNAs. SwMKML adaptively learns two optimal kernel matrices for both miRNAs and diseases from multiple kernels constructed from known miRNA-disease associations. Moreover, the miRNA-disease associations predicted from both spaces are updated simultaneously based on a multi-label framework. Compared with four state-of-the-art computational models, SwMKML achieved best results of 95.5%, 93.1%, and 84.1% in global leave-one-out cross-validation, 5-fold cross-validation, and overall prediction accuracy, respectively. A case study conducted on head and neck neoplasms further identified two potential prognostic biomarkers, hsa-mir-125b-1 and hsa-mir-125b-2, for the disease. SwMKML is freely available at Github, and we anticipate that it may become an effective tool for potential miRNA-disease association prediction.
Collapse
Affiliation(s)
- Zhenxia Pan
- School of Information Science and Engineering, Shandong Normal University, Jinan 250358, China
| | - Huaxiang Zhang
- School of Information Science and Engineering, Shandong Normal University, Jinan 250358, China.
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan 250358, China.
| | - Guanghui Li
- School of Information Engineering, East China Jiaotong University, Nanchang 330013, China
| | - Qiu Xiao
- College of Information Science and Engineering, Hunan Normal University, Changsha 410006, China
| | - Pingjian Ding
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| |
Collapse
|
32
|
Caetano Dos Santos FL, Michalek IM, Laurila K, Kaukinen K, Hyttinen J, Lindfors K. Automatic classification of IgA endomysial antibody test for celiac disease: a new method deploying machine learning. Sci Rep 2019; 9:9217. [PMID: 31239486 PMCID: PMC6592927 DOI: 10.1038/s41598-019-45679-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Accepted: 06/12/2019] [Indexed: 01/06/2023] Open
Abstract
Widespread use of endomysial autoantibody (EmA) test in diagnostics of celiac disease is limited due to its subjectivity and its requirement of an expert evaluator. The study aimed to determine whether machine learning can be applied to create a new observer-independent method of automatic assessment and classification of the EmA test for celiac disease. The study material comprised of 2597 high-quality IgA-class EmA images collected in 2017–2018. According to standard procedure, highly-experienced professional classified samples into the following four classes: I - positive, II - negative, III - IgA deficient, and IV - equivocal. Machine learning was deployed to create a classification model. The sensitivity and specificity of the model were 82.84% and 99.40%, respectively. The accuracy was 96.80%. The classification error was 3.20%. The area under the curve was 99.67%, 99.61%, 100%, and 99.89%, for I, II, III, and IV class, respectively. The mean assessment time per image was 16.11 seconds. This is the first study deploying machine learning for the automatic classification of IgA-class EmA test for celiac disease. The results indicate that using machine learning enables quick and precise EmA test analysis that can be further developed to simplify EmA analysis.
Collapse
Affiliation(s)
| | | | - Kaija Laurila
- Celiac Disease Research Center, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
| | - Katri Kaukinen
- Celiac Disease Research Center, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.,Department of Internal Medicine, Tampere University Hospital, Tampere, Finland
| | - Jari Hyttinen
- Computational Biophysics and Imaging Group, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
| | - Katri Lindfors
- Celiac Disease Research Center, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
| |
Collapse
|
33
|
Chen W, Feng P, Liu T, Jin D. Recent Advances in Machine Learning Methods for Predicting Heat Shock Proteins. Curr Drug Metab 2019; 20:224-228. [DOI: 10.2174/1389200219666181031105916] [Citation(s) in RCA: 69] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Revised: 05/21/2018] [Accepted: 08/02/2018] [Indexed: 02/08/2023]
Abstract
Background:As molecular chaperones, Heat Shock Proteins (HSPs) not only play key roles in protein folding and maintaining protein stabilities, but are also linked with multiple kinds of diseases. Therefore, HSPs have been regarded as the focus of drug design. Since HSPs from different families play distinct functions, accurately classifying the families of HSPs is the key step to clearly understand their biological functions. In contrast to laborintensive and cost-ineffective experimental methods, computational classification of HSP families has emerged to be an alternative approach.Methods:We reviewed the paper that described the existing datasets of HSPs and the representative computational approaches developed for the identification and classification of HSPs.Results:The two benchmark datasets of HSPs, namely HSPIR and sHSPdb were introduced, which provided invaluable resources for computationally identifying HSPs. The gold standard dataset and sequence encoding schemes for building computational methods of classifying HSPs were also introduced. The three representative web-servers for identifying HSPs and their families were described.Conclusion:The existing machine learning methods for identifying the different families of HSPs indeed yielded quite encouraging results and did play a role in promoting the research on HSPs. However, the number of HSPs with known structures is very limited. Therefore, determining the structure of the HSPs is also urgent, which will be helpful in revealing their functions.
Collapse
Affiliation(s)
- Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611730, China
| | - Pengmian Feng
- Hebei Province Key Laboratory of Occupational Health and Safety for Coal Industry, School of Public Health, North China University of Science and Technology, Tangshan 063000, China
| | - Tao Liu
- School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan 063000, China
| | - Dianchuan Jin
- School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan 063000, China
| |
Collapse
|
34
|
Characterization of human proteins with different subcellular localizations by topological and biological properties. Genomics 2018; 111:1831-1838. [PMID: 30543849 DOI: 10.1016/j.ygeno.2018.12.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2018] [Revised: 12/02/2018] [Accepted: 12/07/2018] [Indexed: 11/20/2022]
Abstract
Knowing the protein localization can provide valuable information resource for elucidating protein function. In recent years, with the advances of human genomics and proteomics, it is possible to characterize human proteins that are located in different subcellular localizations. In this study, we used the topological properties and biological properties to characterize human proteins with six subcellular localizations. Almost all of these properties were found to be significantly different among six protein categories. Network topology analysis indicated that several significant topological properties, including the degree and k-core, were higher for the mitochondrial proteins. Biological property analysis showed that the nuclear proteins appeared to be correlated with important biological function. We hope these findings may provide some important help for comprehensive understanding the biological function of proteins, and prediction of protein subcellular localizations in human.
Collapse
|
35
|
Lai HY, Chen XX, Chen W, Tang H, Lin H. Sequence-based predictive modeling to identify cancerlectins. Oncotarget 2018; 8:28169-28175. [PMID: 28423655 PMCID: PMC5438640 DOI: 10.18632/oncotarget.15963] [Citation(s) in RCA: 90] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Accepted: 02/24/2017] [Indexed: 11/25/2022] Open
Abstract
Lectins are a diverse type of glycoproteins or carbohydrate-binding proteins that have a wide distribution to various species. They can specially identify and exclusively bind to a certain kind of saccharide groups. Cancerlectins are a group of lectins that are closely related to cancer and play a major role in the initiation, survival, growth, metastasis and spread of tumor. Several computational methods have emerged to discriminate cancerlectins from non-cancerlectins, which promote the study on pathogenic mechanisms and clinical treatment of cancer. However, the predictive accuracies of most of these techniques are very limited. In this work, by constructing a benchmark dataset based on the CancerLectinDB database, a new amino acid sequence-based strategy for feature description was developed, and then the binomial distribution was applied to screen the optimal feature set. Ultimately, an SVM-based predictor was performed to distinguish cancerlectins from non-cancerlectins, and achieved an accuracy of 77.48% with AUC of 85.52% in jackknife cross-validation. The results revealed that our prediction model could perform better comparing with published predictive tools.
Collapse
Affiliation(s)
- Hong-Yan Lai
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Xin-Xin Chen
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Wei Chen
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, Tangshan, China
| | - Hua Tang
- Department of Pathophysiology, Southwest Medical University, Luzhou, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
36
|
Kumar S. Prediction of Metal Ion Binding Sites in Proteins from Amino Acid Sequences by Using Simplified Amino Acid Alphabets and Random Forest Model. Genomics Inform 2017; 15:162-169. [PMID: 29307143 PMCID: PMC5769865 DOI: 10.5808/gi.2017.15.4.162] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2017] [Revised: 11/16/2017] [Accepted: 11/16/2017] [Indexed: 11/20/2022] Open
Abstract
Metal binding proteins or metallo-proteins are important for the stability of the protein and also serve as co-factors in various functions like controlling metabolism, regulating signal transport, and metal homeostasis. In structural genomics, prediction of metal binding proteins help in the selection of suitable growth medium for overexpression's studies and also help in obtaining the functional protein. Computational prediction using machine learning approach has been widely used in various fields of bioinformatics based on the fact all the information contains in amino acid sequence. In this study, random forest machine learning prediction systems were deployed with simplified amino acid for prediction of individual major metal ion binding sites like copper, calcium, cobalt, iron, magnesium, manganese, nickel, and zinc.
Collapse
Affiliation(s)
- Suresh Kumar
- Department of Diagnostic and Allied Health Sciences, Faculty of Health and Life Sciences, Management and Science University, 40100 Shah Alam, Malaysia
| |
Collapse
|
37
|
Carvalho TFM, Silva JCF, Calil IP, Fontes EPB, Cerqueira FR. Rama: a machine learning approach for ribosomal protein prediction in plants. Sci Rep 2017; 7:16273. [PMID: 29176736 PMCID: PMC5701237 DOI: 10.1038/s41598-017-16322-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 10/30/2017] [Indexed: 12/04/2022] Open
Abstract
Ribosomal proteins (RPs) play a fundamental role within all type of cells, as they are major components of ribosomes, which are essential for translation of mRNAs. Furthermore, these proteins are involved in various physiological and pathological processes. The intrinsic biological relevance of RPs motivated advanced studies for the identification of unrevealed RPs. In this work, we propose a new computational method, termed Rama, for the prediction of RPs, based on machine learning techniques, with a particular interest in plants. To perform an effective classification, Rama uses a set of fundamental attributes of the amino acid side chains and applies a two-step procedure to classify proteins with unknown function as RPs. The evaluation of the resultant predictive models showed that Rama could achieve mean sensitivity, precision, and specificity of 0.91, 0.91, and 0.82, respectively. Furthermore, a list of proteins that have no annotation in Phytozome v.10, and are annotated as RPs in Phytozome v.12, were correctly classified by our models. Additional computational experiments have also shown that Rama presents high accuracy to differentiate ribosomal proteins from RNA-binding proteins. Finally, two novel proteins of Arabidopsis thaliana were validated in biological experiments. Rama is freely available at http://inctipp.bioagro.ufv.br:8080/Rama .
Collapse
Affiliation(s)
| | - José Cleydson F Silva
- Computer Science Department, Universidade Federal de Viçosa, 36570-900, Minas Gerais, Brazil
- National Institute of Science and Technology in Plant-Pest Interactions/BIOAGRO, Universidade Federal de Viçosa, 36570-900, Minas Gerais, Brazil
| | - Iara Pinheiro Calil
- National Institute of Science and Technology in Plant-Pest Interactions/BIOAGRO, Universidade Federal de Viçosa, 36570-900, Minas Gerais, Brazil
| | - Elizabeth Pacheco Batista Fontes
- National Institute of Science and Technology in Plant-Pest Interactions/BIOAGRO, Universidade Federal de Viçosa, 36570-900, Minas Gerais, Brazil.
| | - Fabio Ribeiro Cerqueira
- Computer Science Department, Universidade Federal de Viçosa, 36570-900, Minas Gerais, Brazil.
- Department of Production Engineering, Universidade Federal Fluminense, Petrópolis, 25650-050, Rio de Janeiro, Brazil.
| |
Collapse
|
38
|
Du PF. Predicting Protein Submitochondrial Locations: The 10th Anniversary. Curr Genomics 2017; 18:316-321. [PMID: 29081687 PMCID: PMC5635615 DOI: 10.2174/1389202918666170228143256] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Revised: 10/16/2016] [Accepted: 11/02/2016] [Indexed: 12/16/2022] Open
Abstract
Predicting protein submitochondrial location has been studied for about ten years. A number of methods have been developed. The prediction performances have been improved to an almost perfect level. In this review, we introduce the background of this research topic. We also compare the methods, the performances and the datasets that have been used by these studies. Towards the end, we provide hints for the future directions of this research topic.
Collapse
Affiliation(s)
- Pu-Feng Du
- School of Computer Science and Technology, Tianjin University, Tianjin300350, China
| |
Collapse
|
39
|
Huo H, Li T, Wang S, Lv Y, Zuo Y, Yang L. Prediction of presynaptic and postsynaptic neurotoxins by combining various Chou's pseudo components. Sci Rep 2017; 7:5827. [PMID: 28724993 PMCID: PMC5517432 DOI: 10.1038/s41598-017-06195-y] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Accepted: 06/08/2017] [Indexed: 11/09/2022] Open
Abstract
Presynaptic and postsynaptic neurotoxins are two groups of neurotoxins. Identification of presynaptic and postsynaptic neurotoxins is an important work for numerous newly found toxins. It is both costly and time consuming to determine these two neurotoxins by experimental methods. As a complement, using computational methods for predicting presynaptic and postsynaptic neurotoxins could provide some useful information in a timely manner. In this study, we described four algorithms for predicting presynaptic and postsynaptic neurotoxins from sequence driven features by using Increment of Diversity (ID), Multinomial Naive Bayes Classifier (MNBC), Random Forest (RF), and K-nearest Neighbours Classifier (IBK). Each protein sequence was encoded by pseudo amino acid (PseAA) compositions and three biological motif features, including MEME, Prosite and InterPro motif features. The Maximum Relevance Minimum Redundancy (MRMR) feature selection method was used to rank the PseAA compositions and the 50 top ranked features were selected to improve the prediction accuracy. The PseAA compositions and three kinds of biological motif features were combined and 12 different parameters that defined as P1-P12 were selected as the input parameters of ID, MNBC, RF, and IBK. The prediction results obtained in this study were significantly better than those of previously developed methods.
Collapse
Affiliation(s)
- Haiyan Huo
- Department of Environmental Engineering, Hohhot University for Nationalities, Hohhot, 010051, China
| | - Tao Li
- College of Life Science, Inner Mongolia Agricultural University, Hohhot, 010018, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yingli Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yongchun Zuo
- The Key Laboratory of Mammalian Reproductive Biology and Biotechnology of the Ministry of Education, Inner Mongolia University, Hohhot, 010021, China.
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China.
| |
Collapse
|
40
|
Tahir M, Hayat M. Machine learning based identification of protein–protein interactions using derived features of physiochemical properties and evolutionary profiles. Artif Intell Med 2017; 78:61-71. [DOI: 10.1016/j.artmed.2017.06.006] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2017] [Revised: 06/09/2017] [Accepted: 06/11/2017] [Indexed: 02/09/2023]
|
41
|
Shafee TMA, Lay FT, Phan TK, Anderson MA, Hulett MD. Convergent evolution of defensin sequence, structure and function. Cell Mol Life Sci 2017; 74:663-682. [PMID: 27557668 PMCID: PMC11107677 DOI: 10.1007/s00018-016-2344-5] [Citation(s) in RCA: 123] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2016] [Revised: 07/27/2016] [Accepted: 08/15/2016] [Indexed: 02/06/2023]
Abstract
Defensins are a well-characterised group of small, disulphide-rich, cationic peptides that are produced by essentially all eukaryotes and are highly diverse in their sequences and structures. Most display broad range antimicrobial activity at low micromolar concentrations, whereas others have other diverse roles, including cell signalling (e.g. immune cell recruitment, self/non-self-recognition), ion channel perturbation, toxic functions, and enzyme inhibition. The defensins consist of two superfamilies, each derived from an independent evolutionary origin, which have subsequently undergone extensive divergent evolution in their sequence, structure and function. Referred to as the cis- and trans-defensin superfamilies, they are classified based on their secondary structure orientation, cysteine motifs and disulphide bond connectivities, tertiary structure similarities and precursor gene sequence. The utility of displaying loops on a stable, compact, disulphide-rich core has been exploited by evolution on multiple occasions. The defensin superfamilies represent a case where the ensuing convergent evolution of sequence, structure and function has been particularly extreme. Here, we discuss the extent, causes and significance of these convergent features, drawing examples from across the eukaryotes.
Collapse
Affiliation(s)
- Thomas M A Shafee
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, 3086, Australia
| | - Fung T Lay
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, 3086, Australia
| | - Thanh Kha Phan
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, 3086, Australia
| | - Marilyn A Anderson
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, 3086, Australia.
| | - Mark D Hulett
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, 3086, Australia.
| |
Collapse
|
42
|
Borup R, Thuesen LL, Andersen CY, Nyboe-Andersen A, Ziebe S, Winther O, Grøndahl ML. Competence Classification of Cumulus and Granulosa Cell Transcriptome in Embryos Matched by Morphology and Female Age. PLoS One 2016; 11:e0153562. [PMID: 27128483 PMCID: PMC4851390 DOI: 10.1371/journal.pone.0153562] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2015] [Accepted: 03/31/2016] [Indexed: 12/23/2022] Open
Abstract
Objective By focussing on differences in the mural granulosa cell (MGC) and cumulus cell (CC) transcriptomes from follicles resulting in competent (live birth) and non-competent (no pregnancy) oocytes the study aims on defining a competence classifier expression profile in the two cellular compartments. Design: A case-control study. Setting: University based facilities for clinical services and research. Patients: MGC and CC samples from 60 women undergoing IVF treatment following the long GnRH-agonist protocol were collected. Samples from 16 oocytes where live birth was achieved and 16 age- and embryo morphology matched incompetent oocytes were included in the study. Methods MGC and CC were isolated immediately after oocyte retrieval. From the 16 competent and non-competent follicles, mRNA was extracted and expression profile generated on the Human Gene 1.0 ST Affymetrix array. Live birth prediction analysis using machine learning algorithms (support vector machines) with performance estimation by leave-one-out cross validation and independent validation on an external data set. Results We defined a signature of 30 genes expressed in CC predictive of live birth. This live birth prediction model had an accuracy of 81%, a sensitivity of 0.83, a specificity of 0.80, a positive predictive value of 0.77, and a negative predictive value of 0.86. Receiver operating characteristic analysis found an area under the curve of 0.86, significantly greater than random chance. When applied on 3 external data sets with the end-point outcome measure of blastocyst formation, the signature resulted in 62%, 75% and 88% accuracy, respectively. The genes in the classifier are primarily connected to apoptosis and involvement in formation of extracellular matrix. We were not able to define a robust MGC classifier signature that could classify live birth with accuracy above random chance level. Conclusion We have developed a cumulus cell classifier, which showed a promising performance on external data. This suggests that the gene signature at least partly include genes that relates to competence in the developing blastocyst.
Collapse
Affiliation(s)
- Rehannah Borup
- Center for Genomic Medicine, University Hospital of Copenhagen, Rigshospitalet, Copenhagen, Denmark
- * E-mail:
| | - Lea Langhoff Thuesen
- Fertility Clinic, University Hospital of Copenhagen, Rigshospitalet, Copenhagen, Denmark
| | - Claus Yding Andersen
- Laboratory of Reproductive Biology, University Hospital of Copenhagen, Rigshospitalet, Copenhagen, Denmark
| | - Anders Nyboe-Andersen
- Fertility Clinic, University Hospital of Copenhagen, Rigshospitalet, Copenhagen, Denmark
| | - Søren Ziebe
- Fertility Clinic, University Hospital of Copenhagen, Rigshospitalet, Copenhagen, Denmark
| | - Ole Winther
- Bioinformatics Center, Department of Biology and Biotech Research and Innovation Centre, University of Copenhagen, Copenhagen, Denmark
| | - Marie Louise Grøndahl
- Fertility Clinic, University Hospital of Copenhagen, Herlev Hospital, Copenhagen, Denmark
| |
Collapse
|