1
|
Cheong HH, Zuo W, Chen J, Un CW, Si YW, Wong KH, Kwok HF, Siu SWI. Identification of Anticancer Peptides from the Genome of Candida albicans: in Silico Screening, in Vitro and in Vivo Validations. J Chem Inf Model 2024; 64:6174-6189. [PMID: 39008832 DOI: 10.1021/acs.jcim.4c00501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/17/2024]
Abstract
Anticancer peptides (ACPs) are promising future therapeutics, but their experimental discovery remains time-consuming and costly. To accelerate the discovery process, we propose a computational screening workflow to identify, filter, and prioritize peptide sequences based on predicted class probability, antitumor activity, and toxicity. The workflow was applied to identify novel ACPs with potent activity against colorectal cancer from the genome sequences of Candida albicans. As a result, four candidates were identified and validated in the HCT116 colon cancer cell line. Among them, PCa1 and PCa2 emerged as the most potent, displaying IC50 values of 3.75 and 56.06 μM, respectively, and demonstrating a 4-fold selectivity for cancer cells over normal cells. In the colon xenograft nude mice model, the administration of both peptides resulted in substantial inhibition of tumor growth without causing significant adverse effects. In conclusion, this work not only contributes a proven computational workflow for ACP discovery but also introduces two peptides, PCa1 and PCa2, as promising candidates poised for further development as targeted therapies for colon cancer. The method as a web service is available at https://app.cbbio.online/acpep/home and the source code at https://github.com/cartercheong/AcPEP_classification.git.
Collapse
Affiliation(s)
- Hong-Hin Cheong
- Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
| | - Weimin Zuo
- Department of Biomedical Sciences, Faculty of Health Sciences, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
- Cancer Centre, Faculty of Health Sciences, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
| | - Jiarui Chen
- Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
| | - Chon-Wai Un
- Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
| | - Yain-Whar Si
- Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
| | - Koon Ho Wong
- Department of Biomedical Sciences, Faculty of Health Sciences, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
- MoE Frontiers Science Center for Precision Oncology, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
- Cancer Centre, Faculty of Health Sciences, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
| | - Hang Fai Kwok
- Department of Biomedical Sciences, Faculty of Health Sciences, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
- MoE Frontiers Science Center for Precision Oncology, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
- Cancer Centre, Faculty of Health Sciences, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
| | - Shirley W I Siu
- Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macau SAR 999078, China
- Institute of Science and Environment, University of Saint Joseph, Estrada Marginal da Ilha Verde 14-17, Macau SAR 999078, China
| |
Collapse
|
2
|
Chen XG, Yang X, Li C, Lin X, Zhang W. Non-coding RNA identification with pseudo RNA sequences and feature representation learning. Comput Biol Med 2023; 165:107355. [PMID: 37639767 DOI: 10.1016/j.compbiomed.2023.107355] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 07/16/2023] [Accepted: 08/12/2023] [Indexed: 08/31/2023]
Abstract
Distinguishing non-coding RNAs (ncRNAs) from coding RNAs is very important in bioinformatics. Although many methods have been proposed for solving this task, it remains highly challenging to further improve the accuracy of ncRNA identification. In this paper, we propose a coding potential predictor using feature representation learning based on pseudo RNA sequences named CPPFLPS. In this method, we use the pseudo RNA sequences generated by simulating RNA sequence mutations as new samples for data augmentation, and six string operations simulating RNA sequence mutations are considered: base replacement, base insertion, base deletion, subsequence reversion, subsequence repetition and subsequence deletion. In the feature representation learning framework, different types of pseudo RNA sequences are added to the training set to form new training sets that can be used to train baseline classifiers, thus obtaining baseline models. The resulting labels of these baseline models are used as feature vectors to represent RNA sequences, and the resulting feature vectors acquired after feature selection are used to train a predictive model for distinguishing ncRNAs from coding RNAs. Our method achieves better performance compared with that of existing state-of-the-art methods. The implementation of the proposed method is available at https://github.com/chenxgscuec/CPPFLPS.
Collapse
Affiliation(s)
- Xian-Gan Chen
- School of Biomedical Engineering, South-Central Minzu University, Wuhan, 430074, China; Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central Minzu University, Wuhan, 430074, China; Key Laboratory of Cognitive Science(South-Central Minzu University), State Ethnic Affairs Commission, Wuhan, 430074, China.
| | - Xiaofei Yang
- School of Biomedical Engineering, South-Central Minzu University, Wuhan, 430074, China; Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central Minzu University, Wuhan, 430074, China; Key Laboratory of Cognitive Science(South-Central Minzu University), State Ethnic Affairs Commission, Wuhan, 430074, China.
| | - Chenhong Li
- School of Biomedical Engineering, South-Central Minzu University, Wuhan, 430074, China; Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central Minzu University, Wuhan, 430074, China; Key Laboratory of Cognitive Science(South-Central Minzu University), State Ethnic Affairs Commission, Wuhan, 430074, China.
| | - Xianguang Lin
- School of Biomedical Engineering, South-Central Minzu University, Wuhan, 430074, China; Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central Minzu University, Wuhan, 430074, China; Key Laboratory of Cognitive Science(South-Central Minzu University), State Ethnic Affairs Commission, Wuhan, 430074, China.
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
3
|
Lei Y, Meng Y, Guo X, Ning K, Bian Y, Li L, Hu Z, Anashkina AA, Jiang Q, Dong Y, Zhu X. Overview of structural variation calling: Simulation, identification, and visualization. Comput Biol Med 2022; 145:105534. [DOI: 10.1016/j.compbiomed.2022.105534] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Revised: 04/09/2022] [Accepted: 04/14/2022] [Indexed: 12/11/2022]
|
4
|
Chen XG, Liu S, Zhang W. Predicting Coding Potential of RNA Sequences by Solving Local Data Imbalance. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1075-1083. [PMID: 32886613 DOI: 10.1109/tcbb.2020.3021800] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Non-coding RNAs (ncRNAs)play an important role in various biological processes and are associated with diseases. Distinguishing between coding RNAs and ncRNAs, also known as predicting coding potential of RNA sequences, is critical for downstream biological function analysis. Many machine learning-based methods have been proposed for predicting coding potential of RNA sequences. Recent studies reveal that most existing methods have poor performance on RNA sequences with short Open Reading Frames (sORF, ORF length<303nt). In this work, we analyze the distribution of ORF length of RNA sequences, and observe that the number of coding RNAs with sORF is inadequate and coding RNAs with sORF are much less than ncRNAs with sORF. Thus, there exists the problem of local data imbalance in RNA sequences with sORF. We propose a coding potential prediction method CPE-SLDI, which uses data oversampling techniques to augment samples for coding RNAs with sORF so as to alleviate local data imbalance. Compared with existing methods, CPE-SLDI produces the better performances, and studies reveal that data augmentation by various data oversampling techniques can enhance the performance of coding potential prediction, especially for RNA sequences with sORF. The implementation of the proposed method is available at https://github.com/chenxgscuec/CPESLDI.
Collapse
|
5
|
Guo Y, Ju Y, Chen D, Wang L. Research on the Computational Prediction of Essential Genes. Front Cell Dev Biol 2021; 9:803608. [PMID: 34938741 PMCID: PMC8685449 DOI: 10.3389/fcell.2021.803608] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 11/22/2021] [Indexed: 11/19/2022] Open
Abstract
Genes, the nucleotide sequences that encode a polypeptide chain or functional RNA, are the basic genetic unit controlling biological traits. They are the guarantee of the basic structures and functions in organisms, and they store information related to biological factors and processes such as blood type, gestation, growth, and apoptosis. The environment and genetics jointly affect important physiological processes such as reproduction, cell division, and protein synthesis. Genes are related to a wide range of phenomena including growth, decline, illness, aging, and death. During the evolution of organisms, there is a class of genes that exist in a conserved form in multiple species. These genes are often located on the dominant strand of DNA and tend to have higher expression levels. The protein encoded by it usually either performs very important functions or is responsible for maintaining and repairing these essential functions. Such genes are called persistent genes. Among them, the irreplaceable part of the body’s life activities is the essential gene. For example, when starch is the only source of energy, the genes related to starch digestion are essential genes. Without them, the organism will die because it cannot obtain enough energy to maintain basic functions. The function of the proteins encoded by these genes is thought to be fundamental to life. Nowadays, DNA can be extracted from blood, saliva, or tissue cells for genetic testing, and detailed genetic information can be obtained using the most advanced scientific instruments and technologies. The information gained from genetic testing is useful to assess the potential risks of disease, and to help determine the prognosis and development of diseases. Such information is also useful for developing personalized medication and providing targeted health guidance to improve the quality of life. Therefore, it is of great theoretical and practical significance to identify important and essential genes. In this paper, the research status of essential genes and the essential genome database of bacteria are reviewed, the computational prediction method of essential genes based on communication coding theory is expounded, and the significance and practical application value of essential genes are discussed.
Collapse
Affiliation(s)
- Yuxin Guo
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China
| | - Dong Chen
- College of Electrical and Information Engineering, Quzhou University, Quzhou, China
| | - Lihong Wang
- Beidahuang Industry Group General Hospital, Harbin, China
| |
Collapse
|
6
|
Zhang Z, Zhao Y, Wang J, Guo M. DeepRCI: predicting ATP-binding proteins using the residue-residue contact information. IEEE J Biomed Health Inform 2021; 26:2822-2829. [PMID: 34941538 DOI: 10.1109/jbhi.2021.3137840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Adenine-5'-triphosphate (ATP) is a direct energy source for various activities of tissues and cells in the body. The release of ATP energies requires the assistance of ATP-binding proteins. Therefore, the identification of ATP-binding proteins is of great significance for the research on organisms. So far, there are several methods for predicting ATP-binding proteins. However, the accuracies of these methods are so low that the predicted proteins are inaccurate. Here, we designed a novel method, called as DeepRCI (based on Deep convolutional neural network and Residue-residue Contact Information), for predicting ATP-binding proteins. DeepRCI achieved an accuracy of 93.61\% on the test set which was a significant improvement over the state-of-the-art methods.
Collapse
|
7
|
Le NQK, Ho QT. Deep transformers and convolutional neural network in identifying DNA N6-methyladenine sites in cross-species genomes. Methods 2021; 204:199-206. [PMID: 34915158 DOI: 10.1016/j.ymeth.2021.12.004] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Revised: 11/30/2021] [Accepted: 12/09/2021] [Indexed: 12/19/2022] Open
Abstract
As one of the most common post-transcriptional epigenetic modifications, N6-methyladenine (6 mA), plays an essential role in various cellular processes and disease pathogenesis. Therefore, accurately identifying 6 mA modifications is necessary for a deep understanding of cellular processes and other possible functional mechanisms. Although a few computational methods have been proposed, their respective models were developed with small training datasets. Hence, their practical application is quite limited in genome-wide detection. To overcome the existing limitations, we present a novel model based on transformer architecture and deep learning to identify DNA 6 mA sites from the cross-species genome. The model is constructed on a benchmark dataset and explored a feature derived from pre-trained transformer word embedding approaches. Subsequently, a convolutional neural network was employed to learn the generated features and generate the prediction outcomes. As a result, our predictor achieved excellent performance during independent test with the accuracy and Matthews correlation coefficient (MCC) of 79.3% and 0.58, respectively. Overall, its performance achieved better accuracy than the baseline models and significantly outperformed the existing predictors, demonstrating the effectiveness of our proposed hybrid framework. Furthermore, our model is expected to assist biologists in accurately identifying 6mAs and formulate the novel testable biological hypothesis. We also release source codes and datasets freely at https://github.com/khanhlee/bert-dna for front-end users.
Collapse
Affiliation(s)
- Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei 106, Taiwan; Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei 106, Taiwan; Translational Imaging Research Center, Taipei Medical University Hospital, Taipei 110, Taiwan.
| | - Quang-Thai Ho
- College of Information & Communication Technology, Can Tho University, Viet Nam; Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan
| |
Collapse
|
8
|
Chen XG, Zhang W, Yang X, Li C, Chen H. ACP-DA: Improving the Prediction of Anticancer Peptides Using Data Augmentation. Front Genet 2021; 12:698477. [PMID: 34276801 PMCID: PMC8279753 DOI: 10.3389/fgene.2021.698477] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 06/07/2021] [Indexed: 12/09/2022] Open
Abstract
Anticancer peptides (ACPs) have provided a promising perspective for cancer treatment, and the prediction of ACPs is very important for the discovery of new cancer treatment drugs. It is time consuming and expensive to use experimental methods to identify ACPs, so computational methods for ACP identification are urgently needed. There have been many effective computational methods, especially machine learning-based methods, proposed for such predictions. Most of the current machine learning methods try to find suitable features or design effective feature learning techniques to accurately represent ACPs. However, the performance of these methods can be further improved for cases with insufficient numbers of samples. In this article, we propose an ACP prediction model called ACP-DA (Data Augmentation), which uses data augmentation for insufficient samples to improve the prediction performance. In our method, to better exploit the information of peptide sequences, peptide sequences are represented by integrating binary profile features and AAindex features, and then the samples in the training set are augmented in the feature space. After data augmentation, the samples are used to train the machine learning model, which is used to predict ACPs. The performance of ACP-DA exceeds that of existing methods, and ACP-DA achieves better performance in the prediction of ACPs compared with a method without data augmentation. The proposed method is available at http://github.com/chenxgscuec/ACPDA.
Collapse
Affiliation(s)
- Xian-Gan Chen
- School of Biomedical Engineering, South-Central University for Nationalities, Wuhan, China.,Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central University for Nationalities, Wuhan, China.,Key Laboratory of Cognitive Science (South-Central University for Nationalities), State Ethnic Affairs Commission, Wuhan, China
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, China.,Hubei Engineering Technology Research Center of Agricultural Big Data, Wuhan, China
| | - Xiaofei Yang
- School of Biomedical Engineering, South-Central University for Nationalities, Wuhan, China.,Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central University for Nationalities, Wuhan, China.,Key Laboratory of Cognitive Science (South-Central University for Nationalities), State Ethnic Affairs Commission, Wuhan, China
| | - Chenhong Li
- School of Biomedical Engineering, South-Central University for Nationalities, Wuhan, China.,Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central University for Nationalities, Wuhan, China.,Key Laboratory of Cognitive Science (South-Central University for Nationalities), State Ethnic Affairs Commission, Wuhan, China
| | - Hengling Chen
- School of Biomedical Engineering, South-Central University for Nationalities, Wuhan, China.,Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, South-Central University for Nationalities, Wuhan, China.,Key Laboratory of Cognitive Science (South-Central University for Nationalities), State Ethnic Affairs Commission, Wuhan, China
| |
Collapse
|
9
|
Zhang J, Sun M, Zhao Y, Geng G, Hu Y. Identification of Gingivitis-Related Genes Across Human Tissues Based on the Summary Mendelian Randomization. Front Cell Dev Biol 2021; 8:624766. [PMID: 34026747 PMCID: PMC8134671 DOI: 10.3389/fcell.2020.624766] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Accepted: 12/02/2020] [Indexed: 11/13/2022] Open
Abstract
Periodontal diseases are among the most frequent inflammatory diseases affecting children and adolescents, which affect the supporting structures of the teeth and lead to tooth loss and contribute to systemic inflammation. Gingivitis is the most common periodontal infection. Gingivitis, which is mainly caused by a substance produced by microbial plaque, systemic disorders, and genetic abnormalities in the host. Identifying gingivitis-related genes across human tissues is not only significant for understanding disease mechanisms but also disease development and clinical diagnosis. The Genome-wide association study (GWAS) a commonly used method to mine disease-related genetic variants. However, due to some factors such as linkage disequilibrium, it is difficult for GWAS to identify genes directly related to the disease. Hence, we constructed a data integration method that uses the Summary Mendelian randomization (SMR) to combine the GWAS with expression quantitative trait locus (eQTL) data to identify gingivitis-related genes. Five eQTL studies from different human tissues and one GWAS studies were referenced in this paper. This study identified several candidates SNPs and genes relate to gingivitis in tissue-specific or cross-tissue. Further, we also analyzed and explained the functions of these genes. The R program for the SMR method has been uploaded to GitHub(https://github.com/hxdde/SMR).
Collapse
Affiliation(s)
- Jiahui Zhang
- Department of Stomatology and Dental Hygiene, The Fourth Affiliated Hospital, Harbin Medical University, Harbin, China
| | - Mingai Sun
- General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Yuanyuan Zhao
- General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Guannan Geng
- Department of Endocrinology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Yang Hu
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
10
|
Dao FY, Lv H, Su W, Sun ZJ, Huang QL, Lin H. iDHS-Deep: an integrated tool for predicting DNase I hypersensitive sites by deep neural network. Brief Bioinform 2021; 22:6158360. [PMID: 33751027 DOI: 10.1093/bib/bbab047] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2020] [Revised: 01/28/2021] [Accepted: 01/29/2021] [Indexed: 01/09/2023] Open
Abstract
DNase I hypersensitive site (DHS) refers to the hypersensitive region of chromatin for the DNase I enzyme. It is an important part of the noncoding region and contains a variety of regulatory elements, such as promoter, enhancer, and transcription factor-binding site, etc. Moreover, the related locus of disease (or trait) are usually enriched in the DHS regions. Therefore, the detection of DHS region is of great significance. In this study, we develop a deep learning-based algorithm to identify whether an unknown sequence region would be potential DHS. The proposed method showed high prediction performance on both training datasets and independent datasets in different cell types and developmental stages, demonstrating that the method has excellent superiority in the identification of DHSs. Furthermore, for the convenience of related wet-experimental researchers, the user-friendly web-server iDHS-Deep was established at http://lin-group.cn/server/iDHS-Deep/, by which users can easily distinguish DHS and non-DHS and obtain the corresponding developmental stage ofDHS.
Collapse
Affiliation(s)
- Fu-Ying Dao
- Informational Biology at University of Electronic Science and Technology of China, China
| | - Hao Lv
- Informational Biology at University of Electronic Science and Technology of China, China
| | - Wei Su
- Informational Biology at University of Electronic Science and Technology of China, China
| | - Zi-Jie Sun
- Informational Biology at University of Electronic Science and Technology of China, China
| | - Qin-Lai Huang
- Informational Biology at University of Electronic Science and Technology of China, China
| | - Hao Lin
- Informational Biology at University of Electronic Science and Technology of China, China
| |
Collapse
|
11
|
Huang Q, Zhou W, Guo F, Xu L, Zhang L. 6mA-Pred: identifying DNA N6-methyladenine sites based on deep learning. PeerJ 2021; 9:e10813. [PMID: 33604189 PMCID: PMC7866889 DOI: 10.7717/peerj.10813] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Accepted: 12/30/2020] [Indexed: 01/03/2023] Open
Abstract
With the accumulation of data on 6mA modification sites, an increasing number of scholars have begun to focus on the identification of 6mA sites. Despite the recognized importance of 6mA sites, methods for their identification remain lacking, with most existing methods being aimed at their identification in individual species. In the present study, we aimed to develop an identification method suitable for multiple species. Based on previous research, we propose a method for 6mA site recognition. Our experiments prove that the proposed 6mA-Pred method is effective for identifying 6mA sites in genes from taxa such as rice, Mus musculus, and human. A series of experimental results show that 6mA-Pred is an excellent method. We provide the source code used in the study, which can be obtained from http://39.100.246.211:5004/6mA_Pred/.
Collapse
Affiliation(s)
- Qianfei Huang
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Wenyang Zhou
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Lichao Zhang
- School of Intelligent Manufacturing and Equipment, Shenzhen Institute of Information Technology, Shenzhen, China
| |
Collapse
|
12
|
Bian H, Guo M, Wang J. Recognition of Mitochondrial Proteins in Plasmodium Based on the Tripeptide Composition. Front Cell Dev Biol 2020; 8:578901. [PMID: 33043014 PMCID: PMC7525148 DOI: 10.3389/fcell.2020.578901] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Accepted: 08/13/2020] [Indexed: 01/31/2023] Open
Abstract
Mitochondria play essential roles in eukaryotic cells, especially in Plasmodium cells. They have several unusual evolutionary and functional features that are incredibly vital for disease diagnosis and drug design. Thus, predicting mitochondrial proteins of Plasmodium has become a worthwhile work. However, existing computational methods can only predict mitochondrial proteins of Plasmodium falciparum (P. falciparum for short), and these methods have low accuracy. It is highly desirable to design a classifier with high accuracy for predicting mitochondrial proteins for all Plasmodium species, not only P. falciparum. We proposed a novel method, named as PM-OTC, for predicting mitochondrial proteins in Plasmodium. PM-OTC uses the Support Vector Machine (SVM) as the classifier and the selected tripeptide composition as the features. We adopted the 5-fold cross-validation method to train and test PM-OTC. Results demonstrate that PM-OTC achieves an accuracy of 94.91%, and performances of PM-OTC are superior to other methods.
Collapse
Affiliation(s)
- Haodong Bian
- School of Computer Science, Inner Mongolia University, Hohhot, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China.,Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing, China
| | - Juan Wang
- School of Computer Science, Inner Mongolia University, Hohhot, China.,Stage Key Laboratories of Reproductive Regulation & Breeding of Grassland Livestock, Hohhot, China
| |
Collapse
|
13
|
Zhang J, Zhang Y, Li Y, Guo S, Yang G. Identification of Cancer Biomarkers in Human Body Fluids by Using Enhanced Physicochemical-incorporated Evolutionary Conservation Scheme. Curr Top Med Chem 2020; 20:1888-1897. [PMID: 32648847 DOI: 10.2174/1568026620666200710100743] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 03/01/2020] [Accepted: 03/02/2020] [Indexed: 02/07/2023]
Abstract
OBJECTIVE Cancer is one of the most serious diseases affecting human health. Among all current cancer treatments, early diagnosis and control significantly help increase the chances of cure. Detecting cancer biomarkers in body fluids now is attracting more attention within oncologists. In-silico predictions of body fluid-related proteins, which can be served as cancer biomarkers, open a door for labor-intensive and time-consuming biochemical experiments. METHODS In this work, we propose a novel method for high-throughput identification of cancer biomarkers in human body fluids. We incorporate physicochemical properties into the weighted observed percentages (WOP) and position-specific scoring matrices (PSSM) profiles to enhance their attributes that reflect the evolutionary conservation of the body fluid-related proteins. The least absolute selection and shrinkage operator (LASSO) feature selection strategy is introduced to generate the optimal feature subset. RESULTS The ten-fold cross-validation results on training datasets demonstrate the accuracy of the proposed model. We also test our proposed method on independent testing datasets and apply it to the identification of potential cancer biomarkers in human body fluids. CONCLUSION The testing results promise a good generalization capability of our approach.
Collapse
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, China
| | - Yu Zhang
- Information Engineering College, Huanghuai University, Zhumadian, China
| | - Yanlin Li
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, China
| | - Song Guo
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, China
| | - Guifu Yang
- College of Information Science and Technology, Northeast Normal University, Changchun, China
| |
Collapse
|
14
|
Liang R, Xie J, Zhang C, Zhang M, Huang H, Huo H, Cao X, Niu B. Identifying Cancer Targets Based on Machine Learning Methods via Chou's 5-steps Rule and General Pseudo Components. Curr Top Med Chem 2019; 19:2301-2317. [PMID: 31622219 DOI: 10.2174/1568026619666191016155543] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2019] [Revised: 07/19/2019] [Accepted: 08/26/2019] [Indexed: 01/09/2023]
Abstract
In recent years, the successful implementation of human genome project has made people realize that genetic, environmental and lifestyle factors should be combined together to study cancer due to the complexity and various forms of the disease. The increasing availability and growth rate of 'big data' derived from various omics, opens a new window for study and therapy of cancer. In this paper, we will introduce the application of machine learning methods in handling cancer big data including the use of artificial neural networks, support vector machines, ensemble learning and naïve Bayes classifiers.
Collapse
Affiliation(s)
- Ruirui Liang
- School of Life Sciences, Shanghai University, Shanghai, 200444, China
| | - Jiayang Xie
- School of Life Sciences, Shanghai University, Shanghai, 200444, China
| | - Chi Zhang
- Foshan Huaxia Eye Hospital, Huaxia Eye Hospital Group, Foshan 528000, China
| | - Mengying Zhang
- School of Life Sciences, Shanghai University, Shanghai, 200444, China
| | - Hai Huang
- School of Life Sciences, Shanghai University, Shanghai, 200444, China
| | - Haizhong Huo
- Department of General Surgery, Shanghai Ninth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai 200011, China
| | - Xin Cao
- Zhongshan Hospital, Institute of Clinical Science, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Bing Niu
- School of Life Sciences, Shanghai University, Shanghai, 200444, China
| |
Collapse
|