Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zhu X, He J, Zhao S, Tao W, Xiong Y, Bi S. A comprehensive comparison and analysis of computational predictors for RNA N6-methyladenosine sites of Saccharomyces cerevisiae. Brief Funct Genomics 2020;18:367-376. [PMID: 31609411 DOI: 10.1093/bfgp/elz018] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Revised: 07/07/2019] [Accepted: 07/15/2019] [Indexed: 12/16/2022] Open

For:	Zhu X, He J, Zhao S, Tao W, Xiong Y, Bi S. A comprehensive comparison and analysis of computational predictors for RNA N6-methyladenosine sites of Saccharomyces cerevisiae. Brief Funct Genomics 2020;18:367-376. [PMID: 31609411 DOI: 10.1093/bfgp/elz018] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Revised: 07/07/2019] [Accepted: 07/15/2019] [Indexed: 12/16/2022] Open

Number

Cited by Other Article(s)

Liu Y, Liu Y, Wang GA, Cheng Y, Bi S, Zhu X. BERT-Kgly: A Bidirectional Encoder Representations From Transformers (BERT)-Based Model for Predicting Lysine Glycation Site for Homo sapiens. FRONTIERS IN BIOINFORMATICS 2022;2:834153. [PMID: 36304324 PMCID: PMC9580886 DOI: 10.3389/fbinf.2022.834153] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Accepted: 01/20/2022] [Indexed: 12/21/2022] Open

Staem5: A novel computational approachfor accurate prediction of m5C site. MOLECULAR THERAPY. NUCLEIC ACIDS 2021;26:1027-1034. [PMID: 34786208 PMCID: PMC8571400 DOI: 10.1016/j.omtn.2021.10.012] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Revised: 08/27/2021] [Accepted: 10/06/2021] [Indexed: 12/25/2022]

WHISTLE server: A high-accuracy genomic coordinate-based machine learning platform for RNA modification prediction. Methods 2021;203:378-382. [PMID: 34245870 DOI: 10.1016/j.ymeth.2021.07.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 06/28/2021] [Accepted: 07/05/2021] [Indexed: 01/12/2023] Open

Zhao S, Ju Y, Ye X, Zhang J, Han S. Bioluminescent Proteins Prediction with Voting Strategy. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200601122328] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

Niu M, Lin Y, Zou Q. sgRNACNN: identifying sgRNA on-target activity in four crops using ensembles of convolutional neural networks. PLANT MOLECULAR BIOLOGY 2021;105:483-495. [PMID: 33385273 DOI: 10.1007/s11103-020-01102-y] [Citation(s) in RCA: 65] [Impact Index Per Article: 21.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2020] [Accepted: 12/01/2020] [Indexed: 06/12/2023]

Huang Q, Zhou W, Guo F, Xu L, Zhang L. 6mA-Pred: identifying DNA N6-methyladenine sites based on deep learning. PeerJ 2021;9:e10813. [PMID: 33604189 PMCID: PMC7866889 DOI: 10.7717/peerj.10813] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Accepted: 12/30/2020] [Indexed: 01/03/2023] Open

He S, Guo F, Zou Q, HuiDing. MRMD2.0: A Python Tool for Machine Learning with Feature Ranking and Reduction. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200503030350] [Citation(s) in RCA: 101] [Impact Index Per Article: 33.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Chen K, Song B, Tang Y, Wei Z, Xu Q, Su J, de Magalhães JP, Rigden DJ, Meng J. RMDisease: a database of genetic variants that affect RNA modifications, with implications for epitranscriptome pathogenesis. Nucleic Acids Res 2021;49:D1396-D1404. [PMID: 33010174 PMCID: PMC7778951 DOI: 10.1093/nar/gkaa790] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 09/08/2020] [Accepted: 09/11/2020] [Indexed: 12/11/2022] Open

Zhuang J, Liu D, Lin M, Qiu W, Liu J, Chen S. PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm. Front Genet 2021;12:773882. [PMID: 34868261 PMCID: PMC8637112 DOI: 10.3389/fgene.2021.773882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 10/04/2021] [Indexed: 11/16/2022] Open

Chen X, Xiong Y, Liu Y, Chen Y, Bi S, Zhu X. m5CPred-SVM: a novel method for predicting m5C sites of RNA. BMC Bioinformatics 2020;21:489. [PMID: 33126851 PMCID: PMC7602301 DOI: 10.1186/s12859-020-03828-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Accepted: 10/21/2020] [Indexed: 02/08/2023] Open

Abstract

BACKGROUND

As one of the most common post-transcriptional modifications (PTCM) in RNA, 5-cytosine-methylation plays important roles in many biological functions such as RNA metabolism and cell fate decision. Through accurate identification of 5-methylcytosine (m5C) sites on RNA, researchers can better understand the exact role of 5-cytosine-methylation in these biological functions. In recent years, computational methods of predicting m5C sites have attracted lots of interests because of its efficiency and low-cost. However, both the accuracy and efficiency of these methods are not satisfactory yet and need further improvement.

RESULTS

In this work, we have developed a new computational method, m5CPred-SVM, to identify m5C sites in three species, H. sapiens, M. musculus and A. thaliana. To build this model, we first collected benchmark datasets following three recently published methods. Then, six types of sequence-based features were generated based on RNA segments and the sequential forward feature selection strategy was used to obtain the optimal feature subset. After that, the performance of models based on different learning algorithms were compared, and the model based on the support vector machine provided the highest prediction accuracy. Finally, our proposed method, m5CPred-SVM was compared with several existing methods, and the result showed that m5CPred-SVM offered substantially higher prediction accuracy than previously published methods. It is expected that our method, m5CPred-SVM, can become a useful tool for accurate identification of m5C sites.

CONCLUSION

In this study, by introducing position-specific propensity related features, we built a new model, m5CPred-SVM, to predict RNA m5C sites of three different species. The result shows that our model outperformed the existing state-of-art models. Our model is available for users through a web server at https://zhulab.ahu.edu.cn/m5CPred-SVM .

Collapse

Li Q, Xu L, Li Q, Zhang L. Identification and Classification of Enhancers Using Dimension Reduction Technique and Recurrent Neural Network. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2020;2020:8852258. [PMID: 33133227 PMCID: PMC7591959 DOI: 10.1155/2020/8852258] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Revised: 09/16/2020] [Accepted: 09/30/2020] [Indexed: 12/21/2022]

Tang R, Zhang Y, Liang C, Xu J, Meng Q, Hua J, Liu J, Zhang B, Yu X, Shi S. The role of m6A-related genes in the prognosis and immune microenvironment of pancreatic adenocarcinoma. PeerJ 2020;8:e9602. [PMID: 33062408 PMCID: PMC7528816 DOI: 10.7717/peerj.9602] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Accepted: 07/03/2020] [Indexed: 12/13/2022] Open

Abstract

Background

Pancreatic adenocarcinoma (PAAD) is among the most lethal diseases and has a dismal prognosis; however, efficient treatment is currently limited. Several studies have observed epigenetic variation during tumorigenesis, suggesting the potential role of RNA methylation, especially N6-methyladenosine (m6A) modification, as a novel epigenetic modification mediating PAAD prognosis.

Methods

The expression levels of m6A-related genes were downloaded from The Cancer Genome Atlas-Pancreatic Adenocarcinoma (TCGA) and Genotype-Tissue Expression (GTEx) projects, and the findings were validated in four Expression Omnibus (GEO) datasets. A predictive model was constructed using a lasso regression and evaluated by a survival analysis and receiver operating characteristic curve. Consensus clustering identified two distinct subgroups with different immune activity signatures based on the expression pattern of m6A-related genes. The relationship between the mutation state of m6A-related genes and infiltration of immune cells was established and visualized using Tumor Immune Estimation Resource (https://cistrome.shinyapps.io/timer/).

Results

Fourteen of twenty-one m6A-related genes were differentially expressed between PAAD and normal tissues in TCGA-GTEx cohort. Among these genes, HNRNPC, IGF2BP2 and YTHDF1 were further validated in four GEO datasets. Moreover, an m6A-based model exhibited moderate accuracy in predicting overall survival in PAAD samples. Additionally, potential m6A modification targets were screened by selecting genes from a set of 23,391 genes that not only harbored the most m6A-modified sites but also showed a robust correlation with PAAD survival. Moreover, we correlated the expression level of m6A-related genes with the immune microenvironment of pancreatic cancer for the first time. Specifically, both arm-level gain and deletion of ALKBH5 decreased the infiltration of CD8+T cells (P < 0.05 and P < 0.01, respectively).

Conclusion

Collectively, our findings suggest a novel anticancer strategy for restoring balanced RNA methylation in tumor cells and guide clinical physicians in developing a new practical approach for considering the impact of related genes on prognosis.

Collapse

Affiliation(s)

Rong Tang Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, Shanghai, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.,Shanghai Pancreatic Cancer Institute, Shanghai, China.,Pancreatic Cancer Institute, Fudan University, Shanghai, China
Yiyin Zhang Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, Shanghai, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.,Shanghai Pancreatic Cancer Institute, Shanghai, China.,Pancreatic Cancer Institute, Fudan University, Shanghai, China
Chen Liang Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, Shanghai, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.,Shanghai Pancreatic Cancer Institute, Shanghai, China.,Pancreatic Cancer Institute, Fudan University, Shanghai, China
Jin Xu Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, Shanghai, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.,Shanghai Pancreatic Cancer Institute, Shanghai, China.,Pancreatic Cancer Institute, Fudan University, Shanghai, China
Qingcai Meng Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, Shanghai, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.,Shanghai Pancreatic Cancer Institute, Shanghai, China.,Pancreatic Cancer Institute, Fudan University, Shanghai, China
Jie Hua Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, Shanghai, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.,Shanghai Pancreatic Cancer Institute, Shanghai, China.,Pancreatic Cancer Institute, Fudan University, Shanghai, China
Jiang Liu Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, Shanghai, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.,Shanghai Pancreatic Cancer Institute, Shanghai, China.,Pancreatic Cancer Institute, Fudan University, Shanghai, China
Bo Zhang Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, Shanghai, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.,Shanghai Pancreatic Cancer Institute, Shanghai, China.,Pancreatic Cancer Institute, Fudan University, Shanghai, China
Xianjun Yu Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, Shanghai, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.,Shanghai Pancreatic Cancer Institute, Shanghai, China.,Pancreatic Cancer Institute, Fudan University, Shanghai, China
Si Shi Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, Shanghai, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.,Shanghai Pancreatic Cancer Institute, Shanghai, China.,Pancreatic Cancer Institute, Fudan University, Shanghai, China

Collapse

Chen T, Wang X, Chu Y, Wang Y, Jiang M, Wei DQ, Xiong Y. T4SE-XGB: Interpretable Sequence-Based Prediction of Type IV Secreted Effectors Using eXtreme Gradient Boosting Algorithm. Front Microbiol 2020;11:580382. [PMID: 33072049 PMCID: PMC7541839 DOI: 10.3389/fmicb.2020.580382] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Accepted: 08/21/2020] [Indexed: 12/19/2022] Open

Wahab A, Mahmoudi O, Kim J, Chong KT. DNC4mC-Deep: Identification and Analysis of DNA N4-Methylcytosine Sites Based on Different Encoding Schemes By Using Deep Learning. Cells 2020;9:E1756. [PMID: 32707969 PMCID: PMC7465362 DOI: 10.3390/cells9081756] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 07/17/2020] [Accepted: 07/17/2020] [Indexed: 11/24/2022] Open

Identification of Human Enzymes Using Amino Acid Composition and the Composition of k-Spaced Amino Acid Pairs. BIOMED RESEARCH INTERNATIONAL 2020;2020:9235920. [PMID: 32596396 PMCID: PMC7273372 DOI: 10.1155/2020/9235920] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/14/2020] [Accepted: 04/22/2020] [Indexed: 11/17/2022]

Dou L, Li X, Ding H, Xu L, Xiang H. Prediction of m5C Modifications in RNA Sequences by Combining Multiple Sequence Features. MOLECULAR THERAPY. NUCLEIC ACIDS 2020;21:332-342. [PMID: 32645685 PMCID: PMC7340967 DOI: 10.1016/j.omtn.2020.06.004] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 06/03/2020] [Accepted: 06/04/2020] [Indexed: 12/14/2022]

Its2vec: Fungal Species Identification Using Sequence Embedding and Random Forest Classification. BIOMED RESEARCH INTERNATIONAL 2020;2020:2468789. [PMID: 32566672 PMCID: PMC7275950 DOI: 10.1155/2020/2468789] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 03/20/2020] [Accepted: 03/25/2020] [Indexed: 12/19/2022]

Mahmoudi O, Wahab A, Chong KT. iMethyl-Deep: N6 Methyladenosine Identification of Yeast Genome with Automatic Feature Extraction Technique by Using Deep Learning Algorithm. Genes (Basel) 2020;11:genes11050529. [PMID: 32397453 PMCID: PMC7288457 DOI: 10.3390/genes11050529] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Revised: 04/30/2020] [Accepted: 05/05/2020] [Indexed: 12/12/2022] Open

Meng C, Hu Y, Zhang Y, Guo F. PSBP-SVM: A Machine Learning-Based Computational Identifier for Predicting Polystyrene Binding Peptides. Front Bioeng Biotechnol 2020;8:245. [PMID: 32296690 PMCID: PMC7137786 DOI: 10.3389/fbioe.2020.00245] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 03/09/2020] [Indexed: 12/11/2022] Open

Govindaraj RG, Subramaniyam S, Manavalan B. Extremely-randomized-tree-based Prediction of N⁶-Methyladenosine Sites in Saccharomyces cerevisiae. Curr Genomics 2020;21:26-33. [PMID: 32655295 PMCID: PMC7324895 DOI: 10.2174/1389202921666200219125625] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Revised: 12/28/2019] [Accepted: 01/24/2020] [Indexed: 02/07/2023] Open

Hou R, Wang L, Wu YJ. Predicting ATP-Binding Cassette Transporters Using the Random Forest Method. Front Genet 2020;11:156. [PMID: 32269586 PMCID: PMC7109328 DOI: 10.3389/fgene.2020.00156] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2019] [Accepted: 02/11/2020] [Indexed: 12/21/2022] Open

Wang C, Zhang J, Wang X, Han K, Guo M. Pathogenic Gene Prediction Algorithm Based on Heterogeneous Information Fusion. Front Genet 2020;11:5. [PMID: 32117433 PMCID: PMC7010852 DOI: 10.3389/fgene.2020.00005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Accepted: 01/06/2020] [Indexed: 12/23/2022] Open

Abstract

Complex diseases seriously affect people's physical and mental health. The discovery of disease-causing genes has become a target of research. With the emergence of bioinformatics and the rapid development of biotechnology, to overcome the inherent difficulties of the long experimental period and high cost of traditional biomedical methods, researchers have proposed many gene prioritization algorithms that use a large amount of biological data to mine pathogenic genes. However, because the currently known gene–disease association matrix is still very sparse and lacks evidence that genes and diseases are unrelated, there are limits to the predictive performance of gene prioritization algorithms. Based on the hypothesis that functionally related gene mutations may lead to similar disease phenotypes, this paper proposes a PU induction matrix completion algorithm based on heterogeneous information fusion (PUIMCHIF) to predict candidate genes involved in the pathogenicity of human diseases. On the one hand, PUIMCHIF uses different compact feature learning methods to extract features of genes and diseases from multiple data sources, making up for the lack of sparse data. On the other hand, based on the prior knowledge that most of the unknown gene–disease associations are unrelated, we use the PU-Learning strategy to treat the unknown unlabeled data as negative examples for biased learning. The experimental results of the PUIMCHIF algorithm regarding the three indexes of precision, recall, and mean percentile ranking (MPR) were significantly better than those of other algorithms. In the top 100 global prediction analysis of multiple genes and multiple diseases, the probability of recovering true gene associations using PUIMCHIF reached 50% and the MPR value was 10.94%. The PUIMCHIF algorithm has higher priority than those from other methods, such as IMC and CATAPULT.

Collapse

Lv Z, Zhang J, Ding H, Zou Q. RF-PseU: A Random Forest Predictor for RNA Pseudouridine Sites. Front Bioeng Biotechnol 2020;8:134. [PMID: 32175316 PMCID: PMC7054385 DOI: 10.3389/fbioe.2020.00134] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Accepted: 02/10/2020] [Indexed: 12/21/2022] Open

Huang Q, Zhang J, Wei L, Guo F, Zou Q. 6mA-RicePred: A Method for Identifying DNA N ⁶-Methyladenine Sites in the Rice Genome Based on Feature Fusion. FRONTIERS IN PLANT SCIENCE 2020;11:4. [PMID: 32076430 PMCID: PMC7006724 DOI: 10.3389/fpls.2020.00004] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Accepted: 01/06/2020] [Indexed: 06/01/2023]

Wang X, Zhu X, Ye M, Wang Y, Li CD, Xiong Y, Wei DQ. STS-NLSP: A Network-Based Label Space Partition Method for Predicting the Specificity of Membrane Transporter Substrates Using a Hybrid Feature of Structural and Semantic Similarity. Front Bioeng Biotechnol 2019;7:306. [PMID: 31781551 PMCID: PMC6851049 DOI: 10.3389/fbioe.2019.00306] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Accepted: 10/17/2019] [Indexed: 12/11/2022] Open

Abstract

Membrane transport proteins play crucial roles in the pharmacokinetics of substrate drugs, the drug resistance in cancer and are vital to the process of drug discovery, development and anti-cancer therapeutics. However, experimental methods to profile a substrate drug against a panel of transporters to determine its specificity are labor intensive and time consuming. In this article, we aim to develop an in silico multi-label classification approach to predict whether a substrate can specifically recognize one of the 13 categories of drug transporters ranging from ATP-binding cassette to solute carrier families using both structural fingerprints and chemical ontologies information of substrates. The data-driven network-based label space partition (NLSP) method was utilized to construct the model based on a hybrid of similarity-based feature by the integration of 2D fingerprint and semantic similarity. This method builds predictors for each label cluster (possibly intersecting) detected by community detection algorithms and takes union of label sets for a compound as final prediction. NLSP lies into the ensembles of multi-label classifier category in multi-label learning field. We utilized Cramér's V statistics to quantify the label correlations and depicted them via a heatmap. The jackknife tests and iterative stratification based cross-validation method were adopted on a benchmark dataset to evaluate the prediction performance of the proposed models both in multi-label and label-wise manner. Compared with other powerful multi-label methods, ML-kNN, MTSVM, and RAkELd, our multi-label classification model of NLPS-RF (random forest-based NLSP) has proven to be a feasible and effective model, and performed satisfactorily in the predictive task of transporter-substrate specificity. The idea behind NLSP method is intriguing and the power of NLSP remains to be explored for the multi-label learning problems in bioinformatics. The benchmark dataset, intermediate results and python code which can fully reproduce our experiments and results are available at https://github.com/dqwei-lab/STS.

Collapse