Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Peng J, Zhang X, Hui W, Lu J, Li Q, Liu S, Shang X. Improving the measurement of semantic similarity by combining gene ontology and co-functional network: a random walk based approach. BMC Syst Biol 2018;12:18. [PMID: 29560823 PMCID: PMC5861498 DOI: 10.1186/s12918-018-0539-0] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]

For:	Peng J, Zhang X, Hui W, Lu J, Li Q, Liu S, Shang X. Improving the measurement of semantic similarity by combining gene ontology and co-functional network: a random walk based approach. BMC Syst Biol 2018;12:18. [PMID: 29560823 PMCID: PMC5861498 DOI: 10.1186/s12918-018-0539-0] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]

Number

Cited by Other Article(s)

Wang H, Zheng H, Chen DZ. TANGO: A GO-Term Embedding Based Method for Protein Semantic Similarity Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:694-706. [PMID: 35030084 DOI: 10.1109/tcbb.2022.3143480] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Chen Y, Hu Y, Hu X, Feng C, Chen M. CoGO: a contrastive learning framework to predict disease similarity based on gene network and ontology structure. Bioinformatics 2022;38:4380-4386. [PMID: 35900147 DOI: 10.1093/bioinformatics/btac520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 06/16/2022] [Indexed: 12/24/2022] Open

Han Y, Klinger K, Rajpal DK, Zhu C, Teeple E. Empowering the discovery of novel target-disease associations via machine learning approaches in the open targets platform. BMC Bioinformatics 2022;23:232. [PMID: 35710324 PMCID: PMC9202116 DOI: 10.1186/s12859-022-04753-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 05/26/2022] [Indexed: 11/10/2022] Open

Abstract

Background

The Open Targets (OT) Platform integrates a wide range of data sources on target-disease associations to facilitate identification of potential therapeutic drug targets to treat human diseases. However, due to the complexity that targets are usually functionally pleiotropic and efficacious for multiple indications, challenges in identifying novel target to indication associations remain. Specifically, persistent need exists for new methods for integration of novel target-disease association evidence and biological knowledge bases via advanced computational methods. These offer promise for increasing power for identification of the most promising target-disease pairs for therapeutic development. Here we introduce a novel approach by integrating additional target-disease features with machine learning models to further uncover druggable disease to target indications.

Results

We derived novel target-disease associations as supplemental features to OT platform-based associations using three data sources: (1) target tissue specificity from GTEx expression profiles; (2) target semantic similarities based on gene ontology; and (3) functional interactions among targets by embedding them from protein–protein interaction (PPI) networks. Machine learning models were applied to evaluate feature importance and performance benchmarks for predicting targets with known drug indications. The evaluation results show the newly integrated features demonstrate higher importance than current features in OT. In addition, these also show superior performance over association benchmarks and may support discovery of novel therapeutic indications for highly pursued targets.

Conclusion

Our newly generated features can be used to represent additional underlying biological relatedness among targets and diseases to further empower improved performance for predicting novel indications for drug targets through advanced machine learning models. The proposed methodology enables a powerful new approach for systematic evaluation of drug targets with novel indications.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12859-022-04753-4.

Collapse

Yuan L, Yang Z, Zhao J, Sun T, Hu C, Shen Z, Yu G. Pan-Cancer Bioinformatics Analysis of Gene UBE2C. Front Genet 2022;13:893358. [PMID: 35571064 PMCID: PMC9091452 DOI: 10.3389/fgene.2022.893358] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 03/29/2022] [Indexed: 11/30/2022] Open

Kamran AB, Naveed H. GOntoSim: a semantic similarity measure based on LCA and common descendants. Sci Rep 2022;12:3818. [PMID: 35264663 PMCID: PMC8907294 DOI: 10.1038/s41598-022-07624-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Accepted: 02/14/2022] [Indexed: 11/20/2022] Open

Mallick K, Mallik S, Bandyopadhyay S, Chakraborty S. A Novel Graph Topology-Based GO-Similarity Measure for Signature Detection From Multi-Omics Data and its Application to Other Problems. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:773-785. [PMID: 32866101 DOI: 10.1109/tcbb.2020.3020537] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Harikumar H, Quinn TP, Rana S, Gupta S, Venkatesh S. Personalized single-cell networks: a framework to predict the response of any gene to any drug for any patient. BioData Min 2021;14:37. [PMID: 34353329 PMCID: PMC8340371 DOI: 10.1186/s13040-021-00263-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 05/10/2021] [Indexed: 11/15/2022] Open

Abstract

BACKGROUND

The last decade has seen a major increase in the availability of genomic data. This includes expert-curated databases that describe the biological activity of genes, as well as high-throughput assays that measure gene expression in bulk tissue and single cells. Integrating these heterogeneous data sources can generate new hypotheses about biological systems. Our primary objective is to combine population-level drug-response data with patient-level single-cell expression data to predict how any gene will respond to any drug for any patient.

METHODS

We take 2 approaches to benchmarking a "dual-channel" random walk with restart (RWR) for data integration. First, we evaluate how well RWR can predict known gene functions from single-cell gene co-expression networks. Second, we evaluate how well RWR can predict known drug responses from individual cell networks. We then present two exploratory applications. In the first application, we combine the Gene Ontology database with glioblastoma single cells from 5 individual patients to identify genes whose functions differ between cancers. In the second application, we combine the LINCS drug-response database with the same glioblastoma data to identify genes that may exhibit patient-specific drug responses.

CONCLUSIONS

Our manuscript introduces two innovations to the integration of heterogeneous biological data. First, we use a "dual-channel" method to predict up-regulation and down-regulation separately. Second, we use individualized single-cell gene co-expression networks to make personalized predictions. These innovations let us predict gene function and drug response for individual patients. Taken together, our work shows promise that single-cell co-expression data could be combined in heterogeneous information networks to facilitate precision medicine.

Collapse

Kulmanov M, Smaili FZ, Gao X, Hoehndorf R. Semantic similarity and machine learning with ontologies. Brief Bioinform 2021;22:bbaa199. [PMID: 33049044 PMCID: PMC8293838 DOI: 10.1093/bib/bbaa199] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 08/03/2020] [Accepted: 08/04/2020] [Indexed: 12/13/2022] Open

Li Y, Wang K, Wang G. Evaluating Disease Similarity Based on Gene Network Reconstruction and Representation. Bioinformatics 2021;37:3579-3587. [PMID: 33978702 DOI: 10.1093/bioinformatics/btab252] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 03/01/2021] [Accepted: 04/28/2021] [Indexed: 11/13/2022] Open

Abstract

MOTIVATION

Quantifying the associations between diseases is of great significance in increasing our understanding of disease biology, improving disease diagnosis, re-positioning, and developing drugs. Therefore, in recent years, the research of disease similarity has received a lot of attention in the field of bioinformatics. Previous work has shown that the combination of the ontology (such as disease ontology and gene ontology) and disease-gene interactions are worthy to be regarded to elucidate diseases and disease associations. However, most of them are either based on the overlap between disease-related gene sets or distance within the ontology's hierarchy. The diseases in these methods are represented by discrete or sparse feature vectors, which cannot grasp the deep semantic information of diseases. Recently, deep representation learning has been widely studied and gradually applied to various fields of bioinformatics. Based on the hypothesis that disease representation depends on its related gene representations, we propose a disease representation model using two most representative gene resources HumanNet and Gene Ontology to construct a new gene network and learn gene (disease) representations. The similarity between two diseases is computed by the cosine similarity of their corresponding representations.

RESULTS

We propose a novel approach to compute disease similarity, which integrates two important factors disease-related genes and gene ontology hierarchy to learn disease representation based on deep representation learning. Under the same experimental settings, the AUC value of our method is 0.8074, which improves the most competitive baseline method by 10.1%. The quantitative and qualitative experimental results show that our model can learn effective disease representations and improve the accuracy of disease similarity computation significantly.

AVAILABILITY

The research shows that this method has certain applicability in the prediction of gene-related diseases, the migration of disease treatment methods, drug development, and so on.

SUPPLEMENTARY INFORMATION

Supplementary data are available at https://github.com/catly/disease_similarity.

Collapse

Liu W, Sun X, Peng L, Zhou L, Lin H, Jiang Y. RWRNET: A Gene Regulatory Network Inference Algorithm Using Random Walk With Restart. Front Genet 2020;11:591461. [PMID: 33101398 PMCID: PMC7545090 DOI: 10.3389/fgene.2020.591461] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Accepted: 09/02/2020] [Indexed: 11/30/2022] Open

Peng J, Guan J, Hui W, Shang X. A novel subnetwork representation learning method for uncovering disease-disease relationships. Methods 2020;192:77-84. [PMID: 32946974 DOI: 10.1016/j.ymeth.2020.09.002] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Revised: 08/20/2020] [Accepted: 09/07/2020] [Indexed: 12/12/2022] Open

Lu K, Yang K, Niyongabo E, Shu Z, Wang J, Chang K, Zou Q, Jiang J, Jia C, Liu B, Zhou X. Integrated network analysis of symptom clusters across disease conditions. J Biomed Inform 2020;107:103482. [PMID: 32535270 DOI: 10.1016/j.jbi.2020.103482] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2019] [Revised: 05/18/2020] [Accepted: 06/08/2020] [Indexed: 10/24/2022]

Peng J, Zhu L, Wang Y, Chen J. Mining Relationships among Multiple Entities in Biological Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020;17:769-776. [PMID: 30872239 DOI: 10.1109/tcbb.2019.2904965] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Zhao Y, Wang J, Chen J, Zhang X, Guo M, Yu G. A Literature Review of Gene Function Prediction by Modeling Gene Ontology. Front Genet 2020;11:400. [PMID: 32391061 PMCID: PMC7193026 DOI: 10.3389/fgene.2020.00400] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 03/30/2020] [Indexed: 12/14/2022] Open

Liu H, Guan J, Li H, Bao Z, Wang Q, Luo X, Xue H. Predicting the Disease Genes of Multiple Sclerosis Based on Network Representation Learning. Front Genet 2020;11:328. [PMID: 32373160 PMCID: PMC7186413 DOI: 10.3389/fgene.2020.00328] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Accepted: 03/19/2020] [Indexed: 02/02/2023] Open

Wang H, Wang J, Dong C, Lian Y, Liu D, Yan Z. A Novel Approach for Drug-Target Interactions Prediction Based on Multimodal Deep Autoencoder. Front Pharmacol 2020;10:1592. [PMID: 32047432 PMCID: PMC6997437 DOI: 10.3389/fphar.2019.01592] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2019] [Accepted: 12/09/2019] [Indexed: 01/09/2023] Open

Peng J, Lu G, Xue H, Wang T, Shang X. TS-GOEA: a web tool for tissue-specific gene set enrichment analysis based on gene ontology. BMC Bioinformatics 2019;20:572. [PMID: 31760951 PMCID: PMC6876092 DOI: 10.1186/s12859-019-3125-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

Wang Y, Juan L, Peng J, Zang T, Wang Y. Prioritizing candidate diseases-related metabolites based on literature and functional similarity. BMC Bioinformatics 2019;20:574. [PMID: 31760947 PMCID: PMC6876110 DOI: 10.1186/s12859-019-3127-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open

Abstract

Background

As the terminal products of cellular regulatory process, functional related metabolites have a close relationship with complex diseases, and are often associated with the same or similar diseases. Therefore, identification of disease related metabolites play a critical role in understanding comprehensively pathogenesis of disease, aiming at improving the clinical medicine. Considering that a large number of metabolic markers of diseases need to be explored, we propose a computational model to identify potential disease-related metabolites based on functional relationships and scores of referred literatures between metabolites. First, obtaining associations between metabolites and diseases from the Human Metabolome database, we calculate the similarities of metabolites based on modified recommendation strategy of collaborative filtering utilizing the similarities between diseases. Next, a disease-associated metabolite network (DMN) is built with similarities between metabolites as weight. To improve the ability of identifying disease-related metabolites, we introduce scores of text mining from the existing database of chemicals and proteins into DMN and build a new disease-associated metabolite network (FLDMN) by fusing functional associations and scores of literatures. Finally, we utilize random walking with restart (RWR) in this network to predict candidate metabolites related to diseases.

Results

We construct the disease-associated metabolite network and its improved network (FLDMN) with 245 diseases, 587 metabolites and 28,715 disease-metabolite associations. Subsequently, we extract training sets and testing sets from two different versions of the Human Metabolome database and assess the performance of DMN and FLDMN on 19 diseases, respectively. As a result, the average AUC (area under the receiver operating characteristic curve) of DMN is 64.35%. As a further improved network, FLDMN is proven to be successful in predicting potential metabolic signatures for 19 diseases with an average AUC value of 76.03%.

Conclusion

In this paper, a computational model is proposed for exploring metabolite-disease pairs and has good performance in predicting potential metabolites related to diseases through adequate validation. This result suggests that integrating literature and functional associations can be an effective way to construct disease associated metabolite network for prioritizing candidate diseases-related metabolites.

Collapse

Zhan Q, Wang N, Jin S, Tan R, Jiang Q, Wang Y. ProbPFP: a multiple sequence alignment algorithm combining hidden Markov model optimized by particle swarm optimization with partition function. BMC Bioinformatics 2019;20:573. [PMID: 31760933 PMCID: PMC6876095 DOI: 10.1186/s12859-019-3132-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

During procedures for conducting multiple sequence alignment, that is so essential to use the substitution score of pairwise alignment. To compute adaptive scores for alignment, researchers usually use Hidden Markov Model or probabilistic consistency methods such as partition function. Recent studies show that optimizing the parameters for hidden Markov model, as well as integrating hidden Markov model with partition function can raise the accuracy of alignment. The combination of partition function and optimized HMM, which could further improve the alignment's accuracy, however, was ignored by these researches.

RESULTS

A novel algorithm for MSA called ProbPFP is presented in this paper. It intergrate optimized HMM by particle swarm with partition function. The algorithm of PSO was applied to optimize HMM's parameters. After that, the posterior probability obtained by the HMM was combined with the one obtained by partition function, and thus to calculate an integrated substitution score for alignment. In order to evaluate the effectiveness of ProbPFP, we compared it with 13 outstanding or classic MSA methods. The results demonstrate that the alignments obtained by ProbPFP got the maximum mean TC scores and mean SP scores on these two benchmark datasets: SABmark and OXBench, and it got the second highest mean TC scores and mean SP scores on the benchmark dataset BAliBASE. ProbPFP is also compared with 4 other outstanding methods, by reconstructing the phylogenetic trees for six protein families extracted from the database TreeFam, based on the alignments obtained by these 5 methods. The result indicates that the reference trees are closer to the phylogenetic trees reconstructed from the alignments obtained by ProbPFP than the other methods.

CONCLUSIONS

We propose a new multiple sequence alignment method combining optimized HMM and partition function in this paper. The performance validates this method could make a great improvement of the alignment's accuracy.

Collapse

Wang Y, Nie C, Zang T, Wang Y. Predicting circRNA-Disease Associations Based on circRNA Expression Similarity and Functional Similarity. Front Genet 2019;10:832. [PMID: 31572444 PMCID: PMC6751509 DOI: 10.3389/fgene.2019.00832] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Accepted: 08/13/2019] [Indexed: 12/19/2022] Open

Peng J, Wang X, Shang X. Combining gene ontology with deep neural networks to enhance the clustering of single cell RNA-Seq data. BMC Bioinformatics 2019;20:284. [PMID: 31182005 PMCID: PMC6557741 DOI: 10.1186/s12859-019-2769-6] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open

Xue H, Peng J, Shang X. Predicting disease-related phenotypes using an integrated phenotype similarity measurement based on HPO. BMC SYSTEMS BIOLOGY 2019;13:34. [PMID: 30953559 PMCID: PMC6449884 DOI: 10.1186/s12918-019-0697-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Peng J, Guan J, Shang X. Predicting Parkinson's Disease Genes Based on Node2vec and Autoencoder. Front Genet 2019;10:226. [PMID: 31001311 PMCID: PMC6454041 DOI: 10.3389/fgene.2019.00226] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2018] [Accepted: 02/28/2019] [Indexed: 12/26/2022] Open

Cheng L, Zhuang H, Ju H, Yang S, Han J, Tan R, Hu Y. Exposing the Causal Effect of Body Mass Index on the Risk of Type 2 Diabetes Mellitus: A Mendelian Randomization Study. Front Genet 2019;10:94. [PMID: 30891058 PMCID: PMC6413727 DOI: 10.3389/fgene.2019.00094] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2018] [Accepted: 01/29/2019] [Indexed: 12/17/2022] Open

Xu L, Liang G, Liao C, Chen GD, Chang CC. k-Skip-n-Gram-RF: A Random Forest Based Method for Alzheimer's Disease Protein Identification. Front Genet 2019;10:33. [PMID: 30809242 PMCID: PMC6379451 DOI: 10.3389/fgene.2019.00033] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Accepted: 01/17/2019] [Indexed: 11/18/2022] Open

Cheng L, Zhuang H, Yang S, Jiang H, Wang S, Zhang J. Exposing the Causal Effect of C-Reactive Protein on the Risk of Type 2 Diabetes Mellitus: A Mendelian Randomization Study. Front Genet 2018;9:657. [PMID: 30619477 PMCID: PMC6306438 DOI: 10.3389/fgene.2018.00657] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Accepted: 12/03/2018] [Indexed: 12/21/2022] Open

Peng J, Xue H, Hui W, Lu J, Chen B, Jiang Q, Shang X, Wang Y. An online tool for measuring and visualizing phenotype similarities using HPO. BMC Genomics 2018;19:571. [PMID: 30367579 PMCID: PMC6101067 DOI: 10.1186/s12864-018-4927-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

A Similarity Regression Fusion Model for Integrating Multi-Omics Data to Identify Cancer Subtypes. Genes (Basel) 2018;9:genes9070314. [PMID: 29933539 PMCID: PMC6070922 DOI: 10.3390/genes9070314] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2018] [Revised: 06/08/2018] [Accepted: 06/11/2018] [Indexed: 12/26/2022] Open

Peng J, Hui W, Shang X. Measuring phenotype-phenotype similarity through the interactome. BMC Bioinformatics 2018;19:114. [PMID: 29671400 PMCID: PMC5907215 DOI: 10.1186/s12859-018-2102-9] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open

Wang Z, Wu X, Wang Y. A framework for analyzing DNA methylation data from Illumina Infinium HumanMethylation450 BeadChip. BMC Bioinformatics 2018;19:115. [PMID: 29671397 PMCID: PMC5907140 DOI: 10.1186/s12859-018-2096-3] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open

Chu Y, Teng M, Wang Y. Modeling and correct the GC bias of tumor and normal WGS data for SCNA based tumor subclonal population inferring. BMC Bioinformatics 2018;19:112. [PMID: 29671389 PMCID: PMC5907144 DOI: 10.1186/s12859-018-2099-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open

Hu Y, Zhao T, Zhang N, Zang T, Zhang J, Cheng L. Identifying diseases-related metabolites using random walk. BMC Bioinformatics 2018;19:116. [PMID: 29671398 PMCID: PMC5907145 DOI: 10.1186/s12859-018-2098-1] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open

Hao X, Hao J, Wang L, Hou H. Effective norm emergence in cell systems under limited communication. BMC Bioinformatics 2018;19:119. [PMID: 29671391 PMCID: PMC5907317 DOI: 10.1186/s12859-018-2097-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

Guo Y, Liu S, Li Z, Shang X. BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data. BMC Bioinformatics 2018;19:118. [PMID: 29671390 PMCID: PMC5907304 DOI: 10.1186/s12859-018-2095-4] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The classification of cancer subtypes is of great importance to cancer disease diagnosis and therapy. Many supervised learning approaches have been applied to cancer subtype classification in the past few years, especially of deep learning based approaches. Recently, the deep forest model has been proposed as an alternative of deep neural networks to learn hyper-representations by using cascade ensemble decision trees. It has been proved that the deep forest model has competitive or even better performance than deep neural networks in some extent. However, the standard deep forest model may face overfitting and ensemble diversity challenges when dealing with small sample size and high-dimensional biology data.

RESULTS

In this paper, we propose a deep learning model, so-called BCDForest, to address cancer subtype classification on small-scale biology datasets, which can be viewed as a modification of the standard deep forest model. The BCDForest distinguishes from the standard deep forest model with the following two main contributions: First, a named multi-class-grained scanning method is proposed to train multiple binary classifiers to encourage diversity of ensemble. Meanwhile, the fitting quality of each classifier is considered in representation learning. Second, we propose a boosting strategy to emphasize more important features in cascade forests, thus to propagate the benefits of discriminative features among cascade layers to improve the classification performance. Systematic comparison experiments on both microarray and RNA-Seq gene expression datasets demonstrate that our method consistently outperforms the state-of-the-art methods in application of cancer subtype classification.

CONCLUSIONS

The multi-class-grained scanning and boosting strategy in our model provide an effective solution to ease the overfitting challenge and improve the robustness of deep forest model working on small-scale data. Our model provides a useful approach to the classification of cancer subtypes by using deep learning on high-dimensional and small-scale biology data.

Collapse

Peng J, Wang H, Lu J, Hui W, Wang Y, Shang X. Identifying term relations cross different gene ontology categories. BMC Bioinformatics 2017;18:573. [PMID: 29297309 PMCID: PMC5751813 DOI: 10.1186/s12859-017-1959-3] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open