Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Chen L, Zhang YH, Huang G, Pan X, Wang S, Huang T, Cai YD. Discriminating cirRNAs from other lncRNAs using a hierarchical extreme learning machine (H-ELM) algorithm with feature selection. Mol Genet Genomics 2017;293:137-149. [DOI: 10.1007/s00438-017-1372-7] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2017] [Accepted: 09/07/2017] [Indexed: 12/15/2022]

For:	Chen L, Zhang YH, Huang G, Pan X, Wang S, Huang T, Cai YD. Discriminating cirRNAs from other lncRNAs using a hierarchical extreme learning machine (H-ELM) algorithm with feature selection. Mol Genet Genomics 2017;293:137-149. [DOI: 10.1007/s00438-017-1372-7] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2017] [Accepted: 09/07/2017] [Indexed: 12/15/2022]

Number

Cited by Other Article(s)

Niu M, Wang C, Chen Y, Zou Q, Qi R, Xu L. CircRNA identification and feature interpretability analysis. BMC Biol 2024;22:44. [PMID: 38408987 PMCID: PMC10898045 DOI: 10.1186/s12915-023-01804-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Accepted: 12/18/2023] [Indexed: 02/28/2024] Open

Wang J, Lu S, Wang SH, Zhang YD. A review on extreme learning machine. MULTIMEDIA TOOLS AND APPLICATIONS 2022;81:41611-41660. [DOI: 10.1007/s11042-021-11007-7] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 02/26/2021] [Accepted: 05/05/2021] [Indexed: 08/30/2023]

Attention-Based Deep Multiple-Instance Learning for Classifying Circular RNA and Other Long Non-Coding RNA. Genes (Basel) 2021;12:genes12122018. [PMID: 34946967 PMCID: PMC8701965 DOI: 10.3390/genes12122018] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 12/14/2021] [Accepted: 12/17/2021] [Indexed: 12/23/2022] Open

Niu M, Ju Y, Lin C, Zou Q. Characterizing viral circRNAs and their application in identifying circRNAs in viruses. Brief Bioinform 2021;23:6377516. [PMID: 34585234 DOI: 10.1093/bib/bbab404] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 08/23/2021] [Accepted: 09/02/2021] [Indexed: 01/19/2023] Open

Asim MN, Ibrahim MA, Imran Malik M, Dengel A, Ahmed S. Advances in Computational Methodologies for Classification and Sub-Cellular Locality Prediction of Non-Coding RNAs. Int J Mol Sci 2021;22:8719. [PMID: 34445436 PMCID: PMC8395733 DOI: 10.3390/ijms22168719] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Revised: 08/02/2021] [Accepted: 08/03/2021] [Indexed: 02/06/2023] Open

Jiang JY, Ju CJT, Hao J, Chen M, Wang W. JEDI: circular RNA prediction based on junction encoders and deep interaction among splice sites. Bioinformatics 2021;37:i289-i298. [PMID: 34252942 PMCID: PMC8336595 DOI: 10.1093/bioinformatics/btab288] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open

Abstract

Motivation

Circular RNA (circRNA) is a novel class of long non-coding RNAs that have been broadly discovered in the eukaryotic transcriptome. The circular structure arises from a non-canonical splicing process, where the donor site backspliced to an upstream acceptor site. These circRNA sequences are conserved across species. More importantly, rising evidence suggests their vital roles in gene regulation and association with diseases. As the fundamental effort toward elucidating their functions and mechanisms, several computational methods have been proposed to predict the circular structure from the primary sequence. Recently, advanced computational methods leverage deep learning to capture the relevant patterns from RNA sequences and model their interactions to facilitate the prediction. However, these methods fail to fully explore positional information of splice junctions and their deep interaction.

Results

We present a robust end-to-end framework, Junction Encoder with Deep Interaction (JEDI), for circRNA prediction using only nucleotide sequences. JEDI first leverages the attention mechanism to encode each junction site based on deep bidirectional recurrent neural networks and then presents the novel cross-attention layer to model deep interaction among these sites for backsplicing. Finally, JEDI can not only predict circRNAs but also interpret relationships among splice sites to discover backsplicing hotspots within a gene region. Experiments demonstrate JEDI significantly outperforms state-of-the-art approaches in circRNA prediction on both isoform level and gene level. Moreover, JEDI also shows promising results on zero-shot backsplicing discovery, where none of the existing approaches can achieve.

Availability and implementation

The implementation of our framework is available at https://github.com/hallogameboy/JEDI.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Liu D, Fang L. Current research on circular RNAs and their potential clinical implications in breast cancer. Cancer Biol Med 2021;18:j.issn.2095-3941.2020.0275. [PMID: 34018386 PMCID: PMC8330541 DOI: 10.20892/j.issn.2095-3941.2020.0275] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 11/26/2020] [Indexed: 02/06/2023] Open

Singh D, Madhawan A, Roy J. Identification of multiple RNAs using feature fusion. Brief Bioinform 2021;22:6272794. [PMID: 33971667 DOI: 10.1093/bib/bbab178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 04/08/2021] [Indexed: 11/13/2022] Open

Bonidia RP, Sampaio LDH, Domingues DS, Paschoal AR, Lopes FM, de Carvalho ACPLF, Sanches DS. Feature extraction approaches for biological sequences: a comparative study of mathematical features. Brief Bioinform 2021;22:6135010. [PMID: 33585910 DOI: 10.1093/bib/bbab011] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 12/13/2020] [Accepted: 01/07/2021] [Indexed: 11/14/2022] Open

CircNet: an encoder–decoder-based convolution neural network (CNN) for circular RNA identification. Neural Comput Appl 2021. [DOI: 10.1007/s00521-020-05673-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]

Applications of Deep Learning in Biomedicine. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11507-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open

Wu Z, Shou L, Wang J, Huang T, Xu X. The Methylation Pattern for Knee and Hip Osteoarthritis. Front Cell Dev Biol 2020;8:602024. [PMID: 33240895 PMCID: PMC7677303 DOI: 10.3389/fcell.2020.602024] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 10/22/2020] [Indexed: 01/08/2023] Open

Li R, Wang X, Song Y, Lei L. Hierarchical extreme learning machine with L21-norm loss and regularization. INT J MACH LEARN CYB 2020. [DOI: 10.1007/s13042-020-01234-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]

Peng L, Shen L, Liao L, Liu G, Zhou L. RNMFMDA: A Microbe-Disease Association Identification Method Based on Reliable Negative Sample Selection and Logistic Matrix Factorization With Neighborhood Regularization. Front Microbiol 2020;11:592430. [PMID: 33193260 PMCID: PMC7652725 DOI: 10.3389/fmicb.2020.592430] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Accepted: 09/17/2020] [Indexed: 12/22/2022] Open

Zhu JH, Yan QL, Wang JW, Chen Y, Ye QH, Wang ZJ, Huang T. The Key Genes for Perineural Invasion in Pancreatic Ductal Adenocarcinoma Identified With Monte-Carlo Feature Selection Method. Front Genet 2020;11:554502. [PMID: 33193628 PMCID: PMC7593847 DOI: 10.3389/fgene.2020.554502] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Accepted: 08/17/2020] [Indexed: 12/20/2022] Open

Xu Y, Zhang YH, Li J, Pan XY, Huang T, Cai YD. New Computational Tool Based on Machine-learning Algorithms for the Identification of Rhinovirus Infection-Related Genes. Comb Chem High Throughput Screen 2020;22:665-674. [PMID: 31782358 DOI: 10.2174/1386207322666191129114741] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2019] [Revised: 05/22/2019] [Accepted: 07/09/2019] [Indexed: 12/14/2022]

Chaabane M, Williams RM, Stephens AT, Park JW. circDeep: deep learning approach for circular RNA classification from other long non-coding RNA. Bioinformatics 2020;36:73-80. [PMID: 31268128 PMCID: PMC6956777 DOI: 10.1093/bioinformatics/btz537] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Revised: 06/13/2019] [Accepted: 07/01/2019] [Indexed: 01/17/2023] Open

Zhang G, Deng Y, Liu Q, Ye B, Dai Z, Chen Y, Dai X. Identifying Circular RNA and Predicting Its Regulatory Interactions by Machine Learning. Front Genet 2020;11:655. [PMID: 32849764 PMCID: PMC7396586 DOI: 10.3389/fgene.2020.00655] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2019] [Accepted: 05/29/2020] [Indexed: 12/11/2022] Open

Li M, Chen F, Zhang Y, Xiong Y, Li Q, Huang H. Identification of Post-myocardial Infarction Blood Expression Signatures Using Multiple Feature Selection Strategies. Front Physiol 2020;11:483. [PMID: 32581823 PMCID: PMC7287215 DOI: 10.3389/fphys.2020.00483] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 04/20/2020] [Indexed: 12/24/2022] Open

Measuring Performance Metrics of Machine Learning Algorithms for Detecting and Classifying Transposable Elements. Processes (Basel) 2020. [DOI: 10.3390/pr8060638] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

Abstract Because of the promising results obtained by machine learning (ML) approaches in several fields, every day is more common, the utilization of ML to solve problems in bioinformatics. In genomics, a current issue is to detect and classify transposable elements (TEs) because of the tedious tasks involved in bioinformatics methods. Thus, ML was recently evaluated for TE datasets, demonstrating better results than bioinformatics applications. A crucial step for ML approaches is the selection of metrics that measure the realistic performance of algorithms. Each metric has specific characteristics and measures properties that may be different from the predicted results. Although the most commonly used way to compare measures is by using empirical analysis, a non-result-based methodology has been proposed, called measure invariance properties. These properties are calculated on the basis of whether a given measure changes its value under certain modifications in the confusion matrix, giving comparative parameters independent of the datasets. Measure invariance properties make metrics more or less informative, particularly on unbalanced, monomodal, or multimodal negative class datasets and for real or simulated datasets. Although several studies applied ML to detect and classify TEs, there are no works evaluating performance metrics in TE tasks. Here, we analyzed 26 different metrics utilized in binary, multiclass, and hierarchical classifications, through bibliographic sources, and their invariance properties. Then, we corroborated our findings utilizing freely available TE datasets and commonly used ML algorithms. Based on our analysis, the most suitable metrics for TE tasks must be stable, even using highly unbalanced datasets, multimodal negative class, and training datasets with errors or outliers. Based on these parameters, we conclude that the F1-score and the area under the precision-recall curve are the most informative metrics since they are calculated based on other metrics, providing insight into the development of an ML application. Collapse

CirRNAPL: A web server for the identification of circRNA based on extreme learning machine. Comput Struct Biotechnol J 2020;18:834-842. [PMID: 32308930 PMCID: PMC7153170 DOI: 10.1016/j.csbj.2020.03.028] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2019] [Revised: 03/29/2020] [Accepted: 03/29/2020] [Indexed: 12/27/2022] Open

Zhang J, Hu H, Xu S, Jiang H, Zhu J, Qin E, He Z, Chen E. The Functional Effects of Key Driver KRAS Mutations on Gene Expression in Lung Cancer. Front Genet 2020;11:17. [PMID: 32117436 PMCID: PMC7010953 DOI: 10.3389/fgene.2020.00017] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Accepted: 01/07/2020] [Indexed: 12/11/2022] Open

Orozco-Arias S, Isaza G, Guyot R, Tabares-Soto R. A systematic review of the application of machine learning in the detection and classification of transposable elements. PeerJ 2019;7:e8311. [PMID: 31976169 PMCID: PMC6967008 DOI: 10.7717/peerj.8311] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Accepted: 11/28/2019] [Indexed: 12/16/2022] Open

Liu M, Liu G. Prediction of Citrullination Sites on the Basis of mRMR Method and SNN. Comb Chem High Throughput Screen 2019;22:705-715. [PMID: 31782357 DOI: 10.2174/1386207322666191129113508] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Revised: 10/02/2019] [Accepted: 10/25/2019] [Indexed: 01/31/2023]

Abstract

BACKGROUND

Citrullination, an important post-translational modification of proteins, alters the molecular weight and electrostatic charge of the protein side chains. Citrulline, in protein sequences, is catalyzed by a class of Peptidyl Arginine Deiminases (PADs). Dependent on Ca2+, PADs include five isozymes: PAD 1, 2, 3, 4/5, and 6. Citrullinated proteins have been identified in many biological and pathological processes. Among them, abnormal protein citrullination modification can lead to serious human diseases, including multiple sclerosis and rheumatoid arthritis.

OBJECTIVE

It is important to identify the citrullination sites in protein sequences. The accurate identification of citrullination sites may contribute to the studies on the molecular functions and pathological mechanisms of related diseases.

METHODS AND RESULTS

In this study, after an encoded training set (containing 116 positive and 348 negative samples) into the feature matrix, the mRMR method was used to analyze the 941- dimensional features which were sorted on the basis of their importance. Then, a predictive model based on a self-normalizing neural network (SNN) was proposed to predict the citrullination sites in protein sequences. Incremental Feature Selection (IFS) and 10-fold cross-validation were used as the model evaluation method. Three classical machine learning models, namely random forest, support vector machine, and k-nearest neighbor algorithm, were selected and compared with the SNN prediction model using the same evaluation methods. SNN may be the best tool for citrullination site prediction. The maximum value of the Matthews Correlation Coefficient (MCC) reached 0.672404 on the basis of the optimal classifier of SNN.

CONCLUSION

The results showed that the SNN-based prediction methods performed better when evaluated by some common metrics, such as MCC, accuracy, and F1-Measure. SNN prediction model also achieved a better balance in the classification and recognition of positive and negative samples from datasets compared with the other three models.

Collapse

Chen L, Li D, Shao Y, Wang H, Liu Y, Zhang Y. Identifying Microbiota Signature and Functional Rules Associated With Bacterial Subtypes in Human Intestine. Front Genet 2019;10:1146. [PMID: 31803234 PMCID: PMC6872643 DOI: 10.3389/fgene.2019.01146] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 10/21/2019] [Indexed: 12/12/2022] Open

Identifying Methylation Pattern and Genes Associated with Breast Cancer Subtypes. Int J Mol Sci 2019;20:ijms20174269. [PMID: 31480430 PMCID: PMC6747348 DOI: 10.3390/ijms20174269] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Revised: 08/19/2019] [Accepted: 08/29/2019] [Indexed: 12/18/2022] Open

Analysis of Expression Pattern of snoRNAs in Different Cancer Types with Machine Learning Algorithms. Int J Mol Sci 2019;20:ijms20092185. [PMID: 31052553 PMCID: PMC6539089 DOI: 10.3390/ijms20092185] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2019] [Revised: 04/29/2019] [Accepted: 04/30/2019] [Indexed: 01/17/2023] Open

Chen L, Pan X, Zhang YH, Huang T, Cai YD. Analysis of Gene Expression Differences between Different Pancreatic Cells. ACS OMEGA 2019;4:6421-6435. [DOI: 10.1021/acsomega.8b02171] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/30/2023]

Zhang Y, Dong D, Li D, Lu L, Li J, Zhang Y, Chen L. Computational Method for the Identification of Molecular Metabolites Involved in Cereal Hull Color Variations. Comb Chem High Throughput Screen 2019;21:760-770. [DOI: 10.2174/1386207322666190129105441] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2018] [Revised: 08/02/2018] [Accepted: 08/16/2018] [Indexed: 11/22/2022]

Mirza B, Wang W, Wang J, Choi H, Chung NC, Ping P. Machine Learning and Integrative Analysis of Biomedical Big Data. Genes (Basel) 2019;10:E87. [PMID: 30696086 PMCID: PMC6410075 DOI: 10.3390/genes10020087] [Citation(s) in RCA: 153] [Impact Index Per Article: 30.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Revised: 01/08/2019] [Accepted: 01/21/2019] [Indexed: 12/11/2022] Open

Affiliation(s)

Bilal Mirza NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA. Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA.
Wei Wang NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA. Department of Computer Science, University of California Los Angeles, Los Angeles, CA 90095, USA. Scalable Analytics Institute (ScAi), University of California Los Angeles, Los Angeles, CA 90095, USA. Department of Bioinformatics, University of California Los Angeles, Los Angeles, CA 90095, USA.
Jie Wang NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA. Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA.
Howard Choi NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA. Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA. Department of Bioinformatics, University of California Los Angeles, Los Angeles, CA 90095, USA.
Neo Christopher Chung NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA. Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland.
Peipei Ping NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA. Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA. Scalable Analytics Institute (ScAi), University of California Los Angeles, Los Angeles, CA 90095, USA. Department of Bioinformatics, University of California Los Angeles, Los Angeles, CA 90095, USA. Department of Medicine (Cardiology), University of California Los Angeles, Los Angeles, CA 90095, USA.

Collapse

Wang S, Li J, Sun X, Zhang YH, Huang T, Cai Y. Computational Method for Identifying Malonylation Sites by Using Random Forest Algorithm. Comb Chem High Throughput Screen 2018;23:304-312. [PMID: 30588879 DOI: 10.2174/1386207322666181227144318] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2018] [Revised: 09/03/2018] [Accepted: 12/04/2018] [Indexed: 12/12/2022]

Chen L, Pan X, Zhang YH, Liu M, Huang T, Cai YD. Classification of Widely and Rarely Expressed Genes with Recurrent Neural Network. Comput Struct Biotechnol J 2018;17:49-60. [PMID: 30595815 PMCID: PMC6307323 DOI: 10.1016/j.csbj.2018.12.002] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Revised: 12/07/2018] [Accepted: 12/09/2018] [Indexed: 02/06/2023] Open

Abstract

A tissue-specific gene expression shapes the formation of tissues, while gene expression changes reflect the immune response of the human body to environmental stimulations or pressure, particularly in disease conditions, such as cancers. A few genes are commonly expressed across tissues or various cancers, while others are not. To investigate the functional differences between widely and rarely expressed genes, we defined the genes that were expressed in 32 normal tissues/cancers (i.e., called widely expressed genes; FPKM >1 in all samples) and those that were not detected (i.e., called rarely expressed genes; FPKM <1 in all samples) based on the large gene expression data set provided by Uhlen et al. Each gene was encoded using the gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment scores. Minimum redundancy maximum relevance (mRMR) was used to measure and rank these features on the mRMR feature list. Thereafter, we applied the incremental feature selection method with a supervised classifier recurrent neural network (RNN) to select the discriminate features for classifying widely expressed genes from rarely expressed genes and construct an optimum RNN classifier. The Youden's indexes generated by the optimum RNN classifier and evaluated using a 10-fold cross validation were 0.739 for normal tissues and 0.639 for cancers. Furthermore, the underlying mechanisms of the key discriminate GO and KEGG features were analyzed. Results can facilitate the identification of the expression landscape of genes and elucidation of how gene expression shapes tissues and the microenvironment of cancers.

•

Some genes are widely expressed across tissues or various cancers.

•

A number of genes are rarely expressed across tissues or various cancers.

•

The functional differences between widely and rarely expressed genes were studied.

•

Several GO terms and KEGG pathways were extracted and analyzed.

Collapse

Chen L, Zhang S, Pan X, Hu X, Zhang YH, Yuan F, Huang T, Cai YD. HIV infection alters the human epigenetic landscape. Gene Ther 2018;26:29-39. [PMID: 30443044 DOI: 10.1038/s41434-018-0051-6] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Revised: 10/30/2018] [Accepted: 10/31/2018] [Indexed: 02/07/2023]

Chen L, Zhang YH, Pan X, Liu M, Wang S, Huang T, Cai YD. Tissue Expression Difference between mRNAs and lncRNAs. Int J Mol Sci 2018;19:ijms19113416. [PMID: 30384456 PMCID: PMC6274976 DOI: 10.3390/ijms19113416] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Revised: 10/26/2018] [Accepted: 10/28/2018] [Indexed: 12/15/2022] Open

Abstract

Messenger RNA (mRNA) and long noncoding RNA (lncRNA) are two main subgroups of RNAs participating in transcription regulation. With the development of next generation sequencing, increasing lncRNAs are identified. Many hidden functions of lncRNAs are also revealed. However, the differences in lncRNAs and mRNAs are still unclear. For example, we need to determine whether lncRNAs have stronger tissue specificity than mRNAs and which tissues have more lncRNAs expressed. To investigate such tissue expression difference between mRNAs and lncRNAs, we encoded 9339 lncRNAs and 14,294 mRNAs with 71 expression features, including 69 maximum expression features for 69 types of cells, one feature for the maximum expression in all cells, and one expression specificity feature that was measured as Chao-Shen-corrected Shannon's entropy. With advanced feature selection methods, such as maximum relevance minimum redundancy, incremental feature selection methods, and random forest algorithm, 13 features presented the dissimilarity of lncRNAs and mRNAs. The 11 cell subtype features indicated which cell types of the lncRNAs and mRNAs had the largest expression difference. Such cell subtypes may be the potential cell models for lncRNA identification and function investigation. The expression specificity feature suggested that the cell types to express mRNAs and lncRNAs were different. The maximum expression feature suggested that the maximum expression levels of mRNAs and lncRNAs were different. In addition, the rule learning algorithm, repeated incremental pruning to produce error reduction algorithm, was also employed to produce effective classification rules for classifying lncRNAs and mRNAs, which gave competitive results compared with random forest and could give a clearer picture of different expression patterns between lncRNAs and mRNAs. Results not only revealed the heterogeneous expression pattern of lncRNA and mRNA, but also gave rise to the development of a new tool to identify the potential biological functions of such RNA subgroups.

Collapse

Pan X, Hu X, Zhang YH, Chen L, Zhu L, Wan S, Huang T, Cai YD. Identification of the copy number variant biomarkers for breast cancer subtypes. Mol Genet Genomics 2018;294:95-110. [PMID: 30203254 DOI: 10.1007/s00438-018-1488-4] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Accepted: 09/03/2018] [Indexed: 01/07/2023]

A Computational Method for Classifying Different Human Tissues with Quantitatively Tissue-Specific Expressed Genes. Genes (Basel) 2018;9:genes9090449. [PMID: 30205473 PMCID: PMC6162521 DOI: 10.3390/genes9090449] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Revised: 09/01/2018] [Accepted: 09/04/2018] [Indexed: 02/06/2023] Open

An Efficient Feature Selection Strategy Based on Multiple Support Vector Machine Technology with Gene Expression Data. BIOMED RESEARCH INTERNATIONAL 2018;2018:7538204. [PMID: 30228989 PMCID: PMC6136508 DOI: 10.1155/2018/7538204] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/02/2018] [Revised: 07/17/2018] [Accepted: 07/29/2018] [Indexed: 11/18/2022]

Yuan F, Lu L, Zhang Y, Wang S, Cai YD. Data mining of the cancer-related lncRNAs GO terms and KEGG pathways by using mRMR method. Math Biosci 2018;304:1-8. [PMID: 30086268 DOI: 10.1016/j.mbs.2018.08.001] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Revised: 06/15/2018] [Accepted: 08/01/2018] [Indexed: 02/07/2023]

Computational Approach to Investigating Key GO Terms and KEGG Pathways Associated with CNV. BIOMED RESEARCH INTERNATIONAL 2018;2018:8406857. [PMID: 29850576 PMCID: PMC5925134 DOI: 10.1155/2018/8406857] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/26/2017] [Revised: 02/28/2018] [Accepted: 03/06/2018] [Indexed: 12/25/2022]

Wang D, Li JR, Zhang YH, Chen L, Huang T, Cai YD. Identification of Differentially Expressed Genes between Original Breast Cancer and Xenograft Using Machine Learning Algorithms. Genes (Basel) 2018. [PMID: 29534550 PMCID: PMC5867876 DOI: 10.3390/genes9030155] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open

Wang S, Wang D, Li J, Huang T, Cai YD. Identification and analysis of the cleavage site in a signal peptide using SMOTE, dagging, and feature selection methods. Mol Omics 2018;14:64-73. [DOI: 10.1039/c7mo00030h] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]