Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Feng C, Liu S, Zhang H, Guan R, Li D, Zhou F, Liang Y, Feng X. Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study. Int J Mol Sci 2020;21:E2181. [PMID: 32235704 DOI: 10.3390/ijms21062181] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 03/09/2020] [Accepted: 03/20/2020] [Indexed: 12/30/2022] Open

For:	Feng C, Liu S, Zhang H, Guan R, Li D, Zhou F, Liang Y, Feng X. Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study. Int J Mol Sci 2020;21:E2181. [PMID: 32235704 DOI: 10.3390/ijms21062181] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 03/09/2020] [Accepted: 03/20/2020] [Indexed: 12/30/2022] Open

Number

Cited by Other Article(s)

Aragones SD, Ferrer E. Clustering Analysis of Time Series of Affect in Dyadic Interactions. MULTIVARIATE BEHAVIORAL RESEARCH 2024;59:320-341. [PMID: 38407099 DOI: 10.1080/00273171.2023.2283633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]

Nazaret A, Fan JL, Lavallée VP, Cornish AE, Kiseliovas V, Masilionis I, Chun J, Bowman RL, Eisman SE, Wang J, Shi L, Levine RL, Mazutis L, Blei D, Pe'er D, Azizi E. Deep generative model deciphers derailed trajectories in acute myeloid leukemia. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.11.566719. [PMID: 38014231 PMCID: PMC10680623 DOI: 10.1101/2023.11.11.566719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]

Gunawan I, Vafaee F, Meijering E, Lock JG. An introduction to representation learning for single-cell data analysis. CELL REPORTS METHODS 2023;3:100547. [PMID: 37671013 PMCID: PMC10475795 DOI: 10.1016/j.crmeth.2023.100547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/07/2023]

Li Y, Nguyen J, Anastasiu DC, Arriaga EA. CosTaL: an accurate and scalable graph-based clustering algorithm for high-dimensional single-cell data analysis. Brief Bioinform 2023;24:bbad157. [PMID: 37150778 PMCID: PMC10199777 DOI: 10.1093/bib/bbad157] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 03/28/2023] [Accepted: 04/02/2023] [Indexed: 05/09/2023] Open

Li K, Sun YH, Ouyang Z, Negi S, Gao Z, Zhu J, Wang W, Chen Y, Piya S, Hu W, Zavodszky MI, Yalamanchili H, Cao S, Gehrke A, Sheehan M, Huh D, Casey F, Zhang X, Zhang B. scRNASequest: an ecosystem of scRNA-seq analysis, visualization, and publishing. BMC Genomics 2023;24:228. [PMID: 37131143 PMCID: PMC10155351 DOI: 10.1186/s12864-023-09332-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 04/25/2023] [Indexed: 05/04/2023] Open

Zhang Y, Sun H, Lian X, Tang J, Zhu F. ANPELA: Significantly Enhanced Quantification Tool for Cytometry-Based Single-Cell Proteomics. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023;10:e2207061. [PMID: 36950745 DOI: 10.1002/advs.202207061] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 02/13/2023] [Indexed: 05/27/2023]

Becker LM, Chen SH, Rodor J, de Rooij LPMH, Baker AH, Carmeliet P. Deciphering endothelial heterogeneity in health and disease at single-cell resolution: progress and perspectives. Cardiovasc Res 2023;119:6-27. [PMID: 35179567 PMCID: PMC10022871 DOI: 10.1093/cvr/cvac018] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 12/16/2021] [Accepted: 02/16/2022] [Indexed: 11/14/2022] Open

Rather AA, Chachoo MA. Robust correlation estimation and UMAP assisted topological analysis of omics data for disease subtyping. Comput Biol Med 2023;155:106640. [PMID: 36774889 DOI: 10.1016/j.compbiomed.2023.106640] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 01/08/2023] [Accepted: 02/05/2023] [Indexed: 02/10/2023]

Abstract

Deciphering information hidden in the gene expression assays for identifying disease subtypes has significant importance in precision medicine. However, computational limitations thwart this process due to the intricacy of the biological networks and the curse of dimensionality of gene expression data. Therefore, clustering in such scenarios often becomes the first choice of exploratory data analysis to identify natural structures and intrinsic patterns in the data. However, sparse and high dimensional nature of omics data prevents conventional clustering algorithms to discover subtypes that are clinically relevant and statistically significant. Hence, non-linear dimensionality reduction techniques coupled with clustering in such scenarios often becomes imperative to improve the clustering results. In this study, we present a robust pipeline to discover disease subtypes with clinical relevance. Specifically, we focus on discovering patient sub-groups that have a residual life patterns remarkably different from other sub-groups. This is significant because by refining prognosis, subtyping can reduce uncertainty in approximating patients expected outcome. The methodology present is based on robust correlation estimation, UMAP- a non-linear dimensionality reduction method and mapper- a tool from topology. Notably, we suggest a method for improving the robustness of the correlation matrix of gene expression data for improving the clustering results. The performance of the model is evaluated by applying to five cancer datasets obtained through TCGA and comparisons are performed with some state of the art methods of NEMO, RSC-OTRI and SNF with regard to log-rank test and Restricted Life Expectancy Difference. For example in GBM dataset, the minimum separation for any two discovered subtypes is 221 days which is significantly higher than the other methodologies. We also compared the results without using the robust correlation based estimate and observed that robust correlation improves separability between survival curves significantly. From the results we infer that our methodology performs better compared to other methodologies with regard to separating survival curves of patient sub-groups despite using single omics profiles of patients compared to multiple omics profiles of SNF and NEMO. Pathway over-representation analysis is performed on the final clustering results to investigate the biological underpinnings characterizing each subtype.

Collapse

Tian SZ, Li G, Ning D, Jing K, Xu Y, Yang Y, Fullwood MJ, Yin P, Huang G, Plewczynski D, Zhai J, Dai Z, Chen W, Zheng M. MCIBox: a toolkit for single-molecule multi-way chromatin interaction visualization and micro-domains identification. Brief Bioinform 2022;23:6696142. [PMID: 36094071 DOI: 10.1093/bib/bbac380] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Revised: 08/05/2022] [Accepted: 08/09/2022] [Indexed: 12/14/2022] Open

Affiliation(s)

Simon Zhongyuan Tian Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Rd, Nanshan District, Shenzhen, 518055, Guangdong, China
Guoliang Li National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, No.1 Shizishan Street, Hongshan District, Wuhan, 430070, Hubei, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, No.1, Shizishan Street, Hongshan District, Wuhan, 430070, Hubei, China
Duo Ning Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Rd, Nanshan District, Shenzhen, 518055, Guangdong, China
Kai Jing Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Rd, Nanshan District, Shenzhen, 518055, Guangdong, China
Yewen Xu Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Rd, Nanshan District, Shenzhen, 518055, Guangdong, China
Yang Yang Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Rd, Nanshan District, Shenzhen, 518055, Guangdong, China
Melissa J Fullwood School of Biological Sciences, Nanyang Technological University, 60 Nanyang Dr, 637551, Singapore.,Cancer Science Institute of Singapore, National University of Singapore, 14 Medical Dr, 117599, Singapore.,Institute of Molecular and Cell Biology, Agency for Science, Technology and Research (A*STAR), 61 Biopolis Dr, 138673, Singapore
Pengfei Yin Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Rd, Nanshan District, Shenzhen, 518055, Guangdong, China
Guangyu Huang Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Rd, Nanshan District, Shenzhen, 518055, Guangdong, China
Dariusz Plewczynski Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Pl. Politechniki 1, 00-661, Warsaw, Poland.,Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, S. Banacha 2c, 00-927, Warsaw, Poland
Jixian Zhai Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Rd, Nanshan District, Shenzhen, 518055, Guangdong, China.,Institute of Plant and Food Science, Southern University of Science and Technology, Southern University of Science and Technology, 1088, Xueyuan Rd, Nanshan District, Shenzhen, 518055, Guangdong, China.,Key Laboratory of Molecular Design for Plant Cell Factory of Guangdong Higher Education Institutes, Southern University of Science and Technology, 1088 Xueyuan Rd, Nanshan District, Shenzhen, 518055, Guangdong, China
Ziwei Dai Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Rd, Nanshan District, Shenzhen, 518055, Guangdong, China
Wei Chen Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Rd, Nanshan District, Shenzhen, 518055, Guangdong, China
Meizhen Zheng Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Rd, Nanshan District, Shenzhen, 518055, Guangdong, China

Collapse

Chowdhury HA, Bhattacharyya DK, Kalita JK. UIPBC: An effective clustering for scRNA-seq data analysis without user input. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Dong J, Zhang Y, Wang F. scSemiAE: a deep model with semi-supervised learning for single-cell transcriptomics. BMC Bioinformatics 2022;23:161. [PMID: 35513780 PMCID: PMC9069784 DOI: 10.1186/s12859-022-04703-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 04/28/2022] [Indexed: 11/30/2022] Open

CASSL: A cell-type annotation method for single cell transcriptomics data using semi-supervised learning. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03440-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]

Seth S, Mallik S, Bhadra T, Zhao Z. Dimensionality Reduction and Louvain Agglomerative Hierarchical Clustering for Cluster-Specified Frequent Biomarker Discovery in Single-Cell Sequencing Data. Front Genet 2022;13:828479. [PMID: 35198011 PMCID: PMC8859265 DOI: 10.3389/fgene.2022.828479] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 01/05/2022] [Indexed: 02/02/2023] Open

Abstract

The major interest domains of single-cell RNA sequential analysis are identification of existing and novel types of cells, depiction of cells, cell fate prediction, classification of several types of tumor, and investigation of heterogeneity in different cells. Single-cell clustering plays an important role to solve the aforementioned questions of interest. Cluster identification in high dimensional single-cell sequencing data faces some challenges due to its nature. Dimensionality reduction models can solve the problem. Here, we introduce a potential cluster specified frequent biomarkers discovery framework using dimensionality reduction and hierarchical agglomerative clustering Louvain for single-cell RNA sequencing data analysis. First, we pre-filtered the features with fewer number of cells and the cells with fewer number of features. Then we created a Seurat object to store data and analysis together and used quality control metrics to discard low quality or dying cells. Afterwards we applied global-scaling normalization method "LogNormalize" for data normalization. Next, we computed cell-to-cell highly variable features from our dataset. Then, we applied a linear transformation and linear dimensionality reduction technique, Principal Component Analysis (PCA) to project high dimensional data to an optimal low-dimensional space. After identifying fifty "significant"principal components (PCs) based on strong enrichment of low p-value features, we implemented a graph-based clustering algorithm Louvain for the cell clustering of 10 top significant PCs. We applied our model to a single-cell RNA sequential dataset for a rare intestinal cell type in mice (NCBI accession ID:GSE62270, 23,630 features and 1872 samples (cells)). We obtained 10 cell clusters with a maximum modularity of 0.885 1. After detecting the cell clusters, we found 3871 cluster-specific biomarkers using an expression feature extraction statistical tool for single-cell sequencing data, Model-based Analysis of Single-cell Transcriptomics (MAST) with a log ₂ FC threshold of 0.25 and a minimum feature detection of 25%. From these cluster-specific biomarkers, we found 1892 most frequent markers, i.e., overlapping biomarkers. We performed degree hub gene network analysis using Cytoscape and reported the five highest degree genes (Rps4x, Rps18, Rpl13a, Rps12 and Rpl18a). Subsequently, we performed KEGG pathway and Gene Ontology enrichment analysis of cluster markers using David 6.8 software tool. In summary, our proposed framework that integrated dimensionality reduction and agglomerative hierarchical clustering provides a robust approach to efficiently discover cluster-specific frequent biomarkers, i.e., overlapping biomarkers from single-cell RNA sequencing data.

Collapse

Vasighizaker A, Danda S, Rueda L. Discovering cell types using manifold learning and enhanced visualization of single-cell RNA-Seq data. Sci Rep 2022;12:120. [PMID: 34996927 PMCID: PMC8742092 DOI: 10.1038/s41598-021-03613-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Accepted: 12/07/2021] [Indexed: 01/03/2023] Open

Mazan-Mamczarz K, Ha J, De S, Sen P. Single-Cell Analysis of the Transcriptome and Epigenome. Methods Mol Biol 2022;2399:21-60. [PMID: 35604552 PMCID: PMC9352558 DOI: 10.1007/978-1-0716-1831-8_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]

Ryu KH, Zhu Y, Schiefelbein J. Plant Cell Identity in the Era of Single-Cell Transcriptomics. Annu Rev Genet 2021;55:479-496. [PMID: 34530637 DOI: 10.1146/annurev-genet-071719-020453] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Chowdhury HA, Bhattacharyya DK, Kalita JK. UICPC: Centrality-based clustering for scRNA-seq data analysis without user input. Comput Biol Med 2021;137:104820. [PMID: 34508973 DOI: 10.1016/j.compbiomed.2021.104820] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 08/24/2021] [Accepted: 08/27/2021] [Indexed: 11/16/2022]

Zhang W, Xue X, Zheng X, Fan Z. NMFLRR: Clustering scRNA-seq data by integrating non-negative matrix factorization with low rank representation. IEEE J Biomed Health Inform 2021;26:1394-1405. [PMID: 34310328 DOI: 10.1109/jbhi.2021.3099127] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

Kopf A, Fortuin V, Somnath VR, Claassen M. Mixture-of-Experts Variational Autoencoder for clustering and generating from similarity-based representations on single cell data. PLoS Comput Biol 2021;17:e1009086. [PMID: 34191792 PMCID: PMC8277074 DOI: 10.1371/journal.pcbi.1009086] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 07/13/2021] [Accepted: 05/14/2021] [Indexed: 12/17/2022] Open

Gaydosik AM, Tabib T, Domsic R, Khanna D, Lafyatis R, Fuschiotti P. Single-cell transcriptome analysis identifies skin-specific T-cell responses in systemic sclerosis. Ann Rheum Dis 2021;80:1453-1460. [PMID: 34031030 DOI: 10.1136/annrheumdis-2021-220209] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Accepted: 05/08/2021] [Indexed: 12/13/2022]

Moehlin J, Mollet B, Colombo BM, Mendoza-Parra MA. Inferring biologically relevant molecular tissue substructures by agglomerative clustering of digitized spatial transcriptomes with multilayer. Cell Syst 2021;12:694-705.e3. [PMID: 34159899 DOI: 10.1016/j.cels.2021.04.008] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Revised: 01/08/2021] [Accepted: 04/13/2021] [Indexed: 01/04/2023]

Nussbaum YI, Manjunath Y, Suvilesh KN, Warren WC, Shyu CR, Kaifi JT, Ciorba MA, Mitchem JB. Current and Prospective Methods for Assessing Anti-Tumor Immunity in Colorectal Cancer. Int J Mol Sci 2021;22:4802. [PMID: 33946558 PMCID: PMC8125332 DOI: 10.3390/ijms22094802] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 04/23/2021] [Accepted: 04/27/2021] [Indexed: 02/06/2023] Open

Xi NM, Li JJ. Benchmarking Computational Doublet-Detection Methods for Single-Cell RNA Sequencing Data. Cell Syst 2021;12:176-194.e6. [PMID: 33338399 PMCID: PMC7897250 DOI: 10.1016/j.cels.2020.11.008] [Citation(s) in RCA: 85] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2020] [Revised: 10/06/2020] [Accepted: 11/19/2020] [Indexed: 12/29/2022]

Technique of Gene Expression Profiles Extraction Based on the Complex Use of Clustering and Classification Methods. Diagnostics (Basel) 2020;10:diagnostics10080584. [PMID: 32806785 PMCID: PMC7460566 DOI: 10.3390/diagnostics10080584] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2020] [Revised: 08/10/2020] [Accepted: 08/11/2020] [Indexed: 11/16/2022] Open

Abstract

In this paper, we present the results of the research concerning extraction of informative gene expression profiles from high-dimensional array of gene expressions considering the state of patients' health using clustering method, ML-based binary classifiers and fuzzy inference system. Applying of the proposed stepwise procedure can allow us to extract the most informative genes taking into account both the subtypes of disease or state of the patient's health for further reconstruction of gene regulatory networks based on the allocated genes and following simulation of the reconstructed models. We used the publicly available gene expressions data as the experimental ones which were obtained using DNA microarray experiments and contained two types of patients' gene expression profiles-the patients with lung cancer tumor and healthy patients. The stepwise procedure of the data processing assumes the following steps-in the beginning, we reduce the number of genes by removing non-informative genes in terms of statistical criteria and Shannon entropy; then, we perform the stepwise hierarchical clustering of gene expression profiles at hierarchical levels from 1 to 10 using the SOTA (Self-Organizing Tree Algorithm) clustering algorithm with correlation distance metric. The quality of the obtained clustering was evaluated using the complex clustering quality criterion which is considered both the gene expression profiles distribution relative to center of the clusters where these gene expression profiles are allocated and the centers of the clusters distribution. The result of this stage execution was a selection of the optimal cluster at each of the hierarchical levels which corresponded to the minimum value of the quality criterion. At the next step, we have implemented a classification procedure of the examined objects using four well known binary classifiers-logistic regression, support-vector machine, decision trees and random forest classifier. The effectiveness of the appropriate technique was evaluated based on the use of ROC (Receiver Operating Characteristic) analysis using criteria, included as the components, the errors of both the first and the second kinds. The final decision concerning the extraction of the most informative subset of gene expression profiles was taken based on the use of the fuzzy inference system, the inputs of which are the results of the appropriate single classifiers operation and the output is the final solution concerning state of the patient's health. To our mind, the implementation of the proposed stepwise procedure of the informative gene expression profiles extraction create the conditions for the increasing effectiveness of the further procedure of gene regulatory networks reconstruction and the following simulation of the reconstructed models considering the subtypes of the disease and/or state of the patient's health.

Collapse