1
|
Stephan S, Galland S, Labbani Narsis O, Shoji K, Vachenc S, Gerart S, Nicolle C. Agent-based approaches for biological modeling in oncology: A literature review. Artif Intell Med 2024; 152:102884. [PMID: 38703466 DOI: 10.1016/j.artmed.2024.102884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 04/25/2024] [Accepted: 04/25/2024] [Indexed: 05/06/2024]
Abstract
CONTEXT Computational modeling involves the use of computer simulations and models to study and understand real-world phenomena. Its application is particularly relevant in the study of potential interactions between biological elements. It is a promising approach to understand complex biological processes and predict their behavior under various conditions. METHODOLOGY This paper is a review of the recent literature on computational modeling of biological systems. Our study focuses on the field of oncology and the use of artificial intelligence (AI) and, in particular, agent-based modeling (ABM), between 2010 and May 2023. RESULTS Most of the articles studied focus on improving the diagnosis and understanding the behaviors of biological entities, with metaheuristic algorithms being the models most used. Several challenges are highlighted regarding increasing and structuring knowledge about biological systems, developing holistic models that capture multiple scales and levels of organization, reproducing emergent behaviors of biological systems, validating models with experimental data, improving computational performance of models and algorithms, and ensuring privacy and personal data protection are discussed.
Collapse
Affiliation(s)
- Simon Stephan
- UTBM, CIAD UMR 7533, Belfort, F-90010, France; Université de Bourgogne, CIAD UMR 7533, Dijon, F-21000, France.
| | | | | | - Kenji Shoji
- Oncodesign Precision Medicine (OPM), 18 Rue Jean Mazen, Dijon, F-21000, France
| | - Sébastien Vachenc
- Oncodesign Precision Medicine (OPM), 18 Rue Jean Mazen, Dijon, F-21000, France
| | - Stéphane Gerart
- Oncodesign Precision Medicine (OPM), 18 Rue Jean Mazen, Dijon, F-21000, France
| | | |
Collapse
|
2
|
Chen J, Wen B. Bi-level gene selection of cancer by combining clustering and sparse learning. Comput Biol Med 2024; 172:108236. [PMID: 38471351 DOI: 10.1016/j.compbiomed.2024.108236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 02/07/2024] [Accepted: 02/25/2024] [Indexed: 03/14/2024]
Abstract
The diagnosis of cancer based on gene expression profile data has attracted extensive attention in the field of biomedical science. This type of data usually has the characteristics of high dimensionality and noise. In this paper, a hybrid gene selection method based on clustering and sparse learning is proposed to choose the key genes with high precision. We first propose a filter method, which combines the k-means clustering algorithm and signal-to-noise ratio ranking method, and then, a weighted gene co-expression network has been applied to the reduced data set to identify modules corresponding to biological pathways. Moreover, we choose the key genes by using group bridge and sparse group lasso as wrapper methods. Finally, we conduct some numerical experiments on six cancer datasets. The numerical results show that our proposed method has achieved good performance in gene selection and cancer classification.
Collapse
Affiliation(s)
- Junnan Chen
- School of Science, Hebei University of Technology, Tianjin, PR China.
| | - Bo Wen
- Institute of Mathematics, Hebei University of Technology, Tianjin, PR China.
| |
Collapse
|
3
|
Li J, Zhang H, Mu B, Zuo H, Zhou K. Identifying phenotype-associated subpopulations through LP_SGL. Brief Bioinform 2023; 25:bbad424. [PMID: 38008419 PMCID: PMC10753413 DOI: 10.1093/bib/bbad424] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 09/28/2023] [Accepted: 10/31/2023] [Indexed: 11/28/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) enables the resolution of cellular heterogeneity in diseases and facilitates the identification of novel cell types and subtypes. However, the grouping effects caused by cell-cell interactions are often overlooked in the development of tools for identifying subpopulations. We proposed LP_SGL which incorporates cell group structure to identify phenotype-associated subpopulations by integrating scRNA-seq, bulk expression and bulk phenotype data. Cell groups from scRNA-seq data were obtained by the Leiden algorithm, which facilitates the identification of subpopulations and improves model robustness. LP_SGL identified a higher percentage of cancer cells, T cells and tumor-associated cells than Scissor and scAB on lung adenocarcinoma diagnosis, melanoma drug response and liver cancer survival datasets, respectively. Biological analysis on three original datasets and four independent external validation sets demonstrated that the signaling genes of this cell subset can predict cancer, immunotherapy and survival.
Collapse
Affiliation(s)
- Juntao Li
- College of Mathematics and Information Science, Henan Normal University, 46 Jianshe East Road, 453007, Xinxiang, China
| | - Hongmei Zhang
- College of Mathematics and Information Science, Henan Normal University, 46 Jianshe East Road, 453007, Xinxiang, China
| | - Bingyu Mu
- College of Arts and Design, Zhengzhou University of Light Industry, No. 5 Dongfeng Road, 450000, Zhengzhou, China
| | - Hongliang Zuo
- College of Mathematics and Information Science, Henan Normal University, 46 Jianshe East Road, 453007, Xinxiang, China
| | - Kanglei Zhou
- School of Computer Science and Engneering, Beihang University, 37 Xueyuan Road, Haidian District, 100191, Beijing, China
| |
Collapse
|
4
|
Song X, Liang K, Li J. WGRLR: A Weighted Group Regularized Logistic Regression for Cancer Diagnosis and Gene Selection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1563-1573. [PMID: 36044492 DOI: 10.1109/tcbb.2022.3203167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Sparse regressions applied to cancer diagnosis suffer from noise reduction, gene grouping, and group significance evaluation. This paper presented the weighted group regularized logistic regression (WGRLR) for dealing with the above problems. Clean data was separated from noisy gene expression profile data, based on which gene grouping and model building were performed. An interpretable gene group significance evaluation criterion was proposed based on symmetrical uncertainty and module eigengene. A group-wise individual gene significance evaluation criterion was also presented. The performances of the proposed method were compared with WGGL, ASGL-CMI, SGL, GL, Elastic Net, and lasso on acute leukemia and brain cancer data. Experimental results demonstrate that the proposed method is superior to the other six methods in cancer diagnosis accuracy and gene selection.
Collapse
|
5
|
Feature selection using Information Gain and decision information in neighborhood decision system. Appl Soft Comput 2023. [DOI: 10.1016/j.asoc.2023.110100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
|
6
|
Wang Y, Li X, Ruiz R. Feature Selection With Maximal Relevance and Minimal Supervised Redundancy. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:707-717. [PMID: 35130179 DOI: 10.1109/tcyb.2021.3139898] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Feature selection (FS) for classification is crucial for large-scale images and bio-microarray data using machine learning. It is challenging to select informative features from high-dimensional data which generally contains many irrelevant and redundant features. These features often impede classifier performance and misdirect classification tasks. In this article, we present an efficient FS algorithm to improve classification accuracy by taking into account both the relevance of the features and the pairwise features correlation in regard to class labels. Based on conditional mutual information and entropy, a new supervised similarity measure is proposed. The supervised similarity measure is connected with feature redundancy minimization evaluation and then combined with feature relevance maximization evaluation. A new criterion max-relevance and min-supervised-redundancy (MRMSR) is introduced and theoretically proved for FS. The proposed MRMSR-based method is compared to seven existing FS approaches on several frequently studied public benchmark datasets. Experimental results demonstrate that the proposal is more effective at selecting informative features and results in better competitive classification performance.
Collapse
|
7
|
Li J, Cao F, Gao Q, Liang K, Tang Y. Improving diagnosis accuracy of non-small cell lung carcinoma on noisy data by adaptive group lasso regularized multinomial regression. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
8
|
Tuna E, Evren A, Ustaoğlu E, Şahin B, Şahinbaşoğlu ZZ. Testing Nonlinearity with Rényi and Tsallis Mutual Information with an Application in the EKC Hypothesis. ENTROPY (BASEL, SWITZERLAND) 2022; 25:79. [PMID: 36673220 PMCID: PMC9857815 DOI: 10.3390/e25010079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 12/20/2022] [Accepted: 12/28/2022] [Indexed: 06/17/2023]
Abstract
The nature of dependence between random variables has always been the subject of many statistical problems for over a century. Yet today, there is a great deal of research on this topic, especially focusing on the analysis of nonlinearity. Shannon mutual information has been considered to be the most comprehensive measure of dependence for evaluating total dependence, and several methods have been suggested for discerning the linear and nonlinear components of dependence between two variables. We, in this study, propose employing the Rényi and Tsallis mutual information measures for measuring total dependence because of their parametric nature. We first use a residual analysis in order to remove linear dependence between the variables, and then we compare the Rényi and Tsallis mutual information measures of the original data with that the lacking linear component to determine the degree of nonlinearity. A comparison against the values of the Shannon mutual information measure is also provided. Finally, we apply our method to the environmental Kuznets curve (EKC) and demonstrate the validity of the EKC hypothesis for Eastern Asian and Asia-Pacific countries.
Collapse
Affiliation(s)
- Elif Tuna
- Department of Statistics, Faculty of Sciences and Literature, Yildiz Technical University, Davutpasa, Esenler, 34210 Istanbul, Turkey
| | - Atıf Evren
- Department of Statistics, Faculty of Sciences and Literature, Yildiz Technical University, Davutpasa, Esenler, 34210 Istanbul, Turkey
| | - Erhan Ustaoğlu
- Department of Informatics, Faculty of Management, Marmara University, Göztepe, 34180 Istanbul, Turkey
| | - Büşra Şahin
- Department of Computer, Faculty of Engineering, Halic University, Eyupsultan, 34060 Istanbul, Turkey
| | - Zehra Zeynep Şahinbaşoğlu
- Department of Statistics, Faculty of Sciences and Literature, Yildiz Technical University, Davutpasa, Esenler, 34210 Istanbul, Turkey
| |
Collapse
|
9
|
Bai F, Puk KM, Liu J, Zhou H, Tao P, Zhou W, Wang S. Sparse group selection and analysis of function-related residue for protein-state recognition. J Comput Chem 2022; 43:1342-1354. [PMID: 35656889 PMCID: PMC9248267 DOI: 10.1002/jcc.26937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Revised: 03/23/2022] [Accepted: 05/08/2022] [Indexed: 11/08/2022]
Abstract
Machine learning methods have helped to advance wide range of scientific and technological field in recent years, including computational chemistry. As the chemical systems could become complex with high dimension, feature selection could be critical but challenging to develop reliable machine learning based prediction models, especially for proteins as bio-macromolecules. In this study, we applied sparse group lasso (SGL) method as a general feature selection method to develop classification model for an allosteric protein in different functional states. This results into a much improved model with comparable accuracy (Acc) and only 28 selected features comparing to 289 selected features from a previous study. The Acc achieves 91.50% with 1936 selected feature, which is far higher than that of baseline methods. In addition, grouping protein amino acids into secondary structures provides additional interpretability of the selected features. The selected features are verified as associated with key allosteric residues through comparison with both experimental and computational works about the model protein, and demonstrate the effectiveness and necessity of applying rigorous feature selection and evaluation methods on complex chemical systems.
Collapse
Affiliation(s)
- Fangyun Bai
- Department of Management Science and Engineering, Tongji University. Fangyun Bai and Kin Ming Puk contributed equally to this work
| | | | - Jin Liu
- Department of Pharmaceutical Sciences, University of North Texas System College of Pharmacy, University of North Texas Health Science Center
| | - Hongyu Zhou
- Department of Chemistry, Center for Scientific Computation, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University
| | - Peng Tao
- Department of Chemistry, Center for Scientific Computation, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University
| | - Wenyong Zhou
- Department of Management Science and Engineering, Tongji University
| | - Shouyi Wang
- Corresponding author: Shouyi Wang, Department of Industrial, Manufacturing and Systems Engineering, University of Texas at Arlington.
| |
Collapse
|
10
|
|
11
|
Feature selection using self-information uncertainty measures in neighborhood information systems. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03760-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
12
|
|
13
|
Xu J, Qu K, Meng X, Sun Y, Hou Q. Feature selection based on multiview entropy measures in multiperspective rough set. INT J INTELL SYST 2022. [DOI: 10.1002/int.22878] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Affiliation(s)
- Jiucheng Xu
- Engineering Lab of Intelligence Business & Internet of Things Henan Province Xinxiang China
- College of Computer and Information Engineering Henan Normal University Xinxiang China
| | - Kanglin Qu
- Engineering Lab of Intelligence Business & Internet of Things Henan Province Xinxiang China
- College of Computer and Information Engineering Henan Normal University Xinxiang China
| | - Xiangru Meng
- Engineering Lab of Intelligence Business & Internet of Things Henan Province Xinxiang China
- College of Computer and Information Engineering Henan Normal University Xinxiang China
| | - Yuanhao Sun
- Engineering Lab of Intelligence Business & Internet of Things Henan Province Xinxiang China
- College of Computer and Information Engineering Henan Normal University Xinxiang China
| | - Qincheng Hou
- Engineering Lab of Intelligence Business & Internet of Things Henan Province Xinxiang China
- College of Computer and Information Engineering Henan Normal University Xinxiang China
| |
Collapse
|
14
|
Li X, Wang Y, Ruiz R. A Survey on Sparse Learning Models for Feature Selection. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:1642-1660. [PMID: 32386172 DOI: 10.1109/tcyb.2020.2982445] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Feature selection is important in both machine learning and pattern recognition. Successfully selecting informative features can significantly increase learning accuracy and improve result comprehensibility. Various methods have been proposed to identify informative features from high-dimensional data by removing redundant and irrelevant features to improve classification accuracy. In this article, we systematically survey existing sparse learning models for feature selection from the perspectives of individual sparse feature selection and group sparse feature selection, and analyze the differences and connections among various sparse learning models. Promising research directions and topics on sparse learning models are analyzed.
Collapse
|
15
|
Cao W, Pomeroy MJ, Zhang S, Tan J, Liang Z, Gao Y, Abbasi AF, Pickhardt PJ. An Adaptive Learning Model for Multiscale Texture Features in Polyp Classification via Computed Tomographic Colonography. SENSORS (BASEL, SWITZERLAND) 2022; 22:907. [PMID: 35161653 PMCID: PMC8840570 DOI: 10.3390/s22030907] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 01/14/2022] [Accepted: 01/20/2022] [Indexed: 12/10/2022]
Abstract
Objective: As an effective lesion heterogeneity depiction, texture information extracted from computed tomography has become increasingly important in polyp classification. However, variation and redundancy among multiple texture descriptors render a challenging task of integrating them into a general characterization. Considering these two problems, this work proposes an adaptive learning model to integrate multi-scale texture features. Methods: To mitigate feature variation, the whole feature set is geometrically split into several independent subsets that are ranked by a learning evaluation measure after preliminary classifications. To reduce feature redundancy, a bottom-up hierarchical learning framework is proposed to ensure monotonic increase of classification performance while integrating these ranked sets selectively. Two types of classifiers, traditional (random forest + support vector machine)- and convolutional neural network (CNN)-based, are employed to perform the polyp classification under the proposed framework with extended Haralick measures and gray-level co-occurrence matrix (GLCM) as inputs, respectively. Experimental results are based on a retrospective dataset of 63 polyp masses (defined as greater than 3 cm in largest diameter), including 32 adenocarcinomas and 31 benign adenomas, from adult patients undergoing first-time computed tomography colonography and who had corresponding histopathology of the detected masses. Results: We evaluate the performance of the proposed models by the area under the curve (AUC) of the receiver operating characteristic curve. The proposed models show encouraging performances of an AUC score of 0.925 with the traditional classification method and an AUC score of 0.902 with CNN. The proposed adaptive learning framework significantly outperforms nine well-established classification methods, including six traditional methods and three deep learning ones with a large margin. Conclusions: The proposed adaptive learning model can combat the challenges of feature variation through a multiscale grouping of feature inputs, and the feature redundancy through a hierarchal sorting of these feature groups. The improved classification performance against comparative models demonstrated the feasibility and utility of this adaptive learning procedure for feature integration.
Collapse
Affiliation(s)
- Weiguo Cao
- Department of Radiology, Stony Brook University, Stony Brook, NY 11794, USA; (W.C.); (M.J.P.); (S.Z.); (Y.G.); (A.F.A.)
| | - Marc J. Pomeroy
- Department of Radiology, Stony Brook University, Stony Brook, NY 11794, USA; (W.C.); (M.J.P.); (S.Z.); (Y.G.); (A.F.A.)
- Department of Biomedical Engineering, Stony Brook University, Stony Brook, NY 11794, USA
| | - Shu Zhang
- Department of Radiology, Stony Brook University, Stony Brook, NY 11794, USA; (W.C.); (M.J.P.); (S.Z.); (Y.G.); (A.F.A.)
| | - Jiaxing Tan
- Department of Computer Science, City University of New York, New York, NY 10314, USA;
| | - Zhengrong Liang
- Department of Radiology, Stony Brook University, Stony Brook, NY 11794, USA; (W.C.); (M.J.P.); (S.Z.); (Y.G.); (A.F.A.)
- Department of Biomedical Engineering, Stony Brook University, Stony Brook, NY 11794, USA
| | - Yongfeng Gao
- Department of Radiology, Stony Brook University, Stony Brook, NY 11794, USA; (W.C.); (M.J.P.); (S.Z.); (Y.G.); (A.F.A.)
| | - Almas F. Abbasi
- Department of Radiology, Stony Brook University, Stony Brook, NY 11794, USA; (W.C.); (M.J.P.); (S.Z.); (Y.G.); (A.F.A.)
| | - Perry J. Pickhardt
- Department of Radiology, School of Medicine, University of Wisconsin, Madison, WI 53792, USA;
| |
Collapse
|
16
|
Weighted Gene Coexpression Network Analysis in Mouse Livers following Ischemia-Reperfusion and Extensive Hepatectomy. EVIDENCE-BASED COMPLEMENTARY AND ALTERNATIVE MEDICINE 2022; 2021:3897715. [PMID: 35003298 PMCID: PMC8736699 DOI: 10.1155/2021/3897715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Accepted: 11/23/2021] [Indexed: 11/17/2022]
Abstract
In mouse models, the recovery of liver volume is mainly mediated by the proliferation of hepatocytes after partial hepatectomy that is commonly accompanied with ischemia-reperfusion. The identification of differently expressed genes in liver following partial hepatectomy benefits the better understanding of the molecular mechanisms during liver regeneration (LR) with appliable clinical significance. Briefly, studying different gene expression patterns in liver tissues collected from the mice group that survived through extensive hepatectomy will be of huge critical importance in LR than those collected from the mice group that survived through appropriate hepatectomy. In this study, we performed the weighted gene coexpression network analysis (WGCNA) to address the central candidate genes and to construct the free-scale gene coexpression networks using the identified dynamic different expressive genes in liver specimens from the mice with 85% hepatectomy (20% for seven-day survial rate) and 50% hepatectomy (100% for seven-day survial rate under ischemia-reperfusion condition compared with the sham group control mice). The WGCNA combined with Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) enrichment analyses pinpointed out the apparent distinguished importance of three gene expression modules: the blue module for apoptotic process, the turquoise module for lipid metabolism, and the green module for fatty acid metabolic process in LR following extensive hepatectomy. WGCNA analysis and protein-protein interaction (PPI) network construction highlighted FAM175B, OGT, and PDE3B were the potential three hub genes in the previously mentioned three modules. This work may help to provide new clues to the future fundamental study and treatment strategy for LR following liver injury and hepatectomy.
Collapse
|
17
|
Li J, Liang K, Song X. Logistic regression with adaptive sparse group lasso penalty and its application in acute leukemia diagnosis. Comput Biol Med 2021; 141:105154. [PMID: 34952336 DOI: 10.1016/j.compbiomed.2021.105154] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 12/14/2021] [Accepted: 12/15/2021] [Indexed: 01/15/2023]
Abstract
Cancer diagnosis based on gene expression profile data has attracted extensive attention in computational biology and medicine. It suffers from three challenges in practical applications: noise, gene grouping, and adaptive gene selection. This paper aims to solve the above problems by developing the logistic regression with adaptive sparse group lasso penalty (LR-ASGL). A noise information processing method for cancer gene expression profile data is first presented via robust principal component analysis. Genes are then divided into groups by performing weighted gene co-expression network analysis on the clean matrix. By approximating the relative value of the noise size, gene reliability criterion and robust evaluation criterion are proposed. Finally, LR-ASGL is presented for simultaneous cancer diagnosis and adaptive gene selection. The performance of the proposed method is compared with the other four methods in three simulation settings: Gaussian noise, uniformly distributed noise, and mixed noise. The acute leukemia data are adopted as an experimental example to demonstrate the advantages of LR-ASGL in prediction and gene selection.
Collapse
Affiliation(s)
- Juntao Li
- College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, China.
| | - Ke Liang
- College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, China.
| | - Xuekun Song
- College of Information Technology, Henan University of Chinese Medicine, Zhengzhou, 450046, China.
| |
Collapse
|
18
|
Liu X, Luo Y, He T, Ren M, Xu Y. Predicting essential genes of 37 prokaryotes by combining information-theoretic features. J Microbiol Methods 2021; 188:106297. [PMID: 34343487 DOI: 10.1016/j.mimet.2021.106297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 07/30/2021] [Accepted: 07/30/2021] [Indexed: 10/20/2022]
Abstract
Essential genes are required for the reproduction and survival of an organism. Rapid identification of essential genes has practical application value in biomedicine. Information theory is a discipline that studies information transmission. Based on the similarity between heredity and information transmission, measures derived from information theory can be applied to genetic sequence analysis on different scales. In this study, we employed 114 features extracted by information theory methods to construct an essential gene prediction model. We applied a backpropagation neural network to construct a classifier and employed it to predict essential genes of 37 prokaryotes. The performance of the classifier was evaluated by applying intra-organism prediction and leave-one-species-out prediction. Among 37 prokaryotes, intra-organism prediction and leave-one-species-out prediction yielded average AUC scores of 0.791 and 0.717, respectively. Considering the potential redundancy in the feature set, we performed feature selection and constructed a key feature subset. In the above two prediction methods, the average AUC scores of 37 organisms obtained by using key features were 0.786 and 0.714, respectively. The results show the potential and universality of information-theoretic features in the study of prokaryotic essential gene prediction.
Collapse
Affiliation(s)
- Xiao Liu
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China.
| | - Yachuan Luo
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China
| | - Ting He
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China
| | - Meixiang Ren
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China
| | - Yuqiao Xu
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China
| |
Collapse
|
19
|
Feature Selection Combining Information Theory View and Algebraic View in the Neighborhood Decision System. ENTROPY 2021; 23:e23060704. [PMID: 34199499 PMCID: PMC8230021 DOI: 10.3390/e23060704] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 05/30/2021] [Accepted: 05/31/2021] [Indexed: 11/17/2022]
Abstract
Feature selection is one of the core contents of rough set theory and application. Since the reduction ability and classification performance of many feature selection algorithms based on rough set theory and its extensions are not ideal, this paper proposes a feature selection algorithm that combines the information theory view and algebraic view in the neighborhood decision system. First, the neighborhood relationship in the neighborhood rough set model is used to retain the classification information of continuous data, to study some uncertainty measures of neighborhood information entropy. Second, to fully reflect the decision ability and classification performance of the neighborhood system, the neighborhood credibility and neighborhood coverage are defined and introduced into the neighborhood joint entropy. Third, a feature selection algorithm based on neighborhood joint entropy is designed, which improves the disadvantage that most feature selection algorithms only consider information theory definition or algebraic definition. Finally, experiments and statistical analyses on nine data sets prove that the algorithm can effectively select the optimal feature subset, and the selection result can maintain or improve the classification performance of the data set.
Collapse
|
20
|
Chen L, Li J, Chang M. Cancer Diagnosis and Disease Gene Identification via Statistical Machine Learning. Curr Bioinform 2021. [DOI: 10.2174/1574893615666200207094947] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Diagnosing cancer and identifying the disease gene by using DNA microarray gene
expression data are the hot topics in current bioinformatics. This paper is devoted to the latest
development in cancer diagnosis and gene selection via statistical machine learning. A support
vector machine is firstly introduced for the binary cancer diagnosis. Then, 1-norm support vector
machine, doubly regularized support vector machine, adaptive huberized support vector machine
and other extensions are presented to improve the performance of gene selection. Lasso, elastic
net, partly adaptive elastic net, group lasso, sparse group lasso, adaptive sparse group lasso and
other sparse regression methods are also introduced for performing simultaneous binary cancer
classification and gene selection. In addition to introducing three strategies for reducing multiclass
to binary, methods of directly considering all classes of data in a learning model (multi_class
support vector, sparse multinomial regression, adaptive multinomial regression and so on) are
presented for performing multiple cancer diagnosis. Limitations and promising directions are also
discussed.
Collapse
Affiliation(s)
- Liuyuan Chen
- Henan Engineering Laboratory for Big Data Statistical Analysis and Optimal Control, College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, China
| | - Juntao Li
- Henan Engineering Laboratory for Big Data Statistical Analysis and Optimal Control, College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, China
| | - Mingming Chang
- Henan Engineering Laboratory for Big Data Statistical Analysis and Optimal Control, College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, China
| |
Collapse
|
21
|
Li J, Chang M, Gao Q, Song X, Gao Z. Lung Cancer Classification and Gene Selection by Combining Affinity Propagation Clustering and Sparse Group Lasso. Curr Bioinform 2020. [DOI: 10.2174/1574893614666191017103557] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Background:
Cancer threatens human health seriously. Diagnosing cancer via gene expression
analysis is a hot topic in cancer research.
Objective:
The study aimed to diagnose the accurate type of lung cancer and discover the pathogenic
genes.
Methods:
In this study, Affinity Propagation (AP) clustering with similarity score was employed
to each type of lung cancer and normal lung. After grouping genes, sparse group lasso was adopted
to construct four binary classifiers and the voting strategy was used to integrate them.
Results:
This study screened six gene groups that may associate with different lung cancer subtypes
among 73 genes groups, and identified three possible key pathogenic genes, KRAS, BRAF
and VDR. Furthermore, this study achieved improved classification accuracies at minority classes
SQ and COID in comparison with other four methods.
Conclusion:
We propose the AP clustering based sparse group lasso (AP-SGL), which provides
an alternative for simultaneous diagnosis and gene selection for lung cancer.
Collapse
Affiliation(s)
- Juntao Li
- College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, China
| | - Mingming Chang
- College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, China
| | - Qinghui Gao
- College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, China
| | - Xuekun Song
- School of Information Technology, Henan University of Chinese Medicine, Zhengzhou, 450046, China
| | - Zhiyu Gao
- School of Information Technology, Henan University of Chinese Medicine, Zhengzhou, 450046, China
| |
Collapse
|
22
|
Cao W, Liang Z, Pomeroy MJ, Ng K, Zhang S, Gao Y, Pickhardt PJ, Barish MA, Abbasi AF, Lu H. Multilayer feature selection method for polyp classification via computed tomographic colonography. J Med Imaging (Bellingham) 2020; 6:044503. [PMID: 32280727 DOI: 10.1117/1.jmi.6.4.044503] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2019] [Accepted: 12/05/2019] [Indexed: 01/22/2023] Open
Abstract
Polyp classification is a feature selection and clustering process. Picking the most effective features from multiple polyp descriptors without redundant information is a great challenge in this procedure. We propose a multilayer feature selection method to construct an optimized descriptor for polyp classification with a feature-grouping strategy in a hierarchical framework. First, the proposed method makes good use of image metrics, such as intensity, gradient, and curvature, to divide their corresponding polyp descriptors into several feature groups, which are the preliminary units of this method. Then each preliminary unit generates two ranked descriptors, i.e., their optimized variable groups (OVGs) and preliminary classification measurements. Next, a feature dividing-merging (FDM) algorithm is designed to perform feature merging operation hierarchically and iteratively. Unlike traditional feature selection methods, the proposed FDM algorithm includes two steps for feature dividing and feature merging. At each layer, feature dividing selects the OVG with the highest area under the receiver operating characteristic curve (AUC) as the baseline while other descriptors are treated as its complements. In the fusion step, the FDM merges some variables with gains into the baseline from the complementary descriptors iteratively on every layer until the final descriptor is obtained. This proposed model (including the forward step algorithm and the FDM algorithm) is a greedy method that guarantees clustering monotonicity of all OVGs from the bottom to the top layer. In our experiments, all the selected results from each layer are reported by both graphical illustration and data analysis. Performance of the proposed method is compared to five existing classification methods by a polyp database of 63 samples with pathological reports. The experimental results show that our proposed method outperforms other methods by 4% to 23% gains in terms of AUC scores.
Collapse
Affiliation(s)
- Weiguo Cao
- State University of New York, Department of Radiology, Stony Brook, New York, United States
| | - Zhengrong Liang
- State University of New York, Department of Radiology, Stony Brook, New York, United States.,State University of New York, Department of Biomedical Engineering, Stony Brook, New York, United States.,State University of New York, Department of Electrical and Computer Engineering, Stony Brook, New York, United States
| | - Marc J Pomeroy
- State University of New York, Department of Radiology, Stony Brook, New York, United States.,State University of New York, Department of Biomedical Engineering, Stony Brook, New York, United States
| | - Kenneth Ng
- State University of New York, Department of Electrical and Computer Engineering, Stony Brook, New York, United States
| | - Shu Zhang
- State University of New York, Department of Radiology, Stony Brook, New York, United States
| | - Yongfeng Gao
- State University of New York, Department of Radiology, Stony Brook, New York, United States
| | - Perry J Pickhardt
- University of Wisconsin Medical School, Department of Radiology, Madison, Wisconsin, United States
| | - Matthew A Barish
- State University of New York, Department of Radiology, Stony Brook, New York, United States
| | - Almas F Abbasi
- State University of New York, Department of Radiology, Stony Brook, New York, United States
| | - Hongbing Lu
- The Fourth Medical University, Department of of Biomedical Engineering, Xi'an, China
| |
Collapse
|
23
|
Che K, Chen X, Guo M, Wang C, Liu X. Genetic Variants Detection Based on Weighted Sparse Group Lasso. Front Genet 2020; 11:155. [PMID: 32194631 PMCID: PMC7063084 DOI: 10.3389/fgene.2020.00155] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Accepted: 02/10/2020] [Indexed: 01/21/2023] Open
Abstract
Identification of genetic variants associated with complex traits is a critical step for improving plant resistance and breeding. Although the majority of existing methods for variants detection have good predictive performance in the average case, they can not precisely identify the variants present in a small number of target genes. In this paper, we propose a weighted sparse group lasso (WSGL) method to select both common and low-frequency variants in groups. Under the biologically realistic assumption that complex traits are influenced by a few single loci in a small number of genes, our method involves a sparse group lasso approach to simultaneously select associated groups along with the loci within each group. To increase the probability of selecting out low-frequency variants, biological prior information is introduced in the model by re-weighting lasso regularization based on weights calculated from input data. Experimental results from both simulation and real data of single nucleotide polymorphisms (SNPs) associated with Arabidopsis flowering traits demonstrate the superiority of WSGL over other competitive approaches for genetic variants detection.
Collapse
Affiliation(s)
- Kai Che
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xi Chen
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Maozu Guo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.,School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China.,Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing, China
| | - Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xiaoyan Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
24
|
Neighborhood multi-granulation rough sets-based attribute reduction using Lebesgue and entropy measures in incomplete neighborhood decision systems. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2019.105373] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
25
|
A study on metaheuristics approaches for gene selection in microarray data: algorithms, applications and open challenges. EVOLUTIONARY INTELLIGENCE 2019. [DOI: 10.1007/s12065-019-00306-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
26
|
Sun L, Zhang X, Qian Y, Xu J, Zhang S. Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2019.05.072] [Citation(s) in RCA: 109] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
|
27
|
Scaria LTT, Christopher T. A Bio-inspired Algorithm based Multi-class Classification Scheme for Microarray Gene Data. J Med Syst 2019; 43:208. [PMID: 31144036 DOI: 10.1007/s10916-019-1353-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Accepted: 05/20/2019] [Indexed: 11/24/2022]
Abstract
Microarray gene data is widely known for its high dimensionality and volume. The utilization of microarray gene data is increasing now-a-days, owing to the advancement of medical science. Microarray gene data helps in diagnosing diseases quite accurately. However, processing microarray gene data is difficult and is usually not understandable. Taking this challenge into account, this work presents a user-friendly rule based classification model, which is easily understandable and does not demand users to have prior knowledge. The classification rules are formed with the help of cuckoo search optimization algorithm and the rules are pruned by the associative rule mining. Finally, the classification is performed with the help of the pruned rules. The performance of the proposed approach is satisfactory in terms of accuracy, sensitivity, specificity and time consumption.
Collapse
Affiliation(s)
- L T Thomas Scaria
- Department of Computer Science, St. Pius X College, Kasaragod, Kerala, India.
| | - T Christopher
- PG and Research Department of Information Technology, Government Arts College, Coimbatore, India
| |
Collapse
|
28
|
Sun L, Zhang X, Xu J, Zhang S. An Attribute Reduction Method Using Neighborhood Entropy Measures in Neighborhood Rough Sets. ENTROPY 2019; 21:e21020155. [PMID: 33266871 PMCID: PMC7514638 DOI: 10.3390/e21020155] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2018] [Revised: 01/22/2019] [Accepted: 02/01/2019] [Indexed: 11/16/2022]
Abstract
Attribute reduction as an important preprocessing step for data mining, and has become a hot research topic in rough set theory. Neighborhood rough set theory can overcome the shortcoming that classical rough set theory may lose some useful information in the process of discretization for continuous-valued data sets. In this paper, to improve the classification performance of complex data, a novel attribute reduction method using neighborhood entropy measures, combining algebra view with information view, in neighborhood rough sets is proposed, which has the ability of dealing with continuous data whilst maintaining the classification information of original attributes. First, to efficiently analyze the uncertainty of knowledge in neighborhood rough sets, by combining neighborhood approximate precision with neighborhood entropy, a new average neighborhood entropy, based on the strong complementarity between the algebra definition of attribute significance and the definition of information view, is presented. Then, a concept of decision neighborhood entropy is investigated for handling the uncertainty and noisiness of neighborhood decision systems, which integrates the credibility degree with the coverage degree of neighborhood decision systems to fully reflect the decision ability of attributes. Moreover, some of their properties are derived and the relationships among these measures are established, which helps to understand the essence of knowledge content and the uncertainty of neighborhood decision systems. Finally, a heuristic attribute reduction algorithm is proposed to improve the classification performance of complex data sets. The experimental results under an instance and several public data sets demonstrate that the proposed method is very effective for selecting the most relevant attributes with great classification performance.
Collapse
Affiliation(s)
- Lin Sun
- College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
- Engineering Technology Research Center for Computing Intelligence and Data Mining, Henan 453007, China
- Correspondence: or
| | - Xiaoyu Zhang
- College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Jiucheng Xu
- College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
- Engineering Technology Research Center for Computing Intelligence and Data Mining, Henan 453007, China
| | - Shiguang Zhang
- College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
- Engineering Technology Research Center for Computing Intelligence and Data Mining, Henan 453007, China
| |
Collapse
|
29
|
Adaptive multinomial regression with overlapping groups for multi-class classification of lung cancer. Comput Biol Med 2018; 100:1-9. [PMID: 29957558 DOI: 10.1016/j.compbiomed.2018.06.014] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2018] [Revised: 06/17/2018] [Accepted: 06/17/2018] [Indexed: 02/07/2023]
Abstract
Multi-class classification has attracted much attention in cancer diagnosis and treatment and many machine learning methods have emerged for addressing this issue recently. However, class imbalance and gene selection problems occur in classifying lung cancer data. In this paper, an adaptive multinomial regression with a sparse overlapping group lasso penalty is proposed to perform classification and grouped gene selection for lung cancer gene expression data. An overlapped grouping strategy with biological interpretability is proposed, which highlights the importance of gene groups from the minority classes. By using the conditional mutual information, the gene significance within each group is evaluated and the data-driven weights are constructed. Based on the grouping strategy and constructed weights, a regularized adaptive multinomial regression is presented and the solving algorithm is developed, which can not only select the important gene groups for each class in performing multi-class classification, but also adaptively select important genes within each group. The experiment results show that the proposed method significantly outperforms the other 6 methods on classification accuracy, and the selected genes are disease-causing genes for lung cancer.
Collapse
|