1
|
Roy S, Singh J, Ray SS. Weighted Combination of Łukasiewicz implication and Fuzzy Jaccard similarity in Hybrid Ensemble Framework (WCLFJHEF) for Gene Selection. Comput Biol Med 2024; 170:107981. [PMID: 38262204 DOI: 10.1016/j.compbiomed.2024.107981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 01/02/2024] [Accepted: 01/12/2024] [Indexed: 01/25/2024]
Abstract
A framework is developed for gene expression analysis by introducing fuzzy Jaccard similarity (FJS) and combining Łukasiewicz implication with it through weights in hybrid ensemble framework (WCLFJHEF) for gene selection in cancer. The method is called weighted combination of Łukasiewicz implication and fuzzy Jaccard similarity in hybrid ensemble framework (WCLFJHEF). While the fuzziness in Jaccard similarity is incorporated by using the existing Gödel fuzzy logic, the weights are obtained by maximizing the average F-score of selected genes in classifying the cancer patients. The patients are first divided into different clusters, based on the number of patient groups, using average linkage agglomerative clustering and a new score, called WCLFJ (weighted combination of Łukasiewicz implication and fuzzy Jaccard similarity). The genes are then selected from each cluster separately using filter based Relief-F and wrapper based SVMRFE (Support Vector Machine with Recursive Feature Elimination). A gene (feature) pool is created by considering the union of selected features for all the clusters. A set of informative genes is selected from the pool using sequential backward floating search (SBFS) algorithm. Patients are then classified using Naïve Bayes'(NB) and Support Vector Machine (SVM) separately, using the selected genes and the related F-scores are calculated. The weights in WCLFJ are then updated iteratively to maximize the average F-score obtained from the results of the classifier. The effectiveness of WCLFJHEF is demonstrated on six gene expression datasets. The average values of accuracy, F-score, recall, precision and MCC over all the datasets, are 95%, 94%, 94%, 94%, and 90%, respectively. The explainability of the selected genes is shown using SHapley Additive exPlanations (SHAP) values and this information is further used to rank them. The relevance of the selected gene set are biologically validated using the KEGG Pathway, Gene Ontology (GO), and existing literatures. It is seen that the genes that are selected by WCLFJHEF are candidates for genomic alterations in the various cancer types. The source code of WCLFJHEF is available at http://www.isical.ac.in/~shubhra/WCLFJHEF.html.
Collapse
Affiliation(s)
- Sukriti Roy
- Machine Intelligence Unit, Indian Statistical Institute, Kolkata 700108, India.
| | - Joginder Singh
- Center for Soft Computing Research, Indian Statistical Institute, Kolkata 700108, India.
| | - Shubhra Sankar Ray
- Machine Intelligence Unit, Indian Statistical Institute, Kolkata 700108, India; Center for Soft Computing Research, Indian Statistical Institute, Kolkata 700108, India.
| |
Collapse
|
2
|
Yang J, Shu L, Han M, Pan J, Chen L, Yuan T, Tan L, Shu Q, Duan H, Li H. RDmaster: A novel phenotype-oriented dialogue system supporting differential diagnosis of rare disease. Comput Biol Med 2024; 169:107924. [PMID: 38181610 DOI: 10.1016/j.compbiomed.2024.107924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 12/18/2023] [Accepted: 01/01/2024] [Indexed: 01/07/2024]
Abstract
BACKGROUND Clinicians often lack the necessary expertise to differentially diagnose multiple underlying rare diseases (RDs) due to their complex and overlapping clinical features, leading to misdiagnoses and delayed treatments. The aim of this study is to develop a novel electronic differential diagnostic support system for RDs. METHOD Through integrating two Bayesian diagnostic methods, a candidate list was generated with enhance clinical interpretability for the further Q&A based differential diagnosis (DDX). To achieve an efficient Q&A dialogue strategy, we introduce a novel metric named the adaptive information gain and Gini index (AIGGI) to evaluate the expected gain of interrogated phenotypes within real-time diagnostic states. RESULTS This DDX tool called RDmaster has been implemented as a web-based platform (http://rdmaster.nbscn.org/). A diagnostic trial involving 238 published RD patients revealed that RDmaster outperformed existing RD diagnostic tools, as well as ChatGPT, and was shown to enhance the diagnostic accuracy through its Q&A system. CONCLUSIONS The RDmaster offers an effective multi-omics differential diagnostic technique and outperforms existing tools and popular large language models, particularly enhancing differential diagnosis in collecting diagnostically beneficial phenotypes.
Collapse
Affiliation(s)
- Jian Yang
- Clinical Data Center, The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Zhejiang, China; The College of Biomedical Engineering and Instrument Science, Zhejiang University, Zhejiang, China
| | - Liqi Shu
- Rhode Island Hospital, Warren Alpert Medical School of Brown University, Rhode Island, USA
| | - Mingyu Han
- Neonatal Department, The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Zhejiang, China
| | - Jiarong Pan
- Neonatal Department, The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Zhejiang, China
| | - Lihua Chen
- Neonatal Department, The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Zhejiang, China
| | - Tianming Yuan
- Neonatal Department, The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Zhejiang, China
| | - Linhua Tan
- Surgical Intensive Care Unit, The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Zhejiang, China
| | - Qiang Shu
- Clinical Data Center, The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Zhejiang, China
| | - Huilong Duan
- The College of Biomedical Engineering and Instrument Science, Zhejiang University, Zhejiang, China
| | - Haomin Li
- Clinical Data Center, The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Zhejiang, China.
| |
Collapse
|
3
|
Osama S, Ali M, Ali AA, Shaban H. Gene selection and tumor identification based on a hybrid of the multi-filter embedded recursive mountain gazelle algorithm. Comput Biol Med 2023; 167:107674. [PMID: 37976816 DOI: 10.1016/j.compbiomed.2023.107674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 10/09/2023] [Accepted: 11/06/2023] [Indexed: 11/19/2023]
Abstract
Microarray gene expression data are useful for identifying gene expression patterns associated with cancer outcomes; however, their high dimensionality make it difficult to extract meaningful information and accurately classify tumors. Hence, developing effective methods for reducing dimensionality while preserving relevant information is a crucial task. Hybrid-based gene selection methods are widely proposed in the gene expression analysis domain and can still be enhanced in terms of efficiency and reliability. This study proposes a new hybrid-based gene selection method, called multi-filter embedded mountain gazelle optimizer (MUL-MGO), which utilizes two filters and an embedded method to remove irrelevant genes, followed by selecting the most relevant genes using recently developed MGO algorithm. To the best of our knowledge, this is the first work to exploit MGO as a gene or feature selection method. A new version of MGO, called recursive mountain gazelle optimizer (RMGO), which implements MGO algorithm recursively to avoid local optima, minimize search space, and obtain minimum gene count without decreasing the classifier's performance, is developed. The proposed RMGO is used to develop a new hybrid gene selection method employing similar filters and embedded methods as MUL-MGO, but with a recursive MGO algorithm version. The resulting method is called multi-filter embedded recursive mountain gazelle optimizer (MUL-RMGO). Several classifiers are used for cancer classification. Accordingly, several experimental studies are performed on eight microarray gene expression datasets to demonstrate the proficiencies of MUL-MGO and MUL-RMGO methods. The experimental findings indicate the efficiency and productivity of the suggested MUL-MGO and MUL-RMGO methods for gene selection. The methods outperform cutting-edge methods in the literature, with MUL-RMGO exceeding MUL-MGO in terms of accuracy and selected gene count.
Collapse
Affiliation(s)
- Sarah Osama
- Computer Science Department, Faculty of Computers and Information, Minia University, Minia, Egypt.
| | - Moatez Ali
- Department of Internal Medicine, St. Barnabas Hospital, NY, USA.
| | - Abdelmgeid A Ali
- Computer Science Department, Faculty of Computers and Information, Minia University, Minia, Egypt.
| | - Hassan Shaban
- Computer Science Department, Faculty of Computers and Information, Minia University, Minia, Egypt.
| |
Collapse
|
4
|
Moslemi A, Ahmadian A. Dual regularized subspace learning using adaptive graph learning and rank constraint: Unsupervised feature selection on gene expression microarray datasets. Comput Biol Med 2023; 167:107659. [PMID: 37950946 DOI: 10.1016/j.compbiomed.2023.107659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 10/13/2023] [Accepted: 10/31/2023] [Indexed: 11/13/2023]
Abstract
High-dimensional problems have increasingly drawn attention in gene selection and analysis. To add insult to injury, usually the number of features is greater than number of samples in microarray gene dataset which leads to an ill-posed underdetermined equation system. Poor performance and high computational time for learning algorithms are consequences of redundant features in high-dimensional data. Feature selection is a noteworthy pre-processing method to ameliorate the curse of dimensionality with aim of maximum relevancy and minimum redundancy information preservation. Likewise, unsupervised feature selection has been important since collecting labels for data is expensive. In this paper, we develop a novel robust unsupervised feature selection to select discriminative subset of features for unlabeled data based on rank constrained and dual regularized nonnegative matrix factorization. The major focus of the proposed technique is to discard redundant features while keeping the informative features. Proposed feature selection technique consists of nonnegative matrix factorization to decompose the data into feature weight matrix and representation matrix, inner product norm as regularization for both feature weight matrix and representation matrix, adaptive structure learning to preserve local information and Schatten-p norm as rank constraint. To demonstrate the effectiveness of the proposed method, numerical studies are conducted on six benchmark microarray datasets. The results show that the proposed technique outperforms eight state-of-art unsupervised feature selection techniques in terms of clustering accuracy and normalized mutual information.
Collapse
Affiliation(s)
- Amir Moslemi
- Imaging Research and Physical Sciences, Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada.
| | - Arash Ahmadian
- Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
5
|
Yang J, Hussein Kadir D. Data mining techniques in breast cancer diagnosis at the cellular-molecular level. J Cancer Res Clin Oncol 2023; 149:12605-12620. [PMID: 37442866 DOI: 10.1007/s00432-023-05090-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Accepted: 06/30/2023] [Indexed: 07/15/2023]
Abstract
INTRODUCTION Studies in the field of better diagnosis of breast cancer using machine learning and data mining techniques have always been promising. A new diagnostic method can detect the characteristics of breast cancer in the early stages and help in better treatment. The aim of this study is to provide a method for early detection of breast cancer by reducing human errors based on data mining techniques in medicine using accurate and rapid screening. METHODOLOGY The proposed method includes data pre-processing and image quality improvement in the first step. The second step consists of separating cancer cells from healthy breast tissue and removing outliers using image segmentation. Finally, a classification model is configured by combining deep neural networks in the third phase. The proposed ensemble classification model uses several effective features extracted from images and is based on majority vote. This model can be used as a screening system to diagnose the grade of invasive ductal carcinoma of the breast. RESULTS Evaluations have been done using two histopathological microscopic datasets including patients with invasive ductal carcinoma of the breast. With extracting high-level features with average accuracies of 92.65% and 93.34% in these two datasets, the proposed method has succeeded in quickly diagnosing and classifying breast cancer with high performance. CONCLUSION By combining deep neural networks and extracting features affecting breast cancer, the ability to diagnose with the highest accuracy is provided, and this is a step toward helping specialists and increasing the chances of patients' survival.
Collapse
Affiliation(s)
- Jian Yang
- General Office of China Science and Technology Development Center for Chinese Medicine, Chaoyang District, Beijing, 100020, China.
| | - Dler Hussein Kadir
- Department of Statistics and Informatics, College of Administration and Economics, Salahaddin University, Erbil, Iraq
- Department of Business Administration, Cihan University-Erbil, Erbil, Iraq
| |
Collapse
|
6
|
Schürmeyer L, Schorning K, Rahnenführer J. Designs for the simultaneous inference of concentration-response curves. BMC Bioinformatics 2023; 24:393. [PMID: 37858091 PMCID: PMC10588042 DOI: 10.1186/s12859-023-05526-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 10/09/2023] [Indexed: 10/21/2023] Open
Abstract
BACKGROUND An important problem in toxicology in the context of gene expression data is the simultaneous inference of a large number of concentration-response relationships. The quality of the inference substantially depends on the choice of design of the experiments, in particular, on the set of different concentrations, at which observations are taken for the different genes under consideration. As this set has to be the same for all genes, the efficient planning of such experiments is very challenging. We address this problem by determining efficient designs for the simultaneous inference of a large number of concentration-response models. For that purpose, we both construct a D-optimality criterion for simultaneous inference and a K-means procedure which clusters the support points of the locally D-optimal designs of the individual models. RESULTS We show that a planning of experiments that addresses the simultaneous inference of a large number of concentration-response relationships yields a substantially more accurate statistical analysis. In particular, we compare the performance of the constructed designs to the ones of other commonly used designs in terms of D-efficiencies and in terms of the quality of the resulting model fits using a real data example dealing with valproic acid. For the quality comparison we perform an extensive simulation study. CONCLUSIONS The design maximizing the D-optimality criterion for simultaneous inference improves the inference of the different concentration-response relationships substantially. The design based on the K-means procedure also performs well, whereas a log-equidistant design, which was also included in the analysis, performs poorly in terms of the quality of the simultaneous inference. Based on our findings, the D-optimal design for simultaneous inference should be used for upcoming analyses dealing with high-dimensional gene expression data.
Collapse
|
7
|
Wang J, Zhu X, Chen K, Hao L, Liu Y. HAHNet: a convolutional neural network for HER2 status classification of breast cancer. BMC Bioinformatics 2023; 24:353. [PMID: 37730567 PMCID: PMC10512620 DOI: 10.1186/s12859-023-05474-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 09/12/2023] [Indexed: 09/22/2023] Open
Abstract
OBJECTIVE Breast cancer is a significant health issue for women, and human epidermal growth factor receptor-2 (HER2) plays a crucial role as a vital prognostic and predictive factor. The HER2 status is essential for formulating effective treatment plans for breast cancer. However, the assessment of HER2 status using immunohistochemistry (IHC) is time-consuming and costly. Existing computational methods for evaluating HER2 status have limitations and lack sufficient accuracy. Therefore, there is an urgent need for an improved computational method to better assess HER2 status, which holds significant importance in saving lives and alleviating the burden on pathologists. RESULTS This paper analyzes the characteristics of histological images of breast cancer and proposes a neural network model named HAHNet that combines multi-scale features with attention mechanisms for HER2 status classification. HAHNet directly classifies the HER2 status from hematoxylin and eosin (H&E) stained histological images, reducing additional costs. It achieves superior performance compared to other computational methods. CONCLUSIONS According to our experimental results, the proposed HAHNet achieved high performance in classifying the HER2 status of breast cancer using only H&E stained samples. It can be applied in case classification, benefiting the work of pathologists and potentially helping more breast cancer patients.
Collapse
Affiliation(s)
- Jiahao Wang
- College of Software, Jilin University, Changchun, 130012, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, China
| | - Xiaodong Zhu
- College of Software, Jilin University, Changchun, 130012, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, China
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Kai Chen
- College of Software, Jilin University, Changchun, 130012, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, China
| | - Lei Hao
- College of Software, Jilin University, Changchun, 130012, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, China
| | - Yuanning Liu
- College of Software, Jilin University, Changchun, 130012, China.
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, China.
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
| |
Collapse
|
8
|
Angaitkar P, Aljrees T, Kumar Pandey S, Kumar A, Janghel RR, Sahu TP, Singh KU, Singh T. Inferring linear-B cell epitopes using 2-step metaheuristic variant-feature selection using genetic algorithm. Sci Rep 2023; 13:14593. [PMID: 37670007 PMCID: PMC10480427 DOI: 10.1038/s41598-023-41179-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Accepted: 08/23/2023] [Indexed: 09/07/2023] Open
Abstract
Linear-B cell epitopes (LBCE) play a vital role in vaccine design; thus, efficiently detecting them from protein sequences is of primary importance. These epitopes consist of amino acids arranged in continuous or discontinuous patterns. Vaccines employ attenuated viruses and purified antigens. LBCE stimulate humoral immunity in the body, where B and T cells target circulating infections. To predict LBCE, the underlying protein sequences undergo a process of feature extraction, feature selection, and classification. Various system models have been proposed for this purpose, but their classification accuracy is only moderate. In order to enhance the accuracy of LBCE classification, this paper presents a novel 2-step metaheuristic variant-feature selection method that combines a linear support vector classifier (LSVC) with a Modified Genetic Algorithm (MGA). The feature selection model employs mono-peptide, dipeptide, and tripeptide features, focusing on the most diverse ones. These selected features are fed into a machine learning (ML)-based parallel ensemble classifier. The ensemble classifier combines correctly classified instances from various classifiers, including k-Nearest Neighbor (kNN), random forest (RF), logistic regression (LR), and support vector machine (SVM). The ensemble classifier came up with an impressively high accuracy of 99.3% as a result of its work. This accuracy is superior to the most recent models that are considered to be state-of-the-art for linear B-cell classification. As a direct consequence of this, the entire system model can now be utilised effectively in real-time clinical settings.
Collapse
Affiliation(s)
- Pratik Angaitkar
- Department of Information Technology, National Institute of Technology, Raipur, G.E. Road, Raipur, 492010, Chhattisgarh, India
| | - Turki Aljrees
- College of Computer Science and Engineering, University of Hafr Al Batin, 39524, Hafar Al Batin, Saudi Arabia
| | - Saroj Kumar Pandey
- Department of Computer Engineering & Applications, GLA University, Mathura, India
| | - Ankit Kumar
- Department of Computer Engineering & Applications, GLA University, Mathura, India.
| | - Rekh Ram Janghel
- Department of Information Technology, National Institute of Technology, Raipur, G.E. Road, Raipur, 492010, Chhattisgarh, India
| | - Tirath Prasad Sahu
- Department of Information Technology, National Institute of Technology, Raipur, G.E. Road, Raipur, 492010, Chhattisgarh, India
| | | | - Teekam Singh
- Department of Computer Science and Engineering, Graphic Era Deemed to be University, Dehradun, 248002, Uttarakhand, India
| |
Collapse
|
9
|
Yao N, Pan J, Chen X, Li P, Li Y, Wang Z, Yao T, Qian L, Yi D, Wu Y. Discovery of potential biomarkers for lung cancer classification based on human proteome microarrays using Stochastic Gradient Boosting approach. J Cancer Res Clin Oncol 2023; 149:6803-6812. [PMID: 36807761 DOI: 10.1007/s00432-023-04643-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 02/08/2023] [Indexed: 02/21/2023]
Abstract
PURPOSE Early identification of lung cancer (LC) will considerably facilitate the intervention and prevention of LC. The human proteome micro-arrays approach can be used as a "liquid biopsy" to diagnose LC to complement conventional diagnosis, which needs advanced bioinformatics methods such as feature selection (FS) and refined machine learning models. METHODS A two-stage FS methodology by infusing Pearson's Correlation (PC) with a univariate filter (SBF) or recursive feature elimination (RFE) was used to reduce the redundancy of the original dataset. The Stochastic Gradient Boosting (SGB), Random Forest (RF), and Support Vector Machine (SVM) techniques were applied to build ensemble classifiers based on four subsets. The synthetic minority oversampling technique (SMOTE) was used in the preprocessing of imbalanced data. RESULTS FS approach with SBF and RFE extracted 25 and 55 features, respectively, with 14 overlapped ones. All three ensemble models demonstrate superior accuracy (ranging from 0.867 to 0.967) and sensitivity (0.917 to 1.00) in the test datasets with SGB of SBF subset outperforming others. The SMOTE technique has improved the model performance in the training process. Three of the top selected candidate biomarkers (LGR4, CDC34, and GHRHR) were highly suggested to play a role in lung tumorigenesis. CONCLUSION A novel hybrid FS method with classical ensemble machine learning algorithms was first used in the classification of protein microarray data. The parsimony model constructed by the SGB algorithm with the appropriate FS and SMOTE approach performs well in the classification task with higher sensitivity and specificity. Standardization and innovation of bioinformatics approach for protein microarray analysis need further exploration and validation.
Collapse
Affiliation(s)
- Ning Yao
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, No.30 Gaotanyan Street, Shapingba District, Chongqing, 400038, China
- Chongqing Center for Disease Control and Prevention, No.8 Changjiang 2nd Street, Yuzhong District, Chongqing, 400042, China
| | - Jianbo Pan
- Center for Novel Target and Therapeutic Intervention, Institute of Life Sciences, Chongqing Medical University, Chongqing, 400016, China
| | - Xicheng Chen
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, No.30 Gaotanyan Street, Shapingba District, Chongqing, 400038, China
| | - Pengpeng Li
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, No.30 Gaotanyan Street, Shapingba District, Chongqing, 400038, China
| | - Yang Li
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, No.30 Gaotanyan Street, Shapingba District, Chongqing, 400038, China
| | - Zhenyan Wang
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, No.30 Gaotanyan Street, Shapingba District, Chongqing, 400038, China
| | - Tianhua Yao
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, No.30 Gaotanyan Street, Shapingba District, Chongqing, 400038, China
| | - Li Qian
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, No.30 Gaotanyan Street, Shapingba District, Chongqing, 400038, China
| | - Dong Yi
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, No.30 Gaotanyan Street, Shapingba District, Chongqing, 400038, China.
| | - Yazhou Wu
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, No.30 Gaotanyan Street, Shapingba District, Chongqing, 400038, China.
| |
Collapse
|
10
|
Wang X, Han Y, Wang B. A Two-Phase Feature Selection Method for Identifying Influential Spreaders of Disease Epidemics in Complex Networks. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1068. [PMID: 37510015 PMCID: PMC10378310 DOI: 10.3390/e25071068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 06/28/2023] [Accepted: 07/07/2023] [Indexed: 07/30/2023]
Abstract
Network epidemiology plays a fundamental role in understanding the relationship between network structure and epidemic dynamics, among which identifying influential spreaders is especially important. Most previous studies aim to propose a centrality measure based on network topology to reflect the influence of spreaders, which manifest limited universality. Machine learning enhances the identification of influential spreaders by combining multiple centralities. However, several centrality measures utilized in machine learning methods, such as closeness centrality, exhibit high computational complexity when confronted with large network sizes. Here, we propose a two-phase feature selection method for identifying influential spreaders with a reduced feature dimension. Depending on the definition of influential spreaders, we obtain the optimal feature combination for different synthetic networks. Our results demonstrate that when the datasets are mildly or moderately imbalanced, for Barabasi-Albert (BA) scale-free networks, the centralities' combination with the two-hop neighborhood is fundamental, and for Erdős-Rényi (ER) random graphs, the centralities' combination with the degree centrality is essential. Meanwhile, for Watts-Strogatz (WS) small world networks, feature selection is unnecessary. We also conduct experiments on real-world networks, and the features selected display a high similarity with synthetic networks. Our method provides a new path for identifying superspreaders for the control of epidemics.
Collapse
Affiliation(s)
- Xiya Wang
- School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
| | - Yuexing Han
- School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
- Zhejiang Laboratory, Hangzhou 311100, China
| | - Bing Wang
- School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
| |
Collapse
|
11
|
Park J, Lee JW, Park M. Comparison of cancer subtype identification methods combined with feature selection methods in omics data analysis. BioData Min 2023; 16:18. [PMID: 37420304 DOI: 10.1186/s13040-023-00334-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Accepted: 06/30/2023] [Indexed: 07/09/2023] Open
Abstract
BACKGROUND Cancer subtype identification is important for the early diagnosis of cancer and the provision of adequate treatment. Prior to identifying the subtype of cancer in a patient, feature selection is also crucial for reducing the dimensionality of the data by detecting genes that contain important information about the cancer subtype. Numerous cancer subtyping methods have been developed, and their performance has been compared. However, combinations of feature selection and subtype identification methods have rarely been considered. This study aimed to identify the best combination of variable selection and subtype identification methods in single omics data analysis. RESULTS Combinations of six filter-based methods and six unsupervised subtype identification methods were investigated using The Cancer Genome Atlas (TCGA) datasets for four cancers. The number of features selected varied, and several evaluation metrics were used. Although no single combination was found to have a distinctively good performance, Consensus Clustering (CC) and Neighborhood-Based Multi-omics Clustering (NEMO) used with variance-based feature selection had a tendency to show lower p-values, and nonnegative matrix factorization (NMF) stably showed good performance in many cases unless the Dip test was used for feature selection. In terms of accuracy, the combination of NMF and similarity network fusion (SNF) with Monte Carlo Feature Selection (MCFS) and Minimum-Redundancy Maximum Relevance (mRMR) showed good overall performance. NMF always showed among the worst performances without feature selection in all datasets, but performed much better when used with various feature selection methods. iClusterBayes (ICB) had decent performance when used without feature selection. CONCLUSIONS Rather than a single method clearly emerging as optimal, the best methodology was different depending on the data used, the number of features selected, and the evaluation method. A guideline for choosing the best combination method under various situations is provided.
Collapse
Affiliation(s)
- JiYoon Park
- Department of Statistics, Korea University, 145 Anam-Ro, Seongbuk-Gu, Seoul, 02841, South Korea
| | - Jae Won Lee
- Department of Statistics, Korea University, 145 Anam-Ro, Seongbuk-Gu, Seoul, 02841, South Korea
| | - Mira Park
- Department of Preventive Medicine, Eulji University, 77 Gyeryong-Ro, Jung-Gu, Daejeon, 34824, South Korea.
| |
Collapse
|
12
|
Fu Q, Li Q, Li X. An improved multi-objective marine predator algorithm for gene selection in classification of cancer microarray data. Comput Biol Med 2023; 160:107020. [PMID: 37196457 DOI: 10.1016/j.compbiomed.2023.107020] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Revised: 04/09/2023] [Accepted: 05/05/2023] [Indexed: 05/19/2023]
Abstract
Gene selection (GS) is an important branch of interest within the field of feature selection, which is widely used in cancer classification. It provides essential insights into the pathogenesis of cancer and enables a deeper understanding of cancer data. In cancer classification, GS is essentially a multi-objective optimization problem, which aims to simultaneously optimize the two objectives of classification accuracy and the size of the gene subset. The marine predator algorithm (MPA) has been successfully employed in practical applications, however, its random initialization can lead to blindness, which may adversely affect the convergence of the algorithm. Furthermore, the elite individuals in guiding evolution are randomly chosen from the Pareto solutions, which may degrade the good exploration performance of the population. To overcome these limitations, a multi-objective improved MPA with continuous mapping initialization and leader selection strategies is proposed. In this work, a new continuous mapping initialization with ReliefF overwhelms the defects with less information in late evolution. Moreover, an improved elite selection mechanism with Gaussian distribution guides the population to evolve towards a better Pareto front. Finally, an efficient mutation method is adopted to prevent evolutionary stagnation. To evaluate its effectiveness, the proposed algorithm was compared with 9 famous algorithms. The experimental results on 16 datasets demonstrate that the proposed algorithm can significantly reduce the data dimension and obtain the highest classification accuracy on most of high-dimension cancer microarray datasets.
Collapse
Affiliation(s)
- Qiyong Fu
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China
| | - Qi Li
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China
| | - Xiaobo Li
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China.
| |
Collapse
|
13
|
Gurmani SH, Zhang Z, Zulqarnain RM, Askar S. An interaction and feedback mechanism-based group decision-making for emergency medical supplies supplier selection using T-spherical fuzzy information. Sci Rep 2023; 13:8726. [PMID: 37253823 DOI: 10.1038/s41598-023-35909-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2023] [Accepted: 05/25/2023] [Indexed: 06/01/2023] Open
Abstract
Selecting a supplier for emergency medical supplies during disasters can be considered a typical multiple attribute group decision-making (MAGDM) problem. MAGDM is an intriguing common problem that is rife with ambiguity and uncertainty. It becomes much more challenging when governments and medical care enterprises adjust their priorities in response to the escalating problems and the effectiveness of the actions taken in different countries. As decision-making problems become increasingly complicated nowadays, a growing number of experts are likely to use T-spherical fuzzy sets (T-SFSs) rather than exact numbers. T-SFS is a novel extension of fuzzy sets that can fully convey ambiguous and complicated information in MAGDM. The objective of this paper is to propose a MAGDM methodology based on interaction and feedback mechanism (IFM) and T-SFS theory. In it, we first introduce T-SF partitioned Bonferroni mean (T-SFPBM) and T-SF weighted partitioned Bonferroni mean (T-SFWPBM) operators to fuse the evaluation information provided by experts. Then, an IFM is designed to achieve a consensus between multiple experts. In the meantime, we also find the weights of experts by using T-SF information. Furthermore, in light of the combination of IFM and T-SFWPBM operator, an MAGDM algorithm is designed. Finally, an example of supplier selection for emergency medical supplies is provided to demonstrate the viability of the suggested approach. The influence of parameters on decision results and comparative analysis with the existing methods confirmed the reliability and accuracy of the suggested approach.
Collapse
Affiliation(s)
- Shahid Hussain Gurmani
- School of Mathematical Sciences, Zhejiang Normal University, Jinhua, 321004, Zhejiang, China.
| | - Zhao Zhang
- School of Mathematical Sciences, Zhejiang Normal University, Jinhua, 321004, Zhejiang, China.
| | - Rana Muhammad Zulqarnain
- School of Mathematical Sciences, Zhejiang Normal University, Jinhua, 321004, Zhejiang, China
- Department of Mathematics, University of Management and Technology, Sialkot Campus, 51310, Pakistan
| | - Sameh Askar
- Department of Statistics and Operations Research, College of Science, King Saud University, P.O. Box 2455, Riyadh, 11451, Saudi Arabia
| |
Collapse
|
14
|
Semi-supervised segmentation of coronary DSA using mixed networks and multi-strategies. Comput Biol Med 2023; 156:106493. [PMID: 36893708 DOI: 10.1016/j.compbiomed.2022.106493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 12/11/2022] [Accepted: 12/27/2022] [Indexed: 12/31/2022]
Abstract
The coronary arteries supply blood to the myocardium, which originate from the root of the aorta and mainly branch into the left and right. X-ray digital subtraction angiography (DSA) is a technique for evaluating coronary artery plaques and narrowing, that is widely used because of its time efficiency and cost-effectiveness. However, automated coronary vessel classification and segmentation remains challenging using a little data. Therefore, the purpose of this study is twofold: one is to propose a more robust method for vessel segmentation, the other is to provide a solution that is feasible with a small amount of labeled data. Currently, there are three main types of vessel segmentation methods, i.e., graphical- and statistical-based; clustering theory based, and deep learning-based methods for pixel-by-pixel probabilistic prediction, among which the last method is the mainstream with high accuracy and automation. Under this trend, an Inception-SwinUnet (ISUnet) network combining the convolutional neural network and Transformer basic module was proposed in this paper. Considering that data-driven fully supervised learning (FSL) segmentation methods require a large set of paired data with high-quality pixel-level annotation, which is expertise-demanding and time-consuming, we proposed a Semi-supervised Learning (SSL) method to achieve better performance with a small amount of labeled and unlabeled data. Different from the classical SSL method, i.e., Mean-Teacher, our method used two different networks for cross-teaching as the backbone. Meanwhile, inspired by deep supervision and confidence learning (CL), two effective strategies for SSL were adopted, which were denominated Pyramid-consistency Learning (PL) and Confidence Learning (CL), respectively. Both were designed to filter the noise and improve the credibility of pseudo labels generated by unlabeled data. Compared with existing methods, ours achieved superior segmentation performance over other FSL and SSL ones by using data with a small equal number of labels. Code is available in https://github.com/Allenem/SSL4DSA.
Collapse
|
15
|
Deng S, Wang L, Guan S, Li M, Wang L. Non-parametric Nearest Neighbor Classification Based on Global Variance Difference. INT J COMPUT INT SYS 2023. [DOI: 10.1007/s44196-023-00200-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023] Open
Abstract
AbstractAs technology improves, how to extract information from vast datasets is becoming more urgent. As is well known, k-nearest neighbor classifiers are simple to implement and conceptually simple to implement. It is not without its shortcomings, however, as follows: (1) there is still a sensitivity to the choice of k-values even when representative attributes are not considered in each class; (2) in some cases, the proximity between test samples and nearest neighbor samples cannot be reflected accurately due to proximity measurements, etc. Here, we propose a non-parametric nearest neighbor classification method based on global variance differences. First, the difference in variance is calculated before and after adding the sample to be the subject, then the difference is divided by the variance before adding the sample to be tested, and the resulting quotient serves as the objective function. In the final step, the samples to be tested are classified into the class with the smallest objective function. Here, we discuss the theoretical aspects of this function. Using the Lagrange method, it can be shown that the objective function can be optimal when the sample centers of each class are averaged. Twelve real datasets from the University of California, Irvine are used to compare the proposed algorithm with competitors such as the Local mean k-nearest neighbor algorithm and the pseudo-nearest neighbor algorithm. According to a comprehensive experimental study, the average accuracy on 12 datasets is as high as 86.27$$\%$$
%
, which is far higher than other algorithms. The experimental findings verify that the proposed algorithm produces results that are more dependable than other existing algorithms.
Collapse
|
16
|
Liu X, Teng L, Zuo W, Zhong S, Xu Y, Sun J. Deafness gene screening based on a multilevel cascaded BPNN model. BMC Bioinformatics 2023; 24:56. [PMID: 36803022 PMCID: PMC9942297 DOI: 10.1186/s12859-023-05182-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 02/11/2023] [Indexed: 02/22/2023] Open
Abstract
Sudden sensorineural hearing loss is a common and frequently occurring condition in otolaryngology. Existing studies have shown that sudden sensorineural hearing loss is closely associated with mutations in genes for inherited deafness. To identify these genes associated with deafness, researchers have mostly used biological experiments, which are accurate but time-consuming and laborious. In this paper, we proposed a computational method based on machine learning to predict deafness-associated genes. The model is based on several basic backpropagation neural networks (BPNNs), which were cascaded as multiple-level BPNN models. The cascaded BPNN model showed a stronger ability for screening deafness-associated genes than the conventional BPNN. A total of 211 of 214 deafness-associated genes from the deafness variant database (DVD v9.0) were used as positive data, and 2110 genes extracted from chromosomes were used as negative data to train our model. The test achieved a mean AUC higher than 0.98. Furthermore, to illustrate the predictive performance of the model for suspected deafness-associated genes, we analyzed the remaining 17,711 genes in the human genome and screened the 20 genes with the highest scores as highly suspected deafness-associated genes. Among these 20 predicted genes, three genes were mentioned as deafness-associated genes in the literature. The analysis showed that our approach has the potential to screen out highly suspected deafness-associated genes from a large number of genes, and our predictions could be valuable for future research and discovery of deafness-associated genes.
Collapse
Affiliation(s)
- Xiao Liu
- School of Microelectronics and Communication Engineering, Chongqing University, 174 Shapingba District, Chongqing, 400044, China.
| | - Li Teng
- grid.190737.b0000 0001 0154 0904School of Microelectronics and Communication Engineering, Chongqing University, 174 Shapingba District, Chongqing, 400044 China
| | - Wenqi Zuo
- grid.452206.70000 0004 1758 417XDepartment of Otolaryngology, The First Affiliated Hospital of Chongqing Medical University, NO. 1 Youyi Road, Yuzhong District, Chongqing, 400016 China
| | - Shixun Zhong
- grid.452206.70000 0004 1758 417XDepartment of Otolaryngology, The First Affiliated Hospital of Chongqing Medical University, NO. 1 Youyi Road, Yuzhong District, Chongqing, 400016 China
| | - Yuqiao Xu
- grid.190737.b0000 0001 0154 0904School of Microelectronics and Communication Engineering, Chongqing University, 174 Shapingba District, Chongqing, 400044 China
| | - Jing Sun
- grid.190737.b0000 0001 0154 0904School of Microelectronics and Communication Engineering, Chongqing University, 174 Shapingba District, Chongqing, 400044 China
| |
Collapse
|
17
|
Kang IA, Njimbouom SN, Kim JD. Optimal Feature Selection-Based Dental Caries Prediction Model Using Machine Learning for Decision Support System. Bioengineering (Basel) 2023; 10:bioengineering10020245. [PMID: 36829739 PMCID: PMC9952690 DOI: 10.3390/bioengineering10020245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 02/07/2023] [Accepted: 02/08/2023] [Indexed: 02/16/2023] Open
Abstract
The high frequency of dental caries is a major public health concern worldwide. The condition is common, particularly in developing countries. Because there are no evident early-stage signs, dental caries frequently goes untreated. Meanwhile, early detection and timely clinical intervention are required to slow disease development. Machine learning (ML) models can benefit clinicians in the early detection of dental cavities through efficient and cost-effective computer-aided diagnoses. This study proposed a more effective method for diagnosing dental caries by integrating the GINI and mRMR algorithms with the GBDT classifier. Because just a few clinical test features are required for the diagnosis, this strategy could save time and money when screening for dental caries. The proposed method was compared to recently proposed dental procedures. Among these classifiers, the suggested GBDT trained with a reduced feature set achieved the best classification performance, with accuracy, F1-score, precision, and recall values of 95%, 93%, 99%, and 88%, respectively. Furthermore, the experimental results suggest that feature selection improved the performance of the various classifiers. The suggested method yielded a good predictive model for dental caries diagnosis, which might be used in more imbalanced medical datasets to identify disease more effectively.
Collapse
Affiliation(s)
- In-Ae Kang
- Department of Computer and Electronics Convergence Engineering, Sun Moon University, Asan-si 31460, Republic of Korea
| | - Soualihou Ngnamsie Njimbouom
- Department of Computer and Electronics Convergence Engineering, Sun Moon University, Asan-si 31460, Republic of Korea
| | - Jeong-Dong Kim
- Department of Computer and Electronics Convergence Engineering, Sun Moon University, Asan-si 31460, Republic of Korea
- Department of Computer Science and Engineering, Sun Moon University, Asan-si 31460, Republic of Korea
- Genome-Based BioIT Convergence Institute, Sun Moon University, Asan-si 31460, Republic of Korea
- Correspondence:
| |
Collapse
|
18
|
Alharbi F, Vakanski A. Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review. Bioengineering (Basel) 2023; 10:bioengineering10020173. [PMID: 36829667 PMCID: PMC9952758 DOI: 10.3390/bioengineering10020173] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 01/24/2023] [Accepted: 01/26/2023] [Indexed: 01/31/2023] Open
Abstract
Cancer is a term that denotes a group of diseases caused by the abnormal growth of cells that can spread in different parts of the body. According to the World Health Organization (WHO), cancer is the second major cause of death after cardiovascular diseases. Gene expression can play a fundamental role in the early detection of cancer, as it is indicative of the biochemical processes in tissue and cells, as well as the genetic characteristics of an organism. Deoxyribonucleic acid (DNA) microarrays and ribonucleic acid (RNA)-sequencing methods for gene expression data allow quantifying the expression levels of genes and produce valuable data for computational analysis. This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods. Both conventional and deep learning-based approaches are reviewed, with an emphasis on the application of deep learning models due to their comparative advantages for identifying gene patterns that are distinctive for various types of cancers. Relevant works that employ the most commonly used deep neural network architectures are covered, including multi-layer perceptrons, as well as convolutional, recurrent, graph, and transformer networks. This survey also presents an overview of the data collection methods for gene expression analysis and lists important datasets that are commonly used for supervised machine learning for this task. Furthermore, we review pertinent techniques for feature engineering and data preprocessing that are typically used to handle the high dimensionality of gene expression data, caused by a large number of genes present in data samples. The paper concludes with a discussion of future research directions for machine learning-based gene expression analysis for cancer classification.
Collapse
|
19
|
MultiScale-CNN-4mCPred: a multi-scale CNN and adaptive embedding-based method for mouse genome DNA N4-methylcytosine prediction. BMC Bioinformatics 2023; 24:21. [PMID: 36653789 PMCID: PMC9847203 DOI: 10.1186/s12859-023-05135-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 01/04/2023] [Indexed: 01/19/2023] Open
Abstract
N4-methylcytosine (4mC) is an important epigenetic mechanism, which regulates many cellular processes such as cell differentiation and gene expression. The knowledge about the 4mC sites is a key foundation to exploring its roles. Due to the limitation of techniques, precise detection of 4mC is still a challenging task. In this paper, we presented a multi-scale convolution neural network (CNN) and adaptive embedding-based computational method for predicting 4mC sites in mouse genome, which was referred to as MultiScale-CNN-4mCPred. The MultiScale-CNN-4mCPred used adaptive embedding to encode nucleotides, and then utilized multi-scale CNNs as well as long short-term memory to extract more in-depth local properties and contextual semantics in the sequences. The MultiScale-CNN-4mCPred is an end-to-end learning method, which requires no sophisticated feature design. The MultiScale-CNN-4mCPred reached an accuracy of 81.66% in the 10-fold cross-validation, and an accuracy of 84.69% in the independent test, outperforming state-of-the-art methods. We implemented the proposed method into a user-friendly web application which is freely available at: http://www.biolscience.cn/MultiScale-CNN-4mCPred/ .
Collapse
|
20
|
Liu C, Wu S, Lai L, Liu J, Guo Z, Ye Z, Chen X. Comprehensive analysis of cuproptosis-related lncRNAs in immune infiltration and prognosis in hepatocellular carcinoma. BMC Bioinformatics 2023; 24:4. [PMID: 36597032 PMCID: PMC9811804 DOI: 10.1186/s12859-022-05091-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 12/01/2022] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Being among the most common malignancies worldwide, hepatocellular carcinoma (HCC) accounting for the third cause of cancer mortality. The regulation of cell death is the most crucial step in tumor progression and has become a crucial target for nearly all therapeutic options. Cuproptosis, a copper-induced cell death, was recently reported in Science. However, its primary function in carcinogenesis is still unclear. METHODS Cuproptosis-related lncRNAs significantly associated with overall survival (OS) were screened by stepwise univariate Cox regression. The signature of cuproptosis-related lncRNAs for HCC prognosis was constructed by the LASSO algorithm and multivariate Cox regression. Further Kaplan-Meier analysis, proportional hazards model, and ROC analysis were performed. Functional annotation was performed using gene set enrichment analysis (GSEA). The relationship between prognostic cuproptosis-related lncRNAs and HCC prognosis was further explored by GEPIA( http://gepia.cancer-pku.cn/ ) online analysis tool. Finally, we used the ESTIMATE and XCELL algorithms to estimate stromal and immune cells in tumor tissue and cast each sample to infer the underlying mechanism of cuproptosis-related lncRNAs in the tumor immune microenvironment (TIME) of HCC patients. RESULTS Four cuproptosis-related lncRNAs were used to construct a prognostic lncRNA signature, which was an independent factor in predicting OS in HCC patients. Kaplan-Meier curves showed significant differences in survival rates between risk subgroups (p = 0.002). At the same time, we found that the expression levels of most immune checkpoint genes increased with increasing risk scores. Tumorigenesis and immunological-related pathways were primarily enhanced in the high-risk group, as determined by GSEA. The results of drug sensitivity analysis showed that compared with patients in the high-risk group, the IC50 values of erlotinib and lapatinib were lower in patients in the low-risk group, while the opposite was true for sunitinib, paclitaxel, gemcitabine, and imatinib. We also found that elevated AL133243.2 expression was significantly associated with worse OS and disease-free survival (DFS), more advanced T stage and higher tumor grade, and reduced immune cell infiltration, suggesting that HCC patients with low AL133243.2 expression in tumor tissues may have a better response to immunotherapy. CONCLUSION Collectively, the cuproptosis-associated lncRNA signature can serve as an independent predictor to guide individual treatment strategies. Furthermore, AL133243.2 is a promising marker for predicting immunotherapy response in HCC patients. This data may facilitate further exploration of more effective immunotherapy strategies for HCC.
Collapse
Affiliation(s)
- Chunhua Liu
- grid.417384.d0000 0004 1764 2632Rehabilitation Center, The Second Affiliated Hospital of Wenzhou Medical University, 108 Xueyuan West Road, Wenzhou, Zhejiang China
| | - Simin Wu
- grid.417384.d0000 0004 1764 2632Rehabilitation Center, The Second Affiliated Hospital of Wenzhou Medical University, 108 Xueyuan West Road, Wenzhou, Zhejiang China
| | - Liying Lai
- grid.13402.340000 0004 1759 700XDepartment of Cancer Rehabilitation, Lishui Hospital of Traditional Chinese Medicine Affiliated to the Zhejiang University of Chinese Medicine, Lishui, Zhejiang China
| | - Jinyu Liu
- grid.13402.340000 0004 1759 700XDepartment of Cancer Rehabilitation, Lishui Hospital of Traditional Chinese Medicine Affiliated to the Zhejiang University of Chinese Medicine, Lishui, Zhejiang China
| | - Zhaofu Guo
- grid.13402.340000 0004 1759 700XDepartment of Cancer Rehabilitation, Lishui Hospital of Traditional Chinese Medicine Affiliated to the Zhejiang University of Chinese Medicine, Lishui, Zhejiang China
| | - Zegen Ye
- grid.13402.340000 0004 1759 700XDepartment of Cancer Rehabilitation, Lishui Hospital of Traditional Chinese Medicine Affiliated to the Zhejiang University of Chinese Medicine, Lishui, Zhejiang China
| | - Xiang Chen
- Rehabilitation Center, The Second Affiliated Hospital of Wenzhou Medical University, 108 Xueyuan West Road, Wenzhou, Zhejiang, China.
| |
Collapse
|
21
|
Sheikhpour R. A local spline regression-based framework for semi-supervised sparse feature selection. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2023.110265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
22
|
Xiang J, Wang X, Wang X, Zhang J, Yang S, Yang W, Han X, Liu Y. Automatic diagnosis and grading of Prostate Cancer with weakly supervised learning on whole slide images. Comput Biol Med 2023; 152:106340. [PMID: 36481762 DOI: 10.1016/j.compbiomed.2022.106340] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Revised: 11/02/2022] [Accepted: 11/16/2022] [Indexed: 11/23/2022]
Abstract
BACKGROUND The workflow of prostate cancer diagnosis and grading is cumbersome and the results suffer from substantial inter-observer variability. Recent trials have shown potential in using machine learning to develop automated systems to address this challenge. Most automated deep learning systems for prostate cancer Gleason grading focused on supervised learning requiring demanding fine-grained pixel-level annotations. METHODS A weakly-supervised deep learning model with slide-level labels is presented in this study for the diagnosis and grading of prostate cancer with whole slide image (WSI). WSIs are first cropped into small patches and then processed with a deep learning model to extract patch-level features. A graph convolution network (GCN) is used to aggregate the features for classifications. Throughout the training process, the noisy labels are progressively filtered out to reduce inter-observer variations in clinical reports. Finally, multi-center independent test cohorts with 6,174 slides are collected to evaluate the prostate cancer diagnosis and grading performance of our model. RESULTS The cancer diagnosis (2-level classification) results on two external test sets (n= 4,675, n= 844) show an area under the receiver operating characteristic curve (AUC) of 0.985 and 0.986. The Gleason grading (6-level classification) results reach 0.931 quadratic weighted kappa on the internal test set (n= 531). It generalizes well on the external test dataset (n= 844) with 0.801 quadratic weighted kappa with the reference standard set independently. The model enables pathological meaningful interpretability by visualizing the most attended lesions which are highly consistent with expert annotations. CONCLUSION The proposed model incorporates a graph network in weakly supervised learning with only slide-level reports. A robust learning strategy is also employed to correct the label noise. It is highly accurate (>0.985 AUC for diagnosis) and also interpretable with intuitive heatmap visualization. It can be unified with a digital pathology pipeline to deliver prostate cancer metrics for a pathology report.
Collapse
Affiliation(s)
| | - Xiyue Wang
- College of Computer Science, Sichuan University, Chengdu, China
| | - Xinran Wang
- Department of Pathology, The Fourth Hospital of Hebei Medical University, Shijiazhuang, China
| | | | - Sen Yang
- AI Lab, Tencent, Shenzhen, China
| | - Wei Yang
- AI Lab, Tencent, Shenzhen, China
| | - Xiao Han
- AI Lab, Tencent, Shenzhen, China
| | - Yueping Liu
- Department of Pathology, The Fourth Hospital of Hebei Medical University, Shijiazhuang, China.
| |
Collapse
|
23
|
Buckler AJ, Marlevi D, Skenteris NT, Lengquist M, Kronqvist M, Matic L, Hedin U. In silico model of atherosclerosis with individual patient calibration to enable precision medicine for cardiovascular disease. Comput Biol Med 2023; 152:106364. [PMID: 36525832 DOI: 10.1016/j.compbiomed.2022.106364] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 11/01/2022] [Accepted: 11/25/2022] [Indexed: 12/03/2022]
Abstract
OBJECTIVE Guidance for preventing myocardial infarction and ischemic stroke by tailoring treatment for individual patients with atherosclerosis is an unmet need. Such development may be possible with computational modeling. Given the multifactorial biology of atherosclerosis, modeling must be based on complete biological networks that capture protein-protein interactions estimated to drive disease progression. Here, we aimed to develop a clinically relevant scale model of atherosclerosis, calibrate it with individual patient data, and use it to simulate optimized pharmacotherapy for individual patients. APPROACH AND RESULTS The study used a uniquely constituted plaque proteomic dataset to create a comprehensive systems biology disease model for simulating individualized responses to pharmacotherapy. Plaque tissue was collected from 18 patients with 6735 proteins at two locations per patient. 113 pathways were identified and included in the systems biology model of endothelial cells, vascular smooth muscle cells, macrophages, lymphocytes, and the integrated intima, altogether spanning 4411 proteins, demonstrating a range of 39-96% plaque instability. After calibrating the systems biology models for individual patients, we simulated intensive lipid-lowering, anti-inflammatory, and anti-diabetic drugs. We also simulated a combination therapy. Drug response was evaluated as the degree of change in plaque stability, where an improvement was defined as a reduction of plaque instability. In patients with initially unstable lesions, simulated responses varied from high (20%, on combination therapy) to marginal improvement, whereas patients with initially stable plaques showed generally less improvement. CONCLUSION In this pilot study, proteomics-based system biology modeling was shown to simulate drug response based on atherosclerotic plaque instability with a power of 90%, providing a potential strategy for improved personalized management of patients with cardiovascular disease.
Collapse
Affiliation(s)
- Andrew J Buckler
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden; Elucid Bioimaging Inc., Boston, MA, USA
| | - David Marlevi
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
| | - Nikolaos T Skenteris
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
| | - Mariette Lengquist
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
| | - Malin Kronqvist
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
| | - Ljubica Matic
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
| | - Ulf Hedin
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.
| |
Collapse
|
24
|
Liang JX, Chen Q, Gao W, Chen D, Qian XY, Bi JQ, Lin XC, Han BB, Liu JS. A novel glycosylation-related gene signature predicts survival in patients with lung adenocarcinoma. BMC Bioinformatics 2022; 23:562. [PMID: 36575396 PMCID: PMC9793550 DOI: 10.1186/s12859-022-05109-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 12/12/2022] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Lung adenocarcinoma (LUAD) is the most common malignant tumor that seriously affects human health. Previous studies have indicated that abnormal levels of glycosylation promote progression and poor prognosis of lung cancer. Thus, the present study aimed to explore the prognostic signature related to glycosyltransferases (GTs) for LUAD. METHODS The gene expression profiles were obtained from The Cancer Genome Atlas (TCGA) database, and GTs were obtained from the GlycomeDB database. Differentially expressed GTs-related genes (DGTs) were identified using edge package and Venn diagram. Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and ingenuity pathway analysis (IPA) methods were used to investigate the biological processes of DGTs. Subsequently, Cox and Least Absolute Shrinkage and Selection Operator (LASSO) regression analyses were performed to construct a prognostic model for LUAD. Kaplan-Meier (K-M) analysis was adopted to explore the overall survival (OS) of LUAD patients. The accuracy and specificity of the prognostic model were evaluated by receiver operating characteristic analysis (ROC). In addition, single-sample gene set enrichment analysis (ssGSEA) algorithm was used to analyze the infiltrating immune cells in the tumor environment. RESULTS A total of 48 DGTs were mainly enriched in the processes of glycosylation, glycoprotein biosynthetic process, glycosphingolipid biosynthesis-lacto and neolacto series, and cell-mediated immune response. Furthermore, B3GNT3, MFNG, GYLTL1B, ALG3, and GALNT13 were screened as prognostic genes to construct a risk model for LUAD, and the LUAD patients were divided into high- and low-risk groups. K-M curve suggested that patients with a high-risk score had shorter OS than those with a low-risk score. The ROC analysis demonstrated that the risk model efficiently diagnoses LUAD. Additionally, the proportion of infiltrating aDCs (p < 0.05) and Tgds (p < 0.01) was higher in the high-risk group than in the low-risk group. Spearman's correlation analysis manifested that the prognostic genes (MFNG and ALG3) were significantly correlated with infiltrating immune cells. CONCLUSION In summary, this study established a novel GTs-related risk model for the prognosis of LUAD patients, providing new therapeutic targets for LUAD. However, the biological role of glycosylation-related genes in LUAD needs to be explored further.
Collapse
Affiliation(s)
- Jin-Xiao Liang
- Department of Oncological Surgery, Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), No. 1 of Banshan East Road, Hangzhou, 310022, Zhejiang Province, Republic of China
- Institute of Cancer and Basic Medicine (IBMC), Chinese Academy of Sciences, Hangzhou, People's Republic of China
| | - Qian Chen
- Department of Oncological Surgery, Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), No. 1 of Banshan East Road, Hangzhou, 310022, Zhejiang Province, Republic of China
- Institute of Cancer and Basic Medicine (IBMC), Chinese Academy of Sciences, Hangzhou, People's Republic of China
| | - Wei Gao
- School of Medicine, Zhejiang University City College, Hangzhou, People's Republic of China
| | - Da Chen
- Department of Oncological Surgery, Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), No. 1 of Banshan East Road, Hangzhou, 310022, Zhejiang Province, Republic of China
- Institute of Cancer and Basic Medicine (IBMC), Chinese Academy of Sciences, Hangzhou, People's Republic of China
| | - Xin-Yu Qian
- School of Medicine, Zhejiang University City College, Hangzhou, People's Republic of China
| | - Jin-Qiao Bi
- School of Medicine, Zhejiang University City College, Hangzhou, People's Republic of China
| | - Xing-Chen Lin
- School of Medicine, Zhejiang University City College, Hangzhou, People's Republic of China
| | - Bing-Bing Han
- School of Medicine, Zhejiang University City College, Hangzhou, People's Republic of China
| | - Jin-Shi Liu
- Department of Oncological Surgery, Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), No. 1 of Banshan East Road, Hangzhou, 310022, Zhejiang Province, Republic of China.
- Institute of Cancer and Basic Medicine (IBMC), Chinese Academy of Sciences, Hangzhou, People's Republic of China.
| |
Collapse
|
25
|
Zhu J, Jiang Z, Feng L. Improved neural network with least square support vector machine for wastewater treatment process. CHEMOSPHERE 2022; 308:136116. [PMID: 36037940 DOI: 10.1016/j.chemosphere.2022.136116] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Revised: 07/22/2022] [Accepted: 08/16/2022] [Indexed: 06/15/2023]
Abstract
This research offers a unique interval by using the predicting approach for discharge indicators of water quality data such as biochemical oxygen demand (BOD) and ammonia nitrogen (NH3-N). This is considered one of the significant quality metrics in wastewater treatment plants for water quality management as well as surveillance. To begin, the effluent information for BOD/NH3-N and their supplementary parameters are gathered. Hence BOD and NH3 are considered major feature sources for estimating water pollutants. BOD is high then oxygen level is very low in the water due to pollutants or algae. Ammonia nitrogen is an organic waste component in water from sewage. The significant characteristics with good correlation levels of BOD and NH3-N are examined and identified using a grey correlation analysis method after certain basic data pre-processing procedures. The BOD/NH3-N effluent information of a water treatment plant is predicted using an upgraded feed-forward neural network with the least square support vector machine (FFNN-LSSVM) method. An optimization approach for an enhanced feed-forward neural network (IFFNN) is built by Machine Learning Algorithms. The IFFNN used regular influent water quality, influent rate of flow, and Wastewater performance monitoring and operational conditions as input parameters. For future prediction, input variables were previous different wastewater quality measurements. Lastly, the analysis shows that, when compared to other current algorithms, the proposed methodology can forecast wastewater quality of water with high accuracy in predicting BOD and NH3 levels, limited computation duration, mean error less than 10% and R2 is 90% proves better than existing techniques.
Collapse
Affiliation(s)
- Junren Zhu
- Chongqing City Management College, Chongqing, 401331, PR China
| | - Zhenzhen Jiang
- Chongqing Vocational Institute of Engineering, Chongqing, 402260, PR China
| | - Li Feng
- School of Civil and Transportation Engineering, Guangdong University of Technology, Guangzhou, 510006, Guangdong, PR China.
| |
Collapse
|
26
|
Wang C. Efficient customer segmentation in digital marketing using deep learning with swarm intelligence approach. Inf Process Manag 2022. [DOI: 10.1016/j.ipm.2022.103085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
27
|
Dual Regularized Unsupervised Feature Selection Based on Matrix Factorization and Minimum Redundancy with application in gene selection. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
28
|
Alsaleem MN, Islam MS, Al-Ahmadi S, Soudani A. Multiscale Encoding of Electrocardiogram Signals with a Residual Network for the Detection of Atrial Fibrillation. BIOENGINEERING (BASEL, SWITZERLAND) 2022; 9:bioengineering9090480. [PMID: 36135025 PMCID: PMC9495512 DOI: 10.3390/bioengineering9090480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 09/09/2022] [Accepted: 09/14/2022] [Indexed: 11/16/2022]
Abstract
Atrial fibrillation (AF) is one of the most common cardiac arrhythmias, and it is an indication of high-risk factors for stroke, myocardial ischemia, and other malignant cardiovascular diseases. Most of the existing AF detection methods typically convert one-dimensional time-series electrocardiogram (ECG) signals into two-dimensional representations to train a deep and complex AF detection system, which results in heavy training computation and high implementation costs. In this paper, a multiscale signal encoding scheme is proposed to improve feature representation and detection performance without the need for using any transformation or handcrafted feature engineering techniques. The proposed scheme uses different kernel sizes to produce the encoded signal by using multiple streams that are passed into a one-dimensional sequence of blocks of a residual convolutional neural network (ResNet) to extract representative features from the input ECG signal. This also allows networks to grow in breadth rather than in depth, thus reducing the computing time by using the parallel processing capability of deep learning networks. We investigated the effects of the use of a different number of streams with different kernel sizes on the performance. Experiments were carried out for a performance evaluation using the publicly available PhysioNet CinC Challenge 2017 dataset. The proposed multiscale encoding scheme outperformed existing deep learning-based methods with an average F1 score of 98.54%, but with a lower network complexity.
Collapse
|