1
|
Yang M, He L, Liu W, Zhang Y, Huang H. Performance improvement of atherosclerosis risk assessment based on feature interaction. Comput Methods Programs Biomed 2024; 249:108139. [PMID: 38554640 DOI: 10.1016/j.cmpb.2024.108139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 03/06/2024] [Accepted: 03/18/2024] [Indexed: 04/02/2024]
Abstract
BACKGROUND AND OBJECTIVE Cardiovascular disease is a leading cause of mortality and premature death. Early intervention in asymptomatic individuals through risk assessment can reduce the incidence of disease. Atherosclerosis is a major cause of cardiovascular disease and early detection can effectively prevent and treat it. In this study, we used real patient data to evaluate the risk of atherosclerosis, assisting doctors in diagnosis and reducing the incidence of cardiovascular disease. METHODS We proposed a multi-stage atherosclerosis risk assessment model that includes three main stages: (i) SMOTE and decorrelation weighting algorithm technology were added to the causal stability middle layer to address class imbalance in the dataset and reduce the impact of feature-induced dataset distribution shifts on model differences. (ii) The feature interaction layer considered possible feature interactions and classified features by different categories. By adding more effective feature information, the accuracy and generalizability of the model were improved. (iii) In the integrated model layer, we chose LightGBM as the decision tree integration model for risk assessment because it has higher accuracy and robustness compared to other machine learning algorithms. RESULTS The final model used a dataset containing 21 original features and 17 interaction features, achieving excellent performance under a 10-fold cross-validation strategy. The macro accuracy reached 93.86%, macro precision was 94.82%, macro recall was 93.52%, and macro F1 score was as high as 93.37%. These indicators demonstrate the accuracy and robustness of the model in atherosclerosis risk assessment. CONCLUSION The model provides strong support for the prevention and diagnosis of cardiovascular disease. Through atherosclerosis risk assessment, the model can help doctors develop personalized prevention and treatment plans, which is of great significance for the prevention and treatment of cardiovascular disease.
Collapse
Affiliation(s)
- Mengdie Yang
- School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing 210044, China
| | - Lidan He
- School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing 210044, China
| | - Wenjun Liu
- School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing 210044, China.
| | - Yudong Zhang
- School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing 210044, China
| | - Hui Huang
- Department of Ultrasound, Affiliated Hospital of Nanjing University of Chinese Medicine, Nanjing 210029, China
| |
Collapse
|
2
|
Yin Y, Ochieng ND, Sun J, Bao X, Wang Z. PeNet: A feature excitation learning approach to advertisement click-through rate prediction. Neural Netw 2024; 172:106127. [PMID: 38232422 DOI: 10.1016/j.neunet.2024.106127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 11/13/2023] [Accepted: 01/11/2024] [Indexed: 01/19/2024]
Abstract
Since the physical meaning of the fields of the dataset is unknown, we have to use the feature interaction method to select the correlated features and exclude uncorrelated features. The current state-of-the-art methods employ various methods based on feature interaction to predict advertisement Click-Through Rate (CTR); however, the feature interaction based on potential new feature mining is rarely considered, which can provide effective assistance for feature interaction. This motivates us to investigate methods that combine potential new features and feature interactions. Thus, we propose a potential feature excitation learning network (PeNet), which is a neural network model based on feature combination and feature interaction. In PeNet, we treat the row compression and column compression of the original feature matrix as potential new features, and proposed the excitation learning mechanism that is a weighted mechanism based on residual principle. Through this excitation learning mechanism, the original embedded features and potential new features are subjected to weighted interaction based on the residual principle. Moreover, a deep neural network is exploited to iteratively learn and iteratively combine features. The excitation learning structure of PeNet neural network is well demonstrated in this paper, that is, the control flow of embedding, compression, excitation and output, which further strengthens the correlated features and weakens the uncorrelated features by compressing and expanding the features. Experimental results on multiple benchmark datasets indicate the PeNet as a general-purpose plug-in has more superior performance and better efficiency than previous state-of-the-art methods.
Collapse
Affiliation(s)
- Yunfei Yin
- College of Computer Science, Chongqing University, Chongqing, 400044, China.
| | | | - Jingqin Sun
- College of Computer Science, Chongqing University, Chongqing, 400044, China
| | - Xianjian Bao
- Maharishi University of Management, Fairfield, IA, USA
| | - Zhuowei Wang
- Australian Artificial Intelligence Institute, University of Technology Sydney, NSW, Australia
| |
Collapse
|
3
|
Fuhr ACFP, Gonçalves IDM, Santos LO, Salau NPG. Machine learning modeling and additive explanation techniques for glutathione production from multiple experimental growth conditions of Saccharomyces cerevisiae. Int J Biol Macromol 2024; 262:130035. [PMID: 38336325 DOI: 10.1016/j.ijbiomac.2024.130035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 01/27/2024] [Accepted: 02/05/2024] [Indexed: 02/12/2024]
Abstract
Glutathione (GSH) production is of great industrial interest due to its essential properties. This study aimed to use machine learning (ML) methods to model GSHproduction under different growth conditions of Saccharomyces cerevisiae, namely cultivation time, culture volume, pressure, and magnetic field application. Different ML and regression models were evaluated for their statistics to select the most robust model. Results showed that eXtreme Gradient Boosting (XGB) was the best predictive performance model. From the best model, additive explanation techniques were used to identify the feature importance of process. According to variable analysis, the best conditions to obtain the highest GSH concentrations would be cultivation times of 72-96 h, low magnetic field intensity (3.02 mT), low pressure (0.5 kgf.cm-2), and high culture volume (3.5 L). XGB use and additive explanation techniques proved promising for determining process optimization conditions and selecting the essential process variables.
Collapse
|
4
|
Qin C, Zheng B, Zeng J, Chen Z, Zhai Y, Genovese A, Piuri V, Scotti F. Dynamically aggregating MLPs and CNNs for skin lesion segmentation with geometry regularization. Comput Methods Programs Biomed 2023; 238:107601. [PMID: 37210926 DOI: 10.1016/j.cmpb.2023.107601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 04/24/2023] [Accepted: 05/13/2023] [Indexed: 05/23/2023]
Abstract
BACKGROUND AND OBJECTIVE Melanoma is a highly malignant skin tumor. Accurate segmentation of skin lesions from dermoscopy images is pivotal for computer-aided diagnosis of melanoma. However, blurred lesion boundaries, variable lesion shapes, and other interference factors pose a challenge in this regard. METHODS This work proposes a novel framework called CFF-Net (Cross Feature Fusion Network) for supervised skin lesion segmentation. The encoder of the network includes dual branches, where the CNNs branch aims to extract rich local features while MLPs branch is used to establish both the global-spatial-dependencies and global-channel-dependencies for precise delineation of skin lesions. Besides, a feature-interaction module between two branches is designed for strengthening the feature representation by allowing dynamic exchange of spatial and channel information, so as to retain more spatial details and inhibit irrelevant noise. Moreover, an auxiliary prediction task is introduced to learn the global geometric information, highlighting the boundary of the skin lesion. RESULTS Comprehensive experiments using four publicly available skin lesion datasets (i.e., ISIC 2018, ISIC 2017, ISIC 2016, and PH2) indicated that CFF-Net outperformed the state-of-the-art models. In particular, CFF-Net greatly increased the average Jaccard Index score from 79.71% to 81.86% in ISIC 2018, from 78.03% to 80.21% in ISIC 2017, from 82.58% to 85.38% in ISIC 2016, and from 84.18% to 89.71% in PH2 compared with U-Net. Ablation studies demonstrated the effectiveness of each proposed component. Cross-validation experiments in ISIC 2018 and PH2 datasets verified the generalizability of CFF-Net under different skin lesion data distributions. Finally, comparison experiments using three public datasets demonstrated the superior performance of our model. CONCLUSION The proposed CFF-Net performed well in four public skin lesion datasets, especially for challenging cases with blurred edges of skin lesions and low contrast between skin lesions and background. CFF-Net can be employed for other segmentation tasks with better prediction and more accurate delineation of boundaries.
Collapse
Affiliation(s)
- Chuanbo Qin
- Faculty of Intelligent Manufacturing, Wuyi University, Jiangmen 529020, China
| | - Bin Zheng
- Faculty of Intelligent Manufacturing, Wuyi University, Jiangmen 529020, China
| | - Junying Zeng
- Faculty of Intelligent Manufacturing, Wuyi University, Jiangmen 529020, China.
| | - Zhuyuan Chen
- Faculty of Intelligent Manufacturing, Wuyi University, Jiangmen 529020, China
| | - Yikui Zhai
- Faculty of Intelligent Manufacturing, Wuyi University, Jiangmen 529020, China
| | - Angelo Genovese
- Departimento di Information, Università degli Studi di Milano, 20133 Milano, Italy
| | - Vincenzo Piuri
- Departimento di Information, Università degli Studi di Milano, 20133 Milano, Italy
| | - Fabio Scotti
- Departimento di Information, Università degli Studi di Milano, 20133 Milano, Italy
| |
Collapse
|
5
|
Chen L, Sun ZL. PmliHFM: Predicting Plant miRNA-lncRNA Interactions with Hybrid Feature Mining Network. Interdiscip Sci 2023; 15:44-54. [PMID: 36223068 DOI: 10.1007/s12539-022-00540-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 09/27/2022] [Accepted: 09/27/2022] [Indexed: 11/07/2022]
Abstract
Due to the crucial role of interactions between microRNAs (miRNAs) and long non-coding RNAs (lncRNAs) in biological processes, the study of their biological functions is necessary. So far, the various computational methods have been employed to make predictions of the miRNA-lncRNA interaction, which compensate for the inadequacy of biological experiments. However, the existing methods do not consider the differences between miRNA and lncRNA in feature extraction. In this paper, we propose a hybrid feature mining network, named PmliHFM, for predicting plant miRNA-lncRNA interactions. Firstly, miRNA and lncRNA with different sequence lengths are encoded by different encodings, which can reduce the loss of information caused by using the same coding approach. Then, a hybrid feature mining network is designed to adapt to different encoding methods and extract more useful feature information than a single network. Finally, an ensemble module is utilized to integrate the training results of the hybrid feature mining network, while a prediction module is employed to determine whether there are interactions. By testing on multiple test sets, PmliHFM outperforms several state-of-the-art approaches. The results show that the AUC of PmliHFM achieves 0.8[Formula: see text], 3.1[Formula: see text] and 0.4[Formula: see text] improvement respectively on three balanced datasets, and achieves 2.1[Formula: see text] and 1.8[Formula: see text] improvement respectively on two imbalanced datasets. These experiments demonstrate the feasibility of the proposed method.
Collapse
Affiliation(s)
- Lin Chen
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui University, Hefei, 230601, Anhui, China
- School of Electrical Engineering and Automation, Anhui University, Hefei, 230601, Anhui, China
| | - Zhan-Li Sun
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui University, Hefei, 230601, Anhui, China.
- School of Electrical Engineering and Automation, Anhui University, Hefei, 230601, Anhui, China.
| |
Collapse
|
6
|
Sha C, Cuperlovic-Culf M, Hu T. SMILE: systems metabolomics using interpretable learning and evolution. BMC Bioinformatics 2021; 22:284. [PMID: 34049495 PMCID: PMC8161935 DOI: 10.1186/s12859-021-04209-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Accepted: 05/18/2021] [Indexed: 11/23/2022] Open
Abstract
Background Direct link between metabolism and cell and organism phenotype in health and disease makes metabolomics, a high throughput study of small molecular metabolites, an essential methodology for understanding and diagnosing disease development and progression. Machine learning methods have seen increasing adoptions in metabolomics thanks to their powerful prediction abilities. However, the “black-box” nature of many machine learning models remains a major challenge for wide acceptance and utility as it makes the interpretation of decision process difficult. This challenge is particularly predominant in biomedical research where understanding of the underlying decision making mechanism is essential for insuring safety and gaining new knowledge. Results In this article, we proposed a novel computational framework, Systems Metabolomics using Interpretable Learning and Evolution (SMILE), for supervised metabolomics data analysis. Our methodology uses an evolutionary algorithm to learn interpretable predictive models and to identify the most influential metabolites and their interactions in association with disease. Moreover, we have developed a web application with a graphical user interface that can be used for easy analysis, interpretation and visualization of the results. Performance of the method and utilization of the web interface is shown using metabolomics data for Alzheimer’s disease. Conclusions SMILE was able to identify several influential metabolites on AD and to provide interpretable predictive models that can be further used for a better understanding of the metabolic background of AD. SMILE addresses the emerging issue of interpretability and explainability in machine learning, and contributes to more transparent and powerful applications of machine learning in bioinformatics. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04209-1.
Collapse
Affiliation(s)
- Chengyuan Sha
- School of Computing, Queen's University, Kingston, ON, Canada
| | | | - Ting Hu
- School of Computing, Queen's University, Kingston, ON, Canada.
| |
Collapse
|
7
|
Urbanowicz RJ, Meeker M, La Cava W, Olson RS, Moore JH. Relief-based feature selection: Introduction and review. J Biomed Inform 2018; 85:189-203. [PMID: 30031057 PMCID: PMC6299836 DOI: 10.1016/j.jbi.2018.07.014] [Citation(s) in RCA: 298] [Impact Index Per Article: 49.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Revised: 06/29/2018] [Accepted: 07/14/2018] [Indexed: 01/25/2023]
Abstract
Feature selection plays a critical role in biomedical data mining, driven by increasing feature dimensionality in target problems and growing interest in advanced but computationally expensive methodologies able to model complex associations. Specifically, there is a need for feature selection methods that are computationally efficient, yet sensitive to complex patterns of association, e.g. interactions, so that informative features are not mistakenly eliminated prior to downstream modeling. This paper focuses on Relief-based algorithms (RBAs), a unique family of filter-style feature selection algorithms that have gained appeal by striking an effective balance between these objectives while flexibly adapting to various data characteristics, e.g. classification vs. regression. First, this work broadly examines types of feature selection and defines RBAs within that context. Next, we introduce the original Relief algorithm and associated concepts, emphasizing the intuition behind how it works, how feature weights generated by the algorithm can be interpreted, and why it is sensitive to feature interactions without evaluating combinations of features. Lastly, we include an expansive review of RBA methodological research beyond Relief and its popular descendant, ReliefF. In particular, we characterize branches of RBA research, and provide comparative summaries of RBA algorithms including contributions, strategies, functionality, time complexity, adaptation to key data characteristics, and software availability.
Collapse
Affiliation(s)
- Ryan J Urbanowicz
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | | | - William La Cava
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | - Randal S Olson
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | - Jason H Moore
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
8
|
Abstract
Background Biological data such as microarrays contain a huge number of features. Thus, it is necessary to select a small number of novel features to characterize the entire dataset. All combinations of the features subset must be evaluated to produce an ideal feature subset, but this is impossible using currently available computing power. Feature selection or feature subset selection provides a sub-optimal solution within a reasonable amount of time. Results In this study, we propose an improved feature selection method that uses information based on all the pairwise evaluations for a given dataset. We modify the original feature selection algorithms to use pre-evaluation information. The pre-evaluation captures the quality and interactions between two features. The feature subset should be improved by using the top ranking pairs for two features in the selection process. Conclusions Experimental results demonstrated that the proposed method improved the quality of the feature subset produced by modified feature selection algorithms. The proposed method can be applied to microarray and other high-dimensional data.
Collapse
Affiliation(s)
- Songlu Li
- Department of Nanobiomedical Science, Dankook University, Cheonan, 330-714, Korea.,Department of Computer Science and Technologies, Yanbian University of Science & Technology, Yanji City, China
| | - Sejong Oh
- Department of Nanobiomedical Science, Dankook University, Cheonan, 330-714, Korea.
| |
Collapse
|