1
|
Zhao H, Qiu S, Bai M, Wang L, Wang Z. Toxicity prediction and classification of Gunqile-7 with small sample based on transfer learning method. Comput Biol Med 2024; 173:108348. [PMID: 38531249 DOI: 10.1016/j.compbiomed.2024.108348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 03/10/2024] [Accepted: 03/17/2024] [Indexed: 03/28/2024]
Abstract
Drug-induced diseases are the most important component of iatrogenic disease. It is the duty of doctors to provide a reasonable and safe dose of medication. Gunqile-7 is a Mongolian medicine with analgesic and anti-inflammatory effects. As a foreign substance in the body, even with reasonable medication, it may produce varying degrees of adverse reactions or toxic side effects. Since the cost of collecting Gunqile-7 for pharmacological animal trials is high and the data sample is small, this paper employs transfer learning and data augmentation methods to study the toxicity of Gunqile-7. More specifically, to reduce the necessary number of training samples, the data augmentation approach is employed to extend the data set. Then, the transfer learning method and one-dimensional convolutional neural network are utilized to train the network. In addition, we use the support vector machine-recursive feature elimination method for feature selection to reduce features that have adverse effects on model predictions. Furthermore, due to the important role of the pre-trained model of transfer learning, we select a quantitative toxicity prediction model as the pre-trained model, which is consistent with the purpose of this paper. Lastly, the experimental results demonstrate the efficiency of the proposed method. Our method can improve accuracy by up to 9 percentage points compared to the method without transfer learning on a small sample set.
Collapse
Affiliation(s)
- Hongkai Zhao
- Key Laboratory of Intelligent Control and Optimization for Industrial Equipment of Ministry of Education, Dalian University of Technology, Dalian 116024, China; School of Control Science and Engineering, Dalian University of Technology, Dalian 116024, China.
| | - Sen Qiu
- Key Laboratory of Intelligent Control and Optimization for Industrial Equipment of Ministry of Education, Dalian University of Technology, Dalian 116024, China; School of Control Science and Engineering, Dalian University of Technology, Dalian 116024, China.
| | - Meirong Bai
- Key Laboratory of Ministry of Education of Mongolian Medicine RD Engineering, Inner Mongolia Minzu University, Tongliao 028000, China.
| | - Luyao Wang
- Key Laboratory of Intelligent Control and Optimization for Industrial Equipment of Ministry of Education, Dalian University of Technology, Dalian 116024, China; School of Control Science and Engineering, Dalian University of Technology, Dalian 116024, China.
| | - Zhelong Wang
- Key Laboratory of Intelligent Control and Optimization for Industrial Equipment of Ministry of Education, Dalian University of Technology, Dalian 116024, China; School of Control Science and Engineering, Dalian University of Technology, Dalian 116024, China.
| |
Collapse
|
2
|
Wang X, Yang Z, Ma N, Sun X, Li H, Zhou J, Yu X. A novel hypoglycemia alarm framework for type 2 diabetes with high glycemic variability. INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING 2024; 40:e3799. [PMID: 38148660 DOI: 10.1002/cnm.3799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2023] [Revised: 10/29/2023] [Accepted: 11/27/2023] [Indexed: 12/28/2023]
Abstract
In patients with type 2 diabetes (T2D), accurate prediction of hypoglycemic events is crucial for maintaining glycemic control and reducing their frequency. However, individuals with high blood glucose variability experience significant fluctuations over time, posing a challenge for early warning models that rely on static features. This article proposes a novel hypoglycemia early alarm framework based on dynamic feature selection. The framework incorporates domain knowledge and introduces multi-scale blood glucose features, including predicted values, essential for early warnings. To address the complexity of the feature matrix, a dynamic feature selection mechanism (Relief-SVM-RFE) is designed to effectively eliminate redundancy. Furthermore, the framework employs online updates for the random forest model, enhancing the learning of more relevant features. The effectiveness of the framework was evaluated using a clinical dataset. For T2D patients with a high coefficient of variation (CV), the framework achieved a sensitivity of 81.15% and specificity of 98.14%, accurately predicting most hypoglycemic events. Notably, the proposed method outperformed other existing approaches. These results indicate the feasibility of anticipating hypoglycemic events in T2D patients with high CV using this innovative framework.
Collapse
Affiliation(s)
- Xinzhuo Wang
- College of Information Science and Engineering, Northeastern University, Shenyang, China
| | - Zi Yang
- College of Information Science and Engineering, Northeastern University, Shenyang, China
| | - Ning Ma
- College of Information Science and Engineering, Northeastern University, Shenyang, China
| | - Xiaoyu Sun
- College of Information Science and Engineering, Northeastern University, Shenyang, China
| | - Hongru Li
- College of Information Science and Engineering, Northeastern University, Shenyang, China
| | - Jian Zhou
- Department of Endocrinology and Metabolism, Shanghai Clinical Center for Diabetes, Shanghai Diabetes Institute, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Xia Yu
- College of Information Science and Engineering, Northeastern University, Shenyang, China
| |
Collapse
|
3
|
Payra AK, Saha B, Ghosh A. MEM-FET: Essential protein prediction using membership feature and machine learning approach. Proteins 2024; 92:60-75. [PMID: 37638618 DOI: 10.1002/prot.26577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 02/21/2023] [Accepted: 08/08/2023] [Indexed: 08/29/2023]
Abstract
Proteins are played key roles in different functionalities in our daily life. All functional roles of a protein are a bit enhanced in interaction compared to individuals. Identification of essential proteins of an organism is a time consume and costly task during observation in the wet lab. The results of observation in wet lab always ensure high reliability and accuracy in the biological ground. Essential protein prediction using computational approaches is an alternative choice in research. It proves its significance rapidly in day-to-day life as well as reduces the experimental cost of wet lab effectively. Existing computational methods were implemented using Protein interaction networks (PPIN), Sequence, Gene Expression Dataset (GED), Gene Ontology (GO), Orthologous groups, and Subcellular localized datasets. Machine learning has diverse categories of features that enable to model and predict essential macromolecules of understudied organisms. A novel methodology MEM-FET (membership feature) is predicted based on features, that is, edge clustering coefficient, Average clustering coefficient, subcellular localization, and Gene Ontology within a compartment of common neighbors. The accuracy (ACC) values of the predicted true positive (TP) essential proteins are 0.79, 0.74, 0.78, and 0.71 for YHQ, YMIPS, YDIP, and YMBD datasets. An enriched set of essential proteins are also predicted using the MEM-FET algorithm. Ensemble ML also validated the proposed model with an accuracy of 60%. It has been predicted that MEM-FET algorithms outperform other existing algorithms with an ACC value of 80% for the yeast dataset.
Collapse
Affiliation(s)
- Anjan Kumar Payra
- Department of Computer Science and Engineering, Dr. Sudhir Chandra Sur Degree Engineering College, Kolkata, India
| | - Banani Saha
- Department of Computer Science and Engineering, University of Calcutta, Kolkata, India
| | - Anupam Ghosh
- Department of Computer Science and Engineering, Netaji Subhash Engineering College, Kolkata, India
| |
Collapse
|
4
|
Ding X, Li Y, Chen S. Maximum margin and global criterion based-recursive feature selection. Neural Netw 2024; 169:597-606. [PMID: 37956576 DOI: 10.1016/j.neunet.2023.10.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 06/19/2023] [Accepted: 10/22/2023] [Indexed: 11/15/2023]
Abstract
In this research paper, we aim to investigate and address the limitations of recursive feature elimination (RFE) and its variants in high-dimensional feature selection tasks. We identify two main challenges associated with these methods. Firstly, the feature ranking criterion utilized in these approaches is inconsistent with the maximum-margin theory. Secondly, the computation of the criterion is performed locally, lacking the ability to measure the importance of features globally. To overcome these challenges, we propose a novel feature ranking criterion called Maximum Margin and Global (MMG) criterion. This criterion utilizes the classification margin to determine the importance of features and computes it globally, enabling a more accurate assessment of feature importance. Moreover, we introduce an optimal feature subset evaluation algorithm that leverages the MMG criterion to determine the best subset of features. To enhance the efficiency of the proposed algorithms, we provide two alpha seeding strategies that significantly reduce computational costs while maintaining high accuracy. These strategies offer a practical means to expedite the feature selection process. Through extensive experiments conducted on ten benchmark datasets, we demonstrate that our proposed algorithms outperform current state-of-the-art methods. Additionally, the alpha seeding strategies yield significant speedups, further enhancing the efficiency of the feature selection process.
Collapse
Affiliation(s)
- Xiaojian Ding
- College of Information Engineering, Nanjing University of Finance and Economics, Nanjing 210023, China.
| | - Yi Li
- College of Economics and Management, Nanjing Agricultural University, Nanjing 210095, China
| | - Shilin Chen
- Thoracic Surgery, Nanjing Medical University Affiliated Cancer Hospital, Jiangsu Cancer Hospital, Jiangsu Institute of Cancer Research, Nanjing 221005, China
| |
Collapse
|
5
|
Ding X, Yang F, Ma F, Chen S. A Unified Multi-Class Feature Selection Framework for Microarray Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3725-3736. [PMID: 37698974 DOI: 10.1109/tcbb.2023.3314432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/14/2023]
Abstract
In feature selection research, simultaneous multi-class feature selection technologies are popular because they simultaneously select informative features for all classes. Recursive feature elimination (RFE) methods are state-of-the-art binary feature selection algorithms. However, extending existing RFE algorithms to multi-class tasks may increase the computational cost and lead to performance degradation. With this motivation, we introduce a unified multi-class feature selection (UFS) framework for randomization-based neural networks to address these challenges. First, we propose a new multi-class feature ranking criterion using the output weights of neural networks. The heuristic underlying this criterion is that "the importance of a feature should be related to the magnitude of the output weights of a neural network". Subsequently, the UFS framework utilizes the original features to construct a training model based on a randomization-based neural network, ranks these features by the criterion of the norm of the output weights, and recursively removes a feature with the lowest ranking score. Extensive experiments on 15 real-world datasets suggest that our proposed framework outperforms state-of-the-art algorithms. The code of UFS is available at https://github.com/SVMrelated/UFS.git.
Collapse
|
6
|
Zhao S, Meng J, Kang Q, Luan Y. Identifying LncRNA-Encoded Short Peptides Using Optimized Hybrid Features and Ensemble Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2873-2881. [PMID: 34383651 DOI: 10.1109/tcbb.2021.3104288] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Long non-coding RNA (lncRNA) contains short open reading frames (sORFs), and sORFs-encoded short peptides (SEPs) have become the focus of scientific studies due to their crucial role in life activities. The identification of SEPs is vital to further understanding their regulatory function. Bioinformatics methods can quickly identify SEPs to provide credible candidate sequences for verifying SEPs by biological experimenrts. However, there is a lack of methods for identifying SEPs directly. In this study, a machine learning method to identify SEPs of plant lncRNA (ISPL) is proposed. Hybrid features including sequence features and physicochemical features are extracted manually or adaptively to construct different modal features. In order to keep the stability of feature selection, the non-linear correction applied in Max-Relevance-Max-Distance (nocRD) feature selection method is proposed, which integrates multiple feature ranking results and uses the iterative random forest for different modal features dimensionality reduction. Classification models with different modal features are constructed, and their outputs are combined for ensemble classification. The experimental results show that the accuracy of ISPL is 89.86% percent on the independent test set, which will have important implications for further studies of functional genomic.
Collapse
|
7
|
Rani D, Gill NS, Gulia P, Chatterjee JM. An Ensemble-Based Multiclass Classifier for Intrusion Detection Using Internet of Things. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:1668676. [PMID: 35634069 PMCID: PMC9142322 DOI: 10.1155/2022/1668676] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 04/26/2022] [Indexed: 11/18/2022]
Abstract
Internet of Things (IoT) is the fastest growing technology that has applications in various domains such as healthcare, transportation. It interconnects trillions of smart devices through the Internet. A secure network is the basic necessity of the Internet of Things. Due to the increasing rate of interconnected and remotely accessible smart devices, more and more cybersecurity issues are being witnessed among cyber-physical systems. A perfect intrusion detection system (IDS) can probably identify various cybersecurity issues and their sources. In this article, using various telemetry datasets of different Internet of Things scenarios, we exhibit that external users can access the IoT devices and infer the victim user's activity by sniffing the network traffic. Further, the article presents the performance of various bagging and boosting ensemble decision tree techniques of machine learning in the design of an efficient IDS. Most of the previous IDSs just focused on good accuracy and ignored the execution speed that must be improved to optimize the performance of an ID model. Most of the earlier pieces of research focused on binary classification. This study attempts to evaluate the performance of various ensemble machine learning multiclass classification algorithms by deploying on openly available "TON-IoT" datasets of IoT and Industrial IoT (IIoT) sensors.
Collapse
Affiliation(s)
- Deepti Rani
- Department of Computer Science & Applications, Maharshi Dayanand University, Rohtak, Haryana, India
| | - Nasib Singh Gill
- Department of Computer Science & Applications, Maharshi Dayanand University, Rohtak, Haryana, India
| | - Preeti Gulia
- Department of Computer Science & Applications, Maharshi Dayanand University, Rohtak, Haryana, India
| | - Jyotir Moy Chatterjee
- Department of Information Technology, Lord Buddha Education Foundation, Kathmandu, Nepal
| |
Collapse
|
8
|
Yuan Y, Quan T, Song Y, Guan J, Zhou T, Wu R. Noise-immune Extreme Ensemble Learning for Early Diagnosis of Neuropsychiatric Systemic Lupus Erythematosus. IEEE J Biomed Health Inform 2022; 26:3495-3506. [PMID: 35380977 DOI: 10.1109/jbhi.2022.3164937] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Early diagnosis is currently the most effective way of saving the life of patients with neuropsychiatric systemic lupus erythematosus (NPSLE). However, it is rather difficult to detect this terrible disease at the early stage, due to the subtle and elusive symptomatic signals. Recent studies show that the 1H-MRS (proton magnetic resonance spectroscopy) imaging technique can capture more information reflecting the early appearance of this disease than conventional magnetic resonance imaging techniques. 1H-MRS data, however, also presents more noises that can bring serious diagnosis bias. We hence proposed a noise-immune extreme ensemble learning technique for effectively leveraging 1H-MRS data for advancing the early diagnosis of NPSLE. Our main results are that 1) by developing generalized maximum correntropy criterion in the kernel extreme learning setting, many types of non-Gaussian noises can be distinguished, and 2) weighted recursive feature elimination, using maximal information coefficient to weight feature's importance, helps to further alleviate the bad impact of noises on the diagnosis performance. The proposed method is assessed on a publicly available dataset with 97.5% accuracy, 95.8% sensitivity, and 99.9% specificity, which well demonstrates its efficacy.
Collapse
|
9
|
Ye Q, Zhang X, Lin X. Drug-target interaction prediction via multiple classification strategies. BMC Bioinformatics 2022; 22:461. [PMID: 35057737 PMCID: PMC8772044 DOI: 10.1186/s12859-021-04366-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Accepted: 09/08/2021] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Computational prediction of the interaction between drugs and protein targets is very important for the new drug discovery, as the experimental determination of drug-target interaction (DTI) is expensive and time-consuming. However, different protein targets are with very different numbers of interactions. Specifically, most interactions focus on only a few targets. As a result, targets with larger numbers of interactions could own enough positive samples for predicting their interactions but the positive samples for targets with smaller numbers of interactions could be not enough. Only using a classification strategy may not be able to deal with the above two cases at the same time. To overcome the above problem, in this paper, a drug-target interaction prediction method based on multiple classification strategies (MCSDTI) is proposed. In MCSDTI, targets are firstly divided into two parts according to the number of interactions of the targets, where one part contains targets with smaller numbers of interactions (TWSNI) and another part contains targets with larger numbers of interactions (TWLNI). And then different classification strategies are respectively designed for TWSNI and TWLNI to predict the interaction. Furthermore, TWSNI and TWLNI are evaluated independently, which can overcome the problem that result could be mainly determined by targets with large numbers of interactions when all targets are evaluated together. RESULTS We propose a new drug-target interaction (MCSDTI) prediction method, which uses multiple classification strategies. MCSDTI is tested on five DTI datasets, such as nuclear receptors (NR), ion channels (IC), G protein coupled receptors (GPCR), enzymes (E), and drug bank (DB). Experiments show that the AUCs of our method are respectively 3.31%, 1.27%, 2.02%, 2.02% and 1.04% higher than that of the second best methods on NR, IC, GPCR and E for TWLNI; And AUCs of our method are respectively 1.00%, 3.20% and 2.70% higher than the second best methods on NR, IC, and E for TWSNI. CONCLUSION MCSDTI is a competitive method compared to the previous methods for all target parts on most datasets, which administrates that different classification strategies for different target parts is an effective way to improve the effectiveness of DTI prediction.
Collapse
Affiliation(s)
- Qing Ye
- Hubei Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, China
| | - Xiaolong Zhang
- Hubei Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, China.
| | - Xiaoli Lin
- Hubei Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, China
| |
Collapse
|
10
|
Xiang H, Li A, Lin X. An Optimization Method for Drug-Target Interaction Prediction Based on RandSAS Strategy. LECTURE NOTES IN COMPUTER SCIENCE 2022:547-555. [DOI: 10.1007/978-3-031-13829-4_47] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
11
|
Lin X, Zhang X. Identification of hot regions in hub protein-protein interactions by clustering and PPRA optimization. BMC Med Inform Decis Mak 2021; 21:143. [PMID: 33941163 PMCID: PMC8094484 DOI: 10.1186/s12911-020-01350-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Accepted: 11/23/2020] [Indexed: 11/24/2022] Open
Abstract
Background Protein–protein interactions (PPIs) are the core of protein function, which provide an effective means to understand the function at cell level. Identification of PPIs is the crucial foundation of predicting drug-target interactions. Although traditional biological experiments of identifying PPIs are becoming available, these experiments remain to be extremely time-consuming and expensive. Therefore, various computational models have been introduced to identify PPIs. In protein-protein interaction network (PPIN), Hub protein, as a highly connected node, can coordinate PPIs and play biological functions. Detecting hot regions on Hub protein interaction interfaces is an issue worthy of discussing. Methods Two clustering methods, LCSD and RCNOIK are used to detect the hot regions on Hub protein interaction interfaces in this paper. In order to improve the efficiency of K-means clustering algorithm, the best k value is selected by calculating the distance square sum and the average silhouette coefficients. Then, the optimization of residue coordination number strategy is used to calculate the average coordination number. In addition, the pair potentials and relative ASA (PPRA) strategy is also used to optimize the predicted results. Results DataHub dataset and PartyHub dataset were used to train two clustering models respectively. Experiments show that LCSD and RCNOIK have the same coverage with Hub protein datasets, and RCNOIK is slightly higher than LCSD in Precision. The predicted hot regions are closer to the standard hot regions. Conclusions This paper optimizes two clustering methods based on PPRA strategy. Compared our methods for hot regions prediction against the well-known approaches, our improved methods have the higher reliability and are effective for predicting hot regions on Hub protein interaction interfaces.
Collapse
Affiliation(s)
- Xiaoli Lin
- Hubei Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, 430065, People's Republic of China.
| | - Xiaolong Zhang
- Hubei Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, 430065, People's Republic of China
| |
Collapse
|
12
|
Upadhyay D, Manero J, Zaman M, Sampalli S. Gradient Boosting Feature Selection With Machine Learning Classifiers for Intrusion Detection on Power Grids. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT 2021. [DOI: 10.1109/tnsm.2020.3032618] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|