1
|
Zhong L, Huang R, Gao L, Yue J, Zhao B, Nie L, Li L, Wu A, Zhang K, Meng Z, Cao G, Zhang H, Zang H. A Novel Variable Selection Method Based on Binning-Normalized Mutual Information for Multivariate Calibration. Molecules 2023; 28:5672. [PMID: 37570642 PMCID: PMC10419756 DOI: 10.3390/molecules28155672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 07/17/2023] [Accepted: 07/19/2023] [Indexed: 08/13/2023] Open
Abstract
Variable (wavelength) selection is essential in the multivariate analysis of near-infrared spectra to improve model performance and provide a more straightforward interpretation. This paper proposed a new variable selection method named binning-normalized mutual information (B-NMI) based on information entropy theory. "Data binning" was applied to reduce the effects of minor measurement errors and increase the features of near-infrared spectra. "Normalized mutual information" was employed to calculate the correlation between each wavelength and the reference values. The performance of B-NMI was evaluated by two experimental datasets (ideal ternary solvent mixture dataset, fluidized bed granulation dataset) and two public datasets (gasoline octane dataset, corn protein dataset). Compared with classic methods of backward and interval PLS (BIPLS), variable importance projection (VIP), correlation coefficient (CC), uninformative variables elimination (UVE), and competitive adaptive reweighted sampling (CARS), B-NMI not only selected the most featured wavelengths from the spectra of complex real-world samples but also improved the stability and robustness of variable selection results.
Collapse
Affiliation(s)
- Liang Zhong
- NMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, China; (L.Z.); (R.H.); (L.G.); (J.Y.); (B.Z.); (L.N.); (L.L.); (A.W.); (K.Z.)
| | - Ruiqi Huang
- NMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, China; (L.Z.); (R.H.); (L.G.); (J.Y.); (B.Z.); (L.N.); (L.L.); (A.W.); (K.Z.)
| | - Lele Gao
- NMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, China; (L.Z.); (R.H.); (L.G.); (J.Y.); (B.Z.); (L.N.); (L.L.); (A.W.); (K.Z.)
| | - Jianan Yue
- NMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, China; (L.Z.); (R.H.); (L.G.); (J.Y.); (B.Z.); (L.N.); (L.L.); (A.W.); (K.Z.)
| | - Bing Zhao
- NMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, China; (L.Z.); (R.H.); (L.G.); (J.Y.); (B.Z.); (L.N.); (L.L.); (A.W.); (K.Z.)
| | - Lei Nie
- NMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, China; (L.Z.); (R.H.); (L.G.); (J.Y.); (B.Z.); (L.N.); (L.L.); (A.W.); (K.Z.)
| | - Lian Li
- NMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, China; (L.Z.); (R.H.); (L.G.); (J.Y.); (B.Z.); (L.N.); (L.L.); (A.W.); (K.Z.)
| | - Aoli Wu
- NMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, China; (L.Z.); (R.H.); (L.G.); (J.Y.); (B.Z.); (L.N.); (L.L.); (A.W.); (K.Z.)
| | - Kefan Zhang
- NMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, China; (L.Z.); (R.H.); (L.G.); (J.Y.); (B.Z.); (L.N.); (L.L.); (A.W.); (K.Z.)
| | - Zhaoqing Meng
- Shandong Hongjitang Pharmaceutical Group Co. Ltd., Jinan 250103, China; (Z.M.); (G.C.)
| | - Guiyun Cao
- Shandong Hongjitang Pharmaceutical Group Co. Ltd., Jinan 250103, China; (Z.M.); (G.C.)
| | - Hui Zhang
- NMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, China; (L.Z.); (R.H.); (L.G.); (J.Y.); (B.Z.); (L.N.); (L.L.); (A.W.); (K.Z.)
- National Glycoengineering Research Center, Shandong University, Jinan 250012, China
| | - Hengchang Zang
- NMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan 250012, China; (L.Z.); (R.H.); (L.G.); (J.Y.); (B.Z.); (L.N.); (L.L.); (A.W.); (K.Z.)
- National Glycoengineering Research Center, Shandong University, Jinan 250012, China
- Key Laboratory of Chemical Biology, Ministry of Education, Shandong University, Jinan 250012, China
| |
Collapse
|
2
|
Monotone submodular subset for sentiment analysis of online reviews. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-05845-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
3
|
Some Applications of ANN to Solar Radiation Estimation and Forecasting for Energy Applications. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9010209] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In solar energy, the knowledge of solar radiation is very important for the integration of energy systems in building or electrical networks. Global horizontal irradiation (GHI) data are rarely measured over the world, thus an artificial neural network (ANN) model was built to calculate this data from more available ones. For the estimation of 5-min GHI, the normalized root mean square error (nRMSE) of the 6-inputs model is 19.35%. As solar collectors are often tilted, a second ANN model was developed to transform GHI into global tilted irradiation (GTI), a difficult task due to the anisotropy of scattering phenomena in the atmosphere. The GTI calculation from GHI was realized with an nRMSE around 8% for the optimal configuration. These two models estimate solar data at time, t, from other data measured at the same time, t. For an optimal management of energy, the development of forecasting tools is crucial because it allows anticipation of the production/consumption balance; thus, ANN models were developed to forecast hourly direct normal (DNI) and GHI irradiations for a time horizon from one hour (h+1) to six hours (h+6). The forecasting of hourly solar irradiation from h+1 to h+6 using ANN was realized with an nRMSE from 22.57% for h+1 to 34.85% for h+6 for GHI and from 38.23% for h+1 to 61.88% for h+6 for DNI.
Collapse
|
4
|
|
5
|
Tal O, Tran TD. New perspectives on multilocus ancestry informativeness. Math Biosci 2018; 306:60-81. [PMID: 30385120 DOI: 10.1016/j.mbs.2018.10.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Revised: 10/24/2018] [Accepted: 10/25/2018] [Indexed: 10/28/2022]
Abstract
We present an axiomatic approach for multilocus informativeness measures for determining the amount of information that a set of polymorphic genetic markers provides about individual ancestry. We then reveal several surprising properties of a decision-theoretic based measure that is consistent with the set of proposed criteria for multilocus informativeness. In particular, these properties highlight the interplay between information originating from population priors and the information extractable from the population genetic variants. This analysis then reveals a certain deficiency of mutual information based multilocus informativeness measures when such population priors are incorporated. Finally, we analyse and quantify the inevitable inherent decrease in informativeness due to learning from finite population samples.
Collapse
Affiliation(s)
- Omri Tal
- Max-Planck-Institute for Mathematics in the Sciences, Inselstrasse 22, Leipzig D-04103 Germany.
| | - Tat Dat Tran
- Max-Planck-Institute for Mathematics in the Sciences, Inselstrasse 22, Leipzig D-04103 Germany.
| |
Collapse
|
6
|
|
7
|
Lai CM. Multi-objective simplified swarm optimization with weighting scheme for gene selection. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2017.12.049] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
8
|
|
9
|
Lu H, Chen J, Yan K, Jin Q, Xue Y, Gao Z. A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.07.080] [Citation(s) in RCA: 177] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
10
|
Lai CM, Yeh WC, Chang CY. Gene selection using information gain and improved simplified swarm optimization. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2016.08.089] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
11
|
|
12
|
Wei M, Chow TW, Chan RH. Heterogeneous feature subset selection using mutual information-based feature transformation. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2015.05.053] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
13
|
|
14
|
|
15
|
Zhang Y, Yang C, Yang A, Xiong C, Zhou X, Zhang Z. Feature selection for classification with class-separability strategy and data envelopment analysis. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2015.03.081] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
16
|
Chen C, Yan X. Optimization of a multilayer neural network by using minimal redundancy maximal relevance-partial mutual information clustering with least square regression. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:1177-1187. [PMID: 25055386 DOI: 10.1109/tnnls.2014.2334599] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
In this paper, an optimized multilayer feed-forward network (MLFN) is developed to construct a soft sensor for controlling naphtha dry point. To overcome the two main flaws in the structure and weight of MLFNs, which are trained by a back-propagation learning algorithm, minimal redundancy maximal relevance-partial mutual information clustering (mPMIc) integrated with least square regression (LSR) is proposed to optimize the MLFN. The mPMIc can determine the location of hidden layer nodes using information in the hidden and output layers, as well as remove redundant hidden layer nodes. These selected nodes are highly related to output data, but are minimally correlated with other hidden layer nodes. The weights between the selected hidden layer nodes and output layer are then updated through LSR. When the redundant nodes from the hidden layer are removed, the ideal MLFN structure can be obtained according to the test error results. In actual applications, the naphtha dry point must be controlled accurately because it strongly affects the production yield and the stability of subsequent operational processes. The mPMIc-LSR MLFN with a simple network size performs better than other improved MLFN variants and existing efficient models.
Collapse
|
17
|
Zhang Y, Yang A, Xiong C, Wang T, Zhang Z. Feature selection using data envelopment analysis. Knowl Based Syst 2014. [DOI: 10.1016/j.knosys.2014.03.022] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
18
|
Chang CJ, Li DC, Dai WL, Chen CC. A latent information function to extend domain attributes to improve the accuracy of small-data-set forecasting. Neurocomputing 2014. [DOI: 10.1016/j.neucom.2013.09.024] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
19
|
Singh B, Kushwaha N, Vyas OP. A Feature Subset Selection Technique for High Dimensional Data Using Symmetric Uncertainty. ACTA ACUST UNITED AC 2014. [DOI: 10.4236/jdaip.2014.24012] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
20
|
|
21
|
|
22
|
Fan Y, Qin S. A new method of image classification based on local appearance and context information. Neurocomputing 2013. [DOI: 10.1016/j.neucom.2012.04.041] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
23
|
Abstract
In this paper, a novel feature selection method based on rough sets and mutual information is proposed. The dependency of each feature guides the selection, and mutual information is employed to reduce the features which do not favor addition of dependency significantly. So the dependency of the subset found by our method reaches maximum with small number of features. Since our method evaluates both definitive relevance and uncertain relevance by a combined selection criterion of dependency and class-based distance metric, the feature subset is more relevant than other rough sets based methods. As a result, the subset is near optimal solution. In order to verify the contribution, eight different classification applications are employed. Our method is also employed on a real Alzheimer's disease dataset, and finds a feature subset where classification accuracy arrives at 81.3%. Those present results verify the contribution of our method.
Collapse
Affiliation(s)
- Bing Li
- Department of Electronic Engineering, City University of Hong Kong, 83 Tat Chu Avenue, Kowloon, Hong Kong
| | - Tommy W S Chow
- Department of Electronic Engineering, City University of Hong Kong, 83 Tat Chu Avenue, Kowloon, Hong Kong
| | - Di Huang
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA
| |
Collapse
|
24
|
Maji P, Garai P. On fuzzy-rough attribute selection: Criteria of Max-Dependency, Max-Relevance, Min-Redundancy, and Max-Significance. Appl Soft Comput 2013. [DOI: 10.1016/j.asoc.2012.09.006] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
25
|
|
26
|
|
27
|
Sun X, Liu Y, Xu M, Chen H, Han J, Wang K. Feature selection using dynamic weights for classification. Knowl Based Syst 2013. [DOI: 10.1016/j.knosys.2012.10.001] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
28
|
Feature selection based on cluster and variability analyses for ordinal multi-class classification problems. Knowl Based Syst 2013. [DOI: 10.1016/j.knosys.2012.07.018] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
29
|
|
30
|
Ghasemi JB, Zolfonoun E. A New Variable Selection Method Based on Mutual Information Maximization by Replacing Collinear Variables for Nonlinear Quantitative Structure-Property Relationship Models. B KOREAN CHEM SOC 2012. [DOI: 10.5012/bkcs.2012.33.5.1527] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
31
|
Oveisi F, Oveisi S, Erfanian A, Patras I. Tree-structured feature extraction using mutual information. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2012; 23:127-137. [PMID: 24808462 DOI: 10.1109/tnnls.2011.2178447] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
One of the most informative measures for feature extraction (FE) is mutual information (MI). In terms of MI, the optimal FE creates new features that jointly have the largest dependency on the target class. However, obtaining an accurate estimate of a high-dimensional MI as well as optimizing with respect to it is not always easy, especially when only small training sets are available. In this paper, we propose an efficient tree-based method for FE in which at each step a new feature is created by selecting and linearly combining two features such that the MI between the new feature and the class is maximized. Both the selection of the features to be combined and the estimation of the coefficients of the linear transform rely on estimating 2-D MIs. The estimation of the latter is computationally very efficient and robust. The effectiveness of our method is evaluated on several real-world data sets. The results show that the classification accuracy obtained by the proposed method is higher than that achieved by other FE methods.
Collapse
|
32
|
RETRACTED ARTICLE: Feature selection for machine learning classification problems: a recent overview. Artif Intell Rev 2011. [DOI: 10.1007/s10462-011-9230-1] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
33
|
Han M, Liang Z, Li D. Sparse kernel density estimations and its application in variable selection based on quadratic Renyi entropy. Neurocomputing 2011. [DOI: 10.1016/j.neucom.2011.01.022] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
34
|
Liu H, Liu L, Zhang H. Boosting feature selection using information metric for classification. Neurocomputing 2009. [DOI: 10.1016/j.neucom.2009.08.012] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
35
|
Chow T, Piyang Wang, Ma E. A New Feature Selection Scheme Using a Data Distribution Factor for Unsupervised Nominal Data. ACTA ACUST UNITED AC 2008; 38:499-509. [DOI: 10.1109/tsmcb.2007.914707] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
36
|
|
37
|
|
38
|
Huang D, Chow T, Ma E, Jinyan Li. Efficient selection of discriminative genes from microarray gene expression data for cancer diagnosis. ACTA ACUST UNITED AC 2005. [DOI: 10.1109/tcsi.2005.852013] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
39
|
|