1
|
Hong S, Zhang Y, Li X, Teng A, Li L, Chen H. New approach for near-infrared wavelength selection using a combination of MIC and firefly evolution. Spectrochim Acta A Mol Biomol Spectrosc 2024; 316:124343. [PMID: 38676985 DOI: 10.1016/j.saa.2024.124343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Revised: 04/03/2024] [Accepted: 04/23/2024] [Indexed: 04/29/2024]
Abstract
Full-length spectral data analysis has a big problem that the variables are highly in collinearity and correlation. Spectral wavelength selection is a continuing hot topic in quantitative or qualitative analysis. In this paper, we propose a new approach for near-infrared (NIR) wavelength selection. The novel strategy mainly refers to the modification of maximum information coefficient (MIC) method and an improvement of firefly evolutionary algorithm. We introduce the orthogonal decomposition to modify the MIC method, so as to search the informative signals conceived in projection vectors. We also raise the common firefly algorithm (FA) as in the discretized mode, and design a novel adaptive mapping function to improve its intelligent computing effect. In experiment, the modified MIC (MICm) method and the adaptive discrete FA algorithm (DFAadp) are joint together for combined optimization of the NIR calibration model. The proposed combined modeling strategy is applied for quantitative analysis of the fishmeal samples, in the concern to select their informative variables/wavelengths. Experimental results indicate that the combination of MICm and DFAadp perform better than traditional MIC method and common DFA. We conclude that the proposed combined optimization strategy is beneficial for wavelength selection in NIR spectral analysis. It is anticipated to be validated for further applications in a wide range.
Collapse
Affiliation(s)
- Shaoyong Hong
- School of Data Science, Guangzhou Huashang College, Guangzhou 511300, China
| | - Youyou Zhang
- Department of General Education, Xuzhou College of Industrial Technology, Xuzhou, 221140, China
| | - Xinyi Li
- School of Data Science, Guangzhou Huashang College, Guangzhou 511300, China
| | - An Teng
- School of Data Science, Guangzhou Huashang College, Guangzhou 511300, China
| | - Linghui Li
- Faculty of Innovation Engineering, Macau University of Science and Technology, Macau SAR 999078, China
| | - Huazhou Chen
- School of Mathematics and Statistics, Guilin University of Technology, Guilin 541004, China.
| |
Collapse
|
2
|
Ge R, Zhou M, Luo Y, Meng Q, Mai G, Ma D, Wang G, Zhou F. McTwo: a two-step feature selection algorithm based on maximal information coefficient. BMC Bioinformatics 2016; 17:142. [PMID: 27006077 PMCID: PMC4804474 DOI: 10.1186/s12859-016-0990-0] [Citation(s) in RCA: 56] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Accepted: 03/14/2016] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND High-throughput bio-OMIC technologies are producing high-dimension data from bio-samples at an ever increasing rate, whereas the training sample number in a traditional experiment remains small due to various difficulties. This "large p, small n" paradigm in the area of biomedical "big data" may be at least partly solved by feature selection algorithms, which select only features significantly associated with phenotypes. Feature selection is an NP-hard problem. Due to the exponentially increased time requirement for finding the globally optimal solution, all the existing feature selection algorithms employ heuristic rules to find locally optimal solutions, and their solutions achieve different performances on different datasets. RESULTS This work describes a feature selection algorithm based on a recently published correlation measurement, Maximal Information Coefficient (MIC). The proposed algorithm, McTwo, aims to select features associated with phenotypes, independently of each other, and achieving high classification performance of the nearest neighbor algorithm. Based on the comparative study of 17 datasets, McTwo performs about as well as or better than existing algorithms, with significantly reduced numbers of selected features. The features selected by McTwo also appear to have particular biomedical relevance to the phenotypes from the literature. CONCLUSION McTwo selects a feature subset with very good classification performance, as well as a small feature number. So McTwo may represent a complementary feature selection algorithm for the high-dimensional biomedical datasets.
Collapse
Affiliation(s)
- Ruiquan Ge
- Shenzhen Institutes of Advanced Technology, and Key Lab for Health Informatics, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, Guangdong, 518055, P.R. China
- Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, Guangdong, 518055, P.R. China
| | - Manli Zhou
- Shenzhen Institutes of Advanced Technology, and Key Lab for Health Informatics, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, Guangdong, 518055, P.R. China
- Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, Guangdong, 518055, P.R. China
| | - Youxi Luo
- Shenzhen Institutes of Advanced Technology, and Key Lab for Health Informatics, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, Guangdong, 518055, P.R. China
- School of Science, Hubei University of Technology, Wuhan, Hubei, 430068, P.R. China
| | - Qinghan Meng
- Shenzhen Institutes of Advanced Technology, and Key Lab for Health Informatics, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, Guangdong, 518055, P.R. China
- Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, Guangdong, 518055, P.R. China
| | - Guoqin Mai
- Shenzhen Institutes of Advanced Technology, and Key Lab for Health Informatics, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, Guangdong, 518055, P.R. China
| | - Dongli Ma
- Shenzhen Children's Hospital, Shenzhen, Guangdong, 518026, P.R. China.
| | - Guoqing Wang
- Department of Pathogenobiology, Basic Medical College of Jilin University, Changchun, Jilin, China.
| | - Fengfeng Zhou
- Shenzhen Institutes of Advanced Technology, and Key Lab for Health Informatics, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, Guangdong, 518055, P.R. China.
| |
Collapse
|