Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Alirezanejad M, Enayatifar R, Motameni H, Nematzadeh H. Heuristic filter feature selection methods for medical datasets. Genomics 2020;112:1173-81. [DOI: 10.1016/j.ygeno.2019.07.002] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Revised: 06/19/2019] [Accepted: 07/01/2019] [Indexed: 11/23/2022]

For:	Alirezanejad M, Enayatifar R, Motameni H, Nematzadeh H. Heuristic filter feature selection methods for medical datasets. Genomics 2020;112:1173-81. [DOI: 10.1016/j.ygeno.2019.07.002] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Revised: 06/19/2019] [Accepted: 07/01/2019] [Indexed: 11/23/2022]

Number

Cited by Other Article(s)

Robert Vincent ACS, Sengan S. Effective clinical decision support implementation using a multi filter and wrapper optimisation model for Internet of Things based healthcare data. Sci Rep 2024;14:21820. [PMID: 39294200 PMCID: PMC11410983 DOI: 10.1038/s41598-024-71726-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Accepted: 08/30/2024] [Indexed: 09/20/2024] Open

Daneshvar NHN, Masoudi-Sobhanzadeh Y, Omidi Y. A voting-based machine learning approach for classifying biological and clinical datasets. BMC Bioinformatics 2023;24:140. [PMID: 37041456 PMCID: PMC10088226 DOI: 10.1186/s12859-023-05274-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Accepted: 04/05/2023] [Indexed: 04/13/2023] Open

Navin K, Nehemiah HK, Nancy Jane Y, Veena Saroji H. A classification framework using filter–wrapper based feature selection approach for the diagnosis of congenital heart failure. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2023. [DOI: 10.3233/jifs-221348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]

Robust classification of heart valve sound based on adaptive EMD and feature fusion. PLoS One 2022;17:e0276264. [PMID: 36480575 PMCID: PMC9731417 DOI: 10.1371/journal.pone.0276264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 10/03/2022] [Indexed: 12/13/2022] Open

Nematzadeh H, García-Nieto J, Navas-Delgado I, Aldana-Montes JF. Automatic frequency-based feature selection using discrete weighted evolution strategy. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

Abdelwahed NM, El-Tawel GS, Makhlouf MA. Effective hybrid feature selection using different bootstrap enhances cancers classification performance. BioData Min 2022;15:24. [PMID: 36175944 PMCID: PMC9523996 DOI: 10.1186/s13040-022-00304-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 08/31/2022] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Machine learning can be used to predict the different onset of human cancers. Highly dimensional data have enormous, complicated problems. One of these is an excessive number of genes plus over-fitting, fitting time, and classification accuracy. Recursive Feature Elimination (RFE) is a wrapper method for selecting the best subset of features that cause the best accuracy. Despite the high performance of RFE, time computation and over-fitting are two disadvantages of this algorithm. Random forest for selection (RFS) proves its effectiveness in selecting the effective features and improving the over-fitting problem.

METHOD

This paper proposed a method, namely, positions first bootstrap step (PFBS) random forest selection recursive feature elimination (RFS-RFE) and its abbreviation is PFBS- RFS-RFE to enhance cancer classification performance. It used a bootstrap with many positions included in the outer first bootstrap step (OFBS), inner first bootstrap step (IFBS), and outer/ inner first bootstrap step (O/IFBS). In the first position, OFBS is applied as a resampling method (bootstrap) with replacement before selection step. The RFS is applied with bootstrap = false i.e., the whole datasets are used to build each tree. The importance features are hybrid with RFE to select the most relevant subset of features. In the second position, IFBS is applied as a resampling method (bootstrap) with replacement during applied RFS. The importance features are hybrid with RFE. In the third position, O/IFBS is applied as a hybrid of first and second positions. RFE used logistic regression (LR) as an estimator. The proposed methods are incorporated with four classifiers to solve the feature selection problems and modify the performance of RFE, in which five datasets with different size are used to assess the performance of the PFBS-RFS-RFE.

RESULTS

The results showed that the O/IFBS-RFS-RFE achieved the best performance compared with previous work and enhanced the accuracy, variance and ROC area for RNA gene and dermatology erythemato-squamous diseases datasets to become 99.994%, 0.0000004, 1.000 and 100.000%, 0.0 and 1.000, respectively.

CONCLUSION

High dimensional datasets and RFE algorithm face many troubles in cancers classification performance. PFBS-RFS-RFE is proposed to fix these troubles with different positions. The importance features which extracted from RFS are used with RFE to obtain the effective features.

Collapse

Abasabadi S, Nematzadeh H, Motameni H, Akbari E. Hybrid feature selection based on SLI and genetic algorithm for microarray datasets. THE JOURNAL OF SUPERCOMPUTING 2022;78:19725-19753. [PMID: 35789817 PMCID: PMC9244444 DOI: 10.1007/s11227-022-04650-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 06/08/2022] [Indexed: 06/15/2023]

Anuratha K, Parvathy M. Twitter Sentiment Analysis Using Social-Spider Lex Feature-Based Syntactic-Senti Rule Recurrent Neural Network Classification. INT J UNCERTAIN FUZZ 2022. [DOI: 10.1142/s0218488522400037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Abstract Social platforms have become one of the major sources of unstructured text. Investigating the unstructured text and interpreting the meaning is a complex job. Sentiment Analysis is an emerging approach as the social platforms have lot of opinionated data. 1 It uses language processing, classification of texts and linguistics to retrieve the opinions from the text. Twitter is a micro blogging site which is popular amongst the social users as it is a vast open data-platform and it witnesses lot of sentiments. Twitter Sentiment Analysis is a process of automatic mining of user tweets for opinions, emotions, attitude to derive useful insights into community opinions and classify the opinions as well. Due to the enormous increase in the number of collaborative tweets, it has become complex to identify the terms that carries sentiments. Also, the unstructured tweets may have non-relevant terms and reduce the classification accuracy. To address these issues, we propose a Social-Spider Lex Feature Ensemble Model-Based Syntactic-Senti Rule prediction Recurrent Neural Network Classifier (S2LFEM-S2RRNN) to obtain better classification accuracy. Twitter is used as source of data and we have extracted the tweets using Twitter API. Initially, data pre-processing is done to remove unwanted data, symbols and content terms are extracted to improvise the dataset. Then, the significant lexical content terms are extracted employing the proposed Social Spider Lex Feature Ensemble Model (S2LFEM) based on Syntactic-Senti Rule Prediction. The semantics 4 of the terms are analysed on the verbs, subjectivity of the tweet patterns to count the overall weightage of tweets. Based on tweet weightage Recurrent Neural Network is trained to classify the tweets int to positive, negative and neutral. The experiment results show that the proposed classifier outperforms the existing models for sentiment classification in terms of accuracy with a performance score 94.1%. Collapse

Zhang H, Gong M, Nie F, Li X. Unified Dual-label Semi-supervised Learning with Top-k Feature Selection. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.05.090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Wang Y, Gao X, Ru X, Sun P, Wang J. A hybrid feature selection algorithm and its application in bioinformatics. PeerJ Comput Sci 2022;8:e933. [PMID: 35494789 PMCID: PMC9044222 DOI: 10.7717/peerj-cs.933] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 03/03/2022] [Indexed: 06/14/2023]

Alvarez-Gonzalez R, Mendez-Vazquez A. Deep Learning Architecture Reduction for fMRI Data. Brain Sci 2022;12:brainsci12020235. [PMID: 35203997 PMCID: PMC8870362 DOI: 10.3390/brainsci12020235] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Accepted: 01/12/2022] [Indexed: 11/16/2022] Open

Lin S, Lin Y, Wu K, Wang Y, Feng Z, Duan M, Liu S, Fan Y, Huang L, Zhou F. FeCO3, constructing the network biomarkers using the inter-feature correlation coefficients and its application in detecting high-order breast cancer biomarkers. Curr Bioinform 2022. [DOI: 10.2174/1574893617666220124123303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Abstract Aims: This study aims to formulate the inter-feature correlation as the engineered features. Background: Modern biotechnologies tend to generate a huge number of characteristics of a sample, while an OMIC dataset usually has a few dozens or hundreds of samples due to the high costs of generating the OMIC data. So many bio-OMIC studies assumed the inter-feature independence and selected a feature with a high phenotype-association. Objective: However, many features are closely associated with each other due to their physical or functional interactions, which may be utilized as a new view of features. Method: This study proposed a feature engineering algorithm based on the correlation coefficients (FeCO3) by utilizing the correlations between a given sample and a few reference samples. A comprehensive evaluation was carried out for the proposed FeCO3 network features using 24 bio-OMIC datasets. Result: The experimental data suggested that the newly calculated FeCO3 network features tended to achieve better classification performances than the original features, using the same popular feature selection and classification algorithms. The FeCO3 network features were also consistently supported by the literature. FeCO3 was utilized to investigate the high-order engineered biomarkers of breast cancer, and detected the PBX2 gene (Pre-B-Cell Leukemia Transcription Factor 2) as one of the candidate breast cancer biomarkers. Although the two methylated residues cg14851325 (Pvalue=8.06e-2) and cg16602460 (Pvalue=1.19e-1) within PBX2 did not have statistically significant association with breast cancers, the high-order inter-feature correlations showed a significant association with breast cancers. Conclusion: The proposed FeCO3 network features calculated the high-order inter-feature correlations as novel features, and may facilitate the investigations of complex diseases from this new perspective. The source code is available in FigShare at 10.6084/m9.figshare.13550051 or the web site http://www.healthinformaticslab.org/supp/ . Collapse

Affiliation(s)

Shenggeng Lin College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
Yuqi Lin College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
Kexin Wu College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
Yueying Wang College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, Jilin Province, China
Zixuan Feng College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
Meiyu Duan College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
Shuai Liu College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
Yusi Fan College of Software, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
Lan Huang College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
Fengfeng Zhou College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China

Collapse

An C, Zhou Q, Yang S. A reinforcement learning guided adaptive cost-sensitive feature acquisition method. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.108437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]

Two-way threshold-based intelligent water drops feature selection algorithm for accurate detection of breast cancer. Soft comput 2021. [DOI: 10.1007/s00500-021-06498-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]

Zhang J, Wen X, Cho A, Whang M. An Empathy Evaluation System Using Spectrogram Image Features of Audio. SENSORS 2021;21:s21217111. [PMID: 34770419 PMCID: PMC8587789 DOI: 10.3390/s21217111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Revised: 10/17/2021] [Accepted: 10/25/2021] [Indexed: 12/01/2022]

Hamid TMTA, Sallehuddin R, Yunos ZM, Ali A. Ensemble Based Filter Feature Selection with Harmonize Particle Swarm Optimization and Support Vector Machine for Optimal Cancer Classification. MACHINE LEARNING WITH APPLICATIONS 2021. [DOI: 10.1016/j.mlwa.2021.100054] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022] Open

Mandal M, Singh PK, Ijaz MF, Shafi J, Sarkar R. A Tri-Stage Wrapper-Filter Feature Selection Framework for Disease Classification. SENSORS (BASEL, SWITZERLAND) 2021;21:5571. [PMID: 34451013 PMCID: PMC8402295 DOI: 10.3390/s21165571] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 08/10/2021] [Accepted: 08/13/2021] [Indexed: 12/24/2022]

Abstract

In machine learning and data science, feature selection is considered as a crucial step of data preprocessing. When we directly apply the raw data for classification or clustering purposes, sometimes we observe that the learning algorithms do not perform well. One possible reason for this is the presence of redundant, noisy, and non-informative features or attributes in the datasets. Hence, feature selection methods are used to identify the subset of relevant features that can maximize the model performance. Moreover, due to reduction in feature dimension, both training time and storage required by the model can be reduced as well. In this paper, we present a tri-stage wrapper-filter-based feature selection framework for the purpose of medical report-based disease detection. In the first stage, an ensemble was formed by four filter methods-Mutual Information, ReliefF, Chi Square, and Xvariance-and then each feature from the union set was assessed by three classification algorithms-support vector machine, naïve Bayes, and k-nearest neighbors-and an average accuracy was calculated. The features with higher accuracy were selected to obtain a preliminary subset of optimal features. In the second stage, Pearson correlation was used to discard highly correlated features. In these two stages, XGBoost classification algorithm was applied to obtain the most contributing features that, in turn, provide the best optimal subset. Then, in the final stage, we fed the obtained feature subset to a meta-heuristic algorithm, called whale optimization algorithm, in order to further reduce the feature set and to achieve higher accuracy. We evaluated the proposed feature selection framework on four publicly available disease datasets taken from the UCI machine learning repository, namely, arrhythmia, leukemia, DLBCL, and prostate cancer. Our obtained results confirm that the proposed method can perform better than many state-of-the-art methods and can detect important features as well. Less features ensure less medical tests for correct diagnosis, thus saving both time and cost.

Collapse

Prediction of Hypertension Outcomes Based on Gain Sequence Forward Tabu Search Feature Selection and XGBoost. Diagnostics (Basel) 2021;11:diagnostics11050792. [PMID: 33925766 PMCID: PMC8146551 DOI: 10.3390/diagnostics11050792] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 04/23/2021] [Accepted: 04/26/2021] [Indexed: 01/30/2023] Open

RIFS2D: A two-dimensional version of a randomly restarted incremental feature selection algorithm with an application for detecting low-ranked biomarkers. Comput Biol Med 2021;133:104405. [PMID: 33930763 DOI: 10.1016/j.compbiomed.2021.104405] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 04/13/2021] [Accepted: 04/13/2021] [Indexed: 12/20/2022]

Integration of multi-objective PSO based feature selection and node centrality for medical datasets. Genomics 2020;112:4370-4384. [PMID: 32717320 DOI: 10.1016/j.ygeno.2020.07.027] [Citation(s) in RCA: 69] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Revised: 06/22/2020] [Accepted: 07/14/2020] [Indexed: 01/19/2023]

Tarekegn A, Ricceri F, Costa G, Ferracin E, Giacobini M. Predictive Modeling for Frailty Conditions in Elderly People: Machine Learning Approaches. JMIR Med Inform 2020;8:e16678. [PMID: 32442149 PMCID: PMC7303829 DOI: 10.2196/16678] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 01/07/2020] [Accepted: 02/16/2020] [Indexed: 12/15/2022] Open

Abstract

Background

Frailty is one of the most critical age-related conditions in older adults. It is often recognized as a syndrome of physiological decline in late life, characterized by a marked vulnerability to adverse health outcomes. A clear operational definition of frailty, however, has not been agreed so far. There is a wide range of studies on the detection of frailty and their association with mortality. Several of these studies have focused on the possible risk factors associated with frailty in the elderly population while predicting who will be at increased risk of frailty is still overlooked in clinical settings.

Objective

The objective of our study was to develop predictive models for frailty conditions in older people using different machine learning methods based on a database of clinical characteristics and socioeconomic factors.

Methods

An administrative health database containing 1,095,612 elderly people aged 65 or older with 58 input variables and 6 output variables was used. We first identify and define six problems/outputs as surrogates of frailty. We then resolve the imbalanced nature of the data through resampling process and a comparative study between the different machine learning (ML) algorithms – Artificial neural network (ANN), Genetic programming (GP), Support vector machines (SVM), Random Forest (RF), Logistic regression (LR) and Decision tree (DT) – was carried out. The performance of each model was evaluated using a separate unseen dataset.

Results

Predicting mortality outcome has shown higher performance with ANN (TPR 0.81, TNR 0.76, accuracy 0.78, F1-score 0.79) and SVM (TPR 0.77, TNR 0.80, accuracy 0.79, F1-score 0.78) than predicting the other outcomes. On average, over the six problems, the DT classifier has shown the lowest accuracy, while other models (GP, LR, RF, ANN, and SVM) performed better. All models have shown lower accuracy in predicting an event of an emergency admission with red code than predicting fracture and disability. In predicting urgent hospitalization, only SVM achieved better performance (TPR 0.75, TNR 0.77, accuracy 0.73, F1-score 0.76) with the 10-fold cross validation compared with other models in all evaluation metrics.

Conclusions

We developed machine learning models for predicting frailty conditions (mortality, urgent hospitalization, disability, fracture, and emergency admission). The results show that the prediction performance of machine learning models significantly varies from problem to problem in terms of different evaluation metrics. Through further improvement, the model that performs better can be used as a base for developing decision-support tools to improve early identification and prediction of frail older adults.

Collapse

Yoosefzadeh-Najafabadi M, Earl HJ, Tulpan D, Sulik J, Eskandari M. Application of Machine Learning Algorithms in Plant Breeding: Predicting Yield From Hyperspectral Reflectance in Soybean. FRONTIERS IN PLANT SCIENCE 2020;11:624273. [PMID: 33510761 PMCID: PMC7835636 DOI: 10.3389/fpls.2020.624273] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Accepted: 12/10/2020] [Indexed: 05/20/2023]

Abstract

Recent substantial advances in high-throughput field phenotyping have provided plant breeders with affordable and efficient tools for evaluating a large number of genotypes for important agronomic traits at early growth stages. Nevertheless, the implementation of large datasets generated by high-throughput phenotyping tools such as hyperspectral reflectance in cultivar development programs is still challenging due to the essential need for intensive knowledge in computational and statistical analyses. In this study, the robustness of three common machine learning (ML) algorithms, multilayer perceptron (MLP), support vector machine (SVM), and random forest (RF), were evaluated for predicting soybean (Glycine max) seed yield using hyperspectral reflectance. For this aim, the hyperspectral reflectance data for the whole spectra ranged from 395 to 1005 nm, which were collected at the R4 and R5 growth stages on 250 soybean genotypes grown in four environments. The recursive feature elimination (RFE) approach was performed to reduce the dimensionality of the hyperspectral reflectance data and select variables with the largest importance values. The results indicated that R5 is more informative stage for measuring hyperspectral reflectance to predict seed yields. The 395 nm reflectance band was also identified as the high ranked band in predicting the soybean seed yield. By considering either full or selected variables as the input variables, the ML algorithms were evaluated individually and combined-version using the ensemble-stacking (E-S) method to predict the soybean yield. The RF algorithm had the highest performance with a value of 84% yield classification accuracy among all the individual tested algorithms. Therefore, by selecting RF as the metaClassifier for E-S method, the prediction accuracy increased to 0.93, using all variables, and 0.87, using selected variables showing the success of using E-S as one of the ensemble techniques. This study demonstrated that soybean breeders could implement E-S algorithm using either the full or selected spectra reflectance to select the high-yielding soybean genotypes, among a large number of genotypes, at early growth stages.

Collapse

A Framework for Feature Selection to Exploit Feature Group Structures. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING 2020. [PMCID: PMC7206161 DOI: 10.1007/978-3-030-47426-3_61] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]