1
|
Ibidoja OJ, Shan FP, Ali MKM. Modified sparse regression to solve heterogeneity and hybrid models for increasing the prediction accuracy of seaweed big data with outliers. Sci Rep 2024; 14:17599. [PMID: 39080303 PMCID: PMC11289475 DOI: 10.1038/s41598-024-60612-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 04/25/2024] [Indexed: 08/02/2024] Open
Abstract
The linear regression is critical for data modelling, especially for scientists. Nevertheless, with the plenty of high-dimensional data, there are data with more explanatory variables than the number of observations. In such circumstances, traditional approaches fail. This paper proposes a modified sparse regression model that solves the problem of heterogeneity using seaweed big data as a use case. The modified heterogeneity models for ridge, LASSO and Elastic net were used to model the data. Robust estimations M Bi-Square, M Hampel, M Huber, MM and S were used. Based on the results, the hybrid model of sparse regression for before, after, and modified heterogeneity robust regression with the 45 high ranking variables and a 2-sigma limit can be used efficiently and effectively to reduce the outliers. The obtained results confirm that the hybrid model of the modified sparse LASSO with the M Bi-Square estimator for the 45 high ranking parameters performed better compared with other existing methods.
Collapse
Affiliation(s)
- Olayemi Joshua Ibidoja
- Department of Mathematics, Federal University Gusau, Gusau, Nigeria.
- School of Mathematical Sciences, Universiti Sains Malaysia (USM), 11800, Penang, Malaysia.
| | - Fam Pei Shan
- School of Mathematical Sciences, Universiti Sains Malaysia (USM), 11800, Penang, Malaysia
| | - Majid Khan Majahar Ali
- School of Mathematical Sciences, Universiti Sains Malaysia (USM), 11800, Penang, Malaysia.
| |
Collapse
|
2
|
Zhu Y, Wang K. Heterogeneous robust estimation with the mixed penalty in high-dimensional regression model. COMMUN STAT-THEOR M 2022. [DOI: 10.1080/03610926.2022.2148472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Affiliation(s)
- Yanling Zhu
- School of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu, China
| | - Kai Wang
- School of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu, China
| |
Collapse
|
3
|
Ewees AA, Al-qaness MAA, Abualigah L, Algamal ZY, Oliva D, Yousri D, Elaziz MA. Enhanced feature selection technique using slime mould algorithm: a case study on chemical data. Neural Comput Appl 2022; 35:3307-3324. [PMID: 36245794 PMCID: PMC9547998 DOI: 10.1007/s00521-022-07852-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 09/16/2022] [Indexed: 01/31/2023]
Abstract
Feature selection techniques are considered one of the most important preprocessing steps, which has the most significant influence on the performance of data analysis and decision making. These FS techniques aim to achieve several objectives (such as reducing classification error and minimizing the number of features) at the same time to increase the classification rate. FS based on Metaheuristic (MH) is considered one of the most promising techniques to improve the classification process. This paper presents a modified method of the Slime mould algorithm depending on the Marine Predators Algorithm (MPA) operators as a local search strategy, which leads to increasing the convergence rate of the developed method, named SMAMPA and avoiding the attraction to local optima. The efficiency of SMAMPA is evaluated using twenty datasets and compared its results with the state-of-the-art FS methods. In addition, the applicability of SMAMPA to work with real-world problems is evaluated by using it as a quantitative structure-activity relationship (QSAR) model. The obtained results show the high ability of the developed SMAMPA method to reduce the dimension of the tested datasets by increasing the prediction rate. In addition, it provides results better than other FS techniques in terms of performance metrics.
Collapse
Affiliation(s)
- Ahmed A. Ewees
- Department of Information Systems, College of Computing and Information Technology, University of Bisha, Bisha, 61922 Saudi Arabia
- Department of Computer, Damietta University, Damietta, 34517 Egypt
| | - Mohammed A. A. Al-qaness
- College of Physics and Electronic Information Engineering, Zhejiang Normal University, Jinhua, 321004 China
| | - Laith Abualigah
- Hourani Center for Applied Scientific Research, Al-Ahliyya Amman University, Amman, 19328 Jordan
- Faculty of Information Technology, Middle East University, Amman, 11831 Jordan
| | - Zakariya Yahya Algamal
- Department of Statistics and Informatics, University of Mosul, Mosul, Iraq
- College of Engineering, University of Warith Al-Anbiyaa, Karbala, Iraq
| | - Diego Oliva
- Depto. de Ciencias Computacionales, Universidad de Guadalajara, CUCEI, Av. Revolución 1500, Guadalajara, Jal Mexico
| | - Dalia Yousri
- Department of Electrical Engineering, Faculty of Engineering, Fayoum University, Fayoum, Egypt
| | - Mohamed Abd Elaziz
- Department of Mathematics, Faculty of Science, Zagazig University, Zagazig, 44519 Egypt
- Faculty of Computer Science and Engineering, Galala University, Suez, Egypt
- Artificial Intelligence Research Center (AIRC), Ajman University, Ajman, UAE
- Department of Electrical and Computer Engineering, Lebanese American University, Byblos, Lebanon
| |
Collapse
|
4
|
Kosar Tas C, Guler H, Yalcin Y. The usage of bridge estimator to determine the order of integration for possibly integrated series as an alternative to Dickey–Pantula unit root test. COMMUN STAT-SIMUL C 2022. [DOI: 10.1080/03610918.2022.2117826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Affiliation(s)
| | - Huseyin Guler
- Department of Econometrics, Cukurova University, Adana, Turkey
| | - Yeliz Yalcin
- Department of Econometrics, Ankara Haci Bayram Veli University, Adana, Turkey
| |
Collapse
|