1
|
Feng H, Wang S, Wang Y, Ni X, Yang Z, Hu X, Sen Yang. LncCat: An ORF attention model to identify LncRNA based on ensemble learning strategy and fused sequence information. Comput Struct Biotechnol J 2023; 21:1433-1447. [PMID: 36824229 PMCID: PMC9941877 DOI: 10.1016/j.csbj.2023.02.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 02/06/2023] [Accepted: 02/06/2023] [Indexed: 02/10/2023] Open
Abstract
Background Long non-coding RNA (lncRNA) is one of the most essential forms of transcripts, playing crucial regulatory roles in the development of cancers and diseases without protein-coding ability. It was assumed that short ORFs (sORFs) in lncRNA were weak to translate proteins. However, recent research has shown that sORFs can encode peptides, which increases the difficulty to identify lncRNA. Therefore, identifying lncRNAs with sORFs facilitates finding novel regulatory factors. Results In this paper, we propose LncCat for identifying lncRNA based on category boosting (CatBoost) and ORF-attention features. LncCat combines five types of features to encode transcript sequences and employs CatBoost to build a prediction model. In addition, the visualization comparison reveals that the ORF-attention features between lncRNAs and protein-coding transcripts are significantly distinct. The comparison results show that LncCat outperforms competing methods on several benchmark datasets. For Matthew's Correlation Coefficient (MCC), LncCat achieves 0.9503, 0.9219, 0.8591, 0.8672, and 0.9047 on the human, mouse, zebrafish, wheat, and chicken datasets, with improvements ranging from 1.90% to 7.82%, 1.49-17.63%, 6.11-21.50%, 3.02-51.64% and 5.35-26.90%, respectively. Moreover, LncCat dramatically improves the MCC by at least 11.90%, 12.96% and 42.61% on sORF test datasets of human, mouse, and zebrafish, respectively. Conclusions Experiments indicate that LncCat performs better both on long ORF and sORF datasets, and ORF-attention features show positive effects on predicting lncRNA. In brief, LncCat is a reliable method for identifying lncRNA. Additionally, a user-friendly web server is developed for academics at http://cczubio.top/lnccat.
Collapse
Affiliation(s)
- Hongqi Feng
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China
| | - Shaocong Wang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China
| | - Yan Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
- School of Artificial Intelligence, Jilin University, Changchun 130012, China
| | - Xinye Ni
- The Affiliated Changzhou No.2 People’s Hospital of Nanjing Medical University, Changzhou 213164, China
| | - Zexi Yang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China
| | - Xuemei Hu
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Sen Yang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China
- The Affiliated Changzhou No.2 People’s Hospital of Nanjing Medical University, Changzhou 213164, China
| |
Collapse
|
2
|
Jafari S, Byun YC. XGBoost-Based Remaining Useful Life Estimation Model with Extended Kalman Particle Filter for Lithium-Ion Batteries. SENSORS (BASEL, SWITZERLAND) 2022; 22:9522. [PMID: 36502223 PMCID: PMC9736930 DOI: 10.3390/s22239522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 11/30/2022] [Accepted: 12/02/2022] [Indexed: 06/17/2023]
Abstract
The instability and variable lifetime are the benefits of high efficiency and low-cost issues in lithium-ion batteries.An accurate equipment's remaining useful life prediction is essential for successful requirement-based maintenance to improve dependability and lower total maintenance costs. However, it is challenging to assess a battery's working capacity, and specific prediction methods are unable to represent the uncertainty. A scientific evaluation and prediction of a lithium-ion battery's state of health (SOH), mainly its remaining useful life (RUL), is crucial to ensuring the battery's safety and dependability over its entire life cycle and preventing as many catastrophic accidents as feasible. Many strategies have been developed to determine the prediction of the RUL and SOH of lithium-ion batteries, including particle filters (PFs). This paper develops a novel PF-based technique for lithium-ion battery RUL estimation, combining a Kalman filter (KF) with a PF to analyze battery operating data. The PF method is used as the core, and extreme gradient boosting (XGBoost) is used as the observation RUL battery prediction. Due to the powerful nonlinear fitting capabilities, XGBoost is used to map the connection between the retrieved features and the RUL. The life cycle testing aims to gather precise and trustworthy data for RUL prediction. RUL prediction results demonstrate the improved accuracy of our suggested strategy compared to that of other methods. The experiment findings show that the suggested technique can increase the accuracy of RUL prediction when applied to a lithium-ion battery's cycle life data set. The results demonstrate the benefit of the presented method in achieving a more accurate remaining useful life prediction.
Collapse
Affiliation(s)
- Sadiqa Jafari
- Department of Electronic Engineering, Institute of Information Science & Technology, Jeju National University, Jeju 63243, Republic of Korea
| | - Yung-Cheol Byun
- Department of Computer Engineering, Major of Electronic Engineering, Institute of Information Science & Technology, Jeju National University, Jeju 63243, Republic of Korea
| |
Collapse
|
3
|
Abd El-Aziz RM. Renewable power source energy consumption by hybrid machine learning model. ALEXANDRIA ENGINEERING JOURNAL 2022; 61:9447-9455. [DOI: 10.1016/j.aej.2022.03.019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
4
|
Deep Learning Prediction for Rotational Speed of Turbine in Oscillating Water Column-Type Wave Energy Converter. ENERGIES 2022. [DOI: 10.3390/en15020572] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
This study uses deep learning algorithms to predict the rotational speed of the turbine generator in an oscillating water column-type wave energy converter (OWC-WEC). The effective control and operation of OWC-WECs remain a challenge due to the variation in the input wave energy and the significantly high peak-to-average power ratio. Therefore, the rated power control of OWC-WECs is essential for increasing the operating time and power output. The existing rated power control method is based on the instantaneous rotational speed of the turbine generator. However, due to physical limitations, such as the valve operating time, a more refined rated power control method is required. Therefore, we propose a method that applies a deep learning algorithm. Our method predicts the instantaneous rotational speed of the turbine generator and the rated power control is performed based on the prediction. This enables precise control through the operation of the high-speed safety valve before the energy input exceeds the rated value. The prediction performances for various algorithms, such as a multi-layer perceptron (MLP), recurrent neural network (RNN), long short-term memory (LSTM), and convolutional neural network (CNN), are compared. In addition, the prediction performance of each algorithm as a function of the input datasets is investigated using various error evaluation methods. For the training datasets, the operation data from an OWC-WEC west of Jeju in South Korea is used. The analysis demonstrates that LSTM exhibits the most accurate prediction of the instantaneous rotational speed of a turbine generator and CNN has visible advantages when the data correlation is low.
Collapse
|
5
|
Şahin U, Ballı S, Chen Y. Forecasting seasonal electricity generation in European countries under Covid-19-induced lockdown using fractional grey prediction models and machine learning methods. APPLIED ENERGY 2021; 302:117540. [PMID: 36567791 PMCID: PMC9757929 DOI: 10.1016/j.apenergy.2021.117540] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2021] [Revised: 07/28/2021] [Accepted: 08/03/2021] [Indexed: 05/03/2023]
Abstract
Balances in the energy sector have changed since the implementation of the Covid-19 pandemic lockdown in Europe. This paper analyses how the lockdown affected electricity generation in European countries and how it will reshape future energy generation. Monthly electricity generation from total renewables and non-renewables in France, Germany, Spain, Turkey, and the UK from January 2017 to September 2020 were evaluated and compared. Four seasonal grey prediction models and three machine learning methods were used for forecasting; the quarterly results are presented to the end of 2021. Additionally, the share of electricity generation from renewables in total electricity generation from 2017 to 2021 for the selected countries was compared. Electricity generation from total non-renewables in the second quarter of 2020 for France, Germany, Spain, and the UK decreased by 21%-25% compared to the same period of 2019; the decline in Turkey was approximately 11%. Additionally, electricity generation from non-renewables in the third quarter of 2020 for all countries, except Turkey, decreased compared to the same period of the previous year. All grey prediction models and support vector machine method forecast that the share of renewables in total electricity generation will increase continuously in France, Germany, Spain, and the UK to the end of 2021. The forecasting methods provided by this study open new avenues for research on the impact of the Covid-19 pandemic on the future of the energy sector.
Collapse
Affiliation(s)
- Utkucan Şahin
- Department of Energy Systems Engineering, Faculty of Technology, Muğla Sıtkı Koçman University, 48000 Muğla, Turkey
| | - Serkan Ballı
- Department of Information Systems Engineering, Faculty of Technology, Muğla Sıtkı Koçman University, 48000 Muğla, Turkey
| | - Yan Chen
- College of Management Engineering and Business, Hebei University of Engineering, 056038 Handan, China
| |
Collapse
|
6
|
Abstract
The energy manufacturers are required to produce an accurate amount of energy by meeting the energy requirements at the end-user side. Consequently, energy prediction becomes an essential role in the electric industrial zone. In this paper, we propose the hybrid ensemble deep learning model, which combines multilayer perceptron (MLP), convolutional neural network (CNN), long short-term memory (LSTM), and hybrid CNN-LSTM to improve the forecasting performance. These DL architectures are more popular and better than other machine learning (ML) models for time series electrical load prediction. Therefore, hourly-based energy data are collected from Jeju Island, South Korea, and applied for forecasting. We considered external features associated with meteorological conditions affecting energy. Two-year training and one-year testing data are preprocessed and arranged to reform the times series, which are then trained in each DL model. The forecasting results of the proposed ensemble model are evaluated by using mean square error (MSE), mean absolute error (MAE), and mean absolute percentage error (MAPE). Error metrics are compared with DL stand-alone models such as MLP, CNN, LSTM, and CNN-LSTM. Our ensemble model provides better performance than other forecasting models, providing minimum MAPE at 0.75%, and was proven to be inherently symmetric for forecasting time-series energy and demand data, which is of utmost concern to the power system sector.
Collapse
|
7
|
Extreme Gradient Boosting for Recommendation System by Transforming Product Classification into Regression Based on Multi-Dimensional Word2Vec. Symmetry (Basel) 2021. [DOI: 10.3390/sym13050758] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Now that untact services are widespread and worldwide, the number of users visiting online shopping malls has increased. For example, the recommendation systems in Netflix, Amazon, etc., have gained a lot of attention by attracting many users and have made large profit by recommending suitable products to their users. In the paper, we conduct a study to enhance recommendation accuracy using Word2Vec, widely used in natural language processing. We collect user shopping history with personal click preference information of product items as data, representing a document for natural language processing. The sequence of product item clicks is fed into the Word2Vec technology algorithm to obtain the vectors symmetrically representing all of the product items clicked by users. Training and test data have a series of vectors representing a sequence of the clicked product items as inputs and a purchased product as a target. Machine learning models recommend a product as a symmetric vector for each input and calculate the similarity among the recommended vectors and all other registered products they sell in the system to recommend multiple products as final recommendation results. We use XGBoost regressor and classifier models to recommend some products that users would like and evaluate the recommendation accuracy. A finally recommended product by the models is a vector, and the system recommends some more products by calculating the similarity as mentioned above. We evaluated the classifier model’s recommendation accuracy without Word2Vec encoding first and then with the Word2Vec technique. Meanwhile, we can represent the products with single or multiple dimensional vectors. We noted that the recommendation accuracy increases when we use multiple dimensions of Word2Vec vectors from the experiments. We also evaluated the performances when the system recommends one or multiple products. For the recommendation of multiple products (five here), a regression model has higher accuracy than a classification model in all dimensions of vectors.
Collapse
|
8
|
Ensemble Prediction Approach Based on Learning to Statistical Model for Efficient Building Energy Consumption Management. Symmetry (Basel) 2021. [DOI: 10.3390/sym13030405] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
With the development of modern power systems (smart grid), energy consumption prediction becomes an essential aspect of resource planning and operations. In the last few decades, industrial and commercial buildings have thoroughly been investigated for consumption patterns. However, due to the unavailability of data, the residential buildings could not get much attention. During the last few years, many solutions have been devised for predicting electric consumption; however, it remains a challenging task due to the dynamic nature of residential consumption patterns. Therefore, a more robust solution is required to improve the model performance and achieve a better prediction accuracy. This paper presents an ensemble approach based on learning to a statistical model to predict the short-term energy consumption of a multifamily residential building. Our proposed approach utilizes Long Short-Term Memory (LSTM) and Kalman Filter (KF) to build an ensemble prediction model to predict short term energy demands of multifamily residential buildings. The proposed approach uses real energy data acquired from the multifamily residential building, South Korea. Different statistical measures are used, such as mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and R2 score, to evaluate the performance of the proposed approach and compare it with existing models. The experimental results reveal that the proposed approach predicts accurately and outperforms the existing models. Furthermore, a comparative analysis is performed to evaluate and compare the proposed model with conventional machine learning models. The experimental results show the effectiveness and significance of the proposed approach compared to existing energy prediction models. The proposed approach will support energy management to effectively plan and manage the energy supply and demands of multifamily residential buildings.
Collapse
|
9
|
Combination of Feature Selection and CatBoost for Prediction: The First Application to the Estimation of Aboveground Biomass. FORESTS 2021. [DOI: 10.3390/f12020216] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Increasing numbers of explanatory variables tend to result in information redundancy and “dimensional disaster” in the quantitative remote sensing of forest aboveground biomass (AGB). Feature selection of model factors is an effective method for improving the accuracy of AGB estimates. Machine learning algorithms are also widely used in AGB estimation, although little research has addressed the use of the categorical boosting algorithm (CatBoost) for AGB estimation. Both feature selection and regression for AGB estimation models are typically performed with the same machine learning algorithm, but there is no evidence to suggest that this is the best method. Therefore, the present study focuses on evaluating the performance of the CatBoost algorithm for AGB estimation and comparing the performance of different combinations of feature selection methods and machine learning algorithms. AGB estimation models of four forest types were developed based on Landsat OLI data using three feature selection methods (recursive feature elimination (RFE), variable selection using random forests (VSURF), and least absolute shrinkage and selection operator (LASSO)) and three machine learning algorithms (random forest regression (RFR), extreme gradient boosting (XGBoost), and categorical boosting (CatBoost)). Feature selection had a significant influence on AGB estimation. RFE preserved the most informative features for AGB estimation and was superior to VSURF and LASSO. In addition, CatBoost improved the accuracy of the AGB estimation models compared with RFR and XGBoost. AGB estimation models using RFE for feature selection and CatBoost as the regression algorithm achieved the highest accuracy, with root mean square errors (RMSEs) of 26.54 Mg/ha for coniferous forest, 24.67 Mg/ha for broad-leaved forest, 22.62 Mg/ha for mixed forests, and 25.77 Mg/ha for all forests. The combination of RFE and CatBoost had better performance than the VSURF–RFR combination in which random forests were used for both feature selection and regression, indicating that feature selection and regression performed by a single machine learning algorithm may not always ensure optimal AGB estimation. It is promising to extending the application of new machine learning algorithms and feature selection methods to improve the accuracy of AGB estimates.
Collapse
|