Severinsen I, Yu W, Walmsley T, Young B. COVERT: A classless approach to generating balanced datasets for process modelling.
ISA TRANSACTIONS 2024;
144:1-10. [PMID:
37951753 DOI:
10.1016/j.isatra.2023.10.031]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 09/04/2023] [Accepted: 10/27/2023] [Indexed: 11/14/2023]
Abstract
In this work, a classless oversampling technique, Covert, was developed to improve historical datasets from industrial processing plants to aid process modelling. Using kernel density estimation and nearest neighbour algorithms, sparse regions are identified and resampled, developing a more balanced dataset. When applied to a real dataset from a geothermal power plant, Covert outperforms current best practice (Smote) in uniformly populating the input feature space and generating credible data in the output variable. When used to develop a data-driven model Covert improved model accuracy by 20% when predicting outside the original data's feature space. Smote, however, reduced model accuracy by 6% in the same feature space. Developing reliable models of industrial processes continues to be a significant hurdle in developing a digital twin. Using Covert, existing imbalanced historical data can be used to extend the range of applicability of any process model.
Collapse