Nolde JM, Carnagarin R, Lugo-Gavidia LM, Azzam O, Kiuchi MG, Robinson S, Mian A, Schlaich MP. Autoencoded deep features for semi-automatic, weakly supervised physiological signal labelling.
Comput Biol Med 2022;
143:105294. [PMID:
35203038 DOI:
10.1016/j.compbiomed.2022.105294]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 01/23/2022] [Accepted: 02/02/2022] [Indexed: 11/19/2022]
Abstract
BACKGROUND AND AIMS
Machine Learning is transforming data processing in medical research and clinical practice. Missing data labels are a common limitation to training Machine Learning models. To overcome missing labels in a large dataset of microneurography recordings, a novel autoencoder based semi-supervised, iterative group-labelling methodology was developed.
METHODS
Autoencoders were systematically optimised to extract features from a dataset of 478621 signal excerpts from human microneurography recordings. Selected features were clusters with k-means and randomly selected representations of the corresponding original signals labelled as valid or non-valid muscle sympathetic nerve activity (MSNA) bursts in an iterative, purifying procedure by an expert rater. A deep neural network was trained based on the fully labelled dataset.
RESULTS
Three autoencoders, two based on fully connected neural networks and one based on convolutional neural network, were chosen for feature learning. Iterative clustering followed by labelling of complete clusters resulted in all 478621 signal peak excerpts being labelled as valid or non-valid within 13 iterations. Neural networks trained with the labelled dataset achieved, in a cross validation step with a testing dataset not included in training, on average 93.13% accuracy and 91% area under the receiver operating curve (AUC ROC).
DISCUSSION
The described labelling procedure enabled efficient labelling of a large dataset of physiological signal based on expert ratings. The procedure based on autoencoders may be broadly applicable to a wide range of datasets without labels that require expert input and may be utilised for Machine Learning applications if weak-labels were available.
Collapse