He H, Yang H, Mercaldo F, Santone A, Huang P. Isolation forest-voting fusion-multioutput: A stroke risk classification method based on the multidimensional output of abnormal sample detection.
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024;
253:108255. [PMID:
38833760 DOI:
10.1016/j.cmpb.2024.108255]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 12/23/2023] [Accepted: 05/26/2024] [Indexed: 06/06/2024]
Abstract
BACKGROUND AND OBJECTIVE
Stroke has become a major disease threatening the health of people around the world. It has the characteristics of high incidence, high fatality, and a high recurrence rate. At this stage, problems such as poor recognition accuracy of stroke screening based on electronic medical records and insufficient recognition of stroke risk levels exist. These problems occur because of the systematic errors of medical equipment and the characteristics of the collectors during the process of electronic medical record collection. Errors can also occur due to misreporting or underreporting by the collection personnel and the strong subjectivity of the evaluation indicators.
METHODS
This paper proposes an isolation forest-voting fusion-multioutput algorithm model. First, the screening data are collected for numerical processing and normalization. The composite feature score index of this paper is used to analyze the importance of risk factors, and then, the isolation forest is used. The algorithm detects abnormal samples, uses the voting fusion algorithm proposed in this article to perform decision fusion prediction classification, and outputs multidimensional (risk factor importance score, abnormal sample label, risk level classification, and stroke prediction) results that can be used as auxiliary decision information by doctors and medical staff.
RESULTS
The isolation forest-voting fusion-multioutput algorithm proposed in this article has five categories (zero risk, low risk, high risk, ischemic stroke (TIA), and hemorrhagic stroke (HE)). The average accuracy rate of stroke prediction reached 79.59 %.
CONCLUSIONS
The isolation forest-voting fusion-multioutput algorithm model proposed in this paper can not only accurately identify the various categories of stroke risk levels and stroke prediction but can also output multidimensional auxiliary decision-making information to help medical staff make decisions, thereby greatly improving the screening efficiency.
Collapse