1
|
Ortiz BL, Gupta V, Kumar R, Jalin A, Cao X, Ziegenbein C, Singhal A, Tewari M, Choi SW. Data Preprocessing Techniques for AI and Machine Learning Readiness: Scoping Review of Wearable Sensor Data in Cancer Care. JMIR Mhealth Uhealth 2024; 12:e59587. [PMID: 38626290 DOI: 10.2196/59587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 06/12/2024] [Accepted: 08/27/2024] [Indexed: 04/18/2024] Open
Abstract
BACKGROUND Wearable sensors are increasingly being explored in health care, including in cancer care, for their potential in continuously monitoring patients. Despite their growing adoption, significant challenges remain in the quality and consistency of data collected from wearable sensors. Moreover, preprocessing pipelines to clean, transform, normalize, and standardize raw data have not yet been fully optimized. OBJECTIVE This study aims to conduct a scoping review of preprocessing techniques used on raw wearable sensor data in cancer care, specifically focusing on methods implemented to ensure their readiness for artificial intelligence and machine learning (AI/ML) applications. We sought to understand the current landscape of approaches for handling issues, such as noise, missing values, normalization or standardization, and transformation, as well as techniques for extracting meaningful features from raw sensor outputs and converting them into usable formats for subsequent AI/ML analysis. METHODS We systematically searched IEEE Xplore, PubMed, Embase, and Scopus to identify potentially relevant studies for this review. The eligibility criteria included (1) mobile health and wearable sensor studies in cancer, (2) written and published in English, (3) published between January 2018 and December 2023, (4) full text available rather than abstracts, and (5) original studies published in peer-reviewed journals or conferences. RESULTS The initial search yielded 2147 articles, of which 20 (0.93%) met the inclusion criteria. Three major categories of preprocessing techniques were identified: data transformation (used in 12/20, 60% of selected studies), data normalization and standardization (used in 8/20, 40% of the selected studies), and data cleaning (used in 8/20, 40% of the selected studies). Transformation methods aimed to convert raw data into more informative formats for analysis, such as by segmenting sensor streams or extracting statistical features. Normalization and standardization techniques usually normalize the range of features to improve comparability and model convergence. Cleaning methods focused on enhancing data reliability by handling artifacts like missing values, outliers, and inconsistencies. CONCLUSIONS While wearable sensors are gaining traction in cancer care, realizing their full potential hinges on the ability to reliably translate raw outputs into high-quality data suitable for AI/ML applications. This review found that researchers are using various preprocessing techniques to address this challenge, but there remains a lack of standardized best practices. Our findings suggest a pressing need to develop and adopt uniform data quality and preprocessing workflows of wearable sensor data that can support the breadth of cancer research and varied patient populations. Given the diverse preprocessing techniques identified in the literature, there is an urgency for a framework that can guide researchers and clinicians in preparing wearable sensor data for AI/ML applications. For the scoping review as well as our research, we propose a general framework for preprocessing wearable sensor data, designed to be adaptable across different disease settings, moving beyond cancer care.
Collapse
Affiliation(s)
- Bengie L Ortiz
- Department of Pediatrics, Hematology and Oncology Division, Michigan Medicine, University of Michigan Health System, Ann Arbor, MI, United States
| | - Vibhuti Gupta
- School of Applied Computational Sciences, Meharry Medical College, Nashville, TN, United States
| | - Rajnish Kumar
- Department of Pediatrics, Hematology and Oncology Division, Michigan Medicine, University of Michigan Health System, Ann Arbor, MI, United States
| | - Aditya Jalin
- Department of Pediatrics, Hematology and Oncology Division, Michigan Medicine, University of Michigan Health System, Ann Arbor, MI, United States
| | - Xiao Cao
- Department of Pediatrics, Hematology and Oncology Division, Michigan Medicine, University of Michigan Health System, Ann Arbor, MI, United States
| | - Charles Ziegenbein
- Department of Pediatrics, Hematology and Oncology Division, Michigan Medicine, University of Michigan Health System, Ann Arbor, MI, United States
- Autonomous Systems Research Department, Peraton Labs, Basking Ridge, NJ, United States
| | - Ashutosh Singhal
- School of Applied Computational Sciences, Meharry Medical College, Nashville, TN, United States
| | - Muneesh Tewari
- Department of Biomedical Engineering, College of Engineering, University of Michigan, Ann Arbor, MI, United States
- Rogel Comprehensive Cancer Center, University of Michigan, Ann Arbor, MI, United States
- VA Ann Arbor Healthcare System, Ann Arbor, MI, United States
- Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, United States
| | - Sung Won Choi
- Department of Pediatrics, Hematology and Oncology Division, Michigan Medicine, University of Michigan Health System, Ann Arbor, MI, United States
- Rogel Comprehensive Cancer Center, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|
2
|
Kervezee L, Dashti HS, Pilz LK, Skarke C, Ruben MD. Using routinely collected clinical data for circadian medicine: A review of opportunities and challenges. PLOS DIGITAL HEALTH 2024; 3:e0000511. [PMID: 38781189 PMCID: PMC11115276 DOI: 10.1371/journal.pdig.0000511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
A wealth of data is available from electronic health records (EHR) that are collected as part of routine clinical care in hospitals worldwide. These rich, longitudinal data offer an attractive object of study for the field of circadian medicine, which aims to translate knowledge of circadian rhythms to improve patient health. This narrative review aims to discuss opportunities for EHR in studies of circadian medicine, highlight the methodological challenges, and provide recommendations for using these data to advance the field. In the existing literature, we find that data collected in real-world clinical settings have the potential to shed light on key questions in circadian medicine, including how 24-hour rhythms in clinical features are associated with-or even predictive of-health outcomes, whether the effect of medication or other clinical activities depend on time of day, and how circadian rhythms in physiology may influence clinical reference ranges or sampling protocols. However, optimal use of EHR to advance circadian medicine requires careful consideration of the limitations and sources of bias that are inherent to these data sources. In particular, time of day influences almost every interaction between a patient and the healthcare system, creating operational 24-hour patterns in the data that have little or nothing to do with biology. Addressing these challenges could help to expand the evidence base for the use of EHR in the field of circadian medicine.
Collapse
Affiliation(s)
- Laura Kervezee
- Group of Circadian Medicine, Department of Cell and Chemical Biology, Leiden University Medical Center, Leiden, the Netherlands
| | - Hassan S. Dashti
- Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Luísa K. Pilz
- Department of Anesthesiology and Intensive Care Medicine CCM / CVK, Charité–Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Berlin, Germany
- ECRC Experimental and Clinical Research Center, Charité–Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Berlin, Germany
| | - Carsten Skarke
- Institute for Translational Medicine and Therapeutics (ITMAT), University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
- Chronobiology and Sleep Institute (CSI), University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
- Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Marc D. Ruben
- Divisions of Pulmonary and Sleep Medicine and Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| |
Collapse
|
3
|
Bekollari M, Dettoraki M, Stavrou V, Glotsos D, Liaparinos P. Computer-Aided Discrimination of Glaucoma Patients from Healthy Subjects Using the RETeval Portable Device. Diagnostics (Basel) 2024; 14:349. [PMID: 38396388 PMCID: PMC10888400 DOI: 10.3390/diagnostics14040349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 01/29/2024] [Accepted: 02/01/2024] [Indexed: 02/25/2024] Open
Abstract
Glaucoma is a chronic, progressive eye disease affecting the optic nerve, which may cause visual damage and blindness. In this study, we present a machine-learning investigation to classify patients with glaucoma (case group) with respect to normal participants (control group). We examined 172 eyes at the Ophthalmology Clinic of the "Elpis" General Hospital of Athens between October 2022 and September 2023. In addition, we investigated the glaucoma classification in terms of the following: (a) eye selection and (b) gender. Our methodology was based on the features extracted via two diagnostic optical systems: (i) conventional optical coherence tomography (OCT) and (ii) a modern RETeval portable device. The machine-learning approach comprised three different classifiers: the Bayesian, the Probabilistic Neural Network (PNN), and Support Vectors Machines (SVMs). For all cases examined, classification accuracy was found to be significantly higher when using the RETeval device with respect to the OCT system, as follows: 14.7% for all participants, 13.4% and 29.3% for eye selection (right and left, respectively), and 25.6% and 22.6% for gender (male and female, respectively). The most efficient classifier was found to be the SVM compared to the PNN and Bayesian classifiers. In summary, all aforementioned comparisons demonstrate that the RETeval device has the advantage over the OCT system for the classification of glaucoma patients by using the machine-learning approach.
Collapse
Affiliation(s)
- Marsida Bekollari
- Department of Biomedical Engineering, University of West Attica, Ag. Spyridonos, 12243 Athens, Greece; (M.B.); (D.G.)
| | - Maria Dettoraki
- Department of Ophthalmology, “Elpis” General Hospital, 11522 Athens, Greece
| | - Valentina Stavrou
- Department of Ophthalmology, “Elpis” General Hospital, 11522 Athens, Greece
| | - Dimitris Glotsos
- Department of Biomedical Engineering, University of West Attica, Ag. Spyridonos, 12243 Athens, Greece; (M.B.); (D.G.)
| | - Panagiotis Liaparinos
- Department of Biomedical Engineering, University of West Attica, Ag. Spyridonos, 12243 Athens, Greece; (M.B.); (D.G.)
| |
Collapse
|
4
|
Syversen A, Dosis A, Jayne D, Zhang Z. Wearable Sensors as a Preoperative Assessment Tool: A Review. SENSORS (BASEL, SWITZERLAND) 2024; 24:482. [PMID: 38257579 PMCID: PMC10820534 DOI: 10.3390/s24020482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 01/06/2024] [Accepted: 01/09/2024] [Indexed: 01/24/2024]
Abstract
Surgery is a common first-line treatment for many types of disease, including cancer. Mortality rates after general elective surgery have seen significant decreases whilst postoperative complications remain a frequent occurrence. Preoperative assessment tools are used to support patient risk stratification but do not always provide a precise and accessible assessment. Wearable sensors (WS) provide an accessible alternative that offers continuous monitoring in a non-clinical setting. They have shown consistent uptake across the perioperative period but there has been no review of WS as a preoperative assessment tool. This paper reviews the developments in WS research that have application to the preoperative period. Accelerometers were consistently employed as sensors in research and were frequently combined with photoplethysmography or electrocardiography sensors. Pre-processing methods were discussed and missing data was a common theme; this was dealt with in several ways, commonly by employing an extraction threshold or using imputation techniques. Research rarely processed raw data; commercial devices that employ internal proprietary algorithms with pre-calculated heart rate and step count were most commonly employed limiting further feature extraction. A range of machine learning models were used to predict outcomes including support vector machines, random forests and regression models. No individual model clearly outperformed others. Deep learning proved successful for predicting exercise testing outcomes but only within large sample-size studies. This review outlines the challenges of WS and provides recommendations for future research to develop WS as a viable preoperative assessment tool.
Collapse
Affiliation(s)
- Aron Syversen
- School of Computing, University of Leeds, Leeds LS2 9JT, UK
| | - Alexios Dosis
- School of Medicine, University of Leeds, Leeds LS2 9JT, UK; (A.D.); (D.J.)
| | - David Jayne
- School of Medicine, University of Leeds, Leeds LS2 9JT, UK; (A.D.); (D.J.)
| | - Zhiqiang Zhang
- School of Electrical Engineering, University of Leeds, Leeds LS2 9JT, UK;
| |
Collapse
|
5
|
Ciaraglia A, Osta E, Wang H, Cigarroa F, Thomas E, Fritze D, Nicholson S, Eastridge B, Convertino VA. EVIDENCE FOR BENEFICIAL USE OF THE COMPENSATORY RESERVE MEASUREMENT IN GUIDING INTRAOPERATIVE RESUSCITATION: A PROSPECTIVE COHORT STUDY OF ORTHOTOPIC LIVER TRANSPLANT RECIPIENTS. Shock 2024; 61:61-67. [PMID: 38010037 DOI: 10.1097/shk.0000000000002260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
ABSTRACT Introduction: The compensatory reserve measurement (CRM) is a continuous noninvasive monitoring technology that provides an assessment of the integrated capacity of all physiological mechanisms associated with responses to a hypovolemic stressor such as hemorrhagic shock. No prior studies have analyzed its use for intraoperative resuscitation guidance. Methods: A prospective observational study was conducted of 23 patients undergoing orthotopic liver transplant. Chart review was performed to identify timing of various intraoperative events. Data were compared based on predefined thresholds for existence of hemorrhagic shock: CRM lower than 40%, systolic blood pressure (SBP) lower than 90 mm Hg (SBP90), and heart rate (HR) higher than 100 beats per minute (HR100). Regression analysis was performed for predicting resuscitation events, and nonlinear eXtreme Gradient Boosting (XGBoost) models were used to compare CRM with standard vital sign measures. Results: Events where CRM dropped lower than 40% were 2.25 times more likely to lead to an intervention, whereas HR100 and SBP90 were not associated with intraoperative interventions. XGBoost prediction models showed superior discriminatory capacity of CRM alone compared with the model with SBP and HR and no difference when all three were combined (CRM-HR-SBP). All XGBoost models outperformed equivalent linear regression models. Conclusion: These results demonstrate that CRM can provide an adjunctive clinical tool that can augment early and accurate of hemodynamic compromise and promote goal-directed resuscitation in the perioperative setting.
Collapse
Affiliation(s)
| | - Eri Osta
- Division of Trauma and Critical Care, Department of Surgery
| | | | - Francisco Cigarroa
- Division of Transplant and Hepatobiliary Surgery, Department of Surgery, University of Texas Health Science Center at San Antonio
| | - Elizabeth Thomas
- Division of Transplant and Hepatobiliary Surgery, Department of Surgery, University of Texas Health Science Center at San Antonio
| | - Danielle Fritze
- Division of Transplant and Hepatobiliary Surgery, Department of Surgery, University of Texas Health Science Center at San Antonio
| | | | | | | |
Collapse
|