Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Cismondi F, Fialho AS, Vieira SM, Reti SR, Sousa JM, Finkelstein SN. Missing data in medical databases: Impute, delete or classify? Artif Intell Med 2013;58:63-72. [DOI: 10.1016/j.artmed.2013.01.003] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2012] [Revised: 11/01/2012] [Accepted: 01/10/2013] [Indexed: 10/27/2022]

For:	Cismondi F, Fialho AS, Vieira SM, Reti SR, Sousa JM, Finkelstein SN. Missing data in medical databases: Impute, delete or classify? Artif Intell Med 2013;58:63-72. [DOI: 10.1016/j.artmed.2013.01.003] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2012] [Revised: 11/01/2012] [Accepted: 01/10/2013] [Indexed: 10/27/2022]

Number

Cited by Other Article(s)

Gonçalves Pereira J, Fernandes J, Mendes T, Gonzalez FA, Fernandes SM. Artificial Intelligence to Close the Gap between Pharmacokinetic/Pharmacodynamic Targets and Clinical Outcomes in Critically Ill Patients: A Narrative Review on Beta Lactams. Antibiotics (Basel) 2024;13:853. [PMID: 39335027 PMCID: PMC11428226 DOI: 10.3390/antibiotics13090853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 08/30/2024] [Accepted: 09/04/2024] [Indexed: 09/30/2024] Open

Palomino-Echeverria S, Huergo E, Ortega-Legarreta A, Uson Raposo EM, Aguilar F, Peña-Ramirez CDL, López-Vicario C, Alessandria C, Laleman W, Queiroz Farias A, Moreau R, Fernandez J, Arroyo V, Caraceni P, Lagani V, Sánchez-Garrido C, Clària J, Tegner J, Trebicka J, Kiani NA, Planell N, Rautou PE, Gomez-Cabrero D. A robust clustering strategy for stratification unveils unique patient subgroups in acutely decompensated cirrhosis. J Transl Med 2024;22:599. [PMID: 38937846 PMCID: PMC11210156 DOI: 10.1186/s12967-024-05386-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 06/10/2024] [Indexed: 06/29/2024] Open

Abstract

BACKGROUND

Patient heterogeneity poses significant challenges for managing individuals and designing clinical trials, especially in complex diseases. Existing classifications rely on outcome-predicting scores, potentially overlooking crucial elements contributing to heterogeneity without necessarily impacting prognosis.

METHODS

To address patient heterogeneity, we developed ClustALL, a computational pipeline that simultaneously faces diverse clinical data challenges like mixed types, missing values, and collinearity. ClustALL enables the unsupervised identification of patient stratifications while filtering for stratifications that are robust against minor variations in the population (population-based) and against limited adjustments in the algorithm's parameters (parameter-based).

RESULTS

Applied to a European cohort of patients with acutely decompensated cirrhosis (n = 766), ClustALL identified five robust stratifications, using only data at hospital admission. All stratifications included markers of impaired liver function and number of organ dysfunction or failure, and most included precipitating events. When focusing on one of these stratifications, patients were categorized into three clusters characterized by typical clinical features; notably, the 3-cluster stratification showed a prognostic value. Re-assessment of patient stratification during follow-up delineated patients' outcomes, with further improvement of the prognostic value of the stratification. We validated these findings in an independent prospective multicentre cohort of patients from Latin America (n = 580).

CONCLUSIONS

By applying ClustALL to patients with acutely decompensated cirrhosis, we identified three patient clusters. Following these clusters over time offers insights that could guide future clinical trial design. ClustALL is a novel and robust stratification method capable of addressing the multiple challenges of patient stratification in most complex diseases.

Collapse

Affiliation(s)

Sara Palomino-Echeverria Unit of Translational Bioinformatics, Navarrabiomed - Fundación Miguel Servet, Pamplona, Spain
Estefania Huergo Unit of Translational Bioinformatics, Navarrabiomed - Fundación Miguel Servet, Pamplona, Spain
Asier Ortega-Legarreta Unit of Translational Bioinformatics, Navarrabiomed - Fundación Miguel Servet, Pamplona, Spain
Eva M Uson Raposo European Foundation for the Study of Chronic Liver Failure, Barcelona, Spain
Ferran Aguilar European Foundation for the Study of Chronic Liver Failure, Barcelona, Spain
Carlos de la Peña-Ramirez European Foundation for the Study of Chronic Liver Failure, Barcelona, Spain
Cristina López-Vicario European Foundation for the Study of Chronic Liver Failure, Barcelona, Spain Biochemistry and Molecular Genetics Service, Hospital Clínic-IDIBAPS, Barcelona, Spain
Carlo Alessandria Division of Gastroenterology and Hepatology, A.O.U. Città della Salute e della Scienza di Torino, Torino, Italy
Wim Laleman Department of Gastroenterology & Hepatology, Section of Liver & Biliopancreatic disorders and Liver Transplantation, University Hospitals Leuven, KU LEUVEN, Leuven, Belgium
Alberto Queiroz Farias Department of Gastroenterology, Hospital das Clínicas, University of São Paulo School of Medicine, Paulo School, Brazil
Richard Moreau European Foundation for the Study of Chronic Liver Failure, Barcelona, Spain Université Paris-Cité, Inserm, Centre de recherche sur l'inflammation, UMR 1149, Paris, France Assistance Publique-Hôpitaux de Paris (AP-HP), Paris, France Hôpital Beaujon, Service d'Hépatologie, Clichy, France
Javier Fernandez European Foundation for the Study of Chronic Liver Failure, Barcelona, Spain
Vicente Arroyo European Foundation for the Study of Chronic Liver Failure, Barcelona, Spain
Paolo Caraceni Department of Medical and Surgical Science, University of Bologna, Bologna, Italy IRCCS Azienda Ospedaliera-Universitaria di Bologna, Bologna, Italy
Vincenzo Lagani Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence, Thuwal, Saudi Arabia Institute of Chemical Biology, Ilia State University, Tbilisi, 0162, Georgia
Cristina Sánchez-Garrido European Foundation for the Study of Chronic Liver Failure, Barcelona, Spain
Joan Clària European Foundation for the Study of Chronic Liver Failure, Barcelona, Spain Biochemistry and Molecular Genetics Service, Hospital Clínic-IDIBAPS, Barcelona, Spain CIBERehd, Barcelona, Spain Department of Biomedical Sciences, University of Barcelona, Barcelona, Spain
Jesper Tegner Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence, Thuwal, Saudi Arabia Unit of Computational Medicine, Department of Medicine, Center for Molecular Medicine, Karolinska Institutet, Karolinska University Hospital, Stockholm, Sweden Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
Jonel Trebicka European Foundation for the Study of Chronic Liver Failure, Barcelona, Spain Department of internal medicine B, University of Münster, Münster, Germany
Narsis A Kiani Algorithmic Dynamics Lab, Center for Molecular Medicine, Karolinska Institutet, Solna, Sweden Department of Oncology-Pathology, Karolinska Institutet, Solna, Sweden
Nuria Planell Unit of Translational Bioinformatics, Navarrabiomed - Fundación Miguel Servet, Pamplona, Spain. Computational Biology Program, Universidad de Navarra, CIMA, Instituto de Investigación Sanitaria de Navarra (IdiSNA), Navarra, 31008, Spain.
Pierre-Emmanuel Rautou Université Paris-Cité, Inserm, Centre de recherche sur l'inflammation, UMR 1149, Paris, France. AP-HP, Hôpital Beaujon, Service d'Hépatologie, DMU DIGEST, Centre de Référence des Maladies Vasculaires du Foie, FILFOIE, ERN RARE-LIVER, Clichy, France.
David Gomez-Cabrero Unit of Translational Bioinformatics, Navarrabiomed - Fundación Miguel Servet, Pamplona, Spain. Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.

Collapse

Perschinka F, Peer A, Joannidis M. [Artificial intelligence and acute kidney injury]. Med Klin Intensivmed Notfmed 2024;119:199-207. [PMID: 38396124 PMCID: PMC10995052 DOI: 10.1007/s00063-024-01111-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 01/17/2024] [Indexed: 02/25/2024]

Abstract

Digitalization is increasingly finding its way into intensive care units and with it artificial intelligence (AI) for critically ill patients. One promising area for the use of AI is in the field of acute kidney injury (AKI). The use of AI is primarily focused on the prediction of AKI, but further approaches are also being used to classify existing AKI into different phenotypes. Different AI models are used for prediction. The area under the receiver operating characteristic curve values (AUROC) achieved with these models vary and are influenced by several factors, such as the prediction time and the definition of AKI. Most models have an AUROC between 0.650 and 0.900, with lower values for predictions further into the future and when applying Acute Kidney Injury Network (AKIN) instead of KDIGO criteria. Classification into phenotypes already makes it possible to categorize patients into groups with different risks of mortality or requirement of renal replacement therapy (RRT), but the etiologies or therapeutic consequences derived from this are still lacking. However, all the models suffer from AI-specific shortcomings. The use of large databases does not make it possible to promptly include recent changes in therapy and the implementation of new biomarkers in a relevant proportion. For this reason, serum creatinine and urinary output, with their known limitations, dominate current AI models for prediction impairing the performance of the current models. On the other hand, the increasingly complex models no longer allow physicians to understand the basis on which the warning of a threatening AKI is calculated and subsequent initiation of therapy should take place. The successful use of AIs in routine clinical practice will be highly determined by the trust of the physicians in the systems and overcoming the aforementioned weaknesses. However, the clinician will remain irreplaceable as the decisive authority for critically ill patients by combining measurable and nonmeasurable parameters.

Collapse

Ogasawara T, Mukaino M, Matsunaga K, Wada Y, Suzuki T, Aoshima Y, Furuzawa S, Kono Y, Saitoh E, Yamaguchi M, Otaka Y, Tsukada S. Prediction of stroke patients' bedroom-stay duration: machine-learning approach using wearable sensor data. Front Bioeng Biotechnol 2024;11:1285945. [PMID: 38234303 PMCID: PMC10791943 DOI: 10.3389/fbioe.2023.1285945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 12/11/2023] [Indexed: 01/19/2024] Open

Abstract

Background: The importance of being physically active and avoiding staying in bed has been recognized in stroke rehabilitation. However, studies have pointed out that stroke patients admitted to rehabilitation units often spend most of their day immobile and inactive, with limited opportunities for activity outside their bedrooms. To address this issue, it is necessary to record the duration of stroke patients staying in their bedrooms, but it is impractical for medical providers to do this manually during their daily work of providing care. Although an automated approach using wearable devices and access points is more practical, implementing these access points into medical facilities is costly. However, when combined with machine learning, predicting the duration of stroke patients staying in their bedrooms is possible with reduced cost. We assessed using machine learning to estimate bedroom-stay duration using activity data recorded with wearable devices. Method: We recruited 99 stroke hemiparesis inpatients and conducted 343 measurements. Data on electrocardiograms and chest acceleration were measured using a wearable device, and the location name of the access point that detected the signal of the device was recorded. We first investigated the correlation between bedroom-stay duration measured from the access point as the objective variable and activity data measured with a wearable device and demographic information as explanatory variables. To evaluate the duration predictability, we then compared machine-learning models commonly used in medical studies. Results: We conducted 228 measurements that surpassed a 90% data-acquisition rate using Bluetooth Low Energy. Among the explanatory variables, the period spent reclining and sitting/standing were correlated with bedroom-stay duration (Spearman's rank correlation coefficient (R) of 0.56 and -0.52, p < 0.001). Interestingly, the sum of the motor and cognitive categories of the functional independence measure, clinical indicators of the abilities of stroke patients, lacked correlation. The correlation between the actual bedroom-stay duration and predicted one using machine-learning models resulted in an R of 0.72 and p < 0.001, suggesting the possibility of predicting bedroom-stay duration from activity data and demographics. Conclusion: Wearable devices, coupled with machine learning, can predict the duration of patients staying in their bedrooms. Once trained, the machine-learning model can predict without continuously tracking the actual location, enabling more cost-effective and privacy-centric future measurements.

Collapse

Gao W, Xie J, Ke Y, Tian M, Zeng Z, Ma X, Zhi M. A two-stage prediction filling method with support vector technologies optimized competitively in stages by grey wolf optimizer and particle swarm optimization for missing fasting blood glucose. Proc Inst Mech Eng H 2023;237:1427-1440. [PMID: 37873735 DOI: 10.1177/09544119231206456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]

Dhingra LS, Shen M, Mangla A, Khera R. Cardiovascular Care Innovation through Data-Driven Discoveries in the Electronic Health Record. Am J Cardiol 2023;203:136-148. [PMID: 37499593 PMCID: PMC10865722 DOI: 10.1016/j.amjcard.2023.06.104] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/24/2023] [Accepted: 06/29/2023] [Indexed: 07/29/2023]

Tsai YH, Hung KY, Fang WF. Use of Peak Glucose Level and Peak Glycemic Gap in Mortality Risk Stratification in Critically Ill Patients with Sepsis and Prior Diabetes Mellitus of Different Body Mass Indexes. Nutrients 2023;15:3973. [PMID: 37764757 PMCID: PMC10534504 DOI: 10.3390/nu15183973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 09/12/2023] [Accepted: 09/13/2023] [Indexed: 09/29/2023] Open

Lee JM, Hauskrecht M. Personalized event prediction for Electronic Health Records. Artif Intell Med 2023;143:102620. [PMID: 37673563 PMCID: PMC10503594 DOI: 10.1016/j.artmed.2023.102620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 03/01/2023] [Accepted: 04/24/2023] [Indexed: 09/08/2023]

Schleicher M, Unnikrishnan V, Pryss R, Schobel J, Schlee W, Spiliopoulou M. Prediction meets time series with gaps: User clusters with specific usage behavior patterns. Artif Intell Med 2023;142:102575. [PMID: 37316098 DOI: 10.1016/j.artmed.2023.102575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Revised: 03/25/2023] [Accepted: 04/27/2023] [Indexed: 06/16/2023]

Liu Z, Chen C, Ma Q. Category-aware optimal transport for incomplete data classification. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.03.107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]

Zhou Y, Shi J, Stein R, Liu X, Baldassano RN, Forrest CB, Chen Y, Huang J. Missing data matter: an empirical evaluation of the impacts of missing EHR data in comparative effectiveness research. J Am Med Inform Assoc 2023;30:1246-1256. [PMID: 37337922 PMCID: PMC10280351 DOI: 10.1093/jamia/ocad066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 03/20/2023] [Accepted: 04/08/2023] [Indexed: 06/21/2023] Open

Ogasawara T, Mukaino M, Matsuura H, Aoshima Y, Suzuki T, Togo H, Nakashima H, Saitoh E, Yamaguchi M, Otaka Y, Tsukada S. Ensemble averaging for categorical variables: Validation study of imputing lost data in 24-h recorded postures of inpatients. Front Physiol 2023;14:1094946. [PMID: 36776969 PMCID: PMC9910696 DOI: 10.3389/fphys.2023.1094946] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 01/06/2023] [Indexed: 01/27/2023] Open

Mohammed H, Wang K, Wu H, Wang G. Subject-wise model generalization through pooling and patching for regression: Application on non-invasive systolic blood pressure estimation. Comput Biol Med 2022;151:106299. [PMID: 36423530 DOI: 10.1016/j.compbiomed.2022.106299] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 10/19/2022] [Accepted: 11/06/2022] [Indexed: 11/13/2022]

Abstract

BACKGROUND

Subject-wise modeling using machine learning is useful in many applications requiring low error and complexity, such as wearable medical devices. However, regression accuracy depends highly on the data available to train the model and the model's generalization ability. Adversely, the prediction error may increase severely if unknown data patterns test the model; such a model is known to be overfitted. In medicine-related applications, such as Non-Invasive Blood Pressure (NIBP) estimation, the high error renders the estimation model useless and dangerous.

METHODS

This paper presents a novel algorithm to handle overfitting by editing the training data to achieve generalization for subject-wise models. The pooling and patching (PaP) algorithms use a relatively short record segment of a subject as a Key-Segment (KS) to search through a larger dataset for similar subjects. Then samples taken from the matched subjects' pool records are used to patch the original subject's KS. Due to the significance of systolic blood pressure (SBP) and the complexity of its variability, non-invasive estimation of SBP from electrocardiography (ECG) and photoplethysmography (PPG) is introduced as an application to assess the algorithm. The study was performed on 2051 subjects with a wide range of age, height, weight, length, and health status. The subjects' records were taken from a large public dataset, VitalDB, which is acquired from subjects undergoing different surgeries. Finally, all the results are obtained without using other model generalization techniques.

RESULTS

The generalization effect of the proposed algorithm, PaP, significantly outperformed cross-validation, which is widely used in regression model generalization. Moreover, the testing results show that a KS of 200 to 2000 samples is sufficient for providing high accuracy for much longer testing data of about 12000 to 24000 samples long, which is less than %10 of the record length on average. Furthermore, compared to other works based on the same dataset, PaP provides a significantly lower mean error of -0.75 ± 5.51 mmHg, with a small training data portion of 15% over 2051 subjects.

Collapse

Spyreli E, McGowan L, Heery E, Kelly A, Croker H, Lawlor C, O'Neill R, Kelleher CC, McCarthy M, Wall P, Heinen MM. Public beliefs about the consequences of living with obesity in the Republic of Ireland and Northern Ireland. BMC Public Health 2022;22:1910. [PMID: 36229815 PMCID: PMC9559245 DOI: 10.1186/s12889-022-14280-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Accepted: 09/30/2022] [Indexed: 11/17/2022] Open

Abstract

Background

This study aimed to capture public beliefs about living with obesity, examine how these beliefs have changed over time and to explore whether certain characteristics were associated with them in a nationally representative sample of adults from the Republic of Ireland (RoI) and Northern Ireland (NI).

Methods

A cross-sectional survey employed a random quota sampling approach to recruit a nationally representative sample of 1046 adults across NI and RoI. Telephone interviews captured information on demographics; health behaviours & attitudes; and beliefs about the consequences of obesity (measured using the Obesity Beliefs Scale). Univariable analyses compared beliefs about the consequences of living with obesity between participants with a self-reported healthy weight and those living with overweight or obesity, and non-responders (those for whom weight status could not be ascertained due to missing data). Multiple linear regression examined associations between obesity-related beliefs and socio-demographics, self-rated health and perceived ability to change health behaviours. Multiple linear regression also compared changes in obesity-related beliefs between 2013 and 2020 in the RoI.

Results

Higher endorsement of the negative outcomes of obesity was significantly associated with living with a healthy weight, higher self-rated health, dietary quality and perceived ability to improve diet and physical activity. Those who lived with overweight, with obesity and non-responders were less likely to endorse the negative consequences of obesity. Those living with obesity and non-responders were also more likely to support there is an increased cost and effort in maintaining a healthy weight. Comparison with survey data from 2013 showed that currently, there is a greater endorsement of the health benefits of maintaining a healthy weight (p < 0001), but also of the increased costs associated with it (p < 0001).

Conclusion

Beliefs about the consequences of maintaining a healthy body weight are associated with individuals’ weight, self-rated health, diet and perceived ease of adoption of dietary and exercise-related improvements. Beliefs about the health risks of obesity and perceived greater costs associated with maintaining a healthy weight appear to have strengthened over time. Present findings are pertinent to researchers and policy makers involved in the design and framing of interventions to address obesity.

Supplementary information

The online version contains supplementary material available at 10.1186/s12889-022-14280-9.

Collapse

Duan F, Yang Y. Recognizing Missing Electromyography Signal by Data Split Reorganization Strategy and Weight-Based Multiple Neural Network Voting Method. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022;33:2070-2079. [PMID: 34460399 DOI: 10.1109/tnnls.2021.3105595] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

When Can I Expect the mHealth User to Return? Prediction Meets Time Series with Gaps. Artif Intell Med 2022. [DOI: 10.1007/978-3-031-09342-5_30] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]

Steif J, Brant R, Sreepada RS, West N, Murthy S, Görges M. Prediction Model Performance With Different Imputation Strategies: A Simulation Study Using a North American ICU Registry. Pediatr Crit Care Med 2022;23:e29-e44. [PMID: 34560774 PMCID: PMC8719509 DOI: 10.1097/pcc.0000000000002835] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Abstract

OBJECTIVES

To evaluate the performance of pragmatic imputation approaches when estimating model coefficients using datasets with varying degrees of data missingness.

DESIGN

Performance in predicting observed mortality in a registry dataset was evaluated using simulations of two simple logistic regression models with age-specific criteria for abnormal vital signs (mentation, systolic blood pressure, respiratory rate, WBC count, heart rate, and temperature). Starting with a dataset with complete information, increasing degrees of biased missingness of WBC and mentation were introduced, depending on the values of temperature and systolic blood pressure, respectively. Missing data approaches evaluated included analysis of complete cases only, assuming missing data are normal, and multiple imputation by chained equations. Percent bias and root mean square error, in relation to parameter estimates obtained from the original data, were evaluated as performance indicators.

SETTING

Data were obtained from the Virtual Pediatric Systems, LLC, database (Los Angeles, CA), which provides clinical markers and outcomes in prospectively collected records from 117 PICUs in the United States and Canada.

PATIENTS

Children admitted to a participating PICU in 2017, for whom all required data were available.

INTERVENTIONS

None.

MEASUREMENTS AND MAIN RESULTS

Simulations demonstrated that multiple imputation by chained equations is an effective strategy and that even a naive implementation of multiple imputation by chained equations significantly outperforms traditional approaches: the root mean square error for model coefficients was lower using multiple imputation by chained equations in 90 of 99 of all simulations (91%) compared with discarding cases with missing data and lower in 97 of 99 (98%) compared with models assuming missing values are in the normal range. Assuming missing data to be abnormal was inferior to all other approaches.

CONCLUSIONS

Analyses of large observational studies are likely to encounter the issue of missing data, which are likely not missing at random. Researchers should always consider multiple imputation by chained equations (or similar imputation approaches) when encountering even only small proportions of missing data in their work.

Collapse

Wang S, Celebi ME, Zhang YD, Yu X, Lu S, Yao X, Zhou Q, Miguel MG, Tian Y, Gorriz JM, Tyukin I. Advances in Data Preprocessing for Biomedical Data Fusion: An Overview of the Methods, Challenges, and Prospects. INFORMATION FUSION 2021;76:376-421. [DOI: 10.1016/j.inffus.2021.07.001] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/30/2023]

An Ensemble Method for Missing Data of Environmental Sensor Considering Univariate and Multivariate Characteristics. SENSORS 2021;21:s21227595. [PMID: 34833670 PMCID: PMC8621076 DOI: 10.3390/s21227595] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Revised: 11/12/2021] [Accepted: 11/15/2021] [Indexed: 11/25/2022]

Abstract

With rapid urbanization, awareness of environmental pollution is growing rapidly and, accordingly, interest in environmental sensors that measure atmospheric and indoor air quality is increasing. Since these IoT-based environmental sensors are sensitive and value reliability, it is essential to deal with missing values, which are one of the causes of reliability problems. Characteristics that can be used to impute missing values in environmental sensors are the time dependency of single variables and the correlation between multivariate variables. However, in the existing method of imputing missing values, only one characteristic has been used and there has been no case where both characteristics were used. In this work, we introduced a new ensemble imputation method reflecting this. First, the cases in which missing values occur frequently were divided into four cases and were generated into the experimental data: communication error (aperiodic, periodic), sensor error (rapid change, measurement range). To compare the existing method with the proposed method, five methods of univariate imputation and five methods of multivariate imputation—both of which are widely used—were used as a single model to predict missing values for the four cases. The values predicted by a single model were applied to the ensemble method. Among the ensemble methods, the weighted average and stacking methods were used to derive the final predicted values and replace the missing values. Finally, the predicted values, substituted with the original data, were evaluated by a comparison between the mean absolute error (MAE) and the root mean square error (RMSE). The proposed ensemble method generally performed better than the single method. In addition, this method simultaneously considers the correlation between variables and time dependence, which are characteristics that must be considered in the environmental sensor. As a result, our proposed ensemble technique can contribute to the replacement of the missing values generated by environmental sensors, which can help to increase the reliability of environmental sensor data.

Collapse

Caudai C, Galizia A, Geraci F, Le Pera L, Morea V, Salerno E, Via A, Colombo T. AI applications in functional genomics. Comput Struct Biotechnol J 2021;19:5762-5790. [PMID: 34765093 PMCID: PMC8566780 DOI: 10.1016/j.csbj.2021.10.009] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 10/05/2021] [Accepted: 10/05/2021] [Indexed: 12/13/2022] Open

Shahpari M, Hajji M, Mirnajafi-Zadeh J, Setoodeh P. Modeling plasticity during epileptogenesis by long short term memory neural networks. Cogn Neurodyn 2021;16:401-409. [PMID: 35401870 PMCID: PMC8934824 DOI: 10.1007/s11571-021-09698-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 05/30/2021] [Accepted: 07/07/2021] [Indexed: 10/20/2022] Open

E Moura FS, Amin K, Ekwobi C. Artificial intelligence in the management and treatment of burns: a systematic review. BURNS & TRAUMA 2021;9:tkab022. [PMID: 34423054 PMCID: PMC8375569 DOI: 10.1093/burnst/tkab022] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/25/2020] [Revised: 03/08/2021] [Accepted: 04/30/2021] [Indexed: 06/13/2023]

Arfat Y, Mittone G, Esposito R, Cantalupo B, DE Ferrari GM, Aldinucci M. A review of machine learning for cardiology. Minerva Cardiol Angiol 2021;70:75-91. [PMID: 34338485 DOI: 10.23736/s2724-5683.21.05709-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

O'Hara C, Gibney ER. Meal Pattern Analysis in Nutritional Science: Recent Methods and Findings. Adv Nutr 2021;12:1365-1378. [PMID: 33460431 PMCID: PMC8321870 DOI: 10.1093/advances/nmaa175] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 12/01/2020] [Accepted: 12/15/2020] [Indexed: 11/14/2022] Open

Bibicheva TS, Skazkina VV, Ogneva MV, Simonyan MA, Gridnev VI, Karavaev AS. Missing value imputation with linear methods in the database of cardiological patients in prediction of mortality. CARDIO-IT 2021. [DOI: 10.15275/cardioit.2021.0101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open

Cabitza F, Campagner A. The need to separate the wheat from the chaff in medical informatics: Introducing a comprehensive checklist for the (self)-assessment of medical AI studies. Int J Med Inform 2021;153:104510. [PMID: 34108105 DOI: 10.1016/j.ijmedinf.2021.104510] [Citation(s) in RCA: 116] [Impact Index Per Article: 38.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 05/26/2021] [Accepted: 05/27/2021] [Indexed: 12/23/2022]

A review of irregular time series data handling with gated recurrent neural networks. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.02.046] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

Ngueilbaye A, Wang H, Mahamat DA, Junaidu SB. Modulo 9 model-based learning for missing data imputation. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107167] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

Shin J, Yoon S, Kim Y, Kim T, Go B, Cha Y. Effects of class imbalance on resampling and ensemble learning for improved prediction of cyanobacteria blooms. ECOL INFORM 2021. [DOI: 10.1016/j.ecoinf.2020.101202] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Nonlinear compensation algorithm for multidimensional temporal data: A missing value imputation for the power grid applications. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.106743] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

The visual outcomes of idiopathic epiretinal membrane removal in eyes with ectopic inner foveal layers and preserved macular segmentation. Graefes Arch Clin Exp Ophthalmol 2021;259:2193-2201. [PMID: 33528646 DOI: 10.1007/s00417-021-05102-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 01/18/2021] [Accepted: 01/25/2021] [Indexed: 10/22/2022] Open

Mostafa SM. Towards improving machine learning algorithms accuracy by benefiting from similarities between cases. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-201077] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Roland D, Suzen N, Coats TJ, Levesley J, Gorban AN, Mirkes EM. What can the randomness of missing values tell you about clinical practice in large data sets of children's vital signs? Pediatr Res 2021;89:16-21. [PMID: 32294665 DOI: 10.1038/s41390-020-0861-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/02/2019] [Revised: 01/27/2020] [Accepted: 02/26/2020] [Indexed: 11/09/2022]

Zhang X, Yan C, Gao C, Malin BA, Chen Y. Predicting Missing Values in Medical Data via XGBoost Regression. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2020;4:383-394. [PMID: 33283143 DOI: 10.1007/s41666-020-00077-1] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]

Coito T, Martins MS, Viegas JL, Firme B, Figueiredo J, Vieira SM, Sousa JM. A Middleware Platform for Intelligent Automation: An Industrial Prototype Implementation. COMPUT IND 2020. [DOI: 10.1016/j.compind.2020.103329] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Lung PY, Zhong D, Pang X, Li Y, Zhang J. Maximizing the reusability of gene expression data by predicting missing metadata. PLoS Comput Biol 2020;16:e1007450. [PMID: 33156882 PMCID: PMC7673503 DOI: 10.1371/journal.pcbi.1007450] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2019] [Revised: 11/18/2020] [Accepted: 10/09/2020] [Indexed: 11/18/2022] Open

Missing data techniques in classification for cardiovascular dysautonomias diagnosis. Med Biol Eng Comput 2020;58:2863-2878. [PMID: 32970269 DOI: 10.1007/s11517-020-02266-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Accepted: 09/08/2020] [Indexed: 10/23/2022]

Pan Y, Liu M, Lian C, Xia Y, Shen D. Spatially-Constrained Fisher Representation for Brain Disease Identification With Incomplete Multi-Modal Neuroimages. IEEE TRANSACTIONS ON MEDICAL IMAGING 2020;39:2965-2975. [PMID: 32217472 PMCID: PMC7485604 DOI: 10.1109/tmi.2020.2983085] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]

Affiliation(s)

Yongsheng Pan Y. Pan and Y. Xia are with the National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an 710072, China. M. Liu, C. Lian, and D. Shen are with the Department of Radiology and BRIC, University of North Carolina, Chapel Hill, NC 27599, USA. D. Shen is also with the Department of Brain and Cognitive Engineering, Korea University, Seoul 02841, South Korea
Mingxia Liu Y. Pan and Y. Xia are with the National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an 710072, China. M. Liu, C. Lian, and D. Shen are with the Department of Radiology and BRIC, University of North Carolina, Chapel Hill, NC 27599, USA. D. Shen is also with the Department of Brain and Cognitive Engineering, Korea University, Seoul 02841, South Korea
Chunfeng Lian Y. Pan and Y. Xia are with the National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an 710072, China. M. Liu, C. Lian, and D. Shen are with the Department of Radiology and BRIC, University of North Carolina, Chapel Hill, NC 27599, USA. D. Shen is also with the Department of Brain and Cognitive Engineering, Korea University, Seoul 02841, South Korea
Yong Xia Y. Pan and Y. Xia are with the National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an 710072, China. M. Liu, C. Lian, and D. Shen are with the Department of Radiology and BRIC, University of North Carolina, Chapel Hill, NC 27599, USA. D. Shen is also with the Department of Brain and Cognitive Engineering, Korea University, Seoul 02841, South Korea
Dinggang Shen Y. Pan and Y. Xia are with the National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an 710072, China. M. Liu, C. Lian, and D. Shen are with the Department of Radiology and BRIC, University of North Carolina, Chapel Hill, NC 27599, USA. D. Shen is also with the Department of Brain and Cognitive Engineering, Korea University, Seoul 02841, South Korea

Collapse

Vilardell M, Buxó M, Clèries R, Martínez JM, Garcia G, Ameijide A, Font R, Civit S, Marcos-Gragera R, Vilardell ML, Carulla M, Espinàs JA, Galceran J, Izquierdo A, Borràs JM. Missing data imputation and synthetic data simulation through modeling graphical probabilistic dependencies between variables (ModGraProDep): An application to breast cancer survival. Artif Intell Med 2020;107:101875. [DOI: 10.1016/j.artmed.2020.101875] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 02/12/2020] [Accepted: 05/02/2020] [Indexed: 12/29/2022]

Ferrão JC, Oliveira MD, Janela F, Martins HMG, Gartner D. Can structured EHR data support clinical coding? A data mining approach. Health Syst (Basingstoke) 2020;10:138-161. [PMID: 34104432 PMCID: PMC8143604 DOI: 10.1080/20476965.2020.1729666] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2019] [Accepted: 10/22/2019] [Indexed: 10/24/2022] Open

Artificial Intelligence in Critical Care. Int Anesthesiol Clin 2020;57:89-102. [PMID: 30864993 DOI: 10.1097/aia.0000000000000221] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

Traina AJ, Brinis S, Pedrosa GV, Avalhais LP, Traina C. Querying on large and complex databases by content: Challenges on variety and veracity regarding real applications. INFORM SYST 2019. [DOI: 10.1016/j.is.2019.03.012] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Mostafa SM. Imputing missing values using cumulative linear regression. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2019. [DOI: 10.1049/trit.2019.0032] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Venugopalan J, Chanani N, Maher K, Wang MD. Novel Data Imputation for Multiple Types of Missing Data in Intensive Care Units. IEEE J Biomed Health Inform 2019;23:1243-1250. [DOI: 10.1109/jbhi.2018.2883606] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Imani F, Cheng C, Chen R, Yang H. Nested Gaussian process modeling and imputation of high-dimensional incomplete data under uncertainty. ACTA ACUST UNITED AC 2019. [DOI: 10.1080/24725579.2019.1583704] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Data preprocessing in predictive data mining. KNOWL ENG REV 2019. [DOI: 10.1017/s026988891800036x] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Idri A, Benhar H, Fernández-Alemán JL, Kadi I. A systematic map of medical data preprocessing in knowledge discovery. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018;162:69-85. [PMID: 29903496 DOI: 10.1016/j.cmpb.2018.05.007] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Revised: 04/25/2018] [Accepted: 05/03/2018] [Indexed: 06/08/2023]

Abstract

BACKGROUND AND OBJECTIVE

Datamining (DM) has, over the last decade, received increased attention in the medical domain and has been widely used to analyze medical datasets in order to extract useful knowledge and previously unknown patterns. However, historical medical data can often comprise inconsistent, noisy, imbalanced, missing and high dimensional data. These challenges lead to a serious bias in predictive modeling and reduce the performance of DM techniques. Data preprocessing is, therefore, an essential step in knowledge discovery as regards improving the quality of data and making it appropriate and suitable for DM techniques. The objective of this paper is to review the use of preprocessing techniques in clinical datasets.

METHODS

We performed a systematic map of studies regarding the application of data preprocessing to healthcare and published between January 2000 and December 2017. A search string was determined on the basis of the mapping questions and the PICO categories. The search string was then applied in digital databases covering the fields of computer science and medical informatics in order to identify relevant studies. The studies were initially selected by reading their titles, abstracts and keywords. Those that were selected at that stage were then reviewed using a set of inclusion and exclusion criteria in order to eliminate any that were not relevant. This process resulted in 126 primary studies.

RESULTS

Selected studies were analyzed and classified according to their publication years and channels, research type, empirical type and contribution type. The findings of this mapping study revealed that researchers have paid a considerable amount of attention to preprocessing in medical DM in last decade. A significant number of the selected studies used data reduction and cleaning preprocessing tasks. Moreover, the disciplines in which preprocessing have received most attention are: cardiology, endocrinology and oncology.

CONCLUSIONS

Researchers should develop and implement standards for an effective integration of multiple medical data types. Moreover, we identified the need to perform literature reviews.

Collapse

Albers DJ, Elhadad N, Claassen J, Perotte R, Goldstein A, Hripcsak G. Estimating summary statistics for electronic health record laboratory data for use in high-throughput phenotyping algorithms. J Biomed Inform 2018;78:87-101. [PMID: 29369797 PMCID: PMC5856130 DOI: 10.1016/j.jbi.2018.01.004] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2017] [Revised: 12/05/2017] [Accepted: 01/14/2018] [Indexed: 01/12/2023]

Abstract

We study the question of how to represent or summarize raw laboratory data taken from an electronic health record (EHR) using parametric model selection to reduce or cope with biases induced through clinical care. It has been previously demonstrated that the health care process (Hripcsak and Albers, 2012, 2013), as defined by measurement context (Hripcsak and Albers, 2013; Albers et al., 2012) and measurement patterns (Albers and Hripcsak, 2010, 2012), can influence how EHR data are distributed statistically (Kohane and Weber, 2013; Pivovarov et al., 2014). We construct an algorithm, PopKLD, which is based on information criterion model selection (Burnham and Anderson, 2002; Claeskens and Hjort, 2008), is intended to reduce and cope with health care process biases and to produce an intuitively understandable continuous summary. The PopKLD algorithm can be automated and is designed to be applicable in high-throughput settings; for example, the output of the PopKLD algorithm can be used as input for phenotyping algorithms. Moreover, we develop the PopKLD-CAT algorithm that transforms the continuous PopKLD summary into a categorical summary useful for applications that require categorical data such as topic modeling. We evaluate our methodology in two ways. First, we apply the method to laboratory data collected in two different health care contexts, primary versus intensive care. We show that the PopKLD preserves known physiologic features in the data that are lost when summarizing the data using more common laboratory data summaries such as mean and standard deviation. Second, for three disease-laboratory measurement pairs, we perform a phenotyping task: we use the PopKLD and PopKLD-CAT algorithms to define high and low values of the laboratory variable that are used for defining a disease state. We then compare the relationship between the PopKLD-CAT summary disease predictions and the same predictions using empirically estimated mean and standard deviation to a gold standard generated by clinical review of patient records. We find that the PopKLD laboratory data summary is substantially better at predicting disease state. The PopKLD or PopKLD-CAT algorithms are not meant to be used as phenotyping algorithms, but we use the phenotyping task to show what information can be gained when using a more informative laboratory data summary. In the process of evaluation our method we show that the different clinical contexts and laboratory measurements necessitate different statistical summaries. Similarly, leveraging the principle of maximum entropy we argue that while some laboratory data only have sufficient information to estimate a mean and standard deviation, other laboratory data captured in an EHR contain substantially more information than can be captured in higher-parameter models.

Collapse

Image recognition with missing-features based on gaussian mixture model and graph constrained nonnegative matrix factorization. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2017;2017:3150-3153. [PMID: 29060566 DOI: 10.1109/embc.2017.8037525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Nancy JY, Khanna NH, Arputharaj K. Imputing missing values in unevenly spaced clinical time series data to build an effective temporal classification framework. Comput Stat Data Anal 2017. [DOI: 10.1016/j.csda.2017.02.012] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]