Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

44
(from Reference Citation Analysis)

Article PDFs (10)

Cited by > 0 (32)

Searched Name

missing values

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Nehler KJ, Schultze M. Simulation-Based Performance Evaluation of Missing Data Handling in Network Analysis. Multivariate Behav Res 2024:1-21. [PMID: 38247019 DOI: 10.1080/00273171.2023.2283638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2024]

Kim M, Kim TH, Kim D, Lee D, Kim D, Heo J, Kang S, Ha T, Kim J, Moon DH, Heo Y, Kim WJ, Lee SJ, Kim Y, Park SW, Han SS, Choi HS. In-Advance Prediction of Pressure Ulcers via Deep-Learning-Based Robust Missing Value Imputation on Real-Time Intensive Care Variables. J Clin Med 2023;13:36. [PMID: 38202043 PMCID: PMC10780209 DOI: 10.3390/jcm13010036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 12/06/2023] [Accepted: 12/15/2023] [Indexed: 01/12/2024] Open

Affiliation(s)

Minkyu Kim Department of Research & Development, Ziovision Co., Ltd., Chuncheon 24341, Republic of Korea; (M.K.); (D.K.); (D.L.); (D.K.)
Tae-Hoon Kim Department of Internal Medicine, Kangwon National University, Chuncheon 24341, Republic of Korea; (T.-H.K.); (J.H.); (J.K.); (D.H.M.); (Y.H.); (W.J.K.); (S.-J.L.)
Dowon Kim Department of Research & Development, Ziovision Co., Ltd., Chuncheon 24341, Republic of Korea; (M.K.); (D.K.); (D.L.); (D.K.)
Donghoon Lee Department of Research & Development, Ziovision Co., Ltd., Chuncheon 24341, Republic of Korea; (M.K.); (D.K.); (D.L.); (D.K.)
Dohyun Kim Department of Research & Development, Ziovision Co., Ltd., Chuncheon 24341, Republic of Korea; (M.K.); (D.K.); (D.L.); (D.K.)
Jeongwon Heo Department of Internal Medicine, Kangwon National University, Chuncheon 24341, Republic of Korea; (T.-H.K.); (J.H.); (J.K.); (D.H.M.); (Y.H.); (W.J.K.); (S.-J.L.)
Seonguk Kang Department of Convergence Security, Kangwon National University, Chuncheon 24341, Republic of Korea;
Taejun Ha Biomedical Research Institute, Kangwon National University Hospital, Chuncheon 24289, Republic of Korea;
Jinju Kim Department of Internal Medicine, Kangwon National University, Chuncheon 24341, Republic of Korea; (T.-H.K.); (J.H.); (J.K.); (D.H.M.); (Y.H.); (W.J.K.); (S.-J.L.)
Da Hye Moon Department of Internal Medicine, Kangwon National University, Chuncheon 24341, Republic of Korea; (T.-H.K.); (J.H.); (J.K.); (D.H.M.); (Y.H.); (W.J.K.); (S.-J.L.) Department of Pulmonology, Kangwon National University Hospital, Chuncheon 24289, Republic of Korea
Yeonjeong Heo Department of Internal Medicine, Kangwon National University, Chuncheon 24341, Republic of Korea; (T.-H.K.); (J.H.); (J.K.); (D.H.M.); (Y.H.); (W.J.K.); (S.-J.L.) Department of Pulmonology, Kangwon National University Hospital, Chuncheon 24289, Republic of Korea
Woo Jin Kim Department of Internal Medicine, Kangwon National University, Chuncheon 24341, Republic of Korea; (T.-H.K.); (J.H.); (J.K.); (D.H.M.); (Y.H.); (W.J.K.); (S.-J.L.)
Seung-Joon Lee Department of Internal Medicine, Kangwon National University, Chuncheon 24341, Republic of Korea; (T.-H.K.); (J.H.); (J.K.); (D.H.M.); (Y.H.); (W.J.K.); (S.-J.L.)
Yoon Kim Department of Computer Science and Engineering, Kangwon National University, Chuncheon 24341, Republic of Korea;
Sang Won Park Department of Medical Informatics, School of Medicine, Kangwon National University, Chuncheon 24341, Republic of Korea; Institute of Medical Science, School of Medicine, Kangwon National University, Chuncheon 24341, Republic of Korea
Seon-Sook Han Department of Internal Medicine, Kangwon National University, Chuncheon 24341, Republic of Korea; (T.-H.K.); (J.H.); (J.K.); (D.H.M.); (Y.H.); (W.J.K.); (S.-J.L.)
Hyun-Soo Choi Department of Computer Science and Engineering, Seoul National University of Science and Technology, Seoul 01811, Republic of Korea

Collapse

Aßmann C, Gaasch JC, Stingl D. A Bayesian Approach Towards Missing Covariate Data in Multilevel Latent Regression Models. Psychometrika 2023;88:1495-1528. [PMID: 36418780 PMCID: PMC10656345 DOI: 10.1007/s11336-022-09888-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 08/29/2022] [Accepted: 09/20/2022] [Indexed: 06/16/2023]

Karamti H, Alharthi R, Anizi AA, Alhebshi RM, Eshmawi AA, Alsubai S, Umer M. Improving Prediction of Cervical Cancer Using KNN Imputed SMOTE Features and Multi-Model Ensemble Learning Approach. Cancers (Basel) 2023;15:4412. [PMID: 37686692 PMCID: PMC10486648 DOI: 10.3390/cancers15174412] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 08/02/2023] [Accepted: 08/09/2023] [Indexed: 09/10/2023] Open

Vanderaa C, Gatto L. Revisiting the Thorny Issue of Missing Values in Single-Cell Proteomics. J Proteome Res 2023;22:2775-2784. [PMID: 37530557 DOI: 10.1021/acs.jproteome.3c00227] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/03/2023]

Stahlmann K, Reitsma JB, Zapf A. Missing values and inconclusive results in diagnostic studies - A scoping review of methods. Stat Methods Med Res 2023;32:1842-1855. [PMID: 37559474 PMCID: PMC10540494 DOI: 10.1177/09622802231192954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/11/2023]

Mayer I, Josse J. Generalizing treatment effects with incomplete covariates: Identifying assumptions and multiple imputation algorithms. Biom J 2023;65:e2100294. [PMID: 36907999 DOI: 10.1002/bimj.202100294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 01/24/2023] [Accepted: 02/13/2023] [Indexed: 03/14/2023]

Pandolfi S, Bartolucci F, Pennoni F. A hidden Markov model for continuous longitudinal data with missing responses and dropout. Biom J 2023;65:e2200016. [PMID: 37035989 DOI: 10.1002/bimj.202200016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 01/04/2023] [Accepted: 01/10/2023] [Indexed: 04/11/2023]

Buczak P, Chen JJ, Pauly M. Analyzing the Effect of Imputation on Classification Performance under MCAR and MAR Missing Mechanisms. Entropy (Basel) 2023;25:521. [PMID: 36981409 PMCID: PMC10048089 DOI: 10.3390/e25030521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 03/10/2023] [Accepted: 03/14/2023] [Indexed: 06/18/2023]

Buyukozkan M, Benedetti E, Krumsiek J. rox: A Statistical Model for Regression with Missing Values. Metabolites 2023;13:metabo13010127. [PMID: 36677052 PMCID: PMC9861384 DOI: 10.3390/metabo13010127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 11/15/2022] [Accepted: 11/17/2022] [Indexed: 01/18/2023] Open

Chen X, Aljrees T, Umer M, Saidani O, Almuqren L, Mzoughi O, Ishaq A, Ashraf I. Cervical cancer detection using K nearest neighbor imputer and stacked ensemble learningmodel. Digit Health 2023;9:20552076231203802. [PMID: 37799501 PMCID: PMC10548812 DOI: 10.1177/20552076231203802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 09/08/2023] [Indexed: 10/07/2023] Open

Abstract

Objective

Cervical cancer stands as a leading cause of mortality among women in developing nations. To ensure the reduction of its adverse consequences, the primary protocols to be adhered to involve early detection and treatment under the guidance of expert medical professionals. An effective approach for identifying this form of malignancy involves the examination of Pap smear images. However, in the context of automating cervical cancer detection, many of the existing datasets frequently exhibit missing data points, a factor that can substantially impact the effectiveness of machine learning models.

Methods

In response to these hurdles, this research introduces an automated system designed to predict cervical cancer with a dual focus: adeptly managing missing data while attaining remarkable accuracy. The system's core is built upon a stacked ensemble voting classifier model, which amalgamates three distinct machine learning models, all harmoniously integrated with the KNN Imputer to address the issue of missing values.

Results

The model put forth attains an accuracy of 99.41%, precision of 97.63%, recall of 95.96%, and an F1 score of 96.76% when incorporating the KNN imputation method. The investigation conducts a comparative analysis, contrasting the performance of this model with seven alternative machine learning algorithms in two scenarios: one where missing values are eliminated, and another employing KNN imputation. This study offers validation of the effectiveness of the proposed model in comparison to current state-of-the-art methodologies.

Conclusions

This research delves into the challenge of handling missing data in the dataset utilized for cervical cancer detection. The findings have the potential to assist healthcare professionals in achieving early detection and enhancing the quality of care provided to individuals affected by cervical cancer.

Collapse

Witte J, Foraita R, Didelez V. Multiple imputation and test-wise deletion for causal discovery with incomplete cohort data. Stat Med 2022;41:4716-4743. [PMID: 35908775 DOI: 10.1002/sim.9535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 06/12/2022] [Accepted: 07/11/2022] [Indexed: 11/08/2022]

Gu Y, Preisser JS, Zeng D, Shrestha P, Shah M, Simancas-Pallares MA, Ginnis J, Divaris K. PARTITIONING AROUND MEDOIDS CLUSTERING AND RANDOM FOREST CLASSIFICATION FOR GIS-INFORMED IMPUTATION OF FLUORIDE CONCENTRATION DATA. Ann Appl Stat 2022;16:551-572. [PMID: 35356492 DOI: 10.1214/21-aoas1516] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]

Abstract

Community water fluoridation is an important component of oral health promotion, as fluoride exposure is a well-documented dental caries-preventive agent. Direct measurements of domestic water fluoride content provide valuable information regarding individuals' fluoride exposure and thus caries risk; however, they are logistically challenging to carry out at a large scale in oral health research. This article describes the development and evaluation of a novel method for the imputation of missing domestic water fluoride concentration data informed by spatial autocorrelation. The context is a state-wide epidemiologic study of pediatric oral health in North Carolina, where domestic water fluoride concentration information was missing for approximately 75% of study participants with clinical data on dental caries. A new machine-learning-based imputation method that combines partitioning around medoids clustering and random forest classification (PAMRF) is developed and implemented. Imputed values are filtered according to allowable error rates or target sample size, depending on the requirements of each application. In leave-one-out cross-validation and simulation studies, PAMRF outperforms four existing imputation approaches-two conventional spatial interpolation methods (i.e., inverse-distance weighting, IDW and universal kriging, UK) and two supervised learning methods (k-nearest neighbors, KNN and classification and regression trees, CART). The inclusion of multiply imputed values in the estimation of the association between fluoride concentration and dental caries prevalence resulted in essentially no change in PAMRF estimates but substantial gains in precision due to larger effective sample size. PAMRF is a powerful new method for the imputation of missing fluoride values where geographical information exists.

Collapse

Alsaber A, Al-Herz A, Pan J, Al-Sultan AT, Mishra D. Handling missing data in a rheumatoid arthritis registry using random forest approach. Int J Rheum Dis 2021;24:1282-1293. [PMID: 34382756 DOI: 10.1111/1756-185x.14203] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Revised: 07/13/2021] [Accepted: 07/23/2021] [Indexed: 12/01/2022]

Amro L, Pauly M, Ramosaj B. Asymptotic-based bootstrap approach for matched pairs with missingness in a single arm. Biom J 2021;63:1389-1405. [PMID: 34240446 DOI: 10.1002/bimj.202000051] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Revised: 12/11/2020] [Accepted: 01/20/2021] [Indexed: 11/06/2022]

Sendi P, Ramadani A, Bornstein MM. Prevalence of Missing Values and Protest Zeros in Contingent Valuation in Dental Medicine. Int J Environ Res Public Health 2021;18:7219. [PMID: 34299670 DOI: 10.3390/ijerph18147219] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Revised: 06/30/2021] [Accepted: 07/02/2021] [Indexed: 12/26/2022]

Abstract

Background: The number of contingent valuation (CV) studies in dental medicine using willingness-to-pay (WTP) methodology has substantially increased in recent years. Missing values due to absent information (i.e., missingness) or false information (i.e., protest zeros) are a common problem in WTP studies. The objective of this study is to evaluate the prevalence of missing values in CV studies in dental medicine, to assess how these have been dealt with, and to suggest recommendations for future research. Methods: We systematically searched electronic databases (MEDLINE, Web of Science, Cochrane Library, PROSPERO) on 8 June 2021, and hand-searched references of selected reviews. CV studies in clinical dentistry using WTP for valuing a good or service were included. Results: We included 49 WTP studies in our review. Out of these, 19 (38.8%) reported missing values due to absent information, and 28 (57.1%) reported zero values (i.e., WTP valued at zero). Zero values were further classified into true zeros (i.e., representing the underlying preference of the respondent) or protest zeros (i.e., false information as a protest behavior) in only 9 studies. Most studies used a complete case analysis to address missingness while only one study used multiple imputation. Conclusions: There is uncertainty in the dental literature on how to address missing values and zero values in CV studies. Zero values need to be classified as true zeros versus protest zeros with follow-up questions after the WTP elicitation procedure, and then need to be handled differently. Advanced statistical methods are available to address both missing values due to missingness and due to protest zeros but these are currently underused in dental medicine. Failing to appropriately address missing values in CV studies may lead to biased WTP estimates of dental interventions.

Collapse

Egert J, Brombacher E, Warscheid B, Kreutz C. DIMA: Data-Driven Selection of an Imputation Algorithm. J Proteome Res 2021;20:3489-3496. [PMID: 34062065 DOI: 10.1021/acs.jproteome.1c00119] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Dabke K, Kreimer S, Jones MR, Parker SJ. A Simple Optimization Workflow to Enable Precise and Accurate Imputation of Missing Values in Proteomic Data Sets. J Proteome Res 2021;20:3214-3229. [PMID: 33939434 DOI: 10.1021/acs.jproteome.1c00070] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]

Abstract

Missing values in proteomic data sets have real consequences on downstream data analysis and reproducibility. Although several imputation methods exist to handle missing values, no single imputation method is best suited for a diverse range of data sets, and no clear strategy exists for evaluating imputation methods for clinical DIA-MS data sets, especially at different levels of protein quantification. To navigate through the different imputation strategies available in the literature, we have established a strategy to assess imputation methods on clinical label-free DIA-MS data sets. We used three DIA-MS data sets with real missing values to evaluate eight imputation methods with multiple parameters at different levels of protein quantification: a dilution series data set, a small pilot data set, and a clinical proteomic data set comparing paired tumor and stroma tissue. We found that imputation methods based on local structures within the data, like local least-squares (LLS) and random forest (RF), worked well in our dilution series data set, whereas imputation methods based on global structures within the data, like BPCA, performed well in the other two data sets. We also found that imputation at the most basic protein quantification level-fragment level-improved accuracy and the number of proteins quantified. With this analytical framework, we quickly and cost-effectively evaluated different imputation methods using two smaller complementary data sets to narrow down to the larger proteomic data set's most accurate methods. This acquisition strategy allowed us to provide reproducible evidence of the accuracy of the imputation method, even in the absence of a ground truth. Overall, this study indicates that the most suitable imputation method relies on the overall structure of the data set and provides an example of an analytic framework that may assist in identifying the most appropriate imputation strategies for the differential analysis of proteins.

Collapse

Zhang X, Yan C, Gao C, Malin BA, Chen Y. Predicting Missing Values in Medical Data via XGBoost Regression. J Healthc Inform Res 2020;4:383-394. [PMID: 33283143 DOI: 10.1007/s41666-020-00077-1] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]

Yu SH, Kyriakidou P, Cox J. Isobaric Matching between Runs and Novel PSM-Level Normalization in MaxQuant Strongly Improve Reporter Ion-Based Quantification. J Proteome Res 2020;19:3945-3954. [PMID: 32892627 PMCID: PMC7586393 DOI: 10.1021/acs.jproteome.0c00209] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]

Brombacher E, Schad A, Kreutz C. Tail-Robust Quantile Normalization. Proteomics 2020;20:e2000068. [PMID: 32865322 DOI: 10.1002/pmic.202000068] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Revised: 08/25/2020] [Indexed: 11/07/2022]

Gillies CE, Jennaro TS, Puskarich MA, Sharma R, Ward KR, Fan X, Jones AE, Stringer KA. A Multilevel Bayesian Approach to Improve Effect Size Estimation in Regression Modeling of Metabolomics Data Utilizing Imputation with Uncertainty. Metabolites 2020;10:E319. [PMID: 32781624 PMCID: PMC7465156 DOI: 10.3390/metabo10080319] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 07/29/2020] [Accepted: 08/03/2020] [Indexed: 01/12/2023] Open

Affiliation(s)

Christopher E. Gillies Department of Emergency Medicine, University of Michigan, Ann Arbor, MI 48109, USA; Michigan Center for Integrative Research in Critical Care (MCIRCC), University of Michigan, Ann Arbor, MI 48109, USA; Michigan Institute for Data Science (MIDAS), Office of Research, University of Michigan, Ann Arbor, MI 48109, USA
Theodore S. Jennaro Department of Clinical Pharmacy, College of Pharmacy, University of Michigan, Ann Arbor, MI 48109, USA;
Michael A. Puskarich Department of Emergency Medicine, University of Minnesota, Minneapolis, MN 55455, USA;
Ruchi Sharma Department of Biomedical Engineering, University of Michigan, Ann Arbor, MI 48109, USA;
Kevin R. Ward Department of Emergency Medicine, University of Michigan, Ann Arbor, MI 48109, USA; Michigan Center for Integrative Research in Critical Care (MCIRCC), University of Michigan, Ann Arbor, MI 48109, USA; Michigan Institute for Data Science (MIDAS), Office of Research, University of Michigan, Ann Arbor, MI 48109, USA Department of Biomedical Engineering, University of Michigan, Ann Arbor, MI 48109, USA;
Xudong Fan Michigan Center for Integrative Research in Critical Care (MCIRCC), University of Michigan, Ann Arbor, MI 48109, USA; Michigan Institute for Data Science (MIDAS), Office of Research, University of Michigan, Ann Arbor, MI 48109, USA Department of Biomedical Engineering, University of Michigan, Ann Arbor, MI 48109, USA;
Alan E. Jones Department of Emergency Medicine, University of Mississippi Medical Center, Jackson, MS 39216, USA;
Kathleen A. Stringer Michigan Center for Integrative Research in Critical Care (MCIRCC), University of Michigan, Ann Arbor, MI 48109, USA; The NMR Metabolomics Laboratory, Department of Clinical Pharmacy, College of Pharmacy, University of Michigan, Ann Arbor, MI 48109, USA Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, School of Medicine, University of Michigan, Ann Arbor, MI 48109, USA

Collapse

Kakileti ST, Manjunath G, Dekker A, Wee L. Robust Estimation of Breast Cancer Incidence Risk in Presence of Incomplete or Inaccurate Information. Asian Pac J Cancer Prev 2020;21:2307-2313. [PMID: 32856859 PMCID: PMC7771951 DOI: 10.31557/apjcp.2020.21.8.2307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2020] [Indexed: 11/25/2022] Open

Abstract

Purpose:

To evaluate the robustness of multiple machine learning classifiers for breast cancer risk estimation in the presence of incomplete or inaccurate information.

Data and methods:

Open data for this study was obtained from the BCSC Data Resource (http://breastscreening.cancer.gov/). We conducted two ablation-type experiments to compare the robustness of different classifiers where we randomly switched known information to missing with a missing probability of p_m in one experiment, and randomly corrupted the existing information with a probability of p_c in another experiment. We considered three prominent machine-learning classifiers such as Logistic regression (LR), Random Forests (RF) and a custom Neural Network (NN) architecture and compared their degradation of discrimination performance as a function of increasing probability of missing or inaccurate data.

Results:

LR, RF and custom NN resulted in an Area Under Curve (AUC) of 0.645, 0.643 and 0.649, respectively, on a test set with 500,000 total observations. When we manipulated the data by varying probabilities p_m and p_c from 0 to 1, NN resulted in better performance in terms of AUC compared to RF and LR as long as less than half the data was missing/inaccurate (that is, for values of p_m < 0.5 and p_c < 0.5). However, for missing (p_m) or corruption (p_c) probabilities above 0.5, LR gave similar performance as the custom NN. RF resulted in overall poorer performance when the data had additional missing or incorrect entries.

Conclusion:

In cases where the input information is missing or inaccurate, our experiments show that the proposed custom NN provides reliable risk estimates in medical datasets like BCSC. These results are particularly important in health care applications where not every attribute of the individual participant might be available.

Collapse

Hossain T, Ahad MAR, Inoue S. A Method for Sensor-Based Activity Recognition in Missing Data Scenario. Sensors (Basel) 2020;20:E3811. [PMID: 32650486 DOI: 10.3390/s20143811] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 06/09/2020] [Accepted: 06/30/2020] [Indexed: 11/30/2022]

Abstract

Sensor-based human activity recognition has various applications in the arena of healthcare, elderly smart-home, sports, etc. There are numerous works in this field—to recognize various human activities from sensor data. However, those works are based on data patterns that are clean data and have almost no missing data, which is a genuine concern for real-life healthcare centers. Therefore, to address this problem, we explored the sensor-based activity recognition when some partial data were lost in a random pattern. In this paper, we propose a novel method to improve activity recognition while having missing data without any data recovery. For the missing data pattern, we considered data to be missing in a random pattern, which is a realistic missing pattern for sensor data collection. Initially, we created different percentages of random missing data only in the test data, while the training was performed on good quality data. In our proposed approach, we explicitly induce different percentages of missing data randomly in the raw sensor data to train the model with missing data. Learning with missing data reinforces the model to regulate missing data during the classification of various activities that have missing data in the test module. This approach demonstrates the plausibility of the machine learning model, as it can learn and predict from an identical domain. We exploited several time-series statistical features to extricate better features in order to comprehend various human activities. We explored both support vector machine and random forest as machine learning models for activity classification. We developed a synthetic dataset to empirically evaluate the performance and show that the method can effectively improve the recognition accuracy from 80.8% to 97.5%. Afterward, we tested our approach with activities from two challenging benchmark datasets: the human activity sensing consortium (HASC) dataset and single chest-mounted accelerometer dataset. We examined the method for different missing percentages, varied window sizes, and diverse window sliding widths. Our explorations demonstrated improved recognition performances even in the presence of missing data. The achieved results provide persuasive findings on sensor-based activity recognition in the presence of missing data.

Collapse

Liu M, Dongre A. Proper imputation of missing values in proteomics datasets for differential expression analysis. Brief Bioinform 2020;22:5855395. [PMID: 32520347 DOI: 10.1093/bib/bbaa112] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 04/16/2020] [Accepted: 05/11/2020] [Indexed: 01/01/2023] Open

Lin TI, Wang WL. Multivariate-t linear mixed models with censored responses, intermittent missing values and heavy tails. Stat Methods Med Res 2020;29:1288-1304. [PMID: 31242813 DOI: 10.1177/0962280219857103] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Awad A, Bader-El-Den M, McNicholas J, Briggs J, El-Sonbaty Y. Predicting hospital mortality for intensive care unit patients: Time-series analysis. Health Informatics J 2019;26:1043-1059. [PMID: 31347428 DOI: 10.1177/1460458219850323] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]

Battey HS, Cox DR, Jackson MV. On the linear in probability model for binary data. R Soc Open Sci 2019;6:190067. [PMID: 31218050 PMCID: PMC6549984 DOI: 10.1098/rsos.190067] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Accepted: 04/05/2019] [Indexed: 05/23/2023]

Dong X, Chen C, Geng Q, Cao Z, Chen X, Lin J, Jin Y, Zhang Z, Shi Y, Zhang XD. An Improved Method of Handling Missing Values in the Analysis of Sample Entropy for Continuous Monitoring of Physiological Signals. Entropy (Basel) 2019;21:e21030274. [PMID: 33266989 PMCID: PMC7514754 DOI: 10.3390/e21030274] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Revised: 03/08/2019] [Accepted: 03/09/2019] [Indexed: 11/17/2022]

Li L, Lee JH, Sutton SK, Simmons VN, Brandon TH. A Bayesian transition model for missing longitudinal binary outcomes and an application to a smoking cessation study. STAT MODEL 2019;20:310-338. [PMID: 33854408 DOI: 10.1177/1471082x18821489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Pohl S, von Davier M. Commentary: On the Importance of the Speed-Ability Trade-Off When Dealing With Not Reached Items. Front Psychol 2018;9:1988. [PMID: 30425667 PMCID: PMC6218577 DOI: 10.3389/fpsyg.2018.01988] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Accepted: 09/27/2018] [Indexed: 11/13/2022] Open

Ali Ali B, Lefering R, Belzunegui Otano T. Quality assessment of Major Trauma Registry of Navarra: completeness and correctness. Int J Inj Contr Saf Promot 2018;26:137-144. [PMID: 30251595 DOI: 10.1080/17457300.2018.1515229] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

Rosinska M, Pantazis N, Janiec J, Pharris A, Amato-Gauci AJ, Quinten C. Potential adjustment methodology for missing data and reporting delay in the HIV Surveillance System, European Union/European Economic Area, 2015. Euro Surveill 2018;23:1700359. [PMID: 29897039 PMCID: PMC6152165 DOI: 10.2807/1560-7917.es.2018.23.23.1700359] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open

Ondeck NT, Fu MC, Skrip LA, McLynn RP, Su EP, Grauer JN. Treatments of Missing Values in Large National Data Affect Conclusions: The Impact of Multiple Imputation on Arthroplasty Research. J Arthroplasty 2018;33:661-7. [PMID: 29153865 DOI: 10.1016/j.arth.2017.10.034] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/10/2017] [Revised: 10/11/2017] [Accepted: 10/11/2017] [Indexed: 02/01/2023] Open

Abstract

BACKGROUND

Despite the advantages of large, national datasets, one continuing concern is missing data values. Complete case analysis, where only cases with complete data are analyzed, is commonly used rather than more statistically rigorous approaches such as multiple imputation. This study characterizes the potential selection bias introduced using complete case analysis and compares the results of common regressions using both techniques following unicompartmental knee arthroplasty.

METHODS

Patients undergoing unicompartmental knee arthroplasty were extracted from the 2005 to 2015 National Surgical Quality Improvement Program. As examples, the demographics of patients with and without missing preoperative albumin and hematocrit values were compared. Missing data were then treated with both complete case analysis and multiple imputation (an approach that reproduces the variation and associations that would have been present in a full dataset) and the conclusions of common regressions for adverse outcomes were compared.

RESULTS

A total of 6117 patients were included, of which 56.7% were missing at least one value. Younger, female, and healthier patients were more likely to have missing preoperative albumin and hematocrit values. The use of complete case analysis removed 3467 patients from the study in comparison with multiple imputation which included all 6117 patients. The 2 methods of handling missing values led to differing associations of low preoperative laboratory values with commonly studied adverse outcomes.

CONCLUSION

The use of complete case analysis can introduce selection bias and may lead to different conclusions in comparison with the statistically rigorous multiple imputation approach. Joint surgeons should consider the methods of handling missing values when interpreting arthroplasty research.

Collapse

Ebert JF, Huibers L, Christensen B, Christensen MB. Paper- or Web-Based Questionnaire Invitations as a Method for Data Collection: Cross-Sectional Comparative Study of Differences in Response Rate, Completeness of Data, and Financial Cost. J Med Internet Res 2018;20:e24. [PMID: 29362206 PMCID: PMC5801515 DOI: 10.2196/jmir.8353] [Citation(s) in RCA: 171] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Revised: 10/12/2017] [Accepted: 11/16/2017] [Indexed: 11/27/2022] Open

Abstract

Background

Paper questionnaires have traditionally been the first choice for data collection in research. However, declining response rates over the past decade have increased the risk of selection bias in cross-sectional studies. The growing use of the Internet offers new ways of collecting data, but trials using Web-based questionnaires have so far seen mixed results. A secure, online digital mailbox (e-Boks) linked to a civil registration number became mandatory for all Danish citizens in 2014 (exemption granted only in extraordinary cases). Approximately 89% of the Danish population have a digital mailbox, which is used for correspondence with public authorities.

Objective

We aimed to compare response rates, completeness of data, and financial costs for different invitation methods: traditional surface mail and digital mail.

Methods

We designed a cross-sectional comparative study. An invitation to participate in a survey on help-seeking behavior in out-of-hours care was sent to two groups of randomly selected citizens from age groups 30-39 and 50-59 years and parents to those aged 0-4 years using either traditional surface mail (paper group) or digital mail sent to a secure online mailbox (digital group). Costs per respondent were measured by adding up all costs for handling, dispatch, printing, and work salary and then dividing the total figure by the number of respondents. Data completeness was assessed by comparing the number of missing values between the two methods. Socioeconomic variables (age, gender, family income, education duration, immigrant status, and job status) were compared both between respondents and nonrespondents and within these groups to evaluate the degree of selection bias.

Results

A total 3600 citizens were invited in each group; 1303 (36.29%) responded to the digital invitation and 1653 (45.99%) to the paper invitation (difference 9.66%, 95% CI 7.40-11.92). The costs were €1.51 per respondent for the digital group and €15.67 for paper group respondents. Paper questionnaires generally had more missing values; this was significant in five of 17 variables (P<.05). Substantial differences were found in the socioeconomic variables between respondents and nonrespondents, whereas only minor differences were seen within the groups of respondents and nonrespondents.

Conclusions

Although we found lower response rates for Web-based invitations, this solution was more cost-effective (by a factor of 10) and had slightly lower numbers of missing values than questionnaires sent with paper invitations. Analyses of socioeconomic variables showed almost no difference between nonrespondents in both groups, which could imply that the lower response rate in the digital group does not necessarily increase the level of selection bias. Invitations to questionnaire studies via digital mail may be an excellent option for collecting research data in the future. This study may serve as the foundational pillar of digital data collection in health care research in Scandinavia and other countries considering implementing similar systems.

Collapse

Cahsai A, Anagnostopoulos C, Triantafillou P. Scalable Data Quality for Big Data: The Pythia Framework for Handling Missing Values. Big Data 2015;3:159-172. [PMID: 27442958 DOI: 10.1089/big.2015.0002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Abstract

Solving the missing-value (MV) problem with small estimation errors in large-scale data environments is a notoriously resource-demanding task. The most widely used MV imputation approaches are computationally expensive because they explicitly depend on the volume and the dimension of the data. Moreover, as datasets and their user community continuously grow, the problem can only be exacerbated. In an attempt to deal with such a problem, in our previous work, we introduced a novel framework coined Pythia, which employs a number of distributed data nodes (cohorts), each of which contains a partition of the original dataset. To perform MV imputation, the Pythia, based on specific machine and statistical learning structures (signatures), selects the most appropriate subset of cohorts to perform locally a missing value substitution algorithm (MVA). This selection relies on the principle that particular subset of cohorts maintains the most relevant partition of the dataset. In addition to this, as Pythia uses only part of the dataset for imputation and accesses different cohorts in parallel, it improves efficiency, scalability, and accuracy compared to a single machine (coined Godzilla), which uses the entire massive dataset to compute imputation requests. Although this article is an extension of our previous work, we particularly investigate the robustness of the Pythia framework and show that the Pythia is independent from any MVA and signature construction algorithms. In order to facilitate our research, we considered two well-known MVAs (namely K-nearest neighbor and expectation-maximization imputation algorithms), as well as two machine and neural computational learning signature construction algorithms based on adaptive vector quantization and competitive learning. We prove comprehensive experiments to assess the performance of the Pythia against Godzilla and showcase the benefits stemmed from this framework.

Collapse

Loh WY, He X, Man M. A regression tree approach to identifying subgroups with differential treatment effects. Stat Med 2015;34:1818-33. [PMID: 25656439 PMCID: PMC4393794 DOI: 10.1002/sim.6454] [Citation(s) in RCA: 96] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2014] [Revised: 11/13/2014] [Accepted: 01/20/2015] [Indexed: 12/13/2022]

Goetz CG, Luo S, Wang L, Tilley BC, LaPelle NR, Stebbins GT. Handling missing values in the MDS-UPDRS. Mov Disord 2015;30:1632-8. [PMID: 25649812 PMCID: PMC5072275 DOI: 10.1002/mds.26153] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2014] [Revised: 12/16/2014] [Accepted: 12/18/2014] [Indexed: 11/10/2022] Open

Priya RD, Kuppuswami S. Drawing inferences from clinical studies with missing values using genetic algorithm. Int J Bioinform Res Appl 2014;10:613-27. [PMID: 25335566 DOI: 10.1504/ijbra.2014.065245] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Kwak YT, Yang Y, Park SG. Missing data analysis in drug-naïve Alzheimer's disease with behavioral and psychological symptoms. Yonsei Med J 2013;54:825-31. [PMID: 23709414 PMCID: PMC3663226 DOI: 10.3349/ymj.2013.54.4.825] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open

Holditch-Davis D, Levy J. Potential Pitfalls in Collecting and Analyzing Longitudinal Data from Chronically Ill Populations. Newborn Infant Nurs Rev 2010;10:10-18. [PMID: 20190867 PMCID: PMC2826814 DOI: 10.1053/j.nainr.2009.12.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Plachta-Danielzik S, Bartel C, Raspe H, Thyen U, Landsberg B, Müller MJ. Assessment of representativity of a study population - experience of the Kiel Obesity Prevention Study (KOPS). Obes Facts 2008;1:325-30. [PMID: 20054196 PMCID: PMC6452140 DOI: 10.1159/000176609] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open

Velden MVD, Bijmolt THA. Generalized canonical correlation analysis of matrices with missing rows: a simulation study. Psychometrika 2006;71:323-331. [PMID: 28197957 DOI: 10.1007/s11336-004-1168-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2006] [Accepted: 06/09/2006] [Indexed: 06/06/2023]

White IR, Moodie E, Thompson SG, Croudace T. A modelling strategy for the analysis of clinical trials with partly missing longitudinal data. Int J Methods Psychiatr Res 2003;12:139-50. [PMID: 12953141 PMCID: PMC6878453 DOI: 10.1002/mpr.150] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open