1
|
Ngo H, Fang H, Rumbut J, Wang H. Federated Fuzzy Clustering for Decentralized Incomplete Longitudinal Behavioral Data. IEEE INTERNET OF THINGS JOURNAL 2024; 11:14657-14670. [PMID: 38605934 PMCID: PMC11006372 DOI: 10.1109/jiot.2023.3343719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/13/2024]
Abstract
The use of medical data for machine learning, including unsupervised methods such as clustering, is often restricted by privacy regulations such as the Health Insurance Portability and Accountability Act (HIPAA). Medical data is sensitive and highly regulated and anonymization is often insufficient to protect a patient's identity. Traditional clustering algorithms are also unsuitable for longitudinal behavioral health trials, which often have missing data and observe individual behaviors over varying time periods. In this work, we develop a new decentralized federated multiple imputation-based fuzzy clustering algorithm for complex longitudinal behavioral trial data collected from multisite randomized controlled trials over different time periods. Federated learning (FL) preserves privacy by aggregating model parameters instead of data. Unlike previous FL methods, this proposed algorithm requires only two rounds of communication and handles clients with varying numbers of time points for incomplete longitudinal data. The model is evaluated on both empirical longitudinal dietary health data and simulated clusters with different numbers of clients, effect sizes, correlations, and sample sizes. The proposed algorithm converges rapidly and achieves desirable performance on multiple clustering metrics. This new method allows for targeted treatments for various patient groups while preserving their data privacy and enables the potential for broader applications in the Internet of Medical Things.
Collapse
Affiliation(s)
- Hieu Ngo
- College of Engineering, University of Massachusetts Dartmouth, North Dartmouth, MA, 02747
| | - Hua Fang
- Department of Computer and Information Science, University of Massachusetts Dartmouth, North Dartmouth, MA, 02747 and the Department of Population and Quantitative Health Science, University of Massachusetts Chan Medical School, Worcester, MA 01655 USA
| | - Joshua Rumbut
- College of Engineering, University of Massachusetts Dartmouth, North Dartmouth, MA, 02747 and the Department of Population and Quantitative Health Science, University of Massachusetts Chan Medical School, Worcester, MA 01655 USA
| | - Honggang Wang
- Department of Graduate Computer Science and Engineering, Katz School of Science and Health, Yeshiva University, New York City, NY, 10033
| |
Collapse
|
2
|
Rumbut J, Fang H, Wang H. Topic modeling for systematic review of visual analytics in incomplete longitudinal behavioral trial data. SMART HEALTH (AMSTERDAM, NETHERLANDS) 2020; 18:100142. [PMID: 33344744 PMCID: PMC7745978 DOI: 10.1016/j.smhl.2020.100142] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Longitudinal observational and randomized controlled trials (RCT) are widely applied in biomedical behavioral studies and increasingly implemented in smart health systems. These trials frequently produce data that are high-dimensional, correlated, and contain missing values, posing significant analytic challenges. Notably, visual analytics are underdeveloped in this area. In this paper, we developed a longitudinal topic model to implement the systematic review of visual analytic methods presented at the IEEE VIS conference over its 28 year history, in comparison with MIFuzzy, an integrated and comprehensive soft computing tool for behavioral trajectory pattern recognition, validation, and visualization of incomplete longitudinal data. The findings of our longitudinal topic modeling highlight the trend patterns of visual analytics development in longitudinal behavioral trials and underscore the gigantic gap of existing robust visual analytic methods and actual working algorithms for longitudinal behavioral trial data. Future research areas for visual analytics in behavioral trial studies and smart health systems are discussed.
Collapse
Affiliation(s)
- Joshua Rumbut
- Department of Computer and Information Science, University of Massachusetts Dartmouth, North Dartmouth, MA, 02747, USA
- Department of Population and Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA, 01655, USA
| | - Hua Fang
- Department of Computer and Information Science, University of Massachusetts Dartmouth, North Dartmouth, MA, 02747, USA
- Department of Population and Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA, 01655, USA
| | - Honggong Wang
- Department of Electrical and Computer Engineering, University of Massachusetts Dartmouth, North Dartmouth, MA, 02747, USA
| |
Collapse
|
3
|
Mahmud MS, Fang H, Carreiro S, Wang H, Boyer EW. Wearables technology for drug abuse detection: A survey of recent advancement. ACTA ACUST UNITED AC 2019. [DOI: 10.1016/j.smhl.2018.09.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
4
|
Fang H, Zhang Z. An Enhanced Visualization Method to Aid Behavioral Trajectory Pattern Recognition Infrastructure for Big Longitudinal Data. IEEE TRANSACTIONS ON BIG DATA 2018; 4:289-298. [PMID: 29888298 PMCID: PMC5990046 DOI: 10.1109/tbdata.2017.2653815] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Big longitudinal data provide more reliable information for decision making and are common in all kinds of fields. Trajectory pattern recognition is in an urgent need to discover important structures for such data. Developing better and more computationally-efficient visualization tool is crucial to guide this technique. This paper proposes an enhanced projection pursuit (EPP) method to better project and visualize the structures (e.g. clusters) of big high-dimensional (HD) longitudinal data on a lower-dimensional plane. Unlike classic PP methods potentially useful for longitudinal data, EPP is built upon nonlinear mapping algorithms to compute its stress (error) function by balancing the paired weights for between and within structure stress while preserving original structure membership in the high-dimensional space. Specifically, EPP solves an NP hard optimization problem by integrating gradual optimization and non-linear mapping algorithms, and automates the searching of an optimal number of iterations to display a stable structure for varying sample sizes and dimensions. Using publicized UCI and real longitudinal clinical trial datasets as well as simulation, EPP demonstrates its better performance in visualizing big HD longitudinal data.
Collapse
Affiliation(s)
- Hua Fang
- Department of Computer and Information Science, Department of Mathematics, University of Massachusetts Dartmouth, 285 Old Westport Rd, Dartmouth, MA, 02747, and Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA, 01605
| | - Zhaoyang Zhang
- College of Engineering, University of Massachusetts Dartmouth and Department of Quantitative Health Sciences, University of Massachusetts Medical School
| |
Collapse
|
5
|
Gurugubelli VS, Li Z, Wang H, Fang H. eFCM: An Enhanced Fuzzy C-Means Algorithm for Longitudinal Intervention Data. INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING, AND COMMUNICATIONS : [PROCEEDINGS]. INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS 2018; 2018:912-916. [PMID: 30906794 PMCID: PMC6428443 DOI: 10.1109/iccnc.2018.8390419] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Clustering methods become increasingly important in analyzing heterogeneity of treatment effects, especially in longitudinal behavioral intervention studies. Methods such as K-means and Fuzzy C-means (FCM) have been widely endorsed to identify distinct groups of different types of data. Build upon our MIFuzzy [1], our goal is to concurrently handle multiple methodological issues in studying high dimensional longitudinal intervention data with missing values. Particularly, this paper focuses on the initialization issue of FCM and proposes a new initialization method to overcome the local optimal problem and decrease the convergence time in handling high-dimensional data with missing values for overlapping clusters. Based on the idea of K-means++ [9], we proposed an enhanced Fuzzy C-means clustering (eFCM) and incorporated it into our MIFuzzy. This method was evaluated using real longitudinal intervention data, classic and generic datasets. Compared to conventional FCM, our findings indicate eFCM can improve computational efficiency and avoid the local optimization.
Collapse
Affiliation(s)
- Venkata Sukumar Gurugubelli
- Department of Computer and Information Science, University of Massachusetts - Dartmouth, Dartmouth, MA, 02747
- Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA
| | - Zhouzhou Li
- Department of Electrical and Computer Engineering, University of Massachusetts - Dartmouth, Dartmouth, MA, 02747
- Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA
| | - Honggang Wang
- Department of Electrical and Computer Engineering, University of Massachusetts - Dartmouth, Dartmouth, MA, 02747
| | - Hua Fang
- Department of Computer and Information Science, University of Massachusetts - Dartmouth, Dartmouth, MA, 02747
- Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA
| |
Collapse
|
6
|
Kim SS, Fang H, Bernstein K, Zhang Z, DiFranza J, Ziedonis D, Allison J. Acculturation, Depression, and Smoking Cessation: a trajectory pattern recognition approach. Tob Induc Dis 2017; 15:33. [PMID: 28747857 PMCID: PMC5525352 DOI: 10.1186/s12971-017-0135-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2016] [Accepted: 07/06/2017] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Korean Americans are known for a high smoking prevalence within the Asian American population. This study examined the effects of acculturation and depression on Korean Americans' smoking cessation and abstinence. METHODS This is a secondary data analysis of a smoking cessation study that implemented eight weekly individualized counseling sessions of a culturally adapted cessation intervention for the treatment arm and a standard cognitive behavioral therapy for the comparison arm. Both arms also received nicotine patches for 8 weeks. A newly developed non-parametric trajectory pattern recognition model (MI-Fuzzy) was used to identify cognitive and behavioral response patterns to a smoking cessation intervention among 97 Korean American smokers (81 men and 16 women). RESULTS Three distinctive response patterns were revealed: (a) Culturally Adapted (CA), since all identified members received the culturally adapted intervention; (b) More Bicultural (MB), for having higher scores of bicultural acculturation; and (c) Less Bicultural (LB), for having lower scores of bicultural acculturation. The CA smokers were those from the treatment arm, while MB and LB groups were from the comparison arm. The LB group differed in depression from the CA and MB groups and no difference was found between the CA and MB groups. Although depression did not directly affect 12-month prolonged abstinence, the LB group was most depressed and achieved the lowest rate of abstinence (LB: 1.03%; MB: 5.15%; CA: 21.65%). CONCLUSION A culturally adaptive intervention should target Korean American smokers with a high level of depression and a low level of biculturalism to assist in their smoking cessation. TRIAL REGISTRATION NCT01091363. Registered 21 March 2010.
Collapse
Affiliation(s)
- Sun S Kim
- University of Massachusetts, Boston, Boston, MA 02125 USA
| | - Hua Fang
- University of Massachusetts Dartmouth and Medical School Dartmouth, Dartmouth, MA 02747 USA
- Department of Computer and Information Science, College of Engineering, University of Massachusetts Dartmouth, Dion Building, Room 317 285 Old Westport Road Dartmouth, Dartmouth, MA 02747-2300 USA
- Division of Biostatistics and Health Services Research Department of Quantitative Health Sciences, University of Massachusetts Medical School, Albert Sherman Bldg, Office: AS8-2061, 368 Plantation St. Worcester, Dartmouth, MA 01605-0002 USA
| | - Kunsook Bernstein
- Hunter College, City University of New York, New York, New York 10010 USA
| | - Zhaoyang Zhang
- University of Massachusetts Dartmouth and Medical School Dartmouth, Dartmouth, MA 02747 USA
| | - Joseph DiFranza
- University of Massachusetts Dartmouth and Medical School Dartmouth, Dartmouth, MA 02747 USA
| | - Douglas Ziedonis
- University of California San Diego, Deparetment of Psychiatry, 9500 Gilman Drive #0602, La Jolla, CA 92093-0602 USA
| | - Jeroan Allison
- University of Massachusetts Dartmouth and Medical School Dartmouth, Dartmouth, MA 02747 USA
| |
Collapse
|
7
|
Abstract
Missing data are common in longitudinal observational and randomized controlled trials in smart health studies. Multiple-imputation based fuzzy clustering is an emerging non-parametric soft computing method, used for either semi-supervised or unsupervised learning. Multiple imputation (MI) has been widely-used in missing data analyses, but has not yet been scrutinized for unsupervised learning methods, although they are important for explaining the heterogeneity of treatment effects. Built upon our previous work on MIfuzzy clustering, this paper introduces the MIFuzzy concepts and performance, theoretically, empirically and numerically demonstrate how MI-based approach can reduce the uncertainty of clustering accuracy in comparison to non- and single-imputation based clustering approach. This paper advances our understanding of the utility and strength of MIFuzzy clustering approach to processing incomplete longitudinal behavioral intervention data.
Collapse
Affiliation(s)
- Hua Fang
- Department of Computer and Information Science, University of Massachusetts Dartmouth, Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA 01655
| |
Collapse
|
8
|
Zhang Z, Fang H. Multiple- vs Non- or Single-Imputation based Fuzzy Clustering for Incomplete Longitudinal Behavioral Intervention Data. ...IEEE...INTERNATIONAL CONFERENCE ON CONNECTED HEALTH: APPLICATIONS, SYSTEMS AND ENGINEERING TECHNOLOGIES. IEEE INTERNATIONAL CONFERENCE ON CONNECTED HEALTH: APPLICATIONS, SYSTEMS AND ENGINEERING TECHNOLOGIES 2016; 2016:219-228. [PMID: 29034067 PMCID: PMC5635859 DOI: 10.1109/chase.2016.19] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Disentangling patients' behavioral variations is a critical step for better understanding an intervention's effects on individual outcomes. Missing data commonly exist in longitudinal behavioral intervention studies. Multiple imputation (MI) has been well studied for missing data analyses in the statistical field, however, has not yet been scrutinized for clustering or unsupervised learning, which are important techniques for explaining the heterogeneity of treatment effects. Built upon previous work on MI fuzzy clustering, this paper theoretically, empirically and numerically demonstrate how MI-based approach can reduce the uncertainty of clustering accuracy in comparison to non-and single-imputation based clustering approach. This paper advances our understanding of the utility and strength of multiple-imputation (MI) based fuzzy clustering approach to processing incomplete longitudinal behavioral intervention data.
Collapse
Affiliation(s)
- Zhaoyang Zhang
- Division of Biostatistics and Health Services Research, Department of Quantitative Health Science, University of Massachusetts Medical School, Worcester, MA 01655
| | - Hua Fang
- Division of Biostatistics and Health Services Research, Department of Quantitative Health Science, University of Massachusetts Medical School, Worcester, MA 01655
| |
Collapse
|
9
|
Zhang Z, Fang H, Wang H. Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth. J Med Syst 2016; 40:146. [PMID: 27126063 DOI: 10.1007/s10916-016-0499-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Accepted: 04/11/2016] [Indexed: 11/27/2022]
Abstract
Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering are more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services.
Collapse
Affiliation(s)
- Zhaoyang Zhang
- Department of Quantitative Health Science, University of Massachusetts Medical School, Worcester, MA, 01655, USA
| | - Hua Fang
- Department of Quantitative Health Science, University of Massachusetts Medical School, Worcester, MA, 01655, USA.
| | - Honggang Wang
- Department of Electrical and Computer Engineering, University of Massachusetts Dartmouth, North Dartmouth, MA, 02747, USA
| |
Collapse
|