1
|
Ngo H, Fang H, Rumbut J, Wang H. Federated Fuzzy Clustering for Decentralized Incomplete Longitudinal Behavioral Data. IEEE INTERNET OF THINGS JOURNAL 2024; 11:14657-14670. [PMID: 38605934 PMCID: PMC11006372 DOI: 10.1109/jiot.2023.3343719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/13/2024]
Abstract
The use of medical data for machine learning, including unsupervised methods such as clustering, is often restricted by privacy regulations such as the Health Insurance Portability and Accountability Act (HIPAA). Medical data is sensitive and highly regulated and anonymization is often insufficient to protect a patient's identity. Traditional clustering algorithms are also unsuitable for longitudinal behavioral health trials, which often have missing data and observe individual behaviors over varying time periods. In this work, we develop a new decentralized federated multiple imputation-based fuzzy clustering algorithm for complex longitudinal behavioral trial data collected from multisite randomized controlled trials over different time periods. Federated learning (FL) preserves privacy by aggregating model parameters instead of data. Unlike previous FL methods, this proposed algorithm requires only two rounds of communication and handles clients with varying numbers of time points for incomplete longitudinal data. The model is evaluated on both empirical longitudinal dietary health data and simulated clusters with different numbers of clients, effect sizes, correlations, and sample sizes. The proposed algorithm converges rapidly and achieves desirable performance on multiple clustering metrics. This new method allows for targeted treatments for various patient groups while preserving their data privacy and enables the potential for broader applications in the Internet of Medical Things.
Collapse
Affiliation(s)
- Hieu Ngo
- College of Engineering, University of Massachusetts Dartmouth, North Dartmouth, MA, 02747
| | - Hua Fang
- Department of Computer and Information Science, University of Massachusetts Dartmouth, North Dartmouth, MA, 02747 and the Department of Population and Quantitative Health Science, University of Massachusetts Chan Medical School, Worcester, MA 01655 USA
| | - Joshua Rumbut
- College of Engineering, University of Massachusetts Dartmouth, North Dartmouth, MA, 02747 and the Department of Population and Quantitative Health Science, University of Massachusetts Chan Medical School, Worcester, MA 01655 USA
| | - Honggang Wang
- Department of Graduate Computer Science and Engineering, Katz School of Science and Health, Yeshiva University, New York City, NY, 10033
| |
Collapse
|
2
|
|
3
|
Rumbut J, Fang H, Wang H. Topic modeling for systematic review of visual analytics in incomplete longitudinal behavioral trial data. SMART HEALTH (AMSTERDAM, NETHERLANDS) 2020; 18:100142. [PMID: 33344744 PMCID: PMC7745978 DOI: 10.1016/j.smhl.2020.100142] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Longitudinal observational and randomized controlled trials (RCT) are widely applied in biomedical behavioral studies and increasingly implemented in smart health systems. These trials frequently produce data that are high-dimensional, correlated, and contain missing values, posing significant analytic challenges. Notably, visual analytics are underdeveloped in this area. In this paper, we developed a longitudinal topic model to implement the systematic review of visual analytic methods presented at the IEEE VIS conference over its 28 year history, in comparison with MIFuzzy, an integrated and comprehensive soft computing tool for behavioral trajectory pattern recognition, validation, and visualization of incomplete longitudinal data. The findings of our longitudinal topic modeling highlight the trend patterns of visual analytics development in longitudinal behavioral trials and underscore the gigantic gap of existing robust visual analytic methods and actual working algorithms for longitudinal behavioral trial data. Future research areas for visual analytics in behavioral trial studies and smart health systems are discussed.
Collapse
Affiliation(s)
- Joshua Rumbut
- Department of Computer and Information Science, University of Massachusetts Dartmouth, North Dartmouth, MA, 02747, USA
- Department of Population and Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA, 01655, USA
| | - Hua Fang
- Department of Computer and Information Science, University of Massachusetts Dartmouth, North Dartmouth, MA, 02747, USA
- Department of Population and Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA, 01655, USA
| | - Honggong Wang
- Department of Electrical and Computer Engineering, University of Massachusetts Dartmouth, North Dartmouth, MA, 02747, USA
| |
Collapse
|
4
|
Ning X, Li W, Xu J. The Principle of Homology Continuity and Geometrical Covering Learning for Pattern Recognition. INT J PATTERN RECOGN 2018. [DOI: 10.1142/s0218001418500428] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Homology Continuity is a fundamental property of the nature, but few of the traditional pattern recognition algorithms were aware of it. Firstly, this paper gives a brief description to the Principle of Homology Continuity (PHC), and tries to mathematically redefine it. Then, we introduce a PHC-based pattern learning method — Geometrical Covering Learning (GCL), following the Hyper sausage neural network as an instance of GCL. Lastly, we propose a GCL solution to the “two-spirals” pattern recognition problem. The final experimental results show that the new method is feasible and efficient.
Collapse
Affiliation(s)
- Xin Ning
- Lab of Artificial Neural Networks, Institute of Semiconductors, CAS, Beijing 100083, P. R. China
- School of Microelectronics, University of Chinese Academy of Sciences, Beijing 100029, P. R. China
- Cognitive Computing Technology Wave Joint Lab, Beijing 100083, P. R. China
| | - Weijun Li
- Lab of Artificial Neural Networks, Institute of Semiconductors, CAS, Beijing 100083, P. R. China
- School of Microelectronics, University of Chinese Academy of Sciences, Beijing 100029, P. R. China
- Cognitive Computing Technology Wave Joint Lab, Beijing 100083, P. R. China
| | - Jiang Xu
- Lab of Artificial Neural Networks, Institute of Semiconductors, CAS, Beijing 100083, P. R. China
| |
Collapse
|
5
|
Gurugubelli VS, Li Z, Wang H, Fang H. eFCM: An Enhanced Fuzzy C-Means Algorithm for Longitudinal Intervention Data. INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING, AND COMMUNICATIONS : [PROCEEDINGS]. INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS 2018; 2018:912-916. [PMID: 30906794 PMCID: PMC6428443 DOI: 10.1109/iccnc.2018.8390419] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Clustering methods become increasingly important in analyzing heterogeneity of treatment effects, especially in longitudinal behavioral intervention studies. Methods such as K-means and Fuzzy C-means (FCM) have been widely endorsed to identify distinct groups of different types of data. Build upon our MIFuzzy [1], our goal is to concurrently handle multiple methodological issues in studying high dimensional longitudinal intervention data with missing values. Particularly, this paper focuses on the initialization issue of FCM and proposes a new initialization method to overcome the local optimal problem and decrease the convergence time in handling high-dimensional data with missing values for overlapping clusters. Based on the idea of K-means++ [9], we proposed an enhanced Fuzzy C-means clustering (eFCM) and incorporated it into our MIFuzzy. This method was evaluated using real longitudinal intervention data, classic and generic datasets. Compared to conventional FCM, our findings indicate eFCM can improve computational efficiency and avoid the local optimization.
Collapse
Affiliation(s)
- Venkata Sukumar Gurugubelli
- Department of Computer and Information Science, University of Massachusetts - Dartmouth, Dartmouth, MA, 02747
- Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA
| | - Zhouzhou Li
- Department of Electrical and Computer Engineering, University of Massachusetts - Dartmouth, Dartmouth, MA, 02747
- Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA
| | - Honggang Wang
- Department of Electrical and Computer Engineering, University of Massachusetts - Dartmouth, Dartmouth, MA, 02747
| | - Hua Fang
- Department of Computer and Information Science, University of Massachusetts - Dartmouth, Dartmouth, MA, 02747
- Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA
| |
Collapse
|
6
|
Kim SS, Fang H, Bernstein K, Zhang Z, DiFranza J, Ziedonis D, Allison J. Acculturation, Depression, and Smoking Cessation: a trajectory pattern recognition approach. Tob Induc Dis 2017; 15:33. [PMID: 28747857 PMCID: PMC5525352 DOI: 10.1186/s12971-017-0135-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2016] [Accepted: 07/06/2017] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Korean Americans are known for a high smoking prevalence within the Asian American population. This study examined the effects of acculturation and depression on Korean Americans' smoking cessation and abstinence. METHODS This is a secondary data analysis of a smoking cessation study that implemented eight weekly individualized counseling sessions of a culturally adapted cessation intervention for the treatment arm and a standard cognitive behavioral therapy for the comparison arm. Both arms also received nicotine patches for 8 weeks. A newly developed non-parametric trajectory pattern recognition model (MI-Fuzzy) was used to identify cognitive and behavioral response patterns to a smoking cessation intervention among 97 Korean American smokers (81 men and 16 women). RESULTS Three distinctive response patterns were revealed: (a) Culturally Adapted (CA), since all identified members received the culturally adapted intervention; (b) More Bicultural (MB), for having higher scores of bicultural acculturation; and (c) Less Bicultural (LB), for having lower scores of bicultural acculturation. The CA smokers were those from the treatment arm, while MB and LB groups were from the comparison arm. The LB group differed in depression from the CA and MB groups and no difference was found between the CA and MB groups. Although depression did not directly affect 12-month prolonged abstinence, the LB group was most depressed and achieved the lowest rate of abstinence (LB: 1.03%; MB: 5.15%; CA: 21.65%). CONCLUSION A culturally adaptive intervention should target Korean American smokers with a high level of depression and a low level of biculturalism to assist in their smoking cessation. TRIAL REGISTRATION NCT01091363. Registered 21 March 2010.
Collapse
Affiliation(s)
- Sun S Kim
- University of Massachusetts, Boston, Boston, MA 02125 USA
| | - Hua Fang
- University of Massachusetts Dartmouth and Medical School Dartmouth, Dartmouth, MA 02747 USA
- Department of Computer and Information Science, College of Engineering, University of Massachusetts Dartmouth, Dion Building, Room 317 285 Old Westport Road Dartmouth, Dartmouth, MA 02747-2300 USA
- Division of Biostatistics and Health Services Research Department of Quantitative Health Sciences, University of Massachusetts Medical School, Albert Sherman Bldg, Office: AS8-2061, 368 Plantation St. Worcester, Dartmouth, MA 01605-0002 USA
| | - Kunsook Bernstein
- Hunter College, City University of New York, New York, New York 10010 USA
| | - Zhaoyang Zhang
- University of Massachusetts Dartmouth and Medical School Dartmouth, Dartmouth, MA 02747 USA
| | - Joseph DiFranza
- University of Massachusetts Dartmouth and Medical School Dartmouth, Dartmouth, MA 02747 USA
| | - Douglas Ziedonis
- University of California San Diego, Deparetment of Psychiatry, 9500 Gilman Drive #0602, La Jolla, CA 92093-0602 USA
| | - Jeroan Allison
- University of Massachusetts Dartmouth and Medical School Dartmouth, Dartmouth, MA 02747 USA
| |
Collapse
|
7
|
Abstract
Missing data are common in longitudinal observational and randomized controlled trials in smart health studies. Multiple-imputation based fuzzy clustering is an emerging non-parametric soft computing method, used for either semi-supervised or unsupervised learning. Multiple imputation (MI) has been widely-used in missing data analyses, but has not yet been scrutinized for unsupervised learning methods, although they are important for explaining the heterogeneity of treatment effects. Built upon our previous work on MIfuzzy clustering, this paper introduces the MIFuzzy concepts and performance, theoretically, empirically and numerically demonstrate how MI-based approach can reduce the uncertainty of clustering accuracy in comparison to non- and single-imputation based clustering approach. This paper advances our understanding of the utility and strength of MIFuzzy clustering approach to processing incomplete longitudinal behavioral intervention data.
Collapse
Affiliation(s)
- Hua Fang
- Department of Computer and Information Science, University of Massachusetts Dartmouth, Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA 01655
| |
Collapse
|
8
|
Zhang Z, Fang H, Wang H. A New MI-Based Visualization Aided Validation Index for Mining Big Longitudinal Web Trial Data. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2016; 4:2272-2280. [PMID: 27482473 PMCID: PMC4963037 DOI: 10.1109/access.2016.2569074] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Web-delivered clinical trials generate big complex data. To help untangle the heterogeneity of treatment effects, unsupervised learning methods have been widely applied. However, identifying valid patterns is a priority but challenging issue for these methods. This paper, built upon our previous research on multiple imputation (MI)-based fuzzy clustering and validation, proposes a new MI-based Visualization-aided validation index (MIVOOS) to determine the optimal number of clusters for big incomplete longitudinal Web-trial data with inflated zeros. Different from a recently developed fuzzy clustering validation index, MIVOOS uses a more suitable overlap and separation measures for Web-trial data but does not depend on the choice of fuzzifiers as the widely used Xie and Beni (XB) index. Through optimizing the view angles of 3-D projections using Sammon mapping, the optimal 2-D projection-guided MIVOOS is obtained to better visualize and verify the patterns in conjunction with trajectory patterns. Compared with XB and VOS, our newly proposed MIVOOS shows its robustness in validating big Web-trial data under different missing data mechanisms using real and simulated Web-trial data.
Collapse
Affiliation(s)
- Zhaoyang Zhang
- Department of Quantitative Health Science, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Hua Fang
- Department of Quantitative Health Science, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Honggang Wang
- Department of Electrical and Computer Engineering, University of Massachusetts Dartmouth, North Dartmouth, MA 02747, USA
| |
Collapse
|