26
|
Faust L, Feldman K, Chawla NV. Examining the weekend effect across ICU performance metrics. Crit Care 2019; 23:207. [PMID: 31171026 PMCID: PMC6554947 DOI: 10.1186/s13054-019-2479-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Accepted: 05/16/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Known colloquially as the "weekend effect," the association between weekend admissions and increased mortality within hospital settings has become a highly contested topic over the last two decades. Drawing interest from practitioners and researchers alike, a sundry of works have emerged arguing for and against the presence of the effect across various patient cohorts. However, it has become evident that simply studying population characteristics is insufficient for understanding how the effect manifests. Rather, to truly understand the effect, investigations into its underlying factors must be considered. As such, the work presented in this manuscript serves to address this consideration by moving beyond identification of patient cohorts to examining the role of ICU performance. METHODS Employing a comprehensive, publicly available database of electronic medical records (EMR), we began by utilizing multiple logistic regression to identify and isolate a specific cohort in which the weekend effect was present. Next, we leveraged the highly detailed nature of the EMR to evaluate ICU performance using well-established ICU quality scorecards to assess differences in clinical factors among patients admitted to an ICU on the weekend versus weekday. RESULTS Our results demonstrate the weekend effect to be most prevalent among emergency surgery patients (OR 1.53; 95% CI 1.19, 1.96), specifically those diagnosed with circulatory diseases (P<.001). Differences between weekday and weekend admissions for this cohort included a variety of clinical factors such as ventilatory support and night-time discharges. CONCLUSIONS This work reinforces the importance of accounting for differences in clinical factors as well as patient cohorts in studies investigating the weekend effect.
Collapse
|
27
|
Lin S, Faust L, Robles-Granda P, Kajdanowicz T, Chawla NV. Social network structure is predictive of health and wellness. PLoS One 2019; 14:e0217264. [PMID: 31170181 PMCID: PMC6553705 DOI: 10.1371/journal.pone.0217264] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Accepted: 05/06/2019] [Indexed: 11/19/2022] Open
Abstract
Social networks influence health-related behavior, such as obesity and smoking. While researchers have studied social networks as a driver for diffusion of influences and behavior, it is less understood how the structure or topology of the network, in itself, impacts an individual’s health behavior and wellness state. In this paper, we investigate whether the structure or topology of a social network offers additional insight and predictability on an individual’s health and wellness. We develop a method called the Network-Driven health predictor (NetCARE) that leverages features representative of social network structure. Using a large longitudinal data set of students enrolled in the NetHealth study at the University of Notre Dame, we show that the NetCARE method improves the overall prediction performance over the baseline models—that use demographics and physical attributes—by 38%, 65%, 55%, and 54% for the wellness states—stress, happiness, positive attitude, and self-assessed health—considered in this paper.
Collapse
|
28
|
Faust L, Wang C, Hachen D, Lizardo O, Chawla NV. Physical Activity Trend eXtraction: A Framework for Extracting Moderate-Vigorous Physical Activity Trends From Wearable Fitness Tracker Data. JMIR Mhealth Uhealth 2019; 7:e11075. [PMID: 30860488 PMCID: PMC6434402 DOI: 10.2196/11075] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2018] [Revised: 11/26/2018] [Accepted: 12/10/2018] [Indexed: 12/11/2022] Open
Abstract
Background Moderate-vigorous physical activity (MVPA) offers extensive health benefits but is neglected by many. As a result, a wide body of research investigating physical activity behavior change has been conducted. As many of these studies transition from paper-based methods of MVPA data collection to fitness trackers, a series of challenges arise in extracting insights from these new data. Objective The objective of this research was to develop a framework for preprocessing and extracting MVPA trends from wearable fitness tracker data to support MVPA behavior change studies. Methods Using heart rate data collected from fitness trackers, we propose Physical Activity Trend eXtraction (PATX), a framework that imputes missing data, recalculates personalized target heart zones, and extracts MVPA trends. We tested our framework on a dataset of 123 college study participants observed across 2 academic years (18 months) using Fitbit Charge HRs. To demonstrate the value of our frameworks’ output in supporting MVPA behavior change studies, we applied it to 2 case studies. Results Among the 123 participants analyzed, PATX labeled 41 participants as experiencing a significant increase in MVPA and 44 participants who experienced a significant decrease in MVPA, with significance defined as P<.05. Our first case study was consistent with previous works investigating the associations between MVPA and mental health. Whereas the second, exploring how individuals perceive their own levels of MVPA relative to their friends, led to a novel observation that individuals were less likely to notice changes in their own MVPA when close ties in their social network mimicked their changes. Conclusions By providing meaningful and flexible outputs, PATX alleviates data concerns common with fitness trackers to support MVPA behavior change studies as they shift to more objective assessments of MVPA.
Collapse
|
29
|
Tao J, Wang C, Chawla NV, Shi L, Kim SH. Semantic Flow Graph: A Framework for Discovering Object Relationships in Flow Fields. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:3200-3213. [PMID: 29990237 DOI: 10.1109/tvcg.2017.2773071] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Visual exploration of flow fields is important for studying dynamic systems. We introduce semantic flow graph (SFG), a novel graph representation and interaction framework that enables users to explore the relationships among key objects (i.e., field lines, features, and spatiotemporal regions) of both steady and unsteady flow fields. The objects and their relationships are organized as a heterogeneous graph. We assign each object a set of attributes, based on which a semantic abstraction of the heterogeneous graph is generated. This semantic abstraction is SFG. We design a suite of operations to explore the underlying flow fields based on this graph representation and abstraction mechanism. Users can flexibly reconfigure SFG to examine the relationships among groups of objects at different abstraction levels. Three linked views are developed to display SFG, its node split criteria and history, and the objects in the spatial volume. For simplicity, we introduce SFG construction and exploration for steady flow fields with critical points being the only features. Then we demonstrate that SFG can be naturally extended to deal with unsteady flow fields and multiple types of features. We experiment with multiple data sets and conduct an expert evaluation to demonstrate the effectiveness of our approach.
Collapse
|
30
|
Thomas PB, Robertson DH, Chawla NV. Predicting onset of complications from diabetes: a graph based approach. APPLIED NETWORK SCIENCE 2018; 3:48. [PMID: 30581983 PMCID: PMC6245137 DOI: 10.1007/s41109-018-0106-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/03/2018] [Accepted: 10/17/2018] [Indexed: 06/09/2023]
Abstract
Diabetes is a significant health concern with more than 30 million Americans living with diabetes. Onset of diabetes increases the risk for various complications, including kidney disease, myocardial infractions, heart failure, stroke, retinopathy, and liver disease. In this paper, we study and predict the onset of these complications using a network-based approach by identifying fast and slow progressors. That is, given a patient's diagnosis of diabetes, we predict the likelihood of developing one or more of the possible complications, and which patients will develop complications quickly. This combination of "if a complication will be developed" with "how fast it will be developed" can aid the physician in developing better diabetes management program for a given patient.
Collapse
|
31
|
Tao J, Imre M, Wang C, Chawla NV, Guo H, Sever G, Kim SH. Exploring Time-Varying Multivariate Volume Data Using Matrix of Isosurface Similarity Maps. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:1236-1245. [PMID: 30130208 DOI: 10.1109/tvcg.2018.2864808] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We present a novel visual representation and interface named the matrix of isosurface similarity maps (MISM) for effective exploration of large time-varying multivariate volumetric data sets. MISM synthesizes three types of similarity maps (i.e., self, temporal, and variable similarity maps) to capture the essential relationships among isosurfaces of different variables and time steps. Additionally, it serves as the main visual mapping and navigation tool for examining the vast number of isosurfaces and exploring the underlying time-varying multivariate data set. We present temporal clustering, variable grouping, and interactive filtering to reduce the huge exploration space of MISM. In conjunction with the isovalue and isosurface views, MISM allows users to identify important isosurfaces or isosurface pairs and compare them over space, time, and value range. More importantly, we introduce path recommendation that suggests, animates, and compares traversal paths for effectively exploring MISM under varied criteria and at different levels-of-detail. A silhouette-based method is applied to render multiple surfaces of interest in a visually succinct manner. We demonstrate the effectiveness of our approach with case studies of several time-varying multivariate data sets and an ensemble data set, and evaluate our work with two domain experts.
Collapse
|
32
|
Feldman K, Johnson RA, Chawla NV. The State of Data in Healthcare: Path Towards Standardization. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2018; 2:248-271. [DOI: 10.1007/s41666-018-0019-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2017] [Revised: 03/21/2018] [Accepted: 03/29/2018] [Indexed: 12/23/2022]
|
33
|
Fernandez A, Garcia S, Herrera F, Chawla NV. SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary. J ARTIF INTELL RES 2018. [DOI: 10.1613/jair.1.11192] [Citation(s) in RCA: 490] [Impact Index Per Article: 81.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered "de facto" standard in the framework of learning from imbalanced data. This is due to its simplicity in the design of the procedure, as well as its robustness when applied to different type of problems. Since its publication in 2002, SMOTE has proven successful in a variety of applications from several different domains. SMOTE has also inspired several approaches to counter the issue of class imbalance, and has also significantly contributed to new supervised learning paradigms, including multilabel classification, incremental learning, semi-supervised learning, multi-instance learning, among others. It is standard benchmark for learning from imbalanced data. It is also featured in a number of different software packages - from open source to commercial. In this paper, marking the fifteen year anniversary of SMOTE, we reflect on the SMOTE journey, discuss the current state of affairs with SMOTE, its applications, and also identify the next set of challenges to extend SMOTE for Big Data problems.
Collapse
|
34
|
Feldman K, Kotoulas S, Chawla NV. TIQS: Targeted Iterative Question Selection for Health Interventions. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2018; 2:205-227. [PMID: 35415407 DOI: 10.1007/s41666-018-0015-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Revised: 02/08/2018] [Accepted: 02/21/2018] [Indexed: 11/28/2022]
Abstract
While healthcare has traditionally existed within the confines of formal clinical environments, the emergence of population health initiatives has given rise to a new and diverse set of community interventions. As the number of interventions continues to grow, the ability to quickly and accurately identify those most relevant to an individual's specific need has become essential in the care process. However, due to the diverse nature of the interventions, the determination need often requires non-clinical social and behavioral information that must be collected from the individuals themselves. Although survey tools have demonstrated success in the collection of this data, time restrictions and diminishing respondent interest have presented barriers to obtaining up-to-date information on a regular basis. In response, researchers have turned to analytical approaches to optimize surveys and quantify the importance of each question. To date, the majority of these works have approached the task from a univariate standpoint, identifying the next most important question to ask. However, such an approach fails to address the interconnected nature of the health conditions inherently captured by the broader set of survey questions. Utilizing data mining and machine learning methodology, this work demonstrates the value of capturing these relations. We present a novel framework that identifies a variable-length subset of survey questions most relevant in determining the need for a particular health intervention for a given individual. We evaluate the framework using a large national longitudinal dataset centered on aging, demonstrating the ability to identify the questions with the highest impact across a variety of interventions.
Collapse
|
35
|
Nagrecha S, Johnson RA, Chawla NV. FraudBuster: Reducing Fraud in an Auto Insurance Market. BIG DATA 2018; 6:3-12. [PMID: 29570416 DOI: 10.1089/big.2017.0083] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Nonstandard insurers suffer from a peculiar variant of fraud wherein an overwhelming majority of claims have the semblance of fraud. We show that state-of-the-art fraud detection performs poorly when deployed at underwriting. Our proposed framework "FraudBuster" represents a new paradigm in predicting segments of fraud at underwriting in an interpretable and regulation compliant manner. We show that the most actionable and generalizable profile of fraud is represented by market segments with high confidence of fraud and high loss ratio. We show how these segments can be reported in terms of their constituent policy traits, expected loss ratios, support, and confidence of fraud. Overall, our predictive models successfully identify fraud with an area under the precision-recall curve of 0.63 and an f-1 score of 0.769.
Collapse
|
36
|
Nigam A, Dambanemuya HK, Joshi M, Chawla NV. Harvesting Social Signals to Inform Peace Processes Implementation and Monitoring. BIG DATA 2017; 5:337-355. [PMID: 29235916 PMCID: PMC5734239 DOI: 10.1089/big.2017.0055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Peace processes are complex, protracted, and contentious involving significant bargaining and compromising among various societal and political stakeholders. In civil war terminations, it is pertinent to measure the pulse of the nation to ensure that the peace process is responsive to citizens' concerns. Social media yields tremendous power as a tool for dialogue, debate, organization, and mobilization, thereby adding more complexity to the peace process. Using Colombia's final peace agreement and national referendum as a case study, we investigate the influence of two important indicators: intergroup polarization and public sentiment toward the peace process. We present a detailed linguistic analysis to detect intergroup polarization and a predictive model that leverages Tweet structure, content, and user-based features to predict public sentiment toward the Colombian peace process. We demonstrate that had proaccord stakeholders leveraged public opinion from social media, the outcome of the Colombian referendum could have been different.
Collapse
|
37
|
Dong Y, Chawla NV, Tang J, Yang Y, Yang Y. User Modeling on Demographic Attributes in Big Mobile Social Networks. ACM T INFORM SYST 2017. [DOI: 10.1145/3057278] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Users with demographic profiles in social networks offer the potential to understand the social principles that underpin our highly connected world, from individuals, to groups, to societies. In this article, we harness the power of network and data sciences to model the interplay between user demographics and social behavior and further study to what extent users’ demographic profiles can be inferred from their mobile communication patterns. By modeling over 7 million users and 1 billion mobile communication records, we find that during the active dating period (i.e., 18--35 years old), users are active in broadening social connections with males and females alike, while after reaching 35 years of age people tend to keep small, closed, and same-gender social circles. Further, we formalize the demographic prediction problem of inferring users’ gender and age simultaneously. We propose a factor graph-based
WhoAmI
method to address the problem by leveraging not only the correlations between network features and users’ gender/age, but also the interrelations between gender and age. In addition, we identify a new problem—coupled network demographic prediction across multiple mobile operators—and present a coupled variant of the
WhoAmI
method to address its unique challenges. Our extensive experiments demonstrate the effectiveness, scalability, and applicability of the
WhoAmI
methods. Finally, our study finds a greater than 80% potential predictability for inferring users’ gender from phone call behavior and 73% for users’ age from text messaging interactions.
Collapse
|
38
|
Wang S, Song J, Yang Y, Zhang Y, Chawla NV, Ma J, Wang H. Interaction between obesity and the Hypoxia Inducible Factor 3 Alpha Subunit rs3826795 polymorphism in relation with plasma alanine aminotransferase. BMC MEDICAL GENETICS 2017; 18:80. [PMID: 28754107 PMCID: PMC5534125 DOI: 10.1186/s12881-017-0437-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Accepted: 07/13/2017] [Indexed: 12/14/2022]
Abstract
BACKGROUND Hypoxia Inducible Factor 3 Alpha Subunit (HIF3A) DNA has been demonstrated to be associated with obesity in the methylation level, and it also has a Body Mass Index (BMI)-independent association with plasma alanine aminotransferase (ALT). However, the relation among obesity, plasma ALT, HIF3A polymorphism and methylation remains unclear. This study aims to identify the association between HIF3A polymorphism and plasma ALT, and further to determine whether the effect of HIF3A polymorphism on ALT could be modified by obesity or mediated by DNA methylation. METHODS The HIF3A rs3826795 polymorphism was genotyped in a case-control study including 2030 Chinese children aged 7-18 years (705 obese cases and 1325 non-obese controls). Furthermore, the HIF3A DNA methylation of the peripheral blood was measured in 110 severely obese children and 110 age- and gender- matched normal-weight controls. RESULTS There was no overall association between the HIF3A rs3826795 polymorphism and ALT. A significant interaction between obesity and rs3826795 in relation with ALT was found (P inter = 0.042), with rs3826795 G-allele number elevating ALT significantly only in obese children (β' = 0.075, P = 0.037), but not in non-obese children (β' = -0.009, P = 0.741). Additionally, a mediation effect of HIF3A methylation was found in the association between the HIF3A rs3826795 polymorphism and ALT among obese children (β' = 0.242, P = 0.014). CONCLUSION This is the first study to report the interaction between obesity and HIF3A gene in relation with ALT, and also to reveal a mediation effect among the HIF3A polymorphism, methylation and ALT. This study provides new evidence to the function of HIF3A gene, which would be helpful for future risk assessment and personalized treatment of liver diseases.
Collapse
|
39
|
Mursalin M, Zhang Y, Chen Y, Chawla NV. Automated epileptic seizure detection using improved correlation-based feature selection with random forest classifier. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.02.053] [Citation(s) in RCA: 151] [Impact Index Per Article: 21.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
40
|
Wang S, Song J, Yang Y, Chawla NV, Ma J, Wang H. Rs12970134 near MC4R is associated with appetite and beverage intake in overweight and obese children: A family-based association study in Chinese population. PLoS One 2017; 12:e0177983. [PMID: 28520814 PMCID: PMC5433775 DOI: 10.1371/journal.pone.0177983] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2016] [Accepted: 05/05/2017] [Indexed: 12/15/2022] Open
Abstract
Background Recent studies indicated that eating behaviors are under genetic influence, and the melanocortin 4 receptor (MC4R) gene polymorphisms can affect the total energy intake and the consumption of fat, protein and carbohydrates. Our study aims at investigating the association of the MC4R polymorphism with appetite and food intake among Chinese children. Methods A family-based association study was conducted among 151 Chinese trios whose offsprings were overweight/obese children aged 9–15 years. The rs12970134 near MC4R was genotyped, and the Children Eating Behavior Questionnaire (CEBQ) and a self-designed questionnaire measuring food intake were performed. The FBAT and PBAT software packages were used. Results The family-based association analysis showed that there was a significant association between rs12970134 and obesity (Z = 2.449, P = 0.014). After adjusting for age, gender and standardized BMI, rs12970134 was significantly associated with food responsiveness (FR) among children (β'b = 0.077, Pb = 0.028), and with satiety responsiveness (SR) in trios (P = -0.026). The polymorphism was associated with beverage intake (β'b = 0.331, Pb = 0.00016 in children; P = 0.043 in trios), but not significantly associated with vegetable, fruit or meat intake (P>0.050). We further found a significant mediation effect among the rs12970134, FR and beverage intake (b = 0.177, P = 0.047). Conclusions Our study is the first to report that rs12970134 near MC4R was associated with appetite and beverage intake, and food responsiveness could mediate the effect of rs12970134 on beverage intake in overweight and obese Chinese children population. Further studies are needed to uncover the genetic basis for eating behaviors, which could lead to develop and implement effective interventional strategies early in life.
Collapse
|
41
|
Dasgupta D, Johnson RA, Chaudhry B, Reeves KG, Willaert P, Chawla NV. Design and Evaluation of a Medication Adherence Application with Communication for Seniors in Independent Living Communities. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2017; 2016:480-489. [PMID: 28269843 PMCID: PMC5333254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Medication non-adherence is a pressing concern among seniors, leading to a lower quality of life and higher healthcare costs. While mobile applications provide a viable medium for medication management, their utility can be limited without tackling the specific needs of seniors and facilitating the active involvement of care providers. To address these limitations, we are developing a tablet-based application designed specifically for seniors to track their medications and a web portal for their care providers to track medication adherence. In collaboration with a local Aging in Place program, we conducted a three-month study with sixteen participants from an independent living facility. Our study found that the application helped participants to effectively track their medications and improved their sense of wellbeing. Our findings highlight the importance of catering to the needs of seniors and of involving care providers in this process, with specific recommendations for the development of future medication management applications.
Collapse
|
42
|
Xu J, Wickramarathne TL, Chawla NV. Representing higher-order dependencies in networks. SCIENCE ADVANCES 2016; 2:e1600028. [PMID: 27386539 PMCID: PMC4928957 DOI: 10.1126/sciadv.1600028] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2016] [Accepted: 04/20/2016] [Indexed: 05/28/2023]
Abstract
To ensure the correctness of network analysis methods, the network (as the input) has to be a sufficiently accurate representation of the underlying data. However, when representing sequential data from complex systems, such as global shipping traffic or Web clickstream traffic as networks, conventional network representations that implicitly assume the Markov property (first-order dependency) can quickly become limiting. This assumption holds that, when movements are simulated on the network, the next movement depends only on the current node, discounting the fact that the movement may depend on several previous steps. However, we show that data derived from many complex systems can show up to fifth-order dependencies. In these cases, the oversimplifying assumption of the first-order network representation can lead to inaccurate network analysis results. To address this problem, we propose the higher-order network (HON) representation that can discover and embed variable orders of dependencies in a network representation. Through a comprehensive empirical evaluation and analysis, we establish several desirable characteristics of HON, including accuracy, scalability, and direct compatibility with the existing suite of network analysis methods. We illustrate how HON can be applied to a broad variety of tasks, such as random walking, clustering, and ranking, and we demonstrate that, by using it as input, HON yields more accurate results without any modification to these tasks.
Collapse
|
43
|
Feldman K, Chawla NV. Does Medical School Training Relate to Practice? Evidence from Big Data. BIG DATA 2015; 3:103-113. [PMID: 26487985 PMCID: PMC4605456 DOI: 10.1089/big.2014.0060] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
On April 2nd, 2014, the Department of Health and Human Services (HHS) announced a historic policy in its effort to increase the transparency in the American healthcare system. The Center for Medicare and Medicaid Service (CMS) would publicly release a dataset containing information about the types of Medicare services, requested charges, and payments issued by providers across the country. In its release, HHS stated that the data would shed light on "Medicare fraud, waste, and abuse." While this is most certainly true, we believe that it can provide so much more. Beyond the purely financial aspects of procedure charges and payments, the procedures themselves may provide us with additional information, not only about the Medicare population, but also about the physicians themselves. The procedures a physician performs are for the most part not novel, but rather recommended, observed, and studied. However, whether a physician decides on advocating a procedure is somewhat discretionary. Some patients require a clear course of action, while others may benefit from a variety of options. This article poses the following question: How does a physician's past experience in medical school shape his or her practicing decisions? This article aims to open the analysis into how data, such as the CMS Medicare release, can help further our understanding of knowledge transfer and how experiences during education can shape a physician's decision's over the course of his or her career. This work begins with an evaluation into similarities between medical school charges, procedures, and payments. It then details how schools' procedure choices may link them in other, more interesting ways. Finally, the article includes a geographic analysis of how medical school procedure payments and charges are distributed nationally, highlighting potential deviations.
Collapse
|
44
|
Dong Y, Tang J, Chawla NV, Lou T, Yang Y, Wang B. Inferring social status and rich club effects in enterprise communication networks. PLoS One 2015; 10:e0119446. [PMID: 25822343 PMCID: PMC4379184 DOI: 10.1371/journal.pone.0119446] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2014] [Accepted: 01/19/2015] [Indexed: 11/30/2022] Open
Abstract
Social status, defined as the relative rank or position that an individual holds in a social hierarchy, is known to be among the most important motivating forces in social behaviors. In this paper, we consider the notion of status from the perspective of a position or title held by a person in an enterprise. We study the intersection of social status and social networks in an enterprise. We study whether enterprise communication logs can help reveal how social interactions and individual status manifest themselves in social networks. To that end, we use two enterprise datasets with three communication channels — voice call, short message, and email — to demonstrate the social-behavioral differences among individuals with different status. We have several interesting findings and based on these findings we also develop a model to predict social status. On the individual level, high-status individuals are more likely to be spanned as structural holes by linking to people in parts of the enterprise networks that are otherwise not well connected to one another. On the community level, the principle of homophily, social balance and clique theory generally indicate a “rich club” maintained by high-status individuals, in the sense that this community is much more connected, balanced and dense. Our model can predict social status of individuals with 93% accuracy.
Collapse
|
45
|
Dong Y, Pinelli F, Gkoufas Y, Nabi Z, Calabrese F, Chawla NV. Inferring Unusual Crowd Events from Mobile Phone Call Detail Records. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES 2015. [DOI: 10.1007/978-3-319-23525-7_29] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
46
|
Yang Y, Dong Y, Chawla NV. Predicting node degree centrality with the node prominence profile. Sci Rep 2014; 4:7236. [PMID: 25429797 PMCID: PMC4246206 DOI: 10.1038/srep07236] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2014] [Accepted: 11/05/2014] [Indexed: 11/15/2022] Open
Abstract
Centrality of a node measures its relative importance within a network. There are a number of applications of centrality, including inferring the influence or success of an individual in a social network, and the resulting social network dynamics. While we can compute the centrality of any node in a given network snapshot, a number of applications are also interested in knowing the potential importance of an individual in the future. However, current centrality is not necessarily an effective predictor of future centrality. While there are different measures of centrality, we focus on degree centrality in this paper. We develop a method that reconciles preferential attachment and triadic closure to capture a node's prominence profile. We show that the proposed node prominence profile method is an effective predictor of degree centrality. Notably, our analysis reveals that individuals in the early stage of evolution display a distinctive and robust signature in degree centrality trend, adequately predicted by their prominence profile. We evaluate our work across four real-world social networks. Our findings have important implications for the applications that require prediction of a node's future degree centrality, as well as the study of social network dynamics.
Collapse
|
47
|
Zhou ZH, Chawla NV, Jin Y, Williams GJ. Big Data Opportunities and Challenges: Discussions from Data Analytics Perspectives [Discussion Forum]. IEEE COMPUT INTELL M 2014. [DOI: 10.1109/mci.2014.2350953] [Citation(s) in RCA: 166] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
48
|
Lichtenwalter RN, Chawla NV. Vertex collocation profiles: theory, computation, and results. SPRINGERPLUS 2014; 3:116. [PMID: 25392767 PMCID: PMC4212056 DOI: 10.1186/2193-1801-3-116] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/13/2013] [Accepted: 02/03/2014] [Indexed: 11/16/2022]
Abstract
We describe the vertex collocation profile (VCP) concept. VCPs provide rich information about the surrounding local structure of embedded vertex pairs. VCP analysis offers a new tool for researchers and domain experts to understand the underlying growth mechanisms in their networks and to analyze link formation mechanisms in the appropriate sociological, biological, physical, or other context. The same resolution that gives the VCP method its analytical power also enables it to perform well when used to accomplish link prediction. We first develop the theory, mathematics, and algorithms underlying VCPs. We provide timing results to demonstrate that the algorithms scale well even for large networks. Then we demonstrate VCP methods performing link prediction competitively with unsupervised and supervised methods across different network families. Unlike many analytical tools, VCPs inherently generalize to multirelational data, which provides them with unique power in complex modeling tasks. To demonstrate this, we apply the VCP method to longitudinal networks by encoding temporally resolved information into different relations. In this way, the transitions between VCP elements represent temporal evolutionary patterns in the longitudinal network data. Results show that VCPs can use this additional data, typically challenging to employ, to improve predictive model accuracies. We conclude with our perspectives on the VCP method and its future in network science, particularly link prediction.
Collapse
|
49
|
Rider AK, Siwo G, Emrich SJ, Ferdig MT, Chawla NV. A supervised learning approach to the ensemble clustering of genes. INT J DATA MIN BIOIN 2014; 9:199-219. [DOI: 10.1504/ijdmb.2014.059062] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
50
|
Hoens TR, Blanton M, Steele A, Chawla NV. Reliable medical recommendation systems with patient privacy. ACM T INTEL SYST TEC 2013. [DOI: 10.1145/2508037.2508048] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
One of the concerns patients have when confronted with a medical condition is which physician to trust. Any recommendation system that seeks to answer this question must ensure that any sensitive medical information collected by the system is properly secured. In this article, we codify these privacy concerns in a privacy-friendly framework and present two architectures that realize it: the Secure Processing Architecture (SPA) and the Anonymous Contributions Architecture (ACA). In SPA, patients submit their ratings in a protected form without revealing any information about their data and the computation of recommendations proceeds over the protected data using secure multiparty computation techniques. In ACA, patients submit their ratings in the clear, but no link between a submission and patient data can be made. We discuss various aspects of both architectures, including techniques for ensuring reliability of computed recommendations and system performance, and provide their comparison.
Collapse
|