1
|
Tang S, Mao S, Chen Y, Tan F, Duan L, Pian C, Zeng X. LRBmat: A novel gut microbial interaction and individual heterogeneity inference method for colorectal cancer. J Theor Biol 2023; 571:111538. [PMID: 37257720 DOI: 10.1016/j.jtbi.2023.111538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 05/07/2023] [Accepted: 05/18/2023] [Indexed: 06/02/2023]
Abstract
The gut microbial community has been shown to play a significant role in various diseases, including colorectal cancer (CRC), which is a major public health concern worldwide. The accurate diagnosis and etiological analysis of CRC are crucial issues. Numerous methods have utilized gut microbiota to address these challenges; however, few have considered the complex interactions and individual heterogeneity of the gut microbiota, which are important issues in genetics and intestinal microbiology, particularly in high-dimensional cases. This paper presents a novel method called Binary matrix based on Logistic Regression (LRBmat) to address these concerns. The binary matrix in LRBmat can directly mitigate or eliminate the influence of heterogeneity, while also capturing information on gut microbial interactions with any order. LRBmat is highly adaptable and can be combined with any machine learning method to enhance its capabilities. The proposed method was evaluated using real CRC data and demonstrated superior classification performance compared to state-of-the-art methods. Furthermore, the association rules extracted from the binary matrix of the real data align well with biological properties and existing literature, thereby aiding in the etiological analysis of CRC.
Collapse
Affiliation(s)
- Shan Tang
- Department of Statistics, Hunan University, Changsha 410006, China
| | - Shanjun Mao
- Department of Statistics, Hunan University, Changsha 410006, China.
| | - Yangyang Chen
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| | - Falong Tan
- Department of Statistics, Hunan University, Changsha 410006, China
| | - Lihua Duan
- Department of Rheumatology and Clinical Immunology, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang University, Nanchang 330006, China
| | - Cong Pian
- College of Sciences, Nanjing Agricultural University, Nanjing 210095, China
| | - Xiangxiang Zeng
- Department of Computer Science, Hunan University, Changsha 410086, China
| |
Collapse
|
2
|
Wentzel A, Floricel C, Canahuate G, Naser MA, Mohamed AS, Fuller CD, van Dijk L, Marai GE. DASS Good: Explainable Data Mining of Spatial Cohort Data. COMPUTER GRAPHICS FORUM : JOURNAL OF THE EUROPEAN ASSOCIATION FOR COMPUTER GRAPHICS 2023; 42:283-295. [PMID: 37854026 PMCID: PMC10583718 DOI: 10.1111/cgf.14830] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/20/2023]
Abstract
Developing applicable clinical machine learning models is a difficult task when the data includes spatial information, for example, radiation dose distributions across adjacent organs at risk. We describe the co-design of a modeling system, DASS, to support the hybrid human-machine development and validation of predictive models for estimating long-term toxicities related to radiotherapy doses in head and neck cancer patients. Developed in collaboration with domain experts in oncology and data mining, DASS incorporates human-in-the-loop visual steering, spatial data, and explainable AI to augment domain knowledge with automatic data mining. We demonstrate DASS with the development of two practical clinical stratification models and report feedback from domain experts. Finally, we describe the design lessons learned from this collaborative experience.
Collapse
Affiliation(s)
- A Wentzel
- University of Illinois Chicago, Electronic Visualization Lab
| | - C Floricel
- University of Illinois Chicago, Electronic Visualization Lab
| | | | - M A Naser
- University of Texas MD Anderson Cancer Center
| | - A S Mohamed
- University of Texas MD Anderson Cancer Center
| | - C D Fuller
- University of Texas MD Anderson Cancer Center
| | - L van Dijk
- University of Texas MD Anderson Cancer Center
| | - G E Marai
- University of Illinois Chicago, Electronic Visualization Lab
| |
Collapse
|
3
|
Xenopoulos P, Rulff J, Nonato LG, Barr B, Silva C. Calibrate: Interactive Analysis of Probabilistic Model Output. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:853-863. [PMID: 36166523 DOI: 10.1109/tvcg.2022.3209489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Analyzing classification model performance is a crucial task for machine learning practitioners. While practitioners often use count-based metrics derived from confusion matrices, like accuracy, many applications, such as weather prediction, sports betting, or patient risk prediction, rely on a classifier's predicted probabilities rather than predicted labels. In these instances, practitioners are concerned with producing a calibrated model, that is, one which outputs probabilities that reflect those of the true distribution. Model calibration is often analyzed visually, through static reliability diagrams, however, the traditional calibration visualization may suffer from a variety of drawbacks due to the strong aggregations it necessitates. Furthermore, count-based approaches are unable to sufficiently analyze model calibration. We present Calibrate, an interactive reliability diagram that addresses the aforementioned issues. Calibrate constructs a reliability diagram that is resistant to drawbacks in traditional approaches, and allows for interactive subgroup analysis and instance-level inspection. We demonstrate the utility of Calibrate through use cases on both real-world and synthetic data. We further validate Calibrate by presenting the results of a think-aloud experiment with data scientists who routinely analyze model calibration.
Collapse
|
4
|
Chatzimparmpas A, Martins RM, Kucher K, Kerren A. FeatureEnVi: Visual Analytics for Feature Engineering Using Stepwise Selection and Semi-Automatic Extraction Approaches. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:1773-1791. [PMID: 34990365 DOI: 10.1109/tvcg.2022.3141040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The machine learning (ML) life cycle involves a series of iterative steps, from the effective gathering and preparation of the data-including complex feature engineering processes-to the presentation and improvement of results, with various algorithms to choose from in every step. Feature engineering in particular can be very beneficial for ML, leading to numerous improvements such as boosting the predictive results, decreasing computational times, reducing excessive noise, and increasing the transparency behind the decisions taken during the training. Despite that, while several visual analytics tools exist to monitor and control the different stages of the ML life cycle (especially those related to data and algorithms), feature engineering support remains inadequate. In this paper, we present FeatureEnVi, a visual analytics system specifically designed to assist with the feature engineering process. Our proposed system helps users to choose the most important feature, to transform the original features into powerful alternatives, and to experiment with different feature generation combinations. Additionally, data space slicing allows users to explore the impact of features on both local and global scales. FeatureEnVi utilizes multiple automatic feature selection techniques; furthermore, it visually guides users with statistical evidence about the influence of each feature (or subsets of features). The final outcome is the extraction of heavily engineered features, evaluated by multiple validation metrics. The usefulness and applicability of FeatureEnVi are demonstrated with two use cases and a case study. We also report feedback from interviews with two ML experts and a visualization researcher who assessed the effectiveness of our system.
Collapse
|
5
|
|
6
|
Kwon BC, Anand V, Severson KA, Ghosh S, Sun Z, Frohnert BI, Lundgren M, Ng K. DPVis: Visual Analytics With Hidden Markov Models for Disease Progression Pathways. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:3685-3700. [PMID: 32275600 DOI: 10.1109/tvcg.2020.2985689] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Clinical researchers use disease progression models to understand patient status and characterize progression patterns from longitudinal health records. One approach for disease progression modeling is to describe patient status using a small number of states that represent distinctive distributions over a set of observed measures. Hidden Markov models (HMMs) and its variants are a class of models that both discover these states and make inferences of health states for patients. Despite the advantages of using the algorithms for discovering interesting patterns, it still remains challenging for medical experts to interpret model outputs, understand complex modeling parameters, and clinically make sense of the patterns. To tackle these problems, we conducted a design study with clinical scientists, statisticians, and visualization experts, with the goal to investigate disease progression pathways of chronic diseases, namely type 1 diabetes (T1D), Huntington's disease, Parkinson's disease, and chronic obstructive pulmonary disease (COPD). As a result, we introduce DPVis which seamlessly integrates model parameters and outcomes of HMMs into interpretable and interactive visualizations. In this article, we demonstrate that DPVis is successful in evaluating disease progression models, visually summarizing disease states, interactively exploring disease progression patterns, and building, analyzing, and comparing clinically relevant patient subgroups.
Collapse
|
7
|
Dingen D, Van' T Veer M, Wammes-van der Heijden E, Lazeron RHC, van Mastrigt G, Majoie M. Evaluation of two anti-seizure medication strategies in refractory epilepsy patients from a tertiary center with complementary insights from data visualization. Epilepsy Res 2021; 174:106667. [PMID: 33989886 DOI: 10.1016/j.eplepsyres.2021.106667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Revised: 05/04/2021] [Accepted: 05/07/2021] [Indexed: 10/21/2022]
Abstract
OBJECTIVE To evaluate the healthcare resources in a tertiary center related to exclusive use of non-enzyme inducing anti-seizure medications relative to concomitant use of enzyme-inducing anti-seizure medications in patients with refractory epilepsy. METHODS In this retrospective case-time-control study, we compared the effects of two anti-seizure medication strategies: exclusively non-inducing anti-seizure medications (NIND) or a combination of NIND and inducing anti-seizure medications (IND+). The primary outcome parameter was the number of consultations with relevant healthcare professionals in our tertiary center, assessed with a negative binomial regression model, adjusting for several covariates like blood drug level and time interval (TI). Results from statistical models were visualized to explore the contribution of all covariates on the outcome in the total population and in subgroups. RESULTS From the 21538 patients with refractory epilepsy referred to our center 1648 patients met the inclusion criteria. The regression model showed that the IND + strategy was significantly associated with fewer consultations compared to the NIND strategy (p < 0.001), reflected in an incidence risk ratio (IRR) of 0.844 (0.799-0.890). Visualization of subgroups, defined by anti-seizure medications strategy, revealed patterns in contribution of blood drug level measurements on the outcome. Although sex was not included as a covariate in the regression model, as it was eliminated by the backward-elimination approach, visualization of this subgroup showed differences in effects of blood drug level and TI. CONCLUSION For patients with refractory epilepsy in our tertiary center, treatment following the IND + strategy is associated with fewer consultations with healthcare professionals compared to the NIND strategy. Comprehensive visualization of the results facilitated the exploration of effects of covariates across subgroups.
Collapse
Affiliation(s)
- Dennis Dingen
- Eindhoven University of Technology, Dept. Mathematics and Computer Science, the Netherlands.
| | - Marcel Van' T Veer
- Eindhoven University of Technology, Dept. Biomedical Engineering, the Netherlands; Catharina Hospital Eindhoven, Dept. Research & Education, the Netherlands
| | | | - Richard H C Lazeron
- Academic Center for Epileptology Kempenhaeghe and Maastricht University Medical Center, Heeze and Maastricht, the Netherlands; Eindhoven University of Technology, Dept. Electrical Engineering, the Netherlands
| | - Ghislaine van Mastrigt
- CAPHRI, Research School for Public Health and Primary Care, Dept. Health Services Research, Faculty of Health, Medicine and Life Sciences, Maastricht University, the Netherlands
| | - Marian Majoie
- Academic Center for Epileptology Kempenhaeghe and Maastricht University Medical Center, Heeze and Maastricht, the Netherlands
| |
Collapse
|
8
|
Tena A, Claria F, Solsona F, Meister E, Povedano M. Detection of Bulbar Involvement in Patients With Amyotrophic Lateral Sclerosis by Machine Learning Voice Analysis: Diagnostic Decision Support Development Study. JMIR Med Inform 2021; 9:e21331. [PMID: 33688838 PMCID: PMC7991994 DOI: 10.2196/21331] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 10/26/2020] [Accepted: 01/17/2021] [Indexed: 11/13/2022] Open
Abstract
Background Bulbar involvement is a term used in amyotrophic lateral sclerosis (ALS) that refers to motor neuron impairment in the corticobulbar area of the brainstem, which produces a dysfunction of speech and swallowing. One of the earliest symptoms of bulbar involvement is voice deterioration characterized by grossly defective articulation; extremely slow, laborious speech; marked hypernasality; and severe harshness. Bulbar involvement requires well-timed and carefully coordinated interventions. Therefore, early detection is crucial to improving the quality of life and lengthening the life expectancy of patients with ALS who present with this dysfunction. Recent research efforts have focused on voice analysis to capture bulbar involvement. Objective The main objective of this paper was (1) to design a methodology for diagnosing bulbar involvement efficiently through the acoustic parameters of uttered vowels in Spanish, and (2) to demonstrate that the performance of the automated diagnosis of bulbar involvement is superior to human diagnosis. Methods The study focused on the extraction of features from the phonatory subsystem—jitter, shimmer, harmonics-to-noise ratio, and pitch—from the utterance of the five Spanish vowels. Then, we used various supervised classification algorithms, preceded by principal component analysis of the features obtained. Results To date, support vector machines have performed better (accuracy 95.8%) than the models analyzed in the related work. We also show how the model can improve human diagnosis, which can often misdiagnose bulbar involvement. Conclusions The results obtained are very encouraging and demonstrate the efficiency and applicability of the automated model presented in this paper. It may be an appropriate tool to help in the diagnosis of ALS by multidisciplinary clinical teams, in particular to improve the diagnosis of bulbar involvement.
Collapse
Affiliation(s)
- Alberto Tena
- Information and Communication Technologies Group, International Centre for Numerical Methods in Engineering, Barcelona, Spain
| | - Francec Claria
- Department of Computer Science, Universitat de Lleida, Lleida, Spain
| | - Francesc Solsona
- Department of Computer Science, Universitat de Lleida, Lleida, Spain
| | - Einar Meister
- Institute of Cybernetics, Tallinn University of Technology, Tallinn, Estonia
| | - Monica Povedano
- Motoneuron Functional Unit, Hospital Universitari de Bellvitge, Barcelona, Spain
| |
Collapse
|
9
|
Abstract
The increasing use of electronic health record (EHR)-based systems has led to the generation of clinical data at an unprecedented rate, which produces an untapped resource for healthcare experts to improve the quality of care. Despite the growing demand for adopting EHRs, the large amount of clinical data has made some analytical and cognitive processes more challenging. The emergence of a type of computational system called visual analytics has the potential to handle information overload challenges in EHRs by integrating analytics techniques with interactive visualizations. In recent years, several EHR-based visual analytics systems have been developed to fulfill healthcare experts’ computational and cognitive demands. In this paper, we conduct a systematic literature review to present the research papers that describe the design of EHR-based visual analytics systems and provide a brief overview of 22 systems that met the selection criteria. We identify and explain the key dimensions of the EHR-based visual analytics design space, including visual analytics tasks, analytics, visualizations, and interactions. We evaluate the systems using the selected dimensions and identify the gaps and areas with little prior work.
Collapse
|
10
|
Knittel J, Lalama A, Koch S, Ertl T. Visual Neural Decomposition to Explain Multivariate Data Sets. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1374-1384. [PMID: 33048724 DOI: 10.1109/tvcg.2020.3030420] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Investigating relationships between variables in multi-dimensional data sets is a common task for data analysts and engineers. More specifically, it is often valuable to understand which ranges of which input variables lead to particular values of a given target variable. Unfortunately, with an increasing number of independent variables, this process may become cumbersome and time-consuming due to the many possible combinations that have to be explored. In this paper, we propose a novel approach to visualize correlations between input variables and a target output variable that scales to hundreds of variables. We developed a visual model based on neural networks that can be explored in a guided way to help analysts find and understand such correlations. First, we train a neural network to predict the target from the input variables. Then, we visualize the inner workings of the resulting model to help understand relations within the data set. We further introduce a new regularization term for the backpropagation algorithm that encourages the neural network to learn representations that are easier to interpret visually. We apply our method to artificial and real-world data sets to show its utility.
Collapse
|
11
|
Abstract
Feature Analysis has become a very critical task in data analysis and visualization. Graph structures are very flexible in terms of representation and may encode important information on features but are challenging in regards to layout being adequate for analysis tasks. In this study, we propose and develop similarity-based graph layouts with the purpose of locating relevant patterns in sets of features, thus supporting feature analysis and selection. We apply a tree layout in the first step of the strategy, to accomplish node placement and overview based on feature similarity. By drawing the remainder of the graph edges on demand, further grouping and relationships among features are revealed. We evaluate those groups and relationships in terms of their effectiveness in exploring feature sets for data analysis. Correlation of features with a target categorical attribute and feature ranking are added to support the task. Multidimensional projections are employed to plot the dataset based on selected attributes to reveal the effectiveness of the feature set. Our results have shown that the tree-graph layout framework allows for a number of observations that are very important in user-centric feature selection, and not easy to observe by any other available tool. They provide a way of finding relevant and irrelevant features, spurious sets of noisy features, groups of similar features, and opposite features, all of which are essential tasks in different scenarios of data analysis. Case studies in application areas centered on documents, images and sound data demonstrate the ability of the framework to quickly reach a satisfactory compact representation from a larger feature set.
Collapse
|
12
|
Li J, Ma X, Tobore I, Liu Y, Kandwal A, Wang L, Lu J, Lu W, Bao Y, Zhou J, Nie Z. A Novel CGM Metric-Gradient and Combining Mean Sensor Glucose Enable to Improve the Prediction of Nocturnal Hypoglycemic Events in Patients with Diabetes. J Diabetes Res 2020; 2020:8830774. [PMID: 33204733 PMCID: PMC7655247 DOI: 10.1155/2020/8830774] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 10/15/2020] [Accepted: 10/24/2020] [Indexed: 12/28/2022] Open
Abstract
Nocturnal hypoglycemia is a serious complication of insulin-treated diabetes, and it is often asymptomatic. A novel CGM metric-gradient was proposed in this paper, and a method of combining mean sensor glucose (MSG) and gradient was presented for the prediction of nocturnal hypoglycemia. For this purpose, the data from continuous glucose monitoring (CGM) encompassing 1,921 patients with diabetes were analyzed, and a total of 302 nocturnal hypoglycemic events were recorded. The MSG and gradient values were calculated, respectively, and then combined as a new metric (i.e., MSG+gradient). In addition, the prediction was conducted by four algorithms, namely, logistic regression, support vector machine, random forest, and long short-term memory. The results revealed that the gradient of CGM showed a downward trend before hypoglycemic events happened. Additionally, the results indicated that the specificity and sensitivity based on the proposed method were better than the conventional metrics of low blood glucose index (LBGI), coefficient of variation (CV), mean absolute glucose (MAG), lability index (LI), etc., and the complex metrics of MSG+LBGI, MSG+CV, MSG+MAG, and MSG+LI, etc. Specifically, the specificity and sensitivity were greater than 96.07% and 96.03% at the prediction horizon of 15 minutes and greater than 87.79% and 90.07% at the prediction horizon of 30 minutes when the proposed method was adopted to predict nocturnal hypoglycemic events in the aforementioned four algorithms. Therefore, the proposed method of combining MSG and gradient may enable to improve the prediction of nocturnal hypoglycemic events. Future studies are warranted to confirm the validity of this metric.
Collapse
Affiliation(s)
- Jingzhen Li
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Xiaojing Ma
- Department of Endocrinology and Metabolism, Shanghai Clinical Center for Diabetes, Shanghai Diabetes Institute, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai 200233, China
| | - Igbe Tobore
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Yuhang Liu
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Abhishek Kandwal
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Lei Wang
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Jingyi Lu
- Department of Endocrinology and Metabolism, Shanghai Clinical Center for Diabetes, Shanghai Diabetes Institute, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai 200233, China
| | - Wei Lu
- Department of Endocrinology and Metabolism, Shanghai Clinical Center for Diabetes, Shanghai Diabetes Institute, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai 200233, China
| | - Yuqian Bao
- Department of Endocrinology and Metabolism, Shanghai Clinical Center for Diabetes, Shanghai Diabetes Institute, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai 200233, China
| | - Jian Zhou
- Department of Endocrinology and Metabolism, Shanghai Clinical Center for Diabetes, Shanghai Diabetes Institute, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai 200233, China
| | - Zedong Nie
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| |
Collapse
|