1
|
Itzhak N, Jaroszewicz S, Moskovitch R. Event prediction by estimating continuously the completion of a single temporal pattern's instances. J Biomed Inform 2024; 156:104665. [PMID: 38852777 DOI: 10.1016/j.jbi.2024.104665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Revised: 05/10/2024] [Accepted: 06/03/2024] [Indexed: 06/11/2024]
Abstract
OBJECTIVE Develop a new method for continuous prediction that utilizes a single temporal pattern ending with an event of interest and its multiple instances detected in the temporal data. METHODS Use temporal abstraction to transform time series, instantaneous events, and time intervals into a uniform representation using symbolic time intervals (STIs). Introduce a new approach to event prediction using a single time intervals-related pattern (TIRP), which can learn models to predict whether and when an event of interest will occur, based on multiple instances of a pattern that end with the event. RESULTS The proposed methods achieved an average improvement of 5% AUROC over LSTM-FCN, the best-performed baseline model, out of the evaluated baseline models (RawXGB, Resnet, LSTM-FCN, and ROCKET) that were applied to real-life datasets. CONCLUSION The proposed methods for predicting events continuously have the potential to be used in a wide range of real-world and real-time applications in diverse domains with heterogeneous multivariate temporal data. For example, it could be used to predict panic attacks early using wearable devices or to predict complications early in intensive care unit patients.
Collapse
Affiliation(s)
- Nevo Itzhak
- Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer Sheva, Israel.
| | - Szymon Jaroszewicz
- Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland; Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland.
| | - Robert Moskovitch
- Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer Sheva, Israel.
| |
Collapse
|
2
|
Adam N, Wieder R. Temporal Association Rule Mining: Race-Based Patterns of Treatment-Adverse Events in Breast Cancer Patients Using SEER-Medicare Dataset. Biomedicines 2024; 12:1213. [PMID: 38927419 PMCID: PMC11200891 DOI: 10.3390/biomedicines12061213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 05/17/2024] [Accepted: 05/20/2024] [Indexed: 06/28/2024] Open
Abstract
PURPOSE Disparities in the screening, treatment, and survival of African American (AA) patients with breast cancer extend to adverse events experienced with systemic therapy. However, data are limited and difficult to obtain. We addressed this challenge by applying temporal association rule (TAR) mining using the SEER-Medicare dataset for differences in the association of specific adverse events (AEs) and treatments (TRs) for breast cancer between AA and White women. We considered two categories of cancer care providers and settings: practitioners providing care in the outpatient units of hospitals and institutions and private practitioners providing care in their offices. PATIENTS AN METHODS We considered women enrolled in the Medicare fee-for-service option at age 65 who qualified by age and not disability, who were diagnosed with breast cancer with attributed patient factors of age and race, marital status, comorbidities, prior malignancies, prior therapy, disease factors of stage, grade, and ER/PR and Her2 status and laterality. We included 141 HCPCS drug J codes for chemotherapy, biotherapy, and hormone therapy drugs, which we consolidated into 46 mechanistic categories and generated AE data. We consolidated AEs from ICD9 codes into 18 categories associated with breast cancer therapy. We applied TAR mining to determine associations between the 46 TR and 18 AE categories in the context of the patient categories outlined. We applied the spark.mllib implementation of the FPGrowth algorithm, a parallel version called PFP. We considered differences of at least one unit of lift as significant between groups. The model's results demonstrated a high overlap between the model's identified TR-AEs associated set and the actual set. RESULTS Our results demonstrate that specific TR/AE associations are highly dependent on race, stage, and venue of care administration. CONCLUSIONS Our data demonstrate the usefulness of this approach in identifying differences in the associations between TRs and AEs in different populations and serve as a reference for predicting the likelihood of AEs in different patient populations treated for breast cancer. Our novel approach using unsupervised learning enables the discovery of association rules while paying special attention to temporal information, resulting in greater predictive and descriptive power as a patient's health and life status change over time.
Collapse
Affiliation(s)
- Nabil Adam
- Phalcon, LLC., Manhasset, NY 11030, USA;
- Rutgers University, Newark Campus, Newark, NJ 07102, USA
| | - Robert Wieder
- Rutgers New Jersey Medical School, Newark, NJ 07103, USA
- Rutgers Cancer Institute of New Jersey, Newark, NJ 07103, USA
| |
Collapse
|
3
|
Oh W, Jayaraman P, Tandon P, Chaddha US, Kovatch P, Charney AW, Glicksberg BS, Nadkarni GN. A novel method leveraging time series data to improve subphenotyping and application in critically ill patients with COVID-19. Artif Intell Med 2024; 148:102750. [PMID: 38325922 PMCID: PMC10864255 DOI: 10.1016/j.artmed.2023.102750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 12/12/2023] [Accepted: 12/14/2023] [Indexed: 02/09/2024]
Abstract
Computational subphenotyping, a data-driven approach to understanding disease subtypes, is a prominent topic in medical research. Numerous ongoing studies are dedicated to developing advanced computational subphenotyping methods for cross-sectional data. However, the potential of time-series data has been underexplored until now. Here, we propose a Multivariate Levenshtein Distance (MLD) that can account for address correlation in multiple discrete features over time-series data. Our algorithm has two distinct components: it integrates an optimal threshold score to enhance the sensitivity in discriminating between pairs of instances, and the MLD itself. We have applied the proposed distance metrics on the k-means clustering algorithm to derive temporal subphenotypes from time-series data of biomarkers and treatment administrations from 1039 critically ill patients with COVID-19 and compare its effectiveness to standard methods. In conclusion, the Multivariate Levenshtein Distance metric is a novel method to quantify the distance from multiple discrete features over time-series data and demonstrates superior clustering performance among competing time-series distance metrics.
Collapse
Affiliation(s)
- Wonsuk Oh
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Pushkala Jayaraman
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Pranai Tandon
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Udit S Chaddha
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Patricia Kovatch
- Department of Scientific Computing, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Alexander W Charney
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Benjamin S Glicksberg
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Character Biosciences, New York, NY, USA
| | - Girish N Nadkarni
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Nephrology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
4
|
Sheetrit E, Brief M, Elisha O. Predicting unplanned readmissions in the intensive care unit: a multimodality evaluation. Sci Rep 2023; 13:15426. [PMID: 37723231 PMCID: PMC10507073 DOI: 10.1038/s41598-023-42372-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 09/09/2023] [Indexed: 09/20/2023] Open
Abstract
A hospital readmission is when a patient who was discharged from the hospital is admitted again for the same or related care within a certain period. Hospital readmissions are a significant problem in the healthcare domain, as they lead to increased hospitalization costs, decreased patient satisfaction, and increased risk of adverse outcomes such as infections, medication errors, and even death. The problem of hospital readmissions is particularly acute in intensive care units (ICUs), due to the severity of the patients' conditions, and the substantial risk of complications. Predicting Unplanned Readmissions in ICUs is a challenging task, as it involves analyzing different data modalities, such as static data, unstructured free text, sequences of diagnoses and procedures, and multivariate time-series. Here, we investigate the effectiveness of each data modality separately, then alongside with others, using state-of-the-art machine learning approaches in time-series analysis and natural language processing. Using our evaluation process, we are able to determine the contribution of each data modality, and for the first time in the context of readmission, establish a hierarchy of their predictive value. Additionally, we demonstrate the impact of Temporal Abstractions in enhancing the performance of time-series approaches to readmission prediction. Due to conflicting definitions in the literature, we also provide a clear definition of the term Unplanned Readmission to enhance reproducibility and consistency of future research and to prevent any potential misunderstandings that could result from diverse interpretations of the term. Our experimental results on a large benchmark clinical data set show that Discharge Notes written by physicians, have better capabilities for readmission prediction than all other modalities.
Collapse
|
5
|
Prediction of acute hypertensive episodes in critically ill patients. Artif Intell Med 2023; 139:102525. [PMID: 37100504 DOI: 10.1016/j.artmed.2023.102525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 01/19/2023] [Accepted: 03/06/2023] [Indexed: 03/16/2023]
Abstract
Prevention and treatment of complications are the backbone of medical care, particularly in critical care settings. Early detection and prompt intervention can potentially prevent complications from occurring and improve outcomes. In this study, we use four longitudinal vital signs variables of intensive care unit patients, focusing on predicting acute hypertensive episodes (AHEs). These episodes represent elevations in blood pressure and may result in clinical damage or indicate a change in a patient's clinical situation, such as an elevation in intracranial pressure or kidney failure. Prediction of AHEs may allow clinicians to anticipate changes in the patient's condition and respond early on to prevent these from occurring. Temporal abstraction was employed to transform the multivariate temporal data into a uniform representation of symbolic time intervals, from which frequent time-intervals-related patterns (TIRPs) are mined and used as features for AHE prediction. A novel TIRP metric for classification, called coverage, is introduced that measures the coverage of a TIRP's instances in a time window. For comparison, several baseline models were applied on the raw time series data, including logistic regression and sequential deep learning models, are used. Our results show that using frequent TIRPs as features outperforms the baseline models, and the use of the coverage, metric outperforms other TIRP metrics. Two approaches to predicting AHEs in real-life application conditions are evaluated: using a sliding window to continuously predict whether a patient would experience an AHE within a specific prediction time period ahead, our models produced an AUC-ROC of 82%, but with low AUPRC. Alternatively, predicting whether an AHE would generally occur during the entire admission resulted in an AUC-ROC of 74%.
Collapse
|
6
|
A time-interval-based active learning framework for enhanced PE malware acquisition and detection. Comput Secur 2022. [DOI: 10.1016/j.cose.2022.102838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
7
|
Personalized insulin dose manipulation attack and its detection using interval-based temporal patterns and machine learning algorithms. J Biomed Inform 2022; 132:104129. [PMID: 35781036 DOI: 10.1016/j.jbi.2022.104129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 05/16/2022] [Accepted: 06/21/2022] [Indexed: 11/20/2022]
Abstract
Many patients with diabetes are currently being treated with insulin pumps and other diabetes devices which improve their quality of life and enable effective treatment of diabetes. These devices are connected wirelessly and thus, are vulnerable to cyber-attacks which have already been proven feasible. In this paper, we focus on two types of cyber-attacks on insulin pump systems: an overdose of insulin, which can cause hypoglycemia, and an underdose of insulin, which can cause hyperglycemia. Both of these attacks can result in a variety of complications and endanger a patient's life. Specifically, we propose a sophisticated and personalized insulin dose manipulation attack; this attack is based on a novel method of predicting the blood glucose (BG) level in response to insulin dose administration. To protect patients from the proposed sophisticated and malicious insulin dose manipulation attacks, we also present an automated machine learning based system for attack detection; the detection system is based on an advanced temporal pattern mining process, which is performed on the logs of real insulin pumps and continuous glucose monitors (CGMs). Our multivariate time-series data (MTSD) collection consists of 225,780 clinical logs, collected from real insulin pumps and CGMs of 47 patients with type I diabetes (13 adults and 34 children) from two different clinics at Soroka University Medical Center in Beer-Sheva, Israel over a four-year period. We enriched our data collection with additional relevant medical information related to the subjects. In the extensive experiments performed, we evaluated the proposed attack and detection system and examined whether: (1) it is possible to accurately predict BG levels in order to create malicious data that simulate a manipulation attack and the patient's body in response to it; (2) it is possible to automatically detect such attacks based on advanced machine learning (ML) methods that leverage temporal patterns; (3) the detection capabilities of the proposed detection system differ for insulin overdose and underdose attacks; and (4) the granularity of the learning model (general / adult vs. pediatric clinic / individual patient) affects the detection capabilities. Our results show that (a) it is possible to predict, with nearly 90% accuracy, BG levels using our proposed methods, and by doing so, enable malicious data creation for our detection system evaluation; (b) it is possible to accurately detect insulin manipulation attacks using temporal patterns mining using several ML methods, including Logistic Regression, Random Forest, TPF class model, TPF top k, and ANN algorithms; (c) it is easier to detect an overdose attack than an underdose attack in more than 25%, in terms of AUC scores; and (d) the adult vs. pediatric model outperformed models of other granularities in the detection of overdose attacks, while the general model outperformed the other models in the case of detecting underdose attacks; for both attacks, attack detection among children was found to be more challenging than among adults. In addition to its use in the evaluation of our detection system, the proposed BG prediction method has great importance in the medical domain where it can contribute to improved care of patients with diabetes.
Collapse
|
8
|
All-cause mortality prediction in T2D patients with iTirps. Artif Intell Med 2022; 130:102325. [DOI: 10.1016/j.artmed.2022.102325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 05/17/2022] [Accepted: 05/17/2022] [Indexed: 11/17/2022]
|
9
|
Finder I, Sheetrit E, Nissim N. Time-interval temporal patterns can beat and explain the malware. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108266] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
10
|
Mordvanyuk N, López B, Bifet A. TA4L: Efficient temporal abstraction of multivariate time series. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
11
|
Lion M, Shahar Y. Implementation and evaluation of a multivariate abstraction-based, interval-based dynamic time-warping method as a similarity measure for longitudinal medical records. J Biomed Inform 2021; 123:103919. [PMID: 34628062 DOI: 10.1016/j.jbi.2021.103919] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 08/25/2021] [Accepted: 09/27/2021] [Indexed: 11/30/2022]
Abstract
OBJECTIVES A common prerequisite for tasks such as classification, prediction, clustering and retrieval of longitudinal medical records is a clinically meaningful similarity measure that considers both [multiple] variable (concept) values and their time. Currently, most similarity measures focus on raw, time-stamped data as these are stored in a medical record. However, clinicians think in terms of clinically meaningful temporal abstractions, such as "decreasing renal functions", enabling them to ignore minor time and value variations and focus on similarities among the clinical trajectories of different patients. Our objective was to define an abstraction- and interval-based methodology for matching longitudinal, multivariate medical records, and rigorously assess its value, versus the option of using just the raw, time-stamped data. METHODS We have developed a new methodology for determination of the relative distance between a pair of longitudinal records, by extending the known dynamic time warping (DTW) method into an interval-based dynamic time warping (iDTW) methodology. The iDTW methodology includes (A): A three-steps interval-based representation (iRep) method: [1] abstracting the raw, time-stamped data of the longitudinal records into clinically meaningful interval-based abstractions, using a domain-specific knowledge base, [2] scoping the period of comparison of the records, [3] creating from the intervals a symbolic time series, by partitioning them into a predetermined temporal granularity; (B) An interval-based matching (iMatch) method to match each relevant pair of multivariate longitudinal records, each represented as multiple series of short symbolic intervals in the determined temporal granularity, using a modified DTW version. EVALUATION Three classification or prediction tasks were defined: (1) classifying 161 records of oncology patients as having had autologous versus allogenic bone-marrow transplantation; (2) classifying the longitudinal records of 125 hepatitis patients as having B or C hepatitis; and (3) predicting micro- or macro-albuminuria in the second year, for 151 diabetes patients who were followed for five years. The raw, time-stamped, multivariate data within each medical record, for one, two, or three concepts out of four or five concepts judged as relevant in each medical domain, were abstracted into clinically meaningful intervals using the Knowledge-Based Temporal-Abstraction method, using previously acquired knowledge. We focused on two temporal-abstraction types: (1) State abstractions, which discretize a concept's raw value into a predetermined range (e.g., LOW or HIGH Hemoglobin); and (2) Gradient abstractions, which indicate the trend of the concept's value (e.g., INCREASING, DECREASING Hemoglobin value). We created all of the combinations of either uni-dimensional (State or Gradient) or multi-dimensional (State and Gradient) abstractions, of all of the concepts used. Classification of a record was determined by using a majority of the k-Nearest-Neighbors (KNN) of the given record, k ranging over the odd numbers (to break ties) from 1 to N, N being the size of the training set. We have experimented with all possible configurations of the parameters that our method uses. Overall, a total of 75,936 experiments were performed: 33,600 in the Oncology domain, 28,800 in the Hepatitis domain, and 13,536 in the Diabetes domain. Each experiment involved the performance of a 10-fold Cross Validation to compute the mean performance of a particular iDTW method-configuration set of settings, for a specific subset of one, two, or three concepts out of all of the domain-specific concepts relevant to the classification or prediction task on which the experiment focuses. We measured for each such experimental combination the Area Under the Curve (AUC) and the optimal Specificity/Sensitivity ratio using Youden's Index. We then aggregated the experiments by the types of unidimensional or multidimensional abstractions used in them (including the use of only raw concepts as a special case); for example, two state abstractions of different concepts, and one gradient abstraction of a third concept. We compared the mean AUC when using each such feature representation, or combination of abstractions, across all possible method-setting configurations, to the mean AUC when using as a feature representation, for the same task, only raw concepts, also across all possible method-setting configurations. Finally, we applied a paired t-test, to determine whether the mean difference between the accuracy of each temporal-abstraction representation, across all concept and configuration combinations, and the respective raw-concept combinations, across all concept subset and configuration combinations, is significant (P < 0.05). RESULTS The mean performance of the classification and prediction tasks when using, as a feature representation, the various temporal-abstraction combinations, was significantly higher than that performance when using only raw data. Furthermore, in each domain and task, there existed at least one representation using interval-based abstractions whose use led, on average (over all concept subset combinations and method configurations) to a significantly better performance than the use of only subsets of the raw time-stamped data. In seven of nine combinations of domain type (out of three) and number of concepts used (one, two, or three), the variance of the AUCs (for all representations and configurations) was considerably higher across all raw-concept subsets, compared to all abstract combinations. Increasing the number of features used by the matching task enhanced performance. Using multi-dimensional abstractions of the same concept further enhanced the performance. When using only raw data, increasing the number of neighbors monotonically increased the mean performance (over all concept combinations and method configurations) until reaching an optimal saddle-point aroundN; when using abstractions, however, optimal mean performance was often reached after matching only five nearest neighbors. CONCLUSIONS Using multivariate and multidimensional interval-based, abstraction-based similarity measures is feasible, and consistently and significantly improved the mean classification and prediction performance in time-oriented domains, using DTW-inspired methods, compared to the use of only raw, time-stamped data. It also made the KNN classification more effective. Nevertheless, although the mean performance for the abstract representations was higher than the mean performance when using only raw-data concepts, the actual optimal classification performance in each domain and task depends on the choice of the specific raw or abstract concepts used as features.
Collapse
Affiliation(s)
- Matan Lion
- Medical Informatics Research Center, Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel.
| | - Yuval Shahar
- Medical Informatics Research Center, Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel.
| |
Collapse
|
12
|
Schvetz M, Fuchs L, Novack V, Moskovitch R. Outcomes prediction in longitudinal data: Study designs evaluation, use case in ICU acquired sepsis. J Biomed Inform 2021; 117:103734. [PMID: 33711544 DOI: 10.1016/j.jbi.2021.103734] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2020] [Revised: 02/27/2021] [Accepted: 03/01/2021] [Indexed: 12/23/2022]
Abstract
Outcomes' prediction in Electronic Health Records (EHR) and specifically in Critical Care is increasingly attracting more exploration and research. In this study, we used clinical data from the Intensive Care Unit (ICU), focusing on ICU acquired sepsis. Looking at the current literature, several evaluation approaches are reported, inspired by epidemiological designs, in which some do not always reflect real-life application's conditions. This problem seems relevant generally to outcomes' prediction in longitudinal EHR data, or generally longitudinal data, while in this study we focused on ICU data. Unlike in most previous studies that investigated all sepsis admissions, we focused specifically on ICU-Acquired Sepsis. Due to the sparse nature of the longitudinal data, we employed the use of Temporal Abstraction and Time Interval-Related Patterns discovery, which are further used as classification features. Two experiments were designed using three different outcomes prediction study designs from the literature, implementing various levels of real-life conditions to evaluate the prediction models. The first experiment focused on predicting whether a patient would suffer from ICU-acquired sepsis and when during her admission, given a sliding observation time window, and the comparison of the three study designs behavior. The second experiment focused only on predicting whether the patient will suffer from ICU-acquired sepsis, based on data taken relatively to his admission start time. Our results show that using Temporal Discretization for Classification (TD4C) led to better performance than using the Equal-Width Discretization, Knowledge-Based, or SAX. Also, using two states abstraction was better than three or four. Using the default Binary TIRP representation method performed better than Mean Duration, Horizontal Support, and horizontally normalized horizontal support. Using XGBoost as a classifier performed better than Logistic Regression, Neural Net, or Random Forest. Additionally, it is demonstrated why the use of case-crossover-control is most appropriate for real life application conditions evaluation, unlike other incomplete designs that may even result in "better performance".
Collapse
Affiliation(s)
- Maya Schvetz
- Department of Software and Information Systems Engineering, Ben Gurion University of the Negev, Beer-Sheva, Israel.
| | - Lior Fuchs
- Medical Intensive Care Unit and Clinical Research Center, Soroka University Medical Center, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel.
| | - Victor Novack
- Clinical Research Center, Soroka University Medical Center, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel.
| | - Robert Moskovitch
- Department of Software and Information Systems Engineering, Ben Gurion University of the Negev, Beer-Sheva, Israel.
| |
Collapse
|
13
|
Shlomo A, Kalech M, Moskovitch R. Temporal pattern-based malicious activity detection in SCADA systems. Comput Secur 2021. [DOI: 10.1016/j.cose.2020.102153] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
14
|
Abstract
The increasing use of electronic health record (EHR)-based systems has led to the generation of clinical data at an unprecedented rate, which produces an untapped resource for healthcare experts to improve the quality of care. Despite the growing demand for adopting EHRs, the large amount of clinical data has made some analytical and cognitive processes more challenging. The emergence of a type of computational system called visual analytics has the potential to handle information overload challenges in EHRs by integrating analytics techniques with interactive visualizations. In recent years, several EHR-based visual analytics systems have been developed to fulfill healthcare experts’ computational and cognitive demands. In this paper, we conduct a systematic literature review to present the research papers that describe the design of EHR-based visual analytics systems and provide a brief overview of 22 systems that met the selection criteria. We identify and explain the key dimensions of the EHR-based visual analytics design space, including visual analytics tasks, analytics, visualizations, and interactions. We evaluate the systems using the selected dimensions and identify the gaps and areas with little prior work.
Collapse
|
15
|
Morid MA, Sheng ORL, Kawamoto K, Abdelrahman S. Learning hidden patterns from patient multivariate time series data using convolutional neural networks: A case study of healthcare cost prediction. J Biomed Inform 2020; 111:103565. [DOI: 10.1016/j.jbi.2020.103565] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Revised: 08/27/2020] [Accepted: 09/07/2020] [Indexed: 01/20/2023]
|
16
|
Quantitative and temporal approach to utilising electronic medical records from general practices in mental health prediction. Comput Biol Med 2020; 125:103973. [DOI: 10.1016/j.compbiomed.2020.103973] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2020] [Revised: 08/11/2020] [Accepted: 08/11/2020] [Indexed: 01/06/2023]
|
17
|
Dagliati A, Geifman N, Peek N, Holmes JH, Sacchi L, Bellazzi R, Sajjadi SE, Tucker A. Using topological data analysis and pseudo time series to infer temporal phenotypes from electronic health records. Artif Intell Med 2020; 108:101930. [PMID: 32972659 PMCID: PMC7536308 DOI: 10.1016/j.artmed.2020.101930] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2020] [Revised: 05/21/2020] [Accepted: 07/11/2020] [Indexed: 11/17/2022]
Abstract
Topological Data and Pseudo Time Series to discover Type 2 Diabetes temporal phenotypes. Temporal phenotypes inferred from state-space model based on hidden-states transitions. Study of states continuous transitions visually delivered in an easily explainable way. Mined phenotypes characterized by significant differences in disease deterioration.
Temporal phenotyping enables clinicians to better understand observable characteristics of a disease as it progresses. Modelling disease progression that captures interactions between phenotypes is inherently challenging. Temporal models that capture change in disease over time can identify the key features that characterize disease subtypes that underpin these trajectories. These models will enable clinicians to identify early warning signs of progression in specific sub-types and therefore to make informed decisions tailored to individual patients. In this paper, we explore two approaches to building temporal phenotypes based on the topology of data: topological data analysis and pseudo time-series. Using type 2 diabetes data, we show that the topological data analysis approach is able to identify disease trajectories and that pseudo time-series can infer a state space model characterized by transitions between hidden states that represent distinct temporal phenotypes. Both approaches highlight lipid profiles as key factors in distinguishing the phenotypes.
Collapse
Affiliation(s)
- Arianna Dagliati
- Centre for Health Informatics, University of Manchester, Manchester, United Kingdom; Manchester Molecular Pathology Innovation Centre, University of Manchester, United Kingdom; Department of Electrical, Computer & Biomedical Engineering University of Pavia, Italy.
| | - Nophar Geifman
- Centre for Health Informatics, University of Manchester, Manchester, United Kingdom
| | - Niels Peek
- Centre for Health Informatics, University of Manchester, Manchester, United Kingdom; NIHR Manchester Biomedical Research Centre, University of Manchester, United Kingdom
| | - John H Holmes
- Department of Biostatistics, Epidemiology, and Informatics, Penn Institute for Biomedical Informatics, University of Pennsylvania Perelman School of Medicine, USA
| | - Lucia Sacchi
- Department of Electrical, Computer & Biomedical Engineering University of Pavia, Italy
| | - Riccardo Bellazzi
- Department of Electrical, Computer & Biomedical Engineering University of Pavia, Italy
| | | | - Allan Tucker
- Department of Computer Science, Brunel University London, United Kingdom
| |
Collapse
|
18
|
Morid MA, Sheng ORL, Del Fiol G, Facelli JC, Bray BE, Abdelrahman S. Temporal Pattern Detection to Predict Adverse Events in Critical Care: Case Study With Acute Kidney Injury. JMIR Med Inform 2020; 8:e14272. [PMID: 32181753 PMCID: PMC7109618 DOI: 10.2196/14272] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Revised: 11/23/2019] [Accepted: 01/22/2020] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND More than 20% of patients admitted to the intensive care unit (ICU) develop an adverse event (AE). No previous study has leveraged patients' data to extract the temporal features using their structural temporal patterns, that is, trends. OBJECTIVE This study aimed to improve AE prediction methods by using structural temporal pattern detection that captures global and local temporal trends and to demonstrate these improvements in the detection of acute kidney injury (AKI). METHODS Using the Medical Information Mart for Intensive Care dataset, containing 22,542 patients, we extracted both global and local trends using structural pattern detection methods to predict AKI (ie, binary prediction). Classifiers were built on 17 input features consisting of vital signs and laboratory test results using state-of-the-art models; the optimal classifier was selected for comparisons with previous approaches. The classifier with structural pattern detection features was compared with two baseline classifiers that used different temporal feature extraction approaches commonly used in the literature: (1) symbolic temporal pattern detection, which is the most common approach for multivariate time series classification; and (2) the last recorded value before the prediction point, which is the most common approach to extract temporal data in the AKI prediction literature. Moreover, we assessed the individual contribution of global and local trends. Classifier performance was measured in terms of accuracy (primary outcome), area under the curve, and F-measure. For all experiments, we employed 20-fold cross-validation. RESULTS Random forest was the best classifier using structural temporal pattern detection. The accuracy of the classifier with local and global trend features was significantly higher than that while using symbolic temporal pattern detection and the last recorded value (81.3% vs 70.6% vs 58.1%; P<.001). Excluding local or global features reduced the accuracy to 74.4% or 78.1%, respectively (P<.001). CONCLUSIONS Classifiers using features obtained from structural temporal pattern detection significantly improved the prediction of AKI onset in ICU patients over two baselines based on common previous approaches. The proposed method is a generalizable approach to predict AEs in critical care that may be used to help clinicians intervene in a timely manner to prevent or mitigate AEs.
Collapse
Affiliation(s)
- Mohammad Amin Morid
- Department of Information Systems and Analytics, Leavey School of Business, Santa Clara University, Santa Clara, CA, United States
| | - Olivia R Liu Sheng
- Department of Operations and Information Systems, David Eccles School of Business, University of Utah, Salt Lake City, UT, United States
| | - Guilherme Del Fiol
- Department of Biomedical Informatics, School of Medicine, University of Utah, Salt Lake City, UT, United States
| | - Julio C Facelli
- Department of Biomedical Informatics, School of Medicine, University of Utah, Salt Lake City, UT, United States
- Center for Clinical and Translational Science, University of Utah, Salt Lake City, UT, United States
| | - Bruce E Bray
- Department of Biomedical Informatics, School of Medicine, University of Utah, Salt Lake City, UT, United States
- Division of Cardiovascular Medicine, School of Medicine, University of Utah, Salt Lake City, UT, United States
| | - Samir Abdelrahman
- Department of Biomedical Informatics, School of Medicine, University of Utah, Salt Lake City, UT, United States
- Computer Science Department, Faculty of Computers and Information, Cairo University, Cairo, Egypt
| |
Collapse
|
19
|
Mate S, Bürkle T, Kapsner LA, Toddenroth D, Kampf MO, Sedlmayr M, Castellanos I, Prokosch HU, Kraus S. A method for the graphical modeling of relative temporal constraints. J Biomed Inform 2019; 100:103314. [PMID: 31629921 DOI: 10.1016/j.jbi.2019.103314] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2019] [Revised: 08/13/2019] [Accepted: 10/14/2019] [Indexed: 02/06/2023]
Abstract
Searching for patient cohorts in electronic patient data often requires the definition of temporal constraints between the selection criteria. However, beyond a certain degree of temporal complexity, the non-graphical, form-based approaches implemented in current translational research platforms may be limited when modeling such constraints. In our opinion, there is a need for an easily accessible and implementable, fully graphical method for creating temporal queries. We aim to respond to this challenge with a new graphical notation. Based on Allen's time interval algebra, it allows for modeling temporal queries by arranging simple horizontal bars depicting symbolic time intervals. To make our approach applicable to complex temporal patterns, we apply two extensions: with duration intervals, we enable the inference about relative temporal distances between patient events, and with time interval modifiers, we support counting and excluding patient events, as well as constraining numeric values. We describe how to generate database queries from this notation. We provide a prototypical implementation, consisting of a temporal query modeling frontend and an experimental backend that connects to an i2b2 system. We evaluate our modeling approach on the MIMIC-III database to demonstrate that it can be used for modeling typical temporal phenotyping queries.
Collapse
Affiliation(s)
- Sebastian Mate
- Medical Centre for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Germany.
| | - Thomas Bürkle
- Bern University of Applied Sciences, Biel, Switzerland
| | - Lorenz A Kapsner
- Medical Centre for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Germany
| | - Dennis Toddenroth
- Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
| | - Marvin O Kampf
- Medical Centre for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Germany
| | - Martin Sedlmayr
- Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, Dresden, Germany
| | - Ixchel Castellanos
- Department of Anesthesiology, Universitätsklinikum Erlangen, Erlangen, Germany
| | - Hans-Ulrich Prokosch
- Medical Centre for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Germany; Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
| | - Stefan Kraus
- Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
| |
Collapse
|
20
|
Georga EI, Tachos NS, Sakellarios AI, Kigka VI, Exarchos TP, Pelosi G, Parodi O, Michalis LK, Fotiadis DI. Artificial Intelligence and Data Mining Methods for Cardiovascular Risk Prediction. ACTA ACUST UNITED AC 2019. [DOI: 10.1007/978-981-10-5092-3_14] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
21
|
Moskovitch R, Shahar Y, Wang F, Hripcsak G. Temporal biomedical data analytics. J Biomed Inform 2019; 90:103092. [PMID: 30654029 PMCID: PMC9745669 DOI: 10.1016/j.jbi.2018.12.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2018] [Accepted: 12/24/2018] [Indexed: 02/07/2023]
Affiliation(s)
- Robert Moskovitch
- Department of Information Systems Engineering, Ben Gurion University of the Negev, Beersheba, Israel
| | - Yuval Shahar
- Department of Information Systems Engineering, Ben Gurion University of the Negev, Beersheba, Israel
| | - Fei Wang
- Department of Healthcare Policy and Research, Weill Cornell Medical College, Cornell University, New York, NY, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| |
Collapse
|
22
|
Nguyen D, Luo W, Phung D, Venkatesh S. LTARM: A novel temporal association rule mining method to understand toxicities in a routine cancer treatment. Knowl Based Syst 2018. [DOI: 10.1016/j.knosys.2018.07.031] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
23
|
Incorporating repeating temporal association rules in Naïve Bayes classifiers for coronary heart disease diagnosis. J Biomed Inform 2018; 81:74-82. [PMID: 29555443 DOI: 10.1016/j.jbi.2018.03.002] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Revised: 02/14/2018] [Accepted: 03/07/2018] [Indexed: 01/08/2023]
Abstract
In this paper, we develop a Naïve Bayes classification model integrated with temporal association rules (TARs). A temporal pattern mining algorithm is used to detect TARs by identifying the most frequent temporal relationships among the derived basic temporal abstractions (TA). We develop and compare three classifiers that use as features the most frequent TARs as follows: (i) representing the most frequent TARs detected within the target class ('Disease = Present'), (ii) representing the most frequent TARs from both classes ('Disease = Present', 'Disease = Absent'), (iii) representing the most frequent TARs, after removing the ones that are low-risk predictors for the disease. These classifiers incorporate the horizontal support of TARs, which defines the number of times that a particular temporal pattern is found in some patient's record, as their features. All of the developed classifiers are applied for diagnosis of coronary heart disease (CHD) using a longitudinal dataset. We compare two ways of feature representation, using horizontal support or the mean duration of each TAR, on a single patient. The results obtained from this comparison show that the horizontal support representation outperforms the mean duration. The main effort of our research is to demonstrate that where long time periods are of significance in some medical domain, such as the CHD domain, the detection of the repeated occurrences of the most frequent TARs can yield better performances. We compared the classifier that uses the horizontal support representation and has the best performance with a Baseline Classifier which uses the binary representation of the most frequent TARs. The results obtained illustrate the comparatively high performance of the classifier representing the horizontal support, over the Baseline Classifier.
Collapse
|
24
|
Liu L, Wang S, Su G, Hu B, Peng Y, Xiong Q, Wen J. A framework of mining semantic-based probabilistic event relations for complex activity recognition. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2017.07.022] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
25
|
Shknevsky A, Shahar Y, Moskovitch R. Consistent discovery of frequent interval-based temporal patterns in chronic patients' data. J Biomed Inform 2017; 75:83-95. [PMID: 28987378 DOI: 10.1016/j.jbi.2017.10.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2017] [Revised: 08/23/2017] [Accepted: 10/02/2017] [Indexed: 11/24/2022]
Abstract
Increasingly, frequent temporal patterns discovered in longitudinal patient records are proposed as features for classification and prediction, and as means to cluster patient clinical trajectories. However, to justify that, we must demonstrate that most frequent temporal patterns are indeed consistently discoverable within the records of different patient subsets within similar patient populations. We have developed several measures for the consistency of the discovery of temporal patterns. We focus on time-interval relations patterns (TIRPs) that can be discovered within different subsets of the same patient population. We expect the discovered TIRPs (1) to be frequent in each subset, (2) preserve their "local" metrics - the absolute frequency of each pattern, measured by a Proportion Test, and (3) preserve their "global" characteristics - their overall distribution, measured by a Kolmogorov-Smirnov test. We also wanted to examine the effect on consistency, over a variety of settings, of varying the minimal frequency threshold for TIRP discovery, and of using a TIRP-filtering criterion that we previously introduced, the Semantic Adjacency Criterion (SAC). We applied our methodology to three medical domains (oncology, infectious hepatitis, and diabetes). We found that, within the minimal frequency ranges we had examined, 70-95% of the discovered TIRPs were consistently discoverable; 40-48% of them maintained their local frequency. TIRP global distribution similarity varied widely, from 0% to 65%. Increasing the threshold usually increased the percentage of TIRPs that were repeatedly discovered across different patient subsets within the same domain, and the probability of a similar TIRP distribution. Using the SAC principle, enhanced, for most minimal support levels, the percentage of repeating TIRPs, their local consistency and their global consistency. The effect of using the SAC was further strengthened as the minimal frequency threshold was raised.
Collapse
Affiliation(s)
- Alexander Shknevsky
- Software and Information Systems Engineering, Ben-Gurion University, Beer Sheva, Israel.
| | - Yuval Shahar
- Software and Information Systems Engineering, Ben-Gurion University, Beer Sheva, Israel.
| | - Robert Moskovitch
- Software and Information Systems Engineering, Ben-Gurion University, Beer Sheva, Israel.
| |
Collapse
|
26
|
Nissim N, Shahar Y, Elovici Y, Hripcsak G, Moskovitch R. Inter-labeler and intra-labeler variability of condition severity classification models using active and passive learning methods. Artif Intell Med 2017; 81:12-32. [PMID: 28456512 PMCID: PMC5937023 DOI: 10.1016/j.artmed.2017.03.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Accepted: 03/03/2017] [Indexed: 01/20/2023]
Abstract
BACKGROUND AND OBJECTIVES Labeling instances by domain experts for classification is often time consuming and expensive. To reduce such labeling efforts, we had proposed the application of active learning (AL) methods, introduced our CAESAR-ALE framework for classifying the severity of clinical conditions, and shown its significant reduction of labeling efforts. The use of any of three AL methods (one well known [SVM-Margin], and two that we introduced [Exploitation and Combination_XA]) significantly reduced (by 48% to 64%) condition labeling efforts, compared to standard passive (random instance-selection) SVM learning. Furthermore, our new AL methods achieved maximal accuracy using 12% fewer labeled cases than the SVM-Margin AL method. However, because labelers have varying levels of expertise, a major issue associated with learning methods, and AL methods in particular, is how to best to use the labeling provided by a committee of labelers. First, we wanted to know, based on the labelers' learning curves, whether using AL methods (versus standard passive learning methods) has an effect on the Intra-labeler variability (within the learning curve of each labeler) and inter-labeler variability (among the learning curves of different labelers). Then, we wanted to examine the effect of learning (either passively or actively) from the labels created by the majority consensus of a group of labelers. METHODS We used our CAESAR-ALE framework for classifying the severity of clinical conditions, the three AL methods and the passive learning method, as mentioned above, to induce the classifications models. We used a dataset of 516 clinical conditions and their severity labeling, represented by features aggregated from the medical records of 1.9 million patients treated at Columbia University Medical Center. We analyzed the variance of the classification performance within (intra-labeler), and especially among (inter-labeler) the classification models that were induced by using the labels provided by seven labelers. We also compared the performance of the passive and active learning models when using the consensus label. RESULTS The AL methods: produced, for the models induced from each labeler, smoother Intra-labeler learning curves during the training phase, compared to the models produced when using the passive learning method. The mean standard deviation of the learning curves of the three AL methods over all labelers (mean: 0.0379; range: [0.0182 to 0.0496]), was significantly lower (p=0.049) than the Intra-labeler standard deviation when using the passive learning method (mean: 0.0484; range: [0.0275-0.0724). Using the AL methods resulted in a lower mean Inter-labeler AUC standard deviation among the AUC values of the labelers' different models during the training phase, compared to the variance of the induced models' AUC values when using passive learning. The Inter-labeler AUC standard deviation, using the passive learning method (0.039), was almost twice as high as the Inter-labeler standard deviation using our two new AL methods (0.02 and 0.019, respectively). The SVM-Margin AL method resulted in an Inter-labeler standard deviation (0.029) that was higher by almost 50% than that of our two AL methods The difference in the inter-labeler standard deviation between the passive learning method and the SVM-Margin learning method was significant (p=0.042). The difference between the SVM-Margin and Exploitation method was insignificant (p=0.29), as was the difference between the Combination_XA and Exploitation methods (p=0.67). Finally, using the consensus label led to a learning curve that had a higher mean intra-labeler variance, but resulted eventually in an AUC that was at least as high as the AUC achieved using the gold standard label and that was always higher than the expected mean AUC of a randomly selected labeler, regardless of the choice of learning method (including a passive learning method). Using a paired t-test, the difference between the intra-labeler AUC standard deviation when using the consensus label, versus that value when using the other two labeling strategies, was significant only when using the passive learning method (p=0.014), but not when using any of the three AL methods. CONCLUSIONS The use of AL methods, (a) reduces intra-labeler variability in the performance of the induced models during the training phase, and thus reduces the risk of halting the process at a local minimum that is significantly different in performance from the rest of the learned models; and (b) reduces Inter-labeler performance variance, and thus reduces the dependence on the use of a particular labeler. In addition, the use of a consensus label, agreed upon by a rather uneven group of labelers, might be at least as good as using the gold standard labeler, who might not be available, and certainly better than randomly selecting one of the group's individual labelers. Finally, using the AL methods: when provided by the consensus label reduced the intra-labeler AUC variance during the learning phase, compared to using passive learning.
Collapse
Affiliation(s)
- Nir Nissim
- Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel; Malware Lab, Cyber Security Research Center, Ben-Gurion University of the Negev, Beer-Sheva, Israel.
| | - Yuval Shahar
- Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Yuval Elovici
- Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel; Malware Lab, Cyber Security Research Center, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University, New York, NY, USA; Observational Health Data Sciences and Informatics, Columbia University, New York, NY, USA
| | - Robert Moskovitch
- Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel; Department of Biomedical Informatics, Columbia University, New York, NY, USA
| |
Collapse
|
27
|
Moskovitch R, Polubriaginof F, Weiss A, Ryan P, Tatonetti N. Procedure prediction from symbolic Electronic Health Records via time intervals analytics. J Biomed Inform 2017; 75:70-82. [PMID: 28823923 DOI: 10.1016/j.jbi.2017.07.018] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2016] [Revised: 06/19/2017] [Accepted: 07/25/2017] [Indexed: 11/18/2022]
Abstract
Prediction of medical events, such as clinical procedures, is essential for preventing disease, understanding disease mechanism, and increasing patient quality of care. Although longitudinal clinical data from Electronic Health Records provides opportunities to develop predictive models, the use of these data faces significant challenges. Primarily, while the data are longitudinal and represent thousands of conceptual events having duration, they are also sparse, complicating the application of traditional analysis approaches. Furthermore, the framework presented here takes advantage of the events duration and gaps. International standards for electronic healthcare data represent data elements, such as procedures, conditions, and drug exposures, using eras, or time intervals. Such eras contain both an event and a duration and enable the application of time intervals mining - a relatively new subfield of data mining. In this study, we present Maitreya, a framework for time intervals analytics in longitudinal clinical data. Maitreya discovers frequent time intervals related patterns (TIRPs), which we use as prognostic markers for modelling clinical events. We introduce three novel TIRP metrics that are normalized versions of the horizontal-support, that represents the number of TIRP instances per patient. We evaluate Maitreya on 28 frequent and clinically important procedures, using the three novel TIRP representation metrics in comparison to no temporal representation and previous TIRPs metrics. We also evaluate the epsilon value that makes Allen's relations more flexible with several settings of 30, 60, 90 and 180days in comparison to the default zero. For twenty-two of these procedures, the use of temporal patterns as predictors was superior to non-temporal features, and the use of the vertically normalized horizontal support metric to represent TIRPs as features was most effective. The use of the epsilon value with thirty days was slightly better than the zero.
Collapse
Affiliation(s)
- Robert Moskovitch
- Department of Biomedical Informatics, Columbia University, NY, USA; Department of Systems Biology, Columbia University, NY, USA; Department of Medicine, Columbia University, NY, USA; Observational Health Data Sciences and Informations (OHDSI), NY, USA; Department of Software and Information Systems Engineering, Ben Gurion Univeristy, Beer Sheva, Israel.
| | - Fernanda Polubriaginof
- Department of Biomedical Informatics, Columbia University, NY, USA; Department of Systems Biology, Columbia University, NY, USA; Department of Medicine, Columbia University, NY, USA; Observational Health Data Sciences and Informations (OHDSI), NY, USA
| | - Aviram Weiss
- Department of Software and Information Systems Engineering, Ben Gurion Univeristy, Beer Sheva, Israel
| | - Patrick Ryan
- Department of Biomedical Informatics, Columbia University, NY, USA
| | - Nicholas Tatonetti
- Department of Biomedical Informatics, Columbia University, NY, USA; Department of Systems Biology, Columbia University, NY, USA; Department of Medicine, Columbia University, NY, USA; Observational Health Data Sciences and Informations (OHDSI), NY, USA.
| |
Collapse
|
28
|
|
29
|
Moskovitch R, Choi H, Hripcsak G, Tatonetti N. Prognosis of Clinical Outcomes with Temporal Patterns and Experiences with One Class Feature Selection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:555-563. [PMID: 27429447 PMCID: PMC5486920 DOI: 10.1109/tcbb.2016.2591539] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Accurate prognosis of outcome events, such as clinical procedures or disease diagnosis, is central in medicine. The emergence of longitudinal clinical data, like the Electronic Health Records (EHR), represents an opportunity to develop automated methods for predicting patient outcomes. However, these data are highly dimensional and very sparse, complicating the application of predictive modeling techniques. Further, their temporal nature is not fully exploited by current methods, and temporal abstraction was recently used which results in symbolic time intervals representation. We present Maitreya, a framework for the prediction of outcome events that leverages these symbolic time intervals. Using Maitreya, learn predictive models based on the temporal patterns in the clinical records that are prognostic markers and use these markers to train predictive models for eight clinical procedures. In order to decrease the number of patterns that are used as features, we propose the use of three one class feature selection methods. We evaluate the performance of Maitreya under several parameter settings, including the one-class feature selection, and compare our results to that of atemporal approaches. In general, we found that the use of temporal patterns outperformed the atemporal methods, when representing the number of pattern occurrences.
Collapse
|
30
|
Deja R, Froelich W, Deja G, Wakulicz-Deja A. Hybrid approach to the generation of medical guidelines for insulin therapy for children. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2016.07.066] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
31
|
Kostakis O, Papapetrou P. On searching and indexing sequences of temporal intervals. Data Min Knowl Discov 2017. [DOI: 10.1007/s10618-016-0489-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
32
|
Luo Y, Szolovits P. Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records. BIOMEDICAL INFORMATICS INSIGHTS 2016; 8:29-38. [PMID: 27478379 PMCID: PMC4954589 DOI: 10.4137/bii.s38916] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Revised: 06/13/2016] [Accepted: 06/22/2016] [Indexed: 11/07/2022]
Abstract
In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently retrieving such annotations that satisfy position constraints. Efficient storage and retrieval of stand-off annotations can facilitate tasks such as mapping unstructured text to electronic medical record ontologies. We first formulate this problem into the interval query problem, for which optimal query/update time is in general logarithm. We next perform a tight time complexity analysis on the basic interval tree query algorithm and show its nonoptimality when being applied to a collection of 13 query types from Allen’s interval algebra. We then study two closely related state-of-the-art interval query algorithms, proposed query reformulations, and augmentations to the second algorithm. Our proposed algorithm achieves logarithmic time stabbing-max query time complexity and solves the stabbing-interval query tasks on all of Allen’s relations in logarithmic time, attaining the theoretic lower bound. Updating time is kept logarithmic and the space requirement is kept linear at the same time. We also discuss interval management in external memory models and higher dimensions.
Collapse
Affiliation(s)
- Yuan Luo
- Assistant Professor, Department of Preventive Medicine, Northwestern University, Chicago, IL, USA
| | - Peter Szolovits
- Professor, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
33
|
Pimus I, Peleg M, Schertz M. Sequence Mining of Comorbid Neurodevelopmental Disorders Using the SPADE Algorithm. Methods Inf Med 2016; 55:223-33. [PMID: 26848079 DOI: 10.3414/me15-01-0142] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2015] [Accepted: 12/21/2015] [Indexed: 11/09/2022]
Abstract
OBJECTIVES Understanding the progression of comorbid neurodevelopmental disorders (NDD) during different critical time periods may contribute to our comprehension of the underlying pathophysiology of NDDs. The objective of our study was to identify frequent temporal sequences of developmental diagnoses in noisy patient data. METHODS We used a data set of 2810 patients, documenting NDD diagnoses given to them by an NDD expert at a child developmental center during multiple visits at different ages. Extensive preprocessing steps were developed in order to allow the data set to be processed by an efficient sequence mining algorithm (SPADE). RESULTS The discovered sequences were validated by cross validation for 10 iterations; all correlation coefficients for support, confidence and lift measures were above 0.75 and their proportions were similar. No signifi- cant differences between the distributions of sequences were found using Kolmogorov-Smirnov test. CONCLUSIONS We have demonstrated the feasibility of using the SPADE algorithm for discovery of valid temporal sequences of comorbid disorders in children with NDDs. The identification of such sequences would be beneficial from clinical and research perspectives. Moreover, these sequences could serve as features for developing a full-fledged temporal predictive model.
Collapse
Affiliation(s)
| | - Mor Peleg
- Mor Peleg, Ph.D., Assoc. Prof., Department of Information Systems, Rabin Building, room 7047, Faculty of Social Sciences, University of Haifa, Haifa, Israel, 3498838, E-mail:
| | | |
Collapse
|
34
|
Nissim N, Boland MR, Tatonetti NP, Elovici Y, Hripcsak G, Shahar Y, Moskovitch R. Improving condition severity classification with an efficient active learning based framework. J Biomed Inform 2016; 61:44-54. [PMID: 27016383 DOI: 10.1016/j.jbi.2016.03.016] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2015] [Revised: 01/31/2016] [Accepted: 03/21/2016] [Indexed: 02/07/2023]
Abstract
Classification of condition severity can be useful for discriminating among sets of conditions or phenotypes, for example when prioritizing patient care or for other healthcare purposes. Electronic Health Records (EHRs) represent a rich source of labeled information that can be harnessed for severity classification. The labeling of EHRs is expensive and in many cases requires employing professionals with high level of expertise. In this study, we demonstrate the use of Active Learning (AL) techniques to decrease expert labeling efforts. We employ three AL methods and demonstrate their ability to reduce labeling efforts while effectively discriminating condition severity. We incorporate three AL methods into a new framework based on the original CAESAR (Classification Approach for Extracting Severity Automatically from Electronic Health Records) framework to create the Active Learning Enhancement framework (CAESAR-ALE). We applied CAESAR-ALE to a dataset containing 516 conditions of varying severity levels that were manually labeled by seven experts. Our dataset, called the "CAESAR dataset," was created from the medical records of 1.9 million patients treated at Columbia University Medical Center (CUMC). All three AL methods decreased labelers' efforts compared to the learning methods applied by the original CAESER framework in which the classifier was trained on the entire set of conditions; depending on the AL strategy used in the current study, the reduction ranged from 48% to 64% that can result in significant savings, both in time and money. As for the PPV (precision) measure, CAESAR-ALE achieved more than 13% absolute improvement in the predictive capabilities of the framework when classifying conditions as severe. These results demonstrate the potential of AL methods to decrease the labeling efforts of medical experts, while increasing accuracy given the same (or even a smaller) number of acquired conditions. We also demonstrated that the methods included in the CAESAR-ALE framework (Exploitation and Combination_XA) are more robust to the use of human labelers with different levels of professional expertise.
Collapse
Affiliation(s)
- Nir Nissim
- Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel; Malware Lab, Cyber Security Research Center, Ben-Gurion University of the Negev, Beer-Sheva, Israel.
| | - Mary Regina Boland
- Department of Biomedical Informatics, Columbia University, New York, NY, USA; Observational Health Data Sciences and Informatics, Columbia University, New York, NY, USA
| | - Nicholas P Tatonetti
- Department of Biomedical Informatics, Columbia University, New York, NY, USA; Department of Systems Biology, Columbia University, New York, NY, USA; Department of Medicine, Columbia University, New York, NY, USA; Observational Health Data Sciences and Informatics, Columbia University, New York, NY, USA
| | - Yuval Elovici
- Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University, New York, NY, USA; Observational Health Data Sciences and Informatics, Columbia University, New York, NY, USA
| | - Yuval Shahar
- Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Robert Moskovitch
- Department of Biomedical Informatics, Columbia University, New York, NY, USA; Department of Systems Biology, Columbia University, New York, NY, USA; Department of Medicine, Columbia University, New York, NY, USA; Observational Health Data Sciences and Informatics, Columbia University, New York, NY, USA.
| |
Collapse
|
35
|
Boland MR, Jacunski A, Lorberbaum T, Romano JD, Moskovitch R, Tatonetti NP. Systems biology approaches for identifying adverse drug reactions and elucidating their underlying biological mechanisms. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2015; 8:104-22. [PMID: 26559926 DOI: 10.1002/wsbm.1323] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Revised: 09/30/2015] [Accepted: 10/01/2015] [Indexed: 01/06/2023]
Abstract
Small molecules are indispensable to modern medical therapy. However, their use may lead to unintended, negative medical outcomes commonly referred to as adverse drug reactions (ADRs). These effects vary widely in mechanism, severity, and populations affected, making ADR prediction and identification important public health concerns. Current methods rely on clinical trials and postmarket surveillance programs to find novel ADRs; however, clinical trials are limited by small sample size, whereas postmarket surveillance methods may be biased and inherently leave patients at risk until sufficient clinical evidence has been gathered. Systems pharmacology, an emerging interdisciplinary field combining network and chemical biology, provides important tools to uncover and understand ADRs and may mitigate the drawbacks of traditional methods. In particular, network analysis allows researchers to integrate heterogeneous data sources and quantify the interactions between biological and chemical entities. Recent work in this area has combined chemical, biological, and large-scale observational health data to predict ADRs in both individual patients and global populations. In this review, we explore the rapid expansion of systems pharmacology in the study of ADRs. We enumerate the existing methods and strategies and illustrate progress in the field with a model framework that incorporates crucial data elements, such as diet and comorbidities, known to modulate ADR risk. Using this framework, we highlight avenues of research that may currently be underexplored, representing opportunities for future work.
Collapse
Affiliation(s)
- Mary Regina Boland
- Department of Biomedical Informatics, Columbia University, New York, NY, USA.,Department of Systems Biology, Columbia University, New York, NY, USA.,Department of Medicine, Columbia University, New York, NY, USA.,Observational Health Data Science and Informatics (OHDSI), New York, NY, USA
| | - Alexandra Jacunski
- Department of Biomedical Informatics, Columbia University, New York, NY, USA.,Department of Systems Biology, Columbia University, New York, NY, USA.,Department of Medicine, Columbia University, New York, NY, USA.,Integrated Program in Cellular, Molecular and Biomedical Studies, Columbia University, New York, NY, USA
| | - Tal Lorberbaum
- Department of Biomedical Informatics, Columbia University, New York, NY, USA.,Department of Systems Biology, Columbia University, New York, NY, USA.,Department of Medicine, Columbia University, New York, NY, USA.,Department of Physiology and Cellular Biophysics, Columbia University, New York, NY, USA
| | - Joseph D Romano
- Department of Biomedical Informatics, Columbia University, New York, NY, USA.,Department of Systems Biology, Columbia University, New York, NY, USA.,Department of Medicine, Columbia University, New York, NY, USA
| | - Robert Moskovitch
- Department of Biomedical Informatics, Columbia University, New York, NY, USA.,Department of Systems Biology, Columbia University, New York, NY, USA.,Department of Medicine, Columbia University, New York, NY, USA
| | - Nicholas P Tatonetti
- Department of Biomedical Informatics, Columbia University, New York, NY, USA.,Department of Systems Biology, Columbia University, New York, NY, USA.,Department of Medicine, Columbia University, New York, NY, USA.,Observational Health Data Science and Informatics (OHDSI), New York, NY, USA
| |
Collapse
|
36
|
Perer A, Wang F, Hu J. Mining and exploring care pathways from electronic medical records with visual analytics. J Biomed Inform 2015; 56:369-78. [PMID: 26146159 DOI: 10.1016/j.jbi.2015.06.020] [Citation(s) in RCA: 73] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2014] [Revised: 05/13/2015] [Accepted: 06/26/2015] [Indexed: 11/28/2022]
Abstract
OBJECTIVE In order to derive data-driven insights, we develop Care Pathway Explorer, a system that mines and visualizes a set of frequent event sequences from patient EMR data. The goal is to utilize historical EMR data to extract common sequences of medical events such as diagnoses and treatments, and investigate how these sequences correlate with patient outcome. MATERIALS AND METHODS The Care Pathway Explorer uses a frequent sequence mining algorithm adapted to handle the real-world properties of EMR data, including techniques for handling event concurrency, multiple levels-of-detail, temporal context, and outcome. The mined patterns are then visualized in an interactive user interface consisting of novel overview and flow visualizations. RESULTS We use the proposed system to analyze the diagnoses and treatments of a cohort of hyperlipidemic patients with hypertension and diabetes pre-conditions, and demonstrate the clinical relevance of patterns mined from EMR data. The patterns that were identified corresponded to clinical and published knowledge, some of it unknown to the physician at the time of discovery. CONCLUSION Care Pathway Explorer, which combines frequent sequence mining techniques with advanced visualizations supports the integration of data-driven insights into care pathway discovery.
Collapse
Affiliation(s)
- Adam Perer
- IBM T.J. Watson Research Center, 1101 Kitchawan Road, P.O. Box 218, Yorktown Heights, NY 10598, USA.
| | - Fei Wang
- University of Connecticut, Storrs, CT, USA
| | - Jianying Hu
- IBM T.J. Watson Research Center, 1101 Kitchawan Road, P.O. Box 218, Yorktown Heights, NY 10598, USA
| |
Collapse
|
37
|
Klimov D, Shknevsky A, Shahar Y. Exploration of patterns predicting renal damage in patients with diabetes type II using a visual temporal analysis laboratory. J Am Med Inform Assoc 2014; 22:275-89. [DOI: 10.1136/amiajnl-2014-002927] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
Abstract
Objective To analyze the longitudinal data of multiple patients and to discover new temporal knowledge, we designed and developed the Visual Temporal Analysis Laboratory (ViTA-Lab). In this study, we demonstrate several of the capabilities of the ViTA-Lab framework through the exploration of renal-damage risk factors in patients with diabetes type II.
Materials and methods The ViTA-Lab framework combines data-driven temporal data mining techniques, with interactive, query-driven, visual analytical capabilities, to support, in an integrated fashion, an iterative investigation of time-oriented clinical data and of patterns discovered in them. Patterns discovered through the data mining mode can be explored visually, and vice versa. Both analysis modes are supported by a rich underlying ontology of clinical concepts, their relations, and their temporal properties. The knowledge enables us to apply a temporal-abstraction pre-processing phase that abstracts in a context-sensitive manner raw time-stamped data into interval-based clinically meaningful interpretations, increasing the results’ significance. We demonstrate our approach through the exploration of risk factors associated with future renal damage (micro-albuminuria and macro-albuminuria) and their relationship to the hemoglobin A1C (HbA1C ) and creatinine level concepts, in the longitudinal records of 22 000 patients with diabetes type II followed for up to 5 years.
Results The iterative ViTA-Lab analysis process was highly feasible. Higher ranges of either normal albuminuria or normal creatinine values and their combination were shown to be significantly associated with future micro-albuminuria and macro-albuminuria. The risk increased given high HbA1C levels for women in the lower range of normal albuminuria, and for men in the higher range of albuminuria.
Conclusions The ViTA-Lab framework can potentially serve as a virtual laboratory for investigations of large masses of longitudinal clinical databases, for discovery of new knowledge through interactive exploration, clustering, classification, and prediction.
Collapse
Affiliation(s)
- Denis Klimov
- Department of Information Systems Engineering, Faculty of Engineering Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Alexander Shknevsky
- Department of Information Systems Engineering, Faculty of Engineering Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Yuval Shahar
- Department of Information Systems Engineering, Faculty of Engineering Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| |
Collapse
|
38
|
|
39
|
Classification of multivariate time series via temporal abstraction and time intervals mining. Knowl Inf Syst 2014. [DOI: 10.1007/s10115-014-0784-5] [Citation(s) in RCA: 67] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|