Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Moskovitch R, Shahar Y. Classification-driven temporal discretization of multivariate time series. Data Min Knowl Discov 2015;29:871-913. [DOI: 10.1007/s10618-014-0380-z] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

For:	Moskovitch R, Shahar Y. Classification-driven temporal discretization of multivariate time series. Data Min Knowl Discov 2015;29:871-913. [DOI: 10.1007/s10618-014-0380-z] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Number

Cited by Other Article(s)

Itzhak N, Jaroszewicz S, Moskovitch R. Event prediction by estimating continuously the completion of a single temporal pattern's instances. J Biomed Inform 2024;156:104665. [PMID: 38852777 DOI: 10.1016/j.jbi.2024.104665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Revised: 05/10/2024] [Accepted: 06/03/2024] [Indexed: 06/11/2024]

Oh W, Jayaraman P, Tandon P, Chaddha US, Kovatch P, Charney AW, Glicksberg BS, Nadkarni GN. A novel method leveraging time series data to improve subphenotyping and application in critically ill patients with COVID-19. Artif Intell Med 2024;148:102750. [PMID: 38325922 PMCID: PMC10864255 DOI: 10.1016/j.artmed.2023.102750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 12/12/2023] [Accepted: 12/14/2023] [Indexed: 02/09/2024]

Affiliation(s)

Wonsuk Oh Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Pushkala Jayaraman Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Pranai Tandon Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Udit S Chaddha Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Patricia Kovatch Department of Scientific Computing, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Alexander W Charney Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Benjamin S Glicksberg Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Character Biosciences, New York, NY, USA
Girish N Nadkarni Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Nephrology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.

Collapse

García-Pavioni A, López B. Dimensionality reduction and features visual representation based on conditional probabilities applied to activity classification. Comput Biol Med 2023;167:107595. [PMID: 37925905 DOI: 10.1016/j.compbiomed.2023.107595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 10/05/2023] [Accepted: 10/17/2023] [Indexed: 11/07/2023]

Sheetrit E, Brief M, Elisha O. Predicting unplanned readmissions in the intensive care unit: a multimodality evaluation. Sci Rep 2023;13:15426. [PMID: 37723231 PMCID: PMC10507073 DOI: 10.1038/s41598-023-42372-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 09/09/2023] [Indexed: 09/20/2023] Open

Abstract

A hospital readmission is when a patient who was discharged from the hospital is admitted again for the same or related care within a certain period. Hospital readmissions are a significant problem in the healthcare domain, as they lead to increased hospitalization costs, decreased patient satisfaction, and increased risk of adverse outcomes such as infections, medication errors, and even death. The problem of hospital readmissions is particularly acute in intensive care units (ICUs), due to the severity of the patients' conditions, and the substantial risk of complications. Predicting Unplanned Readmissions in ICUs is a challenging task, as it involves analyzing different data modalities, such as static data, unstructured free text, sequences of diagnoses and procedures, and multivariate time-series. Here, we investigate the effectiveness of each data modality separately, then alongside with others, using state-of-the-art machine learning approaches in time-series analysis and natural language processing. Using our evaluation process, we are able to determine the contribution of each data modality, and for the first time in the context of readmission, establish a hierarchy of their predictive value. Additionally, we demonstrate the impact of Temporal Abstractions in enhancing the performance of time-series approaches to readmission prediction. Due to conflicting definitions in the literature, we also provide a clear definition of the term Unplanned Readmission to enhance reproducibility and consistency of future research and to prevent any potential misunderstandings that could result from diverse interpretations of the term. Our experimental results on a large benchmark clinical data set show that Discharge Notes written by physicians, have better capabilities for readmission prediction than all other modalities.

Collapse

Mazzacane S, Coccagna M, Manzella F, Pagliarini G, Sironi VA, Gatti A, Caselli E, Sciavicco G. Towards an objective theory of subjective liking: A first step in understanding the sense of beauty. PLoS One 2023;18:e0287513. [PMID: 37352316 PMCID: PMC10289447 DOI: 10.1371/journal.pone.0287513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 06/07/2023] [Indexed: 06/25/2023] Open

Shrestha A, Zikos D, Fegaras L, Blebea J, Sasso RA. A Bayesian method for the automatic extraction of meaningful clinical sequences from large clinical databases. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023;233:107392. [PMID: 36996758 DOI: 10.1016/j.cmpb.2023.107392] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 01/30/2023] [Accepted: 01/31/2023] [Indexed: 06/19/2023]

Prediction of acute hypertensive episodes in critically ill patients. Artif Intell Med 2023;139:102525. [PMID: 37100504 DOI: 10.1016/j.artmed.2023.102525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 01/19/2023] [Accepted: 03/06/2023] [Indexed: 03/16/2023]

Abstract

Prevention and treatment of complications are the backbone of medical care, particularly in critical care settings. Early detection and prompt intervention can potentially prevent complications from occurring and improve outcomes. In this study, we use four longitudinal vital signs variables of intensive care unit patients, focusing on predicting acute hypertensive episodes (AHEs). These episodes represent elevations in blood pressure and may result in clinical damage or indicate a change in a patient's clinical situation, such as an elevation in intracranial pressure or kidney failure. Prediction of AHEs may allow clinicians to anticipate changes in the patient's condition and respond early on to prevent these from occurring. Temporal abstraction was employed to transform the multivariate temporal data into a uniform representation of symbolic time intervals, from which frequent time-intervals-related patterns (TIRPs) are mined and used as features for AHE prediction. A novel TIRP metric for classification, called coverage, is introduced that measures the coverage of a TIRP's instances in a time window. For comparison, several baseline models were applied on the raw time series data, including logistic regression and sequential deep learning models, are used. Our results show that using frequent TIRPs as features outperforms the baseline models, and the use of the coverage, metric outperforms other TIRP metrics. Two approaches to predicting AHEs in real-life application conditions are evaluated: using a sliding window to continuously predict whether a patient would experience an AHE within a specific prediction time period ahead, our models produced an AUC-ROC of 82%, but with low AUPRC. Alternatively, predicting whether an AHE would generally occur during the entire admission resulted in an AUC-ROC of 74%.

Collapse

Bai W, Yamashita O, Yoshimoto J. Learning task-agnostic and interpretable subsequence-based representation of time series and its applications in fMRI analysis. Neural Netw 2023;163:327-340. [PMID: 37099896 DOI: 10.1016/j.neunet.2023.03.038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 02/13/2023] [Accepted: 03/28/2023] [Indexed: 04/28/2023]

Abstract

The recent success of sequential learning models, such as deep recurrent neural networks, is largely due to their superior representation-learning capability for learning the informative representation of a targeted time series. The learning of these representations is generally goal-directed, resulting in their task-specific nature, giving rise to excellent performance in completing a single downstream task but hindering between-task generalisation. Meanwhile, with increasingly intricate sequential learning models, learned representation becomes abstract to human knowledge and comprehension. Hence, we propose a unified local predictive model based on the multi-task learning paradigm to learn the task-agnostic and interpretable subsequence-based time series representation, allowing versatile use of learned representations in temporal prediction, smoothing, and classification tasks. The targeted interpretable representation could convey the spectral information of the modelled time series to the level of human comprehension. Through a proof-of-concept evaluation study, we demonstrate the empirical superiority of learned task-agnostic and interpretable representation over task-specific and conventional subsequence-based representation, such as symbolic and recurrent learning-based representation, in solving temporal prediction, smoothing, and classification tasks. These learned task-agnostic representations can also reveal the ground-truth periodicity of the modelled time series. We further propose two applications of our unified local predictive model in functional magnetic resonance imaging (fMRI) analysis to reveal the spectral characterisation of cortical areas at rest and reconstruct more smoothed temporal dynamics of cortical activations in both resting-state and task-evoked fMRI data, giving rise to robust decoding.

Collapse

Manzella F, Pagliarini G, Sciavicco G, Stan IE. The voice of COVID-19: Breath and cough recording classification with temporal decision trees and random forests. Artif Intell Med 2023;137:102486. [PMID: 36868683 PMCID: PMC9904537 DOI: 10.1016/j.artmed.2022.102486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 12/27/2022] [Accepted: 12/28/2022] [Indexed: 02/05/2023]

Novitski P, Cohen CM, Karasik A, Hodik G, Moskovitch R. Temporal patterns selection for All-Cause Mortality prediction in T2D with ANNs. J Biomed Inform 2022;134:104198. [PMID: 36100163 DOI: 10.1016/j.jbi.2022.104198] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 08/10/2022] [Accepted: 09/03/2022] [Indexed: 01/02/2023]

Abstract

Mortality prevention in T2D elderly population having Chronic Kidney Disease (CKD) may be possible thorough risk assessment and predictive modeling. In this study we investigate the ability to predict mortality using heterogeneous Electronic Health Records data. Temporal abstraction is employed to transform the heterogeneous multivariate temporal data into a uniform representation of symbolic time intervals, from which then frequent Time Intervals Related Patterns (TIRPs) are discovered. However, in this study a novel representation of the TIRPs is introduced, which enables to incorporate them in Deep Learning Networks. We describe here the use of iTirps and bTirps, in which the TIRPs are represented by a integer and binary vector representing the time respectively. While bTirp represents whether a TIRP's instance was present, iTirp represents whether multiple instances were present. While the framework showed encouraging results, a major challenge is often the large number of TIRPs, which may cause the models to under-perform. We introduce a novel method for TIRPs' selection method, called TIRP Ranking Criteria (TRC), which is consists on the TIRP's metrics, such as the differences in its recurrences, its frequencies, and the average duration difference between the classes. Additionally, we introduce an advanced version, called TRC Redundant TIRP Removal (TRC-RTR), TIRPs that highly correlate are candidates for removal. Then the selected subset of iTirp/bTirps is fed into a Deep Learning architecture like a Recurrent Neural Network or a Convolutional Neural Network. Furthermore, a predictive committee is utilized in which raw data and iTirp data are both used as input. Our results show that iTirps-based models that use a subset of iTirps based on the TRC-RTR method outperform models that use raw data or models that use full set of discovered iTirps.

Collapse

Shitrit G, Tractinsky N, Moskovitch R. Visualization of Frequent Temporal Patterns in Single or Two Populations. J Biomed Inform 2022;134:104169. [PMID: 36038065 DOI: 10.1016/j.jbi.2022.104169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 08/11/2022] [Accepted: 08/13/2022] [Indexed: 10/15/2022]

Personalized insulin dose manipulation attack and its detection using interval-based temporal patterns and machine learning algorithms. J Biomed Inform 2022;132:104129. [PMID: 35781036 DOI: 10.1016/j.jbi.2022.104129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 05/16/2022] [Accepted: 06/21/2022] [Indexed: 11/20/2022]

Abstract

Many patients with diabetes are currently being treated with insulin pumps and other diabetes devices which improve their quality of life and enable effective treatment of diabetes. These devices are connected wirelessly and thus, are vulnerable to cyber-attacks which have already been proven feasible. In this paper, we focus on two types of cyber-attacks on insulin pump systems: an overdose of insulin, which can cause hypoglycemia, and an underdose of insulin, which can cause hyperglycemia. Both of these attacks can result in a variety of complications and endanger a patient's life. Specifically, we propose a sophisticated and personalized insulin dose manipulation attack; this attack is based on a novel method of predicting the blood glucose (BG) level in response to insulin dose administration. To protect patients from the proposed sophisticated and malicious insulin dose manipulation attacks, we also present an automated machine learning based system for attack detection; the detection system is based on an advanced temporal pattern mining process, which is performed on the logs of real insulin pumps and continuous glucose monitors (CGMs). Our multivariate time-series data (MTSD) collection consists of 225,780 clinical logs, collected from real insulin pumps and CGMs of 47 patients with type I diabetes (13 adults and 34 children) from two different clinics at Soroka University Medical Center in Beer-Sheva, Israel over a four-year period. We enriched our data collection with additional relevant medical information related to the subjects. In the extensive experiments performed, we evaluated the proposed attack and detection system and examined whether: (1) it is possible to accurately predict BG levels in order to create malicious data that simulate a manipulation attack and the patient's body in response to it; (2) it is possible to automatically detect such attacks based on advanced machine learning (ML) methods that leverage temporal patterns; (3) the detection capabilities of the proposed detection system differ for insulin overdose and underdose attacks; and (4) the granularity of the learning model (general / adult vs. pediatric clinic / individual patient) affects the detection capabilities. Our results show that (a) it is possible to predict, with nearly 90% accuracy, BG levels using our proposed methods, and by doing so, enable malicious data creation for our detection system evaluation; (b) it is possible to accurately detect insulin manipulation attacks using temporal patterns mining using several ML methods, including Logistic Regression, Random Forest, TPF class model, TPF top k, and ANN algorithms; (c) it is easier to detect an overdose attack than an underdose attack in more than 25%, in terms of AUC scores; and (d) the adult vs. pediatric model outperformed models of other granularities in the detection of overdose attacks, while the general model outperformed the other models in the case of detecting underdose attacks; for both attacks, attack detection among children was found to be more challenging than among adults. In addition to its use in the evaluation of our detection system, the proposed BG prediction method has great importance in the medical domain where it can contribute to improved care of patients with diabetes.

Collapse

All-cause mortality prediction in T2D patients with iTirps. Artif Intell Med 2022;130:102325. [DOI: 10.1016/j.artmed.2022.102325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 05/17/2022] [Accepted: 05/17/2022] [Indexed: 11/17/2022]

Mordvanyuk N, López B, Bifet A. TA4L: Efficient temporal abstraction of multivariate time series. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Lion M, Shahar Y. Implementation and evaluation of a multivariate abstraction-based, interval-based dynamic time-warping method as a similarity measure for longitudinal medical records. J Biomed Inform 2021;123:103919. [PMID: 34628062 DOI: 10.1016/j.jbi.2021.103919] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 08/25/2021] [Accepted: 09/27/2021] [Indexed: 11/30/2022]

Abstract

OBJECTIVES

A common prerequisite for tasks such as classification, prediction, clustering and retrieval of longitudinal medical records is a clinically meaningful similarity measure that considers both [multiple] variable (concept) values and their time. Currently, most similarity measures focus on raw, time-stamped data as these are stored in a medical record. However, clinicians think in terms of clinically meaningful temporal abstractions, such as "decreasing renal functions", enabling them to ignore minor time and value variations and focus on similarities among the clinical trajectories of different patients. Our objective was to define an abstraction- and interval-based methodology for matching longitudinal, multivariate medical records, and rigorously assess its value, versus the option of using just the raw, time-stamped data.

METHODS

We have developed a new methodology for determination of the relative distance between a pair of longitudinal records, by extending the known dynamic time warping (DTW) method into an interval-based dynamic time warping (iDTW) methodology. The iDTW methodology includes (A): A three-steps interval-based representation (iRep) method: [1] abstracting the raw, time-stamped data of the longitudinal records into clinically meaningful interval-based abstractions, using a domain-specific knowledge base, [2] scoping the period of comparison of the records, [3] creating from the intervals a symbolic time series, by partitioning them into a predetermined temporal granularity; (B) An interval-based matching (iMatch) method to match each relevant pair of multivariate longitudinal records, each represented as multiple series of short symbolic intervals in the determined temporal granularity, using a modified DTW version.

EVALUATION

Three classification or prediction tasks were defined: (1) classifying 161 records of oncology patients as having had autologous versus allogenic bone-marrow transplantation; (2) classifying the longitudinal records of 125 hepatitis patients as having B or C hepatitis; and (3) predicting micro- or macro-albuminuria in the second year, for 151 diabetes patients who were followed for five years. The raw, time-stamped, multivariate data within each medical record, for one, two, or three concepts out of four or five concepts judged as relevant in each medical domain, were abstracted into clinically meaningful intervals using the Knowledge-Based Temporal-Abstraction method, using previously acquired knowledge. We focused on two temporal-abstraction types: (1) State abstractions, which discretize a concept's raw value into a predetermined range (e.g., LOW or HIGH Hemoglobin); and (2) Gradient abstractions, which indicate the trend of the concept's value (e.g., INCREASING, DECREASING Hemoglobin value). We created all of the combinations of either uni-dimensional (State or Gradient) or multi-dimensional (State and Gradient) abstractions, of all of the concepts used. Classification of a record was determined by using a majority of the k-Nearest-Neighbors (KNN) of the given record, k ranging over the odd numbers (to break ties) from 1 to N, N being the size of the training set. We have experimented with all possible configurations of the parameters that our method uses. Overall, a total of 75,936 experiments were performed: 33,600 in the Oncology domain, 28,800 in the Hepatitis domain, and 13,536 in the Diabetes domain. Each experiment involved the performance of a 10-fold Cross Validation to compute the mean performance of a particular iDTW method-configuration set of settings, for a specific subset of one, two, or three concepts out of all of the domain-specific concepts relevant to the classification or prediction task on which the experiment focuses. We measured for each such experimental combination the Area Under the Curve (AUC) and the optimal Specificity/Sensitivity ratio using Youden's Index. We then aggregated the experiments by the types of unidimensional or multidimensional abstractions used in them (including the use of only raw concepts as a special case); for example, two state abstractions of different concepts, and one gradient abstraction of a third concept. We compared the mean AUC when using each such feature representation, or combination of abstractions, across all possible method-setting configurations, to the mean AUC when using as a feature representation, for the same task, only raw concepts, also across all possible method-setting configurations. Finally, we applied a paired t-test, to determine whether the mean difference between the accuracy of each temporal-abstraction representation, across all concept and configuration combinations, and the respective raw-concept combinations, across all concept subset and configuration combinations, is significant (P < 0.05).

RESULTS

The mean performance of the classification and prediction tasks when using, as a feature representation, the various temporal-abstraction combinations, was significantly higher than that performance when using only raw data. Furthermore, in each domain and task, there existed at least one representation using interval-based abstractions whose use led, on average (over all concept subset combinations and method configurations) to a significantly better performance than the use of only subsets of the raw time-stamped data. In seven of nine combinations of domain type (out of three) and number of concepts used (one, two, or three), the variance of the AUCs (for all representations and configurations) was considerably higher across all raw-concept subsets, compared to all abstract combinations. Increasing the number of features used by the matching task enhanced performance. Using multi-dimensional abstractions of the same concept further enhanced the performance. When using only raw data, increasing the number of neighbors monotonically increased the mean performance (over all concept combinations and method configurations) until reaching an optimal saddle-point aroundN; when using abstractions, however, optimal mean performance was often reached after matching only five nearest neighbors.

CONCLUSIONS

Using multivariate and multidimensional interval-based, abstraction-based similarity measures is feasible, and consistently and significantly improved the mean classification and prediction performance in time-oriented domains, using DTW-inspired methods, compared to the use of only raw, time-stamped data. It also made the KNN classification more effective. Nevertheless, although the mean performance for the abstract representations was higher than the mean performance when using only raw-data concepts, the actual optimal classification performance in each domain and task depends on the choice of the specific raw or abstract concepts used as features.

Collapse

Villa‐Blanco C, Larrañaga P, Bielza C. Multidimensional continuous time Bayesian network classifiers. INT J INTELL SYST 2021. [DOI: 10.1002/int.22611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Estiri H, Strasser ZH, Murphy SN. High-throughput phenotyping with temporal sequences. J Am Med Inform Assoc 2021;28:772-781. [PMID: 33313899 DOI: 10.1093/jamia/ocaa288] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Accepted: 11/04/2020] [Indexed: 12/15/2022] Open

Abstract

OBJECTIVE

High-throughput electronic phenotyping algorithms can accelerate translational research using data from electronic health record (EHR) systems. The temporal information buried in EHRs is often underutilized in developing computational phenotypic definitions. This study aims to develop a high-throughput phenotyping method, leveraging temporal sequential patterns from EHRs.

MATERIALS AND METHODS

We develop a representation mining algorithm to extract 5 classes of representations from EHR diagnosis and medication records: the aggregated vector of the records (aggregated vector representation), the standard sequential patterns (sequential pattern mining), the transitive sequential patterns (transitive sequential pattern mining), and 2 hybrid classes. Using EHR data on 10 phenotypes from the Mass General Brigham Biobank, we train and validate phenotyping algorithms.

RESULTS

Phenotyping with temporal sequences resulted in a superior classification performance across all 10 phenotypes compared with the standard representations in electronic phenotyping. The high-throughput algorithm's classification performance was superior or similar to the performance of previously published electronic phenotyping algorithms. We characterize and evaluate the top transitive sequences of diagnosis records paired with the records of risk factors, symptoms, complications, medications, or vaccinations.

DISCUSSION

The proposed high-throughput phenotyping approach enables seamless discovery of sequential record combinations that may be difficult to assume from raw EHR data. Transitive sequences offer more accurate characterization of the phenotype, compared with its individual components, and reflect the actual lived experiences of the patients with that particular disease.

CONCLUSION

Sequential data representations provide a precise mechanism for incorporating raw EHR records into downstream machine learning. Our approach starts with user interpretability and works backward to the technology.

Collapse

Classification of colposcopic images using a multi-breakpoints discretization approach on temporal patterns. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.102918] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Singh A, Ramkumar K. Risk assessment for health insurance using equation modeling and machine learning. INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED AND INTELLIGENT ENGINEERING SYSTEMS 2021. [DOI: 10.3233/kes-210065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Schvetz M, Fuchs L, Novack V, Moskovitch R. Outcomes prediction in longitudinal data: Study designs evaluation, use case in ICU acquired sepsis. J Biomed Inform 2021;117:103734. [PMID: 33711544 DOI: 10.1016/j.jbi.2021.103734] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2020] [Revised: 02/27/2021] [Accepted: 03/01/2021] [Indexed: 12/23/2022]

Abstract

Outcomes' prediction in Electronic Health Records (EHR) and specifically in Critical Care is increasingly attracting more exploration and research. In this study, we used clinical data from the Intensive Care Unit (ICU), focusing on ICU acquired sepsis. Looking at the current literature, several evaluation approaches are reported, inspired by epidemiological designs, in which some do not always reflect real-life application's conditions. This problem seems relevant generally to outcomes' prediction in longitudinal EHR data, or generally longitudinal data, while in this study we focused on ICU data. Unlike in most previous studies that investigated all sepsis admissions, we focused specifically on ICU-Acquired Sepsis. Due to the sparse nature of the longitudinal data, we employed the use of Temporal Abstraction and Time Interval-Related Patterns discovery, which are further used as classification features. Two experiments were designed using three different outcomes prediction study designs from the literature, implementing various levels of real-life conditions to evaluate the prediction models. The first experiment focused on predicting whether a patient would suffer from ICU-acquired sepsis and when during her admission, given a sliding observation time window, and the comparison of the three study designs behavior. The second experiment focused only on predicting whether the patient will suffer from ICU-acquired sepsis, based on data taken relatively to his admission start time. Our results show that using Temporal Discretization for Classification (TD4C) led to better performance than using the Equal-Width Discretization, Knowledge-Based, or SAX. Also, using two states abstraction was better than three or four. Using the default Binary TIRP representation method performed better than Mean Duration, Horizontal Support, and horizontally normalized horizontal support. Using XGBoost as a classifier performed better than Logistic Regression, Neural Net, or Random Forest. Additionally, it is demonstrated why the use of case-crossover-control is most appropriate for real life application conditions evaluation, unlike other incomplete designs that may even result in "better performance".

Collapse

Rebane J, Karlsson I, Bornemann L, Papapetrou P. SMILE: a feature-based temporal abstraction framework for event-interval sequence classification. Data Min Knowl Discov 2020. [DOI: 10.1007/s10618-020-00719-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Falls Prediction in Care Homes Using Mobile App Data Collection. Artif Intell Med 2020. [DOI: 10.1007/978-3-030-59137-3_36] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2023]

Estiri H, Strasser ZH, Klann JG, McCoy TH, Wagholikar KB, Vasey S, Castro VM, Murphy ME, Murphy SN. Transitive Sequencing Medical Records for Mining Predictive and Interpretable Temporal Representations. PATTERNS (NEW YORK, N.Y.) 2020;1:100051. [PMID: 32835307 PMCID: PMC7301790 DOI: 10.1016/j.patter.2020.100051] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 04/27/2020] [Accepted: 05/26/2020] [Indexed: 12/13/2022]

Affiliation(s)

Hossein Estiri Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA Harvard Medical School, Boston, MA 02115, USA
Zachary H. Strasser Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA Harvard Medical School, Boston, MA 02115, USA Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
Jeffery G. Klann Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA Harvard Medical School, Boston, MA 02115, USA
Thomas H. McCoy Harvard Medical School, Boston, MA 02115, USA Center for Quantitative Health, Massachusetts General Hospital, Boston, MA 02114, USA
Kavishwar B. Wagholikar Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA Harvard Medical School, Boston, MA 02115, USA
Sebastien Vasey Department of Mathematics, Harvard University, Cambridge, MA 02138, USA
Victor M. Castro Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
MaryKate E. Murphy Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
Shawn N. Murphy Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA Harvard Medical School, Boston, MA 02115, USA Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA

Collapse

A multi-breakpoints approach for symbolic discretization of time series. Knowl Inf Syst 2020. [DOI: 10.1007/s10115-020-01437-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Gharghabi S, Imani S, Bagnall A, Darvishzadeh A, Keogh E. An ultra-fast time series distance measure to allow data mining in more complex real-world deployments. Data Min Knowl Discov 2020. [DOI: 10.1007/s10618-020-00695-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Roe KD, Jawa V, Zhang X, Chute CG, Epstein JA, Matelsky J, Shpitser I, Taylor CO. Feature engineering with clinical expert knowledge: A case study assessment of machine learning model complexity and performance. PLoS One 2020;15:e0231300. [PMID: 32324754 PMCID: PMC7179831 DOI: 10.1371/journal.pone.0231300] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Accepted: 03/20/2020] [Indexed: 11/19/2022] Open

Affiliation(s)

Kenneth D. Roe Johns Hopkins Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, United States of America The Institute of Clinical and Translational Research, Johns Hopkins University, Baltimore, MD, United States of America
Vibhu Jawa Johns Hopkins Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, United States of America Department of Computer Science, Johns Hopkins University Whiting School of Engineering, Baltimore, MD, United States of America
Xiaohan Zhang Division of Health Sciences Informatics, Johns Hopkins University School of Medicine, Baltimore, MD, United States of America
Christopher G. Chute Johns Hopkins Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, United States of America The Institute of Clinical and Translational Research, Johns Hopkins University, Baltimore, MD, United States of America Division of Health Sciences Informatics, Johns Hopkins University School of Medicine, Baltimore, MD, United States of America Division of General Internal Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, United States of America
Jeremy A. Epstein Division of General Internal Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, United States of America
Jordan Matelsky Johns Hopkins University Applied Physics Laboratory, Laurel, MD, United States of America
Ilya Shpitser Johns Hopkins Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, United States of America Department of Computer Science, Johns Hopkins University Whiting School of Engineering, Baltimore, MD, United States of America
Casey Overby Taylor Johns Hopkins Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, United States of America The Institute of Clinical and Translational Research, Johns Hopkins University, Baltimore, MD, United States of America Division of Health Sciences Informatics, Johns Hopkins University School of Medicine, Baltimore, MD, United States of America Division of General Internal Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, United States of America Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States of America * E-mail:

Collapse

Song X, Waitman LR, Yu AS, Robbins DC, Hu Y, Liu M. Longitudinal Risk Prediction of Chronic Kidney Disease in Diabetic Patients Using a Temporal-Enhanced Gradient Boosting Machine: Retrospective Cohort Study. JMIR Med Inform 2020;8:e15510. [PMID: 32012067 PMCID: PMC7055762 DOI: 10.2196/15510] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Revised: 10/31/2019] [Accepted: 10/31/2019] [Indexed: 12/22/2022] Open

Abstract

BACKGROUND

Artificial intelligence-enabled electronic health record (EHR) analysis can revolutionize medical practice from the diagnosis and prediction of complex diseases to making recommendations in patient care, especially for chronic conditions such as chronic kidney disease (CKD), which is one of the most frequent complications in patients with diabetes and is associated with substantial morbidity and mortality.

OBJECTIVE

The longitudinal prediction of health outcomes requires effective representation of temporal data in the EHR. In this study, we proposed a novel temporal-enhanced gradient boosting machine (GBM) model that dynamically updates and ensembles learners based on new events in patient timelines to improve the prediction accuracy of CKD among patients with diabetes.

METHODS

Using a broad spectrum of deidentified EHR data on a retrospective cohort of 14,039 adult patients with type 2 diabetes and GBM as the base learner, we validated our proposed Landmark-Boosting model against three state-of-the-art temporal models for rolling predictions of 1-year CKD risk.

RESULTS

The proposed model uniformly outperformed other models, achieving an area under receiver operating curve of 0.83 (95% CI 0.76-0.85), 0.78 (95% CI 0.75-0.82), and 0.82 (95% CI 0.78-0.86) in predicting CKD risk with automatic accumulation of new data in later years (years 2, 3, and 4 since diabetes mellitus onset, respectively). The Landmark-Boosting model also maintained the best calibration across moderate- and high-risk groups and over time. The experimental results demonstrated that the proposed temporal model can not only accurately predict 1-year CKD risk but also improve performance over time with additionally accumulated data, which is essential for clinical use to improve renal management of patients with diabetes.

CONCLUSIONS

Incorporation of temporal information in EHR data can significantly improve predictive model performance and will particularly benefit patients who follow-up with their physicians as recommended.

Collapse

Novitski P, Cohen CM, Karasik A, Shalev V, Hodik G, Moskovitch R. All-Cause Mortality Prediction in T2D Patients. Artif Intell Med 2020. [DOI: 10.1007/978-3-030-59137-3_1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]

Acute Hypertensive Episodes Prediction. Artif Intell Med 2020. [DOI: 10.1007/978-3-030-59137-3_35] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/08/2022]

Estiri H, Vasey S, Murphy SN. Transitive Sequential Pattern Mining for Discrete Clinical Data. Artif Intell Med 2020. [DOI: 10.1007/978-3-030-59137-3_37] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]

Kate RJ, Pearce N, Mazumdar D, Nilakantan V. A continual prediction model for inpatient acute kidney injury. Comput Biol Med 2020;116:103580. [DOI: 10.1016/j.compbiomed.2019.103580] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Revised: 12/07/2019] [Accepted: 12/09/2019] [Indexed: 12/11/2022]

J48SS: A Novel Decision Tree Approach for the Handling of Sequential and Time Series Data. COMPUTERS 2019. [DOI: 10.3390/computers8010021] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Moskovitch R, Shahar Y, Wang F, Hripcsak G. Temporal biomedical data analytics. J Biomed Inform 2019;90:103092. [PMID: 30654029 PMCID: PMC9745669 DOI: 10.1016/j.jbi.2018.12.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2018] [Accepted: 12/24/2018] [Indexed: 02/07/2023]

C-LACE2: computational risk assessment tool for 30-day post hospital discharge mortality. HEALTH AND TECHNOLOGY 2018. [DOI: 10.1007/s12553-018-0263-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]

Utilizing electronic health records to predict multi-type major adverse cardiovascular events after acute coronary syndrome. Knowl Inf Syst 2018. [DOI: 10.1007/s10115-018-1270-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]

Forestier G, Petitjean F, Senin P, Despinoy F, Huaulmé A, Fawaz HI, Weber J, Idoumghar L, Muller PA, Jannin P. Surgical motion analysis using discriminative interpretable patterns. Artif Intell Med 2018;91:3-11. [PMID: 30172445 DOI: 10.1016/j.artmed.2018.08.002] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Revised: 07/06/2018] [Accepted: 08/13/2018] [Indexed: 11/29/2022]

Trusted system-calls analysis methodology aimed at detection of compromised virtual machines using sequential mining. Knowl Based Syst 2018. [DOI: 10.1016/j.knosys.2018.04.033] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]

Wu S, Liu S, Sohn S, Moon S, Wi CI, Juhn Y, Liu H. Modeling asynchronous event sequences with RNNs. J Biomed Inform 2018;83:167-177. [PMID: 29883623 PMCID: PMC6103779 DOI: 10.1016/j.jbi.2018.05.016] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2017] [Revised: 05/10/2018] [Accepted: 05/26/2018] [Indexed: 12/14/2022]

Shknevsky A, Shahar Y, Moskovitch R. Consistent discovery of frequent interval-based temporal patterns in chronic patients' data. J Biomed Inform 2017;75:83-95. [PMID: 28987378 DOI: 10.1016/j.jbi.2017.10.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2017] [Revised: 08/23/2017] [Accepted: 10/02/2017] [Indexed: 11/24/2022]

Abstract

Increasingly, frequent temporal patterns discovered in longitudinal patient records are proposed as features for classification and prediction, and as means to cluster patient clinical trajectories. However, to justify that, we must demonstrate that most frequent temporal patterns are indeed consistently discoverable within the records of different patient subsets within similar patient populations. We have developed several measures for the consistency of the discovery of temporal patterns. We focus on time-interval relations patterns (TIRPs) that can be discovered within different subsets of the same patient population. We expect the discovered TIRPs (1) to be frequent in each subset, (2) preserve their "local" metrics - the absolute frequency of each pattern, measured by a Proportion Test, and (3) preserve their "global" characteristics - their overall distribution, measured by a Kolmogorov-Smirnov test. We also wanted to examine the effect on consistency, over a variety of settings, of varying the minimal frequency threshold for TIRP discovery, and of using a TIRP-filtering criterion that we previously introduced, the Semantic Adjacency Criterion (SAC). We applied our methodology to three medical domains (oncology, infectious hepatitis, and diabetes). We found that, within the minimal frequency ranges we had examined, 70-95% of the discovered TIRPs were consistently discoverable; 40-48% of them maintained their local frequency. TIRP global distribution similarity varied widely, from 0% to 65%. Increasing the threshold usually increased the percentage of TIRPs that were repeatedly discovered across different patient subsets within the same domain, and the probability of a similar TIRP distribution. Using the SAC principle, enhanced, for most minimal support levels, the percentage of repeating TIRPs, their local consistency and their global consistency. The effect of using the SAC was further strengthened as the minimal frequency threshold was raised.

Collapse

Nissim N, Shahar Y, Elovici Y, Hripcsak G, Moskovitch R. Inter-labeler and intra-labeler variability of condition severity classification models using active and passive learning methods. Artif Intell Med 2017;81:12-32. [PMID: 28456512 PMCID: PMC5937023 DOI: 10.1016/j.artmed.2017.03.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Accepted: 03/03/2017] [Indexed: 01/20/2023]

Abstract

BACKGROUND AND OBJECTIVES

Labeling instances by domain experts for classification is often time consuming and expensive. To reduce such labeling efforts, we had proposed the application of active learning (AL) methods, introduced our CAESAR-ALE framework for classifying the severity of clinical conditions, and shown its significant reduction of labeling efforts. The use of any of three AL methods (one well known [SVM-Margin], and two that we introduced [Exploitation and Combination_XA]) significantly reduced (by 48% to 64%) condition labeling efforts, compared to standard passive (random instance-selection) SVM learning. Furthermore, our new AL methods achieved maximal accuracy using 12% fewer labeled cases than the SVM-Margin AL method. However, because labelers have varying levels of expertise, a major issue associated with learning methods, and AL methods in particular, is how to best to use the labeling provided by a committee of labelers. First, we wanted to know, based on the labelers' learning curves, whether using AL methods (versus standard passive learning methods) has an effect on the Intra-labeler variability (within the learning curve of each labeler) and inter-labeler variability (among the learning curves of different labelers). Then, we wanted to examine the effect of learning (either passively or actively) from the labels created by the majority consensus of a group of labelers.

METHODS

We used our CAESAR-ALE framework for classifying the severity of clinical conditions, the three AL methods and the passive learning method, as mentioned above, to induce the classifications models. We used a dataset of 516 clinical conditions and their severity labeling, represented by features aggregated from the medical records of 1.9 million patients treated at Columbia University Medical Center. We analyzed the variance of the classification performance within (intra-labeler), and especially among (inter-labeler) the classification models that were induced by using the labels provided by seven labelers. We also compared the performance of the passive and active learning models when using the consensus label.

RESULTS

The AL methods: produced, for the models induced from each labeler, smoother Intra-labeler learning curves during the training phase, compared to the models produced when using the passive learning method. The mean standard deviation of the learning curves of the three AL methods over all labelers (mean: 0.0379; range: [0.0182 to 0.0496]), was significantly lower (p=0.049) than the Intra-labeler standard deviation when using the passive learning method (mean: 0.0484; range: [0.0275-0.0724). Using the AL methods resulted in a lower mean Inter-labeler AUC standard deviation among the AUC values of the labelers' different models during the training phase, compared to the variance of the induced models' AUC values when using passive learning. The Inter-labeler AUC standard deviation, using the passive learning method (0.039), was almost twice as high as the Inter-labeler standard deviation using our two new AL methods (0.02 and 0.019, respectively). The SVM-Margin AL method resulted in an Inter-labeler standard deviation (0.029) that was higher by almost 50% than that of our two AL methods The difference in the inter-labeler standard deviation between the passive learning method and the SVM-Margin learning method was significant (p=0.042). The difference between the SVM-Margin and Exploitation method was insignificant (p=0.29), as was the difference between the Combination_XA and Exploitation methods (p=0.67). Finally, using the consensus label led to a learning curve that had a higher mean intra-labeler variance, but resulted eventually in an AUC that was at least as high as the AUC achieved using the gold standard label and that was always higher than the expected mean AUC of a randomly selected labeler, regardless of the choice of learning method (including a passive learning method). Using a paired t-test, the difference between the intra-labeler AUC standard deviation when using the consensus label, versus that value when using the other two labeling strategies, was significant only when using the passive learning method (p=0.014), but not when using any of the three AL methods.

CONCLUSIONS

The use of AL methods, (a) reduces intra-labeler variability in the performance of the induced models during the training phase, and thus reduces the risk of halting the process at a local minimum that is significantly different in performance from the rest of the learned models; and (b) reduces Inter-labeler performance variance, and thus reduces the dependence on the use of a particular labeler. In addition, the use of a consensus label, agreed upon by a rather uneven group of labelers, might be at least as good as using the gold standard labeler, who might not be available, and certainly better than randomly selecting one of the group's individual labelers. Finally, using the AL methods: when provided by the consensus label reduced the intra-labeler AUC variance during the learning phase, compared to using passive learning.

Collapse

Moskovitch R, Polubriaginof F, Weiss A, Ryan P, Tatonetti N. Procedure prediction from symbolic Electronic Health Records via time intervals analytics. J Biomed Inform 2017;75:70-82. [PMID: 28823923 DOI: 10.1016/j.jbi.2017.07.018] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2016] [Revised: 06/19/2017] [Accepted: 07/25/2017] [Indexed: 11/18/2022]

Abstract

Prediction of medical events, such as clinical procedures, is essential for preventing disease, understanding disease mechanism, and increasing patient quality of care. Although longitudinal clinical data from Electronic Health Records provides opportunities to develop predictive models, the use of these data faces significant challenges. Primarily, while the data are longitudinal and represent thousands of conceptual events having duration, they are also sparse, complicating the application of traditional analysis approaches. Furthermore, the framework presented here takes advantage of the events duration and gaps. International standards for electronic healthcare data represent data elements, such as procedures, conditions, and drug exposures, using eras, or time intervals. Such eras contain both an event and a duration and enable the application of time intervals mining - a relatively new subfield of data mining. In this study, we present Maitreya, a framework for time intervals analytics in longitudinal clinical data. Maitreya discovers frequent time intervals related patterns (TIRPs), which we use as prognostic markers for modelling clinical events. We introduce three novel TIRP metrics that are normalized versions of the horizontal-support, that represents the number of TIRP instances per patient. We evaluate Maitreya on 28 frequent and clinically important procedures, using the three novel TIRP representation metrics in comparison to no temporal representation and previous TIRPs metrics. We also evaluate the epsilon value that makes Allen's relations more flexible with several settings of 30, 60, 90 and 180days in comparison to the default zero. For twenty-two of these procedures, the use of temporal patterns as predictors was superior to non-temporal features, and the use of the vertically normalized horizontal support metric to represent TIRPs as features was most effective. The use of the epsilon value with thirty days was slightly better than the zero.

Collapse

Casanova IJ, Campos M, Juarez JM, Fernandez-Fernandez-Arroyo A, Lorente JA. Impact of time series discretization on intensive care burn unit survival classification. PROGRESS IN ARTIFICIAL INTELLIGENCE 2017. [DOI: 10.1007/s13748-017-0130-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]

Moskovitch R, Choi H, Hripcsak G, Tatonetti N. Prognosis of Clinical Outcomes with Temporal Patterns and Experiences with One Class Feature Selection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017;14:555-563. [PMID: 27429447 PMCID: PMC5486920 DOI: 10.1109/tcbb.2016.2591539] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]

Deja R, Froelich W, Deja G, Wakulicz-Deja A. Hybrid approach to the generation of medical guidelines for insulin therapy for children. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2016.07.066] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]

Tuarob S, Tucker CS, Kumara S, Giles CL, Pincus AL, Conroy DE, Ram N. How are you feeling?: A personalized methodology for predicting mental states from temporally observable physical and behavioral information. J Biomed Inform 2017;68:1-19. [PMID: 28213145 PMCID: PMC5453908 DOI: 10.1016/j.jbi.2017.02.010] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Revised: 02/12/2017] [Accepted: 02/13/2017] [Indexed: 01/07/2023]

Abstract

It is believed that anomalous mental states such as stress and anxiety not only cause suffering for the individuals, but also lead to tragedies in some extreme cases. The ability to predict the mental state of an individual at both current and future time periods could prove critical to healthcare practitioners. Currently, the practical way to predict an individual's mental state is through mental examinations that involve psychological experts performing the evaluations. However, such methods can be time and resource consuming, mitigating their broad applicability to a wide population. Furthermore, some individuals may also be unaware of their mental states or may feel uncomfortable to express themselves during the evaluations. Hence, their anomalous mental states could remain undetected for a prolonged period of time. The objective of this work is to demonstrate the ability of using advanced machine learning based approaches to generate mathematical models that predict current and future mental states of an individual. The problem of mental state prediction is transformed into the time series forecasting problem, where an individual is represented as a multivariate time series stream of monitored physical and behavioral attributes. A personalized mathematical model is then automatically generated to capture the dependencies among these attributes, which is used for prediction of mental states for each individual. In particular, we first illustrate the drawbacks of traditional multivariate time series forecasting methodologies such as vector autoregression. Then, we show that such issues could be mitigated by using machine learning regression techniques which are modified for capturing temporal dependencies in time series data. A case study using the data from 150 human participants illustrates that the proposed machine learning based forecasting methods are more suitable for high-dimensional psychological data than the traditional vector autoregressive model in terms of both magnitude of error and directional accuracy. These results not only present a successful usage of machine learning techniques in psychological studies, but also serve as a building block for multiple medical applications that could rely on an automated system to gauge individuals' mental states.

Collapse

Kostakis O, Papapetrou P. On searching and indexing sequences of temporal intervals. Data Min Knowl Discov 2017. [DOI: 10.1007/s10618-016-0489-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

MEMOD: a novel multivariate evolutionary multi-objective discretization. Soft comput 2017. [DOI: 10.1007/s00500-016-2475-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Discovering Discriminative and Interpretable Patterns for Surgical Motion Analysis. Artif Intell Med 2017. [DOI: 10.1007/978-3-319-59758-4_15] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Zhao J, Papapetrou P, Asker L, Boström H. Learning from heterogeneous temporal data in electronic health records. J Biomed Inform 2016;65:105-119. [PMID: 27919732 DOI: 10.1016/j.jbi.2016.11.006] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2016] [Revised: 10/21/2016] [Accepted: 11/21/2016] [Indexed: 11/30/2022]

Cardiac arrhythmia classification using multi-granulation rough set approaches. INT J MACH LEARN CYB 2016. [DOI: 10.1007/s13042-016-0594-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]