1
|
Omen: discovering sequential patterns with reliable prediction delays. Knowl Inf Syst 2022. [DOI: 10.1007/s10115-022-01660-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
AbstractSuppose we are given a discrete-valued time series $$X $$
X
of observed events and an equally long binary sequence $$Y $$
Y
that indicates whether something of interest happened at that particular point in time. We consider the problem of mining serial episodes, sequential patterns allowing for gaps, from $$X $$
X
that reliably predict those interesting events. With reliable we mean patterns that not only predict that an interesting event is likely to follow, but in particular that we can also accurately tell how how long until that event will happen. In other words, we are specifically interested in patterns with a highly skewed distribution of delays between pattern occurrences and predicted events. As it is unlikely that a single pattern can explain a complex real-world progress, we are after the smallest, least redundant set of such patterns that together explain the interesting events well. We formally define this problem in terms of the Minimum Description Length principle, by which we identify the best patterns as those that describe the occurrences of interesting events $$Y $$
Y
most succinctly given the data over $$X $$
X
. As neither discovering the optimal explanation of $$Y $$
Y
given a set of patterns, nor the discovery of optimal pattern set are problems that allow for straightforward optimization, we break the problem in two and propose effective heuristics for both. Through extensive empirical evaluation, we show that both our main method, Omen, and its fast approximation fOmen, work well in practice and both quantitatively and qualitatively beat the state of the art.
Collapse
|
2
|
Lee JM, Hauskrecht M. Modeling multivariate clinical event time-series with recurrent temporal mechanisms. Artif Intell Med 2021; 112:102021. [PMID: 33581828 PMCID: PMC7943294 DOI: 10.1016/j.artmed.2021.102021] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Revised: 12/26/2020] [Accepted: 01/10/2021] [Indexed: 12/18/2022]
Abstract
In this work, we propose a novel autoregressive event time-series model that can predict future occurrences of multivariate clinical events. Our model represents multivariate event time-series using different temporal mechanisms aimed to fit different temporal characteristics of the time-series. In particular, information about distant past is modeled through the hidden state space defined by an LSTM-based model, information on recently observed clinical events is modeled through discriminative projections, and information about periodic (repeated) events is modeled using a special recurrent mechanism based on probability distributions of inter-event gaps compiled from past data. We evaluate our proposed model on electronic health record (EHRs) data derived from MIMIC-III dataset. We show that our new model equipped with the above temporal mechanisms leads to improved prediction performance compared to multiple baselines.
Collapse
Affiliation(s)
- Jeong Min Lee
- Department of Computer Science, University of Pittsburgh, Pittsburgh, PA 15260, USA.
| | - Milos Hauskrecht
- Department of Computer Science, University of Pittsburgh, Pittsburgh, PA 15260, USA.
| |
Collapse
|
3
|
Jane YN, Nehemiah HK, Kannan A. Classifying unevenly spaced clinical time series data using forecast error approximation based bottom-up (FeAB) segmented time delay neural network. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2021. [DOI: 10.1080/21681163.2020.1817791] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Y. Nancy Jane
- Department of Computer Technology, Anna University, Chennai, India
| | | | - Arputharaj Kannan
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India
| |
Collapse
|
4
|
Estiri H, Strasser ZH, Klann JG, McCoy TH, Wagholikar KB, Vasey S, Castro VM, Murphy ME, Murphy SN. Transitive Sequencing Medical Records for Mining Predictive and Interpretable Temporal Representations. PATTERNS (NEW YORK, N.Y.) 2020; 1:100051. [PMID: 32835307 PMCID: PMC7301790 DOI: 10.1016/j.patter.2020.100051] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 04/27/2020] [Accepted: 05/26/2020] [Indexed: 12/13/2022]
Abstract
Electronic health records (EHRs) contain important temporal information about the progression of disease and treatment outcomes. This paper proposes a transitive sequencing approach for constructing temporal representations from EHR observations for downstream machine learning. Using clinical data from a cohort of patients with congestive heart failure, we mined temporal representations by transitive sequencing of EHR medication and diagnosis records for classification and prediction tasks. We compared the classification and prediction performances of the transitive sequential representations (bag-of-sequences approach) with the conventional approach of using aggregated vectors of EHR data (aggregated vector representation) across different classifiers. We found that the transitive sequential representations are better phenotype "differentiators" and predictors than the "atemporal" EHR records. Our results also demonstrated that data representations obtained from transitive sequencing of EHR observations can present novel insights about the progression of the disease that are difficult to discern when clinical data are treated independently of the patient's history.
Collapse
Affiliation(s)
- Hossein Estiri
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA
- Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - Zachary H. Strasser
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA
- Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
- Harvard Medical School, Boston, MA 02115, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Jeffery G. Klann
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA
- Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - Thomas H. McCoy
- Harvard Medical School, Boston, MA 02115, USA
- Center for Quantitative Health, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Kavishwar B. Wagholikar
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA
- Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - Sebastien Vasey
- Department of Mathematics, Harvard University, Cambridge, MA 02138, USA
| | - Victor M. Castro
- Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
| | - MaryKate E. Murphy
- Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
| | - Shawn N. Murphy
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA
- Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
- Harvard Medical School, Boston, MA 02115, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA
| |
Collapse
|
6
|
Kocheturov A, Momcilovic P, Bihorac A, Pardalos PM. Extended vertical lists for temporal pattern mining from multivariate time series. EXPERT SYSTEMS 2019; 36:e12448. [PMID: 33162636 PMCID: PMC7646935 DOI: 10.1111/exsy.12448] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Accepted: 05/10/2019] [Indexed: 06/11/2023]
Abstract
In this paper, the problem of mining complex temporal patterns in the context of multivariate time series is considered. A new method called the Fast Temporal Pattern Mining with Extended Vertical Lists is introduced. The method is based on an extension of the level-wise property, which requires a more complex pattern to start at positions within a record where all of the subpatterns of the pattern start. The approach is built around a novel data structure called the Extended Vertical List that tracks positions of the first state of the pattern inside records and links them to appropriate positions of a specific subpattern of the pattern called the prefix. Extensive computational results indicate that the new method performs significantly faster than the previous version of the algorithm for Temporal Pattern Mining; however, the increase in speed comes at the expense of increased memory usage.
Collapse
Affiliation(s)
- Anton Kocheturov
- Center for Applied Optimization, Industrial and Systems Engineering, University of Florida, Gainesville, Florida
| | - Petar Momcilovic
- Industrial and Systems Engineering, University of Florida, Gainesville, Florida
| | - Azra Bihorac
- Division of Nephrology, Hypertension, and Renal Transplantation, University of Florida, Gainesville, Florida
| | - Panos M. Pardalos
- Center for Applied Optimization, Industrial and Systems Engineering, University of Florida, Gainesville, Florida
| |
Collapse
|
9
|
Georga EI, Tachos NS, Sakellarios AI, Kigka VI, Exarchos TP, Pelosi G, Parodi O, Michalis LK, Fotiadis DI. Artificial Intelligence and Data Mining Methods for Cardiovascular Risk Prediction. ACTA ACUST UNITED AC 2019. [DOI: 10.1007/978-981-10-5092-3_14] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
11
|
Li J, Tan X, Xu X, Wang F. Efficient Mining Template of Predictive Temporal Clinical Event Patterns From Patient Electronic Medical Records. IEEE J Biomed Health Inform 2018; 23:2138-2147. [PMID: 30346297 DOI: 10.1109/jbhi.2018.2877255] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Exploring the temporal relationship among events in patient electronic medical records (EMR) is an important problem in biomedical informatics and the results can reveal patients' impending disease conditions. In this paper, we investigate the problem of mining patterns from a sequence of point events, i.e., we only have the information on when the event happens but no duration or numerical value available. We propose a whole pipeline, including event preprocessing, pattern mining, and outcome analysis to mine the patterns and evaluate their effectiveness and discriminative power. Finally, we treat those mined patterns as additional features and evaluate them in a predictive modeling task for the early detection of congestive heart failure. On a real-world EMR data warehouse, we found that by adding those sequential pattern features, the prediction performance could be significantly improved approximately 0.1.
Collapse
|
12
|
Incorporating repeating temporal association rules in Naïve Bayes classifiers for coronary heart disease diagnosis. J Biomed Inform 2018; 81:74-82. [PMID: 29555443 DOI: 10.1016/j.jbi.2018.03.002] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Revised: 02/14/2018] [Accepted: 03/07/2018] [Indexed: 01/08/2023]
Abstract
In this paper, we develop a Naïve Bayes classification model integrated with temporal association rules (TARs). A temporal pattern mining algorithm is used to detect TARs by identifying the most frequent temporal relationships among the derived basic temporal abstractions (TA). We develop and compare three classifiers that use as features the most frequent TARs as follows: (i) representing the most frequent TARs detected within the target class ('Disease = Present'), (ii) representing the most frequent TARs from both classes ('Disease = Present', 'Disease = Absent'), (iii) representing the most frequent TARs, after removing the ones that are low-risk predictors for the disease. These classifiers incorporate the horizontal support of TARs, which defines the number of times that a particular temporal pattern is found in some patient's record, as their features. All of the developed classifiers are applied for diagnosis of coronary heart disease (CHD) using a longitudinal dataset. We compare two ways of feature representation, using horizontal support or the mean duration of each TAR, on a single patient. The results obtained from this comparison show that the horizontal support representation outperforms the mean duration. The main effort of our research is to demonstrate that where long time periods are of significance in some medical domain, such as the CHD domain, the detection of the repeated occurrences of the most frequent TARs can yield better performances. We compared the classifier that uses the horizontal support representation and has the best performance with a Baseline Classifier which uses the binary representation of the most frequent TARs. The results obtained illustrate the comparatively high performance of the classifier representing the horizontal support, over the Baseline Classifier.
Collapse
|