Abstract
Time series obtained from time-dependent experiments contain rich information on kinetics and dynamics of the system under investigation. This work describes an unsupervised learning framework, along with the derivation of the necessary analytical expressions, for the analysis of Gaussian-distributed time series that exhibit discrete states. After the time series has been partitioned into segments in a model-free manner using the previously developed change-point (CP) method, this protocol starts with an agglomerative hierarchical clustering algorithm to classify the detected segments into possible states. The initial state clustering is further refined using an expectation-maximization (EM) procedure, and the number of states is determined by a Bayesian information criterion (BIC). Also introduced here is an achievement scalarization function, usually seen in artificial intelligence literature, for quantitatively assessing the performance of state determination. The statistical learning framework, which is comprised of three stages, detection of signal change, clustering, and number-of-state determination, was thoroughly characterized using simulated trajectories with random intensity segments that have no underlying kinetics, and its performance was critically evaluated. The application to experimental data is also demonstrated. The results suggested that this general framework, the implementation of which is based on firm theoretical foundations and does not require the imposition of any kinetics model, is powerful in determining the number of states, the parameters contained in each state, as well as the associated statistical significance.
Collapse