1
|
Nomura Y. Boltzmann machines and quantum many-body problems. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2023; 36:073001. [PMID: 37918107 DOI: 10.1088/1361-648x/ad0916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Accepted: 11/02/2023] [Indexed: 11/04/2023]
Abstract
Analyzing quantum many-body problems and elucidating the entangled structure of quantum states is a significant challenge common to a wide range of fields. Recently, a novel approach using machine learning was introduced to address this challenge. The idea is to 'embed' nontrivial quantum correlations (quantum entanglement) into artificial neural networks. Through intensive developments, artificial neural network methods are becoming new powerful tools for analyzing quantum many-body problems. Among various artificial neural networks, this topical review focuses on Boltzmann machines and provides an overview of recent developments and applications.
Collapse
Affiliation(s)
- Yusuke Nomura
- Department of Applied Physics and Physico-Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama 223-8522, Japan
| |
Collapse
|
2
|
Jiang Z, Su YH, Yin H. Quantifying Information of Dynamical Biochemical Reaction Networks. ENTROPY (BASEL, SWITZERLAND) 2023; 25:887. [PMID: 37372231 DOI: 10.3390/e25060887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/10/2023] [Accepted: 05/26/2023] [Indexed: 06/29/2023]
Abstract
A large number of complex biochemical reaction networks are included in the gene expression, cell development, and cell differentiation of in vivo cells, among other processes. Biochemical reaction-underlying processes are the ones transmitting information from cellular internal or external signaling. However, how this information is measured remains an open question. In this paper, we apply the method of information length, based on the combination of Fisher information and information geometry, to study linear and nonlinear biochemical reaction chains, respectively. Through a lot of random simulations, we find that the amount of information does not always increase with the length of the linear reaction chain; instead, the amount of information varies significantly when this length is not very large. When the length of the linear reaction chain reaches a certain value, the amount of information hardly changes. For nonlinear reaction chains, the amount of information changes not only with the length of this chain, but also with reaction coefficients and rates, and this amount also increases with the length of the nonlinear reaction chain. Our results will help to understand the role of the biochemical reaction networks in cells.
Collapse
Affiliation(s)
- Zhiyuan Jiang
- School of Science, Shenyang University of Technology, Shenyang 110870, China
- School of Mathematics and Statistics, Xuzhou University of Technology, Xuzhou 221018, China
| | - You-Hui Su
- School of Mathematics and Statistics, Xuzhou University of Technology, Xuzhou 221018, China
| | - Hongwei Yin
- School of Mathematics and Statistics, Xuzhou University of Technology, Xuzhou 221018, China
| |
Collapse
|
3
|
Chen X, Zhou J. Multisensor Estimation Fusion on Statistical Manifold. ENTROPY (BASEL, SWITZERLAND) 2022; 24:e24121802. [PMID: 36554207 PMCID: PMC9777556 DOI: 10.3390/e24121802] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Revised: 12/02/2022] [Accepted: 12/06/2022] [Indexed: 05/28/2023]
Abstract
In the paper, we characterize local estimates from multiple distributed sensors as posterior probability densities, which are assumed to belong to a common parametric family. Adopting the information-geometric viewpoint, we consider such family as a Riemannian manifold endowed with the Fisher metric, and then formulate the fused density as an informative barycenter through minimizing the sum of its geodesic distances to all local posterior densities. Under the assumption of multivariate elliptical distribution (MED), two fusion methods are developed by using the minimal Manhattan distance instead of the geodesic distance on the manifold of MEDs, which both have the same mean estimation fusion, but different covariance estimation fusions. One obtains the fused covariance estimate by a robust fixed point iterative algorithm with theoretical convergence, and the other provides an explicit expression for the fused covariance estimate. At different heavy-tailed levels, the fusion results of two local estimates for a static target display that the two methods achieve a better approximate of the informative barycenter than some existing fusion methods. An application to distributed estimation fusion for dynamic systems with heavy-tailed process and observation noises is provided to demonstrate the performance of the two proposed fusion algorithms.
Collapse
Affiliation(s)
- Xiangbing Chen
- Division of Mathematics, Sichuan University Jinjiang College, Meishan 620860, China
| | - Jie Zhou
- College of Mathematics, Sichuan University, Chengdu 610064, China
| |
Collapse
|
4
|
Variational inference as iterative projection in a Bayesian Hilbert space with application to robotic state estimation. ROBOTICA 2022. [DOI: 10.1017/s0263574722001497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Abstract
Variational Bayesian inference is an important machine learning tool that finds application from statistics to robotics. The goal is to find an approximate probability density function (PDF) from a chosen family that is in some sense “closest” to the full Bayesian posterior. Closeness is typically defined through the selection of an appropriate loss functional such as the Kullback-Leibler (KL) divergence. In this paper, we explore a new formulation of variational inference by exploiting the fact that (most) PDFs are members of a Bayesian Hilbert space under careful definitions of vector addition, scalar multiplication, and an inner product. We show that, under the right conditions, variational inference based on KL divergence can amount to iterative projection, in the Euclidean sense, of the Bayesian posterior onto a subspace corresponding to the selected approximation family. We work through the details of this general framework for the specific case of the Gaussian approximation family and show the equivalence to another Gaussian variational inference approach. We furthermore discuss the implications for systems that exhibit sparsity, which is handled naturally in Bayesian space, and give an example of a high-dimensional robotic state estimation problem that can be handled as a result. We provide some preliminary examples of how the approach could be applied to non-Gaussian inference and discuss the limitations of the approach in detail to encourage follow-on work along these lines.
Collapse
|
5
|
Wang X, Jiang B, Ding SX, Lu N, Li Y. Extended Relevance Vector Machine-Based Remaining Useful Life Prediction for DC-Link Capacitor in High-Speed Train. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:9746-9755. [PMID: 33382664 DOI: 10.1109/tcyb.2020.3035796] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Remaining useful life (RUL) prediction is a reliable tool for the health management of components. The main concern of RUL prediction is how to accurately predict the RUL under uncertainties. In order to enhance the prediction accuracy under uncertain conditions, the relevance vector machine (RVM) is extended into the probability manifold to compensate for the weakness caused by evidence approximation of the RVM. First, tendency features are selected based on the batch samples. Then, a dynamic multistep regression model is built for well describing the influence of uncertainties. Furthermore, the degradation tendency is estimated to monitor degradation status continuously. As poorly estimated hyperparameters of RVM may result in low prediction accuracy, the established RVM model is extended to the probabilistic manifold for estimating the degradation tendency exactly. The RUL is then prognosticated by the first hitting time (FHT) method based on the estimated degradation tendency. The proposed schemes are illustrated by a case study, which investigated the capacitors' performance degradation in traction systems of high-speed trains.
Collapse
|
6
|
Langer C, Ay N. How Morphological Computation Shapes Integrated Information in Embodied Agents. Front Psychol 2021; 12:716433. [PMID: 34912262 PMCID: PMC8666602 DOI: 10.3389/fpsyg.2021.716433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 10/27/2021] [Indexed: 11/13/2022] Open
Abstract
The Integrated Information Theory provides a quantitative approach to consciousness and can be applied to neural networks. An embodied agent controlled by such a network influences and is being influenced by its environment. This involves, on the one hand, morphological computation within goal directed action and, on the other hand, integrated information within the controller, the agent's brain. In this article, we combine different methods in order to examine the information flows among and within the body, the brain and the environment of an agent. This allows us to relate various information flows to each other. We test this framework in a simple experimental setup. There, we calculate the optimal policy for goal-directed behavior based on the "planning as inference" method, in which the information-geometric em-algorithm is used to optimize the likelihood of the goal. Morphological computation and integrated information are then calculated with respect to the optimal policies. Comparing the dynamics of these measures under changing morphological circumstances highlights the antagonistic relationship between these two concepts. The more morphological computation is involved, the less information integration within the brain is required. In order to determine the influence of the brain on the behavior of the agent it is necessary to additionally measure the information flow to and from the brain.
Collapse
Affiliation(s)
- Carlotta Langer
- Hamburg University of Technology, Hamburg, Germany.,Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany
| | - Nihat Ay
- Hamburg University of Technology, Hamburg, Germany.,Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.,Santa Fe Institute, Santa Fe, NM, United States.,Leipzig University, Leipzig, Germany
| |
Collapse
|
7
|
Nomura Y, Yoshioka N, Nori F. Purifying Deep Boltzmann Machines for Thermal Quantum States. PHYSICAL REVIEW LETTERS 2021; 127:060601. [PMID: 34420335 DOI: 10.1103/physrevlett.127.060601] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Accepted: 07/01/2021] [Indexed: 06/13/2023]
Abstract
We develop two cutting-edge approaches to construct deep neural networks representing the purified finite-temperature states of quantum many-body systems. Both methods commonly aim to represent the Gibbs state by a highly expressive neural-network wave function, exemplifying the idea of purification. The first method is an entirely deterministic approach to generate deep Boltzmann machines representing the purified Gibbs state exactly. This strongly assures the remarkable flexibility of the ansatz which can fully exploit the quantum-to-classical mapping. The second method employs stochastic sampling to optimize the network parameters such that the imaginary time evolution is well approximated within the expressibility of neural networks. Numerical demonstrations for transverse-field Ising models and Heisenberg models show that our methods are powerful enough to investigate the finite-temperature properties of strongly correlated quantum many-body systems, even when the problematic effect of frustration is present.
Collapse
Affiliation(s)
- Yusuke Nomura
- RIKEN Center for Emergent Matter Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Nobuyuki Yoshioka
- Department of Applied Physics, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
- Theoretical Quantum Physics Laboratory, RIKEN Cluster for Pioneering Research (CPR), Wako-shi, Saitama 351-0198, Japan
| | - Franco Nori
- Theoretical Quantum Physics Laboratory, RIKEN Cluster for Pioneering Research (CPR), Wako-shi, Saitama 351-0198, Japan
- RIKEN Center for Quantum Computing (RQC), Wako-shi, Saitama 351-0198, Japan
- Physics Department, University of Michigan, Ann Arbor, Michigan 48109-1040, USA
| |
Collapse
|
8
|
Information geometry of hyperbolic-valued Boltzmann machines. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.12.048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
9
|
Aguilera M, Moosavi SA, Shimazaki H. A unifying framework for mean-field theories of asymmetric kinetic Ising systems. Nat Commun 2021; 12:1197. [PMID: 33608507 PMCID: PMC7895831 DOI: 10.1038/s41467-021-20890-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Accepted: 12/29/2020] [Indexed: 01/15/2023] Open
Abstract
Kinetic Ising models are powerful tools for studying the non-equilibrium dynamics of complex systems. As their behavior is not tractable for large networks, many mean-field methods have been proposed for their analysis, each based on unique assumptions about the system’s temporal evolution. This disparity of approaches makes it challenging to systematically advance mean-field methods beyond previous contributions. Here, we propose a unifying framework for mean-field theories of asymmetric kinetic Ising systems from an information geometry perspective. The framework is built on Plefka expansions of a system around a simplified model obtained by an orthogonal projection to a sub-manifold of tractable probability distributions. This view not only unifies previous methods but also allows us to develop novel methods that, in contrast with traditional approaches, preserve the system’s correlations. We show that these new methods can outperform previous ones in predicting and assessing network properties near maximally fluctuating regimes. Many mean-field theories are proposed for studying the non-equilibrium dynamics of complex systems, each based on specific assumptions about the system’s temporal evolution. Here, Aguilera et al. propose a unified framework for mean-field theories of asymmetric kinetic Ising systems to study non-equilibrium dynamics.
Collapse
Affiliation(s)
- Miguel Aguilera
- IAS-Research Center for Life, Mind, and Society, Department of Logic and Philosophy of Science, University of the Basque Country, Donostia, Spain. .,Department of Informatics & Sussex Neuroscience, University of Sussex, Falmer, Brighton, UK. .,ISAAC Lab, Aragón Institute of Engineering Research (I3A), University of Zaragoza, Zaragoza, Spain.
| | - S Amin Moosavi
- Graduate School of Informatics, Kyoto University, Kyoto, Japan.,Department of Neuroscience, Brown University, Providence, RI, USA
| | - Hideaki Shimazaki
- Graduate School of Informatics, Kyoto University, Kyoto, Japan.,Center for Human Nature, Artificial Intelligence, and Neuroscience (CHAIN), Hokkaido University, Sapporo, Hokkaido, Japan
| |
Collapse
|
10
|
Langer C, Ay N. Complexity as Causal Information Integration. ENTROPY 2020; 22:e22101107. [PMID: 33286876 PMCID: PMC7597220 DOI: 10.3390/e22101107] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 09/25/2020] [Accepted: 09/27/2020] [Indexed: 11/16/2022]
Abstract
Complexity measures in the context of the Integrated Information Theory of consciousness try to quantify the strength of the causal connections between different neurons. This is done by minimizing the KL-divergence between a full system and one without causal cross-connections. Various measures have been proposed and compared in this setting. We will discuss a class of information geometric measures that aim at assessing the intrinsic causal cross-influences in a system. One promising candidate of these measures, denoted by ΦCIS, is based on conditional independence statements and does satisfy all of the properties that have been postulated as desirable. Unfortunately it does not have a graphical representation, which makes it less intuitive and difficult to analyze. We propose an alternative approach using a latent variable, which models a common exterior influence. This leads to a measure ΦCII, Causal Information Integration, that satisfies all of the required conditions. Our measure can be calculated using an iterative information geometric algorithm, the em-algorithm. Therefore we are able to compare its behavior to existing integrated information measures.
Collapse
Affiliation(s)
- Carlotta Langer
- Max Planck Institute for Mathematics in the Sciences, 04103 Leipzig, Germany;
- Correspondence:
| | - Nihat Ay
- Max Planck Institute for Mathematics in the Sciences, 04103 Leipzig, Germany;
- Faculty of Mathematics and Computer Science, University of Leipzig, PF 100920, 04009 Leipzig, Germany
- Santa Fe Institute, Santa Fe, NM 87501, USA
| |
Collapse
|
11
|
Zhao X, Hou Y, Song D, Li W. A Confident Information First Principle for Parameter Reduction and Model Selection of Boltzmann Machines. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1608-1621. [PMID: 28328513 DOI: 10.1109/tnnls.2017.2664100] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Typical dimensionality reduction (DR) methods are data-oriented, focusing on directly reducing the number of random variables (or features) while retaining the maximal variations in the high-dimensional data. Targeting unsupervised situations, this paper aims to address the problem from a novel perspective and considers model-oriented DR in parameter spaces of binary multivariate distributions. Specifically, we propose a general parameter reduction criterion, called confident-information-first (CIF) principle, to maximally preserve confident parameters and rule out less confident ones. Formally, the confidence of each parameter can be assessed by its contribution to the expected Fisher information distance within a geometric manifold over the neighborhood of the underlying real distribution. Then, we demonstrate two implementations of CIF in different scenarios. First, when there are no observed samples, we revisit the Boltzmann machines (BMs) from a model selection perspective and theoretically show that both the fully visible BM and the BM with hidden units can be derived from the general binary multivariate distribution using the CIF principle. This finding would help us uncover and formalize the essential parts of the target density that BM aims to capture and the nonessential parts that BM should discard. Second, when there exist observed samples, we apply CIF to the model selection for BM, which is in turn made adaptive to the observed samples. The sample-specific CIF is a heuristic method to decide the priority order of parameters, which can improve the search efficiency without degrading the quality of model selection results as shown in a series of density estimation experiments.
Collapse
|
12
|
|
13
|
The Geometry of Signal Detection with Applications to Radar Signal Processing. ENTROPY 2016. [DOI: 10.3390/e18110381] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
14
|
Takenouchi T. A Novel Parameter Estimation Method for Boltzmann Machines. Neural Comput 2015; 27:2423-46. [PMID: 26378877 DOI: 10.1162/neco_a_00781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
We propose a novel estimator for a specific class of probabilistic models on discrete spaces such as the Boltzmann machine. The proposed estimator is derived from minimization of a convex risk function and can be constructed without calculating the normalization constant, whose computational cost is exponential order. We investigate statistical properties of the proposed estimator such as consistency and asymptotic normality in the framework of the estimating function. Small experiments show that the proposed estimator can attain comparable performance to the maximum likelihood expectation at a much lower computational cost and is applicable to high-dimensional data.
Collapse
|
15
|
|
16
|
Nakada Y, Wakahara M, Matsumoto T. Online Bayesian learning with natural sequential prior distribution. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2014; 25:40-54. [PMID: 24806643 DOI: 10.1109/tnnls.2013.2250999] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Online Bayesian learning has been successfully applied to online learning for multilayer perceptrons and radial basis functions. In online Bayesian learning, typically, the conventional transition model has been used. Although the conventional transition model is based on the squared norm of the difference between the current parameter vector and the previous parameter vector, the transition model does not adequately consider the difference between the current observation model and the previous observation model. To adequately consider this difference between the observation models, we propose a natural sequential prior. The proposed transition model uses a Fisher information matrix to consider the difference between the observation models more naturally. For validation, the proposed transition model is applied to an online learning problem for a three-layer perceptron.
Collapse
|
17
|
Dreaming of mathematical neuroscience for half a century. Neural Netw 2013; 37:48-51. [DOI: 10.1016/j.neunet.2012.09.014] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2012] [Revised: 09/18/2012] [Accepted: 09/24/2012] [Indexed: 11/30/2022]
|
18
|
Galtier MN, Faugeras OD, Bressloff PC. Hebbian learning of recurrent connections: a geometrical perspective. Neural Comput 2012; 24:2346-83. [PMID: 22594830 DOI: 10.1162/neco_a_00322] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
We show how a Hopfield network with modifiable recurrent connections undergoing slow Hebbian learning can extract the underlying geometry of an input space. First, we use a slow and fast analysis to derive an averaged system whose dynamics derives from an energy function and therefore always converges to equilibrium points. The equilibria reflect the correlation structure of the inputs, a global object extracted through local recurrent interactions only. Second, we use numerical methods to illustrate how learning extracts the hidden geometrical structure of the inputs. Indeed, multidimensional scaling methods make it possible to project the final connectivity matrix onto a Euclidean distance matrix in a high-dimensional space, with the neurons labeled by spatial position within this space. The resulting network structure turns out to be roughly convolutional. The residual of the projection defines the nonconvolutional part of the connectivity, which is minimized in the process. Finally, we show how restricting the dimension of the space where the neurons live gives rise to patterns similar to cortical maps. We motivate this using an energy efficiency argument based on wire length minimization. Finally, we show how this approach leads to the emergence of ocular dominance or orientation columns in primary visual cortex via the self-organization of recurrent rather than feedforward connections. In addition, we establish that the nonconvolutional (or long-range) connectivity is patchy and is co-aligned in the case of orientation learning.
Collapse
Affiliation(s)
- Mathieu N Galtier
- NeuroMathComp Project Team, INRIA Sophia-Antipolis Méditerranée, 06902 Sophia Antipolis, France.
| | | | | |
Collapse
|
19
|
State-space analysis of time-varying higher-order spike correlation for multiple neural spike train data. PLoS Comput Biol 2012; 8:e1002385. [PMID: 22412358 PMCID: PMC3297562 DOI: 10.1371/journal.pcbi.1002385] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2011] [Accepted: 12/28/2011] [Indexed: 11/23/2022] Open
Abstract
Precise spike coordination between the spiking activities of multiple neurons is suggested as an indication of coordinated network activity in active cell assemblies. Spike correlation analysis aims to identify such cooperative network activity by detecting excess spike synchrony in simultaneously recorded multiple neural spike sequences. Cooperative activity is expected to organize dynamically during behavior and cognition; therefore currently available analysis techniques must be extended to enable the estimation of multiple time-varying spike interactions between neurons simultaneously. In particular, new methods must take advantage of the simultaneous observations of multiple neurons by addressing their higher-order dependencies, which cannot be revealed by pairwise analyses alone. In this paper, we develop a method for estimating time-varying spike interactions by means of a state-space analysis. Discretized parallel spike sequences are modeled as multi-variate binary processes using a log-linear model that provides a well-defined measure of higher-order spike correlation in an information geometry framework. We construct a recursive Bayesian filter/smoother for the extraction of spike interaction parameters. This method can simultaneously estimate the dynamic pairwise spike interactions of multiple single neurons, thereby extending the Ising/spin-glass model analysis of multiple neural spike train data to a nonstationary analysis. Furthermore, the method can estimate dynamic higher-order spike interactions. To validate the inclusion of the higher-order terms in the model, we construct an approximation method to assess the goodness-of-fit to spike data. In addition, we formulate a test method for the presence of higher-order spike correlation even in nonstationary spike data, e.g., data from awake behaving animals. The utility of the proposed methods is tested using simulated spike data with known underlying correlation dynamics. Finally, we apply the methods to neural spike data simultaneously recorded from the motor cortex of an awake monkey and demonstrate that the higher-order spike correlation organizes dynamically in relation to a behavioral demand. Nearly half a century ago, the Canadian psychologist D. O. Hebb postulated the formation of assemblies of tightly connected cells in cortical recurrent networks because of changes in synaptic weight (Hebb's learning rule) by repetitive sensory stimulation of the network. Consequently, the activation of such an assembly for processing sensory or behavioral information is likely to be expressed by precisely coordinated spiking activities of the participating neurons. However, the available analysis techniques for multiple parallel neural spike data do not allow us to reveal the detailed structure of transiently active assemblies as indicated by their dynamical pairwise and higher-order spike correlations. Here, we construct a state-space model of dynamic spike interactions, and present a recursive Bayesian method that makes it possible to trace multiple neurons exhibiting such precisely coordinated spiking activities in a time-varying manner. We also formulate a hypothesis test of the underlying dynamic spike correlation, which enables us to detect the assemblies activated in association with behavioral events. Therefore, the proposed method can serve as a useful tool to test Hebb's cell assembly hypothesis.
Collapse
|
20
|
|
21
|
Takenouchi T, Ishii S. A multiclass classification method based on decoding of binary classifiers. Neural Comput 2009; 21:2049-81. [PMID: 19292646 DOI: 10.1162/neco.2009.03-08-740] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
In this letter, we present new methods of multiclass classification that combine multiple binary classifiers. Misclassification of each binary classifier is formulated as a bit inversion error with probabilistic models by making an analogy to the context of information transmission theory. Dependence between binary classifiers is incorporated into our model, which makes a decoder a type of Boltzmann machine. We performed experimental studies using a synthetic data set, data sets from the UCI repository, and bioinformatics data sets, and the results show that the proposed methods are superior to the existing multiclass classification methods.
Collapse
Affiliation(s)
- Takashi Takenouchi
- Graduate School of Information Science, Nara Institute of Science and Technology, Ikoma, Nara 630-0192, Japan.
| | | |
Collapse
|
22
|
Turchetti C, Crippa P, Pirani M, Biagetti G. Representation of nonlinear random transformations by non-gaussian stochastic neural networks. ACTA ACUST UNITED AC 2008; 19:1033-60. [PMID: 18541503 DOI: 10.1109/tnn.2007.2000055] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The learning capability of neural networks is equivalent to modeling physical events that occur in the real environment. Several early works have demonstrated that neural networks belonging to some classes are universal approximators of input-output deterministic functions. Recent works extend the ability of neural networks in approximating random functions using a class of networks named stochastic neural networks (SNN). In the language of system theory, the approximation of both deterministic and stochastic functions falls within the identification of nonlinear no-memory systems. However, all the results presented so far are restricted to the case of Gaussian stochastic processes (SPs) only, or to linear transformations that guarantee this property. This paper aims at investigating the ability of stochastic neural networks to approximate nonlinear input-output random transformations, thus widening the range of applicability of these networks to nonlinear systems with memory. In particular, this study shows that networks belonging to a class named non-Gaussian stochastic approximate identity neural networks (SAINNs) are capable of approximating the solutions of large classes of nonlinear random ordinary differential transformations. The effectiveness of this approach is demonstrated and discussed by some application examples.
Collapse
Affiliation(s)
- Claudio Turchetti
- DEIT-Dipartimento di Elettronica, Intelligenza Artificiale e Telecomunicazioni, Università Politecnica delle Marche, I-60131 Ancona, Italy.
| | | | | | | |
Collapse
|
23
|
Cousseau F, Ozeki T, Amari SI. Dynamics of Learning in Multilayer Perceptrons Near Singularities. ACTA ACUST UNITED AC 2008; 19:1313-28. [DOI: 10.1109/tnn.2008.2000391] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
24
|
Abstract
We considered a gamma distribution of interspike intervals as a statistical model for neuronal spike generation. A gamma distribution is a natural extension of the Poisson process taking the effect of a refractory period into account. The model is specified by two parameters: a time-dependent firing rate and a shape parameter that characterizes spiking irregularities of individual neurons. Because the environment changes over time, observed data are generated from a model with a time-dependent firing rate, which is an unknown function. A statistical model with an unknown function is called a semiparametric model and is generally very difficult to solve. We used a novel method of estimating functions in information geometry to estimate the shape parameter without estimating the unknown function. We obtained an optimal estimating function analytically for the shape parameter independent of the functional form of the firing rate. This estimation is efficient without Fisher information loss and better than maximum likelihood estimation. We suggest a measure of spiking irregularity based on the estimating function, which may be useful for characterizing individual neurons in changing environments.
Collapse
Affiliation(s)
- Keiji Miura
- Department of Physics, Kyoto University, Kyoto 606-8502, and Intelligent Cooperation and Control, PRESTO, JST, Chiba 277-8561, Japan.
| | | | | |
Collapse
|
25
|
Tatsuno M, Okada M. Investigation of possible neural architectures underlying information-geometric measures. Neural Comput 2004; 16:737-65. [PMID: 15025828 DOI: 10.1162/089976604322860686] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
A novel analytical method based on information geometry was recently proposed, and this method may provide useful insights into the statistical interactions within neural groups. The link between informationgeometric measures and the structure of neural interactions has not yet been elucidated, however, because of the ill-posed nature of the problem. Here, possible neural architectures underlying information-geometric measures are investigated using an isolated pair and an isolated triplet of model neurons. By assuming the existence of equilibrium states, we derive analytically the relationship between the information-geometric parameters and these simple neural architectures. For symmetric networks, the first- and second-order information-geometric parameters represent, respectively, the external input and the underlying connections between the neurons provided that the number of neurons used in the parameter estimation in the log-linear model and the number of neurons in the network are the same. For asymmetric networks, however, these parameters are dependent on both the intrinsic connections and the external inputs to each neuron. In addition, we derive the relation between the information-geometric parameter corresponding to the two-neuron interaction and a conventional cross-correlation measure. We also show that the information-geometric parameters vary depending on the number of neurons assumed for parameter estimation in the log-linear model. This finding suggests a need to examine the information-geometric method carefully. A possible criterion for choosing an appropriate orthogonal coordinate is also discussed. This article points out the importance of a model-based approach and sheds light on the possible neural structure underlying the application of information geometry to neural network analysis.
Collapse
Affiliation(s)
- Masami Tatsuno
- ARL Division of Neural Systems, Memory and Aging, The University of Arizona, Tucson, AZ 85724-5115, USA.
| | | |
Collapse
|
26
|
Abstract
From a smooth, strictly convex function Φ: Rn → R, a parametric family of divergence function DΦ(α) may be introduced: [Formula: see text] for x, y, ε int dom(Φ) and for α ε R, with DΦ(±1 defined through taking the limit of α. Each member is shown to induce an α-independent Riemannian metric, as well as a pair of dual α-connections, which are generally nonflat, except for α = ±1. In the latter case, D(±1)Φ reduces to the (nonparametric) Bregman divergence, which is representable using and its convex conjugate Φ * and becomes the canonical divergence for dually flat spaces (Amari, 1982, 1985; Amari & Nagaoka, 2000). This formulation based on convex analysis naturally extends the information-geometric interpretation of divergence functions (Eguchi, 1983) to allow the distinction between two different kinds of duality: referential duality (α -α) and representational duality (Φ Φ *). When applied to (not necessarily normalized) probability densities, the concept of conjugated representations of densities is introduced, so that ± α-connections defined on probability densities embody both referential and representational duality and are hence themselves bidual. When restricted to a finite-dimensional affine submanifold, the natural parameters of a certain representation of densities and the expectation parameters under its conjugate representation form biorthogonal coordinates. The alpha representation (indexed by β now, β ε [−1, 1]) is shown to be the only measure-invariant representation. The resulting two-parameter family of divergence functionals D(α, β), (α, β) ε [−1, 1] × [-1, 1] induces identical Fisher information but bidual alpha-connection pairs; it reduces in form to Amari's alpha-divergence family when α =±1 or when β = 1, but to the family of Jensen difference (Rao, 1987) when β = 1.
Collapse
Affiliation(s)
- Jun Zhang
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
27
|
Abstract
This study introduces information-geometric measures to analyze neural firing patterns by taking not only the second-order but also higher-order interactions among neurons into account. Information geometry provides useful tools and concepts for this purpose, including the orthogonality of coordinate parameters and the Pythagoras relation in the Kullback-Leibler divergence. Based on this orthogonality, we show a novel method for analyzing spike firing patterns by decomposing the interactions of neurons of various orders. As a result, purely pairwise, triple-wise, and higher-order interactions are singled out. We also demonstrate the benefits of our proposal by using several examples.
Collapse
Affiliation(s)
- Hiroyuki Nakahara
- Laboratory for Mathematical Neuroscience, RIKEN Brain Science Institute, Wako, Saitama, 351-0198, Japan.
| | | |
Collapse
|
28
|
Albizuri F, Gonzalez A, Graña M, d'Anjou A. Neural learning for distributions on categorical data. Neurocomputing 2000. [DOI: 10.1016/s0925-2312(00)00291-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
29
|
Kappen HJ, Spanjers JJ. Mean field theory for asymmetric neural networks. PHYSICAL REVIEW. E, STATISTICAL PHYSICS, PLASMAS, FLUIDS, AND RELATED INTERDISCIPLINARY TOPICS 2000; 61:5658-5663. [PMID: 11031623 DOI: 10.1103/physreve.61.5658] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/1999] [Indexed: 05/23/2023]
Abstract
The computation of mean firing rates and correlations is intractable for large neural networks. For symmetric networks one can derive mean field approximations using the Taylor series expansion of the free energy as proposed by Plefka. In asymmetric networks, the concept of free energy is absent. Therefore, it is not immediately obvious how to extend this method to asymmetric networks. In this paper we extend Plefka's approach to asymmetric networks and in fact to arbitrary probability distributions. The method is based on an information geometric argument. The method is illustrated for asymmetric neural networks with sequential dynamics. We compare our approximate analytical results with Monte Carlo simulations for a network of 100 neurons. It is shown that the quality of the approximation for asymmetric networks is as good as for symmetric networks.
Collapse
Affiliation(s)
- H J Kappen
- SNN University of Nijmegen, The Netherlands
| | | |
Collapse
|
30
|
Abstract
We introduce an efficient method for learning and inference in higher order Boltzmann machines. The method is based on mean field theory with the linear response correction. We compute the correlations using the exact and the approximated method for a fully connected third order network of ten neurons. In addition, we compare the results of the exact and approximate learning algorithm. Finally we use the presented method to solve the shifter problem. We conclude that the linear response approximation gives good results as long as the couplings are not too large.
Collapse
Affiliation(s)
- M A Leisink
- Department of Biophysics, University of Nijmegen, The Netherlands. www.mbfys.kun.nl/martijn
| | | |
Collapse
|
31
|
Abstract
When a parameter space has a certain underlying structure, the ordinary gradient of a function does not represent its steepest direction, but the natural gradient does. Information geometry is used for calculating the natural gradients in the parameter space of perceptrons, the space of matrices (for blind source separation), and the space of linear dynamical systems (for blind source deconvolution). The dynamical behavior of natural gradient online learning is analyzed and is proved to be Fisher efficient, implying that it has asymptotically the same performance as the optimal batch estimation of parameters. This suggests that the plateau phenomenon, which appears in the backpropagation learning algorithm of multilayer perceptrons, might disappear or might not be so serious when the natural gradient is used. An adaptive method of updating the learning rate is proposed and analyzed.
Collapse
|
32
|
Sheng Ma, Chuanyi Ji. Fast training of recurrent networks based on the EM algorithm. ACTA ACUST UNITED AC 1998; 9:11-26. [DOI: 10.1109/72.655025] [Citation(s) in RCA: 23] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
33
|
|
34
|
Farmer J, Ji C, Ma S. An Efficient EM-based Training Algorithm for Feedforward Neural Networks. Neural Netw 1997; 10:243-256. [PMID: 12662523 DOI: 10.1016/s0893-6080(96)00049-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
A fast training algorithm is developed for two-layer feedforward neural networks based on a probabilistic model for hidden representations and the EM algorithm. The algorithm decomposes training the original two-layer networks into training a set of single neurons. The individual neurons are then trained via a linear weighted regression algorithm. Significant improvement on training speed has been made using this algorithm for several bench-mark problems. Copyright 1997 Elsevier Science Ltd. All Rights Reserved.
Collapse
|
35
|
Kogiantis AG, Papantoni-Kazakos T. Operations and learning in neural networks for robust prediction. IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS. PART B, CYBERNETICS : A PUBLICATION OF THE IEEE SYSTEMS, MAN, AND CYBERNETICS SOCIETY 1997; 27:402-11. [PMID: 18255880 DOI: 10.1109/3477.584948] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We consider stochastic neural networks, the objective of which is robust prediction for spatial control. We develop neural structures and operations, in which the representations of the environment are preprocessed and provided in quantized format to the prediction layer, and in which the response of each neuron is binary. We also identify the pertinent stochastic network parameters, and subsequently develop a supervised learning algorithm for them. The on-line learning algorithm is based an the Kullback-Leibler performance criterion, it induces backpropagation, and guarantees fast convergence to the prediction probabilities induced by the environment, with probability one.
Collapse
Affiliation(s)
- A G Kogiantis
- Dept. of Electr. & Comput. Eng., Univ. of Southwestern Louisiana, Lafayette, LA
| | | |
Collapse
|
36
|
Yatsenko V. Determining the characteristics of water pollutants by neural sensors and pattern recognition methods. J Chromatogr A 1996; 722:233-43. [PMID: 9019297 DOI: 10.1016/0021-9673(95)00571-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
In this paper we have researched the influence of pollutants on such biological objects as photosynthesizing systems in order to reveal the capabilities and features of their application as the controlling sensor in integral ecological monitoring microsystems. It is proposed to elaborate upon the intelligent sensor on the basis of: (1) neural network technologies; (2) the possibility to separate the characteristics of the substances dissolved in water by means of the methods which recognize patterns in a functional space of the fluorescence curves; (3) the results of the chromatographic analysis of standard water samples. This sensor allows to predict water state and to make the optimal decisions for correcting an ecosystem's condition. The efficiency of such a system for water analysis can be improved using the dual measurement principle. This principle suggests identification of a biosensor model according to experimental data.
Collapse
Affiliation(s)
- V Yatsenko
- Department of Control Systems, Institute of Cybernetics, Kiev, Ukraine
| |
Collapse
|
37
|
Martignon L, Von Hasseln H, Grün S, Aertsen A, Palm G. Detecting higher-order interactions among the spiking events in a group of neurons. BIOLOGICAL CYBERNETICS 1995; 73:69-81. [PMID: 7654851 DOI: 10.1007/bf00199057] [Citation(s) in RCA: 52] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
We propose a formal framework for the description of interactions among groups of neurons. This framework is not restricted to the common case of pair interactions, but also incorporates higher-order interactions, which cannot be reduced to lower-order ones. We derive quantitative measures to detect the presence of such interactions in experimental data, by statistical analysis of the frequency distribution of higher-order correlations in multiple neuron spike train data. Our first step is to represent a frequency distribution as a Markov field on the minimal graph it induces. We then show the invariance of this graph with regard to changes of state. Clearly, only linear Markov fields can be adequately represented by graphs. Higher-order interdependencies, which are reflected by the energy expansion of the distribution, require more complex graphical schemes, like constellations or assembly diagrams, which we introduce and discuss. The coefficients of the energy expansion not only point to the interactions among neurons but are also a measure of their strength. We investigate the statistical meaning of detected interactions in an information theoretic sense and propose minimum relative entropy approximations as null hypotheses for significance tests. We demonstrate the various steps of our method in the situation of an empirical frequency distribution on six neurons, extracted from data on simultaneous multineuron recordings from the frontal cortex of a behaving monkey and close with a brief outlook on future work.
Collapse
Affiliation(s)
- L Martignon
- Department of Neural Information Processing, University of Ulm, Germany
| | | | | | | | | |
Collapse
|
38
|
Anderson NH, Titterington DM. Beyond the binary Boltzmann machine. IEEE TRANSACTIONS ON NEURAL NETWORKS 1995; 6:1229-1236. [PMID: 18263410 DOI: 10.1109/72.410364] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
The definition of the usual Boltzmann machine is extended to allow for neurons with polytomous (multicategory) rather than simply binary responses. Updating rules are defined along with the associated stationary distributions, and an alternating minimization method is described for the purposes of training. Emphasis is placed on the relevance of statistical ideas, including polytomous logistic regression, the iterative proportional fitting procedure and the EM algorithm.
Collapse
|
39
|
|
40
|
Abstract
Hidden units play an important role in neural networks, although their activation values are unknown in many learning situations. The EM algorithm (statistical algorithm) and the em algorithm (information-geometric one) have been proposed so far in this connection, and the effectiveness of such algorithms is recognized in many areas of research. The present note points out that these two algorithms are equivalent under a certain condition, although they are different in general.
Collapse
Affiliation(s)
- Shun-ichi Amari
- Department of Mathematical Engineering, University of Tokyo, Bunkyo-ku, Tokyo 113, Japan
| |
Collapse
|
41
|
Kosmatopoulos EB, Christodoulou MA. The Boltzmann g-RHONN: A learning machine for estimating unknown probability distributions. Neural Netw 1994. [DOI: 10.1016/0893-6080(94)90021-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
42
|
|
43
|
|