1
|
Verzelli P, Alippi C, Livi L, Tino P. Input-to-State Representation in Linear Reservoirs Dynamics. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:4598-4609. [PMID: 33651697 DOI: 10.1109/tnnls.2021.3059389] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Reservoir computing is a popular approach to design recurrent neural networks, due to its training simplicity and approximation performance. The recurrent part of these networks is not trained (e.g., via gradient descent), making them appealing for analytical studies by a large community of researchers with backgrounds spanning from dynamical systems to neuroscience. However, even in the simple linear case, the working principle of these networks is not fully understood and their design is usually driven by heuristics. A novel analysis of the dynamics of such networks is proposed, which allows the investigator to express the state evolution using the controllability matrix. Such a matrix encodes salient characteristics of the network dynamics; in particular, its rank represents an input-independent measure of the memory capacity of the network. Using the proposed approach, it is possible to compare different reservoir architectures and explain why a cyclic topology achieves favorable results as verified by practitioners.
Collapse
|
2
|
Gain-scheduled state estimation for discrete-time complex networks under bit-rate constraints. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.03.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
3
|
Dynamical Systems Implementation of Intrinsic Sentence Meaning. Minds Mach (Dordr) 2022. [DOI: 10.1007/s11023-022-09590-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
4
|
Oliva C, Lago-Fernández LF. Stability of internal states in recurrent neural networks trained on regular languages. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.04.058] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
5
|
Wang HT, Liu ZT, He Y. Exponential stability criterion of the switched neural networks with time-varying delay. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.11.022] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
6
|
Wang Q, Zhang K, Ororbia Ii AG, Xing X, Liu X, Giles CL. An Empirical Evaluation of Rule Extraction from Recurrent Neural Networks. Neural Comput 2018; 30:2568-2591. [PMID: 30021081 DOI: 10.1162/neco_a_01111] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Rule extraction from black box models is critical in domains that require model validation before implementation, as can be the case in credit scoring and medical diagnosis. Though already a challenging problem in statistical learning in general, the difficulty is even greater when highly nonlinear, recursive models, such as recurrent neural networks (RNNs), are fit to data. Here, we study the extraction of rules from second-order RNNs trained to recognize the Tomita grammars. We show that production rules can be stably extracted from trained RNNs and that in certain cases, the rules outperform the trained RNNs.
Collapse
Affiliation(s)
| | - Kaixuan Zhang
- Pennsylvania State University, State College, PA 16801, U.S.A.
| | | | - Xinyu Xing
- Pennsylvania State University, State College, PA 16801, U.S.A.
| | - Xue Liu
- McGill University, Montreal, Quebec H3A 0G4, Canada
| | - C Lee Giles
- Pennsylvania State University, State College, PA 16801, U.S.A.
| |
Collapse
|
7
|
Luo Y, Song B, Liang J, Dobaie AM. Finite-time state estimation for jumping recurrent neural networks with deficient transition probabilities and linear fractional uncertainties. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.04.039] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
8
|
Shen H, Zhu Y, Zhang L, Park JH. Extended Dissipative State Estimation for Markov Jump Neural Networks With Unreliable Links. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:346-358. [PMID: 26761905 DOI: 10.1109/tnnls.2015.2511196] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
This paper is concerned with the problem of extended dissipativity-based state estimation for discrete-time Markov jump neural networks (NNs), where the variation of the piecewise time-varying transition probabilities of Markov chain is subject to a set of switching signals satisfying an average dwell-time property. The communication links between the NNs and the estimator are assumed to be imperfect, where the phenomena of signal quantization and data packet dropouts occur simultaneously. The aim of this paper is to contribute with a Markov switching estimator design method, which ensures that the resulting error system is extended stochastically dissipative, in the simultaneous presences of packet dropouts and signal quantization stemmed from unreliable communication links. Sufficient conditions for the solvability of such a problem are established. Based on the derived conditions, an explicit expression of the desired Markov switching estimator is presented. Finally, two illustrated examples are given to show the effectiveness of the proposed design method.
Collapse
|
9
|
Prakash M, Balasubramaniam P, Lakshmanan S. Synchronization of Markovian jumping inertial neural networks and its applications in image encryption. Neural Netw 2016; 83:86-93. [DOI: 10.1016/j.neunet.2016.07.001] [Citation(s) in RCA: 128] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2015] [Revised: 05/23/2016] [Accepted: 07/01/2016] [Indexed: 11/15/2022]
|
10
|
Cui J, Liu Z, Xu L. Modelling and simulation for table tennis referee regulation based on finite state machine. J Sports Sci 2016; 35:1888-1896. [PMID: 27733099 DOI: 10.1080/02640414.2016.1241417] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
As referee's decisions are made artificially in traditional table tennis matches, many factors in a match, such as fatigue and subjective tendency, may lead to unjust decision. Based on finite state machine (FSM), this paper presents a model for table tennis referee regulation to substitute manual decisions. In this model, the trajectory of the ball is recorded through a binocular visual system while the complete rules extracted from the International Table Tennis Federation (ITTF) rules are described based on FSM. The final decision for the competition is made based on expert system theory. Simulation result shows that the proposed model has high accuracy, and can be generalised to other similar games such as badminton, volleyball, etc.
Collapse
Affiliation(s)
- Jianjiang Cui
- a School of Information Science and Engineering , Northeastern University , Shenyang , China
| | - Zixuan Liu
- a School of Information Science and Engineering , Northeastern University , Shenyang , China
| | - Long Xu
- a School of Information Science and Engineering , Northeastern University , Shenyang , China
| |
Collapse
|
11
|
Schmidhuber J. Deep learning in neural networks: an overview. Neural Netw 2014; 61:85-117. [PMID: 25462637 DOI: 10.1016/j.neunet.2014.09.003] [Citation(s) in RCA: 3905] [Impact Index Per Article: 355.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2014] [Revised: 09/12/2014] [Accepted: 09/14/2014] [Indexed: 11/30/2022]
Abstract
In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarizes relevant work, much of it from the previous millennium. Shallow and Deep Learners are distinguished by the depth of their credit assignment paths, which are chains of possibly learnable, causal links between actions and effects. I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.
Collapse
Affiliation(s)
- Jürgen Schmidhuber
- Swiss AI Lab IDSIA, Istituto Dalle Molle di Studi sull'Intelligenza Artificiale, University of Lugano & SUPSI, Galleria 2, 6928 Manno-Lugano, Switzerland.
| |
Collapse
|
12
|
Beer RD, Williams PL. Information Processing and Dynamics in Minimally Cognitive Agents. Cogn Sci 2014; 39:1-38. [DOI: 10.1111/cogs.12142] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2012] [Revised: 12/05/2013] [Accepted: 12/16/2013] [Indexed: 11/30/2022]
Affiliation(s)
- Randall D. Beer
- Cognitive Science Program
- School of Informatics and Computing; Indiana University
| | | |
Collapse
|
13
|
Abstract
The human capacity to acquire language is an outstanding scientific challenge to understand. Somehow our language capacities arise from the way the human brain processes, develops and learns in interaction with its environment. To set the stage, we begin with a summary of what is known about the neural organization of language and what our artificial grammar learning (AGL) studies have revealed. We then review the Chomsky hierarchy in the context of the theory of computation and formal learning theory. Finally, we outline a neurobiological model of language acquisition and processing based on an adaptive, recurrent, spiking network architecture. This architecture implements an asynchronous, event-driven, parallel system for recursive processing. We conclude that the brain represents grammars (or more precisely, the parser/generator) in its connectivity, and its ability for syntax is based on neurobiological infrastructure for structured sequence processing. The acquisition of this ability is accounted for in an adaptive dynamical systems framework. Artificial language learning (ALL) paradigms might be used to study the acquisition process within such a framework, as well as the processing properties of the underlying neurobiological infrastructure. However, it is necessary to combine and constrain the interpretation of ALL results by theoretical models and empirical studies on natural language processing. Given that the faculty of language is captured by classical computational models to a significant extent, and that these can be embedded in dynamic network architectures, there is hope that significant progress can be made in understanding the neurobiology of the language faculty.
Collapse
|
14
|
The Linguistic Subversion of Mental Representation. Minds Mach (Dordr) 2012. [DOI: 10.1007/s11023-012-9275-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
15
|
|
16
|
Petersson KM, Folia V, Hagoort P. What artificial grammar learning reveals about the neurobiology of syntax. BRAIN AND LANGUAGE 2012; 120:83-95. [PMID: 20943261 DOI: 10.1016/j.bandl.2010.08.003] [Citation(s) in RCA: 103] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2009] [Revised: 08/25/2010] [Accepted: 08/29/2010] [Indexed: 05/12/2023]
Abstract
In this paper we examine the neurobiological correlates of syntax, the processing of structured sequences, by comparing FMRI results on artificial and natural language syntax. We discuss these and similar findings in the context of formal language and computability theory. We used a simple right-linear unification grammar in an implicit artificial grammar learning paradigm in 32 healthy Dutch university students (natural language FMRI data were already acquired for these participants). We predicted that artificial syntax processing would engage the left inferior frontal region (BA 44/45) and that this activation would overlap with syntax-related variability observed in the natural language experiment. The main findings of this study show that the left inferior frontal region centered on BA 44/45 is active during artificial syntax processing of well-formed (grammatical) sequence independent of local subsequence familiarity. The same region is engaged to a greater extent when a syntactic violation is present and structural unification becomes difficult or impossible. The effects related to artificial syntax in the left inferior frontal region (BA 44/45) were essentially identical when we masked these with activity related to natural syntax in the same subjects. Finally, the medial temporal lobe was deactivated during this operation, consistent with the view that implicit processing does not rely on declarative memory mechanisms that engage the medial temporal lobe. In the context of recent FMRI findings, we raise the question whether Broca's region (or subregions) is specifically related to syntactic movement operations or the processing of hierarchically nested non-adjacent dependencies in the discussion section. We conclude that this is not the case. Instead, we argue that the left inferior frontal region is a generic on-line sequence processor that unifies information from various sources in an incremental and recursive manner, independent of whether there are any processing requirements related to syntactic movement or hierarchically nested structures. In addition, we argue that the Chomsky hierarchy is not directly relevant for neurobiological systems.
Collapse
Affiliation(s)
- Karl-Magnus Petersson
- Max Planck Institute for Psycholinguistics, P.O. Box 310, NL-6500 AH Nijmegen, The Netherlands.
| | | | | |
Collapse
|
17
|
Zheng-Guang Wu, Peng Shi, Hongye Su, Jian Chu. Delay-Dependent Stability Analysis for Switched Neural Networks With Time-Varying Delay. ACTA ACUST UNITED AC 2011; 41:1522-30. [DOI: 10.1109/tsmcb.2011.2157140] [Citation(s) in RCA: 150] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
18
|
|
19
|
Yurong Liu, Zidong Wang, Jinling Liang, Xiaohui Liu. Stability and Synchronization of Discrete-Time Markovian Jumping Neural Networks With Mixed Mode-Dependent Time Delays. ACTA ACUST UNITED AC 2009; 20:1102-16. [DOI: 10.1109/tnn.2009.2016210] [Citation(s) in RCA: 294] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
20
|
Liu Y, Wang Z, Liu X. On Global Stability of Delayed BAM Stochastic Neural Networks with Markovian Switching. Neural Process Lett 2009. [DOI: 10.1007/s11063-009-9107-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
21
|
Wang Z, Liu Y, Liu X. State estimation for jumping recurrent neural networks with discrete and distributed delays. Neural Netw 2008; 22:41-8. [PMID: 19041222 DOI: 10.1016/j.neunet.2008.09.015] [Citation(s) in RCA: 103] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2005] [Accepted: 09/15/2008] [Indexed: 10/21/2022]
Abstract
This paper is concerned with the state estimation problem for a class of Markovian neural networks with discrete and distributed time-delays. The neural networks have a finite number of modes, and the modes may jump from one to another according to a Markov chain. The main purpose is to estimate the neuron states, through available output measurements, such that for all admissible time-delays, the dynamics of the estimation error is globally asymptotically stable in the mean square. An effective linear matrix inequality approach is developed to solve the neuron state estimation problem. Both the existence conditions and the explicit characterization of the desired estimator are derived. Furthermore, it is shown that the traditional stability analysis issue for delayed neural networks with Markovian jumping parameters can be included as a special case of our main results. Finally, numerical examples are given to illustrate the applicability of the proposed design method.
Collapse
Affiliation(s)
- Zidong Wang
- Department of Information Systems and Computing, Brunel University, Uxbridge, Middlesex, UB8 3PH, UK.
| | | | | |
Collapse
|
22
|
Liu Y, Wang Z, Liang J, Liu X. Synchronization and State Estimation for Discrete-Time Complex Networks With Distributed Delays. ACTA ACUST UNITED AC 2008; 38:1314-25. [PMID: 18784014 DOI: 10.1109/tsmcb.2008.925745] [Citation(s) in RCA: 129] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Yurong Liu
- Department of Mathematics, Yangzhou University,Yangzhou 225002, China.
| | | | | | | |
Collapse
|
23
|
Cartling B. On the implicit acquisition of a context-free grammar by a simple recurrent neural network. Neurocomputing 2008. [DOI: 10.1016/j.neucom.2007.05.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
24
|
|
25
|
Abstract
This letter presents an algorithm, CrySSMEx, for extracting minimal finite state machine descriptions of dynamic systems such as recurrent neural networks. Unlike previous algorithms, CrySSMEx is parameter free and deterministic, and it efficiently generates a series of increasingly refined models. A novel finite stochastic model of dynamic systems and a novel vector quantization function have been developed to take into account the state-space dynamics of the system. The experiments show that (1) extraction from systems that can be described as regular grammars is trivial, (2) extraction from high-dimensional systems is feasible, and (3) extraction of approximative models from chaotic systems is possible. The results are promising, and an analysis of shortcomings suggests some possible further improvements. Some largely overlooked connections, of the field of rule extraction from recurrent neural networks, to other fields are also identified.
Collapse
|
26
|
Sun GZ, Giles CL, Chen HH. The neural network pushdown automaton: Architecture, dynamics and training. ACTA ACUST UNITED AC 2006. [DOI: 10.1007/bfb0054003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
|
27
|
Abstract
We investigate possibilities of inducing temporal structures without fading memory in recurrent networks of spiking neurons strictly operating in the pulse-coding regime. We extend the existing gradient-based algorithm for training feedforward spiking neuron networks, SpikeProp (Bohte, Kok, & La Poutré, 2002), to recurrent network topologies, so that temporal dependencies in the input stream are taken into account. It is shown that temporal structures with unbounded input memory specified by simple Moore machines (MM) can be induced by recurrent spiking neuron networks (RSNN). The networks are able to discover pulse-coded representations of abstract information processing states coding potentially unbounded histories of processed inputs. We show that it is often possible to extract from trained RSNN the target MM by grouping together similar spike trains appearing in the recurrent layer. Even when the target MM was not perfectly induced in a RSNN, the extraction procedure was able to reveal weaknesses of the induced mechanism and the extent to which the target machine had been learned.
Collapse
Affiliation(s)
| | - Ashely J. S. Mills
- School of Computer Science, University of Birmingham, Birmingham B15 2TT, U.K
| |
Collapse
|
28
|
Abstract
Rule extraction (RE) from recurrent neural networks (RNNs) refers to finding models of the underlying RNN, typically in the form of finite state machines, that mimic the network to a satisfactory degree while having the advantage of being more transparent. RE from RNNs can be argued to allow a deeper and more profound form of analysis of RNNs than other, more or less ad hoc methods. RE may give us understanding of RNNs in the intermediate levels between quite abstract theoretical knowledge of RNNs as a class of computing devices and quantitative performance evaluations of RNN instantiations. The development of techniques for extraction of rules from RNNs has been an active field since the early 1990s. This article reviews the progress of this development and analyzes it in detail. In order to structure the survey and evaluate the techniques, a taxonomy specifically designed for this purpose has been developed. Moreover, important open research issues are identified that, if addressed properly, possibly can give the field a significant push forward.
Collapse
Affiliation(s)
- Henrik Jacobsson
- School of Humanities and Informatics, University of Skövde, Skövde, Sweden, and Department of Computer Science, University of Sheffield, United Kingdom
| |
Collapse
|
29
|
|
30
|
Robust Simulations of Turing Machines with Analytic Maps and Flows. NEW COMPUTATIONAL PARADIGMS 2005. [DOI: 10.1007/11494645_21] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
31
|
|
32
|
Šter B. An integrated learning approach to environment modelling in mobile robot navigation. Neurocomputing 2004. [DOI: 10.1016/j.neucom.2003.10.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
33
|
Vahed A, Omlin CW. A Machine Learning Method for Extracting Symbolic Knowledge from Recurrent Neural Networks. Neural Comput 2004; 16:59-71. [PMID: 15006023 DOI: 10.1162/08997660460733994] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Neural networks do not readily provide an explanation of the knowledge stored in their weights as part of their information processing. Until recently, neural networks were considered to be black boxes, with the knowledge stored in their weights not readily accessible. Since then, research has resulted in a number of algorithms for extracting knowledge in symbolic form from trained neural networks. This article addresses the extraction of knowledge in symbolic form from recurrent neural networks trained to behave like deterministic finite-state automata (DFAs). To date, methods used to extract knowledge from such networks have relied on the hypothesis that networks' states tend to cluster and that clusters of network states correspond to DFA states. The computational complexity of such a cluster analysis has led to heuristics that either limit the number of clusters that may form during training or limit the exploration of the space of hidden recurrent state neurons. These limitations, while necessary, may lead to decreased fidelity, in which the extracted knowledge may not model the true behavior of a trained network, perhaps not even for the training set. The method proposed here uses a polynomial time, symbolic learning algorithm to infer DFAs solely from the observation of a trained network's input-output behavior. Thus, this method has the potential to increase the fidelity of the extracted knowledge.
Collapse
Affiliation(s)
- A Vahed
- Department of Computer Science, University of the Western Cape, Bellville, South Africa.
| | | |
Collapse
|
34
|
Tino P, Cernanský M, Benusková L. Markovian Architectural Bias of Recurrent Neural Networks. ACTA ACUST UNITED AC 2004; 15:6-15. [PMID: 15387243 DOI: 10.1109/tnn.2003.820839] [Citation(s) in RCA: 138] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
In this paper, we elaborate upon the claim that clustering in the recurrent layer of recurrent neural networks (RNNs) reflects meaningful information processing states even prior to training [1], [2]. By concentrating on activation clusters in RNNs, while not throwing away the continuous state space network dynamics, we extract predictive models that we call neural prediction machines (NPMs). When RNNs with sigmoid activation functions are initialized with small weights (a common technique in the RNN community), the clusters of recurrent activations emerging prior to training are indeed meaningful and correspond to Markov prediction contexts. In this case, the extracted NPMs correspond to a class of Markov models, called variable memory length Markov models (VLMMs). In order to appreciate how much information has really been induced during the training, the RNN performance should always be compared with that of VLMMs and NPMs extracted before training as the "null" base models. Our arguments are supported by experiments on a chaotic symbolic sequence and a context-free language with a deep recursive structure. Index Terms-Complex symbolic sequences, information latching problem, iterative function systems, Markov models, recurrent neural networks (RNNs).
Collapse
Affiliation(s)
- Peter Tino
- School of Computer Science, University of Birmingham, Edgbaston, Birmingham, B15 2TT, U.K.
| | | | | |
Collapse
|
35
|
Síma J, Orponen P. General-Purpose Computation with Neural Networks: A Survey of Complexity Theoretic Results. Neural Comput 2003; 15:2727-78. [PMID: 14629867 DOI: 10.1162/089976603322518731] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
We survey and summarize the literature on the computational aspects of neural network models by presenting a detailed taxonomy of the various models according to their complexity theoretic characteristics. The criteria of classification include the architecture of the network (feedforward versus recurrent), time model (discrete versus continuous), state type (binary versus analog), weight constraints (symmetric versus asymmetric), network size (finite nets versus infinite families), and computation type (deterministic versus probabilistic), among others. The underlying results concerning the computational power and complexity issues of perceptron, radial basis function, winner-take-all, and spiking neural networks are briefly surveyed, with pointers to the relevant literature. In our survey, we focus mainly on the digital computation whose inputs and outputs are binary in nature, although their values are quite often encoded as analog neuron states. We omit the important learning issues.
Collapse
Affiliation(s)
- Jirí Síma
- Institute of Computer Science, Academy of Sciences of the Czech Republic, P.O. Box 5, Prague 8, Czech Republic.
| | | |
Collapse
|
36
|
Chalup SK, Blair AD. Incremental training of first order recurrent neural networks to predict a context-sensitive language. Neural Netw 2003; 16:955-72. [PMID: 14692631 DOI: 10.1016/s0893-6080(03)00054-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
In recent years it has been shown that first order recurrent neural networks trained by gradient-descent can learn not only regular but also simple context-free and context-sensitive languages. However, the success rate was generally low and severe instability issues were encountered. The present study examines the hypothesis that a combination of evolutionary hill climbing with incremental learning and a well-balanced training set enables first order recurrent networks to reliably learn context-free and mildly context-sensitive languages. In particular, we trained the networks to predict symbols in string sequences of the context-sensitive language [a(n)b(n)c(n); n > or = 1. Comparative experiments with and without incremental learning indicated that incremental learning can accelerate and facilitate training. Furthermore, incrementally trained networks generally resulted in monotonic trajectories in hidden unit activation space, while the trajectories of non-incrementally trained networks were oscillating. The non-incrementally trained networks were more likely to generalise.
Collapse
Affiliation(s)
- Stephan K Chalup
- School of Electrical Engineering and Computer Science, The University of Newcastle, Callaghan, NSW 2308, Australia.
| | | |
Collapse
|
37
|
Abstract
We have recently shown that when initialized with “small” weights, recurrent neural networks (RNNs) with standard sigmoid-type activation functions are inherently biased toward Markov models; even prior to any training, RNN dynamics can be readily used to extract finite memory machines (Hammer & Tiňo, 2002; Tiňo, Čerňanský, &Beňušková, 2002a, 2002b). Following Christiansen and Chater (1999), we refer to this phenomenon as the architectural bias of RNNs. In this article, we extend our work on the architectural bias in RNNs by performing a rigorous fractal analysis of recurrent activation patterns. We assume the network is driven by sequences obtained by traversing an underlying finite-state transition diagram&a scenario that has been frequently considered in the past, for example, when studying RNN-based learning and implementation of regular grammars and finite-state transducers. We obtain lower and upper bounds on various types of fractal dimensions, such as box counting and Hausdorff dimensions. It turns out that not only can the recurrent activations inside RNNs with small initial weights be explored to build Markovian predictive models, but also the activations form fractal clusters, the dimension of which can be bounded by the scaled entropy of the underlying driving source. The scaling factors are fixed and are given by the RNN parameters.
Collapse
Affiliation(s)
- Peter Tiňo
- Aston University, Birmingham B4 7ET, U.K.,
| | | |
Collapse
|
38
|
Demaris D. DIMENSION CHANGE, COARSE GRAINED CODING AND PATTERN RECOGNITION IN SPATIO-TEMPORAL NONLINEAR SYSTEMS. J Integr Neurosci 2003; 2:71-102. [PMID: 15011278 DOI: 10.1142/s0219635203000160] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2003] [Revised: 04/02/2003] [Indexed: 11/18/2022] Open
Abstract
Several research programs employing spatio-temporal recurrent dynamics and changes in dimensionality have extended the dialog on neural computation and coding beyond classical frameworks such as feed forward and attractor neural networks and feature detectors. Some have emphasized spiking networks, while others emphasize oscillations and synchronization as the locus of computation and coding. In this paper, the formalism of locally connected homogeneous coupled map lattices is described. Its deployment in an extended version of the dynamical recognizer framework is described, and is compared with density coding, computational mechanics, and liquid state machine frameworks for neural computation. A population coding strategy based on coarse graining the continuous valued distribution of all sites in the lattice is developed and examined as a form of dimension reduction. Results on recognition of 3-D objects are reported. In order to better understand the dynamics supporting recognition, measures suggested by these other research programs and computational frameworks were examined. Dynamics trajectories from object recognition trials were examined for correlation with recognition rates and measures of the distance of the representation space statistics between the target objects and noise initial conditions, and the intrinsic separation between different objects in the set to be classified were performed. These results raise questions about the efficacy of density coding as an explanation for the results, and on the validity of recent criticisms that chaotic systems cannot satisfy separation requirements required for real time computation.
Collapse
Affiliation(s)
- David Demaris
- IBM Corporation, 11400 Burnet Rd, Austin, TX 78758, USA.
| |
Collapse
|
39
|
Síma J, Orponen P. Continuous-time symmetric Hopfield nets are computationally universal. Neural Comput 2003; 15:693-733. [PMID: 12620163 DOI: 10.1162/089976603321192130] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
We establish a fundamental result in the theory of computation by continuous-time dynamical systems by showing that systems corresponding to so-called continuous-time symmetric Hopfield nets are capable of general computation. As is well known, such networks have very constrained Lyapunov-function controlled dynamics. Nevertheless, we show that they are universal and efficient computational devices, in the sense that any convergent synchronous fully parallel computation by a recurrent network of n discrete-time binary neurons, with in general asymmetric coupling weights, can be simulated by a symmetric continuous-time Hopfield net containing only 18n + 7 units employing the saturated-linear activation function. Moreover, if the asymmetric network has maximum integer weight size w(max) and converges in discrete time t*, then the corresponding Hopfield net can be designed to operate in continuous time Theta(t*/epsilon) for any epsilon > 0 such that w(max)2(12n) </= epsilon2(1/epsilon). In terms of standard discrete computation models, our result implies that any polynomially space-bounded Turing machine can be simulated by a family of polynomial-size continuous-time symmetric Hopfield nets.
Collapse
Affiliation(s)
- Jirí Síma
- Institute of Computer Science, Academy of Sciences of the Czech Republic, P.O. Box 5 182 07 Prague 8, Czech Republic.
| | | |
Collapse
|
40
|
Gabrijel I, Dobnikar A. On-line identification and reconstruction of finite automata with generalized recurrent neural networks. Neural Netw 2003; 16:101-20. [PMID: 12576110 DOI: 10.1016/s0893-6080(02)00221-6] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
In this paper finite automata are treated as general discrete dynamical systems from the viewpoint of systems theory. The unconditional on-line identification of an unknown finite automaton is the problem considered. A generalized architecture of recurrent neural networks with a corresponding on-line learning scheme is proposed as a solution to the problem. An on-line rule-extraction algorithm is further introduced. The architecture presented, the on-line learning scheme and the on-line rule-extraction method are tested on different, strongly connected automata, ranging from a very simple example with two states only to a more interesting and complex one with 64 states; the results of both training and extraction processes are very promising.
Collapse
Affiliation(s)
- Ivan Gabrijel
- Faculty of Computer and Information Science, University of Ljubljana, Trzaska c. 25, SI-1001, Ljubljana, Slovenia.
| | | |
Collapse
|
41
|
Abstract
A simple associationist neural network learns to factor abstract rules (i.e., grammars) from sequences of arbitrary input symbols by inventing abstract representations that accommodate unseen symbol sets as well as unseen but similar grammars. The neural network is shown to have the ability to transfer grammatical knowledge to both new symbol vocabularies and new grammars. Analysis of the state-space shows that the network learns generalized abstract structures of the input and is not simply memorizing the input strings. These representations are context sensitive, hierarchical, and based on the state variable of the finite-state machines that the neural network has learned. Generalization to new symbol sets or grammars arises from the spatial nature of the internal representations used by the network, allowing new symbol sets to be encoded close to symbol sets that have already been learned in the hidden unit space of the network. The results are counter to the arguments that learning algorithms based on weight adaptation after each exemplar presentation (such as the long term potentiation found in the mammalian nervous system) cannot in principle extract symbolic knowledge from positive examples as prescribed by prevailing human linguistic theory and evolutionary psychology.
Collapse
|
42
|
Rule Extraction from Self-Organizing Networks. ACTA ACUST UNITED AC 2002. [DOI: 10.1007/3-540-46084-5_142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
43
|
|
44
|
Rodriguez P. Simple recurrent networks learn context-free and context-sensitive languages by counting. Neural Comput 2001; 13:2093-118. [PMID: 11516359 DOI: 10.1162/089976601750399326] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
It has been shown that if a recurrent neural network (RNN) learns to process a regular language, one can extract a finite-state machine (FSM) by treating regions of phase-space as FSM states. However, it has also been shown that one can construct an RNN to implement Turing machines by using RNN dynamics as counters. But how does a network learn languages that require counting? Rodriguez, Wiles, and Elman (1999) showed that a simple recurrent network (SRN) can learn to process a simple context-free language (CFL) by counting up and down. This article extends that to show a range of language tasks in which an SRN develops solutions that not only count but also copy and store counting information. In one case, the network stores information like an explicit storage mechanism. In other cases, the network stores information more indirectly in trajectories that are sensitive to slight displacements that depend on context. In this sense, an SRN can learn analog computation as a set of interdependent counters. This demonstrates how SRNs may be an alternative psychological model of language or sequence processing.
Collapse
Affiliation(s)
- P Rodriguez
- Department of Cognitive Science, University of California at San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
45
|
|
46
|
Tino P, Horne BG, Giles CL. Attractive periodic sets in discrete-time recurrent networks (with emphasis on fixed-point stability and bifurcations in two-neuron networks). Neural Comput 2001; 13:1379-414. [PMID: 11387050 DOI: 10.1162/08997660152002898] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
We perform a detailed fixed-point analysis of two-unit recurrent neural networks with sigmoid-shaped transfer functions. Using geometrical arguments in the space of transfer function derivatives, we partition the network state-space into distinct regions corresponding to stability types of the fixed points. Unlike in the previous studies, we do not assume any special form of connectivity pattern between the neurons, and all free parameters are allowed to vary. We also prove that when both neurons have excitatory self-connections and the mutual interaction pattern is the same (i.e., the neurons mutually inhibit or excite themselves), new attractive fixed points are created through the saddle-node bifurcation. Finally, for an N-neuron recurrent network, we give lower bounds on the rate of convergence of attractive periodic points toward the saturation values of neuron activations, as the absolute values of connection weights grow.
Collapse
Affiliation(s)
- P Tino
- Aston University, Birmingham B4 7ET, U.K., and Department of Computer Science and Engineering, Slovak University of Technology, 812 19 Bratislava, Slovakia
| | | | | |
Collapse
|
47
|
|
48
|
Gers FA, Schmidhuber E. LSTM recurrent networks learn simple context-free and context-sensitive languages. IEEE TRANSACTIONS ON NEURAL NETWORKS 2001; 12:1333-40. [PMID: 18249962 DOI: 10.1109/72.963769] [Citation(s) in RCA: 131] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/01/2023]
Abstract
Previous work on learning regular languages from exemplary training sequences showed that long short-term memory (LSTM) outperforms traditional recurrent neural networks (RNNs). We demonstrate LSTMs superior performance on context-free language benchmarks for RNNs, and show that it works even better than previous hardwired or highly specialized architectures. To the best of our knowledge, LSTM variants are also the first RNNs to learn a simple context-sensitive language, namely a(n)b(n)c(n).
Collapse
Affiliation(s)
- F A Gers
- IDSIA, 6928 Manno, Switzerland. ;
| | | |
Collapse
|
49
|
Sima J, Orponen P, Antti-Poika T. On the computational complexity of binary and analog symmetric hopfield nets. Neural Comput 2000; 12:2965-89. [PMID: 11112262 DOI: 10.1162/089976600300014791] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
We investigate the computational properties of finite binary- and analog-state discrete-time symmetric Hopfield nets. For binary networks, we obtain a simulation of convergent asymmetric networks by symmetric networks with only a linear increase in network size and computation time. Then we analyze the convergence time of Hopfield nets in terms of the length of their bit representations. Here we construct an analog symmetric network whose convergence time exceeds the convergence time of any binary Hopfield net with the same representation length. Further, we prove that the MIN ENERGY problem for analog Hopfield nets is NP-hard and provide a polynomial time approximation algorithm for this problem in the case of binary nets. Finally, we show that symmetric analog nets with an external clock are computationally Turing universal.
Collapse
Affiliation(s)
- J Sima
- Department of Theoretical Computer Science, Institute of Computer Science, Academy of Sciences of the Czech Republic, P.O. Box 5, 182 07 Prague 8, Czech Republic
| | | | | |
Collapse
|
50
|
Carrasco RC, Forcada ML, Valdés-Muñoz MA, Neco RP. Stable encoding of finite-state machines in discrete-time recurrent neural nets with sigmoid units. Neural Comput 2000; 12:2129-74. [PMID: 10976142 DOI: 10.1162/089976600300015097] [Citation(s) in RCA: 23] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
There has been a lot of interest in the use of discrete-time recurrent neural nets (DTRNN) to learn finite-state tasks, with interesting results regarding the induction of simple finite-state machines from input-output strings. Parallel work has studied the computational power of DTRNN in connection with finite-state computation. This article describes a simple strategy to devise stable encodings of finite-state machines in computationally capable discrete-time recurrent neural architectures with sigmoid units and gives a detailed presentation on how this strategy may be applied to encode a general class of finite-state machines in a variety of commonly used first- and second-order recurrent neural networks. Unlike previous work that either imposed some restrictions to state values or used a detailed analysis based on fixed-point attractors, our approach applies to any positive, bounded, strictly growing, continuous activation function and uses simple bounding criteria based on a study of the conditions under which a proposed encoding scheme guarantees that the DTRNN is actually behaving as a finite-state machine.
Collapse
Affiliation(s)
- R C Carrasco
- Departament de Llenguatges i Sistemes Informàtics, Universitat d'Alacant, E-03071 Alacant, Spain
| | | | | | | |
Collapse
|