1
|
Dulberg Z, Dubey R, Berwian IM, Cohen JD. Having multiple selves helps learning agents explore and adapt in complex changing worlds. Proc Natl Acad Sci U S A 2023; 120:e2221180120. [PMID: 37399387 PMCID: PMC10334746 DOI: 10.1073/pnas.2221180120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 05/09/2023] [Indexed: 07/05/2023] Open
Abstract
Satisfying a variety of conflicting needs in a changing environment is a fundamental challenge for any adaptive agent. Here, we show that designing an agent in a modular fashion as a collection of subagents, each dedicated to a separate need, powerfully enhanced the agent's capacity to satisfy its overall needs. We used the formalism of deep reinforcement learning to investigate a biologically relevant multiobjective task: continually maintaining homeostasis of a set of physiologic variables. We then conducted simulations in a variety of environments and compared how modular agents performed relative to standard monolithic agents (i.e., agents that aimed to satisfy all needs in an integrated manner using a single aggregate measure of success). Simulations revealed that modular agents a) exhibited a form of exploration that was intrinsic and emergent rather than extrinsically imposed; b) were robust to changes in nonstationary environments, and c) scaled gracefully in their ability to maintain homeostasis as the number of conflicting objectives increased. Supporting analysis suggested that the robustness to changing environments and increasing numbers of needs were due to intrinsic exploration and efficiency of representation afforded by the modular architecture. These results suggest that the normative principles by which agents have adapted to complex changing environments may also explain why humans have long been described as consisting of "multiple selves."
Collapse
Affiliation(s)
- Zack Dulberg
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ08544
| | - Rachit Dubey
- Department of Computer Science, Princeton University, Princeton, NJ08544
| | - Isabel M. Berwian
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ08544
| | - Jonathan D. Cohen
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ08544
| |
Collapse
|
2
|
Abstract
AbstractLearning from demonstration, or imitation learning, is the process of learning to act in an environment from examples provided by a teacher. Inverse reinforcement learning (IRL) is a specific form of learning from demonstration that attempts to estimate the reward function of a Markov decision process from examples provided by the teacher. The reward function is often considered the most succinct description of a task. In simple applications, the reward function may be known or easily derived from properties of the system and hard coded into the learning process. However, in complex applications, this may not be possible, and it may be easier to learn the reward function by observing the actions of the teacher. This paper provides a comprehensive survey of the literature on IRL. This survey outlines the differences between IRL and two similar methods - apprenticeship learning and inverse optimal control. Further, this survey organizes the IRL literature based on the principal method, describes applications of IRL algorithms, and provides areas of future research.
Collapse
|
3
|
Arora S, Doshi P. A survey of inverse reinforcement learning: Challenges, methods and progress. ARTIF INTELL 2021. [DOI: 10.1016/j.artint.2021.103500] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
4
|
Ballard DH, Zhang R. The Hierarchical Evolution in Human Vision Modeling. Top Cogn Sci 2021; 13:309-328. [PMID: 33838010 PMCID: PMC9462461 DOI: 10.1111/tops.12527] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Revised: 02/22/2021] [Accepted: 02/22/2021] [Indexed: 11/30/2022]
Abstract
Computational models of primate vision took a significant advance with David Marr's tripartite separation of the vision enterprise into the problem formulation, algorithm, and neural implementation; however, many subsequent parallel developments in robotics and modeling greatly refined the algorithm descriptions into very distinct levels that complement each other. This review traces the time course of these developments and shows how the current perspective evolved to have its alternative internal hierarchical organization.
Collapse
Affiliation(s)
- Dana H Ballard
- Department of Computer Science, The University of Texas at Austin
| | - Ruohan Zhang
- Department of Computer Science, The University of Texas at Austin
| |
Collapse
|
5
|
Muryy A, Siddharth N, Nardelli N, Glennerster A, Torr PHS. Lessons from reinforcement learning for biological representations of space. Vision Res 2020; 174:79-93. [PMID: 32683096 DOI: 10.1016/j.visres.2020.05.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 04/26/2020] [Accepted: 05/26/2020] [Indexed: 10/23/2022]
Abstract
Neuroscientists postulate 3D representations in the brain in a variety of different coordinate frames (e.g. 'head-centred', 'hand-centred' and 'world-based'). Recent advances in reinforcement learning demonstrate a quite different approach that may provide a more promising model for biological representations underlying spatial perception and navigation. In this paper, we focus on reinforcement learning methods that reward an agent for arriving at a target image without any attempt to build up a 3D 'map'. We test the ability of this type of representation to support geometrically consistent spatial tasks such as interpolating between learned locations using decoding of feature vectors. We introduce a hand-crafted representation that has, by design, a high degree of geometric consistency and demonstrate that, in this case, information about the persistence of features as the camera translates (e.g. distant features persist) can improve performance on the geometric tasks. These examples avoid Cartesian (in this case, 2D) representations of space. Non-Cartesian, learned representations provide an important stimulus in neuroscience to the search for alternatives to a 'cognitive map'.
Collapse
Affiliation(s)
- Alex Muryy
- School of Psychology and Clinical Language Sciences, University of Reading, UK
| | - N Siddharth
- Department of Engineering Science, University of Oxford, UK
| | | | - Andrew Glennerster
- School of Psychology and Clinical Language Sciences, University of Reading, UK.
| | | |
Collapse
|
6
|
Bhattacharyya R, Hazarika SM. A knowledge-driven layered inverse reinforcement learning approach for recognizing human intents. J EXP THEOR ARTIF IN 2020. [DOI: 10.1080/0952813x.2020.1718773] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- R. Bhattacharyya
- Computer Science and Engineering, Indian Institute of Information Technology Bhagalpur, Bihar, India
| | - S. M. Hazarika
- Biomimetic Robotics and Artificial Intelligence Lab, Mechanical Engineering, Indian Institute of Technology Guwahati, Assam, India
| |
Collapse
|
7
|
Zhang R, Zhang S, Tong MH, Cui Y, Rothkopf CA, Ballard DH, Hayhoe MM. Modeling sensory-motor decisions in natural behavior. PLoS Comput Biol 2018; 14:e1006518. [PMID: 30359364 PMCID: PMC6219815 DOI: 10.1371/journal.pcbi.1006518] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Revised: 11/06/2018] [Accepted: 09/18/2018] [Indexed: 11/18/2022] Open
Abstract
Although a standard reinforcement learning model can capture many aspects of reward-seeking behaviors, it may not be practical for modeling human natural behaviors because of the richness of dynamic environments and limitations in cognitive resources. We propose a modular reinforcement learning model that addresses these factors. Based on this model, a modular inverse reinforcement learning algorithm is developed to estimate both the rewards and discount factors from human behavioral data, which allows predictions of human navigation behaviors in virtual reality with high accuracy across different subjects and with different tasks. Complex human navigation trajectories in novel environments can be reproduced by an artificial agent that is based on the modular model. This model provides a strategy for estimating the subjective value of actions and how they influence sensory-motor decisions in natural behavior.
Collapse
Affiliation(s)
- Ruohan Zhang
- Department of Computer Science, The University of Texas at Austin, Austin, TX, USA
| | - Shun Zhang
- Computer Science and Engineering, University of Michigan, Ann Arbor, MI, USA
| | - Matthew H. Tong
- Center for Perceptual Systems, The University of Texas at Austin, Austin, TX, USA
| | - Yuchen Cui
- Department of Computer Science, The University of Texas at Austin, Austin, TX, USA
| | - Constantin A. Rothkopf
- Cognitive Science Center and Institute of Psychology, Technical University Darmstadt, Darmstadt, Germany
| | - Dana H. Ballard
- Department of Computer Science, The University of Texas at Austin, Austin, TX, USA
| | - Mary M. Hayhoe
- Center for Perceptual Systems, The University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
8
|
Hayhoe MM, Matthis JS. Control of gaze in natural environments: effects of rewards and costs, uncertainty and memory in target selection. Interface Focus 2018; 8:20180009. [PMID: 29951189 DOI: 10.1098/rsfs.2018.0009] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/08/2018] [Indexed: 11/12/2022] Open
Abstract
The development of better eye and body tracking systems, and more flexible virtual environments have allowed more systematic exploration of natural vision and contributed a number of insights. In natural visually guided behaviour, humans make continuous sequences of sensory-motor decisions to satisfy current goals, and the role of vision is to provide the relevant information in order to achieve those goals. This paper reviews the factors that control gaze in natural visually guided actions such as locomotion, including the rewards and costs associated with the immediate behavioural goals, uncertainty about the state of the world and prior knowledge of the environment. These general features of human gaze control may inform the development of artificial systems.
Collapse
Affiliation(s)
- Mary M Hayhoe
- Center for Perceptual Systems, University of Texas Austin, Austin, TX, USA
| | | |
Collapse
|
9
|
Yamaguchi S, Naoki H, Ikeda M, Tsukada Y, Nakano S, Mori I, Ishii S. Identification of animal behavioral strategies by inverse reinforcement learning. PLoS Comput Biol 2018; 14:e1006122. [PMID: 29718905 PMCID: PMC5951592 DOI: 10.1371/journal.pcbi.1006122] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Revised: 05/14/2018] [Accepted: 04/03/2018] [Indexed: 11/18/2022] Open
Abstract
Animals are able to reach a desired state in an environment by controlling various behavioral patterns. Identification of the behavioral strategy used for this control is important for understanding animals’ decision-making and is fundamental to dissect information processing done by the nervous system. However, methods for quantifying such behavioral strategies have not been fully established. In this study, we developed an inverse reinforcement-learning (IRL) framework to identify an animal’s behavioral strategy from behavioral time-series data. We applied this framework to C. elegans thermotactic behavior; after cultivation at a constant temperature with or without food, fed worms prefer, while starved worms avoid the cultivation temperature on a thermal gradient. Our IRL approach revealed that the fed worms used both the absolute temperature and its temporal derivative and that their behavior involved two strategies: directed migration (DM) and isothermal migration (IM). With DM, worms efficiently reached specific temperatures, which explains their thermotactic behavior when fed. With IM, worms moved along a constant temperature, which reflects isothermal tracking, well-observed in previous studies. In contrast to fed animals, starved worms escaped the cultivation temperature using only the absolute, but not the temporal derivative of temperature. We also investigated the neural basis underlying these strategies, by applying our method to thermosensory neuron-deficient worms. Thus, our IRL-based approach is useful in identifying animal strategies from behavioral time-series data and could be applied to a wide range of behavioral studies, including decision-making, in other organisms. Understanding animal decision-making has been a fundamental problem in neuroscience and behavioral ecology. Many studies have analyzed the actions representing decision-making in behavioral tasks, in which rewards are artificially designed with specific objectives. However, it is impossible to extend this artificially designed experiment to a natural environment, as in the latter, the rewards for freely-behaving animals cannot be clearly defined. To this end, we sought to reverse the current paradigm so that rewards could be identified from behavioral data. Here, we propose a new reverse-engineering approach (inverse reinforcement learning), which can estimate a behavioral strategy from time-series data of freely-behaving animals. By applying this technique on C. elegans thermotaxis, we successfully identified the respective reward-based behavioral strategy.
Collapse
Affiliation(s)
- Shoichiro Yamaguchi
- Integrated Systems Biology Laboratory, Graduate School of Informatics, Kyoto University, Sakyo, Kyoto, Japan
| | - Honda Naoki
- Laboratory of Theoretical Biology, Graduate School of Biostudies, Kyoto University, Yoshidakonoecho, Sakyo, Kyoto, Japan
- Data-driven Modeling Team, Research Center for Dynamic Living Systems, Graduate School of Biostudies, Kyoto University, Yoshidakonoecho, Sakyo, Kyoto, Japan
- * E-mail:
| | - Muneki Ikeda
- Group of Molecular Neurobiology, Graduate School of Science, Nagoya University, Furoucho, Chikusa, Nagoya, Aichi, Japan
| | - Yuki Tsukada
- Group of Molecular Neurobiology, Graduate School of Science, Nagoya University, Furoucho, Chikusa, Nagoya, Aichi, Japan
| | - Shunji Nakano
- Group of Molecular Neurobiology, Graduate School of Science, Nagoya University, Furoucho, Chikusa, Nagoya, Aichi, Japan
| | - Ikue Mori
- Group of Molecular Neurobiology, Graduate School of Science, Nagoya University, Furoucho, Chikusa, Nagoya, Aichi, Japan
| | - Shin Ishii
- Integrated Systems Biology Laboratory, Graduate School of Informatics, Kyoto University, Sakyo, Kyoto, Japan
| |
Collapse
|
10
|
Hayhoe MM. Davida Teller Award Lecture 2017: What can be learned from natural behavior? J Vis 2018; 18:10. [PMID: 29710300 PMCID: PMC5895074 DOI: 10.1167/18.4.10] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Accepted: 02/05/2018] [Indexed: 11/25/2022] Open
Abstract
The essentially active nature of vision has long been acknowledged but has been difficult to investigate because of limitations in the available instrumentation, both for measuring eye and body movements and for presenting realistic stimuli in the context of active behavior. These limitations have been substantially reduced in recent years, opening up a wider range of contexts where experimental control is possible. Given this, it is important to examine just what the benefits are for exploring natural vision, with its attendant disadvantages. Work over the last two decades provides insights into these benefits. Natural behavior turns out to be a rich domain for investigation, as it is remarkably stable and opens up new questions, and the behavioral context helps specify the momentary visual computations and their temporal evolution.
Collapse
Affiliation(s)
- Mary M Hayhoe
- Center for Perceptual Systems, University of Texas Austin, Austin, TX, USA
| |
Collapse
|
11
|
Muelling K, Boularias A, Mohler B, Schölkopf B, Peters J. Learning strategies in table tennis using inverse reinforcement learning. BIOLOGICAL CYBERNETICS 2014; 108:603-619. [PMID: 24756167 DOI: 10.1007/s00422-014-0599-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2013] [Accepted: 03/20/2014] [Indexed: 06/03/2023]
Abstract
Learning a complex task such as table tennis is a challenging problem for both robots and humans. Even after acquiring the necessary motor skills, a strategy is needed to choose where and how to return the ball to the opponent's court in order to win the game. The data-driven identification of basic strategies in interactive tasks, such as table tennis, is a largely unexplored problem. In this paper, we suggest a computational model for representing and inferring strategies, based on a Markov decision problem, where the reward function models the goal of the task as well as the strategic information. We show how this reward function can be discovered from demonstrations of table tennis matches using model-free inverse reinforcement learning. The resulting framework allows to identify basic elements on which the selection of striking movements is based. We tested our approach on data collected from players with different playing styles and under different playing conditions. The estimated reward function was able to capture expert-specific strategic information that sufficed to distinguish the expert among players with different skill levels as well as different playing styles.
Collapse
Affiliation(s)
- Katharina Muelling
- Max Planck Institute for Intelligent Systems, Spemannstr. 38, 72076 , Tuebingen, Germany,
| | | | | | | | | |
Collapse
|
12
|
Sullivan BT, Johnson L, Rothkopf CA, Ballard D, Hayhoe M. The role of uncertainty and reward on eye movements in a virtual driving task. J Vis 2012; 12:19. [PMID: 23262151 DOI: 10.1167/12.13.19] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Eye movements during natural tasks are well coordinated with ongoing task demands and many variables could influence gaze strategies. Sprague and Ballard (2003) proposed a gaze-scheduling model that uses a utility-weighted uncertainty metric to prioritize fixations on task-relevant objects and predicted that human gaze should be influenced by both reward structure and task-relevant uncertainties. To test this conjecture, we tracked the eye movements of participants in a simulated driving task where uncertainty and implicit reward (via task priority) were varied. Participants were instructed to simultaneously perform a Follow Task where they followed a lead car at a specific distance and a Speed Task where they drove at an exact speed. We varied implicit reward by instructing the participants to emphasize one task over the other and varied uncertainty in the Speed Task with the presence or absence of uniform noise added to the car's velocity. Subjects' gaze data were classified for the image content near fixation and segmented into looks. Gaze measures, including look proportion, duration and interlook interval, showed that drivers more closely monitor the speedometer if it had a high level of uncertainty, but only if it was also associated with high task priority or implicit reward. The interaction observed appears to be an example of a simple mechanism whereby the reduction of visual uncertainty is gated by behavioral relevance. This lends qualitative support for the primary variables controlling gaze allocation proposed in the Sprague and Ballard model.
Collapse
Affiliation(s)
- Brian T Sullivan
- Smith-Kettlewell Eye Research Institute, San Francisco, CA, USA.
| | | | | | | | | |
Collapse
|