1
|
Alonso A, Kirkegaard JB. Learning optimal integration of spatial and temporal information in noisy chemotaxis. PNAS NEXUS 2024; 3:pgae235. [PMID: 38952456 PMCID: PMC11216223 DOI: 10.1093/pnasnexus/pgae235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 06/06/2024] [Indexed: 07/03/2024]
Abstract
We investigate the boundary between chemotaxis driven by spatial estimation of gradients and chemotaxis driven by temporal estimation. While it is well known that spatial chemotaxis becomes disadvantageous for small organisms at high noise levels, it is unclear whether there is a discontinuous switch of optimal strategies or a continuous transition exists. Here, we employ deep reinforcement learning to study the possible integration of spatial and temporal information in an a priori unconstrained manner. We parameterize such a combined chemotactic policy by a recurrent neural network and evaluate it using a minimal theoretical model of a chemotactic cell. By comparing with constrained variants of the policy, we show that it converges to purely temporal and spatial strategies at small and large cell sizes, respectively. We find that the transition between the regimes is continuous, with the combined strategy outperforming in the transition region both the constrained variants as well as models that explicitly integrate spatial and temporal information. Finally, by utilizing the attribution method of integrated gradients, we show that the policy relies on a nontrivial combination of spatially and temporally derived gradient information in a ratio that varies dynamically during the chemotactic trajectories.
Collapse
Affiliation(s)
- Albert Alonso
- Niels Bohr Institute, University of Copenhagen, Copenhagen 2100, Denmark
| | - Julius B Kirkegaard
- Niels Bohr Institute, University of Copenhagen, Copenhagen 2100, Denmark
- Department of Computer Science, University of Copenhagen, Copenhagen 2100, Denmark
| |
Collapse
|
2
|
Rando M, James M, Verri A, Rosasco L, Seminara A. Q-Learning to navigate turbulence without a map. ARXIV 2024:arXiv:2404.17495v1. [PMID: 38711433 PMCID: PMC11071615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
We consider the problem of olfactory searches in a turbulent environment. We focus on agents that respond solely to odor stimuli, with no access to spatial perception nor prior information about the odor location. We ask whether navigation strategies to a target can be learned robustly within a sequential decision making framework. We develop a reinforcement learning algorithm using a small set of interpretable olfactory states and train it with realistic turbulent odor cues. By introducing a temporal memory, we demonstrate that two salient features of odor traces, discretized in few olfactory states, are sufficient to learn navigation in a realistic odor plume. Performance is dictated by the sparse nature of turbulent plumes. An optimal memory exists which ignores blanks within the plume and activates a recovery strategy outside the plume. We obtain the best performance by letting agents learn their recovery strategy and show that it is mostly casting cross wind, similar to behavior observed in flying insects. The optimal strategy is robust to substantial changes in the odor plumes, suggesting minor parameter tuning may be sufficient to adapt to different environments.
Collapse
|
3
|
Stupski SD, van Breugel F. Wind Gates Search States in Free Flight. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.30.569086. [PMID: 38076971 PMCID: PMC10705368 DOI: 10.1101/2023.11.30.569086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2023]
Abstract
For any organism tracking a chemical cue to its source, the motion of its surrounding fluid provides crucial information for success. For both swimming and flying animals engaged in olfaction driven search, turning into the direction of oncoming wind or water current is often a critical first step 1, 2 . However, in nature, wind and water currents may not always provide a reliable directional cue 3, 4, 5 . It is unclear how organisms adjust their search strategies accordingly due to the challenges of separately controlling flow and chemical encounters. Here, we use the genetic toolkit of Drosophila melanogaster , a model organism for olfaction 6 , to develop an optogenetic paradigm to deliver temporally precise "virtual" olfactory experiences in free-flying animals while independently manipulating the wind conditions. We show that in free flight, Drosophila melanogaster adopt distinct search routines that are gated by whether they are flying in laminar wind or in still air. We first confirm that in laminar wind flies turn upwind, and further, we show that they achieve this using a rapid turn. In still air, flies adopt remarkably stereotyped "sink and circle" search state characterized by ∼60°turns at 3-4 Hz, biased in a consistent direction. In both laminar wind and still air, immediately after odor onset, flies decelerate and often perform a rapid turn. Both maneuvers are consistent with predictions from recent control theoretic analyses for how insects may estimate properties of wind while in flight 7, 8 . We suggest that flies may use their deceleration and "anemometric" turn as active sensing maneuvers to rapidly gauge properties of their wind environment before initiating a proximal or upwind search routine.
Collapse
|
4
|
Hennig JA, Romero Pinto SA, Yamaguchi T, Linderman SW, Uchida N, Gershman SJ. Emergence of belief-like representations through reinforcement learning. PLoS Comput Biol 2023; 19:e1011067. [PMID: 37695776 PMCID: PMC10513382 DOI: 10.1371/journal.pcbi.1011067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 09/21/2023] [Accepted: 08/27/2023] [Indexed: 09/13/2023] Open
Abstract
To behave adaptively, animals must learn to predict future reward, or value. To do this, animals are thought to learn reward predictions using reinforcement learning. However, in contrast to classical models, animals must learn to estimate value using only incomplete state information. Previous work suggests that animals estimate value in partially observable tasks by first forming "beliefs"-optimal Bayesian estimates of the hidden states in the task. Although this is one way to solve the problem of partial observability, it is not the only way, nor is it the most computationally scalable solution in complex, real-world environments. Here we show that a recurrent neural network (RNN) can learn to estimate value directly from observations, generating reward prediction errors that resemble those observed experimentally, without any explicit objective of estimating beliefs. We integrate statistical, functional, and dynamical systems perspectives on beliefs to show that the RNN's learned representation encodes belief information, but only when the RNN's capacity is sufficiently large. These results illustrate how animals can estimate value in tasks without explicitly estimating beliefs, yielding a representation useful for systems with limited capacity.
Collapse
Affiliation(s)
- Jay A. Hennig
- Department of Psychology, Harvard University, Cambridge, Massachusetts, United States of America
- Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
| | - Sandra A. Romero Pinto
- Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
- Program in Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, Massachusetts, USA
| | - Takahiro Yamaguchi
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
- Future Research Department, Toyota Research Institute of North America, Toyota Motor North America, Ann Arbor, Michigan, United States of America
| | - Scott W. Linderman
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, California, United States of America
- Department of Statistics, Stanford University, Stanford, California, United States of America
| | - Naoshige Uchida
- Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Samuel J. Gershman
- Department of Psychology, Harvard University, Cambridge, Massachusetts, United States of America
- Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
5
|
Loisy A, Heinonen RA. Deep reinforcement learning for the olfactory search POMDP: a quantitative benchmark. THE EUROPEAN PHYSICAL JOURNAL. E, SOFT MATTER 2023; 46:17. [PMID: 36939979 DOI: 10.1140/epje/s10189-023-00277-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 03/05/2023] [Indexed: 06/18/2023]
Abstract
The olfactory search POMDP (partially observable Markov decision process) is a sequential decision-making problem designed to mimic the task faced by insects searching for a source of odor in turbulence, and its solutions have applications to sniffer robots. As exact solutions are out of reach, the challenge consists in finding the best possible approximate solutions while keeping the computational cost reasonable. We provide a quantitative benchmarking of a solver based on deep reinforcement learning against traditional POMDP approximate solvers. We show that deep reinforcement learning is a competitive alternative to standard methods, in particular to generate lightweight policies suitable for robots.
Collapse
Affiliation(s)
- Aurore Loisy
- Aix Marseille Univ, CNRS, Centrale Marseille, IRPHE, Marseille, France.
| | - Robin A Heinonen
- Department of Physics and INFN, University of Rome "Tor Vergata", Via della Ricerca Scientifica 1, 00133, Rome, Italy.
| |
Collapse
|