1
|
Mueller D, Giglio E, Chen CS, Holm A, Ebitz RB, Grissom NM. Touchscreen response precision is sensitive to the explore/exploit tradeoff. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.23.619903. [PMID: 39484597 PMCID: PMC11526980 DOI: 10.1101/2024.10.23.619903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
The explore/exploit tradeoff is a fundamental property of choice selection during reward-guided decision making. In perceptual decision making, higher certainty decisions are more motorically precise, even when the decision does not require motor accuracy. However, while we can parametrically control uncertainty in perceptual tasks, we do not know what variables - if any - shape motor precision and reflect subjective certainty during reward-guided decision making. Touchscreens are increasingly used across species to measure choice, but provide no tactile feedback on whether an action is precise or not, and therefore provide a valuable opportunity to determine whether actions differ in precision due to explore/exploit state, reward, or individual variables. We find all three of these factors exert independent drives towards increased precision. During exploit states, successive touches to the same choice are closer together than those made in an explore state, consistent with exploit states reflecting higher certainty and/or motor stereotypy in responding. However, exploit decisions might be expected to be rewarded more frequently than explore decisions. We find that exploit choice precision is increased independently of a separate increase in precision due to immediate past reward, suggesting multiple mechanisms regulating choice precision. Finally, we see evidence that male mice in general are less precise in their interactions with the touchscreen than females, even when exploiting a choice. These results suggest that as exploit behavior emerges in reward-guided decision making, individuals become more motorically precise reflecting increased certainty, even when decision choice does not require additional motor accuracy, but this is influenced by individual differences and prior reward. These data uncover the hidden potential for touchscreen tasks in any species to uncover the latent neural states that unite cognition and movement.
Collapse
Affiliation(s)
- Dana Mueller
- Department of Psychology, University of Minnesota, Minneapolis MN 55455
| | - Erin Giglio
- Department of Psychology, University of Minnesota, Minneapolis MN 55455
| | - Cathy S Chen
- Department of Psychology, University of Minnesota, Minneapolis MN 55455
| | - Aspen Holm
- Department of Psychology, University of Minnesota, Minneapolis MN 55455
| | - R Becket Ebitz
- Department of Neurosciences, Université de Montréal, Quebec, Canada
| | - Nicola M Grissom
- Department of Psychology, University of Minnesota, Minneapolis MN 55455
| |
Collapse
|
2
|
Venditto SJC, Miller KJ, Brody CD, Daw ND. Dynamic reinforcement learning reveals time-dependent shifts in strategy during reward learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.28.582617. [PMID: 38464244 PMCID: PMC10925334 DOI: 10.1101/2024.02.28.582617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Different brain systems have been hypothesized to subserve multiple "experts" that compete to generate behavior. In reinforcement learning, two general processes, one model-free (MF) and one model-based (MB), are often modeled as a mixture of agents (MoA) and hypothesized to capture differences between automaticity vs. deliberation. However, shifts in strategy cannot be captured by a static MoA. To investigate such dynamics, we present the mixture-of-agents hidden Markov model (MoA-HMM), which simultaneously learns inferred action values from a set of agents and the temporal dynamics of underlying "hidden" states that capture shifts in agent contributions over time. Applying this model to a multi-step, reward-guided task in rats reveals a progression of within-session strategies: a shift from initial MB exploration to MB exploitation, and finally to reduced engagement. The inferred states predict changes in both response time and OFC neural encoding during the task, suggesting that these states are capturing real shifts in dynamics.
Collapse
|
3
|
Yan X, Ebitz RB, Grissom N, Darrow DP, Herman AB. Distinct computational mechanisms of uncertainty processing explain opposing exploratory behaviors in anxiety and apathy. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.04.597412. [PMID: 38895240 PMCID: PMC11185698 DOI: 10.1101/2024.06.04.597412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Decision-making in uncertain environments often leads to varied outcomes. Understanding how individuals interpret the causes of unexpected feedback is crucial for adaptive behavior and mental well-being. Uncertainty can be broadly categorized into two components: volatility and stochasticity. Volatility is about how quickly conditions change, impacting results. Stochasticity, on the other hand, refers to outcomes affected by random chance or "luck". Understanding these factors enables individuals to have more effective environmental analysis and strategy implementation (explore or exploit) for future decisions. This study investigates how anxiety and apathy, two prevalent affective states, influence the perceptions of uncertainty and exploratory behavior. Participants (N = 1001) completed a restless three-armed bandit task that was analyzed using latent state models. Anxious individuals perceived uncertainty as more volatile, leading to increased exploration and learning rates, especially after reward omission. Conversely, apathetic individuals viewed uncertainty as more stochastic, resulting in decreased exploration and learning rates. The perceived volatility-to-stochasticity ratio mediated the anxiety-exploration relationship post-adverse outcomes. Dimensionality reduction showed exploration and uncertainty estimation to be distinct but related latent factors shaping a manifold of adaptive behavior that is modulated by anxiety and apathy. These findings reveal distinct computational mechanisms for how anxiety and apathy influence decision-making, providing a framework for understanding cognitive and affective processes in neuropsychiatric disorders.
Collapse
Affiliation(s)
- Xinyuan Yan
- Department of Psychiatry and Behavioral Sciences, University of Minnesota, Minneapolis, MN 55455, USA
| | - R. Becket Ebitz
- Department of Neuroscience, Universite de Montreal, 2900 Edouard Montpetit Blvd, Montreal, Quebec H3T 1J4, Canada
| | - Nicola Grissom
- Department of Psychology, University of Minnesota, 75 E River Rd, Minneapolis, MN 55455, USA
| | - David P. Darrow
- Department of Neurosurgery, University of Minnesota, Minneapolis, MN 55455, USA
| | - Alexander B. Herman
- Department of Psychiatry and Behavioral Sciences, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
4
|
Goudar V, Kim JW, Liu Y, Dede AJO, Jutras MJ, Skelin I, Ruvalcaba M, Chang W, Ram B, Fairhall AL, Lin JJ, Knight RT, Buffalo EA, Wang XJ. A Comparison of Rapid Rule-Learning Strategies in Humans and Monkeys. J Neurosci 2024; 44:e0231232024. [PMID: 38871463 PMCID: PMC11236592 DOI: 10.1523/jneurosci.0231-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 05/28/2024] [Accepted: 05/31/2024] [Indexed: 06/15/2024] Open
Abstract
Interspecies comparisons are key to deriving an understanding of the behavioral and neural correlates of human cognition from animal models. We perform a detailed comparison of the strategies of female macaque monkeys to male and female humans on a variant of the Wisconsin Card Sorting Test (WCST), a widely studied and applied task that provides a multiattribute measure of cognitive function and depends on the frontal lobe. WCST performance requires the inference of a rule change given ambiguous feedback. We found that well-trained monkeys infer new rules three times more slowly than minimally instructed humans. Input-dependent hidden Markov model-generalized linear models were fit to their choices, revealing hidden states akin to feature-based attention in both species. Decision processes resembled a win-stay, lose-shift strategy with interspecies similarities as well as key differences. Monkeys and humans both test multiple rule hypotheses over a series of rule-search trials and perform inference-like computations to exclude candidate choice options. We quantitatively show that perseveration, random exploration, and poor sensitivity to negative feedback account for the slower task-switching performance in monkeys.
Collapse
Affiliation(s)
- Vishwa Goudar
- Center for Neural Science, New York University, New York 10003
| | - Jeong-Woo Kim
- Center for Neural Science, New York University, New York 10003
| | - Yue Liu
- Center for Neural Science, New York University, New York 10003
| | - Adam J O Dede
- Department of Physiology and Biophysics, University of Washington, Seattle, Washington 98195
| | - Michael J Jutras
- Department of Physiology and Biophysics, University of Washington, Seattle, Washington 98195
| | - Ivan Skelin
- Department of Neurology, University of California, Davis, California 95616
- The Center for Mind and Brain, University of California, Davis, California 95616
| | - Michael Ruvalcaba
- Helen Wills Neuroscience Institute, University of California, Berkeley, California 94720
| | - William Chang
- Helen Wills Neuroscience Institute, University of California, Berkeley, California 94720
| | - Bhargavi Ram
- Department of Neurology, University of California, Davis, California 95616
- The Center for Mind and Brain, University of California, Davis, California 95616
| | - Adrienne L Fairhall
- Department of Physiology and Biophysics, University of Washington, Seattle, Washington 98195
| | - Jack J Lin
- Department of Neurology, University of California, Davis, California 95616
- The Center for Mind and Brain, University of California, Davis, California 95616
| | - Robert T Knight
- Helen Wills Neuroscience Institute, University of California, Berkeley, California 94720
- Department of Psychology, University of California, Berkeley, California 94720
| | - Elizabeth A Buffalo
- Department of Physiology and Biophysics, University of Washington, Seattle, Washington 98195
- Washington Primate Research Center, University of Washington, Seattle, Washington 98195
| | - Xiao-Jing Wang
- Center for Neural Science, New York University, New York 10003
| |
Collapse
|
5
|
Zid M, Laurie VJ, Levine-Champagne A, Shourkeshti A, Harrell D, Herman AB, Ebitz RB. Humans forage for reward in reinforcement learning tasks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.08.602539. [PMID: 39026817 PMCID: PMC11257465 DOI: 10.1101/2024.07.08.602539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
How do we make good decisions in uncertain environments? In psychology and neuroscience, the classic answer is that we calculate the value of each option and then compare the values to choose the most rewarding, modulo some exploratory noise. An ethologist, conversely, would argue that we commit to one option until its value drops below a threshold, at which point we start exploring other options. In order to determine which view better describes human decision-making, we developed a novel, foraging-inspired sequential decision-making model and used it to ask whether humans compare to threshold ("Forage") or compare alternatives ("Reinforcement-Learn" [RL]). We found that the foraging model was a better fit for participant behavior, better predicted the participants' tendency to repeat choices, and predicted the existence of held-out participants with a pattern of choice that was almost impossible under RL. Together, these results suggest that humans use foraging computations, rather than RL, even in classic reinforcement learning tasks.
Collapse
Affiliation(s)
- Meriam Zid
- Department of Neuroscience, University of Montreal, Montreal, QC , H3T 1J4, Canada
| | - Veldon-James Laurie
- Department of Neuroscience, University of Montreal, Montreal, QC , H3T 1J4, Canada
| | | | - Akram Shourkeshti
- Department of Neuroscience, University of Montreal, Montreal, QC , H3T 1J4, Canada
| | - Dameon Harrell
- Department of Psychiatry, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Alexander B. Herman
- Department of Psychiatry, University of Minnesota, Minneapolis, MN, 55455, USA
| | - R. Becket Ebitz
- Department of Neuroscience, University of Montreal, Montreal, QC , H3T 1J4, Canada
| |
Collapse
|
6
|
Atlan G, Matosevich N, Peretz-Rivlin N, Marsh-Yvgi I, Zelinger N, Chen E, Kleinman T, Bleistein N, Sheinbach E, Groysman M, Nir Y, Citri A. Claustrum neurons projecting to the anterior cingulate restrict engagement during sleep and behavior. Nat Commun 2024; 15:5415. [PMID: 38926345 PMCID: PMC11208603 DOI: 10.1038/s41467-024-48829-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Accepted: 05/14/2024] [Indexed: 06/28/2024] Open
Abstract
The claustrum has been linked to attention and sleep. We hypothesized that this reflects a shared function, determining responsiveness to stimuli, which spans the axis of engagement. To test this hypothesis, we recorded claustrum population dynamics from male mice during both sleep and an attentional task ('ENGAGE'). Heightened activity in claustrum neurons projecting to the anterior cingulate cortex (ACCp) corresponded to reduced sensory responsiveness during sleep. Similarly, in the ENGAGE task, heightened ACCp activity correlated with disengagement and behavioral lapses, while low ACCp activity correlated with hyper-engagement and impulsive errors. Chemogenetic elevation of ACCp activity reduced both awakenings during sleep and impulsive errors in the ENGAGE task. Furthermore, mice employing an exploration strategy in the task showed a stronger correlation between ACCp activity and performance compared to mice employing an exploitation strategy which reduced task complexity. Our results implicate ACCp claustrum neurons in restricting engagement during sleep and goal-directed behavior.
Collapse
Affiliation(s)
- Gal Atlan
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem; Edmond J. Safra Campus, Givat Ram, Jerusalem, Israel
| | - Noa Matosevich
- Department of Physiology & Pharmacology, Faculty of Medical and Health Sciences, Tel Aviv University, Tel Aviv, Israel
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| | - Noa Peretz-Rivlin
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem; Edmond J. Safra Campus, Givat Ram, Jerusalem, Israel
| | - Idit Marsh-Yvgi
- The Alexander Silberman Institute of Life Science, Faculty of Science, The Hebrew University of Jerusalem; Edmond J. Safra Campus, Givat Ram, Jerusalem, Israel
| | - Noam Zelinger
- Department of Physiology & Pharmacology, Faculty of Medical and Health Sciences, Tel Aviv University, Tel Aviv, Israel
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| | - Eden Chen
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem; Edmond J. Safra Campus, Givat Ram, Jerusalem, Israel
| | - Timna Kleinman
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem; Edmond J. Safra Campus, Givat Ram, Jerusalem, Israel
| | - Noa Bleistein
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem; Edmond J. Safra Campus, Givat Ram, Jerusalem, Israel
- The Alexander Silberman Institute of Life Science, Faculty of Science, The Hebrew University of Jerusalem; Edmond J. Safra Campus, Givat Ram, Jerusalem, Israel
| | - Efrat Sheinbach
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem; Edmond J. Safra Campus, Givat Ram, Jerusalem, Israel
- The Alexander Silberman Institute of Life Science, Faculty of Science, The Hebrew University of Jerusalem; Edmond J. Safra Campus, Givat Ram, Jerusalem, Israel
| | - Maya Groysman
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem; Edmond J. Safra Campus, Givat Ram, Jerusalem, Israel
| | - Yuval Nir
- Department of Physiology & Pharmacology, Faculty of Medical and Health Sciences, Tel Aviv University, Tel Aviv, Israel
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
- Department of Biomedical Engineering, Faculty of Engineering, Tel Aviv University, Tel Aviv, Israel
- The Sieratzki-Sagol Center for Sleep Medicine, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
- Sagol Brain Institute, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
| | - Ami Citri
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem; Edmond J. Safra Campus, Givat Ram, Jerusalem, Israel.
- The Alexander Silberman Institute of Life Science, Faculty of Science, The Hebrew University of Jerusalem; Edmond J. Safra Campus, Givat Ram, Jerusalem, Israel.
- Program in Child and Brain Development, Canadian Institute for Advanced Research; MaRS Centre, Toronto, ON, Canada.
| |
Collapse
|
7
|
Gilmour W, Mackenzie G, Feile M, Tayler-Grint L, Suveges S, Macfarlane JA, Macleod AD, Marshall V, Grunwald IQ, Steele JD, Gilbertson T. Impaired value-based decision-making in Parkinson's disease apathy. Brain 2024; 147:1362-1376. [PMID: 38305691 PMCID: PMC10994558 DOI: 10.1093/brain/awae025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 12/07/2023] [Accepted: 01/13/2024] [Indexed: 02/03/2024] Open
Abstract
Apathy is a common and disabling complication of Parkinson's disease characterized by reduced goal-directed behaviour. Several studies have reported dysfunction within prefrontal cortical regions and projections from brainstem nuclei whose neuromodulators include dopamine, serotonin and noradrenaline. Work in animal and human neuroscience have confirmed contributions of these neuromodulators on aspects of motivated decision-making. Specifically, these neuromodulators have overlapping contributions to encoding the value of decisions, and influence whether to explore alternative courses of action or persist in an existing strategy to achieve a rewarding goal. Building upon this work, we hypothesized that apathy in Parkinson's disease should be associated with an impairment in value-based learning. Using a four-armed restless bandit reinforcement learning task, we studied decision-making in 75 volunteers; 53 patients with Parkinson's disease, with and without clinical apathy, and 22 age-matched healthy control subjects. Patients with apathy exhibited impaired ability to choose the highest value bandit. Task performance predicted an individual patient's apathy severity measured using the Lille Apathy Rating Scale (R = -0.46, P < 0.001). Computational modelling of the patient's choices confirmed the apathy group made decisions that were indifferent to the learnt value of the options, consistent with previous reports of reward insensitivity. Further analysis demonstrated a shift away from exploiting the highest value option and a reduction in perseveration, which also correlated with apathy scores (R = -0.5, P < 0.001). We went on to acquire functional MRI in 59 volunteers; a group of 19 patients with and 20 without apathy and 20 age-matched controls performing the Restless Bandit Task. Analysis of the functional MRI signal at the point of reward feedback confirmed diminished signal within ventromedial prefrontal cortex in Parkinson's disease, which was more marked in apathy, but not predictive of their individual apathy severity. Using a model-based categorization of choice type, decisions to explore lower value bandits in the apathy group activated prefrontal cortex to a similar degree to the age-matched controls. In contrast, Parkinson's patients without apathy demonstrated significantly increased activation across a distributed thalamo-cortical network. Enhanced activity in the thalamus predicted individual apathy severity across both patient groups and exhibited functional connectivity with dorsal anterior cingulate cortex and anterior insula. Given that task performance in patients without apathy was no different to the age-matched control subjects, we interpret the recruitment of this network as a possible compensatory mechanism, which compensates against symptomatic manifestation of apathy in Parkinson's disease.
Collapse
Affiliation(s)
- William Gilmour
- Division of Imaging Science and Technology, Ninewells Hospital and Medical School, University of Dundee, Dundee DD1 9SY, UK
- Department of Neurology, Ninewells Hospital and Medical School, Dundee DD1 9SY, UK
| | - Graeme Mackenzie
- Division of Imaging Science and Technology, Ninewells Hospital and Medical School, University of Dundee, Dundee DD1 9SY, UK
- Department of Neurology, Ninewells Hospital and Medical School, Dundee DD1 9SY, UK
| | - Mathias Feile
- Rehabilitation Psychiatry, Murray Royal Hospital, Perth PH2 7BH, UK
| | | | - Szabolcs Suveges
- Division of Imaging Science and Technology, Ninewells Hospital and Medical School, University of Dundee, Dundee DD1 9SY, UK
| | - Jennifer A Macfarlane
- Division of Imaging Science and Technology, Ninewells Hospital and Medical School, University of Dundee, Dundee DD1 9SY, UK
- Medical Physics, Ninewells Hospital and Medical School, Dundee DD1 9SY, UK
- SINAPSE, University of Glasgow, Imaging Centre of Excellence, Level 2, Queen Elizabeth University Hospital, Glasgow G51 4TF, Scotland, UK
| | - Angus D Macleod
- Institute of Applied Health Sciences, School of Medicine, University of Aberdeen, Foresterhill, Aberdeen AB24 2ZD, UK
- Department of Neurology, Aberdeen Royal Infirmary, Foresterhill, Aberdeen AB24 2ZD, UK
| | - Vicky Marshall
- Institute of Neurological Sciences, Queen Elizabeth University Hospital, Glasgow G51 4TF, UK
| | - Iris Q Grunwald
- Division of Imaging Science and Technology, Ninewells Hospital and Medical School, University of Dundee, Dundee DD1 9SY, UK
| | - J Douglas Steele
- Division of Imaging Science and Technology, Ninewells Hospital and Medical School, University of Dundee, Dundee DD1 9SY, UK
| | - Tom Gilbertson
- Division of Imaging Science and Technology, Ninewells Hospital and Medical School, University of Dundee, Dundee DD1 9SY, UK
- Department of Neurology, Ninewells Hospital and Medical School, Dundee DD1 9SY, UK
| |
Collapse
|
8
|
Jahn CI, Markov NT, Morea B, Daw ND, Ebitz RB, Buschman TJ. Learning attentional templates for value-based decision-making. Cell 2024; 187:1476-1489.e21. [PMID: 38401541 DOI: 10.1016/j.cell.2024.01.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 12/18/2023] [Accepted: 01/25/2024] [Indexed: 02/26/2024]
Abstract
Attention filters sensory inputs to enhance task-relevant information. It is guided by an "attentional template" that represents the stimulus features that are currently relevant. To understand how the brain learns and uses templates, we trained monkeys to perform a visual search task that required them to repeatedly learn new attentional templates. Neural recordings found that templates were represented across the prefrontal and parietal cortex in a structured manner, such that perceptually neighboring templates had similar neural representations. When the task changed, a new attentional template was learned by incrementally shifting the template toward rewarded features. Finally, we found that attentional templates transformed stimulus features into a common value representation that allowed the same decision-making mechanisms to deploy attention, regardless of the identity of the template. Altogether, our results provide insight into the neural mechanisms by which the brain learns to control attention and how attention can be flexibly deployed across tasks.
Collapse
Affiliation(s)
- Caroline I Jahn
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08540, USA.
| | - Nikola T Markov
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08540, USA
| | - Britney Morea
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08540, USA
| | - Nathaniel D Daw
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08540, USA; Department of Psychology, Princeton University, Princeton, NJ 08540, USA
| | - R Becket Ebitz
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08540, USA; Department of Neurosciences, Université de Montréal, Montréal, QC H3C 3J7, Canada
| | - Timothy J Buschman
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08540, USA; Department of Psychology, Princeton University, Princeton, NJ 08540, USA.
| |
Collapse
|
9
|
Segraves MA. Using Natural Scenes to Enhance our Understanding of the Cerebral Cortex's Role in Visual Search. Annu Rev Vis Sci 2023; 9:435-454. [PMID: 37164028 DOI: 10.1146/annurev-vision-100720-124033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Using natural scenes is an approach to studying the visual and eye movement systems approximating how these systems function in everyday life. This review examines the results from behavioral and neurophysiological studies using natural scene viewing in humans and monkeys. The use of natural scenes for the study of cerebral cortical activity is relatively new and presents challenges for data analysis. Methods and results from the use of natural scenes for the study of the visual and eye movement cortex are presented, with emphasis on new insights that this method provides enhancing what is known about these cortical regions from the use of conventional methods.
Collapse
Affiliation(s)
- Mark A Segraves
- Department of Neurobiology, Northwestern University, Evanston, Illinois, USA;
| |
Collapse
|
10
|
Tranter MM, Aggarwal S, Young JW, Dillon DG, Barnes SA. Reinforcement learning deficits exhibited by postnatal PCP-treated rats enable deep neural network classification. Neuropsychopharmacology 2023; 48:1377-1385. [PMID: 36509858 PMCID: PMC10354061 DOI: 10.1038/s41386-022-01514-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 11/21/2022] [Accepted: 11/26/2022] [Indexed: 12/14/2022]
Abstract
The ability to appropriately update the value of a given action is a critical component of flexible decision making. Several psychiatric disorders, including schizophrenia, are associated with impairments in flexible decision making that can be evaluated using the probabilistic reversal learning (PRL) task. The PRL task has been reverse-translated for use in rodents. Disrupting glutamate neurotransmission during early postnatal neurodevelopment in rodents has induced behavioral, cognitive, and neuropathophysiological abnormalities relevant to schizophrenia. Here, we tested the hypothesis that using the NMDA receptor antagonist phencyclidine (PCP) to disrupt postnatal glutamatergic transmission in rats would lead to impaired decision making in the PRL. Consistent with this hypothesis, compared to controls the postnatal PCP-treated rats completed fewer reversals and exhibited disruptions in reward and punishment sensitivity (i.e., win-stay and lose-shift responding, respectively). Moreover, computational analysis of behavior revealed that postnatal PCP-treatment resulted in a pronounced impairment in the learning rate throughout PRL testing. Finally, a deep neural network (DNN) trained on the rodent behavior could accurately predict the treatment group of subjects. These data demonstrate that disrupting early postnatal glutamatergic neurotransmission impairs flexible decision making and provides evidence that DNNs can be trained on behavioral datasets to accurately predict the treatment group of new subjects, highlighting the potential for DNNs to aid in the diagnosis of schizophrenia.
Collapse
Affiliation(s)
- Michael M Tranter
- Department of Psychiatry, University of California San Diego, La Jolla, CA, 92093, USA
- Department of Mental Health, VA San Diego Healthcare System, La Jolla, CA, 92093, USA
| | - Samarth Aggarwal
- Department of Psychiatry, University of California San Diego, La Jolla, CA, 92093, USA
| | - Jared W Young
- Department of Psychiatry, University of California San Diego, La Jolla, CA, 92093, USA
- Department of Mental Health, VA San Diego Healthcare System, La Jolla, CA, 92093, USA
| | - Daniel G Dillon
- Center for Depression, Anxiety and Stress Research, McLean Hospital, Belmont, MA, 02478, USA
- Harvard Medical School, Boston, MA, 02115, USA
| | - Samuel A Barnes
- Department of Psychiatry, University of California San Diego, La Jolla, CA, 92093, USA.
- Department of Mental Health, VA San Diego Healthcare System, La Jolla, CA, 92093, USA.
| |
Collapse
|
11
|
Yan X, Ebitz RB, Grissom N, Darrow DP, Herman AB. A low dimensional manifold of human exploratory behavior reveals opposing roles for apathy and anxiety. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.19.545645. [PMID: 37425723 PMCID: PMC10327047 DOI: 10.1101/2023.06.19.545645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Exploration-exploitation decision-making is a feature of daily life that is altered in a number of neuropsychiatric conditions. Humans display a range of exploration and exploitation behaviors, which can be affected by apathy and anxiety. It remains unknown how factors underlying decision-making generate the spectrum of observed exploration-exploitation behavior and how they relate to states of anxiety and apathy. Here, we report a latent structure underlying sequential exploration and exploitation decisions that explains variation in anxiety and apathy. 1001 participants in a gender-balanced sample completed a three-armed restless bandit task along with psychiatric symptom surveys. Using dimensionality reduction methods, we found that decision sequences reduced to a low-dimensional manifold. The axes of this manifold explained individual differences in the balance between states of exploration and exploitation and the stability of those states, as determined by a statistical mechanics model of decision-making. Position along the balance axis was correlated with opposing symptoms of behavioral apathy and anxiety, while position along the stability axis correlated with the level of emotional apathy. This result resolves a paradox over how these symptoms can be correlated in samples but have opposite effects on behavior. Furthermore, this work provides a basis for using behavioral manifolds to reveal relationships between behavioral dynamics and affective states, with important implications for behavioral measurement approaches to neuropsychiatric conditions.
Collapse
|
12
|
Shourkeshti A, Marrocco G, Jurewicz K, Moore T, Ebitz RB. Pupil size predicts the onset of exploration in brain and behavior. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.24.541981. [PMID: 37292773 PMCID: PMC10245915 DOI: 10.1101/2023.05.24.541981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In uncertain environments, intelligent decision-makers exploit actions that have been rewarding in the past, but also explore actions that could be even better. Several neuromodulatory systems are implicated in exploration, based, in part, on work linking exploration to pupil size-a peripheral correlate of neuromodulatory tone and index of arousal. However, pupil size could instead track variables that make exploration more likely, like volatility or reward, without directly predicting either exploration or its neural bases. Here, we simultaneously measured pupil size, exploration, and neural population activity in the prefrontal cortex while two rhesus macaques explored and exploited in a dynamic environment. We found that pupil size under constant luminance specifically predicted the onset of exploration, beyond what could be explained by reward history. Pupil size also predicted disorganized patterns of prefrontal neural activity at both the single neuron and population levels, even within periods of exploitation. Ultimately, our results support a model in which pupil-linked mechanisms promote the onset of exploration via driving the prefrontal cortex through a critical tipping point where prefrontal control dynamics become disorganized and exploratory decisions are possible.
Collapse
Affiliation(s)
- Akram Shourkeshti
- Department of Neurosciences, Université de Montréal, Montréal, QC, Canada
| | - Gabriel Marrocco
- Department of Neurosciences, Université de Montréal, Montréal, QC, Canada
| | - Katarzyna Jurewicz
- Department of Neurosciences, Université de Montréal, Montréal, QC, Canada
- Department of Physiology, McGill University, Montréal, QC, Canada
| | - Tirin Moore
- Department of Neurobiology, Stanford University School of Medicine, Stanford, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - R. Becket Ebitz
- Department of Neurosciences, Université de Montréal, Montréal, QC, Canada
| |
Collapse
|
13
|
Jahn CI, Grohn J, Cuell S, Emberton A, Bouret S, Walton ME, Kolling N, Sallet J. Neural responses in macaque prefrontal cortex are linked to strategic exploration. PLoS Biol 2023; 21:e3001985. [PMID: 36716348 PMCID: PMC9910800 DOI: 10.1371/journal.pbio.3001985] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 02/09/2023] [Accepted: 01/03/2023] [Indexed: 02/01/2023] Open
Abstract
Humans have been shown to strategically explore. They can identify situations in which gathering information about distant and uncertain options is beneficial for the future. Because primates rely on scarce resources when they forage, they are also thought to strategically explore, but whether they use the same strategies as humans and the neural bases of strategic exploration in monkeys are largely unknown. We designed a sequential choice task to investigate whether monkeys mobilize strategic exploration based on whether information can improve subsequent choice, but also to ask the novel question about whether monkeys adjust their exploratory choices based on the contingency between choice and information, by sometimes providing the counterfactual feedback about the unchosen option. We show that monkeys decreased their reliance on expected value when exploration could be beneficial, but this was not mediated by changes in the effect of uncertainty on choices. We found strategic exploratory signals in anterior and mid-cingulate cortex (ACC/MCC) and dorsolateral prefrontal cortex (dlPFC). This network was most active when a low value option was chosen, which suggests a role in counteracting expected value signals, when exploration away from value should to be considered. Such strategic exploration was abolished when the counterfactual feedback was available. Learning from counterfactual outcome was associated with the recruitment of a different circuit centered on the medial orbitofrontal cortex (OFC), where we showed that monkeys represent chosen and unchosen reward prediction errors. Overall, our study shows how ACC/MCC-dlPFC and OFC circuits together could support exploitation of available information to the fullest and drive behavior towards finding more information through exploration when it is beneficial.
Collapse
Affiliation(s)
- Caroline I. Jahn
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom
- Motivation, Brain and Behavior Team, Institut du Cerveau et de la Moelle Epinière, Paris, France
- Sorbonne Paris Cité universités, Université Paris Descartes, Frontières du Vivant, Paris, France
- * E-mail: (CIJ); (JG); (NK); (JS)
| | - Jan Grohn
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom
- * E-mail: (CIJ); (JG); (NK); (JS)
| | - Steven Cuell
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom
| | - Andrew Emberton
- Biomedical Science Services, University of Oxford, Oxford, United Kingdom
| | - Sebastien Bouret
- Motivation, Brain and Behavior Team, Institut du Cerveau et de la Moelle Epinière, Paris, France
| | - Mark E. Walton
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom
| | - Nils Kolling
- Wellcome Centre for Integrative Neuroimaging, OBHA, University of Oxford, Headington, United Kingdom
- Univ Lyon, Université Lyon 1, Inserm, Stem Cell and Brain Research Institute U1208, Bron, France
- * E-mail: (CIJ); (JG); (NK); (JS)
| | - Jérôme Sallet
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom
- Univ Lyon, Université Lyon 1, Inserm, Stem Cell and Brain Research Institute U1208, Bron, France
- * E-mail: (CIJ); (JG); (NK); (JS)
| |
Collapse
|
14
|
Brown VM, Hallquist MN, Frank MJ, Dombrovski AY. Humans adaptively resolve the explore-exploit dilemma under cognitive constraints: Evidence from a multi-armed bandit task. Cognition 2022; 229:105233. [PMID: 35917612 PMCID: PMC9530017 DOI: 10.1016/j.cognition.2022.105233] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 06/02/2022] [Accepted: 07/22/2022] [Indexed: 11/27/2022]
Abstract
When navigating uncertain worlds, humans must balance exploring new options versus exploiting known rewards. Longer horizons and spatially structured option values encourage humans to explore, but the impact of real-world cognitive constraints such as environment size and memory demands on explore-exploit decisions is unclear. In the present study, humans chose between options varying in uncertainty during a multi-armed bandit task with varying environment size and memory demands. Regression and cognitive computational models of choice behavior showed that with a lower cognitive load, humans are more exploratory than a simulated value-maximizing learner, but under cognitive constraints, they adaptively scale down exploration to maintain exploitation. Thus, while humans are curious, cognitive constraints force people to decrease their strategic exploration in a resource-rational-like manner to focus on harvesting known rewards.
Collapse
Affiliation(s)
- Vanessa M Brown
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, USA.
| | - Michael N Hallquist
- Department of Psychology, Pennsylvania State University, State College, PA, USA; Department of Psychology and Neuroscience, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Michael J Frank
- Department of Cognitive, Linguistic, and Psychological Sciences and Carney Institute for Brain Science, Brown University, Providence, RI, USA
| | | |
Collapse
|
15
|
Post RJ, Bulkin DA, Ebitz RB, Lee V, Han K, Warden MR. Tonic activity in lateral habenula neurons acts as a neutral valence brake on reward-seeking behavior. Curr Biol 2022; 32:4325-4336.e5. [PMID: 36049479 PMCID: PMC9613558 DOI: 10.1016/j.cub.2022.08.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 12/16/2021] [Accepted: 08/09/2022] [Indexed: 11/16/2022]
Abstract
Survival requires both the ability to persistently pursue goals and the ability to determine when it is time to stop, an adaptive balance of perseverance and disengagement. Neural activity in the lateral habenula (LHb) has been linked to negative valence, but its role in regulating the balance between engaged reward seeking and disengaged behavioral states remains unclear. Here, we show that LHb neural activity is tonically elevated during minutes-long periods of disengagement from reward-seeking behavior, both when due to repeated reward omission (negative valence) and when sufficient reward has been consumed (positive valence). Furthermore, we show that LHb inhibition extends ongoing reward-seeking behavioral states but does not prompt task re-engagement. We find no evidence for similar tonic activity changes in ventral tegmental area dopamine neurons. Our findings support a framework in which tonic activity in LHb neurons suppresses engagement in reward-seeking behavior in response to both negatively and positively valenced factors.
Collapse
Affiliation(s)
- Ryan J Post
- Department of Neurobiology and Behavior, Cornell University, Ithaca, NY 14853, USA; Cornell Neurotech, Cornell University, Ithaca, NY 14853, USA
| | - David A Bulkin
- Department of Neurobiology and Behavior, Cornell University, Ithaca, NY 14853, USA; Cornell Neurotech, Cornell University, Ithaca, NY 14853, USA
| | - R Becket Ebitz
- Department of Neuroscience, Université de Montréal, Montréal, QC H3T 1J4, Canada
| | - Vladlena Lee
- Department of Neurobiology and Behavior, Cornell University, Ithaca, NY 14853, USA
| | - Kasey Han
- Department of Neurobiology and Behavior, Cornell University, Ithaca, NY 14853, USA
| | - Melissa R Warden
- Department of Neurobiology and Behavior, Cornell University, Ithaca, NY 14853, USA; Cornell Neurotech, Cornell University, Ithaca, NY 14853, USA.
| |
Collapse
|
16
|
Li Y, Daddaoua N, Horan M, Foley NC, Gottlieb J. Uncertainty modulates visual maps during noninstrumental information demand. Nat Commun 2022; 13:5911. [PMID: 36207316 PMCID: PMC9547007 DOI: 10.1038/s41467-022-33585-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Accepted: 09/22/2022] [Indexed: 11/23/2022] Open
Abstract
Animals are intrinsically motivated to obtain information independently of instrumental incentives. This motivation depends on two factors: a desire to resolve uncertainty by gathering accurate information and a desire to obtain positively-valenced observations, which predict favorable rather than unfavorable outcomes. To understand the neural mechanisms, we recorded parietal cortical activity implicated in prioritizing stimuli for spatial attention and gaze, in a task in which monkeys were free (but not trained) to obtain information about probabilistic non-contingent rewards. We show that valence and uncertainty independently modulated parietal neuronal activity, and uncertainty but not reward-related enhancement consistently correlated with behavioral sensitivity. The findings suggest uncertainty-driven and valence-driven information demand depend on partially distinct pathways, with the former being consistently related to parietal responses and the latter depending on additional mechanisms implemented in downstream structures.
Collapse
Affiliation(s)
- Yvonne Li
- Department of Neuroscience, Columbia University, New York, NY, USA
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Nabil Daddaoua
- Department of Neuroscience, Columbia University, New York, NY, USA
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Mattias Horan
- Department of Neuroscience, Columbia University, New York, NY, USA
| | - Nicholas C Foley
- Department of Neuroscience, Columbia University, New York, NY, USA
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Jacqueline Gottlieb
- Department of Neuroscience, Columbia University, New York, NY, USA.
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA.
- Kavli Institute for Brain Science, Columbia University, New York, NY, USA.
| |
Collapse
|
17
|
Rojas GR, Curry-Pochy LS, Chen CS, Heller AT, Grissom NM. Sequential delay and probability discounting tasks in mice reveal anchoring effects partially attributable to decision noise. Behav Brain Res 2022; 431:113951. [PMID: 35661751 PMCID: PMC9844124 DOI: 10.1016/j.bbr.2022.113951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 05/20/2022] [Accepted: 05/29/2022] [Indexed: 01/19/2023]
Abstract
Delay discounting and probability discounting decision making tasks in rodent models have high translational potential. However, it is unclear whether the discounted value of the large reward option is the main contributor to variability in animals' choices in either task, which may limit translation to humans. Male and female mice underwent sessions of delay and probability discounting in sequence to assess how choice behavior adapts over experience with each task. To control for "anchoring" (persistent choices based on the initial delay or probability), mice experienced "Worsening" schedules where the large reward was offered under initially favorable conditions that became less favorable during testing, followed by "Improving" schedules where the large reward was offered under initially unfavorable conditions that improved over a session. During delay discounting, both male and female mice showed elimination of anchoring effects over training. In probability discounting, both sexes of mice continued to show some anchoring even after months of training. One possibility is that "noisy", exploratory choices could contribute to these persistent anchoring effects, rather than constant fluctuations in value discounting. We fit choice behavior in individual animals using models that included both a value-based discounting parameter and a decision noise parameter that captured variability in choices deviating from value maximization. Changes in anchoring behavior over time were tracked by changes in both the value and decision noise parameters in delay discounting, but by the decision noise parameter in probability discounting. Exploratory decision making was also reflected in choice response times that tracked the degree of conflict caused by both uncertainty and temporal cost, but was not linked with differences in locomotor activity reflecting chamber exploration. Thus, variable discounting behavior in mice can result from changes in exploration of the decision options rather than changes in reward valuation.
Collapse
|
18
|
Csorba BA, Krause MR, Zanos TP, Pack CC. Long-range cortical synchronization supports abrupt visual learning. Curr Biol 2022; 32:2467-2479.e4. [PMID: 35523181 DOI: 10.1016/j.cub.2022.04.029] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 03/08/2022] [Accepted: 04/12/2022] [Indexed: 11/29/2022]
Abstract
Visual plasticity declines sharply after the critical period, yet we easily learn to recognize new faces and places, even as adults. Such learning is often characterized by a "moment of insight," an abrupt and dramatic improvement in recognition. The mechanisms that support abrupt learning are unknown, but one hypothesis is that they involve changes in synchronization between brain regions. To test this hypothesis, we used a behavioral task in which non-human primates rapidly learned to recognize novel images and to associate them with specific responses. Simultaneous recordings from inferotemporal and prefrontal cortices revealed a transient synchronization of neural activity between these areas that peaked around the moment of insight. Synchronization was strongest between inferotemporal sites that encoded images and reward-sensitive prefrontal sites. Moreover, its magnitude intensified gradually over image exposures, suggesting that abrupt learning is the culmination of a search for informative signals within a circuit linking sensory information to task demands.
Collapse
Affiliation(s)
- Bennett A Csorba
- Montreal Neurological Institute, McGill University, Montreal, QC H3A 2B4, Canada.
| | - Matthew R Krause
- Montreal Neurological Institute, McGill University, Montreal, QC H3A 2B4, Canada
| | | | - Christopher C Pack
- Montreal Neurological Institute, McGill University, Montreal, QC H3A 2B4, Canada
| |
Collapse
|
19
|
Beron CC, Neufeld SQ, Linderman SW, Sabatini BL. Mice exhibit stochastic and efficient action switching during probabilistic decision making. Proc Natl Acad Sci U S A 2022; 119:e2113961119. [PMID: 35385355 PMCID: PMC9169659 DOI: 10.1073/pnas.2113961119] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 03/03/2022] [Indexed: 12/05/2022] Open
Abstract
In probabilistic and nonstationary environments, individuals must use internal and external cues to flexibly make decisions that lead to desirable outcomes. To gain insight into the process by which animals choose between actions, we trained mice in a task with time-varying reward probabilities. In our implementation of such a two-armed bandit task, thirsty mice use information about recent action and action–outcome histories to choose between two ports that deliver water probabilistically. Here we comprehensively modeled choice behavior in this task, including the trial-to-trial changes in port selection, i.e., action switching behavior. We find that mouse behavior is, at times, deterministic and, at others, apparently stochastic. The behavior deviates from that of a theoretically optimal agent performing Bayesian inference in a hidden Markov model (HMM). We formulate a set of models based on logistic regression, reinforcement learning, and sticky Bayesian inference that we demonstrate are mathematically equivalent and that accurately describe mouse behavior. The switching behavior of mice in the task is captured in each model by a stochastic action policy, a history-dependent representation of action value, and a tendency to repeat actions despite incoming evidence. The models parsimoniously capture behavior across different environmental conditionals by varying the stickiness parameter, and like the mice, they achieve nearly maximal reward rates. These results indicate that mouse behavior reaches near-maximal performance with reduced action switching and can be described by a set of equivalent models with a small number of relatively fixed parameters.
Collapse
Affiliation(s)
- Celia C. Beron
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115
- HHMI, Harvard Medical School, Boston, MA 02115
| | - Shay Q. Neufeld
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115
- HHMI, Harvard Medical School, Boston, MA 02115
| | - Scott W. Linderman
- Department of Statistics, Stanford University, Stanford, CA 94305
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA 94305
| | - Bernardo L. Sabatini
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115
- HHMI, Harvard Medical School, Boston, MA 02115
| |
Collapse
|
20
|
Smith R, Taylor S, Wilson RC, Chuning AE, Persich MR, Wang S, Killgore WDS. Lower Levels of Directed Exploration and Reflective Thinking Are Associated With Greater Anxiety and Depression. Front Psychiatry 2022; 12:782136. [PMID: 35126200 PMCID: PMC8808291 DOI: 10.3389/fpsyt.2021.782136] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 12/07/2021] [Indexed: 01/15/2023] Open
Abstract
Anxiety and depression are often associated with strong beliefs that entering specific situations will lead to aversive outcomes - even when these situations are objectively safe and avoiding them reduces well-being. A possible mechanism underlying this maladaptive avoidance behavior is a failure to reflect on: (1) appropriate levels of uncertainty about the situation, and (2) how this uncertainty could be reduced by seeking further information (i.e., exploration). To test this hypothesis, we asked a community sample of 416 individuals to complete measures of reflective cognition, exploration, and symptoms of anxiety and depression. Consistent with our hypotheses, we found significant associations between each of these measures in expected directions (i.e., positive relationships between reflective cognition and strategic information-seeking behavior or "directed exploration", and negative relationships between these measures and anxiety/depression symptoms). Further analyses suggested that the relationship between directed exploration and depression/anxiety was due in part to an ambiguity aversion promoting exploration in conditions where information-seeking was not beneficial (as opposed to only being due to under-exploration when more information would aid future choices). In contrast, reflectiveness was associated with greater exploration in appropriate settings and separately accounted for differences in reaction times, decision noise, and choice accuracy in expected directions. These results shed light on the mechanisms underlying information-seeking behavior and how they may contribute to symptoms of emotional disorders. They also highlight the potential clinical relevance of individual differences in reflectiveness and exploration and should motivate future research on their possible contributions to vulnerability and/or maintenance of affective disorders.
Collapse
Affiliation(s)
- Ryan Smith
- Laureate Institute for Brain Research, Tulsa, OK, United States
| | - Samuel Taylor
- Laureate Institute for Brain Research, Tulsa, OK, United States
| | - Robert C. Wilson
- Department of Psychology, University of Arizona, Tucson, AZ, United States
| | - Anne E. Chuning
- Laureate Institute for Brain Research, Tulsa, OK, United States
| | | | - Siyu Wang
- Department of Psychology, University of Arizona, Tucson, AZ, United States
| | - William D. S. Killgore
- Department of Psychology, University of Arizona, Tucson, AZ, United States
- Department of Psychiatry, University of Arizona, Tucson, AZ, United States
| |
Collapse
|
21
|
Averbeck B, O'Doherty JP. Reinforcement-learning in fronto-striatal circuits. Neuropsychopharmacology 2022; 47:147-162. [PMID: 34354249 PMCID: PMC8616931 DOI: 10.1038/s41386-021-01108-0] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 07/06/2021] [Accepted: 07/09/2021] [Indexed: 01/03/2023]
Abstract
We review the current state of knowledge on the computational and neural mechanisms of reinforcement-learning with a particular focus on fronto-striatal circuits. We divide the literature in this area into five broad research themes: the target of the learning-whether it be learning about the value of stimuli or about the value of actions; the nature and complexity of the algorithm used to drive the learning and inference process; how learned values get converted into choices and associated actions; the nature of state representations, and of other cognitive machinery that support the implementation of various reinforcement-learning operations. An emerging fifth area focuses on how the brain allocates or arbitrates control over different reinforcement-learning sub-systems or "experts". We will outline what is known about the role of the prefrontal cortex and striatum in implementing each of these functions. We then conclude by arguing that it will be necessary to build bridges from algorithmic level descriptions of computational reinforcement-learning to implementational level models to better understand how reinforcement-learning emerges from multiple distributed neural networks in the brain.
Collapse
Affiliation(s)
| | - John P O'Doherty
- Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, USA.
| |
Collapse
|
22
|
Monosov IE, Rushworth MFS. Interactions between ventrolateral prefrontal and anterior cingulate cortex during learning and behavioural change. Neuropsychopharmacology 2022; 47:196-210. [PMID: 34234288 PMCID: PMC8617208 DOI: 10.1038/s41386-021-01079-2] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Revised: 05/27/2021] [Accepted: 06/15/2021] [Indexed: 02/06/2023]
Abstract
Hypotheses and beliefs guide credit assignment - the process of determining which previous events or actions caused an outcome. Adaptive hypothesis formation and testing are crucial in uncertain and changing environments in which associations and meanings are volatile. Despite primates' abilities to form and test hypotheses, establishing what is causally responsible for the occurrence of particular outcomes remains a fundamental challenge for credit assignment and learning. Hypotheses about what surprises are due to stochasticity inherent in an environment as opposed to real, systematic changes are necessary for identifying the environment's predictive features, but are often hard to test. We review evidence that two highly interconnected frontal cortical regions, anterior cingulate cortex and ventrolateral prefrontal area 47/12o, provide a biological substrate for linking two crucial components of hypothesis-formation and testing: the control of information seeking and credit assignment. Neuroimaging, targeted disruptions, and neurophysiological studies link an anterior cingulate - 47/12o circuit to generation of exploratory behaviour, non-instrumental information seeking, and interpretation of subsequent feedback in the service of credit assignment. Our observations support the idea that information seeking and credit assignment are linked at the level of neural circuits and explain why this circuit is important for ensuring behaviour is flexible and adaptive.
Collapse
Affiliation(s)
- Ilya E Monosov
- Department of Neuroscience, Washington University School of Medicine, St. Louis, MO, USA.
- Department of Biomedical Engineering, Washington University, St. Louis, MO, USA.
- Department of Electrical Engineering, Washington University, St. Louis, MO, USA.
- Department of Neurosurgery, Washington University, St. Louis, MO, USA.
- Pain Center, Washington University, St. Louis, MO, USA.
| | - Matthew F S Rushworth
- Wellcome Centre for Integrative Neuroimaging (WIN), Department of Experimental Psychology, University of Oxford, Oxford, UK.
| |
Collapse
|
23
|
Ebitz RB, Hayden BY. The population doctrine in cognitive neuroscience. Neuron 2021; 109:3055-3068. [PMID: 34416170 PMCID: PMC8725976 DOI: 10.1016/j.neuron.2021.07.011] [Citation(s) in RCA: 69] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 07/02/2021] [Accepted: 07/13/2021] [Indexed: 01/08/2023]
Abstract
A major shift is happening within neurophysiology: a population doctrine is drawing level with the single-neuron doctrine that has long dominated the field. Population-level ideas have so far had their greatest impact in motor neuroscience, but they hold great promise for resolving open questions in cognition as well. Here, we codify the population doctrine and survey recent work that leverages this view to specifically probe cognition. Our discussion is organized around five core concepts that provide a foundation for population-level thinking: (1) state spaces, (2) manifolds, (3) coding dimensions, (4) subspaces, and (5) dynamics. The work we review illustrates the progress and promise that population-level thinking holds for cognitive neuroscience-for delivering new insight into attention, working memory, decision-making, executive function, learning, and reward processing.
Collapse
Affiliation(s)
- R Becket Ebitz
- Department of Neurosciences, Faculté de médecine, Université de Montréal, Montréal, QC, Canada.
| | - Benjamin Y Hayden
- Department of Neuroscience, Center for Magnetic Resonance Research, and Center for Neuroengineering, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
24
|
Yu LQ, Wilson RC, Nassar MR. Adaptive learning is structure learning in time. Neurosci Biobehav Rev 2021; 128:270-281. [PMID: 34144114 PMCID: PMC8422504 DOI: 10.1016/j.neubiorev.2021.06.024] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 04/19/2021] [Accepted: 06/11/2021] [Indexed: 10/21/2022]
Abstract
People use information flexibly. They often combine multiple sources of relevant information over time in order to inform decisions with little or no interference from intervening irrelevant sources. They adjust the degree to which they use new information over time rationally in accordance with environmental statistics and their own uncertainty. They can even use information gained in one situation to solve a problem in a very different one. Learning flexibly rests on the ability to infer the context at a given time, and therefore knowing which pieces of information to combine and which to separate. We review the psychological and neural mechanisms behind adaptive learning and structure learning to outline how people pool together relevant information, demarcate contexts, prevent interference between information collected in different contexts, and transfer information from one context to another. By examining all of these processes through the lens of optimal inference we bridge concepts from multiple fields to provide a unified multi-system view of how the brain exploits structure in time to optimize learning.
Collapse
Affiliation(s)
- Linda Q Yu
- Carney Institute for Brain Sciences, Brown University, 164 Angell Street, Providence, RI, 02912, USA.
| | - Robert C Wilson
- Department of Psychology, University of Arizona, Tucson, AZ, 85721, USA
| | - Matthew R Nassar
- Carney Institute for Brain Sciences, Brown University, 164 Angell Street, Providence, RI, 02912, USA
| |
Collapse
|
25
|
Tardiff N, Medaglia JD, Bassett DS, Thompson-Schill SL. The modulation of brain network integration and arousal during exploration. Neuroimage 2021; 240:118369. [PMID: 34242784 PMCID: PMC8507424 DOI: 10.1016/j.neuroimage.2021.118369] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 07/01/2021] [Accepted: 07/05/2021] [Indexed: 11/08/2022] Open
Abstract
There is growing interest in how neuromodulators shape brain networks. Recent neuroimaging studies provide evidence that brainstem arousal systems, such as the locus coeruleus-norepinephrine system (LC-NE), influence functional connectivity and brain network topology, suggesting they have a role in flexibly reconfiguring brain networks in order to adapt behavior and cognition to environmental demands. To date, however, the relationship between brainstem arousal systems and functional connectivity has not been assessed within the context of a task with an established relationship between arousal and behavior, with most prior studies relying on incidental variations in arousal or pharmacological manipulation and static brain networks constructed over long periods of time. These factors have likely contributed to a heterogeneity of effects across studies. To address these issues, we took advantage of the association between LC-NE-linked arousal and exploration to probe the relationships between exploratory choice, arousal—as measured indirectly via pupil diameter—and brain network dynamics. Exploration in a bandit task was associated with a shift toward fewer, more weakly connected modules that were more segregated in terms of connectivity and topology but more integrated with respect to the diversity of cognitive systems represented in each module. Functional connectivity strength decreased, and changes in connectivity were correlated with changes in pupil diameter, in line with the hypothesis that brainstem arousal systems influence the dynamic reorganization of brain networks. More broadly, we argue that carefully aligning dynamic network analyses with task designs can increase the temporal resolution at which behaviorally- and cognitively-relevant modulations can be identified, and offer these results as a proof of concept of this approach.
Collapse
Affiliation(s)
- Nathan Tardiff
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, United States.
| | - John D Medaglia
- Department of Psychology, Drexel University, Philadelphia, PA, United States; Department of Neurology, Drexel University, Philadelphia, PA, United States; Department of Neurology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Danielle S Bassett
- Department of Neurology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States; Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, United States; Department of Electrical & Systems Engineering, University of Pennsylvania, Philadelphia, PA, United States; Department of Psychiatry, University of Pennsylvania, Philadelphia, PA, United States; Department of Physics & Astronomy, University of Pennsylvania, Philadelphia, PA, United States; Santa Fe Institute, Santa Fe, NM, United States
| | | |
Collapse
|
26
|
Koralek AC, Costa RM. Dichotomous dopaminergic and noradrenergic neural states mediate distinct aspects of exploitative behavioral states. SCIENCE ADVANCES 2021; 7:7/30/eabh2059. [PMID: 34301604 PMCID: PMC8302134 DOI: 10.1126/sciadv.abh2059] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 06/07/2021] [Indexed: 06/13/2023]
Abstract
The balance between exploiting known actions and exploring alternatives is critical for survival and hypothesized to rely on shifts in neuromodulation. We developed a behavioral paradigm to capture exploitative and exploratory states and imaged calcium dynamics in genetically identified dopaminergic and noradrenergic neurons. During exploitative states, characterized by motivated repetition of the same action choice, dopamine neurons in SNc encoding movement vigor showed sustained elevation of basal activity that lasted many seconds. This sustained activity emerged from longer positive responses, which accumulated during exploitative action-reward bouts, and hysteretic dynamics. Conversely, noradrenergic neurons in LC showed sustained inhibition of basal activity due to the accumulation of longer negative responses in LC. Chemogenetic manipulation of these sustained dynamics revealed that dopaminergic activity mediates action drive, whereas noradrenergic activity modulates choice diversity. These data uncover the emergence of sustained neural states in dopaminergic and noradrenergic networks that mediate dissociable aspects of exploitative bouts.
Collapse
Affiliation(s)
- Aaron C Koralek
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
- Champalimaud Neuroscience Programme, Champalimaud Centre for the Unknown, Lisbon, Portugal
| | - Rui M Costa
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA.
- Champalimaud Neuroscience Programme, Champalimaud Centre for the Unknown, Lisbon, Portugal
| |
Collapse
|
27
|
Tervo DGR, Kuleshova E, Manakov M, Proskurin M, Karlsson M, Lustig A, Behnam R, Karpova AY. The anterior cingulate cortex directs exploration of alternative strategies. Neuron 2021; 109:1876-1887.e6. [PMID: 33852896 DOI: 10.1016/j.neuron.2021.03.028] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Revised: 12/31/2020] [Accepted: 03/22/2021] [Indexed: 12/26/2022]
Abstract
The ability to adjust one's behavioral strategy in complex environments is at the core of cognition. Doing so efficiently requires monitoring the reliability of the ongoing strategy and, when appropriate, switching away from it to evaluate alternatives. Studies in humans and non-human primates have uncovered signals in the anterior cingulate cortex (ACC) that reflect the pressure to switch away from the ongoing strategy, whereas other ACC signals relate to the pursuit of alternatives. However, whether these signals underlie computations that actually underpin strategy switching or merely reflect tracking of related variables remains unclear. Here we provide causal evidence that the rodent ACC actively arbitrates between persisting with the ongoing behavioral strategy and temporarily switching away to re-evaluate alternatives. Furthermore, by individually perturbing distinct output pathways, we establish that the two associated computations-determining whether to switch strategy and committing to the pursuit of a specific alternative-are segregated in the ACC microcircuitry.
Collapse
Affiliation(s)
| | - Elena Kuleshova
- Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, VA, USA; Institute of Higher Nervous Activity and Neurophysiology of the Russian Academy of Sciences, Moscow, Russia
| | - Maxim Manakov
- Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, VA, USA; Department of Neuroscience, Johns Hopkins University Medical School, Baltimore, MD, USA
| | - Mikhail Proskurin
- Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, VA, USA; Department of Neuroscience, Johns Hopkins University Medical School, Baltimore, MD, USA
| | - Mattias Karlsson
- Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, VA, USA; SpikeGadgets, San Francisco, CA, USA
| | - Andy Lustig
- Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, VA, USA
| | - Reza Behnam
- Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, VA, USA
| | - Alla Y Karpova
- Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, VA, USA.
| |
Collapse
|
28
|
Wilson RC, Bonawitz E, Costa VD, Ebitz RB. Balancing exploration and exploitation with information and randomization. Curr Opin Behav Sci 2021; 38:49-56. [PMID: 33184605 PMCID: PMC7654823 DOI: 10.1016/j.cobeha.2020.10.001] [Citation(s) in RCA: 78] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Explore-exploit decisions require us to trade off the benefits of exploring unknown options to learn more about them, with exploiting known options, for immediate reward. Such decisions are ubiquitous in nature, but from a computational perspective, they are notoriously hard. There is therefore much interest in how humans and animals make these decisions and recently there has been an explosion of research in this area. Here we provide a biased and incomplete snapshot of this field focusing on the major finding that many organisms use two distinct strategies to solve the explore-exploit dilemma: a bias for information ('directed exploration') and the randomization of choice ('random exploration'). We review evidence for the existence of these strategies, their computational properties, their neural implementations, as well as how directed and random exploration vary over the lifespan. We conclude by highlighting open questions in this field that are ripe to both explore and exploit.
Collapse
Affiliation(s)
- Robert C. Wilson
- Department of Psychology, University of Arizona, Tucson AZ USA
- Cognitive Science Program, University of Arizona, Tucson AZ USA
- Evelyn F. McKnight Brain Institute, University of Arizona, Tucson AZ USA
| | | | - Vincent D. Costa
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland OR USA
| | - R. Becket Ebitz
- Department of Neuroscience, University of Montréal, Montréal, Québec, Canada
| |
Collapse
|
29
|
|
30
|
Feng SF, Wang S, Zarnescu S, Wilson RC. The dynamics of explore-exploit decisions reveal a signal-to-noise mechanism for random exploration. Sci Rep 2021; 11:3077. [PMID: 33542333 PMCID: PMC7862437 DOI: 10.1038/s41598-021-82530-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 12/16/2020] [Indexed: 12/29/2022] Open
Abstract
Growing evidence suggests that behavioral variability plays a critical role in how humans manage the tradeoff between exploration and exploitation. In these decisions a little variability can help us to overcome the desire to exploit known rewards by encouraging us to randomly explore something else. Here we investigate how such 'random exploration' could be controlled using a drift-diffusion model of the explore-exploit choice. In this model, variability is controlled by either the signal-to-noise ratio with which reward is encoded (the 'drift rate'), or the amount of information required before a decision is made (the 'threshold'). By fitting this model to behavior, we find that while, statistically, both drift and threshold change when people randomly explore, numerically, the change in drift rate has by far the largest effect. This suggests that random exploration is primarily driven by changes in the signal-to-noise ratio with which reward information is represented in the brain.
Collapse
Affiliation(s)
- Samuel F Feng
- Department of Mathematics, Khalifa University of Science and Technology, Abu Dhabi, UAE
- Khalifa University Centre for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, UAE
| | - Siyu Wang
- Department of Psychology, University of Arizona, Tucson, AZ, USA
| | - Sylvia Zarnescu
- Department of Psychology, University of Arizona, Tucson, AZ, USA
| | - Robert C Wilson
- Department of Psychology, University of Arizona, Tucson, AZ, USA.
- Cognitive Science Program, University of Arizona, Tucson, AZ, USA.
| |
Collapse
|
31
|
Ebitz RB, Tu JC, Hayden BY. Rules warp feature encoding in decision-making circuits. PLoS Biol 2020; 18:e3000951. [PMID: 33253163 PMCID: PMC7728226 DOI: 10.1371/journal.pbio.3000951] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Revised: 12/10/2020] [Accepted: 11/02/2020] [Indexed: 01/22/2023] Open
Abstract
We have the capacity to follow arbitrary stimulus-response rules, meaning simple policies that guide our behavior. Rule identity is broadly encoded across decision-making circuits, but there are less data on how rules shape the computations that lead to choices. One idea is that rules could simplify these computations. When we follow a rule, there is no need to encode or compute information that is irrelevant to the current rule, which could reduce the metabolic or energetic demands of decision-making. However, it is not clear if the brain can actually take advantage of this computational simplicity. To test this idea, we recorded from neurons in 3 regions linked to decision-making, the orbitofrontal cortex (OFC), ventral striatum (VS), and dorsal striatum (DS), while macaques performed a rule-based decision-making task. Rule-based decisions were identified via modeling rules as the latent causes of decisions. This left us with a set of physically identical choices that maximized reward and information, but could not be explained by simple stimulus-response rules. Contrasting rule-based choices with these residual choices revealed that following rules (1) decreased the energetic cost of decision-making; and (2) expanded rule-relevant coding dimensions and compressed rule-irrelevant ones. Together, these results suggest that we use rules, in part, because they reduce the costs of decision-making through a distributed representational warping in decision-making circuits.
Collapse
Affiliation(s)
- R. Becket Ebitz
- Department of Neuroscience, Center for Magnetic Resonance Research, and Center for Neuroengineering University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Jiaxin Cindy Tu
- Department of Neuroscience, Center for Magnetic Resonance Research, and Center for Neuroengineering University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Benjamin Y. Hayden
- Department of Neuroscience, Center for Magnetic Resonance Research, and Center for Neuroengineering University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|
32
|
Taghizadeh B, Foley NC, Karimimehr S, Cohanpour M, Semework M, Sheth SA, Lashgari R, Gottlieb J. Reward uncertainty asymmetrically affects information transmission within the monkey fronto-parietal network. Commun Biol 2020; 3:594. [PMID: 33087809 PMCID: PMC7578031 DOI: 10.1038/s42003-020-01320-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Accepted: 09/25/2020] [Indexed: 01/02/2023] Open
Abstract
A central hypothesis in research on executive function is that controlled information processing is costly and is allocated according to the behavioral benefits it brings. However, while computational theories predict that the benefits of new information depend on prior uncertainty, the cellular effects of uncertainty on the executive network are incompletely understood. Using simultaneous recordings in monkeys, we describe several mechanisms by which the fronto-parietal network reacts to uncertainty. We show that the variance of expected rewards, independently of the value of the rewards, was encoded in single neuron and population spiking activity and local field potential (LFP) oscillations, and, importantly, asymmetrically affected fronto-parietal information transmission (measured through the coherence between spikes and LFPs). Higher uncertainty selectively enhanced information transmission from the parietal to the frontal lobe and suppressed it in the opposite direction, consistent with Bayesian principles that prioritize sensory information according to a decision maker’s prior uncertainty. Bahareh Taghizadeh and Nicholas Foley et al. show that individual neuronal responses, population spiking activity, and local field potential oscillations encode the variance of expected rewards independent of their value. They also demonstrate that reward uncertainty asymmetrically affects neuronal transmission within the monkey fronto-parietal network.
Collapse
Affiliation(s)
- Bahareh Taghizadeh
- Brain Engineering Research Center, Institute for Research in Fundamental Sciences, Tehran, Iran.,School of Cognitive Sciences, Institute for Research in Fundamental Sciences, Tehran, Iran
| | - Nicholas C Foley
- Department of Neuroscience, Columbia University, New York, NY, USA.,Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Saeed Karimimehr
- Brain Engineering Research Center, Institute for Research in Fundamental Sciences, Tehran, Iran.,School of Cognitive Sciences, Institute for Research in Fundamental Sciences, Tehran, Iran
| | - Michael Cohanpour
- Department of Neuroscience, Columbia University, New York, NY, USA.,Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Mulugeta Semework
- Department of Neuroscience, Columbia University, New York, NY, USA.,Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Sameer A Sheth
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Reza Lashgari
- Brain Engineering Research Center, Institute for Research in Fundamental Sciences, Tehran, Iran.,Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Jacqueline Gottlieb
- Department of Neuroscience, Columbia University, New York, NY, USA. .,Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA. .,The Kavli Institute for Brain Science, Columbia University, New York, NY, USA.
| |
Collapse
|
33
|
Taswell CA, Costa VD, Basile BM, Pujara MS, Jones B, Manem N, Murray EA, Averbeck BB. Effects of Amygdala Lesions on Object-Based Versus Action-Based Learning in Macaques. Cereb Cortex 2020; 31:529-546. [PMID: 32954409 DOI: 10.1093/cercor/bhaa241] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Revised: 08/05/2020] [Accepted: 08/05/2020] [Indexed: 01/01/2023] Open
Abstract
The neural systems that underlie reinforcement learning (RL) allow animals to adapt to changes in their environment. In the present study, we examined the hypothesis that the amygdala would have a preferential role in learning the values of visual objects. We compared a group of monkeys (Macaca mulatta) with amygdala lesions to a group of unoperated controls on a two-armed bandit reversal learning task. The task had two conditions. In the What condition, the animals had to learn to select a visual object, independent of its location. And in the Where condition, the animals had to learn to saccade to a location, independent of the object at the location. In both conditions choice-outcome mappings reversed in the middle of the block. We found that monkeys with amygdala lesions had learning deficits in both conditions. Monkeys with amygdala lesions did not have deficits in learning to reverse choice-outcome mappings. Rather, amygdala lesions caused the monkeys to become overly sensitive to negative feedback which impaired their ability to consistently select the more highly valued action or object. These results imply that the amygdala is generally necessary for RL.
Collapse
Affiliation(s)
- Craig A Taswell
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, MD 20892-4415, USA
| | - Vincent D Costa
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, MD 20892-4415, USA
| | - Benjamin M Basile
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, MD 20892-4415, USA
| | - Maia S Pujara
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, MD 20892-4415, USA
| | - Breonda Jones
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, MD 20892-4415, USA
| | - Nihita Manem
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, MD 20892-4415, USA
| | - Elisabeth A Murray
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, MD 20892-4415, USA
| | - Bruno B Averbeck
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, MD 20892-4415, USA
| |
Collapse
|
34
|
Moreno-Bote R, Ramírez-Ruiz J, Drugowitsch J, Hayden BY. Heuristics and optimal solutions to the breadth-depth dilemma. Proc Natl Acad Sci U S A 2020; 117:19799-19808. [PMID: 32759219 PMCID: PMC7443877 DOI: 10.1073/pnas.2004929117] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
In multialternative risky choice, we are often faced with the opportunity to allocate our limited information-gathering capacity between several options before receiving feedback. In such cases, we face a natural trade-off between breadth-spreading our capacity across many options-and depth-gaining more information about a smaller number of options. Despite its broad relevance to daily life, including in many naturalistic foraging situations, the optimal strategy in the breadth-depth trade-off has not been delineated. Here, we formalize the breadth-depth dilemma through a finite-sample capacity model. We find that, if capacity is small (∼10 samples), it is optimal to draw one sample per alternative, favoring breadth. However, for larger capacities, a sharp transition is observed, and it becomes best to deeply sample a very small fraction of alternatives, which roughly decreases with the square root of capacity. Thus, ignoring most options, even when capacity is large enough to shallowly sample all of them, is a signature of optimal behavior. Our results also provide a rich casuistic for metareasoning in multialternative decisions with bounded capacity using close-to-optimal heuristics.
Collapse
Affiliation(s)
- Rubén Moreno-Bote
- Center for Brain and Cognition, Universitat Pompeu Fabra, 08002 Barcelona, Spain;
- Department of Information and Communication Technologies, Universitat Pompeu Fabra, 08002 Barcelona, Spain
- Serra Húnter Fellow Programme, Universitat Pompeu Fabra, 08002 Barcelona, Spain
- Catalan Institution for Research and Advanced Studies-Academia, Universitat Pompeu Fabra, 08002 Barcelona, Spain
| | - Jorge Ramírez-Ruiz
- Center for Brain and Cognition, Universitat Pompeu Fabra, 08002 Barcelona, Spain
- Department of Information and Communication Technologies, Universitat Pompeu Fabra, 08002 Barcelona, Spain
| | - Jan Drugowitsch
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115
| | - Benjamin Y Hayden
- Department of Neuroscience, University of Minnesota, Minneapolis, MN 55455
- Center for Magnetic Resonance Research, University of Minnesota, Minneapolis, MN 55455
- Center for Neural Engineering, University of Minnesota, Minneapolis, MN 55455
| |
Collapse
|
35
|
Chakroun K, Mathar D, Wiehler A, Ganzer F, Peters J. Dopaminergic modulation of the exploration/exploitation trade-off in human decision-making. eLife 2020; 9:e51260. [PMID: 32484779 PMCID: PMC7266623 DOI: 10.7554/elife.51260] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Accepted: 05/01/2020] [Indexed: 01/15/2023] Open
Abstract
Involvement of dopamine in regulating exploration during decision-making has long been hypothesized, but direct causal evidence in humans is still lacking. Here, we use a combination of computational modeling, pharmacological intervention and functional magnetic resonance imaging to address this issue. Thirty-one healthy male participants performed a restless four-armed bandit task in a within-subjects design under three drug conditions: 150 mg of the dopamine precursor L-dopa, 2 mg of the D2 receptor antagonist haloperidol, and placebo. Choices were best explained by an extension of an established Bayesian learning model accounting for perseveration, directed exploration and random exploration. Modeling revealed attenuated directed exploration under L-dopa, while neural signatures of exploration, exploitation and prediction error were unaffected. Instead, L-dopa attenuated neural representations of overall uncertainty in insula and dorsal anterior cingulate cortex. Our results highlight the computational role of these regions in exploration and suggest that dopamine modulates how this circuit tracks accumulating uncertainty during decision-making.
Collapse
Affiliation(s)
- Karima Chakroun
- Department of Systems Neuroscience, University Medical Center Hamburg-EppendorfHamburgGermany
| | - David Mathar
- Department of Psychology, Biological Psychology, University of CologneCologneGermany
| | - Antonius Wiehler
- Department of Systems Neuroscience, University Medical Center Hamburg-EppendorfHamburgGermany
- Institut du Cerveau et de la Moelle épinière - ICM, Centre de NeuroImagerie de Recherche - CENIR, Sorbonne Universités, Groupe Hospitalier Pitié-SalpêtrièreParisFrance
| | - Florian Ganzer
- German Center for Addiction Research in Childhood and Adolescence, University Medical Center Hamburg-EppendorfHamburgGermany
| | - Jan Peters
- Department of Systems Neuroscience, University Medical Center Hamburg-EppendorfHamburgGermany
- Department of Psychology, Biological Psychology, University of CologneCologneGermany
| |
Collapse
|
36
|
Bartolo R, Averbeck BB. Prefrontal Cortex Predicts State Switches during Reversal Learning. Neuron 2020; 106:1044-1054.e4. [PMID: 32315603 DOI: 10.1016/j.neuron.2020.03.024] [Citation(s) in RCA: 64] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Revised: 01/28/2020] [Accepted: 03/24/2020] [Indexed: 11/25/2022]
Abstract
Reinforcement learning allows organisms to predict future outcomes and to update their beliefs about value in the world. The dorsal-lateral prefrontal cortex (dlPFC) integrates information carried by reward circuits, which can be used to infer the current state of the world under uncertainty. Here, we explored the dlPFC computations related to updating current beliefs during stochastic reversal learning. We recorded the activity of populations up to 1,000 neurons, simultaneously, in two male macaques while they executed a two-armed bandit reversal learning task. Behavioral analyses using a Bayesian framework showed that animals inferred reversals and switched their choice preference rapidly, rather than slowly updating choice values, consistent with state inference. Furthermore, dlPFC neural populations accurately encoded choice preference switches. These results suggest that prefrontal neurons dynamically encode decisions associated with Bayesian subjective values, highlighting the role of the PFC in representing a belief about the current state of the world.
Collapse
Affiliation(s)
- Ramon Bartolo
- Laboratory of Neuropsychology, National Institute of Mental Health/National Institutes of Health, Bethesda, MD 20892-4415, USA.
| | - Bruno B Averbeck
- Laboratory of Neuropsychology, National Institute of Mental Health/National Institutes of Health, Bethesda, MD 20892-4415, USA
| |
Collapse
|
37
|
Yoo SBM, Hayden BY. The Transition from Evaluation to Selection Involves Neural Subspace Reorganization in Core Reward Regions. Neuron 2020; 105:712-724.e4. [PMID: 31836322 PMCID: PMC7035164 DOI: 10.1016/j.neuron.2019.11.013] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 10/13/2019] [Accepted: 11/08/2019] [Indexed: 11/29/2022]
Abstract
Economic choice proceeds from evaluation, in which we contemplate options, to selection, in which we weigh options and choose one. These stages must be differentiated so that decision makers do not proceed to selection before evaluation is complete. We examined responses of neurons in two core reward regions, orbitofrontal (OFC) and ventromedial prefrontal cortex (vmPFC), during two-option choice with asynchronous offer presentation. Our data suggest that neurons selective during the first (presumed evaluation) and second (presumed comparison and selection) offer epochs come from a single pool. Stage transition is accompanied by a shift toward orthogonality in the low-dimensional population response manifold. Nonetheless, the relative position of each option in driving responses in the population subspace is preserved. The orthogonalization we observe supports the hypothesis that the transition from evaluation to selection leads to reorganization of response subspace and suggests a mechanism by which value-related signals are prevented from prematurely driving choice.
Collapse
Affiliation(s)
- Seng Bum Michael Yoo
- Department of Neuroscience, Center for Magnetic Resonance Research, Center for Neuroengineering, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Benjamin Y Hayden
- Department of Neuroscience, Center for Magnetic Resonance Research, Center for Neuroengineering, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
38
|
Prefrontal attentional saccades explore space rhythmically. Nat Commun 2020; 11:925. [PMID: 32066740 PMCID: PMC7026397 DOI: 10.1038/s41467-020-14649-7] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Accepted: 01/25/2020] [Indexed: 01/01/2023] Open
Abstract
Recent studies suggest that attention samples space rhythmically through oscillatory interactions in the frontoparietal network. How these attentional fluctuations coincide with spatial exploration/displacement and exploitation/selection by a dynamic attentional spotlight under top-down control is unclear. Here, we show a direct contribution of prefrontal attention selection mechanisms to a continuous space exploration. Specifically, we provide a direct high spatio-temporal resolution prefrontal population decoding of the covert attentional spotlight. We show that it continuously explores space at a 7-12 Hz rhythm. Sensory encoding and behavioral reports are increased at a specific optimal phase w/ to this rhythm. We propose that this prefrontal neuronal rhythm reflects an alpha-clocked sampling of the visual environment in the absence of eye movements. These attentional explorations are highly flexible, how they spatially unfold depending both on within-trial and across-task contingencies. These results are discussed in the context of exploration-exploitation strategies and prefrontal top-down attentional control.
Collapse
|
39
|
O C Jordan H, Navarro DM, Stringer SM. The formation and use of hierarchical cognitive maps in the brain: A neural network model. NETWORK (BRISTOL, ENGLAND) 2020; 31:37-141. [PMID: 32746663 DOI: 10.1080/0954898x.2020.1798531] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 06/21/2020] [Accepted: 07/16/2020] [Indexed: 06/11/2023]
Abstract
Many researchers have tried to model how environmental knowledge is learned by the brain and used in the form of cognitive maps. However, previous work was limited in various important ways: there was little consensus on how these cognitive maps were formed and represented, the planning mechanism was inherently limited to performing relatively simple tasks, and there was little consideration of how these mechanisms would scale up. This paper makes several significant advances. Firstly, the planning mechanism used by the majority of previous work propagates a decaying signal through the network to create a gradient that points towards the goal. However, this decaying signal limited the scale and complexity of tasks that can be solved in this manner. Here we propose several ways in which a network can can self-organize a novel planning mechanism that does not require decaying activity. We also extend this model with a hierarchical planning mechanism: a layer of cells that identify frequently-used sequences of actions and reuse them to significantly increase the efficiency of planning. We speculate that our results may explain the apparent ability of humans and animals to perform model-based planning on both small and large scales without a noticeable loss of efficiency.
Collapse
|
40
|
Khalighinejad N, Bongioanni A, Verhagen L, Folloni D, Attali D, Aubry JF, Sallet J, Rushworth MFS. A Basal Forebrain-Cingulate Circuit in Macaques Decides It Is Time to Act. Neuron 2019; 105:370-384.e8. [PMID: 31813653 PMCID: PMC6975166 DOI: 10.1016/j.neuron.2019.10.030] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Revised: 10/02/2019] [Accepted: 10/22/2019] [Indexed: 12/22/2022]
Abstract
The medial frontal cortex has been linked to voluntary action, but an explanation of why decisions to act emerge at particular points in time has been lacking. We show that, in macaques, decisions about whether and when to act are predicted by a set of features defining the animal’s current and past context; for example, respectively, cues indicating the current average rate of reward and recent previous voluntary action decisions. We show that activity in two brain areas—the anterior cingulate cortex and basal forebrain—tracks these contextual factors and mediates their effects on behavior in distinct ways. We use focused transcranial ultrasound to selectively and effectively stimulate deep in the brain, even as deep as the basal forebrain, and demonstrate that alteration of activity in the two areas changes decisions about when to act. Likelihood and timing of voluntary action in macaques can be partially predicted Recent experience and present context influence when voluntary action occurs A basal forebrain-cingulate circuit mediated effects of these factors on behavior Stimulation of this circuit by ultrasound changed decisions about when to act
Collapse
Affiliation(s)
- Nima Khalighinejad
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, University of Oxford, Oxford OX1 3SR, UK.
| | - Alessandro Bongioanni
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, University of Oxford, Oxford OX1 3SR, UK
| | - Lennart Verhagen
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, University of Oxford, Oxford OX1 3SR, UK; Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, Nijmegen 6525 XZ, the Netherlands
| | - Davide Folloni
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, University of Oxford, Oxford OX1 3SR, UK
| | - David Attali
- Physics for Medicine Paris, INSERM U1273, ESPCI Paris, CNRS FRE 2031, PSL Research University, Paris 75012, France; Pathophysiology of Psychiatric Disorders Laboratory, Inserm U1266, Institute of Psychiatry and Neuroscience of Paris, Paris Descartes University, Paris University, Paris 75014, France; Service Hospitalo-Universitaire, Sainte-Anne Hospital, UGH Paris Psychiatry and Neurosciences, Paris 75014, France
| | - Jean-Francois Aubry
- Physics for Medicine Paris, INSERM U1273, ESPCI Paris, CNRS FRE 2031, PSL Research University, Paris 75012, France
| | - Jerome Sallet
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, University of Oxford, Oxford OX1 3SR, UK
| | - Matthew F S Rushworth
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, University of Oxford, Oxford OX1 3SR, UK
| |
Collapse
|
41
|
Ebitz RB, Sleezer BJ, Jedema HP, Bradberry CW, Hayden BY. Tonic exploration governs both flexibility and lapses. PLoS Comput Biol 2019; 15:e1007475. [PMID: 31703063 PMCID: PMC6867658 DOI: 10.1371/journal.pcbi.1007475] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Revised: 11/20/2019] [Accepted: 10/10/2019] [Indexed: 11/20/2022] Open
Abstract
In many cognitive tasks, lapses (spontaneous errors) are tacitly dismissed as the result of nuisance processes like sensorimotor noise, fatigue, or disengagement. However, some lapses could also be caused by exploratory noise: randomness in behavior that facilitates learning in changing environments. If so, then strategic processes would need only up-regulate (rather than generate) exploration to adapt to a changing environment. This view predicts that more frequent lapses should be associated with greater flexibility because these behaviors share a common cause. Here, we report that when rhesus macaques performed a set-shifting task, lapse rates were negatively correlated with perseverative error frequency across sessions, consistent with a common basis in exploration. The results could not be explained by local failures to learn. Furthermore, chronic exposure to cocaine, which is known to impair cognitive flexibility, did increase perseverative errors, but, surprisingly, also improved overall set-shifting task performance by reducing lapse rates. We reconcile these results with a state-switching model in which cocaine decreases exploration by deepening attractor basins corresponding to rule states. These results support the idea that exploratory noise contributes to lapses, affecting rule-based decision-making even when it has no strategic value, and suggest that one key mechanism for regulating exploration may be the depth of rule states.
Collapse
Affiliation(s)
- R. Becket Ebitz
- Department of Neuroscience and Center for Magnetic Resonance Research University of Minnesota, Minneapolis, MN, United States of America
| | - Brianna J. Sleezer
- Department of Neurobiology and Behavior, Cornell University, Ithaca, NY, United States of America
| | - Hank P. Jedema
- NIDA Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD, United States of America
| | - Charles W. Bradberry
- NIDA Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD, United States of America
| | - Benjamin Y. Hayden
- Department of Neuroscience and Center for Magnetic Resonance Research University of Minnesota, Minneapolis, MN, United States of America
| |
Collapse
|
42
|
Costa VD, Mitz AR, Averbeck BB. Subcortical Substrates of Explore-Exploit Decisions in Primates. Neuron 2019; 103:533-545.e5. [PMID: 31196672 PMCID: PMC6687547 DOI: 10.1016/j.neuron.2019.05.017] [Citation(s) in RCA: 70] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2018] [Revised: 03/27/2019] [Accepted: 05/08/2019] [Indexed: 01/06/2023]
Abstract
The explore-exploit dilemma refers to the challenge of deciding when to forego immediate rewards and explore new opportunities that could lead to greater rewards in the future. While motivational neural circuits facilitate learning based on past choices and outcomes, it is unclear whether they also support computations relevant for deciding when to explore. We recorded neural activity in the amygdala and ventral striatum of rhesus macaques as they solved a task that required them to balance novelty-driven exploration with exploitation of what they had already learned. Using a partially observable Markov decision process (POMDP) model to quantify explore-exploit trade-offs, we identified that the ventral striatum and amygdala differ in how they represent the immediate value of exploitative choices and the future value of exploratory choices. These findings show that subcortical motivational circuits are important in guiding explore-exploit decisions.
Collapse
Affiliation(s)
- Vincent D Costa
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institute of Health, Bethesda, MD 20892, USA; Department of Behavioral Neuroscience, Oregon Health & Science University, Portland, OR 97239, USA; Division of Neuroscience, Oregon National Primate Research Center, Beaverton, OR 97006, USA.
| | - Andrew R Mitz
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institute of Health, Bethesda, MD 20892, USA
| | - Bruno B Averbeck
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institute of Health, Bethesda, MD 20892, USA
| |
Collapse
|
43
|
Bari BA, Grossman CD, Lubin EE, Rajagopalan AE, Cressy JI, Cohen JY. Stable Representations of Decision Variables for Flexible Behavior. Neuron 2019; 103:922-933.e7. [PMID: 31280924 DOI: 10.1016/j.neuron.2019.06.001] [Citation(s) in RCA: 80] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Revised: 05/03/2019] [Accepted: 05/31/2019] [Indexed: 12/25/2022]
Abstract
Decisions occur in dynamic environments. In the framework of reinforcement learning, the probability of performing an action is influenced by decision variables. Discrepancies between predicted and obtained rewards (reward prediction errors) update these variables, but they are otherwise stable between decisions. Although reward prediction errors have been mapped to midbrain dopamine neurons, it is unclear how the brain represents decision variables themselves. We trained mice on a dynamic foraging task in which they chose between alternatives that delivered reward with changing probabilities. Neurons in the medial prefrontal cortex, including projections to the dorsomedial striatum, maintained persistent firing rate changes over long timescales. These changes stably represented relative action values (to bias choices) and total action values (to bias response times) with slow decay. In contrast, decision variables were weakly represented in the anterolateral motor cortex, a region necessary for generating choices. Thus, we define a stable neural mechanism to drive flexible behavior.
Collapse
Affiliation(s)
- Bilal A Bari
- The Solomon H. Snyder Department of Neuroscience, Brain Science Institute, Kavli Neuroscience Discovery Institute, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Cooper D Grossman
- The Solomon H. Snyder Department of Neuroscience, Brain Science Institute, Kavli Neuroscience Discovery Institute, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Emily E Lubin
- The Solomon H. Snyder Department of Neuroscience, Brain Science Institute, Kavli Neuroscience Discovery Institute, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Adithya E Rajagopalan
- The Solomon H. Snyder Department of Neuroscience, Brain Science Institute, Kavli Neuroscience Discovery Institute, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Jianna I Cressy
- The Solomon H. Snyder Department of Neuroscience, Brain Science Institute, Kavli Neuroscience Discovery Institute, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Jeremiah Y Cohen
- The Solomon H. Snyder Department of Neuroscience, Brain Science Institute, Kavli Neuroscience Discovery Institute, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.
| |
Collapse
|
44
|
Ebitz RB, Moore T. Both a Gauge and a Filter: Cognitive Modulations of Pupil Size. Front Neurol 2019; 9:1190. [PMID: 30723454 PMCID: PMC6350273 DOI: 10.3389/fneur.2018.01190] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2018] [Accepted: 12/27/2018] [Indexed: 01/21/2023] Open
Abstract
Over 50 years of research have established that cognitive processes influence pupil size. This has led to the widespread use of pupil size as a peripheral measure of cortical processing in psychology and neuroscience. However, the function of cortical control over the pupil remains poorly understood. Why does visual attention change the pupil light reflex? Why do mental effort and surprise cause pupil dilation? Here, we consider these functional questions as we review and synthesize two literatures on cognitive effects on the pupil: how cognition affects pupil light response and how cognition affects pupil size under constant luminance. We propose that cognition may have co-opted control of the pupil in order to filter incoming visual information to optimize it for particular goals. This could complement other cortical mechanisms through which cognition shapes visual perception.
Collapse
Affiliation(s)
- R. Becket Ebitz
- Department of Neuroscience and Center for Magnetic Resonance Research, University of Minnesota, Minneapolis, MN, United States
| | - Tirin Moore
- Department of Neurobiology, Stanford University School of Medicine, Stanford, CA, United States
- Howard Hughes Medical Institute, Seattle, WA, United States
| |
Collapse
|
45
|
|