1
|
Feng YY, Bromberg-Martin ES, Monosov IE. Dorsal raphe neurons integrate the values of reward amount, delay, and uncertainty in multi-attribute decision-making. Cell Rep 2024; 43:114341. [PMID: 38878290 DOI: 10.1016/j.celrep.2024.114341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 03/27/2024] [Accepted: 05/23/2024] [Indexed: 06/25/2024] Open
Abstract
The dorsal raphe nucleus (DRN) is implicated in psychiatric disorders that feature impaired sensitivity to reward amount, impulsivity when facing reward delays, and risk-seeking when confronting reward uncertainty. However, it has been unclear whether and how DRN neurons signal reward amount, reward delay, and reward uncertainty during multi-attribute value-based decision-making, where subjects consider these attributes to make a choice. We recorded DRN neurons as monkeys chose between offers whose attributes, namely expected reward amount, reward delay, and reward uncertainty, varied independently. Many DRN neurons signaled offer attributes, and this population tended to integrate the attributes in a manner that reflected monkeys' preferences for amount, delay, and uncertainty. After decision-making, in response to post-decision feedback, these same neurons signaled signed reward prediction errors, suggesting a broader role in tracking value across task epochs and behavioral contexts. Our data illustrate how the DRN participates in value computations, guiding theories about the role of the DRN in decision-making and psychiatric disease.
Collapse
Affiliation(s)
- Yang-Yang Feng
- Department of Neuroscience, Washington University School of Medicine, St. Louis, MO, USA; Department of Biomedical Engineering, Washington University, St. Louis, MO, USA
| | | | - Ilya E Monosov
- Department of Neuroscience, Washington University School of Medicine, St. Louis, MO, USA; Department of Biomedical Engineering, Washington University, St. Louis, MO, USA; Washington University Pain Center, Washington University, St. Louis, MO, USA; Department of Neurosurgery, Washington University, St. Louis, MO, USA; Department of Electrical Engineering, Washington University, St. Louis, MO, USA.
| |
Collapse
|
2
|
Schultz W. A dopamine mechanism for reward maximization. Proc Natl Acad Sci U S A 2024; 121:e2316658121. [PMID: 38717856 PMCID: PMC11098095 DOI: 10.1073/pnas.2316658121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/18/2024] Open
Abstract
Individual survival and evolutionary selection require biological organisms to maximize reward. Economic choice theories define the necessary and sufficient conditions, and neuronal signals of decision variables provide mechanistic explanations. Reinforcement learning (RL) formalisms use predictions, actions, and policies to maximize reward. Midbrain dopamine neurons code reward prediction errors (RPE) of subjective reward value suitable for RL. Electrical and optogenetic self-stimulation experiments demonstrate that monkeys and rodents repeat behaviors that result in dopamine excitation. Dopamine excitations reflect positive RPEs that increase reward predictions via RL; against increasing predictions, obtaining similar dopamine RPE signals again requires better rewards than before. The positive RPEs drive predictions higher again and thus advance a recursive reward-RPE-prediction iteration toward better and better rewards. Agents also avoid dopamine inhibitions that lower reward prediction via RL, which allows smaller rewards than before to elicit positive dopamine RPE signals and resume the iteration toward better rewards. In this way, dopamine RPE signals serve a causal mechanism that attracts agents via RL to the best rewards. The mechanism improves daily life and benefits evolutionary selection but may also induce restlessness and greed.
Collapse
Affiliation(s)
- Wolfram Schultz
- Department of Physiology, Development and Neuroscience, University of Cambridge, CambridgeCB2 3DY, United Kingdom
| |
Collapse
|
3
|
Hill DF, Hickman RW, Al-Mohammad A, Stasiak A, Schultz W. Dopamine neurons encode trial-by-trial subjective reward value in an auction-like task. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.01.20.524896. [PMID: 36711724 PMCID: PMC9882283 DOI: 10.1101/2023.01.20.524896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
The dopamine reward prediction error signal is known to be subjective but has so far only been assessed in aggregate choices. However, personal choices fluctuate across trials and thus reflect the instantaneous subjective reward value. In the well-established Becker-DeGroot-Marschak (BDM) auction-like mechanism, participants are encouraged to place bids that accurately reveal their instantaneous subjective reward value; inaccurate bidding results in suboptimal reward ('incentive compatibility'). In our experiment, male rhesus monkeys became experienced over several years to place accurate BDM bids for juice rewards without specific external constraints. Their bids for physically identical rewards varied trial by trial and increased overall for larger rewards. In these highly experienced animals, responses of midbrain dopamine neurons followed the trial-by-trial variations of bids despite constant, explicitly predicted reward amounts. Inversely, dopamine responses were similar with similar bids for different physical reward amounts. Support Vector Regression demonstrated accurate prediction of the animals' bids by as few as twenty dopamine neurons. Thus, the phasic dopamine reward signal reflects instantaneous subjective reward value.
Collapse
Affiliation(s)
- Daniel F Hill
- Department of Physiology, Development and Neuroscience , University of Cambridge, Cambridge CB2 3DY, United Kingdom
| | - Robert W Hickman
- Department of Physiology, Development and Neuroscience , University of Cambridge, Cambridge CB2 3DY, United Kingdom
| | - Alaa Al-Mohammad
- Department of Physiology, Development and Neuroscience , University of Cambridge, Cambridge CB2 3DY, United Kingdom
| | - Arkadiusz Stasiak
- Department of Physiology, Development and Neuroscience , University of Cambridge, Cambridge CB2 3DY, United Kingdom
| | - Wolfram Schultz
- Department of Physiology, Development and Neuroscience , University of Cambridge, Cambridge CB2 3DY, United Kingdom
| |
Collapse
|
4
|
Hughes NC, Qian H, Zargari M, Zhao Z, Singh B, Wang Z, Fulton JN, Johnson GW, Li R, Dawant BM, Englot DJ, Constantinidis C, Roberson SW, Bick SK. Reward Circuit Local Field Potential Modulations Precede Risk Taking. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.10.588629. [PMID: 38645237 PMCID: PMC11030333 DOI: 10.1101/2024.04.10.588629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Risk taking behavior is a symptom of multiple neuropsychiatric disorders and often lacks effective treatments. Reward circuitry regions including the amygdala, orbitofrontal cortex, insula, and anterior cingulate have been implicated in risk-taking by neuroimaging studies. Electrophysiological activity associated with risk taking in these regions is not well understood in humans. Further characterizing the neural signalling that underlies risk-taking may provide therapeutic insight into disorders associated with risk-taking. Eleven patients with pharmacoresistant epilepsy who underwent stereotactic electroencephalography with electrodes in the amygdala, orbitofrontal cortex, insula, and/or anterior cingulate participated. Patients participated in a gambling task where they wagered on a visible playing card being higher than a hidden card, betting $5 or $20 on this outcome, while local field potentials were recorded from implanted electrodes. We used cluster-based permutation testing to identify reward prediction error signals by comparing oscillatory power following unexpected and expected rewards. We also used cluster-based permutation testing to compare power preceding high and low bets in high-risk (<50% chance of winning) trials and two-way ANOVA with bet and risk level to identify signals associated with risky, risk averse, and optimized decisions. We used linear mixed effects models to evaluate the relationship between reward prediction error and risky decision signals across trials, and a linear regression model for associations between risky decision signal power and Barratt Impulsiveness Scale scores for each patient. Reward prediction error signals were identified in the amygdala (p=0.0066), anterior cingulate (p=0.0092), and orbitofrontal cortex (p=6.0E-4, p=4.0E-4). Risky decisions were predicted by increased oscillatory power in high-gamma frequency range during card presentation in the orbitofrontal cortex (p=0.0022), and by increased power following bet cue presentation across the theta-to-beta range in the orbitofrontal cortex ( p =0.0022), high-gamma in the anterior cingulate ( p =0.0004), and high-gamma in the insula ( p =0.0014). Risk averse decisions were predicted by decreased orbitofrontal cortex gamma power ( p =2.0E-4). Optimized decisions that maximized earnings were preceded by decreases within the theta to beta range in orbitofrontal cortex ( p =2.0E-4), broad frequencies in amygdala ( p =2.0E-4), and theta to low-gamma in insula ( p =4.0E-4). Insula risky decision power was associated with orbitofrontal cortex high-gamma reward prediction error signal ( p =0.0048) and with patient impulsivity ( p =0.00478). Our findings identify and help characterize reward circuitry activity predictive of risk-taking in humans. These findings may serve as potential biomarkers to inform the development of novel treatment strategies such as closed loop neuromodulation for disorders of risk taking.
Collapse
|
5
|
Sasaki R, Ohta Y, Onoe H, Yamaguchi R, Miyamoto T, Tokuda T, Tamaki Y, Isa K, Takahashi J, Kobayashi K, Ohta J, Isa T. Balancing risk-return decisions by manipulating the mesofrontal circuits in primates. Science 2024; 383:55-61. [PMID: 38175903 DOI: 10.1126/science.adj6645] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 11/06/2023] [Indexed: 01/06/2024]
Abstract
Decision-making is always coupled with some level of risk, with more pathological forms of risk-taking decisions manifesting as gambling disorders. In macaque monkeys trained in a high risk-high return (HH) versus low risk-low return (LL) choice task, we found that the reversible pharmacological inactivation of ventral Brodmann area 6 (area 6V) impaired the risk dependency of decision-making. Selective optogenetic activation of the mesofrontal pathway from the ventral tegmental area (VTA) to the ventral aspect of 6V resulted in stronger preference for HH, whereas activation of the pathway from the VTA to the dorsal aspect of 6V led to LL preference. Finally, computational decoding captured the modulations of behavioral preference. Our results suggest that VTA inputs to area 6V determine the decision balance between HH and LL.
Collapse
Affiliation(s)
- Ryo Sasaki
- Division of Physiology and Neurobiology, Department of Neuroscience, Graduate School of Medicine, Kyoto University, Kyoto-shi, Kyoto 606-8501, Japan
| | - Yasumi Ohta
- Division of Materials Science, Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma-shi, Nara 630-0192, Japan
| | - Hirotaka Onoe
- Human Brain Research Center, Graduate School of Medicine, Kyoto University, Kyoto-shi, Kyoto 606-8507, Japan
| | - Reona Yamaguchi
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto-shi, Kyoto 606-8501, Japan
| | - Takeshi Miyamoto
- Division of Physiology and Neurobiology, Department of Neuroscience, Graduate School of Medicine, Kyoto University, Kyoto-shi, Kyoto 606-8501, Japan
- Japan Society for the Promotion of Science, Chiyoda-Ku, Tokyo 102-0083, Japan
| | - Takashi Tokuda
- Institute of Innovative Research, Tokyo Institute of Technology, Meguro-Ku, Tokyo 152-8550, Japan
| | - Yuki Tamaki
- Division of Physiology and Neurobiology, Department of Neuroscience, Graduate School of Medicine, Kyoto University, Kyoto-shi, Kyoto 606-8501, Japan
| | - Kaoru Isa
- Division of Physiology and Neurobiology, Department of Neuroscience, Graduate School of Medicine, Kyoto University, Kyoto-shi, Kyoto 606-8501, Japan
| | - Jun Takahashi
- Department of Clinical Application, Center for iPS Cell Research and Application, Kyoto University, Kyoto-shi, Kyoto 606-8507, Japan
| | - Kenta Kobayashi
- Section of Viral Vector Development, National Institute for Physiological Sciences, Okazaki-shi, Aichi 444-8585, Japan
| | - Jun Ohta
- Division of Materials Science, Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma-shi, Nara 630-0192, Japan
| | - Tadashi Isa
- Division of Physiology and Neurobiology, Department of Neuroscience, Graduate School of Medicine, Kyoto University, Kyoto-shi, Kyoto 606-8501, Japan
- Human Brain Research Center, Graduate School of Medicine, Kyoto University, Kyoto-shi, Kyoto 606-8507, Japan
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto-shi, Kyoto 606-8501, Japan
| |
Collapse
|
6
|
Chan HK, Toyoizumi T. A multi-stage anticipated surprise model with dynamic expectation for economic decision-making. Sci Rep 2024; 14:657. [PMID: 38182692 PMCID: PMC10770108 DOI: 10.1038/s41598-023-50529-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 12/20/2023] [Indexed: 01/07/2024] Open
Abstract
There are many modeling works that aim to explain people's behaviors that violate classical economic theories. However, these models often do not take into full account the multi-stage nature of real-life problems and people's tendency in solving complicated problems sequentially. In this work, we propose a descriptive decision-making model for multi-stage problems with perceived post-decision information. In the model, decisions are chosen based on an entity which we call the 'anticipated surprise'. The reference point is determined by the expected value of the possible outcomes, which we assume to be dynamically changing during the mental simulation of a sequence of events. We illustrate how our formalism can help us understand prominent economic paradoxes and gambling behaviors that involve multi-stage or sequential planning. We also discuss how neuroscience findings, like prediction error signals and introspective neuronal replay, as well as psychological theories like affective forecasting, are related to the features in our model. This provides hints for future experiments to investigate the role of these entities in decision-making.
Collapse
Affiliation(s)
- Ho Ka Chan
- Laboratory for Neural Computation and Adaptation, RIKEN Center for Brain Science, Wako, Japan.
| | - Taro Toyoizumi
- Laboratory for Neural Computation and Adaptation, RIKEN Center for Brain Science, Wako, Japan.
- Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan.
| |
Collapse
|
7
|
Konova AB, Ceceli AO, Horga G, Moeller SJ, Alia-Klein N, Goldstein RZ. Reduced neural encoding of utility prediction errors in cocaine addiction. Neuron 2023; 111:4058-4070.e6. [PMID: 37883973 PMCID: PMC10880133 DOI: 10.1016/j.neuron.2023.09.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 07/18/2023] [Accepted: 09/13/2023] [Indexed: 10/28/2023]
Abstract
Influential accounts of addiction posit alterations in adaptive behavior driven by deficient dopaminergic prediction errors (PEs), signaling the discrepancy between actual and expected reward. Dopamine neurons encode these error signals in subjective terms, calibrated by individual risk preferences, as "utility" PEs. It remains unclear, however, whether people with drug addiction have PE deficits or their computational source. Here, using an analogous task to prior single-unit studies with known expectancies, we show that fMRI-measured PEs similarly reflect utility PEs. Relative to control participants, people with chronic cocaine addiction demonstrate reduced utility PEs in the dopaminoceptive ventral striatum, with similar trends in orbitofrontal cortex. Dissecting this PE signal into its subcomponent terms attributed these reductions to weaker striatal responses to received reward/utility, whereas suppression of activity with reward expectation was unchanged. These findings support that addiction may fundamentally disrupt PE signaling and reveal an underappreciated role for perceived reward value in this mechanism.
Collapse
Affiliation(s)
- Anna B Konova
- Department of Psychiatry, University Behavioral Health Care & the Brain Health Institute, Rutgers University-New Brunswick, Piscataway, NJ 08855, USA.
| | - Ahmet O Ceceli
- Departments of Psychiatry & Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Guillermo Horga
- Department of Psychiatry, Columbia University, New York, NY 10024, USA
| | - Scott J Moeller
- Department of Psychiatry, Renaissance School of Medicine at Stony Brook University, Stony Brook, NY 11794, USA
| | - Nelly Alia-Klein
- Departments of Psychiatry & Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Rita Z Goldstein
- Departments of Psychiatry & Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
| |
Collapse
|
8
|
Pinto SR, Uchida N. Tonic dopamine and biases in value learning linked through a biologically inspired reinforcement learning model. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.10.566580. [PMID: 38014087 PMCID: PMC10680794 DOI: 10.1101/2023.11.10.566580] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
A hallmark of various psychiatric disorders is biased future predictions. Here we examined the mechanisms for biased value learning using reinforcement learning models incorporating recent findings on synaptic plasticity and opponent circuit mechanisms in the basal ganglia. We show that variations in tonic dopamine can alter the balance between learning from positive and negative reward prediction errors, leading to biased value predictions. This bias arises from the sigmoidal shapes of the dose-occupancy curves and distinct affinities of D1- and D2-type dopamine receptors: changes in tonic dopamine differentially alters the slope of the dose-occupancy curves of these receptors, thus sensitivities, at baseline dopamine concentrations. We show that this mechanism can explain biased value learning in both mice and humans and may also contribute to symptoms observed in psychiatric disorders. Our model provides a foundation for understanding the basal ganglia circuit and underscores the significance of tonic dopamine in modulating learning processes.
Collapse
Affiliation(s)
- Sandra Romero Pinto
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA
- Program in Speech and Hearing Bioscience and Technology, Division of Medical Sciences, Harvard Medical School, Boston, MA 02115, USA
| | - Naoshige Uchida
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
9
|
Ferrari-Toniolo S, Schultz W. Reliable population code for subjective economic value from heterogeneous neuronal signals in primate orbitofrontal cortex. Neuron 2023; 111:3683-3696.e7. [PMID: 37678250 DOI: 10.1016/j.neuron.2023.08.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 03/31/2023] [Accepted: 08/08/2023] [Indexed: 09/09/2023]
Abstract
Behavior-related neuronal signals often vary between neurons, which might reflect the unreliability of individual neurons or a truly heterogeneous code. This notion may also apply to economic ("value-based") choices and the underlying reward signals. Reward value is subjective and can be described by a nonlinearly weighted magnitude (utility) and probability. Defining subjective values relies on the continuity axiom, whose testing involves structured variations of a wide range of reward magnitudes and probabilities. Axiom compliance demonstrates understanding of the stimuli and the meaningful character of choices. Using these tests, we investigated the encoding of subjective economic value by neurons in a key economic-decision structure of the monkey brain, the orbitofrontal cortex (OFC). We found that individual neurons carry heterogeneous neuronal value signals that largely fail to match the animal's choices. However, neuronal population signals matched the animal's choices well, suggesting accurate subjective economic value encoding by a heterogeneous population of unreliable neurons.
Collapse
Affiliation(s)
- Simone Ferrari-Toniolo
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, UK.
| | - Wolfram Schultz
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, UK
| |
Collapse
|
10
|
Deng Y, Song D, Ni J, Qing H, Quan Z. Reward prediction error in learning-related behaviors. Front Neurosci 2023; 17:1171612. [PMID: 37662112 PMCID: PMC10471312 DOI: 10.3389/fnins.2023.1171612] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 07/31/2023] [Indexed: 09/05/2023] Open
Abstract
Learning is a complex process, during which our opinions and decisions are easily changed due to unexpected information. But the neural mechanism underlying revision and correction during the learning process remains unclear. For decades, prediction error has been regarded as the core of changes to perception in learning, even driving the learning progress. In this article, we reviewed the concept of reward prediction error, and the encoding mechanism of dopaminergic neurons and the related neural circuities. We also discussed the relationship between reward prediction error and learning-related behaviors, including reversal learning. We then demonstrated the evidence of reward prediction error signals in several neurological diseases, including Parkinson's disease and addiction. These observations may help to better understand the regulatory mechanism of reward prediction error in learning-related behaviors.
Collapse
Affiliation(s)
- Yujun Deng
- Key Laboratory of Molecular Medicine and Biotherapy, School of Life Science, Beijing Institute of Technology, Beijing, China
| | - Da Song
- Key Laboratory of Molecular Medicine and Biotherapy, School of Life Science, Beijing Institute of Technology, Beijing, China
| | - Junjun Ni
- Key Laboratory of Molecular Medicine and Biotherapy, School of Life Science, Beijing Institute of Technology, Beijing, China
| | - Hong Qing
- Key Laboratory of Molecular Medicine and Biotherapy, School of Life Science, Beijing Institute of Technology, Beijing, China
- Department of Biology, Shenzhen MSU-BIT University, Shenzhen, China
| | - Zhenzhen Quan
- Key Laboratory of Molecular Medicine and Biotherapy, School of Life Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
11
|
Pastor-Bernier A, Volkmann K, Chi U Seak L, Stasiak A, Plott CR, Schultz W. Studying neural responses for multi-component economic choices in human and non-human primates using concept-based behavioral choice experiments. STAR Protoc 2023; 4:102296. [PMID: 37294630 PMCID: PMC10323126 DOI: 10.1016/j.xpro.2023.102296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 02/24/2023] [Accepted: 04/19/2023] [Indexed: 06/11/2023] Open
Abstract
Realistic, everyday rewards contain multiple components, such as taste and size. However, our reward valuations and the associated neural reward signals are single dimensional (vector to scalar transformation). Here, we present a protocol to identify these single-dimensional neural responses for multi-component choice options in humans and monkeys using concept-based behavioral choice experiments. We describe the use of stringent economic concepts to develop and implement behavioral tasks. We detail regional neuroimaging in humans and fine-grained neurophysiology in monkeys and describe approaches for data analysis. For complete details on the use and execution of this protocol, please refer to our work on humans Seak et al.1 and Pastor-Bernier et al.2 and monkeys Pastor-Bernier et al. 3, Pastor-Bernier et al.4, and Pastor-Bernier et al.5.
Collapse
Affiliation(s)
- Alexandre Pastor-Bernier
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge CB2 3DY, UK
| | - Konstantin Volkmann
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge CB2 3DY, UK
| | - Leo Chi U Seak
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge CB2 3DY, UK
| | - Arkadiusz Stasiak
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge CB2 3DY, UK
| | - Charles R Plott
- Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA 91125, USA
| | - Wolfram Schultz
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge CB2 3DY, UK.
| |
Collapse
|
12
|
Odland AU, Sandahl R, Andreasen JT. Chronic corticosterone improves perseverative behavior in mice during sequential reversal learning. Behav Brain Res 2023; 450:114479. [PMID: 37169127 DOI: 10.1016/j.bbr.2023.114479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 04/04/2023] [Accepted: 05/06/2023] [Indexed: 05/13/2023]
Abstract
BACKGROUND Stressful life events can both trigger development of psychiatric disorders and promote positive behavioral changes in response to adversities. The relationship between stress and cognitive flexibility is complex, and conflicting effects of stress manifest in both humans and laboratory animals. OBJECTIVE To mirror the clinical situation where stressful life events impair mental health or promote behavioral change, we examined the post-exposure effects of stress on cognitive flexibility in mice. METHODS We tested female C57BL/6JOlaHsd mice in the touchscreen-based sequential reversal learning test. Corticosterone (CORT) was used as a model of stress and was administered in the drinking water for two weeks before reversal learning. Control animals received drinking water without CORT. Behaviors in supplementary tests were included to exclude non-specific confounding effects of CORT and improve interpretation of the results. RESULTS CORT-treated mice were similar to controls on all touchscreen parameters before reversal. During the low accuracy phase of reversal learning, CORT reduced perseveration index, a measure of perseverative responding, but did not affect acquisition of the new reward contingency. This effect was not related to non-specific deficits in chamber activity. CORT increased anxiety-like behavior in the elevated zero maze test and repetitive digging in the marble burying test, reduced locomotor activity, but did not affect spontaneous alternation behavior. CONCLUSION CORT improved cognitive flexibility in the reversal learning test by extinguishing prepotent responses that were no longer rewarded, an effect possibly related to a stress-mediated increase in sensitivity to negative feedback that should be confirmed in a larger study.
Collapse
Affiliation(s)
- Anna U Odland
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, DK-2100, Copenhagen, Denmark
| | - Rune Sandahl
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, DK-2100, Copenhagen, Denmark
| | - Jesper T Andreasen
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, DK-2100, Copenhagen, Denmark.
| |
Collapse
|
13
|
Hong T, Stauffer WR. Computational complexity drives sustained deliberation. Nat Neurosci 2023; 26:850-857. [PMID: 37095398 PMCID: PMC10166852 DOI: 10.1038/s41593-023-01307-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 03/16/2023] [Indexed: 04/26/2023]
Abstract
Economic deliberations are slow, effortful and intentional searches for solutions to difficult economic problems. Although such deliberations are critical for making sound decisions, the underlying reasoning strategies and neurobiological substrates remain poorly understood. Here two nonhuman primates performed a combinatorial optimization task to identify valuable subsets and satisfy predefined constraints. Their behavior revealed evidence of combinatorial reasoning-when low-complexity algorithms that consider items one at a time provided optimal solutions, the animals adopted low-complexity reasoning strategies. When greater computational resources were required, the animals approximated high-complexity algorithms that search for optimal combinations. The deliberation times reflected the demands created by computational complexity-high-complexity algorithms require more operations and, concomitantly, the animals deliberated for longer durations. Recurrent neural networks that mimicked low- and high-complexity algorithms also reflected the behavioral deliberation times and were used to reveal algorithm-specific computations that support economic deliberation. These findings reveal evidence for algorithm-based reasoning and establish a paradigm for studying the neurophysiological basis for sustained deliberation.
Collapse
Affiliation(s)
- Tao Hong
- Department of Neurobiology, University of Pittsburgh, Pittsburgh, PA, USA
- Program in Neural Computation, Carnegie Mellon University, Pittsburgh, PA, USA
- Center for the Neural Basis of Cognition, Pittsburgh, PA, USA
| | - William R Stauffer
- Department of Neurobiology, University of Pittsburgh, Pittsburgh, PA, USA.
- Program in Neural Computation, Carnegie Mellon University, Pittsburgh, PA, USA.
- Center for the Neural Basis of Cognition, Pittsburgh, PA, USA.
| |
Collapse
|
14
|
Huang FY, Grabenhorst F. Nutrient-Sensitive Reinforcement Learning in Monkeys. J Neurosci 2023; 43:1714-1730. [PMID: 36669886 PMCID: PMC10010454 DOI: 10.1523/jneurosci.0752-22.2022] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2022] [Revised: 11/27/2022] [Accepted: 12/19/2022] [Indexed: 01/21/2023] Open
Abstract
In reinforcement learning (RL), animals choose by assigning values to options and learn by updating these values from reward outcomes. This framework has been instrumental in identifying fundamental learning variables and their neuronal implementations. However, canonical RL models do not explain how reward values are constructed from biologically critical intrinsic reward components, such as nutrients. From an ecological perspective, animals should adapt their foraging choices in dynamic environments to acquire nutrients that are essential for survival. Here, to advance the biological and ecological validity of RL models, we investigated how (male) monkeys adapt their choices to obtain preferred nutrient rewards under varying reward probabilities. We found that the nutrient composition of rewards strongly influenced learning and choices. Preferences of the animals for specific nutrients (sugar, fat) affected how they adapted to changing reward probabilities; the history of recent rewards influenced choices of the monkeys more strongly if these rewards contained the their preferred nutrients (nutrient-specific reward history). The monkeys also chose preferred nutrients even when they were associated with lower reward probability. A nutrient-sensitive RL model captured these processes; it updated the values of individual sugar and fat components of expected rewards based on experience and integrated them into subjective values that explained the choices of the monkeys. Nutrient-specific reward prediction errors guided this value-updating process. Our results identify nutrients as important reward components that guide learning and choice by influencing the subjective value of choice options. Extending RL models with nutrient-value functions may enhance their biological validity and uncover nutrient-specific learning and decision variables.SIGNIFICANCE STATEMENT RL is an influential framework that formalizes how animals learn from experienced rewards. Although reward is a foundational concept in RL theory, canonical RL models cannot explain how learning depends on specific reward properties, such as nutrients. Intuitively, learning should be sensitive to the nutrient components of the reward to benefit health and survival. Here, we show that the nutrient (fat, sugar) composition of rewards affects how the monkeys choose and learn in an RL paradigm and that key learning variables including reward history and reward prediction error should be modified with nutrient-specific components to account for the choice behavior observed in the monkeys. By incorporating biologically critical nutrient rewards into the RL framework, our findings help advance the ecological validity of RL models.
Collapse
Affiliation(s)
- Fei-Yang Huang
- Department of Experimental Psychology, University of Oxford, Oxford OX1 3TA, United Kingdom
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge CB2 3DY, United Kingdom
| | - Fabian Grabenhorst
- Department of Experimental Psychology, University of Oxford, Oxford OX1 3TA, United Kingdom
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge CB2 3DY, United Kingdom
| |
Collapse
|
15
|
Seak LCU, Ferrari-Toniolo S, Jain R, Nielsen K, Schultz W. Systematic comparison of risky choices in humans and monkeys. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.07.527517. [PMID: 36798272 PMCID: PMC9934584 DOI: 10.1101/2023.02.07.527517] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
The past decades have seen tremendous progress in fundamental studies on economic choice in humans. However, elucidation of the underlying neuronal processes requires invasive neurophysiological studies that are met with difficulties in humans. Monkeys as evolutionary closest relatives offer a solution. The animals display sophisticated and well-controllable behavior that allows to implement key constructs of proven economic choice theories. However, the similarity of economic choice between the two species has never been systematically investigated. We investigated compliance with the independence axiom (IA) of expected utility theory as one of the most demanding choice tests and compared IA violations between humans and monkeys. Using generalized linear modeling and cumulative prospect theory (CPT), we found that humans and monkeys made comparable risky choices, although their subjective values (utilities) differed. These results suggest similar fundamental choice mechanism across these primate species and encourage to study their underlying neurophysiological mechanisms.
Collapse
Affiliation(s)
- Leo Chi U Seak
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge CB2 3DY, United Kingdom
| | - Simone Ferrari-Toniolo
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge CB2 3DY, United Kingdom
| | - Ritesh Jain
- Management School, University of Liverpool, Liverpool L697ZY, United Kingdom
| | - Kirby Nielsen
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena CA 91125, USA
| | - Wolfram Schultz
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge CB2 3DY, United Kingdom
| |
Collapse
|
16
|
Mkrtchian A, Valton V, Roiser JP. Reliability of Decision-Making and Reinforcement Learning Computational Parameters. COMPUTATIONAL PSYCHIATRY (CAMBRIDGE, MASS.) 2023; 7:30-46. [PMID: 38774643 PMCID: PMC11104400 DOI: 10.5334/cpsy.86] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 01/23/2023] [Indexed: 02/11/2023]
Abstract
Computational models can offer mechanistic insight into cognition and therefore have the potential to transform our understanding of psychiatric disorders and their treatment. For translational efforts to be successful, it is imperative that computational measures capture individual characteristics reliably. Here we examine the reliability of reinforcement learning and economic models derived from two commonly used tasks. Healthy individuals (N = 50) completed a restless four-armed bandit and a calibrated gambling task twice, two weeks apart. Reward and punishment learning rates from the reinforcement learning model showed good reliability and reward and punishment sensitivity from the same model had fair reliability; while risk aversion and loss aversion parameters from a prospect theory model exhibited good and excellent reliability, respectively. Both models were further able to predict future behaviour above chance within individuals. This prediction was better when based on participants' own model parameters than other participants' parameter estimates. These results suggest that reinforcement learning, and particularly prospect theory parameters, as derived from a restless four-armed bandit and a calibrated gambling task, can be measured reliably to assess learning and decision-making mechanisms. Overall, these findings indicate the translational potential of clinically-relevant computational parameters for precision psychiatry.
Collapse
Affiliation(s)
- Anahit Mkrtchian
- Neuroscience and Mental Health Group, Institute of Cognitive Neuroscience, University College London, London, United Kingdom
- Applied Computational Psychiatry Lab, Mental Health Neuroscience Department, Division of Psychiatry and Max Planck Centre for Computational Psychiatry and Ageing Research, Queen Square Institute of Neurology, University College London, London, United Kingdom
| | - Vincent Valton
- Neuroscience and Mental Health Group, Institute of Cognitive Neuroscience, University College London, London, United Kingdom
| | - Jonathan P. Roiser
- Neuroscience and Mental Health Group, Institute of Cognitive Neuroscience, University College London, London, United Kingdom
| |
Collapse
|
17
|
A neuronal prospect theory model in the brain reward circuitry. Nat Commun 2022; 13:5855. [PMID: 36195765 PMCID: PMC9532451 DOI: 10.1038/s41467-022-33579-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 09/22/2022] [Indexed: 11/23/2022] Open
Abstract
Prospect theory, arguably the most prominent theory of choice, is an obvious candidate for neural valuation models. How the activity of individual neurons, a possible computational unit, obeys prospect theory remains unknown. Here, we show, with theoretical accuracy equivalent to that of human neuroimaging studies, that single-neuron activity in four core reward-related cortical and subcortical regions represents the subjective valuation of risky gambles in monkeys. The activity of individual neurons in monkeys passively viewing a lottery reflects the desirability of probabilistic rewards parameterized as a multiplicative combination of utility and probability weighting functions, as in the prospect theory framework. The diverse patterns of valuation signals were not localized but distributed throughout most parts of the reward circuitry. A network model aggregating these signals reconstructed the risk preferences and subjective probability weighting revealed by the animals’ choices. Thus, distributed neural coding explains the computation of subjective valuations under risk. It is unclear how the activity of individual neurons conform to prospect theory. Here, the authors demonstrate that the activity of single neurons in various reward-related regions in the monkey brain can be described as encoding a multiplicative combination of utility and probability weighting, and that this subjective valuation process is achieved via a distributed coding scheme.
Collapse
|
18
|
Karin O, Alon U. The dopamine circuit as a reward-taxis navigation system. PLoS Comput Biol 2022; 18:e1010340. [PMID: 35877694 PMCID: PMC9352198 DOI: 10.1371/journal.pcbi.1010340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 08/04/2022] [Accepted: 06/29/2022] [Indexed: 01/29/2023] Open
Abstract
Studying the brain circuits that control behavior is challenging, since in addition to their structural complexity there are continuous feedback interactions between actions and sensed inputs from the environment. It is therefore important to identify mathematical principles that can be used to develop testable hypotheses. In this study, we use ideas and concepts from systems biology to study the dopamine system, which controls learning, motivation, and movement. Using data from neuronal recordings in behavioral experiments, we developed a mathematical model for dopamine responses and the effect of dopamine on movement. We show that the dopamine system shares core functional analogies with bacterial chemotaxis. Just as chemotaxis robustly climbs chemical attractant gradients, the dopamine circuit performs ‘reward-taxis’ where the attractant is the expected value of reward. The reward-taxis mechanism provides a simple explanation for scale-invariant dopaminergic responses and for matching in free operant settings, and makes testable quantitative predictions. We propose that reward-taxis is a simple and robust navigation strategy that complements other, more goal-directed navigation mechanisms.
Collapse
Affiliation(s)
- Omer Karin
- Dept. of Molecular Cell Biology, Weizmann Institute of Science, Rehovot Israel
- Dept. of Applied Mathematics and Theoretical Physics, Centre for Mathematical Sciences, University of Cambridge, Cambridge, United Kingdom
- Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Cambridge, United Kingdom
- * E-mail: (OK); (UA)
| | - Uri Alon
- Dept. of Molecular Cell Biology, Weizmann Institute of Science, Rehovot Israel
- * E-mail: (OK); (UA)
| |
Collapse
|
19
|
Ferrari-Toniolo S, Seak LCU, Schultz W. Risky choice: Probability weighting explains independence axiom violations in monkeys. JOURNAL OF RISK AND UNCERTAINTY 2022; 65:319-351. [PMID: 36654986 PMCID: PMC9840594 DOI: 10.1007/s11166-022-09388-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 06/02/2022] [Indexed: 06/17/2023]
Abstract
Expected Utility Theory (EUT) provides axioms for maximizing utility in risky choice. The Independence Axiom (IA) is its most demanding axiom: preferences between two options should not change when altering both options equally by mixing them with a common gamble. We tested common consequence (CC) and common ratio (CR) violations of the IA over several months in thousands of stochastic choices using a large variety of binary option sets. Three monkeys showed consistently few outright Preference Reversals (8%) but substantial graded Preference Changes (46%) between the initial preferred gamble and the corresponding altered gamble. Linear Discriminant Analysis (LDA) indicated that gamble probabilities predicted most Preference Changes in CC (72%) and CR (88%) tests. The Akaike Information Criterion indicated that probability weighting within Cumulative Prospect Theory (CPT) explained choices better than models using Expected Value (EV) or EUT. Fitting by utility and probability weighting functions of CPT resulted in nonlinear and non-parallel indifference curves (IC) in the Marschak-Machina triangle and suggested IA non-compliance of models using EV or EUT. Indeed, CPT models predicted Preference Changes better than EV and EUT models. Indifference points in out-of-sample tests were closer to CPT-estimated ICs than EV and EUT ICs. Finally, while the few outright Preference Reversals may reflect the long experience of our monkeys, their more graded Preference Changes corresponded to those reported for humans. In benefitting from the wide testing possibilities in monkeys, our stringent axiomatic tests contribute critical information about risky decision-making and serves as basis for investigating neuronal decision mechanisms. Supplementary information The online version contains supplementary material available at 10.1007/s11166-022-09388-7.
Collapse
Affiliation(s)
- Simone Ferrari-Toniolo
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, UK
| | - Leo Chi U Seak
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, UK
| | - Wolfram Schultz
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, UK
| |
Collapse
|
20
|
Louie K. Asymmetric and adaptive reward coding via normalized reinforcement learning. PLoS Comput Biol 2022; 18:e1010350. [PMID: 35862443 PMCID: PMC9345478 DOI: 10.1371/journal.pcbi.1010350] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 08/02/2022] [Accepted: 07/01/2022] [Indexed: 11/18/2022] Open
Abstract
Learning is widely modeled in psychology, neuroscience, and computer science by prediction error-guided reinforcement learning (RL) algorithms. While standard RL assumes linear reward functions, reward-related neural activity is a saturating, nonlinear function of reward; however, the computational and behavioral implications of nonlinear RL are unknown. Here, we show that nonlinear RL incorporating the canonical divisive normalization computation introduces an intrinsic and tunable asymmetry in prediction error coding. At the behavioral level, this asymmetry explains empirical variability in risk preferences typically attributed to asymmetric learning rates. At the neural level, diversity in asymmetries provides a computational mechanism for recently proposed theories of distributional RL, allowing the brain to learn the full probability distribution of future rewards. This behavioral and computational flexibility argues for an incorporation of biologically valid value functions in computational models of learning and decision-making. Reinforcement learning models are widely used to characterize reward-driven learning in biological and computational agents. Standard reinforcement learning models use linear value functions, despite strong empirical evidence that biological value representations are nonlinear functions of external rewards. Here, we examine the properties of a biologically-based nonlinear reinforcement learning algorithm employing the canonical divisive normalization function, a neural computation commonly found in sensory, cognitive, and reward coding. We show that this normalized reinforcement learning algorithm implements a simple but powerful control of how reward learning reflects relative gains and losses. This property explains diverse behavioral and neural phenomena, and suggests the importance of using biologically valid value functions in computational models of learning and decision-making.
Collapse
Affiliation(s)
- Kenway Louie
- Center for Neural Science, New York University, New York, United States of America
- Neuroscience Institute, New York University Grossman School of Medicine, New York, United States of America
- * E-mail:
| |
Collapse
|
21
|
Efficient coding of cognitive variables underlies dopamine response and choice behavior. Nat Neurosci 2022; 25:738-748. [PMID: 35668173 DOI: 10.1038/s41593-022-01085-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Accepted: 04/26/2022] [Indexed: 11/26/2022]
Abstract
Reward expectations based on internal knowledge of the external environment are a core component of adaptive behavior. However, internal knowledge may be inaccurate or incomplete due to errors in sensory measurements. Some features of the environment may also be encoded inaccurately to minimize representational costs associated with their processing. In this study, we investigated how reward expectations are affected by features of internal representations by studying behavior and dopaminergic activity while mice make time-based decisions. We show that several possible representations allow a reinforcement learning agent to model animals' overall performance during the task. However, only a small subset of highly compressed representations simultaneously reproduced the co-variability in animals' choice behavior and dopaminergic activity. Strikingly, these representations predict an unusual distribution of response times that closely match animals' behavior. These results inform how constraints of representational efficiency may be expressed in encoding representations of dynamic cognitive variables used for reward-based computations.
Collapse
|
22
|
Bujold PM, Seak LCU, Schultz W, Ferrari-Toniolo S. Comparing utility functions between risky and riskless choice in rhesus monkeys. Anim Cogn 2022; 25:385-399. [PMID: 34568979 PMCID: PMC8940808 DOI: 10.1007/s10071-021-01560-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Revised: 08/10/2021] [Accepted: 09/14/2021] [Indexed: 11/29/2022]
Abstract
Decisions can be risky or riskless, depending on the outcomes of the choice. Expected utility theory describes risky choices as a utility maximization process: we choose the option with the highest subjective value (utility), which we compute considering both the option's value and its associated risk. According to the random utility maximization framework, riskless choices could also be based on a utility measure. Neuronal mechanisms of utility-based choice may thus be common to both risky and riskless choices. This assumption would require the existence of a utility function that accounts for both risky and riskless decisions. Here, we investigated whether the choice behavior of two macaque monkeys in risky and riskless decisions could be described by a common underlying utility function. We found that the utility functions elicited in the two choice scenarios were different from each other, even after taking into account the contribution of subjective probability weighting. Our results suggest that distinct utility representations exist for risky and riskless choices, which could reflect distinct neuronal representations of the utility quantities, or distinct brain mechanisms for risky and riskless choices. The different utility functions should be taken into account in neuronal investigations of utility-based choice.
Collapse
Affiliation(s)
- Philipe M. Bujold
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, CB2 3DY UK
| | - Leo Chi U. Seak
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, CB2 3DY UK
| | - Wolfram Schultz
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, CB2 3DY UK
| | - Simone Ferrari-Toniolo
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, CB2 3DY UK
| |
Collapse
|
23
|
Soutschek A, Jetter A, Tobler PN. Towards a Unifying Account of Dopamine’s Role in Cost-Benefit Decision Making. BIOLOGICAL PSYCHIATRY GLOBAL OPEN SCIENCE 2022; 3:179-186. [PMID: 37124350 PMCID: PMC10140448 DOI: 10.1016/j.bpsgos.2022.02.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 02/25/2022] [Accepted: 02/25/2022] [Indexed: 10/18/2022] Open
Abstract
Dopamine is thought to play a crucial role in cost-benefit decision making, but so far there is no consensus on the precise role of dopamine in decision making. Here, we review the literature on dopaminergic manipulations of cost-benefit decision making in humans and evaluate how well different theoretical accounts explain the existing body of evidence. Reduced D2 stimulation tends to increase the willingness to bear delay and risk costs (i.e., wait for later rewards, take riskier options), while increased D1 and D2 receptor stimulation increases willingness to bear effort costs. We argue that the empirical findings can best be explained by combining the strengths of two theoretical accounts: in cost-benefit decision making, dopamine may play a dual role both in promoting the pursuit of psychologically close options (e.g., sooner and safer rewards) and in computing which costs are acceptable for a reward at stake. Moreover, we identify several limiting factors in the study designs of previous investigations that prevented a fuller understanding of dopamine's role in value-based choice. Together, the proposed theoretical framework and the methodological suggestions for future studies may bring us closer to a unifying account of dopamine in healthy and impaired cost-benefit decision making.
Collapse
|
24
|
Al-Mohammad A, Schultz W. Reward Value Revealed by Auction in Rhesus Monkeys. J Neurosci 2022; 42:1510-1528. [PMID: 34937703 PMCID: PMC8883853 DOI: 10.1523/jneurosci.1275-21.2021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 11/25/2021] [Accepted: 12/01/2021] [Indexed: 11/21/2022] Open
Abstract
Economic choice is thought to involve the elicitation of the subjective values of the choice options. Thus far, value estimation in animals has relied on stochastic choices between multiple options presented in repeated trials and expressed from averages of dozens of trials. However, subjective reward valuations are made moment-to-moment and do not always require alternative options; their consequences are usually felt immediately. Here, we describe a Becker-DeGroot-Marschak (BDM) auction-like mechanism that provides more direct and simple valuations with immediate consequences. The BDM encourages agents to truthfully reveal their true subjective value in individual choices ("incentive compatibility"). Male monkeys reliably placed well-ranked BDM bids for up to five juice volumes while paying from a water budget. The bids closely approximated the average subjective values estimated with conventional binary choices (BCs), thus demonstrating procedural invariance and aligning with the wealth of knowledge acquired with these less direct estimation methods. The feasibility of BDM bidding in monkeys paves the way for an analysis of subjective neuronal value signals in single trials rather than from averages; the feasibility also bridges the gap to the increasingly used BDM method in human neuroeconomics.SIGNIFICANCE STATEMENT The subjective economic value of rewards cannot be measured directly but must be inferred from observable behavior. Until now, the estimation method in animals was rather complex and required comparison between several choice options during repeated choices; thus, such methods did not respect the imminence of the outcome from individual choices. However, human economic research has developed a simple auction-like procedure that can reveal in a direct and immediate manner the true subjective value in individual choices [Becker-DeGroot-Marschak (BDM) mechanism]. The current study implemented this mechanism in rhesus monkeys and demonstrates its usefulness for eliciting meaningful value estimates of liquid rewards. The mechanism allows future neurophysiological assessment of subjective reward value signals in single trials of controlled animal tasks.
Collapse
Affiliation(s)
- Alaa Al-Mohammad
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge CB2 3DY, United Kingdom
| | - Wolfram Schultz
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge CB2 3DY, United Kingdom
| |
Collapse
|
25
|
Seeking motivation and reward: roles of dopamine, hippocampus and supramammillo-septal pathway. Prog Neurobiol 2022; 212:102252. [PMID: 35227866 PMCID: PMC8961455 DOI: 10.1016/j.pneurobio.2022.102252] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Revised: 02/09/2022] [Accepted: 02/23/2022] [Indexed: 01/07/2023]
Abstract
Reinforcement learning and goal-seeking behavior are thought to be mediated by midbrain dopamine neurons. However, little is known about neural substrates of curiosity and exploratory behavior, which occur in the absence of clear goal or reward. This is despite behavioral scientists having long suggested that curiosity and exploratory behaviors are regulated by an innate drive. We refer to such behavior as information-seeking behavior and propose 1) key neural substrates and 2) the concept of environment prediction error as a framework to understand information-seeking processes. The cognitive aspect of information-seeking behavior, including the perception of salience and uncertainty, involves, in part, the pathways from the posterior hypothalamic supramammillary region to the hippocampal formation. The vigor of such behavior is modulated by the following: supramammillary glutamatergic neurons; their projections to medial septal glutamatergic neurons; and the projections of medial septal glutamatergic neurons to ventral tegmental dopaminergic neurons. Phasic responses of dopaminergic neurons are characterized as signaling potentially important stimuli rather than rewards. This paper describes how novel stimuli and uncertainty trigger seeking motivation and how these neural substrates modulate information-seeking behavior.
Collapse
|
26
|
He J, Kleyman M, Chen J, Alikaya A, Rothenhoefer KM, Ozturk BE, Wirthlin M, Bostan AC, Fish K, Byrne LC, Pfenning AR, Stauffer WR. Transcriptional and anatomical diversity of medium spiny neurons in the primate striatum. Curr Biol 2021; 31:5473-5486.e6. [PMID: 34727523 PMCID: PMC9359438 DOI: 10.1016/j.cub.2021.10.015] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Revised: 09/17/2021] [Accepted: 10/06/2021] [Indexed: 10/20/2022]
Abstract
Medium spiny neurons (MSNs) constitute the vast majority of striatal neurons and the principal interface between dopamine reward signals and functionally diverse cortico-basal ganglia circuits. Information processing in these circuits is dependent on distinct MSN types: cell types that are traditionally defined according to their projection targets or dopamine receptor expression. Single-cell transcriptional studies have revealed greater MSN heterogeneity than predicted by traditional circuit models, but the transcriptional landscape in the primate striatum remains unknown. Here, we set out to establish molecular definitions for MSN subtypes in Rhesus monkeys and to explore the relationships between transcriptionally defined subtypes and anatomical subdivisions of the striatum. Our results suggest at least nine MSN subtypes, including dorsal striatum subtypes associated with striosome and matrix compartments, ventral striatum subtypes associated with the nucleus accumbens shell and olfactory tubercle, and an MSN-like cell type restricted to μ-opioid receptor rich islands in the ventral striatum. Although each subtype was demarcated by discontinuities in gene expression, continuous variation within subtypes defined gradients corresponding to anatomical locations and, potentially, functional specializations. These results lay the foundation for achieving cell-type-specific transgenesis in the primate striatum and provide a blueprint for investigating circuit-specific information processing.
Collapse
Affiliation(s)
- Jing He
- Department of Neurobiology, Systems Neuroscience Center, Brain Institute, Center for Neuroscience, Center for the Neural Basis of Cognition, University of Pittsburgh, 3501 Fifth Avenue, Pittsburgh, PA 15213, USA
| | - Michael Kleyman
- Department of Computational Biology, School of Computer Science, Neuroscience Institute, Center for the Neural Basis of Cognition, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA
| | - Jianjiao Chen
- Department of Neurobiology, Systems Neuroscience Center, Brain Institute, Center for Neuroscience, Center for the Neural Basis of Cognition, University of Pittsburgh, 3501 Fifth Avenue, Pittsburgh, PA 15213, USA
| | - Aydin Alikaya
- Department of Neurobiology, Systems Neuroscience Center, Brain Institute, Center for Neuroscience, Center for the Neural Basis of Cognition, University of Pittsburgh, 3501 Fifth Avenue, Pittsburgh, PA 15213, USA
| | - Kathryn M Rothenhoefer
- Department of Neurobiology, Systems Neuroscience Center, Brain Institute, Center for Neuroscience, Center for the Neural Basis of Cognition, University of Pittsburgh, 3501 Fifth Avenue, Pittsburgh, PA 15213, USA
| | - Bilge Esin Ozturk
- Department of Ophthalmology, Brain Institute, Center for Neuroscience, Center for the Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Morgan Wirthlin
- Department of Computational Biology, School of Computer Science, Neuroscience Institute, Center for the Neural Basis of Cognition, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA
| | - Andreea C Bostan
- Department of Neurobiology, Systems Neuroscience Center, Brain Institute, Center for Neuroscience, Center for the Neural Basis of Cognition, University of Pittsburgh, 3501 Fifth Avenue, Pittsburgh, PA 15213, USA
| | - Kenneth Fish
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Leah C Byrne
- Department of Ophthalmology, Brain Institute, Center for Neuroscience, Center for the Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Andreas R Pfenning
- Department of Computational Biology, School of Computer Science, Neuroscience Institute, Center for the Neural Basis of Cognition, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA.
| | - William R Stauffer
- Department of Neurobiology, Systems Neuroscience Center, Brain Institute, Center for Neuroscience, Center for the Neural Basis of Cognition, University of Pittsburgh, 3501 Fifth Avenue, Pittsburgh, PA 15213, USA.
| |
Collapse
|
27
|
Abstract
We confirm that rats can act as rational economic agents, making choices about how much work to do to obtain a reward in a way that optimally trades off the value of the reward against the cost of the effort. Contrary to the notion that bigger rewards are more motivating, rats worked harder in economies where rewards were small, ensuring a sufficient minimum income of water. But they chose to earn and consume more water per day when water was “cheap” (available for little work). We present a mathematical model explaining why rats work when they do (surprisingly, not just when they are thirsty) and suggesting where in the brain animals might compute the current value of working for water. In the laboratory, animals’ motivation to work tends to be positively correlated with reward magnitude. But in nature, rewards earned by work are essential to survival (e.g., working to find water), and the payoff of that work can vary on long timescales (e.g., seasonally). Under these constraints, the strategy of working less when rewards are small could be fatal. We found that instead, rats in a closed economy did more work for water rewards when the rewards were stably smaller, a phenomenon also observed in human labor supply curves. Like human consumers, rats showed elasticity of demand, consuming far more water per day when its price in effort was lower. The neural mechanisms underlying such “rational” market behaviors remain largely unexplored. We propose a dynamic utility maximization model that can account for the dependence of rat labor supply (trials/day) on the wage rate (milliliter/trial) and also predict the temporal dynamics of when rats work. Based on data from mice, we hypothesize that glutamatergic neurons in the subfornical organ in lamina terminalis continuously compute the instantaneous marginal utility of voluntary work for water reward and causally determine the amount and timing of work.
Collapse
|
28
|
Báez-Mendoza R, Vázquez Y, Mastrobattista EP, Williams ZM. Neuronal Circuits for Social Decision-Making and Their Clinical Implications. Front Neurosci 2021; 15:720294. [PMID: 34658766 PMCID: PMC8517320 DOI: 10.3389/fnins.2021.720294] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 09/09/2021] [Indexed: 11/13/2022] Open
Abstract
Social living facilitates individual access to rewards, cognitive resources, and objects that would not be otherwise accessible. There are, however, some drawbacks to social living, particularly when competing for scarce resources. Furthermore, variability in our ability to make social decisions can be associated with neuropsychiatric disorders. The neuronal mechanisms underlying social decision-making are beginning to be understood. The momentum to study this phenomenon has been partially carried over by the study of economic decision-making. Yet, because of the similarities between these different types of decision-making, it is unclear what is a social decision. Here, we propose a definition of social decision-making as choices taken in a context where one or more conspecifics are involved in the decision or the consequences of it. Social decisions can be conceptualized as complex economic decisions since they are based on the subjective preferences between different goods. During social decisions, individuals choose based on their internal value estimate of the different alternatives. These are complex decisions given that conspecifics beliefs or actions could modify the subject's internal valuations at every choice. Here, we first review recent developments in our collective understanding of the neuronal mechanisms and circuits of social decision-making in primates. We then review literature characterizing populations with neuropsychiatric disorders showing deficits in social decision-making and the underlying neuronal circuitries associated with these deficits.
Collapse
Affiliation(s)
- Raymundo Báez-Mendoza
- Department of Neurosurgery, Massachusetts General Hospital and Harvard Medical School, Boston, MA, United States
| | - Yuriria Vázquez
- Laboratory of Neural Systems, The Rockefeller University, New York, NY, United States
| | - Emma P. Mastrobattista
- Department of Neurosurgery, Massachusetts General Hospital and Harvard Medical School, Boston, MA, United States
| | - Ziv M. Williams
- Department of Neurosurgery, Massachusetts General Hospital and Harvard Medical School, Boston, MA, United States
| |
Collapse
|
29
|
Tanaka S, Taylor JE, Sakagami M. The effect of effort on reward prediction error signals in midbrain dopamine neurons. Curr Opin Behav Sci 2021. [DOI: 10.1016/j.cobeha.2021.07.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
30
|
Schultz W, Stauffer WR, Lak A, Pastor-Bernier A. Smarter than humans: rationality reflected in primate neuronal reward signals. Curr Opin Behav Sci 2021. [DOI: 10.1016/j.cobeha.2021.03.021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
31
|
Bujold PM, Ferrari-Toniolo S, Schultz W. Adaptation of utility functions to reward distribution in rhesus monkeys. Cognition 2021; 214:104764. [PMID: 34000666 PMCID: PMC8346953 DOI: 10.1016/j.cognition.2021.104764] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 04/26/2021] [Accepted: 05/04/2021] [Indexed: 10/25/2022]
Abstract
This study investigated how the experience of different reward distributions would shape the utility functions that can be inferred from economic choice. Despite the generally accepted notion that utility functions are not insensitive to external references, the exact way in which such changes take place remains largely unknown. Here we benefitted from the capacity to engage in thorough and prolonged empirical tests of economic choice by one of our evolutionary cousins, the rhesus macaque. We analyzed data from thousands of binary choices and found that the animals' preferences changed depending on the statistics of rewards experienced in the past (up to weeks) and that these changes could reflect monkeys' adapting their expectations of reward. The utility functions we elicited from their choices stretched and shifted over several months of sequential changes in the mean and range of rewards that the macaques experienced. However, this adaptation was usually incomplete, suggesting that - even after months - past experiences held weight when monkeys' assigned value to future rewards. Rather than having stable and fixed preferences assumed by normative economic models, our results demonstrate that rhesus macaques flexibly shape their preferences around the past and present statistics of their environment. That is, rather than relying on a singular reference-point, reference-dependent preferences are likely to capture a monkey's range of expectations.
Collapse
Affiliation(s)
- Philipe M Bujold
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge CB2 3DY, United Kingdom.
| | - Simone Ferrari-Toniolo
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge CB2 3DY, United Kingdom
| | - Wolfram Schultz
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge CB2 3DY, United Kingdom.
| |
Collapse
|
32
|
Pastor-Bernier A, Stasiak A, Schultz W. Reward-specific satiety affects subjective value signals in orbitofrontal cortex during multicomponent economic choice. Proc Natl Acad Sci U S A 2021; 118:e2022650118. [PMID: 34285071 PMCID: PMC8325167 DOI: 10.1073/pnas.2022650118] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Sensitivity to satiety constitutes a basic requirement for neuronal coding of subjective reward value. Satiety from natural ongoing consumption affects reward functions in learning and approach behavior. More specifically, satiety reduces the subjective economic value of individual rewards during choice between options that typically contain multiple reward components. The unconfounded assessment of economic reward value requires tests at choice indifference between two options, which is difficult to achieve with sated rewards. By conceptualizing choices between options with multiple reward components ("bundles"), Revealed Preference Theory may offer a solution. Despite satiety, choices against an unaltered reference bundle may remain indifferent when the reduced value of a sated bundle reward is compensated by larger amounts of an unsated reward of the same bundle, and then the value loss of the sated reward is indicated by the amount of the added unsated reward. Here, we show psychophysically titrated choice indifference in monkeys between bundles of differently sated rewards. Neuronal chosen value signals in the orbitofrontal cortex (OFC) followed closely the subjective value change within recording periods of individual neurons. A neuronal classifier distinguishing the bundles and predicting choice substantiated the subjective value change. The choice between conventional single rewards confirmed the neuronal changes seen with two-reward bundles. Thus, reward-specific satiety reduces subjective reward value signals in OFC. With satiety being an important factor of subjective reward value, these results extend the notion of subjective economic reward value coding in OFC neurons.
Collapse
Affiliation(s)
- Alexandre Pastor-Bernier
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge CB2 3DY, United Kingdom
| | - Arkadiusz Stasiak
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge CB2 3DY, United Kingdom
| | - Wolfram Schultz
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge CB2 3DY, United Kingdom
| |
Collapse
|
33
|
Wang G, Li J, Zhu C, Wang S, Jiang S. How Do Reference Points Influence the Representation of the N200 for Consumer Preference? Front Psychol 2021; 12:645775. [PMID: 34248744 PMCID: PMC8266263 DOI: 10.3389/fpsyg.2021.645775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Accepted: 05/10/2021] [Indexed: 11/20/2022] Open
Abstract
Recent studies have suggested that event-related brain potential (ERP) can represent consumer preference, and there is consensus that the N200 is the best indicator of consumer preference. Measurement of reference-dependent consumer preference, in turn, requires a reference point, but it remains largely unknown how reference points modulate the preference-related N200. We designed an experiment to investigate how reference points affect the N200 based on classical paradigms. In the single-reference condition, one product was displayed in each trial; in the conjoined-reference condition, a pair of products was displayed simultaneously. Our results showed that in the single-reference condition, low-preference products elicited more negative N200 than high-preference products, replicating previous results, but the N200 could not distinguish between low‐ and high-preference products when viewing two options of similar subjective value in the conjoined-reference condition. These findings suggest that reference points modulate the representation of the N200 on consumer preference. When only viewing one product, participants make a value judgment based on their expectations. However, when viewing two products simultaneously, both their expectation and the alternative product can serve as reference points, and whether the N200 can represent consumer preference depends on which reference point is dominant. In future research, reference points must be controlled when the N200 is used to explore value-related decision-making.
Collapse
Affiliation(s)
- Guangrong Wang
- Neural Decision Science Laboratory, School of Economics and Management, Weifang University, Weifang, China.,Institute for Study of Brain-Like Economics, School of Economics, Shandong University, Jinan, China
| | - Jianbiao Li
- Institute for Study of Brain-Like Economics, School of Economics, Shandong University, Jinan, China.,Department of Economics and Management, Nankai University Binhai College, Tianjin, China
| | - Chengkang Zhu
- Institute for Study of Brain-Like Economics, School of Economics, Shandong University, Jinan, China
| | - Shenru Wang
- School of Mechanical Engineering and Automation, Beihang University, Beijing, China
| | - Shenzhou Jiang
- School of Business Administration, Guangxi University of Finance and Economics, Nanning, China
| |
Collapse
|
34
|
Oleson EB, Hamilton LR, Gomez DM. Cannabinoid Modulation of Dopamine Release During Motivation, Periodic Reinforcement, Exploratory Behavior, Habit Formation, and Attention. Front Synaptic Neurosci 2021; 13:660218. [PMID: 34177546 PMCID: PMC8222827 DOI: 10.3389/fnsyn.2021.660218] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 05/05/2021] [Indexed: 12/12/2022] Open
Abstract
Motivational and attentional processes energize action sequences to facilitate evolutionary competition and promote behavioral fitness. Decades of neuropharmacology, electrophysiology and electrochemistry research indicate that the mesocorticolimbic DA pathway modulates both motivation and attention. More recently, it was realized that mesocorticolimbic DA function is tightly regulated by the brain's endocannabinoid system and greatly influenced by exogenous cannabinoids-which have been harnessed by humanity for medicinal, ritualistic, and recreational uses for 12,000 years. Exogenous cannabinoids, like the primary psychoactive component of cannabis, delta-9-tetrahydrocannabinol, produce their effects by acting at binding sites for naturally occurring endocannabinoids. The brain's endocannabinoid system consists of two G-protein coupled receptors, endogenous lipid ligands for these receptor targets, and several synthetic and metabolic enzymes involved in their production and degradation. Emerging evidence indicates that the endocannabinoid 2-arachidonoylglycerol is necessary to observe concurrent increases in DA release and motivated behavior. And the historical pharmacology literature indicates a role for cannabinoid signaling in both motivational and attentional processes. While both types of behaviors have been scrutinized under manipulation by either DA or cannabinoid agents, there is considerably less insight into prospective interactions between these two important signaling systems. This review attempts to summate the relevance of cannabinoid modulation of DA release during operant tasks designed to investigate either motivational or attentional control of behavior. We first describe how cannabinoids influence DA release and goal-directed action under a variety of reinforcement contingencies. Then we consider the role that endocannabinoids might play in switching an animal's motivation from a goal-directed action to the search for an alternative outcome, in addition to the formation of long-term habits. Finally, dissociable features of attentional behavior using both the 5-choice serial reaction time task and the attentional set-shifting task are discussed along with their distinct influences by DA and cannabinoids. We end with discussing potential targets for further research regarding DA-cannabinoid interactions within key substrates involved in motivation and attention.
Collapse
Affiliation(s)
- Erik B. Oleson
- Department of Psychology, University of Colorado Denver, Denver, CO, United States
| | - Lindsey R. Hamilton
- Department of Psychology, University of Colorado Denver, Denver, CO, United States
| | - Devan M. Gomez
- Department of Biomedical Sciences, Marquette University, Milwaukee, WI, United States
| |
Collapse
|
35
|
Ghazizadeh A, Hikosaka O. Common coding of expected value and value uncertainty memories in the prefrontal cortex and basal ganglia output. SCIENCE ADVANCES 2021; 7:eabe0693. [PMID: 33980480 PMCID: PMC8115923 DOI: 10.1126/sciadv.abe0693] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Accepted: 03/23/2021] [Indexed: 05/12/2023]
Abstract
Recent evidence implicates both basal ganglia and ventrolateral prefrontal cortex (vlPFC) in encoding value memories. However, comparative roles of cortical and basal nodes in value memory are not well understood. Here, single-unit recordings in vlPFC and substantia nigra reticulata (SNr), within macaque monkeys, revealed a larger value signal in SNr that was nevertheless correlated with and had a comparable onset to the vlPFC value signal. The value signal was maintained for many objects (>90) many weeks after reward learning and was resistant to extinction in both regions and to repetition suppression in vlPFC. Both regions showed comparable granularity in encoding expected value and value uncertainty, which was paralleled by enhanced gaze bias during free viewing. The value signal dynamics in SNr could be predicted by combining responses of vlPFC neurons according to their value preferences consistent with a scheme in which cortical neurons reached SNr via direct and indirect pathways.
Collapse
Affiliation(s)
- Ali Ghazizadeh
- Bio-intelligence Research Unit, Electrical Engineering Department, Sharif University of Technology, Tehran 11365-11155, Iran.
- School of Cognitive Sciences, Institute for Research in Fundamental Sciences, Tehran 19395-5746, Iran
| | - Okihide Hikosaka
- Laboratory of Sensorimotor Research, National Eye Institute, NIH, Bethesda, MD 20892, USA
- National Institute on Drug Abuse, NIH, Baltimore, MD 21224, USA
| |
Collapse
|
36
|
Morville T, Madsen KH, Siebner HR, Hulme OJ. Reward signalling in brainstem nuclei under fluctuating blood glucose. PLoS One 2021; 16:e0243899. [PMID: 33826633 PMCID: PMC8026025 DOI: 10.1371/journal.pone.0243899] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Accepted: 12/01/2020] [Indexed: 11/18/2022] Open
Abstract
Phasic dopamine release from mid-brain dopaminergic neurons is thought to signal errors of reward prediction (RPE). If reward maximisation is to maintain homeostasis, then the value of primary rewards should be coupled to the homeostatic errors they remediate. This leads to the prediction that RPE signals should be configured as a function of homeostatic state and thus diminish with the attenuation of homeostatic error. To test this hypothesis, we collected a large volume of functional MRI data from five human volunteers on four separate days. After fasting for 12 hours, subjects consumed preloads that differed in glucose concentration. Participants then underwent a Pavlovian cue-conditioning paradigm in which the colour of a fixation-cross was stochastically associated with the delivery of water or glucose via a gustometer. This design afforded computation of RPE separately for better- and worse-than expected outcomes during ascending and descending trajectories of serum glucose fluctuations. In the parabrachial nuclei, regional activity coding positive RPEs scaled positively with serum glucose for both ascending and descending glucose levels. The ventral tegmental area and substantia nigra became more sensitive to negative RPEs when glucose levels were ascending. Together, the results suggest that RPE signals in key brainstem structures are modulated by homeostatic trajectories of naturally occurring glycaemic flux, revealing a tight interplay between homeostatic state and the neural encoding of primary reward in the human brain.
Collapse
Affiliation(s)
- Tobias Morville
- Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital Hvidovre, Hvidovre, Denmark
| | - Kristoffer H. Madsen
- DTU Compute, Department of Informatics and Mathematical Modelling, Technical University of Denmark, Copenhagen, Denmark
| | - Hartwig R. Siebner
- Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital Hvidovre, Hvidovre, Denmark
- Department of Neurology, Copenhagen University Hospital Bispebjerg, Copenhagen, Denmark
| | - Oliver J. Hulme
- Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital Hvidovre, Hvidovre, Denmark
| |
Collapse
|
37
|
Rothenhoefer KM, Hong T, Alikaya A, Stauffer WR. Rare rewards amplify dopamine responses. Nat Neurosci 2021; 24:465-469. [PMID: 33686298 PMCID: PMC9373731 DOI: 10.1038/s41593-021-00807-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 01/20/2021] [Indexed: 01/02/2023]
Abstract
Dopamine prediction error responses are essential components of universal learning mechanisms. However, it is unknown whether individual dopamine neurons reflect the shape of reward distributions. Here, we used symmetrical distributions with differently weighted tails to investigate how the frequency of rewards and reward prediction errors influence dopamine signals. Rare rewards amplified dopamine responses, even when conventional prediction errors were identical, indicating a mechanism for learning the complexities of real-world incentives.
Collapse
Affiliation(s)
- Kathryn M Rothenhoefer
- Center for Neuroscience, University of Pittsburgh, Pittsburgh, PA, USA
- Center for the Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA, USA
- Systems Neuroscience Center, University of Pittsburgh, Pittsburgh, PA, USA
- The Brain Institute, University of Pittsburgh, Pittsburgh, PA, USA
| | - Tao Hong
- Center for the Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA, USA
- Systems Neuroscience Center, University of Pittsburgh, Pittsburgh, PA, USA
- The Brain Institute, University of Pittsburgh, Pittsburgh, PA, USA
- Program in Neural Computation, Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Aydin Alikaya
- Center for Neuroscience, University of Pittsburgh, Pittsburgh, PA, USA
- Center for the Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA, USA
- Systems Neuroscience Center, University of Pittsburgh, Pittsburgh, PA, USA
- The Brain Institute, University of Pittsburgh, Pittsburgh, PA, USA
| | - William R Stauffer
- Center for Neuroscience, University of Pittsburgh, Pittsburgh, PA, USA.
- Center for the Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA, USA.
- Systems Neuroscience Center, University of Pittsburgh, Pittsburgh, PA, USA.
- The Brain Institute, University of Pittsburgh, Pittsburgh, PA, USA.
| |
Collapse
|
38
|
Ferrari-Toniolo S, Bujold PM, Grabenhorst F, Báez-Mendoza R, Schultz W. Nonhuman Primates Satisfy Utility Maximization in Compliance with the Continuity Axiom of Expected Utility Theory. J Neurosci 2021; 41:2964-2979. [PMID: 33542082 PMCID: PMC8018892 DOI: 10.1523/jneurosci.0955-20.2020] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Revised: 11/13/2020] [Accepted: 11/19/2020] [Indexed: 11/21/2022] Open
Abstract
Expected Utility Theory (EUT), the first axiomatic theory of risky choice, describes choices as a utility maximization process: decision makers assign a subjective value (utility) to each choice option and choose the one with the highest utility. The continuity axiom, central to Expected Utility Theory and its modifications, is a necessary and sufficient condition for the definition of numerical utilities. The axiom requires decision makers to be indifferent between a gamble and a specific probabilistic combination of a more preferred and a less preferred gamble. While previous studies demonstrated that monkeys choose according to combinations of objective reward magnitude and probability, a concept-driven experimental approach for assessing the axiomatically defined conditions for maximizing utility by animals is missing. We experimentally tested the continuity axiom for a broad class of gamble types in 4 male rhesus macaque monkeys, showing that their choice behavior complied with the existence of a numerical utility measure as defined by the economic theory. We used the numerical quantity specified in the continuity axiom to characterize subjective preferences in a magnitude-probability space. This mapping highlighted a trade-off relation between reward magnitudes and probabilities, compatible with the existence of a utility function underlying subjective value computation. These results support the existence of a numerical utility function able to describe choices, allowing for the investigation of the neuronal substrates responsible for coding such rigorously defined quantity.SIGNIFICANCE STATEMENT A common assumption of several economic choice theories is that decisions result from the comparison of subjectively assigned values (utilities). This study demonstrated the compliance of monkey behavior with the continuity axiom of Expected Utility Theory, implying a subjective magnitude-probability trade-off relation, which supports the existence of numerical utility directly linked to the theoretical economic framework. We determined a numerical utility measure able to describe choices, which can serve as a correlate for the neuronal activity in the quest for brain structures and mechanisms guiding decisions.
Collapse
Affiliation(s)
- Simone Ferrari-Toniolo
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, CB2 3DY, United Kingdom
| | - Philipe M Bujold
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, CB2 3DY, United Kingdom
| | - Fabian Grabenhorst
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, CB2 3DY, United Kingdom
| | - Raymundo Báez-Mendoza
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, CB2 3DY, United Kingdom
- Department of Neurosurgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts 02114
| | - Wolfram Schultz
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, CB2 3DY, United Kingdom
| |
Collapse
|
39
|
Fung BJ, Sutlief E, Hussain Shuler MG. Dopamine and the interdependency of time perception and reward. Neurosci Biobehav Rev 2021; 125:380-391. [PMID: 33652021 DOI: 10.1016/j.neubiorev.2021.02.030] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Revised: 02/16/2021] [Accepted: 02/19/2021] [Indexed: 01/14/2023]
Abstract
Time is a fundamental dimension of our perception of the world and is therefore of critical importance to the organization of human behavior. A corpus of work - including recent optogenetic evidence - implicates striatal dopamine as a crucial factor influencing the perception of time. Another stream of literature implicates dopamine in reward and motivation processes. However, these two domains of research have remained largely separated, despite neurobiological overlap and the apothegmatic notion that "time flies when you're having fun". This article constitutes a review of the literature linking time perception and reward, including neurobiological and behavioral studies. Together, these provide compelling support for the idea that time perception and reward processing interact via a common dopaminergic mechanism.
Collapse
Affiliation(s)
- Bowen J Fung
- The Behavioural Insights Team, Suite 3, Level 13/9 Hunter St, Sydney NSW 2000, Australia.
| | - Elissa Sutlief
- The Solomon H. Snyder Department of Neuroscience, The Johns Hopkins University School of Medicine, Woods Basic Science Building Rm914, 725 N. Wolfe Street, Baltimore, MD 21205, USA
| | - Marshall G Hussain Shuler
- The Solomon H. Snyder Department of Neuroscience, The Johns Hopkins University School of Medicine, Woods Basic Science Building Rm914, 725 N. Wolfe Street, Baltimore, MD 21205, USA; Kavli Neuroscience Discovery Institute, The Johns Hopkins University School of Medicine, 725 N Wolfe Street, Baltimore, MD 21205, USA.
| |
Collapse
|
40
|
Hesp C, Smith R, Parr T, Allen M, Friston KJ, Ramstead MJD. Deeply Felt Affect: The Emergence of Valence in Deep Active Inference. Neural Comput 2021; 33:398-446. [PMID: 33253028 PMCID: PMC8594962 DOI: 10.1162/neco_a_01341] [Citation(s) in RCA: 74] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Accepted: 08/17/2020] [Indexed: 01/20/2023]
Abstract
The positive-negative axis of emotional valence has long been recognized as fundamental to adaptive behavior, but its origin and underlying function have largely eluded formal theorizing and computational modeling. Using deep active inference, a hierarchical inference scheme that rests on inverting a model of how sensory data are generated, we develop a principled Bayesian model of emotional valence. This formulation asserts that agents infer their valence state based on the expected precision of their action model-an internal estimate of overall model fitness ("subjective fitness"). This index of subjective fitness can be estimated within any environment and exploits the domain generality of second-order beliefs (beliefs about beliefs). We show how maintaining internal valence representations allows the ensuing affective agent to optimize confidence in action selection preemptively. Valence representations can in turn be optimized by leveraging the (Bayes-optimal) updating term for subjective fitness, which we label affective charge (AC). AC tracks changes in fitness estimates and lends a sign to otherwise unsigned divergences between predictions and outcomes. We simulate the resulting affective inference by subjecting an in silico affective agent to a T-maze paradigm requiring context learning, followed by context reversal. This formulation of affective inference offers a principled account of the link between affect, (mental) action, and implicit metacognition. It characterizes how a deep biological system can infer its affective state and reduce uncertainty about such inferences through internal action (i.e., top-down modulation of priors that underwrite confidence). Thus, we demonstrate the potential of active inference to provide a formal and computationally tractable account of affect. Our demonstration of the face validity and potential utility of this formulation represents the first step within a larger research program. Next, this model can be leveraged to test the hypothesized role of valence by fitting the model to behavioral and neuronal responses.
Collapse
Affiliation(s)
- Casper Hesp
- Department of Psychology and Amsterdam Brain and Cognition Centre, University of Amsterdam, 1098 XH Amsterdam, Netherlands; Institute for Advanced Study, University of Amsterdam, 1012 GC Amsterdam, Netherlands; and Wellcome Centre for Human Neuroimaging, University College London, London WC1N 3BG, U.K.
| | - Ryan Smith
- Laureate Institute for Brain Research, Tulsa, OK 74136, U.S.A.
| | - Thomas Parr
- Wellcome Centre for Human Neuroimaging, University College London, London WC1N 3BG, U.K.
| | - Micah Allen
- Aarhus Institute of Advanced Studies, Aarhus University, Aarhus 8000, Denmark; Centre of Functionally Integrative Neuroscience, Aarhus University Hospital, Aarhus 8200, Denmark; and Cambridge Psychiatry, Cambridge University, Cambridge CB2 8AH, U.K.
| | - Karl J Friston
- Wellcome Centre for Human Neuroimaging, University College London, London WC1N 3BG, U.K.
| | - Maxwell J D Ramstead
- Wellcome Centre for Human Neuroimaging, University College London, London WC1N 3BG, U.K.; Division of Social and Transcultural Psychiatry, Department of Psychiatry and Culture, Mind, and Brain Program, McGill University, Montreal H3A 0G4, QC, Canada
| |
Collapse
|
41
|
Verstynen T, Dunovan K, Walsh C, Kuan CH, Manuck SB, Gianaros PJ. Adiposity covaries with signatures of asymmetric feedback learning during adaptive decisions. Soc Cogn Affect Neurosci 2020; 15:1145-1156. [PMID: 32608485 PMCID: PMC7657458 DOI: 10.1093/scan/nsaa088] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Revised: 06/03/2020] [Accepted: 06/15/2020] [Indexed: 12/19/2022] Open
Abstract
Unhealthy weight gain relates, in part, to how people make decisions based on prior experience. Here we conducted post hoc analysis on an archival data set to evaluate whether individual differences in adiposity, an anthropometric construct encompassing a spectrum of body types, from lean to obese, associate with signatures of asymmetric feedback learning during value-based decision-making. In a sample of neurologically healthy adults (N = 433), ventral striatal responses to rewards, measured using fMRI, were not directly associated with adiposity, but rather moderated its relationship with feedback-driven learning in the Iowa gambling task, tested outside the scanner. Using a biologically inspired model of basal ganglia-dependent decision processes, we found this moderating effect of reward reactivity to be explained by an asymmetrical use of feedback to drive learning; that is, with more plasticity for gains than for losses, stronger reward reactivity leads to decisions that minimize exploration for maximizing long-term outcomes. Follow-up analysis confirmed that individual differences in adiposity correlated with signatures of asymmetric use of feedback cues during learning, suggesting that reward reactivity may especially relate to adiposity, and possibly obesity risk, when gains impact future decisions more than losses.
Collapse
Affiliation(s)
- Timothy Verstynen
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA 15213, USA.,Carnegie Mellon Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA.,Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Kyle Dunovan
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA 15213, USA.,Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Catherine Walsh
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA 15260, USA.,Center for the Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | - Chieh-Hsin Kuan
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA 15260, USA.,Center for the Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | - Stephen B Manuck
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | - Peter J Gianaros
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA 15260, USA.,Center for the Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA 15260, USA
| |
Collapse
|
42
|
Mendoza JA, Lafferty CK, Yang AK, Britt JP. Cue-Evoked Dopamine Neuron Activity Helps Maintain but Does Not Encode Expected Value. Cell Rep 2020; 29:1429-1437.e3. [PMID: 31693885 DOI: 10.1016/j.celrep.2019.09.077] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Revised: 08/21/2019] [Accepted: 09/26/2019] [Indexed: 11/16/2022] Open
Abstract
Cue-evoked midbrain dopamine (DA) neuron activity reflects expected value, but its influence on reward assessment is unclear. In mice performing a trial-based operant task, we test if bidirectional manipulations of cue or operant-associated DA neuron activity drive learning as a result of under- or overexpectation of reward value. We target optogenetic manipulations to different components of forced trials, when only one lever is presented, and assess lever biases on choice trials in the absence of photomanipulation. Although lever biases are demonstrated to be flexible and sensitive to changes in expected value, augmentation of cue or operant-associated DA signaling does not significantly alter choice behavior, and blunting DA signaling during any component of the forced trials reduces choice trial responses on the associated lever. These data suggest cue-evoked DA helps maintain cue-value associations but does not encode expected value as to set the benchmark against which received reward is judged.
Collapse
Affiliation(s)
- Jesse A Mendoza
- Department of Psychology, McGill University, Montreal, QC H3A 1B1, Canada; Center for Studies in Behavioral Neurobiology, Concordia University, Montreal, QC H4B 1R6, Canada
| | - Christopher K Lafferty
- Department of Psychology, McGill University, Montreal, QC H3A 1B1, Canada; Center for Studies in Behavioral Neurobiology, Concordia University, Montreal, QC H4B 1R6, Canada
| | - Angela K Yang
- Integrated Program in Neuroscience, McGill University, Montreal, QC H3A 2B4, Canada; Center for Studies in Behavioral Neurobiology, Concordia University, Montreal, QC H4B 1R6, Canada
| | - Jonathan P Britt
- Department of Psychology, McGill University, Montreal, QC H3A 1B1, Canada; Integrated Program in Neuroscience, McGill University, Montreal, QC H3A 2B4, Canada; Center for Studies in Behavioral Neurobiology, Concordia University, Montreal, QC H4B 1R6, Canada.
| |
Collapse
|
43
|
Neuser MP, Kühnel A, Svaldi J, Kroemer NB. Beyond the average: The role of variable reward sensitivity in eating disorders. Physiol Behav 2020; 223:112971. [DOI: 10.1016/j.physbeh.2020.112971] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Revised: 04/30/2020] [Accepted: 05/13/2020] [Indexed: 01/13/2023]
|
44
|
Emberly E, Seamans JK. Abrupt, Asynchronous Changes in Action Representations by Anterior Cingulate Cortex Neurons during Trial and Error Learning. Cereb Cortex 2020; 30:4336-4345. [PMID: 32239139 DOI: 10.1093/cercor/bhaa019] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2019] [Revised: 01/09/2020] [Accepted: 01/12/2020] [Indexed: 11/13/2022] Open
Abstract
The ability to act on knowledge about the value of stimuli or actions factors into simple foraging behaviors as well as complex forms of decision-making. In striatal regions, action representations are thought to acquire value through a gradual (reinforcement-learning based) process. It is unclear whether this is also true for anterior cingulate cortex (ACC) where neuronal representations tend to change abruptly. We recorded from ensembles of ACC neurons as rats deduced which of 3 levers was rewarded each day. The rat's lever preferences changed gradually throughout the sessions as they eventually came to focus on the rewarded lever. Most individual neurons changed their responses to both rewarded and nonrewarded lever presses abruptly (<2 trials). These transitions occurred asynchronously across the population but peaked near the point where the rats began to focus on the rewarded lever. Because the individual transitions were asynchronous, the overall change at the population level appeared gradual. Abrupt transitions in action representations of ACC neurons may be part of a mechanism that alters choice strategies as new information is acquired.
Collapse
Affiliation(s)
- Eldon Emberly
- Department of Physics, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Jeremy K Seamans
- Department of Psychiatry, Centre for Brain Health, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| |
Collapse
|
45
|
Yoon T, Jaleel A, Ahmed AA, Shadmehr R. Saccade vigor and the subjective economic value of visual stimuli. J Neurophysiol 2020; 123:2161-2172. [PMID: 32374201 DOI: 10.1152/jn.00700.2019] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Decisions are made based on the subjective value that the brain assigns to options. However, subjective value is a mathematical construct that cannot be measured directly, but rather is inferred from choices. Recent results have demonstrated that reaction time, amplitude, and velocity of movements are modulated by reward, raising the possibility that there is a link between how the brain evaluates an option and how it controls movements toward that option. Here, we asked people to choose among risky options represented by abstract stimuli, some associated with gain (points in a game), and others with loss. From their choices we estimated the subjective value that they assigned to each stimulus. In probe trials, a single stimulus appeared at center, instructing subjects to make a saccade to a peripheral target. We found that the reaction time, peak velocity, and amplitude of the peripherally directed saccade varied roughly linearly with the subjective value that the participant had assigned to the central stimulus: reaction time was shorter, velocity was higher, and amplitude was larger for stimuli that the participant valued more. Naturally, participants differed in how much they valued a given stimulus. Remarkably, those who valued a stimulus more, as evidenced by their choices in decision trials, tended to move with shorter reaction time and greater velocity in response to that stimulus in probe trials. Overall, the reaction time of the saccade in response to a stimulus partly predicted the subjective value that the brain assigned to that stimulus.NEW & NOTEWORTHY Behavioral economics relies on subjective evaluation, an abstract quantity that cannot be measured directly but must be inferred by fitting decision models to the choice patterns. Here, we present a new approach to estimate subjective value: with nothing to fit, we show that it is possible to estimate subjective value based on movement kinematics, providing a modest ability to predict a participant's preferences without prior measurement of their choice patterns.
Collapse
Affiliation(s)
- Tehrim Yoon
- Laboratory for Computational Motor Control, Department of Biomedical Engineering, Johns Hopkins School of Medicine, Baltimore, Maryland
| | - Afareen Jaleel
- Laboratory for Computational Motor Control, Department of Biomedical Engineering, Johns Hopkins School of Medicine, Baltimore, Maryland
| | - Alaa A Ahmed
- Departments of Integrative Physiology and Mechanical Engineering University of Colorado, Boulder, Colorado
| | - Reza Shadmehr
- Laboratory for Computational Motor Control, Department of Biomedical Engineering, Johns Hopkins School of Medicine, Baltimore, Maryland
| |
Collapse
|
46
|
van Swieten MMH, Bogacz R. Modeling the effects of motivation on choice and learning in the basal ganglia. PLoS Comput Biol 2020; 16:e1007465. [PMID: 32453725 PMCID: PMC7274475 DOI: 10.1371/journal.pcbi.1007465] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Revised: 06/05/2020] [Accepted: 04/03/2020] [Indexed: 01/08/2023] Open
Abstract
Decision making relies on adequately evaluating the consequences of actions on the basis of past experience and the current physiological state. A key role in this process is played by the basal ganglia, where neural activity and plasticity are modulated by dopaminergic input from the midbrain. Internal physiological factors, such as hunger, scale signals encoded by dopaminergic neurons and thus they alter the motivation for taking actions and learning. However, to our knowledge, no formal mathematical formulation exists for how a physiological state affects learning and action selection in the basal ganglia. We developed a framework for modelling the effect of motivation on choice and learning. The framework defines the motivation to obtain a particular resource as the difference between the desired and the current level of this resource, and proposes how the utility of reinforcements depends on the motivation. To account for dopaminergic activity previously recorded in different physiological states, the paper argues that the prediction error encoded in the dopaminergic activity needs to be redefined as the difference between utility and expected utility, which depends on both the objective reinforcement and the motivation. We also demonstrate a possible mechanism by which the evaluation and learning of utility of actions can be implemented in the basal ganglia network. The presented theory brings together models of learning in the basal ganglia with the incentive salience theory in a single simple framework, and it provides a mechanistic insight into how decision processes and learning in the basal ganglia are modulated by the motivation. Moreover, this theory is also consistent with data on neural underpinnings of overeating and obesity, and makes further experimental predictions.
Collapse
Affiliation(s)
| | - Rafal Bogacz
- MRC Brain Network Dynamics Unit, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
47
|
Rationalization of emotion is also rational. Behav Brain Sci 2020; 43:e43. [PMID: 32292159 DOI: 10.1017/s0140525x19002292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Cushman seeks to explain rationalization in terms of fundamental mental processes, and he hypotheses a selected-for function: information exchange between "rational" and "non-rational" processes in the brain. While this is plausible, his account overlooks the importance - and information value - of rationalizing the emotions of ourselves and others. Incorporating such rationalization would help explain the effectiveness of rationalization and its connection with valuation, as well as raise a challenge to his way of bifurcating "rational" and "non-rational" processes.
Collapse
|
48
|
Bayer J, Rusch T, Zhang L, Gläscher J, Sommer T. Dose-dependent effects of estrogen on prediction error related neural activity in the nucleus accumbens of healthy young women. Psychopharmacology (Berl) 2020; 237:745-755. [PMID: 31773208 DOI: 10.1007/s00213-019-05409-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Accepted: 11/18/2019] [Indexed: 12/24/2022]
Abstract
RATIONALE Whereas the effect of the sex steroid 17-beta-estradiol (E2) on dopaminergic (DA) transmission in the nucleus accumbens (NAc) is well evidenced in female rats, studies in humans are inconsistent. Moreover, linear and inverted u-shaped dose response curves have been observed for E2's effects on hippocampal plasticity, but the shape of dose response curves for E2's effects on the NAc is much less characterized. OBJECTIVES Investigation of dose response curves for E2's effects on DA-related neural activity in the human NAc. METHODS Placebo or E2 valerate in doses of 2, 4, 6 or 12 mg was orally administered to 125 naturally cycling young women during the low-hormone menstruation phase on two consecutive days using a randomized, double-blinded design. The E2 treatment regimen induced a wide range of E2 levels, from physiological (2- and 4-mg groups; equivalent to cycle peak) to supraphysiological levels (6- and 12-mg groups; equivalent to early pregnancy). This made it possible to study different dose response functions for E2's effects on NAc activity. During E2 peak, participants performed a well-established reversal learning paradigm. We used trial-wise prediction errors (PE) estimated via a computational reinforcement learning model as a proxy for dopaminergic activity. Linear and quadratic regression analyses predicting PE-related NAc activity from salivary E2 levels were calculated. RESULTS There was a positive linear relationship between PE-associated NAc activity and salivary E2 increases. CONCLUSIONS The randomized, placebo-controlled elevation of E2 levels stimulates NAc activity in the human brain, likely mediated by dopaminergic processes.
Collapse
Affiliation(s)
- Janine Bayer
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Martinistr. 52, 20246, Hamburg, Germany.
| | - Tessa Rusch
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Martinistr. 52, 20246, Hamburg, Germany
| | - Lei Zhang
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Martinistr. 52, 20246, Hamburg, Germany.,Department of Basic Psychological Research and Research Methods, University of Vienna, Liebiggasse 5, 1010, Vienna, Austria
| | - Jan Gläscher
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Martinistr. 52, 20246, Hamburg, Germany
| | - Tobias Sommer
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Martinistr. 52, 20246, Hamburg, Germany
| |
Collapse
|
49
|
A distributional code for value in dopamine-based reinforcement learning. Nature 2020; 577:671-675. [PMID: 31942076 DOI: 10.1038/s41586-019-1924-6] [Citation(s) in RCA: 170] [Impact Index Per Article: 42.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Accepted: 11/19/2019] [Indexed: 12/12/2022]
Abstract
Since its introduction, the reward prediction error theory of dopamine has explained a wealth of empirical phenomena, providing a unifying framework for understanding the representation of reward and value in the brain1-3. According to the now canonical theory, reward predictions are represented as a single scalar quantity, which supports learning about the expectation, or mean, of stochastic outcomes. Here we propose an account of dopamine-based reinforcement learning inspired by recent artificial intelligence research on distributional reinforcement learning4-6. We hypothesized that the brain represents possible future rewards not as a single mean, but instead as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. This idea implies a set of empirical predictions, which we tested using single-unit recordings from mouse ventral tegmental area. Our findings provide strong evidence for a neural realization of distributional reinforcement learning.
Collapse
|
50
|
Do domain-general executive resources play a role in linguistic prediction? Re-evaluation of the evidence and a path forward. Neuropsychologia 2020; 136:107258. [DOI: 10.1016/j.neuropsychologia.2019.107258] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 11/07/2019] [Accepted: 11/07/2019] [Indexed: 12/13/2022]
|