Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Averbeck BB. Theory of choice in bandit, information sampling and foraging tasks. PLoS Comput Biol 2015;11:e1004164. [PMID: 25815510 PMCID: PMC4376795 DOI: 10.1371/journal.pcbi.1004164] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2014] [Accepted: 01/30/2015] [Indexed: 11/18/2022] Open

For:	Averbeck BB. Theory of choice in bandit, information sampling and foraging tasks. PLoS Comput Biol 2015;11:e1004164. [PMID: 25815510 PMCID: PMC4376795 DOI: 10.1371/journal.pcbi.1004164] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2014] [Accepted: 01/30/2015] [Indexed: 11/18/2022] Open

Number

Cited by Other Article(s)

Ramírez-Ruiz J, Grytskyy D, Mastrogiuseppe C, Habib Y, Moreno-Bote R. Complex behavior from intrinsic motivation to occupy future action-state path space. Nat Commun 2024;15:6368. [PMID: 39075046 PMCID: PMC11286966 DOI: 10.1038/s41467-024-49711-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 06/13/2024] [Indexed: 07/31/2024] Open

Kobayashi K, Kable JW. Neural mechanisms of information seeking. Neuron 2024;112:1741-1756. [PMID: 38703774 DOI: 10.1016/j.neuron.2024.04.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 01/30/2024] [Accepted: 04/08/2024] [Indexed: 05/06/2024]

Tang H, Bartolo-Orozco R, Averbeck BB. Ventral frontostriatal circuitry mediates the computation of reinforcement from symbolic gains and losses. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.03.587097. [PMID: 38617219 PMCID: PMC11014508 DOI: 10.1101/2024.04.03.587097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]

Alejandro RJ, Holroyd CB. Hierarchical control over foraging behavior by anterior cingulate cortex. Neurosci Biobehav Rev 2024;160:105623. [PMID: 38490499 DOI: 10.1016/j.neubiorev.2024.105623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 02/14/2024] [Accepted: 03/13/2024] [Indexed: 03/17/2024]

Venditto SJC, Miller KJ, Brody CD, Daw ND. Dynamic reinforcement learning reveals time-dependent shifts in strategy during reward learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.28.582617. [PMID: 38464244 PMCID: PMC10925334 DOI: 10.1101/2024.02.28.582617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]

Giarrocco F, Costa VD, Basile BM, Pujara MS, Murray EA, Averbeck BB. Motor System-Dependent Effects of Amygdala and Ventral Striatum Lesions on Explore-Exploit Behaviors. J Neurosci 2024;44:e1206232023. [PMID: 38296647 PMCID: PMC10860650 DOI: 10.1523/jneurosci.1206-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 11/17/2023] [Accepted: 11/21/2023] [Indexed: 02/02/2024] Open

Rolls ET, Deco G, Huang CC, Feng J. The connectivity of the human frontal pole cortex, and a theory of its involvement in exploit versus explore. Cereb Cortex 2024;34:bhad416. [PMID: 37991264 DOI: 10.1093/cercor/bhad416] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 10/16/2023] [Accepted: 10/17/2023] [Indexed: 11/23/2023] Open

Wyatt LE, Hewan PA, Hogeveen J, Spreng RN, Turner GR. Exploration versus exploitation decisions in the human brain: A systematic review of functional neuroimaging and neuropsychological studies. Neuropsychologia 2024;192:108740. [PMID: 38036246 DOI: 10.1016/j.neuropsychologia.2023.108740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Revised: 10/15/2023] [Accepted: 11/21/2023] [Indexed: 12/02/2023]

Abstract

Thoughts and actions are often driven by a decision to either explore new avenues with unknown outcomes, or to exploit known options with predictable outcomes. Yet, the neural mechanisms underlying this exploration-exploitation trade-off in humans remain poorly understood. This is attributable to variability in the operationalization of exploration and exploitation as psychological constructs, as well as the heterogeneity of experimental protocols and paradigms used to study these choice behaviours. To address this gap, here we present a comprehensive review of the literature to investigate the neural basis of explore-exploit decision-making in humans. We first conducted a systematic review of functional magnetic resonance imaging (fMRI) studies of exploration-versus exploitation-based decision-making in healthy adult humans during foraging, reinforcement learning, and information search. Eleven fMRI studies met inclusion criterion for this review. Adopting a network neuroscience framework, synthesis of the findings across these studies revealed that exploration-based choice was associated with the engagement of attentional, control, and salience networks. In contrast, exploitation-based choice was associated with engagement of default network brain regions. We interpret these results in the context of a network architecture that supports the flexible switching between externally and internally directed cognitive processes, necessary for adaptive, goal-directed behaviour. To further investigate potential neural mechanisms underlying the exploration-exploitation trade-off we next surveyed studies involving neurodevelopmental, neuropsychological, and neuropsychiatric disorders, as well as lifespan development, and neurodegenerative diseases. We observed striking differences in patterns of explore-exploit decision-making across these populations, again suggesting that these two decision-making modes are supported by independent neural circuits. Taken together, our review highlights the need for precision-mapping of the neural circuitry and behavioural correlates associated with exploration and exploitation in humans. Characterizing exploration versus exploitation decision-making biases may offer a novel, trans-diagnostic approach to assessment, surveillance, and intervention for cognitive decline and dysfunction in normal development and clinical populations.

Collapse

Xu Y, Harms MB, Green CS, Wilson RC, Pollak SD. Childhood unpredictability and the development of exploration. Proc Natl Acad Sci U S A 2023;120:e2303869120. [PMID: 38011553 DOI: 10.1073/pnas.2303869120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 09/12/2023] [Indexed: 11/29/2023] Open

Campbell EM, Singh G, Claus ED, Witkiewitz K, Costa VD, Hogeveen J, Cavanagh JF. Electrophysiological Markers of Aberrant Cue-Specific Exploration in Hazardous Drinkers. COMPUTATIONAL PSYCHIATRY (CAMBRIDGE, MASS.) 2023;7:47-59. [PMID: 38774639 PMCID: PMC11104413 DOI: 10.5334/cpsy.96] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Accepted: 06/28/2023] [Indexed: 05/24/2024]

Abstract

Background

Hazardous drinking is associated with maladaptive alcohol-related decision-making. Existing studies have often focused on how participants learn to exploit familiar cues based on prior reinforcement, but little is known about the mechanisms that drive hazardous drinkers to explore novel alcohol cues when their value is not known.

Methods

We investigated exploration of novel alcohol and non-alcohol cues in hazardous drinkers (N = 27) and control participants (N = 26) during electroencephalography (EEG). A normative computational model with two free parameters was fit to estimate participants' weighting of the future value of exploration and immediate value of exploitation.

Results

Hazardous drinkers demonstrated increased exploration of novel alcohol cues, and conversely, increased probability of exploiting familiar alternatives instead of exploring novel non-alcohol cues. The motivation to explore novel alcohol stimuli in hazardous drinkers was driven by an elevated relative future valuation of uncertain alcohol cues. P3a predicted more exploratory decision policies driven by an enhanced relative future valuation of novel alcohol cues. P3b did not predict choice behavior, but computational parameter estimates suggested that hazardous drinkers with enhanced P3b to alcohol cues were likely to learn to exploit their immediate expected value.

Conclusions

Hazardous drinkers did not display atypical choice behavior, different P3a/P3b amplitudes, or computational estimates to novel non-alcohol cues-diverging from previous studies in addiction showing atypical generalized explore-exploit decisions with non-drug-related cues. These findings reveal that cue-specific neural computations may drive aberrant alcohol-related decision-making in hazardous drinkers-highlighting the importance of drug-relevant cues in studies of decision-making in addiction.

Collapse

Lee JK, Rouault M, Wyart V. Adaptive tuning of human learning and choice variability to unexpected uncertainty. SCIENCE ADVANCES 2023;9:eadd0501. [PMID: 36989365 PMCID: PMC10058239 DOI: 10.1126/sciadv.add0501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 02/28/2023] [Indexed: 06/19/2023]

Khatib D, Morris G. Spontaneous behaviour is shaped by dopamine in two ways. Nature 2023;614:36-37. [PMID: 36653602 DOI: 10.1038/d41586-023-00004-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]

Burk DC, Averbeck BB. Environmental uncertainty and the advantage of impulsive choice strategies. PLoS Comput Biol 2023;19:e1010873. [PMID: 36716320 PMCID: PMC9910799 DOI: 10.1371/journal.pcbi.1010873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 02/09/2023] [Accepted: 01/15/2023] [Indexed: 02/01/2023] Open

Abstract

Choice impulsivity is characterized by the choice of immediate, smaller reward options over future, larger reward options, and is often thought to be associated with negative life outcomes. However, some environments make future rewards more uncertain, and in these environments impulsive choices can be beneficial. Here we examined the conditions under which impulsive vs. non-impulsive decision strategies would be advantageous. We used Markov Decision Processes (MDPs) to model three common decision-making tasks: Temporal Discounting, Information Sampling, and an Explore-Exploit task. We manipulated environmental variables to create circumstances where future outcomes were relatively uncertain. We then manipulated the discount factor of an MDP agent, which affects the value of immediate versus future rewards, to model impulsive and non-impulsive behavior. This allowed us to examine the performance of impulsive and non-impulsive agents in more or less predictable environments. In Temporal Discounting, we manipulated the transition probability to delayed rewards and found that the agent with the lower discount factor (i.e. the impulsive agent) collected more average reward than the agent with a higher discount factor (the non-impulsive agent) by selecting immediate reward options when the probability of receiving the future reward was low. In the Information Sampling task, we manipulated the amount of information obtained with each sample. When sampling led to small information gains, the impulsive MDP agent collected more average reward than the non-impulsive agent. Third, in the Explore-Exploit task, we manipulated the substitution rate for novel options. When the substitution rate was high, the impulsive agent again performed better than the non-impulsive agent, as it explored the novel options less and instead exploited options with known reward values. The results of these analyses show that impulsivity can be advantageous in environments that are unexpectedly uncertain.

Collapse

Schach S, Lindner A, Braun DA. Bounded rational decision-making models suggest capacity-limited concurrent motor planning in human posterior parietal and frontal cortex. PLoS Comput Biol 2022;18:e1010585. [PMID: 36227842 PMCID: PMC9560147 DOI: 10.1371/journal.pcbi.1010585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Accepted: 09/18/2022] [Indexed: 11/05/2022] Open

Abstract

While traditional theories of sensorimotor processing have often assumed a serial decision-making pipeline, more recent approaches have suggested that multiple actions may be planned concurrently and vie for execution. Evidence for the latter almost exclusively stems from electrophysiological studies in posterior parietal and premotor cortex of monkeys. Here we study concurrent prospective motor planning in humans by recording functional magnetic resonance imaging (fMRI) during a delayed response task engaging movement sequences towards multiple potential targets. We find that also in human posterior parietal and premotor cortex delay activity modulates both with sequence complexity and the number of potential targets. We tested the hypothesis that this modulation is best explained by concurrent prospective planning as opposed to the mere maintenance of potential targets in memory. We devise a bounded rationality model with information constraints that optimally assigns information resources for planning and memory for this task and determine predicted information profiles according to the two hypotheses. When regressing delay activity on these model predictions, we find that the concurrent prospective planning strategy provides a significantly better explanation of the fMRI-signal modulations. Moreover, we find that concurrent prospective planning is more costly and thus limited for most subjects, as expressed by the best fitting information capacities. We conclude that bounded rational decision-making models allow relating both behavior and neural representations to utilitarian task descriptions based on bounded optimal information-processing assumptions.

When the future is uncertain, it can be beneficial to concurrently plan several action possibilities in advance. Electrophysiological research found evidence in monkeys that brain regions in posterior parietal and promotor cortex are indeed capable of planning several actions in parallel. We now used fMRI to study brain activity in these brain regions in humans. For our analyses we applied bounded rationality models that optimally assign information resources to fMRI activity in a complex motor planning task. We find that theoretical information costs of concurrent prospective planning explained fMRI activity profiles significantly better than assuming alternative memory-based strategies. Moreover, exploiting the model allowed us to quantify the individual capacity limit for concurrent planning and to relate these individual limits to both subjects’ behavior and to their neural representations of planning.

Collapse

Pupil dilation and response slowing distinguish deliberate explorative choices in the probabilistic learning task. COGNITIVE, AFFECTIVE, & BEHAVIORAL NEUROSCIENCE 2022;22:1108-1129. [PMID: 35359274 PMCID: PMC9458574 DOI: 10.3758/s13415-022-00996-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Accepted: 03/07/2022] [Indexed: 12/22/2022]

Rethinking delusions: A selective review of delusion research through a computational lens. Schizophr Res 2022;245:23-41. [PMID: 33676820 PMCID: PMC8413395 DOI: 10.1016/j.schres.2021.01.023] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Revised: 01/27/2021] [Accepted: 01/29/2021] [Indexed: 02/06/2023]

Shared mechanisms mediate the explore-exploit tradeoff in macaques and humans. Neuron 2022;110:1751-1753. [PMID: 35654023 DOI: 10.1016/j.neuron.2022.05.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Hogeveen J, Mullins TS, Romero JD, Eversole E, Rogge-Obando K, Mayer AR, Costa VD. The neurocomputational bases of explore-exploit decision-making. Neuron 2022;110:1869-1879.e5. [PMID: 35390278 PMCID: PMC9167768 DOI: 10.1016/j.neuron.2022.03.014] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 12/11/2021] [Accepted: 03/10/2022] [Indexed: 02/04/2023]

Leopold DA, Averbeck BB. Self-tuition as an essential design feature of the brain. Philos Trans R Soc Lond B Biol Sci 2022;377:20200530. [PMID: 34957855 PMCID: PMC8710880 DOI: 10.1098/rstb.2020.0530] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open

Overcoming cognitive set bias requires more than seeing an alternative strategy. Sci Rep 2022;12:2179. [PMID: 35140344 PMCID: PMC8828898 DOI: 10.1038/s41598-022-06237-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 01/21/2022] [Indexed: 12/05/2022] Open

Differential coding of goals and actions in ventral and dorsal corticostriatal circuits during goal-directed behavior. Cell Rep 2022;38:110198. [PMID: 34986350 PMCID: PMC9608360 DOI: 10.1016/j.celrep.2021.110198] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 07/08/2021] [Accepted: 12/10/2021] [Indexed: 02/04/2023] Open

Averbeck B, O'Doherty JP. Reinforcement-learning in fronto-striatal circuits. Neuropsychopharmacology 2022;47:147-162. [PMID: 34354249 PMCID: PMC8616931 DOI: 10.1038/s41386-021-01108-0] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 07/06/2021] [Accepted: 07/09/2021] [Indexed: 01/03/2023]

Curiosity or savouring? Information seeking is modulated by both uncertainty and valence. PLoS One 2021;16:e0257011. [PMID: 34559816 PMCID: PMC8462690 DOI: 10.1371/journal.pone.0257011] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Accepted: 08/20/2021] [Indexed: 11/19/2022] Open

Petitet P, Attaallah B, Manohar SG, Husain M. The computational cost of active information sampling before decision-making under uncertainty. Nat Hum Behav 2021;5:935-946. [PMID: 34045719 DOI: 10.1038/s41562-021-01116-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Accepted: 04/14/2021] [Indexed: 01/30/2023]

Livermore JJA, Holmes CL, Cutler J, Levstek M, Moga G, Brittain JRC, Campbell-Meiklejohn D. Selective effects of serotonin on choices to gather more information. J Psychopharmacol 2021;35:631-640. [PMID: 33601931 PMCID: PMC8278551 DOI: 10.1177/0269881121991571] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Gardner MPH, Schoenbaum G. The orbitofrontal cartographer. Behav Neurosci 2021;135:267-276. [PMID: 34060879 PMCID: PMC8177731 DOI: 10.1037/bne0000463] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Wilson RC, Bonawitz E, Costa VD, Ebitz RB. Balancing exploration and exploitation with information and randomization. Curr Opin Behav Sci 2021;38:49-56. [PMID: 33184605 PMCID: PMC7654823 DOI: 10.1016/j.cobeha.2020.10.001] [Citation(s) in RCA: 77] [Impact Index Per Article: 25.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]

Ferrari-Toniolo S, Bujold PM, Grabenhorst F, Báez-Mendoza R, Schultz W. Nonhuman Primates Satisfy Utility Maximization in Compliance with the Continuity Axiom of Expected Utility Theory. J Neurosci 2021;41:2964-2979. [PMID: 33542082 PMCID: PMC8018892 DOI: 10.1523/jneurosci.0955-20.2020] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Revised: 11/13/2020] [Accepted: 11/19/2020] [Indexed: 11/21/2022] Open

Abstract

Expected Utility Theory (EUT), the first axiomatic theory of risky choice, describes choices as a utility maximization process: decision makers assign a subjective value (utility) to each choice option and choose the one with the highest utility. The continuity axiom, central to Expected Utility Theory and its modifications, is a necessary and sufficient condition for the definition of numerical utilities. The axiom requires decision makers to be indifferent between a gamble and a specific probabilistic combination of a more preferred and a less preferred gamble. While previous studies demonstrated that monkeys choose according to combinations of objective reward magnitude and probability, a concept-driven experimental approach for assessing the axiomatically defined conditions for maximizing utility by animals is missing. We experimentally tested the continuity axiom for a broad class of gamble types in 4 male rhesus macaque monkeys, showing that their choice behavior complied with the existence of a numerical utility measure as defined by the economic theory. We used the numerical quantity specified in the continuity axiom to characterize subjective preferences in a magnitude-probability space. This mapping highlighted a trade-off relation between reward magnitudes and probabilities, compatible with the existence of a utility function underlying subjective value computation. These results support the existence of a numerical utility function able to describe choices, allowing for the investigation of the neuronal substrates responsible for coding such rigorously defined quantity.SIGNIFICANCE STATEMENT A common assumption of several economic choice theories is that decisions result from the comparison of subjectively assigned values (utilities). This study demonstrated the compliance of monkey behavior with the continuity axiom of Expected Utility Theory, implying a subjective magnitude-probability trade-off relation, which supports the existence of numerical utility directly linked to the theoretical economic framework. We determined a numerical utility measure able to describe choices, which can serve as a correlate for the neuronal activity in the quest for brain structures and mechanisms guiding decisions.

Collapse

Feng SF, Wang S, Zarnescu S, Wilson RC. The dynamics of explore-exploit decisions reveal a signal-to-noise mechanism for random exploration. Sci Rep 2021;11:3077. [PMID: 33542333 PMCID: PMC7862437 DOI: 10.1038/s41598-021-82530-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 12/16/2020] [Indexed: 12/29/2022] Open

van Lieshout LLF, de Lange FP, Cools R. Why so curious? Quantifying mechanisms of information seeking. Curr Opin Behav Sci 2020. [DOI: 10.1016/j.cobeha.2020.08.005] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Averbeck BB, Murray EA. Hypothalamic Interactions with Large-Scale Neural Circuits Underlying Reinforcement Learning and Motivated Behavior. Trends Neurosci 2020;43:681-694. [PMID: 32762959 PMCID: PMC7483858 DOI: 10.1016/j.tins.2020.06.006] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 06/02/2020] [Accepted: 06/19/2020] [Indexed: 02/02/2023]

Moreno-Bote R, Ramírez-Ruiz J, Drugowitsch J, Hayden BY. Heuristics and optimal solutions to the breadth-depth dilemma. Proc Natl Acad Sci U S A 2020;117:19799-19808. [PMID: 32759219 PMCID: PMC7443877 DOI: 10.1073/pnas.2004929117] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open

Soltani A, Izquierdo A. Adaptive learning under expected and unexpected uncertainty. Nat Rev Neurosci 2020;20:635-644. [PMID: 31147631 DOI: 10.1038/s41583-019-0180-y] [Citation(s) in RCA: 105] [Impact Index Per Article: 26.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Primate Orbitofrontal Cortex Codes Information Relevant for Managing Explore-Exploit Tradeoffs. J Neurosci 2020;40:2553-2561. [PMID: 32060169 DOI: 10.1523/jneurosci.2355-19.2020] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Revised: 01/26/2020] [Accepted: 02/09/2020] [Indexed: 11/21/2022] Open

Abstract

Reinforcement learning (RL) refers to the behavioral process of learning to obtain reward and avoid punishment. An important component of RL is managing explore-exploit tradeoffs, which refers to the problem of choosing between exploiting options with known values and exploring unfamiliar options. We examined correlates of this tradeoff, as well as other RL related variables, in orbitofrontal cortex (OFC) while three male monkeys performed a three-armed bandit learning task. During the task, novel choice options periodically replaced familiar options. The values of the novel options were unknown, and the monkeys had to explore them to see if they were better than other currently available options. The identity of the chosen stimulus and the reward outcome were strongly encoded in the responses of single OFC neurons. These two variables define the states and state transitions in our model that are relevant to decision-making. The chosen value of the option and the relative value of exploring that option were encoded at intermediate levels. We also found that OFC value coding was stimulus specific, as opposed to coding value independent of the identity of the option. The location of the option and the value of the current environment were encoded at low levels. Therefore, we found encoding of the variables relevant to learning and managing explore-exploit tradeoffs in OFC. These results are consistent with findings in the ventral striatum and amygdala and show that this monosynaptically connected network plays an important role in learning based on the immediate and future consequences of choices.SIGNIFICANCE STATEMENT Orbitofrontal cortex (OFC) has been implicated in representing the expected values of choices. Here we extend these results and show that OFC also encodes information relevant to managing explore-exploit tradeoffs. Specifically, OFC encodes an exploration bonus, which characterizes the relative value of exploring novel choice options. OFC also strongly encodes the identity of the chosen stimulus, and reward outcomes, which are necessary for computing the value of novel and familiar options.

Collapse

Ebitz RB, Sleezer BJ, Jedema HP, Bradberry CW, Hayden BY. Tonic exploration governs both flexibility and lapses. PLoS Comput Biol 2019;15:e1007475. [PMID: 31703063 PMCID: PMC6867658 DOI: 10.1371/journal.pcbi.1007475] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Revised: 11/20/2019] [Accepted: 10/10/2019] [Indexed: 11/20/2022] Open

Reitich-Stolero T, Aberg KC, Paz R. Re-exploring Mechanisms of Exploration. Neuron 2019;103:360-363. [PMID: 31394060 DOI: 10.1016/j.neuron.2019.07.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Costa VD, Mitz AR, Averbeck BB. Subcortical Substrates of Explore-Exploit Decisions in Primates. Neuron 2019;103:533-545.e5. [PMID: 31196672 PMCID: PMC6687547 DOI: 10.1016/j.neuron.2019.05.017] [Citation(s) in RCA: 66] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2018] [Revised: 03/27/2019] [Accepted: 05/08/2019] [Indexed: 01/06/2023]

Baker SC, Konova AB, Daw ND, Horga G. A distinct inferential mechanism for delusions in schizophrenia. Brain 2019;142:1797-1812. [PMID: 30895299 PMCID: PMC6644849 DOI: 10.1093/brain/awz051] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Revised: 12/27/2018] [Accepted: 01/16/2019] [Indexed: 12/14/2022] Open

Abstract

Delusions, a core symptom of psychosis, are false beliefs that are rigidly held with strong conviction despite contradictory evidence. Alterations in inferential processes have long been proposed to underlie delusional pathology, but previous attempts to show this have failed to yield compelling evidence for a specific relationship between inferential abnormalities and delusional severity in schizophrenia. Using a novel, incentivized information-sampling task (a modified version of the beads task), alongside well-characterized decision-making tasks, we sought a mechanistic understanding of delusions in a sample of medicated and unmedicated patients with schizophrenia who exhibited a wide range of delusion severity. In this novel task, participants chose whether to draw beads from one of two hidden jars or to guess the identity of the hidden jar, in order to minimize financial loss from a monetary endowment, and concurrently reported their probability estimates for the hidden jar. We found that patients with higher delusion severity exhibited increased information seeking (i.e. increased draws-to-decision behaviour). This increase was highly specific to delusion severity as compared to the severity of other psychotic symptoms, working-memory capacity, and other clinical and socio-demographic characteristics. Delusion-related increases in information seeking were present in unmedicated patients, indicating that they were unlikely due to antipsychotic medication. In addition, after adjusting for delusion severity, patients as a whole exhibited decreased information seeking relative to healthy individuals, a decrease that correlated with lower socioeconomic status. Computational analyses of reported probability estimates further showed that more delusional patients exhibited abnormal belief updating characterized by stronger reliance on prior beliefs formed early in the inferential process, a feature that correlated with increased information seeking in patients. Other decision-making parameters that could have theoretically explained the delusion effects, such as those related to subjective valuation, were uncorrelated with both delusional severity and information seeking among the patients. In turn, we found some preliminary evidence that subjective valuation (rather than belief updating) may explain group differences in information seeking unrelated to delusions. Together, these results suggest that abnormalities in belief updating, characterized by stronger reliance on prior beliefs formed by incorporating information presented earlier in the inferential process, may be a core computational mechanism of delusional ideation in psychosis. Our results thus provide direct empirical support for an inferential mechanism that naturally captures the characteristic rigidity associated with delusional beliefs.

Collapse

Dopamine blockade impairs the exploration-exploitation trade-off in rats. Sci Rep 2019;9:6770. [PMID: 31043685 PMCID: PMC6494917 DOI: 10.1038/s41598-019-43245-z] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Accepted: 04/18/2019] [Indexed: 01/30/2023] Open

Monkeys are curious about counterfactual outcomes. Cognition 2019;189:1-10. [PMID: 30889493 DOI: 10.1016/j.cognition.2019.03.009] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 03/11/2019] [Accepted: 03/13/2019] [Indexed: 11/22/2022]

Furl N, Averbeck BB, McKay RT. Looking for Mr(s) Right: Decision bias can prevent us from finding the most attractive face. Cogn Psychol 2019;111:1-14. [PMID: 30826584 DOI: 10.1016/j.cogpsych.2019.02.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Revised: 12/21/2018] [Accepted: 02/22/2019] [Indexed: 01/28/2023]

Reinforcement learning in artificial and biological systems. NAT MACH INTELL 2019. [DOI: 10.1038/s42256-019-0025-4] [Citation(s) in RCA: 96] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Dale G, Sampers D, Loo S, Green CS. Individual differences in exploration and persistence: Grit and beliefs about ability and reward. PLoS One 2018;13:e0203131. [PMID: 30180200 PMCID: PMC6122809 DOI: 10.1371/journal.pone.0203131] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Accepted: 08/15/2018] [Indexed: 11/18/2022] Open

Abstract

The tradeoff between knowing when to seek greater rewards (exploration), and knowing when to settle (exploitation), is critical to success. One dispositional factor that may modulate this tradeoff is "grit." Gritty individuals tend to persist in the face of difficulty and consequently experience greater life success. It is possible that they may also experience a greater tendency to explore in a reward task. However, although most exploration/exploitation tasks manipulate beliefs about the presence/magnitude of rewards in the environment, the belief of one's ability to actually achieve a reward is also critical. As such, we investigated whether individuals higher in grit were more likely to explore, and how beliefs about the magnitude/presence of rewards, and the perceived ability to achieve a reward, modulated their exploration tendencies. Over two experiments, participants completed 4 different exploration/persistence tasks: two that tapped into participant beliefs about the presence/magnitude of rewards, and two that tapped into participant beliefs about their ability to achieve a reward. Participants also completed measures of dispositional grit (Experiment 1a and 1b), conscientiousness (Experiment 1b), and working memory (Experiment 1a and 1b). In both experiments, we found a relationship between the two "belief of rewards" tasks, as well as between the two "belief of ability" tasks, but performance was unrelated across the two types of task. We also found that dispositional grit was strongly associated with greater exploration, but only on the "belief of ability" tasks. Finally, in Experiment 1b we showed that conscientiousness better predicted exploration on the "belief of ability" tasks than grit, suggesting that it is not grittiness per se that is associated with exploration. Overall, our findings showed that individuals high in grit/conscientiousness are more likely to explore, but only when there is a known reward available that they believe they have the ability to achieve.

Collapse

Fetsch CR, Odean NN, Jeurissen D, El-Shamayleh Y, Horwitz GD, Shadlen MN. Focal optogenetic suppression in macaque area MT biases direction discrimination and decision confidence, but only transiently. eLife 2018;7:e36523. [PMID: 30051817 PMCID: PMC6086666 DOI: 10.7554/elife.36523] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2018] [Accepted: 07/12/2018] [Indexed: 12/29/2022] Open

Martinelli C, Rigoli F, Averbeck B, Shergill SS. The value of novelty in schizophrenia. Schizophr Res 2018;192:287-293. [PMID: 28495493 PMCID: PMC5890442 DOI: 10.1016/j.schres.2017.05.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/06/2016] [Revised: 04/27/2017] [Accepted: 05/06/2017] [Indexed: 11/15/2022]

Cogliati Dezza I, Yu AJ, Cleeremans A, Alexander W. Learning the value of information and reward over time when solving exploration-exploitation problems. Sci Rep 2017;7:16919. [PMID: 29209058 PMCID: PMC5717252 DOI: 10.1038/s41598-017-17237-w] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2017] [Accepted: 11/22/2017] [Indexed: 11/09/2022] Open

Vicario-Feliciano R, Murray EA, Averbeck BB. Ventral striatum lesions do not affect reinforcement learning with deterministic outcomes on slow time scales. Behav Neurosci 2017;131:385-91. [PMID: 28805428 DOI: 10.1037/bne0000211] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Autonomous robotic exploration using a utility function based on Rényi’s general theory of entropy. Auton Robots 2017. [DOI: 10.1007/s10514-017-9662-9] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

van den Berg R, Zylberberg A, Kiani R, Shadlen MN, Wolpert DM. Confidence Is the Bridge between Multi-stage Decisions. Curr Biol 2016;26:3157-3168. [PMID: 27866891 PMCID: PMC5154755 DOI: 10.1016/j.cub.2016.10.021] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Revised: 09/18/2016] [Accepted: 10/12/2016] [Indexed: 11/30/2022]

Yang SCH, Wolpert DM, Lengyel M. Theoretical perspectives on active sensing. Curr Opin Behav Sci 2016;11:100-108. [PMID: 30175197 DOI: 10.1016/j.cobeha.2016.06.009] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]