51
|
Yi R, Landes RD, Bickel WK. Novel Models of Intertemporal Valuation: Past and Future Outcomes. ACTA ACUST UNITED AC 2009; 2:102. [PMID: 20157625 DOI: 10.1037/a0017571] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Temporal discounting refers to the reduction in the present subjective value of an outcome as a function of the temporal distance to that outcome. Though a number of mathematical models have been proposed to describe this time/value relationship, this search has largely excluded insights from the literature on memory decay. This study examines the utility of memory decay models by comparing the fits of four of these models to fits from established temporal discounting models using past and future temporal discounting data. These results (1) suggest that a single model describes valuation of both future and past outcomes, (2) indicate the exponential-power model, from memory decay literature, is statistically superior in fitting discounting data from both past and future outcomes, and (3) support the advancing perspective of the psychological interconnectedness of the future and past.
Collapse
Affiliation(s)
- Richard Yi
- University of Arkansas for Medical Sciences
| | | | | |
Collapse
|
52
|
Han CE, Arbib MA, Schweighofer N. Stroke rehabilitation reaches a threshold. PLoS Comput Biol 2008; 4:e1000133. [PMID: 18769588 PMCID: PMC2527783 DOI: 10.1371/journal.pcbi.1000133] [Citation(s) in RCA: 95] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2007] [Accepted: 06/18/2008] [Indexed: 11/18/2022] Open
Abstract
Motor training with the upper limb affected by stroke partially reverses the loss of cortical representation after lesion and has been proposed to increase spontaneous arm use. Moreover, repeated attempts to use the affected hand in daily activities create a form of practice that can potentially lead to further improvement in motor performance. We thus hypothesized that if motor retraining after stroke increases spontaneous arm use sufficiently, then the patient will enter a virtuous circle in which spontaneous arm use and motor performance reinforce each other. In contrast, if the dose of therapy is not sufficient to bring spontaneous use above threshold, then performance will not increase and the patient will further develop compensatory strategies with the less affected hand. To refine this hypothesis, we developed a computational model of bilateral hand use in arm reaching to study the interactions between adaptive decision making and motor relearning after motor cortex lesion. The model contains a left and a right motor cortex, each controlling the opposite arm, and a single action choice module. The action choice module learns, via reinforcement learning, the value of using each arm for reaching in specific directions. Each motor cortex uses a neural population code to specify the initial direction along which the contralateral hand moves towards a target. The motor cortex learns to minimize directional errors and to maximize neuronal activity for each movement. The derived learning rule accounts for the reversal of the loss of cortical representation after rehabilitation and the increase of this loss after stroke with insufficient rehabilitation. Further, our model exhibits nonlinear and bistable behavior: if natural recovery, motor training, or both, brings performance above a certain threshold, then training can be stopped, as the repeated spontaneous arm use provides a form of motor learning that further bootstraps performance and spontaneous use. Below this threshold, motor training is “in vain”: there is little spontaneous arm use after training, the model exhibits learned nonuse, and compensatory movements with the less affected hand are reinforced. By exploring the nonlinear dynamics of stroke recovery using a biologically plausible neural model that accounts for reversal of the loss of motor cortex representation following rehabilitation or the lack thereof, respectively, we can explain previously hard to reconcile data on spontaneous arm use in stroke recovery. Further, our threshold prediction could be tested with an adaptive train–wait–train paradigm: if spontaneous arm use has increased in the “wait” period, then the threshold has been reached, and rehabilitation can be stopped. If spontaneous arm use is still low or has decreased, then another bout of rehabilitation is to be provided. Stroke often leaves patients with predominantly unilateral functional limitations of the arm and hand. Although recovery of function after stroke is often achieved by compensatory use of the less affected limb, improving use of the more affected limb has been associated with increased quality of life. Here, we developed a biologically plausible model of bilateral reaching movements to investigate the mechanisms and conditions leading to effective rehabilitation. Our motor cortex model accounts for the experimental observation that motor training can reverse the loss of cortical representation due to lesion. Further, our model predicts that if spontaneous arm use is above a certain threshold, then training can be stopped, as the repeated spontaneous use provides a form of motor learning that further improves performance and spontaneous use. Below this threshold, training is “in vain,” and compensatory movements with the less affected hand are reinforced. Our model is a first step in the development of adaptive and cost-effective rehabilitation methods tailored to individuals poststroke.
Collapse
Affiliation(s)
- Cheol E. Han
- Department of Computer Science, University of Southern California, Los Angeles, California, United States of America
- USC Brain Project, University of Southern California, Los Angeles, California, United States of America
| | - Michael A. Arbib
- USC Brain Project, University of Southern California, Los Angeles, California, United States of America
- Department of Computer Science, University of Southern California, Los Angeles, California, United States of America
- Department of Neuroscience, University of Southern California, Los Angeles, California, United States of America
| | - Nicolas Schweighofer
- Department of Biokinesiology and Physical Therapy, University of Southern California, Los Angeles, California, United States of America
- * E-mail:
| |
Collapse
|
53
|
Reinforcement learning: the good, the bad and the ugly. Curr Opin Neurobiol 2008; 18:185-96. [PMID: 18708140 DOI: 10.1016/j.conb.2008.08.003] [Citation(s) in RCA: 288] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2008] [Revised: 07/30/2008] [Accepted: 08/05/2008] [Indexed: 11/21/2022]
Abstract
Reinforcement learning provides both qualitative and quantitative frameworks for understanding and modeling adaptive decision-making in the face of rewards and punishments. Here we review the latest dispatches from the forefront of this field, and map out some of the territories where lie monsters.
Collapse
|
54
|
Kim S, Hwang J, Lee D. Prefrontal coding of temporally discounted values during intertemporal choice. Neuron 2008; 59:161-72. [PMID: 18614037 DOI: 10.1016/j.neuron.2008.05.010] [Citation(s) in RCA: 180] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2008] [Revised: 03/24/2008] [Accepted: 05/12/2008] [Indexed: 11/29/2022]
Abstract
Reward from a particular action is seldom immediate, and the influence of such delayed outcome on choice decreases with delay. It has been postulated that when faced with immediate and delayed rewards, decision makers choose the option with maximum temporally discounted value. We examined the preference of monkeys for delayed reward in an intertemporal choice task and the neural basis for real-time computation of temporally discounted values in the dorsolateral prefrontal cortex. During this task, the locations of the targets associated with small or large rewards and their corresponding delays were randomly varied. We found that prefrontal neurons often encoded the temporally discounted value of reward expected from a particular option. Furthermore, activity tended to increase with [corrected] discounted values for targets [corrected] presented in the neuron's preferred direction, suggesting that activity related to temporally discounted values in the prefrontal cortex might determine the animal's behavior during intertemporal choice.
Collapse
Affiliation(s)
- Soyoun Kim
- Department of Neurobiology, Yale University School of Medicine, New Haven, CT 06510, USA
| | | | | |
Collapse
|
55
|
La Camera G, Richmond BJ. Modeling the violation of reward maximization and invariance in reinforcement schedules. PLoS Comput Biol 2008; 4:e1000131. [PMID: 18688266 PMCID: PMC2453237 DOI: 10.1371/journal.pcbi.1000131] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2008] [Accepted: 06/18/2008] [Indexed: 11/19/2022] Open
Abstract
It is often assumed that animals and people adjust their behavior to maximize reward acquisition. In visually cued reinforcement schedules, monkeys make errors in trials that are not immediately rewarded, despite having to repeat error trials. Here we show that error rates are typically smaller in trials equally distant from reward but belonging to longer schedules (referred to as "schedule length effect"). This violates the principles of reward maximization and invariance and cannot be predicted by the standard methods of Reinforcement Learning, such as the method of temporal differences. We develop a heuristic model that accounts for all of the properties of the behavior in the reinforcement schedule task but whose predictions are not different from those of the standard temporal difference model in choice tasks. In the modification of temporal difference learning introduced here, the effect of schedule length emerges spontaneously from the sensitivity to the immediately preceding trial. We also introduce a policy for general Markov Decision Processes, where the decision made at each node is conditioned on the motivation to perform an instrumental action, and show that the application of our model to the reinforcement schedule task and the choice task are special cases of this general theoretical framework. Within this framework, Reinforcement Learning can approach contextual learning with the mixture of empirical findings and principled assumptions that seem to coexist in the best descriptions of animal behavior. As examples, we discuss two phenomena observed in humans that often derive from the violation of the principle of invariance: "framing," wherein equivalent options are treated differently depending on the context in which they are presented, and the "sunk cost" effect, the greater tendency to continue an endeavor once an investment in money, effort, or time has been made. The schedule length effect might be a manifestation of these phenomena in monkeys.
Collapse
Affiliation(s)
- Giancarlo La Camera
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Department of Health and Human Services, Bethesda, Maryland, United States of America
| | - Barry J. Richmond
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Department of Health and Human Services, Bethesda, Maryland, United States of America
| |
Collapse
|
56
|
Abstract
Previous animal experiments have shown that serotonin is involved in the control of impulsive choice, as characterized by high preference for small immediate rewards over larger delayed rewards. Previous human studies under serotonin manipulation, however, have been either inconclusive on the effect on impulsivity or have shown an effect in the speed of action-reward learning or the optimality of action choice. Here, we manipulated central serotonergic levels of healthy volunteers by dietary tryptophan depletion and loading. Subjects performed a "dynamic" delayed reward choice task that required a continuous update of the reward value estimates to maximize total gain. By using a computational model of delayed reward choice learning, we estimated the parameters governing the subjects' reward choices in low-, normal, and high-serotonin conditions. We found an increase of proportion in small reward choices, together with an increase in the rate of discounting of delayed rewards in the low-serotonin condition compared with the control and high-serotonin conditions. There were no significant differences between conditions in the speed of learning of the estimated delayed reward values or in the variability of reward choice. Therefore, in line with previous animal experiments, our results show that low-serotonin levels steepen delayed reward discounting in humans. The combined results of our previous and current studies suggest that serotonin may adjust the rate of delayed reward discounting via the modulation of specific loops in parallel corticobasal ganglia circuits.
Collapse
|
57
|
Abstract
Pavlovian predictions of future aversive outcomes lead to behavioral inhibition, suppression, and withdrawal. There is considerable evidence for the involvement of serotonin in both the learning of these predictions and the inhibitory consequences that ensue, although less for a causal relationship between the two. In the context of a highly simplified model of chains of affectively charged thoughts, we interpret the combined effects of serotonin in terms of pruning a tree of possible decisions, (i.e., eliminating those choices that have low or negative expected outcomes). We show how a drop in behavioral inhibition, putatively resulting from an experimentally or psychiatrically influenced drop in serotonin, could result in unexpectedly large negative prediction errors and a significant aversive shift in reinforcement statistics. We suggest an interpretation of this finding that helps dissolve the apparent contradiction between the fact that inhibition of serotonin reuptake is the first-line treatment of depression, although serotonin itself is most strongly linked with aversive rather than appetitive outcomes and predictions. Serotonin is an evolutionarily ancient neuromodulator probably best known for its role in psychiatric disorders. However, that role has long appeared contradictory to its role in normal function, and indeed its various roles in normal affective behaviors have been hard to reconcile. Here, we model two predominant functions of normal serotonin function in a highly simplified reinforcement learning model and show how these may explain some of its complex roles in depression and anxiety.
Collapse
Affiliation(s)
- Peter Dayan
- Gatsby Computational Neuroscience Unit, University College London, London, United Kingdom
| | - Quentin J. M Huys
- Gatsby Computational Neuroscience Unit, University College London, London, United Kingdom
- Center for Theoretical Neuroscience, Columbia University, New York, New York, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
58
|
Kalenscher T, Pennartz CM. Is a bird in the hand worth two in the future? The neuroeconomics of intertemporal decision-making. Prog Neurobiol 2008; 84:284-315. [DOI: 10.1016/j.pneurobio.2007.11.004] [Citation(s) in RCA: 114] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2007] [Revised: 11/28/2007] [Accepted: 11/29/2007] [Indexed: 10/22/2022]
|
59
|
Tanaka SC, Schweighofer N, Asahi S, Shishida K, Okamoto Y, Yamawaki S, Doya K. Serotonin differentially regulates short- and long-term prediction of rewards in the ventral and dorsal striatum. PLoS One 2007; 2:e1333. [PMID: 18091999 PMCID: PMC2129114 DOI: 10.1371/journal.pone.0001333] [Citation(s) in RCA: 141] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2007] [Accepted: 11/26/2007] [Indexed: 11/26/2022] Open
Abstract
Background The ability to select an action by considering both delays and amount of reward outcome is critical for maximizing long-term benefits. Although previous animal experiments on impulsivity have suggested a role of serotonin in behaviors requiring prediction of delayed rewards, the underlying neural mechanism is unclear. Methodology/Principal Findings To elucidate the role of serotonin in the evaluation of delayed rewards, we performed a functional brain imaging experiment in which subjects chose small-immediate or large-delayed liquid rewards under dietary regulation of tryptophan, a precursor of serotonin. A model-based analysis revealed that the activity of the ventral part of the striatum was correlated with reward prediction at shorter time scales, and this correlated activity was stronger at low serotonin levels. By contrast, the activity of the dorsal part of the striatum was correlated with reward prediction at longer time scales, and this correlated activity was stronger at high serotonin levels. Conclusions/Significance Our results suggest that serotonin controls the time scale of reward prediction by differentially regulating activities within the striatum.
Collapse
Affiliation(s)
- Saori C. Tanaka
- Department of Computational Neurobiology, ATR Computational Neuroscience Laboratories, Seika, Souraku, Kyoto, Japan
- Core Research for Evolutional Science and Technology (CREST), Japan Science and Technology Agency, Seika, Souraku, Kyoto, Japan
- * To whom correspondence should be addressed. E-mail: (ST); (KD)
| | - Nicolas Schweighofer
- Department of Computational Neurobiology, ATR Computational Neuroscience Laboratories, Seika, Souraku, Kyoto, Japan
- Core Research for Evolutional Science and Technology (CREST), Japan Science and Technology Agency, Seika, Souraku, Kyoto, Japan
- Department of Biokinesiology and Physical Therapy, University of Southern California, Los Angeles, California, United States of America
| | - Shuji Asahi
- Core Research for Evolutional Science and Technology (CREST), Japan Science and Technology Agency, Seika, Souraku, Kyoto, Japan
- Department of Psychiatry and Neurosciences, Hiroshima University, Minamiku, Hiroshima, Japan
| | - Kazuhiro Shishida
- Core Research for Evolutional Science and Technology (CREST), Japan Science and Technology Agency, Seika, Souraku, Kyoto, Japan
- Department of Psychiatry and Neurosciences, Hiroshima University, Minamiku, Hiroshima, Japan
| | - Yasumasa Okamoto
- Core Research for Evolutional Science and Technology (CREST), Japan Science and Technology Agency, Seika, Souraku, Kyoto, Japan
- Department of Psychiatry and Neurosciences, Hiroshima University, Minamiku, Hiroshima, Japan
| | - Shigeto Yamawaki
- Core Research for Evolutional Science and Technology (CREST), Japan Science and Technology Agency, Seika, Souraku, Kyoto, Japan
- Department of Psychiatry and Neurosciences, Hiroshima University, Minamiku, Hiroshima, Japan
| | - Kenji Doya
- Department of Computational Neurobiology, ATR Computational Neuroscience Laboratories, Seika, Souraku, Kyoto, Japan
- Core Research for Evolutional Science and Technology (CREST), Japan Science and Technology Agency, Seika, Souraku, Kyoto, Japan
- Neural Computational Unit, Okinawa Institute of Science and Technology, Suzaki, Uruma, Okinawa, Japan
- * To whom correspondence should be addressed. E-mail: (ST); (KD)
| |
Collapse
|
60
|
Rosati AG, Stevens JR, Hare B, Hauser MD. The evolutionary origins of human patience: temporal preferences in chimpanzees, bonobos, and human adults. Curr Biol 2007; 17:1663-8. [PMID: 17900899 DOI: 10.1016/j.cub.2007.08.033] [Citation(s) in RCA: 187] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2007] [Revised: 08/07/2007] [Accepted: 08/10/2007] [Indexed: 11/28/2022]
Abstract
To make adaptive choices, individuals must sometimes exhibit patience, forgoing immediate benefits to acquire more valuable future rewards [1-3]. Although humans account for future consequences when making temporal decisions [4], many animal species wait only a few seconds for delayed benefits [5-10]. Current research thus suggests a phylogenetic gap between patient humans and impulsive, present-oriented animals [9, 11], a distinction with implications for our understanding of economic decision making [12] and the origins of human cooperation [13]. On the basis of a series of experimental results, we reject this conclusion. First, bonobos (Pan paniscus) and chimpanzees (Pan troglodytes) exhibit a degree of patience not seen in other animals tested thus far. Second, humans are less willing to wait for food rewards than are chimpanzees. Third, humans are more willing to wait for monetary rewards than for food, and show the highest degree of patience only in response to decisions about money involving low opportunity costs. These findings suggest that core components of the capacity for future-oriented decisions evolved before the human lineage diverged from apes. Moreover, the different levels of patience that humans exhibit might be driven by fundamental differences in the mechanisms representing biological versus abstract rewards.
Collapse
Affiliation(s)
- Alexandra G Rosati
- Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, Leipzig D-04103, Germany.
| | | | | | | |
Collapse
|
61
|
Vardavas R, Breban R, Blower S. Can influenza epidemics be prevented by voluntary vaccination? PLoS Comput Biol 2007; 3:e85. [PMID: 17480117 PMCID: PMC1864996 DOI: 10.1371/journal.pcbi.0030085] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2007] [Accepted: 03/30/2007] [Indexed: 11/24/2022] Open
Abstract
Previous modeling studies have identified the vaccination coverage level necessary for preventing influenza epidemics, but have not shown whether this critical coverage can be reached. Here we use computational modeling to determine, for the first time, whether the critical coverage for influenza can be achieved by voluntary vaccination. We construct a novel individual-level model of human cognition and behavior; individuals are characterized by two biological attributes (memory and adaptability) that they use when making vaccination decisions. We couple this model with a population-level model of influenza that includes vaccination dynamics. The coupled models allow individual-level decisions to influence influenza epidemiology and, conversely, influenza epidemiology to influence individual-level decisions. By including the effects of adaptive decision-making within an epidemic model, we can reproduce two essential characteristics of influenza epidemiology: annual variation in epidemic severity and sporadic occurrence of severe epidemics. We suggest that individual-level adaptive decision-making may be an important (previously overlooked) causal factor in driving influenza epidemiology. We find that severe epidemics cannot be prevented unless vaccination programs offer incentives. Frequency of severe epidemics could be reduced if programs provide, as an incentive to be vaccinated, several years of free vaccines to individuals who pay for one year of vaccination. Magnitude of epidemic amelioration will be determined by the number of years of free vaccination, an individuals' adaptability in decision-making, and their memory. This type of incentive program could control epidemics if individuals are very adaptable and have long-term memories. However, incentive-based programs that provide free vaccination for families could increase the frequency of severe epidemics. We conclude that incentive-based vaccination programs are necessary to control influenza, but some may be detrimental. Surprisingly, we find that individuals' memories and flexibility in adaptive decision-making can be extremely important factors in determining the success of influenza vaccination programs. Finally, we discuss the implication of our results for controlling pandemics.
Collapse
Affiliation(s)
- Raffaele Vardavas
- Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, California, United States of America
| | - Romulus Breban
- Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, California, United States of America
| | - Sally Blower
- Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, California, United States of America
| |
Collapse
|