1
|
Cleaveland JM. The active time model of concurrent choice. PLoS One 2024; 19:e0301173. [PMID: 38771859 PMCID: PMC11108226 DOI: 10.1371/journal.pone.0301173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 03/12/2024] [Indexed: 05/23/2024] Open
Abstract
The following paper describes a steady-state model of concurrent choice, termed the active time model (ATM). ATM is derived from maximization principles and is characterized by a semi-Markov process. The model proposes that the controlling stimulus in concurrent variable-interval (VI) VI schedules of reinforcement is the time interval since the most recent response, termed here "the active interresponse time" or simply "active time." In the model after a response is generated, it is categorized by a function that relates active times to switch/stay probabilities. In the paper the output of ATM is compared with predictions made by three other models of operant conditioning: melioration, a version of scalar expectancy theory (SET), and momentary maximization. Data sets considered include preferences in multiple-concurrent VI VI schedules, molecular choice patterns, correlations between switching and perseveration, and molar choice proportions. It is shown that ATM can account for all of these data sets, while the other models produce more limited fits. However, rather than argue that ATM is the singular model for concurrent VI VI choice, a consideration of its concept space leads to the conclusion that operant choice is multiply-determined, and that an adaptive viewpoint-one that considers experimental procedures both as selecting mechanisms for animal choice as well as tests of the controlling variables of that choice-is warranted.
Collapse
Affiliation(s)
- J. Mark Cleaveland
- Department of Psychological Science, Vassar College, Poughkeepsie, NY, United States of America
| |
Collapse
|
2
|
Kane GA, Senne RA, Scott BB. Rat movements reflect internal decision dynamics in an evidence accumulation task. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.11.556575. [PMID: 37745309 PMCID: PMC10515875 DOI: 10.1101/2023.09.11.556575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Perceptual decision-making involves multiple cognitive processes, including accumulation of sensory evidence, planning, and executing a motor action. How these processes are intertwined is unclear; some models assume that decision-related processes precede motor execution, whereas others propose that movements reflecting on-going decision processes occur before commitment to a choice. Here we develop and apply two complementary methods to study the relationship between decision processes and the movements leading up to a choice. The first is a free response pulse-based evidence accumulation task, in which stimuli continue until choice is reported. The second is a motion-based drift diffusion model (mDDM), in which movement variables from video pose estimation constrain decision parameters on a trial-by-trial basis. We find the mDDM provides a better model fit to rats' decisions in the free response accumulation task than traditional DDM models. Interestingly, on each trial we observed a period of time, prior to choice, that was characterized by head immobility. The length of this period was positively correlated with the rats' decision bounds and stimuli presented during this period had the greatest impact on choice. Together these results support a model in which internal decision dynamics are reflected in movements and demonstrate that inclusion of movement parameters improves the performance of diffusion-to-bound decision models.
Collapse
Affiliation(s)
- Gary A. Kane
- Department of Psychological and Brain Sciences and Center for Systems Neuroscience, Boston University, Boston MA
| | - Ryan A. Senne
- Graduate Program in Neuroscience, Boston University, Boston MA
| | - Benjamin B. Scott
- Department of Psychological and Brain Sciences and Center for Systems Neuroscience, Boston University, Boston MA
| |
Collapse
|
3
|
Undermatching Is a Consequence of Policy Compression. J Neurosci 2023; 43:447-457. [PMID: 36639891 PMCID: PMC9864556 DOI: 10.1523/jneurosci.1003-22.2022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 10/14/2022] [Accepted: 11/17/2022] [Indexed: 12/12/2022] Open
Abstract
The matching law describes the tendency of agents to match the ratio of choices allocated to the ratio of rewards received when choosing among multiple options (Herrnstein, 1961). Perfect matching, however, is infrequently observed. Instead, agents tend to undermatch or bias choices toward the poorer option. Overmatching, or the tendency to bias choices toward the richer option, is rarely observed. Despite the ubiquity of undermatching, it has received an inadequate normative justification. Here, we assume agents not only seek to maximize reward, but also seek to minimize cognitive cost, which we formalize as policy complexity (the mutual information between actions and states of the environment). Policy complexity measures the extent to which the policy of an agent is state dependent. Our theory states that capacity-constrained agents (i.e., agents that must compress their policies to reduce complexity) can only undermatch or perfectly match, but not overmatch, consistent with the empirical evidence. Moreover, using mouse behavioral data (male), we validate a novel prediction about which task conditions exaggerate undermatching. Finally, in patients with Parkinson's disease (male and female), we argue that a reduction in undermatching with higher dopamine levels is consistent with an increased policy complexity.SIGNIFICANCE STATEMENT The matching law describes the tendency of agents to match the ratio of choices allocated to different options to the ratio of reward received. For example, if option a yields twice as much reward as option b, matching states that agents will choose option a twice as much. However, agents typically undermatch: they choose the poorer option more frequently than expected. Here, we assume that agents seek to simultaneously maximize reward and minimize the complexity of their action policies. We show that this theory explains when and why undermatching occurs. Neurally, we show that policy complexity, and by extension undermatching, is controlled by tonic dopamine, consistent with other evidence that dopamine plays an important role in cognitive resource allocation.
Collapse
|
4
|
Food cue regulation of AGRP hunger neurons guides learning. Nature 2021; 595:695-700. [PMID: 34262177 PMCID: PMC8522184 DOI: 10.1038/s41586-021-03729-3] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Accepted: 06/15/2021] [Indexed: 02/04/2023]
Abstract
Agouti-related peptide (AGRP)-expressing neurons are activated by fasting-this causes hunger1-4, an aversive state that motivates the seeking and consumption of food5,6. Eating returns AGRP neuron activity towards baseline on three distinct timescales: rapidly and transiently following sensory detection of food cues6-8, slowly and longer-lasting in response to nutrients in the gut9,10, and even more slowly and permanently with restoration of energy balance9,11. The rapid regulation by food cues is of particular interest as its neurobiological basis and purpose are unknown. Given that AGRP neuron activity is aversive6, the sensory cue-linked reductions in activity could function to guide behaviour. To evaluate this, we first identified the circuit mediating sensory cue inhibition and then selectively perturbed it to determine function. Here, we show that a lateral hypothalamic glutamatergic → dorsomedial hypothalamic GABAergic (γ-aminobutyric acid-producing)12 → AGRP neuron circuit mediates this regulation. Interference with this circuit impairs food cue inhibition of AGRP neurons and, notably, greatly impairs learning of a sensory cue-initiated food-acquisition task. This is specific for food, as learning of an identical water-acquisition task is unaffected. We propose that decreases in aversive AGRP neuron activity6 mediated by this food-specific circuit increases the incentive salience13 of food cues, and thus facilitates the learning of food-acquisition tasks.
Collapse
|
5
|
Machado A, Vasconcelos M. Dissolving the molar-molecular controversy. J Exp Anal Behav 2021; 115:596-603. [PMID: 33497470 DOI: 10.1002/jeab.675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2020] [Revised: 11/26/2020] [Accepted: 12/09/2020] [Indexed: 11/07/2022]
|
6
|
Killeen PR. Moles and Molecules. J Exp Anal Behav 2021; 115:584-595. [PMID: 33428792 DOI: 10.1002/jeab.667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 11/18/2020] [Accepted: 11/19/2020] [Indexed: 11/06/2022]
|
7
|
Limited evidence for probability matching as a strategy in probability learning tasks. PSYCHOLOGY OF LEARNING AND MOTIVATION 2021. [DOI: 10.1016/bs.plm.2021.02.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
8
|
Shimp CP. Molecular (moment‐to‐moment) and molar (aggregate) analyses of behavior. J Exp Anal Behav 2020; 114:394-429. [DOI: 10.1002/jeab.626] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 08/24/2020] [Accepted: 08/26/2020] [Indexed: 11/06/2022]
|
9
|
Hachiga Y, Schwartz LP, Tripoli C, Michaels S, Kearns D, Silberberg A. Like chimpanzees (Pan troglodytes), pigeons (Columba livia domestica) match and Nash equilibrate where humans (Homo sapiens) do not. ACTA ACUST UNITED AC 2018; 133:197-206. [PMID: 30372107 DOI: 10.1037/com0000144] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Martin, Bhui, Bossaerts, Matsuzawa, and Camerer (2014) found that chimpanzee pairs competing in matching-pennies games achieved the Nash equilibrium whereas human pairs did not. They hypothesized this outcome may be due to (a) chimpanzee ecology producing evolutionary changes that give them a cognitive advantage over humans in these games, and (b) humans being disadvantaged because the cognition necessary for optimal game play was traded off in evolution to support language. We provide data relevant to their hypotheses by exposing pairs of pigeons to the same games. Pigeons also achieved the Nash equilibrium, but did so while also conforming with the matching law prediction on concurrent schedules where choice ratios covary with reinforcer ratios. The cumulative effects model, which produces matching on concurrent schedules, also achieved the Nash equilibrium when it was simulated on matching-pennies games. The empirical and simulated compatibility between matching law and Nash equilibrium predictions can be explained in two ways. Choice to concurrent schedules, where matching obtains, and choice in game play, where the Nash equilibrium is achieved, may reflect the operation of a common process in choice (e.g., reinforcer maximization) for which matching and achieving the Nash equilibrium are derivative. Alternatively, if matching in choice is innate as some accounts argue, then achieving the Nash equilibrium may be an epiphenomenon of matching. Regardless, the wide species generality of matching relations in nonhuman choice suggests game play in chimpanzees would not prove advantaged relative to most species in the animal kingdom. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
Collapse
|
10
|
The temporal dynamics of waiting when reward is increasing. Behav Processes 2018; 149:16-26. [DOI: 10.1016/j.beproc.2018.01.015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Revised: 09/27/2017] [Accepted: 01/19/2018] [Indexed: 11/18/2022]
|
11
|
Kearns DN, Kim JS, Tunstall BJ, Silberberg A. Essential values of cocaine and non-drug alternatives predict the choice between them. Addict Biol 2017; 22:1501-1514. [PMID: 27623729 DOI: 10.1111/adb.12450] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Revised: 07/26/2016] [Accepted: 08/14/2016] [Indexed: 02/04/2023]
Abstract
This study investigated the relationship between reinforcer value and choice between cocaine and two non-drug alternative reinforcers in rats. The essential value (EV, a behavioral economic measure based on elasticity of demand) of intravenous cocaine and food (Experiment 1) or saccharin (Experiment 2) was determined in the first phase of each experiment. Food had higher EV than cocaine, whereas the EVs of cocaine and saccharin did not differ. In the second phase of each experiment, rats were allowed to make mutually exclusive choices between cocaine and the non-drug alternative reinforcer. The main findings were that the EV of cocaine was a positive predictor of cocaine preference and the EV of food or saccharin was a negative predictor of cocaine preference. An analysis of within-session patterns of choice behavior revealed sequential dependencies, whereby rats were more likely to choose cocaine on a particular trial after having chosen the non-drug alternative on previous trials. When the time between choices was increased, these sequential dependencies disappeared. The results of these experiments are consistent with the suggestion that addiction-like behavior involves both overvaluation of drug reinforcers and undervaluation of non-drug reinforcers.
Collapse
Affiliation(s)
- David N. Kearns
- Psychology Department; American University; Washington DC USA
| | - Jung S. Kim
- Psychology Department; American University; Washington DC USA
| | - Brendan J. Tunstall
- Intramural Research Program; National Institute on Drug Abuse; Baltimore MD USA
| | - Alan Silberberg
- Psychology Department; American University; Washington DC USA
| |
Collapse
|
12
|
|
13
|
Kupalov Conditioning: Molecular Control of Response Sequences. PSYCHOLOGICAL RECORD 2017. [DOI: 10.1007/bf03394763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
14
|
Abstract
The history of psychology is full of disputes among various “-isms”: behaviorism, cognitivism, functionalism, and many others. Nevertheless, all are unanimous in their opposition to one other -ism: reductionism. From Skinner to Simon, there is tacit agreement that behavior (or mind) is a subject matter in its own right that need not, perhaps cannot, be “reduced to” neurophysiology. This consensus has begun to crack in recent decades, with advances in neurobiology and the growth of understanding of the properties of brainlike theoretical systems. What, then, is the status of the study of behavior in its own right? This paper proposes a framework in which realtime theoretical models provide the link between behavioral research and the structure and function of the nervous system. We argue that such models arise most naturally from studies at the behavioral level, especially when the behavior under study depends on context and remote past history, as in learning and memory. We conclude that Skinner was probably right to argue that behavior must be understood in its own right before we can expect to understand brain—behavior relations. But he was wrong in limiting behavioral science to descriptive laws and catalogs of input-output relationships.
Collapse
|
15
|
Tanno T. Response-bout analysis of interresponse times in variable-ratio and variable-interval schedules. Behav Processes 2016; 132:12-21. [DOI: 10.1016/j.beproc.2016.09.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Revised: 09/05/2016] [Accepted: 09/07/2016] [Indexed: 10/21/2022]
|
16
|
Abstract
Using a concurrent-chains procedure, the present study examined preference of monkeys between forced-choice and free-choice with multiple alternatives. The forced-choice was followed by the reinforcer with a probability of .8. The free-choice was between reinforcement alternatives of high probability ( p = .8) and low probability ( p = .5) in which the monkeys mostly chose the former. The number of keys with each kind of alternative in the free-choice was manipulated under four conditions. Preference for free-choice increased with the number of keys for the preferred alternative. However, the number of keys for the less-preferred alternative had no influence on preference. Effects of the less-preferred alternative on preference were inconsistent with previous studies using pigeons.
Collapse
|
17
|
Willmore CB. The Cognitive Effect Profiles of NMDA Receptor Modulating Drugs are Resolvable If Stimulus Complexity Is Varied in a Number Discernment Task. ACTA ACUST UNITED AC 2016. [DOI: 10.1177/1534582303002002004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Number discernment is at the heart of task accuracy for laboratory animals performing Fixed Consecutive Number (FCN) operant tasks. Narrow-limit FCN tasks, in particular, are useful for measuring working memory in rat subjects because performance efficacy, which is set up to concord with food delivery, depends on a fairly precise quantification of cues generated by the rat's ongoing behavior. Reported here is a behavioral pharmacology study that utilized a group of overtrained and FCN-schedule-compliant rats injected in a randomized series of testing sessions with different types of N-methyl-D-aspartate (NMDA) receptor modulating drugs. Modifications made to the narrowlimit FCN schedule permitted a simultaneous measure of druginduced compromises in subjects' sensory integrative or motor coordinating capabilities. This highly sensitive model implicated the intrachannel and the glutamate recognition NMDA receptor binding sites as prime mediators of NMDA antagonist associated memory impairments because drugs acting at the mentioned sites lowered counting efficacy without altering sensorimotor function.
Collapse
|
18
|
Hachiga Y, Sakagami T, Silberberg A. Preference pulses and the win-stay, fix-and-sample model of choice. J Exp Anal Behav 2015; 104:274-95. [DOI: 10.1002/jeab.170] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2014] [Accepted: 09/01/2015] [Indexed: 11/10/2022]
|
19
|
|
20
|
The copyist model and the shaping view of reinforcement. Behav Processes 2015; 114:72-7. [DOI: 10.1016/j.beproc.2015.02.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2014] [Revised: 02/05/2015] [Accepted: 02/18/2015] [Indexed: 11/16/2022]
|
21
|
Catania AC, Reilly MP, Hand D, Kehle LK, Valentine L, Shimoff E. A quantitative analysis of the behavior maintained by delayed reinforcers. J Exp Anal Behav 2015; 103:288-31. [DOI: 10.1002/jeab.138] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
22
|
Abstract
In the Monty Hall dilemma, humans are initially given a choice among three alternatives, one of which has a hidden prize. After choosing, but before it is revealed whether they have won the prize, they are shown that one of the remaining alternatives does not have the prize. They are then asked whether they want to stay with their original choice or switch to the remaining alternative. Although switching results in obtaining the prize two thirds of the time, humans consistently fail to adopt the optimal strategy of switching even after considerable training. Interestingly, there is evidence that pigeons show more optimal switching performance with this task than humans. Because humans often view even random choices already made as being more valuable than choices not made, we reasoned that if pigeons made a greater investment, it might produce an endowment or ownership effect resulting in more human-like suboptimal performance. On the other hand, the greater investment in the initial choice by the pigeons might facilitate switching behavior by helping them to better discriminate their staying versus switching behavior. In Experiment 1, we examined the effect of requiring pigeons to make a greater investment in their initial choice (20 pecks rather than the usual 1 peck). We found that the increased response requirement facilitated acquisition of the switching response. In Experiment 2, we showed that facilitation of switching due to the increased response requirement did not result from extinction of responding to the initially chosen location.
Collapse
|
23
|
Greenwald MK, Ledgerwood DM, Lundahl LH, Steinmiller CL. Effect of experimental analogs of contingency management treatment on cocaine seeking behavior. Drug Alcohol Depend 2014; 139:164-8. [PMID: 24685561 PMCID: PMC5532806 DOI: 10.1016/j.drugalcdep.2014.03.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/24/2013] [Revised: 03/07/2014] [Accepted: 03/07/2014] [Indexed: 11/20/2022]
Abstract
BACKGROUND Contingency management (CM) treatment is effective for treating cocaine dependence but further mechanistic studies of its efficacy are warranted. This study aimed to determine whether: (a) higher vs. lower predictable money amounts ($3 vs. $1; analogs of standard voucher-based CM) increase cocaine demand elasticity; and (b) probabilistic amounts matched for expected value with the $3-predictable amount (50% chance of $6; 25% chance of $12; and 12.5% chance of $24; analogs of prize CM) similarly affect cocaine choice. METHODS Each of 15 cocaine-dependent participants first completed a qualifying session to ensure that intranasal cocaine functioned as a reinforcer, then completed a 10-session, within-subject, randomized crossover study. During each of the 10 sessions, the participant responded on a progressive ratio schedule to earn units of cocaine (5-mg or 10-mg) and/or money (five monetary conditions above). RESULTS During the reinforcement qualifying session (10-mg vs. 0-mg units; no money alternative), cocaine choice was high. The $3-predictable amount significantly decreased cocaine choice relative to both the $1-predictable amount and the qualifying session. Cocaine-choices in the probabilistic conditions were similar to the $3 predictable condition. CONCLUSIONS These findings indicate that CM interventions targeted at reducing cocaine self-administration are more likely to succeed with higher value non-drug reinforcement.
Collapse
Affiliation(s)
- Mark K Greenwald
- Substance Abuse Research Division, Department of Psychiatry and Behavioral Neurosciences, Wayne State University School of Medicine, Detroit, MI 48201, USA.
| | - David M Ledgerwood
- Substance Abuse Research Division, Department of Psychiatry and Behavioral Neurosciences, Wayne State University School of Medicine, Detroit, MI 48201, USA
| | - Leslie H Lundahl
- Substance Abuse Research Division, Department of Psychiatry and Behavioral Neurosciences, Wayne State University School of Medicine, Detroit, MI 48201, USA
| | - Caren L Steinmiller
- Substance Abuse Research Division, Department of Psychiatry and Behavioral Neurosciences, Wayne State University School of Medicine, Detroit, MI 48201, USA; Department of Pharmacology, University of Toledo, Toledo, OH 43614, USA
| |
Collapse
|
24
|
|
25
|
|
26
|
Crawford LL. BEHAVIOR ANALYSIS TAKES A FIELD TRIP: A REVIEW OF KREBS AND DAVIES' BEHAVIOURAL ECOLOGY: AN EVOLUTIONARY APPROACH. J Exp Anal Behav 2013. [DOI: 10.1901/jeab.1986.46-395] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
27
|
|
28
|
Hinson JM. SEEKING THE NATURAL LINES OF FRACTURE: A REVIEW OF THOMPSON AND ZEILER'S ANALYSIS AND INTEGRATION OF BEHAVIORAL UNITS1. J Exp Anal Behav 2013. [DOI: 10.1901/jeab.1987.47-133] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
29
|
|
30
|
McDowell JJ, Popa A, Calvin NT. Selection dynamics in joint matching to rate and magnitude of reinforcement. J Exp Anal Behav 2012; 98:199-212. [PMID: 23008523 PMCID: PMC3449856 DOI: 10.1901/jeab.2012.98-199] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2012] [Accepted: 07/03/2012] [Indexed: 11/22/2022]
Abstract
Virtual organisms animated by a selectionist theory of behavior dynamics worked on concurrent random interval schedules where both the rate and magnitude of reinforcement were varied. The selectionist theory consists of a set of simple rules of selection, recombination, and mutation that act on a population of potential behaviors by means of a genetic algorithm. An extension of the power function matching equation, which expresses behavior allocation as a joint function of exponentiated reinforcement rate and reinforcer magnitude ratios, was fitted to the virtual organisms' data, and over a range of moderate mutation rates was found to provide an excellent description of their behavior without residual trends. The mean exponents in this range of mutation rates were 0.83 for the reinforcement rate ratio and 0.68 for the reinforcer magnitude ratio, which are values that are comparable to those obtained in experiments with live organisms. These findings add to the evidence supporting the selectionist theory, which asserts that the world of behavior we observe and measure is created by evolutionary dynamics.
Collapse
Affiliation(s)
- J J McDowell
- Department of Psychology, Emory University, Atlanta, GA 30322, USA.
| | | | | |
Collapse
|
31
|
Misak P, Cleaveland JM. Preference as a function of active interresponse times: a test of the active time model. J Exp Anal Behav 2012; 96:215-25. [PMID: 21909165 DOI: 10.1901/jeab.2011.96-215] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2010] [Accepted: 05/25/2011] [Indexed: 10/28/2022]
Abstract
In this article, we describe a test of the active time model for concurrent variable interval (VI) choice. The active time model (ATM) suggests that the time since the most recent response is one of the variables controlling choice in concurrent VI VI schedules of reinforcement. In our experiment, pigeons were trained in a multiple concurrent similar to that employed by Belke (1992), with VI 20-s and VI 40-s schedules in one component, and VI 40-s and VI 80-s schedules in the other component. However, rather than use a free-operant design, we used a discrete-trial procedure that restricted interresponse times to a range of 0.5-9.0 s. After 45 sessions of training, unreinforced probe periods were mixed with reinforced training periods. These probes paired the two stimuli associated with the VI 40-s schedules. Further, the probes were defined such that during their occurrence, interresponse times were either "short" (0.5-3.0 s) or "long" (7.5-9.0 s). All pigeons showed a preference for the stimulus associated with the relatively rich VI 40-s schedule--a result mirroring that of Belke. We also observed, though, that this preference was more extreme during long probes than during short probes--a result predicted by ATM.
Collapse
Affiliation(s)
- Paul Misak
- Vassar College, Poughkeepsie, NY 12603, USA
| | | |
Collapse
|
32
|
|
33
|
Abstract
AbstractCan the output of human cognition be predicted from the assumption that it is an optimal response to the information-processing demands of the environment? A methodology called rational analysis is described for deriving predictions about cognitive phenomena using optimization assumptions. The predictions flow from the statistical structure of the environment and not the assumed structure of the mind. Bayesian inference is used, assuming that people start with a weak prior model of the world which they integrate with experience to develop stronger models of specific aspects of the world. Cognitive performance maximizes the difference between the expected gain and cost of mental effort. (1) Memory performance can be predicted on the assumption that retrieval seeks a maximal trade-off between the probability of finding the relevant memories and the effort required to do so; in (2) categorization performance there is a similar trade-off between accuracy in predicting object features and the cost of hypothesis formation; in (3) casual inference the trade-off is between accuracy in predicting future events and the cost of hypothesis formation; and in (4) problem solving it is between the probability of achieving goals and the cost of both external and mental problem-solving search. The implemention of these rational prescriptions in neurally plausible architecture is also discussed.
Collapse
|
34
|
|
35
|
|
36
|
|
37
|
|
38
|
|
39
|
Deciding when to explore and when to persist: a comparison of honeybees and bumblebees in their response to downshifts in reward. Behav Ecol Sociobiol 2010. [DOI: 10.1007/s00265-010-1047-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
|
40
|
Abstract
When a pigeon's choices between two keys are probabilistically reinforced, as in discrete trial probability learning procedures and in concurrent variable-interval schedules, the bird tends to maximize, or to choose the alternative with the higher probability of reinforcement. In concurrent variable-interval schedules, steady-state matching, which is an approximate equality between the relative frequency of a response and the relative frequency of reinforcement of that response, has previously been obtained only as a consequence of maximizing. In the present experiment, maximizing was impossible. A choice of one of two keys was reinforced only if it formed, together with the three preceding choices, the sequence of four successive choices that had occurred least often. This sequence was determined by a Bernoulli-trials process with parameter p. Each of three pigeons matched when p was (1/2) or (1/4). Therefore, steady-state matching by individual birds is not always a consequence of maximizing. Choice probability varied between successive reinforcements, and sequential statistics revealed dependencies which were adequately described by a Bernoulli-trials process with p depending on the time since the preceding reinforcement.
Collapse
|
41
|
Abstract
Pigeons were trained to peck at red or green keys presented simultaneously in discrete trials. In one experiment, reinforcements were arranged by concurrent variable-interval schedules. The proportion of responses to green approximately matched the proportion of reinforcements produced by pecking green. Detailed analysis of responding revealed a systematic decrease in the probability of switching from green to red within sequences of trials after reinforcement. This trend corresponded to sequential changes in the relative frequency of reinforcement, and not to sequential changes in probability of reinforcement. In a second experiment, reinforcements were scheduled probabilistically every seventh trial. Even though there were no contingencies on pecking during the first six post-reinforcement trials, choices of green on the first response after reinforcement matched the proportion of reinforcements for pecking green. These results extend the generality of overall matching under concurrent reinforcement.
Collapse
|
42
|
Abstract
Six pigeons pecked for food in a three-key experiment. A subject at any time could choose the left or right key and receive reinforcement according to one two-key concurrent variable-interval variable-interval schedule of reinforcement, or it could peck the center key. A peck on the center key arranged the complementary two-key concurrent variable-interval variable-interval schedule on the left and right keys. The two different two-key concurrent schedules arranged reinforcements concurrently and were signalled by two different colors of key lights. Choice behavior in the presence of a given color conformed to the usual relationship in two-key concurrent schedules: the relative frequency of responding on a key approximately equalled the relative frequency of reinforcement on that key. Preference for a two-key concurrent schedule, which was equivalent to preference for a color, was measured by the percentage of all responses on the left and right keys in the presence of that color: this percentage approximately equalled the percentage of all reinforcements that were delivered in the presence of that color. Thus, choice between concurrent schedules conforms approximately to the same relationship as does choice between alternatives in a single concurrent schedule.
Collapse
|
43
|
Abstract
Three pigeons received training on multiple variable-interval schedules with brief alternating components, concurrently with a fixed-interval schedule of food reinforcement on a second key. Fixed-interval performance exhibited typical increases in rate within the interval, and was independent of multiple-schedule responding. Responding on the multiple-schedule key decreased as a function of proximity to reinforcement on the fixed-interval key. The overall relative rate of responding in one component of the multiple schedule roughly matched the overall relative rate of reinforcement. Within the fixed interval, response rate during one multiple-schedule component was a monotonic, negatively accelerated function of response rate during the other component. To a first approximation, the data were described by a power function, where the exponent depended on the relative rate of reinforcement obtained in the two components. The relative rate of responding in one component of the multiple schedule increased as a function of proximity to fixed-interval reinforcement, and often exceeded the overall obtained relative rate of reinforcement. The form of the function relating response rates is discussed in relation to findings on rate-dependent effects of drugs, chaining, and the relation between response rate and reinforcement rate in single-schedule conditions.
Collapse
|
44
|
Bradshaw CM, Szabadi E, Bevan P. Relationship between response rate and reinforcement frequency in variable-interval schedules: the effect of the concentration of sucrose reinforcement. J Exp Anal Behav 2010; 29:447-52. [PMID: 16812068 PMCID: PMC1332842 DOI: 10.1901/jeab.1978.29-447] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Four rats were exposed to variable-interval schedules specifying a range of different reinforcement frequencies, using sucrose of two different concentrations and distilled water as the reinforcer. With sucrose, the rates of responding of all four rats were increasing negatively accelerated functions of reinforcement frequency, the data conforming closely to Herrnstein's equation; this was also true of the data from three of the four rats when distilled water was used as the reinforcer. The values of both constants in Herrnstein's equation were related to the sucrose concentration: the asymptotic response rate decreased, and the reinforcement frequency corresponding to the half-maximal response rate increased, with decreasing sucrose concentration.
Collapse
|
45
|
Abstract
Pigeons were studied in an experiment involving two concurrently available response keys. Conditions were such that in the first condition the predictions of melioration (Herrnstein & Vaughan, 1980), minimization of deviation from matching, and maximization were identical: relative time on the right key should have fallen between .125 and .25, which in fact occurred. In the second condition, melioration predicted a shift in relative time on the right to between .75 and .875, which would involve a transient deviation from matching as well as a substantial drop in rate of reinforcement. All three birds eventually shifted their distribution of behavior to within the range predicted by melioration.
Collapse
|
46
|
|
47
|
Abstract
Lever pressing by two squirrel monkeys was maintained under a variable-interval 60-second schedule of food presentation. When response-dependent electric shock was made contingent on comparatively long interresponse times, response rate increased, and further increases were obtained when the minimum interresponse-time requirement was decreased. When an equal proportion of responses produced shock without regard to interresponse time, rates decreased. Thus, shock contingent on long interresponse times selectively decreased the relative frequency of those interresponse times, and increased the relative frequency of shorter interresponse times, whereas shock delivered independent of interresponse times decreased the relative frequency of shorter interresponse times while increasing the frequency of longer ones. The results provide preliminary evidence that interresponse times may be differentiated by punishment, further supporting the notion that interresponse times may be considered functional units of behavior.
Collapse
|
48
|
Abstract
We present a classification and theoretical analysis of discrete-trial and free-operant choice procedures in which reinforcement is assigned to one alternative only, or independently to both, is either always available or conditionally available, and is either "held" or not from trial to trial. Momentary-maximizing and (globally) optimal choice sequences are defined in terms of initializing and marker events. Free-operant choice is analyzed in terms of a clock space whose axes are the times since the last A and B choices. The analysis shows that most molar matching data are derivable from momentary maximizing, and that the momentary-maximizing hypothesis has not been adequately tested in either discrete-trial or free-operant situations.
Collapse
|
49
|
Abstract
In simple situations, animals consistently choose the better of two alternatives. On concurrent variable-interval variable-interval and variable-interval variable-ratio schedules, they approximately match aggregate choice and reinforcement ratios. The matching law attempts to explain the latter result but does not address the former. Hill-climbing rules such as momentary maximizing can account for both. We show that momentary maximizing constrains molar choice to approximate matching; that molar choice covaries with pigeons' momentary-maximizing estimate; and that the "generalized matching law" follows from almost any hill-climbing rule.
Collapse
|
50
|
Abstract
A case history illustrates how one research program in the experimental analysis of behavior evolved somewhat differently from the modal research program represented in this journal. A chief issue that seems to be responsible for this difference is the role attributed to theory in behavioral research: Skinner's views on the nature and function of theory and on the nature of observation combine to produce a certain kind of picture of behavior. The classic conception of reinforcement contingencies is tied to this particular picture. But this picture may be incompatible with, and certainly is different from, other possible pictures. Reinforcement contingencies that place greater emphasis on the local temporal patterning of behavior seem tied to some of these alternative pictures of what behavior is. These other pictures encourage a wide range of theoretical approaches, including cognitive ones, various kinds of mathematical analyses, and computer-simulation methods to characterize entire behavior streams. In the future, perhaps the experimental analysis of behavior will accept a somewhat different range of views on the nature and function of theory, a correspondingly different set of experimental methods, and alternative ways of talking about behavior.
Collapse
|