1
|
Ghita RS. Do mturkers collude in interactive online experiments? Behav Res Methods 2023:10.3758/s13428-023-02220-3. [PMID: 37658256 DOI: 10.3758/s13428-023-02220-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/08/2023] [Indexed: 09/03/2023]
Abstract
One of the issues that can potentially affect the internal validity of interactive online experiments that recruit participants using crowdsourcing platforms is collusion: participants could act upon information shared through channels that are external to the experimental design. Using two experiments, I measure how prevalent collusion is among MTurk workers and whether collusion depends on experimental design choices. Despite having incentives to collude, I find no evidence that MTurk workers collude in the treatments that resembled the design of most other interactive online experiments. This suggests collusion is not a concern for data quality in typical interactive online experiments that recruit participants using crowdsourcing platforms. However, I find that approximately 3% of MTurk workers collude when the payoff of collusion is unusually high. Therefore, collusion should not be overlooked as a possible danger to data validity in interactive experiments that recruit participants using crowdsourcing platforms when participants have strong incentives to engage in such behavior.
Collapse
Affiliation(s)
- Razvan S Ghita
- Department of Business and Management, Southern Denmark University, Universitetsparken 1, Kolding, 6000, Denmark.
| |
Collapse
|
2
|
Abstract
We examine key aspects of data quality for online behavioral research between selected platforms (Amazon Mechanical Turk, CloudResearch, and Prolific) and panels (Qualtrics and Dynata). To identify the key aspects of data quality, we first engaged with the behavioral research community to discover which aspects are most critical to researchers and found that these include attention, comprehension, honesty, and reliability. We then explored differences in these data quality aspects in two studies (N ~ 4000), with or without data quality filters (approval ratings). We found considerable differences between the sites, especially in comprehension, attention, and dishonesty. In Study 1 (without filters), we found that only Prolific provided high data quality on all measures. In Study 2 (with filters), we found high data quality among CloudResearch and Prolific. MTurk showed alarmingly low data quality even with data quality filters. We also found that while reputation (approval rating) did not predict data quality, frequency and purpose of usage did, especially on MTurk: the lowest data quality came from MTurk participants who report using the site as their main source of income but spend few hours on it per week. We provide a framework for future investigation into the ever-changing nature of data quality in online research, and how the evolving set of platforms and panels performs on these key aspects.
Collapse
|
3
|
Abstract
Non-experts have long made important contributions to machine learning (ML) by contributing training data, and recent work has shown that non-experts can also help with feature engineering by suggesting novel predictive features. However, non-experts have only contributed features to prediction tasks already posed by experienced ML practitioners. Here we study how non-experts can design prediction tasks themselves, what types of tasks non-experts will design, and whether predictive models can be automatically trained on data sourced for their tasks. We use a crowdsourcing platform where non-experts design predictive tasks that are then categorized and ranked by the crowd. Crowdsourced data are collected for top-ranked tasks and predictive models are then trained and evaluated automatically using those data. We show that individuals without ML experience can collectively construct useful datasets and that predictive models can be learned on these datasets, but challenges remain. The prediction tasks designed by non-experts covered a broad range of domains, from politics and current events to health behavior, demographics, and more. Proper instructions are crucial for non-experts, so we also conducted a randomized trial to understand how different instructions may influence the types of prediction tasks being proposed. In general, understanding better how non-experts can contribute to ML can further leverage advances in Automatic machine learning and has important implications as ML continues to drive workplace automation.
Collapse
Affiliation(s)
- James P. Bagrow
- Mathematics & Statistics, University of Vermont, Burlington, VT, USA
- Vermont Complex Systems Center, University of Vermont, Burlington, VT, USA
| |
Collapse
|
4
|
Lindner P, Ramnerö J, Ivanova E, Carlbring P. Studying Gambling Behaviors and Responsible Gambling Tools in a Simulated Online Casino Integrated With Amazon Mechanical Turk: Development and Initial Validation of Survey Data and Platform Mechanics of the Frescati Online Research Casino. Front Psychiatry 2020; 11:571954. [PMID: 33613331 PMCID: PMC7892621 DOI: 10.3389/fpsyt.2020.571954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 12/23/2020] [Indexed: 11/13/2022] Open
Abstract
Introduction: Online gambling, popular among both problem and recreational gamblers, simultaneously entails both heightened addiction risks as well as unique opportunities for prevention and intervention. There is a need to bridge the growing literature on learning and extinction mechanisms of gambling behavior, with account tracking studies using real-life gambling data. In this study, we describe the development and validation of the Frescati Online Research Casino (FORC): a simulated online casino where games, visual themes, outcome sizes, probabilities, and other variables of interest can be experimentally manipulated to conduct behavioral analytic studies and evaluate the efficacy of responsible gambling tools. Methods: FORC features an initial survey for self-reporting of gambling and gambling problems, along with several games resembling regular real-life casino games, designed to allow Pavlovian and instrumental learning. FORC was developed with maximum flexibility in mind, allowing detailed experiment specification by setting parameters using an online interface, including the display of messages. To allow convenient and rapid data collection from diverse samples, FORC is independently hosted yet integrated with the popular crowdsourcing platform Amazon Mechanical Turk through a reimbursement key mechanism. To validate the survey data quality and game mechanics of FORC, n = 101 participants were recruited, who answered an questionnaire on gambling habits and problems, then played both slot machine and card-draw type games. Questionnaire and trial-by-trial behavioral data were analyzed using standard psychometric tests, and outcome distribution modeling. Results: The expected associations among variables in the introductory questionnaire were found along with good psychometric properties, suggestive of good quality data. Only 6% of participants provided seemingly poor behavioral data. Game mechanics worked as intended: gambling outcomes showed the expected pattern of random sampling with replacement and were normally distributed around the set percentages, while balances developed according to the set return to player rate. Conclusions: FORC appears to be a valid paradigm for simulating online gambling and for collecting survey and behavioral data, offering a valuable compromise between stringent experimental paradigms with lower external validity, and real-world gambling account tracking data with lower internal validity.
Collapse
Affiliation(s)
- Philip Lindner
- Department of Psychology, Stockholm University, Stockholm, Sweden.,Department of Clinical Neuroscience, Centre for Psychiatry Research, Karolinska Institutet & Stockholm Health Care Services, Stockholm County Council, Stockholm, Sweden
| | - Jonas Ramnerö
- Department of Psychology, Stockholm University, Stockholm, Sweden.,Department of Clinical Neuroscience, Centre for Psychiatry Research, Karolinska Institutet & Stockholm Health Care Services, Stockholm County Council, Stockholm, Sweden
| | - Ekaterina Ivanova
- Department of Psychology, Stockholm University, Stockholm, Sweden.,Department of Clinical Neuroscience, Centre for Psychiatry Research, Karolinska Institutet & Stockholm Health Care Services, Stockholm County Council, Stockholm, Sweden
| | - Per Carlbring
- Department of Psychology, Stockholm University, Stockholm, Sweden
| |
Collapse
|
5
|
Greiner Safi A, Reyes C, Jesch E, Steinhardt J, Niederdeppe J, Skurka C, Kalaji M, Scolere L, Byrne S. Comparing in person and internet methods to recruit low-SES populations for tobacco control policy research. Soc Sci Med 2019; 242:112597. [PMID: 31670216 DOI: 10.1016/j.socscimed.2019.112597] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2019] [Revised: 10/09/2019] [Accepted: 10/10/2019] [Indexed: 10/25/2022]
Abstract
Tobacco use and the associated consequences are much more prevalent among low-SES populations in the U.S. However, tobacco-based research often does not include these harder-to-reach populations. This paper compares the effectiveness and drawbacks of three methods of recruiting low-SES adult smokers in the Northeast. From a 5-year, [funding blinded] grant about impacts of graphic warning labels on tobacco products, three separate means of recruiting low-SES adult smokers emerged: 1) in person in the field with a mobile lab vehicle, 2) in person in the field with tablet computers, and 3) online via Amazon Mechanical Turk (MTurk). We compared each of these methods in terms of the resulting participant demographics and the "pros" and "cons" of each approach including quality control, logistics, cost, and engagement. Field-based methods (with a mobile lab or in person with a tablet) yielded a greater proportion of disadvantaged participants who could be biochemically verified as current smokers-45% of the field-based sample had an annual income of <$10,000 compared to 16% of the MTurk sample; 40-45% of the field-based sample did not complete high school compared to 2.6% of the MTurk sample. MTurk-based recruitment was substantially less expensive to operate (1/14th the cost of field-based methods) was faster, and involved less logistical coordination, though was unable to provide immediate biochemical verification of current smoking status. Both MTurk and field-based methods provide access to low-SES participants-the difference is the proportion and the degree of disadvantage. For research and interventions where either inclusion considerations or external validity with low-SES populations is critical, especially the most disadvantaged, our research supports the use of field-based methods. It also highlights the importance of adequate funding and time to enable the recruitment and participation of these harder-to-reach populations.
Collapse
Affiliation(s)
- Amelia Greiner Safi
- Department of Communication, Cornell University, 450B Mann Library Building, Ithaca, NY, 14853, USA; Master of Public Health Program, Department of Population Medicine and Diagnostic Sciences, Cornell University, S2002 Schurman Hall, Ithaca, NY, 14853, USA.
| | - Carolyn Reyes
- Department of Communication, Cornell University, 450B Mann Library Building, Ithaca, NY, 14853, USA; Department of Agricultural Economics, Sociology, and Education, Pennsylvania State University, 111 Armsby Building, University Park, PA, 16802, USA
| | - Emma Jesch
- Department of Communication, Cornell University, 450B Mann Library Building, Ithaca, NY, 14853, USA; Annenberg School of Communication, The University of Pennsylvania, 3620 Walnut Street, Philadelphia, PA, 19104, USA
| | - Joseph Steinhardt
- Department of Advertising and Public Relations, Michigan State University, 404 Wilson Road, Office 377, East Lansing, MI, 48824, USA
| | - Jeff Niederdeppe
- Department of Communication, Cornell University, 450B Mann Library Building, Ithaca, NY, 14853, USA
| | - Christofer Skurka
- Donald P. Bellisario College of Communications, Pennsylvania State University, 222 Carnegie Building, University Park, PA, 16802, USA
| | - Motasem Kalaji
- Department of Communication, Cornell University, 450B Mann Library Building, Ithaca, NY, 14853, USA
| | - Leah Scolere
- Department of Design and Merchandising, Colorado State University, 1100 Meridian Avenue, Fort Collins, CO, 80521, USA
| | - Sahara Byrne
- Department of Communication, Cornell University, 450B Mann Library Building, Ithaca, NY, 14853, USA
| |
Collapse
|
6
|
Becirevic A, Critchfield TS, Reed DD. On the Social Acceptability of Behavior-Analytic Terms: Crowdsourced Comparisons of Lay and Technical Language. Behav Anal 2016; 39:305-317. [PMID: 31976979 PMCID: PMC6701255 DOI: 10.1007/s40614-016-0067-4] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Behavior analysis has a marketing problem. Although behavior analysts have speculated about the problems regarding our technical behavior-analytic terminology and how our terminology has hindered the dissemination of behavior analysis to outsiders, few have investigated the social acceptability of the terminology. The present paper reports the general public's reactions to technical behavioral jargon versus non-technical substitute terms that refer to applied behavior-analytic techniques. Two-hundred participants, all non-behavior analysts, were recruited from Amazon Mechanical Turk and completed a survey on the social acceptability of behavioral jargon and non-technical terms. Specifically, participants rated the acceptability of how the six pairs of terms (technical and non-technical) sounded if the treatments were to be implemented for each of 10 potential populations of clients that behavior analysts typically work with. The results show that, overall, members of the general public found non-technical substitute terms more acceptable than technical behavior-analytic terms. The finding suggests that specialized vocabulary of behavior analysis may create hurdles to the acceptability of applied behavior-analytic services. The implication of these findings suggest the importance of a systematic investigation of listener behavior with respect to behavior analysis terms.
Collapse
Affiliation(s)
- Amel Becirevic
- University of Kansas, 4048 Dole Human Development Center, 1000 Sunnyside Avenue, Lawrence, KS 66045-7555 USA
| | | | - Derek D. Reed
- University of Kansas, 4048 Dole Human Development Center, 1000 Sunnyside Avenue, Lawrence, KS 66045-7555 USA
| |
Collapse
|