1
|
Szulc J, Fletcher K. Numerical versus graphical aids for decision-making in a multi-cue signal identification task. APPLIED ERGONOMICS 2024; 118:104260. [PMID: 38417229 DOI: 10.1016/j.apergo.2024.104260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Revised: 02/07/2024] [Accepted: 02/20/2024] [Indexed: 03/01/2024]
Abstract
Decision aids are commonly used in tactical decision-making environments to help humans integrate base-rate and multi-cue information. However, it is important that users appropriately trust and rely on aids. Decision aids can be presented in many ways, but the literature lacks clarity over the conditions surrounding their effectiveness. This research aims to determine whether a numerical or graphical aid more effectively supports human performance, and explores the relationships between aid presentation, trust, and workload. Participants (N = 30) completed a signal-identification task that required integration of readings from a set of three dynamic gauges. Participants experienced three conditions: unaided, using a numerical aid, and using a graphical aid. The aids combined gauge and base-rate information in a statistically-optimal fashion. Participants also indicated how much they trusted the system and how hard they worked during the task. Analyses explored the impact of aid condition on sensitivity, response bias, response time, trust, and workload. Both the numerical and graphical aids produced significant increases in sensitivity and trust, and significant decreases in workload in comparison to the unaided condition. The difference in response time between the graphical and unaided conditions approached significance, with participants responding faster using the graphical aid without decrements in sensitivity. Significant interactions between aid and signal type indicated that both aided conditions promoted faster responding to non-hostile signals, with larger mean differences in the graphical aid condition. Practically, graphical aids in which suggestions are more salient to users may promote faster responding in tactical environments, with negligible cost of accuracy.
Collapse
|
2
|
Dunning RE, Fischhoff B, Davis AL. When Do Humans Heed AI Agents' Advice? When Should They? HUMAN FACTORS 2024; 66:1914-1927. [PMID: 37553098 PMCID: PMC11089830 DOI: 10.1177/00187208231190459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 06/20/2023] [Indexed: 08/10/2023]
Abstract
OBJECTIVE We manipulate the presence, skill, and display of artificial intelligence (AI) recommendations in a strategy game to measure their effect on users' performance. BACKGROUND Many applications of AI require humans and AI agents to make decisions collaboratively. Success depends on how appropriately humans rely on the AI agent. We demonstrate an evaluation method for a platform that uses neural network agents of varying skill levels for the simple strategic game of Connect Four. METHODS We report results from a 2 × 3 between-subjects factorial experiment that varies the format of AI recommendations (categorical or probabilistic) and the AI agent's amount of training (low, medium, or high). On each round of 10 games, participants proposed a move, saw the AI agent's recommendations, and then moved. RESULTS Participants' performance improved with a highly skilled agent, but quickly plateaued, as they relied uncritically on the agent. Participants relied too little on lower skilled agents. The display format had no effect on users' skill or choices. CONCLUSIONS The value of these AI agents depended on their skill level and users' ability to extract lessons from their advice. APPLICATION Organizations employing AI decision support systems must consider behavioral aspects of the human-agent team. We demonstrate an approach to evaluating competing designs and assessing their performance.
Collapse
|
3
|
Carragher DJ, Sturman D, Hancock PJB. Trust in automation and the accuracy of human-algorithm teams performing one-to-one face matching tasks. Cogn Res Princ Implic 2024; 9:41. [PMID: 38902539 PMCID: PMC11190114 DOI: 10.1186/s41235-024-00564-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 05/31/2024] [Indexed: 06/22/2024] Open
Abstract
The human face is commonly used for identity verification. While this task was once exclusively performed by humans, technological advancements have seen automated facial recognition systems (AFRS) integrated into many identification scenarios. Although many state-of-the-art AFRS are exceptionally accurate, they often require human oversight or involvement, such that a human operator actions the final decision. Previously, we have shown that on average, humans assisted by a simulated AFRS (sAFRS) failed to reach the level of accuracy achieved by the same sAFRS alone, due to overturning the system's correct decisions and/or failing to correct sAFRS errors. The aim of the current study was to investigate whether participants' trust in automation was related to their performance on a one-to-one face matching task when assisted by a sAFRS. Participants (n = 160) completed a standard face matching task in two phases: an unassisted baseline phase, and an assisted phase where they were shown the identification decision (95% accurate) made by a sAFRS prior to submitting their own decision. While most participants improved with sAFRS assistance, those with greater relative trust in automation achieved larger gains in performance. However, the average aided performance of participants still failed to reach that of the sAFRS alone, regardless of trust status. Nonetheless, further analysis revealed a small sample of participants who achieved 100% accuracy when aided by the sAFRS. Our results speak to the importance of considering individual differences when selecting employees for roles requiring human-algorithm interaction, including identity verification tasks that incorporate facial recognition technologies.
Collapse
Affiliation(s)
- Daniel J Carragher
- School of Psychology, Faculty of Health and Medical Sciences, University of Adelaide, Adelaide, SA, 5005, Australia.
| | - Daniel Sturman
- School of Psychology, Faculty of Health and Medical Sciences, University of Adelaide, Adelaide, SA, 5005, Australia
| | - Peter J B Hancock
- Psychology, Faculty of Natural Sciences, University of Stirling, Stirling, Scotland, UK
| |
Collapse
|
4
|
Raikwar A, Mifsud D, Wickens CD, Batmaz AU, Warden AC, Kelley B, Clegg BA, Ortega FR. Beyond the Wizard of Oz: Negative Effects of Imperfect Machine Learning to Examine the Impact of Reliability of Augmented Reality Cues on Visual Search Performance. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:2662-2670. [PMID: 38437133 DOI: 10.1109/tvcg.2024.3372062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2024]
Abstract
Despite knowing exactly what an object looks like, searching for it in a person's visual field is a time-consuming and error-prone experience. In Augmented Reality systems, new algorithms are proposed to speed up search time and reduce human errors. However, these algorithms might not always provide 100% accurate visual cues, which might affect users' perceived reliability of the algorithm and, thus, search performance. Here, we examined the detrimental effects of automation bias caused by imperfect cues presented in the Augmented Reality head-mounted display using the YOLOv5 machine learning model. 53 participants in the two groups received either 100% accurate visual cues or 88.9% accurate visual cues. Their performance was compared with the control condition, which did not include any additional cues. The results show how cueing may increase performance and shorten search times. The results also showed that performance with imperfect automation was much worse than perfect automation and that, consistent with automation bias, participants were frequently enticed by incorrect cues.
Collapse
|
5
|
Rieger T, Manzey D. Understanding the Impact of Time Pressure and Automation Support in a Visual Search Task. HUMAN FACTORS 2024; 66:770-786. [PMID: 35770911 DOI: 10.1177/00187208221111236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
OBJECTIVE To understand the impact of time pressure and automated decision support systems (DSS) in a simulated medical visual search task. BACKGROUND Time pressure usually impairs manual performance in visual search tasks, but DSS support might neutralize this negative effect. Moreover, understanding the impact of time pressure and DSS support seems relevant for many real-world applications of visual search. METHOD We used a visual search paradigm where participants had to search for target letters in a simulated medical image. Participants performed the task either manually or with support of a highly reliable DSS. Time pressure was varied within-subjects by either a trialwise time-pressure manipulation (Experiment 1) or a blockwise manipulation (Experiment 2). Performance was assessed based on signal detection measures. To further analyze visual search behavior, a mouse-over approach was used. RESULTS In both experiments, results showed impaired sensitivity under high compared to low time pressure in the manual condition, but no negative effect of time pressure when working with a highly reliable DSS. Moreover, participants searched less under time pressure and when receiving DSS support, indicating participants followed the automation without thoroughly checking recommendations. However, the human-DSS team's sensitivity was always worse than that of the DSS alone, independent of the strength of time pressure. CONCLUSION Negative effects of time pressure can be ameliorated when receiving support by a DSS, but joint overall performance remains below DSS-alone performance. APPLICATION Highly reliable DSS seem capable of ameliorating the negative impact of time pressure in complex detection tasks.
Collapse
|
6
|
Elder H, Canfield C, Shank DB, Rieger T, Hines C. Knowing When to Pass: The Effect of AI Reliability in Risky Decision Contexts. HUMAN FACTORS 2024; 66:348-362. [PMID: 35603703 DOI: 10.1177/00187208221100691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
OBJECTIVE This study manipulates the presence and reliability of AI recommendations for risky decisions to measure the effect on task performance, behavioral consequences of trust, and deviation from a probability matching collaborative decision-making model. BACKGROUND Although AI decision support improves performance, people tend to underutilize AI recommendations, particularly when outcomes are uncertain. As AI reliability increases, task performance improves, largely due to higher rates of compliance (following action recommendations) and reliance (following no-action recommendations). METHODS In a between-subject design, participants were assigned to a high reliability AI, low reliability AI, or a control condition. Participants decided whether to bet that their team would win in a series of basketball games tying compensation to performance. We evaluated task performance (in accuracy and signal detection terms) and the behavioral consequences of trust (via compliance and reliance). RESULTS AI recommendations improved task performance, had limited impact on risk-taking behavior, and were under-valued by participants. Accuracy, sensitivity (d'), and reliance increased in the high reliability AI condition, but there was no effect on response bias (c) or compliance. Participant behavior was only consistent with a probability matching model for compliance in the low reliability condition. CONCLUSION In a pay-off structure that incentivized risk-taking, the primary value of the AI recommendations was in determining when to perform no action (i.e., pass on bets). APPLICATION In risky contexts, designers need to consider whether action or no-action recommendations will be more influential to design appropriate interventions.
Collapse
Affiliation(s)
- Hannah Elder
- Technische Universität Berlin, Berlin, Germany, and University of Missouri-Columbia, Columbia, Missouri, USA
| | - Casey Canfield
- Missouri University of Science & Technology, Rolla, Missouri, USA
| | - Daniel B Shank
- Missouri University of Science & Technology, Rolla, Missouri, USA
| | | | - Casey Hines
- Missouri University of Science & Technology, Rolla, Missouri, USA
| |
Collapse
|
7
|
Patton CE, Wickens CD, Smith CAP, Noble KM, Clegg BA. Supporting detection of hostile intentions: automated assistance in a dynamic decision-making context. Cogn Res Princ Implic 2023; 8:69. [PMID: 37980697 PMCID: PMC10657914 DOI: 10.1186/s41235-023-00519-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Accepted: 09/29/2023] [Indexed: 11/21/2023] Open
Abstract
In a dynamic decision-making task simulating basic ship movements, participants attempted, through a series of actions, to elicit and identify which one of six other ships was exhibiting either of two hostile behaviors. A high-performing, although imperfect, automated attention aid was introduced. It visually highlighted the ship categorized by an algorithm as the most likely to be hostile. Half of participants also received automation transparency in the form of a statement about why the hostile ship was highlighted. Results indicated that while the aid's advice was often complied with and hence led to higher accuracy with a shorter response time, detection was still suboptimal. Additionally, transparency had limited impacts on all aspects of performance. Implications for detection of hostile intentions and the challenges of supporting dynamic decision making are discussed.
Collapse
Affiliation(s)
- Colleen E Patton
- Department of Psychology, Colorado State University, Fort Collins, USA.
| | | | - C A P Smith
- Department of Psychology, Colorado State University, Fort Collins, USA
| | - Kayla M Noble
- Department of Psychology, Colorado State University, Fort Collins, USA
| | | |
Collapse
|
8
|
Cockram L, Bartlett ML, McCarley JS. Simple manipulations of anthropomorphism fail to induce perceptions of humanness or improve trust in an automated agent. APPLIED ERGONOMICS 2023; 111:104027. [PMID: 37100010 DOI: 10.1016/j.apergo.2023.104027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 04/06/2023] [Accepted: 04/11/2023] [Indexed: 06/04/2023]
Abstract
Although automation is employed as an aid to human performance, operators often interact with automated decision aids inefficiently. The current study investigated whether anthropomorphic automation would engender higher trust and use, subsequently improving human-automation team performance. Participants performed a multi-element probabilistic signal detection task in which they diagnosed a hypothetical nuclear reactor as in a state of safety or danger. The task was completed unassisted and assisted by a 93%-reliable agent varying in anthropomorphism. Results gave no evidence that participants' perceptions of anthropomorphism differed between conditions. Further, anthropomorphic automation failed to bolster trust and automation-aided performance. Findings suggest that the benefits of anthropomorphism may be limited in some contexts.
Collapse
Affiliation(s)
- Lewis Cockram
- Discipline of Psychology, Flinders University, GPO Box 2100, Adelaide, South Australia, 5001, Australia
| | - Megan L Bartlett
- Discipline of Psychology, Flinders University, GPO Box 2100, Adelaide, South Australia, 5001, Australia.
| | - Jason S McCarley
- School of Psychological Science, Oregon State University, 1500 SW Jefferson Way, Corvallis, OR, 97331, United States
| |
Collapse
|
9
|
Knocton S, Hunter A, Connors W, Dithurbide L, Neyedli HF. The Effect of Informing Participants of the Response Bias of an Automated Target Recognition System on Trust and Reliance Behavior. HUMAN FACTORS 2023; 65:189-199. [PMID: 34078167 PMCID: PMC9969489 DOI: 10.1177/00187208211021711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Accepted: 03/29/2021] [Indexed: 06/12/2023]
Abstract
OBJECTIVE To determine how changing and informing a user of the false alarm (FA) rate of an automated target recognition (ATR) system affects the user's trust in and reliance on the system and their performance during an underwater mine detection task. BACKGROUND ATR systems are designed to operate using a high sensitivity and a liberal decision criterion to reduce the risk of the ATR system missing a target. A high number of FAs in general may lead to a decrease in operator trust and reliance. METHODS Participants viewed sonar images and were asked to identify mines in the images. They performed the task without ATR and with ATR at a lower and higher FA rate. The participants were split into two groups-one informed and one uninformed of the changed FA rate. Trust and/or confidence in detecting mines was measured after each block. RESULTS When not informed of the FA rate, the FA rate had a significant effect on the participants' response bias. Participants had greater trust in the system and a more consistent response bias when informed of the FA rate. Sensitivity and confidence were not influenced by disclosure of the FA rate but were significantly worse for the high FA rate condition compared with performance without the ATR. CONCLUSION AND APPLICATION Informing a user of the FA rate of automation may positively influence the level of trust in and reliance on the aid.
Collapse
Affiliation(s)
| | - Aren Hunter
- Defence Research and Development Canada, Dartmouth, Nova Scotia,
Canada
| | - Warren Connors
- Defence Research and Development Canada, Dartmouth, Nova Scotia,
Canada
| | | | | |
Collapse
|
10
|
Boskemper MM, Bartlett ML, McCarley JS. Measuring the Efficiency of Automation-Aided Performance in a Simulated Baggage Screening Task. HUMAN FACTORS 2022; 64:945-961. [PMID: 33508964 DOI: 10.1177/0018720820983632] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
OBJECTIVE The present study replicated and extended prior findings of suboptimal automation use in a signal detection task, benchmarking automation-aided performance to the predictions of several statistical models of collaborative decision making. BACKGROUND Though automated decision aids can assist human operators to perform complex tasks, operators often use the aids suboptimally, achieving performance lower than statistically ideal. METHOD Participants performed a simulated security screening task requiring them to judge whether a target (a knife) was present or absent in a series of colored X-ray images of passenger baggage. They completed the task both with and without assistance from a 93%-reliable automated decision aid that provided a binary text diagnosis. A series of three experiments varied task characteristics including the timing of the aid's judgment relative to the raw stimuli, target certainty, and target prevalence. RESULTS AND CONCLUSION Automation-aided performance fell closest to the predictions of the most suboptimal model under consideration, one which assumes the participant defers to the aid's diagnosis with a probability of 50%. Performance was similar across experiments. APPLICATION Results suggest that human operators' performance when undertaking a naturalistic search task falls far short of optimal and far lower than prior findings using an abstract signal detection task.
Collapse
|
11
|
Rieger T, Manzey D. Human Performance Consequences of Automated Decision Aids: The Impact of Time Pressure. HUMAN FACTORS 2022; 64:617-634. [PMID: 33111557 DOI: 10.1177/0018720820965019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
OBJECTIVE The study addresses the impact of time pressure on human interactions with automated decision support systems (DSSs) and related performance consequences. BACKGROUND When humans interact with DSSs, this often results in worse performance than could be expected from the automation alone. Previous research has suggested that time pressure might make a difference by leading humans to rely more on a DSS. METHOD In two laboratory experiments, participants performed a luggage screening task either manually, supported by a highly reliable DSS, or by a low reliable DSS. Time provided for inspecting the X-rays was 4.5 s versus 9 s varied within-subjects as the time pressure manipulation. Participants in the automation conditions were either shown the automation's advice prior (Experiment 1) or following (Experiment 2) their own inspection, before they made their final decision. RESULTS In Experiment 1, time pressure compromised performance independent of whether the task was performed manually or with automation support. In Experiment 2, the negative impact of time pressure was only found in the manual but not in the two automation conditions. However, neither experiment revealed any positive impact of time pressure on overall performance, and the joint performance of human and automation was mostly worse than the performance of the automation alone. CONCLUSION Time pressure compromises the quality of decision-making. Providing a DSS can reduce this effect, but only if the automation's advice follows the assessment of the human. APPLICATION The study provides suggestions for the effective implementation of DSSs in addition to supporting concerns that highly reliable DSSs are not used optimally by human operators.
Collapse
|
12
|
Rieger T, Roesler E, Manzey D. Challenging presumed technological superiority when working with (artificial) colleagues. Sci Rep 2022; 12:3768. [PMID: 35260683 PMCID: PMC8904495 DOI: 10.1038/s41598-022-07808-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 02/25/2022] [Indexed: 12/12/2022] Open
Abstract
Technological advancements are ubiquitously supporting or even replacing humans in all areas of life, bringing the potential for human-technology symbiosis but also novel challenges. To address these challenges, we conducted three experiments in different task contexts ranging from loan assignment over X-Ray evaluation to process industry. Specifically, we investigated the impact of support agent (artificial intelligence, decision support system, or human) and failure experience (one vs. none) on trust-related aspects of human-agent interaction. This included not only the subjective evaluation of the respective agent in terms of trust, reliability, and responsibility, when working together, but also a change in perspective to the willingness to be assessed oneself by the agent. In contrast to a presumed technological superiority, we show a general advantage with regard to trust and responsibility of human support over both technical support systems (i.e., artificial intelligence and decision support system), regardless of task context from the collaborative perspective. This effect reversed to a preference for technical systems when switching the perspective to being assessed. These findings illustrate an imperfect automation schema from the perspective of the advice-taker and demonstrate the importance of perspective when working with or being assessed by machine intelligence.
Collapse
Affiliation(s)
- Tobias Rieger
- Department of Psychology and Ergonomics, Technische Universität Berlin, Marchstr. 12, F7, 10587, Berlin, Germany.
| | - Eileen Roesler
- Department of Psychology and Ergonomics, Technische Universität Berlin, Marchstr. 12, F7, 10587, Berlin, Germany.
| | - Dietrich Manzey
- Department of Psychology and Ergonomics, Technische Universität Berlin, Marchstr. 12, F7, 10587, Berlin, Germany
| |
Collapse
|
13
|
Douer N, Meyer J. Judging One's Own or Another Person's Responsibility in Interactions With Automation. HUMAN FACTORS 2022; 64:359-371. [PMID: 32749166 PMCID: PMC8943263 DOI: 10.1177/0018720820940516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/04/2020] [Accepted: 06/05/2020] [Indexed: 06/11/2023]
Abstract
OBJECTIVE We explore users' and observers' subjective assessments of human and automation capabilities and human causal responsibility for outcomes. BACKGROUND In intelligent systems and advanced automation, human responsibility for outcomes becomes equivocal, as do subjective perceptions of responsibility. In particular, actors who actively work with a system may perceive responsibility differently from observers. METHOD In a laboratory experiment with pairs of participants, one participant (the "actor") performed a decision task, aided by an automated system, and the other (the "observer") passively observed the actor. We compared the perceptions of responsibility between the two roles when interacting with two systems with different capabilities. RESULTS Actors' behavior matched the theoretical predictions, and actors and observers assessed the system and human capabilities and the comparative human responsibility similarly. However, actors tended to relate adverse outcomes more to system characteristics than to their own limitations, whereas the observers insufficiently considered system capabilities when evaluating the actors' comparative responsibility. CONCLUSION When intelligent systems greatly exceed human capabilities, users may correctly feel they contribute little to system performance. They may interfere more than necessary, impairing the overall performance. Outside observers, such as managers, may overweigh users' contribution to outcomes, holding users responsible for adverse outcomes when they rightly trusted the system. APPLICATION Presenting users of intelligent systems and others with performance measures and the comparative human responsibility may help them calibrate subjective assessments of performance, reducing users' and outside observers' biases and attribution errors.
Collapse
|
14
|
Liang G, Sloane JF, Donkin C, Newell BR. Adapting to the algorithm: how accuracy comparisons promote the use of a decision aid. Cogn Res Princ Implic 2022; 7:14. [PMID: 35133521 PMCID: PMC8825899 DOI: 10.1186/s41235-022-00364-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 01/20/2022] [Indexed: 11/16/2022] Open
Abstract
In three experiments, we sought to understand when and why people use an algorithm decision aid. Distinct from recent approaches, we explicitly enumerate the algorithm’s accuracy while also providing summary feedback and training that allowed participants to assess their own skills. Our results highlight that such direct performance comparisons between the algorithm and the individual encourages a strategy of selective reliance on the decision aid; individuals ignored the algorithm when the task was easier and relied on the algorithm when the task was harder. Our systematic investigation of summary feedback, training experience, and strategy hint manipulations shows that further opportunities to learn about the algorithm encourage not only increased reliance on the algorithm but also engagement in experimentation and verification of its recommendations. Together, our findings emphasize the decision-maker’s capacity to learn about the algorithm providing insights for how we can improve the use of decision aids.
Collapse
Affiliation(s)
- Garston Liang
- School of Psychology, The University of New South Wales, Sydney, Kensington, NSW, 2052, Australia.
| | - Jennifer F Sloane
- School of Psychology, The University of New South Wales, Sydney, Kensington, NSW, 2052, Australia
| | - Christopher Donkin
- School of Psychology, The University of New South Wales, Sydney, Kensington, NSW, 2052, Australia
| | - Ben R Newell
- School of Psychology, The University of New South Wales, Sydney, Kensington, NSW, 2052, Australia
| |
Collapse
|
15
|
Douer N, Meyer J. Theoretical, Measured, and Subjective Responsibility in Aided Decision Making. ACM T INTERACT INTEL 2021. [DOI: 10.1145/3425732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
When humans interact with intelligent systems, their causal responsibility for outcomes becomes equivocal. We analyze the descriptive abilities of a newly developed responsibility quantification model (ResQu) to predict actual human responsibility and perceptions of responsibility in the interaction with intelligent systems. In two laboratory experiments, participants performed a classification task. They were aided by classification systems with different capabilities. We compared the predicted theoretical responsibility values to the actual measured responsibility participants took on and to their subjective rankings of responsibility. The model predictions were strongly correlated with both measured and subjective responsibility. Participants’ behavior with each system was influenced by the system and human capabilities, but also by the subjective perceptions of these capabilities and the perception of the participant's own contribution. A bias existed only when participants with poor classification capabilities relied less than optimally on a system that had superior classification capabilities and assumed higher-than-optimal responsibility. The study implies that when humans interact with advanced intelligent systems, with capabilities that greatly exceed their own, their comparative causal responsibility will be small, even if formally the human is assigned major roles. Simply putting a human into the loop does not ensure that the human will meaningfully contribute to the outcomes. The results demonstrate the descriptive value of the ResQu model to predict behavior and perceptions of responsibility by considering the characteristics of the human, the intelligent system, the environment, and some systematic behavioral biases. The ResQu model is a new quantitative method that can be used in system design and can guide policy and legal decisions regarding human responsibility in events involving intelligent systems.
Collapse
|
16
|
Rieger T, Heilmann L, Manzey D. Visual search behavior and performance in luggage screening: effects of time pressure, automation aid, and target expectancy. COGNITIVE RESEARCH-PRINCIPLES AND IMPLICATIONS 2021; 6:12. [PMID: 33630179 PMCID: PMC7907401 DOI: 10.1186/s41235-021-00280-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Accepted: 02/12/2021] [Indexed: 11/10/2022]
Abstract
Visual inspection of luggage using X-ray technology at airports is a time-sensitive task that is often supported by automated systems to increase performance and reduce workload. The present study evaluated how time pressure and automation support influence visual search behavior and performance in a simulated luggage screening task. Moreover, we also investigated how target expectancy (i.e., targets appearing in a target-often location or not) influenced performance and visual search behavior. We used a paradigm where participants used the mouse to uncover a portion of the screen which allowed us to track how much of the stimulus participants uncovered prior to their decision. Participants were randomly assigned to either a high (5-s time per trial) or a low (10-s time per trial) time-pressure condition. In half of the trials, participants were supported by an automated diagnostic aid (85% reliability) in deciding whether a threat item was present. Moreover, within each half, in target-present trials, targets appeared in a predictable location (i.e., 70% of targets appeared in the same quadrant of the image) to investigate effects of target expectancy. The results revealed better detection performance with low time pressure and faster response times with high time pressure. There was an overall negative effect of automation support because the automation was only moderately reliable. Participants also uncovered a smaller amount of the stimulus under high time pressure in target-absent trials. Target expectancy of target location improved accuracy, speed, and the amount of uncovered space needed for the search.Significance Statement Luggage screening is a safety-critical real-world visual search task which often has to be done under time pressure. The present research found that time pressure compromises performance and increases the risk to miss critical items even with automation support. Moreover, even highly reliable automated support may not improve performance if it does not exceed the manual capabilities of the human screener. Lastly, the present research also showed that heuristic search strategies (e.g., areas where targets appear more often) seem to guide attention also in luggage screening.
Collapse
Affiliation(s)
- Tobias Rieger
- Department of Psychology and Ergonomics, Chair of Work, Engineering, and Organizational Psychology, F7, Technische Universität Berlin, Marchstr. 12, 10587, Berlin, Germany.
| | - Lydia Heilmann
- Department of Psychology and Ergonomics, Chair of Work, Engineering, and Organizational Psychology, F7, Technische Universität Berlin, Marchstr. 12, 10587, Berlin, Germany
| | - Dietrich Manzey
- Department of Psychology and Ergonomics, Chair of Work, Engineering, and Organizational Psychology, F7, Technische Universität Berlin, Marchstr. 12, 10587, Berlin, Germany
| |
Collapse
|
17
|
Bartlett ML, McCarley JS. Ironic efficiency in automation-aided signal detection. ERGONOMICS 2021; 64:103-112. [PMID: 32790530 DOI: 10.1080/00140139.2020.1809716] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/19/2019] [Accepted: 07/17/2020] [Indexed: 06/11/2023]
Abstract
Decision makers often make poor use of the information provided by an automated signal detection aid; recent studies have found that participants assisted by an automated aid fell well short of best-possible sensitivity levels. The present study tested the generalisability of this finding over varying levels of aid reliability. Participants performed a binary signal detection task either unaided or with assistance from a decision aid that was 60%, 85%, or 96%-reliable. Assistance from a highly reliable aid (85% or 96%) improved discrimination performance, while assistance from a low-reliability aid (60%) did not. Because their ideal strategy is to place less weight on less reliable cues, however, the decision makers' tendency to disuse the aid became more appropriate as the aid's reliability declined. Automation-aided efficiency was thus near to optimal when the aid was close to chance but became highly inefficient, ironically, as the aid's reliability increased. Practitioner Summary: Investigating operators' automation-aided information integration strategies allows human factors practitioners to predict the level of performance the operator will attain. Ironically, in an aided signal detection task, performance when assisted by a highly reliable aid is far less efficient than that obtained when assisted by a far less reliable aid. Abbreviations: OW: optimal weighting; UW: uniform weighting; CC: contingent criterion; BD: best decides; CF: coin flip; PM: probability matching; HDI: highest density interval; MCMC: markov chain monte carlo; HR: hit rate; FAR: false alarm rate.
Collapse
Affiliation(s)
- Megan L Bartlett
- College of Education, Psychology and Social Work, Flinders University, Adelaide, Australia
| | - Jason S McCarley
- School of Psychological Science, Oregon State University, Corvallis, OR, USA
| |
Collapse
|
18
|
Huegli D, Merks S, Schwaninger A. Automation reliability, human-machine system performance, and operator compliance: A study with airport security screeners supported by automated explosives detection systems for cabin baggage screening. APPLIED ERGONOMICS 2020; 86:103094. [PMID: 32342885 DOI: 10.1016/j.apergo.2020.103094] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Revised: 02/28/2020] [Accepted: 03/03/2020] [Indexed: 06/11/2023]
Abstract
Using a simulated X-ray screening task, we tested 122 airport security screeners working with the support of explosives detection systems for cabin baggage screening (EDSCB) as low-level automation. EDSCB varied systematically on three automation reliability measures: accuracy, d', and positive predictive value (PPV). Results showed that when unaided performance was high, operator confidence was high, and automation provided only small benefits. When unaided performance was lower, operator confidence was lower, and automation with higher d' provided large benefits. Operator compliance depended on the PPV of automation: We found lower compliance for lower PPV. Automation with a high false alarm rate of 20% and a low PPV of .3 resulted in operators ignoring about one-half of the true automation alarms on difficult targets-a strong cry-wolf effect. Our results suggest that automation reliability described by d' and PPV is more valid than using accuracy alone. When the PPV is below .5, operators should receive clear instructions on how to respond to automation alarms.
Collapse
Affiliation(s)
- David Huegli
- University of Applied Sciences and Arts Northwestern Switzerland, School of Applied Psychology, Institute Humans in Complex Systems, Riggenbachstrasse 16, CH-4600, Olten, Switzerland.
| | - Sarah Merks
- University of Applied Sciences and Arts Northwestern Switzerland, School of Applied Psychology, Institute Humans in Complex Systems, Riggenbachstrasse 16, CH-4600, Olten, Switzerland.
| | - Adrian Schwaninger
- University of Applied Sciences and Arts Northwestern Switzerland, School of Applied Psychology, Institute Humans in Complex Systems, Riggenbachstrasse 16, CH-4600, Olten, Switzerland.
| |
Collapse
|
19
|
Wiczorek R, Meyer J. Effects of Trust, Self-Confidence, and Feedback on the Use of Decision Automation. Front Psychol 2019; 10:519. [PMID: 30915005 PMCID: PMC6423180 DOI: 10.3389/fpsyg.2019.00519] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2018] [Accepted: 02/21/2019] [Indexed: 11/13/2022] Open
Abstract
Operators often fail to rely sufficiently on alarm systems. This results in a joint human-machine (JHM) sensitivity below the one of the alarm system. The 'confidence vs. trust hypothesis' assumes the use of the system depends on the weighting of both values. In case of higher confidence, the task is performed manually, if trust is higher, the user relies on the system. Thus, insufficient reliance may be due to operators' overconfidence in their own abilities and/or insufficient trust in the decision automation, but could be mitigated by providing feedback. That was investigated within a signal detection task, supported by a system with either higher sensitivity (HSS) or lower sensitivity (LSS) than the human, while being provided with feedback or not. We expected disuse of the LSS and insufficiently reliance on the HSS, in the condition without feedback. The feedback was expected to increase reliance on the HSS through an increase in trust and/or decreases in confidence, and thus, improve performance. Hypotheses were partly supported. Confidence in manual performance was similar to trust in the HSS even though humans' sensitivity was significantly lower than systems' sensitivity. While confidence had not effect on reliance or JHM sensitivity, trust was found to be positively related with both. We found disuse of the HSS, that could be improved through feedback, increasing also trust and JHM sensitivity. However, contrary to 'confidence vs. trust' expectations, participants were also found to make use of the LSS. This misuse could not be reduced by feedback. Results indicate the use of feedback being beneficial for the overall performance (with HSS only). Findings do not support the idea that misuse or disuse of the system may result from comparison of confidence and trust. We suppose it may rather be the product of users' wrong strategy of function allocation, based on the underlying idea of team work in combination with missing assignment of responsibility. We discuss this alternative explanation.
Collapse
Affiliation(s)
- Rebecca Wiczorek
- Department of Psychology and Ergonomics, Technische Universität Berlin, Berlin, Germany
| | - Joachim Meyer
- Department of Industrial Engineering, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
20
|
Bartlett ML, McCarley JS. No Effect of Cue Format on Automation Dependence in an Aided Signal Detection Task. HUMAN FACTORS 2019; 61:169-190. [PMID: 30335518 DOI: 10.1177/0018720818802961] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
OBJECTIVE To investigate whether manipulating the format of an automated decision aid's cues can improve participants' information integration strategies in a signal detection task. BACKGROUND Automation-aided decision making is often suboptimal, falling well short of statistically ideal levels. The choice of format in which the cues from the aid are displayed may help users to better understand and integrate the aid's judgments with their own. METHOD Participants performed a signal detection task that asked them to classify random dot images as either blue or orange dominant. They made their judgments either unaided or with assistance from a 93% reliable automated decision aid. The aid provided a binary judgment, along with an estimate of signal strength in the form of either a raw value, a likelihood ratio, or a confidence rating (Experiments 1 and 2) or a binary judgment along with either a verbal or verbal-visuospatial expression of confidence (Experiment 3). Aided sensitivity was benchmarked to the predictions of various statistical models of collaborative decision making. RESULTS AND CONCLUSION Aided performance was suboptimal, matching the predictions of some of the least efficient models. Most importantly, performance was similar across cue formats. APPLICATION Results indicate that changes to the format in which cues from a signal detection aid are rendered are unlikely to dramatically improve the efficiency of automation-aided decision making.
Collapse
|
21
|
Yamani Y, McCarley JS. Effects of Task Difficulty and Display Format on Automation Usage Strategy: A Workload Capacity Analysis. HUMAN FACTORS 2018; 60:527-537. [PMID: 29470135 DOI: 10.1177/0018720818759356] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Objective An experiment used workload capacity analysis to quantify automation usage strategy across different task difficulty and display format types in a speeded task. Background Workload capacity measures the efficiency of concurrent information processing and can serve as a gauge of automation usage strategy in speeded decision tasks. The present study used workload capacity analysis to investigate automation usage strategy while information display format and task difficulty were manipulated. Method Subjects performed a speeded judgment task assisted by an automated aid that issued decision cues at varying onset times. Response time distributions were converted to measures of workload capacity. Results Two variants of a workload capacity measure, CzOR and CzAND, gave evidence that operators moderated their own decision times both in anticipation of and following the arrival of the aid's diagnosis under difficult task conditions regardless of display format. Conclusion Assistance from an automated decision aid may cause operators to delay their own responses in a speeded decision task, producing joint response time distributions that are slower than optimal. Application Even when it renders its own judgments quickly and with high accuracy, an automated decision aid may slow responses from a user. Automation designers should consider the relative costs and benefits of response accuracy and time when choosing whether and how to implement an automated decision aid.
Collapse
Affiliation(s)
- Yusuke Yamani
- Old Dominion University, Norfolk, Virginia
- Oregon State University, Corvallis
| | | |
Collapse
|