1
|
Hoffman RR, Mueller ST, Klein G, Litman J. Measures for explainable AI: Explanation goodness, user satisfaction, mental models, curiosity, trust, and human-AI performance. FRONTIERS IN COMPUTER SCIENCE 2023. [DOI: 10.3389/fcomp.2023.1096257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
If a user is presented an AI system that portends to explain how it works, how do we know whether the explanation works and the user has achieved a pragmatic understanding of the AI? This question entails some key concepts of measurement such as explanation goodness and trust. We present methods for enabling developers and researchers to: (1) Assess the a priori goodness of explanations, (2) Assess users' satisfaction with explanations, (3) Reveal user's mental model of an AI system, (4) Assess user's curiosity or need for explanations, (5) Assess whether the user's trust and reliance on the AI are appropriate, and finally, (6) Assess how the human-XAI work system performs. The methods we present derive from our integration of extensive research literatures and our own psychometric evaluations. We point to the previous research that led to the measurement scales which we aggregated and tailored specifically for the XAI context. Scales are presented in sufficient detail to enable their use by XAI researchers. For Mental Model assessment and Work System Performance, XAI researchers have choices. We point to a number of methods, expressed in terms of methods' strengths and weaknesses, and pertinent measurement issues.
Collapse
|
2
|
Abstract
The military environment generates a large amount of data of great importance, which makes necessary the use of machine learning for its processing. Its ability to learn and predict possible scenarios by analyzing the huge volume of information generated provides automatic learning and decision support. This paper aims to present a model of a machine learning architecture applied to a military organization, carried out and supported by a bibliometric study applied to an architecture model of a nonmilitary organization. For this purpose, a bibliometric analysis up to the year 2021 was carried out, making a strategic diagram and interpreting the results. The information used has been extracted from one of the main databases widely accepted by the scientific community, ISI WoS. No direct military sources were used. This work is divided into five parts: the study of previous research related to machine learning in the military world; the explanation of our research methodology using the SciMat, Excel and VosViewer tools; the use of this methodology based on data mining, preprocessing, cluster normalization, a strategic diagram and the analysis of its results to investigate machine learning in the military context; based on these results, a conceptual architecture of the practical use of ML in the military context is drawn up; and, finally, we present the conclusions, where we will see the most important areas and the latest advances in machine learning applied, in this case, to a military environment, to analyze a large set of data, providing utility, machine learning and decision support.
Collapse
|
3
|
Khanna R, Dodge J, Anderson A, Dikkala R, Irvine J, Shureih Z, Lam KH, Matthews CR, Lin Z, Kahng M, Fern A, Burnett M. Finding AI’s Faults with AAR/AI: An Empirical Study. ACM T INTERACT INTEL 2022. [DOI: 10.1145/3487065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
Would you allow an AI agent to make decisions on your behalf? If the answer is “not always,” the next question becomes “in what circumstances”? Answering this question requires human users to be able to assess an AI agent—and not just with overall pass/fail assessments or statistics. Here users need to be able to
localize
an agent’s bugs so that they can determine when they are willing to rely on the agent and when they are not. After-Action Review for AI (AAR/AI), a new AI assessment process for integration with Explainable AI systems, aims to support human users in this endeavor, and in this article we empirically investigate AAR/AI’s effectiveness with domain-knowledgeable users. Our results show that AAR/AI participants not only located significantly
more
bugs than non-AAR/AI participants did (i.e., showed greater recall) but also located them more
precisely
(i.e., with greater precision). In fact, AAR/AI participants outperformed non-AAR/AI participants on every bug and were, on average, almost six times as likely as non-AAR/AI participants to find any particular bug. Finally, evidence suggests that incorporating labeling into the AAR/AI process may encourage domain-knowledgeable users to abstract above individual instances of bugs; we hypothesize that doing so may have contributed further to AAR/AI participants’ effectiveness.
Collapse
Affiliation(s)
| | | | | | | | - Jed Irvine
- Oregon State University, Corvallis, OR, USA
| | | | - Kin-Ho Lam
- Oregon State University, Corvallis, OR, USA
| | | | | | | | - Alan Fern
- Oregon State University, Corvallis, OR, USA
| | | |
Collapse
|