1
|
Martire KA, Chin JM, Davis C, Edmond G, Growns B, Gorski S, Kemp RI, Lee Z, Verdon CM, Jansen G, Lang T, Neal TM, Searston RA, Slocum J, Summersby S, Tangen JM, Thompson MB, Towler A, Watson D, Werrett MV, Younan M, Ballantyne KN. Understanding 'error' in the forensic sciences: A primer. Forensic Sci Int Synerg 2024; 8:100470. [PMID: 39005839 PMCID: PMC11240290 DOI: 10.1016/j.fsisyn.2024.100470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 04/02/2024] [Accepted: 04/17/2024] [Indexed: 07/16/2024]
Abstract
This paper distils seven key lessons about 'error' from a collaborative webinar series between practitioners at Victoria Police Forensic Services Department and academics. It aims to provide the common understanding of error necessary to foster interdisciplinary dialogue, collaboration and research. The lessons underscore the inevitability, complexity and subjectivity of error, as well as opportunities for learning and growth. Ultimately, we argue that error can be a potent tool for continuous improvement and accountability, enhancing the reliability of forensic sciences and public trust. It is hoped the shared understanding provided by this paper will support future initiatives and funding for collaborative developments in this vital domain.
Collapse
Affiliation(s)
- Kristy A. Martire
- School of Psychology, University of New South Wales Sydney, Australia
| | | | - Carolyn Davis
- Major Crime Scene Unit, Victoria Police Forensic Services Department, Australia
| | - Gary Edmond
- School of Law, Society & Criminology, University of New South Wales Sydney, Australia
| | - Bethany Growns
- School of Psychology, Speech and Hearing, University of Canterbury, New Zealand
| | - Stacey Gorski
- Biological Sciences Group, Victoria Police Forensic Services Department, Australia
| | - Richard I. Kemp
- School of Psychology, University of New South Wales Sydney, Australia
| | - Zara Lee
- Fingerprint Sciences Group, Victoria Police Forensic Services Department, Australia
| | | | - Gabrielle Jansen
- Morwell Forensic Hub, Victoria Police Forensic Services Department, Australia
| | - Tanya Lang
- Major Crime Scene Unit, Victoria Police Forensic Services Department, Australia
| | | | | | - Joshua Slocum
- Fingerprint Sciences Group, Victoria Police Forensic Services Department, Australia
| | - Stephanie Summersby
- Office of the Chief Forensic Scientist, Victoria Police Forensic Services Department, Australia
| | - Jason M. Tangen
- School of Psychology, The University of Queensland, Australia
| | - Matthew B. Thompson
- School of Psychology, Murdoch University, Australia
- Centre for Biosecurity and One Health, Harry Butler Institute, Murdoch University, Australia
| | - Alice Towler
- School of Psychology, The University of Queensland, Australia
| | - Darren Watson
- Ballistics Unit, Victoria Police Forensic Services Department, Australia
| | - Melissa V. Werrett
- Chemical Trace Unit, Chemical and Physical Sciences Group, Victoria Police Forensic Services Department, Australia
| | - Mariam Younan
- School of Psychology, University of New South Wales Sydney, Australia
| | - Kaye N. Ballantyne
- Office of the Chief Forensic Scientist, Victoria Police Forensic Services Department, Australia
| |
Collapse
|
2
|
Koehler JJ, Mnookin JL, Saks MJ. The scientific reinvention of forensic science. Proc Natl Acad Sci U S A 2023; 120:e2301840120. [PMID: 37782789 PMCID: PMC10576124 DOI: 10.1073/pnas.2301840120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2023] Open
Abstract
Forensic science is undergoing an evolution in which a long-standing "trust the examiner" focus is being replaced by a "trust the scientific method" focus. This shift, which is in progress and still partial, is critical to ensure that the legal system uses forensic information in an accurate and valid way. In this Perspective, we discuss the ways in which the move to a more empirically grounded scientific culture for the forensic sciences impacts testing, error rate analyses, procedural safeguards, and the reporting of forensic results. However, we caution that the ultimate success of this scientific reinvention likely depends on whether the courts begin to engage with forensic science claims in a more rigorous way.
Collapse
Affiliation(s)
| | | | - Michael J. Saks
- Sandra Day O’Connor College of Law, Arizona State University, Phoenix, AZ85004
| |
Collapse
|
3
|
Warren EM, Sheets HD. The inconclusive category, entropy, and forensic firearm identification. Forensic Sci Int 2023; 349:111741. [PMID: 37279628 DOI: 10.1016/j.forsciint.2023.111741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 05/15/2023] [Accepted: 05/30/2023] [Indexed: 06/08/2023]
Abstract
There has been extensive recent discussion of the difficulty in estimating meaningful error rates in forensic firearms examinations, and other areas of pattern evidence. The 2016 President's Council of Advisors on Science and Technology (PCAST) report was clear in criticizing many forensic disciplines as lacking the types of studies that would provide error rate measurements seen in other scientific fields. However, there is a substantial lack of consensus on the approach to measuring an "error rate" for fields such as forensic firearm examination that include in the conclusion scale the "inconclusive" category, as occurs in the Association of Firearm and Tool Mark Examiners (AFTE) Range of Conclusions and many other such fields. Many authors appear to assume the error rate calculated in the binary decision model is the only appropriate way to report errors, but there have been attempts made to adapt the error rate from the binary decision model to scientific fields in which the inconclusive category is viewed as a meaningful outcome of the examination process. In this study we present three neural networks of differing complexity and performance trained to classify the outlines of ejector marks on cartridge cases fired from different firearm models, as a model system for examining the performance of various metrics of error in systems using the inconclusive category. We also discuss an entropy, or information, based method to assess the similarity of classifications to ground truth that is applicable to range of conclusion scales, even when the inconclusive category is used.
Collapse
Affiliation(s)
- E M Warren
- SEP Forensic Consultants, 296 Washington Ave., Memphis, TN 38103, USA
| | - H D Sheets
- Data Analytics Program, Department of Quantitative Science Canisius College, 2001 Main Street, Buffalo, NY 14208, USA.
| |
Collapse
|
4
|
Reidy S, Harris R, Gwinnett C, Reel S. Planning and developing a method for collecting ground truth data relating to footwear mark evidence. Sci Justice 2022; 62:632-643. [DOI: 10.1016/j.scijus.2022.09.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 09/07/2022] [Accepted: 09/29/2022] [Indexed: 11/06/2022]
|
5
|
Decision Theory and Linear Sequential Unmasking in Forensic Fire Debris Analysis: A Proposed Workflow. Forensic Chem 2022. [DOI: 10.1016/j.forc.2022.100426] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
6
|
Monson KL, Smith ED, Bajic SJ. Planning, design and logistics of a decision analysis study: The FBI/Ames study involving forensic firearms examiners. Forensic Sci Int Synerg 2022; 4:100221. [PMID: 35243285 PMCID: PMC8860930 DOI: 10.1016/j.fsisyn.2022.100221] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Revised: 02/03/2022] [Accepted: 02/04/2022] [Indexed: 11/03/2022]
Abstract
This paper describes design and logistical aspects of a decision analysis study to assess the performance of qualified firearms examiners working in accredited laboratories in the United States in terms of accuracy (error rate), repeatability, and reproducibility of decisions involving comparisons of fired bullets and cartridge cases. The purpose of the study was to validate current practice of the forensic discipline of firearms/toolmarks (F/T) examination. It elicited error rate data by counting the number of false positive and false negative conclusions. Preceded by the experimental design, decisions, and logistics described herein, testing was ultimately administered 173 qualified, practicing F/T examiners in public and private crime laboratories. The first round of testing evaluated accuracy, while two subsequent rounds evaluated repeatability and reproducibility of examiner conclusions. This project expands on previous studies by involving many F/T examiners in challenging comparisons and by executing the study in the recommended double-blind format.
Collapse
|
7
|
Dorfman AH, Valliant R. Inconclusives, errors, and error rates in forensic firearms analysis:Three statistical perspectives. Forensic Sci Int Synerg 2022; 5:100273. [PMID: 35800204 PMCID: PMC9254335 DOI: 10.1016/j.fsisyn.2022.100273] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Revised: 05/22/2022] [Accepted: 05/26/2022] [Indexed: 11/24/2022]
Abstract
Error rates that have been published in recent open black box studies of forensic firearms examiner performance have been very low, typically below one percent. These low error rates have been challenged, however, as not properly taking into account one of the categories, “Inconclusive”, that examiners can reach in comparing a pair of bullets or cartridges. These challenges have themselves been challenged; how to consider the inconclusives and their effect on error rates is currently a matter of sharp debate. We review several viewpoints that have been put forth, and then examine the impact of inconclusives on error rates from three fresh statistical perspectives: (a) an ideal perspective using objective measurements combined with statistical algorithms, (b) basic sampling theory and practice, and (c) standards of experimental design in human studies. Our conclusions vary with the perspective: (a) inconclusives can be simple errors (or, on the other hand, simply correct or at least well justified); (b) inconclusives need not be counted as errors to bring into doubt assessments of error rates; (c) inconclusives are potential errors, more explicitly, inconclusives in studies are not necessarily the equivalent of inconclusives in casework and can mask potential errors in casework. From all these perspectives, it is impossible to simply read out trustworthy estimates of error rates from those studies which have been carried out to date. At most, one can put reasonable bounds on the potential error rates. These are much larger than the nominal rates reported in the studies. To get straightforward, sound estimates of error rates requires a challenging but critical improvement to the design of firearms studies. A proper study—one in which inconclusives are not potential errors, and which yields direct, sound estimates of error rates—will require new objective measures or blind proficiency testing embedded in ordinary casework.
Collapse
Affiliation(s)
- Alan H. Dorfman
- National Center for Health Statistics (retired), Bethesda, MD, 20814-1345, USA
- Corresponding author.
| | | |
Collapse
|
8
|
Smith AM, Neal TMS. The distinction between discriminability and reliability in forensic science. Sci Justice 2021; 61:319-331. [PMID: 34172120 DOI: 10.1016/j.scijus.2021.04.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 03/15/2021] [Accepted: 04/23/2021] [Indexed: 12/11/2022]
Abstract
Forensic science plays an increasingly important role in the criminal justice system; yet, many forensic procedures have not been subject to the empirical scrutiny that is expected in other scientific disciplines. Over the past two decades, the scientific community has done well to bridge the gap, but have likely only scratched the tip of the iceberg. We offer the discriminability-reliability distinction as a critical framework to guide future research on diagnostic-testing procedures in the forensic science domain. We argue that the primary concern of the scientist ought to be maximizing discriminability and that the primary concern of the criminal justice system ought to be assessing the reliability of evidence. We argue that Receiver Operating Characteristic (ROC) analysis is uniquely equipped for determining which of two procedures or conditions has better discriminability and we also demonstrate how estimates of reliability can be extracted from this Signal Detection framework.
Collapse
|
9
|
Mannering WM, Vogelsang MD, Busey TA, Mannering FL. Are forensic scientists too risk averse? J Forensic Sci 2021; 66:1377-1400. [PMID: 33748945 DOI: 10.1111/1556-4029.14700] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 01/26/2021] [Accepted: 02/19/2021] [Indexed: 11/29/2022]
Abstract
Fingerprint examiners maintain decision thresholds that represent the amount of evidence required for an identification or exclusion conclusion. As measured by error rate studies (Proc Natl Acad Sci USA. 2011;108(19):7733-8), these decision thresholds currently exhibit a preference for preventing erroneous identification errors at the expense of preventing erroneous exclusion errors. The goal of this study is to measure the decision thresholds for both fingerprint examiners and members of the general public, to determine whether examiners are more risk averse than potential jury members. To externally measure these decision thresholds, subjects manipulated decision criteria in a web-based visualization that reflects the trade-offs between erroneous identification decisions and erroneous exclusion decisions. Data from fingerprint examiners and the general public were compared to determine whether both groups have similar values as expressed by the placement of the decision criteria. The results of this study show that fingerprint examiners are more risk averse than members of the general public, although they align with error rate studies of fingerprint examiners. Demographic data demonstrate those factors that may contribute to differences in decision criterion placement, both between the two groups and between individuals within a group. The experimental methods provide a rich framework for measuring, interpreting, and responding to the values of society as applied to forensic decision-making.
Collapse
Affiliation(s)
- Willa M Mannering
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN, USA
| | | | - Thomas A Busey
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN, USA
| | - Fred L Mannering
- Department of Civil and Environmental Engineering, University of South Florida, Tampa, FL, USA
| |
Collapse
|
10
|
Growns B, Kukucka J. The prevalence effect in fingerprint identification: Match and
non‐match base‐rates
impact misses and false alarms. APPLIED COGNITIVE PSYCHOLOGY 2021. [DOI: 10.1002/acp.3800] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Bethany Growns
- School of Social and Behavioral Sciences Arizona State University Arizona USA
| | - Jeff Kukucka
- Department of Psychology Towson University Maryland USA
| |
Collapse
|
11
|
De Beuf TLF, de Ruiter C, Edens JF, de Vogel V. Taking "the boss" into the real world: Field interrater reliability of the Short-Term Assessment of Risk and Treatability: Adolescent Version. BEHAVIORAL SCIENCES & THE LAW 2021; 39:123-144. [PMID: 33569848 PMCID: PMC7986435 DOI: 10.1002/bsl.2503] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Revised: 10/30/2020] [Accepted: 11/20/2020] [Indexed: 06/12/2023]
Abstract
There is emerging evidence that the performance of risk assessment instruments is weaker when used for clinical decision-making than for research purposes. For instance, research has found lower agreement between evaluators when the risk assessments are conducted during routine practice. We examined the field interrater reliability of the Short-Term Assessment of Risk and Treatability: Adolescent Version (START:AV). Clinicians in a Dutch secure youth care facility completed START:AV assessments as part of the treatment routine. Consistent with previous literature, interrater reliability of the items and total scores was lower than previously reported in non-field studies. Nevertheless, moderate to good interrater reliability was found for final risk judgments on most adverse outcomes. Field studies provide insights into the actual performance of structured risk assessment in real-world settings, exposing factors that affect reliability. This information is relevant for those who wish to implement structured risk assessment with a level of reliability that is defensible considering the high stakes.
Collapse
Affiliation(s)
- Tamara L. F. De Beuf
- Research DepartmentOttho Gerhard Heldring InstitutionZettenNetherlands
- Department of Clinical Psychological ScienceMaastricht UniversityMaastrichtNetherlands
| | - Corine de Ruiter
- Department of Clinical Psychological ScienceMaastricht UniversityMaastrichtNetherlands
| | - John F. Edens
- Department of Psychological and Brain SciencesTexas A&M UniversityCollege StationTexasUSA
| | - Vivienne de Vogel
- Research DepartmentDe Forensische ZorgspecialistenUtrechtNetherlands
| |
Collapse
|
12
|
Koehler JJ, Liu S. Fingerprint error rate on close non-matches. J Forensic Sci 2020; 66:129-134. [PMID: 32990979 DOI: 10.1111/1556-4029.14580] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2020] [Revised: 08/28/2020] [Accepted: 09/01/2020] [Indexed: 02/06/2023]
Abstract
The accuracy of fingerprint identifications is critically important to the administration of criminal justice. Accuracy is challenging when two prints from different sources have many common features and few dissimilar features. Such print pairs, known as close non-matches (CNMs), are increasingly likely to arise as ever-growing databases are searched with greater frequency. In this study, 125 fingerprint agencies completed a mandatory proficiency test that included two pairs of CNMs. The false-positive error rates on the two CNMs were 15.9% (17 out of 107, 95% C.I.: 9.5%, 24.2%) and 28.1% (27 out of 96, 95% C.I.: 19.4%, 38.2%), respectively. These CNM error rates are (a) inconsistent with the popular notion that fingerprint evidence is nearly infallible, and (b) larger than error rates reported in leading fingerprint studies. We conclude that, when the risk of CNMs is high, the probative value of a reported fingerprint identification may be severely diminished due to an elevated false-positive error risk. We call for additional CNM research, including a replication and expansion of the present study using a representative selection of CNMs from database searches.
Collapse
Affiliation(s)
| | - Shiquan Liu
- Institute of Evidence Law and Forensic Science, China University of Political Science and Law, Beijing, China
| |
Collapse
|
13
|
Mattijssen EJAT, Witteman CLM, Berger CEH, Zheng XA, Soons JA, Stoel RD. Firearm examination: Examiner judgments and computer-based comparisons. J Forensic Sci 2020; 66:96-111. [PMID: 32970858 PMCID: PMC7821150 DOI: 10.1111/1556-4029.14557] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Revised: 07/01/2020] [Accepted: 08/10/2020] [Indexed: 01/25/2023]
Abstract
Forensic firearm examination provides the court of law with information about the source of fired cartridge cases. We assessed the validity of source decisions of a computer-based method and of 73 firearm examiners who compared breechface and firing pin impressions of 48 comparison sets. We also compared the computer-based method's comparison scores with the examiners' degree-of-support judgments and assessed the validity of the latter. The true-positive rate (sensitivity) and true-negative rate (specificity) of the computer-based method (for the comparison of both the breechface and firing pin impressions) were 94.4% and at least 91.7%, respectively. For the examiners, the true-positive rate was at least 95.3% and the true-negative rate was at least 86.2%. The validity of the source decisions improved when the evaluations of breechface and firing pin impressions were combined and for the examiners also when the perceived difficulty of the comparison decreased. The examiners were reluctant to provide source decisions for "difficult" comparisons even though their source decisions were mostly correct. The correlation between the computer-based method's comparison scores and the examiners' degree-of-support judgments was low for the same-source comparisons to negligible for the different-source comparisons. Combining the outcomes of computer-based methods with the judgments of examiners could increase the validity of firearm examinations. The examiners' numerical degree-of-support judgments for their source decisions were not well-calibrated and showed clear signs of overconfidence. We suggest studying the merits of performance feedback to calibrate these judgments.
Collapse
Affiliation(s)
- Erwin J A T Mattijssen
- Behavioural Science Institute, Radboud University Nijmegen, Nijmegen, The Netherlands.,Netherlands Forensic Institute, The Hague, The Netherlands
| | - Cilia L M Witteman
- Behavioural Science Institute, Radboud University Nijmegen, Nijmegen, The Netherlands
| | - Charles E H Berger
- Netherlands Forensic Institute, The Hague, The Netherlands.,Institute for Criminal Law and Criminology, Leiden University, Leiden, The Netherlands
| | - Xiaoyu A Zheng
- Sensor Science Division, National Institute of Standards and Technology, Gaithersburg, Maryland, USA
| | - Johannes A Soons
- Sensor Science Division, National Institute of Standards and Technology, Gaithersburg, Maryland, USA
| | | |
Collapse
|
14
|
Dror IE, Scurich N. (Mis)use of scientific measurements in forensic science. Forensic Sci Int Synerg 2020; 2:333-338. [PMID: 33385131 PMCID: PMC7770438 DOI: 10.1016/j.fsisyn.2020.08.006] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Revised: 08/21/2020] [Accepted: 08/22/2020] [Indexed: 04/16/2023]
Abstract
Forensic science error rate studies have not given sufficient attention or weight to inconclusive evidence and inconclusive decisions. Inconclusive decisions can be correct decisions, but they can also be incorrect decisions. Errors can occur when inconclusive evidence is determined as an identification or exclusion, or conversely, when same- or different-source evidence is incorrectly determined as inconclusive. We present four common flaws in error rate studies: 1. Not including test items which are more prone to error; 2. Excluding inconclusive decisions from error rate calculations; 3. Counting inconclusive decisions as correct in error rate calculations; and 4. Examiners resorting to more inconclusive decisions during error rate studies than they do in casework. These flaws seriously undermine the credibility and accuracy of error rates reported in studies. To remedy these shortcomings, we present the problems and show the way forward by providing a corrected experimental design that quantifies error rates more accurately.
Collapse
Affiliation(s)
- Itiel E. Dror
- University College London (UCL), 35 Tavistock Square, London, WC1H 9EZ, USA
- Corresponding author.
| | - Nicholas Scurich
- University of California, Irvine, 4312 Social and Behavioral Sciences Gateway, Irvine, CA, 92697, USA
| |
Collapse
|