1
|
Barash M, McNevin D, Fedorenko V, Giverts P. Machine learning applications in forensic DNA profiling: A critical review. Forensic Sci Int Genet 2024; 69:102994. [PMID: 38086200 DOI: 10.1016/j.fsigen.2023.102994] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Revised: 11/06/2023] [Accepted: 11/26/2023] [Indexed: 01/29/2024]
Abstract
Machine learning (ML) is a range of powerful computational algorithms capable of generating predictive models via intelligent autonomous analysis of relatively large and often unstructured data. ML has become an integral part of our daily lives with a plethora of applications, including web, business, automotive industry, clinical diagnostics, scientific research, and more recently, forensic science. In the field of forensic DNA, the manual analysis of complex data can be challenging, time-consuming, and error-prone. The integration of novel ML-based methods may aid in streamlining this process while maintaining the high accuracy and reproducibility required for forensic tools. Due to the relative novelty of such applications, the forensic community is largely unaware of ML capabilities and limitations. Furthermore, computer science and ML professionals are often unfamiliar with the forensic science field and its specific requirements. This manuscript offers a brief introduction to the capabilities of machine learning methods and their applications in the context of forensic DNA analysis and offers a critical review of the current literature in this rapidly developing field.
Collapse
Affiliation(s)
- Mark Barash
- Department of Justice Studies, San José State University, San Jose, CA, United States; Centre for Forensic Science, School of Mathematical and Physical Sciences, Faculty of Science, University of Technology Sydney, Broadway, Ultimo, NSW 2007, Australia.
| | - Dennis McNevin
- Centre for Forensic Science, School of Mathematical and Physical Sciences, Faculty of Science, University of Technology Sydney, Broadway, Ultimo, NSW 2007, Australia
| | - Vladimir Fedorenko
- The Educational and Scientific Laboratory of Forensic Materials Engineering of the Saratov State University, Russia
| | - Pavel Giverts
- Division of Identification and Forensic Science, Israel Police HQ, Haim Bar-Lev Road, Jerusalem, Israel
| |
Collapse
|
2
|
Kruijver M, Curran JM. The number of alleles in DNA mixtures with related contributors. Forensic Sci Int Genet 2022; 61:102748. [DOI: 10.1016/j.fsigen.2022.102748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 07/11/2022] [Accepted: 07/19/2022] [Indexed: 11/24/2022]
|
3
|
Holland MM, Tiedge TM, Bender AJ, Gaston-Sanchez SA, McElhoe JA. MaSTR™: an effective probabilistic genotyping tool for interpretation of STR mixtures associated with differentially degraded DNA. Int J Legal Med 2022; 136:433-446. [DOI: 10.1007/s00414-021-02771-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Accepted: 12/21/2021] [Indexed: 11/30/2022]
|
4
|
Noël J, Noël S, Mailly F, Granger D, Lefebvre JF, Milot E, Séguin D. Total allele count distribution (TAC curves) improves number of contributor estimation for complex DNA mixtures. CANADIAN SOCIETY OF FORENSIC SCIENCE JOURNAL 2022. [DOI: 10.1080/00085030.2022.2028359] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Josée Noël
- Laboratoire de Sciences Judiciaires et de Médecine Légale, Montréal, Québec, Canada
| | - Sarah Noël
- Laboratoire de Sciences Judiciaires et de Médecine Légale, Montréal, Québec, Canada
| | - France Mailly
- Laboratoire de Sciences Judiciaires et de Médecine Légale, Montréal, Québec, Canada
| | - Dominic Granger
- Laboratoire de Sciences Judiciaires et de Médecine Légale, Montréal, Québec, Canada
| | | | - Emmanuel Milot
- Laboratoire de Recherche en Criminalistique, Department of Chemistry, Biochemistry and Physics and Centre International de Criminologie Comparée, Université du Québec à Trois-Rivières, Trois-Rivières, Québec, Canada
| | - Diane Séguin
- Laboratoire de Sciences Judiciaires et de Médecine Légale, Montréal, Québec, Canada
| |
Collapse
|
5
|
Gill P, Benschop C, Buckleton J, Bleka Ø, Taylor D. A Review of Probabilistic Genotyping Systems: EuroForMix, DNAStatistX and STRmix™. Genes (Basel) 2021; 12:1559. [PMID: 34680954 PMCID: PMC8535381 DOI: 10.3390/genes12101559] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Revised: 09/24/2021] [Accepted: 09/28/2021] [Indexed: 11/24/2022] Open
Abstract
Probabilistic genotyping has become widespread. EuroForMix and DNAStatistX are both based upon maximum likelihood estimation using a γ model, whereas STRmix™ is a Bayesian approach that specifies prior distributions on the unknown model parameters. A general overview is provided of the historical development of probabilistic genotyping. Some general principles of interpretation are described, including: the application to investigative vs. evaluative reporting; detection of contamination events; inter and intra laboratory studies; numbers of contributors; proposition setting and validation of software and its performance. This is followed by details of the evolution, utility, practice and adoption of the software discussed.
Collapse
Affiliation(s)
- Peter Gill
- Forensic Genetics Research Group, Department of Forensic Sciences, Oslo University Hospital, 0372 Oslo, Norway;
- Department of Forensic Medicine, Institute of Clinical Medicine, University of Oslo, 0315 Oslo, Norway
| | - Corina Benschop
- Division of Biological Traces, Netherlands Forensic Institute, P.O. Box 24044, 2490 AA The Hague, The Netherlands;
| | - John Buckleton
- Department of Statistics, University of Auckland, Private Bag 92019, Auckland 1142, New Zealand;
- Institute of Environmental Science and Research Limited, Private Bag 92021, Auckland 1142, New Zealand
| | - Øyvind Bleka
- Forensic Genetics Research Group, Department of Forensic Sciences, Oslo University Hospital, 0372 Oslo, Norway;
| | - Duncan Taylor
- Forensic Science SA, GPO Box 2790, Adelaide, SA 5001, Australia;
- School of Biological Sciences, Flinders University, GPO Box 2100, Adelaide, SA 5001, Australia
| |
Collapse
|
6
|
Estimating the number of contributors to a DNA profile using decision trees. Forensic Sci Int Genet 2020; 50:102407. [PMID: 33197741 DOI: 10.1016/j.fsigen.2020.102407] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2020] [Revised: 09/30/2020] [Accepted: 10/03/2020] [Indexed: 11/20/2022]
Abstract
The interpretation of DNA profiles typically starts with an assessment of the number of contributors. In the last two decades, several methods have been proposed to assist with this assessment. We describe a relatively simple method using decision trees, that is fast to run and fully transparent to a forensic analyst. We use mixtures from the publicly available PROVEDIt dataset to demonstrate the performance of the method. We show that the performance of the method crucially depends on the performance of filters for stutter and other artefacts. We compare the performance of the decision tree method with other published methods for the same dataset.
Collapse
|
7
|
Benschop CC, van der Linden J, Hoogenboom J, Ypma R, Haned H. Automated estimation of the number of contributors in autosomal short tandem repeat profiles using a machine learning approach. Forensic Sci Int Genet 2019; 43:102150. [DOI: 10.1016/j.fsigen.2019.102150] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2019] [Revised: 08/15/2019] [Accepted: 08/19/2019] [Indexed: 01/19/2023]
|
8
|
Young BA, Gettings KB, McCord B, Vallone PM. Estimating number of contributors in massively parallel sequencing data of STR loci. Forensic Sci Int Genet 2019; 38:15-22. [DOI: 10.1016/j.fsigen.2018.09.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Revised: 09/12/2018] [Accepted: 09/24/2018] [Indexed: 12/21/2022]
|
9
|
Alfonse LE, Garrett AD, Lun DS, Duffy KR, Grgicak CM. A large-scale dataset of single and mixed-source short tandem repeat profiles to inform human identification strategies: PROVEDIt. Forensic Sci Int Genet 2018; 32:62-70. [DOI: 10.1016/j.fsigen.2017.10.006] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Revised: 09/07/2017] [Accepted: 10/20/2017] [Indexed: 01/15/2023]
|
10
|
Alfonse LE, Tejada G, Swaminathan H, Lun DS, Grgicak CM. Inferring the Number of Contributors to Complex DNA Mixtures Using Three Methods: Exploring the Limits of Low-Template DNA Interpretation. J Forensic Sci 2016; 62:308-316. [DOI: 10.1111/1556-4029.13284] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2015] [Revised: 06/11/2016] [Accepted: 06/25/2016] [Indexed: 11/29/2022]
Affiliation(s)
- Lauren E. Alfonse
- Biomedical Forensic Sciences; Boston University School of Medicine; Boston MA
| | - Genesis Tejada
- Biomedical Forensic Sciences; Boston University School of Medicine; Boston MA
| | - Harish Swaminathan
- Center for Computational and Integrative Biology; Rutgers University; Camden NJ
| | - Desmond S. Lun
- Center for Computational and Integrative Biology; Rutgers University; Camden NJ
- School of Information Technology and Mathematical Sciences; University of South Australia; Adelaide SA Australia
| | | |
Collapse
|
11
|
The effect of varying the number of contributors on likelihood ratios for complex DNA mixtures. Forensic Sci Int Genet 2015. [DOI: 10.1016/j.fsigen.2015.07.003] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
12
|
Haned H, Benschop CCG, Gill PD, Sijen T. Complex DNA mixture analysis in a forensic context: evaluating the probative value using a likelihood ratio model. Forensic Sci Int Genet 2014; 16:17-25. [PMID: 25485478 DOI: 10.1016/j.fsigen.2014.11.014] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2014] [Revised: 11/14/2014] [Accepted: 11/16/2014] [Indexed: 12/01/2022]
Abstract
The interpretation of mixed DNA profiles obtained from low template DNA samples has proven to be a particularly difficult task in forensic casework. Newly developed likelihood ratio (LR) models that account for PCR-related stochastic effects, such as allelic drop-out, drop-in and stutters, have enabled the analysis of complex cases that would otherwise have been reported as inconclusive. In such samples, there are uncertainties about the number of contributors, and the correct sets of propositions to consider. Using experimental samples, where the genotypes of the donors are known, we evaluated the feasibility and the relevance of the interpretation of high order mixtures, of three, four and five donors. The relative risks of analyzing high order mixtures of three, four, and five donors, were established by comparison of a 'gold standard' LR, to the LR that would be obtained in casework. The 'gold standard' LR is the ideal LR: since the genotypes and number of contributors are known, it follows that the parameters needed to compute the LR can be determined per contributor. The 'casework LR' was calculated as used in standard practice, where unknown donors are assumed; the parameters were estimated from the available data. Both LRs were calculated using the basic standard model, also termed the drop-out/drop-in model, implemented in the LRmix module of the R package Forensim. We show how our results furthered the understanding of the relevance of analyzing high order mixtures in a forensic context. Limitations are highlighted, and it is illustrated how our study serves as a guide to implement likelihood ratio interpretation of complex DNA profiles in forensic casework.
Collapse
Affiliation(s)
- Hinda Haned
- Department of Human Biological Traces, Netherlands Forensic Institute, P.O. Box 24044, 2490 AA The Hague, The Netherlands.
| | - Corina C G Benschop
- Department of Human Biological Traces, Netherlands Forensic Institute, P.O. Box 24044, 2490 AA The Hague, The Netherlands.
| | - Peter D Gill
- National Institute of Public Health, Department of Forensic Biology, P.O. Box 4404 Nydalen, 0403 Oslo, Norway; National Institute of Public Health, Department of Forensic Medicine, P.O. Box 4950 Nydalen, 0424 Oslo, Norway.
| | - Titia Sijen
- Department of Human Biological Traces, Netherlands Forensic Institute, P.O. Box 24044, 2490 AA The Hague, The Netherlands.
| |
Collapse
|