1
|
Barash M, McNevin D, Fedorenko V, Giverts P. Machine learning applications in forensic DNA profiling: A critical review. Forensic Sci Int Genet 2024; 69:102994. [PMID: 38086200 DOI: 10.1016/j.fsigen.2023.102994] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Revised: 11/06/2023] [Accepted: 11/26/2023] [Indexed: 01/29/2024]
Abstract
Machine learning (ML) is a range of powerful computational algorithms capable of generating predictive models via intelligent autonomous analysis of relatively large and often unstructured data. ML has become an integral part of our daily lives with a plethora of applications, including web, business, automotive industry, clinical diagnostics, scientific research, and more recently, forensic science. In the field of forensic DNA, the manual analysis of complex data can be challenging, time-consuming, and error-prone. The integration of novel ML-based methods may aid in streamlining this process while maintaining the high accuracy and reproducibility required for forensic tools. Due to the relative novelty of such applications, the forensic community is largely unaware of ML capabilities and limitations. Furthermore, computer science and ML professionals are often unfamiliar with the forensic science field and its specific requirements. This manuscript offers a brief introduction to the capabilities of machine learning methods and their applications in the context of forensic DNA analysis and offers a critical review of the current literature in this rapidly developing field.
Collapse
Affiliation(s)
- Mark Barash
- Department of Justice Studies, San José State University, San Jose, CA, United States; Centre for Forensic Science, School of Mathematical and Physical Sciences, Faculty of Science, University of Technology Sydney, Broadway, Ultimo, NSW 2007, Australia.
| | - Dennis McNevin
- Centre for Forensic Science, School of Mathematical and Physical Sciences, Faculty of Science, University of Technology Sydney, Broadway, Ultimo, NSW 2007, Australia
| | - Vladimir Fedorenko
- The Educational and Scientific Laboratory of Forensic Materials Engineering of the Saratov State University, Russia
| | - Pavel Giverts
- Division of Identification and Forensic Science, Israel Police HQ, Haim Bar-Lev Road, Jerusalem, Israel
| |
Collapse
|
2
|
Chen D, Tan M, Xue J, Wu M, Song J, Wu Q, Liu G, Zheng Y, Xiao Y, Lv M, Liao M, Qu S, Liang W. Optimizing Analytical Thresholds for Low-Template DNA Analysis: Insights from Multi-Laboratory Negative Controls. Genes (Basel) 2024; 15:117. [PMID: 38255006 PMCID: PMC10815623 DOI: 10.3390/genes15010117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 01/15/2024] [Accepted: 01/16/2024] [Indexed: 01/24/2024] Open
Abstract
When analyzing challenging samples, such as low-template DNA, analysts aim to maximize information while minimizing noise, often by adjusting the analytical threshold (AT) for optimal results. A potential approach involves calculating the AT based on the baseline signal distribution in electrophoresis results. This study investigates the impact of reagent kits, testing quarters, environmental conditions, and amplification cycles on baseline signals using historical records and experimental data on low-template DNA. Variations in these aspects contribute to differences in baseline signal patterns. Analysts should remain vigilant regarding routine instrument maintenance and reagent replacement, as these may affect baseline signals. Prompt analysis of baseline status and tailored adjustments to ATs under specific laboratory conditions are advised. A comparative analysis of published methods for calculating the optimal AT from a negative signal distribution highlighted the efficiency of utilizing baseline signals to enhance forensic genetic analysis, with the exception of extremely low-template samples and high-amplification cycles. Moreover, a user-friendly program for real-time analysis was developed, enabling prompt adjustments to ATs based on negative control profiles. In conclusion, this study provides insights into baseline signals, aiming to enhance genetic analysis accuracy across diverse laboratories. Practical recommendations are offered for optimizing ATs in forensic DNA analysis.
Collapse
Affiliation(s)
- Dezhi Chen
- Department of Forensic Genetics, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu 610041, China; (D.C.); (M.T.)
| | - Mengyu Tan
- Department of Forensic Genetics, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu 610041, China; (D.C.); (M.T.)
| | - Jiaming Xue
- Department of Forensic Genetics, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu 610041, China; (D.C.); (M.T.)
| | - Mengna Wu
- Department of Forensic Genetics, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu 610041, China; (D.C.); (M.T.)
| | - Jinlong Song
- Department of Forensic Genetics, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu 610041, China; (D.C.); (M.T.)
| | - Qiushuo Wu
- Department of Forensic Genetics, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu 610041, China; (D.C.); (M.T.)
| | - Guihong Liu
- Department of Forensic Genetics, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu 610041, China; (D.C.); (M.T.)
| | - Yazi Zheng
- Department of Forensic Genetics, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu 610041, China; (D.C.); (M.T.)
| | - Yuanyuan Xiao
- Department of Forensic Genetics, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu 610041, China; (D.C.); (M.T.)
| | - Meili Lv
- Department of Immunology, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu 610041, China
| | - Miao Liao
- Department of Forensic Genetics, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu 610041, China; (D.C.); (M.T.)
- West China Forensics Center, Sichuan University, No. 16, Section 3, Renmin South Road, Wuhou District, Chengdu 610041, China
| | - Shengqiu Qu
- West China Forensics Center, Sichuan University, No. 16, Section 3, Renmin South Road, Wuhou District, Chengdu 610041, China
| | - Weibo Liang
- Department of Forensic Genetics, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu 610041, China; (D.C.); (M.T.)
| |
Collapse
|
3
|
Taylor D, Abarno D. A lights-out forensic DNA analysis workflow for no-suspect crime. Forensic Sci Int Genet 2023; 66:102907. [PMID: 37379740 DOI: 10.1016/j.fsigen.2023.102907] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 06/13/2023] [Accepted: 06/15/2023] [Indexed: 06/30/2023]
Abstract
An automated system of DNA profile processing (termed a 'lights-out' workflow) was trialled for no-suspect cases over a three-month period at Forensic Science SA (FSSA). The lights-out workflow utilised automated DNA profile reading using the neural network reading feature in FaSTR™ DNA with no analytical threshold. The profile information from FaSTR™ DNA was then processed in STRmix™ using a top-down analysis and automatically compared to a de-identified South Australian searchable DNA database. Computer scripts were used to generate link reports and upload reports and these were compared to the links and uploads that were obtained for the cases during their standard processing within the laboratory. The results of the lights-out workflow was an increase in both uploads and links compared to the standard workflow, with minimal adventitious links or erroneous uploads. Overall, the proof-of-concept study shows the potential for using automated DNA profile reading and top-down analysis to improve workflow efficiency in a no-suspect workflow.
Collapse
Affiliation(s)
- Duncan Taylor
- Forensic Science SA, Adelaide, Australia; Flinders University, Adelaide, Australia.
| | - Damien Abarno
- Forensic Science SA, Adelaide, Australia; Flinders University, Adelaide, Australia
| |
Collapse
|
4
|
Taylor D, Buckleton J. Combining artificial neural network classification with fully continuous probabilistic genotyping to remove the need for an analytical threshold and electropherogram reading. Forensic Sci Int Genet 2023; 62:102787. [PMID: 36270165 DOI: 10.1016/j.fsigen.2022.102787] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Revised: 09/22/2022] [Accepted: 10/05/2022] [Indexed: 12/14/2022]
Abstract
Standard processing of electrophoretic data within a forensic DNA laboratory is for one (or two) analysts to designate peaks as either artefactual or non-artefactual in a process commonly referred to as profile 'reading'. Recently, FaSTR™ DNA has been developed to use artificial neural networks to automatically classify fluorescence within an electropherogram as baseline, allele, stutter or pull-up. These classifications are based on probabilities assigned to each timepoint (scan) within the electropherogram. Instead of using the probabilities to assign fluorescence into a category they can be used directly in the profile analysis. This has a number of advantages; increased objectivity in DNA profile processing, the removal for the need for analysts to read profiles, the removal for the need of an analytical threshold. Models within STRmix™ were extended to incorporate the peak label probabilities assigned by FaSTR™ DNA. The performance of the model extensions was tested on a DNA mixture dataset, comprising 2-4 person samples. This dataset was processed in a 'standard' manner using an analytical threshold of 50rfu, analyst peak designations and STRmix™ V2.9 models. The same dataset was then processed in an automated manner using no analytical threshold, no analysts reading the profile and using the STRmix™ models extended to incorporate peak label probabilities. Both datasets were compared to the known DNA donors and a set of non-donors. The result between the two processes was a very close performance, but with a large efficiency gain in the 0rfu process. Utilising peak label probabilities opens up the possibility for a range of workflow process efficiency gains, but beyond this allows full use of all data within an electropherogram.
Collapse
Affiliation(s)
- Duncan Taylor
- Forensic Science SA, GPO Box 2790, Adelaide, SA 5001, Australia; School of Biological Sciences, Flinders University, GPO Box 2100, Adelaide, SA 5001, Australia.
| | - John Buckleton
- Institute of Environmental Science and Research Limited, Private Bag 92021, Auckland 1142, New Zealand; University of Auckland, Department of Statistics, Auckland, New Zealand
| |
Collapse
|
5
|
Taylor D. Using a multi-head, convolutional neural network with data augmentation to improve electropherogram classification performance. Forensic Sci Int Genet 2021; 56:102605. [PMID: 34688114 DOI: 10.1016/j.fsigen.2021.102605] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Revised: 10/07/2021] [Accepted: 10/07/2021] [Indexed: 11/30/2022]
Abstract
DNA profiles are generated in forensic biology laboratories around the world. It is possible that these profiles are assessed by two independent people in order for the profiles to be 'read'. Recent work has been carried out to develop a neural network model to classify fluorescence in a DNA profile electropherogram and potentially replace one, or both human readers. The ability to use neural networks for this function has been programmed into the software FaSTR™ DNA, which has been validated for use in at least one laboratory in Australia. The work that previously developed a neural network system had a number of limitations, specifically it was computer intensive, did not make the best use of available data, and consequently the performance of this model was sub-optimal in some conditions (particularly for low-intensity peaks). In the current work a new neural network model is developed that makes various improvements on the old model, by using convolutional layers, a multi-head architecture and data augmentation. Results indicate that an improved performance can be expected for low-intensity profiles.
Collapse
Affiliation(s)
- Duncan Taylor
- Forensic Science South Australia, 21 Divett Place, Adelaide, SA 5000, Australia; Flinders University, GPO Box 2100, Adelaide, SA 5001, Australia.
| |
Collapse
|
6
|
Volgin L, Taylor D, Bright JA, Lin MH. Validation of a neural network approach for STR typing to replace human reading. Forensic Sci Int Genet 2021; 55:102591. [PMID: 34530398 DOI: 10.1016/j.fsigen.2021.102591] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 07/28/2021] [Accepted: 09/03/2021] [Indexed: 10/20/2022]
Abstract
A typical forensic laboratory process for interpreting STR capillary electrophoresis profile data is for two people to independently 'read' the profiles, compare results, and resolve any differences. Recently, work has been conducted to develop a machine learning tool called an artificial neural network (ANN) to carry out the same function as a human profile reader, by classifying areas of fluorescence in the capillary electrophoresis profile raw signal data. The ANN approach has been embedded in commercial software FaSTR™ DNA to read GlobalFiler™ DNA profiles. The ANN feature of FaSTR™ DNA was investigated during validation at Forensic Science South Australia (FSSA) to determine whether one of the human profile readers could be replaced by an ANN reader. FaSTR™ DNA accuracy in detecting allele peaks in reference profiles was 99.7% and was deemed high enough that a one-reader workflow could be implemented into the reference reading workflow at FSSA.
Collapse
Affiliation(s)
- Luke Volgin
- Forensic Science SA, PO Box 2790, Adelaide, SA 5000, Australia.
| | - Duncan Taylor
- Forensic Science SA, PO Box 2790, Adelaide, SA 5000, Australia; College of Science and Engineering, Flinders University, GPO Box 2100, Adelaide, SA 5001, Australia
| | - Jo-Anne Bright
- Institute of Environmental Science and Research Limited, Private Bag 92021, Auckland 1142, New Zealand
| | - Meng-Han Lin
- Institute of Environmental Science and Research Limited, Private Bag 92021, Auckland 1142, New Zealand
| |
Collapse
|
7
|
Lin MH, Lee SI, Zhang X, Russell L, Kelly H, Cheng K, Cooper S, Wivell R, Kerr Z, Morawitz J, Bright JA. Developmental validation of FaSTR™ DNA: Software for the analysis of forensic DNA profiles. FORENSIC SCIENCE INTERNATIONAL: REPORTS 2021. [DOI: 10.1016/j.fsir.2021.100217] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022] Open
|
8
|
Comparison of Diagnosis Accuracy between a Backpropagation Artificial Neural Network Model and Linear Regression in Digestive Disease Patients: an Empirical Research. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:6662779. [PMID: 33727951 PMCID: PMC7937476 DOI: 10.1155/2021/6662779] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 12/10/2020] [Accepted: 02/18/2021] [Indexed: 02/08/2023]
Abstract
Introduction A Noninvasive diagnosis model for digestive diseases is the vital issue for the current clinical research. Our systematic review is aimed at demonstrating diagnosis accuracy between the BP-ANN algorithm and linear regression in digestive disease patients, including their activation function and data structure. Methods We reported the systematic review according to the PRISMA guidelines. We searched related articles from seven electronic scholarly databases for comparison of the diagnosis accuracy focusing on BP-ANN and linear regression. The characteristics, patient number, input/output marker, diagnosis accuracy, and results/conclusions related to comparison were extracted independently based on inclusion criteria. Results Nine articles met all the criteria and were enrolled in our review. Of those enrolled articles, the publishing year ranged from 1991 to 2017. The sample size ranged from 42 to 3222 digestive disease patients, and all of the patients showed comparable biomarkers between the BP-ANN algorithm and linear regression. According to our study, 8 literature demonstrated that the BP-ANN model is superior to linear regression in predicting the disease outcome based on AUROC results. One literature reported linear regression to be superior to BP-ANN for the early diagnosis of colorectal cancer. Conclusion The BP-ANN algorithm and linear regression both had high capacity in fitting the diagnostic model and BP-ANN displayed more prediction accuracy for the noninvasive diagnosis model of digestive diseases. We compared the activation functions and data structure between BP-ANN and linear regression for fitting the diagnosis model, and the data suggested that BP-ANN was a comprehensive recommendation algorithm.
Collapse
|
9
|
Nistal-Nuño B. A neural network for prediction of risk of nosocomial infection at intensive care units: a didactic preliminary model. EINSTEIN-SAO PAULO 2020; 18:eAO5480. [PMID: 33237246 PMCID: PMC7664827 DOI: 10.31744/einstein_journal/2020ao5480] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Accepted: 06/24/2020] [Indexed: 11/08/2022] Open
Abstract
OBJECTIVE To propose a preliminary artificial intelligence model, based on artificial neural networks, for predicting the risk of nosocomial infection at intensive care units. METHODS An artificial neural network is designed that employs supervised learning. The generation of the datasets was based on data derived from the Japanese Nosocomial Infection Surveillance system. It is studied how the Java Neural Network Simulator learns to categorize these patients to predict their risk of nosocomial infection. The simulations are performed with several backpropagation learning algorithms and with several groups of parameters, comparing their results through the sum of the squared errors and mean errors per pattern. RESULTS The backpropagation with momentum algorithm showed better performance than the backpropagation algorithm. The performance improved with the xor. README file parameter values compared to the default parameters. There were no failures in the categorization of the patients into their risk of nosocomial infection. CONCLUSION While this model is still based on a synthetic dataset, the excellent performance observed with a small number of patterns suggests that using higher numbers of variables and network layers to analyze larger volumes of data can create powerful artificial neural networks, potentially capable of precisely anticipating nosocomial infection at intensive care units. Using a real database during the simulations has the potential to realize the predictive ability of this model.
Collapse
Affiliation(s)
- Beatriz Nistal-Nuño
- Complejo Hospitalario Universitario de Santiago de Compostela, Santiago de Compostela, Spain
| |
Collapse
|
10
|
Taylor D, Kitselaar M, Powers D. The generalisability of artificial neural networks used to classify electrophoretic data produced under different conditions. Forensic Sci Int Genet 2018; 38:181-184. [PMID: 30419517 DOI: 10.1016/j.fsigen.2018.10.019] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2018] [Revised: 10/02/2018] [Accepted: 10/31/2018] [Indexed: 11/26/2022]
Abstract
Previous work has shown that artificial neural networks can be used to classify signal in an electropherogram into categories that have interpretational meaning (such as allele, baseline, pull-up or stutter). The previous work trained the neural networks on a single data type, produced under a single laboratory condition and applied it to data that was matched in these factors. In this work we investigate the ability of neural networks to be trained on data of different types (i.e. single sourced profiles or mixed DNA profiles) and from different laboratory conditions (specifically the model of electrophoresis instrument) to determine whether a set of neural networks is required for each different type of data produced or whether a single neural network can be used for a broad range of data and still achieve the same level of performance. The results of our study have implications as to how a laboratory would choose to train and apply neural networks to classify data in electropherograms produced in their laboratory.
Collapse
Affiliation(s)
- Duncan Taylor
- Forensic Science South Australia, GPO Box 2790, Adelaide SA 5001, Australia; Flinders University, GPO Box 2100, Adelaide, SA, 5001, Australia.
| | | | - David Powers
- Flinders University, GPO Box 2100, Adelaide, SA, 5001, Australia
| |
Collapse
|
11
|
Marciano MA, Williamson VR, Adelman JD. A hybrid approach to increase the informedness of CE-based data using locus-specific thresholding and machine learning. Forensic Sci Int Genet 2018; 35:26-37. [DOI: 10.1016/j.fsigen.2018.03.017] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2017] [Revised: 03/30/2018] [Accepted: 03/30/2018] [Indexed: 11/26/2022]
|