1
|
Inokuchi S, Nakanishi H, Takada A, Saito K. Uncertainty in the number of contributor estimation methods applied to a Y-STR profile. Forensic Sci Int Genet 2025; 74:103145. [PMID: 39288689 DOI: 10.1016/j.fsigen.2024.103145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 09/01/2024] [Accepted: 09/05/2024] [Indexed: 09/19/2024]
Abstract
Maximum allele count (MAC) and total allele count (TAC) methods are widely used for estimating the number of contributors (NoC) of autosomal short tandem repeat (STR) profile in many forensic laboratories. In this study, we applied NoC estimation methods to mixed Y-STR profiles and evaluated its uncertainty and performance. For the MAC method, as recent Y-STR typing kits involve single- and multi-copy loci, we defined "MAC-single" for use across only single-copy loci and "MAC-multi" for use across only multi-copy loci. We generated a dataset containing 120,000 Y-STR profiles for a one to six-person mixture in silico based on previously reported haplotype frequencies of 27 Y-STR loci in Yfiler Plus for the U.S. population (reported by NIST) and the Henan Han population. The dataset was randomly split into a training set and a test set. The training set was used to construct a TAC distribution (TAC curve), whereas the test set was used to calculate the performance metrics (accuracy, precision, recall, and F1-score). In addition, the effect of the upper limit of NoC considered for estimation on overall accuracy was evaluated. The overall accuracies of MAC-single, MAC-multi, and TAC methods when the upper limit of NoC was set to six-person were 0.7920, 0.4329, and 0.7877 for the U.S. population and 0.8207, 0.4609, and 0.8385 for the Henan Han population. Our results suggest that the MAC-single and TAC methods can estimate the NoC for mixed Y-STR profiles with high levels of accuracy.
Collapse
Affiliation(s)
- Shota Inokuchi
- Department of Forensic Medicine, Graduate School of Medicine, Juntendo University, 2-1-1 Hongo, Bunkyo-ku, Tokyo, Japan; Forensic Science Laboratory, Tokyo Metropolitan Police Department, 3-35-21 Shakujiidai, Nerima-ku, Tokyo, Japan.
| | - Hiroaki Nakanishi
- Department of Forensic Medicine, Graduate School of Medicine, Juntendo University, 2-1-1 Hongo, Bunkyo-ku, Tokyo, Japan
| | - Aya Takada
- Department of Forensic Medicine, Saitama Medical University, 38 Moroyamamachimorohongo, Saitama, Japan
| | - Kazuyuki Saito
- Department of Forensic Medicine, Graduate School of Medicine, Juntendo University, 2-1-1 Hongo, Bunkyo-ku, Tokyo, Japan; Department of Forensic Medicine, Saitama Medical University, 38 Moroyamamachimorohongo, Saitama, Japan
| |
Collapse
|
2
|
Huang Y, Wang M, Liu C, He G. Comprehensive landscape of non-CODIS STRs in global populations provides new insights into challenging DNA profiles. Forensic Sci Int Genet 2024; 70:103010. [PMID: 38271830 DOI: 10.1016/j.fsigen.2024.103010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 01/13/2024] [Accepted: 01/14/2024] [Indexed: 01/27/2024]
Abstract
The worldwide implementation of short tandem repeats (STR) profiles in forensic genetics necessitated establishing and expanding the CODIS core loci set to facilitated efficient data management and exchange. Currently, the mainstay CODIS STRs are adopted in most general-purpose forensic kits. However, relying solely on these loci failed to yield satisfactory results for challenging tasks, such as bio-geographical ancestry inference, complex DNA mixture profile interpretation, and distant kinship analysis. In this context, non-CODIS STRs are potent supplements to enhance the systematic discriminating power, particularly when combined with the high-throughput next-generation sequencing (NGS) technique. Nevertheless, comprehensive evaluation on non-CODIS STRs in diverse populations was scarce, hindering their further application in routine caseworks. To address this gap, we investigated genetic variations of 178 historically available non-CODIS STRs from ethnolinguistically different worldwide populations and studied their characteristics and forensic potentials via high-coverage whole genome sequencing (WGS) data. Initially, we delineated the genomic properties of these non-CODIS markers through sequence searching, repeat structure scanning, and manual inspection. Subsequent population genetics analysis suggested that these non-CODIS STRs had comparable polymorphism levels and forensic utility to CODIS STRs. Furthermore, we constructed a theoretical next-generation sequencing (NGS) panel comprising 108 STRs (20 CODIS STRs and 88 non-CODIS STRs), and evaluated its performance in inferring bio-geographical ancestry origins, deconvoluting complex DNA mixtures, and differentiating distant kinships using real and simulated datasets. Our findings demonstrated that incorporating supplementary non-CODIS STRs enabled the extrapolation of multidimensional information from a single STR profile, thereby facilitating the analysis of challenging forensic tasks. In conclusion, this study presents an extensive genomic landscape of forensic non-CODIS STRs among global populations, and emphasized the imperative inclusion of additional polymorphic non-CODIS STRs in future NGS-based forensic systems.
Collapse
Affiliation(s)
- Yuguo Huang
- Institute of Rare Diseases, West China Hospital of Sichuan University, Sichuan University, Chengdu 610041, China.
| | - Mengge Wang
- Institute of Rare Diseases, West China Hospital of Sichuan University, Sichuan University, Chengdu 610041, China
| | - Chao Liu
- Anti-Drug Technology Center of Guangdong Province, Guangzhou 510230, China; Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China.
| | - Guanglin He
- Institute of Rare Diseases, West China Hospital of Sichuan University, Sichuan University, Chengdu 610041, China; Center for Archaeological Science, Sichuan University, Chengdu 610000, China.
| |
Collapse
|
3
|
Barash M, McNevin D, Fedorenko V, Giverts P. Machine learning applications in forensic DNA profiling: A critical review. Forensic Sci Int Genet 2024; 69:102994. [PMID: 38086200 DOI: 10.1016/j.fsigen.2023.102994] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Revised: 11/06/2023] [Accepted: 11/26/2023] [Indexed: 01/29/2024]
Abstract
Machine learning (ML) is a range of powerful computational algorithms capable of generating predictive models via intelligent autonomous analysis of relatively large and often unstructured data. ML has become an integral part of our daily lives with a plethora of applications, including web, business, automotive industry, clinical diagnostics, scientific research, and more recently, forensic science. In the field of forensic DNA, the manual analysis of complex data can be challenging, time-consuming, and error-prone. The integration of novel ML-based methods may aid in streamlining this process while maintaining the high accuracy and reproducibility required for forensic tools. Due to the relative novelty of such applications, the forensic community is largely unaware of ML capabilities and limitations. Furthermore, computer science and ML professionals are often unfamiliar with the forensic science field and its specific requirements. This manuscript offers a brief introduction to the capabilities of machine learning methods and their applications in the context of forensic DNA analysis and offers a critical review of the current literature in this rapidly developing field.
Collapse
Affiliation(s)
- Mark Barash
- Department of Justice Studies, San José State University, San Jose, CA, United States; Centre for Forensic Science, School of Mathematical and Physical Sciences, Faculty of Science, University of Technology Sydney, Broadway, Ultimo, NSW 2007, Australia.
| | - Dennis McNevin
- Centre for Forensic Science, School of Mathematical and Physical Sciences, Faculty of Science, University of Technology Sydney, Broadway, Ultimo, NSW 2007, Australia
| | - Vladimir Fedorenko
- The Educational and Scientific Laboratory of Forensic Materials Engineering of the Saratov State University, Russia
| | - Pavel Giverts
- Division of Identification and Forensic Science, Israel Police HQ, Haim Bar-Lev Road, Jerusalem, Israel
| |
Collapse
|
4
|
Wang H, Zhu Q, Huang Y, Cao Y, Hu Y, Wei Y, Wang Y, Hou T, Shan T, Dai X, Zhang X, Wang Y, Zhang J. Using simulated microhaplotype genotyping data to evaluate the value of machine learning algorithms for inferring DNA mixture contributor numbers. Forensic Sci Int Genet 2024; 69:103008. [PMID: 38244524 DOI: 10.1016/j.fsigen.2024.103008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 12/01/2023] [Accepted: 01/05/2024] [Indexed: 01/22/2024]
Abstract
Inferring the number of contributors (NoC) is a crucial step in interpreting DNA mixtures, as it directly affects the accuracy of the likelihood ratio calculation and the assessment of evidence strength. However, obtaining the correct NoC in complex DNA mixtures remains challenging due to the high degree of allele sharing and dropout. This study aimed to analyze the impact of allele sharing and dropout on NoC inference in complex DNA mixtures when using microhaplotypes (MH). The effectiveness and value of highly polymorphic MH for NoC inference in complex DNA mixtures were evaluated through comparing the performance of three NoC inference methods, including maximum allele count (MAC) method, maximum likelihood estimation (MLE) method, and random forest classification (RFC) algorithm. In this study, we selected the top 100 most polymorphic MH from the Southern Han Chinese (CHS) population, and simulated over 40 million complex DNA mixture profiles with the NoC ranging from 2 to 8. These profiles involve unrelated individuals (RM type) and related pairs of individuals, including parent-offspring pairs (PO type), full-sibling pairs (FS type), and second-degree kinship pairs (SE type). Our results indicated that how the number of detected alleles in DNA mixture profiles varied with the markers' polymorphism, kinship's involvement, NoC, and dropout settings. Across different types of DNA mixtures, the MAC and MLE methods performed best in the RM type, followed by SE, FS, and PO types, while RFC models showed the best performance in the PO type, followed by RM, SE, and FS types. The recall of all three methods for NoC inference were decreased as the NoC and dropout levels increased. Furthermore, the MLE method performed better at low NoC, whereas RFC models excelled at high NoC and/or high dropout levels, regardless of the availability of a priori information about related pairs of individuals in DNA mixtures. However, the RFC models which considered the aforementioned priori information and were trained specifically on each type of DNA mixture profiles, outperformed RFC_ALL model that did not consider such information. Finally, we provided recommendations for model building when applying machine learning algorithms to NoC inference.
Collapse
Affiliation(s)
- Haoyu Wang
- West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, China
| | - Qiang Zhu
- West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, China
| | - Yuguo Huang
- West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, China
| | - Yueyan Cao
- West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, China
| | - Yuhan Hu
- West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, China
| | - Yifan Wei
- West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, China
| | - Yuting Wang
- West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, China
| | - Tingyun Hou
- West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, China
| | - Tiantian Shan
- West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, China
| | - Xuan Dai
- West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, China
| | - Xiaokang Zhang
- West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, China
| | - Yufang Wang
- West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, China.
| | - Ji Zhang
- West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, China.
| |
Collapse
|
5
|
Zhang N, Shi S, Lin S, Bai Z, Ling X, Gao J, Yan R, Ou X. Application of SNPs with low minor allele frequencies in missing person identification (MPI) through kinship analysis of DNA mixtures. Electrophoresis 2023; 44:1569-1578. [PMID: 37454302 DOI: 10.1002/elps.202300111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 06/18/2023] [Accepted: 07/07/2023] [Indexed: 07/18/2023]
Abstract
The need to identify a missing person (MP) through kinship analysis of DNA samples found at a crime scene has become increasingly prevalent. DNA samples from MPs can be severely degraded, contain little DNA and mixed with other contributors, which often makes it difficult to apply conventional methods in practice. This study developed a massively parallel sequencing-based panel that contains 1661 single-nucleotide polymorphisms (SNPs) with low minor allele frequencies (MAFs) (averaged at 0.0613) in the Chinese Han population, and the strategy for relationship inference from DNA mixtures comprising different numbers of contributors (NOCs) and of varying allele dropout probabilities. Based on the simulated dataset and genotyping results of 42 artificial DNA mixtures (NOC = 2-4), it was observed that the present SNP panel was sufficient for balanced mixtures when referenced to the closest relatives (parents/offspring and full siblings). When the mixture profiles suffered from dropout, incorrect assignments were markedly associated with relatedness, NOC and the dropout level. We, therefore, indicate that SNPs with low MAFs could be reliably interpreted for MP identification through the kinship analysis of complex DNA mixtures. Further studies should be extended to more possible scenarios to test the feasibility of this present approach.
Collapse
Affiliation(s)
- Nan Zhang
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, P. R. China
- Guangdong Province Translational Forensic Medicine Engineering Technology Research Center, Sun Yat-sen University, Guangzhou, P. R. China
| | - Shanshan Shi
- Fetal Medicine Department, The First Affiliated Hospital of Jinan University, Guangzhou, P. R. China
| | - Shaobin Lin
- Fetal Medicine Center, Department of Obstetrics and Gynecology, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, P. R. China
| | - Zhaochen Bai
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, P. R. China
- Guangdong Province Translational Forensic Medicine Engineering Technology Research Center, Sun Yat-sen University, Guangzhou, P. R. China
| | - Xiaohua Ling
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, P. R. China
- Guangdong Province Translational Forensic Medicine Engineering Technology Research Center, Sun Yat-sen University, Guangzhou, P. R. China
| | - Jun Gao
- Reproductive Medicine Center, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, P. R. China
| | - Ruiling Yan
- Fetal Medicine Department, The First Affiliated Hospital of Jinan University, Guangzhou, P. R. China
| | - Xueling Ou
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, P. R. China
- Guangdong Province Translational Forensic Medicine Engineering Technology Research Center, Sun Yat-sen University, Guangzhou, P. R. China
| |
Collapse
|
6
|
Kruijver M, Bright JA. A tool for simulating single source and mixed DNA profiles. Forensic Sci Int Genet 2022; 60:102746. [PMID: 35843122 DOI: 10.1016/j.fsigen.2022.102746] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 06/22/2022] [Accepted: 07/05/2022] [Indexed: 11/04/2022]
Abstract
Simulation studies play an important role in the study of probabilistic genotyping systems, as a low cost and fast alternative to in vitro studies. With ongoing calls for further study of the behaviour of probabilistic genotyping systems, there is a continuous need for such studies. In most cases, researchers use simplified models, for example ignoring complexities such as peak height variability due to lack of availability of advanced tools. We fill this void and describe a tool that can simulate DNA profiles in silico for the validation and investigation of probabilistic genotyping software. Contributor genotypes are simulated by randomly sampling alleles from selected allele frequencies. Some or all contributors may be related to a pedigree and the genotypes of non-founders are obtained by random gene dropping. The number of contributors per profile, and ranges for parameters such as DNA template amount and degradation parameters can be configured. Peak height variability is modelled using a lognormal distribution or a gamma distribution. Profile behaviour of simulated profiles is shown to be broadly similar to laboratory generated profiles though the latter shows more variation. Simulation studies do not remove the need for experimental data. The tool has been made available as an R-package named simDNAmixtures.
Collapse
|
7
|
Kelly H, Coble M, Kruijver M, Wivell R, Bright JA. Exploring likelihood ratios assigned for siblings of the true mixture contributor as an alternate contributor. J Forensic Sci 2022; 67:1167-1175. [PMID: 35211970 DOI: 10.1111/1556-4029.15020] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Revised: 01/23/2022] [Accepted: 02/14/2022] [Indexed: 11/30/2022]
Abstract
Relatives tend to have more DNA in common than unrelated people. The closer the biological relationship, the higher the chance of alleles being identical by descent between the individuals. Therefore, when considering a mixed DNA profile, close relatives of the true contributor may not always be excluded as a possible contributor to a mixture due to allele sharing. In these situations, it might be more appropriate under the alternate proposition to consider that the DNA could have originated from a relative of the person of interest rather than an unrelated individual. The probabilistic genotyping software STRmix™ automatically provides LRs considering close biological relatives as alternate sources of the DNA. In this paper, we investigate the support for siblings of the true contributor to a mixture (who are not present in the mixture themselves). We interpret the mixtures and assign LRs using STRmix™ and investigate whether the resulting LRs could be used to indicate whether the true contributor could be a sibling of the POI. Most siblings will have one or more alleles that are not observed in the mixture profile. Support for siblings to have contributed can only occur when allelic dropout is a possibility at the loci where the siblings have alleles that are not observed in the profile. In these data, that was only observed in components with assigned template of 588 rfu or less.
Collapse
Affiliation(s)
- Hannah Kelly
- Institute of Environmental Science and Research Limited, Auckland, New Zealand
| | - Michael Coble
- Center for Human Identification, Department of Microbiology, Immunology, and Genetics, University of North Texas Health Science Center, Fort Worth, Texas, USA
| | - Maarten Kruijver
- Institute of Environmental Science and Research Limited, Auckland, New Zealand
| | - Richard Wivell
- Institute of Environmental Science and Research Limited, Auckland, New Zealand
| | - Jo-Anne Bright
- Institute of Environmental Science and Research Limited, Auckland, New Zealand
| |
Collapse
|
8
|
TAWSEEM: A Deep-Learning-Based Tool for Estimating the Number of Unknown Contributors in DNA Profiling. ELECTRONICS 2022. [DOI: 10.3390/electronics11040548] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
DNA profiling involves the analysis of sequences of an individual or mixed DNA profiles to identify the persons that these profiles belong to. A critically important application of DNA profiling is in forensic science to identify criminals by finding a match between their blood samples and the DNA profile found on the crime scene. Other applications include paternity tests, disaster victim identification, missing person investigations, and mapping genetic diseases. A crucial task in DNA profiling is the determination of the number of contributors in a DNA mixture profile, which is challenging due to issues that include allele dropout, stutter, blobs, and noise in DNA profiles; these issues negatively affect the estimation accuracy and the computational complexity. Machine-learning-based methods have been applied for estimating the number of unknowns; however, there is limited work in this area and many more efforts are required to develop robust models and their training on large and diverse datasets. In this paper, we propose and develop a software tool called TAWSEEM that employs a multilayer perceptron (MLP) neural network deep learning model for estimating the number of unknown contributors in DNA mixture profiles using PROVEDIt, the largest publicly available dataset. We investigate the performance of our developed deep learning model using four performance metrics, namely accuracy, F1-score, recall, and precision. The novelty of our tool is evident in the fact that it provides the highest accuracy (97%) compared to any existing work on the most diverse dataset (in terms of the profiles, loci, multiplexes, etc.). We also provide a detailed background on the DNA profiling and literature review, and a detailed account of the deep learning tool development and the performance investigation of the deep learning method.
Collapse
|
9
|
Noël J, Noël S, Mailly F, Granger D, Lefebvre JF, Milot E, Séguin D. Total allele count distribution (TAC curves) improves number of contributor estimation for complex DNA mixtures. CANADIAN SOCIETY OF FORENSIC SCIENCE JOURNAL 2022. [DOI: 10.1080/00085030.2022.2028359] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Josée Noël
- Laboratoire de Sciences Judiciaires et de Médecine Légale, Montréal, Québec, Canada
| | - Sarah Noël
- Laboratoire de Sciences Judiciaires et de Médecine Légale, Montréal, Québec, Canada
| | - France Mailly
- Laboratoire de Sciences Judiciaires et de Médecine Légale, Montréal, Québec, Canada
| | - Dominic Granger
- Laboratoire de Sciences Judiciaires et de Médecine Légale, Montréal, Québec, Canada
| | | | - Emmanuel Milot
- Laboratoire de Recherche en Criminalistique, Department of Chemistry, Biochemistry and Physics and Centre International de Criminologie Comparée, Université du Québec à Trois-Rivières, Trois-Rivières, Québec, Canada
| | - Diane Séguin
- Laboratoire de Sciences Judiciaires et de Médecine Légale, Montréal, Québec, Canada
| |
Collapse
|
10
|
Yang J, Chen J, Ji Q, Yu Y, Li K, Kong X, Xie S, Zhan W, Mao Z, Yu Y, Li D, Chen P, Chen F. A highly polymorphic panel of 40-plex microhaplotypes for the Chinese Han population and its application in estimating the number of contributors in DNA mixtures. Forensic Sci Int Genet 2021; 56:102600. [PMID: 34688115 DOI: 10.1016/j.fsigen.2021.102600] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Revised: 08/29/2021] [Accepted: 10/04/2021] [Indexed: 12/11/2022]
Abstract
Microhaplotypes (MHs) have great potential in multiple forensic applications and have proven to be promising markers in complex DNA mixture analysis. In this study, we developed a multiplex panel of 40 highly polymorphic MHs for the Chinese Han population, evaluated its forensic values, and explored its application in predicting the number of contributors (NOCs) in DNA mixtures. The panel consisted of 20 newly proposed loci and 20 previously reported loci with lengths spanning less than 120 bp. The average effective number of alleles (Ae) was 3.77, and the cumulative matching probability (CMP) and the cumulative power of exclusion (CPE) reached 1.2E-37 and 1-2.1E-12, respectively, in the Chinese Han population from the 1000 Genomes Project. Further validation on 150 Chinese Han individuals showed that Ae ranged from 2.62 to 4.41 with a mean value of 3.61, and CMP and CPE were 3.61E-36 and 1-1.84E-12, respectively, indicating that this panel was informative for personal identification and paternity testing in the studied population. To estimate NOC in DNA mixtures, we developed a machine learning model based on this panel. As a result, the accuracies in artificial DNA mixtures reached 95.24% for 2- to 4-person mixtures and 83.33% for 2- to 6-person mixtures. Furthermore, the NOC estimation on simulated profiles with allele dropout showed that this panel was still robust under slight dropout. In conclusion, this panel has value for forensic identification and NOC estimation of DNA mixtures.
Collapse
Affiliation(s)
- Jiawen Yang
- Department of Forensic Medicine, Nanjing Medical University, Nanjing, Jiangsu, 211166, PR China
| | - Ji Chen
- Department of Forensic Medicine, Nanjing Medical University, Nanjing, Jiangsu, 211166, PR China
| | - Qiang Ji
- Department of Forensic Medicine, Nanjing Medical University, Nanjing, Jiangsu, 211166, PR China
| | - Youjia Yu
- Department of Forensic Medicine, Nanjing Medical University, Nanjing, Jiangsu, 211166, PR China
| | - Kai Li
- Department of Forensic Medicine, Nanjing Medical University, Nanjing, Jiangsu, 211166, PR China
| | - Xiaochao Kong
- Department of Forensic Medicine, Nanjing Medical University, Nanjing, Jiangsu, 211166, PR China
| | - Sumei Xie
- Department of Forensic Medicine, Nanjing Medical University, Nanjing, Jiangsu, 211166, PR China
| | - Wenxuan Zhan
- Department of Forensic Medicine, Nanjing Medical University, Nanjing, Jiangsu, 211166, PR China
| | - Zhengsheng Mao
- Department of Forensic Medicine, Nanjing Medical University, Nanjing, Jiangsu, 211166, PR China
| | - Yanfang Yu
- Department of Forensic Medicine, Nanjing Medical University, Nanjing, Jiangsu, 211166, PR China
| | - Ding Li
- Department of Forensic Medicine, Nanjing Medical University, Nanjing, Jiangsu, 211166, PR China
| | - Peng Chen
- Department of Forensic Medicine, Nanjing Medical University, Nanjing, Jiangsu, 211166, PR China; Department of Surgery, The Ohio State University Wexner Medical Center, Columbus, OH, 43210, USA.
| | - Feng Chen
- Department of Forensic Medicine, Nanjing Medical University, Nanjing, Jiangsu, 211166, PR China; Key Laboratory of Targeted Intervention of Cardiovascular Disease, Collaborative Innovation Center for Cardiovascular Disease Translational Medicine, Nanjing Medical University, Nanjing, Jiangsu, 211166, PR China.
| |
Collapse
|
11
|
Gill P, Benschop C, Buckleton J, Bleka Ø, Taylor D. A Review of Probabilistic Genotyping Systems: EuroForMix, DNAStatistX and STRmix™. Genes (Basel) 2021; 12:1559. [PMID: 34680954 PMCID: PMC8535381 DOI: 10.3390/genes12101559] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Revised: 09/24/2021] [Accepted: 09/28/2021] [Indexed: 11/24/2022] Open
Abstract
Probabilistic genotyping has become widespread. EuroForMix and DNAStatistX are both based upon maximum likelihood estimation using a γ model, whereas STRmix™ is a Bayesian approach that specifies prior distributions on the unknown model parameters. A general overview is provided of the historical development of probabilistic genotyping. Some general principles of interpretation are described, including: the application to investigative vs. evaluative reporting; detection of contamination events; inter and intra laboratory studies; numbers of contributors; proposition setting and validation of software and its performance. This is followed by details of the evolution, utility, practice and adoption of the software discussed.
Collapse
Affiliation(s)
- Peter Gill
- Forensic Genetics Research Group, Department of Forensic Sciences, Oslo University Hospital, 0372 Oslo, Norway;
- Department of Forensic Medicine, Institute of Clinical Medicine, University of Oslo, 0315 Oslo, Norway
| | - Corina Benschop
- Division of Biological Traces, Netherlands Forensic Institute, P.O. Box 24044, 2490 AA The Hague, The Netherlands;
| | - John Buckleton
- Department of Statistics, University of Auckland, Private Bag 92019, Auckland 1142, New Zealand;
- Institute of Environmental Science and Research Limited, Private Bag 92021, Auckland 1142, New Zealand
| | - Øyvind Bleka
- Forensic Genetics Research Group, Department of Forensic Sciences, Oslo University Hospital, 0372 Oslo, Norway;
| | - Duncan Taylor
- Forensic Science SA, GPO Box 2790, Adelaide, SA 5001, Australia;
- School of Biological Sciences, Flinders University, GPO Box 2100, Adelaide, SA 5001, Australia
| |
Collapse
|
12
|
Grgicak CM, Duffy KR, Lun DS. The a posteriori probability of the number of contributors when conditioned on an assumed contributor. Forensic Sci Int Genet 2021; 54:102563. [PMID: 34284325 DOI: 10.1016/j.fsigen.2021.102563] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 06/24/2021] [Accepted: 07/03/2021] [Indexed: 10/20/2022]
Abstract
Forensic DNA signal is notoriously challenging to assess, requiring computational tools to support its interpretation. Over-expressions of stutter, allele drop-out, allele drop-in, degradation, differential degradation, and the like, make forensic DNA profiles too complicated to evaluate by manual methods. In response, computational tools that make point estimates on the Number of Contributors (NOC) to a sample have been developed, as have Bayesian methods that evaluate an A Posteriori Probability (APP) distribution on the NOC. In cases where an overly narrow NOC range is assumed, the downstream strength of evidence may be incomplete insofar as the evidence is evaluated with an inadequate set of propositions. In the current paper, we extend previous work on NOCIt, a Bayesian method that determines an APP on the NOC given an electropherogram, by reporting on an implementation where the user can add assumed contributors. NOCIt is a continuous system that incorporates models of peak height (including degradation and differential degradation), forward and reverse stutter, noise, and allelic drop-out, while being cognizant of allele frequencies in a reference population. When conditioned on a known contributor, we found that the mode of the APP distribution can shift to one greater when compared with the circumstance where no known contributor is assumed, and that occurred most often when the assumed contributor was the minor constituent to the mixture. In a development of a result of Slooten and Caliebe (FSI:G, 2018) that, under suitable assumptions, establishes the NOC can be treated as a nuisance variable in the computation of a likelihood ratio between the prosecution and defense hypotheses, we show that this computation must not only use coincident models, but also coincident contextual information. The results reported here, therefore, illustrate the power of modern probabilistic systems to assess full weights-of-evidence, and to provide information on reasonable NOC ranges across multiple contexts.
Collapse
Affiliation(s)
- Catherine M Grgicak
- Department of Chemistry, Rutgers University, Camden, NJ 08102, USA; Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA.
| | - Ken R Duffy
- Hamilton Institute, Maynooth University, Ireland
| | - Desmond S Lun
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA; Department of Computer Science, Rutgers University, Camden, NJ 08102, USA; Department of Plant Biology, Rutgers University, New Brunswick, NJ 08901, USA
| |
Collapse
|
13
|
Lin MH, Lee SI, Zhang X, Russell L, Kelly H, Cheng K, Cooper S, Wivell R, Kerr Z, Morawitz J, Bright JA. Developmental validation of FaSTR™ DNA: Software for the analysis of forensic DNA profiles. FORENSIC SCIENCE INTERNATIONAL: REPORTS 2021. [DOI: 10.1016/j.fsir.2021.100217] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022] Open
|
14
|
Valtl J, Mönich UJ, Lun DS, Kelley J, Grgicak CM. A series of developmental validation tests for Number of Contributors platforms: Exemplars using NOCIt and a neural network. Forensic Sci Int Genet 2021; 54:102556. [PMID: 34225042 DOI: 10.1016/j.fsigen.2021.102556] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 06/15/2021] [Accepted: 06/16/2021] [Indexed: 10/21/2022]
Abstract
Complex DNA mixtures are challenging to interpret and require computational tools that aid in that interpretation. Recently, several computational methods that estimate the number of contributors (NOC) to a sample have been developed. Unlike analogous tools that interpret profiles and report LRs, NOC tools vary widely in their operational principle where some are Bayesian and others are machine learning tools. Conjunctionally, NOC tools may return a single n estimate, or a distribution on n. This vast array of constructs, coupled with a gap in standardized methods by which to validate NOC systems, warrants an exploration into the measures by which differing NOC systems might be tested for operations. In the current paper, we use two exemplar NOC systems: a probabilistic system named NOCIt, which renders an a posteriori probability (APP) distribution on the number of contributors given an electropherogram and an artificial neural network (ANN). NOCIt is a continuous Bayesian inference system incorporating models of peak height, degradation, differential degradation, forward and reverse stutter, noise and allelic drop-out while considering allele frequencies in a reference population. The ANN is also a continuous method, taking all the same features (barring degradation) into account. Unlike its Bayesian counterpart, it demands substantively more data to parameterize, requiring synthetic data. We explore each system's performance by conducting tests on 214 PROVEDIt mixtures where the limit of detection was 1-copy of DNA. We found that after a lengthy training period of approximately 24 h, the ANN's evaluation process was very fast and perfectly repeatable. In contrast, NOCIt only took a few minutes to train but took tens of minutes to complete each sample and was less repeatable. In addition, it rendered a probability distribution that was more sensitive and specific, affording a reasonable method by which to report all reasonable n that explain the evidence for a given sample. Whatever the method, by acknowledging the inherent differences between NOC systems, we demonstrate that validation constructs will necessarily be guided by the needs of the forensic domain and be dependent upon whether the laboratory seeks to assign a single n or range of n.
Collapse
Affiliation(s)
- Jakob Valtl
- Lehrstuhl für Theoretische Informationstechnik, Technische Universität München, 80333 Munich, Germany
| | - Ullrich J Mönich
- Lehrstuhl für Theoretische Informationstechnik, Technische Universität München, 80333 Munich, Germany
| | - Desmond S Lun
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA; Department of Computer Science, Rutgers University, Camden, NJ 08102, USA; Department of Plant Biology, Rutgers University, New Brunswick, NJ 08901, USA
| | - James Kelley
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA
| | - Catherine M Grgicak
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA; Department of Chemistry, Rutgers University, Camden, NJ 08102, USA.
| |
Collapse
|
15
|
Kruijver M, Taylor D, Bright JA. Evaluating DNA evidence possibly involving multiple (mixed) samples, common donors and related contributors. Forensic Sci Int Genet 2021; 54:102532. [PMID: 34130043 DOI: 10.1016/j.fsigen.2021.102532] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 05/06/2021] [Accepted: 05/07/2021] [Indexed: 11/18/2022]
Abstract
Forensic DNA profiling is used in various circumstances to evaluate support for two competing propositions with the assignment of a likelihood ratio. Many software implementations exist that tackle a range of inference problems spanning identification and relationship testing. We propose a flexible likelihood ratio framework that caters to inference problems in forensic genetics. The framework allows for investigation of the degree of support for the contribution of multiple persons to multiple samples allowing for persons to be related according to a pedigree, including inbred relationships. We explain how a number of routine as well as more complex problems can be treated within this framework.
Collapse
Affiliation(s)
- Maarten Kruijver
- Institute of Environmental Science and Research Limited, Private Bag 92021, Auckland 1142, New Zealand.
| | - Duncan Taylor
- College of Science and Engineering, Flinders University, GPO Box 2100, Adelaide, SA 5001, Australia; Forensic Science SA, GPO Box 2790, Adelaide, SA 5001, Australia
| | - Jo-Anne Bright
- Institute of Environmental Science and Research Limited, Private Bag 92021, Auckland 1142, New Zealand
| |
Collapse
|