1
|
Grgicak CM, Bhembe Q, Slooten K, Sheth NC, Duffy KR, Lun DS. Single-cell investigative genetics: Single-cell data produces genotype distributions concentrated at the true genotype across all mixture complexities. Forensic Sci Int Genet 2024; 69:103000. [PMID: 38199167 DOI: 10.1016/j.fsigen.2023.103000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 11/07/2023] [Accepted: 12/12/2023] [Indexed: 01/12/2024]
Abstract
In the absence of a suspect the forensic aim is investigative, and the focus is one of discerning what genotypes best explain the evidence. In traditional systems, the list of candidate genotypes may become vast if the sample contains DNA from many donors or the information from a minor contributor is swamped by that of major contributors, leading to lower evidential value for a true donor's contribution and, as a result, possibly overlooked or inefficient investigative leads. Recent developments in single-cell analysis offer a way forward, by producing data capable of discriminating genotypes. This is accomplished by first clustering single-cell data by similarity without reference to a known genotype. With good clustering it is reasonable to assume that the scEPGs in a cluster are of a single contributor. With that assumption we determine the probability of a cluster's content given each possible genotype at each locus, which is then used to determine the posterior probability mass distribution for all genotypes by application of Bayes' rule. A decision criterion is then applied such that the sum of the ranked probabilities of all genotypes falling in the set is at least 1-α. This is the credible genotype set and is used to inform database search criteria. Within this work we demonstrate the salience of single-cell analysis by performance testing a set of 630 previously constructed admixtures containing up to 5 donors of balanced and unbalanced contributions. We use scEPGs that were generated by isolating single cells, employing a direct-to-PCR extraction treatment, amplifying STRs that are compliant with existing national databases and applying post-PCR treatments that elicit a detection limit of one DNA copy. We determined that, for these test data, 99.3% of the true genotypes are included in the 99.8% credible set, regardless of the number of donors that comprised the mixture. We also determined that the most probable genotype was the true genotype for 97% of the loci when the number of cells in a cluster was at least two. Since efficient investigative leads will be borne by posterior mass distributions that are narrow and concentrated at the true genotype, we report that, for this test set, 47,900 (86%) loci returned only one credible genotype and of these 47,551 (99%) were the true genotype. When determining the LR for true contributors, 91% of the clusters rendered LR>1018, showing the potential of single-cell data to positively affect investigative reporting.
Collapse
Affiliation(s)
- Catherine M Grgicak
- Department of Chemistry, Rutgers University, Camden, NJ 08102, USA; Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA; Program in Biomedical Forensic Sciences, Boston University, Boston, MA 02118, USA.
| | - Qhawe Bhembe
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA
| | - Klaas Slooten
- Netherlands Forensic Institute, P.O. Box 24044, 2490 AA The Hague, the Netherlands; VU University Amsterdam, De Boelelaan 1081, 1081 HV Amsterdam, the Netherlands
| | - Nidhi C Sheth
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA
| | - Ken R Duffy
- Department of Mathematics, Northeastern University, Boston, MA 02115, USA; Department of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115, USA; Hamilton Institute, Maynooth University, Ireland
| | - Desmond S Lun
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA; Department of Computer Science, Rutgers University, Camden, NJ 08102, USA
| |
Collapse
|
2
|
Slooten K. The comparison of DNA mixture profiles with multiple persons of interest. Forensic Sci Int Genet 2021; 56:102592. [PMID: 34739935 DOI: 10.1016/j.fsigen.2021.102592] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Revised: 08/27/2021] [Accepted: 09/06/2021] [Indexed: 12/29/2022]
Abstract
When we compare a DNA mixture profile to a single person of interest, there are often just two competing explanations considered, and the comparison of how likely these are to lead to the observed mixture is summarized by a likelihood ratio. However, in more complex cases this does not suffice, e.g., when there are multiple persons of interest. One can then compute several likelihood ratios, corresponding to several pairs of hypotheses, and subsequently decide which one(s) to report. This may lead to the computation of a rather large number of such likelihood ratios. In this article we advocate a systematic approach that starts by describing all relevant hypotheses. For each hypothesis, we then compute its likelihood (i.e., the probability to see the genetic data if the hypothesis is true). Based on the likelihoods of all considered hypotheses, one can then make a summary of the findings to report. This may be on the level of the considered hypotheses and/or with likelihood ratios per person of interest. We illustrate with several examples how this approach assists interpretation. The likelihoods summarize how the trace can help to distinguish between the considered hypotheses, in the sense that they transform the prior odds on them into posterior odds, without having to assign prior probabilities on the hypotheses for the calculation of the likelihoods themselves. On the other hand likelihood ratios (LR's) for individual PoI's cannot be obtained without these priors. In many cases these LR's will be quite insensitive to the choice of prior probabilities but in other cases they will be; we give examples of both.We argue that the table of likelihoods of the considered hypotheses is a more natural analog of the LR provided in the simple case with one PoI and two considered hypotheses, compared to the computation of a LR per PoI. We end with a discussion of the choice of prior probabilities, of the existing recommendations for this situation, and on reporting.
Collapse
Affiliation(s)
- K Slooten
- Netherlands Forensic Institute; VU University Amsterdam
| |
Collapse
|
3
|
Slooten K. The analogy between DNA kinship and DNA mixture evaluation, with applications for the interpretation of likelihood ratios produced by possibly imperfect models. Forensic Sci Int Genet 2020; 52:102449. [PMID: 33517022 DOI: 10.1016/j.fsigen.2020.102449] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 11/19/2020] [Accepted: 12/01/2020] [Indexed: 12/22/2022]
Abstract
Two main applications of forensic DNA analysis are the investigation of possible relatedness and the investigation whether a person left DNA in a trace. Both of these are usually carried out by the calculation of likelihood ratios. In the kinship case, it is standard to let the likelihood ratio express the support in favour of the investigated relatedness versus no relatedness, and in the investigation of traces, one by default compares the hypothesis that the person of interest contributed DNA, versus that he is unrelated to any of the actual contributors. In both cases however, we can also view the probabilistic procedure as an inference of the profile of the person we look for: in other words, in both cases we carry out probabilistic genotyping. In this article we use this general analogy to develop various more specific analogies between kinship and mixture likelihood ratios. These analogies help to understand the concepts that play a role, and also to understand the importance of the statistical modeling needed for DNA mixtures. In this article, we apply our findings to consider what we can and cannot conclude from a likelihood ratio in favour of contribution to a mixed DNA profile, if that is computed by a model whose specifics are not entirely known to us, or where we do not know whether they provide a good description of the stochastic effects involved in the generation of DNA trace profiles. We show that, if unrelated individuals are adequately modeled, we can give bounds on how often LR's coming from certain types of black box models may arise, both for persons who are actual contributors and who are unrelated. In particular we show that no model, provided it satisfies basic requirements, can overestimate the evidence found for actual contributors both often and strongly.
Collapse
Affiliation(s)
- Klaas Slooten
- Netherlands Forensic Institute, P.O. Box 24044, 2490 AA The Hague, The Netherlands; VU University Amsterdam, De Boelelaan 1081, 1081 HV Amsterdam, The Netherlands.
| |
Collapse
|
4
|
Vergeer P, Leegwater AJ, Slooten K. Evaluation of glass evidence at activity level: A new distribution for the background population. Forensic Sci Int 2020; 316:110431. [PMID: 32980719 DOI: 10.1016/j.forsciint.2020.110431] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 07/17/2020] [Accepted: 07/20/2020] [Indexed: 11/30/2022]
Abstract
For evidence evaluation of the physicochemical properties of glass at activity level a well-known formula introduced by Evett & Buckleton [1,2] is commonly used. Parameters in this formula are, amongst others, the probability in a background population to find on somebody's clothing the observed number of glass sources and the probability in a background population to find on somebody's clothing a group of fragments with the same size as the observed matching group. Recently, for efficiency reasons, the Netherlands Forensic Institute changed its methodology to measure not all the glass fragments but a subset of glass fragments found on clothing. Due to the measurement of subsets, it is difficult to get accurate estimates for these parameters in this formula. We offer a solution to this problem. The heart of the solution consists of relaxing the assumption of conditional independence of group sizes of background fragments, and modelling the probability of an allocation of background fragments into groups given a total number of background fragments by a two-parameter Chinese restaurant process (CRP) [3]. Under the assumption of random sampling of fragments to be measured from recovered fragments in the laboratory, parameter values for the Chinese restaurant process may be estimated from a relatively small dataset of glass in other relevant cases. We demonstrate this for a dataset of glass fragments collected from upper garments in casework, show model fit and provide a prototypical calculation of an LR at activity level accompanied with a parameter sensitivity analysis for reasonable ranges of the CRP parameter values. Considering that other laboratories may want to measure subsets as well, we believe this is an important alternative approach to the evaluation of numerical LRs for glass analyses at activity level.
Collapse
Affiliation(s)
- Peter Vergeer
- The Netherlands Forensic Institute, P.O. Box 24044, 2490 AA, The Hague, the Netherlands.
| | | | - Klaas Slooten
- The Netherlands Forensic Institute, P.O. Box 24044, 2490 AA, The Hague, the Netherlands; VU University Amsterdam, De Boelelaan 1081, 1081 HV, Amsterdam, the Netherlands
| |
Collapse
|
5
|
Slooten K. Likelihood ratio distributions and the (ir)relevance of error rates. Forensic Sci Int Genet 2020; 44:102173. [DOI: 10.1016/j.fsigen.2019.102173] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Revised: 08/29/2019] [Accepted: 09/30/2019] [Indexed: 12/15/2022]
|
6
|
Meester RWJ, Slooten K. DNA database matches: A p versus np problem. Forensic Sci Int Genet 2019; 46:102229. [PMID: 32058298 DOI: 10.1016/j.fsigen.2019.102229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Revised: 12/18/2019] [Accepted: 12/18/2019] [Indexed: 11/17/2022]
Abstract
The evidential value of a unique DNA database match has been extensively discussed. In principle the matter has been mathematically resolved, since the posterior odds on the match being with the trace donor are unambiguously defined. There are multiple ways to express these odds as a product of likelihood ratio and prior odds, and so the mathematics do not immediately tell us what to do in concrete cases, in particular which likelihood ratio to choose for reporting. With p the random match probability for the matching person, if innocent, and n the database size, both 1/p, originating from a suspect-centered framework, and 1/(np), originating from a database-centered framework, arise as likelihood ratio. Both have been defended and both have been criticized in the literature. We will clarify the situation by not introducing models and choices of prior probabilities until they are needed. This allows to derive the posterior odds in their most general form, which applies whenever we know that a single person among a list is not excluded as potential trace donor. We show that we need only three probabilities, that pertain to the observed match, to the database, and to the matching person respectively. How these required probabilities behave in a given context, then, differs from one situation to another. This is understandable since database searches may be done under various circumstances. They may be carried out with or without a suspect already in mind and, depending on the operational procedures, one may or may not be informed about the personal details of the person who gives the match. We show how to evaluate the required probabilities in all such cases. We will motivate why we believe that for some database searches, the 1/p likelihood ratio is more natural, whereas for others, 1/(np) seems the more sensible choice. This is not motivated by the mathematics: mathematically, the approaches are equivalent. It is motivated by considering which model best reflects the actual situation, taking into account what question was asked to begin with, and by the practical consideration of judging which likelihood ratio comes closer to the posterior odds based on the information available in the case. This article is intended to be both a research and a review article, and we end with an in-depth discussion of various arguments that have been brought forward in favor or against either 1/p or 1/(np).
Collapse
Affiliation(s)
| | - K Slooten
- VU University Amsterdam, The Netherlands; Netherlands Forensic Institute, The Netherlands.
| |
Collapse
|
7
|
Abstract
In this response paper, part of the Virtual Special Issue on "Measuring and Reporting the Precision of Forensic Likelihood Ratios", we further develop our position on likelihood ratios which we described previously in Berger et al. (2016) "The LR does not exist". Our exposition is inspired by an example given in Martire et al. (2016) "On the likelihood of encapsulating all uncertainty", where the consequences of obtaining additional information on the LR were discussed. In their example, two experts use the same data in a different way, and the LRs of these experts change differently when new data are taken into account. Using this example as a starting point we will demonstrate that the probability distribution for the frequency of the characteristic observed in trace and reference material can be used to predict how much an LR will change when new data become available. This distribution can thus be useful for such a sensitivity analysis, and address the question of whether to obtain additional data or not. But it does not change the answer to the original question of how to update one's prior odds based on the evidence, and it does not represent an uncertainty on the likelihood ratio based on the current data.
Collapse
Affiliation(s)
- Klaas Slooten
- Netherlands Forensic Institute, P.O. Box 24044, 2490 AA The Hague, The Netherlands; VU University Amsterdam, De Boelelaan 1081, 1081 HV Amsterdam, The Netherlands.
| | - Charles E H Berger
- Netherlands Forensic Institute, P.O. Box 24044, 2490 AA The Hague, The Netherlands; Leiden University, Institute for Criminal Law and Criminology, P.O. Box 9520, 2300 RA Leiden, The Netherlands
| |
Collapse
|
8
|
Slooten K. Identifying common donors in DNA mixtures, with applications to database searches. Forensic Sci Int Genet 2017; 26:40-47. [DOI: 10.1016/j.fsigen.2016.10.003] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2016] [Revised: 08/13/2016] [Accepted: 10/01/2016] [Indexed: 12/15/2022]
|
9
|
Slooten K. Accurate assessment of the weight of evidence for DNA mixtures by integrating the likelihood ratio. Forensic Sci Int Genet 2016; 27:1-16. [PMID: 27914277 DOI: 10.1016/j.fsigen.2016.11.001] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2016] [Revised: 10/26/2016] [Accepted: 11/06/2016] [Indexed: 01/24/2023]
Abstract
Several methods exist for weight of evidence calculations on DNA mixtures. Especially if dropout is a possibility, it may be difficult to estimate mixture specific parameters needed for the evaluation. For semi-continuous models, the LR for a person to have contributed to a mixture depends on the specified number of contributors and the probability of dropout for each. We show here that, for the semi-continuous model that we consider, the weight of evidence can be accurately obtained by applying the standard statistical technique of integrating the likelihood ratio against the parameter likelihoods obtained from the mixture data. This method takes into account all likelihood ratios belonging to every choice of parameters, but LR's belonging to parameters that provide a better explanation to the mixture data put in more weight into the final result. We therefore avoid having to estimate the number of contributors or their probabilities of dropout, and let the whole evaluation depend on the mixture data and the allele frequencies, which is a practical advantage as well as a gain in objectivity. Using simulated mixtures, we compare the LR obtained in this way with the best informed LR, i.e., the LR using the parameters that were used to generate the data, and show that results obtained by integration of the LR approximate closely these ideal values. We investigate both contributors and non-contributors for mixtures with various numbers of contributors. For contributors we always obtain a result close to the best informed LR whereas non-contributors are excluded more strongly if a smaller dropout probability is imposed for them. The results therefore naturally lead us to reconsider what we mean by a contributor, or by the number of contributors.
Collapse
Affiliation(s)
- Klaas Slooten
- Netherlands Forensic Institute, P.O. Box 24044, 2490 AA The Hague, The Netherlands; VU University Amsterdam, De Boelelaan 1081, 1081 HV Amsterdam, The Netherlands
| |
Collapse
|
10
|
|
11
|
Slooten K. Familial searching on DNA mixtures with dropout. Forensic Sci Int Genet 2016; 22:128-138. [DOI: 10.1016/j.fsigen.2016.02.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Revised: 12/11/2015] [Accepted: 02/01/2016] [Indexed: 11/25/2022]
|
12
|
Slooten K. Distinguishing between donors and their relatives in complex DNA mixtures with binary models. Forensic Sci Int Genet 2015; 21:95-109. [PMID: 26745184 DOI: 10.1016/j.fsigen.2015.12.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2015] [Revised: 10/02/2015] [Accepted: 12/01/2015] [Indexed: 12/18/2022]
Abstract
While likelihood ratio calculations were until the recent past limited to the evaluation of mixtures in which all alleles of all donors are present in the DNA mixture profile, more recent methods are able to deal with allelic dropout and drop-in. This opens up the possibility to obtain likelihood ratios for mixtures where this was not previously possible, but it also means that a full match between the alleged contributor and the crime stain is no longer necessary. We investigate in this article what the consequences are for relatives of the actual donors, because they typically share more alleles with the true donor than an unrelated individual. We do this with a semi-continuous binary approach, where the likelihood ratios are based on the observed alleles and the dropout probabilities for each donor, but not on the peak heights themselves. These models are widespread in the forensic community. Since in many cases a simple model is used where a uniform dropout probability is assumed for all (or for all unknown) contributors, we explore the extent to which this alters the false positive probabilities for relatives of donors, compared to what would have been obtained with the correct probabilities of dropout for each donor.
Collapse
Affiliation(s)
- K Slooten
- Netherlands Forensic Institute (NFI), The Netherlands; VU University Amsterdam, The Netherlands.
| |
Collapse
|
13
|
Slooten K. Distinguishing mixture donors from relatives, illustrated by an example. Forensic Science International: Genetics Supplement Series 2015. [DOI: 10.1016/j.fsigss.2015.09.142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
14
|
Kruijver M, Meester R, Slooten K. Optimal strategies for familial searching. Forensic Sci Int Genet 2014; 13:90-103. [DOI: 10.1016/j.fsigen.2014.06.010] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2014] [Accepted: 06/11/2014] [Indexed: 11/28/2022]
|
15
|
Ricciardi F, Slooten K. Mutation models for DVI analysis. Forensic Sci Int Genet 2014; 11:85-95. [DOI: 10.1016/j.fsigen.2014.02.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2013] [Revised: 01/24/2014] [Accepted: 02/15/2014] [Indexed: 11/27/2022]
|
16
|
Affiliation(s)
- Klaas Slooten
- Netherlands Forensic Institute; The Hague The Netherlands
| | | |
Collapse
|
17
|
Slooten K, Ricciardi F. Estimation of mutation probabilities for autosomal STR markers. Forensic Sci Int Genet 2013; 7:337-44. [DOI: 10.1016/j.fsigen.2013.01.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2012] [Revised: 01/21/2013] [Accepted: 01/27/2013] [Indexed: 10/27/2022]
|
18
|
Haned H, Slooten K, Gill P. Exploratory data analysis for the interpretation of low template DNA mixtures. Forensic Sci Int Genet 2012; 6:762-74. [DOI: 10.1016/j.fsigen.2012.08.008] [Citation(s) in RCA: 91] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2012] [Revised: 08/03/2012] [Accepted: 08/16/2012] [Indexed: 01/13/2023]
|
19
|
|
20
|
van Dongen C, Slooten K, Slagter M, Burgers W, Wiegerinck W. Bonaparte: Application of new software for missing persons program. Forensic Science International: Genetics Supplement Series 2011. [DOI: 10.1016/j.fsigss.2011.08.059] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
21
|
Abstract
A defendant whose DNA profile matches that of a crime stain may argue that he has several, say n, brothers and that one of them may have been the origin of the crime stain. If the probability for any of the brothers considered separately to match the crime stain profile is p, we show that the probability that at least one of the n brothers match is strictly smaller than np. This latter quantity therefore is an easy to compute and conservative value to report.
Collapse
Affiliation(s)
- K Slooten
- Netherlands Forensic Institute, P.O. Box 24044, 2490 AA The Hague, The Netherlands.
| |
Collapse
|
22
|
|