1
|
Liu Z, Wu E, Li R, Liu J, Zang Y, Cong B, Wu R, Xie B, Sun H. Improved individual identification in DNA mixtures of unrelated or related contributors through massively parallel sequencing. Forensic Sci Int Genet 2024; 72:103078. [PMID: 38889491 DOI: 10.1016/j.fsigen.2024.103078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 06/07/2024] [Accepted: 06/11/2024] [Indexed: 06/20/2024]
Abstract
DNA mixtures are a common sample type in forensic genetics, and we typically assume that contributors to the mixture are unrelated when calculating the likelihood ratio (LR). However, scenarios involving mixtures with related contributors, such as in family murder or incest cases, can also be encountered. Compared to the mixtures with unrelated contributors, the kinship within the mixture would bring additional challenges for the inference of the number of contributors (NOC) and the construction of probabilistic genotyping models. To evaluate the influence of potential kinship on the individual identification of the person of interest (POI), we conducted simulations of two-person (2 P) and three-person (3 P) DNA mixtures containing unrelated or related contributors (parent-child, full-sibling, and uncle-nephew) at different mixing ratios (for 2 P: 1:1, 4:1, 9:1, and 19:1; for 3 P: 1:1:1, 2:1:1, 5:4:1, and 10:5:1), and performed massively parallel sequencing (MPS) using MGIEasy Signature Identification Library Prep Kit on MGI platform. In addition, in silico simulations of mixtures with unrelated and related contributors were also performed. In this study, we evaluated 1): the MPS performance; 2) the influence of multiple genetic markers on determining the presence of related contributors and inferring the NOC within the mixture; 3) the probability distribution of MAC (maximum allele count) and TAC (total allele count) based on in silico mixture profiles; 4) trends in LR values with and without considering kinship in mixtures with related and unrelated contributors; 5) trends in LR values with length- and sequence-based STR genotypes. Results indicated that multiple numbers and types of genetic markers positively influenced kinship and NOC inference in a mixture. The LR values of POI were strongly dependent on the mixing ratio. Non- and correct-kinship hypotheses essentially did not affect the individual identification of the major POI; the correct kinship hypothesis yielded more conservative LR values; the incorrect kinship hypothesis did not necessarily lead to the failure of POI individual identification. However, it is noteworthy that these considerations could lead to uncertain outcomes in the identification of minor contributors. Compared to length-based STR genotyping, using sequence-based STR genotype increases the individual identification power of the POI, concurrently improving the accuracy of mixing ratio inference using EuroForMix. In conclusion, the MGIEasy Signature Identification Library Prep kit demonstrated robust individual identification power, which is a viable MPS panel for forensic DNA mixture interpretations, whether involving unrelated or related contributors.
Collapse
Affiliation(s)
- Zhiyong Liu
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China; Guangdong Province Translational Forensic Medicine Engineering Technology Research Center, Sun Yat-sen University, Guangzhou 510080, China
| | - Enlin Wu
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China; Guangdong Province Translational Forensic Medicine Engineering Technology Research Center, Sun Yat-sen University, Guangzhou 510080, China
| | - Ran Li
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China; Guangdong Province Translational Forensic Medicine Engineering Technology Research Center, Sun Yat-sen University, Guangzhou 510080, China; School of Medicine, Jiaying University, Meizhou 514015, China
| | - Jiajun Liu
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China; Guangdong Province Translational Forensic Medicine Engineering Technology Research Center, Sun Yat-sen University, Guangzhou 510080, China
| | - Yu Zang
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China; Guangdong Province Translational Forensic Medicine Engineering Technology Research Center, Sun Yat-sen University, Guangzhou 510080, China
| | - Bin Cong
- College of Forensic Medicine, Hebei Medical University, Hebei Key Laboratory of Forensic Medicine, Shijiazhuang 050017, China
| | - Riga Wu
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China; Guangdong Province Translational Forensic Medicine Engineering Technology Research Center, Sun Yat-sen University, Guangzhou 510080, China
| | - Bo Xie
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China; Guangdong Province Translational Forensic Medicine Engineering Technology Research Center, Sun Yat-sen University, Guangzhou 510080, China
| | - Hongyu Sun
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China; Guangdong Province Translational Forensic Medicine Engineering Technology Research Center, Sun Yat-sen University, Guangzhou 510080, China.
| |
Collapse
|
2
|
Riman S, Ghemrawi M, Borsuk LA, Mahfouz R, Walsh S, Vallone PM. Sequence-based allelic variations and frequencies for 22 autosomal STR loci in the Lebanese population. Forensic Sci Int Genet 2023; 65:102872. [PMID: 37068444 DOI: 10.1016/j.fsigen.2023.102872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 04/06/2023] [Accepted: 04/08/2023] [Indexed: 04/19/2023]
Abstract
This is the first study that characterizes the sequence-based allelic variations of 22 autosomal Short Tandem Repeat (aSTR) loci in a population dataset collected from Lebanon. Genomic DNA extracts from 195 unrelated Lebanese individuals were amplified with PowerSeq 46GY System Prototype. Targeted amplicons were subjected to DNA library preparation and sequenced on the Verogen MiSeq FGx Sequencing System. Raw FASTQ data files were processed by STRait Razor v3. Sequence strings were annotated according to the considerations of the DNA Commission of the International Society for Forensic Genetics (ISFG) and tabulated herein with their respective allelic frequencies and GeneBank accession and version numbers. The sequenced Lebanese dataset resulted in 429 distinct allelic sequences as compared to the 236 alleles identified by length only. The increase in the number of alleles was observed at 18 out of 22 aSTR loci and was attributed to the sequence variations residing in both the STR repeat motifs and flanking regions. The study uncovered 25 novel aSTR allelic sequences across 12 loci for which GenBank records did not previously exist in the STRSeq BioProject, PRJNA380127. For a concordance check, the length-based allelic calls derived from the full sequences were compared to those genotyped using capillary electrophoresis (CE) methods. Population genetic parameters relevant to the evaluation of forensic DNA evidence were assessed for the sequence-based data and compared to the parameters generated from the length-based information. Using the sequence-based data, Analysis of MOlecular VAriance (AMOVA), genetic distances, and population genetic structure were evaluated for 1231 individuals sampled from the Lebanese and four U.S. populations (African American, Asian, Caucasian, and Hispanic). The results were tabulated and visualized in a population tree, multidimensional scaling scatter plots, and bar plots. This newly established sequence-based database for the Lebanese population can be beneficial for extending NGS applicability to casework or paternity testing and assessing the strength of evidence for NGS-STR profiles. The described novel sequence variants at certain loci can further help in the effort to characterize the sequence diversity of STR markers from different populations around the world.
Collapse
Affiliation(s)
- Sarah Riman
- Applied Genetics Group, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA.
| | - Mirna Ghemrawi
- Department of Chemistry and Biochemistry and International Forensic Research Institute, Florida International University, Miami, FL 33199, USA
| | - Lisa A Borsuk
- Applied Genetics Group, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA
| | - Rami Mahfouz
- Department of Pathology and Laboratory Medicine, American University of Beirut Medical Center, Beirut, Lebanon
| | - Susan Walsh
- Department of Biology, Indiana University Purdue University Indianapolis, Indianapolis, IN 46202, USA
| | - Peter M Vallone
- Applied Genetics Group, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA
| |
Collapse
|
3
|
Abstract
This review paper covers the forensic-relevant literature in biological sciences from 2019 to 2022 as a part of the 20th INTERPOL International Forensic Science Managers Symposium. Topics reviewed include rapid DNA testing, using law enforcement DNA databases plus investigative genetic genealogy DNA databases along with privacy/ethical issues, forensic biology and body fluid identification, DNA extraction and typing methods, mixture interpretation involving probabilistic genotyping software (PGS), DNA transfer and activity-level evaluations, next-generation sequencing (NGS), DNA phenotyping, lineage markers (Y-chromosome, mitochondrial DNA, X-chromosome), new markers and approaches (microhaplotypes, proteomics, and microbial DNA), kinship analysis and human identification with disaster victim identification (DVI), and non-human DNA testing including wildlife forensics. Available books and review articles are summarized as well as 70 guidance documents to assist in quality control that were published in the past three years by various groups within the United States and around the world.
Collapse
|
4
|
Identification of missing persons through kinship analysis by microhaplotype sequencing of single-source DNA and two-person DNA mixtures. Forensic Sci Int Genet 2022; 58:102689. [DOI: 10.1016/j.fsigen.2022.102689] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 02/22/2022] [Accepted: 03/14/2022] [Indexed: 11/04/2022]
|
5
|
Phan NN, Chattopadhyay A, Lee TT, Yin HI, Lu TP, Lai LC, Hwa HL, Tsai MH, Chuang EY. High-performance deep learning pipeline predicts individuals in mixtures of DNA using sequencing data. Brief Bioinform 2021; 22:6345217. [PMID: 34368845 DOI: 10.1093/bib/bbab283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 06/20/2021] [Accepted: 07/03/2021] [Indexed: 11/14/2022] Open
Abstract
In this study, we proposed a deep learning (DL) model for classifying individuals from mixtures of DNA samples using 27 short tandem repeats and 94 single nucleotide polymorphisms obtained through massively parallel sequencing protocol. The model was trained/tested/validated with sequenced data from 6 individuals and then evaluated using mixtures from forensic DNA samples. The model successfully identified both the major and the minor contributors with 100% accuracy for 90 DNA mixtures, that were manually prepared by mixing sequence reads of 3 individuals at different ratios. Furthermore, the model identified 100% of the major contributors and 50-80% of the minor contributors in 20 two-sample external-mixed-samples at ratios of 1:39 and 1:9, respectively. To further demonstrate the versatility and applicability of the pipeline, we tested it on whole exome sequence data to classify subtypes of 20 breast cancer patients and achieved an area under curve of 0.85. Overall, we present, for the first time, a complete pipeline, including sequencing data processing steps and DL steps, that is applicable across different NGS platforms. We also introduced a sliding window approach, to overcome the sequence length variation problem of sequencing data, and demonstrate that it improves the model performance dramatically.
Collapse
Affiliation(s)
- Nam Nhut Phan
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan.,Graduate Institute of Biomedical Electronics and Bioinformatics, Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan.,Bioinformatics and Biostatistics Core, Centre of Genomic and Precision Medicine, National Taiwan University, Taipei 10055, Taiwan
| | - Amrita Chattopadhyay
- Bioinformatics and Biostatistics Core, Centre of Genomic and Precision Medicine, National Taiwan University, Taipei 10055, Taiwan
| | - Tsui-Ting Lee
- Department and Graduate Institute of Forensic Medicine, College of Medicine, National Taiwan University, Taipei, Taiwan
| | - Hsiang-I Yin
- Department and Graduate Institute of Forensic Medicine, College of Medicine, National Taiwan University, Taipei, Taiwan
| | - Tzu-Pin Lu
- Bioinformatics and Biostatistics Core, Centre of Genomic and Precision Medicine, National Taiwan University, Taipei 10055, Taiwan.,Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei 10055, Taiwan
| | - Liang-Chuan Lai
- Bioinformatics and Biostatistics Core, Centre of Genomic and Precision Medicine, National Taiwan University, Taipei 10055, Taiwan.,Graduate Institute of Physiology, College of Medicine, National Taiwan University, Taipei 10051, Taiwan
| | - Hsiao-Lin Hwa
- Department and Graduate Institute of Forensic Medicine, College of Medicine, National Taiwan University, Taipei, Taiwan
| | - Mong-Hsun Tsai
- Bioinformatics and Biostatistics Core, Centre of Genomic and Precision Medicine, National Taiwan University, Taipei 10055, Taiwan.,Institute of Biotechnology, National Taiwan University, Taipei 10672, Taiwan.,Center of Biotechnology, National Taiwan University, Taipei 10672, Taiwan
| | - Eric Y Chuang
- Graduate Institute of Biomedical Electronics and Bioinformatics, Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan.,Bioinformatics and Biostatistics Core, Centre of Genomic and Precision Medicine, National Taiwan University, Taipei 10055, Taiwan.,Master Program for Biomedical Engineering, China Medical University, Taichung 110122, Taiwan
| |
Collapse
|
6
|
Yang TW, Li YH, Chou CF, Lai FP, Chien YH, Yin HI, Lee TT, Hwa HL. DNA mixture interpretation using linear regression and neural networks on massively parallel sequencing data of single nucleotide polymorphisms. AUST J FORENSIC SCI 2021. [DOI: 10.1080/00450618.2020.1807050] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Ta-Wei Yang
- Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei, Taiwan
| | - Yi-Hao Li
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
| | - Cheng-Fu Chou
- Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei, Taiwan
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
| | - Fei-Pei Lai
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan
- Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan
| | - Yin-Hsiu Chien
- Department of Medical Genetics, National Taiwan University Hospital, Taipei, Taiwan
| | - Hsiang-I Yin
- Department and Graduate Institute of Forensic Medicine, College of Medicine, National Taiwan University, Taipei, Taiwan
| | - Tsui-Ting Lee
- Department and Graduate Institute of Forensic Medicine, College of Medicine, National Taiwan University, Taipei, Taiwan
| | - Hsiao-Lin Hwa
- Department of Medical Genetics, National Taiwan University Hospital, Taipei, Taiwan
- Department and Graduate Institute of Forensic Medicine, College of Medicine, National Taiwan University, Taipei, Taiwan
- Department of Obstetrics and Gynecology, National Taiwan University Hospital, Taipei, Taiwan
| |
Collapse
|
7
|
Evaluation of a Microhaplotype-Based Noninvasive Prenatal Test in Twin Gestations: Determination of Paternity, Zygosity, and Fetal Fraction. Genes (Basel) 2020; 12:genes12010026. [PMID: 33375453 PMCID: PMC7823673 DOI: 10.3390/genes12010026] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2020] [Revised: 12/23/2020] [Accepted: 12/24/2020] [Indexed: 12/17/2022] Open
Abstract
As a novel type of genetic marker, the microhaplotype has shown promising potential in forensic research. In the present study, we analyzed maternal plasma cell-free DNA (cfDNA) samples from twin pregnancies to validate microhaplotype-based noninvasive prenatal testing (NIPT) for paternity, zygosity, and fetal fraction (FF). Paternity was determined with the combined use of the relMix package, zygosity was evaluated by examining the presence of informative loci with two fetal genome complements, and FF was assessed through fetal allele ratios. Paternity was determined in 19 twin cases, among which 13 cases were considered dizygotic (DZ) twins based on the presence of 3~10 informative loci and the remaining 6 cases were considered monozygotic (MZ) twins because no informative locus was observed. With the fetal genomic genotypes as a reference, the accuracy of paternity and zygosity determination were confirmed by standard short tandem repeat (STR) analysis. Moreover, the lower FF, higher FF, and combined FF in each DZ plasma sample were closely related to the estimated value. This present preliminary study proposes that microhaplotype-based NIPT is applicable for paternity, zygosity, and FF determination in twin pregnancies, which are expected to be advantageous for both forensic and clinical settings.
Collapse
|
8
|
An examination of STR nomenclatures, filters and models for MPS mixture interpretation. Forensic Sci Int Genet 2020; 48:102319. [DOI: 10.1016/j.fsigen.2020.102319] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Revised: 05/19/2020] [Accepted: 06/01/2020] [Indexed: 11/20/2022]
|