26
|
Turner SD, Nagraj VP, Scholz M, Jessa S, Acevedo C, Ge J, Woerner AE, Budowle B. skater: an R package for SNP-based kinship analysis, testing, and evaluation. F1000Res 2022; 11:18. [PMID: 35222994 PMCID: PMC8844523 DOI: 10.12688/f1000research.76004.1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/20/2021] [Indexed: 11/20/2022] Open
Abstract
Motivation: SNP-based kinship analysis with genome-wide relationship estimation and IBD segment analysis methods produces results that often require further downstream process- ing and manipulation. A dedicated software package that consistently and intuitively imple- ments this analysis functionality is needed. Results: Here we present the skater R package for SNP-based kinship analysis, testing, and evaluation with R. The skater package contains a suite of well-documented tools for importing, parsing, and analyzing pedigree data, performing relationship degree inference, benchmarking relationship degree classification, and summarizing IBD segment data. Availability: The skater package is implemented as an R package and is released under the MIT license at https://github.com/signaturescience/skater. Documentation is available at https://signaturescience.github.io/skater.
Collapse
|
27
|
Crysup B, Budowle B, Woerner AE. ProSynAR: a reference aware read merger. Bioinformatics 2022; 38:2052-2053. [PMID: 35020788 DOI: 10.1093/bioinformatics/btac022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 01/05/2022] [Accepted: 01/07/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Read-merging algorithms that look solely at the reads can misalign and mis-merge the reads (especially near repetitive sequences). RESULTS The C++ program ProSynAR has been written to take the reads' position in the reference into account when performing (and deciding whether to perform) a merge. AVAILABILITY *Nix users can retrieve the source from GitHub (https://github.com/Benjamin-Crysup/prosynar). Windows binary available at https://github.com/Benjamin-Crysup/prosynar/releases/download/1.0/prosynar.zip. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
28
|
Ceresa L, Chavez J, Bus MM, Budowle B, Kitchner E, Kimball J, Gryczynski I, Gryczynski Z. Förster Resonance Energy Transfer-Enhanced Detection of Minute Amounts of DNA. Anal Chem 2022; 94:5062-5068. [PMID: 35286067 DOI: 10.1021/acs.analchem.1c05275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
This article presents a novel approach to increase the detection sensitivity of trace amounts of DNA in a sample by employing Förster resonance energy transfer (FRET) between intercalating dyes. Two intercalators that present efficient FRET were used to enhance sensitivity and improve specificity in detecting minute amounts of DNA. Comparison of steady-state acceptor emission spectra with and without the donor allows for simple and specific detection of DNA (acceptor bound to DNA) down to 100 pg/μL. When utilizing as an acceptor a dye with a significantly longer lifetime (e.g., ethidium bromide bound to DNA), multipulse pumping and time-gated detection enable imaging/visualization of picograms of DNA present in a microliter of an unprocessed sample or DNA collected on a swab or other substrate materials.
Collapse
|
29
|
Vuorio A, Budowle B, Kovanen PT. Airborne particles and cardiovascular morbidity in severe inherited hypercholesterolemia: Vulnerable endothelium under multiple attacks. Bioessays 2021; 44:e2100273. [PMID: 34967031 DOI: 10.1002/bies.202100273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2021] [Revised: 12/17/2021] [Accepted: 12/20/2021] [Indexed: 11/07/2022]
Abstract
Despite recent advances in the research related to air pollution and associated adverse cardiovascular events, the combined effects of air pollution, climate change, and SARS-CoV-2 infection on cardiovascular health need to be researched further. This Commentary addresses their impacts on cardiovascular health in the approximately 25 million people with a severe form of inherited hypercholesterolemia, called familial hypercholesterolemia (FH). The arterial endothelium in these individuals is potentially under multiple attacks caused by particles of both endogenous and exogenous origin. Thus, they have a lifelong highly elevated level of circulating low density lipoprotein (LDL) cholesterol which drives premature atherosclerosis. The high levels of LDL particles, often associated with an elevated level of circulating lipoprotein(a) particles, are both capable of inducing and maintaining endothelial dysfunction. Such pre-existing endothelial dysfunction can be exacerbated by exposure to SARS-CoV-2 viral particles, by exposure to fine particulate matter generated by climate change-associated wildfires, and by dehydration during deadly heatwaves linked to the globally rising temperatures. The external factors can severely worsen the pre-existing endothelial dysfunction, and thereby significantly increase the risk of a cardiovascular event in the exposed FH patients.
Collapse
|
30
|
Ge J, King J, Mandape S, Budowle B. Enhanced mixture interpretation with macrohaplotypes based on long-read DNA sequencing. Int J Legal Med 2021; 135:2189-2198. [PMID: 34378071 DOI: 10.1007/s00414-021-02679-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Accepted: 07/30/2021] [Indexed: 12/18/2022]
Abstract
Deconvoluting mixture samples is one of the most challenging problems confronting DNA forensic laboratories. Efforts have been made to provide solutions regarding mixture interpretation. The probabilistic interpretation of Short Tandem Repeat (STR) profiles has increased the number of complex mixtures that can be analyzed. A portion of complex mixture profiles, particularly for mixtures with a high number of contributors, are still being deemed uninterpretable. Novel forensic markers, such as Single Nucleotide Variants (SNV) and microhaplotypes, also have been proposed to allow for better mixture interpretation. However, these markers have both a lower discrimination power compared with STRs and are not compatible with CODIS or other national DNA databanks worldwide. The short-read sequencing (SRS) technologies can facilitate mixture interpretation by identifying intra-allelic variations within STRs. Unfortunately, the short size of the amplicons containing STR markers and sequence reads limit the alleles that can be attained per STR. The latest long-read sequencing (LRS) technologies can overcome this limitation in some samples in which larger DNA fragments (including both STRs and SNVs) with definitive phasing are available. Based on the LRS technologies, this study developed a novel CODIS compatible forensic marker, called a macrohaplotype, which combines a CODIS STR and flanking variants to offer extremely high number of haplotypes and hence very high discrimination power per marker. The macrohaplotype will substantially improve mixture interpretation capabilities. Based on publicly accessible data, a panel of 20 macrohaplotypes with sizes of ~ 8 k bp and the maximum high discrimination powers were designed. The statistical evaluation demonstrates that these macrohaplotypes substantially outperform CODIS STRs for mixture interpretation, particularly for mixtures with a high number of contributors, as well as other forensic applications. Based on these results, efforts should be undertaken to build a complete workflow, both wet-lab and bioinformatics, to precisely call the variants and generate the macrohaplotypes based on the LRS technologies.
Collapse
|
31
|
Ceresa L, Kitchner E, Seung M, Bus MM, Budowle B, Chavez J, Gryczynski I, Gryczynski Z. A novel approach to imaging and visualization of minute amounts of DNA in small volume samples. Analyst 2021; 146:6520-6527. [PMID: 34559174 DOI: 10.1039/d1an01391b] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
This report presents a novel approach for detecting and visualizing small to trace amounts of DNA in a sample. By utilizing both the change in emission spectrum and change in fluorescence lifetime, there is a significant increase in detection sensitivity allowing for the imaging/visualizing of a picograms amount of DNA in a microliters volume. As in the previous reports, one of the oldest DNA intercalators, Ethidium Bromide (EtBr), is employed as a model system. With this new approach, it is feasible to visualize just a few hundred picograms of DNA without the need for prior DNA amplification. The sensitivity can later be largely improved by using an intercalator that exhibits a higher affinity to DNA and a larger fluorescence change upon binding to DNA (e.g., ethidium homodimer, YOYO, or Diamond nucleic acid dyes).
Collapse
|
32
|
Ge J, King JL, Smuts A, Budowle B. Precision DNA Mixture Interpretation with Single-Cell Profiling. Genes (Basel) 2021; 12:1649. [PMID: 34828255 PMCID: PMC8623868 DOI: 10.3390/genes12111649] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 10/14/2021] [Accepted: 10/14/2021] [Indexed: 11/16/2022] Open
Abstract
Wet-lab based studies have exploited emerging single-cell technologies to address the challenges of interpreting forensic mixture evidence. However, little effort has been dedicated to developing a systematic approach to interpreting the single-cell profiles derived from the mixtures. This study is the first attempt to develop a comprehensive interpretation workflow in which single-cell profiles from mixtures are interpreted individually and holistically. In this approach, the genotypes from each cell are assessed, the number of contributors (NOC) of the single-cell profiles is estimated, followed by developing a consensus profile of each contributor, and finally the consensus profile(s) can be used for a DNA database search or comparing with known profiles to determine their potential sources. The potential of this single-cell interpretation workflow was assessed by simulation with various mixture scenarios and empirical allele drop-out and drop-in rates, the accuracies of estimating the NOC, the accuracies of recovering the true alleles by consensus, and the capabilities of deconvolving mixtures with related contributors. The results support that the single-cell based mixture interpretation can provide a precision that cannot beachieved with current standard CE-STR analyses. A new paradigm for mixture interpretation is available to enhance the interpretation of forensic genetic casework.
Collapse
|
33
|
Mandape SN, Smart U, King JL, Muenzler M, Kapema KB, Budowle B, Woerner AE. MMDIT: A tool for the deconvolution and interpretation of mitochondrial DNA mixtures. Forensic Sci Int Genet 2021; 55:102568. [PMID: 34416654 DOI: 10.1016/j.fsigen.2021.102568] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 06/22/2021] [Accepted: 08/03/2021] [Indexed: 01/01/2023]
Abstract
Short tandem repeats of the nuclear genome have been the preferred markers for analyzing forensic DNA mixtures. However, when nuclear DNA in a sample is degraded or limited, mitochondrial DNA (mtDNA) markers provide a powerful alternative. Though historically considered challenging, the interpretation and analysis of mtDNA mixtures have recently seen renewed interest with the advent of massively parallel sequencing. However, there are only a few software tools available for mtDNA mixture interpretation. To address this gap, the Mitochondrial Mixture Deconvolution and Interpretation Tool (MMDIT) was developed. MMDIT is an interactive application complete with a graphical user interface that allows users to deconvolve mtDNA (whole or partial genomes) mixtures into constituent donor haplotypes and estimate random match probabilities on these resultant haplotypes. In cases where deconvolution might not be feasible, the software allows mixture analysis directly within a binary framework (i.e. qualitatively, only using data on allele presence/absence). This paper explains the functionality of MMDIT, using an example of an in vitro two-person mtDNA mixture with a ratio of 1:4. The uniqueness of MMDIT lies in its ability to resolve mixtures into complete donor haplotypes using a statistical phasing framework before mixture analysis and evaluating statistical weights employing a novel graph algorithm approach. MMDIT is the first available open-source software that can automate mtDNA mixture deconvolution and analysis. The MMDIT web application can be accessed online at https://www.unthsc.edu/mmdit/. The source code is available at https://github.com/SammedMandape/MMDIT_UI and archived on zenodo (https://doi.org/10.5281/zenodo.4770184).
Collapse
|
34
|
Neyra Rivera CD, Delgado Ramos E, Díaz Soria F, Quispe Ramírez JS, Ge J, Budowle B. Genetic study with autosomal STR markers in people of the Peruvian jungle for human identification purposes. CANADIAN SOCIETY OF FORENSIC SCIENCE JOURNAL 2021. [DOI: 10.1080/00085030.2021.1933811] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
35
|
Smart U, Cihlar JC, Budowle B. International Wildlife Trafficking: A perspective on the challenges and potential forensic genetics solutions. Forensic Sci Int Genet 2021; 54:102551. [PMID: 34134047 DOI: 10.1016/j.fsigen.2021.102551] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 06/03/2021] [Accepted: 06/04/2021] [Indexed: 12/29/2022]
Abstract
International wildlife trafficking (IWT) is a thriving and pervasive illegal enterprise that adversely affects modern societies. Yet, despite being globally recognized as a threat to biodiversity, national security, economy, and biosecurity, IWT remains largely unabated and is proliferating at an alarming rate. The increase in IWT is generally attributed to a lack of prioritization to curb wildlife crime through legal and scientific infrastructure. This review: (1) lays out the damaging scope and influence of IWT; (2) discusses the potential of DNA marker systems, barcodes, and emerging molecular technologies, such as long-read portable sequencing, to facilitate rapid, in situ identification of species and individuals; and (3) encourages initiatives that promote quality and innovation. Interdisciplinary collaboration promises to be one of the most effective ways forward to surmounting the complex scientific and legal challenges posed by IWT.
Collapse
|
36
|
Santos CGM, Rolim-Filho NG, Domingues CA, Dornelas-Ribeiro M, King JL, Budowle B, Moura-Neto RS, Silva R. Association of whole mtDNA, an NADPH G11914A variant, and haplogroups with high physical performance in an elite military troop. ACTA ACUST UNITED AC 2021; 54:e10317. [PMID: 33909855 PMCID: PMC8075130 DOI: 10.1590/1414-431x202010317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Accepted: 12/29/2020] [Indexed: 11/22/2022]
Abstract
Physical performance is a multifactorial and complex trait influenced by environmental and hereditary factors. Environmental factors alone have been insufficient to characterize all outstanding phenotypes. Recent advances in genomic technologies have enabled the investigation of whole nuclear and mitochondrial genome sequences, increasing our ability to understand interindividual variability in physical performance. Our objective was to evaluate the association of mitochondrial polymorphic loci with physical performance in Brazilian elite military personnel. Eighty-eight male military personnel who participated in the Command Actions Course of the Army were selected. Total DNA was obtained from blood samples and a complete mitochondrial genome (mtDNA) was sequenced using Illumina MiSeq platform. Twenty-nine subjects completed the training program (FINISHED, 'F'), and fifty-nine failed to complete (NOT_FINISHED, 'NF'). The mtDNA from NF was slightly more similar to genomes from African countries frequently related to endurance level. Twenty-two distinct mtDNA haplogroups were identified corroborating the intense genetic admixture of the Brazilian population, but their distribution was similar between the two groups (FST=0.0009). Of 745 polymorphisms detected in the mtDNA, the position G11914A within the NADPH gene component of the electron transport chain, was statistically different between F and NF groups (P=0.011; OR: 4.286; 95%CI: 1.198-16.719), with a higher frequency of the G allele in group F individuals). The high performance of military personnel may be mediated by performance-related genomic traits. Thus, mitochondrial genetic markers such as the ND4 gene may play an important role on physical performance variability.
Collapse
|
37
|
Li R, Budowle B, Sun H, Ge J. Linkage and linkage disequilibrium among the markers in the forensic MPS panels. J Forensic Sci 2021; 66:1637-1646. [PMID: 33885147 DOI: 10.1111/1556-4029.14724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Revised: 03/11/2021] [Accepted: 03/22/2021] [Indexed: 11/28/2022]
Abstract
For the past two to three decades, forensic DNA evidence has been analyzed with a limited number of short tandem repeats (STRs), and these STRs are usually assumed to be independent for statistical calculations. With the development and implementation of the MPS technologies, more autosomal markers, both single nucleotide polymorphisms (SNPs) and STRs, can be analyzed. A number of these markers are physically very close to each other, and it may not be appropriate to assume all these markers are genetically unlinked or in linkage equilibrium. In this study, publicly accessible genomic data from five representative populations were used to evaluate the genetic linkage and linkage disequilibrium (LD) between autosomal markers represented in six major commercial panels (in total, 362 markers). Among the 3041 syntenic marker pairs, 1524 pairs had sex-average genetic distances <50 cM, and thus, these marker pairs can be considered as genetically linked. Among the 143 marker pairs with physical distances <1 Mb, 19 LD haplotype blocks (comprising 39 SNPs in total) were detected for at least one of the tested populations. Statistical methods for interpreting linked markers and/or markers in LD were suggested for various case scenarios.
Collapse
|
38
|
Moura-Neto R, King JL, Mello I, Dias V, Crysup B, Woerner AE, Budowle B, Silva R. Evaluation of Promega PowerSeq™ Auto/Y systems prototype on an admixed sample of Rio de Janeiro, Brazil: Population data, sensitivity, stutter and mixture studies. Forensic Sci Int Genet 2021; 53:102516. [PMID: 33878618 DOI: 10.1016/j.fsigen.2021.102516] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 03/06/2021] [Accepted: 03/29/2021] [Indexed: 02/01/2023]
Abstract
Forensic DNA typing typically relies on the length-based (LB) separation of PCR products containing short tandem repeat loci (STRs). Massively parallel sequencing (MPS) elucidates an additional level of STR motif and flanking region variation. Also, MPS enables simultaneous analysis of different marker-types - autosomal STRs, SNPs for lineage and identification purposes, reducing both the amount of sample used and the turn-around-time of analysis. Therefore, MPS methodologies are being considered as an additional tool in forensic genetic casework. The PowerSeq™ Auto/Y System (Promega Corp), a multiplex forensic kit for MPS, enables analysis of the 22 autosomal STR markers (plus Amelogenin) from the PowerPlex® Fusion 6C kit and 23 Y-STR markers from the PowerPlex® Y23 kit. Population data were generated from 140 individuals from an admixed sample from Rio de Janeiro, Brazil. All samples were processed according to the manufacturers' recommended protocols. Raw data (FastQ) were generated for each indexed sample and analyzed using STRait Razor v2s and PowerSeqv2.config file. The subsequent population data showed the largest increase in expected heterozygosity (23%), from LB to sequence-based (SB) analyses at the D5S818 locus. Unreported allele was found at the D21S11 locus. The random match probability across all loci decreased from 5.9 × 10-28 to 7.6 × 10-33. Sensitivity studies using 1, 0.25, 0.062 and 0.016 ng of DNA input were analyzed in triplicate. Full Y-STR profiles were detected in all samples, and no autosomal allele drop-out was observed with 62 pg of input DNA. For mixture studies, 1 ng of genomic DNA from a male and female sample at 1:1, 1:4, 1:9, 1:19 and 1:49 proportions were analyzed in triplicate. Clearly resolvable alleles (i.e., no stacking or shared alleles) were obtained at a 1:19 male to female contributor ratio. The minus one stutter (-1) increased with the longest uninterrupted stretch (LUS) allele size reads and according to simple or compound/complex repeats. The haplotype-specific stutter rates add more information for mixed samples interpretation. These data support the use of the PowerSeqTM Auto/Y systems prototype kit (22 autosomal STR loci, 23 Y-STR loci and Amelogenin) for forensic genetics applications.
Collapse
|
39
|
Neyra-Rivera CD, Ticona Arenas A, Delgado Ramos E, Velasquez Reinoso MRE, Caceres Rey OA, Budowle B. Population data of 27 Y-chromosome STRS in Aymara population from Peru. AUST J FORENSIC SCI 2021. [DOI: 10.1080/00450618.2021.1882571] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
40
|
Crysup B, Woerner AE, King JL, Budowle B. Graph Algorithms for Mixture Interpretation. Genes (Basel) 2021; 12:genes12020185. [PMID: 33514030 PMCID: PMC7911948 DOI: 10.3390/genes12020185] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 01/22/2021] [Accepted: 01/26/2021] [Indexed: 12/19/2022] Open
Abstract
The scale of genetic methods are presently being expanded: forensic genetic assays previously were limited to tens of loci, but now technologies allow for a transition to forensic genomic approaches that assess thousands to millions of loci. However, there are subtle distinctions between genetic assays and their genomic counterparts (especially in the context of forensics). For instance, forensic genetic approaches tend to describe a locus as a haplotype, be it a microhaplotype or a short tandem repeat with its accompanying flanking information. In contrast, genomic assays tend to provide not haplotypes but sequence variants or differences, variants which in turn describe how the alleles apparently differ from the reference sequence. By the given construction, mitochondrial genetic assays can be thought of as genomic as they often describe genetic differences in a similar way. The mitochondrial genetics literature makes clear that sequence differences, unlike the haplotypes they encode, are not comparable to each other. Different alignment algorithms and different variant calling conventions may cause the same haplotype to be encoded in multiple ways. This ambiguity can affect evidence and reference profile comparisons as well as how “match” statistics are computed. In this study, a graph algorithm is described (and implemented in the MMDIT (Mitochondrial Mixture Database and Interpretation Tool) R package) that permits the assessment of forensic match statistics on mitochondrial DNA mixtures in a way that is invariant to both the variant calling conventions followed and the alignment parameters considered. The algorithm described, given a few modest constraints, can be used to compute the “random man not excluded” statistic or the likelihood ratio. The performance of the approach is assessed in in silico mitochondrial DNA mixtures.
Collapse
|
41
|
Smart U, Cihlar JC, Mandape SN, Muenzler M, King JL, Budowle B, Woerner AE. A Continuous Statistical Phasing Framework for the Analysis of Forensic Mitochondrial DNA Mixtures. Genes (Basel) 2021; 12:128. [PMID: 33498312 PMCID: PMC7909279 DOI: 10.3390/genes12020128] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 01/14/2021] [Accepted: 01/15/2021] [Indexed: 11/16/2022] Open
Abstract
Despite the benefits of quantitative data generated by massively parallel sequencing, resolving mitotypes from mixtures occurring in certain ratios remains challenging. In this study, a bioinformatic mixture deconvolution method centered on population-based phasing was developed and validated. The method was first tested on 270 in silico two-person mixtures varying in mixture proportions. An assortment of external reference panels containing information on haplotypic variation (from similar and different haplogroups) was leveraged to assess the effect of panel composition on phasing accuracy. Building on these simulations, mitochondrial genomes from the Human Mitochondrial DataBase were sourced to populate the panels and key parameter values were identified by deconvolving an additional 7290 in silico two-person mixtures. Finally, employing an optimized reference panel and phasing parameters, the approach was validated with in vitro two-person mixtures with differing proportions. Deconvolution was most accurate when the haplotypes in the mixture were similar to haplotypes present in the reference panel and when the mixture ratios were neither highly imbalanced nor subequal (e.g., 4:1). Overall, errors in haplotype estimation were largely bounded by the accuracy of the mixture's genotype results. The proposed framework is the first available approach that automates the reconstruction of complete individual mitotypes from mixtures, even in ratios that have traditionally been considered problematic.
Collapse
|
42
|
Crysup B, Budowle B, Woerner AE. ProDerAl: Reference Position Dependent Alignment. Bioinformatics 2021; 37:2479-2480. [PMID: 33459758 DOI: 10.1093/bioinformatics/btab008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Revised: 12/11/2020] [Accepted: 01/04/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Current read-mapping software uses a singular specification of alignment parameters with respect to the reference. In the presence of varying reference structures (such as the repetitive regions of the human genome), alignments can be improved if those parameters are allowed vary. RESULTS To that end, the C ++ program ProDerAl was written to refine previously generated alignments using varying parameters for these problematic regions. Synthetic benchmarks show that this realignment can result in an order of magnitude fewer misaligned bases. AVAILABILITY *Nix users can retrieve the source from GitHub (https://github.com/Benjamin-Crysup/proderal.git). Windows binary available at https://github.com/Benjamin-Crysup/proderal/releases/download/v1.1/proderal.zip. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
43
|
King JL, Woerner AE, Mandape SN, Kapema KB, Moura-Neto RS, Silva R, Budowle B. STRait Razor Online: An enhanced user interface to facilitate interpretation of MPS data. Forensic Sci Int Genet 2021; 52:102463. [PMID: 33493821 DOI: 10.1016/j.fsigen.2021.102463] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Revised: 11/06/2020] [Accepted: 12/29/2020] [Indexed: 12/17/2022]
Abstract
Since 2013, STRait Razor has enabled analysis of massively parallel sequencing (MPS) data from various marker systems such as short tandem repeats, single nucleotide polymorphisms, insertion/deletions, and mitochondrial DNA. In this paper, STRait Razor Online (SRO), available at https://www.unthsc.edu/straitrazor, is introduced as an interactive, Shiny-based user interface for primary analysis of MPS data and secondary analysis of STRait Razor haplotype pileups. This software can be accessed from any common browser via desktop, tablet, or smartphone device. SRO is available also as a standalone application and open-source R script available at https://github.com/ExpectationsManaged/STRaitRazorOnline. The local application is capable of batch processing of both fastq files and primary analysis output. Processed batches generate individual report folders and summary reports at the locus- and haplotype-level in a matter of minutes. For example, the processing of data from ∼700 samples generated with the ForenSeq Signature Preparation Kit from allsequences.txt to a final table can be performed in ∼40 min whereas the Excel-based workbooks can take 35-60 h to compile a subset of the tables generated by SRO. To facilitate analysis of single-source, reference samples, a preliminary triaging system was implemented that calls potential alleles and flags loci suspected of severe heterozygote imbalance. When compared to published, manually curated data sets, 98.72 % of software-assigned allele calls without manual interpretation were consistent with curated data sets, 0.99 % loci were presented to the user for interpretation due to heterozygote imbalance, and the remaining 0.29 % of loci were inconsistent due to the analytical thresholds used across the studies.
Collapse
|
44
|
Kitchner E, Chavez J, Ceresa L, Bus MM, Budowle B, Gryczynski Z. A novel approach for visualization and localization of small amounts of DNA on swabs to improve DNA collection and recovery process. Analyst 2021; 146:1198-1206. [PMID: 33393553 DOI: 10.1039/d0an02043e] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
In this report, a simple and practical procedure is proposed for DNA localization on a solid matrix e.g., a collection swab. The approach is straightforward and employs spectrum decomposition using a model DNA intercalator Ethidium Bromide (EtBr). The proposed approach can detect picograms of DNA in solution and nanograms of DNA on solid surfaces (swabs) without the need for PCR amplification. The proposed technology offers the possibility for developing an inexpensive, sensitive, rapid, and practical method for localizing and recovering DNA deposited on collection swabs during routine DNA screening. Improved detection of low DNA concentrations is needed and, if feasible, will allow for better decision making in clinical medicine, biological and environmental research, and human identification in forensic investigations.
Collapse
|
45
|
Guevara EK, Palo JU, Översti S, King JL, Seidel M, Stoljarova M, Wendt FR, Bus MM, Guengerich A, Church WB, Guillén S, Roewer L, Budowle B, Sajantila A. Genetic assessment reveals no population substructure and divergent regional and sex-specific histories in the Chachapoyas from northeast Peru. PLoS One 2020; 15:e0244497. [PMID: 33382772 PMCID: PMC7774974 DOI: 10.1371/journal.pone.0244497] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 12/10/2020] [Indexed: 12/31/2022] Open
Abstract
Many native populations in South America have been severely impacted by two relatively recent historical events, the Inca and the Spanish conquest. However decisive these disruptive events may have been, the populations and their gene pools have been shaped markedly also by the history prior to the conquests. This study focuses mainly on the Chachapoya peoples that inhabit the montane forests on the eastern slopes of the northern Peruvian Andes, but also includes three distinct neighboring populations (the Jívaro, the Huancas and the Cajamarca). By assessing mitochondrial, Y-chromosomal and autosomal diversity in the region, we explore questions that have emerged from archaeological and historical studies of the regional culture (s). These studies have shown, among others, that Chachapoyas was a crossroads for Coast-Andes-Amazon interactions since very early times. In this study, we examine the following questions: 1) was there pre-Hispanic genetic population substructure in the Chachapoyas sample? 2) did the Spanish conquest cause a more severe population decline on Chachapoyan males than on females? 3) can we detect different patterns of European gene flow in the Chachapoyas region? and, 4) did the demographic history in the Chachapoyas resemble the one from the Andean area? Despite cultural differences within the Chachapoyas region as shown by archaeological and ethnohistorical research, genetic markers show no significant evidence for past or current population substructure, although an Amazonian gene flow dynamic in the northern part of this territory is suggested. The data also indicates a bottleneck c. 25 generations ago that was more severe among males than females, as well as divergent population histories for populations in the Andean and Amazonian regions. In line with previous studies, we observe high genetic diversity in the Chachapoyas, despite the documented dramatic population declines. The diverse topography and great biodiversity of the northeastern Peruvian montane forests are potential contributing agents in shaping and maintaining the high genetic diversity in the Chachapoyas region.
Collapse
|
46
|
Woerner AE, Mandape S, King JL, Muenzler M, Crysup B, Budowle B. Reducing noise and stutter in short tandem repeat loci with unique molecular identifiers. Forensic Sci Int Genet 2020; 51:102459. [PMID: 33429137 DOI: 10.1016/j.fsigen.2020.102459] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Revised: 10/28/2020] [Accepted: 12/21/2020] [Indexed: 12/24/2022]
Abstract
Unique molecular identifiers (UMIs) are a promising approach to contend with errors generated during PCR and massively parallel sequencing (MPS). With UMI technology, random molecular barcodes are ligated to template DNA molecules prior to PCR, allowing PCR and sequencing error to be tracked and corrected bioinformatically. UMIs have the potential to be particularly informative for the interpretation of short tandem repeats (STRs). Traditional MPS approaches may simply lead to the observation of alleles that are consistent with the hypotheses of stutter, while with UMIs stutter products bioinformatically may be re-associated with their parental alleles and subsequently removed. Herein, a bioinformatics pipeline named strumi is described that is designed for the analysis of STRs that are tagged with UMIs. Unlike other tools, strumi is an alignment-free machine learning driven algorithm that clusters individual MPS reads into UMI families, infers consensus super-reads that represent each family and provides an estimate the resulting haplotype's accuracy. Super-reads, in turn, approximate independent measurements not of the PCR products, but of the original template molecules, both in terms of quantity and sequence identity. Provisional assessments show that naïve threshold-based approaches generate super-reads that are accurate (∼97 % haplotype accuracy, compared to ∼78 % when UMIs are not used), and the application of a more nuanced machine learning approach increases the accuracy to ∼99.5 % depending on the level of certainty desired. With these features, UMIs may greatly simplify probabilistic genotyping systems and reduce uncertainty. However, the ability to interpret alleles at trace levels also permits the interpretation, characterization and quantification of contamination as well as somatic variation (including somatic stutter), which may present newfound challenges.
Collapse
|
47
|
Ge J, Budowle B. Forensic investigation approaches of searching relatives in DNA databases. J Forensic Sci 2020; 66:430-443. [PMID: 33136341 DOI: 10.1111/1556-4029.14615] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 09/11/2020] [Accepted: 10/05/2020] [Indexed: 11/29/2022]
Abstract
There are several indirect database searching approaches to identify the potential source of a forensic biological sample. These DNA-based approaches are familial searching, Y-STR database searching, and investigative genetic genealogy (IGG). The first two strategies use forensic DNA databases managed by the government, and the latter uses databases managed by private citizens or companies. Each of these search strategies relies on DNA testing to identify relatives of the donor of the crime scene sample, provided such profiles reside in the DNA database(s). All three approaches have been successfully used to identify the donor of biological evidence, which assisted in solving criminal cases or identifying unknown human remains. This paper describes and compares these approaches in terms of genotyping technologies, searching methods, database structures, searching efficiency, data quality, data security, and costs, and raises some potential privacy and legal considerations for further discussion by stakeholders and scientists. Y-STR database searching and IGG are advantageous since they are able to assist in more cases than familial searching readily identifying distant relatives. In contrast, familial searching can be performed more readily with existing laboratory systems. Every country or state may have its own unique economic, technical, cultural, and legal considerations and should decide the best approach(es) to fit those circumstances. Regardless of the approach, the ultimate goal should be the same: generate investigative leads and solve active and cold criminal cases to public safety, under stringent policies and security practices designed to protect the privacy of its citizenry.
Collapse
|
48
|
Buckleton JS, Pugh SN, Bright JA, Taylor DA, Curran JM, Kruijver M, Gill P, Budowle B, Cheng K. Are low LRs reliable? Forensic Sci Int Genet 2020; 49:102350. [DOI: 10.1016/j.fsigen.2020.102350] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2020] [Revised: 06/09/2020] [Accepted: 06/26/2020] [Indexed: 12/20/2022]
|
49
|
Neyra-Rivera CD, Ticona Arenas A, Delgado Ramos E, Velasquez Reinoso MRE, Budowle B. Allelic frequencies with 23 autosomic STRS in the Aymara population of Peru. Int J Legal Med 2020; 135:779-781. [PMID: 33089341 DOI: 10.1007/s00414-020-02448-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2020] [Accepted: 10/14/2020] [Indexed: 11/29/2022]
Abstract
Population data of the Aymara in the province of Puno were established for 23 autosomal STR markers. DNA was obtained from unrelated individuals (n = 190) who reside in three areas of the Floating Islands of Lake Titicaca, residents on the border with Bolivia and residents who are not from the border with Bolivia. The PENTA E marker presented the highest PD (0.9738), PIC (0.8793), and PM (0.7847) values. The combined PD was greater than 0.99999999 and the combined PE was 0.99999994. The largest distance, based on Fst values, was between the Aymara population and the Ashaninca population (0.04022), and the smallest distance was with the populations of Bolivia (0.00136) and Peru (0.00525).
Collapse
|
50
|
|