1
|
Challenges in selecting admixture models and marker sets to infer genetic ancestry in a Brazilian admixed population. Sci Rep 2022; 12:21240. [PMID: 36481695 PMCID: PMC9731996 DOI: 10.1038/s41598-022-25521-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 11/30/2022] [Indexed: 12/13/2022] Open
Abstract
The inference of genetic ancestry plays an increasingly prominent role in clinical, population, and forensic genetics studies. Several genotyping strategies and analytical methodologies have been developed over the last few decades to assign individuals to specific biogeographic regions. However, despite these efforts, ancestry inference in populations with a recent history of admixture, such as those in Brazil, remains a challenge. In admixed populations, proportion and components of genetic ancestry vary on different levels: (i) between populations; (ii) between individuals of the same population, and (iii) throughout the individual's genome. The present study evaluated 1171 admixed Brazilian samples to compare the genetic ancestry inferred by tri-/tetra-hybrid admixture models and evaluated different marker sets from those with small numbers of ancestry informative markers panels (AIMs), to high-density SNPs (HDSNP) and whole-genome-sequence (WGS) data. Analyses revealed greater variation in the correlation coefficient of ancestry components within and between admixed populations, especially for minority ancestral components. We also observed positive correlation between the number of markers in the AIMs panel and HDSNP/WGS. Furthermore, the greater the number of markers, the more accurate the tri-/tetra-hybrid admixture models.
Collapse
|
2
|
Liang X, Han X, Liu C, Du W, Zhong P, Huang L, Huang M, Fu L, Liu C, Chen L. Integrating the salivary microbiome in the forensic toolkit by 16S rRNA gene: potential application in body fluid identification and biogeographic inference. Int J Legal Med 2022; 136:975-985. [PMID: 35536322 DOI: 10.1007/s00414-022-02831-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 04/21/2022] [Indexed: 11/30/2022]
Abstract
Saliva is a common body fluid with significant forensic value used to investigate criminal cases such as murder and assault. In the past, saliva identification often relied on the α-amylase test; however, this method has low specificity and is prone to false positives. Accordingly, forensic researchers have been working to find new specific molecular markers to refine the current saliva identification approach. At present, research on immunological methods, mRNA, microRNA, circRNA, and DNA methylation is still in the exploratory stage, and the application of these markers still has various limitations. It has been established that salivary microorganisms exhibit good specificity and stability. In this study, 16S rDNA sequencing technology was used to sequence the V3-V4 hypervariable regions in saliva samples from five regions to reveal the role of regional location on the heterogeneity in microbial profile information in saliva. Although the relative abundance of salivary flora was affected to a certain extent by geographical factors, the salivary flora of each sample was still dominated by Streptococcus, Neisseria, and Rothia. In addition, the microbial community in the saliva samples in this study was significantly different from that in the vaginal secretions, semen, and skin samples reported in our previous studies. Accordingly, saliva can be distinguished from the other three body fluids and tissues. Moreover, we established a prediction model based on the random forest algorithm that could distinguish saliva between different regions at the genus level even though the model has a certain probability of misjudgment which needs more in-depth research. Overall, the microbial community information in saliva stains might have prospects for potential application in body fluid identification and biogeographic inference.
Collapse
Affiliation(s)
- Xiaomin Liang
- Multi-Omics Innovative Research Center of Forensic Identification, Department of Forensic Genetics, School of Forensic Medicine, Southern Medical University, Guangzhou, 510515, People's Republic of China
| | - Xiaolong Han
- Guangzhou Forensic Science Institute, Guangzhou, 510030, People's Republic of China
| | - Changhui Liu
- Guangzhou Forensic Science Institute, Guangzhou, 510030, People's Republic of China
| | - Weian Du
- Multi-Omics Innovative Research Center of Forensic Identification, Department of Forensic Genetics, School of Forensic Medicine, Southern Medical University, Guangzhou, 510515, People's Republic of China
| | - Peiwen Zhong
- Multi-Omics Innovative Research Center of Forensic Identification, Department of Forensic Genetics, School of Forensic Medicine, Southern Medical University, Guangzhou, 510515, People's Republic of China
| | - Litao Huang
- Multi-Omics Innovative Research Center of Forensic Identification, Department of Forensic Genetics, School of Forensic Medicine, Southern Medical University, Guangzhou, 510515, People's Republic of China
| | - Manling Huang
- Multi-Omics Innovative Research Center of Forensic Identification, Department of Forensic Genetics, School of Forensic Medicine, Southern Medical University, Guangzhou, 510515, People's Republic of China
| | - Linhe Fu
- Multi-Omics Innovative Research Center of Forensic Identification, Department of Forensic Genetics, School of Forensic Medicine, Southern Medical University, Guangzhou, 510515, People's Republic of China
| | - Chao Liu
- Multi-Omics Innovative Research Center of Forensic Identification, Department of Forensic Genetics, School of Forensic Medicine, Southern Medical University, Guangzhou, 510515, People's Republic of China.
- Guangzhou Forensic Science Institute, Guangzhou, 510030, People's Republic of China.
| | - Ling Chen
- Multi-Omics Innovative Research Center of Forensic Identification, Department of Forensic Genetics, School of Forensic Medicine, Southern Medical University, Guangzhou, 510515, People's Republic of China.
| |
Collapse
|
3
|
Pfaffelhuber P, Sester-Huss E, Baumdicker F, Naue J, Lutz-Bonengel S, Staubach F. Inference of recent admixture using genotype data. Forensic Sci Int Genet 2021; 56:102593. [PMID: 34735936 DOI: 10.1016/j.fsigen.2021.102593] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 07/30/2021] [Accepted: 09/07/2021] [Indexed: 12/23/2022]
Abstract
The inference of biogeographic ancestry (BGA) has become a focus of forensic genetics. Misinference of BGA can have profound unwanted consequences for investigations and society. We show that recent admixture can lead to misclassification and erroneous inference of ancestry proportions, using state of the art analysis tools with (i) simulations, (ii) 1000 genomes project data, and (iii) two individuals analyzed using the ForenSeq DNA Signature Prep Kit. Subsequently, we extend existing tools for estimation of individual ancestry (IA) by allowing for different IA in both parents, leading to estimates of parental individual ancestry (PIA), and a statistical test for recent admixture. Estimation of PIA outperforms IA in most scenarios of recent admixture. Furthermore, additional information about parental ancestry can be acquired with PIA that may guide casework.
Collapse
Affiliation(s)
- Peter Pfaffelhuber
- Institute for Mathematics, University of Freiburg, Ernst-Zermelo-Str. 1, 79104 Freiburg, Germany.
| | - Elisabeth Sester-Huss
- Institute for Mathematics, University of Freiburg, Ernst-Zermelo-Str. 1, 79104 Freiburg, Germany
| | - Franz Baumdicker
- Cluster of Excellence CMFI, Mathematical and Computational Population Genetics, University of Tübingen, Sand 14, 72076 Tübingen, Germany
| | - Jana Naue
- Institute of Forensic Medicine, Medical Center, Faculty of Medicine, University of Freiburg, Albertstraße 9, 79104 Freiburg, Germany
| | - Sabine Lutz-Bonengel
- Institute of Forensic Medicine, Medical Center, Faculty of Medicine, University of Freiburg, Albertstraße 9, 79104 Freiburg, Germany
| | - Fabian Staubach
- Biology I, Evolution & Ecology, University of Freiburg, Hauptstraße 1, 79104 Freiburg, Germany
| |
Collapse
|
4
|
Validation of BMI genetic risk score and DNA methylation in a Korean population. Int J Legal Med 2021; 135:1201-1212. [PMID: 33594455 DOI: 10.1007/s00414-021-02517-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 01/27/2021] [Indexed: 12/19/2022]
Abstract
When DNA profiles obtained from biological evidence at a crime scene fail to match suspects or anyone in the database, forensic DNA phenotyping, which is the prediction of externally visible characteristics, can facilitate a traced search for an unknown suspect by limiting the search range. Therefore, age, trait, or lifestyle predictors, as well as the predictor for colorations, have been researched in the forensic field. In the present study, for the development of a prediction model for BMI or obesity, we investigated several previously reported BMI- or obesity-associated genetic and epigenetic markers that included four CpGs (cg06500161, cg00574958, cg12593793, and cg10505902 of the ABCG1, CPT1A, LMNA, and PDE4DIP genes, respectively), and eight SNPs (rs12463617, rs1558902, rs591166, rs11030104, rs11671664, rs6545814, rs16858082, and rs574367 near the TMEM18, FTO, MC4R, BDNF, GIPR/QPCTL, ADCY3/RBJ, GNPDA2, and SEC16B genes, respectively) in 700 Koreans within the BMI ranging from 16.1 to 40.6 (27.6 ± 4.5) kg/m2. Linear regression analysis showed that DNA methylation of the four CpG sites explained 10.9% total variance in BMI, and the model constructed using age information, genetic score from eight SNPs, and DNA methylation at four CpG sites could account for 17.4% of BMI variance. Using data mining techniques, i.e., decision tree (Entropy and Gini), random forest, and bagging, a total of eight models with BMI 31 or 32 as a cutoff value were also constructed based on the data obtained from 490 training samples with age and sex as a covariate. Among them, a random forest model with a cutoff value of 31 showed the best performance with 63.3% accuracy and the AUC value of 0.682 in 210 test set samples. In the present study, we could replicate the previous finding that DNA methylation contributes more to BMI than do genetic factors. In addition, although the accuracy for the prediction of BMI was not high, our study is meaningful in respect of the ability to use a small number of markers to achieve similar prediction accuracy to that obtained from a model composed of more than a thousand markers, which adds support to continued research to identify a small set of predictive markers for practical application in the forensic field.
Collapse
|
5
|
Wen D, Sun S, Liu Y, Li J, Yang Z, Kureshi A, Fu Y, Li H, Jiang B, Jin C, Cai J, Zha L. Considering the flanking region variants of nonbinary SNP and phenotype-informative SNP to constitute 30 microhaplotype loci for increasing the discriminative ability of forensic applications. Electrophoresis 2021; 42:1115-1126. [PMID: 33483973 DOI: 10.1002/elps.202000341] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 12/22/2020] [Accepted: 01/15/2021] [Indexed: 12/15/2022]
Abstract
The flanking region variants of nonbinary SNPs and phenotype-informative SNPs (piSNPs) have been observed, which may greatly improve the discriminative ability after constituting microhaplotype. In this study, 30 microhaplotype loci based on the nonbinary SNPs and piSNPs (shown to be related to phenotypes such as hair and eye color) were selected. Genotyping were conducted on 100 unrelated northern Han Chinese, and the 26 populations from the 1000 Genome Project were also included for comparison of populations differentiation. The simulated study was conducted for evaluating the efficiency of kinship testing. These 30 microhaplotype loci we selected had good polymorphism, with a mean effective number of alleles (Ae) of 3.46. The average Ae increase was 1.27 compared with the target SNPs. The populations from the five regions worldwide could also be distinguished using these loci. The results of kinship testing showed that these microhaplotype loci had the similar ability as 15 STR loci of AmpFlSTRR IdentifilerR PCR Amplification Kit to identify the biological parent and a stronger ability to exclude the nonbiological parents. So, these 30 microhaplotype loci may be multifunctional for forensic application, including the ability of personal identification and kinship testing equivalent to 15 STR loci, and the power of ancestry inference for distinguishing the main intercontinental population. Moreover, our selected phenotypic microhaplotype loci may theoretically have phenotype prediction capabilities. But the phenotype prediction efficiency of these phenotypic microhaplotype loci may be worse than that of piSNPs and the detailed prediction accuracy of different populations needs to be further studied.
Collapse
Affiliation(s)
- Dan Wen
- Department of Forensic Medicine, School of Basic Medical Sciences, Central South University, Changsha, P. R. China
| | - Shule Sun
- Department of Forensic Medicine, School of Basic Medical Sciences, Central South University, Changsha, P. R. China
| | - Ying Liu
- Department of Forensic Medicine, School of Basic Medical Sciences, Central South University, Changsha, P. R. China
| | - Jienan Li
- Department of Forensic Medicine, School of Basic Medical Sciences, Central South University, Changsha, P. R. China
| | - Zedeng Yang
- Department of Forensic Medicine, School of Basic Medical Sciences, Central South University, Changsha, P. R. China
| | - Aliye Kureshi
- School of Basic Medical Sciences, Xinjiang Medical University, Urumqi, P. R. China
| | - Yan Fu
- Huazhi Biotech Co., Ltd, Changsha, P. R. China
| | - Henan Li
- Microanaly Gene Technologies Co., Ltd, Hefei, P. R. China
| | - Bowei Jiang
- The First Research Institute of the Ministry of Public Security P.R.C, Beijing, P. R. China
| | - Chuan Jin
- The First Research Institute of the Ministry of Public Security P.R.C, Beijing, P. R. China
| | - Jifeng Cai
- Department of Forensic Medicine, School of Basic Medical Sciences, Central South University, Changsha, P. R. China
| | - Lagabaiyila Zha
- Department of Forensic Medicine, School of Basic Medical Sciences, Central South University, Changsha, P. R. China
| |
Collapse
|
6
|
Parfenchyk MS, Kotava SA. The Theoretical Framework for the Panels of DNA Markers Formation in the Forensic Determination of an Individual Ancestral Origin. RUSS J GENET+ 2021. [DOI: 10.1134/s1022795421010105] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
7
|
Truelsen D, Pereira V, Phillips C, Morling N, Børsting C. Evaluation of a custom GeneRead™ massively parallel sequencing assay with 210 ancestry informative SNPs using the Ion S5™ and MiSeq platforms. Forensic Sci Int Genet 2020; 50:102411. [PMID: 33176271 DOI: 10.1016/j.fsigen.2020.102411] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 10/14/2020] [Accepted: 10/19/2020] [Indexed: 01/20/2023]
Abstract
A custom GeneRead DNAseq SNP panel with 210 markers was evaluated using the Ion S5 and MiSeq sequencing platforms. Sensitivity, PCR cycle number, and the use of half volume of reagents for target enrichment and library preparation were tested. Furthermore, genotype concordance between results obtained with the different sequencing platforms and with known profiles generated using other sequencing assays was analysed. The GeneRead DNASeq SNP assay gave reproducible results with an input of 200 pg DNA on both platforms. A total of 204 loci were successfully sequenced. Three loci failed completely in the PCR amplification, and three additional loci displayed frequent locus drop-outs due to low read depth or high heterozygote imbalance. Overall, the read depth across the loci was more well-balanced with the MiSeq, while the heterozygote balance was less variable with the Ion S5. Noise levels were low on both platforms (median< 0.2 %). Two simple criteria for genotyping were applied: A minimum threshold of 45 reads and an acceptable heterozygote balance range of 0.3-3.0. Complete concordance between platforms was observed except for three genotypes in one of the poorly performing loci, rs1470637. This locus had relatively low read depths on both platforms, skewed heterozygote balance, and frequent locus drop-outs. There was also full genotype concordance between the results from the GeneRead assay and known profiles generated with the QIAseq and Ion AmpliSeq assays. The few discordant results were either due to locus drop-outs in the poorly performing loci or allele drop-outs in the QIAseq assay. Profiles with a minimum of 179 SNPs were obtained from four challenging case work samples (blood swabs, bone, or blood from a corpse). Overall, the GeneRead DNASeq assay showed considerable potential and could provide a reliable method for SNP genotyping in cases involving identification of individuals, prediction of phenotypic traits, and ancestry inference.
Collapse
Affiliation(s)
- Ditte Truelsen
- Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, 2100 Copenhagen, Denmark.
| | - Vania Pereira
- Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, 2100 Copenhagen, Denmark
| | - Chris Phillips
- Forensic Genetics Unit, Institute of Forensic Sciences, University of Santiago de Compostela, Spain
| | - Niels Morling
- Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, 2100 Copenhagen, Denmark; Department of Mathematical Sciences, Aalborg University, DK-9220 Aalborg East, Denmark
| | - Claus Børsting
- Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, 2100 Copenhagen, Denmark
| |
Collapse
|
8
|
Pfaffelhuber P, Grundner-Culemann F, Lipphardt V, Baumdicker F. How to choose sets of ancestry informative markers: A supervised feature selection approach. Forensic Sci Int Genet 2020; 46:102259. [PMID: 32105949 DOI: 10.1016/j.fsigen.2020.102259] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Revised: 12/23/2019] [Accepted: 02/01/2020] [Indexed: 01/06/2023]
Abstract
Inference of the Biogeographical Ancestry (BGA) of a person or trace relies on three ingredients: (1) a reference database of DNA samples including BGA information; (2) a statistical clustering method; (3) a set of loci which segregate dependent on geographical location, i.e. a set of so-called Ancestry Informative Markers (AIMs). We used the theory of feature selection from statistical learning in order to obtain AIMsets for BGA inference. Using simulations, we show that this learning procedure works in various cases, and outperforms ad hoc methods, based on statistics like FST or informativeness for the choice of AIMs. Applying our method to data from the 1000 genomes project (excluding Admixed Americans) we identified an AIMset of 12 SNPs, which gives a vanishing misclassification error on a continental scale, as do other published AIMsets. In fact, cross validation shows that there exists a multitude of sets with comparable performance to the optimal AIMset. On a sub-continental scale, we find a set of 55 SNPs for distinguishing the five European populations. The misclassification error is reduced by a factor of two relative to published AIMsets, but is still 30% and therefore too large in order to be useful in forensic applications.
Collapse
Affiliation(s)
- Peter Pfaffelhuber
- University of Freiburg, Department of Mathematical Stochastics, Ernst-Zermelo-Straße 1, D-79104 Freiburg, Germany.
| | | | - Veronika Lipphardt
- University College Freiburg, Bertoldstraße 17, D-79098 Freiburg, Germany
| | - Franz Baumdicker
- University of Freiburg, Department of Mathematical Stochastics, Ernst-Zermelo-Straße 1, D-79104 Freiburg, Germany
| |
Collapse
|