1
|
Govender P, Fashoto SG, Maharaj L, Adeleke MA, Mbunge E, Olamijuwon J, Akinnuwesi B, Okpeku M. The application of machine learning to predict genetic relatedness using human mtDNA hypervariable region I sequences. PLoS One 2022; 17:e0263790. [PMID: 35180257 PMCID: PMC8856515 DOI: 10.1371/journal.pone.0263790] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 01/26/2022] [Indexed: 11/21/2022] Open
Abstract
Human identification of unknown samples following disaster and mass casualty events is essential, especially to bring closure to family and friends of the deceased. Unfortunately, victim identification is often challenging for forensic investigators as analysis becomes complicated when biological samples are degraded or of poor quality as a result of exposure to harsh environmental factors. Mitochondrial DNA becomes the ideal option for analysis, particularly for determining the origin of the samples. In such events, the estimation of genetic parameters plays an important role in modelling and predicting genetic relatedness and is useful in assigning unknown individuals to an ethnic group. Various techniques exist for the estimation of genetic relatedness, but the use of Machine learning (ML) algorithms are novel and presently the least used in forensic genetic studies. In this study, we investigated the ability of ML algorithms to predict genetic relatedness using hypervariable region I sequences; that were retrieved from the GenBank database for three race groups, namely African, Asian and Caucasian. Four ML classification algorithms; Support vector machines (SVM), Linear discriminant analysis (LDA), Quadratic discriminant analysis (QDA) and Random Forest (RF) were hybridised with one-hot encoding, Principal component analysis (PCA) and Bags of Words (BoW), and were compared for inferring genetic relatedness. The findings from this study on WEKA showed that genetic inferences based on PCA-SVM achieved an overall accuracy of 80–90% and consistently outperformed PCA-LDA, PCA-RF and PCA-QDA, while in Python BoW-PCA-RF achieved 94.4% accuracy which outperformed BoW-PCA-SVM, BoW-PCA-LDA and BoW-PCA-QDA respectively. ML results from the use of WEKA and Python software tools displayed higher accuracies as compared to the Analysis of molecular variance results. Given the results, SVM and RF algorithms are likely to also be useful in other sequence classification applications, making it a promising tool in genetics and forensic science. The study provides evidence that ML can be utilized as a supplementary tool for forensic genetics casework analysis.
Collapse
Affiliation(s)
- Priyanka Govender
- Discipline of Genetics, School of Life Sciences, University of KwaZulu-Natal, Westville, South Africa
| | - Stephen Gbenga Fashoto
- Faculty of Science and Engineering, Department of Computer Science, Computational Intelligence and Health Informatics Research Group, University of Eswatini, Kwaluseni, Kingdom of Eswatini
| | - Leah Maharaj
- Discipline of Genetics, School of Life Sciences, University of KwaZulu-Natal, Westville, South Africa
| | - Matthew A. Adeleke
- Discipline of Genetics, School of Life Sciences, University of KwaZulu-Natal, Westville, South Africa
| | - Elliot Mbunge
- Faculty of Science and Engineering, Department of Computer Science, Computational Intelligence and Health Informatics Research Group, University of Eswatini, Kwaluseni, Kingdom of Eswatini
| | - Jeremiah Olamijuwon
- Faculty of Science and Engineering, Department of Computer Science, Computational Intelligence and Health Informatics Research Group, University of Eswatini, Kwaluseni, Kingdom of Eswatini
| | - Boluwaji Akinnuwesi
- Faculty of Science and Engineering, Department of Computer Science, Computational Intelligence and Health Informatics Research Group, University of Eswatini, Kwaluseni, Kingdom of Eswatini
| | - Moses Okpeku
- Discipline of Genetics, School of Life Sciences, University of KwaZulu-Natal, Westville, South Africa
- * E-mail:
| |
Collapse
|
2
|
Yang FC, Tseng B, Lin CY, Yu YJ, Linacre A, Lee JCI. Population inference based on mitochondrial DNA control region data by the nearest neighbors algorithm. Int J Legal Med 2021; 135:1191-1199. [PMID: 33586030 DOI: 10.1007/s00414-021-02520-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 01/27/2021] [Indexed: 11/24/2022]
Abstract
Population and geographic assignment are frequently undertaken using DNA sequences on the mitochondrial genome. Assignment to broad continental populations is common, although finer resolution to subpopulations can be less accurate due to shared genetic ancestry at a local level and members of different ancestral subpopulations cohabiting the same geographic area. This study reports on the accuracy of population and subpopulation assignment by using the sequence data obtained from the 3070 mitochondrial genomes and applying the K-nearest neighbors (KNN) algorithm. These data also included training samples used for continental and population assignment comprised of 1105 Europeans (including Austria, France, Germany, Spain, and England and Caucasian countries), 374 Africans (including North and East Africa and non-specific area (Pan-Africa)), and 1591 Asians (including Japan, Philippines, and Taiwan). Subpopulations included in this study were 1153 mitochondrial DNA (mtDNA) control region sequences from 12 subpopulations in Taiwan (including Han, Hakka, Ami, Atayal, Bunun, Paiwan, Puyuma, Rukai, Saisiyat, Tsou, Tao, and Pingpu). Additionally, control region sequence data from a further 50 samples, obtained from the Sigma Company, were included after they were amplified and sequenced. These additional 50 samples acted as the "testing samples" to verify the accuracy of the population. In this study, based on genetic distances as genetic metric, we used the KNN algorithm and the K-weighted-nearest neighbors (KWNN) algorithm weighted by genetic distance to classify individuals into continental populations, and subpopulations within the same continent. Accuracy results of ethnic inferences at the level of continental populations and of subpopulations among KNN and KWNN algorithms were obtained. The training sample set achieved an overall accuracy of 99 to 82% for assignment to their continental populations with K values from 1 to 101. Population assignment for subpopulations with K assignments from 1 to 5 reached an accuracy of 77 to 54%. Four out of 12 Taiwanese populations returned an accuracy of assignment of over 60%, Ami (66%), Atayal (67%), Saisiyat (66%), and Tao (80%). For the testing sample set, results of ethnic prediction for continental populations with recommended K values as 5, 10, and 35, based on results of the training sample set, achieved overall an accuracy of 100 to 94%. This study provided an accurate method in population assignment for not only continental populations but also subpopulations, which can be useful in forensic and anthropological studies.
Collapse
Affiliation(s)
- Fu-Chi Yang
- Department of Forensic Medicine, College of Medicine, National Taiwan University, No.1 Jen-Ai Road Section 1, Taipei, 10051, Taiwan
| | - Bill Tseng
- Department of Forensic Medicine, College of Medicine, National Taiwan University, No.1 Jen-Ai Road Section 1, Taipei, 10051, Taiwan
| | - Chun-Yen Lin
- Institute of Forensic Medicine, Ministry of Justice, New Taipei City, 23016, Taiwan
| | - Yu-Jen Yu
- Department of Forensic Medicine, College of Medicine, National Taiwan University, No.1 Jen-Ai Road Section 1, Taipei, 10051, Taiwan
| | - Adrian Linacre
- College of Science & Engineering, Flinders University, Adelaide, 5001, Australia
| | - James Chun-I Lee
- Department of Forensic Medicine, College of Medicine, National Taiwan University, No.1 Jen-Ai Road Section 1, Taipei, 10051, Taiwan.
| |
Collapse
|
3
|
Mogensen HS, Tvedebrink T, Børsting C, Pereira V, Morling N. Ancestry prediction efficiency of the software GenoGeographer using a z-score method and the ancestry informative markers in the Precision ID Ancestry Panel. Forensic Sci Int Genet 2020; 44:102154. [DOI: 10.1016/j.fsigen.2019.102154] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Revised: 07/25/2019] [Accepted: 08/24/2019] [Indexed: 10/25/2022]
|
4
|
Pardo-Seco J, Gómez-Carballa A, Bello X, Martinón-Torres F, Salas A. Biogeographical informativeness of Y-STR haplotypes. Sci Bull (Beijing) 2019; 64:1381-1384. [PMID: 36659691 DOI: 10.1016/j.scib.2019.07.025] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Affiliation(s)
- Jacobo Pardo-Seco
- Unidade de Xenética, Instituto de Ciencias Forenses (INCIFOR), Facultade de Medicina, Universidade de Santiago de Compostela, and GenPoB Research Group, Instituto de Investigaciones Sanitarias (IDIS), Hospital Clínico Universitario de Santiago (SERGAS), Galicia, Spain; Grupo de Investigación en Genética, Vacunas, Infecciones y Pediatría (GENVIP), Universidade de Santiago de Compostela, and Hospital Clínico Universitario de Santiago (SERGAS), Galicia, Spain
| | - Alberto Gómez-Carballa
- Unidade de Xenética, Instituto de Ciencias Forenses (INCIFOR), Facultade de Medicina, Universidade de Santiago de Compostela, and GenPoB Research Group, Instituto de Investigaciones Sanitarias (IDIS), Hospital Clínico Universitario de Santiago (SERGAS), Galicia, Spain; Grupo de Investigación en Genética, Vacunas, Infecciones y Pediatría (GENVIP), Universidade de Santiago de Compostela, and Hospital Clínico Universitario de Santiago (SERGAS), Galicia, Spain
| | - Xabier Bello
- Unidade de Xenética, Instituto de Ciencias Forenses (INCIFOR), Facultade de Medicina, Universidade de Santiago de Compostela, and GenPoB Research Group, Instituto de Investigaciones Sanitarias (IDIS), Hospital Clínico Universitario de Santiago (SERGAS), Galicia, Spain; Grupo de Investigación en Genética, Vacunas, Infecciones y Pediatría (GENVIP), Universidade de Santiago de Compostela, and Hospital Clínico Universitario de Santiago (SERGAS), Galicia, Spain
| | - Federico Martinón-Torres
- Grupo de Investigación en Genética, Vacunas, Infecciones y Pediatría (GENVIP), Universidade de Santiago de Compostela, and Hospital Clínico Universitario de Santiago (SERGAS), Galicia, Spain
| | - Antonio Salas
- Unidade de Xenética, Instituto de Ciencias Forenses (INCIFOR), Facultade de Medicina, Universidade de Santiago de Compostela, and GenPoB Research Group, Instituto de Investigaciones Sanitarias (IDIS), Hospital Clínico Universitario de Santiago (SERGAS), Galicia, Spain; Grupo de Investigación en Genética, Vacunas, Infecciones y Pediatría (GENVIP), Universidade de Santiago de Compostela, and Hospital Clínico Universitario de Santiago (SERGAS), Galicia, Spain.
| |
Collapse
|
5
|
Looking at cancer health disparities without the colored lenses. CANCER HEALTH DISPARITIES 2019; 3:e1-e9. [PMID: 31440743 DOI: 10.9777/chd.2019.1004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Cancer health disparities (CHDs), defined as the adverse differences in cancer incidence and mortality, are prevalent in certain racial and ethnic groups. Underlying causes of CHDs are multi-factorial and debatable. While low socioeconomic status, geographical location, lifestyle and behavioral factors are mostly believed to contribute to CHDs, regardless of ethnic and racial background, significant data now also exist to support a genetic basis of such disparities as well. Clearly, CHDs could best be understood by studying the interplay of multiple (genetic and non-genetic) factors and then translating the resulting knowledge into effective approaches for reducing the existing disparity gaps. This review article highlights these aspects in brief and calls the people of different expertise to work together to make an impact and tackle the challenges associated with CHDs.
Collapse
|
6
|
Biogeographical origin and timing of the founder ichthyosis TGM1 c.1187G > A mutation in an isolated Ecuadorian population. Sci Rep 2019; 9:7175. [PMID: 31073126 PMCID: PMC6509209 DOI: 10.1038/s41598-019-43133-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Accepted: 04/11/2019] [Indexed: 11/22/2022] Open
Abstract
An unusually high frequency of the lamellar ichthyosis TGM1 mutation, c.1187G > A, has been observed in the Ecuadorian province of Manabí. Recently, the same mutation has been detected in a Galician patient (Northwest of Spain). By analyzing patterns of genetic variation around this mutation in Ecuadorian patients and population matched controls, we were able to estimate the age of c.1187G > A and the time to their most recent common ancestor (TMRCA) of c.1187G > A Ecuadorian carriers. While the estimated mutation age is 41 generations ago (~1,025 years ago [ya]), the TMRCA of Ecuadorian c.1187G > A carrier haplotypes dates to just 17 generations (~425 ya). Probabilistic-based inferences of local ancestry allowed us to infer a most likely European origin of a few (16% to 30%) Ecuadorian haplotypes carrying this mutation. In addition, inferences on demographic historical changes based on c.1187G > A Ecuadorian carrier haplotypes estimated an exponential population growth starting ~20 generations, compatible with a recent founder effect occurring in Manabí. Two main hypotheses can be considered for the origin of c.1187G > A: (i) the mutation could have arisen in Spain >1,000 ya (being Galicia the possible homeland) and then carried to Ecuador by Spaniards in colonial times ~400 ya, and (ii) two independent mutational events originated this mutation in Ecuador and Galicia. The geographic and cultural characteristics of Manabí could have favored a founder effect that explains the high prevalence of TGM1 c.1187G > A in this region.
Collapse
|
7
|
Weight of the evidence of genetic investigations of ancestry informative markers. Theor Popul Biol 2018; 120:1-10. [DOI: 10.1016/j.tpb.2017.12.004] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Revised: 12/11/2017] [Accepted: 12/14/2017] [Indexed: 01/03/2023]
|
8
|
Toscanini U, Gaviria A, Pardo-Seco J, Gómez-Carballa A, Moscoso F, Vela M, Cobos S, Lupero A, Zambrano AK, Martinón-Torres F, Carabajo-Marcillo A, Yunga-León R, Ugalde-Noritz N, Ordoñez-Ugalde A, Salas A. The geographic mosaic of Ecuadorian Y-chromosome ancestry. Forensic Sci Int Genet 2017; 33:59-65. [PMID: 29197245 DOI: 10.1016/j.fsigen.2017.11.011] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Revised: 11/16/2017] [Accepted: 11/18/2017] [Indexed: 11/29/2022]
Abstract
Ecuadorians originated from a complex mixture of Native American indigenous people with Europeans and Africans. We analyzed Y-chromosome STRs (Y-STRs) in a sample of 415 Ecuadorians (145 using the AmpFlSTR® Yfiler™ system [Life Technologies, USA] and 270 using the PowerPlex®Y23 system [Promega Corp., USA]; hereafter Yfiler and PPY23, respectively) representing three main ecological continental regions of the country, namely Amazon rainforest, Andes, and Pacific coast. Diversity values are high in the three regions, and the PPY23 exhibits higher discrimination power than the Yfiler set. While summary statistics, AMOVA, and RST distances show low to moderate levels of population stratification, inferred ancestry derived from Y-STRs reveal clear patterns of geographic variation. The major ancestry in Ecuadorian males is European (61%), followed by an important Native American component (34%); whereas the African ancestry (5%) is mainly concentrated in the Northwest corner of the country. We conclude that classical procedures for measuring population stratification do not have the desirable sensitivity. Statistical inference of ancestry from Y-STRS is a satisfactory alternative for revealing patterns of spatial variation that would pass unnoticed when using popular statistical summary indices.
Collapse
Affiliation(s)
- U Toscanini
- Pricai-Fundación Favaloro, Buenos Aires, Argentina; Unidade de Xenética, Departamento de Anatomía Patolóxica e Ciencias Forenses, Instituto de Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, and GenPoB Research Group, Instituto de Investigaciones Sanitarias (IDIS), Hospital Clínico Universitario de Santiago, Galicia (SERGAS), Spain
| | - A Gaviria
- Laboratorio de Genética Molecular, Centros Médicos Especializados Cruz Roja Ecuatoriana-Cruz Vital, Quito, Ecuador
| | - J Pardo-Seco
- Unidade de Xenética, Departamento de Anatomía Patolóxica e Ciencias Forenses, Instituto de Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, and GenPoB Research Group, Instituto de Investigaciones Sanitarias (IDIS), Hospital Clínico Universitario de Santiago, Galicia (SERGAS), Spain; Translational Pediatrics and Infectious Diseases, Hospital Clínico Universitario de Santiago, Santiago de Compostela, Spain; GENVIP Research Group, Instituto de Investigación Sanitaria de Santiago, Galicia, Spain(2)
| | - A Gómez-Carballa
- Unidade de Xenética, Departamento de Anatomía Patolóxica e Ciencias Forenses, Instituto de Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, and GenPoB Research Group, Instituto de Investigaciones Sanitarias (IDIS), Hospital Clínico Universitario de Santiago, Galicia (SERGAS), Spain; Translational Pediatrics and Infectious Diseases, Hospital Clínico Universitario de Santiago, Santiago de Compostela, Spain; GENVIP Research Group, Instituto de Investigación Sanitaria de Santiago, Galicia, Spain(2)
| | - F Moscoso
- Unidade de Xenética, Departamento de Anatomía Patolóxica e Ciencias Forenses, Instituto de Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, and GenPoB Research Group, Instituto de Investigaciones Sanitarias (IDIS), Hospital Clínico Universitario de Santiago, Galicia (SERGAS), Spain; Laboratorio Biomolecular, Cuenca, Ecuador
| | - M Vela
- Laboratorio de Genética Molecular, Centros Médicos Especializados Cruz Roja Ecuatoriana-Cruz Vital, Quito, Ecuador
| | - S Cobos
- Laboratorio de Genética Molecular, Centros Médicos Especializados Cruz Roja Ecuatoriana-Cruz Vital, Quito, Ecuador
| | - A Lupero
- Laboratorio de Genética Molecular, Centros Médicos Especializados Cruz Roja Ecuatoriana-Cruz Vital, Quito, Ecuador
| | - A K Zambrano
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad Tecnológica Equinoccial, Quito, 1701129, Ecuador
| | - F Martinón-Torres
- Translational Pediatrics and Infectious Diseases, Hospital Clínico Universitario de Santiago, Santiago de Compostela, Spain; GENVIP Research Group, Instituto de Investigación Sanitaria de Santiago, Galicia, Spain(2)
| | | | | | | | - A Ordoñez-Ugalde
- Unidade de Xenética, Departamento de Anatomía Patolóxica e Ciencias Forenses, Instituto de Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, and GenPoB Research Group, Instituto de Investigaciones Sanitarias (IDIS), Hospital Clínico Universitario de Santiago, Galicia (SERGAS), Spain; Laboratorio Biomolecular, Cuenca, Ecuador; Neurogenetics Group, FPGMX-IDIS, Santiago de Compostela, Spain
| | - A Salas
- Unidade de Xenética, Departamento de Anatomía Patolóxica e Ciencias Forenses, Instituto de Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, and GenPoB Research Group, Instituto de Investigaciones Sanitarias (IDIS), Hospital Clínico Universitario de Santiago, Galicia (SERGAS), Spain.
| |
Collapse
|
9
|
Ponomarenko P, Ryutov A, Maglinte DT, Baranova A, Tatarinova TV, Gai X. Clinical utility of the low-density Infinium QC genotyping Array in a genomics-based diagnostics laboratory. BMC Med Genomics 2017; 10:57. [PMID: 28985730 PMCID: PMC5639583 DOI: 10.1186/s12920-017-0297-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Accepted: 10/02/2017] [Indexed: 11/10/2022] Open
Abstract
Background With 15,949 markers, the low-density Infinium QC Array-24 BeadChip enables linkage analysis, HLA haplotyping, fingerprinting, ethnicity determination, mitochondrial genome variations, blood groups and pharmacogenomics. It represents an attractive independent QC option for NGS-based diagnostic laboratories, and provides cost-efficient means for determining gender, ethnic ancestry, and sample kinships, that are important for data interpretation of NGS-based genetic tests. Methods We evaluated accuracy and reproducibility of Infinium QC genotyping calls by comparing them with genotyping data of the same samples from other genotyping platforms, whole genome/exome sequencing. Accuracy and robustness of determining gender, provenance, and kinships were assessed. Results Concordance of genotype calls between Infinium QC and other platforms was above 99%. Here we show that the chip’s ancestry informative markers are sufficient for ethnicity determination at continental and sometimes subcontinental levels, with assignment accuracy varying with the coverage for a particular region and ethnic groups. Mean accuracies of provenance prediction at a regional level were varied from 81% for Asia, to 89% for Americas, 86% for Africa, 97% for Oceania, 98% for Europe, and 100% for India. Mean accuracy of ethnicity assignment predictions was 63%. Pairwise concordances of AFR samples with the samples from any other super populations were the lowest (0.39–0.43), while the concordances within the same population were relatively high (0.55–0.61). For all populations except African, cross-population comparisons were similar in their concordance ranges to the range of within-population concordances (0.54–0.57). Gender determination was correct in all tested cases. Conclusions Our results indicate that the Infinium QC Array-24 chip is suitable for cost-efficient, independent QC assaying in the settings of an NGS-based molecular diagnostic laboratory; hence, we recommend its integration into the standard laboratory workflow. Low-density chips can provide sample-specific measures for variant call accuracy, prevent sample mix-ups, validate self-reported ethnicities, and detect consanguineous cases. Integration of low-density chips into QC procedures aids proper interpretation of candidate sequence variants. To enhance utility of this low-density chip, we recommend expansion of ADME and mitochondrial markers. Inexpensive Infinium-like low-density human chips have a potential to become a “Swiss army knife” among genotyping assays suitable for many applications requiring high-throughput assays. Electronic supplementary material The online version of this article (10.1186/s12920-017-0297-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Petr Ponomarenko
- Department of Biology, University of La Verne, La Verne, CA, USA
| | - Alex Ryutov
- Center for Personalized Medicine, Department of Pathology and Laboratory Medicine, Children's Hospital Los Angeles, Los Angeles, CA, USA
| | - Dennis T Maglinte
- Center for Personalized Medicine, Department of Pathology and Laboratory Medicine, Children's Hospital Los Angeles, Los Angeles, CA, USA
| | - Ancha Baranova
- School of Systems Biology, George Mason University, Fairfax, VA, USA.,Research Center for Medical Genetics, Moscow, Russia.,Atlas Biomed Group, Moscow, Russia
| | - Tatiana V Tatarinova
- Department of Biology, University of La Verne, La Verne, CA, USA. .,School of Systems Biology, George Mason University, Fairfax, VA, USA. .,Atlas Biomed Group, Moscow, Russia.
| | - Xiaowu Gai
- Center for Personalized Medicine, Department of Pathology and Laboratory Medicine, Children's Hospital Los Angeles, Los Angeles, CA, USA. .,Department of Pathology and Laboratory Medicine, USC Keck School of Medicine, Los Angeles, CA, USA.
| |
Collapse
|
10
|
Heinz T, Pala M, Gómez-Carballa A, Richards MB, Salas A. Updating the African human mitochondrial DNA tree: Relevance to forensic and population genetics. Forensic Sci Int Genet 2016; 27:156-159. [PMID: 28086175 DOI: 10.1016/j.fsigen.2016.12.016] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2016] [Revised: 12/14/2016] [Accepted: 12/30/2016] [Indexed: 11/24/2022]
Abstract
Analysis of human mitochondrial DNA (mtDNA) variation plays an important role in forensic genetic investigations, especially in degraded biological samples and hair shafts. There are many issues of the mtDNA phylogeny that are of special interest to the forensic community, such as haplogroup classification or the post hoc investigation of potential errors in mtDNA datasets. We have analyzed >2200 mitogenomes of African ancestry with the aim of improving the known worldwide phylogeny. More than 300 new minor subclades were identified, and the Time to the Most Recent Common Ancestor (TMRCA) was estimated for each node of the phylogeny. Phylogeographic details are provided which might also be relevant to forensic genetics. The present study has special interest for forensic investigations because current analysis and interpretation of mtDNA casework rest on a solid worldwide phylogeny, as is evident from the role that phylogeny plays in popular resources in the field (e.g. PhyloTree), software (e.g. Haplogrep 2), and databases (e.g. EMPOP). Apart from this forensic genetic interest, we also highlight the impact of this research in anthropological studies, such as those related to the reconstruction of the transatlantic slave trade.
Collapse
Affiliation(s)
- Tanja Heinz
- Unidade de Xenética, Departamento de Anatomía Patolóxica e Ciencias Forenses, Instituto de Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, and GenPoB Research Group, Instituto de Investigaciones Sanitarias (IDIS), Hospital Clínico Universitario de Santiago, Galicia, Spain
| | - Maria Pala
- Department of Biological Sciences, School of Applied Sciences, University of Huddersfield, Huddersfield, United Kingdom
| | - Alberto Gómez-Carballa
- Unidade de Xenética, Departamento de Anatomía Patolóxica e Ciencias Forenses, Instituto de Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, and GenPoB Research Group, Instituto de Investigaciones Sanitarias (IDIS), Hospital Clínico Universitario de Santiago, Galicia, Spain
| | - Martin B Richards
- Department of Biological Sciences, School of Applied Sciences, University of Huddersfield, Huddersfield, United Kingdom
| | - Antonio Salas
- Unidade de Xenética, Departamento de Anatomía Patolóxica e Ciencias Forenses, Instituto de Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, and GenPoB Research Group, Instituto de Investigaciones Sanitarias (IDIS), Hospital Clínico Universitario de Santiago, Galicia, Spain.
| |
Collapse
|
11
|
Barral-Arca R, Pischedda S, Gómez-Carballa A, Pastoriza A, Mosquera-Miguel A, López-Soto M, Martinón-Torres F, Álvarez-Iglesias V, Salas A. Meta-Analysis of Mitochondrial DNA Variation in the Iberian Peninsula. PLoS One 2016; 11:e0159735. [PMID: 27441366 PMCID: PMC4956223 DOI: 10.1371/journal.pone.0159735] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Accepted: 07/07/2016] [Indexed: 12/14/2022] Open
Abstract
The Iberian Peninsula has been the focus of attention of numerous studies dealing with mitochondrial DNA (mtDNA) variation, most of them targeting the control region segment. In the present study we sequenced the control region of 3,024 Spanish individuals from areas where available data were still limited. We also compiled mtDNA haplotypes from the literature involving 4,588 sequences and 28 population groups or small regions. We meta-analyzed all these data in order to shed further light on patterns of geographic variation, taking advantage of the large sample size and geographic coverage, in contrast with the atomized sampling strategy of previous work. The results indicate that the main mtDNA haplogroups show primarily clinal geographic patterns across the Iberian geography, roughly along a North-South axis. Haplogroup HV0 (where haplogroup U is nested) is more prevalent in the Franco Cantabrian region, in good agreement with previous findings that identified this area as a climate refuge during the Last Glacial Maximum (LGM), prior to a subsequent demographic re-expansion towards Central Europe and the Mediterranean. Typical sub-Saharan and North African lineages are slightly more prevalent in South Iberia, although at low frequencies; this pattern has been shaped mainly by the transatlantic slave trade and the Arab invasion of the Iberian Peninsula. The results also indicate that summary statistics that aim to measure molecular variation, or AMOVA, have limited sensitivity to detect population substructure, in contrast to patterns revealed by phylogeographic analysis. Overall, the results suggest that mtDNA variation in Iberia is substantially stratified. These patterns might be relevant in biomedical studies given that stratification is a common cause of false positives in case-control mtDNA association studies, and should be also considered when weighting the DNA evidence in forensic casework, which is strongly dependent on haplotype frequencies.
Collapse
Affiliation(s)
- Ruth Barral-Arca
- Unidade de Xenética, Departamento de Anatomía Patolóxica e Ciencias Forenses, Instituto de Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, Galicia, Spain
- GenPop Research Group, Instituto de Investigaciones Sanitarias (IDIS), Hospital Clínico Universitario de Santiago, Galicia, Spain
| | - Sara Pischedda
- Unidade de Xenética, Departamento de Anatomía Patolóxica e Ciencias Forenses, Instituto de Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, Galicia, Spain
- GenPop Research Group, Instituto de Investigaciones Sanitarias (IDIS), Hospital Clínico Universitario de Santiago, Galicia, Spain
| | - Alberto Gómez-Carballa
- Unidade de Xenética, Departamento de Anatomía Patolóxica e Ciencias Forenses, Instituto de Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, Galicia, Spain
- GenPop Research Group, Instituto de Investigaciones Sanitarias (IDIS), Hospital Clínico Universitario de Santiago, Galicia, Spain
- Grupo de Investigación en Genética, Vacunas, Infecciones y Pediatría (GENVIP), Hospital Clínico Universitario and Universidade de Santiago de Compostela (USC), Galicia, Spain
| | - Ana Pastoriza
- Unidade de Xenética, Departamento de Anatomía Patolóxica e Ciencias Forenses, Instituto de Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, Galicia, Spain
| | - Ana Mosquera-Miguel
- Unidade de Xenética, Departamento de Anatomía Patolóxica e Ciencias Forenses, Instituto de Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, Galicia, Spain
| | - Manuel López-Soto
- Servicio de Biología, Instituto Nacional de Toxicología y Ciencias Forenses, Departamento de Sevilla, Sevilla, Spain
| | - Federico Martinón-Torres
- Grupo de Investigación en Genética, Vacunas, Infecciones y Pediatría (GENVIP), Hospital Clínico Universitario and Universidade de Santiago de Compostela (USC), Galicia, Spain
- Pediatric Emergency and Critical Care Division, Department of Pediatrics, Hospital Clínico Universitario de Santiago, Santiago de Compostela, Galicia, Spain
| | - Vanesa Álvarez-Iglesias
- Unidade de Xenética, Departamento de Anatomía Patolóxica e Ciencias Forenses, Instituto de Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, Galicia, Spain
| | - Antonio Salas
- Unidade de Xenética, Departamento de Anatomía Patolóxica e Ciencias Forenses, Instituto de Ciencias Forenses, Facultade de Medicina, Universidade de Santiago de Compostela, Galicia, Spain
- GenPop Research Group, Instituto de Investigaciones Sanitarias (IDIS), Hospital Clínico Universitario de Santiago, Galicia, Spain
- * E-mail:
| |
Collapse
|
12
|
Toscanini U, Brisighelli F, Llull C, Berardi G, Gómez A, Andreatta F, Pardo-Seco J, Gómez-Carballa A, Martinón-Torres F, Álvarez-Iglesias V, Salas A. Charting the Y-chromosome ancestry of present-day Argentinean Mennonites. J Hum Genet 2016; 61:507-13. [PMID: 26841831 DOI: 10.1038/jhg.2016.3] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2015] [Revised: 12/27/2015] [Accepted: 01/06/2016] [Indexed: 11/09/2022]
Abstract
Old Order Mennonite communities initially arose in Northern Europe (centered in the Netherlands) and derived from the Anabaptist movement of the 16th century. Mennonites migrated to the New World in the early 18th century, first to North America, and more recently to Mesoamerica and South America. We analyzed Y-chromosome short tandem repeats (STRs) and single nucleotide polymorphisms in males from a community of Mennonites, 'La Nueva Esperanza', which arrived to Argentina in 1985 from colonies in Bolivia and Mexico. Molecular diversity indices coupled with demographic simulations show that Mennonites have a reduced variability when compared with local Argentinean populations and 69 European population samples. Mennonite Y-STR haplotypes were mainly observed in Central Europe. In agreement, multidimensional scaling analyses based on RST genetic distances indicate that Mennonite Y-chromosomes are closely related to Central/Northern Europeans (the Netherlands, Switzerland and Denmark). In addition, statistical inferences made on the most likely geographic origin of Y-chromosome haplotypes point more specifically to the Netherlands as the populations that best represent the majority of the Mennonite Y-chromosomes. Overall, Y-chromosome variation of Mennonites shows the signatures of moderate reduction of variability when compared with source populations, which is in good agreement with their lifestyle in small endogamous demes. These genetic singularities could also help to understand disease conditions that are more prevalent among Mennonites.
Collapse
Affiliation(s)
- Ulises Toscanini
- PRICAI-Fundación Favaloro, Buenos Aires, Argentina.,Unidade de Xenética, Departamento de Anatomía Patolóxica e Ciencias Forenses, and Instituto de Ciencias Forenses, Grupo de Medicina Xenómica (GMX), Facultade de Medicina, Universidade de Santiago de Compostela, Galicia, Spain
| | - Francesca Brisighelli
- Unidade de Xenética, Departamento de Anatomía Patolóxica e Ciencias Forenses, and Instituto de Ciencias Forenses, Grupo de Medicina Xenómica (GMX), Facultade de Medicina, Universidade de Santiago de Compostela, Galicia, Spain
| | - Cintia Llull
- PRICAI-Fundación Favaloro, Buenos Aires, Argentina
| | | | - Andrea Gómez
- PRICAI-Fundación Favaloro, Buenos Aires, Argentina
| | | | - Jacobo Pardo-Seco
- Unidade de Xenética, Departamento de Anatomía Patolóxica e Ciencias Forenses, and Instituto de Ciencias Forenses, Grupo de Medicina Xenómica (GMX), Facultade de Medicina, Universidade de Santiago de Compostela, Galicia, Spain.,Grupo de Investigación en Genética, Vacunas, Infecciones y Pediatría (GENVIP), Hospital Clínico Universitario and Universidade de Santiago de Compostela (USC), Galicia, Spain
| | - Alberto Gómez-Carballa
- Unidade de Xenética, Departamento de Anatomía Patolóxica e Ciencias Forenses, and Instituto de Ciencias Forenses, Grupo de Medicina Xenómica (GMX), Facultade de Medicina, Universidade de Santiago de Compostela, Galicia, Spain.,Grupo de Investigación en Genética, Vacunas, Infecciones y Pediatría (GENVIP), Hospital Clínico Universitario and Universidade de Santiago de Compostela (USC), Galicia, Spain
| | - Federico Martinón-Torres
- Grupo de Investigación en Genética, Vacunas, Infecciones y Pediatría (GENVIP), Hospital Clínico Universitario and Universidade de Santiago de Compostela (USC), Galicia, Spain.,Translational Pediatrics and Infectious Diseases, Department of Pediatrics, Hospital Clínico Universitario de Santiago de Compostela, Galicia, Spain
| | - Vanesa Álvarez-Iglesias
- Unidade de Xenética, Departamento de Anatomía Patolóxica e Ciencias Forenses, and Instituto de Ciencias Forenses, Grupo de Medicina Xenómica (GMX), Facultade de Medicina, Universidade de Santiago de Compostela, Galicia, Spain
| | - Antonio Salas
- Unidade de Xenética, Departamento de Anatomía Patolóxica e Ciencias Forenses, and Instituto de Ciencias Forenses, Grupo de Medicina Xenómica (GMX), Facultade de Medicina, Universidade de Santiago de Compostela, Galicia, Spain.,Grupo de Investigación en Genética, Vacunas, Infecciones y Pediatría (GENVIP), Hospital Clínico Universitario and Universidade de Santiago de Compostela (USC), Galicia, Spain
| |
Collapse
|
13
|
Toscanini U, Vullo C, Berardi G, Llull C, Borosky A, Gómez A, Pardo-Seco J, Salas A. A comprehensive Y-STR portrait of Argentinean populations. Forensic Sci Int Genet 2016; 20:1-5. [DOI: 10.1016/j.fsigen.2015.09.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Revised: 08/09/2015] [Accepted: 09/02/2015] [Indexed: 01/10/2023]
|
14
|
Mersha TB, Abebe T. Self-reported race/ethnicity in the age of genomic research: its potential impact on understanding health disparities. Hum Genomics 2015; 9:1. [PMID: 25563503 PMCID: PMC4307746 DOI: 10.1186/s40246-014-0023-x] [Citation(s) in RCA: 255] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2014] [Accepted: 12/01/2014] [Indexed: 12/02/2022] Open
Abstract
This review explores the limitations of self-reported race, ethnicity, and genetic ancestry in biomedical research. Various terminologies are used to classify human differences in genomic research including race, ethnicity, and ancestry. Although race and ethnicity are related, race refers to a person's physical appearance, such as skin color and eye color. Ethnicity, on the other hand, refers to communality in cultural heritage, language, social practice, traditions, and geopolitical factors. Genetic ancestry inferred using ancestry informative markers (AIMs) is based on genetic/genomic data. Phenotype-based race/ethnicity information and data computed using AIMs often disagree. For example, self-reporting African Americans can have drastically different levels of African or European ancestry. Genetic analysis of individual ancestry shows that some self-identified African Americans have up to 99% of European ancestry, whereas some self-identified European Americans have substantial admixture from African ancestry. Similarly, African ancestry in the Latino population varies between 3% in Mexican Americans to 16% in Puerto Ricans. The implication of this is that, in African American or Latino populations, self-reported ancestry may not be as accurate as direct assessment of individual genomic information in predicting treatment outcomes. To better understand human genetic variation in the context of health disparities, we suggest using "ancestry" (or biogeographical ancestry) to describe actual genetic variation, "race" to describe health disparity in societies characterized by racial categories, and "ethnicity" to describe traditions, lifestyle, diet, and values. We also suggest using ancestry informative markers for precise characterization of individuals' biological ancestry. Understanding the sources of human genetic variation and the causes of health disparities could lead to interventions that would improve the health of all individuals.
Collapse
Affiliation(s)
- Tesfaye B Mersha
- Division of Asthma Research, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA.
| | - Tilahun Abebe
- Department of Biology, University of Northern Iowa, Cedar Falls, IA, USA.
| |
Collapse
|
15
|
DNA Commission of the International Society for Forensic Genetics: Revised and extended guidelines for mitochondrial DNA typing. Forensic Sci Int Genet 2014; 13:134-42. [DOI: 10.1016/j.fsigen.2014.07.010] [Citation(s) in RCA: 207] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2014] [Accepted: 07/19/2014] [Indexed: 11/21/2022]
|
16
|
Pardo-Seco J, Martinón-Torres F, Salas A. Evaluating the accuracy of AIM panels at quantifying genome ancestry. BMC Genomics 2014; 15:543. [PMID: 24981136 PMCID: PMC4101176 DOI: 10.1186/1471-2164-15-543] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2014] [Accepted: 06/19/2014] [Indexed: 01/24/2023] Open
Abstract
Background There is a growing interest among geneticists in developing panels of Ancestry Informative Markers (AIMs) aimed at measuring the biogeographical ancestry of individual genomes. The efficiency of these panels is commonly tested empirically by contrasting self-reported ancestry with the ancestry estimated from these panels. Results Using SNP data from HapMap we carried out a simulation-based study aimed at measuring the effect of SNP coverage on the estimation of genome ancestry. For three of the main continental groups (Africans, East Asians, Europeans) ancestry was first estimated using the whole HapMap SNP database as a proxy for global genome ancestry; these estimates were subsequently compared to those obtained from pre-designed AIM panels. Panels that consider >400 AIMs capture genome ancestry reasonably well, while those containing a few dozen AIMs show a large variability in ancestry estimates. Curiously, 500-1,000 SNPs selected at random from the genome provide an unbiased estimate of genome ancestry and perform as well as any AIM panel of similar size. In simulated scenarios of population admixture, panels containing few AIMs also show important deficiencies to measure genome ancestry. Conclusions The results indicate that the ability to estimate genome ancestry is strongly dependent on the number of AIMs used, and not primarily on their individual informativeness. Caution should be taken when making individual (medical, forensic, or anthropological) inferences based on AIMs. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-543) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | - Antonio Salas
- Unidade de Xenética, Departamento de Anatomía Patolóxica e Ciencias Forenses, and Instituto de Ciencias Forenses, Grupo de Medicina Xenómica (GMX), Facultade de Medicina, Universidade de Santiago de Compostela, 15872 Santiago de Compostela, Galicia, Spain.
| |
Collapse
|
17
|
Cardena MMSG, Ribeiro-dos-Santos Â, Santos S, Mansur AJ, Pereira AC, Fridman C. Assessment of the relationship between self-declared ethnicity, mitochondrial haplogroups and genomic ancestry in Brazilian individuals. PLoS One 2013; 8:e62005. [PMID: 23637946 PMCID: PMC3634831 DOI: 10.1371/journal.pone.0062005] [Citation(s) in RCA: 70] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2012] [Accepted: 03/15/2013] [Indexed: 11/19/2022] Open
Abstract
In populations that have a high degree of admixture, such as in Brazil, the sole use of ethnicity self-declaration information is not a good method for classifying individuals regarding their ethnicity. Here, we evaluate the relationship of self-declared ethnicities with genomic ancestry and mitochondrial haplogroups in 492 individuals from southeastern Brazil. Mitochondrial haplogroups were obtained by analyzing the hypervariable regions of the mitochondrial DNA (mtDNA), and the genomic ancestry was obtained using 48 autosomal insertion-deletion ancestry informative markers (AIM). Of the 492 individuals, 74.6% self-declared as White, 13.8% as Brown and 10.4% as Black. Classification of the mtDNA haplogroups showed that 46.3% had African mtDNA, and the genomic ancestry analysis showed that the main contribution was European (57.4%). When we looked at the distribution of mtDNA and genomic ancestry according to the self-declared ethnicities from 367 individuals who self-declared as White, 37.6% showed African mtDNA, and they had a high contribution of European genomic ancestry (63.3%) but also a significant contribution of African ancestry (22.2%). Of the 68 individuals who self-declared as Brown, 25% showed Amerindian mtDNA and similar contribution of European and African genomic ancestries. Of the 51 subjects who self-declared as black, 80.4% had African mtDNA, and the main contribution of genomic ancestry was African (55.6%), but they also had a significant proportion of European ancestry (32.1%). The Brazilian population had a uniform degree of Amerindian genomic ancestry, and it was only with the use of genetic markers (autosomal or mitochondrial) that we were able to capture Amerindian ancestry information. Additionally, it was possible to observe a high degree of heterogeneity in the ancestry for both types of genetic markers, which shows the high genetic admixture that is present in the Brazilian population. We suggest that in epidemiological studies, the use of these methods could provide complementary information.
Collapse
Affiliation(s)
- Mari M. S. G. Cardena
- Department of Legal Medicine, Ethics and Occupational Health, Medical School, University of São Paulo, São Paulo, São Paulo, Brazil
| | | | - Sidney Santos
- Laboratory of Human Genetics and Medicine, Federal University of Pará, Belém, Pará, Brazil
| | - Alfredo J. Mansur
- Department of Cardiology, Laboratory of Genetics and Molecular Cardiology, Heart Institute, Medical School, University of São Paulo, São Paulo, São Paulo, Brazil
| | - Alexandre C. Pereira
- Department of Cardiology, Laboratory of Genetics and Molecular Cardiology, Heart Institute, Medical School, University of São Paulo, São Paulo, São Paulo, Brazil
| | - Cintia Fridman
- Department of Legal Medicine, Ethics and Occupational Health, Medical School, University of São Paulo, São Paulo, São Paulo, Brazil
| |
Collapse
|
18
|
Prieto L, Alves C, Zimmermann B, Tagliabracci A, Prieto V, Montesino M, Whittle M, Anjos M, Cardoso S, Heinrichs B, Hernandez A, López-Parra A, Sala A, Saragoni V, Burgos G, Marino M, Paredes M, Mora-Torres C, Angulo R, Chemale G, Vullo C, Sánchez-Simón M, Comas D, Puente J, López-Cubría C, Modesti N, Aler M, Merigioli S, Betancor E, Pedrosa S, Plaza G, Masciovecchio M, Schneider P, Parson W. GHEP-ISFG proficiency test 2011: Paper challenge on evaluation of mitochondrial DNA results. Forensic Sci Int Genet 2013; 7:10-5. [DOI: 10.1016/j.fsigen.2012.04.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2011] [Revised: 04/06/2012] [Accepted: 04/20/2012] [Indexed: 11/16/2022]
|
19
|
Fine-scale estimation of location of birth from genome-wide single-nucleotide polymorphism data. Genetics 2011; 190:669-77. [PMID: 22095078 DOI: 10.1534/genetics.111.135657] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Systematic nonrandom mating in populations results in genetic stratification and is predominantly caused by geographic separation, providing the opportunity to infer individuals' birthplace from genetic data. Such inference has been demonstrated for individuals' country of birth, but here we use data from the Northern Finland Birth Cohort 1966 (NFBC1966) to investigate the characteristics of genetic structure within a population and subsequently develop a method for inferring location to a finer scale. Principal component analysis (PCA) shows that while the first PCs are particularly informative for location, there is also location information in the higher-order PCs, but it cannot be captured by a linear model. We introduce a new method, pcLOCATE, which is able to exploit this information to improve the accuracy of location inference. pcLOCATE uses individuals' PC values to estimate the probability of birth in each town and then averages over all towns to give an estimated longitude and latitude of birth using a fully Bayesian model. We apply pcLOCATE to the NFBC1966 data to estimate parental birthplace, testing with successively more PCs and finding the model with the top 23 PCs most accurate, with a median distance of 23 km between the estimated and the true location. pcLOCATE predicts the most recent residence of NFBC1966 individuals to a median distance of 47 km. We also apply pcLOCATE to Indian individuals from the London Life Sciences Prospective Population Study (LOLIPOP) data, and find that birthplace is predicated to a median distance of 54 km from the true location. A method with such accuracy is potentially valuable in population genetics and forensics.
Collapse
|
20
|
A statistical framework for the interpretation of mtDNA mixtures: forensic and medical applications. PLoS One 2011; 6:e26723. [PMID: 22053205 PMCID: PMC3203886 DOI: 10.1371/journal.pone.0026723] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2011] [Accepted: 10/02/2011] [Indexed: 11/19/2022] Open
Abstract
Background Mitochondrial DNA (mtDNA) variation is commonly analyzed in a wide range of different biomedical applications. Cases where more than one individual contribute to a stain genotyped from some biological material give rise to a mixture. Most forensic mixture cases are analyzed using autosomal markers. In rape cases, Y-chromosome markers typically add useful information. However, there are important cases where autosomal and Y-chromosome markers fail to provide useful profiles. In some instances, usually involving small amounts or degraded DNA, mtDNA may be the only useful genetic evidence available. Mitochondrial DNA mixtures also arise in studies dealing with the role of mtDNA variation in tumorigenesis. Such mixtures may be generated by the tumor, but they could also originate in vitro due to inadvertent contamination or a sample mix-up. Methods/Principal Findings We present the statistical methods needed for mixture interpretation and emphasize the modifications required for the more well-known methods based on conventional markers to generalize to mtDNA mixtures. Two scenarios are considered. Firstly, only categorical mtDNA data is assumed available, that is, the variants contributing to the mixture. Secondly, quantitative data (peak heights or areas) on the allelic variants are also accessible. In cases where quantitative information is available in addition to allele designation, it is possible to extract more precise information by using regression models. More precisely, using quantitative information may lead to a unique solution in cases where the qualitative approach points to several possibilities. Importantly, these methods also apply to clinical cases where contamination is a potential alternative explanation for the data. Conclusions/Significance We argue that clinical and forensic scientists should give greater consideration to mtDNA for mixture interpretation. The results and examples show that the analysis of mtDNA mixtures contributes substantially to forensic casework and may also clarify erroneous claims made in clinical genetics regarding tumorigenesis.
Collapse
|
21
|
Abstract
BACKGROUND The assignment of DNA samples to coarse population groups can be a useful but difficult task. One such example is the inference of coarse ethnic groupings for forensic applications. Ethnicity plays an important role in forensic investigation and can be inferred with the help of genetic markers. Being maternally inherited, of high copy number, and robust persistence in degraded samples, mitochondrial DNA may be useful for inferring coarse ethnicity. In this study, we compare the performance of methods for inferring ethnicity from the sequence of the hypervariable region of the mitochondrial genome. RESULTS We present the results of comprehensive experiments conducted on datasets extracted from the mtDNA population database, showing that ethnicity inference based on support vector machines (SVM) achieves an overall accuracy of 80-90%, consistently outperforming nearest neighbor and discriminant analysis methods previously proposed in the literature. We also evaluate methods of handling missing data and characterize the most informative segments of the hypervariable region of the mitochondrial genome. CONCLUSIONS Support vector machines can be used to infer coarse ethnicity from a small region of mitochondrial DNA sequence with surprisingly high accuracy. In the presence of missing data, utilizing only the regions common to the training sequences and a test sequence proves to be the best strategy. Given these results, SVM algorithms are likely to also be useful in other DNA sequence classification applications.
Collapse
Affiliation(s)
- Chih Lee
- Computer Science and Engineering Department, University of Connecticut, Storrs, CT, USA
| | - Ion I Măndoiu
- Computer Science and Engineering Department, University of Connecticut, Storrs, CT, USA
| | - Craig E Nelson
- Molecular and Cell Biology Department, University of Connecticut, Storrs, CT, USA
| |
Collapse
|
22
|
Abstract
The genetic structure of human populations is important in population genetics, forensics and medicine. Using genome-wide scans and individuals with all four grandparents born in the same settlement, we here demonstrate remarkable geographical structure across 8-30 km in three different parts of rural Europe. After excluding close kin and inbreeding, village of origin could still be predicted correctly on the basis of genetic data for 89-100% of individuals.
Collapse
|
23
|
Egeland T, Salas A. Estimating haplotype frequency and coverage of databases. PLoS One 2008; 3:e3988. [PMID: 19098988 PMCID: PMC2602601 DOI: 10.1371/journal.pone.0003988] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2008] [Accepted: 11/18/2008] [Indexed: 11/30/2022] Open
Abstract
A variety of forensic, population, and disease studies are based on haploid DNA (e.g. mitochondrial DNA or Y-chromosome data). For any set of genetic markers databases of conventional size will normally contain only a fraction of all haplotypes. For several applications, reliable estimates of haplotype frequencies, the total number of haplotypes and coverage of the database (the probability that the next random haplotype is contained in the database) will be useful. We propose different approaches to the problem based on classical methods as well as new applications of Principal Component Analysis (PCA). We also discuss previous proposals based on saturation curves. Several conclusions can be inferred from simulated and real data. First, classical estimates of the fraction of unseen haplotypes can be seriously biased. Second, there is no obvious way to decide on required sample size based on traditional approaches. Methods based on testing of hypotheses or length of confidence intervals may appear artificial since no single test or parameter stands out as particularly relevant. Rather the coverage may be more relevant since it indicates the percentage of different haplotypes that are contained in a database; if the coverage is low, there is a considerable chance that the next haplotype to be observed does not appear in the database and this indicates that the database needs to be expanded. Finally, freeware and example data sets accompany the methods discussed in this paper: http://folk.uio.no/thoree/nhap/.
Collapse
Affiliation(s)
- Thore Egeland
- Institute of Forensic Medicine, University of Oslo, Oslo, Norway.
| | | |
Collapse
|
24
|
Phillips C, Salas A, Sánchez J, Fondevila M, Gómez-Tato A, Álvarez-Dios J, Calaza M, de Cal MC, Ballard D, Lareu M, Carracedo Á. Inferring ancestral origin using a single multiplex assay of ancestry-informative marker SNPs. Forensic Sci Int Genet 2007; 1:273-80. [DOI: 10.1016/j.fsigen.2007.06.008] [Citation(s) in RCA: 218] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2007] [Revised: 06/25/2007] [Accepted: 06/27/2007] [Indexed: 10/22/2022]
|
25
|
Montesino M, Salas A, Crespillo M, Albarrán C, Alonso A, Alvarez-Iglesias V, Cano JA, Carvalho M, Corach D, Cruz C, Di Lonardo A, Espinheira R, Farfán MJ, Filippini S, García-Hirschfeld J, Hernández A, Lima G, López-Cubría CM, López-Soto M, Pagano S, Paredes M, Pinheiro MF, Rodríguez-Monge AM, Sala A, Sóñora S, Sumita DR, Vide MC, Whittle MR, Zurita A, Prieto L. Analysis of body fluid mixtures by mtDNA sequencing: An inter-laboratory study of the GEP-ISFG working group. Forensic Sci Int 2006; 168:42-56. [PMID: 16899347 DOI: 10.1016/j.forsciint.2006.06.066] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2006] [Revised: 06/15/2006] [Accepted: 06/17/2006] [Indexed: 10/24/2022]
Abstract
The mitochondrial DNA (mtDNA) working group of the GEP-ISFG (Spanish and Portuguese Group of the International Society for Forensic Genetics) carried out an inter-laboratory exercise consisting of the analysis of mtDNA sequencing patterns in mixed stains (saliva/semen and blood/semen). Mixtures were prepared with saliva or blood from a female donor and three different semen dilutions (pure, 1:10 and 1:20) in order to simulate forensic casework. All labs extracted the DNA by preferential lysis and amplified and sequenced the first mtDNA hypervariable region (HVS-I). Autosomal and Y-STR markers were also analysed in order to compare nuclear and mitochondrial results from the same DNA extracts. A mixed stain prepared using semen from a vasectomized individual was also analysed. The results were reasonably consistent among labs for the first fractions but not for the second ones, for which some laboratories reported contamination problems. In the first fractions, both the female and male haplotypes were generally detected in those samples prepared with undiluted semen. In contrast, most of the mixtures prepared with diluted semen only yielded the female haplotype, suggesting that the mtDNA copy number per cell is smaller in semen than in saliva or blood. Although the detection level of the male component decreased in accordance with the degree of semen dilution, it was found that the loss of signal was not consistently uniform throughout each electropherogram. Moreover, differences between mixtures prepared from different donors and different body fluids were also observed. We conclude that the particular characteristics of each mixed stain can deeply influence the interpretation of the mtDNA evidence in forensic mixtures (leading in some cases to false exclusions). In this sense, the implementation of preliminary tests with the aim of identifying the fluids involved in the mixture is an essential tool. In addition, in order to prevent incorrect conclusions in the interpretation of electropherograms we strongly recommend: (i) the use of additional sequencing primers to confirm the sequencing results and (ii) interpreting the results to the light of the phylogenetic perspective.
Collapse
|
26
|
Salas A, Bandelt HJ, Macaulay V, Richards MB. Phylogeographic investigations: the role of trees in forensic genetics. Forensic Sci Int 2006; 168:1-13. [PMID: 16814504 DOI: 10.1016/j.forsciint.2006.05.037] [Citation(s) in RCA: 89] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2006] [Revised: 05/19/2006] [Accepted: 05/21/2006] [Indexed: 11/16/2022]
Abstract
The human mitochondrial DNA (mtDNA) genome is commonly analyzed in various disciplines, such as population, medical, and forensic genetics, but conceptual and scientific exchange between them is still limited. Here we review several aspects of the mtDNA phylogeny that are particularly--but not exclusively--of interest to the forensic community. Among the issues that arise, we emphasize the importance of integrating evolutionary concepts into the forensic routine. We also discuss topics such as mtDNA mutation-rate heterogeneity and the weight of evidence, ethnic affiliations of mtDNA profiles, and the abuse of reference databases. Finally, we show the usefulness of coding-region variation in a forensic context.
Collapse
Affiliation(s)
- A Salas
- Unidad de Genética, Instituto de Medicina Legal, Facultad de Medicina, Universidad de Santiago de Compostela, 15782 Galicia, Spain.
| | | | | | | |
Collapse
|
27
|
Salas A, Phillips C, Carracedo A. Ancestry vs physical traits: the search for ancestry informative markers (AIMs). Int J Legal Med 2005; 120:188-9; author reply 190. [PMID: 16133562 DOI: 10.1007/s00414-005-0032-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2005] [Accepted: 07/08/2005] [Indexed: 10/25/2022]
|
28
|
Affiliation(s)
- T A Brettell
- Office of Forensic Sciences, New Jersey State Police, New Jersey Forensic Science and Technology Complex, 1200 Negron Road, Horizon Center, Hamilton, New Jersey 08691, USA
| | | | | |
Collapse
|