1
|
Ko S, Chu BB, Peterson D, Okenwa C, Papp JC, Alexander DH, Sobel EM, Zhou H, Lange KL. Unsupervised discovery of ancestry-informative markers and genetic admixture proportions in biobank-scale datasets. Am J Hum Genet 2023; 110:314-325. [PMID: 36610401 PMCID: PMC9943729 DOI: 10.1016/j.ajhg.2022.12.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 12/12/2022] [Indexed: 01/09/2023] Open
Abstract
Admixture estimation plays a crucial role in ancestry inference and genome-wide association studies (GWASs). Computer programs such as ADMIXTURE and STRUCTURE are commonly employed to estimate the admixture proportions of sample individuals. However, these programs can be overwhelmed by the computational burdens imposed by the 105 to 106 samples and millions of markers commonly found in modern biobanks. An attractive strategy is to run these programs on a set of ancestry-informative SNP markers (AIMs) that exhibit substantially different frequencies across populations. Unfortunately, existing methods for identifying AIMs require knowing ancestry labels for a subset of the sample. This supervised learning approach creates a chicken and the egg scenario. In this paper, we present an unsupervised, scalable framework that seamlessly carries out AIM selection and likelihood-based estimation of admixture proportions. Our simulated and real data examples show that this approach is scalable to modern biobank datasets. OpenADMIXTURE, our Julia implementation of the method, is open source and available for free.
Collapse
Affiliation(s)
- Seyoon Ko
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Benjamin B. Chu
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Daniel Peterson
- Department of Mathematics, Brigham Young University, Provo, UT 84602, USA
| | - Chidera Okenwa
- Department of Mathematics, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Jeanette C. Papp
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | | | - Eric M. Sobel
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA,Corresponding author
| | - Hua Zhou
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Kenneth L. Lange
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Statistics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
2
|
Boudeau S, Ramakodi MP, Zhou Y, Liu JC, Ragin C, Kulathinal RJ. Extensive set of African ancestry-informative markers (AIMs) to study ancestry and population health. Front Genet 2023; 14:1061781. [PMID: 36911410 PMCID: PMC9997643 DOI: 10.3389/fgene.2023.1061781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 01/20/2023] [Indexed: 02/16/2023] Open
Abstract
Introduction: Human populations are often highly structured due to differences in genetic ancestry among groups, posing difficulties in associating genes with diseases. Ancestry-informative markers (AIMs) aid in the detection of population stratification and provide an alternative approach to map population-specific alleles to disease. Here, we identify and characterize a novel set of African AIMs that separate populations of African ancestry from other global populations including those of European ancestry. Methods: Using data from the 1000 Genomes Project, highly informative SNP markers from five African subpopulations were selected based on estimates of informativeness (In) and compared against the European population to generate a final set of 46,737 African ancestry-informative markers (AIMs). The AIMs identified were validated using an independent set and functionally annotated using tools like SIFT, PolyPhen. They were also investigated for representation of commonly used SNP arrays. Results: This set of African AIMs effectively separates populations of African ancestry from other global populations and further identifies substructure between populations of African ancestry. When a subset of these AIMs was studied in an independent dataset, they differentiated people who self-identify as African American or Black from those who identify their ancestry as primarily European. Most of the AIMs were found to be in their intergenic and intronic regions with only 0.6% in the coding regions of the genome. Most of the commonly used SNP array investigated contained less than 10% of the AIMs. Discussion: While several functional annotations of both coding and non-coding African AIMs are supported by the literature and linked these high-frequency African alleles to diseases in African populations, more effort is needed to map genes to diseases in these genetically diverse subpopulations. The relative dearth of these African AIMs on current genotyping platforms (the array with the highest fraction, llumina's Omni 5, harbors less than a quarter of AIMs), further demonstrates a greater need to better represent historically understudied populations.
Collapse
Affiliation(s)
- Samantha Boudeau
- Department of Biology, Temple University, Philadelphia, PA, United States.,Cancer Prevention and Control Program, Fox Chase Cancer Center, Philadelphia, PA, United States.,African Caribbean Cancer Consortium, Fox Chase Cancer Center, Philadelphia, PA, United States
| | - Meganathan P Ramakodi
- Department of Biology, Temple University, Philadelphia, PA, United States.,Cancer Prevention and Control Program, Fox Chase Cancer Center, Philadelphia, PA, United States.,African Caribbean Cancer Consortium, Fox Chase Cancer Center, Philadelphia, PA, United States
| | - Yan Zhou
- Department of Biostatistics and Bioinformatics, Fox Chase Cancer Center, Philadelphia, PA, United States
| | - Jeffrey C Liu
- Department of Otolaryngology, Lewis Katz School of Medicine at Temple University, Philadelphia, PA, United States.,Department of Surgical Oncology, Fox chase Cancer center, Philadelphia, PA, United States
| | - Camille Ragin
- Cancer Prevention and Control Program, Fox Chase Cancer Center, Philadelphia, PA, United States.,African Caribbean Cancer Consortium, Fox Chase Cancer Center, Philadelphia, PA, United States
| | - Rob J Kulathinal
- Department of Biology, Temple University, Philadelphia, PA, United States.,African Caribbean Cancer Consortium, Fox Chase Cancer Center, Philadelphia, PA, United States
| |
Collapse
|
3
|
Cao Y, Zhu Q, Huang Y, Li X, Wei Y, Wang H, Zhang J. An efficient ancestry informative SNPs panel for further discriminating East Asian populations. Electrophoresis 2022; 43:1774-1783. [PMID: 35749689 DOI: 10.1002/elps.202100349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 05/19/2022] [Accepted: 06/17/2022] [Indexed: 11/07/2022]
Abstract
In forensic genetics, the use of ancestry informative single-nucleotide polymorphisms (AISNPs) panels can narrow the direction of the investigation by estimating an individual's biogeographic ancestry. However, distinguishing subgroups within continental regions requires more specific panels. In this study, we screened 19 AISNPs from the 1000 Genomes Project (1KG) based on their FST values to distinguish target populations in East Asia and obtained genotypes through SNaPshot. The 19 AISNPs could divide the global population of the 1KG into five clusters and could further divide the East Asian population into four clusters: Japanese, Han Chinese, Dai Chinese, and Kinh in Ho Chi Minh City of Vietnam. In summary, the 19-AISNP panel may serve as a useful and cost-effective tool for forensic ancestry inference in East Asian populations at a finer scale.
Collapse
Affiliation(s)
- Yueyan Cao
- West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu, P. R. China
| | - Qiang Zhu
- West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu, P. R. China
| | - Yuguo Huang
- West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu, P. R. China
| | - Xi Li
- West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu, P. R. China
| | - Yifan Wei
- West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu, P. R. China
| | - Haoyu Wang
- West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu, P. R. China
| | - Ji Zhang
- West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu, P. R. China
| |
Collapse
|
4
|
Lan Q, Zhao C, Chen C, Xu H, Fang Y, Yao H, Zhu B. Forensic Feature Exploration and Comprehensive Genetic Insights Into Yugu Ethnic Minority and Northern Han Population via a Novel NGS-Based Marker Set. Front Genet 2022; 13:816737. [PMID: 35601485 PMCID: PMC9121381 DOI: 10.3389/fgene.2022.816737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Accepted: 03/08/2022] [Indexed: 12/02/2022] Open
Abstract
The MPS technology has expanded the potential applications of DNA markers and increased the discrimination power of the targeted loci by taking variations in their flanking regions into consideration. Here, a collection of nuclear and extranuclear DNA markers (totally six kinds of nuclear genetic markers and mtDNA hypervariable region variations) were comprehensively and systematically assessed for polymorphism detections, further employed to dissect the population backgrounds in the Yugu ethnic group from Gansu province (Yugu) and Han population from the Inner Mongolia Autonomous Region (NMH) of China. The elevated efficiencies of the marker set in separating full sibling and challenging half sibling determination cases in parentage tests (iiSNPs), as well as predicting ancestry origins of unknown individuals from at least four continental populations (aiSNPs) and providing informative characteristic-related clues for Chinese populations (piSNPs) are highlighted in the present study. To sum up, different sets of DNA markers revealed sufficient effciencies to serve as promising tools in forensic applications. Genetic insights from the perspectives of autosomal DNA, Y chromosomal DNA, and mtDNA variations yielded that the Yugu ethnic group was genetically close related to the Han populations of the northern region. But we admit that more reference populations (like Mongolian, Tibetan, Hui, and Tu) should be incorporated to gain a refined genetic background landscape of the Yugu group in future studies.
Collapse
Affiliation(s)
- Qiong Lan
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, China
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Congying Zhao
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, China
| | - Chong Chen
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, China
- Clinical Research Center of Shaanxi Province for Dental and Maxillofacial Diseases, College of Stomatology, Xi’an Jiaotong University, Xi’an, China
| | - Hui Xu
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, China
| | - Yating Fang
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, China
| | - Hongbing Yao
- Belt and Road Research Center for Forensic Molecular Anthropology Gansu University of Political Science and Law, Lanzhou, China
| | - Bofeng Zhu
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, China
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
- Clinical Research Center of Shaanxi Province for Dental and Maxillofacial Diseases, College of Stomatology, Xi’an Jiaotong University, Xi’an, China
- *Correspondence: Bofeng Zhu,
| |
Collapse
|
5
|
Chen C, Jin X, Zhang X, Zhang W, Guo Y, Tao R, Chen A, Xu Q, Li M, Yang Y, Zhu B. Comprehensive Insights Into Forensic Features and Genetic Background of Chinese Northwest Hui Group Using Six Distinct Categories of 231 Molecular Markers. Front Genet 2021; 12:705753. [PMID: 34721519 PMCID: PMC8555763 DOI: 10.3389/fgene.2021.705753] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 09/07/2021] [Indexed: 11/13/2022] Open
Abstract
The Hui minority is predominantly composed of Chinese-speaking Islamic adherents distributed throughout China, of which the individuals are mainly concentrated in Northwest China. In the present study, we employed the length and sequence polymorphisms-based typing system of 231 molecular markers, i.e., amelogenin, 22 phenotypic-informative single nucleotide polymorphisms (PISNPs), 94 identity-informative single nucleotide polymorphisms (IISNPs), 24 Y-chromosomal short tandem repeats (Y-STRs), 56 ancestry-informative single nucleotide polymorphisms (AISNPs), 7 X-chromosomal short tandem repeats (X-STRs), and 27 autosomal short tandem repeats (A-STRs), into 90 unrelated male individuals from the Chinese Northwest Hui group to comprehensively explore its forensic characteristics and genetic background. Total of 451 length-based and 652 sequence-based distinct alleles were identified from 58 short tandem repeats (STRs) in 90 unrelated Northwest Hui individuals, denoting that the sequence-based genetic markers could pronouncedly provide more genetic information than length-based markers. The forensic characteristics and efficiencies of STRs and IISNPs were estimated, both of which externalized high polymorphisms in the Northwest Hui group and could be further utilized in forensic investigations. No significant departure from the Hardy-Weinberg equilibrium (HWE) expectation was observed after the Bonferroni correction. Additionally, four group sets of reference population data were exploited to dissect the genetic background of the Northwest Hui group separately from different perspectives, which contained 26 populations for 93 IISNPs, 58 populations for 17 Y-STRs, 26 populations for 55 AISNPs (raw data), and 109 populations for 55 AISNPs (allele frequencies). As a result, the analyses based on the Y-STRs indicated that the Northwest Hui group primarily exhibited intimate genetic relationships with reference Hui groups from Chinese different regions except for the Sichuan Hui group and secondarily displayed close genetic relationships with populations from Central and West Asia, as well as several Chinese groups. However, the AISNP analyses demonstrated that the Northwest Hui group shared more intimate relationships with current East Asian populations apart from reference Hui group, harboring the large proportion of ancestral component contributed by East Asia.
Collapse
Affiliation(s)
- Chong Chen
- Key Laboratory of Shaanxi Province for Craniofacial Precision Medicine Research, College of Stomatology, Xi’an Jiaotong University, Xi’an, China
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, China
| | - Xiaoye Jin
- Key Laboratory of Shaanxi Province for Craniofacial Precision Medicine Research, College of Stomatology, Xi’an Jiaotong University, Xi’an, China
| | - Xingru Zhang
- Key Laboratory of Shaanxi Province for Craniofacial Precision Medicine Research, College of Stomatology, Xi’an Jiaotong University, Xi’an, China
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, China
| | - Wenqing Zhang
- Key Laboratory of Shaanxi Province for Craniofacial Precision Medicine Research, College of Stomatology, Xi’an Jiaotong University, Xi’an, China
| | - Yuxin Guo
- Key Laboratory of Shaanxi Province for Craniofacial Precision Medicine Research, College of Stomatology, Xi’an Jiaotong University, Xi’an, China
| | - Ruiyang Tao
- Shanghai Key Laboratory of Forensic Medicine, Shanghai Forensic Service Platform, Academy of Forensic Sciences, Ministry of Justice, Shanghai, China
| | - Anqi Chen
- Shanghai Key Laboratory of Forensic Medicine, Shanghai Forensic Service Platform, Academy of Forensic Sciences, Ministry of Justice, Shanghai, China
- Department of Forensic Medicine, Shanghai Medical College of Fudan University, Shanghai, China
| | - Qiannan Xu
- Shanghai Key Laboratory of Forensic Medicine, Shanghai Forensic Service Platform, Academy of Forensic Sciences, Ministry of Justice, Shanghai, China
- Institute of Forensic Medicine, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu, China
| | - Min Li
- Shanghai Key Laboratory of Forensic Medicine, Shanghai Forensic Service Platform, Academy of Forensic Sciences, Ministry of Justice, Shanghai, China
- Institute of Forensic Medicine, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu, China
| | - Yue Yang
- Shanghai Key Laboratory of Forensic Medicine, Shanghai Forensic Service Platform, Academy of Forensic Sciences, Ministry of Justice, Shanghai, China
- School of Basic Medicine, Inner Mongolia Medical University, Hohhot, China
| | - Bofeng Zhu
- Key Laboratory of Shaanxi Province for Craniofacial Precision Medicine Research, College of Stomatology, Xi’an Jiaotong University, Xi’an, China
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, China
- Department of Forensic Genetics, Multi-Omics Innovative Research Center of Forensic Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, China
| |
Collapse
|
6
|
Differentiation of Hispanic biogeographic ancestry with 80 ancestry informative markers. Sci Rep 2020; 10:7745. [PMID: 32385290 PMCID: PMC7210943 DOI: 10.1038/s41598-020-64245-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Accepted: 04/03/2020] [Indexed: 11/09/2022] Open
Abstract
Ancestry informative single nucleotide polymorphisms (SNPs) can identify biogeographic ancestry (BGA); however, population substructure and relatively recent admixture can make differentiation difficult in heterogeneous Hispanic populations. Utilizing unrelated individuals from the Genomic Origins and Admixture in Latinos dataset (GOAL, n = 160), we designed an 80 SNP panel (Setser80) that accurately depicts BGA through STRUCTURE and PCA. We compared our Setser80 to the Seldin and Kidd panels via resampling simulations, which models data based on allele frequencies. We incorporated Admixed American 1000 Genomes populations (1000 G, n = 347), into a combined populations dataset to determine robustness. Using multinomial logistic regression (MLR), we compared the 3 panels on the combined dataset and found overall MLR classification accuracies: 93.2% Setser80, 87.9% Seldin panel, 71.4% Kidd panel. Naïve Bayesian classification had similar results on the combined dataset: 91.5% Setser80, 84.7% Seldin panel, 71.1% Kidd panel. Although Peru and Mexico were absent from panel design, we achieved high classification accuracy on the combined populations for Peru (MLR = 100%, naïve Bayes = 98%), and Mexico (MLR = 90%, naïve Bayes = 83.4%) as evidence of the portability of the Setser80. Our results indicate the Setser80 SNP panel can reliably classify BGA for individuals of presumed Hispanic origin.
Collapse
|
7
|
Yahya P, Sulong S, Harun A, Wangkumhang P, Wilantho A, Ngamphiw C, Tongsima S, Zilfalil BA. Ancestry-informative marker (AIM) SNP panel for the Malay population. Int J Legal Med 2019; 134:123-134. [PMID: 31760471 DOI: 10.1007/s00414-019-02184-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Accepted: 10/15/2019] [Indexed: 10/25/2022]
Abstract
Ancestry-informative markers (AIMs) can be used to infer the ancestry of an individual to minimize the inaccuracy of self-reported ethnicity in biomedical research. In this study, we describe three methods for selecting AIM SNPs for the Malay population (Malay AIM panel) using different approaches based on pairwise FST, informativeness for assignment (In), and PCA-correlated SNPs (PCAIMs). These Malay AIM panels were extracted from genotype data stored in SNP arrays hosted by the Malaysian node of the Human Variome Project (MyHVP) and the Singapore Genome Variation Project (SGVP). In particular, genotype data from a total of 165 Malay individuals were analyzed, comprising data on 117 individual genotypes from the Affymetrix SNP-6 SNP array platform and data on 48 individual genotypes from the OMNI 2.5 Illumina SNP array platform. The HapMap phase 3 database (1397 individuals from 11 populations) was used as a reference for comparison with the Malay genotype data. The accuracy of each resulting Malay AIM panel was evaluated using a machine learning "ancestry-predictive model" constructed by using WEKA, a comprehensive machine learning platform written in Java. A total of 1250 SNPs were finally selected, which successfully identified Malay individuals from other world populations with an accuracy of 90%, but the accuracy decreased to 80% using 157 SNPs according to the pairwise FST method, while a panel of 200 SNPs selected using In and PCAIMs could be used to identify Malay individuals with an accuracy of approximately 80%.
Collapse
Affiliation(s)
- Padillah Yahya
- Department of Paediatrics, School of Medical Sciences, Universiti Sains Malaysia, 16150, Kubang Kerian, Kelantan, Malaysia
| | - Sarina Sulong
- Human Genome Centre, School of Medical Sciences, Universiti Sains Malaysia, 16150, Kubang Kerian, Kelantan, Malaysia
| | - Azian Harun
- Department of Medical Microbiology and Parasitology, School of Medical Sciences, Universiti Sains Malaysia, 16150, Kubang Kerian, Kelantan, Malaysia
| | - Pongsakorn Wangkumhang
- National Center for Genetic Engineering and Biotechnology (BIOTEC), Thailand Science Park, Khlong Luang District, Pathum Thani, 12120, Thailand
| | - Alisa Wilantho
- National Center for Genetic Engineering and Biotechnology (BIOTEC), Thailand Science Park, Khlong Luang District, Pathum Thani, 12120, Thailand
| | - Chumpol Ngamphiw
- National Center for Genetic Engineering and Biotechnology (BIOTEC), Thailand Science Park, Khlong Luang District, Pathum Thani, 12120, Thailand
| | - Sissades Tongsima
- National Center for Genetic Engineering and Biotechnology (BIOTEC), Thailand Science Park, Khlong Luang District, Pathum Thani, 12120, Thailand
| | - Bin Alwi Zilfalil
- Department of Paediatrics, School of Medical Sciences, Universiti Sains Malaysia, 16150, Kubang Kerian, Kelantan, Malaysia.
| |
Collapse
|
8
|
Phillips C, McNevin D, Kidd K, Lagacé R, Wootton S, de la Puente M, Freire-Aradas A, Mosquera-Miguel A, Eduardoff M, Gross T, Dagostino L, Power D, Olson S, Hashiyada M, Oz C, Parson W, Schneider P, Lareu M, Daniel R. MAPlex - A massively parallel sequencing ancestry analysis multiplex for Asia-Pacific populations. Forensic Sci Int Genet 2019; 42:213-226. [DOI: 10.1016/j.fsigen.2019.06.022] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2019] [Revised: 06/04/2019] [Accepted: 06/26/2019] [Indexed: 11/25/2022]
|
9
|
Gauch HG, Qian S, Piepho HP, Zhou L, Chen R. Consequences of PCA graphs, SNP codings, and PCA variants for elucidating population structure. PLoS One 2019; 14:e0218306. [PMID: 31211811 PMCID: PMC6581268 DOI: 10.1371/journal.pone.0218306] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Accepted: 05/30/2019] [Indexed: 11/23/2022] Open
Abstract
SNP datasets are high-dimensional, often with thousands to millions of SNPs and hundreds to thousands of samples or individuals. Accordingly, PCA graphs are frequently used to provide a low-dimensional visualization in order to display and discover patterns in SNP data from humans, animals, plants, and microbes—especially to elucidate population structure. PCA is not a single method that is always done the same way, but rather requires three choices which we explore as a three-way factorial: two kinds of PCA graphs by three SNP codings by six PCA variants. Our main three recommendations are simple and easily implemented: Use PCA biplots, SNP coding 1 for the rare allele and 0 for the common allele, and double-centered PCA (or AMMI1 if main effects are also of interest). We also document contemporary practices by a literature survey of 125 representative articles that apply PCA to SNP data, find that virtually none implement our recommendations. The ultimate benefit from informed and optimal choices of PCA graph, SNP coding, and PCA variant, is expected to be discovery of more biology, and thereby acceleration of medical, agricultural, and other vital applications.
Collapse
Affiliation(s)
- Hugh G. Gauch
- Soil and Crop Sciences, College of Agriculture and Life Sciences, Cornell University, Ithaca, New York, United States of America
- * E-mail:
| | - Sheng Qian
- Biological Statistics and Computational Biology, College of Agriculture and Life Sciences, Cornell University, Ithaca, New York, United States of America
| | - Hans-Peter Piepho
- University of Hohenheim, Institute of Crop Science, Biostatistics Unit, Stuttgart, Germany
| | - Linda Zhou
- Soil and Crop Sciences, College of Agriculture and Life Sciences, Cornell University, Ithaca, New York, United States of America
| | - Rui Chen
- Biological Statistics and Computational Biology, College of Agriculture and Life Sciences, Cornell University, Ithaca, New York, United States of America
| |
Collapse
|
10
|
Liang Z, Bu L, Qin Y, Peng Y, Yang R, Zhao Y. Selection of Optimal Ancestry Informative Markers for Classification and Ancestry Proportion Estimation in Pigs. Front Genet 2019; 10:183. [PMID: 30915106 PMCID: PMC6421339 DOI: 10.3389/fgene.2019.00183] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Accepted: 02/19/2019] [Indexed: 12/26/2022] Open
Abstract
Using small sets of ancestry informative markers (AIMs) constitutes a cost-effective method to accurately estimate the ancestry proportions of individuals. This study aimed to generate a small and effective number of AIMs from ∼60 K single nucleotide polymorphism (SNP) data of porcine and estimate three ancestry proportions [East China pig (ECHP), South China pig (SCHP), and European commercial pig (EUCP)] from Asian breeds and European domestic breeds. A total of 186 samples of 10 pure breeds were divided into three groups: ECHP, SCHP, and EUCP. Using these samples and a one-vs.-rest SVM classifier, we found that using only seven AIMs could completely separate the three groups. Subsequently, we utilized supervised ADMIXTURE to calculate ancestry proportions and found that the 129 AIMs performed well on ancestry estimates when pseudo admixed individuals were used. Furthermore, another 969 samples of 61 populations were applied to evaluate the performance of the 129 AIMs. We also observed that the 129 AIMs were highly correlated with estimates using ∼60 K SNP data for three ancestry components: ECHP (Pearson correlation coefficient (r) = 0.94), SCHP (r = 0.94), and EUCP (r = 0.99). Our results provided an example of using a small number of pig AIMs for classifications and estimating ancestry proportions with high accuracy and in a cost-effective manner.
Collapse
Affiliation(s)
- Zuoxiang Liang
- Beijing Advanced Innovation Center for Food Nutrition and Human Health, China Agricultural University, Beijing, China.,State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Lina Bu
- Beijing Advanced Innovation Center for Food Nutrition and Human Health, China Agricultural University, Beijing, China.,State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Yidi Qin
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Yebo Peng
- Beijing Advanced Innovation Center for Food Nutrition and Human Health, China Agricultural University, Beijing, China.,State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Ruifei Yang
- Beijing Advanced Innovation Center for Food Nutrition and Human Health, China Agricultural University, Beijing, China.,State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Yiqiang Zhao
- Beijing Advanced Innovation Center for Food Nutrition and Human Health, China Agricultural University, Beijing, China.,State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, China
| |
Collapse
|
11
|
Fowke JH, Koyama T, Dai Q, Zheng SL, Xu J, Howard LE, Freedland SJ. Blood and dietary magnesium levels are not linked with lower prostate cancer risk in black or white men. Cancer Lett 2019; 449:99-105. [PMID: 30776477 DOI: 10.1016/j.canlet.2019.02.023] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Revised: 01/18/2019] [Accepted: 02/11/2019] [Indexed: 12/15/2022]
Abstract
Recent studies suggest a diet low in dietary magnesium intake or lower blood magnesium levels is linked with increased prostate cancer risk. This study investigates the race-specific link between blood magnesium and calcium levels, or dietary magnesium intake, and the diagnosis of low-grade and high-grade prostate cancer. The study included 637 prostate cancer cases and 715 biopsy-negative controls (50% black) recruited from Nashville, TN or Durham, NC. Blood was collected at the time of recruitment, and dietary intake was assessed by food frequency questionnaire. Percent genetic African ancestry was determined as a compliment to self-reported race. Blood magnesium levels and dietary magnesium intake were significantly lower in black compared to white men. However, magnesium levels or intake were not associated with risk of total prostate cancer or aggressive prostate cancer. Indeed, a higher calcium-to-magnesium diet intake was significantly protective for high-grade prostate cancer in black (OR = 0.66 (0.45, 0.96), p = 0.03) but not white (OR = 1.00 (0.79, 1.26), p = 0.99) men. In summary, there was a statistically significant difference in magnesium intake between black and white men, but the biological impact was unclear, and we did not confirm a lower prostate cancer risk associated with magnesium levels.
Collapse
Affiliation(s)
- Jay H Fowke
- Department of Preventive Medicine, University of Tennessee Health Science Center, TN, USA.
| | - Tatsuki Koyama
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA.
| | - Qi Dai
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
| | - S Lilly Zheng
- Program for Personalized Cancer Care, NorthShore University HealthSystem, Evanston, IL, USA.
| | - Jianfeng Xu
- Program for Personalized Cancer Care, NorthShore University HealthSystem, Evanston, IL, USA.
| | - Lauren E Howard
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA; Surgery Section, Durham VA Medical Center, Durham, NC, USA.
| | - Stephen J Freedland
- Surgery Section, Durham VA Medical Center, Durham, NC, USA; Department of Surgery, Division of Urology, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA.
| |
Collapse
|
12
|
Woerner AE, Novroski NM, Wendt FR, Ambers A, Wiley R, Schmedes SE, Budowle B. Forensic human identification with targeted microbiome markers using nearest neighbor classification. Forensic Sci Int Genet 2019; 38:130-139. [DOI: 10.1016/j.fsigen.2018.10.003] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 10/02/2018] [Accepted: 10/03/2018] [Indexed: 02/09/2023]
|
13
|
Davis MB, Newman LA. Breast Cancer Disparities: How Can We Leverage Genomics to Improve Outcomes? Surg Oncol Clin N Am 2018; 27:217-234. [PMID: 29132562 DOI: 10.1016/j.soc.2017.07.009] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Breast cancer mortality rates are higher in African American compared with white American women. Disproportionately rising incidence rates, coupled with higher rates of biologically aggressive disease among African Americans is resulting in a widening of the mortality disparity. Higher rates of triple-negative breast cancer among African American women, as well as women from western sub-Saharan Africa, has prompted questions regarding the role of African ancestry as a marker of hereditary susceptibility for specific disease phenotypes. Advances in germline genetics, as well as somatic tumor genomic research, hold great promise in the effort to understand the biology of breast cancer variations between different population subsets.
Collapse
Affiliation(s)
- Melissa B Davis
- Henry Ford Cancer Institute, 2799 West Grand Boulevard, Detroit, MI 48202, USA
| | - Lisa A Newman
- Breast Oncology Program, Department of Surgery, Henry Ford Health System, Henry Ford Cancer Institute, International Center for the Study of Breast Cancer Subtypes, 2799 West Grand Boulevard, Detroit, MI 48202, USA.
| |
Collapse
|
14
|
Prieto-Fernández E, Kleinbielen T, Baeta M, de Pancorbo MM. In-silico evaluation based on public data: In search of forensically efficient tri- and tetrallelic X-SNPs. Forensic Sci Int Genet 2017; 32:e5-e6. [PMID: 29162489 DOI: 10.1016/j.fsigen.2017.11.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Accepted: 11/13/2017] [Indexed: 11/19/2022]
Affiliation(s)
- Endika Prieto-Fernández
- BIOMICs Research Group, Lascaray Research Center, University of the Basque Country UPV/EHU, Avda. Miguel de Unamuno, 3, 01006 Vitoria-Gasteiz, Spain
| | - Tamara Kleinbielen
- BIOMICs Research Group, Lascaray Research Center, University of the Basque Country UPV/EHU, Avda. Miguel de Unamuno, 3, 01006 Vitoria-Gasteiz, Spain
| | - Miriam Baeta
- BIOMICs Research Group, Lascaray Research Center, University of the Basque Country UPV/EHU, Avda. Miguel de Unamuno, 3, 01006 Vitoria-Gasteiz, Spain
| | - Marian M de Pancorbo
- BIOMICs Research Group, Lascaray Research Center, University of the Basque Country UPV/EHU, Avda. Miguel de Unamuno, 3, 01006 Vitoria-Gasteiz, Spain.
| |
Collapse
|
15
|
Ramani A, Wong Y, Tan SZ, Shue BH, Syn C. Ancestry prediction in Singapore population samples using the Illumina ForenSeq kit. Forensic Sci Int Genet 2017; 31:171-179. [DOI: 10.1016/j.fsigen.2017.08.013] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2017] [Revised: 07/18/2017] [Accepted: 08/11/2017] [Indexed: 11/24/2022]
|
16
|
Liu Y, Liao H, Liu Y, Guo J, Sun Y, Fu X, Xiao D, Cai J, Lan L, Xie P, Zha L. Developing a new nonbinary SNP fluorescent multiplex detection system for forensic application in China. Electrophoresis 2017; 38:1154-1162. [PMID: 28168762 DOI: 10.1002/elps.201600379] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Revised: 01/21/2017] [Accepted: 01/31/2017] [Indexed: 01/23/2023]
Affiliation(s)
- Yanfang Liu
- Department of Forensic Science, School of Basic Medical Sciences; Central South University; Changsha P.R. China
| | - Huidan Liao
- Department of Forensic Science, School of Basic Medical Sciences; Central South University; Changsha P.R. China
| | - Ying Liu
- Department of Oral and Maxillofacial Surgery, Xiangya Stomatological Hospital; Central South University; Changsha P.R. China
| | - Juanjuan Guo
- Department of Oral and Maxillofacial Surgery, Xiangya Stomatological Hospital; Central South University; Changsha P.R. China
| | - Yi Sun
- Department of Forensic Science, School of Basic Medical Sciences; Central South University; Changsha P.R. China
| | - Xiaoliang Fu
- Department of Forensic Science, School of Basic Medical Sciences; Central South University; Changsha P.R. China
| | - Ding Xiao
- Research Center of Carcinogenesis and Targeted Therapy, Xiangya Hospital; Central South University; Changsha P.R. China
| | - Jifeng Cai
- Department of Forensic Science, School of Basic Medical Sciences; Central South University; Changsha P.R. China
| | - Lingmei Lan
- Department of Forensic Science, School of Basic Medical Sciences; Central South University; Changsha P.R. China
| | - Pingli Xie
- Department of Forensic Science, School of Basic Medical Sciences; Central South University; Changsha P.R. China
| | - Lagabaiyila Zha
- Department of Forensic Science, School of Basic Medical Sciences; Central South University; Changsha P.R. China
| |
Collapse
|
17
|
Chen P, Zhu J, Pu Y, Jiang Y, Chen D, Wang H, Mao J, Zhou B, Gao L, Bai P, Liang W, Zhang L. Microhaplotype identified and performed in genetic investigation using PCR-SSCP. Forensic Sci Int Genet 2017; 28:e1-e7. [PMID: 28174015 DOI: 10.1016/j.fsigen.2017.01.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2016] [Revised: 12/02/2016] [Accepted: 01/17/2017] [Indexed: 01/04/2023]
Abstract
The recently introduced concept of microhaplotype loci has attracted attention in forensics. Previous studies estimated the allele frequencies generally through obtaining genotypic data on the individual SNPs from a larger set of unrelated individuals then phasing microhaplotypes by statistical and computational techniques. Determining phase for a single new individual requires the larger set of individuals to have been genotyped previously. Rare microhaplotypes possessed only by the target individual or microhaplotypes private to a specific population not previously studied are unlikely to be accurately phased using data sets of SNPs. Thus, there is a demand for an approach that could directly determine a gain single individual's precise microhaplotype information. In the present study, we introduced potential approaches of single chain sequencing based Massively Parallel Sequencing Technology (MiSeq) and PCR based Single Strand Conformational Polymorphism (SSCP) technology which was simple, accurate, and cost-effective. The results indicated that microhaplotypes contain much more polymorphic information than divided SNPs per locus (average heterozygosity of microhaplotype 0.61 VS SNPs 0.41). When microhaplotype allele frequencies were compared among five Chinese ethnic populations, significantly different distributions were found between the Han and Uyghur populations. Further analysis of pairwise Fst values and analysis of molecular variance (AMOVA), showed significant population differentiation between the Uyghur and other populations.
Collapse
Affiliation(s)
- Peng Chen
- Department of Forensic Biology, West China School of Preclinical and Forensic Medicine, Sichuan University, Chengdu 610041, PR China
| | - Jing Zhu
- Department of Forensic Biology, West China School of Preclinical and Forensic Medicine, Sichuan University, Chengdu 610041, PR China
| | - Yan Pu
- Department of Forensic Biology, West China School of Preclinical and Forensic Medicine, Sichuan University, Chengdu 610041, PR China
| | - Youjing Jiang
- Department of Forensic Biology, West China School of Preclinical and Forensic Medicine, Sichuan University, Chengdu 610041, PR China
| | - Dan Chen
- Department of Forensic Genetics, Institute of Forensic Science, Chengdu Public Security Bureau, Chengdu 610081, Sichuan, PR China
| | - Hui Wang
- Department of Forensic Genetics, Institute of Forensic Science, Chengdu Public Security Bureau, Chengdu 610081, Sichuan, PR China
| | - Jiong Mao
- Department of Forensic Genetics, Institute of Forensic Science, Chengdu Public Security Bureau, Chengdu 610081, Sichuan, PR China
| | - Bin Zhou
- Laboratory of Molecular Translational Medicine, West China Institute of Women and Children's Health, Key Laboratory of Obstetric & Gynecologic and Pediatric Diseases and Birth Defects of Ministry of Education, West China Second University Hospital, Sichuan University, Chengdu 610041, PR China
| | - Linbo Gao
- Laboratory of Molecular Translational Medicine, West China Institute of Women and Children's Health, Key Laboratory of Obstetric & Gynecologic and Pediatric Diseases and Birth Defects of Ministry of Education, West China Second University Hospital, Sichuan University, Chengdu 610041, PR China
| | - Peng Bai
- Department of Forensic Biology, West China School of Preclinical and Forensic Medicine, Sichuan University, Chengdu 610041, PR China
| | - Weibo Liang
- Department of Forensic Biology, West China School of Preclinical and Forensic Medicine, Sichuan University, Chengdu 610041, PR China.
| | - Lin Zhang
- Department of Forensic Biology, West China School of Preclinical and Forensic Medicine, Sichuan University, Chengdu 610041, PR China; Laboratory of Molecular Translational Medicine, West China Institute of Women and Children's Health, Key Laboratory of Obstetric & Gynecologic and Pediatric Diseases and Birth Defects of Ministry of Education, West China Second University Hospital, Sichuan University, Chengdu 610041, PR China.
| |
Collapse
|
18
|
冯 杏, 孙 启, 刘 宏, 魏 以, 杜 蔚, 李 彩, 陈 玲, 刘 超. [Efficiency of 27-plex single nucleotide polymorphism multiplex system for ancestry inference in different populations]. NAN FANG YI KE DA XUE XUE BAO = JOURNAL OF SOUTHERN MEDICAL UNIVERSITY 2016; 37:555-562. [PMID: 28446414 PMCID: PMC6744106 DOI: 10.3969/j.issn.1673-4254.2017.04.24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 09/05/2016] [Indexed: 06/07/2023]
Abstract
OBJECTIVE To validate the efficiency of 27-plex single nucleotide polymorphism (SNP) multiplex system for ancestry inference. METHODS The 27-plex SNP system was validated for its sensitivity and species specificity. A total of 533 samples were collected from African, Southern Chinese Han, China's ethic minorities (Yi, Hui, Miao, Tibet, and Uygur), European, Central Asian, Western Asian, Southern Asian, Southeast Asian and South American populations for clustering analysis of the genotypes by citing 3 representative continental ancestral groups [East Asia (CHB), Europe (CEU), and Africa (YRI)] from HapMap database. RESULTS The system sensitivity is 0.125 ng. Twenty and six genotypes were detected in chimpanzee and monkeys, respectively. Except in rs10496971, no more products were found in other animals. The system was capable of differentiating intercontinental populations but not of distinguishing between East Asian and Southeast Asian population or between Southern Chinese Han population and Chinese Ethnic populations (Hui, Miao, Yi and Tibet). This system achieved a 100% accuracy for intercontinental population source inference for 46 blind test samples. CONCLUSION 27-plex SNPs multiplex system has a high sensitivity and species specificity and can correctly differentiate the ancestry origins of individuals from African, European and East Asian for criminal case investigation. But this system is not capable of distinguishing subpopulation groups and more specific ancestry-informative markers are needed to improve its recognition of Southeast Asian and Chinese ethnic populations.
Collapse
Affiliation(s)
- 杏玲 冯
- 南方医科大学法医学院,广东 广州 510515School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China
| | - 启凡 孙
- 公安部物证鉴定中心//现场物证溯源国家工程实验室//法医遗传学公安部重点实验室,北京 100038National Engineering Laboratory for Crime Scene Evidence Examination, Key Laboratory of Forensic Genetics of Ministry of Public Security, Institute of Forensic Science, Beijing 100038, China
| | - 宏 刘
- 广州市刑事科学技术研究所//广东省法医遗传学重点实验室,广东 广州 510030Guangzhou Institute of Criminal Science and Technology/Key Laboratory of Forensic Pathology of Ministry of Public Security, Guangzhou 510030, China
| | - 以梁 魏
- 天津医科大学,天津 300070Tianjin Medical University, Tianjin 300070, China
| | - 蔚安 杜
- 南方医科大学法医学院,广东 广州 510515School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China
| | - 彩霞 李
- 公安部物证鉴定中心//现场物证溯源国家工程实验室//法医遗传学公安部重点实验室,北京 100038National Engineering Laboratory for Crime Scene Evidence Examination, Key Laboratory of Forensic Genetics of Ministry of Public Security, Institute of Forensic Science, Beijing 100038, China
| | - 玲 陈
- 南方医科大学法医学院,广东 广州 510515School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China
| | - 超 刘
- 南方医科大学法医学院,广东 广州 510515School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China
| |
Collapse
|
19
|
Wei YL, Sun QF, Li Q, Yi JL, Zhao L, Ou Y, Jiang L, Zhang T, Liu HB, Chen JG, Zhu BF, Ye J, Hu L, Li CX. Genetic structure and differentiation analysis of a Eurasian Uyghur population by use of 27 continental ancestry-informative SNPs. Int J Legal Med 2016; 130:897-903. [DOI: 10.1007/s00414-016-1335-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2015] [Accepted: 02/10/2016] [Indexed: 01/12/2023]
|
20
|
Zeng X, Warshauer DH, King JL, Churchill JD, Chakraborty R, Budowle B. Empirical testing of a 23-AIMs panel of SNPs for ancestry
evaluations in four major US populations. Int J Legal Med 2016; 130:891-896. [DOI: 10.1007/s00414-016-1333-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Accepted: 02/05/2016] [Indexed: 10/22/2022]
|