1
|
Gauch HG, Qian S, Piepho HP, Zhou L, Chen R. Consequences of PCA graphs, SNP codings, and PCA variants for elucidating population structure. PLoS One 2019; 14:e0218306. [PMID: 31211811 PMCID: PMC6581268 DOI: 10.1371/journal.pone.0218306] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Accepted: 05/30/2019] [Indexed: 11/23/2022] Open
Abstract
SNP datasets are high-dimensional, often with thousands to millions of SNPs and hundreds to thousands of samples or individuals. Accordingly, PCA graphs are frequently used to provide a low-dimensional visualization in order to display and discover patterns in SNP data from humans, animals, plants, and microbes—especially to elucidate population structure. PCA is not a single method that is always done the same way, but rather requires three choices which we explore as a three-way factorial: two kinds of PCA graphs by three SNP codings by six PCA variants. Our main three recommendations are simple and easily implemented: Use PCA biplots, SNP coding 1 for the rare allele and 0 for the common allele, and double-centered PCA (or AMMI1 if main effects are also of interest). We also document contemporary practices by a literature survey of 125 representative articles that apply PCA to SNP data, find that virtually none implement our recommendations. The ultimate benefit from informed and optimal choices of PCA graph, SNP coding, and PCA variant, is expected to be discovery of more biology, and thereby acceleration of medical, agricultural, and other vital applications.
Collapse
Affiliation(s)
- Hugh G. Gauch
- Soil and Crop Sciences, College of Agriculture and Life Sciences, Cornell University, Ithaca, New York, United States of America
- * E-mail:
| | - Sheng Qian
- Biological Statistics and Computational Biology, College of Agriculture and Life Sciences, Cornell University, Ithaca, New York, United States of America
| | - Hans-Peter Piepho
- University of Hohenheim, Institute of Crop Science, Biostatistics Unit, Stuttgart, Germany
| | - Linda Zhou
- Soil and Crop Sciences, College of Agriculture and Life Sciences, Cornell University, Ithaca, New York, United States of America
| | - Rui Chen
- Biological Statistics and Computational Biology, College of Agriculture and Life Sciences, Cornell University, Ithaca, New York, United States of America
| |
Collapse
|
2
|
Fowke JH, Koyama T, Dai Q, Zheng SL, Xu J, Howard LE, Freedland SJ. Blood and dietary magnesium levels are not linked with lower prostate cancer risk in black or white men. Cancer Lett 2019; 449:99-105. [PMID: 30776477 DOI: 10.1016/j.canlet.2019.02.023] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Revised: 01/18/2019] [Accepted: 02/11/2019] [Indexed: 12/15/2022]
Abstract
Recent studies suggest a diet low in dietary magnesium intake or lower blood magnesium levels is linked with increased prostate cancer risk. This study investigates the race-specific link between blood magnesium and calcium levels, or dietary magnesium intake, and the diagnosis of low-grade and high-grade prostate cancer. The study included 637 prostate cancer cases and 715 biopsy-negative controls (50% black) recruited from Nashville, TN or Durham, NC. Blood was collected at the time of recruitment, and dietary intake was assessed by food frequency questionnaire. Percent genetic African ancestry was determined as a compliment to self-reported race. Blood magnesium levels and dietary magnesium intake were significantly lower in black compared to white men. However, magnesium levels or intake were not associated with risk of total prostate cancer or aggressive prostate cancer. Indeed, a higher calcium-to-magnesium diet intake was significantly protective for high-grade prostate cancer in black (OR = 0.66 (0.45, 0.96), p = 0.03) but not white (OR = 1.00 (0.79, 1.26), p = 0.99) men. In summary, there was a statistically significant difference in magnesium intake between black and white men, but the biological impact was unclear, and we did not confirm a lower prostate cancer risk associated with magnesium levels.
Collapse
Affiliation(s)
- Jay H Fowke
- Department of Preventive Medicine, University of Tennessee Health Science Center, TN, USA.
| | - Tatsuki Koyama
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA.
| | - Qi Dai
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
| | - S Lilly Zheng
- Program for Personalized Cancer Care, NorthShore University HealthSystem, Evanston, IL, USA.
| | - Jianfeng Xu
- Program for Personalized Cancer Care, NorthShore University HealthSystem, Evanston, IL, USA.
| | - Lauren E Howard
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA; Surgery Section, Durham VA Medical Center, Durham, NC, USA.
| | - Stephen J Freedland
- Surgery Section, Durham VA Medical Center, Durham, NC, USA; Department of Surgery, Division of Urology, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA.
| |
Collapse
|
3
|
King JL, Churchill JD, Novroski NM, Zeng X, Warshauer DH, Seah LH, Budowle B. Increasing the discrimination power of ancestry- and identity-informative SNP loci within the ForenSeq™ DNA Signature Prep Kit. Forensic Sci Int Genet 2018; 36:60-76. [DOI: 10.1016/j.fsigen.2018.06.005] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Revised: 05/03/2018] [Accepted: 06/05/2018] [Indexed: 11/26/2022]
|
4
|
King JL, Wendt FR, Sun J, Budowle B. STRait Razor v2s: Advancing sequence-based STR allele reporting and beyond to other marker systems. Forensic Sci Int Genet 2017; 29:21-28. [PMID: 28343097 DOI: 10.1016/j.fsigen.2017.03.013] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2017] [Revised: 03/06/2017] [Accepted: 03/09/2017] [Indexed: 11/27/2022]
Abstract
STRait Razor has provided the forensic community a free-to-use, open-source tool for short tandem repeat (STR) analysis of massively parallel sequencing (MPS) data. STRait Razor v2s (SRv2s) allows users to capture physically phased haplotypes within the full amplicon of both commercial (ForenSeq) and "early access" panels (PowerSeq, Mixture ID). STRait Razor v2s may be run in batch mode to facilitate population-level analysis and is supported by all Unix distributions (including MAC OS). Data are reported in tables in string (haplotype), length-based (e.g., vWA allele 14), and International Society of Forensic Genetics (ISFG)-recommended (vWA [CE 14]-GRCh38-chr12:5983950-5984049 (TAGA)10 (CAGA)3 TAGA) formats. STRait Razor v2s currently contains a database of ∼2500 unique sequences. This database is used by SRv2s to match strings to the appropriate allele in ISFG-recommended format. In addition to STRs, SRv2s has configuration files necessary to capture and report haplotypes from all marker types included in these multiplexes (e.g., SNPs, InDels, and microhaplotypes). To facilitate mixture interpretation, data may be displayed from all markers in a format similar to that of electropherograms displayed by traditional forensic software. The download package for SRv2s may be found at https://www.unthsc.edu/graduate-school-of-biomedical-sciences/molecular-and-medical-genetics/laboratory-faculty-and-staff/strait-razor.
Collapse
Affiliation(s)
- Jonathan L King
- Center for Human Identification, University of North Texas Health Science Center, 3500 Camp Bowie Blvd., Fort Worth, TX 76107, USA.
| | - Frank R Wendt
- Center for Human Identification, University of North Texas Health Science Center, 3500 Camp Bowie Blvd., Fort Worth, TX 76107, USA
| | - Jie Sun
- Institute of Molecular Medicine, University of North Texas Health Science Center, 3500 Camp Bowie Blvd., Fort Worth, TX 76107, USA
| | - Bruce Budowle
- Center for Human Identification, University of North Texas Health Science Center, 3500 Camp Bowie Blvd., Fort Worth, TX 76107, USA; Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
5
|
Massively parallel sequencing of 68 insertion/deletion markers identifies novel microhaplotypes for utility in human identity testing. Forensic Sci Int Genet 2016; 25:198-209. [PMID: 27685342 DOI: 10.1016/j.fsigen.2016.09.005] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2016] [Revised: 08/01/2016] [Accepted: 09/19/2016] [Indexed: 11/23/2022]
Abstract
Short tandem repeat (STR) loci are the traditional markers used for kinship, missing persons, and direct comparison human identity testing. These markers hold considerable value due to their highly polymorphic nature, amplicon size, and ability to be multiplexed. However, many STRs are still too large for use in analysis of highly degraded DNA. Small bi-allelic polymorphisms, such as insertions/deletions (INDELs), may be better suited for analyzing compromised samples, and their allele size differences are amenable to analysis by capillary electrophoresis. The INDEL marker allelic states range in size from 2 to 6 base pairs, enabling small amplicon size. In addition, heterozygote balance may be increased by minimizing preferential amplification of the smaller allele, as is more common with STR markers. Multiplexing a large number of INDELs allows for generating panels with high discrimination power. The Nextera™ Rapid Capture Custom Enrichment Kit (Illumina, Inc., San Diego, CA) and massively parallel sequencing (MPS) on the Illumina MiSeq were used to sequence 68 well-characterized INDELs in four major US population groups. In addition, the STR Allele Identification Tool: Razor (STRait Razor) was used in a novel way to analyze INDEL sequences and detect adjacent single nucleotide polymorphisms (SNPs) and other polymorphisms. This application enabled the discovery of unique allelic variants, which increased the discrimination power and decreased the single-locus random match probabilities (RMPs) of 22 of these well-characterized INDELs which can be considered as microhaplotypes. These findings suggest that additional microhaplotypes containing human identification (HID) INDELs may exist elsewhere in the genome.
Collapse
|
6
|
冯 杏, 孙 启, 刘 宏, 魏 以, 杜 蔚, 李 彩, 陈 玲, 刘 超. [Efficiency of 27-plex single nucleotide polymorphism multiplex system for ancestry inference in different populations]. NAN FANG YI KE DA XUE XUE BAO = JOURNAL OF SOUTHERN MEDICAL UNIVERSITY 2016; 37:555-562. [PMID: 28446414 PMCID: PMC6744106 DOI: 10.3969/j.issn.1673-4254.2017.04.24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 09/05/2016] [Indexed: 06/07/2023]
Abstract
OBJECTIVE To validate the efficiency of 27-plex single nucleotide polymorphism (SNP) multiplex system for ancestry inference. METHODS The 27-plex SNP system was validated for its sensitivity and species specificity. A total of 533 samples were collected from African, Southern Chinese Han, China's ethic minorities (Yi, Hui, Miao, Tibet, and Uygur), European, Central Asian, Western Asian, Southern Asian, Southeast Asian and South American populations for clustering analysis of the genotypes by citing 3 representative continental ancestral groups [East Asia (CHB), Europe (CEU), and Africa (YRI)] from HapMap database. RESULTS The system sensitivity is 0.125 ng. Twenty and six genotypes were detected in chimpanzee and monkeys, respectively. Except in rs10496971, no more products were found in other animals. The system was capable of differentiating intercontinental populations but not of distinguishing between East Asian and Southeast Asian population or between Southern Chinese Han population and Chinese Ethnic populations (Hui, Miao, Yi and Tibet). This system achieved a 100% accuracy for intercontinental population source inference for 46 blind test samples. CONCLUSION 27-plex SNPs multiplex system has a high sensitivity and species specificity and can correctly differentiate the ancestry origins of individuals from African, European and East Asian for criminal case investigation. But this system is not capable of distinguishing subpopulation groups and more specific ancestry-informative markers are needed to improve its recognition of Southeast Asian and Chinese ethnic populations.
Collapse
Affiliation(s)
- 杏玲 冯
- 南方医科大学法医学院,广东 广州 510515School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China
| | - 启凡 孙
- 公安部物证鉴定中心//现场物证溯源国家工程实验室//法医遗传学公安部重点实验室,北京 100038National Engineering Laboratory for Crime Scene Evidence Examination, Key Laboratory of Forensic Genetics of Ministry of Public Security, Institute of Forensic Science, Beijing 100038, China
| | - 宏 刘
- 广州市刑事科学技术研究所//广东省法医遗传学重点实验室,广东 广州 510030Guangzhou Institute of Criminal Science and Technology/Key Laboratory of Forensic Pathology of Ministry of Public Security, Guangzhou 510030, China
| | - 以梁 魏
- 天津医科大学,天津 300070Tianjin Medical University, Tianjin 300070, China
| | - 蔚安 杜
- 南方医科大学法医学院,广东 广州 510515School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China
| | - 彩霞 李
- 公安部物证鉴定中心//现场物证溯源国家工程实验室//法医遗传学公安部重点实验室,北京 100038National Engineering Laboratory for Crime Scene Evidence Examination, Key Laboratory of Forensic Genetics of Ministry of Public Security, Institute of Forensic Science, Beijing 100038, China
| | - 玲 陈
- 南方医科大学法医学院,广东 广州 510515School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China
| | - 超 刘
- 南方医科大学法医学院,广东 广州 510515School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China
| |
Collapse
|