Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Journal Articles

Rank	Citation Analysis	Article Type	Number of Years	Citation(s) in RCA
1	Peterson RE, Kuchenbaecker K, Walters RK, Chen CY, Popejoy AB, Periyasamy S, Lam M, Iyegbe C, Strawbridge RJ, Brick L, Carey CE, Martin AR, Meyers JL, Su J, Chen J, Edwards AC, Kalungi A, Koen N, Majara L, Schwarz E, Smoller JW, Stahl EA, Sullivan PF, Vassos E, Mowry B, Prieto ML, Cuellar-Barboza A, Bigdeli TB, Edenberg HJ, Huang H, Duncan LE. Genome-wide Association Studies in Ancestrally Diverse Populations: Opportunities, Methods, Pitfalls, and Recommendations. Cell 2019;179:589-603. [PMID: 31607513 PMCID: PMC6939869 DOI: 10.1016/j.cell.2019.08.051] [Citation(s) in RCA: 465] [Impact Index Per Article: 77.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Revised: 07/10/2019] [Accepted: 08/26/2019] [Indexed: 12/19/2022] Abstract Genome-wide association studies (GWASs) have focused primarily on populations of European descent, but it is essential that diverse populations become better represented. Increasing diversity among study participants will advance our understanding of genetic architecture in all populations and ensure that genetic research is broadly applicable. To facilitate and promote research in multi-ancestry and admixed cohorts, we outline key methodological considerations and highlight opportunities, challenges, solutions, and areas in need of development. Despite the perception that analyzing genetic data from diverse populations is difficult, it is scientifically and ethically imperative, and there is an expanding analytical toolbox to do it well. Collapse Key Words GWAS admixed populations ancestry complex disease cross-ancestry diversity population genetics psychiatry trans-ancestry trans-ethnic Collapse MESH Headings Data Accuracy Genetic Variation Genetics, Population/methods Genetics, Population/standards Genome-Wide Association Study/methods Genome-Wide Association Study/standards Genotyping Techniques/methods Genotyping Techniques/standards Human Genetics/methods Human Genetics/standards Humans Pedigree Collapse Grants U01 MH109499 NIMH NIH HHS K01 DK114379 NIDDK NIH HHS MR/S003061/1 Medical Research Council U01 MH109514 NIMH NIH HHS K99 MH117229 NIMH NIH HHS R21 AI139012 NIAID NIH HHS U01 MH094432 NIMH NIH HHS U01 MH109528 NIMH NIH HHS UL1 TR003142 NCATS NIH HHS U01 MH109536 NIMH NIH HHS U01 MH109501 NIMH NIH HHS U41 HG009649 NHGRI NIH HHS UL1 TR001085 NCATS NIH HHS K01 MH113848 NIMH NIH HHS U01 MH109532 NIMH NIH HHS Wellcome Trust U01 MH109539 NIMH NIH HHS Collapse Collaborators Collapse	Research Support, N.I.H., Extramural	6	465
2	Sonah H, Bastien M, Iquira E, Tardivel A, Légaré G, Boyle B, Normandeau É, Laroche J, Larose S, Jean M, Belzile F. An improved genotyping by sequencing (GBS) approach offering increased versatility and efficiency of SNP discovery and genotyping. PLoS One 2013;8:e54603. [PMID: 23372741 PMCID: PMC3553054 DOI: 10.1371/journal.pone.0054603] [Citation(s) in RCA: 286] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2012] [Accepted: 12/14/2012] [Indexed: 11/24/2022] Open Abstract Highly parallel SNP genotyping platforms have been developed for some important crop species, but these platforms typically carry a high cost per sample for first-time or small-scale users. In contrast, recently developed genotyping by sequencing (GBS) approaches offer a highly cost effective alternative for simultaneous SNP discovery and genotyping. In the present investigation, we have explored the use of GBS in soybean. In addition to developing a novel analysis pipeline to call SNPs and indels from the resulting sequence reads, we have devised a modified library preparation protocol to alter the degree of complexity reduction. We used a set of eight diverse soybean genotypes to conduct a pilot scale test of the protocol and pipeline. Using ApeKI for GBS library preparation and sequencing on an Illumina GAIIx machine, we obtained 5.5 M reads and these were processed using our pipeline. A total of 10,120 high quality SNPs were obtained and the distribution of these SNPs mirrored closely the distribution of gene-rich regions in the soybean genome. A total of 39.5% of the SNPs were present in genic regions and 52.5% of these were located in the coding sequence. Validation of over 400 genotypes at a set of randomly selected SNPs using Sanger sequencing showed a 98% success rate. We then explored the use of selective primers to achieve a greater complexity reduction during GBS library preparation. The number of SNP calls could be increased by almost 40% and their depth of coverage was more than doubled, thus opening the door to an increase in the throughput and a significant decrease in the per sample cost. The approach to obtain high quality SNPs developed here will be helpful for marker assisted genomics as well as assessment of available genetic resources for effective utilisation in a wide number of species. Collapse Key Words Collapse MESH Headings Chromosome Mapping Evolution, Molecular Genome, Plant Genomics Genotype Genotyping Techniques/methods Genotyping Techniques/standards High-Throughput Nucleotide Sequencing Phylogeny Polymorphism, Single Nucleotide Reproducibility of Results Glycine max/classification Glycine max/genetics Collapse Grants Collapse Collaborators Collapse	research-article	12	286
3	Castel SE, Levy-Moonshine A, Mohammadi P, Banks E, Lappalainen T. Tools and best practices for data processing in allelic expression analysis. Genome Biol 2015;16:195. [PMID: 26381377 PMCID: PMC4574606 DOI: 10.1186/s13059-015-0762-6] [Citation(s) in RCA: 247] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2015] [Accepted: 08/28/2015] [Indexed: 12/25/2022] Open Abstract Allelic expression analysis has become important for integrating genome and transcriptome data to characterize various biological phenomena such as cis-regulatory variation and nonsense-mediated decay. We analyze the properties of allelic expression read count data and technical sources of error, such as low-quality or double-counted RNA-seq reads, genotyping errors, allelic mapping bias, and technical covariates due to sample preparation and sequencing, and variation in total read depth. We provide guidelines for correcting such errors, show that our quality control measures improve the detection of relevant allelic expression, and introduce tools for the high-throughput production of allelic expression data from RNA-sequencing data. Collapse Key Words Collapse MESH Headings Alleles Cell Line Data Interpretation, Statistical Gene Expression Gene Expression Profiling/methods Gene Expression Profiling/standards Genotyping Techniques/standards Humans Sequence Analysis, RNA Software Collapse Grants R01 DA006227 NIDA NIH HHS U01 HG006569 NHGRI NIH HHS R01 MH090936 NIMH NIH HHS HHSN261200800001C NCI NIH HHS 3R01MH101814-02S1 NIMH NIH HHS R01 MH090951 NIMH NIH HHS R01 MH090937 NIMH NIH HHS 5U01HG006569 NHGRI NIH HHS R01 MH090948 NIMH NIH HHS R01 MH090941 NIMH NIH HHS HHSN268201000029C NHLBI NIH HHS HHSN261200800001E NCI NIH HHS R01 MH101814 NIMH NIH HHS Collapse Collaborators Collapse	Research Support, N.I.H., Extramural	10	247
4	Hwang S, Kim E, Lee I, Marcotte EM. Systematic comparison of variant calling pipelines using gold standard personal exome variants. Sci Rep 2015;5:17875. [PMID: 26639839 PMCID: PMC4671096 DOI: 10.1038/srep17875] [Citation(s) in RCA: 198] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2015] [Accepted: 11/06/2015] [Indexed: 01/08/2023] Open Abstract The success of clinical genomics using next generation sequencing (NGS) requires the accurate and consistent identification of personal genome variants. Assorted variant calling methods have been developed, which show low concordance between their calls. Hence, a systematic comparison of the variant callers could give important guidance to NGS-based clinical genomics. Recently, a set of high-confident variant calls for one individual (NA12878) has been published by the Genome in a Bottle (GIAB) consortium, enabling performance benchmarking of different variant calling pipelines. Based on the gold standard reference variant calls from GIAB, we compared the performance of thirteen variant calling pipelines, testing combinations of three read aligners--BWA-MEM, Bowtie2, and Novoalign--and four variant callers--Genome Analysis Tool Kit HaplotypeCaller (GATK-HC), Samtools mpileup, Freebayes and Ion Proton Variant Caller (TVC), for twelve data sets for the NA12878 genome sequenced by different platforms including Illumina2000, Illumina2500, and Ion Proton, with various exome capture systems and exome coverage. We observed different biases toward specific types of SNP genotyping errors by the different variant callers. The results of our study provide useful guidelines for reliable variant identification from deep sequencing of personal genomes. Collapse Key Words Collapse MESH Headings Base Sequence Exome/genetics Genetic Variation Genotyping Techniques/methods Genotyping Techniques/standards High-Throughput Nucleotide Sequencing Humans Polymorphism, Single Nucleotide/genetics Reference Standards Collapse Grants DP1 GM106408 NIGMS NIH HHS Collapse Collaborators Collapse	Comparative Study	10	198
5	Castro F, Dirks WG, Fähnrich S, Hotz-Wagenblatt A, Pawlita M, Schmitt M. High-throughput SNP-based authentication of human cell lines. Int J Cancer 2013;132:308-14. [PMID: 22700458 PMCID: PMC3492511 DOI: 10.1002/ijc.27675] [Citation(s) in RCA: 154] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2011] [Revised: 04/12/2012] [Accepted: 04/16/2012] [Indexed: 12/15/2022] Abstract Use of false cell lines remains a major problem in biological research. Short tandem repeat (STR) profiling represents the gold standard technique for cell line authentication. However, mismatch repair (MMR)-deficient cell lines are characterized by microsatellite instability, which could force allelic drifts in combination with a selective outgrowth of otherwise persisting side lines, and, thus, are likely to be misclassified by STR profiling. On the basis of the high-throughput Luminex platform, we developed a 24-plex single nucleotide polymorphism profiling assay, called multiplex cell authentication (MCA), for determining authentication of human cell lines. MCA was evaluated by analyzing a collection of 436 human cell lines from the German Collection of Microorganisms and Cell Cultures, previously characterized by eight-loci STR profiling. Both assays showed a very high degree of concordance and similar average matching probabilities (~1 × 10(-8) for STR profiling and ~1 × 10(-9) for MCA). MCA enabled the detection of less than 3% of contaminating human cells. By analyzing MMR-deficient cell lines, evidence was obtained for a higher robustness of the MCA compared to STR profiling. In conclusion, MCA could complement routine cell line authentication and replace the standard authentication STR technique in case of MSI cell lines. Collapse Key Words multiplex cell authentication (mca) snp str profiling luminex cell line cross-contamination mmr deficiency Collapse MESH Headings Cell Culture Techniques Cell Line DNA Mismatch Repair/genetics Genetic Loci Genotyping Techniques/standards Humans Limit of Detection Microsatellite Instability Polymorphism, Single Nucleotide Reference Standards Reproducibility of Results Collapse Grants ZIA CP010210 Intramural NIH HHS ZIA CP010210-01 Intramural NIH HHS Collapse Collaborators Collapse	research-article	12	154
6	Dilthey A, Cox C, Iqbal Z, Nelson MR, McVean G. Improved genome inference in the MHC using a population reference graph. Nat Genet 2015;47:682-8. [PMID: 25915597 PMCID: PMC4449272 DOI: 10.1038/ng.3257] [Citation(s) in RCA: 121] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2014] [Accepted: 03/03/2015] [Indexed: 12/21/2022] Abstract Although much is known about human genetic variation, such information is typically ignored in assembling new genomes. Instead, reads are mapped to a single reference, which can lead to poor characterization of regions of high sequence or structural diversity. We introduce a population reference graph, which combines multiple reference sequences and catalogs of variation. The genomes of new samples are reconstructed as paths through the graph using an efficient hidden Markov model, allowing for recombination between different haplotypes and additional variants. By applying the method to the 4.5-Mb extended MHC region on human chromosome 6, combining 8 assembled haplotypes, the sequences of known classical HLA alleles and 87,640 SNP variants from the 1000 Genomes Project, we demonstrate using simulations, SNP genotyping, and short-read and long-read data how the method improves the accuracy of genome inference and identified regions where the current set of reference sequences is substantially incomplete. Collapse Key Words Collapse MESH Headings Algorithms Computer Simulation Genome, Human Genotyping Techniques/standards Haplotypes Histocompatibility Antigens Class II/genetics Humans Models, Genetic Polymorphism, Single Nucleotide Reference Standards Sequence Analysis, DNA Collapse Grants 102541 Wellcome Trust 100956 Wellcome Trust 090532 Wellcome Trust 100956/Z/13/Z Wellcome Trust 102541/Z/13/Z Wellcome Trust Collapse Collaborators Collapse	research-article	10	121
7	Almeida JL, Cole KD, Plant AL. Standards for Cell Line Authentication and Beyond. PLoS Biol 2016;14:e1002476. [PMID: 27300367 PMCID: PMC4907466 DOI: 10.1371/journal.pbio.1002476] [Citation(s) in RCA: 90] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open Abstract Different genomic technologies have been applied to cell line authentication, but only one method (short tandem repeat [STR] profiling) has been the subject of a comprehensive and definitive standard (ASN-0002). Here we discuss the power of this document and why standards such as this are so critical for establishing the consensus technical criteria and practices that can enable progress in the fields of research that use cell lines. We also examine other methods that could be used for authentication and discuss how a combination of methods could be used in a holistic fashion to assess various critical aspects of the quality of cell lines. Collapse Key Words Collapse MESH Headings Animals Cell Line DNA Barcoding, Taxonomic/methods DNA Barcoding, Taxonomic/standards Gene Expression Profiling/methods Gene Expression Profiling/standards Genotyping Techniques/methods Genotyping Techniques/standards Humans Microsatellite Repeats/genetics Polymorphism, Single Nucleotide Reference Standards Reproducibility of Results Collapse Grants Collapse Collaborators Collapse	other	9	90
8	Brandies P, Peel E, Hogg CJ, Belov K. The Value of Reference Genomes in the Conservation of Threatened Species. Genes (Basel) 2019;10:E846. [PMID: 31717707 PMCID: PMC6895880 DOI: 10.3390/genes10110846] [Citation(s) in RCA: 82] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Revised: 10/18/2019] [Accepted: 10/23/2019] [Indexed: 12/17/2022] Open Abstract Conservation initiatives are now more crucial than ever-over a million plant and animal species are at risk of extinction over the coming decades. The genetic management of threatened species held in insurance programs is recommended; however, few are taking advantage of the full range of genomic technologies available today. Less than 1% of the 13505 species currently listed as threated by the International Union for Conservation of Nature (IUCN) have a published genome. While there has been much discussion in the literature about the importance of genomics for conservation, there are limited examples of how having a reference genome has changed conservation management practice. The Tasmanian devil (Sarcophilus harrisii), is an endangered Australian marsupial, threatened by an infectious clonal cancer devil facial tumor disease (DFTD). Populations have declined by 80% since the disease was first recorded in 1996. A reference genome for this species was published in 2012 and has been crucial for understanding DFTD and the management of the species in the wild. Here we use the Tasmanian devil as an example of how a reference genome has influenced management actions in the conservation of a species. Collapse Key Words Tasmanian devil conservation genomes Collapse MESH Headings Animals Endangered Species Genome Genomics/standards Genotyping Techniques/standards Marsupialia/genetics Reference Standards Collapse Grants Collapse Collaborators Collapse	Review	6	82
9	Bush SJ, Foster D, Eyre DW, Clark EL, De Maio N, Shaw LP, Stoesser N, Peto TEA, Crook DW, Walker AS. Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism-calling pipelines. Gigascience 2020;9:giaa007. [PMID: 32025702 PMCID: PMC7002876 DOI: 10.1093/gigascience/giaa007] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 12/02/2019] [Accepted: 01/15/2020] [Indexed: 02/06/2023] Open Abstract BACKGROUND Accurately identifying single-nucleotide polymorphisms (SNPs) from bacterial sequencing data is an essential requirement for using genomics to track transmission and predict important phenotypes such as antimicrobial resistance. However, most previous performance evaluations of SNP calling have been restricted to eukaryotic (human) data. Additionally, bacterial SNP calling requires choosing an appropriate reference genome to align reads to, which, together with the bioinformatic pipeline, affects the accuracy and completeness of a set of SNP calls obtained. This study evaluates the performance of 209 SNP-calling pipelines using a combination of simulated data from 254 strains of 10 clinically common bacteria and real data from environmentally sourced and genomically diverse isolates within the genera Citrobacter, Enterobacter, Escherichia, and Klebsiella. RESULTS We evaluated the performance of 209 SNP-calling pipelines, aligning reads to genomes of the same or a divergent strain. Irrespective of pipeline, a principal determinant of reliable SNP calling was reference genome selection. Across multiple taxa, there was a strong inverse relationship between pipeline sensitivity and precision, and the Mash distance (a proxy for average nucleotide divergence) between reads and reference genome. The effect was especially pronounced for diverse, recombinogenic bacteria such as Escherichia coli but less dominant for clonal species such as Mycobacterium tuberculosis. CONCLUSIONS The accuracy of SNP calling for a given species is compromised by increasing intra-species diversity. When reads were aligned to the same genome from which they were sequenced, among the highest-performing pipelines was Novoalign/GATK. By contrast, when reads were aligned to particularly divergent genomes, the highest-performing pipelines often used the aligners NextGenMap or SMALT, and/or the variant callers LoFreq, mpileup, or Strelka. Collapse Key Words SNP calling bacteria benchmarking evaluation variant calling Collapse MESH Headings Escherichia coli/genetics Genome, Bacterial Genomics/methods Genomics/standards Genotyping Techniques/methods Genotyping Techniques/standards Mycobacterium tuberculosis/genetics Polymorphism, Single Nucleotide Recombination, Genetic Sequence Alignment/methods Sequence Alignment/standards Software/standards Collapse Grants Department of Health BB/P013740/1 Biotechnology and Biological Sciences Research Council Collapse Collaborators Collapse	research-article	5	76
10	Welsh S, Peakman T, Sheard S, Almond R. Comparison of DNA quantification methodology used in the DNA extraction protocol for the UK Biobank cohort. BMC Genomics 2017;18:26. [PMID: 28056765 PMCID: PMC5217214 DOI: 10.1186/s12864-016-3391-x] [Citation(s) in RCA: 71] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2016] [Accepted: 12/07/2016] [Indexed: 01/03/2023] Open Abstract BACKGROUND UK Biobank is a large prospective cohort study in the UK established by the Medical Research Council (MRC) and the Wellcome Trust to enable approved researchers to investigate the role of genetic factors, environmental exposures and lifestyle in the causes of major diseases of late and middle age. A wide range of phenotypic data has been collected at recruitment and has recently been enhanced by the UK Biobank Genotyping Project. All UK Biobank participants (500,000) have been genotyped on either the UK Biobank Axiom® Array or the Affymetrix UK BiLEVE Axiom® Array and the workflow for preparing samples for genotyping is described. The genetic data is hoped to provide further insight into the genetics of disease. All data, including the genetic data, is available for access to approved researchers. Data for two methods of DNA quantification (ultraviolet-visible spectroscopy [UV/Vis]) measured on the Trinean DropSense™ 96 and PicoGreen®) were compared by two laboratories (UK Biobank and Affymetrix). RESULTS The sample processing workflow established at UK Biobank, for genotyping on the custom Affymetrix Axiom® array, resulted in high quality DNA (average DNA concentration 38.13 ng/μL, average 260/280 absorbance 1.91). The DNA generated high quality genotype data (average call rate 99.48% and pass rate 99.45%). The DNA concentration measured on the Trinean DropSense™ 96 at UK Biobank correlated well with DNA concentration measured by PicoGreen® at Affymetrix (r = 0.85). CONCLUSIONS The UK Biobank Genotyping Project demonstrated that the high throughput DNA extraction protocol described generates high quality DNA suitable for genotyping on the Affymetrix Axiom array. The correlation between DNA concentration derived from UV/Vis and PicoGreen® quantification methods suggests, in large-scale genetic studies involving two laboratories, it may be possible to remove the DNA quantification step in one laboratory without affecting downstream analyses. This would result in reductions in cost and time to complete the project, allowing generation of genetic data faster and cheaper. Collapse Key Words Affymetrix DNA concentration Genotyping PicoGreen Quantification Trinean UK Biobank UV/Vis Collapse MESH Headings Algorithms Biological Specimen Banks DNA/isolation & purification Genotyping Techniques/methods Genotyping Techniques/standards Humans Specimen Handling United Kingdom Collapse Grants Wellcome Trust British Heart Foundation Medical Research Council British Heart Foundation (GB) National Institute for Health Research Collapse Collaborators Collapse	research-article	8	71
11	Sigmon JS, Blanchard MW, Baric RS, Bell TA, Brennan J, Brockmann GA, Burks AW, Calabrese JM, Caron KM, Cheney RE, Ciavatta D, Conlon F, Darr DB, Faber J, Franklin C, Gershon TR, Gralinski L, Gu B, Gaines CH, Hagan RS, Heimsath EG, Heise MT, Hock P, Ideraabdullah F, Jennette JC, Kafri T, Kashfeen A, Kulis M, Kumar V, Linnertz C, Livraghi-Butrico A, Lloyd KCK, Lutz C, Lynch RM, Magnuson T, Matsushima GK, McMullan R, Miller DR, Mohlke KL, Moy SS, Murphy CEY, Najarian M, O'Brien L, Palmer AA, Philpot BD, Randell SH, Reinholdt L, Ren Y, Rockwood S, Rogala AR, Saraswatula A, Sassetti CM, Schisler JC, Schoenrock SA, Shaw GD, Shorter JR, Smith CM, St Pierre CL, Tarantino LM, Threadgill DW, Valdar W, Vilen BJ, Wardwell K, Whitmire JK, Williams L, Zylka MJ, Ferris MT, McMillan L, Manuel de Villena FP. Content and Performance of the MiniMUGA Genotyping Array: A New Tool To Improve Rigor and Reproducibility in Mouse Research. Genetics 2020;216:905-930. [PMID: 33067325 PMCID: PMC7768238 DOI: 10.1534/genetics.120.303596] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 10/06/2020] [Indexed: 12/14/2022] Open Abstract The laboratory mouse is the most widely used animal model for biomedical research, due in part to its well-annotated genome, wealth of genetic resources, and the ability to precisely manipulate its genome. Despite the importance of genetics for mouse research, genetic quality control (QC) is not standardized, in part due to the lack of cost-effective, informative, and robust platforms. Genotyping arrays are standard tools for mouse research and remain an attractive alternative even in the era of high-throughput whole-genome sequencing. Here, we describe the content and performance of a new iteration of the Mouse Universal Genotyping Array (MUGA), MiniMUGA, an array-based genetic QC platform with over 11,000 probes. In addition to robust discrimination between most classical and wild-derived laboratory strains, MiniMUGA was designed to contain features not available in other platforms: (1) chromosomal sex determination, (2) discrimination between substrains from multiple commercial vendors, (3) diagnostic SNPs for popular laboratory strains, (4) detection of constructs used in genetically engineered mice, and (5) an easy-to-interpret report summarizing these results. In-depth annotation of all probes should facilitate custom analyses by individual researchers. To determine the performance of MiniMUGA, we genotyped 6899 samples from a wide variety of genetic backgrounds. The performance of MiniMUGA compares favorably with three previous iterations of the MUGA family of arrays, both in discrimination capabilities and robustness. We have generated publicly available consensus genotypes for 241 inbred strains including classical, wild-derived, and recombinant inbred lines. Here, we also report the detection of a substantial number of XO and XXY individuals across a variety of sample types, new markers that expand the utility of reduced complexity crosses to genetic backgrounds other than C57BL/6, and the robust detection of 17 genetic constructs. We provide preliminary evidence that the array can be used to identify both partial sex chromosome duplication and mosaicism, and that diagnostic SNPs can be used to determine how long inbred mice have been bred independently from the relevant main stock. We conclude that MiniMUGA is a valuable platform for genetic QC, and an important new tool to increase the rigor and reproducibility of mouse research. Collapse Key Words chromosomal sex diagnostic SNPs genetic QC genetic background genetic constructs substrains Collapse MESH Headings Animals Female Genome-Wide Association Study/methods Genome-Wide Association Study/standards Genotype Genotyping Techniques/methods Genotyping Techniques/standards Male Mice/genetics Mice, Inbred C57BL Oligonucleotide Array Sequence Analysis/methods Oligonucleotide Array Sequence Analysis/standards Polymorphism, Genetic Reproducibility of Results Sex Determination Processes Collapse Grants R01 DK058702 NIDDK NIH HHS U42 RR014821 NCRR NIH HHS R01 AI143894 NIAID NIH HHS R01 AG066710 NIA NIH HHS R01 HL128119 NHLBI NIH HHS K22 ES023849 NIEHS NIH HHS P01 AI059443 NIAID NIH HHS R01 GM061728 NIGMS NIH HHS R37 HL065619 NHLBI NIH HHS U24 HG010100 NHGRI NIH HHS R01 AI138337 NIAID NIH HHS P01 DK058335 NIDDK NIH HHS K08 HL143271 NHLBI NIH HHS F32 GM085999 NIGMS NIH HHS P01 AI132130 NIAID NIH HHS P50 DA039841 NIDA NIH HHS P30 ES010126 NIEHS NIH HHS P30 CA016086 NCI NIH HHS U42 OD010924 NIH HHS U42 OD012210 NIH HHS R01 ES029925 NIEHS NIH HHS U42 OD010921 NIH HHS R01 MH100241 NIMH NIH HHS R21 AI117575 NIAID NIH HHS R01 HL155986 NHLBI NIH HHS U42 OD010918 NIH HHS P42 ES031007 NIEHS NIH HHS R01 GM121806 NIGMS NIH HHS U19 AI100625 NIAID NIH HHS R01 GM134531 NIGMS NIH HHS P30 DK065988 NIDDK NIH HHS Collapse Collaborators Collapse	Research Support, N.I.H., Extramural	5	64
12	Farshim PP, Bates GP. Mouse Models of Huntington's Disease. Methods Mol Biol 2018;1780:97-120. [PMID: 29856016 DOI: 10.1007/978-1-4939-7825-0_6] [Citation(s) in RCA: 54] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023] Abstract The identification of the mutation causing Huntington's disease (HD) has led to the generation of a large number of mouse models. These models are used to further enhance our understanding of the mechanisms underlying the disease, as well as investigating and identifying therapeutic targets for this disorder. Here we review the transgenic, knock-in mice commonly used to model HD, as well those that have been generated to study specific disease mechanisms. We then provide a brief overview of the importance of standardizing the use of HD mice and describe brief protocols used for genotyping the mouse models used within the Bates Laboratory. Collapse Key Words CAG repeat Huntingtin Huntington’s disease Inbred strain Mouse models N-terminal fragment Polyglutamine Transgenic knock-in Collapse MESH Headings Animals Disease Models, Animal Gene Knock-In Techniques/methods Gene Knock-In Techniques/standards Genotyping Techniques/methods Genotyping Techniques/standards Humans Huntingtin Protein/genetics Huntingtin Protein/metabolism Huntington Disease/genetics Huntington Disease/pathology Mice Mice, Transgenic Mutation Collapse Grants Medical Research Council Wellcome Trust Collapse Collaborators Collapse	Review	7	54
13	Yu YW, Yorukoglu D, Peng J, Berger B. Quality score compression improves genotyping accuracy. Nat Biotechnol 2015;33:240-3. [PMID: 25748910 PMCID: PMC4439189 DOI: 10.1038/nbt.3170] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Abstract Collapse Key Words Collapse MESH Headings Algorithms Data Compression Genotyping Techniques/standards High-Throughput Nucleotide Sequencing Humans ROC Curve Collapse Grants R01 GM108348 NIGMS NIH HHS GM108348 NIGMS NIH HHS Collapse Collaborators Collapse	Letter	10	46
14	Ramstetter MD, Dyer TD, Lehman DM, Curran JE, Duggirala R, Blangero J, Mezey JG, Williams AL. Benchmarking Relatedness Inference Methods with Genome-Wide Data from Thousands of Relatives. Genetics 2017;207:75-82. [PMID: 28739658 PMCID: PMC5586387 DOI: 10.1534/genetics.117.1122] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2017] [Accepted: 07/08/2017] [Indexed: 01/03/2023] Open Abstract Inferring relatedness from genomic data is an essential component of genetic association studies, population genetics, forensics, and genealogy. While numerous methods exist for inferring relatedness, thorough evaluation of these approaches in real data has been lacking. Here, we report an assessment of 12 state-of-the-art pairwise relatedness inference methods using a data set with 2485 individuals contained in several large pedigrees that span up to six generations. We find that all methods have high accuracy (92-99%) when detecting first- and second-degree relationships, but their accuracy dwindles to <43% for seventh-degree relationships. However, most identical by descent (IBD) segment-based methods inferred seventh-degree relatives correct to within one relatedness degree for >76% of relative pairs. Overall, the most accurate methods are Estimation of Recent Shared Ancestry (ERSA) and approaches that compute total IBD sharing using the output from GERMLINE and Refined IBD to infer relatedness. Combining information from the most accurate methods provides little accuracy improvement, indicating that novel approaches, such as new methods that leverage relatedness signals from multiple samples, are needed to achieve a sizeable jump in performance. Collapse Key Words admixture identical by descent relatedness estimation Collapse MESH Headings Benchmarking/methods Benchmarking/standards Genome, Human Genome-Wide Association Study/methods Genome-Wide Association Study/standards Genotyping Techniques/methods Genotyping Techniques/standards Humans Models, Genetic Pedigree Population/genetics Collapse Grants R01 DK047482 NIDDK NIH HHS R01 DK053889 NIDDK NIH HHS R01 EB015611 NIBIB NIH HHS R01 HL113323 NHLBI NIH HHS Collapse Collaborators Collapse	Research Support, N.I.H., Extramural	8	45
15	Milius RP, Mack SJ, Hollenbach JA, Pollack J, Heuer ML, Gragert L, Spellman S, Guethlein LA, Trachtenberg EA, Cooley S, Bochtler W, Mueller CR, Robinson J, Marsh SGE, Maiers M. Genotype List String: a grammar for describing HLA and KIR genotyping results in a text string. TISSUE ANTIGENS 2013;82:106-12. [PMID: 23849068 PMCID: PMC3715123 DOI: 10.1111/tan.12150] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2013] [Accepted: 05/22/2013] [Indexed: 01/19/2023] Abstract Knowledge of an individual's human leukocyte antigen (HLA) genotype is essential for modern medical genetics, and is crucial for hematopoietic stem cell and solid-organ transplantation. However, the high levels of polymorphism known for the HLA genes make it difficult to generate an HLA genotype that unambiguously identifies the alleles that are present at a given HLA locus in an individual. For the last 20 years, the histocompatibility and immunogenetics community has recorded this HLA genotyping ambiguity using allele codes developed by the National Marrow Donor Program (NMDP). While these allele codes may have been effective for recording an HLA genotyping result when initially developed, their use today results in increased ambiguity in an HLA genotype, and they are no longer suitable in the era of rapid allele discovery and ultra-high allele polymorphism. Here, we present a text string format capable of fully representing HLA genotyping results. This Genotype List (GL) String format is an extension of a proposed standard for reporting killer-cell immunoglobulin-like receptor (KIR) genotype data that can be applied to any genetic data that use a standard nomenclature for identifying variants. The GL String format uses a hierarchical set of operators to describe the relationships between alleles, lists of possible alleles, phased alleles, genotypes, lists of possible genotypes, and multilocus unphased genotypes, without losing typing information or increasing typing ambiguity. When used in concert with appropriate tools to create, exchange, and parse these strings, we anticipate that GL Strings will replace NMDP allele codes for reporting HLA genotypes. Collapse Key Words Genotype List String genotype human leukocyte antigen killer-cell immunoglobulin-like receptor Collapse MESH Headings Algorithms Alleles Gene Frequency Genotype Genotyping Techniques/standards Genotyping Techniques/statistics & numerical data HLA Antigens/genetics HLA Antigens/immunology Hematopoietic Stem Cell Transplantation Histocompatibility Testing/standards Histocompatibility Testing/statistics & numerical data Humans Organ Transplantation Polymorphism, Genetic Receptors, KIR/genetics Receptors, KIR/immunology Sequence Analysis, DNA Terminology as Topic Unrelated Donors Collapse Grants P01 CA111412 NCI NIH HHS P01 111412 PHS HHS Collapse Collaborators Collapse	Research Support, N.I.H., Extramural	12	44
16	Zhang F, Flickinger M, Taliun SAG, Abecasis GR, Scott LJ, McCaroll SA, Pato CN, Boehnke M, Kang HM. Ancestry-agnostic estimation of DNA sample contamination from sequence reads. Genome Res 2020;30:185-194. [PMID: 31980570 PMCID: PMC7050530 DOI: 10.1101/gr.246934.118] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 03/11/2019] [Indexed: 11/24/2022] Abstract Detecting and estimating DNA sample contamination are important steps to ensure high-quality genotype calls and reliable downstream analysis. Existing methods rely on population allele frequency information for accurate estimation of contamination rates. Correctly specifying population allele frequencies for each individual in early stage of sequence analysis is impractical or even impossible for large-scale sequencing centers that simultaneously process samples from multiple studies across diverse populations. On the other hand, incorrectly specified allele frequencies may result in substantial bias in estimated contamination rates. For example, we observed that existing methods often fail to identify 10% contaminated samples at a typical 3% contamination exclusion threshold when genetic ancestry is misspecified. Such an incomplete screening of contaminated samples substantially inflates the estimated rate of genotyping errors even in deeply sequenced genomes and exomes. We propose a robust statistical method that accurately estimates DNA contamination and is agnostic to genetic ancestry of the intended or contaminating sample. Our method integrates the estimation of genetic ancestry and DNA contamination in a unified likelihood framework by leveraging individual-specific allele frequencies projected from reference genotypes onto principal component coordinates. Our method can also be used for estimating genetic ancestries, similar to LASER or TRACE, but simultaneously accounting for potential contamination. We demonstrate that our method robustly estimates contamination rates and genetic ancestries across populations and contamination scenarios. We further demonstrate that, in the presence of contamination, genetic ancestry inference can be substantially biased with existing methods that ignore contamination, while our method corrects for such biases. Collapse Key Words Collapse MESH Headings Alleles DNA/genetics DNA Contamination Exome/genetics Gene Frequency/genetics Genetics, Population Genotype Genotyping Techniques/standards Humans Polymorphism, Single Nucleotide/genetics Sequence Analysis, DNA Collapse Grants R01 MH104964 NIMH NIH HHS U01 MH105653 NIMH NIH HHS R01 MH123451 NIMH NIH HHS U01 HL137182 NHLBI NIH HHS R01 HG007022 NHGRI NIH HHS R01 HG009976 NHGRI NIH HHS R01 MH085548 NIMH NIH HHS NIH NHGRI NHLBI NIMH Collapse Collaborators Collapse	Research Support, N.I.H., Extramural	5	41
17	Renaud G, Hanghøj K, Korneliussen TS, Willerslev E, Orlando L. Joint Estimates of Heterozygosity and Runs of Homozygosity for Modern and Ancient Samples. Genetics 2019;212:587-614. [PMID: 31088861 PMCID: PMC6614887 DOI: 10.1534/genetics.119.302057] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 05/01/2019] [Indexed: 11/18/2022] Open Abstract Both the total amount and the distribution of heterozygous sites within individual genomes are informative about the genetic diversity of the population they belong to. Detecting true heterozygous sites in ancient genomes is complicated by the generally limited coverage achieved and the presence of post-mortem damage inflating sequencing errors. Additionally, large runs of homozygosity found in the genomes of particularly inbred individuals and of domestic animals can skew estimates of genome-wide heterozygosity rates. Current computational tools aimed at estimating runs of homozygosity and genome-wide heterozygosity levels are generally sensitive to such limitations. Here, we introduce ROHan, a probabilistic method which substantially improves the estimate of heterozygosity rates both genome-wide and for genomic local windows. It combines a local Bayesian model and a Hidden Markov Model at the genome-wide level and can work both on modern and ancient samples. We show that our algorithm outperforms currently available methods for predicting heterozygosity rates for ancient samples. Specifically, ROHan can delineate large runs of homozygosity (at megabase scales) and produce a reliable confidence interval for the genome-wide rate of heterozygosity outside of such regions from modern genomes with a depth of coverage as low as 5-6× and down to 7-8× for ancient samples showing moderate DNA damage. We apply ROHan to a series of modern and ancient genomes previously published and revise available estimates of heterozygosity for humans, chimpanzees and horses. Collapse Key Words Ancient DNA Runs of homozygosity effective population size heterozygosity inbreeding Collapse MESH Headings Animals Bayes Theorem DNA, Ancient Genotyping Techniques/methods Genotyping Techniques/standards Heterozygote Homozygote Humans Markov Chains Collapse Grants Collapse Collaborators Collapse	research-article	6	40
18	Eklund C, Forslund O, Wallin KL, Dillner J. Continuing global improvement in human papillomavirus DNA genotyping services: The 2013 and 2014 HPV LabNet international proficiency studies. J Clin Virol 2018;101:74-85. [PMID: 29433017 DOI: 10.1016/j.jcv.2018.01.016] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Revised: 01/17/2018] [Accepted: 01/26/2018] [Indexed: 11/19/2022] Abstract BACKGROUND Accurate and internationally comparable human papillomavirus (HPV) DNA detection and typing services are essential for HPV vaccine research and surveillance. OBJECTIVES This study assessed the proficiency of different HPV typing services offered routinely in laboratories worldwide. STUDY DESIGN The HPV Laboratory Network (LabNet) has designed international proficiency panels that can be regularly issued. The HPV genotyping proficiency panels of 2013 and 2014 contained 43 and 41 coded samples, respectively, composed of purified plasmids of sixteen HPV types (HPV 6, 11, 16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, 59, 66, 68a and 68b) and 3 extraction controls. Proficient typing was defined as detection in both single and multiple infections of 50 International Units of HPV 16 and HPV 18 and 500 genome equivalents for the other 14 HPV types, with at least 97% specificity. RESULTS Ninety-six laboratories submitted 136 datasets in 2013 and 121 laboratories submitted 148 datasets in 2014. Thirty-four different HPV genotyping assays were used, notably Linear Array, HPV Direct Flow-chip, GenoFlow HPV array, Anyplex HPV 28, Inno-LiPa, and PGMY-CHUV assays. A trend towards increased sensitivity and specificity was observed. In 2013, 59 data sets (44%) were 100% proficient compared to 86 data sets (59%) in 2014. This is a definite improvement compared to the first proficiency panel, issued in 2008, when only 19 data sets (26%) were fully proficient. CONCLUSION The regularly issued global proficiency program has documented an ongoing worldwide improvement in comparability and reliability of HPV genotyping services. Collapse Key Words International standards Quality assurance Vaccinology Collapse MESH Headings Female Genotyping Techniques/standards Global Health Health Services Research Humans International Cooperation Laboratories Laboratory Proficiency Testing Papillomaviridae/classification Papillomaviridae/genetics Papillomaviridae/isolation & purification Papillomavirus Infections/virology Sensitivity and Specificity Virology/standards Collapse Grants Collapse Collaborators Collapse	Research Support, Non-U.S. Gov't	7	33
19	Naj AC, Lin H, Vardarajan BN, White S, Lancour D, Ma Y, Schmidt M, Sun F, Butkiewicz M, Bush WS, Kunkle BW, Malamon J, Amin N, Choi SH, Hamilton-Nelson KL, van der Lee SJ, Gupta N, Koboldt DC, Saad M, Wang B, Nato AQ, Sohi HK, Kuzma A, Wang LS, Cupples LA, van Duijn C, Seshadri S, Schellenberg GD, Boerwinkle E, Bis JC, Dupuis J, Salerno WJ, Wijsman EM, Martin ER, DeStefano AL. Quality control and integration of genotypes from two calling pipelines for whole genome sequence data in the Alzheimer's disease sequencing project. Genomics 2019;111:808-818. [PMID: 29857119 PMCID: PMC6397097 DOI: 10.1016/j.ygeno.2018.05.004] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Revised: 04/03/2018] [Accepted: 05/06/2018] [Indexed: 12/30/2022] Abstract The Alzheimer's Disease Sequencing Project (ADSP) performed whole genome sequencing (WGS) of 584 subjects from 111 multiplex families at three sequencing centers. Genotype calling of single nucleotide variants (SNVs) and insertion-deletion variants (indels) was performed centrally using GATK-HaplotypeCaller and Atlas V2. The ADSP Quality Control (QC) Working Group applied QC protocols to project-level variant call format files (VCFs) from each pipeline, and developed and implemented a novel protocol, termed "consensus calling," to combine genotype calls from both pipelines into a single high-quality set. QC was applied to autosomal bi-allelic SNVs and indels, and included pipeline-recommended QC filters, variant-level QC, and sample-level QC. Low-quality variants or genotypes were excluded, and sample outliers were noted. Quality was assessed by examining Mendelian inconsistencies (MIs) among 67 parent-offspring pairs, and MIs were used to establish additional genotype-specific filters for GATK calls. After QC, 578 subjects remained. Pipeline-specific QC excluded ~12.0% of GATK and 14.5% of Atlas SNVs. Between pipelines, ~91% of SNV genotypes across all QCed variants were concordant; 4.23% and 4.56% of genotypes were exclusive to Atlas or GATK, respectively; the remaining ~0.01% of discordant genotypes were excluded. For indels, variant-level QC excluded ~36.8% of GATK and 35.3% of Atlas indels. Between pipelines, ~55.6% of indel genotypes were concordant; while 10.3% and 28.3% were exclusive to Atlas or GATK, respectively; and ~0.29% of discordant genotypes were. The final WGS consensus dataset contains 27,896,774 SNVs and 3,133,926 indels and is publicly available. Collapse Key Words Atlas Consensus calling GATK Mendelian inconsistencies Quality control Whole genome sequencing Collapse MESH Headings Algorithms Alzheimer Disease/genetics Female Genome-Wide Association Study/methods Genome-Wide Association Study/standards Genotype Genotyping Techniques/methods Genotyping Techniques/standards Humans Male Polymorphism, Genetic Quality Control Whole Genome Sequencing/methods Whole Genome Sequencing/standards Collapse Grants R01 AG054060 NIA NIH HHS R01 AG054076 NIA NIH HHS U54 HG003067 NHGRI NIH HHS U24 AG021886 NIA NIH HHS P50 AG008702 NIA NIH HHS U01 AG016976 NIA NIH HHS P50 AG005136 NIA NIH HHS R01 HL105756 NHLBI NIH HHS U24 AG041689 NIA NIH HHS R01 AG033193 NIA NIH HHS HHSN268201100009C NHLBI NIH HHS P30 AG010129 NIA NIH HHS HHSN268201100006C NHLBI NIH HHS HHSN268201100010C NHLBI NIH HHS U01 AG049505 NIA NIH HHS HHSN268201100008C NHLBI NIH HHS RC2 HL102419 NHLBI NIH HHS U01 AG058654 NIA NIH HHS U54 AG052427 NIA NIH HHS R01 NS017950 NINDS NIH HHS HHSN268201100007C NHLBI NIH HHS U24 AG072122 NIA NIH HHS U01 AG049507 NIA NIH HHS U01 AG032984 NIA NIH HHS HHSN268201100011C NHLBI NIH HHS UF1 AG047133 NIA NIH HHS U54 HG003273 NHGRI NIH HHS HHSN268201100012C NHLBI NIH HHS U01 AG049508 NIA NIH HHS HHSN268201100005C NHLBI NIH HHS U01 AG062602 NIA NIH HHS P30 AG066546 NIA NIH HHS U54 HG003079 NHGRI NIH HHS U01 AG052409 NIA NIH HHS Collapse Collaborators Collapse	Research Support, N.I.H., Extramural	6	30
20	Capes-Davis A, Neve RM. Authentication: A Standard Problem or a Problem of Standards? PLoS Biol 2016;14:e1002477. [PMID: 27300550 PMCID: PMC4907433 DOI: 10.1371/journal.pbio.1002477] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open Abstract Reproducibility and transparency in biomedical sciences have been called into question, and scientists have been found wanting as a result. Putting aside deliberate fraud, there is evidence that a major contributor to lack of reproducibility is insufficient quality assurance of reagents used in preclinical research. Cell lines are widely used in biomedical research to understand fundamental biological processes and disease states, yet most researchers do not perform a simple, affordable test to authenticate these key resources. Here, we provide a synopsis of the problems we face and how standards can contribute to an achievable solution. A major contributor to lack of reproducibility in preclinical research is insufficient quality assurance of reagents used. This article examines the problems surrounding the authentication of cell lines and discusses potential solutions. Collapse Key Words Collapse MESH Headings Biomedical Research/methods Biomedical Research/standards Cell Line Gene Expression Profiling/methods Gene Expression Profiling/standards Genotyping Techniques/methods Genotyping Techniques/standards Humans Microsatellite Repeats/genetics Polymorphism, Single Nucleotide Publications/standards Reference Standards Reproducibility of Results Collapse Grants Collapse Collaborators Collapse	Comment	9	25
21	Fang H, Liu X, Ramírez J, Choudhury N, Kubo M, Im HK, Konkashbaev A, Cox NJ, Ratain MJ, Nakamura Y, O’Donnell PH. Establishment of CYP2D6 reference samples by multiple validated genotyping platforms. THE PHARMACOGENOMICS JOURNAL 2014;14:564-72. [PMID: 24980783 PMCID: PMC4237721 DOI: 10.1038/tpj.2014.27] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/18/2013] [Revised: 04/30/2014] [Accepted: 05/22/2014] [Indexed: 11/30/2022] Abstract Cytochrome P450 2D6 (cytochrome P450, family 2, subfamily D, polypeptide 6 (CYP2D6)), a highly polymorphic drug-metabolizing enzyme, is involved in the metabolism of one-quarter of the most commonly prescribed medications. Here we have applied multiple genotyping methods and Sanger sequencing to assign precise and reproducible CYP2D6 genotypes, including copy numbers, for 48 HapMap samples. Furthermore, by analyzing a set of 50 human liver microsomes using endoxifen formation from N-desmethyl-tamoxifen as the phenotype of interest, we observed a significant positive correlation between CYP2D6 genotype-assigned activity score and endoxifen formation rate (rs = 0.68 by rank correlation test, P = 5.3 × 10(-8)), which corroborated the genotype-phenotype prediction derived from our genotyping methodologies. In the future, these 48 publicly available HapMap samples characterized by multiple substantiated CYP2D6 genotyping platforms could serve as a reference resource for assay development, validation, quality control and proficiency testing for other CYP2D6 genotyping projects and for programs pursuing clinical pharmacogenomic testing implementation. Collapse Key Words cyp2d6 genotyping pharmacogenomics clinical implementation sequencing Collapse MESH Headings Alleles Cytochrome P-450 CYP2D6/genetics Genetic Variation/genetics Genotype Genotyping Techniques/standards Humans Liver/cytology Liver/enzymology Microsomes, Liver/enzymology Reference Standards Reproducibility of Results Collapse Grants U01 GM061393 NIGMS NIH HHS K12 CA139160 NCI NIH HHS T32 GM007019 NIGMS NIH HHS NIH K12 CA139160 NCI NIH HHS NIH T32GM007019 NIGMS NIH HHS K23 GM100288 NIGMS NIH HHS K23 GM100288-01A1 NIGMS NIH HHS N01DK92310 NIDDK NIH HHS NIH U01GM061393 NIGMS NIH HHS Collapse Collaborators Collapse	Research Support, N.I.H., Extramural	11	25
22	Stone EA. Joint genotyping on the fly: identifying variation among a sequenced panel of inbred lines. Genome Res 2012;22:966-74. [PMID: 22367192 PMCID: PMC3337441 DOI: 10.1101/gr.129122.111] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2011] [Accepted: 02/21/2012] [Indexed: 02/03/2023] Abstract High-throughput sequencing is enabling remarkably deep surveys of genomic variation. It is now possible to completely sequence multiple individuals from a single species, yet the identification of variation among them remains an evolving computational challenge. This challenge is compounded for experimental organisms when strains are studied instead of individuals. In response, we present the Joint Genotyper for Inbred Lines (JGIL) as a method for obtaining genotypes and identifying variation among a large panel of inbred strains or lines. JGIL inputs the sequence reads from each line after their alignment to a common reference. Its probabilistic model includes site-specific parameters common to all lines that describe the frequency of nucleotides segregating in the population from which the inbred panel was derived. The distribution of line genotypes is conditional on these parameters and reflects the experimental design. Site-specific error probabilities, also common to all lines, parameterize the distribution of reads conditional on line genotype and realized coverage. Both sets of parameters are estimated per site from the aggregate read data, and posterior probabilities are calculated to decode the genotype of each line. We present an application of JGIL to 162 inbred Drosophila melanogaster lines from the Drosophila Genetic Reference Panel. We explore by simulation the effect of varying coverage, sequencing error, mapping error, and the number of lines. In doing so, we illustrate how JGIL is robust to moderate levels of error. Supported by these analyses, we advocate the importance of modeling the data and the experimental design when possible. Collapse Key Words Collapse MESH Headings Algorithms Animals Chromosome Mapping Computer Simulation Drosophila melanogaster/genetics Genetic Variation Genotyping Techniques/standards Inbreeding Likelihood Functions Models, Genetic Polymorphism, Single Nucleotide Reference Standards Sequence Analysis, DNA Collapse Grants R01 GM045146 NIGMS NIH HHS R01GM045146 NIGMS NIH HHS Collapse Collaborators Collapse	Research Support, N.I.H., Extramural	13	23
23	Crysnanto D, Wurmser C, Pausch H. Accurate sequence variant genotyping in cattle using variation-aware genome graphs. Genet Sel Evol 2019;51:21. [PMID: 31092189 PMCID: PMC6521551 DOI: 10.1186/s12711-019-0462-x] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Accepted: 05/03/2019] [Indexed: 12/22/2022] Open Abstract BACKGROUND Genotyping of sequence variants typically involves, as a first step, the alignment of sequencing reads to a linear reference genome. Because a linear reference genome represents only a small fraction of all the DNA sequence variation within a species, reference allele bias may occur at highly polymorphic or divergent regions of the genome. Graph-based methods facilitate the comparison of sequencing reads to a variation-aware genome graph, which incorporates a collection of non-redundant DNA sequences that segregate within a species. We compared the accuracy and sensitivity of graph-based sequence variant genotyping using the Graphtyper software to two widely-used methods, i.e., GATK and SAMtools, which rely on linear reference genomes using whole-genome sequencing data from 49 Original Braunvieh cattle. RESULTS We discovered 21,140,196, 20,262,913, and 20,668,459 polymorphic sites using GATK, Graphtyper, and SAMtools, respectively. Comparisons between sequence variant genotypes and microarray-derived genotypes showed that Graphtyper outperformed both GATK and SAMtools in terms of genotype concordance, non-reference sensitivity, and non-reference discrepancy. The sequence variant genotypes that were obtained using Graphtyper had the smallest number of Mendelian inconsistencies between sequence-derived single nucleotide polymorphisms and indels in nine sire-son pairs. Genotype phasing and imputation using the Beagle software improved the quality of the sequence variant genotypes for all the tools evaluated, particularly for animals that were sequenced at low coverage. Following imputation, the concordance between sequence- and microarray-derived genotypes was almost identical for the three methods evaluated, i.e., 99.32, 99.46, and 99.24% for GATK, Graphtyper, and SAMtools, respectively. Variant filtration based on commonly used criteria improved genotype concordance slightly but it also decreased sensitivity. Graphtyper required considerably more computing resources than SAMtools but less than GATK. CONCLUSIONS Sequence variant genotyping using Graphtyper is accurate, sensitive and computationally feasible in cattle. Graph-based methods enable sequence variant genotyping from variation-aware reference genomes that may incorporate cohort-specific sequence variants, which is not possible with the current implementation of state-of-the-art methods that rely on linear reference genomes. Collapse Key Words Collapse MESH Headings Animals Cattle/genetics Genome-Wide Association Study/methods Genotyping Techniques/methods Genotyping Techniques/standards Polymorphism, Genetic Software Collapse Grants Collapse Collaborators Collapse	Comparative Study	6	23
24	Kubik S, Marques AC, Xing X, Silvery J, Bertelli C, De Maio F, Pournaras S, Burr T, Duffourd Y, Siemens H, Alloui C, Song L, Wenger Y, Saitta A, Macheret M, Smith EW, Menu P, Brayer M, Steinmetz LM, Si-Mohammed A, Chuisseu J, Stevens R, Constantoulakis P, Sali M, Greub G, Tiemann C, Pelechano V, Willig A, Xu Z. Recommendations for accurate genotyping of SARS-CoV-2 using amplicon-based sequencing of clinical samples. Clin Microbiol Infect 2021;27:1036.e1-1036.e8. [PMID: 33813118 PMCID: PMC8016543 DOI: 10.1016/j.cmi.2021.03.029] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 02/12/2021] [Accepted: 03/06/2021] [Indexed: 01/03/2023] Abstract OBJECTIVES Genotyping of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been instrumental in monitoring viral evolution and transmission during the pandemic. The quality of the sequence data obtained from these genotyping efforts depends on several factors, including the quantity/integrity of the input material, the technology, and laboratory-specific implementation. The current lack of guidelines for SARS-CoV-2 genotyping leads to inclusion of error-containing genome sequences in genomic epidemiology studies. We aimed to establish clear and broadly applicable recommendations for reliable virus genotyping. METHODS We established and used a sequencing data analysis workflow that reliably identifies and removes technical artefacts; such artefacts can result in miscalls when using alternative pipelines to process clinical samples and synthetic viral genomes with an amplicon-based genotyping approach. We evaluated the impact of experimental factors, including viral load and sequencing depth, on correct sequence determination. RESULTS We found that at least 1000 viral genomes are necessary to confidently detect variants in the SARS-CoV-2 genome at frequencies of ≥10%. The broad applicability of our recommendations was validated in over 200 clinical samples from six independent laboratories. The genotypes we determined for clinical isolates with sufficient quality cluster by sampling location and period. Our analysis also supports the rise in frequencies of 20A.EU1 and 20A.EU2, two recently reported European strains whose dissemination was facilitated by travel during the summer of 2020. CONCLUSIONS We present much-needed recommendations for the reliable determination of SARS-CoV-2 genome sequences and demonstrate their broad applicability in a large cohort of clinical samples. Collapse Key Words Amplicon Coronavirus Genome Genotyping Guidelines NGS Next-generation sequencing Recommendations SARS-CoV-2 Collapse MESH Headings Artifacts COVID-19/diagnosis COVID-19/virology Genome, Viral Genotyping Techniques/methods Genotyping Techniques/standards Guidelines as Topic High-Throughput Nucleotide Sequencing/methods High-Throughput Nucleotide Sequencing/standards Humans RNA, Viral Reproducibility of Results SARS-CoV-2/genetics SARS-CoV-2/isolation & purification Sensitivity and Specificity Whole Genome Sequencing/methods Whole Genome Sequencing/standards Workflow Collapse Grants Collapse Collaborators Collapse	research-article	4	22
25	Semagn K, Beyene Y, Makumbi D, Mugo S, Prasanna BM, Magorokosho C, Atlin G. Quality control genotyping for assessment of genetic identity and purity in diverse tropical maize inbred lines. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2012;125:1487-501. [PMID: 22801872 DOI: 10.1007/s00122-012-1928-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2011] [Accepted: 06/16/2012] [Indexed: 05/20/2023] Abstract Quality control (QC) genotyping is an important component in breeding, but to our knowledge there are not well established protocols for its implementation in practical breeding programs. The objectives of our study were to (a) ascertain genetic identity among 2-4 seed sources of the same inbred line, (b) evaluate the extent of genetic homogeneity within inbred lines, and (c) identify a subset of highly informative single-nucleotide polymorphism (SNP) markers for routine and low-cost QC genotyping and suggest guidelines for data interpretation. We used a total of 28 maize inbred lines to study genetic identity among different seed sources by genotyping them with 532 and 1,065 SNPs using the KASPar and GoldenGate platforms, respectively. An additional set of 544 inbred lines was used for studying genetic homogeneity. The proportion of alleles that differed between seed sources of the same inbred line varied from 0.1 to 42.3 %. Seed sources exhibiting high levels of genetic distance are mis-labeled, while those with lower levels of difference are contaminated or still segregating. Genetic homogeneity varied from 68.7 to 100 % with 71.3 % of the inbred lines considered to be homogenous. Based on the data sets obtained for a wide range of sample sizes and diverse genetic backgrounds, we recommended a subset of 50-100 SNPs for routine and low-cost QC genotyping, verified them in a different set of double haploid and inbred lines, and outlined a protocol that could be used to minimize errors in genetic analyses and breeding. Collapse Key Words Collapse MESH Headings Alleles Genetic Heterogeneity Genetic Loci/genetics Genotype Genotyping Techniques/methods Genotyping Techniques/standards Haploidy Inbreeding Phylogeny Polymorphism, Single Nucleotide/genetics Quality Control Selection, Genetic Tropical Climate Zea mays/genetics Collapse Grants Collapse Collaborators Collapse		13	21

Please SIGN IN to browse more articles.