1
|
Chen H, Wang B, Cai L, Yang X, Hu Y, Zhang Y, Leng X, Liu W, Fan D, Niu B, Zhou Q. A comprehensive performance evaluation, comparison, and integration of computational methods for detecting and estimating cross-contamination of human samples in cancer next-generation sequencing analysis. J Biomed Inform 2024; 152:104625. [PMID: 38479675 DOI: 10.1016/j.jbi.2024.104625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 02/25/2024] [Accepted: 03/10/2024] [Indexed: 03/17/2024]
Abstract
Cross-sample contamination is one of the major issues in next-generation sequencing (NGS)-based molecular assays. This type of contamination, even at very low levels, can significantly impact the results of an analysis, especially in the detection of somatic alterations in tumor samples. Several contamination identification tools have been developed and implemented as a crucial quality-control step in the routine NGS bioinformatic pipeline. However, no study has been published to comprehensively and systematically investigate, evaluate, and compare these computational methods in the cancer NGS analysis. In this study, we comprehensively investigated nine state-of-the-art computational methods for detecting cross-sample contamination. To explore their application in cancer NGS analysis, we further compared the performance of five representative tools by qualitative and quantitative analyses using in silico and simulated experimental NGS data. The results showed that Conpair achieved the best performance for identifying contamination and predicting the level of contamination in solid tumors NGS analysis. Moreover, based on Conpair, we developed a Python script, Contamination Source Predictor (ConSPr), to identify the source of contamination. We anticipate that this comprehensive survey and the proposed tool for predicting the source of contamination will assist researchers in selecting appropriate cross-contamination detection tools in cancer NGS analysis and inspire the development of computational methods for detecting sample cross-contamination and identifying its source in the future.
Collapse
Affiliation(s)
- Huijuan Chen
- Beijing ChosenMed Clinical Laboratory Co. Ltd., Beijing 100176, China; Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; WillingMed Technology Beijing Co. Ltd., Beijing 100176, China
| | - Bing Wang
- Beijing ChosenMed Clinical Laboratory Co. Ltd., Beijing 100176, China
| | - Lili Cai
- Beijing ChosenMed Clinical Laboratory Co. Ltd., Beijing 100176, China
| | - Xiaotian Yang
- Beijing ChosenMed Clinical Laboratory Co. Ltd., Beijing 100176, China
| | - Yali Hu
- Beijing ChosenMed Clinical Laboratory Co. Ltd., Beijing 100176, China
| | - Yiran Zhang
- Beijing ChosenMed Clinical Laboratory Co. Ltd., Beijing 100176, China
| | - Xue Leng
- Beijing ChosenMed Clinical Laboratory Co. Ltd., Beijing 100176, China
| | - Wen Liu
- Beijing ChosenMed Clinical Laboratory Co. Ltd., Beijing 100176, China
| | - Dongjie Fan
- National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Disease, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing 102206, China.
| | - Beifang Niu
- Beijing ChosenMed Clinical Laboratory Co. Ltd., Beijing 100176, China; Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; ChosenMed Technology (Zhejiang) Co. Ltd., Zhejiang 311103, China.
| | - Qiming Zhou
- Beijing ChosenMed Clinical Laboratory Co. Ltd., Beijing 100176, China; ChosenMed Technology (Zhejiang) Co. Ltd., Zhejiang 311103, China.
| |
Collapse
|
2
|
Lopez-Medina AI, Campos-Staffico AM, A Chahal CA, Volkers I, Jacoby JP, Berenfeld O, Luzum JA. Genetic risk factors for drug-induced long QT syndrome: findings from a large real-world case-control study. Pharmacogenomics 2024; 25:117-131. [PMID: 38506312 DOI: 10.2217/pgs-2023-0229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/21/2024] Open
Abstract
Aim: Drug-induced long QT syndrome (diLQTS), an adverse effect of many drugs, can lead to sudden cardiac death. Candidate genetic variants in cardiac ion channels have been associated with diLQTS, but several limitations of previous studies hamper clinical utility. Materials & methods: Thus, the purpose of this study was to assess the associations of KCNE1-D85N, KCNE2-I57T and SCN5A-G615E with diLQTS in a large observational case-control study (6,083 self-reported white patients treated with 27 different high-risk QT-prolonging medications; 12.0% with diLQTS). Results: KCNE1-D85N significantly associated with diLQTS (adjusted odds ratio: 2.24 [95% CI: 1.35-3.58]; p = 0.001). Given low minor allele frequencies, the study had insufficient power to analyze KCNE2-I57T and SCN5A-G615E. Conclusion: KCNE1-D85N is a risk factor for diLQTS that should be considered in future clinical practice guidelines.
Collapse
Grants
- F32 HL162231, K08 HL146990, R01-HL156961, R21-EB032661, R21-HL153694, T32 TR004371 CSR NIH HHS
- F32 HL162231, K08 HL146990, R01-HL156961, R21-EB032661, R21-HL153694, T32 TR004371 CSR NIH HHS
Collapse
Affiliation(s)
- Ana I Lopez-Medina
- Department of Clinical Pharmacy, University of Michigan College of Pharmacy, Ann Arbor, MI, USA
| | | | - Choudhary Anwar A Chahal
- Center for Inherited Cardiovascular Diseases, WellSpan Health, Lancaster, PA, USA
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
- Department of Cardiology, Barts Heart Centre, London, UK
| | - Isabella Volkers
- Department of Clinical Pharmacy, University of Michigan College of Pharmacy, Ann Arbor, MI, USA
| | - Juliet P Jacoby
- Department of Clinical Pharmacy, University of Michigan College of Pharmacy, Ann Arbor, MI, USA
| | - Omer Berenfeld
- Center for Arrhythmia Research, Departments of Internal Medicine - Cardiology, Biomedical Engineering, & Applied Physics, University of Michigan, Ann Arbor, MI, USA
| | - Jasmine A Luzum
- Department of Clinical Pharmacy, University of Michigan College of Pharmacy, Ann Arbor, MI, USA
| |
Collapse
|
3
|
Zawistowski M, Fritsche LG, Pandit A, Vanderwerff B, Patil S, Schmidt EM, VandeHaar P, Willer CJ, Brummett CM, Kheterpal S, Zhou X, Boehnke M, Abecasis GR, Zöllner S. The Michigan Genomics Initiative: A biobank linking genotypes and electronic clinical records in Michigan Medicine patients. CELL GENOMICS 2023; 3:100257. [PMID: 36819667 PMCID: PMC9932985 DOI: 10.1016/j.xgen.2023.100257] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 06/07/2022] [Accepted: 01/05/2023] [Indexed: 02/04/2023]
Abstract
Biobanks of linked clinical patient histories and biological samples are an efficient strategy to generate large cohorts for modern genetics research. Biobank recruitment varies by factors such as geographic catchment and sampling strategy, which affect biobank demographics and research utility. Here, we describe the Michigan Genomics Initiative (MGI), a single-health-system biobank currently consisting of >91,000 participants recruited primarily during surgical encounters at Michigan Medicine. The surgical enrollment results in a biobank enriched for many diseases and ideally suited for a disease genetics cohort. Compared with the much larger population-based UK Biobank, MGI has higher prevalence for nearly all diagnosis-code-based phenotypes and larger absolute case counts for many phenotypes. Genome-wide association study (GWAS) results replicate known findings, thereby validating the genetic and clinical data. Our results illustrate that opportunistic biobank sampling within single health systems provides a unique and complementary resource for exploring the genetics of complex diseases.
Collapse
Affiliation(s)
- Matthew Zawistowski
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48103, USA
| | - Lars G. Fritsche
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48103, USA
| | - Anita Pandit
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48103, USA
| | - Brett Vanderwerff
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48103, USA
| | - Snehal Patil
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48103, USA
| | - Ellen M. Schmidt
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48103, USA
| | - Peter VandeHaar
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48103, USA
| | - Cristen J. Willer
- Department of Internal Medicine, Division of Cardiovascular Medicine, Department of Human Genetics, University of Michigan, Ann Arbor, MI 48103, USA
| | - Chad M. Brummett
- Department of Anesthesiology, University of Michigan, Ann Arbor, MI 48103, USA
| | - Sachin Kheterpal
- Department of Anesthesiology, University of Michigan, Ann Arbor, MI 48103, USA
| | - Xiang Zhou
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48103, USA
| | - Michael Boehnke
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48103, USA
| | - Gonçalo R. Abecasis
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48103, USA
- Regeneron Genetics Center, Tarrytown, NY 10591, USA
| | - Sebastian Zöllner
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48103, USA
- Department of Psychiatry, University of Michigan, Ann Arbor, MI 48103, USA
| |
Collapse
|
4
|
Campos-Staffico AM, Dorsch MP, Barnes GD, Zhu HJ, Limdi NA, Luzum JA. Eight pharmacokinetic genetic variants are not associated with the risk of bleeding from direct oral anticoagulants in non-valvular atrial fibrillation patients. Front Pharmacol 2022; 13:1007113. [PMID: 36506510 PMCID: PMC9730333 DOI: 10.3389/fphar.2022.1007113] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 11/07/2022] [Indexed: 11/25/2022] Open
Abstract
Background: Atrial fibrillation (AF) is the leading cause of ischemic stroke and treatment has focused on reducing this risk through anticoagulation. Direct Oral Anticoagulants (DOACs) are the first-line guideline-recommended therapy since they are as effective and overall safer than warfarin in preventing AF-related stroke. Although patients bleed less from DOACs compared to warfarin, bleeding remains the primary safety concern with this therapy. Hypothesis: Genetic variants known to modify the function of metabolic enzymes or transporters involved in the pharmacokinetics (PK) of DOACs could increase the risk of bleeding. Aim: To assess the association of eight, functional PK-related single nucleotide variants (SNVs) in five genes (ABCB1, ABCG2, CYP2J2, CYP3A4, CYP3A5) with the risk of bleeding from DOACs in non-valvular AF patients. Methods: A retrospective cohort study was carried out with 2,364 self-identified white non-valvular AF patients treated with either rivaroxaban or apixaban. Genotyping was performed with Illumina Infinium CoreExome v12.1 bead arrays by the Michigan Genomics Initiative biobank. The primary endpoint was a composite of major and clinically relevant non-major bleeding. Cox proportional hazards regression with time-varying analysis assessed the association of the eight PK-related SNVs with the risk of bleeding from DOACs in unadjusted and covariate-adjusted models. The pre-specified primary analysis was the covariate-adjusted, additive genetic models. Six tests were performed in the primary analysis as three SNVs are in the same haplotype, and thus p-values below the Bonferroni-corrected level of 8.33e-3 were considered statistically significant. Results: In the primary analysis, none of the SNVs met the Bonferroni-corrected level of statistical significance (all p > 0.1). In exploratory analyses with other genetic models, the ABCB1 (rs4148732) GG genotype tended to be associated with the risk of bleeding from rivaroxaban [HR: 1.391 (95%CI: 1.019-1.900); p = 0.038] but not from apixaban (p = 0.487). Conclusion: Eight functional PK-related genetic variants were not significantly associated with bleeding from either rivaroxaban or apixaban in more than 2,000 AF self-identified white outpatients.
Collapse
Affiliation(s)
| | - Michael P. Dorsch
- Department of Clinical Pharmacy, College of Pharmacy, University of Michigan, Ann Arbor, MI, United States
| | - Geoffrey D. Barnes
- Division of Cardiovascular Medicine, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, United States
| | - Hao-Jie Zhu
- Department of Clinical Pharmacy, College of Pharmacy, University of Michigan, Ann Arbor, MI, United States
| | - Nita A. Limdi
- Department of Neurology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Jasmine A. Luzum
- Department of Clinical Pharmacy, College of Pharmacy, University of Michigan, Ann Arbor, MI, United States,*Correspondence: Jasmine A. Luzum,
| |
Collapse
|
5
|
Yoon CJ, Kim SY, Nam CH, Lee J, Park JW, Mun J, Park S, Lee S, Yi B, Min KI, Wiley B, Bolton KL, Lee JH, Kim E, Yoo HJ, Jun JK, Choi JS, Griffith M, Griffith OL, Ju YS. Estimation of intrafamilial DNA contamination in family trio genome sequencing using deviation from Mendelian inheritance. Genome Res 2022; 32:2134-2144. [PMID: 36617634 PMCID: PMC9808622 DOI: 10.1101/gr.276794.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 10/31/2022] [Indexed: 12/12/2022]
Abstract
With the increasing number of sequencing projects involving families, quality control tools optimized for family genome sequencing are needed. However, accurately quantifying contamination in a DNA mixture is particularly difficult when genetically related family members are the sources. We developed TrioMix, a maximum likelihood estimation (MLE) framework based on Mendel's law of inheritance, to quantify DNA mixture between family members in genome sequencing data of parent-offspring trios. TrioMix can accurately deconvolute any intrafamilial DNA contamination, including parent-offspring, sibling-sibling, parent-parent, and even multiple familial sources. In addition, TrioMix can be applied to detect genomic abnormalities that deviate from Mendelian inheritance patterns, such as uniparental disomy (UPD) and chimerism. A genome-wide depth and variant allele frequency plot generated by TrioMix facilitates tracing the origin of Mendelian inheritance deviations. We showed that TrioMix could accurately deconvolute genomes in both simulated and real data sets.
Collapse
Affiliation(s)
- Christopher J. Yoon
- Department of Medicine, Washington University School of Medicine, St. Louis, Missouri 63110, USA;,Research Center for Natural Sciences, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea;,Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea;,McDonnell Genome Institute, St. Louis, Missouri 63108, USA
| | - Su Yeon Kim
- Research Center for Natural Sciences, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea
| | - Chang Hyun Nam
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea
| | - Junehawk Lee
- Center for Supercomputing Applications, Division of National Supercomputing, Korea Institute of Science and Technology Information, Daejeon 34141, Korea
| | - Jung Woo Park
- Center for Supercomputing Applications, Division of National Supercomputing, Korea Institute of Science and Technology Information, Daejeon 34141, Korea
| | - Jihyeob Mun
- Center for Supercomputing Applications, Division of National Supercomputing, Korea Institute of Science and Technology Information, Daejeon 34141, Korea
| | | | - Soyoung Lee
- GENOME INSIGHT Incorporated, Daejeon 34051, Korea
| | - Boram Yi
- GENOME INSIGHT Incorporated, Daejeon 34051, Korea
| | - Kyoung Il Min
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea
| | - Brian Wiley
- Department of Medicine, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Kelly L. Bolton
- Department of Medicine, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Jeong Ho Lee
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea
| | - Eunjoon Kim
- Department of Biological Sciences, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea;,Center for Synaptic Brain Dysfunctions, Institute for Basic Science, Daejeon 34141, Korea
| | - Hee Jeong Yoo
- Department of Psychiatry, Seoul National University Bundang Hospital, Seongnam 13620, Korea;,Department of Psychiatry, Seoul National University College of Medicine, Seoul 03080, Korea
| | - Jong Kwan Jun
- Department of Obstetrics and Gynecology, Seoul National University College of Medicine, Seoul 03080, Korea
| | - Ji Seon Choi
- Department of Laboratory Medicine, International St. Mary's Hospital, Catholic Kwandong University College of Medicine, Incheon 22711, Korea
| | - Malachi Griffith
- Department of Medicine, Washington University School of Medicine, St. Louis, Missouri 63110, USA;,McDonnell Genome Institute, St. Louis, Missouri 63108, USA
| | - Obi L. Griffith
- Department of Medicine, Washington University School of Medicine, St. Louis, Missouri 63110, USA;,McDonnell Genome Institute, St. Louis, Missouri 63108, USA
| | - Young Seok Ju
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea;,GENOME INSIGHT Incorporated, Daejeon 34051, Korea
| |
Collapse
|
6
|
Si Y, Vanderwerff B, Zöllner S. Why are rare variants hard to impute? Coalescent models reveal theoretical limits in existing algorithms. Genetics 2021; 217:iyab011. [PMID: 33686438 PMCID: PMC8049559 DOI: 10.1093/genetics/iyab011] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2020] [Accepted: 12/15/2020] [Indexed: 01/13/2023] Open
Abstract
Genotype imputation is an indispensable step in human genetic studies. Large reference panels with deeply sequenced genomes now allow interrogating variants with minor allele frequency < 1% without sequencing. Although it is critical to consider limits of this approach, imputation methods for rare variants have only done so empirically; the theoretical basis of their imputation accuracy has not been explored. To provide theoretical consideration of imputation accuracy under the current imputation framework, we develop a coalescent model of imputing rare variants, leveraging the joint genealogy of the sample to be imputed and reference individuals. We show that broadly used imputation algorithms include model misspecifications about this joint genealogy that limit the ability to correctly impute rare variants. We develop closed-form solutions for the probability distribution of this joint genealogy and quantify the inevitable error rate resulting from the model misspecification across a range of allele frequencies and reference sample sizes. We show that the probability of a falsely imputed minor allele decreases with reference sample size, but the proportion of falsely imputed minor alleles mostly depends on the allele count in the reference sample. We summarize the impact of this error on genotype imputation on association tests by calculating the r2 between imputed and true genotype and show that even when modeling other sources of error, the impact of the model misspecification has a significant impact on the r2 of rare variants. To evaluate these predictions in practice, we compare the imputation of the same dataset across imputation panels of different sizes. Although this empirical imputation accuracy is substantially lower than our theoretical prediction, modeling misspecification seems to further decrease imputation accuracy for variants with low allele counts in the reference. These results provide a framework for developing new imputation algorithms and for interpreting rare variant association analyses.
Collapse
Affiliation(s)
- Yichen Si
- Department of Biostatistics, School of Public Health, University of Michigan, 1420 Washington Heights, Ann Arbor, MI 48109, USA
| | - Brett Vanderwerff
- Department of Biostatistics, School of Public Health, University of Michigan, 1420 Washington Heights, Ann Arbor, MI 48109, USA
| | - Sebastian Zöllner
- Department of Biostatistics, School of Public Health, University of Michigan, 1420 Washington Heights, Ann Arbor, MI 48109, USA
- Department of Psychiatry, University of Michigan,1420 Washington Heights, Ann Arbor, MI 48109, USA
| |
Collapse
|