1
|
A high-throughput real-time PCR tissue-of-origin test to distinguish blood from lymphoblastoid cell line DNA for (epi)genomic studies. Sci Rep 2022; 12:4684. [PMID: 35304543 PMCID: PMC8933453 DOI: 10.1038/s41598-022-08663-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 03/09/2022] [Indexed: 12/13/2022] Open
Abstract
Lymphoblastoid cell lines (LCLs) derive from blood infected in vitro by Epstein–Barr virus and were used in several genetic, transcriptomic and epigenomic studies. Although few changes were shown between LCL and blood genotypes (SNPs) validating their use in genetics, more were highlighted for other genomic features and/or in their transcriptome and epigenome. This could render them less appropriate for these studies, notably when blood DNA could still be available. Here we developed a simple, high-throughput and cost-effective real-time PCR approach allowing to distinguish blood from LCL DNA samples based on the presence of EBV relative load and rearranged T-cell receptors γ and β. Our approach was able to achieve 98.5% sensitivity and 100% specificity on DNA of known origin (458 blood and 316 LCL DNA). It was further applied to 1957 DNA samples from the CEPH Aging cohort comprising DNA of uncertain origin, identifying 784 blood and 1016 LCL DNA. A subset of these DNA was further analyzed with an epigenetic clock indicating that DNA extracted from blood should be preferred to LCL for DNA methylation-based age prediction analysis. Our approach could thereby be a powerful tool to ascertain the origin of DNA in old collections prior to (epi)genomic studies.
Collapse
|
2
|
Liu C, Fetterman JL, Sun X, Yan K, Liu P, Luo Y, Ding J, Zhu J, Levy D. Comparison of mitochondrial DNA sequences from whole blood and lymphoblastoid cell lines. Sci Rep 2022; 12:1801. [PMID: 35110616 PMCID: PMC8810874 DOI: 10.1038/s41598-022-05814-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Accepted: 01/10/2022] [Indexed: 01/19/2023] Open
Abstract
Lymphoblastoid cell lines (LCLs) provide an unlimited source of genomic DNA for genetic studies. Here, we compared mtDNA sequence variants, heteroplasmic or homplasmic, between LCL (sequenced by mitoRCA-seq method) and whole blood samples (sequenced through whole genome sequencing approach) of the same 130 participants in the Framingham Heart Study. We applied harmonization of sequence coverages and consistent quality control to mtDNA sequences. We identified 866 variation sites in the 130 LCL samples and 666 sites in the 130 blood samples. More than 94% of the identified homoplasmies were present in both LCL and blood samples while more than 70% of heteroplasmic sites were uniquely present either in LCL or in blood samples. The LCL and whole blood samples carried a similar number of homoplasmic variants (p = 0.45) per sample while the LCL carried a greater number of heteroplasmic variants than whole blood per sample (p < 2.2e-16). Furthermore, the LCL samples tended to accumulate low level heteroplasmies (heteroplasmy level in 3-25%) than their paired blood samples (p = 0.001). These results suggest that cautions should be taken in the interpretation and comparison of findings when different tissues/cell types or different sequencing technologies are applied to obtain mtDNA sequences.
Collapse
Affiliation(s)
- Chunyu Liu
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA, 02118, USA.
| | | | - Xianbang Sun
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA, 02118, USA
| | - Kaiyu Yan
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA, 02118, USA
| | - Poching Liu
- DNA Sequencing and Genomics Core, NHLBI/NIH, Bethesda, MD, 20892, USA
| | - Yan Luo
- DNA Sequencing and Genomics Core, NHLBI/NIH, Bethesda, MD, 20892, USA
| | - Jun Ding
- Longitudinal Studies Section, Translational Gerontology Branch, National Institute on Aging, NIH, Baltimore, MD, 21224, USA
| | - Jun Zhu
- System Biology Center, NHLBI/NIH, Bethesda, MD, 20892, USA
| | - Daniel Levy
- Population Sciences Branch, NHLBI/NIH, Bethesda, MD, 20892, USA.
- Framingham Heart Study, Framingham, MA, 01702, USA.
| |
Collapse
|
3
|
Paskov K, Jung JY, Chrisman B, Stockham NT, Washington P, Varma M, Sun MW, Wall DP. Estimating sequencing error rates using families. BioData Min 2021; 14:27. [PMID: 33892748 PMCID: PMC8063364 DOI: 10.1186/s13040-021-00259-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2020] [Accepted: 03/29/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND As next-generation sequencing technologies make their way into the clinic, knowledge of their error rates is essential if they are to be used to guide patient care. However, sequencing platforms and variant-calling pipelines are continuously evolving, making it difficult to accurately quantify error rates for the particular combination of assay and software parameters used on each sample. Family data provide a unique opportunity for estimating sequencing error rates since it allows us to observe a fraction of sequencing errors as Mendelian errors in the family, which we can then use to produce genome-wide error estimates for each sample. RESULTS We introduce a method that uses Mendelian errors in sequencing data to make highly granular per-sample estimates of precision and recall for any set of variant calls, regardless of sequencing platform or calling methodology. We validate the accuracy of our estimates using monozygotic twins, and we use a set of monozygotic quadruplets to show that our predictions closely match the consensus method. We demonstrate our method's versatility by estimating sequencing error rates for whole genome sequencing, whole exome sequencing, and microarray datasets, and we highlight its sensitivity by quantifying performance increases between different versions of the GATK variant-calling pipeline. We then use our method to demonstrate that: 1) Sequencing error rates between samples in the same dataset can vary by over an order of magnitude. 2) Variant calling performance decreases substantially in low-complexity regions of the genome. 3) Variant calling performance in whole exome sequencing data decreases with distance from the nearest target region. 4) Variant calls from lymphoblastoid cell lines can be as accurate as those from whole blood. 5) Whole-genome sequencing can attain microarray-level precision and recall at disease-associated SNV sites. CONCLUSION Genotype datasets from families are powerful resources that can be used to make fine-grained estimates of sequencing error for any sequencing platform and variant-calling methodology.
Collapse
Affiliation(s)
- Kelley Paskov
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
| | - Jae-Yoon Jung
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.,Department of Pediatrics (Systems Medicine), Stanford University, Stanford, CA, USA
| | - Brianna Chrisman
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Nate T Stockham
- Department of Neuroscience, Stanford University, Stanford, CA, USA
| | - Peter Washington
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Maya Varma
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Min Woo Sun
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Dennis P Wall
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA. .,Department of Pediatrics (Systems Medicine), Stanford University, Stanford, CA, USA.
| |
Collapse
|
4
|
Lavrichenko K, Helgeland Ø, Njølstad PR, Jonassen I, Johansson S. SeeCiTe: a method to assess CNV calls from SNP arrays using trio data. Bioinformatics 2021; 37:1876-1883. [PMID: 33459766 PMCID: PMC8317106 DOI: 10.1093/bioinformatics/btab028] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 12/17/2020] [Accepted: 01/11/2021] [Indexed: 11/15/2022] Open
Abstract
Motivation Single nucleotide polymorphism (SNP) genotyping arrays remain an attractive platform for assaying copy number variants (CNVs) in large population-wide cohorts. However, current tools for calling CNVs are still prone to extensive false positive calls when applied to biobank scale arrays. Moreover, there is a lack of methods exploiting cohorts with trios available (e.g. nuclear family) to assist in quality control and downstream analyses following the calling. Results We developed SeeCiTe (Seeing CNVs in Trios), a novel CNV-quality control tool that postprocesses output from current CNV-calling tools exploiting child-parent trio data to classify calls in quality categories and provide a set of visualizations for each putative CNV call in the offspring. We apply it to the Norwegian Mother, Father and Child Cohort Study (MoBa) and show that SeeCiTe improves the specificity and sensitivity compared to the common empiric filtering strategies. To our knowledge, it is the first tool that utilizes probe-level CNV data in trios (and singletons) to systematically highlight potential artifacts and visualize signal intensities in a streamlined fashion suitable for biobank scale studies. Availability and implementation The software is implemented in R with the source code freely available at https://github.com/aksenia/SeeCiTe Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ksenia Lavrichenko
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway.,Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Øyvind Helgeland
- Department of Clinical Science, University of Bergen, Bergen, Norway.,Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, Oslo, Norway
| | - Pål R Njølstad
- Department of Clinical Science, University of Bergen, Bergen, Norway.,Department of Pediatrics and Adolescents, Haukeland University Hospital, Bergen, Norway
| | - Inge Jonassen
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Stefan Johansson
- Department of Clinical Science, University of Bergen, Bergen, Norway.,Department of Medical Genetics, Haukeland University Hospital, Bergen, Norway
| |
Collapse
|
5
|
Diagnosing Cornelia de Lange syndrome and related neurodevelopmental disorders using RNA sequencing. Genet Med 2020; 22:927-936. [PMID: 31911672 DOI: 10.1038/s41436-019-0741-5] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Accepted: 12/19/2019] [Indexed: 01/05/2023] Open
Abstract
PURPOSE Neurodevelopmental disorders represent a frequent indication for clinical exome sequencing. Fifty percent of cases, however, remain undiagnosed even upon exome reanalysis. Here we show RNA sequencing (RNA-seq) on human B-lymphoblastoid cell lines (LCL) is highly suitable for neurodevelopmental Mendelian gene testing and demonstrate the utility of this approach in suspected cases of Cornelia de Lange syndrome (CdLS). METHODS Genotype-Tissue Expression project transcriptome data for LCL, blood, and brain were assessed for neurodevelopmental Mendelian gene expression. Detection of abnormal splicing and pathogenic variants in these genes was performed with a novel RNA-seq diagnostic pipeline and using a validation CdLS-LCL cohort (n = 10) and test cohort of patients who carry a clinical diagnosis of CdLS but negative genetic testing (n = 5). RESULTS LCLs share isoform diversity of brain tissue for a large subset of neurodevelopmental genes and express 1.8-fold more of these genes compared with blood (LCL, n = 1706; whole blood, n = 917). This enables testing of more than 1000 genetic syndromes. The RNA-seq pipeline had 90% sensitivity for detecting pathogenic events and revealed novel diagnoses such as abnormal splice products in NIPBL and pathogenic coding variants in BRD4 and ANKRD11. CONCLUSION The LCL transcriptome enables robust frontline and/or reflexive diagnostic testing for neurodevelopmental disorders.
Collapse
|
6
|
Investigating mitonuclear interactions in human admixed populations. Nat Ecol Evol 2019; 3:213-222. [PMID: 30643241 PMCID: PMC6925600 DOI: 10.1038/s41559-018-0766-1] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2018] [Accepted: 11/22/2018] [Indexed: 12/13/2022]
Abstract
To function properly, mitochondria utilize products of 37 mitochondrial and >1,000 nuclear genes, which should be compatible with each other. Discordance between mitochondrial and nuclear genetic ancestry could contribute to phenotypic variation in admixed populations. Here, we explored potential mitonuclear incompatibility in six admixed human populations from the Americas: African Americans, African Caribbeans, Colombians, Mexicans, Peruvians and Puerto Ricans. By comparing nuclear versus mitochondrial ancestry in these populations, we first show that mitochondrial DNA (mtDNA) copy number decreases with increasing discordance between nuclear and mtDNA ancestry. The direction of this effect is consistent across mtDNA haplogroups of different geographic origins. This observation indicates suboptimal regulation of mtDNA replication when its components are encoded by nuclear and mtDNA genes with different ancestry. Second, while most populations analysed exhibit no such trend, in African Americans and Puerto Ricans, we find a significant enrichment of ancestry at nuclear-encoded mitochondrial genes towards the source populations contributing the most prevalent mtDNA haplogroups (African and Native American, respectively). This possibly reflects compensatory effects of selection in recovering mitonuclear interactions optimized in the source populations. Our results provide evidence of mitonuclear interactions in human admixed populations and we discuss their implications for human health and disease.
Collapse
|