1
|
Wang Y, Lee H, Fear JM, Berger I, Oliver B, Przytycka TM. NetREX-CF integrates incomplete transcription factor data with gene expression to reconstruct gene regulatory networks. Commun Biol 2022; 5:1282. [PMID: 36418514 PMCID: PMC9684490 DOI: 10.1038/s42003-022-04226-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 11/04/2022] [Indexed: 11/25/2022] Open
Abstract
The inference of Gene Regulatory Networks (GRNs) is one of the key challenges in systems biology. Leading algorithms utilize, in addition to gene expression, prior knowledge such as Transcription Factor (TF) DNA binding motifs or results of TF binding experiments. However, such prior knowledge is typically incomplete, therefore, integrating it with gene expression to infer GRNs remains difficult. To address this challenge, we introduce NetREX-CF-Regulatory Network Reconstruction using EXpression and Collaborative Filtering-a GRN reconstruction approach that brings together Collaborative Filtering to address the incompleteness of the prior knowledge and a biologically justified model of gene expression (sparse Network Component Analysis based model). We validated the NetREX-CF using Yeast data and then used it to construct the GRN for Drosophila Schneider 2 (S2) cells. To corroborate the GRN, we performed a large-scale RNA-Seq analysis followed by a high-throughput RNAi treatment against all 465 expressed TFs in the cell line. Our knockdown result has not only extensively validated the GRN we built, but also provides a benchmark that our community can use for evaluating GRNs. Finally, we demonstrate that NetREX-CF can infer GRNs using single-cell RNA-Seq, and outperforms other methods, by using previously published human data.
Collapse
Affiliation(s)
- Yijie Wang
- Computer Science Department, Indiana University, Bloomington, IN, 47408, USA.
| | - Hangnoh Lee
- Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, 50 South Drive, Bethesda, MD, 20892, USA
| | - Justin M Fear
- Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, 50 South Drive, Bethesda, MD, 20892, USA
| | - Isabelle Berger
- Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, 50 South Drive, Bethesda, MD, 20892, USA
| | - Brian Oliver
- Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, 50 South Drive, Bethesda, MD, 20892, USA.
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, 20894, USA.
| |
Collapse
|
2
|
Library adaptors with integrated reference controls improve the accuracy and reliability of nanopore sequencing. Nat Commun 2022; 13:6437. [PMID: 36307482 PMCID: PMC9616880 DOI: 10.1038/s41467-022-34028-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 10/11/2022] [Indexed: 12/25/2022] Open
Abstract
Library adaptors are short oligonucleotides that are attached to RNA and DNA samples in preparation for next-generation sequencing (NGS). Adaptors can also include additional functional elements, such as sample indexes and unique molecular identifiers, to improve library analysis. Here, we describe Control Library Adaptors, termed CAPTORs, that measure the accuracy and reliability of NGS. CAPTORs can be integrated within the library preparation of RNA and DNA samples, and their encoded information is retrieved during sequencing. We show how CAPTORs can measure the accuracy of nanopore sequencing, evaluate the quantitative performance of metagenomic and RNA sequencing, and improve normalisation between samples. CAPTORs can also be customised for clinical diagnoses, correcting systematic sequencing errors and improving the diagnosis of pathogenic BRCA1/2 variants in breast cancer. CAPTORs are a simple and effective method to increase the accuracy and reliability of NGS, enabling comparisons between samples, reagents and laboratories, and supporting the use of nanopore sequencing for clinical diagnosis.
Collapse
|
3
|
Yao H, Lu S, Williams BA, Flanagan BM, Gidley MJ, Mikkelsen D. Absolute abundance values reveal microbial shifts and co-occurrence patterns during gut microbiota fermentation of dietary fibres in vitro. Food Hydrocoll 2022. [DOI: 10.1016/j.foodhyd.2021.107422] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
4
|
Pfeifer JD, Loberg R, Lofton-Day C, Zehnbauer BA. Reference Samples to Compare Next-Generation Sequencing Test Performance for Oncology Therapeutics and Diagnostics. Am J Clin Pathol 2022; 157:628-638. [PMID: 34871357 DOI: 10.1093/ajcp/aqab164] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 08/24/2021] [Indexed: 11/15/2022] Open
Abstract
OBJECTIVES Diversity of laboratory-developed tests (LDTs) using next-generation sequencing (NGS) raises concerns about their accuracy for selection of targeted therapies. A working group developed a pilot study of traceable reference samples to measure NGS LDT performance among a cohort of clinical laboratories. METHODS Human cell lines were engineered via CRISPR/Cas9 and prepared as formalin-fixed, paraffin-embedded cell pellets ("wet" samples) to assess the entire NGS test cycle. In silico mutagenized NGS sequence files ("dry" samples) were used to assess the bioinformatics component of the NGS test cycle. Single and multinucleotide variants (n = 36) of KRAS and NRAS were tested at 5% or 15% variant allele fraction to determine eligibility for therapy with the EGFR inhibitor panitumumab in the setting of metastatic colorectal cancer. RESULTS Twenty-one (21/21) laboratories tested wet samples; 19 of 21 analyzed dry samples. Of the laboratories that tested both the wet and dry samples, 7 (37%) of 19 laboratories correctly reported all variants, 3 (16%) of 19 had fewer than five errors, and 9 (47%) of 19 had five or more errors. Most errors were false negatives. CONCLUSIONS Genetically engineered cell lines and mutagenized sequence files are complementary reference samples for evaluating NGS test performance among clinical laboratories using LDTs. Variable accuracy in detection of genetic variants among some LDTs may identify different patient populations for targeted therapy.
Collapse
Affiliation(s)
- John D Pfeifer
- Department of Pathology, Washington University School of Medicine, St Louis, MO, USA
| | - Robert Loberg
- Clinical Biomarkers and Diagnostics, Thousand Oaks, CA, USA
| | | | - Barbara A Zehnbauer
- Department of Pathology, Emory University School of Medicine, Atlanta, GA, USA
| |
Collapse
|
5
|
Lou RN, Therkildsen NO. Batch effects in population genomic studies with low-coverage whole genome sequencing data: Causes, detection and mitigation. Mol Ecol Resour 2021; 22:1678-1692. [PMID: 34825778 DOI: 10.1111/1755-0998.13559] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Revised: 11/05/2021] [Accepted: 11/11/2021] [Indexed: 01/04/2023]
Abstract
Over the past few decades, there has been an explosion in the amount of publicly available sequencing data. This opens new opportunities for combining data sets to achieve unprecedented sample sizes, spatial coverage or temporal replication in population genomic studies. However, a common concern is that nonbiological differences between data sets may generate patterns of variation in the data that can confound real biological patterns, a problem known as batch effects. In this paper, we compare two batches of low-coverage whole genome sequencing (lcWGS) data generated from the same populations of Atlantic cod (Gadus morhua). First, we show that with a "batch-effect-naive" bioinformatic pipeline, batch effects systematically biased our genetic diversity estimates, population structure inference and selection scans. We then demonstrate that these batch effects resulted from multiple technical differences between our data sets, including the sequencing chemistry (four-channel vs. two-channel), sequencing run, read type (single-end vs. paired-end), read length (125 vs. 150 bp), DNA degradation level (degraded vs. well preserved) and sequencing depth (0.8× vs. 0.3× on average). Lastly, we illustrate that a set of simple bioinformatic strategies (such as different read trimming and single nucleotide polymorphism filtering) can be used to detect batch effects in our data and substantially mitigate their impact. We conclude that combining data sets remains a powerful approach as long as batch effects are explicitly accounted for. We focus on lcWGS data in this paper, which may be particularly vulnerable to certain causes of batch effects, but many of our conclusions also apply to other sequencing strategies.
Collapse
|
6
|
Lou RN, Jacobs A, Wilder A, Therkildsen NO. A beginner's guide to low-coverage whole genome sequencing for population genomics. Mol Ecol 2021; 30:5966-5993. [PMID: 34250668 DOI: 10.1111/mec.16077] [Citation(s) in RCA: 80] [Impact Index Per Article: 26.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 06/30/2021] [Accepted: 07/01/2021] [Indexed: 11/26/2022]
Abstract
Low-coverage whole genome sequencing (lcWGS) has emerged as a powerful and cost-effective approach for population genomic studies in both model and non-model species. However, with read depths too low to confidently call individual genotypes, lcWGS requires specialized analysis tools that explicitly account for genotype uncertainty. A growing number of such tools have become available, but it can be difficult to get an overview of what types of analyses can be performed reliably with lcWGS data, and how the distribution of sequencing effort between the number of samples analyzed and per-sample sequencing depths affects inference accuracy. In this introductory guide to lcWGS, we first illustrate how the per-sample cost for lcWGS is now comparable to RAD-seq and Pool-seq in many systems. We then provide an overview of software packages that explicitly account for genotype uncertainty in different types of population genomic inference. Next, we use both simulated and empirical data to assess the accuracy of allele frequency and genetic diversity estimation, detection of population structure, and selection scans under different sequencing strategies. Our results show that spreading a given amount of sequencing effort across more samples with lower depth per sample consistently improves the accuracy of most types of inference, with a few notable exceptions. Finally, we assess the potential for using imputation to bolster inference from lcWGS data in non-model species, and discuss current limitations and future perspectives for lcWGS-based population genomics research. With this overview, we hope to make lcWGS more approachable and stimulate its broader adoption.
Collapse
Affiliation(s)
- Runyang Nicolas Lou
- Department of Natural Resources and the Environment, Cornell University, Ithaca, NY, 14853, USA
| | - Arne Jacobs
- Department of Natural Resources and the Environment, Cornell University, Ithaca, NY, 14853, USA.,Institute of Biodiversity, Animal Health and Comparative Medicine, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Aryn Wilder
- San Diego Zoo Wildlife Alliance, Escondido, CA, 92027, USA
| | - Nina O Therkildsen
- Department of Natural Resources and the Environment, Cornell University, Ithaca, NY, 14853, USA
| |
Collapse
|
7
|
Genome-Wide Analysis of Off-Target CRISPR/Cas9 Activity in Single-Cell-Derived Human Hematopoietic Stem and Progenitor Cell Clones. Genes (Basel) 2020; 11:genes11121501. [PMID: 33322084 PMCID: PMC7762975 DOI: 10.3390/genes11121501] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Revised: 11/28/2020] [Accepted: 12/11/2020] [Indexed: 12/11/2022] Open
Abstract
CRISPR/Cas9 (clustered regularly interspaced short palindromic repeats/CRISPR-associated protein 9)-mediated genome editing holds remarkable promise for the treatment of human genetic diseases. However, the possibility of off-target Cas9 activity remains a concern. To address this issue using clinically relevant target cells, we electroporated Cas9 ribonucleoprotein (RNP) complexes (independently targeted to two different genomic loci, the CXCR4 locus on chromosome 2 and the AAVS1 locus on chromosome 19) into human mobilized peripheral blood-derived hematopoietic stem and progenitor cells (HSPCs) and assessed the acquisition of somatic mutations in an unbiased, genome-wide manner via whole genome sequencing (WGS) of single-cell-derived HSPC clones. Bioinformatic analysis identified >20,000 total somatic variants (indels, single nucleotide variants, and structural variants) distributed among Cas9-treated and non-Cas9-treated control HSPC clones. Statistical analysis revealed no significant difference in the number of novel non-targeted indels among the samples. Moreover, data analysis showed no evidence of Cas9-mediated indel formation at 623 predicted off-target sites. The median number of novel single nucleotide variants was slightly elevated in Cas9 RNP-recipient sample groups compared to baseline, but did not reach statistical significance. Structural variants were rare and demonstrated no clear causal connection to Cas9-mediated gene editing procedures. We find that the collective somatic mutational burden observed within Cas9 RNP-edited human HSPC clones is indistinguishable from naturally occurring levels of background genetic heterogeneity.
Collapse
|
8
|
A single-cell transcriptomic atlas characterizes ageing tissues in the mouse. Nature 2020; 583:590-595. [PMID: 32669714 PMCID: PMC8240505 DOI: 10.1038/s41586-020-2496-1] [Citation(s) in RCA: 557] [Impact Index Per Article: 139.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Accepted: 05/07/2020] [Indexed: 01/10/2023]
Abstract
Ageing is characterized by a progressive loss of physiological integrity, leading to impaired function and increased vulnerability to death1. Despite rapid advances over recent years, many of the molecular and cellular processes that underlie the progressive loss of healthy physiology are poorly understood2. To gain a better insight into these processes, here we generate a single-cell transcriptomic atlas across the lifespan of Mus musculus that includes data from 23 tissues and organs. We found cell-specific changes occurring across multiple cell types and organs, as well as age-related changes in the cellular composition of different organs. Using single-cell transcriptomic data, we assessed cell-type-specific manifestations of different hallmarks of ageing-such as senescence3, genomic instability4 and changes in the immune system2. This transcriptomic atlas-which we denote Tabula Muris Senis, or 'Mouse Ageing Cell Atlas'-provides molecular information about how the most important hallmarks of ageing are reflected in a broad range of tissues and cell types.
Collapse
|
9
|
Use of Spiked Normalizers to More Precisely Quantify Tumor Markers and Viral Genomes by Massive Parallel Sequencing of Plasma DNA. J Mol Diagn 2020; 22:437-446. [PMID: 32036092 DOI: 10.1016/j.jmoldx.2020.01.012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Revised: 12/19/2019] [Accepted: 01/14/2020] [Indexed: 02/07/2023] Open
Abstract
A problematic aspect of massive parallel sequencing is that somatic mutations and viral loads are typically quantified as a fraction relative to wild-type human DNA, yet wild-type levels vary with diverse biologic and preanalytic interferences. A novel strategy was devised to quantify target analytes in copies per mL of plasma after normalizing for read counts of spiked DNAs. Five synthetic DNAs (called EndoGenus spikes) were added to plasma before library preparation (modified ArcherDX LiquidPlex 28). By normalizing to the fractional recovery of EndoGenus spike reads, numerical values for each disease marker were reportable in units of copies per mL. To show how well this system operates, replicate assays were performed on 40 mock plasmas having 23 engineered mutations and on 21 natural plasmas. Reads for all five EndoGenus spikes were recovered (means, 313 and 376 copies/mL in mock and natural plasmas, respectively). Normalizing read counts for the proportional recovery of spikes helped control for variables in the multistep protocol, reducing the CV in replicate tests from 34% to 22% for mutations and from 25% to 7% for viral loads. In conclusion, the EndoGenus system is useful for evaluating efficiency of the total test system and for precisely quantifying target molecules. This system may benefit patients being monitored for disease burden while also tracking emerging subclones.
Collapse
|
10
|
Son DH, Hwang NH, Chung WH, Seong HS, Lim H, Cho ES, Choi JW, Kang KS, Kim YM. Whole-genome resequencing analysis of 20 Micro-pigs. Genes Genomics 2019; 42:263-272. [PMID: 31833050 DOI: 10.1007/s13258-019-00891-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 11/14/2019] [Indexed: 12/22/2022]
Abstract
BACKGROUND Miniature pigs have been increasingly used as mammalian model animals for biomedical research because of their similarity to human beings in terms of their metabolic features and proportional organ sizes. However, despite their importance, there is a severe lack of genome-wide studies on miniature pigs. OBJECTIVE In this study, we performed whole-genome sequencing analysis of 20 Micro-pigs obtained from Medi Kinetics to elucidate their genomic characteristics. RESULTS Approximately 595 gigabase pairs (Gb) of sequence reads were generated to be mapped to the swine reference genome assembly (Sus scrofa 10.2); on average, the sequence reads covered 99.15% of the reference genome at an average of 9.6-fold coverage. We detected a total of 19,518,548 SNPs, of which 8.7% were found to be novel. With further annotation of all of the SNPs, we retrieved 144,507 nonsynonymous SNPs (nsSNPs); of these, 5968 were found in all 20 individuals used in this study. SIFT prediction for these SNPs identified that 812 nsSNPs in 402 genes were deleterious. Among these 402 genes, we identified some genes that could potentially affect traits of interest in Micro-pigs, such as RHEB and FRAS1. Furthermore, we performed runs of homozygosity analysis to locate potential selection signatures in the genome, detecting several loci that might be involved in phenotypic characteristics in Micro-pigs, such as MSTN, GDF5, and GDF11. CONCLUSION In this study, we identified numerous nsSNPs that could be used as candidate genetic markers with involvement in traits of interest. Furthermore, we detected putative selection footprints that might be associated with recent selection applied to miniature pigs.
Collapse
Affiliation(s)
- Da-Hye Son
- College of Animal Life Science, Kangwon National University, Chuncheon, 24341, Republic of Korea
| | - Nam-Hyun Hwang
- College of Animal Life Science, Kangwon National University, Chuncheon, 24341, Republic of Korea
| | - Won-Hyong Chung
- Research Division of Food Functionality, Research Group of Healthcare, 245, Nongsaengmyeong-ro, Iseo-myeon, Wanju-gun, Jeollabuk-do, 55365, Republic of Korea
| | - Ha-Seung Seong
- College of Animal Life Science, Kangwon National University, Chuncheon, 24341, Republic of Korea
| | - Hyungbum Lim
- Medikinetics Co., Ltd, 4 Hansan-gil, Cheongbuk-eup, Pyeongtaek-si, Gyeonggi-do, 17792, Republic of Korea
| | - Eun-Seok Cho
- Division of Swine Science, National Institute of Animal Science, RDA, Cheonan, 31000, Republic of Korea
| | - Jung-Woo Choi
- College of Animal Life Science, Kangwon National University, Chuncheon, 24341, Republic of Korea.
| | - Kyung-Soo Kang
- Medikinetics Co., Ltd, 4 Hansan-gil, Cheongbuk-eup, Pyeongtaek-si, Gyeonggi-do, 17792, Republic of Korea.
| | - Yong-Min Kim
- Division of Swine Science, National Institute of Animal Science, RDA, Cheonan, 31000, Republic of Korea.
| |
Collapse
|
11
|
Benner L, Castro EA, Whitworth C, Venken KJT, Yang H, Fang J, Oliver B, Cook KR, Lerit DA. Drosophila Heterochromatin Stabilization Requires the Zinc-Finger Protein Small Ovary. Genetics 2019; 213:877-895. [PMID: 31558581 PMCID: PMC6827387 DOI: 10.1534/genetics.119.302590] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 09/21/2019] [Indexed: 02/04/2023] Open
Abstract
Heterochromatin-mediated repression is essential for controlling the expression of transposons and for coordinated cell type-specific gene regulation. The small ovary (sov) locus was identified in a screen for female-sterile mutations in Drosophila melanogaster, and mutants show dramatic ovarian morphogenesis defects. We show that the null sov phenotype is lethal and map the locus to the uncharacterized gene CG14438, which encodes a nuclear zinc-finger protein that colocalizes with the essential Heterochromatin Protein 1 (HP1a). We demonstrate Sov functions to repress inappropriate gene expression in the ovary, silence transposons, and suppress position-effect variegation in the eye, suggesting a central role in heterochromatin stabilization.
Collapse
Affiliation(s)
- Leif Benner
- Section of Developmental Genomics, Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892
- Department of Biology, Johns Hopkins University, Baltimore, Maryland 21218
| | - Elias A Castro
- Department of Cell Biology, Emory University School of Medicine, Atlanta, Georgia 30322
| | - Cale Whitworth
- Section of Developmental Genomics, Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892
- Department of Biology, Indiana University, Bloomington, Indiana 47405
| | - Koen J T Venken
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology
- McNair Medical Institute at the Robert and Janice McNair Foundation
- Dan L. Duncan Cancer Center, Center for Drug Discovery
- Department of Pharmacology and Chemical Biology, Baylor College of Medicine, Houston, Texas 77030
| | - Haiwang Yang
- Section of Developmental Genomics, Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892
| | - Junnan Fang
- Department of Cell Biology, Emory University School of Medicine, Atlanta, Georgia 30322
| | - Brian Oliver
- Section of Developmental Genomics, Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892
| | - Kevin R Cook
- Department of Biology, Indiana University, Bloomington, Indiana 47405
| | - Dorothy A Lerit
- Department of Cell Biology, Emory University School of Medicine, Atlanta, Georgia 30322
| |
Collapse
|
12
|
Patil SA, Mujacic I, Ritterhouse LL, Segal JP, Kadri S. insiM. J Mol Diagn 2019; 21:19-26. [DOI: 10.1016/j.jmoldx.2018.08.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Revised: 07/16/2018] [Accepted: 08/14/2018] [Indexed: 12/25/2022] Open
|
13
|
Yang H, Jaime M, Polihronakis M, Kanegawa K, Markow T, Kaneshiro K, Oliver B. Re-annotation of eight Drosophila genomes. Life Sci Alliance 2018; 1:e201800156. [PMID: 30599046 PMCID: PMC6305970 DOI: 10.26508/lsa.201800156] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Revised: 12/15/2018] [Accepted: 12/16/2018] [Indexed: 12/11/2022] Open
Abstract
The sequenced genomes of the Drosophila phylogeny are a central resource for comparative work supporting the understanding of the Drosophila melanogaster non-mammalian model system. These have also facilitated evolutionary studies on the selected and random differences that distinguish the thousands of extant species of Drosophila. However, full utility has been hampered by uneven genome annotation. We have generated a large expression profile dataset for nine species of Drosophila and trained a transcriptome assembly approach on D. melanogaster that best matched the extensively curated annotation. We then applied this to the other species to add more than 10000 transcript models per species. We also developed new orthologs to facilitate cross-species comparisons. We validated the new annotation of the distantly related Drosophila grimshawi with an extensive collection of newly sequenced cDNAs. This re-annotation will facilitate understanding both the core commonalities and the species differences in this important group of model organisms, and suggests a strategy for annotating the many forthcoming genomes covering the tree of life.
Collapse
Affiliation(s)
- Haiwang Yang
- Section of Developmental Genomics, Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Maria Jaime
- Section of Developmental Genomics, Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Maxi Polihronakis
- Drosophila Species Stock Center, Division of Biological Sciences, University of California San Diego, La Jolla, CA, USA
| | - Kelvin Kanegawa
- Hawaiian Drosophila Research Stock Center, Pacific Biosciences Research Center, University of Hawai'i at Manoa, Honolulu, HI, USA
| | - Therese Markow
- National Laboratory of Genomics for Biodiversity (LANGEBIO), Irapuato, Guanajuato, Mexico.,Drosophila Species Stock Center, Division of Biological Sciences, University of California San Diego, La Jolla, CA, USA
| | - Kenneth Kaneshiro
- Hawaiian Drosophila Research Stock Center, Pacific Biosciences Research Center, University of Hawai'i at Manoa, Honolulu, HI, USA
| | - Brian Oliver
- Section of Developmental Genomics, Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
14
|
Khazaei T, Barlow JT, Schoepp NG, Ismagilov RF. RNA markers enable phenotypic test of antibiotic susceptibility in Neisseria gonorrhoeae after 10 minutes of ciprofloxacin exposure. Sci Rep 2018; 8:11606. [PMID: 30072794 PMCID: PMC6072703 DOI: 10.1038/s41598-018-29707-w] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2018] [Accepted: 07/11/2018] [Indexed: 11/29/2022] Open
Abstract
Antimicrobial-resistant Neisseria gonorrhoeae is an urgent public-health threat, with continued worldwide incidents of infection and rising resistance to antimicrobials. Traditional culture-based methods for antibiotic susceptibility testing are unacceptably slow (1-2 days), resulting in the use of broad-spectrum antibiotics and the further development and spread of resistance. Critically needed is a rapid antibiotic susceptibility test (AST) that can guide treatment at the point-of-care. Rapid phenotypic approaches using quantification of DNA have been demonstrated for fast-growing organisms (e.g. E. coli) but are challenging for slower-growing pathogens such as N. gonorrhoeae. Here, we investigate the potential of RNA signatures to provide phenotypic responses to antibiotics in N. gonorrhoeae that are faster and greater in magnitude compared with DNA. Using RNA sequencing, we identified antibiotic-responsive transcripts. Significant shifts (>4-fold change) in transcript levels occurred within 5 min of antibiotic exposure. We designed assays for responsive transcripts with the highest abundances and fold changes, and validated gene expression using digital PCR. Using the top two markers (porB and rpmB) we correctly determined the antibiotic susceptibility and resistance of 49 clinical isolates after 10 min exposure to ciprofloxacin. RNA signatures are therefore promising as an approach on which to build rapid AST devices for N. gonorrhoeae at the point-of-care, which is critical for disease management, surveillance, and antibiotic stewardship efforts.
Collapse
Affiliation(s)
- Tahmineh Khazaei
- Division of Biology and Biological Engineering, California Institute of Technology, 1200 E. California Blvd, Pasadena, CA, United States of America
| | - Jacob T Barlow
- Division of Biology and Biological Engineering, California Institute of Technology, 1200 E. California Blvd, Pasadena, CA, United States of America
| | - Nathan G Schoepp
- Division of Chemistry and Chemical Engineering, California Institute of Technology, 1200 E. California Blvd, Pasadena, CA, United States of America
| | - Rustem F Ismagilov
- Division of Biology and Biological Engineering, California Institute of Technology, 1200 E. California Blvd, Pasadena, CA, United States of America.
- Division of Chemistry and Chemical Engineering, California Institute of Technology, 1200 E. California Blvd, Pasadena, CA, United States of America.
| |
Collapse
|
15
|
Saad C, Noé L, Richard H, Leclerc J, Buisine MP, Touzet H, Figeac M. DiNAMO: highly sensitive DNA motif discovery in high-throughput sequencing data. BMC Bioinformatics 2018; 19:223. [PMID: 29890948 PMCID: PMC5996464 DOI: 10.1186/s12859-018-2215-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2017] [Accepted: 05/21/2018] [Indexed: 12/30/2022] Open
Abstract
Background Discovering over-represented approximate motifs in DNA sequences is an essential part of bioinformatics. This topic has been studied extensively because of the increasing number of potential applications. However, it remains a difficult challenge, especially with the huge quantity of data generated by high throughput sequencing technologies. To overcome this problem, existing tools use greedy algorithms and probabilistic approaches to find motifs in reasonable time. Nevertheless these approaches lack sensitivity and have difficulties coping with rare and subtle motifs. Results We developed DiNAMO (for DNA MOtif), a new software based on an exhaustive and efficient algorithm for IUPAC motif discovery. We evaluated DiNAMO on synthetic and real datasets with two different applications, namely ChIP-seq peaks and Systematic Sequencing Error analysis. DiNAMO proves to compare favorably with other existing methods and is robust to noise. Conclusions We shown that DiNAMO software can serve as a tool to search for degenerate motifs in an exact manner using IUPAC models. DiNAMO can be used in scanning mode with sliding windows or in fixed position mode, which makes it suitable for numerous potential applications. Availability https://github.com/bonsai-team/DiNAMO. Electronic supplementary material The online version of this article (10.1186/s12859-018-2215-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Chadi Saad
- Univ. Lille, CNRS, Inria, UMR 9189 - CRIStAL - Centre de Recherche en Informatique Signal et Automatique de Lille, Lille, France. .,Univ. Lille, Inserm, Lille University Hospital, UMR-S 1172 - JPARC - Centre de Recherche Jean-Pierre AUBERT, Lille, F-59000, France.
| | - Laurent Noé
- Univ. Lille, CNRS, Inria, UMR 9189 - CRIStAL - Centre de Recherche en Informatique Signal et Automatique de Lille, Lille, France
| | - Hugues Richard
- Sorbonne Université, UMR7238, Laboratory Computational and Quantitative Biology, LCQB, Paris, F-75005, France
| | - Julie Leclerc
- Univ. Lille, Inserm, Lille University Hospital, UMR-S 1172 - JPARC - Centre de Recherche Jean-Pierre AUBERT, Lille, F-59000, France
| | - Marie-Pierre Buisine
- Univ. Lille, Inserm, Lille University Hospital, UMR-S 1172 - JPARC - Centre de Recherche Jean-Pierre AUBERT, Lille, F-59000, France
| | - Hélène Touzet
- Univ. Lille, CNRS, Inria, UMR 9189 - CRIStAL - Centre de Recherche en Informatique Signal et Automatique de Lille, Lille, France
| | - Martin Figeac
- Univ. Lille. Plateau de génomique fonctionnelle et structurale, Lille, F-59000, France
| |
Collapse
|
16
|
Jaime MDLA, Hurtado J, Loustalot-Laclette MR, Oliver B, Markow T. Exploring Effects of Sex and Diet on Drosophila melanogaster Head Gene Expression. J Genomics 2017; 5:128-131. [PMID: 29109800 PMCID: PMC5666516 DOI: 10.7150/jgen.22393] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2017] [Accepted: 10/04/2017] [Indexed: 02/02/2023] Open
Abstract
Gene expression depends on sex and environment. We stringently explored the contributions of these effects in Drosophila melanogaster by rearing three distinct wildtype genotypes on isocaloric diets either high in protein or sugar followed by expression profiling of heads from the sexes. By using different genotypes as replicates we developed robust sex- and diet-biased expression responses.
Collapse
Affiliation(s)
- Maria D L A Jaime
- Section of Developmental Genomics, Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Juan Hurtado
- Department of Ecology, Genetics and Evolution, IEGEBA (CONICET-UBA), Faculty of Exact and Natural Sciences, University of Buenos Aires, Ciudad Universitaria, C1428EHA, Buenos Aires, Argentina.,National Laboratory for the Genomics of Biodiversity, CINVESTAV, Irapuato, Guanajuato, Mexico
| | - Mariana Ramirez Loustalot-Laclette
- Department of Ecology, Genetics and Evolution, IEGEBA (CONICET-UBA), Faculty of Exact and Natural Sciences, University of Buenos Aires, Ciudad Universitaria, C1428EHA, Buenos Aires, Argentina
| | - Brian Oliver
- Section of Developmental Genomics, Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Therese Markow
- National Laboratory for the Genomics of Biodiversity, CINVESTAV, Irapuato, Guanajuato, Mexico.,Division of Biological Sciences, University of California San Diego, La Jolla CA 92093, USA
| |
Collapse
|
17
|
Hardwick SA, Deveson IW, Mercer TR. Reference standards for next-generation sequencing. Nat Rev Genet 2017. [DOI: 10.1038/nrg.2017.44] [Citation(s) in RCA: 148] [Impact Index Per Article: 21.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
18
|
Jennings LJ, Arcila ME, Corless C, Kamel-Reid S, Lubin IM, Pfeifer J, Temple-Smolkin RL, Voelkerding KV, Nikiforova MN. Guidelines for Validation of Next-Generation Sequencing-Based Oncology Panels: A Joint Consensus Recommendation of the Association for Molecular Pathology and College of American Pathologists. J Mol Diagn 2017; 19:341-365. [PMID: 28341590 DOI: 10.1016/j.jmoldx.2017.01.011] [Citation(s) in RCA: 440] [Impact Index Per Article: 62.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2017] [Accepted: 01/24/2017] [Indexed: 02/07/2023] Open
Abstract
Next-generation sequencing (NGS) methods for cancer testing have been rapidly adopted by clinical laboratories. To establish analytical validation best practice guidelines for NGS gene panel testing of somatic variants, a working group was convened by the Association of Molecular Pathology with liaison representation from the College of American Pathologists. These joint consensus recommendations address NGS test development, optimization, and validation, including recommendations on panel content selection and rationale for optimization and familiarization phase conducted before test validation; utilization of reference cell lines and reference materials for evaluation of assay performance; determining of positive percentage agreement and positive predictive value for each variant type; and requirements for minimal depth of coverage and minimum number of samples that should be used to establish test performance characteristics. The recommendations emphasize the role of laboratory director in using an error-based approach that identifies potential sources of errors that may occur throughout the analytical process and addressing these potential errors through test design, method validation, or quality controls so that no harm comes to the patient. The recommendations contained herein are intended to assist clinical laboratories with the validation and ongoing monitoring of NGS testing for detection of somatic variants and to ensure high quality of sequencing results.
Collapse
Affiliation(s)
- Lawrence J Jennings
- Next-Generation Sequencing Analytical Validation Working Group of the Clinical Practice Committee, Bethesda, Maryland; Ann & Robert H. Lurie Children's Hospital of Chicago, Northwestern University's Feinberg School of Medicine, Chicago, Illinois.
| | - Maria E Arcila
- Next-Generation Sequencing Analytical Validation Working Group of the Clinical Practice Committee, Bethesda, Maryland; Memorial Sloan Kettering Cancer Center, New York, New York
| | - Christopher Corless
- Next-Generation Sequencing Analytical Validation Working Group of the Clinical Practice Committee, Bethesda, Maryland; Department of Pathology and Knight Cancer Institute, Oregon Health and Science University, Portland, Oregon
| | - Suzanne Kamel-Reid
- Next-Generation Sequencing Analytical Validation Working Group of the Clinical Practice Committee, Bethesda, Maryland; Department of Clinical Laboratory Genetics, University Health Network, Toronto, Ontario, Canada; Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada
| | - Ira M Lubin
- Next-Generation Sequencing Analytical Validation Working Group of the Clinical Practice Committee, Bethesda, Maryland; Centers for Disease Control and Prevention, Atlanta, Georgia
| | - John Pfeifer
- Next-Generation Sequencing Analytical Validation Working Group of the Clinical Practice Committee, Bethesda, Maryland; Washington University School of Medicine, St. Louis, Missouri
| | | | - Karl V Voelkerding
- Next-Generation Sequencing Analytical Validation Working Group of the Clinical Practice Committee, Bethesda, Maryland; ARUP Laboratories, Salt Lake City, Utah; Department of Pathology, University of Utah, Salt Lake City, Utah
| | - Marina N Nikiforova
- Next-Generation Sequencing Analytical Validation Working Group of the Clinical Practice Committee, Bethesda, Maryland; University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania
| |
Collapse
|
19
|
Schuelke M, Øien NC, Oldfors A. Myopathology in the times of modern genetics. Neuropathol Appl Neurobiol 2017; 43:44-61. [DOI: 10.1111/nan.12374] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2016] [Revised: 12/03/2016] [Accepted: 12/23/2016] [Indexed: 12/14/2022]
Affiliation(s)
- M. Schuelke
- Department of Neuropediatrics and NeuroCure Clinical Research Center; Charité-Universitätsmedizin; Berlin Germany
| | - N. C. Øien
- Department of Neuropediatrics and NeuroCure Clinical Research Center; Charité-Universitätsmedizin; Berlin Germany
- Max-Delbrück-Center for Molecular Medicine; Berlin Germany
| | - A. Oldfors
- Department of Pathology and Genetics; Institute of Biomedicine; University of Gothenburg; Gothenburg Sweden
| |
Collapse
|
20
|
Zampetaki A, Mayr M. Circulating microRNAs as Novel Biomarkers in Cardiovascular Disease: Basic and Technical Principles. NON-CODING RNAS IN THE VASCULATURE 2017. [DOI: 10.1007/978-3-319-52945-5_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
21
|
Duncavage EJ, Abel HJ, Pfeifer JD. In Silico Proficiency Testing for Clinical Next-Generation Sequencing. J Mol Diagn 2016; 19:35-42. [PMID: 27863262 DOI: 10.1016/j.jmoldx.2016.09.005] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2016] [Revised: 09/09/2016] [Accepted: 09/13/2016] [Indexed: 12/18/2022] Open
Abstract
Quality assurance for clinical next-generation sequencing (NGS)-based assays is difficult given the complex methods and the range of sequence variants such assays can detect. As the number and range of mutations detected by clinical NGS assays has increased, it is difficult to apply standard analyte-specific proficiency testing (PT). Most current proficiency testing challenges for NGS are methods-based PT surveys that use DNA from reference samples engineered to harbor specific mutations that test both sequence generation and bioinformatics analysis. These methods-based PTs are limited by the number and types of mutations that can be physically introduced into a single DNA sample. In silico proficiency testing, which evaluates only the bioinformatics component of NGS assays, is a recently introduced PT method that allows for evaluation of numerous mutations spanning a range of variant classes. In silico PT data sets can be generated from simulated or actual sequencing data and are used to test alignment through variant detection and annotation steps. In silico PT has several advantages over the use of physical samples, including greater flexibility in tested variants, the ability to design laboratory-specific challenges, and lower costs. Herein, we review the use of in silico PT as an alternative to traditional methods-based PT as it is evolving in oncology applications and discuss how the approach is applicable more broadly.
Collapse
Affiliation(s)
- Eric J Duncavage
- Department of Pathology, Washington University School of Medicine, St. Louis, Missouri
| | - Haley J Abel
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri
| | - John D Pfeifer
- Department of Pathology, Washington University School of Medicine, St. Louis, Missouri.
| |
Collapse
|
22
|
Lee H, Cho DY, Whitworth C, Eisman R, Phelps M, Roote J, Kaufman T, Cook K, Russell S, Przytycka T, Oliver B. Effects of Gene Dose, Chromatin, and Network Topology on Expression in Drosophila melanogaster. PLoS Genet 2016; 12:e1006295. [PMID: 27599372 PMCID: PMC5012587 DOI: 10.1371/journal.pgen.1006295] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2015] [Accepted: 08/10/2016] [Indexed: 11/18/2022] Open
Abstract
Deletions, commonly referred to as deficiencies by Drosophila geneticists, are valuable tools for mapping genes and for genetic pathway discovery via dose-dependent suppressor and enhancer screens. More recently, it has become clear that deviations from normal gene dosage are associated with multiple disorders in a range of species including humans. While we are beginning to understand some of the transcriptional effects brought about by gene dosage changes and the chromosome rearrangement breakpoints associated with them, much of this work relies on isolated examples. We have systematically examined deficiencies of the left arm of chromosome 2 and characterize gene-by-gene dosage responses that vary from collapsed expression through modest partial dosage compensation to full or even over compensation. We found negligible long-range effects of creating novel chromosome domains at deletion breakpoints, suggesting that cases of gene regulation due to altered nuclear architecture are rare. These rare cases include trans de-repression when deficiencies delete chromatin characterized as repressive in other studies. Generally, effects of breakpoints on expression are promoter proximal (~100bp) or in the gene body. Effects of deficiencies genome-wide are in genes with regulatory relationships to genes within the deleted segments, highlighting the subtle expression network defects in these sensitized genetic backgrounds.
Collapse
Affiliation(s)
- Hangnoh Lee
- Section of Developmental Genomics, Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Kidney and Digestive Diseases, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Dong-Yeon Cho
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Cale Whitworth
- Section of Developmental Genomics, Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Kidney and Digestive Diseases, National Institutes of Health, Bethesda, Maryland, United States of America
- Department of Biology, Indiana University, Bloomington, Indiana, United States of America
| | - Robert Eisman
- Department of Biology, Indiana University, Bloomington, Indiana, United States of America
| | - Melissa Phelps
- Department of Biology, Indiana University, Bloomington, Indiana, United States of America
| | - John Roote
- Department of Genetics and Cambridge Systems Biology Centre, University of Cambridge, Cambridge, United Kingdom
| | - Thomas Kaufman
- Department of Biology, Indiana University, Bloomington, Indiana, United States of America
| | - Kevin Cook
- Department of Biology, Indiana University, Bloomington, Indiana, United States of America
| | - Steven Russell
- Department of Genetics and Cambridge Systems Biology Centre, University of Cambridge, Cambridge, United Kingdom
| | - Teresa Przytycka
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Brian Oliver
- Section of Developmental Genomics, Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Kidney and Digestive Diseases, National Institutes of Health, Bethesda, Maryland, United States of America
| |
Collapse
|
23
|
Roy S, Pfeifer JD, LaFramboise WA, Pantanowitz L. Molecular digital pathology: progress and potential of exchanging molecular data. Expert Rev Mol Diagn 2016; 16:941-7. [PMID: 27471996 DOI: 10.1080/14737159.2016.1206472] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Many of the demands to perform next generation sequencing (NGS) in the clinical laboratory can be resolved using the principles of telepathology. Molecular telepathology can allow facilities to outsource all or a portion of their NGS operation such as cloud computing, bioinformatics pipelines, variant data management, and knowledge curation. Clinical pathology laboratories can electronically share diverse types of molecular data with reference laboratories, technology service providers, and/or regulatory agencies. Exchange of electronic molecular data allows laboratories to perform validation of rare diseases using foreign data, check the accuracy of their test results against benchmarks, and leverage in silico proficiency testing. This review covers the emerging subject of molecular telepathology, describes clinical use cases for the appropriate exchange of molecular data, and highlights key issues such as data integrity, interoperable formats for massive genomic datasets, security, malpractice and emerging regulations involved with this novel practice.
Collapse
Affiliation(s)
- Somak Roy
- a Department of Pathology , University of Pittsburgh Medical Center , Pittsburgh , PA , USA
| | - John D Pfeifer
- b Department of Pathology , Washington University , St Louis , MO , USA
| | - William A LaFramboise
- a Department of Pathology , University of Pittsburgh Medical Center , Pittsburgh , PA , USA
| | - Liron Pantanowitz
- a Department of Pathology , University of Pittsburgh Medical Center , Pittsburgh , PA , USA
| |
Collapse
|
24
|
Duncavage EJ, Abel HJ, Merker JD, Bodner JB, Zhao Q, Voelkerding KV, Pfeifer JD. A Model Study of In Silico Proficiency Testing for Clinical Next-Generation Sequencing. Arch Pathol Lab Med 2016; 140:1085-91. [PMID: 27388684 DOI: 10.5858/arpa.2016-0194-cp] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
CONTEXT -Most current proficiency testing challenges for next-generation sequencing assays are methods-based proficiency testing surveys that use DNA from characterized reference samples to test both the wet-bench and bioinformatics/dry-bench aspects of the tests. Methods-based proficiency testing surveys are limited by the number and types of mutations that either are naturally present or can be introduced into a single DNA sample. OBJECTIVE -To address these limitations by exploring a model of in silico proficiency testing in which sequence data from a single well-characterized specimen are manipulated electronically. DESIGN -DNA from the College of American Pathologists reference genome was enriched using the Illumina TruSeq and Life Technologies AmpliSeq panels and sequenced on the MiSeq and Ion Torrent platforms, respectively. The resulting data were mutagenized in silico and 26 variants, including single-nucleotide variants, deletions, and dinucleotide substitutions, were added at variant allele fractions (VAFs) from 10% to 50%. Participating clinical laboratories downloaded these files and analyzed them using their clinical bioinformatics pipelines. RESULTS -Laboratories using the AmpliSeq/Ion Torrent and/or the TruSeq/MiSeq participated in the 2 surveys. On average, laboratories identified 24.6 of 26 variants (95%) overall and 21.4 of 22 variants (97%) with VAFs greater than 15%. No false-positive calls were reported. The most frequently missed variants were single-nucleotide variants with VAFs less than 15%. Across both challenges, reported VAF concordance was excellent, with less than 1% median absolute difference between the simulated VAF and mean reported VAF. CONCLUSIONS -The results indicate that in silico proficiency testing is a feasible approach for methods-based proficiency testing, and demonstrate that the sensitivity and specificity of current next-generation sequencing bioinformatics across clinical laboratories are high.
Collapse
Affiliation(s)
- Eric J Duncavage
- From the Departments of Pathology (Drs Duncavage and Pfeifer) and Genetics (Dr Abel), Washington University School of Medicine, St Louis, Missouri; the Department of Pathology (Dr Merker), Stanford University School of Medicine, Stanford, California; Product Development, Laboratory Improvement Program (Mr Bodner), and the Surveys Department (Dr Zhao), College of American Pathologists, Northfield, Illinois; and the Department of Pathology and ARUP Laboratories, University of Utah, Salt Lake City (Dr Voelkerding)
| | | | | | | | | | | | | |
Collapse
|
25
|
Venezuelan Equine Encephalitis Virus Induces Apoptosis through the Unfolded Protein Response Activation of EGR1. J Virol 2016; 90:3558-72. [PMID: 26792742 PMCID: PMC4794670 DOI: 10.1128/jvi.02827-15] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Accepted: 01/08/2016] [Indexed: 12/11/2022] Open
Abstract
Venezuelan equine encephalitis virus (VEEV) is a previously weaponized arthropod-borne virus responsible for causing acute and fatal encephalitis in animal and human hosts. The increased circulation and spread in the Americas of VEEV and other encephalitic arboviruses, such as eastern equine encephalitis virus and West Nile virus, underscore the need for research aimed at characterizing the pathogenesis of viral encephalomyelitis for the development of novel medical countermeasures. The host-pathogen dynamics of VEEV Trinidad donkey-infected human astrocytoma U87MG cells were determined by carrying out RNA sequencing (RNA-Seq) of poly(A) and mRNAs. To identify the critical alterations that take place in the host transcriptome following VEEV infection, samples were collected at 4, 8, and 16 h postinfection and RNA-Seq data were acquired using an Ion Torrent PGM platform. Differential expression of interferon response, stress response factors, and components of the unfolded protein response (UPR) was observed. The protein kinase RNA-like endoplasmic reticulum kinase (PERK) arm of the UPR was activated, as the expression of both activating transcription factor 4 (ATF4) and CHOP (DDIT3), critical regulators of the pathway, was altered after infection. Expression of the transcription factor early growth response 1 (EGR1) was induced in a PERK-dependent manner. EGR1−/− mouse embryonic fibroblasts (MEFs) demonstrated lower susceptibility to VEEV-induced cell death than isogenic wild-type MEFs, indicating that EGR1 modulates proapoptotic pathways following VEEV infection. The influence of EGR1 is of great importance, as neuronal damage can lead to long-term sequelae in individuals who have survived VEEV infection. IMPORTANCE Alphaviruses represent a group of clinically relevant viruses transmitted by mosquitoes to humans. In severe cases, viral spread targets neuronal tissue, resulting in significant and life-threatening inflammation dependent on a combination of virus-host interactions. Currently there are no therapeutics for infections cause by encephalitic alphaviruses due to an incomplete understanding of their molecular pathogenesis. Venezuelan equine encephalitis virus (VEEV) is an alphavirus that is prevalent in the Americas and that is capable of infecting horses and humans. Here we utilized next-generation RNA sequencing to identify differential alterations in VEEV-infected astrocytes. Our results indicated that the abundance of transcripts associated with the interferon and the unfolded protein response pathways was altered following infection and demonstrated that early growth response 1 (EGR1) contributed to VEEV-induced cell death.
Collapse
|
26
|
Krasnov GS, Dmitriev AA, Kudryavtseva AV, Shargunov AV, Karpov DS, Uroshlev LA, Melnikova NV, Blinov VM, Poverennaya EV, Archakov AI, Lisitsa AV, Ponomarenko EA. PPLine: An Automated Pipeline for SNP, SAP, and Splice Variant Detection in the Context of Proteogenomics. J Proteome Res 2015; 14:3729-37. [DOI: 10.1021/acs.jproteome.5b00490] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- George Sergeevich Krasnov
- Engelhardt
Institute of Molecular Biology, Russian Academy of Sciences, Moscow, 111991 Russia
- Orekhovich
Institute of Biomedical Chemistry, Russian Academy of Medical Sciences, Moscow, 119121 Russia
- Mechnikov Research Institute of Vaccines and Sera, Moscow, 105064 Russia
| | | | - Anna Viktorovna Kudryavtseva
- Engelhardt
Institute of Molecular Biology, Russian Academy of Sciences, Moscow, 111991 Russia
- Herzen
Moscow Cancer Research Institute, Ministry of Healthcare of the Russian Federation, Moscow, 125284 Russia
| | - Alexander Valerievich Shargunov
- Orekhovich
Institute of Biomedical Chemistry, Russian Academy of Medical Sciences, Moscow, 119121 Russia
- Mechnikov Research Institute of Vaccines and Sera, Moscow, 105064 Russia
| | - Dmitry Sergeevich Karpov
- Engelhardt
Institute of Molecular Biology, Russian Academy of Sciences, Moscow, 111991 Russia
- Orekhovich
Institute of Biomedical Chemistry, Russian Academy of Medical Sciences, Moscow, 119121 Russia
| | | | | | - Vladimir Mikhailovich Blinov
- Orekhovich
Institute of Biomedical Chemistry, Russian Academy of Medical Sciences, Moscow, 119121 Russia
- Mechnikov Research Institute of Vaccines and Sera, Moscow, 105064 Russia
| | | | | | - Andrey Valerievich Lisitsa
- Orekhovich
Institute of Biomedical Chemistry, Russian Academy of Medical Sciences, Moscow, 119121 Russia
| | | |
Collapse
|
27
|
Olson ND, Lund SP, Colman RE, Foster JT, Sahl JW, Schupp JM, Keim P, Morrow JB, Salit ML, Zook JM. Best practices for evaluating single nucleotide variant calling methods for microbial genomics. Front Genet 2015. [PMID: 26217378 PMCID: PMC4493402 DOI: 10.3389/fgene.2015.00235] [Citation(s) in RCA: 109] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Innovations in sequencing technologies have allowed biologists to make incredible advances in understanding biological systems. As experience grows, researchers increasingly recognize that analyzing the wealth of data provided by these new sequencing platforms requires careful attention to detail for robust results. Thus far, much of the scientific Communit’s focus for use in bacterial genomics has been on evaluating genome assembly algorithms and rigorously validating assembly program performance. Missing, however, is a focus on critical evaluation of variant callers for these genomes. Variant calling is essential for comparative genomics as it yields insights into nucleotide-level organismal differences. Variant calling is a multistep process with a host of potential error sources that may lead to incorrect variant calls. Identifying and resolving these incorrect calls is critical for bacterial genomics to advance. The goal of this review is to provide guidance on validating algorithms and pipelines used in variant calling for bacterial genomics. First, we will provide an overview of the variant calling procedures and the potential sources of error associated with the methods. We will then identify appropriate datasets for use in evaluating algorithms and describe statistical methods for evaluating algorithm performance. As variant calling moves from basic research to the applied setting, standardized methods for performance evaluation and reporting are required; it is our hope that this review provides the groundwork for the development of these standards.
Collapse
Affiliation(s)
- Nathan D Olson
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA
| | - Steven P Lund
- Statistical Engineering Division, Information Technology Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA
| | - Rebecca E Colman
- Division of Pathogen Genomics, Translational Genomics Research Institute , Flagstaff, AZ, USA
| | - Jeffrey T Foster
- Center for Microbial Genetics and Genomics, Northern Arizona University , Flagstaff, AZ, USA
| | - Jason W Sahl
- Division of Pathogen Genomics, Translational Genomics Research Institute , Flagstaff, AZ, USA ; Center for Microbial Genetics and Genomics, Northern Arizona University , Flagstaff, AZ, USA
| | - James M Schupp
- Division of Pathogen Genomics, Translational Genomics Research Institute , Flagstaff, AZ, USA
| | - Paul Keim
- Division of Pathogen Genomics, Translational Genomics Research Institute , Flagstaff, AZ, USA ; Center for Microbial Genetics and Genomics, Northern Arizona University , Flagstaff, AZ, USA
| | - Jayne B Morrow
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA
| | - Marc L Salit
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA ; Department of Bioengineering, Stanford University , Stanford, CA, USA
| | - Justin M Zook
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA
| |
Collapse
|
28
|
Oh SA, Yang I, Hahn Y, Kang YK, Chung SK, Jeong S. SiNG-PCRseq: Accurate inter-sequence quantification achieved by spiking-in a neighbor genome for competitive PCR amplicon sequencing. Sci Rep 2015; 5:11879. [PMID: 26144254 PMCID: PMC4491706 DOI: 10.1038/srep11879] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2014] [Accepted: 04/10/2015] [Indexed: 01/04/2023] Open
Abstract
Despite the recent technological advances in DNA quantitation by sequencing, accurate delineation of the quantitative relationship among different DNA sequences is yet to be elaborated due to difficulties in correcting the sequence-specific quantitation biases. We here developed a novel DNA quantitation method via spiking-in a neighbor genome for competitive PCR amplicon sequencing (SiNG-PCRseq). This method utilizes genome-wide chemically equivalent but easily discriminable homologous sequences with a known copy arrangement in the neighbor genome. By comparing the amounts of selected human DNA sequences simultaneously to those of matched sequences in the orangutan genome, we could accurately draw the quantitative relationships for those sequences in the human genome (root-mean-square deviations <0.05). Technical replications of cDNA quantitation performed using different reagents at different time points also resulted in excellent correlations (R2 > 0.95). The cDNA quantitation using SiNG-PCRseq was highly concordant with the RNA-seq-derived version in inter-sample comparisons (R2 = 0.88), but relatively discordant in inter-sequence quantitation (R2 < 0.44), indicating considerable level of sequence-dependent quantitative biases in RNA-seq. Considering the measurement structure explicitly relating the amount of different sequences within a sample, SiNG-PCRseq will facilitate sharing and comparing the quantitation data generated under different spatio-temporal settings.
Collapse
Affiliation(s)
- Soo A Oh
- Medical Research Division, Korea Institute of Oriental Medicine (KIOM), Daejeon 305-811, Korea
| | - Inchul Yang
- Center for Bioanalysis, Korea Research Institute of Standards and Science (KRISS), Daejeon 305-340, Korea
| | - Yoonsoo Hahn
- Department of Life Science, Chung-Ang University, Seoul 156-756, Korea
| | - Yong-Kook Kang
- Development and Differentiation Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 305-806, Korea
| | - Sun-Ku Chung
- Medical Research Division, Korea Institute of Oriental Medicine (KIOM), Daejeon 305-811, Korea
| | - Sangkyun Jeong
- Medical Research Division, Korea Institute of Oriental Medicine (KIOM), Daejeon 305-811, Korea
| |
Collapse
|
29
|
Locati MD, Terpstra I, de Leeuw WC, Kuzak M, Rauwerda H, Ensink WA, van Leeuwen S, Nehrdich U, Spaink HP, Jonker MJ, Breit TM, Dekker RJ. Improving small RNA-seq by using a synthetic spike-in set for size-range quality control together with a set for data normalization. Nucleic Acids Res 2015; 43:e89. [PMID: 25870415 PMCID: PMC4538800 DOI: 10.1093/nar/gkv303] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2014] [Accepted: 03/27/2015] [Indexed: 01/18/2023] Open
Abstract
There is an increasing interest in complementing RNA-seq experiments with small-RNA (sRNA) expression data to obtain a comprehensive view of a transcriptome. Currently, two main experimental challenges concerning sRNA-seq exist: how to check the size distribution of isolated sRNAs, given the sensitive size-selection steps in the protocol; and how to normalize data between samples, given the low complexity of sRNA types. We here present two separate sets of synthetic RNA spike-ins for monitoring size-selection and for performing data normalization in sRNA-seq. The size-range quality control (SRQC) spike-in set, consisting of 11 oligoribonucleotides (10–70 nucleotides), was tested by intentionally altering the size-selection protocol and verified via several comparative experiments. We demonstrate that the SRQC set is useful to reproducibly track down biases in the size-selection in sRNA-seq. The external reference for data-normalization (ERDN) spike-in set, consisting of 19 oligoribonucleotides, was developed for sample-to-sample normalization in differential-expression analysis of sRNA-seq data. Testing and applying the ERDN set showed that it can reproducibly detect differential expression over a dynamic range of 218. Hence, biological variation in sRNA composition and content between samples is preserved while technical variation is effectively minimized. Together, both spike-in sets can significantly improve the technical reproducibility of sRNA-seq.
Collapse
Affiliation(s)
- Mauro D Locati
- RNA Biology & Applied Bioinformatics research group, Swammerdam Institute for Life Sciences, Faculty of Science, University of Amsterdam, Amsterdam 1090 GE, The Netherlands
| | - Inez Terpstra
- RNA Biology & Applied Bioinformatics research group, Swammerdam Institute for Life Sciences, Faculty of Science, University of Amsterdam, Amsterdam 1090 GE, The Netherlands
| | - Wim C de Leeuw
- RNA Biology & Applied Bioinformatics research group, Swammerdam Institute for Life Sciences, Faculty of Science, University of Amsterdam, Amsterdam 1090 GE, The Netherlands Netherlands eScience Center, Amsterdam 1098 XG, The Netherlands
| | - Mateusz Kuzak
- RNA Biology & Applied Bioinformatics research group, Swammerdam Institute for Life Sciences, Faculty of Science, University of Amsterdam, Amsterdam 1090 GE, The Netherlands Netherlands eScience Center, Amsterdam 1098 XG, The Netherlands
| | - Han Rauwerda
- RNA Biology & Applied Bioinformatics research group, Swammerdam Institute for Life Sciences, Faculty of Science, University of Amsterdam, Amsterdam 1090 GE, The Netherlands Netherlands eScience Center, Amsterdam 1098 XG, The Netherlands
| | - Wim A Ensink
- RNA Biology & Applied Bioinformatics research group, Swammerdam Institute for Life Sciences, Faculty of Science, University of Amsterdam, Amsterdam 1090 GE, The Netherlands
| | - Selina van Leeuwen
- RNA Biology & Applied Bioinformatics research group, Swammerdam Institute for Life Sciences, Faculty of Science, University of Amsterdam, Amsterdam 1090 GE, The Netherlands
| | - Ulrike Nehrdich
- Department of Molecular Cell Biology, Institute of Biology, Leiden University, Gorlaeus Laboratories - Cell Observatorium, Leiden 2333 CE, The Netherlands
| | - Herman P Spaink
- Department of Molecular Cell Biology, Institute of Biology, Leiden University, Gorlaeus Laboratories - Cell Observatorium, Leiden 2333 CE, The Netherlands
| | - Martijs J Jonker
- RNA Biology & Applied Bioinformatics research group, Swammerdam Institute for Life Sciences, Faculty of Science, University of Amsterdam, Amsterdam 1090 GE, The Netherlands
| | - Timo M Breit
- RNA Biology & Applied Bioinformatics research group, Swammerdam Institute for Life Sciences, Faculty of Science, University of Amsterdam, Amsterdam 1090 GE, The Netherlands
| | - Rob J Dekker
- RNA Biology & Applied Bioinformatics research group, Swammerdam Institute for Life Sciences, Faculty of Science, University of Amsterdam, Amsterdam 1090 GE, The Netherlands
| |
Collapse
|
30
|
Abstract
RNA sequencing (RNA-Seq) uses the capabilities of high-throughput sequencing methods to provide insight into the transcriptome of a cell. Compared to previous Sanger sequencing- and microarray-based methods, RNA-Seq provides far higher coverage and greater resolution of the dynamic nature of the transcriptome. Beyond quantifying gene expression, the data generated by RNA-Seq facilitate the discovery of novel transcripts, identification of alternatively spliced genes, and detection of allele-specific expression. Recent advances in the RNA-Seq workflow, from sample preparation to library construction to data analysis, have enabled researchers to further elucidate the functional complexity of the transcription. In addition to polyadenylated messenger RNA (mRNA) transcripts, RNA-Seq can be applied to investigate different populations of RNA, including total RNA, pre-mRNA, and noncoding RNA, such as microRNA and long ncRNA. This article provides an introduction to RNA-Seq methods, including applications, experimental design, and technical challenges.
Collapse
Affiliation(s)
- Kimberly R Kukurba
- Department of Pathology, Stanford University School of Medicine, Stanford, California 94305; Department of Genetics, Stanford University School of Medicine, Stanford, California 94305
| | - Stephen B Montgomery
- Department of Pathology, Stanford University School of Medicine, Stanford, California 94305; Department of Genetics, Stanford University School of Medicine, Stanford, California 94305; Department of Computer Science, Stanford University School of Medicine, Stanford, California 94305
| |
Collapse
|
31
|
Olson ND, Lund SP, Zook JM, Rojas-Cornejo F, Beck B, Foy C, Huggett J, Whale AS, Sui Z, Baoutina A, Dobeson M, Partis L, Morrow JB. International interlaboratory study comparing single organism 16S rRNA gene sequencing data: Beyond consensus sequence comparisons. BIOMOLECULAR DETECTION AND QUANTIFICATION 2015; 3:17-24. [PMID: 27077030 PMCID: PMC4822220 DOI: 10.1016/j.bdq.2015.01.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/30/2014] [Revised: 01/27/2015] [Accepted: 01/27/2015] [Indexed: 12/30/2022]
Abstract
This study presents the results from an interlaboratory sequencing study for which we developed a novel high-resolution method for comparing data from different sequencing platforms for a multi-copy, paralogous gene. The combination of PCR amplification and 16S ribosomal RNA gene (16S rRNA) sequencing has revolutionized bacteriology by enabling rapid identification, frequently without the need for culture. To assess variability between laboratories in sequencing 16S rRNA, six laboratories sequenced the gene encoding the 16S rRNA from Escherichia coli O157:H7 strain EDL933 and Listeria monocytogenes serovar 4b strain NCTC11994. Participants performed sequencing methods and protocols available in their laboratories: Sanger sequencing, Roche 454 pyrosequencing(®), or Ion Torrent PGM(®). The sequencing data were evaluated on three levels: (1) identity of biologically conserved position, (2) ratio of 16S rRNA gene copies featuring identified variants, and (3) the collection of variant combinations in a set of 16S rRNA gene copies. The same set of biologically conserved positions was identified for each sequencing method. Analytical methods using Bayesian and maximum likelihood statistics were developed to estimate variant copy ratios, which describe the ratio of nucleotides at each identified biologically variable position, as well as the likely set of variant combinations present in 16S rRNA gene copies. Our results indicate that estimated variant copy ratios at biologically variable positions were only reproducible for high throughput sequencing methods. Furthermore, the likely variant combination set was only reproducible with increased sequencing depth and longer read lengths. We also demonstrate novel methods for evaluating variable positions when comparing multi-copy gene sequence data from multiple laboratories generated using multiple sequencing technologies.
Collapse
Affiliation(s)
- Nathan D Olson
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, 100 Bureau Dr, Gaithersburg, MD 20899, USA
| | - Steven P Lund
- Statistical Engineering Division, National Institute of Standards and Technology, 100 Bureau Dr, Gaithersburg, MD 20899, USA
| | - Justin M Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, 100 Bureau Dr, Gaithersburg, MD 20899, USA
| | | | - Brian Beck
- American Type Culture Collection, 10801 University Boulevard, Manassas, VA 20110, USA
| | - Carole Foy
- Science and Innovation Division, LGC, Queens Rd, Teddington, Middlesex TW11 0LY, UK
| | - Jim Huggett
- Science and Innovation Division, LGC, Queens Rd, Teddington, Middlesex TW11 0LY, UK
| | - Alexandra S Whale
- Science and Innovation Division, LGC, Queens Rd, Teddington, Middlesex TW11 0LY, UK
| | - Zhiwei Sui
- National Institute of Metrology, Beijing 100013, China
| | - Anna Baoutina
- National Measurement Institute, West Lindfield, NSW 2070, Australia
| | - Michael Dobeson
- National Measurement Institute, West Lindfield, NSW 2070, Australia
| | - Lina Partis
- National Measurement Institute, West Lindfield, NSW 2070, Australia
| | - Jayne B Morrow
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, 100 Bureau Dr, Gaithersburg, MD 20899, USA
| |
Collapse
|
32
|
Gonçalves da Silva A, Barendse W, Kijas JW, Barris WC, McWilliam S, Bunch RJ, McCullough R, Harrison B, Hoelzel AR, England PR. SNP discovery in nonmodel organisms: strand bias and base-substitution errors reduce conversion rates. Mol Ecol Resour 2014; 15:723-36. [PMID: 25388640 DOI: 10.1111/1755-0998.12343] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2014] [Revised: 10/31/2014] [Accepted: 11/03/2014] [Indexed: 11/28/2022]
Abstract
Single nucleotide polymorphisms (SNPs) have become the marker of choice for genetic studies in organisms of conservation, commercial or biological interest. Most SNP discovery projects in nonmodel organisms apply a strategy for identifying putative SNPs based on filtering rules that account for random sequencing errors. Here, we analyse data used to develop 4723 novel SNPs for the commercially important deep-sea fish, orange roughy (Hoplostethus atlanticus), to assess the impact of not accounting for systematic sequencing errors when filtering identified polymorphisms when discovering SNPs. We used SAMtools to identify polymorphisms in a velvet assembly of genomic DNA sequence data from seven individuals. The resulting set of polymorphisms were filtered to minimize 'bycatch'-polymorphisms caused by sequencing or assembly error. An Illumina Infinium SNP chip was used to genotype a final set of 7714 polymorphisms across 1734 individuals. Five predictors were examined for their effect on the probability of obtaining an assayable SNP: depth of coverage, number of reads that support a variant, polymorphism type (e.g. A/C), strand-bias and Illumina SNP probe design score. Our results indicate that filtering out systematic sequencing errors could substantially improve the efficiency of SNP discovery. We show that BLASTX can be used as an efficient tool to identify single-copy genomic regions in the absence of a reference genome. The results have implications for research aiming to identify assayable SNPs and build SNP genotyping assays for nonmodel organisms.
Collapse
Affiliation(s)
- Anders Gonçalves da Silva
- CSIRO Oceans and Atmosphere, GPO Box 1538, Hobart, Tas., 7001, Australia.,School of Biological Sciences, Monash University, 18 Innovation Walk, Clayton, Vic, 3800, Australia
| | - William Barendse
- CSIRO Animal, Food and Health Sciences, 306 Carmody Road, St Lucia, Qld, 4067, Australia
| | - James W Kijas
- CSIRO Animal, Food and Health Sciences, 306 Carmody Road, St Lucia, Qld, 4067, Australia
| | - Wes C Barris
- CSIRO Animal, Food and Health Sciences, 306 Carmody Road, St Lucia, Qld, 4067, Australia
| | - Sean McWilliam
- CSIRO Animal, Food and Health Sciences, 306 Carmody Road, St Lucia, Qld, 4067, Australia
| | - Rowan J Bunch
- CSIRO Animal, Food and Health Sciences, 306 Carmody Road, St Lucia, Qld, 4067, Australia
| | - Russell McCullough
- CSIRO Animal, Food and Health Sciences, 306 Carmody Road, St Lucia, Qld, 4067, Australia
| | - Blair Harrison
- CSIRO Animal, Food and Health Sciences, 306 Carmody Road, St Lucia, Qld, 4067, Australia
| | - A Rus Hoelzel
- School of Biological and Biomedical Sciences, Durham University, Durham, DH1 3LE, UK
| | | |
Collapse
|
33
|
Li S, Tighe SW, Nicolet CM, Grove D, Levy S, Farmerie W, Viale A, Wright C, Schweitzer PA, Gao Y, Kim D, Boland J, Hicks B, Kim R, Chhangawala S, Jafari N, Raghavachari N, Gandara J, Garcia-Reyero N, Hendrickson C, Roberson D, Rosenfeld J, Smith T, Underwood JG, Wang M, Zumbo P, Baldwin DA, Grills GS, Mason CE. Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study. Nat Biotechnol 2014; 32:915-925. [PMID: 25150835 PMCID: PMC4167418 DOI: 10.1038/nbt.2972] [Citation(s) in RCA: 160] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2013] [Accepted: 07/01/2014] [Indexed: 01/17/2023]
Abstract
High-throughput RNA sequencing (RNA-seq) greatly expands the potential for genomics discoveries, but the wide variety of platforms, protocols and performance capabilitites has created the need for comprehensive reference data. Here we describe the Association of Biomolecular Resource Facilities next-generation sequencing (ABRF-NGS) study on RNA-seq. We carried out replicate experiments across 15 laboratory sites using reference RNA standards to test four protocols (poly-A-selected, ribo-depleted, size-selected and degraded) on five sequencing platforms (Illumina HiSeq, Life Technologies PGM and Proton, Pacific Biosciences RS and Roche 454). The results show high intraplatform (Spearman rank R > 0.86) and inter-platform (R > 0.83) concordance for expression measures across the deep-count platforms, but highly variable efficiency and cost for splice junction and variant detection between all platforms. For intact RNA, gene expression profiles from rRNA-depletion and poly-A enrichment are similar. In addition, rRNA depletion enables effective analysis of degraded RNA samples. This study provides a broad foundation for cross-platform standardization, evaluation and improvement of RNA-seq.
Collapse
Affiliation(s)
- Sheng Li
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York, USA
| | - Scott W. Tighe
- Vermont Cancer Center, University of Vermont, Burlington, Vermont, USA
| | - Charles M. Nicolet
- Keck School of Medicine, University of Southern California, Los Angeles, California, USA
| | - Deborah Grove
- The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, USA
| | - Shawn Levy
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama, USA
| | - William Farmerie
- Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, Florida, USA
| | - Agnes Viale
- Memorial Sloan-Kettering Cancer Institute, New York, New York, USA
| | - Chris Wright
- Roy J. Carver Biotechnology Center, University of Illinois, Urbana, Illinois, USA
| | - Peter A. Schweitzer
- Biotechnology Resource Center, Institute of Biotechnology, Cornell University, Ithaca, New York, USA
| | - Yuan Gao
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, USA
| | - Dewey Kim
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, USA
| | - Joe Boland
- NIH/NCI/SAIC-Frederick, Gaithersburg, Maryland, USA
| | | | - Ryan Kim
- Genome Center, University of California, Davis, Davis, California, USA
| | - Sagar Chhangawala
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York, USA
| | - Nadereh Jafari
- Center for Genetic Medicine, Northwestern University, Chicago, Illinois, USA
| | | | - Jorge Gandara
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York, USA
| | - Natàlia Garcia-Reyero
- Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Starkville, Mississippi, USA
| | | | | | - Jeffrey Rosenfeld
- Division of High Performance and Research Computing, University of Medicine and Dentistry of New Jersey, Newark, New Jersey, USA
| | - Todd Smith
- PerkinElmer Inc., Seattle, Washington, USA
| | - Jason G. Underwood
- University of Washington, Department of Genome Sciences. Seattle, Washington, USA
| | - May Wang
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, Georgia, USA
| | - Paul Zumbo
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York, USA
| | | | - George S. Grills
- Biotechnology Resource Center, Institute of Biotechnology, Cornell University, Ithaca, New York, USA
| | - Christopher E. Mason
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York, USA
| |
Collapse
|
34
|
Plant AL, Locascio LE, May WE, Gallagher PD. Improved reproducibility by assuring confidence in measurements in biomedical research. Nat Methods 2014; 11:895-8. [DOI: 10.1038/nmeth.3076] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
35
|
Pant S, Weiner R, Marton MJ. Navigating the rapids: the development of regulated next-generation sequencing-based clinical trial assays and companion diagnostics. Front Oncol 2014; 4:78. [PMID: 24860780 PMCID: PMC4029014 DOI: 10.3389/fonc.2014.00078] [Citation(s) in RCA: 64] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2013] [Accepted: 03/28/2014] [Indexed: 12/11/2022] Open
Abstract
Over the past decade, next-generation sequencing (NGS) technology has experienced meteoric growth in the aspects of platform, technology, and supporting bioinformatics development allowing its widespread and rapid uptake in research settings. More recently, NGS-based genomic data have been exploited to better understand disease development and patient characteristics that influence response to a given therapeutic intervention. Cancer, as a disease characterized by and driven by the tumor genetic landscape, is particularly amenable to NGS-based diagnostic (Dx) approaches. NGS-based technologies are particularly well suited to studying cancer disease development, progression and emergence of resistance, all key factors in the development of next-generation cancer Dxs. Yet, to achieve the promise of NGS-based patient treatment, drug developers will need to overcome a number of operational, technical, regulatory, and strategic challenges. Here, we provide a succinct overview of the state of the clinical NGS field in terms of the available clinically targeted platforms and sequencing technologies. We discuss the various operational and practical aspects of clinical NGS testing that will facilitate or limit the uptake of such assays in routine clinical care. We examine the current strategies for analytical validation and Food and Drug Administration (FDA)-approval of NGS-based assays and ongoing efforts to standardize clinical NGS and build quality control standards for the same. The rapidly evolving companion diagnostic (CDx) landscape for NGS-based assays will be reviewed, highlighting the key areas of concern and suggesting strategies to mitigate risk. The review will conclude with a series of strategic questions that face drug developers and a discussion of the likely future course of NGS-based CDx development efforts.
Collapse
Affiliation(s)
- Saumya Pant
- Merck Research Laboratories, Molecular Biomarkers and Diagnostics , Rahway, NJ , USA
| | - Russell Weiner
- Merck Research Laboratories, Molecular Biomarkers and Diagnostics , Rahway, NJ , USA
| | - Matthew J Marton
- Merck Research Laboratories, Molecular Biomarkers and Diagnostics , Rahway, NJ , USA
| |
Collapse
|
36
|
Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat Biotechnol 2014; 32:246-51. [PMID: 24531798 DOI: 10.1038/nbt.2835] [Citation(s) in RCA: 502] [Impact Index Per Article: 50.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2013] [Accepted: 01/27/2014] [Indexed: 12/30/2022]
Abstract
Clinical adoption of human genome sequencing requires methods that output genotypes with known accuracy at millions or billions of positions across a genome. Because of substantial discordance among calls made by existing sequencing methods and algorithms, there is a need for a highly accurate set of genotypes across a genome that can be used as a benchmark. Here we present methods to make high-confidence, single-nucleotide polymorphism (SNP), indel and homozygous reference genotype calls for NA12878, the pilot genome for the Genome in a Bottle Consortium. We minimize bias toward any method by integrating and arbitrating between 14 data sets from five sequencing technologies, seven read mappers and three variant callers. We identify regions for which no confident genotype call could be made, and classify them into different categories based on reasons for uncertainty. Our genotype calls are publicly available on the Genome Comparison and Analytic Testing website to enable real-time benchmarking of any method.
Collapse
|
37
|
He Z, O'Roak BJ, Smith JD, Wang G, Hooker S, Santos-Cortez RLP, Li B, Kan M, Krumm N, Nickerson DA, Shendure J, Eichler EE, Leal SM. Rare-variant extensions of the transmission disequilibrium test: application to autism exome sequence data. Am J Hum Genet 2014; 94:33-46. [PMID: 24360806 DOI: 10.1016/j.ajhg.2013.11.021] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2013] [Accepted: 11/26/2013] [Indexed: 11/18/2022] Open
Abstract
Many population-based rare-variant (RV) association tests, which aggregate variants across a region, have been developed to analyze sequence data. A drawback of analyzing population-based data is that it is difficult to adequately control for population substructure and admixture, and spurious associations can occur. For RVs, this problem can be substantial, because the spectrum of rare variation can differ greatly between populations. A solution is to analyze parent-child trio data, by using the transmission disequilibrium test (TDT), which is robust to population substructure and admixture. We extended the TDT to test for RV associations using four commonly used methods. We demonstrate that for all RV-TDT methods, using proper analysis strategies, type I error is well-controlled even when there are high levels of population substructure or admixture. For trio data, unlike for population-based data, RV allele-counting association methods will lead to inflated type I errors. However type I errors can be properly controlled by obtaining p values empirically through haplotype permutation. The power of the RV-TDT methods was evaluated and compared to the analysis of case-control data with a number of genetic and disease models. The RV-TDT was also used to analyze exome data from 199 Simons Simplex Collection autism trios and an association was observed with variants in ABCA7. Given the problem of adequately controlling for population substructure and admixture in RV association studies and the growing number of sequence-based trio studies, the RV-TDT is extremely beneficial to elucidate the involvement of RVs in the etiology of complex traits.
Collapse
Affiliation(s)
- Zongxiao He
- Center for Statistical Genetics, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Brian J O'Roak
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Joshua D Smith
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Gao Wang
- Center for Statistical Genetics, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Stanley Hooker
- Center for Statistical Genetics, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Regie Lyn P Santos-Cortez
- Center for Statistical Genetics, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Biao Li
- Center for Statistical Genetics, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Mengyuan Kan
- Center for Statistical Genetics, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Nik Krumm
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Deborah A Nickerson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Suzanne M Leal
- Center for Statistical Genetics, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.
| |
Collapse
|
38
|
van Gurp TP, McIntyre LM, Verhoeven KJF. Consistent errors in first strand cDNA due to random hexamer mispriming. PLoS One 2013; 8:e85583. [PMID: 24386481 PMCID: PMC3875578 DOI: 10.1371/journal.pone.0085583] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2013] [Accepted: 11/28/2013] [Indexed: 11/19/2022] Open
Abstract
Priming of random hexamers in cDNA synthesis is known to show sequence bias, but in addition it has been suggested recently that mismatches in random hexamer priming could be a cause of mismatches between the original RNA fragment and observed sequence reads. To explore random hexamer mispriming as a potential source of these errors, we analyzed two independently generated RNA-seq datasets of synthetic ERCC spikes for which the reference is known. First strand cDNA synthesized by random hexamer priming on RNA showed consistent position and nucleotide-specific mismatch errors in the first seven nucleotides. The mismatch errors found in both datasets are consistent in distribution and thermodynamically stable mismatches are more common. This strongly indicates that RNA-DNA mispriming of specific random hexamers causes these errors. Due to their consistency and specificity, mispriming errors can have profound implications for downstream applications if not dealt with properly.
Collapse
Affiliation(s)
- Thomas P. van Gurp
- Netherlands Institute of Ecology (NIOO-KNAW), Department of Terrestrial Ecology, Wageningen, The Netherlands
- * E-mail:
| | - Lauren M. McIntyre
- Genetics Institute, University of Florida, Gainesville, Florida, United States of America
| | - Koen J. F. Verhoeven
- Netherlands Institute of Ecology (NIOO-KNAW), Department of Terrestrial Ecology, Wageningen, The Netherlands
| |
Collapse
|
39
|
Thomas AC, Jarman SN, Haman KH, Trites AW, Deagle BE. Improving accuracy of DNA diet estimates using food tissue control materials and an evaluation of proxies for digestion bias. Mol Ecol 2013; 23:3706-18. [PMID: 24102760 DOI: 10.1111/mec.12523] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2013] [Revised: 08/30/2013] [Accepted: 09/09/2013] [Indexed: 11/30/2022]
Abstract
Ecologists are increasingly interested in quantifying consumer diets based on food DNA in dietary samples and high-throughput sequencing of marker genes. It is tempting to assume that food DNA sequence proportions recovered from diet samples are representative of consumer's diet proportions, despite the fact that captive feeding studies do not support that assumption. Here, we examine the idea of sequencing control materials of known composition along with dietary samples in order to correct for technical biases introduced during amplicon sequencing and biological biases such as variable gene copy number. Using the Ion Torrent PGM(©) , we sequenced prey DNA amplified from scats of captive harbour seals (Phoca vitulina) fed a constant diet including three fish species in known proportions. Alongside, we sequenced a prey tissue mix matching the seals' diet to generate tissue correction factors (TCFs). TCFs improved the diet estimates (based on sequence proportions) for all species and reduced the average estimate error from 28 ± 15% (uncorrected) to 14 ± 9% (TCF-corrected). The experimental design also allowed us to infer the magnitude of prey-specific digestion biases and calculate digestion correction factors (DCFs). The DCFs were compared with possible proxies for differential digestion (e.g. fish protein%, fish lipid%) revealing a strong relationship between the DCFs and percent lipid of the fish prey, suggesting prey-specific corrections based on lipid content would produce accurate diet estimates in this study system. These findings demonstrate the value of parallel sequencing of food tissue mixtures in diet studies and offer new directions for future research in quantitative DNA diet analysis.
Collapse
Affiliation(s)
- Austen C Thomas
- Marine Mammal Research Unit, Fisheries Centre, and Department of Zoology, University of British Columbia, 2202 Main Mall, Vancouver, BC, V6T 1Z4, Canada
| | | | | | | | | |
Collapse
|
40
|
Qing T, Yu Y, Du T, Shi L. mRNA enrichment protocols determine the quantification characteristics of external RNA spike-in controls in RNA-Seq studies. SCIENCE CHINA-LIFE SCIENCES 2013; 56:134-42. [PMID: 23393029 DOI: 10.1007/s11427-013-4437-9] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2012] [Accepted: 01/07/2013] [Indexed: 11/29/2022]
Abstract
RNA-Seq promises to be used in clinical settings as a gene-expression profiling tool; however, questions about its variability and biases remain and need to be addressed. Thus, RNA controls with known concentrations and sequence identities originally developed by the External RNA Control Consortium (ERCC) for microarray and qPCR platforms have recently been proposed for RNA-Seq platforms, but only with a limited number of samples. In this study, we report our analysis of RNA-Seq data from 92 ERCC controls spiked in a diverse collection of 447 RNA samples from eight ongoing studies involving five species (human, rat, mouse, chicken, and Schistosoma japonicum) and two mRNA enrichment protocols, i.e., poly(A) and RiboZero. The entire collection of datasets consisted of 15650143175 short sequence reads, 131603796 (i.e., 0.84%) of which were mapped to the 92 ERCC references. The overall ERCC mapping ratio of 0.84% is close to the expected value of 1.0% when assuming a 2.0% mRNA fraction in total RNA, but showed a difference of 2.8-fold across studies and 4.3-fold among samples from the same study with one tissue type. This level of fluctuation may prevent the ERCC controls from being used for cross-sample normalization in RNA-Seq. Furthermore, we observed striking biases of quantification between poly(A) and RiboZero which are transcript-specific. For example, ERCC-00116 showed a 7.3-fold under-enrichment in poly(A) compared to RiboZero. Extra care is needed in integrative analysis of multiple datasets and technical artifacts of protocol differences should not be taken as true biological findings.
Collapse
Affiliation(s)
- Tao Qing
- Center for Pharmacogenomics, School of Pharmacy, Fudan University, Shanghai 201203, China
| | | | | | | |
Collapse
|